What is best approach for measuring cloud network latency?
February 14, 2012 3:59 PM   Subscribe

I would like to know what is the best approach for measuring a cloud provider's network latency and jitter e.g. Amazon EC2. While a single ping on the command line can achieve a round trip latency snapshot for that one moment, I was looking for something more comprehensive and thorough. Are there any software tools out there that are particularly recommended and hopefully easy to use for measuring at intervals, network latency & jitter, preferably in Windows? Is round trip latency or end to end latency the best metric to consider? Ping provides roundtrip although if the packet came back the same route it came, then end to end could be found by dividing by 2. Latency and jitter increases in proportion to server load. How could I ramp up server load to let's say a specified percentage? Many thanks!
posted by conrad101 to Computers & Internet (8 answers total) 5 users marked this as a favorite
It has been a long time since I've used it, but there used to be a program called Ping Plotter that I used to track when I was having intermittent problems with my ISP. This just plots the ping times, not sure how to help with the server load.
posted by gjc at 4:43 PM on February 14, 2012

mtr is a handy command line tool that will measure some of what you want. It's sort of traceroute combined with ping statistics, so you get info on the route to the host. There's a WinMTR I've never used personally.

As for what you're actually trying to measure, you're much better off measuring whatever application you really care about. Ie, if it's a web server, do an HTTP based test. If you really want to measure at the network layer (instead of the application layer) ICMP ping is not a good tool because it's often treated differently from TCP. Also mtr might mislead you since it doesn't give correct results if the route changes or is asymmetric, which is common.

What I'd do is set up Wireshark to log all packets to my server. Then generate a bunch of load, then use Wireshark's tools to categorize TCP round trip time. That'll give you a much more realistic assessment of network. There may be a tool that does exactly that analysis on pcap logs, but I don't know of one. WireShark has some basic statistics built in.
posted by Nelson at 5:02 PM on February 14, 2012 [1 favorite]

First impulsive answer, No. Comprehensive + Easy + Windows is like one of those "pick any two" things, but more like "pick one". There is SmokePing which is awesome but not truly comprehensive, difficult on Linux (pain to configure, Perl, RRD, Web server, FastCGI, etc.), possible on Windows (but definately not simple point and click), but once running will give beautiful drill-downable graphs of ping variance over scores and scores of hosts. It's still just groups of fast pings and storage of the results.

One-Way Ping is very, very hard. You want OWAMP. And it's not Windows, it's not easy, and both machines have to have excellent clocks syncronized closely to stable timesources or the data is lost in variance. We're talking possibly add-on reference timeclock cards, stratum 1 timesources, GPS+CMDA, and it's still sorta iffy as to whether you can keep the endpoints in sync enough to get good readings of one way trip time. It is also something you do not want to run on a server with other applications, or any sort of virtualization, you'd get no better idea of one way than taking RTT/2.

Now if you're interested in general network end-to-end performance, a good place to start might be The PerfSONAR project, but that tends to be Comprehensive + Hard + Not Windows.

What are you trying to accomplish? I also suspect you might find some sort of service-speed test more appropriate. But you could probably pull off some sort of `fping` logging -> Excel Spreadsheet -> graph sort of thing if you really wanted. I just think the numbers would vary so much due to the virualized nature of cloud apps that the data would look more like a shotgun than a bullet.
posted by zengargoyle at 6:06 PM on February 14, 2012

Even before all this, are you prepared to move away from e.g. EC2 to another provider based on the results of your tests here? Because I can guarantee that sending your numbers to Amazon in hopes of enforcing some SLA or whatever is going to hit a brick wall quickly.

Another thing you can do with wireshark is save a conversation to disk in tcpdump format, which can then be replayed over and over or in parallel. Parallel is not so easy on Windows unless there are improved forking/subprocess capabilities in powershell/cmd, but it will enable you to use actual traffic to model load on your machines.
posted by rhizome at 6:35 PM on February 14, 2012

Have you looked into Amazon's Cloudwatch service? I believe it can give you exactly what you're looking for.
posted by machinecraig at 7:07 PM on February 14, 2012

Paessler's PRTG can provide the graphing you're looking for, and if you combine it with the remote probes it'll be able to report things like jitter and QoS. I've used it pretty extensively, and their support is great too.
posted by Runes at 7:11 PM on February 14, 2012

Check out Solarwinds to see if they have any free tools that will work.
posted by roboton666 at 10:19 PM on February 14, 2012

Thanks for the input everyone!

My goal is just to test and benchmark what is the network latency/jitter impact where you are working in the abstracted cloud.

As was mentioned Nelson, a key part of this is the applications that are to be run on the virtualized server which will impact on latency. Is it possible to simulate server-side the effects of high levels of web server activity and perhaps latency sensitive applications, for example voip?

I am trying to get smokeping to work. The cloudwatch is something I was not aware of and I will look at closely.

Thanks again!
posted by conrad101 at 10:23 AM on February 15, 2012

« Older If I can find a way to come up...   |  Two image printing questions: ... Newer »
This thread is closed to new comments.