Earlier this week, Microsoft Research and the Polytechnic Institute of New York issued a joint research paper entitled
Measuring and Evaluating Large-Scale CDNs. The authors conducted measurements and testing to better understand the performance of Limelight Networks’ and Akamai’s CDN. Industry expert Dan Rayburn is reporting on this paper, and asked Limelight Networks to provide him with our thoughts and analysis of the methodology. Many of our best technical experts, from C-Level execs to operations engineers to customer-facing solutions engineers, collaborated on our response. Here, in the spirit of transparency, we print our full message to Dan.
I’m writing to follow up with our impressions of the Microsoft Research/Polytechnic Institute of NYU paper released earlier this week, per your request. Thank you for the opportunity to offer your readers our thoughts on the paper’s approach.
First, we’re pleased that a topic on CDN performance is on the agenda at the Internet Measurement Conference – it is great that more venues are promoting a deeper understanding of our industry. The paper itself is a worthwhile contribution to the study of CDN performance, and we support its overall conclusion that the Limelight and Akamai models are very different but deliver similar results.
However, it is important to note that the research was limited to studying essentially just two aspects of CDN performance: server uptime and single-packet latency. The real world of CDNs and their customers, on the other hand, is much more than two dimensions, as publishers have widely varying combinations of object sizes, library sizes, audience sizes and locations, patterns of object demand distribution, and object turnover rates.
Further, every CDN has a unique combination of customers and therefore a unique macro universe of objects, libraries, and object demand distribution patterns. They have varied physical and logical architectures, different numbers and locations of servers, different networks, and varying connectivity (direct and indirect) to the networks that make up the global Internet. Conditions in the CDN’s own infrastructure and the external Internet fluctuate rapidly, and the definition of “performance” is different for every customer.
Achieving high CDN performance on a global basis, across hundreds or thousands of diverse customers, and doing so consistently on a day-in, day-out basis in a highly dynamic Internet environment, is very difficult – which is one reason why many attempts at building a global CDN fail. Doing this successfully requires mastering difficult architectural and operational challenges across many dimensions, not just two.
This is why we always recommend augmenting any “laboratory-style” testing with a real-world trial using production content. After all, the end-goal is to ensure a brilliant end-user experience, and not about algorithms or performance metrics in a lab.
Some examples of how measuring only two dimensions can lead to incomplete information:
The paper measures aggregate availability of individual servers that comprise each CDN. But server downtime simply does not matter if it is not visible to users. CDN operators often times stop providing a server’s IP address to potential users in instances of planned server downtime, such as for software updates, hardware maintenance, and the like.
We suggest the researchers should have measured uptime associated with actual server IP addresses provided in CDN DNS resolutions, weighted by the object demand per resolution (note that this second bit of information is not available to the researchers without the cooperation of the CDN, though the researchers might have considered methods of estimating it). Evaluated on this basis, our guess is that Akamai and Limelight would have been essentially comparable, both with very high availability, likely over 99.9%.
The paper measures round trip latency for a single packet data request to a CDN server. However, CDN performance is a much more complex issue than this. Size of objects, total demand on servers, and network packet loss are far more important than single packet latency. Further, cache-hit ratio, network-hit ratio, cache-miss performance, and network-miss performance can vastly overwhelm all other issues if managed well or poorly.
We suggest the researchers should have measured actual delivery performance across a range of object sizes and object demand distribution patterns – in other words, real world object deliveries. Publisher libraries are not all alike, nor are their audiences – which is why we recommend that customers do their own performance evaluations.
In summary, while the Microsoft Research paper provides some good thinking on elements of CDN performance and some initial approaches to measuring certain aspects of it, customers are still well-advised to take matters into their own hands and do their own performance evaluations. There is no research study better than your own actual experience.
I’m sure that you will receive many questions after your post goes live. Please know that we are happy to answer any questions about CDN measurement, our architecture and approach, and our real-world implementations. Feel free to direct questions to my email address, and I’ll work with the appropriate Limelight expert on a response for you to publish, or even schedule a one-on-one call with a technical expert if necessary.
Senior Director, Corporate Communications
On behalf of the Limelight Networks Team