Benchmarking cloud servers: A Cloud Computing Insider’s Guide

Many new customers when they start using CloudSigma want to test the performance; they are often looking to compare performance results between clouds and their own infrastructure and that makes sense. A straight price comparison by resource doesn’t tell anything like the whole story; what really matters is the end result, how much does it cost to achieve a specific computing task? For any given requirement the amount of resources needed to achieve it may vary widely between clouds so comparing just prices doesn’t work. The flip-side is that comparing performance in isolation isn’t any better. Meaningful comparisons need to pull together both price and performance to calculate some measure of cost per computing unit. In this post I’m going to share some of my thoughts from benchmarking our cloud and others and provide some tips for getting useful results and what they really mean.

Health Warnings

To explain upfront, I’m quite sceptical about benchmarking in general because it rarely offers a true insight into real world usage. In short there is no real replacement for running the actual applications you intend to use on the platform and at least simulating the load and usage patterns you would expect to deal with. If you can achieve this at a reasonable cost in terms of time then there is no replacement for such an exercise.

Another factor is how busy the cloud vendor is. You may benchmark a cloud and get excellent results but these may be largely due to the level of usage (or lack thereof) of that particular vendor. That may not be a positive sign as it might reflect difficulties in the operation, lost customers, past issues with availability and reliability etc. You should always therefore research the cloud vendor for past outages and other potential problems when interpreting their benchmark results.

As a final health warning, performance isn’t the only factor you should consider. Often lower performance can reflect a more robust (and redundant) hardware architecture underlying it. Its always important therefore to have a very clear understanding of what infrastructure the cloud is built on in order to be able to compare results fairly allowing you to make a meaningful purchasing decision.

Define the problem

Later in this post I set out the various aspects of performance and how best to go about interpreting the results. Before doing any benchmarking however it is important to characterise the kind of computing you will be looking to undertake in the cloud; this will determine the relative importance of different performance metrics. For example, if you are looking to place a database server and it will be under heavy read access but low write access, you should pay attention to the disk performance in the cloud and particularly non-sequential read access.

So, before you start any benchmarking actually codify what you would consider to be good performance from the cloud i.e. what aspects are key and have a disproportionate impact on the real world performance of your computing. Once you have a clear idea of this then you are in a position to start looking at benchmarking.

Computational Performance

When we are looking at raw computational performance we are talking about CPU and RAM. The differences in performance at a pure computational level between clouds are actually not that great although there are some factors that are causing the real differences.

By far the biggest factor affecting computational performance in the cloud is contention. Public clouds are multi-tenant environments. RAM and storage cannot be actually over-allocated (although they can be over-sold) but CPU can and is. The levels of contention vary considerably but essentially public cloud vendors are able to sell the CPU capacity of a physical host at more than 100%. Some of the largest vendors use CPU contention ratios of over three times i.e. the total ‘sold’ CPU capacity of all the virtual servers on the same physical machine might be three times its actual CPU capacity. They do this because most virtual servers aren’t utilising anything like 100% of their CPU allocation for most of the time. Still, contention ratios will directly affect performance benchmarks and real world usage. If contention is high (i.e. at anything more than 200% CPU allocation) then cloud server performance will deteriorate significantly. Simply put, if the load on any physical machine hits more than 1 per core, computational tasks are being queued and the time taken for that virtual machine to complete the job will be longer. Given that most clouds charge on a capacity/hour basis this has a direct cost impact for customers of that cloud.

The other important factor affecting computational performance is the number of CPU cores that the virtual machine has access to. This isn’t a factor for all applications but many modern applications do support multi-threading. Effectively this means that the application and/or operating system is able to spread the computational tasks across multiple cores. One great tip for improving the performance of your computing is matching the number of threads (i.e. cores) that an application can support to the number of cores that the virtual machine has access to. Unfortunately this isn’t possible with many public clouds as their virtualisation platforms don’t support virtualisation at the CPU core level i.e. each core can only be in use by one virtual machine at a time. In clouds that do support virtualisation of CPU cores, you should experiment with varying the number of cores for that machine whilst keeping the total CPU size the same. For example, if you have a 2GHz machine you can see how doubling the cores in use from two to four affects your benchmarking. By doing this applications running on that virtual machine will be able to execute tasks via four cores simultaneously. In our case you can set the number of cores a virtual machine uses via the ‘advanced’ tab on our server detail modal of the web console. Just remember to always check what the standard core size of the cloud vendor is before manually overwriting the number of cores in use. In our case its 2.2GHz per core but it does vary from cloud to cloud.

I’d recommend using consider using POV-RAY, CoreMark, Dhrystone or Whetstone for benchmarking computational performance.

Storage: the real cloud performance differentiator

All performance is limited by the weakest link where a bottleneck develops. Currently technology has advanced significantly in the field of virtualisation with respect to the use of CPU and RAM i.e. a single physical machine can be virtualised and have multiple cloud servers with minimal loss to total aggregate performance. Sadly in the case of storage, there is still a great deal of progress to be made. The end result is that in most cases, the performance of virtual servers in the cloud is determined by the performance of that cloud’s storage solution. In short, storage is currently the limiting factor on the performance of most computational tasks in the cloud. Whatever results pov-ray and other benchmarking may produce for pure computational tasks, the reality is that the speed with which the virtual server can retrieve and write data to physical storage disks will determine the real world performance of a cloud server currently.

With that in mind, the real differences seen in performance in the cloud, even with respect to computational tasks tend to stem from differences in storage performance. As mentioned earlier in this post, there are very differing customer needs depending on the computing task and this is never more true than in respect of storage. Are you needing fast read access to large sequential chunks of data (such as streaming media) or to small disparate pieces of information (perhaps in a large database)? Do you need to sustain heavy write access for fast changing data which is access periodically in large bursts? There are numerous scenarios and each will perform differently on the same platform.

Fundamentally the differences in performance come down to architecture and those differences in architecture usually result from different degrees of robustness with respect to the storage of data, its redundancy and therefore is actually likelihood of ever becoming irreparably lost. At a high level clouds either employ centralised data solutions in the form of a Storage Area Network (SAN) or more distributed local storage solutions where the storage is located on each individual physical machine. Good SANs intrinsically have a high level of redundancy built-in but performance suffers as data needs to be sent from the SAN across the network to the virtual machine’s CPU and RAM for computing tasks. As a result SAN based clouds tend to have lower performance like for like compared to clouds with local distributed storage solutions. Another disadvantage of a SAN is that it represents a very large single point of failure. SANs are extremely reliable but if they ever do go seriously wrong (and they have), then you are likely to face a very large outage and corruption of data. Most cloud vendors using SANs do not employ fully redundant fail-over solutions of the kind used in the enterprise environment, largely for cost reasons. Its important to realise that every SAN isn’t equal and to understand for the cloud vendor what level of redundancy they employ with their SANs.

Local storage based clouds tend to have good disk performance but often they only offer local storage in a non-persistent form. This isn’t a fair comparison to persistent storage as temporary storage doesn’t have to be robust to failures in the same way as permanent storage. It is always important to compare persistent storage with persistent storage for meaningful results.

When looking at clouds with distributed local storage solutions you also need to know what redundancy they have. Hard disk drives fail at a significant rate and so the method of storage is critical. Most vendors use some form of RAID but there are very different levels of safely. At the low end you have RAID1 where two disks are essentially mirroring each other. This usually has good performance but when one disk fails until the replacement disks copies all the data off the old disk, the data is at risk from complete loss is the second (heavily loaded) disk fails. Also, during any rebuilding of the RAID1 array the disk performance is likely to be much much lower than normal. Many vendors therefore use RAID5 (resilient to one disk failure) or RAID6 (resilient to two disk failures). RAID6 offers by far the safest solution for local storage but demands a big penalty on performance. Our approach is to use RAID6 but combine this with top of the line hardware RAID controller cards which have large memory caches and are battery backed. The RAID controller cards we use are actually significantly more expensive than the whole disk arrays but it does mean we can delivery performance comparable with much less resilient set-ups whilst still offering the very large safely net of RAID6 storage. Read more about our cloud infrastructure set-up which we are very open about.

I recommend using IOzone or Bonnie++ for disk performance benchmarks.

So, when interpreting the results of storage benchmarks make sure you also have the following information:

  • what storage architecture is the cloud using (local, SAN, other)?
  • what fail-over and redundancy measures are in place for the data stored?
  • is the storage I’m benchmarking temporary or persistent?

Putting the answers to these three questions together with the results of the benchmarking will give you a fairly good insight into the actual storage performance.

Networking

The performance of networking is significantly more straightforward to determine and measure than computational and disk performance. Networking performance has two key aspects, latency and bandwidth.

Depending on your needs, the latency of the network used by the cloud vendor may or may not be important. If you are using the cloud for largely self-contained operations or overnight routines its unlikely that latency will be a priority. If however you are running real-time applications that are interacting with the world outside of the cloud then latency will be a critical performance determinant.

Usually the vast majority of latency results from sheer physical distance i.e. most latency between London and San Francisco is actually the time it takes light to cover that distance. Differences in latency are determined by the varying efficiency of the route taken and it is this aspect that differs from cloud to cloud. The efficiency of the route is a direct result of the network providers that the cloud has direct connections to, either by taking IP connectivity from them or through peering. When looking at latencies you can simply ping your cloud server and determine its performance however it is important to determine the performance between your actual end users and your cloud server. In other words, if most of your users are based in one geography or access will be primarily from the head office of your company, its important to actually test performance from those locations. Commercial services such as Pingdom offer a cost effective way of determining latency from a large number of general locations simultaneously worldwide.

The actual bandwidth that your cloud server can achieve is also very important. Unlike more traditional hosting solutions, cloud vendors tend to charge in relation to the aggregate volume of data transfer i.e. not time dependent as in a per Mbit fashion which provides you a fixed level of connectivity 24/7. Despite this many cloud vendors will ‘throttle’ the bandwidth to any virtual server and this will be invisible to the user until you hit that barrier. If you have quite a spiky bandwidth profile this could be an important performance factor to take into consideration.

To test the actual bandwidth of your cloud server its important to try and download data to the cloud server from a source that isn’t actually restricting the transfer rate at their end. I often find a great way of determining the speed available is to download a large file from a major vendor such as Microsoft, Ubuntu or even better by patching the operating system. This tends to download many different files from various locations simultaneously and will give you a pretty good feel for the speed of your connection. I often download a Fedora live CD from their main site as a standard test but you should always experiment with a few different files and locations at a minimum. If you are sat on your own very fast corporate network then you may want to download a file from your cloud server to your own network as a test instead.

Now Add Pricing Back in to Weight the Results

Using the methods above you should be able to get a pretty good feel for how the various vendors of clouds servers perform overall and to know which aspects to focus in on that are most important to your particular needs.

The final step is to add a pricing dimension to benchmark results. There is no formula for this as it depends on the relative performance of the various aspects outlined above and these are determined by you. In a simple scenario, if one cloud is producing 40% better performance (as determined by you) but is only 30% more expensive then clearly they look attractive. Likewise, if you have a large bandwidth need, lower computational performance may be trumped by a competitive data transfer pricing plan. The key to making the right decision is to pull in all the various factors of which any one aspect of performance is just one contributor.

As I outlined under ‘health warnings’, benchmarking should be part of a larger process of determining which cloud is right for you. This should include other aspects such as service level agreements, data/vendor lock-in considerations, physical location, legal jurisdiction and quite simply whether the company is one that you actually like working with. By pulling together all these aspects you’ll position yourself to make the right choice of cloud computing vendor.