Benchmarking cloud servers

Benchmarking cloud servers: A Cloud Computing Insider’s Guide

Many new customers when they start using CloudSigma want to test the performance; they are often looking to benchmark performance results between cloud servers and their own infrastructure and that makes sense. A straight price comparison by resource doesn’t tell anything like the whole story; what really matters is the end result, how much does it cost to achieve a specific computing task?

For any given requirement the number of resources needed to achieve it may vary widely between clouds. This means that comparing just prices doesn’t work. The flip-side is that comparing performance in isolation isn’t any better. Meaningful comparisons need to pull together both price and performance to calculate some measure of cost per computing unit. In this post I’m going to share some of my thoughts from benchmarking our cloud servers and others. I will also provide some tips for getting useful results and what they really mean.

Health Warnings

To explain upfront, I’m quite skeptical about benchmarking in general. It rarely offers a true insight into real-world usage. In short, there is no real replacement for running the actual applications you intend to use on the platform. If you can achieve this at a reasonable cost in terms of time then there is no replacement for such an exercise.

Another factor is how busy the cloud vendor is. You may benchmark cloud servers and get excellent results. However, these may be largely due to the level of usage (or lack thereof) of that particular vendor. That may not be a positive sign. It might reflect difficulties in the operation, lost customers, past issues with availability and reliability, etc. You should always, therefore, research the cloud vendor for past outages and other potential problems when interpreting their benchmark results.

As a final health warning, performance isn’t the only factor you should consider. Often lower performance can reflect a more robust (and redundant) hardware architecture underlying it. It’s always important therefore to have a very clear understanding of what infrastructure the cloud is built on. Thus, you can compare results fairly allowing you to make a meaningful purchasing decision.

Define the problem

Later in this post, I set out the various aspects of performance and how best to go about interpreting the results. Before doing any benchmarking however it is important to characterize the kind of computing you will be looking to undertake in the cloud; this will determine the relative importance of different performance metrics. For example, if you are looking to place a database server and it will be under heavy read access but low write access, you should pay attention to the disk performance in the cloud and particularly non-sequential read access.

So, before you start any cloud servers benchmarking actually codify what you would consider being a good performance from the cloud. You should determine what aspects are key and have a disproportionate impact on the real-world performance of your computing. Once you have a clear idea of this then you are in a position to start looking at benchmarking.

Computational Performance

When we are looking at the raw computational performance we are talking about CPU and RAM. The differences in performance at a pure computational level between clouds are actually not that great. However, there are some factors that are causing the real differences.

By far the biggest factor affecting computational performance in the cloud is contention. Public clouds are multi-tenant environments. RAM and storage cannot be actually over-allocated (although they can be over-sold) but CPU can and is. The levels of contention vary considerably but essentially public cloud vendors are able to sell the CPU capacity of a physical host at more than 100%.

Some of the largest vendors use CPU contention ratios of over three times. This means the total ‘sold’ CPU capacity of all the virtual servers on the same physical machine might be three times its actual CPU capacity. They do this because most virtual servers aren’t utilizing anything like 100% of their CPU allocation for most of the time. Still, contention ratios will directly affect cloud servers performance benchmarks and real-world usage. If the contention is high (i.e. at anything more than 200% CPU allocation) then cloud server performance will deteriorate significantly.

Simply put, if the load on any physical machine hits more than 1 per core, computational tasks are being queued and the time taken for that virtual machine to complete the job will be longer. Given that most clouds charge on a capacity/hour basis this has a direct cost impact for customers of that cloud.

The other important factor affecting computational performance is the number of CPU cores that the virtual machine has access to. This isn’t a factor for all applications but many modern applications do support multi-threading. Effectively this means that the application and/or operating system is able to spread the computational tasks across multiple cores. One great tip for improving the performance of your computing is matching the number of threads (i.e. cores) that an application can support to the number of cores that the virtual machine has access to.

Unfortunately, this isn’t possible with many public clouds. This is because their virtualization platforms don’t support virtualization at the CPU core level. In other words, each core can only be in use by one virtual machine at a time. In clouds that do support virtualisation of CPU cores, you should experiment with varying the number of cores for that machine whilst keeping the total CPU size the same.

For example, if you have a 2GHz machine you can see how doubling the cores in use from two to four affects your benchmarking. By doing these applications running on that virtual machine will be able to execute tasks via four cores simultaneously. In our case, you can set the number of cores a virtual machine uses via the ‘advanced’ tab on our server detail modal of the web console. Just remember to always check what the standard core size of the cloud vendor is before manually overwriting the number of cores in use. In our case its 2.2GHz per core but it does vary from cloud to cloud.

I’d recommend using consider using POV-RAY, CoreMark, Dhrystone or Whetstone for benchmarking cloud servers performance.

Storage: the real cloud servers performance benchmark

All performance is limited by the weakest link where a bottleneck develops. Currently, technology has advanced significantly in the field of virtualization with respect to the use of CPU and RAM. For example,  a single physical machine can be virtualized and have multiple cloud servers with minimal loss to total aggregate performance. Sadly in the case of storage, there is still a great deal of progress to be made. The end result is that in most cases, the performance of virtual servers in the cloud is determined by the performance of that cloud’s storage solution.

In short, storage is currently the limiting factor on the performance of most computational tasks in the cloud. Whatever results,  pov-ray and other benchmarking may produce for pure computational tasks, the reality is that the speed with which the virtual server can retrieve and write data to physical storage disks will determine the real-world performance of a cloud server currently.

With that in mind, the real differences seen in performance in the cloud, even with respect to computational tasks tend to stem from differences in storage performance. As mentioned earlier in this post, there are very different customer needs depending on the computing task. This is never more true than in respect of storage. Are you needing fast read access to large sequential chunks of data (such as streaming media) or to small disparate pieces of information (perhaps in a large database)? Do you need to sustain heavy write access for fast-changing data which is access periodically in large bursts? There are numerous scenarios and each will perform differently on the same platform.

Fundamentally, the differences in performance come down to architecture. Those differences in architecture usually result from different degrees of robustness with respect to the storage of data, its redundancy and therefore is actually the likelihood of ever becoming irreparably lost. At a high level, clouds either employ centralized data solutions in the form of a Storage Area Network (SAN) or more distributed local storage solutions. In those, the storage is located on each individual physical machine.

Good SANs intrinsically have a high level of redundancy built-in. However, performance suffers as data needs to be sent from the SAN across the network to the virtual machine’s CPU and RAM for computing tasks. As a result, SAN-based clouds tend to have a lower performance like for like compared to clouds with local distributed storage solutions. Another disadvantage of a SAN is that it represents a very large single point of failure. SANs are extremely reliable. If they ever do go seriously wrong (and they have), then you are likely to face a very large outage and corruption of data.

Most cloud vendors using SANs do not employ fully redundant fail-over solutions of the kind used in the enterprise environment, largely for cost reasons. Its important to realize that every SAN isn’t equal and to understand for the cloud vendor what level of redundancy they employ with their SANs.

Local storage based clouds tend to have good disk performance. However, often they only offer local storage in a non-persistent form. This isn’t a fair comparison to persistent storage. Temporary storage doesn’t have to be robust to failures in the same way as permanent storage. It is always important to compare persistent storage with persistent storage for meaningful results.

When looking at clouds with distributed local storage solutions you also need to know what redundancy they have. Hard disk drives fail at a significant rate and so the method of storage is critical. Most vendors use some form of RAID but there are very different levels of safely. At the low end you have RAID1 where two disks are essentially mirroring each other. This usually has good performance. But when one disk fails until the replacement disks copies all the data off the old disk, the data is at risk from complete loss if the second (heavily loaded) disk fails. Also, during any rebuilding of the RAID1 array, the disk performance is likely to be much much lower than normal.

Many vendors, therefore, use RAID5 (resilient to one disk failure) or RAID6 (resilient to two disk failures). RAID6 offers by far the safest solution for local storage but demands a big penalty on performance. Our approach is to use RAID6 but combine this with top of the line hardware RAID controller cards. They have large memory caches and are battery-backed. The RAID controller cards we use are actually significantly more expensive than the whole disk arrays. Thus, we can deliver performance comparable with much less resilient set-ups whilst still offering the very large safely net of RAID6 storage. Read more about our cloud infrastructure set-up which we are very open about.

I recommend using IOzone or Bonnie++ for disk performance benchmarks.

So, when interpreting the results of storage benchmarks make sure you also have the following information:

  • what storage architecture is the cloud using (local, SAN, other)?
  • what fail-over and redundancy measures are in place for the data?
  • is the storage I’m benchmarking temporary or persistent?

Putting the answers to these three questions together with the results of the benchmarking will give you a fairly good insight into the actual storage performance.

Networking

The performance of networking is significantly more straightforward to determine and measure than computational and disk performance. Networking performance has two key aspects, latency, and bandwidth.

Depending on your needs, the latency of the network the cloud vendor uses may or may not be important. If you are using the cloud for largely self-contained operations its unlikely that latency will be a priority. If however, you are running real-time applications that are interacting with the world outside of the cloud then latency will be a critical performance determinant.

Usually, the vast majority of latency results from a sheer physical distance. For example,  most latency between London and San Francisco is actually the time it takes light to cover that distance. Differences in latency are determined by the varying efficiency of the route taken. This is the aspect that differs from cloud to cloud. The efficiency of the route is a direct result of the network providers that the cloud has direct connections to. This happens by either taking IP connectivity from them or through peering. When looking at latencies you can simply ping your cloud server and determine its performance. However, it is important to determine the performance between your actual end-users and your cloud server.

If most of your users are based in one geography or access will be primarily from the head office of your company, it’s important to  test performance from those locations. Commercial services such as Pingdom offer a cost-effective way of determining latency from a large number of general locations simultaneously worldwide.

The actual bandwidth that your cloud server can achieve is also very important. Unlike more traditional hosting solutions, cloud vendors tend to charge in relation to the aggregate volume of data transfer. In other words not time-dependent as in a per Mbit fashion which provides you a fixed level of connectivity 24/7. Despite this many cloud vendors will ‘throttle’ the bandwidth to any virtual server. This will be invisible to the user until you hit that barrier. If you have quite a spiky bandwidth profile this could be an important performance factor to take into consideration.

To test the actual bandwidth of your cloud server it’s important to try and download data to the cloud server from a source that isn’t actually restricting the transfer rate at their end. I often find a great way of determining the speed available is to download a large file from a major vendor such as MicrosoftUbuntu or even better by patching the operating system. This tends to download many different files from various locations simultaneously. It will give you a pretty good feel for the speed of your connection.

I often download a Fedora live CD from their main site as a standard test but you should always experiment with a few different files and locations at a minimum. If you insist on having your own very fast corporate network then you may want to download a file from your cloud server to your own network as a test instead.

Now Add Pricing Back into Weight the Results

Using the methods above you should be able to get a good feel for how the various vendors of cloud servers perform. Further, you should know which aspects to focus in on that are most important to your particular needs.

The final step is to add a pricing dimension to benchmark results. There is no formula for this. It depends on the relative performance of the various aspects from above and you determine these. If one cloud is producing 40% better performance (as determined by you) but is only 30% more expensive then clearly they look attractive. Likewise, if you have a large bandwidth need, lower computational performance may be trumped by a competitive data transfer pricing plan. The key to making the right decision is to pull in all the various factors.

Finally, benchmarking should be part of a larger process of determining which cloud servers are right for you. This should include other aspects. For example, those can include service level agreements, data/vendor lock-in considerations, physical location, and legal jurisdiction. By pulling together all these aspects you’ll position yourself to make the right choice of cloud computing vendor.

707379291da5aacee894a66824e8b43d?s=80&r=g

About Patrick Baillie

Patrick is co-founder of CloudSigma, and comes from a career working in Investment Banking Technology, as well as having previously ran his own business.