Delivering robust yet high performing storage in the cloud has been one of the greatest hardware and software challenges in the explosion of cloud computing. Poor storage performance from many leading Infrastructure-as-a-Service (IaaS) clouds is one of the most cited complaints by users. In this post I will outline the dominant approach to storage currently, our current approach and what the future holds for cloud storage. The great news is that a revolution in how data is stored and accessed is right around the corner!
As I outlined in my recent post on how to benchmark cloud servers, along with networking performance, storage performance is one of the key differentiating factors between different IaaS clouds. Storage performance varies widely across different clouds and even within the same cloud over time. While managing CPU, RAM and networking securely and reliably has been largely solved, delivery of secure reliable storage clearly hasn’t.
One of the key trade-offs traditionally with storage is between performance and redundancy/reliability. The more redundant a storage solution, the slower the performance as any write action needs to be duplicated in a way not necessary without less replication/redundancy. For example, holding storage in RAID1 gives much higher performance than RAID5 or RAID6. If a drive fails in RAID1, until the second drive is reconstituted, all data on the remaining drive is at risk if a further drive fails (the same is the case under RAID5). That isn’t the case with RAID6 but RAID6 under normal circumstances has much less performance.
It is also important to draw a distinction between cloud storage that is persistent and ephemeral/temporary storage. For temporary storage which isn’t intended for storing critical data, its ok to have little or no resilience to hardware failure. For permanent storage, the resilience is critical.
For permanent storage most public IaaS clouds employ a Storage Area Network (SAN) architecture for persistent data storage. This architecture is a stalwart of the enterprise world and a tried and tested storage method. Modern SANs offer a high degree of reliability and significant built-in redundancy for stored data. They include granular controls on how data is stored and replicated. In short, the modern SAN is a big, highly sophisticated and expensive piece of storage equipment; its also very complicated and proprietary.
Many modern SANs claim to be virtually failure proof, sadly the practical reality doesn’t seem to bear this out. Some of the biggest outages and performance failures in the cloud have related to a SAN failure or significant degradation in performance and that’s the rub. SANs don’t go wrong very often but when they do they are huge single points of failure. Not only this but their complexity and proprietary nature mean when things do go wrong, you have a pretty big, complex problem to solve on your hands. That’s why the outages when they have occurred have often been measured in hours not minutes. The sheer size of your average SAN means it takes quite some time to just repair itself once you have addressed the problem even after the initial problem has been solved.
Why not use redundant SANs? In principal this can be done and often is in mission critical enterprise environments. The reality in the price sensitive public cloud sector is that clouds don’t employ fully redundant SANs because doing so would price storage at a multiple of the current prevailing pricing levels. So, SANs are great until they aren’t!
There is another problem with SANs in a compute cloud and that is latency. The time it takes storage data to travel across the SAN and network to the compute nodes where the CPU and RAM is doing all the work is significant enough to dramatically affect performance. Its really not a case of bandwidth, its a problem of latency. For this reason SANs create an upper boundary level to storage performance by virtue of the time it takes data to move back and forth between the compute nodes and the SAN.
Using a SAN is, in our opinion, an old solution to a new problem and their fit with the cloud is therefore not a good one. If the cloud is to delivery high performance, reliable and resilient storage, the cloud needs to move beyond the SAN.
When building out our cloud we made the decision early that we preferred more frequent low impact problems than infrequent high impact problems. Essentially we’d rather solve a simple small problem which occurs more frequently (but still rarely) than a complicated large problem that occurs less frequently. For this reason we chose not to use SANs for our storage but local RAID6 arrays on each computing node.
By putting storage locally on each node where computing is taking place, for the most part the virtual disk and CPU/RAM are matched to the same physical machine. This means our storage has very low latency. For storage robustness we coupled this with RAID6. To prevent performance suffering we use high-end battery backed hardware RAID controllers with RAM caches. Our RAID controllers are able to deliver high performance even with RAID6 arrays and are resilient to power failures (our computing nodes have two independent power supplies in any case).
To further boost performance and reduce the impact of any drive failure we use small 2.5” 500GB drives. If any drive fails in an array we quickly replace it and the RAID array is re-constituted in a much shorter period of time. Not only that but the greater density of spindles per terabyte of storage means that the load of heavy disk access is spread across a greater number of drives. For this reason our storage performance is one of the best of any public cloud.
Local storage has one main drawback, if a physical host which has your disk on it fails for some reason, you will lose access to that disk. In reality hosts rarely fail completely in this way (it hasn’t happened yet) and we maintain ‘hot spares’ which allow us to swap the disks into a new host almost immediately minimising downtime to 10-15min usually. Most of our customers have multiple servers across different physical machines. It means that the failure of any one host has a tiny impact on the cloud overall, it doesn’t affect most customers at all and those affected suffer a limited outage to some of their infrastructure only. Compare that the a SAN failure for complexity and time to recovery!
Despite this it would be great if a host failure didn’t mean loss of access to drives on that host machine. Likewise it would be great to have disks without upper size limits that could be larger than the size of storage on any one physical host.
Its clear both SANs and local storage have their drawbacks. For ourselves the drawbacks of local storage are much less than SANs, coupled with the better performance its the right choice for a public cloud at the moment. The current way of delivering storage is about to be revolutionised however by a new approach to storage, its called distributed replicated block devices (DRBD) or just ‘distributed block storage’.
Distributed block storage takes each local storage array and, in much the same way as RAID combines multiple drives into one single array, combines each storage/compute node into one huge array cloud wide. Unlike a SAN, management of a distributed block storage array is federated so there is no single point of failure in the management layer. Likewise any data stored on any single node is replicated across other nodes completely. If any physical server were to fail, there would be no loss of access to data stored on that machine. The distributed block storage arrangement means that the other virtual servers would simply access the data from other physical servers in the array.
If your virtual machine was unlucky enough to be on a host that fails (so you’d lose the CPU/RAM you were using), our system would simply bring it back up on another physical computing node immediately. Essentially you have eliminated all single points of failure in storage, delivering a high availability solution to customers. The cost we could offer this at is expected to not be at any premium to our current pricing levels.
Another great benefit of distributed block storage is the ability to create live snapshots of drives even if they are in active use by a virtual server. Rollbacks to previous versions in time are also possible in a seamless way. In essence backups become implicit in the system through replication with the added convenience of snapshots.
There are a number of open source solutions currently in development that are looking to delivery such a solution, one of the leading contenders currently is a project called Sheepdog. Within 6 months it is expected that an open source distributed block storage solution will be available in a pretty stable form.
On the commercial side a company called Amplidata have already launched an extremely robust, cost effective distributed block storage solution delivering the sort of advantages outlined above. They are fellow TechTour finalists along with ourselves and presented at the TechTour Cloud and ICT 2.0 event in Lausanne and CERN last week; it was certainly very interesting to listen their presentation.
Another benefit of distributed block storage is the ability to spread the load from a heavy use virtual drive across multiple disk arrays. Whereas currently local storage means the load for a particular drive can only be spread within one RAID array, distributed block storage spreads the load from any one drive across a great many servers with separate disk arrays. The upshot is that drives in heavy use have a much more marginal impact on other cloud users as their impact is thinly spread across a great many physical disk arrays.
The key lesson here is that in the future, cloud storage will be able to delivery a much more reliable, less variable level of performance. This will make many customers happy who currently suffer from wide variations in their disk performance currently.
I talked about the latency problem with SANs and how we avoid this with local storage, by distributing storage across a whole array of separate physical machines won’t distributed block storage suffer from the same problems? In principal yes it will. That’s why the storage network of the cloud needs to be reconsidered and modified in conjunction with distributed block storage implementation.
Currently our storage network is relatively low traffic, almost all virtual servers are on the same physical machine as the disk. It means traffic between physical servers is minimal and latency is very low. How to cope with most disk traffic travelling between physical servers on the storage network? The answer is to switch to low latency networking and increase cache sizes on each physical server. In this regard there are two main options, 10Gbps Ethernet or Infiniband. Both have advantages and disadvantages however they both share the promise of significantly lower latency over their networks. Which is better is a whole blog post in itself!
In order to deliver the promise of high performance reliable storage, distributed block storage must therefore be implemented with a low latency storage network.
Solid State Drive (SSD) storage is ideal for storage which has a high read to write access ratio. It is not actually ideal for many heavy write storage uses which many traditional storage solutions can outperform. Currently the price of SSD makes it of limited use for most every day storage needs. There is an argument for moving some heavy read storage onto SSD to boost overall performance and its something as a company we are actively investigating. For a cool upcoming SSD storage solution check out Solidfire (not that much to look at yet but one to watch!)
Storage in the cloud currently is sub-optimal. The advent of distributed block storage will deliver SAN-style convenience and reliability with local storage level performance. The elimination of any single point of failure in storage is a huge leap forward and brings closer the fully matured, affordable high availability IaaS cloud.