Fusion-IO: the Pioneer

Fusion-IO is telling everyone that wants to listen that it is much more than the vendor of extremely fast PCIe flash cards. Despite the fact that it sells quite a few cards to the storage giants like NetApp, Fusion-IO wants nothing less than to completely change and conquer the storage market.

Fusion-IO's first succesful move was to sell extremely fast ioDrives to the people who live from scale-out applications like Facebook and Apple. These companies ditched their traditional SAN environments very quickly as replacing centralized shared storage by a model where hundreds of servers have a local PCIe flash storage system gave them up to ten times more storage performance at a fraction of the cost of a high-end SAN system.

Ditching your centralized storage is not for everyone of course: your application has to handle replication and thus be able to survive the loss of many server nodes. But as we all know, that is exactly what Google, Facebook, and other scale-out companies did: build applications that replicate data between nodes so that nobody has to worry about a few failing nodes.

The Iodrive: up to 3TB, hundreds of thousands of IOPS

Although scale-out customers were extremely important to Fusion-IO, the company also went also after the virtualization market where centralized storage is king. The ioTurbine is a Hypervisor plug-in that enables server side caching on a virtualized host with a Fusion-IO flash card. The beauty is that ioTurbine does not disable the typical goodies that centralized storage offers in a virtualized environment such as vMotion and High Availability. ioTurbine works with ESXi, Windows 2012/2008 and RHEL.

Fusion-IO ION Data Accelerator is the next generation SAN: PCIe Flash cards inside any decent x86 server, like the Supermicro 6037 or the HP DL380p. ION is typically used for high-end database clusters. Fusion-IO promises that this shared storage can deliver no less than 1 million IOPs.

With the acquisition of NexGen Storage, Fusion-IO is also targeting the midrange market by offering a “flashpool” kind of product. The key difference is that NexGen Storage can use write-back caching, while most vendors do no or limited writing on the flash disks. The Fusion-IO software is also able to provision a certain amount of IOPs for each LUN.

But more than anything else, the Fusion-IO products are offering extreme speeds. Even a one array NexGen N5 series targeted at SMBs promise 100K-300K IOPs, more than any of the much more expensive midrange SANs can offer right now.

The fastest product, the 10TB ioDrive octal, costs around $100k and delivers 1 million IOPs. Even if those numbers are inflated, it is roughly an order of magnitude faster and cheaper (per GB) than the NetApp “Flash Cache”.

NetApp: Automatic Tiering and More Flash Goodness Conclusions
Comments Locked

60 Comments

View All Comments

  • Brutalizer - Sunday, August 11, 2013 - link

    bitpushr,
    "That's because ZFS has had a minimal impact on the professional storage market."

    That is ignorant. If you had followed the professional storage market, you would have known that ZFS is the most widely deployed storage system in the Enterprise. ZFS systems manages 3-5x more data than NetApp, and manages more data than NetApp and EMC Isilon combined. ZFS is the future and eating other's cake:
    http://blog.nexenta.com/blog/bid/257212/Evan-s-pre...
  • blak0137 - Monday, August 5, 2013 - link

    The Amplidata Bitspread data protection scheme sounds alot like the OneFS filesystem on Isilon.

    A note on the NetApp section, the NVRAM does not store the hottest blocks, rather it is only used for correlating writes to allow destaging entire raid group wide stripes onto disk at once. This utilization of NVRAM in NetApp, along with the write characteristics of the WAFL filesystem, allows RAID-DP (NetApp's slightly customized version of RAID-6) to have similar write performance as RAID-10 with a much smaller usable space penalty up to approximately 85-90% space utilization. Read cache is always held in RAM on the controller and the FlashCache (formerly PAM) cards supplement that RAM-based cache. A thing to remember about the size of the FlashCache cards is that the space still benefits from the data efficiency features of Data OnTap, such as deduplication and compression, and as such applications such as VDI get a massive boost in performance.
  • enealDC - Monday, August 5, 2013 - link

    I think you also need to discuss the effect of OSS or very low cost solutions that can be built on white box hardware. Those cause far greater disruptions than anything I can think of!
    SCST and COMSTAR to name a few.
  • Ammohunt - Monday, August 5, 2013 - link

    One thing i didn't see mention is that in the good old days you spread the I/O out across many spindles which was a huge advantage SCSI which was geared towards such a configuration. As drive sizes have increased the spindles have reduced adding more latency. The fact is that expensive SSD type storage systems are not needed in most medium sized businesses. Their data needs can in most cases be served by spectacularly by using a well architected tiered storage model.
  • mryom - Monday, August 5, 2013 - link

    There's some thing missing - take a look at Pernix Data - That's disruptive and also vSphere 5.5 gonna be a game changer. Software Defined Storage is the way forward - We just need space for more disks in blade servers
  • davegraham - Tuesday, August 6, 2013 - link

    SDS is an EMC-marchitecture discussion (a la ViPR). I'd suggest that you avoid conflating what a marketing talking head discusses with technology can actually do. :)
  • Kevin G - Monday, August 5, 2013 - link

    My understanding withenterprise storage isn't necessarily the hardware but rather the software interface and support that comes with it. NetApp for example will dial home and order replacements for failed hard drives for you. Various interfaces I've used allow for the logical creation multiple arrays across multiple controllers each using a different RAID type. I have no sane reason why some one would want to do that but the option is there and supported for the crazies.

    As far as performance goes, NVMe and SATA Express are clearly the future. I'm surprised that we haven't see any servers with hot swap mini-PCIe slots. With two lanes going to each slot, a single socket Sandy Bridge-e chip could support twenty of those small form factor cards in the front of a 1U server. At 500 GB a piece, that is 10 TB of preformatted storage, not far off of the 16 TB preformatted possible today using hard drives. Cost of course will be more expensive than disk but speeds are ludicrous.

    Going with standard PCIe form factors for storage only makes sense if there are tons of channels connected to the controlller and are PCIe native. So far the majority of offers stick a hardware RAID chip with several SATA SSD controllers onto a PCIe card and call it a day.

    Also for the enterprice market, it would be nice to a PCIe SSD have an out of band management port that communicates via Ethernet and can fully function if the switch on the other end supports power over ethernet. The entire host could be fried but data could still potentially be recovered. Also works great for hardware configuration like on some Areca cards.
  • youshotwhointhewhatnow - Monday, August 5, 2013 - link

    The first link on "Cloudfounders: No More RAID" appears to be broken (http://www.amplidata.com/pdf/The-RAID Catastrophe.pdf).

    I read through the second link on that page (the Intel paper). I wouldn't consider that paper as unbiased considering Intel is clearly trying to use it to sell more Xeon chips. Regardless, I don't think your statement "mathematically proven that the Reed-Solomon based erasure codes of RAID 6 are a dead end road for large storage systems" is justified. Sure RAID6 will eventually give way to RAID7 (or RAIDZ2 in ZFS terms), but this still uses Reed-Solomon codes. The Intel paper just shows that RAID6+1 has much worse efficiency with slightly worse durability compared to Bitspread. The same could be said for RAID7 (instead of Bitspread), which really should have been part of the comparison.

    Another strange statement in the Intel paper is "Traditional erasure coding schemes implemented by competitive storage solutions have limited device-level BER protection (e.g., 4 four bit errors per device)". Umm, with non-degraded RAID6 you could have as many UREs as you like provided less than three occur on the same stripe (or less than two for a degraded array). Again RAID7 allows even more UREs in the same stripe.

    This is not to say that the Bitspread technique isn't interesting, but you seem to be a little to quick to drink the kool-aid.
  • name99 - Tuesday, August 6, 2013 - link

    I imagine the reason people are quick to drink the koolaid is that convolutional FEC codes have proved how well they work through much wireless experience. Loss of some Amplidata data is no different from puncturing, and puncturing just works --- we experience it every time we use WiFi or cell data.

    I also wouldn't read too much into Intel's support here. Obviously running a Viterbi algorithm to cope with a punctured convolutional code is more work than traditional parity-type recovery --- a LOT more work. And obviously, the first round of software you write to prove to yourself that this all works, you're going to write for a standard CPU. Intel is the obvious choice, and they're going to make a big deal about how they were the obvious choice.

    BUT the obvious next step is to go to Qualcomm or Broadcom and ask them to sell you a Viterbi cell, which you put on a SOC along with an ARM front-end, and hey presto --- you have a $20 chip you can stick in your box that's doing all the hard work of that $1500 Xeon.

    The point is, convolutional FEC is operating on a totally different dimension from block parity --- it is just so much more sophisticated, flexible, and powerful. The obvious thing that is being trumpeted here is destruction of one of more blocks in the storage device, but that's not the end of the story. FEC can also handle point bit errors. Recall that a traditional drive (HD or SSD) has its own FEC protecting each block, but if enough point errors occur in the block, that FEC is overwhelmed and the device reports a read error. NOW there is an alternative --- the device can report the raw bad data up to a higher level which can combine it with data from other devices to run the second layer of FEC --- something like a form of Chase combining.

    Convolutional codes are a good start for this, of course, but the state of the art in WiFi and telco is LDPCs, and so the actual logical next step is to create the next device based not on a dedicated convolutional SOC but on a dedicated LDPC SOC. Depending on how big a company grows, and how much clout they eventually have with SSD or HD vendors, there's scope for a whole lot more here --- things like using weaker FEC at the device level precisely because you have a higher level of FEC distributed over multiple devices --- and this may allow you a 10% or more boost in capacity.
  • meorah - Monday, August 5, 2013 - link

    you forgot another implication of scale-out software design. namely, the ability to bypass flash completely and store your most performance intensive workloads that use your most expensive software licensing directly in-memory. 16 gigs to run the host, the other 368 gigs as a nice RAM drive.

Log in

Don't have an account? Sign up now