EMC Software Solutions Blog

Current Articles | RSS Feed RSS Feed

Cloud Storage: What Indy Doesn't Understand About Modern Archives


      raiders book warehouse resized 600

Guest post by Mark O’Connell

What do you think of when you think of an archive?  For years, the final scene in Raiders of the Lost Ark represented the state of the art in archiving technologies.  In case you don’t remember the movie, "Indy" Indiana Jones, after surviving being chased by scores of Nazis, manages to defeat them and to recover the Lost Ark of the covenant, an incredibly powerful religious artifact.  The US government seizes the artifact and assures Indiana that it is being studied by “top men”, while the final scene of the movie shows the Ark being archived: shut in a wooden box, crated into a giant storage room which is filled to the brim with hundreds of thousands of similar boxes, never to be studied because no one could ever hope to find it again.

If you’ve ever tried to recover information from banks and banks of labeled tapes, you might know this feeling.  The information is there, somewhere, just waiting to be discovered, like a precious gem buried deep within the earth.  In such an environment, the archive isn’t quite dead, though it does represent quite a treasure hunt!  However, you certainly have more efficient ways to use your time, and there are certainly more efficient ways to leverage your business critical historical information.

This world began to change in 2002 when EMC introduced the Centera  platform, the first storage platform designed specifically to address the challenges posed by the long term storage of fixed content, archive data.  Because Centera stored the data on disks instead of tape or optical platters, information retrieval time immediately improved by more than a factor of 10 and was suddenly being measured in seconds or sub-seconds instead of minutes or worse.

With such a momentous change in the accessibility of information, the old paradigms were no longer sufficient.  Companies began transferring their data from primary storage to the archive more and more quickly, sometimes even bypassing primary storage altogether and immediately writing to the Centera.  As the average age of archived data grew younger and younger, the retrieval demands for the information grew exponentially, and suddenly sub-second retrieval times were no longer a glorious luxury but an absolute business necessity.

To say that such changes were unanticipated by the Centera engineers would be an understatement.  But the Centera engineers responded, and Centera today remains the premier platform for scale-out, long-term storage of fixed content and compliant archive information.

However, Centera remains designed around the storage of fixed content data.  With a treasure trove of historical information   at their fingertips or available over the web  businesses began to want to derive more and more value from it by running analytics, by annotating the data, even moving away from fixed content to storing mutable business content on the archive.  To meet the growing demands for these next generation archives, EMC introduced the Atmos platform.  Learning from the Centera experience, Atmos was designed to support both mutable and fixed content data, to support annotations and indexing natively, and to natively support multiple methods to access the data, including firewall and mobile device friendly friendly REST protocols.

As with Centera, customers have taken the base support in Atmos  and used it to drive active archives in directions that would have been unimaginable even a few years ago.  For customers whose primary concern is space efficiency, Atmos provides GeoParity  offering distributed erasure encoding for data, maintaining full read/write access to data while also giving protection against disk, node, and site failures for only 33% overhead.  For customers who have existing applications using a filesystem interface, Atmos GeoDrive   

enables these applications to leverage scale-out cloud storage without any changes required.  For customers with global data storage needs, Atmos supports a single system image across geographically dispersed systems, with support to minimize read latencies by reading from the location which is closest to the requesting client.  And for customers with a mixed legacy of applications and use cases, Atmos multi-tenancy allows the customization of the storage policies, self-service access, data indices, and workflows such that each application has the illusion of a storage platform tuned precisely to its needs, while the system administrator has the ease of use associated with a single system with per-application personalities.

Now imagine Indiana Jones in this new world – all the wooden boxes, containing treasures from all over the world, are open and their contents visible, but using only a fraction of space that used to be required.  Any item can be found in under a second using a well known,  unique metadata identifier.  Similar items are grouped, so I can find “all arks from antiquity” or “items found by Indiana Jones” in a heartbeat.  Research notes can be added to any items, and are instantly visible to all users, multiplying the value of the information and bringing new insights that had never before been possible.  Archeologists would see the collection organized by era or empire, while linguistics professors would see the collection organized by hieroglyphic or alphabetic family.  Indiana Jones could only dream of living in such a world, while for us it is the new business reality, driven by the changes initiated by Centera and continued in Atmos.

Photo courtesy of Lucasfilm Ltd.

blog comments powered by Disqus