Fancy tag lines aside, we have come to an age of data storage like no other in the history of information retrieval. The amount of unstructured data being created on an hourly basis is mind-boggling. This includes video, photos, documents – you name it. Both consumers and businesses alike create terabytes of data and need a place to store it.
There is no one sole approach to Big Data. Many people are tempted to save everything. And while that sounds great, the long term cost in power, cooling and support makes it a fool’s game.
EMC’s Isilon is one of the leading Big Data offerings on the market. By leveraging clustered nodes and a global namespace, Isilon has managed to capture a lion’s share of the growing market of multi-petabyte storage solutions.
With the recent addition of Clustered Data ON-TAP combined with scale-out architecture and its Distributed Content Repository solution; NetApp has also joined the Big Data parade.
The business of Big Data is not just storage; it’s the business analytics behind the data that makes all this data relevant. Organizations are discovering that important predictions can be made by sorting through and analyzing Big Data. However, since 80 percent of this data is ”unstructured,” it must be formatted (or structured) in a way that makes it suitable for data mining and subsequent analysis.
As a solution, many large organizations are turning to Apache Hadoop – an open-source software framework for storage and large scale processing of data-sets – as the core platform for structuring Big Data. Hadoop solves the problem of making Big Data useful for analytics purposes.
Where the big change in architecture will really hit home is with the move away from traditional file-based storage paradigms, to the use of object-based storage. That will allow for much larger data sets, without the legacy burden of traditional file system metadata shortcomings.
File-based systems, like network-attached storage (NAS) units, manage access and permissions using inodes. These inodes store information about files and directories, including the block location of the files, but have no awareness of the file names or contents of those files.
Object storage uses containers, known as objects, which are organized in a flat address space. Each object can hold multiple levels of system and user generated identifiers and meta-data. When it comes to managing several petabytes of data and billions of objects, it’s obvious that the hierarchical file systems we have used in the past will no longer be a reliable method for information retrieval.