While some may think that Big Data is all hype and may go away after the next big thing comes along; they are probably not looking at the big picture. The reality is that the term Big Data really translates to analytics; information that helps you make better day-to-day decisions and ultimately keeps you competitive in a fast-paced business world. And the truth is, we can all use that in almost every industry today.
Analytics & Business Intelligence systems have been around forever, but big data refers to the famous three Vs – Volume, Velocity and Variety. What this really means is that data is being generated by man and machine and collected far beyond our ability to make intelligent or timely use of it. This is mainly due to the widely varying structured or semi-structured formats that don’t neatly fit into our traditional decision systems without significant manipulation. In many cases these data sets require a new suite of technologies to digest and utilize them effectively. Having said that, not everyone may need a big data solution – that is pretty clear from the customers we’ve spoken to. However, over time, as the vendors in this space merge and more standards emerge, big data analytics will be available to the masses. In some ways, this is already available through Google, Facebook and others. Most of us use big data technologies everyday but it is very transparent to what we do.
Now, when it comes to preparing your data center for big data, this becomes an interesting task; one that most large enterprises have already begun preparing for, and is definitely something you should be planning for now. While most organizations will benefit from standard analytical and visualization platforms (that are significantly more useful and feature-rich than they were even 5 years ago), the end user requirements are rapidly changing. For example, most of your users want access to read and manipulate analytical reports on their tablets and phones. While you may be able to get away with using a Client Virtualization technology like server-based computing or virtual desktops for a while, ultimately, your end users will get a better experience with a localized app. The good news is that most of the analytics companies have mobile-device enhanced web applications or even apps that are available for your end users to use. If they don’t, they are probably going to have them available in the next year, or will fight extinction in this market.
That being said, the new issue you will face will be both local LAN and Wi-Fi bandwidth saturation as all these devices come online and start consuming bandwidth on your network. For the most part, this is already happening, so take note that keeping on top of upgrading your wireless infrastructure, implementing network access control (NAC) and other optimization technologies will be necessary to keep your users happy.
So far, that’s basically keeping your lights on. What I mean is, we haven’t really talked about the big data aspect yet. We’ve been talking about analytics, data warehousing applications, and large storage subsystems which you probably already have experience managing. But when you get into the world of big data , now we’re talking about a whole different ball game. Here are some tips to help you prepare.
1 – Understand what you want to get out of a Big Data initiative. This will probably be the hardest part of your project. Don’t expect to put in a large system, connect all your datasets together and expect something great to come out of it – unless you have a clear vision in mind. This is where hiring a Data Architect or a Data Scientist would greatly benefit your organization. An experienced architect can go through all your existing structured and unstructured data, from file servers, router logs and other areas to internal and external machine generated data, social media and so on to properly paint you the right picture in terms of potential. This project will take some time and will likely be expensive, but it will be well worth the exercise in understanding exactly what you have that can be useful to making better business decisions. It is very possible that at the end of the project, you may discover that you don’t even need a big data solution, and that a large data warehouse or even a cloud-based solution may be all you need to get started.
2- Get the right people involved from the beginning. In other words, get buy-in from anyone that will benefit from the system. Evangelize early on, and keep people involved with design and the decision making process and you will see greater success in your project. Always keep this in mind; you don’t want to create a system that no one wants to use. Based on every IT project you’ve been through, you’ve probably learned that the hard way, and this type of project is no exception.
3 – If you don’t have any experience with Hadoop or Map Reduce (assuming that’s the route you decide to go down); hire the talent familiar with these core technologies, or an experienced partner. Trust me when I say that hiring the right talent will tremendously help you get started as opposed to trying and figuring this out yourself. I have talked to many highly skilled data center engineers who have tried to stand up Hadoop clusters themselves that have made no progress in this space. There are many players beyond the original Apache Hadoop open source that can help you get a head start in this space – Cloudera, Microsoft, IBM, EMC Greenplum and HortonWorks to name a few. Most of them add enterprise class support (which you will need) and extensibility for you to make Hadoop more interoperable between systems.
4 – Understand the impact of using white box servers. The big Hadoop shops (500+ nodes) all use open server or white box servers, and while that may be really economical, remember that these guys have teams of hardware engineers that either design or manage these systems. In the short term, you may be better off sticking to standard OEM server manufacturers like IBM, HP, Cisco etc. While it may be a little pricier, having one server platform to manage and maintain is always going to be easier for your data center.
5 – You don’t necessarily need shared storage to build a Hadoop cluster. The whole point of Hadoop is to keep the data local on commodity servers and economical local storage, yet we’re now seeing larger storage providers offering hybrid solutions with shared storage subsystems. While I feel that this is definitely the right direction based on the size of datasets we’re starting to deal with, this isn’t really what Hadoop was originally designed to do. Make sure you check with your Hadoop distribution provider to make sure your entire solution will work with a shared storage subsystem. Frankly I think this is inevitable.
All the other IT stuff applies – you have to plan for high availability and continuity (not so much for Hadoop since it has HA built-in), but all the analytics and visualization platforms that are the front-end to your system. This means you have to account for additional software licenses, bandwidth to replicate your data etc. to make sure it is always available to your end-users.
Remember that Big Data is still in its infancy. This is a solution area that is going to rapidly evolve over the next three years. We are going to see new standards and new technologies evolve as more companies start implementing these solutions. Even though you may not have a need for such a solution today, you will eventually, because the speed of data creation is only going to increase over time, and if your competitors can use this in a meaningful way to make smarter business decisions, bring new products to market, and enhance existing offers you want to be in the position to do the same to innovate and stay relevant.