Business Intelligence, and more specifically Big Data, are hot topics in the industry right now due to the vast potential organizations have to benefit from these capabilities. Two key factors include improving decision making and hopefully discovering new insights from this data. But for many, it can be hard to determine what is needed to build a proper BI solution to meet their needs. On top of that, Big Data can bring even more questions than answers and oftentimes muddy the waters leading to wasted time, effort, and resources when struggling to build an effective BI solution.
The Three V’s
Anyone that has read about or discussed Big Data recently will most likely recognize the three V’s:
Simply put, we are concerned with handling large amounts of data, the speed at which the data is being collected and consumed, and multiple types of data. The problem with these attributes is that they are all in the eye of the beholder. While a few terabytes (TBs) might be Big Data for one organization, another might be struggling with petabytes (PBs) of data – vastly different scales of volume that can call for different solutions to meet the needs of the organization.
Oftentimes we see customers looking to deploy Big Data solutions, more specifically Hadoop distributions, to support BI simply because Big Data is the new technology on the block. But when we examine the structure and requirements of the BI solution, a more traditional data warehousing solution can often be a better choice that can deliver superior performance. Careful planning is a necessity when building out a BI solution in order to determine the goals and requirements for the organization in order to choose the appropriate technology to support those needs.
It’s important to remember that Big Data solutions are not replacing traditional data warehouses but rather augmenting and opening up new data sources and opportunities for a more holistic BI solution. Even with the added complexities of the three V’s, the ultimate goal still remains the same: to utilize your data in order to improve business decision making and discover new insights.
Traditional BI solutions have been handling large-scale solutions and even extremely large-scale solutions with the introduction of Massively Parallel Processing (MPP) systems, for a long time now. But those solutions have typically been limited to structured data. One of the biggest hurdles that many customers struggle with is handling the variety of data they would like to utilize in their BI solution, specifically semi-structured or unstructured data which is not a strength of a traditional relational database system.
To handle semi-structured and unstructured data we normally would be required to perform transformations of that data into a structured format through the use of an Extract, Transform, and Load (ETL) process to bring that data into our structured system. With this process we often need to reduce the volume and/or variety of data that is brought in from these semi-structured/unstructured data sources due to storage/performance constraints. Thus resulting in our solution only having a sampling of the data set or missing data that may be relevant as our BI solution evolves. It can also be difficult to adjust the structure of the database(s) with changing requirements.
With a Hadoop solution, we have more flexibility to handle semi-structured and unstructured data as we can store massive volumes of data on inexpensive, commodity hardware and determine the structure of the data as necessary. We can also easily adjust that structure as our needs change.
Big Data solutions like Hadoop are providing the capabilities to handle these difficult-to-manage data sources and bringing new opportunities for improving and expanding BI solutions.
How Big Data can Improve BI
I recently attended a Big Data event in Chicago that was hosted by Gartner where they shared some great customer stories around the use of Big Data in BI solutions.
The first story I want to share is a great example of using Big Data solutions to handle massive volumes of data. The customer is a major online travel site for booking flights, hotels, and rental cars that was looking to improve the client experience. This organization had historically only been able to retain 14 days worth of behavioral data from visitors of their site due to storage constraints. This obviously resulted in only a small sampling of the data being available for analysis at any point in time.
By leveraging a Hadoop distribution, they were able to cost effectively store and maintain exponentially more of this behavioral. In a single year, they collected 750TBs of data resulting in a deeper analysis of visitor behavior patterns. As a result of this deeper data analysis, they were able to make improvements to the client experience and ultimately increased bookings 2.6%. In other words, 50,000 additional transactions per day!
This is a terrific example of leveraging previously unmanageable amounts of data to improve the customer’s BI solution.
Another interesting story focuses around handling the velocity of data. The customer was a national steakhouse chain that was looking for a way to improve their brand and reputation. They decided to utilize social media to understand customer sentiment towards their brand by continuously scanning Twitter for any mention of their name.
By tapping Twitter as a data source, they were able to see a tweet by a loyal customer regarding his flight being delayed and the fact that he would not be able to make it to dinner that night at his favorite steakhouse. They were then able to identify the specific customer through integrations with their customer management system and the marketing team jumped into action. Waiting for the customer as he got off the plane was a steak prepared just the way he likes it delivered by a tuxedo dressed waiter.
This a great example of a solution that mashed together unstructured and structured data to provide marketing with relevant, time sensitive information allowing them to act quickly and deliver a great customer service story.
There are numerous other stories just like these two examples that I have shared. But every customer and solution will be a little different as everyone’s data is different. It is key to remember that BI solutions, especially those involving Big Data, take time and development to realize benefits. While vendors are making it easier to stand-up, the necessary infrastructure it is not magic box.
We must take that infrastructure and those tools to develop and garner insights from our data. BI solutions need to be carefully planned based on goals and requirements. But keep in mind that the solution will continue to evolve as we look to leverage more and more new data sources of every changing volume, velocity, and variety. Don’t be afraid of what you don’t know. But explore the numerous ways to deliver BI to your organization and find what works best to meet your needs.