CSG Blog

Do I Need Hadoop?

Written by Mark Peterman | Oct 15, 2016 3:59:19 PM

Does Hadoop fit my Company?

You might want to pose yourself some questions about Hadoop before jumping aboard that train.

 

Is it mature?

Will it give me better results than the system I have now? Will it improve customer service and increase my earnings? And most importantly…

 

What is it?

Hadoop is essentially a database management system designed to process vast amounts of data. It handles Big Data similarly to a data warehouse, but doesn’t require the data to be structured on the way into the data store. People are wondering if they need to get on the Hadoop express before they're lost in the dust of more progressive companies. The disappointing answer for many questioners is: "Probably not". Which, of course, immediately evokes a "But WHY?"

 

Not Ready for Prime Time

For a massive organization like Walmart it only takes about 10 Data Engineers to deploy Hadoop for the entire company. That is because they are experienced at innovating and leading the pack, not because it is easy. Most companies would flounder trying to get up to speed. Remember that Walmart invented the empty warehouse concept. Moving from the upper image to the lower was only possible when they imposed RFID (Radio Frequency Identification) tags on all their suppliers (under threat of being dropped from the world's largest retailer). They then integrated JiT (Just in Time) shipping. Now pallets began to arrive with the exact contents for a specific store, identified with Walmart's own RFID tags. They got it down to an art form. A truck arrived, a pallet was scanned, and it was delivered directly to the appropriate truck on the other side of the warehouse. Total time spent inside the building usually under 30 seconds.

 

Secure with Shareable Data

Love or hate them, the difference with an intelligent company like Walmart is they deploy carefully and thoughtfully. Every time they do something new it corresponds with an increase in their profits. When they deployed Hadoop, first they separated the Financial Data from the Customer Data. Then they tokenized everything identifiable, and encrypted the rest. Now all departments can access it without revealing Personally Identifiable Information (PII).

 

Hadoop can Improve Customer Service

They still weren't done. Without PII they could make it available to everybody in the organization that had any possible use for it. It wasn't confined to just a few elite users. When one department sought to work with a different department to combine sales data with online marketing data, suddenly they could offer e-receipts. Once that was up and running Walmart added a system that tracks the price paid, and if the store has a price drop, or a competitor offers a lower price, they automatically send gift certificates to the customer's email for the price difference. Consequently, customer loyalty skyrocketed, and once again, sales improved as clients took advantage of the "free" Gift Certificate to make additional purchases.

The Takeaway

Hadoop is still Immature; Data Lakes are turning into Data Swamps. Hadoop is likely not as fast as your current system (yet). It is not optimized for I/O (yet). The more people that are using it concurrently, the slower it goes. These things will come in time. Right now your Data Warehouse is probably an optimized system of hardware and software which easily outperforms Hadoop. Even now people are building systems incorporating SQL (Structured Query Language) in a manner that actually works with Hadoop. It's certainly going to improve, and as it gets faster and more accessible, experts say that it will eventually replace the data warehouses.  

Keep in mind, however, that installing Hadoop requires new IT personnel, with new skills. If you want to be a leader, this just might be a good time to start. But it's going to take the wits to once again follow in the footsteps of Walmart. First tokenize, encrypt, and protect; second, do not hoard your data because it's useless if it's not shared. Finally keep abreast of developments, implementing them in a way that is useful to your organization, and not just because it's the newest thing or it "looks pretty". Consider carefully before you dive in to Hadoop because data warehouses are often the more efficient system for the time being.  

And for both traditional data warehouse technology and newer big data technology like Hadoop, you have a growing array of choices for how you deploy the solution.  If your business needs dictate an on premise solution, you can go that way. But setting up either option in the Cloud gets easier by the day. As you prepare for the future, you could begin training your people on these new technologies so they can better decide the path forward. And with the Cloud solutions, it's easy and inexpensive for them to begin learning and experimenting on their own. Of course, a consulting firm with experience in these technologies can help you get up to speed quickly. Or, if you know your needs are going to require big data technologies like Hadoop, then you might start hiring the right people for those future needs. Usually it's about adopting technology ahead of the curve, but in this case it is not fully mature.

So, just this one time, sit back, look it over, discuss it with your IT people, and then take action carefully.