Data Warehousing: Should You Store Data Internally or in the Cloud?
One of the least understood and perhaps most divisive decisions to be made regarding implementing a business intelligence solution will have to do with the storage of its data. The issue is not with where your data will be stored as it will most likely be in a data warehouse, but rather where your data warehouse will be stored. For most organizations it’s a race in two. You can either keep your data warehouse within the four walls of your business, or you can upload it to that mysterious place we call ‘the cloud’.
So which is the best choice? To find out, we should look both at how data warehousing has developed up to this point, and the sort of considerations that your organization should be keeping in mind when making the call.
A Very Brief History of Data Warehousing
While various things that could be labeled as data warehousing have existed since the early 70s (when market research giant ACNielsen provided its clients with a sales enhancement tool called a ‘data mart’), the term ‘data warehousing’ wasn’t seen in print until the late 80’s. Perhaps unsurprisingly, the term originated from Big Blue. It was IBM that both saw the obvious benefits of data warehousing, and had the wherewithal to act.
But while IBM’s efforts were kept very much in-house, a man by the name of Bill Inmon saw the commercial potential of the technology, and is credited with making data warehousing available to all businesses during the early 90s, through his company Prism Solutions. In the intervening quarter century the technology has developed at light speed, but some changes have been more marked than others.
The foundation on which data warehousing has been built is the storage of data within internal repositories, but with the explosion of cloud computing, it’s fair to say that this foundation has been rocked. Such a paradigm shift can be unsettling, and may raise questions in the minds of those who are either new to the field, or are used to doing things a certain way.
So is cloud computing compatible with data warehousing? Can data warehousing adapt to the vagaries of the cloud? And even if it can, is it a wise storage choice?
Considerations for Deciding between Internal or Cloud Storage
To answer the questions above, let’s take a look at some of the most common reservations that are voiced, and the considerations that one needs to make, when deciding between internal and cloud storage.
The most common reservation for organizations in this space concerns security. ‘If I send my data elsewhere, how do I know it’ll be kept safe?’ Some organizations feel far more comfortable with their own security safeguards than they do with those of cloud computing services. It’s an entirely understandable thought, but it’s one that has a clear logical rebuttal.
It must be understood that the full time job of cloud service providers is to keep their clients’ data secure – their existence is entirely dependent on it. A reputable provider will employ a team of specialists tasked with keeping data safe.
Now ask yourself, does your business boast multiple teams of experts whose sole focus is on the security of your data? Unless you’re one of the biggest players in the market or a government organization, the answer is likely no. Therefore in all likelihood you’re less equipped to handle your own sensitive data than a cloud provider is.
In fact, According to Nominet, 61 percent of security professionals believe that the risk of a security breach is the same or lower in cloud environments compared to on-premise. In 2019, Nominet's VP of cyber security, Stuart Reed, shared that the challenge now isn't if the cloud is more or less secure than on premise- the challenge is to continually adapt your company's security efforts to the new technology. He shares: "[A]s we move into the ‘cloud era’, arguably security teams need to channel their concern into finding solutions that work with the cloud, just as they have been doing in an on-premise environment. The shift in attitude between on-premise and cloud doesn’t change the remit for security teams, it just puts us on a different type of playing field."
Using the cloud likely represent a security upgrade for your data, as more and more companies invest their time and expertise into building new data security strategies that work with the cloud.
Efficiency and Cost Savings
Internal data storage can be a pricey undertaking. Internal data warehousing compels an organization to purchase hardware, to stay on top of compliance and regulatory concerns and to hire contractors, employees and consultants to oversee its operation. Cloud computing, on the other hand, represents a classic example of economies of scale. By creating an architecture that can be utilized by large numbers of users, the amount of resources required to store and manage x amount of data is greatly reduced.
Just because cloud storage can be cheaper than on premises storage doesn't always mean that it will be. In some cases where you will be using a fixed amount of storage over a long period of time, storing your data on premise may be more cost effective.
For those companies whose needs fit the cloud, though, investment in cloud storage is increasing. According to Flexera's 2020 State of the Cloud report, respondents on average in 2020 use 2.2 public clouds and 2.2 private clouds, and they expect to increase their cloud spend by 47 percent in 2021. No longer are they wondering if they will be using the cloud- they're instead focused on how to better use their current cloud storage for cost savings. The cloud is here to stay, only with the new concern of how to minimize waste on cloud spend.
One understated benefit of using the cloud is the ability to quickly and easily share data with those that need it. While you’ll obviously apply strict controls to exactly who can see what, an internal system simply can’t compete with the cloud when it comes to instantaneous data sharing with those outside your organization.
The ability to share data can in fact turn into a potential profit center for a business, particularly those with high volume, low sensitivity data. Manufacturers, for example, can purchase detailed information regarding the performance of their products from retailers, and use this data to enhance their offering. Or, in the case of iRobot and its infomercial-famous Roomba vacuum cleaner, information about the layout of your home could be sold off.
The Value of Microseconds
But where does the cloud fall down? One area is in lag time. Sure, with internet speeds getting exponentially higher this will be less and less of a concern for most, but the basic rules of physics still state that no matter how quick your connection, it still takes a few milliseconds more for data to travel across the word than it does to travel to an internal data warehouse sitting just a stone’s throw away.
Even if information is traveling at the speed of light through fiber-optic cables (which it can’t) and didn’t have to pass through any switches or relays (which it will), the data would still take over 3 milliseconds to get to the other side of the planet. It’s for this reason that in environments were milliseconds count – in high-frequency trading, for example – the trend is towards super localized data warehouses that facilitate the transaction of data in a literal instant. While not a common concern, it’s certainly one to consider.
The Need to Upload
Another obvious concern for those considering data storage in the cloud is the need to get it there. When you have terabytes or even petabytes of data to upload to the cloud, obvious questions regarding security and bandwidth are raised.
Many cloud providers get around this in an ever so quaint way by accepting preloaded hard drives via secure delivery. Who’d imagine that snail mail could be so helpful in cloud computing? Others use third party providers like Equinix to facilitate direct connections to the cloud.
There are ways and means to mitigate the challenges presented by the transfer of excessive amounts of data, but some would argue that they aren’t particularly elegant. And these challenges evaporate as soon as you commit to internal data warehousing.
Snowflake: Bridging the Gap between Internal and Cloud Storage
We would be remiss not to mention one of the greatest innovations in data storage- Snowflake's unique database architecture designed completely for the cloud. Snowflake unites your data lakes, data warehouses, data marts, and even your cloud storage, making your data accessible on any computer that has a Snowflake login.
We wrote a comprehensive blog on what Snowflake is and how it can be used in combination with your current data storage solutions, but here is a quick summary of how Snowflake may be useful in your company's data storage strategy:
First, Snowflake resolves the problem of constantly having to scale up your on premise data storage. Its architecture, hosted in the cloud, can read data from several sources simultaneously (so from your data warehouse, data lake, cloud storage, etc.) and then almost instantly make it available to users- both internally or to customers. This eliminates the problem of having to manually update information across storage solutions if you use both the cloud and on-premise storage.
Additionally, Snowflake's unique architecture allows for extremely efficient data processing- making your data more quickly available than it is with purely on-premise storage. Because Snowflake is built to operate in multiple processing clusters that communicate in the cloud, those clusters process your analyst team’s queries all at once, and you can even set Snowflake to automatically expand and contract based on the size of your workload. With these features you’ll get insight faster than ever before, all without overloading your system.
Lastly, using a completely different architecture than what is used for data warehouses, Snowflake is built for the cloud. This gives users instant access to resources across several types of storage, and you can even connect several cloud platforms at once into an overarching cloud strategy. With Snowflake you can easily add additional storage, share your data with anyone across the organization, and access data across multiple platforms.
We suggest reading our full blogpost on Snowflake to understand if it is the right solution for your business, but we can assuredly say that using Snowflake is an effective, efficient way to create a reliable single source of truth for your company.
The Choice is Yours
So which side of the data warehouse storage fence does your organization sit? The answer, perhaps annoyingly, is not black and white. It will depend on how each of these considerations affects your business and can only be found by carefully analyzing your strengths, weaknesses, needs and wants.
The one concrete fact that you can take away from this article is that the importance of having an effective data storage strategy will only get more pronounced into the future, so whichever data storage path you choose, you must ensure that it will be capable of servicing your organization well into the future.