Home Tech The Future of the Data Warehouse

The Future of the Data Warehouse

by Mangal Tiwari
The Future of the Data Warehouse

Recent developments in data management have raised discussions about the future of the data warehouse. Many pundits may have declared it dead but most organizations continue to operate at least one data warehouse and expect to continue to do so in the foreseeable future. Although the vision of the data warehouse as a single version of the truth has not been realized, it still provides value to organizations.

Today there is an explosion of sources feeding into the data warehouse and many applications making use of data. Product teams have to be able to leverage a plethora of data to inform the decisions they’re making and the data warehouse acts as a source of truth.

Traditional data warehouses can’t keep up

GigaSpaces describes how next-generation Operational Data Stores (ODS) deal with all the limitations of the traditional ODS when it comes to fast performance and low latency. In the same way, legacy data warehouses need modernization in order to fit into modern analytics ecosystems. Traditional data warehousing is unable to cope and needs to evolve both architecturally and technologically to be more agile, scalable and adaptable in the face of continuous change.

Some data warehouse problems

With the flexibility of the data warehouse comes the onus on teams to figure out how to structure the data.

One of the problems of using the data warehouse as a source of truth is the lack of speed at which they update. They aren’t designed to be refreshed every five minutes. Data warehouses are very powerful and scalable but the cost is data refresh speed. A view of data that joins a number of data sources means going at the speed of the slowest source and the latency can really add up.

Data consistency and correctness is an organizational-wide challenge. If mistakes are made upstream, this affects data consistency. When it comes to the accuracy and consistency of data, it is not possible to get everything correct. More care has to be taken with data collection to help improve accuracy.

Companies that are continually changing, improving and adapting will inevitably experience some problems as part of the permanent exploration loop.

Desirable qualities of a data warehouse

Integrated data: The disparity among data sources needs to be resolved to offer a consistent, reliable source of data for reporting and analysis. People are unsure about leveraging data if they don’t know it is correct.

Non-volatile data: Retains unrevised history and supports reliable reports and analysis of past events.

Cleansed data: Data must be cleansed to mitigate the risks coming from defects in data.

Subject-oriented data: Data is organized around major business subjects.

Time-variant history: Capturing data at uniform time intervals and keeping it beyond its lifespan in operational source systems and organizing it to support time-series analysis.

What’s new in data warehouses?

Data warehouses have existed for decades but it wasn’t always possible to power workloads. Only in the new generation of data warehouses is it possible to separate storage from the compute. Within compute, it is also possible to separate workloads. Two users can therefore scale without affecting each other. Nightly or monthly analytics batch processing would take all the resources but now users are able to explore more uses.

Cloud data warehouses

A new generation of cloud data warehouses has unique profiles in terms of how they scale and how they grow. They respond to many of the challenges of legacy data warehouses, offering scalability, elasticity, performance, and workload management.

Cloud data warehouses have created space in the ecosystem for new categories of products that were previously handled by monolithic data warehouses. This has allowed entrepreneurs to create innovative solutions and the ability to utilize all the data in ways that were not possible when a handful of companies controlled the gateway to having a data warehouse.

Data teams and business teams need to work together

The main value of allowing a data team to serve the rest of the company is that it moves from working in silos to working together. This makes it possible to start distilling the needs of different teams in an organization into a common set of data points and build a unified data model with core analytics. Data teams and business teams have the same goals. The whole notion of data warehouses exists to drive value and that value is actionable insights that can inform decision-makers.

What use cases could emerge in the next few years?

Data warehouse modernization is likely to occur as a series of changes are implemented over time. We will continue to see more data sources and the need for a source of truth to unify that data becomes even more significant. Latency also becomes more critical when adding more and more critical use cases.

We are in a journey of people wanting to own their data and data privacy and the data warehouse is likely to play a big part in that over the next decade – how to build privacy controls and online security into the data warehouse.

More advanced analytics are still out of the reach of most business users. There is another stage at the top of the value pyramid whereby automating or creating better tooling around more advanced type analytics can help with understanding why things are happening. This goes beyond looking at dashboards and reports to predicting or understanding what drives outcomes.

Several recent technologies are capable of addressing many of the legacy data warehouse challenges. One of these is in-memory columnar storage that delivers measurable performance gains when working with large volumes of structured data and compute intensive analytics applications.

A final word

Modernizing legacy data warehouses includes migrating to more updated technologies that can improve performance, data variety, freshness and scalability. The architectural framework needs to break down data silos, minimize data redundancy and maximize data re-use. An integrated analytics ecosystem provides the necessary range of data, improves data management, enhances data value, and creates a positive business impact.

You may also like

Leave a Comment