What lies beyond the data warehouse?

Since the 1990s, organisations have gathered, processed and analysed business information in data warehouses.

The term “data warehouse” was introduced to the IT mainstream by American computer scientist Bill Inmon in 1992, and the concept itself dates back further, with the founding of Teradata in 1979 and work carried out by IBM in the early 1980s.

Their goal was to allow enterprises to analyse business data to improve decision making, without the need to interrogate perhaps dozens of different business databases.

Since then, the technology has evolved, allowing organisations to process data at greater scale, speed and precision.

But some commentators now believe the data warehouse has reached the end of its useful life.

Ever greater volumes of data, along with the need to process and analyse information more quickly, including potentially in real time, are putting stress on conventional data warehouse architectures.

And data warehouse suppliers face competition from the cloud. An on-premise data warehouse can cost millions of dollars, take months to implement, and, critically, more months to reconfigure for new queries and new data types. CIOs are looking at the cloud as a more flexible home for analytics tools.

Exponential growth in business data

Conventional data warehouses are struggling with exponential growth in business data, says Richard Berkley, a data and analytics expert at business advisory firm PA Consulting.

“The cloud now provides much more scalability and agility than conventional data warehouses,” he says.

“Cloud technologies can scale dynamically, pulling in the processing power needed to complete queries quickly just for the processing time. You’re no longer paying for infrastructure that sits idle and you can get far better performance as the processing for individual queries is scaled far beyond what is feasible in on-premise services.”

Nor are data volumes the only challenge facing the data warehouse. Organisations want to avoid being locked into one database, or data warehouse technology.

Increasingly, businesses want to draw insights from data streams – from social media, e-commerce, or sensors and the internet of things (IoT). Data warehouses, with their carefully crafted data schemas and extract, transform and load (ETL) processes, are not nimble enough to handle this type of query.

“The market has evolved,” says Alex McMullan, chief technology officer for Europe, the Middle East and Africa at storage supplier Pure.

“It is no longer about an overnight batch report which you then give to the CEO as a colour printout. People are doing real-time analytics and making money in the space.” Applications, he says, run from “black box” financial trading to security monitoring.

Lakeside view

At one point, data lakes appeared set to take over from data warehouses. In a data lake, information is stored in its raw form, on object storage, mostly in the cloud.

Data lakes are quicker to set up and operate, as there is no prior processing or data cleansing, and the lake can hold structured and unstructured data. The processing, and ETL, takes place when an analyst runs a query.

Data lakes are increasingly used outside of traditional business intelligence, in areas such as artificial intelligence and machine learning, and, because they move away from the rigid structure of the data warehouse, they are sometimes cited as democratising business intelligence.

They do, however, have their own drawbacks. Data warehouses used their structure to build performance, and that discipline can be lost with a data lake.

“Organisations can accumulate more data than they know what to do with,” says Tony Baer, analyst at dbInsight. “They don’t have that discipline of an enterprise architecture approach. We gather more data than we need, and it is not being fully utilised.”

To deal with this, enterprises throw more resources at the problem – all too easy to do with the cloud – and end up with performance “almost as good as a data warehouse, through brute force”, he says.

Controlling queries and costs

This can be inefficient, and costly. Baer points out that cloud analytics suppliers such as Snowflake are building in more “guardrails” to control queries and costs. “They are moving in that direction, but it is still easy to keep adding VMs ,” he says.

Data warehouses and data lakes also exist to support different enterprise requirements. The data warehouse is good for repeatable and repeated queries using high-quality, cleaned data, often run as a batch. The data lake supports a more ad-hoc – even speculative – approach to interrogating business information.

“If you are doing ‘what if’ queries, we are seeing data lakes or document management systems being used,” says Pure’s McMullan. He describes this as “hunter gatherer” analytics, while data warehouses are used for “farming” analytics. “Hunter gatherer analytics is looking for the questions to ask, rather than repeating the same question,” he says.

The goal for the industry, though, is to combine elasticity, speed and the ability to handle streamed data, and efficient query processing, all in one platform.

New architectures

This points to a number of new and emerging categories, including the data lakehouse – the approach taken by Databricks – Snowflake’s cloud-based, multi-cluster architecture, and Amazon’s Redshift Spectrum, which connects the supplier’s Redshift data warehouse to its S3 storage.

And, although the industry has largely moved away from trying to build data lakes around Hadoop, other open-source tools, such as Apache Spark, are gaining traction in the market.

Change is being prompted less by technology than by changes in business’s analytics needs.

“Data requirements differ from those of five or 10 years ago,” says Noel Yuhanna, an analyst covering data management and data warehousing at Forrester. “People are looking at customer intelligence, change analysis and IoT analytics.

“There is a new generation of data sources, including sensor and IoT data, and data warehouses have evolved to address this, [by handling] semi-structured and unstructured data.”

The cloud adds elasticity and scale, and cost savings of at least 20%, with 50% or even 70% cost reductions possible in some situations. However, he cautions that few companies genuinely operate their analytics systems at petabyte scale: Forrester calculates that fewer than 3% do.

Those that do are mostly in manufacturing and other highly instrumented businesses. They might, for their part, turn to edge processing and machine learning to cut down data flows and speed decision making.

The other change is the move towards real-time processing, with “click stream” data in e-commerce, entertainment and social media producing constant flows of information that needs immediate analysis, but has limited longer-term value. Organisations, for their part, will only invest in stream analytics if the business can react to the information, which in turn requires high levels of automation.

This is prompting suppliers to claim they can straddle both markets, combining the flexibility of the data lake with the structured processing of the data warehouse. Databricks, for example, says it can enable “business intelligence and machine learning on all data” in its data lakehouse, removing the need for its customers to run duplicated data warehouse and data lake architectures.

Whether that means the demise of the conventional data warehouse, though, is unclear.

“Without this lakehouse, the world is divided into two different parts,” says Ali Ghodsi, CEO of Databricks. “There are warehouses, which are mostly about the past, and you can ask questions about ‘what was my revenue last quarter?’ On the other side is AI and machine learning, which is all about the future. ‘Which of my customers is going to disappear? Is this engine going to break down?’ These are much more interesting questions.

“I think the lakehouse will be the way of the future, and 10 years from now, you won’t really see data warehouses being used like this anymore,” he says. “They will be around just like mainframes are around, but I think the lakehouse category is going to subsume the warehouse.”

Back to the future

By no means everyone believes the data warehouse has had its day, however. As Databricks’ Ghodsi concedes, some systems will carry on as long as they are useful. And there are risks inherent with moving to new platforms, however great their promise. “Data lakes, and new infrastructure models, can be too simplistic and do not fix the real complexity challenge of managing and integrating data,” says PA Consulting’s Berkley. 

Much will depend on the insights organisations need from their data. “Data warehouses and DL are very complementary,” says Jonathan Ellis, chief technology officer of Datastax. “We don’t serve Twitter or Netflix out of a data warehouse, but we don’t serve a BI dashboard out of Cassandra. [We] run live applications out of Cassandra and do analytics in the data warehouse. What is exciting in the industry is the conjunction of streaming technology and the data warehouse.

“Databases are sticky and although everybody in the data warehousing space broadly supports Sequel, the devil is in the detail,” he says. “How you design schemas for optimum performance differs from supplier to supplier.”

He predicts a hybrid model, comprising on-premise and cloud, open source and proprietary software, to create a “deconstructed data warehouse” that is more flexible than conventional offerings, and more able to handle real-time data.

Others in the industry agree. We are likely to see a more diverse market, rather than one technology replacing all others, even if this poses a challenge for CIOs.

The data warehouse is likely to carry on, for some time at least, as the “gold copy” of enterprise data.

Pure Storage’s McMullan predicts that organisations will use warehouses, lakes and hubs to view different sets of data through different lenses. “It will be a lot harder than it used to be, with modern data sets and the requirements to go with it,” he says. “It is no longer about what you can do in your 42U, 19-inch rack.”

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
Upgrade to iPhone 15? You’ll need this cable for CarPlay thumbnail

Upgrade to iPhone 15? You’ll need this cable for CarPlay

Apple changed the charging port from Lightning to USB-C with the iPhone 15. A USB-C charging cable comes in the box, but upgraders may need to purchase a separate cable to continue using CarPlay. USB-A is common in cars That’s because many cars rely on a USB-A port to enable CarPlay. Wireless CarPlay adapters offer
Read More
Red Hat cutting back RHEL source availability thumbnail

Red Hat cutting back RHEL source availability

[Posted June 21, 2023 by corbet] Red Hat has announced that public source releases will be restricted to CentOS Stream going forward: As the CentOS Stream community grows and the enterprise software world tackles new dynamics, we want to sharpen our focus on CentOS Stream as the backbone of enterprise Linux innovation. We are continuing
Read More
Microsoft i Qualcomm razvijaju poseban čip za AR naočare thumbnail

Microsoft i Qualcomm razvijaju poseban čip za AR naočare

Kompanije Microsoft i Qualcomm potpisale su ugovor o partnerstvu koje se odnosi na razvoj posebnog čipa, specijalizovanog za AR uređaje. Za sada one nisu objavile mnogo podataka o tome kakvi bi to čipovi mogli da budu, kao i u kojim proizvodima će se upotrebljavati, ali su obećale da će ti uređaji biti lagani i energetski efikasni. Ako čitamo…
Read More
This week in AI: OpenAI plays for keeps with GPTs thumbnail

This week in AI: OpenAI plays for keeps with GPTs

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world of machine learning, along with notable research and experiments we didn’t cover on their own. This week in AI, OpenAI held the first of what
Read More
Uncle Sam wants to know how big airlines use passenger data thumbnail

Uncle Sam wants to know how big airlines use passenger data

Ever suspected an airline was using your data to upsell, overcharge, target you with ads, or was selling it to third parties? Worried about how secure their systems are when you input that passport number? The US Department of Transportation is looking into it with a review of the country's ten biggest airlines. The probe
Read More
5 Things That Helped Me Survive 2 Massive Power Outages thumbnail

5 Things That Helped Me Survive 2 Massive Power Outages

Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy through our links, we may get a commission. Reviews ethics statement My kid's frog flashlight was helpful, but a huge battery was the real lifesaver. Stephen Shankland principal writer Stephen
Read More
Index Of News
Consider making some contribution to keep us going. We are donation based team who works to bring the best content to the readers. Every donation matters.
Donate Now

Subscription Form

Liking our Index Of News so far? Would you like to subscribe to receive news updates daily?

Total
0
Share