Google Cloud federates warehouse and lake, BI and AI

We’re excited to bring back Transform 2022 in person on July 19 and virtually July 20-28. Join AI and data leaders for insightful talks and exciting networking opportunities. Sign up today!


Google Cloud is making a series of announcements today, covering a variety of its data, analytics, and AI services. A mix of preview and general availability (GA) releases will be released today that together will bolster Google’s data and AI story as it competes with Amazon Web Services (AWS) and Microsoft Azure.

In a blog post, Gerrit Kazmaier, general manager of databases, data analytics and Looker at Google Cloud, said: “With the dramatic growth in the amount and types of data, workloads and users, we “We are at a tipping point where traditional data architectures, even when implemented in the cloud, cannot unlock their full potential. As a result, the gap between data and value is growing.”

Perhaps in response, the overall theme of Google’s ads today is bringing things together. Google Cloud’s data warehouse and data lake will be more integrated; Google’s organically developed business intelligence (BI) components will work more closely with Looker BI technology that Google acquired in 2020; and Google’s AI and analytics components will also work together more seamlessly.

A warehouse near the lake.

Perhaps the most important of today’s announcements is the preview release of a new data lake offering, called BigLake. As you might guess from the name, this service will make data lakes stored in Google Cloud Storage (GCS) much better integrated with BigQuery, Google Cloud’s data storage service. Google Cloud customers will not only be able to query data across the lake and the warehouse together, from services like Spark, Presto, and even TensorFlow, but also have the security and governance of data across the lake and the warehouse unified.

This coordination of lake and warehouse will resonate with fans of the so-called lake house model, while still respecting that data lake and data warehouse technologies have relative strengths. In other words, customers will have a choice of what data to store where, and can still have a unified query and governance experience. General availability of this service is likely to arrive at the end of the calendar year.

Google is also announcing something called Spanner change streams, a change data capture service that will replicate real-time data from Google Cloud Spanner to BigQuery, Pub/Sub, or Google Cloud Storage. This offering seems fairly comparable to Microsoft’s Azure Cosmos DB change feed. This service isn’t available yet, but Google says it’s “coming soon.”

A great deal (BI)

Six years ago, Google launched its self-service BI product called Google Data Studio, making it easy for business users to create visualizations of data stored in a variety of repositories and platforms. Later, extensions were made to make Google Sheets more data-savvy as well. But then Google Cloud also acquired independent BI player Looker, which left customers and industry journalists (including this one) wondering what the future held for Data Studio.

Google is clearing up that story today, explaining that Google Data Studio can now connect to data contained in Looker models, and that Google Connected Sheets can do the same. Looker includes Explore’s data query and visualization front-end, but it also has a kind of back-end, allowing customers to create comprehensive models that combine data from different sources and define the elements of that combined data that constitute the measures (metrics) and dimensions (categories, such as product, time, and location) of the model, used to aggregate or break down the metrics.

Looker models are built in a special language called LookML (“ML” stands for markup language, not machine learning) and those models will now be readable by Google Data Studio and Google Sheets, allowing them to serve developers, analysts of enterprise BI, self-serve to business BI users and spreadsheet users as well.

AI, meet BI

Google has, for quite some time, seen itself as the main competitor in creating the first-class cloud for artificial intelligence (AI). And while the company’s AI prowess is quite apparent, Google Cloud AI was until recently more of a collection of individual services. The assortment included a TensorFlow cloud service, a variety of Web API-based cognitive services, and an in-database AI service called BigQuery ML (where, this time, ML stands for “machine learning”). Meanwhile, Microsoft’s Azure Machine Learning and AWS’s SageMaker offered more integrated machine learning platforms, albeit sometimes under a common brand.

Google’s response to this was its Vertex AI service, launched into general availability in May of last year. And here again, Google Cloud is focusing on cohesion and integration. An important part of the service, Vertex AI Workbench, which launched today for GA, natively integrates with BigQuery, Serverless Spark, and Dataproc.

Today, Google is adding a new model registry to Vertex AI. Think of a model registry in the world of machine learning as comparable to a catalog of data in the world of database and analytics, in that it is a central repository and governance tool that enables searches for all data. machine learning models of an organization. Google also notes, keeping with that general theme of unification, that the model registry will catalog models that live in both Vertex AI and BigQuery ML.

Analysis stack reduction

What’s interesting about all of today’s Google announcements is how reminiscent of patterns that have already appeared in the worlds of analytics and BI. For example, building a parallel data warehouse/data lake environment is a lot like what Microsoft’s Azure Synapse Analytics had already done: bring together the old Azure SQL data warehouse with Azure Data Lake Storage, Spark, and an engine. data lake query. .

On the BI side, bringing native and acquired technologies together is very reminiscent of what Microsoft, IBM, SAP, and Oracle did in the 2000s when they made their own BI acquisitions, of ProClarity, Cognos, BusinessObjects, and Hyperion, respectively. Even the idea of ​​Google using Looker’s semantic layer technology to tie in with Data Studio and Connected Sheets is unprecedented. To this day, BusinessObjects “universes,” also a semantic data model technology, are a centerpiece of SAP’s BI story, both on-premises and in the company’s Analytics Cloud service.

In many ways, today’s leading cloud providers mirror the enterprise “mega providers” of fifteen or twenty years ago. And, fittingly, today’s Google Cloud data and analytics announcements show that the enterprise stack model is very much alive, even in the cloud era.

The VentureBeat Mission is to be a digital public square for technical decision makers to learn about transformative business technology and transact. Learn more about membership.

Leave a Comment