Transforming Passive Data Catalog into Active Data Hub to Deliver Significant ROI

According to Gartner, organizations that provide access to a curated catalog of internal and external data assets will derive twice as much business value from their analytics investments by 2020 than those that do not.

Transforming Passive Data Catalog into Active Data Hub to Deliver Significant ROI

Content Summary

Defining the data catalog
What’s wrong with data catalogs?
Introducing the active data hub
Insights: How to deliver a successful active data hub
Putting it into practice: Real-world case studies
Turning data into actionable business

That’s a ringing endorsement of data catalogs, and a growing number of enterprises seem to agree. In fact, the global data catalog market is expected to grow from US$210.0 million in 2017 to US$620.0 million by 2022, at a Compound Annual Growth Rate (CAGR) of 24.2%.

The global data catalog market is expected to grow from US$210.0 million in 2017 to US$620.0 million by 2022, at a Compound Annual Growth Rate (CAGR) of 24.2%

Why such large and intensifying demand for data catalogs? The primary driver is that many organizations are working to modernize their data platforms with data lakes, cloud-based data warehouses, advanced analytics and various SaaS applications in order to grow profitable digital initiatives. To support these digital initiatives and other business imperatives, organizations need more reliable, faster access to their data.

However, modernizing data platforms can create problems, including data sprawl and the propagation of ungoverned data and data quality issues. Although data catalogs can deliver some value by providing visibility into datasets across an organization and helping users better understand relevancy, usability, and relationships between data, is a traditional data catalog ultimately the best solution? Organizations looking to take a “future-proof” approach to data platform modernization may want to consider their options before jumping on the data catalog bandwagon.

Consider this: a traditional data catalog is “passive” and can only deliver limited business value. Enterprises that truly want to derive value from data into the future need to put in place an “active data hub” that enables broader self-service access and data transformation capabilities. Let’s look at why and how.

Defining the data catalog

A data catalog is an inventory of an organization’s data assets that provides context through description and organization that enables data consumers, including business users, data scientists and data analysts to understand and discover datasets needed for business initiatives.

In our work, we generally see two types of data catalogs. There are “pure-play,” single-purpose solutions that generally focus on inventorying data using machine learning, and also may enable data annotation and some governance for updates.

The second type of data catalog is one that is embedded into data management, data governance and analytic applications. These catalogs have many of the same features as single-purpose data catalogs but are more geared towards improving the inventory of data for the overall application. Analytic applications may provide an improved or integrated catalog for models and model usage that is superior to that of single-purpose solutions.

There is value in both of these types of catalogs, depending on your internal requirements, whether you want to standardize your technology platforms, and the level of integration you want between data inventory and your data governance, data management or analytic management capabilities. There are options to consider – and it’s a good idea to take a closer look at the pros and cons.

What’s wrong with data catalogs?

There are many use cases for data catalogs that can bring business value. They can support new data initiatives, enable regulatory compliance, increase data preparation productivity and improve data quality control. However, we are finding that more of our customers also want the ability to take action from their catalog to allow their data consumers to not only understand what data they have, but use it to support a more data-driven business approach. They want the data catalog to function as the starting point for self-service activities such as data provisioning and transformations. In this unconventional scenario, the platform would enable data consumers to “shop” for data in the catalog and then transform and move it to a sandbox or analytic or reporting environment.

The problem with traditional data catalogs is that they are “passive” and don’t allow for self-service data preparation. Data consumers/lines of business still must rely on IT, which can create bottlenecks and slow down time to insight.

Traditional Data Catalog:

  • Passive
  • Relies on IT for data provisioning/ transformations, causing bottlenecks
  • Enables access to inventory of data sets

Active Data Hub:

  • Active
  • Delivers rapid business value to lines of business
  • Enables self-service access to both metadata and data
  • Ensure governance throughout the entire self-service supply chain

Introducing the active data hub

If not a data catalog, then what? A future-proofed data catalog is what we call an “active data hub,” meaning an integrated catalog that enables faster delivery of “actionable” data to line of business (LOB) users.

Zaloni has found that more of its customers are demanding that the data catalog function as the starting point for self-service data preparation to more rapidly deliver “actionable” data to business users.

In addition to enabling users to discover data, an active data hub serves as the foundation of all self-service data activities. This built-in functionality helps ensure a self-service architecture that provides faster time to insight and reduced IT workload while ensuring a governed and managed process throughout the entire self-service lifecycle. This is absolutely essential for being able to trust your data quality and truly derive value from your data.

Through a unified platform, such as the Zaloni Data Platform (ZDP), data quality and governance are managed inside the platform without the need for users to learn new tooling or technologies. Different teams can use the data hub at any level and speed, such as to understand where data exists through a simple catalog, to enrich selected data from third-party sources quickly, or to “action” that enriched data from a self-service environment to their business applications. The catalog also serves to facilitate reuse and consolidation of metadata from enterprise definitions and standards from all types of applications, including other catalogs, data governance and metadata applications, as well as traditional data sources.

Another critical component of the data management platform is its ability to integrate with and leverage existing data management infrastructure so that any work that needs to scale, or be operationalized into a ongoing process can be managed via the IT team’s existing processes.

Active Data Hubs with the Zaloni Data Platform
Active Data Hubs with the Zaloni Data Platform

Benefits of an Active Data Hub:

  • Significantly reduces load on IT and eliminates bottlenecks
  • Makes finding relevant data easier and faster for LOBs
  • Reduces complexity and speeds up data ingestion
  • Unifies traditionally fragmented management and BI solutions
  • Enables delivery of curated datasets to LOBs and data consumers

Insights: How to deliver a successful active data hub

From our extensive real-world experience in data management and governance, we’ve found the key to a successful enterprise-wide self-service data hub is to start small, one line of business at a time, and build from there.

It’s important to note that implementing a self-service data hub doesn’t happen all at once, and is tied to an organization’s level of data maturity. To help organizations determine their current level and roadmap where they are headed, we’ve developed a maturity model, as you can see below.

Maturity Model to Active Data Hubs From Data Swamp to Fast Time-to-Value
Maturity Model to Active Data Hubs From Data Swamp to Fast Time-to-Value

Level 0 is essentially a data “swamp” with limited visibility into enterprise data. Level 1 involves implementation of a data catalog to improve visibility and provide broader self-service, typically role-based access. Level 2 includes implementation of the first phase of an active data hub, providing the ability to enrich data – e.g., annotate, socialize, prepare, cleanse and govern the catalog inventory. Finally, Level 3 is a mature self-service data hub that enables data provisioning, where users can “action”and use datasets in the catalog to share with other users or applications, publish to a workspace, or access directly via APIs.

To ensure your enterprise-wide, active data hub delivers business value, we recommend a three-step approach to implementation:

Step 1: Build what would be considered a traditional catalog that can be accessed enterprise-wide and by every line of business (LOB).

Step 2: Work with one LOB at a time to enrich the data that the LOB needs to achieve its goals. As you work to understand the group’s requirements and enrich the data using both internal and external sources, the key is to enable your business users to process the data themselves and let them do it quickly in a self-governed way. We find there is no point in doing this at an enterprise level because it takes too long and delivers minimal value.

Step 3: Action the enriched data one LOB at a time. This should be a discrete enhancement, done separately from steps 1 and 2. This step is where you’ll see the power of the active data hub. This, to us, is Level 3 in the data maturity model. Business users will be empowered to take the enriched data definitions and underlying data, and deliver it to their business applications. Furthermore, the data hub keeps everything coordinated and updated as the enriched data builds and changes.

Putting it into practice: Real-world case studies

Following are some case studies that illustrate how a unified active data hub resulted in real business value for leading global companies in various industries.

Global Financial Institution
A leading global financial institution, with more than 47 million consumer and business customers and 4,500 retail centers around the world. The company wanted to create a centralized resource for its enormous volumes of data so that it could more easily be accessed by IT teams and various lines of business for data and analytics initiatives.

Solution: Self-service data hub that provided centralized data management and governance, and enabled self-service for data consumers and lines of business through its data catalog capability.

Results:

  • Data consumers could quickly find relevant, high-quality and accurate data through a shopping cart experience
  • Enabled new analytics use cases across the organization
  • Ensured regulatory compliance

Financial Services and Stock Exchange Company
An industry leading financial services and stock exchange company based in Canada, with more than 1,200 stores in 17 states, was looking for a way to organize and manage their data lake so they could monetize their proprietary data and increase shareholder value.

Solution: Unified self-service data hub platform to ingest organize, manage and govern data and provide governed self-service access to data consumers.

Results:

  • Enabled new use case to monetize proprietary data
  • Ensured regulatory compliance

Healthcare and Pharmaceutical Company
A multinational healthcare and pharmaceutical company with more than 43,000 employees in 80 countries, and products in more than 170 countries wanted to build a cloud-based data lake for various use cases. One of the first use cases was to assist the marketing team with customer analytics to improve and create more targeted marketing campaigns.

Solution: Self-service data hub on top of a data lake in Amazon S3/EMR, using Amazon Redshift. Enabled automated data ingestion; metadata management; data governance such as lineage, data quality and privacy and security; and data lifecycle management.

Results:

  • Reduced time to insight and improved decision-making across the organization
  • Enabled auto-discovery and inventory to populate, manage, and govern data
  • Provided governed access to business users to perform self-service tasks

Turning data into actionable business

Unlike traditional, broad data management solutions that are complex, costly and deliver low value on data, or narrow point solutions that are unusable in isolation and must be bolted together to provide comprehensive time-to-value for a business, Zaloni’s actionable data hub is designed to deliver the breadth of data management, governance and self-service capabilities enterprises require to move fast at scale.

Source: Zaloni