Modernizing data platforms can create challenges. Although traditional data catalogs can deliver some visibility across datasets, achieving DataOps success, and delivering true value from your organization’s data will require more modern methods.
Collaborative Data Catalog to Achieve DataOps Success and Deliver Significant Value
If you’re looking to leverage a DataOps approach, you may want to consider the options before jumping on the traditional data catalog bandwagon.
In this article, you will learn about:
- Why traditional data catalogs fail to deliver value
- How a collaborative data catalog enables DataOps success
- Real-world case studies
- and more!
Content Summary
Introduction
Defining the data catalog
What’s wrong with today’s data catalogs?
Introducing the collaborative data catalog
Insights: How to achieve DataOps success
Putting it into practice: Real-world case studies
Turning data into actionable business
Introduction
According to Gartner, organizations that provide access to a curated catalog of internal and external data assets will derive twice as much business value from their analytics investments by 2020 than those that do not.
That’s a ringing endorsement of data catalogs, and a growing number of enterprises seem to agree. The global data catalog market is expected to grow from US$210.0 million in 2017 to US$620.0 million by 2022, at a Compound Annual Growth Rate (CAGR) of 24.2%.
Why such large and intensifying demand for data catalogs? The primary driver is that many organizations are working to modernize their data platforms with data lakes, cloud-based data warehouses, advanced analytics, and various SaaS applications to drive profitable digital initiatives. To support these digital initiatives and other business imperatives, organizations need more reliable, faster access to their data.
However, modernizing data platforms can create challenges, including data sprawl and the propagation of ungoverned data resulting in data quality issues. Although data catalogs can deliver some value by providing visibility into datasets across an organization and helping users better understand relevancy, usability, and relationships between data, is a traditional data catalog ultimately the best solution? Organizations looking to leverage a DataOps approach may want to consider their options before jumping on the traditional data catalog bandwagon.
Consider this: a traditional data catalog is “passive” and can only deliver limited business value. Enterprises that truly want to derive value from data into the future need to put in place an “active” data catalog that enables broader self-service access, data transformation capabilities, and collaboration features. Let’s look at why and how.
Defining the data catalog
A data catalog is an inventory of an organization’s data assets that provides context through description and organization that enables data consumers, including business users, data scientists, and data analysts to understand and discover datasets needed for business initiatives.
While there are many use cases for data catalogs, here are some examples that show the business value they can bring to an organization:
- New business initiatives: Visibility to data sets across an organization helps users better understand relevancy, usability, and relationships of data to provide context and meaning to the usefulness of the data, providing a jump-start to any new project.
- Regulatory compliance: Regulatory compliance initiatives are built on the foundation of being able to document, audit, and trace information assets. In the past, this type of documentation might have been left up to spreadsheets, SharePoint sites, or a combination of metadata management and data governance projects. Catalogs play a key role as the glue or the hub of information that allows companies to understand, document, and ensure proper usage of data.
- Advanced analytics: Data catalogs help data scientists quickly identify relevant data sets so they can collect and prepare their data for analytics. Ultimately the catalog improves productivity for data analytics and scientists because they can spend less time searching for data and more time on analytics and providing valuable insights to the business.
These types of use cases are driving the increased adoption of data catalogs we are seeing in the market today.
What’s wrong with today’s data catalogs?
Although there are many use cases for data catalogs that can bring business value, we are finding that more of our customers also want the ability to take action from their catalog to allow their data consumers to not only understand what data they have but use it to support a more data-driven business approach. They want the data catalog to function as the starting point for self-service activities such as data enrichment and provisioning to business applications.
The problem with traditional data catalogs is that they are “passive” and don’t allow for these self-service data enrichment and action capabilities. Data consumers and lines of business still must rely on IT, which can create bottlenecks and slow down time to insight.
To provide these self-service capabilities along with the data catalog, organizations try to pull in various tools and technologies and run into trouble determining how to stitch these cataloging and data preparation technologies into a single environment for delivering a unified data supply chain. This creates integration challenges and can limit the productivity and return on the data catalog.
Traditional Data Catalog:
- Passive
- Relies on IT for data provisioning/ transformations, causing bottlenecks
- Enables access to the inventory of data sets
Collaborative Data Catalog:
- Active
- Delivers rapid business value to lines of business
- Enables collaboration across teams
- Provides self-service access to both metadata and data
- Allows for self-service enrichment and actioning of data
- Ensure governance throughout the entire data supply chain
Introducing the collaborative data catalog
The next generation of the data catalog is what we call a collaborative data catalog. A collaborative data catalog supports DataOps by enabling a unified data supply chain from source to consumer, supporting collaboration across teams, and providing self-service enrichment and actioning of data by end-users.
These built-in self-service and collaboration functionalities provide faster time to insight and reduced IT workload while ensuring a governed and managed process throughout the entire data supply chain. This is essential for being able to trust your data quality and truly derive value from your data.
Through a unified DataOps platform, such as Zaloni’s Arena Platform, data quality and governance are managed inside the platform without the need for users to learn new tooling or technologies. Different teams can use the platform at any level and speed, such as to understand where data exists through the data catalog, to enrich selected data from third-party sources quickly, or to “action” that enriched data from a self-service environment to their business applications. The catalog also serves to facilitate reuse and consolidation of metadata from enterprise definitions and standards from all types of applications, including other catalogs, data governance, and metadata applications, as well as traditional data sources.
Another critical component of the data management platform is its ability to integrate with and leverage existing data management infrastructure so that any work that needs to scale or be operationalized into an ongoing process can be managed via the IT team’s existing processes.
Unified DataOps with Arena
Benefits of a Collaborative Data Catalog:
- Significantly reduces the load on IT and eliminates bottlenecks
- Makes finding relevant data easier and faster for LOBs
- Users can annotate, tag, enrich and share data
- Provides self-service enrichment and actioning of data by LOB users
- Reduces complexity and speeds up data ingestion
- Unifies traditionally fragmented management and BI solutions
- Enables delivery of curated datasets to LOBs and data consumers
Insights: How to achieve DataOps success
Gartner describes DataOps as “a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization”.
Having a platform in place that can support communication and collaboration across teams along with end-to-end management of the data supply chain is critical for DataOps success.
Arena’s end-to-end platform gives you the control and visibility at each step in the data supply chain, resulting in reduced risk and improved analytics value. This level of visibility is also important when optimizing data operations and making process improvements that reduce costs.
Arena in the Life of Your Data
From our extensive real-world experience in data management and governance, we’ve found the key to successful enterprise-wide DataOps is to start small, one line of business at a time, and build from there.
To ensure your DataOps approach delivers business value, we recommend a three-step approach to implementation:
- Step 1: Build what would be considered a traditional catalog that can be accessed enterprise-wide and by every line of business (LOB). This will give you a centralized view of all data.
- Step 2: After you’ve built your catalog, start with one line of business or a specific use case and zero in on what data sets will be needed and any data quality and enrichment processes that will be required. As you work to understand the group’s requirements and enrich the data using both internal and external sources, the key is to enable your business users to process the data themselves and let them do it quickly in a self-governed way. We find there is no point in trying to do this at an enterprise level to start because it takes too long and delivers minimal value. Once you can do this successfully for one line of business, you can repeat the process with other lines of business and use cases.
- Step 3: Next is providing lines of business access for collaboration, preparation, and provisioning. This step is where you’ll see the power of the collaborative data catalog. Business users will be empowered to take the enriched data definitions and underlying data, and deliver it to their business applications. Users can easily find, tag, annotate, and share relevant data. Furthermore, the DataOps platform keeps everything coordinated and updated as the data builds and changes.
Putting it into practice: Real-world case studies
The following are some case studies that illustrate how a DataOps platform with an active data catalog resulted in real business value for leading global companies in various industries.
Global Financial Institution
A leading global financial institution, with more than 47 million consumer and business customers and 4,500 retail centers around the world. The company wanted to create a centralized resource for its enormous volumes of data so that it could more easily be accessed by IT teams and various lines of business for data and analytics initiatives.
Solution: Arena’s centralized data management and governance enabled self-service for data consumers and lines of business through its data catalog capability. Data consumers could quickly find relevant, high-quality, and accurate data through a shopping cart experience.
Results:
- Reduced the time for data delivery for analytics from 3 months to 6 days
- Improved data quality with reduced costs
- Ensured regulatory compliance
Financial Services and Stock Exchange Company
An industry-leading financial services and stock exchange company based in Canada, with more than 1,200 stores in 17 states, was looking for a way to organize and manage their data lake so they could monetize their proprietary data and increase shareholder value.
Solution: Arena was used to build an Enterprise Data Lake on AWS for improved visibility and control across the organization along with centralized governance and a self-service data catalog.
Results:
- Governed, traceable, and easy data access for analysts
- Reduced costs while improving overall data quality
Healthcare and Pharmaceutical Company
A multinational healthcare and pharmaceutical company with more than 43,000 employees in 80 countries, and products in more than 170 countries wanted to build a cloud-based data lake for various use cases. One of the first use cases was to assist the marketing team with customer analytics to improve and create more targeted marketing campaigns.
Solution: Arena was used to build a data lake on AWS. Arena provided automated data ingestion; metadata management; data governance such as lineage, data quality, and privacy and security; and data lifecycle management.
Results:
- Global view for commercial analytics in < 3 months
- Cloud implementation with availability to users in weeks
- BI teams collaborated with custom analytics sandboxes, increasing data value while reducing compute the cost
Real-world Case Studies:
- Global Financial Institution
- Financial Services and Stock Exchange Company
- Healthcare and Pharmaceutical company
Turning data into actionable business
Unlike traditional, broad data management solutions that are complex, costly and deliver low value on data, or narrow point solutions that are unusable in isolation and must be bolted together to provide comprehensive time-to-value for a business, Arena is designed to deliver the breadth of data management, governance and self-service capabilities enterprises require to manage the end-to-end data supply chain and move fast at scale.
Source: Zaloni