A core driver of data value is the fact that analytics can extract real, actionable insight from your company’s information. But running analytics on that data can be difficult, as IT architectures become increasingly complex and data is stored in disparate locations. Read this article to learn how to better integrate cloud data for analytics.
Integrating Cloud Data for Better Analytics
Table of contents
Introduction: Cloud Data Integration Will Power the Future of Analytics
Serving The Needs of Both: Technical IT managers and business users
How CIOs Can Maximize Their Data Warehouse Investments: Data Lake vs. Data Warehouse
Modernizing The Data Estate: Data Warehousing And Dark Data
Challenges: With Data Warehouse Projects
Solutions: Integration For Analytics
Emphasize performance, cost reduction, and control
Conclusion
Data integration and cloud architecture will power the future of analytics. Business executives, analysts, and cross-functional vision leaders increasingly rely on analytics platforms and data products to make critical decisions that impact short term revenues and longterm growth for enterprises.
But building modern analytics for the enterprise is increasingly becoming more challenging due to issues integrating data more efficiently. These issues are around complexity within IT infrastructure, limitations in governance, and inability to properly provide data access to users, which all play a role in limiting the potential of enterprise data ecosystems.
What has become apparent with enterprises that have spent millions of dollars in large scale data warehouse implementations is that integration for analytics and business intelligence requires very different capabilities than integration for system applications. The former provides speed, access, and activation—the latter has benefits, and has also proven cumbersome to the business.
This article explores how best practices and principles in cloud data integration for analytics results in superior data products for technical analysts and business users, enabling business impact and a competitive advantage for organizations:
- How to access, analyze, and act upon data with the needs of technical IT managers and business users alike
- Why integration for analytics, rather than system application integration, requires different tools but can deliver competitive advantage
- Cloud data integration maximizes value in data architecture investments
- Implementing proper governance standards for data integration
- Distinguishing integration processes from business intelligence tools
- Emphasizing performance, cost reduction and control
Introduction: Cloud Data Integration Will Power the Future of Analytics
It’s no secret that analytics is becoming the lifeblood of any organization. Beyond dashboards, pipelines, and automated decision engines, business executives, analysts, and cross-functional vision leaders increasingly rely on data products to make critical decisions that impact short term revenues and long term growth for enterprises.
Cloud data integration consists of the processes used to manage and centralize flows of data from various sources to use that data to guide decision-making. Cloud data integration is critical in building effective data products for business users. But application integration and system orchestration can be more burdensome to the business than integration for visualization and business use cases. IT vision leaders that focus on the latter will have a significant competitive advantage in building custom data products.
Building modern analytics products is dependent on integrating data from disparate and often fragmented sources. Ineffective data integration strategies, complexities within IT infrastructure, over-reliance on application integration, limitations in governance, and enabling data access all play a role in limiting the potential of an enterprise’s analytics suite.
According to Forrester, traditional data integration fails to meet new business requirements that demand a combination of real-time connected data, selfservice, and a high degree of automation, speed, and intelligence. New and expanding data sources, batch data movement, rigid transformation workflows, growing data volume, and data distribution across multi-and hybrid cloud environments exacerbate the issue. While collecting data from various sources is often straightforward, enterprises often struggle to integrate, process, curate, and transform data with other sources to deliver a comprehensive view of the customer, partner, product, and employee.
As organizations look to modernize and become more digital and agile, the key factor to their success is how data is stored, how it flows, and how it’s accessed throughout the organization to ensure the capabilities to make better-informed decisions—faster. These data management strategies include bringing ALL disparate data together from across systems. This article explores these challenges and details best practices and principles in data integration for building astute analytics products:
How to access, analyze, and act upon data with the needs of technical IT managers and business users alike
Advanced analytics platforms serve the needs of IT and business users alike. Analysts or power users need the ability to bring in their data for analysis. However, current architecture makes it challenging for IT to extend their backend tech (ETL or IpaaS, Data Warehouse, etc.) to these individuals. Limitations in flexibility and cost, security, governance, and ease of use are reasons they can’t extend these technologies. The key is in building best practices that are fundamentally mindful of both groups.
Why integration for analytics rather than system application integration requires different tools, but can deliver competitive advantage
As enterprises transition from on-premise enterprise applications to cloud computing and AI-enabled apps, the focus has been building new applications. This approach represents an expensive, risky, and lengthy proposition. IT leaders that fundamentally understand application integration and system orchestration can be burdensome to the business will have a significant advantage by focusing on integration for visualization and business intelligence use cases. This focus will result in speed to market, business velocity, and innovation in improving analytics for the organization.
Cloud data integration maximizes ROI in data architecture investments
Most CIOs today have invested a significant amount in their data warehouses but find it difficult to access, extract, transform, and put critical data in the hands of business users. They continue to rack up technical debt—spending countless hours updating analytical cubes and building operational processes while complexity compounds. Data warehouse projects often aim to solve inefficient integrations by adding more business processes to an already abundant litany of jobs and IT data access requests. A simplified architecture, advanced connector configurations, more elegant ETL orchestration tools, and sophisticated auditing processes will prove more valuable in the long run in maximizing data architecture investments.
Implementing proper governance standards for data integration
Once technical business users get access to data through analytical cubes, IT loses control of the data and has minimal means of maintaining governance once intelligence is extracted. The speed of business continues to accelerate, demanding that data, sources, and datadriven products (e.g., reports, insights platforms, and analytics services) be delivered for business use in minimal time. Cloud-based integration platforms with agile interfaces can compress development cycles by incorporating new data sources and users quickly and adding much-needed governance and certification processes to the development of data products for business users.
Separating integration processes from business intelligence tools
Organizations may use unified tools across the entire company, but organizations often have disparate tools because they serve different purposes. Marketing will use one tool, and finance will use another. Operations will use a different tool than finance. More advanced data integration solutions should enable any business user to bring their own visualization tool based on their preferences.
Emphasize performance, cost reduction, and control
The future of analytics will require speed, scale, and control. Sub-second response times at billions of rows will be a requirement across enterprise systems. With massive parallel processing and the highest performance, the need to ensure data can flow and processes can run at the speed your organization requires must be prioritized. Data products must also empower users by giving them access to technology typically owned by IT, while at the same time giving IT all the control they need.
Data from the cloud and the Internet now coexists with enterprise data. Many organizations have data sources and targets on-premise and in the cloud, because they have embraced big data, social media, the Internet of Things (IoT), SaaS applications, and cloud storage—some of which are challenging to ingest and manage in legacy data warehouse environments. To accommodate this hybrid data environment, future-facing organizations need to modernize and extend their cloud data integration infrastructures to fully support the web and the cloud with more robust governance and data lineage capabilities.
If CIOs don’t implement these cloud data integration solutions, they will continue to rack up costly technical debt, IT issues from business users, and the compounding loss of control of their data. And they will never be able to connect their data in a governed manner. However, organizations that align their strategies with the practices outlined in this paper for cloud data integration, IT practitioners can maximize the potential of their data platform investment—and deliver best-in-class analytics within their enterprise.
Serving The Needs of Both: Technical IT managers and business users
To properly build superior analytics practices, IT vision leaders need to get their data in the hands of business users—faster. This access includes the ability to integrate all your data sources (clouds, applications, servers, existing data warehousing, big data analytics platforms, etc.) to build a unified source of truth at cloud-scale, wherever your data resides and without the need to move sensitive or referential sources. It also means the ability to give users access to all data in real time for use in any BI or execution tool, while staying in full control of data governance. IT sets policies and user permissions with the flexibility and ease of use required for large collaborative enterprises, and to serve the needs of multiple stakeholder groups–
- From the data science perspective, the aim is to find the most robust and computationally least expensive model for a given problem.
- From the engineering perspective, the aim is to build things that others can depend on; to innovate either by building new things or finding better ways to build existing things that func-tion 24×7 without much human intervention.
- From the business perspective, the aim is to deliver value to customers; science and engineering are means to that end.
The value of data products is based on the ability to unlock data from various sources and to transform it into actionable insight, and when that insight is promptly delivered.
How CIOs Can Maximize Their Data Warehouse Investments: Data Lake vs. Data Warehouse
The Data lake contains all data in its natural/raw form as it was received, usually in blobs or files. The Data warehouse stores cleaned and transformed data along with a catalog and schema. The Data lake and the warehouse can be of various types: structured (relational), semi-structured, binary, and real-time event streams.
Data lakes were largely IT-driven projects whose value to the business was not clear from the onset. They were built on the premise of “build a centralized data repository, and they will come.” Data lakes delivered on the promise of cheap storage, but continue to fall short in terms of flexibility, access, and utilization for business drivers. Schema-on-Read, applying a schema only when data is read and not when it was originally stored, also came with Hadoop. As a result, businesses started ingesting data into their lakes without worrying about how this data would be organized or accessed. Terabytes of structured and unstructured data began to flow into the data lakes.
This approach has proven to be one of the missteps of Hadoop-based data lakes. Data lake projects began to fail because of Hadoop’s complexity and the expertise required to operate numerous engines that run on top of it. Also, the lack of structure applied to data in the data lakes largely made it useless for analytics. As a result, this data never found its way to a real business application in the operating fabric of the enterprise.
Enterprises have realized that integration to stitch together disparate systems to orchestrate business processes through complex engineering projects has resulted in in compounding complexity in data management processes to service line of business users and analysts.
Modernizing The Data Estate: Data Warehousing And Dark Data
60% OF DATA LIVES IN ENTERPRISE DATA WAREHOUSES (ON AVERAGE) THE OTHER 40% IS DARK DATA
Data Living With Users
Users without access to data take data offline and run analysis locally, removing control and visibility from IT and the rest of the organization. For instance, local spreadsheets like budgets, spend, forecasts, etc.
Data Living In Legacy Systems Or Governed Environments
Data living in older, inaccessible systems or systems typically do not allow the data to flow directly to a (cloud) data warehouse.
Legacy data access patterns do not support the downstream systems’ technologies, holding back transformation and business progress. Tech debt must always be paid.
GROWTH OF THE DATA WAREHOUSE
Rapid Growth of Data
With the rapid growth of data generated and new sources being added frequently, managing data and integrating new sources becomes increasingly challenging, driving people to buy other BI tools or take data offline.
Unified View of Data
The growth of data and data sources makes it more difficult to get a clear overview of sources, destinations and processing of data involved.
The key is in using a cloud integration solution that functions as a data fabric that weaves together fragmented sources and can remain flexible when business users change the context of the data. Dark data can be brought into the light when contextualized in the way users value the information
Organizations understand the need to integrate data to modernize their infrastructure. But they are lacking the data integration technology and tools to deal with the challenges of the complexity of integrating, governing, and controlling data within their organization, thus increasing risk, time spent on repetitive and manual tasks, and negatively impacting their performance.
Challenges: With Data Warehouse Projects
According to Forrester, tackling core systems is hard because nearly all companies that are more than 10 years old have amassed meaningful technical, data, and process debt. Although we saw significant efforts in 2019 to dig out of debt, a mountain of work remains. Modernizing core tech — the systems that record transactions, automate business operations, and underpin customer journeys — is one of the biggest challenges for IT execs over the next decade. It’s no small task for CIOs to connect technologies and experiences that engage customers through underlying systems (whether logistics, ERP, claims management, banking, etc.) and harness the associated data to allow new capabilities to drive differentiation through their data products.
The reason cloud providers are often not successful in moving the needle on business outcomes is that they have essentially replicated the duct tape that existed in their on-premise IT infrastructure into the cloud. To make matters worse, the addition of new workloads, like data science and extensive resource requirements on their data warehouses, have further complicated the enterprise data infrastructure landscape.
Businesses are required to duct tape together multiple pieces of infrastructure — an OLTP database, an OLAP database, and data science algorithms and tools — to modernize their new or existing mission-critical applications with machine learning, businesses. When companies move their data infrastructure to the cloud, the ducttape doesn’t go away. For example, consider a company interested in building a data infrastructure comprised of an OLTP database, an OLAP engine, and data science tools and algorithms deployed in the AWS Cloud.
This setup would require subscribing to Amazon S3 (storage layer), Redshift or Snowflake (data warehouse), RDS or Dynamo (OLTP database), and one of at least nine machine learning engine options like Amazon Sagemaker depending on the particular use case. They would then need to integrate all of this together using Glue, Amazon’s ETL tool, and somewhere between a little and a lot of custom code. This is a complex architecture that is expensive to build, operate, and maintain and applies to all public clouds. Additionally, it requires data movement across platforms that can result in poor business decisions because insights are drawn from stale data or increased costs through the data movement itself being metered and charged.
Most IT managers have created a rigid semantic layer to ensure that everything is properly measured when users request analytical insights for revenue by industry or team. These dimensions have already been aggregated with calculations on how insights should be surfaced. If you need to bring in other variants or dimensions to achieve different optics, this could break the calculations in the initial cube or create inaccuracies in the data set.
Managing the balance between user requirements and admin governance then becomes a challenge because this intersection drives how users want to interact with the data. If a business user wants to manipulate or reconfigure the data, IT managers lose control over it. Suppose business users can extract the data from cubes via export features. In that case, IT leads no longer have visibility and don’t know if those users share it with competitors because they’re dumping it into Excel or another tool.
Enterprises face an ongoing problem: even with tightly integrated applications, they still don’t have technology that allows the organization to integrate all their data while keeping control and governance at scale and speed. The following provides solutions for managing increasing complexity in data ecosystems.
Solutions: Integration For Analytics
While application integration tools will focus ETL processes on data persistence while in the pipeline, Cloud data integration has been on ease of use to allow analysts to retrieve their own departmental or dark data that isn’t accessible or even known.
Acting Before Governance issues compound
Companies own a lot of data. They will put as much of it as possible into a data lake. Today, data lakes are inexpensive, and enterprises will place their data into S3 buckets or Azure blob—or their data engineering teams will create ETL processes around them for custom utilization. But there are limits to data lake and data warehouse configurations, especially when these limitations scale due to company size and complexity within the organization. Manual processes, operations, and programming can exacerbate these problems. IT leaders must implement cloud data integration solutions with core data governance systems that connect and transform data across environments through federated data, augmented write-back features, and data pipeline tools.
Implementing Governance Standards throughout the data Ecosystem
The problem is that when you put your data in a data warehouse, most companies only get 60-70% of their data in that environment. By instilling governance, certification processes, and auditing the data trail through data lineage, you will be able to maintain the control of your data while providing users the right kind of data access. Even highly unstructured data like social media posts, call centers, and other high volume and velocity unstructured documentation can be processed, transformed, and normalized under a single system. A connected architecture with pre-configured connectors enables this capability. From a governance standpoint, most technical business users should be able to get data out and use it for various purposes, so it takes a flexible system to maintain and govern these practices.
Replacing Cubes: Leveraging an integration cloud solution
If you are maintaining a data warehouse that is rigid or with a complex OLAP cube configuration, you will not likely encourage users to take that data out. But this data is important to business analysts because they need the context to make informed decisions about operational logistics, customer lifetime value scores, or financial projections based on external sources.
Business users working in marketing organizations need access to campaign, media, marketing, or POS data that inform how they create customer segmentation or business logic for personalized messaging.
Vast volumes of social media data that often do not get ingested into a data warehouse because of the difficulty in constantly updating it—can use this data through an integration cloud data platform. Then there’s the data that’s just sitting in cubes. It can take up to six months for some critical data to make it into the data warehouse and finally delivered via a pipeline and transformed through a microservice for an analyst to consume. An integration cloud solution can accelerate time to deploy analytics projects and dashboards and to automate insights at scale.
Alleviating the lag and latency challenge enables business users to access the data in real time to make the right critical decisions. Once the data is cleaned and fed into a schema within the data lake environment, the IT department doesn’t want anybody touching it, which is why they create cubes or reporting tables. This process speeds up performance in the short term, but adding job requests quickly becomes untenable for every business user. Cloud data integration solves this.
When IT managers push data directly from data warehouses and into a virtual cloud integration solution, they can keep the data small and don’t have to pay for a lot of data transformation and processing. Preaggregated data in cubes fulfill this purpose but without the same level of flexibility. There are limits to how the user can analyze the data in cubes. If sales leaders used opportunity data from Salesforce, they would want to evaluate revenue by rep, revenue by team, or industry. If that sales leader had questions about the data set, the IT manager would have to reconfigure the cubes again based on their preferences. Since cubes are preconfigured with specific business logic, it is challenging for them to apply their contextual preferences and filters to that data or add various dimensions that are important to them. Cloud data integration, a connected architecture, and federated data remove the IT bottlenecks while providing new governance standards.
The answer lies in leveraging a cloud data integration system that is economical and elastic, with an open cloudbased data platform that can efficiently run automated SQL-driven workloads on multi-formatted datasets. A system with pre-configured connectors makes it easy to provide data access to any cloud system in just minutes, especially with a robust library of over 1,000 pre-built connectors connected to the most commonly-used cloud systems.
Why integration for analytics rather than system application integration requires different tools but can deliver competitive advantage
The data integration opportunity: A tale of two use cases
Cloud environments are exploding, along with data lakes and data warehouses, but enterprises are not maximizing those investments due to their inability to manage multiple complex environments and data flows. Accessibility to data at a time in which business decisions are critical is also posing a significant challenge.
More than ever, enterprises need solutions that provide data access for analysis and decision making, and at the same time, maximizes their existing data warehouse and data lake investments. The tools they currently use are focused on application integration, not data access for decision-making and execution. Those investments are racking up technical debt and becoming problematic in adding value to the organization’s bottom line. Most organizations have tech infrastructure issues that increase complexity, causing disruption and constraints across IT governance, business operations, finance, marketing, and analytics.
Cloud data integration for analytics allows your organization to maintain all of its data in a single environment, so your team has a comprehensive and virtualized view of business operations and customer interactions. Centralizing data and making it accessible to promote widespread data literacy, enables organizations to spot hidden opportunities, improve performance, and spur innovation. These systems also support federated data access when it is not ideal for centralizing it, whether due to security, latency, or volatility. This access delivers additional speed and efficiency.
While most companies already have systems in place for integrating their applications to align with business operations, they are experiencing long deployment cycles, extended time to value, and high data latency. These systems are data lakes that are more centered around process—Integrating systems in which operational process orchestration and applications are the core asset and data is a necessary payload within those processes. This requires heavy lifting across multiple IT stakeholder groups.
As enterprises transition from on-premise enterprise applications to cloud computing and now to AI-enabled apps, their focus has been on building brand new applications. Existing purpose-built applications have been largely overlooked during this transformation, which is ironic because purpose-built apps are deliver the most benefit from being made agile and intelligent. Purpose-built applications have been left behind because of the complexity of rewriting and migrating them to newer, more specialized data processing systems that require new complex architectures, like lambda, to power in-the-moment intelligent decisions and actions. This transition is often an expensive, risky, and lengthy proposition.
Integration for analytics and business intelligence
Data integration for analytics proves more nimble and efficient in terms of time to value. Data-centric companies that wish to integrate data into automated pipelines to achieve governance, visualization, control, and actionable decisions in their existing third-party ecosystems are experiencing more value in building successful data products.
IT leaders that fundamentally understand that application integration and system orchestration can be more burdensome to the business than integration for visualization and business use cases, will have a significant competitive advantage in building custom data products. Getting the right data in the hands of business users for analytics and visualization often provides the value-added workflows the business needs the most when speed and time to value is of the essence.
By focusing on analytics products instead of system orchestration, IT leaders can be more agile, help users accelerate learnings, and extract data from disparate systems in a governed, auditable manner. It also delivers optimal performance to the system without the need to operationalize inefficient business rules, processes, and jobs that orchestrate data flows.
Emphasize performance, cost reduction, and control
Problem: Dark data poses a huge risk. Like shadow IT, dark data increases the risk of data loss and leaks, lowering governance and reducing control of data within the organization.
Solution: Bring dark data to light through ETL data pipelines and orchestration tools
Problem: Ensuring access rights and proper governance. Organizations lack the control to properly manage access rights and provide users the control to find and request the data they need.
Solution: Through row-level user access permissions, data lineage, and certification processes, you put power and control in the hands of IT while satisfying insight-savvy business users
Problem: Manual and time-consuming processes. Processes for managing data importing, cleaning, and preparation for further processing are usually manual and very time consuming, hindering operational excellence and holding back progress for departments within organizations.
Solution: Leverage systems with sub-second performance, massive parallel processing columnar architecture, and big data machine learning tools
Problem: Low performance of data pipelines. Even if automated, latency, and low performance of data pipelines negatively impact the speed of data processing and analysis.
Solution: Use virtual systems. Much faster than original ETL virtualized data query on cache systems allow the processing of millions of rows in minutes. Data set views are completely virtual (no processing time) systems with mpp architecture, providing the ability to bring in raw transactions
Conclusion
Building best practices in data integration for analytics can be challenging—especially when access to data is bottlenecked within data warehouses and IT backlogs. Most CIOs today have invested a significant amount in their data warehouses. Still, they find it difficult to access, extract, transform, and orchestrate data to put critical information in the hands of business users. They continue to rack up technical debt—spending countless hours updating analytical cubes and building operational processes while complexity compounds. Furthermore, once technical business users get access to data through semantic abstraction layers coded in rigid business logic, IT loses control of the data and has minimal means of maintaining governance once the intelligence has been extracted.
To properly build superior data products, IT vision leaders need to be able to get their data in the hands of business users—faster. This includes the ability to integrate all your data sources (clouds, applications, servers, existing data warehousing, big data analytics platforms, etc.) to build a unified source of truth at cloud-scale, wherever your data resides, without the need to move sensitive or referential sources. It also means having to give users access to all data in real time for use in any BI or execution tool, while staying in full control of data governance. IT sets policies and user permissions with the flexibility and ease of use required for large collaborative enterprises.
With cloud data integration capabilities, companies can dynamically integrate data from thousands of sources and systems, whether they live in the cloud, applications, or on-premise, and automate data pipeline and transformation/ETL processes to prepare data for further processing and visualization. Organizations can even bring their own BI tool, or use Domo, to ensure one version of truth.
Source: Domo Technologies