Data-driven innovation requires fast and secure access to enterprise data for anyone who needs it. DataOps methodologies provide the framework for improving the speed and usefulness of data for innovation while complying with data security policies and laws. DataOps accelerates the adoption of transformative technologies, such as cloud, artificial intelligence, and machine learning furthering your ability to lead with innovative products and services.
Digital transformation is a difficult road, one where businesses must continually reinvent themselves just to stay alive. Half of today’s business leaders are unsure what their industry will look like in just three years and 72% of them say they’re being out-innovated by their competitors. The fast is eating the slow, and companies need to innovate to win.
We need a new approach; one that does for data what DevOps did for infrastructure. That is the promise of DataOps, a new means to connect people to data, empowering them to overcome data friction, and achieve the velocity of innovation demanded by the digital economy.
Read on this article to learn more about the role of DataOps in securing and automating the delivery of data for innovative software development and business analytics.
Yet 84% of companies still fail at digital transformation. Several research studies have been done on this alarming statistic, and while technical acumen and capabilities are a necessity, the common issue lies at the intersection of people and technology. A study from GRI reports that of the seven barriers to digital transformation, only one is pure technology, and the remaining six are people-related barriers; such as aversion to risk, slow experimentation, and siloed processes.
The area where these organizational barriers are most glaring? Data. Companies have made great strides in adopting Agile, DevOps, and Cloud to build and deliver software at a breakneck pace. But the world is changing, and data has emerged as a new strategic asset. Applications are becoming more intelligent and data-driven, leveraging machine learning to deliver unique consumer experiences and new business capabilities. Data is a source of unique insights into customer behavior and market dynamics, using advanced analytics and Artificial Intelligence (AI) to reveal new opportunities for differentiation.
Just when businesses are orienting their DNA around becoming software companies, they’re finding that they need to become data companies to win. But many organizations haven’t evolved their strategies quickly enough to match this new imperative. Even though data is king, people and process challenges continue to stem its flow to those who need it. And for most businesses, these challenges have yet to be addressed through greater automation and a culture dedicated to fueling innovation with data.
The Emergence of Data Friction
For decades, companies have organized themselves around the systems and processes that run their business. They have streamlined their ability to develop applications, manage the systems upon which they run, and deliver value to customers. Data, however, must flow between systems and processes, encountering friction that inhibits the ability to get data into the hands of those who need it. Data Friction has emerged as a key bottleneck in a rapidly changing environment:
- More Demand: As DevOps and cloud tear down the barriers between people and infrastructure, more environments, more automation, and more speed mean more demand for data, faster. But it’s not just speed; new technology and market trends are continually creating new data and data needs: 61% of companies implemented AI systems in 2017 that will likely consume large datasets and 43% already have an IoT strategy that will likely produce massive datasets.
- More Users: It used to be that developers were responsible for feeding data into complex business applications and ETL processes so that business professionals could consume carefully crafted reports and dashboards. But the world is evolving quickly: everyone is becoming a consumer of data and must get the data they need when they need it. While the number of data scientists has more than doubled in the last four years, so too has increased the number of auditors, analysts, developers, testers, and compliance officers that need data. It’s not just the number of users that have proliferated, but also the variety of their backgrounds, requirements, and skills spread out across organizations.
- More Places: The days of a single datacenter are gone, and virtually no enterprise talks about a journey to a single cloud provider. 85% of enterprises have a multi-cloud strategy, with applications running across an average of 5 public and private clouds.10 And every cloud vendor is desperately trying to deliver unique services to lock in customers, requiring companies to adapt their data strategy to an ever-changing cloud landscape to leverage the best possible technology solutions.
- More Sources: Long gone are the days of enterprise data and application strategies being centered on the Oracle/IBM/Microsoft relational database. The rise of Open Source and Big Data has created a plethora of new options. With a broad selection of “fit-for-purpose” systems at their disposal — from low footprint relational databases to large, distributed NoSQL stores — application developers have greater flexibility to select a solution that solves the data problem at hand.
These forces are driving data to be everywhere, for anyone, in any form. But different forces constrain data, restricting access and availability:
- Increased Cost: Data is doubling in size every two years, requiring storage, network, and compute resources to keep pace. Even when technologies can keep up with the mitigating cost, the rising number of data copies creates an exponential effect that is nearly impossible to offset.
- Increased Complexity: The number of popular data sources has quadrupled in the past five years, with more as-a-service and on-premise solutions than ever before. Meanwhile, new technologies like mobile, social, machine learning and IoT have created entirely new data sources for the business. As a result, data is increasingly generated and consumed in silos, and organizations struggle to deliver a consistent level of service for all the stakeholders that need data: already slow, manual processes get even slower as data complexity surges.
- Increased Risk: More than 9 billion personal records have been stolen in the last five years. GPDR has gone into effect, with new regulations such as the California Consumer Protection Act (CCPA) already being passed. Data privacy and security are now a top imperative for the business and responding to regulatory pressure a key hurdle to overcome. But the easiest solution – preventing access – is the opposite of what organizations need.
Winning requires aligning people, process, and technology around the flow of information to accelerate innovation and drive differentiation.
Overcoming Data Friction with DataOps
Addressing Data Friction is ultimately a people problem – with the right technology platform as an enabler. Companies must change how organizations are structured and how teams communicate to enable the flow of data across the company. This already difficult task is made impossible without fundamental changes to technology to enable data flow and relieve friction. The combined changes to people, processes, and technology that can overcome Data Friction have emerged as a new practice known as DataOps. Gartner defines DataOps as:
A collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization. The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.
Just as DevOps brought together two key audiences – development and operations – so does DataOps.
- Data Consumers are those that need data to do their job, driving innovation, and outcomes for the business. They include developers, testers, data scientists, and analysts.
- Data Managers are responsible for the operation, preparation, and delivery of data to consumers. They include DBAs,
These groups often point fingers at each other to assign blame. For example, Data Consumers generally hold Data Managers accountable for improperly cleansed data that is delivered late. In turn, Data Managers complain that Data Consumer requirements for data are generally ill-formed and changing, causing constant rework. And these anecdotes Less than 37% of business organizations perceive that IT’s digital initiatives are aligned with the business; around 25% view IT as correctly using data, and less than 25% consider IT is using structured approaches to deliver value to customers. But really, they’re fighting a common enemy: Data Friction. When Data Friction becomes the blocker to innovation, customers leave, competitors win, and businesses spend more time reacting instead of leading.
DataOps is still an emerging movement, so consistently proven practices are hard to find. However, experts generally agree on what form those practices must take. A successful DataOps strategy must start by bringing Data Managers and Data Consumers together to understand what outcomes the business needs, and what information is required by whom to make those possible. This map of data flows can then be used to determine where constraints exist and how to eliminate them, as well as how to manage and evolve those flows over time. Critical problems that create Data Friction include the ability to move and deliver data, identify and mitigate risk within data, clean and prepare data, and manage data models and related artifacts – all while undergoing constant change.
DataOps is a practice that affects all aspects of the business, but it plays an increasingly critical role in several domains:
- Software Development: Automated, repeatable, faster provisioning of higher quality test data earlier in the development lifecycle results in increased release velocity and fewer defects.
- Data Privacy and Governance: As data flows into, across, and out of the enterprise, companies must both identify and mitigate risk to meet compliance and privacy constraints associated with various constituencies of Data Consumers.
- Analytics and Data Science: Critical insights require access to timely, relevant, high-quality data, integrated into the pipelines and workflows for data preparation and machine learning.
Adopting DataOps requires a substantial change to people and process that is catalyzed by transformational technology. A DataOps technology practice will need to provide a seamless experience that meets three critical requirements:
- Enterprise Ready Platform: Enterprises do not want a series of point solutions that must be stitched together to implement a data strategy. They prefer a platform that seamlessly integrates and scales with the heterogeneous enterprise IT landscape, works with all relevant data (wherever it exists), and provides unified governance and control.
- Frictionless Data Delivery: Data is not useful unless it is made available to those that need it. This requires capturing data where it exists, moving it to where it needs to be, preparing it for use, and presenting it in a form that consumers can use on a day-to-day basis with minimal overhead or delay.
- Integrated Risk Management: It’s not enough to simply secure access to data. A compelling DataOps solution must proactively identify risk while leveraging de-identification, obfuscation, and other techniques to mitigate risk as data flows across the enterprise.
A DataOps technology practice will need to provide a seamless experience that meets three critical requirements:
- Enterprise Ready Platform
- Frictionless Data Delivery
- Integrated Risk Management
In successful DataOps practices, Data Consumers will have ready access to, and control over, the right data in the right place for them to innovate at velocity. Data managers will have the automation, efficiency, oversight, and confidence to support the business at scale.
Impact of DataOps
The banking industry is not known for its velocity of adopting new technologies. For more than half of banks, the majority of their banking products cannot be applied for online, leading to a 70-90% abandonment rate when customers fail to open an online account. Winning requires taking advantage of new approaches like virtual banking, machine learning, and AI chatbots – all of which require data to be successful.
In our story, the CEO of a large commercial bank has set the audacious goal to move from the #5 national bank to the #2 national bank in just three years. To do this, they need to disrupt their own business by moving away from brick and mortar and towards a personalized on-demand experience for every customer. The CIO is now faced with the daunting challenge of creating personal data streams from the vast array of data within their institution. She’s hired the best developers and machine learning experts she can find and put them directly with the Agile scrum teams, but all she hears are complaints. They can’t get the data they need, and when they do it’s already hopelessly out of date. The operations team understands their pain but can only move so fast with the resources and regulations they have in place. The project team has turned to synthetic data, but every time they move into system tests with real data everything falls apart and throws the project further behind schedule. They need real data to let them iterate quickly on new ideas, such that issues are identified and fixed earlier in the lifecycle at a lower cost.
With DataOps, operations can focus on establishing, securing, and automating the delivery of the data needed by project teams, without being burdened with the myriad of requests for every environment. Meanwhile, development and test teams get self-service access to fresh secure data that they can easily refresh, reset, and share earlier in the life cycle; whether that data is masked production copies, synthetic, or subsetted—or a combination of all three. The result is faster projects, higher quality, and lower costs. And best of all, by tearing down the walls between them, IT and development can start working with each other instead of against each other to drive a best-in-class personal banking experience.
Data Privacy and Governance
It was only four years ago that Target CEO Gregg Steinhafel resigned in the wake of a massive data breach at the company. Meanwhile, data breaches have become a nearly daily occurrence, and data security has become a top initiative for nearly every company.
In our story, the ousting of a competitor’s CEO due to a data breach has put the job of the CSO at a major retailer on the line if he can’t control risk to the business. After a detailed audit by his team, they uncovered a major problem with one of their marquee projects. While they have implemented state of the art data security techniques to encrypt data at rest and in transit with a strong identity and access controls, they’re also leveraging a third-party offshore development team. They cannot possibly guarantee the same level of security and personnel controls at an outsourced partner and cannot mask and deliver data at the speed required by the project. They’re faced with a lose-lose situation: limit access to the data, or compromise on data quality, both of which will throw the already-behind initiative further behind schedule. And no one wants to tell the CEO and board that their number one digital transformation project has to be put on hold.
With DataOps, they can complement their data security approach with strong data privacy to manage risk by identifying and eliminating sensitive data within the data. The security team can partner with the operations team to de-identify sensitive data to create a secure data timeline that can be continuously delivered to their partner. Meanwhile, the partner gets the same or better control to access the data they need when they need it. The result is that they can meet security objectives without putting the project at risk.
Data Science and Analytics
It’s no secret that Artificial Intelligence and Machine Learning are creating a revolution in the world of data science and analytics. Deloitte estimates that enterprise machine learning projects achieve a 2-5x return on investment in just the first year, with a supporting ML-as-a-service market growing from $1B in 2016 to more than $20B by 2025.
The CEO of our retail company is feeling the pressure of Amazon and online shopping every day. They’ve embraced their transformation from brick-and-mortar to an online enterprise, but they’re unable to create the same customer engagement and retention that their competitors have achieved. They’ve brought in a new Chief Data Officer to help extract value from their data who knows that machine learning will be key to driving greater consumer engagement. They built out a small data science team, but quickly found they lacked access to the most important data stuck in their legacy ERP system. The team wanted to leverage the latest public cloud machine learning workbenches and frameworks but had no way to get the data out of an on-premise relational database, remove sensitive information, clean the data, and provide it into a cloud ML environment. They’ve been banging their heads against the wall of IT operations for more than six months, forcing them to find scraps of value elsewhere instead of getting the big wins that the company needs.
With DataOps, the data science and IT operations teams can work together to build a modern agile data pipeline that will provide clean de-identified data into the cloud. They can continuously extract data from their on-premise ERP database, mask the data, and ship it to the cloud where it can be cleansed and integrated into ML tools. With easy access to relevant data for analytics and model training, data scientists can finally deliver the insights required for the business to compete in the modern retail market.
The velocity of innovation makes and breaks winners in today’s data-driven economy, and companies must overcome Data Friction to accelerate innovation and create a competitive advantage. Organizations and processes must evolve to better collaborate around the flow of information in the enterprise. But they can’t get there without the technology that provides innovators easy access to any data, anywhere, without compromising data privacy and compliance. DataOps is a collaborative data management practice focused on improving the communication, integration, and automation of data flows between Data Managers and Data Consumers across an organization. With investment in DataOps practice and technology, companies can finally thrive and win in today’s data economy.