They say great ideas are born out of necessity. The data & analytics field is no exception. Although DataOps is seen as a fairly new and disruptive methodology, leaders in data and AI pioneered in finding more efficient, agile and easier ways of designing data pipelines long before it was incarnated into a defined concept.
Today, data and AI innovation are practically impossible without enabling teams to innovate by access to data, processing resources, and operational capabilities. Tearing down walls between teams and bringing skills together in a unified process is essential in order to succeed with AI, states Lars Albertsson, Founder of Scling.
Data 2030 Summit attendees will be able to hear Lars present on how DataOps lean principles and practices can be applied in practice for successful data processing. The discussion that follows below is a great opening to his session, where Lars talked about some of the most crucial points encompassing DataOps, including his beginnings with it, DataOps as a natural evolution of collaboration, its role in modern data management, and his views on the rise of the brand new paradigm Data Mesh and its implications for data engineering and business.
Hyperight: Hi Lars, as always glad to have you with us. As you are a frequent speaker at our summits, instead of the regular introduction, we’d like to start with your personal experience with the topic of your Data 2030 Summit session- DataOps. How and why did you start working with DataOps?
Lars Albertsson: Hi. It is a pleasure to participate in your events, as usual. I am honoured to be invited.
My first contact with DataOps was at Spotify in 2013. At the time, developing and deploying our data pipelines was slow, difficult, required a lot of tribal knowledge, and it was easy to make operational mistakes. Even though our data lake had rich content, only a few teams were able to work efficiently with the data. We set out to democratise the value of data, with the goal of enabling all engineering teams to easily build and operate data pipelines. We developed a “golden path” – a blessed way of working with data pipelines, and tooling to support the path. The key technical components were strong support for developer testing, continuous integration and continuous deployment pipelines, and production data quality monitoring. At this time, Docker was not yet mature enough to use in production, but we had realised the operational benefits of immutable, self-contained deployment artefacts, so we built a homegrown solution without containers, which was later substituted for Docker and finally Kubernetes.
When we rolled out our new solution, we took care to work closely with early adopters, sending out data infrastructure engineers to embed with adopting teams in order to understand and address shortcomings. We also had product team developers temporarily embed within infrastructure teams as a way to transfer knowledge and align technical culture. I believe that this exchange of people was a key factor in our successful adoption of DataOps.
We reached our goals of enabling data inexperienced developers to put new pipelines in production in less than a day, and to fix production problems in under an hour. The main impact, however, was organisational – we cut dependencies on the core data teams for innovating with data. A few months after our first launch, the number of teams working with data grew rapidly. Using data eventually became as natural and low-friction as using a word processor or spreadsheet.
Years later, the type of transform that we made became known as DataOps.
Hyperight: DataOps is enterprise data management for the AI era. Do you agree with this statement? Why is DataOps essential for every company striving towards being AI-driven?
Lars Albertsson: Data management is one component of DataOps. The most important component is the new way for teams to collaborate around data, different from traditional collaboration patterns.
If we take a historical perspective, DataOps crystallises out as a natural evolution of collaboration. In the beginning, there was the waterfall, and product development activities were performed in distinct phases, by different teams: requirements, analysis, design, coding, testing, and operations. By the late ‘90s, after an increase in failed IT projects, people started to look for better ways of working. One movement sought to speed up the quality assurance cycle by combining the coding and testing phases to be performed by the same team, covering both skill sets, and Extreme Programming was born. Another movement figured out that requirements were often wrong due to insufficient information, and combined requirements, design, and coding to form an iterative requirements process. We call this Agile. One aspect of the Big Data movement is the ability to measure product behaviour in detail so that we can adapt product development based on measured behaviour; i.e. we combine analysis, design, and coding. Big data is not primarily about technology – it is a new way to collaborate with data.
After several revolutions in collaboration patterns, we finally integrated operations, and we gave these movements the “Ops” suffix. With DevOps, we integrated coding, testing, and operations, and brought these skills together in the same team. DataOps is likewise the integration of skills and activities into the same team: analysis, coding, testing, operations. We still need to improve collaboration pattern integrations, and will see further similar movements become mainstream: MLOps, DevSecOps, ComplianceOps, etc.
Integrated, cross-functional data product teams are aligned around product value flows. These teams are more efficient than teams aligned around skill sets, because they understand all parts of data-driven products, and have fewer dependencies on other teams. Since we have eliminated dependencies on central data teams or governance teams, new data management patterns are necessary. Without centralised governance structure, product-aligned autonomous teams create excessive heterogeneity, which eventually slows down development, prevents data democratisation, and makes compliance challenging. So data management in the age of DataOps is a combination of product team responsibility and central teams that own coordination and governance. Unlike traditional data management bodies, these central teams do not operate through direct actions, but by creating tools and facilitating processes – more carrots than sticks.
Many companies, unfortunately, start their journey towards AI by hiring data scientists and putting them together on one team, with little or no engineering support. In a typical anti-pattern, waterfall scenario, they are given a pile of data, create models, and hand them off to developers. The developers translate them from Python to their favourite language and hand them off to operations, who put them in production. But machine learning products rarely work as expected at the first attempt, and the handoffs between teams make iterations slow, which in practice prevents waterfall machine learning projects from ever yielding expected value. Tearing down walls and bringing these skills together in a unified process is essential in order to succeed with AI. It is also important that data quality and data monitoring is integrated into the iterative work process. Data-fed products are more difficult to understand and operate than traditional products since product quality is dependent not only on code, but also on input data. Unless there is sufficient observability in place, product quality degrades over time as input data changes.
AI innovation is limited by what is possible to create and operate with reasonable effort, which depends on the availability of data, established dataflows, and recent technological innovation. These things are known by practitioners, and successful AI cannot be planned from above. It comes as a result of enabling teams to innovate by access to data, processing resources, and operational capabilities. One of the proudest moments of my career was when I heard how Spotify’s Discover Weekly feature was created in a few weeks by a single team that had an idea. In a conference presentation, they said “Discover Weekly wasn’t a great strategic plan and 100 engineers. It was 3 engineers that decided to build something.” It would not have been possible if we had not succeeded with democratising data innovation.
Hyperight: As it is, DataOps is a great way for companies to operationalise their data and analytics pipelines and get valuable insights in a faster and more agile way. But in theory, many companies are failing in deploying DataOps. Why do you find that is?
Lars Albertsson: While it is easy to install new technology, it is more difficult to change the way people work and organise. Agile and DevOps transformation processes take many years, with mixed results. Data management and governance has traditionally been done by groups separate from developers, and shifting power and influence away from these groups meet resistance. Similar resistance arises whenever any governance or risk management function is requested to shift from exercising control to a more supportive function. We see this with changes to operations, infrastructure management, security, and compliance.
There are also cultural differences to overcome. Product managers, data scientists, software engineers, quality assurance experts, and operations staff come from different cultures, use different tools, and have different ways of working in their natural habitat. It takes time and effort to find working ways of collaborating in mixed teams.
Hyperight: Parallel to DataOps, there’s another hot topic, or trend taking shape and catching on – Data Mesh. Could you please explain to us what Data Mesh is, what its implications are, what it means for data engineering and for the business?
Lars Albertsson: Data Mesh is one way to scale out large data organisations where data management and governance has become challenging due to the number of teams working in the data platform. A Data Mesh can allow companies to scale further by federating data management and governance to teams that own data sources and data pipelines.
Companies adopting Big Data and DataOps go through several phases of organisational structures around data, over a period of many years. In early phases, there are one team or a few tightly collaborating teams that use a single data platform, where the core component is typically a data lake combined with batch processing pipelines, potentially complemented with stream processing capabilities. Most companies are either in these early stages, or still have not yet built their first incarnation of a data platform. More data mature companies have managed to spread data innovation beyond pioneering teams, and democratised data processing capabilities within the company. In order to facilitate data democratisation, and to make governance manageable, the data platform technology and development processes are kept homogeneous. There are usually small variations, but if entropy is not kept under control, excessive friction will prevent data democratisation.
A centralised data platform can scale to large organisations. For companies whose culture makes it difficult to scale centralised services, centralised data governance can be perceived as a bottleneck, and a Data Mesh can be a way to scale further by distributing governance responsibilities. In practice, a Data Mesh incarnation is a data platform and lake that has been split up into multiple ponds and multiple processing environments, controlled and operated by different teams. The size at which a Data Mesh makes sense depends on a company’s capabilities to coordinate a centralised data platform. It is not the only option for scaling to a large number of teams; companies with sufficient capabilities to coordinate data management can keep a centralised data platform and avoid the overhead of decentralisation.
Hyperight: We can say that both DataOps and Data Mesh are trends that are going to disrupt the data and analytics industry in the next decade. But what are your outlooks as to how they will progress and transform companies in the future?
Lars Albertsson: Organisations that have fully adopted DataOps are 10-100 times more efficient working with data, compared to traditional companies. There are no scientific measurements for DataOps, and this is my subjective estimation, based on observing many companies across the maturity spectrum. These numbers may seem high, but they match scientifically measured operational metrics for companies at different levels of DevOps maturity, as presented in the State of DevOps report, and DataOps seems to have similar effects in terms of lead time for new ideas and time to recovery in the event of failures.
This efficiency gap is so significant that it is disruptive. DataOps is in practice a requirement for getting sustainable value out of machine learning technology. Building and operating machine learning applications, keeping them healthy, and iterating to continuously improve them is complex and expensive. Companies that have not achieved a high level of data maturity may strike luck once or twice, but cannot deliver AI innovation in a sustainable and repeatable manner.
Data Mesh is not a disruptive concept. It is a way for very data mature, large companies to scale even further. These companies have already obtained disruptive value from data, and fully adopted machine learning technology. Most of the buzz around Data Mesh is among companies that are not yet at this level of maturity, and it therefore makes me concerned. Early adoption of a Data Mesh is likely to be harmful; if you have not yet established strong, homogeneous conventions and processes in your data platform, decentralising it will introduce heterogeneity, which slows down innovation and prevents effective data democratisation.
For most organisations, adopting a data mesh is in practice likely to be a step backwards, and reintroduce the data silos that we had before the age of big data. Descriptions of the Data Mesh tend to emphasise the responsibility of teams that own data to publish cleaned, high-quality data as data products, a pattern also known as a Data Hub. Shifting responsibilities of data quality improvement to teams that have domain expertise is generally a good idea, and an evolutionary step towards data maturity, but it is imperative that the underlying raw data is also made available. For example, while invalid financial transactions may be desirable to cleanse away for financial reporting scenarios, they might be a gold mine of signals for fraud detection. Hiding raw data into silos is a drawback of the data hub, and a significant risk when adopting a Data Mesh unless the company first establishes strong data democracy practices in a centralised platform. Hence, I am concerned that the buzz around Data Meshes will be harmful, and lure less data mature companies to build data silos.
There is no such risk with DataOps. If your company can make people with different skill sets work well together, there are only benefits and will accelerate your journey to data maturity.
Featured photo by airfocus