In a previous article, we discussed the new enterprise data architecture spreading like wildfire among the data community – Data Mesh architecture. We mentioned that it presents a paradigm shift in data architecture that sees the data industry follow suit by moving away from massive data teams prioritising centralised, monolithic data lakes and databases, to one that prioritises data domains and data products as first-class citizens.
In a nutshell, it presents a convergence of distributed domain-driven architecture, self-serve platform design, and product thinking with data. To better understand the concept, we talked to Daniel Tidström, Partner & Management Consultant at Data Edge, who has been working with Data Mesh in parts at least for quite some time.
Daniel explained that Data Mesh becomes crucial when a company scales quickly. “With the proliferation of data sources and data consumers, having one central team to manage and own data ingestion, data transformation and serving data to all potential stakeholders will inevitably lead to scaling issues. “Given the increasing importance of data in our organisations, designing for scalable teams and scalable platforms is really crucial,” explained Daniel.
An important thing that he mentioned about Data Mesh is starting to discuss the distribution of data because data creation is inherently distributed in all companies.” With the number of data sources growing every day, many organisations should probably at least consider what their options for scaling are.
For companies wondering whether Data Mesh is a good fit for them, Daniel suggested that if you have domain-driven development, started working with Microservices, or if you do a cloud migration, that’s a good time to consider it.
Barr Moses, Co-Founder and CEO, Monte Carlo, states that domain-oriented data architectures, like Data Meshes, give teams the best of both worlds: a centralised database (or a distributed data lake) with domains (or business areas) responsible for handling their own pipelines. This way, Data Mesh allows for easier data architectures scaling by breaking them down into smaller, domain-oriented components.
Does Data Mesh make sense for all types of organisations?
Lars Albertsson, Founder of Scling, described Data Mesh as one way to scale out large data organisations where data management and governance has become challenging due to the number of teams working in the data platform, in his interview.
“A Data Mesh can allow companies to scale further by federating data management and governance to teams that own data sources and data pipelines,” explains Lars.
Companies adopting Big Data and DataOps go through several phases of organisational structures around data, over a period of many years. In early phases, there is one team or a few tightly collaborating teams that use a single data platform, where the core component is typically a data lake combined with batch processing pipelines, potentially complemented with stream processing capabilities.
Most companies are either in these early stages or still have not yet built their first incarnation of a data platform. More data mature companies have managed to spread data innovation beyond pioneering teams, and democratised data processing capabilities. To facilitate data democratisation and make governance manageable, the data platform technology and development processes are kept homogeneous. There are usually small variations, but if entropy is not kept under control, excessive friction will prevent data democratisation, adds Lars.
A centralised data platform can scale to large organisations. For companies whose culture makes it difficult to scale centralised services, centralised data governance can be perceived as a bottleneck. In this case, a Data Mesh can be a way to scale further by distributing governance responsibilities. In practice, a Data Mesh incarnation is a data platform and lake that has been split up into multiple ponds and multiple processing environments, controlled and operated by different teams. The size at which a Data Mesh makes sense depends on a company’s capabilities to coordinate a centralised data platform, clarifies Lars.
In the end, Lars states that Data Mesh is not the only option for scaling to a large number of teams; companies with sufficient capabilities to coordinate data management can keep a centralised data platform and avoid the overhead of decentralisation.
Yet, how to know if your organisation is really ready to dive into the Data Mesh? To help companies make the decision, Barr Moses and her team have created calculation in the form of a survey for companies to determine if it makes sense for your organisation to invest in a Data Mesh. By answering questions about their data sources, data team, data domains, data engineering bottlenecks, data governance and data observability, and they get a score that helps them decide if they should go for data mesh. You can find the calculation in the Guide on implementing Data Mesh by Barr Moses, CEO of Monte Carlo, and Lior Gavish, CTO of Monte Carlo.
Future outlooks for Data Mesh and DataOps
Both Data Mesh and DataOps are set to disrupt the data and analytics industry in the next decade. But will they progress and transform organisations?
Regarding the above, Lars says that Organisations that have fully adopted DataOps are 10-100 times more efficient working with data, compared to traditional companies. Although these features are subjective estimations based on observing many companies across the maturity spectrum and there are no scientific measurements for DataOps, Lars states that they match scientifically measured operational metrics for companies at different levels of DevOps maturity, as presented in the State of DevOps Report. DataOps seems to have similar effects in terms of lead time for new ideas and time to recovery in the event of failures.
“This efficiency gap is so significant that it is disruptive.” DataOps is in practice a requirement for getting sustainable value out of machine learning technology. Building and operating machine learning applications, keeping them healthy, and iterating to continuously improve them is complex and expensive. Companies that have not achieved a high level of data maturity may strike luck once or twice, but cannot deliver AI innovation in a sustainable and repeatable manner, says Lars.
Contrary to this, Lars says that Data Mesh is not a disruptive concept, but it is a way for very data mature, large companies to scale even further. “These companies have already obtained disruptive value from data and fully adopted machine learning technology.”
What Lars sees as concerning is that most of the buzz around Data Mesh is among companies that are not yet at this level of maturity.
“Early adoption of a Data Mesh is likely to be harmful; if you have not yet established strong, homogeneous conventions and processes in your data platform, decentralising it will introduce heterogeneity, which slows down innovation and prevents effective data democratisation.”
Lars believes that for most organisations, adopting a Data Mesh is in practice likely to be a step backwards and reintroduce the data silos that we had before the age of big data. “Descriptions of the Data Mesh tend to emphasise the responsibility of teams that own data to publish cleaned, high-quality data as data products, a pattern also known as a Data Hub.”
Shifting responsibilities of data quality improvement to teams that have domain expertise is generally a good idea, and an evolutionary step towards data maturity, but the underlying raw data must also be made available, he relates.
For clarification, Lars gives the example of invalid financial transactions, which may be desirable to cleanse away for financial reporting scenarios, but they might be a gold mine of signals for fraud detection. “Hiding raw data into silos is a drawback of the data hub, and a significant risk when adopting a Data Mesh unless the company first establishes strong data democracy practices in a centralised platform. Hence, I am concerned that the buzz around Data Meshes will be harmful, and lure less data mature companies to build data silos.”
As opposed to this, there is no such risk with DataOps. “If your company can make people with different skill sets work well together, there are only benefits and will accelerate your journey to data maturity.”
Have you had any experience with implementing Data Mesh? Share your thoughts in the comments below.