All you need to know about data mesh
Data mesh is called the modernized data architecture. Our data engineers explain data mesh principles and architecture, how it works, along with how it differs from data lake. A good and fast read if you are exploring data mesh solutions and want to know how to build data mesh for your organization.
Suresh
Dec 16, 2024 |
7 mins
Key Takeaways
1. Data mesh is exactly opposite to SSOT and data lakes, as it denotes setting up decentralized data architecture and federated governance.
2. Every functional unit is responsible for data they produce; Data mesh entrusts data users and teams in accessing data they want, while centralizing governance and security policies.
3. Data is treated as a product here, making it findable, understandable, and sharable for end-users.
4. Data mesh is not a cost-effective data management option, especially when you try to implement them.
5. Even though data mesh is a modern approach, only 18% companies have the initial setup that’s needed for a successful mesh implementation.
Data mesh: What it is, use cases and benefits
Data mesh is a decentralized approach to data management, where data is considered as a product and each department/functional area handles their part of data. This is how data mesh differs from data lakes which is more about centralizing organizational storage in one location, but mesh is about decentralizing them, making respective teams manage it. For example, the marketing team takes care of website visitor and campaign performance data, while the sales team takes care of lead and customer data.
Why should you go for data mesh?
Teams can handle data better, and use whenever required without any hassles.
Data mesh is suitable for large organizations to make better use of data efficiently without complications of high volume data.
More flexible than centralized data management. Allows business teams to decide how the data must be managed.
Less operational costs as there is no need to own complex central pipelines, connecting complicated business structures.
Use cases of data mesh
1. AI & Machine learning models
Teams that build and use AI/ML applications can use data mesh to power these applications. For example, a company needs to build and run an NLP application. Data mesh domains could come in handy here, to store, process, and manage the image and text data required. No need of deployment struggles from a central data team.
2. Data monetization
Companies that want to monetize their data by selling it third-party services can use data mesh. The decentralized architecture allows them to package datasets as products, with clear API, governance, and documentation for consumption.
3. Data analytics
A fast-growing company with branches across the world can use data mesh for analytics and AI use cases. Data mesh gives regional teams the privilege to adhere to both local and global standards, while sticking to unified governance. Teams can have customized analytics dashboards and get real-time business insights.
4. Nurturing data-driven culture
Businesses that want to allow self-service analytics, where individual teams and members explore data relevant to their roles.
Core principles of data mesh
Domain oriented data ownership
Each team manages the data they create. Hence, it’s more easy for teams to access data relevant to them, whenever they require them. Each data owner is responsible for their data’s quality, accountability, and usability, making it easy for business to align business roles and data ownership. Domain based data ownership allows people closest to the data to maintain and improve it.
Data as a product
Data mesh considers data as a product, meaning it should be easy to find for its users, like how a product is accessible to customers.
When data is treated like a product, companies shift their focus from just storing it to making it valuable for relevant users. Four key features of data as a product is making data discoverable, understandable, usable, and trustworthy for its users.
More data awareness and democratization. Users are aware of what they access, without losing their focus on quality.
Self-serve data platform
Data mesh’s key feature is its self-serve architecture which facilitates users with tools and platforms needed to access data, without contacting anyone. Data mesh will provide each team with data pipelines, storage, processing tools, API, automation tools, and self-service/BI platforms. Self service feature of data mesh reduces the IT/data teams dependency and promote fast access to insights. This self-service feature makes it great for scaling and large organizations.
Federated computational governance
Each domain manages it data, but everyone follows shared rules and regulations - that’s what federated governance means. Data mesh allows federated governance, which maintains data consistency, privacy, and interoperability across the organization.
Why federated governance?
To prevent chaos that could arise from decentralization
To set up a balance between flexibility and governance
To allow domains to work seamlessly while allowing the company to scale and grow seamlessly.
A healthcare organization could be a best example for this case. The industry must adhere to HIPAA compliance and others. Having unified governance can help this company ensure that each of their departments adhere to centralized regulations and standards.
Data mesh vs data lake
The difference between the data lake and data mesh is that data lake is a centralized repository, whereas data mesh is the concept of establishing decentralized, domain-oriented ownership of data.
Data lakes are designed for storage and scalability, and mainly used for use cases like big data analytics and machine learning. It can handle up to petabyte quantities of data in all types of formats. Some examples of data lakes include AWS S3, Azure data lake, or Google cloud storage. Data lakes are the most cost effective and flexible storage solutions for small and growing companies to store large amounts of data without worrying about processing.
Data mesh, on the other hand, divides up storage and handling, while centralizing governance. Example tools of data mesh are Microsoft Fabric, Databricks, Snowflake, and the like.
If data lake can be used by organizations with low-to-moderate data maturity, data mesh is for organizations with complex functional requirements or domain-specific needs. Even though both are different, many organizations use a combination of data lake and mesh too, as they scale and meet their complex data analytics needs.
Benefits of data mesh
1. Cost management
Maintaining a large, centralized data storage could be expensive, especially when one needs to scale. But, data mesh eliminates this by decentralizing, allowing each domain to focus on their specific needs. The initial setup of data mesh can be expensive, with each domain having their own tools and use cases. But, this could be a cost effective choice as the data management becomes complex.
2. Data quality
Each domain takes care of their data. Hence, utmost importance is given to quality and security, as teams and individuals are aware of how to use and handle data use cases. Hence, data stays accurate and useful and does not get stagnated (as we know, stagnation often leads to inconsistencies, errors, and duplication).
3. Interoperability
Cross collaboration becomes more easy with data mesh implementation, making data more usable across the teams. One team can share datasets and insights with others, making the information more holistic. And, this interoperability can happen without the intervention from IT and technical teams, fostering more inclusive and informed data-driven decisions.
4. Security and governance
The federated governance ensures adherence to compliance like GDPR or CCPA. Teams get flexibility to operate their own way and manage their specific needs, while the company could ensure centralized security policies and governance.
5. Data democratization
Data mesh leads to more empowered teams with self-service capabilities to access their own data. This leads to a sense of accountability among employees and helps them drive meaningful outcomes.
How to build data mesh for your organization?
Getting started with data mesh requires more than just planning and tools. It requires entire cultural alignment and data democratization. Here is how you can begin building a data mesh.
Let’s jot down steps involved in crisp points for data mesh implementation.
Initial brainstorming
1. Start with an initial assessment of your data architecture, its workflows, and use cases.
2. What are the current bottlenecks and painpoints users face? Is it inconsistent quality, delayed insights, or siloes?
3. From the above observation, list down goals and objectives for adopting data mesh.
Get down to domain-oriented data ownership
1. Find out and list down domains that need to be established. For example, finance, sales, marketing, etc. It’s okay to start with one team, experiment, and establish across the entire organization.
2. Map data sources and set up domain-oriented ownership for the respective team. Some businesses already have domain-based ownership. In that case, this wouldn’t be too much trouble to begin with.
Self-service architecture
1. Set up self-service architecture that can help teams access the data product more effectively and independently.
2. Start with storage. Choose cloud or hybrid tools (depending on your current data architecture) for storage, pipelines, processing, and data discovery.
3. Ensure that the new infrastructure offers reliable and automated insights on a regular basis.
4. Other tools you will require for data mesh implementation are follows:
Storage: Snowflake, Microsoft Fabric OneLake, etc.
Governance: Immuta, Apache Atlas, Microsoft Purview, etc.
Data discovery: Microsoft Purview, Alation, etc.
Collaboration: Slack, Confluence, etc.
Federated governance
1. Now that your teams have independent data domains, you should set up centralized policies for governance using tools like Microsoft Purview.
2. Any standard you must establish goes here - from documentation to metadata management to schema design to security and privacy to interoperability.
3. Sensitive data must be protected. Activate role-based and column-based security to let users access only what they are authorized to access.
Train the team
1. The data product must be discoverable, usable, meet quality standards, and interoperable.
2. Train and encourage teams to treat data as a product. Teams should be able to use self-service platforms to meet their everyday analytic requirements. Also, train them on how they can promote cross-domain collaboration.
Scale and increment
1. If you have started with one small domain, document your knowledge and findings and repeat the same for other domains.
2. Review the process a few months later and make changes to infrastructure if required.
3. Measure the success of data mesh implementation using metrics like time-to-insights, data quality, cross-domain data availability, etc.
Final thoughts
Extracting value from data mesh can be much quicker than its counterparts, like data lakes, which are known for becoming stale and swamp. That’s why many organizations are moving from traditional data architecture to data mesh, despite its complexity with setting up a decentralized architecture.
But, Gartner’s records from 2021 highlights that only 18% of companies are ready for a successful adoption and shift to data mesh. This means data mesh implementation could be challenging and requires a certain level of data maturity to begin with.
Want to see how data mesh can work for you? We welcome you for a discovery and discussion where you can assess your data maturity levels and how your current architecture could support data mesh migration.