Agile Data Management: Stay Flexible Without Losing Control

Writen by

Daniel Quadros
Reading Time

16 minutes read

Agile Data Management Explained and Demystified

Agile Data Management: Stay Flexible Without Losing Control

Written by

Daniel Quadros
Category
Data & AI Strategy
Published Date

November 6, 2024

Taking a page from the world of software development, where “agile” has been a buzzword for decades, today’s data engineers are increasingly talking about agile data management. In theory, agile approaches to managing data enable greater efficiency and reliability, resulting in better outcomes for the engineers tasked with data management and the business seeking to derive value from its data.

In practice, however, agile data management doesn’t always work out as planned, so it’s important to take a realistic, adaptable approach to agile. Agile is a great goal to strive for, but it’s also an area where teams often fall short.

Allow me to explain what agile data management entails, how the reality often differs from the theory, and which practices organizations should adopt to ensure that they get the most out of agile while also controlling for its shortcomings.

Start With the Basics

Agile data management is an approach to collecting, processing, analyzing, and reporting data that emphasizes flexibility and iterative change. That, at least, is a basic definition of agile data management. Specific takes on what agile means exactly in this context can vary because agile is a high-level concept, and there are many ways to translate it into actual practice. But in general, when you manage data in an agile way, you typically focus on breaking complex projects into smaller, more digestible pieces while also prioritizing flexibility and adaptability.

For example, imagine you want to analyze sales data to predict future trends. Under an agile approach, you’d treat each part of the process — collecting the data, transforming and restructuring it, analyzing it, and generating resulting reports — as a distinct stage. You might also iterate across each stage multiple times to improve the outcome on each pass.

That’s different from haphazardly processing and analyzing data without having a systematic, coordinated strategy in place from the start — which would be the opposite of an agile approach.

Agile data management practices like these are based closely on the concepts at the heart of agile software development, which breaks software projects into multiple stages while prioritizing flexibility and collaboration.

Time to Tackle Agile Data Management Challenges

In theory, agile data management sounds great. It promises to bring order and efficiency to what might otherwise become a complex, messy, drawn-out project. In practice, however, applying agile principles to data management can be tough. There are two main challenges:

Unpredictable timelines: You can plan a roadmap for your data management project that lays out each task and how long you think it will take. But you can’t guarantee your projected timeline will be your actual one. Tasks like building data infrastructure or configuring business intelligence (BI) reporting are complex, and it’s often impossible to predict accurately how long they’ll take.
Unpredictable data quality: If the data you’re managing is low in quality or difficult to access, you’ll need to invest greater time and effort in transforming and analyzing it. But because you can’t determine the quality of the data you are working with until the data management process is underway, there’s no way of determining ahead of time to what extent data quality issues might derail your attempt to implement a consistent set of agile processes.

The second challenge is especially notable because it’s a factor that fundamentally distinguishes agile data management from agile software development. When you build software, you know from the start what you’re working with, and you can typically control all of the significant variables (like which coding language you use and which types of IT infrastructure you set up).

However, data quality is an unknown variable you often can’t control when analyzing data. Unless you designed and managed the data generation process from the start — which is rarely the case because the data that businesses want to explore was often collected over long periods, starting before they knew exactly what they wanted to do with it — you have to work with the data you have, not the data you wish you had.

Follow a Realistic Approach to Agile Data Management

These challenges don’t mean teams should give up on agile data management. Despite its imperfections, agile is still worth it because it can help make complex projects more efficient and consistent.

They do mean, however, that engineers should expect that their projects won’t always go to plan and should be proactive in addressing the inherent challenges of an agile approach to working with data.

For example, at my company, Indicium, we prioritize agile approaches to data management, but we also practice the following to help manage risk.

Treating roadmaps tentatively: We create roadmaps to structure our projects, but we assume they are very tentative. Just because we’ve scheduled two weeks to set up data infrastructure, for example, doesn’t mean we assume that it will take exactly two weeks. It might be longer or shorter.
Building in time margins: Relatedly, we err on the side of overestimating tasks to build in some extra time margin. If we anticipate a task taking two and a half days, for example, we’ll schedule it as three. Taking less time than expected is vastly preferable to running over schedule.
Parallel tasking: When planning projects, we aim to perform as many tasks as possible in parallel. For example, if we can set up data infrastructure while assessing data quality, we’ll do both simultaneously. Some processes, like BI reporting, can’t happen until others are complete, so there are limitations on how much you can do in parallel. But when you perform multiple tasks simultaneously, you speed up the overall project, even if there are delays in some areas.
Clear stakeholder communication: We strive to be clear and transparent with business stakeholders when planning projects. We emphasize that tentative timelines are just that — tentative — and we keep them up to date on changes in the project as it unfolds. The people who depend on data management and analysis to make business decisions need visibility into when analysis will be complete, and leaving them in the dark — or misleading them with timeline promises that you can’t guarantee — is not in anyone’s interest.

These practices enable a data management strategy that is both agile and realistic. They allow us to build efficiencies into data management wherever possible while avoiding goals or promises that are not feasible to achieve, at least not every time.

This is what agile data management should ultimately be about: Transparency and pragmatism. As I noted above, you simply can’t control your variables in the context of data management to the same extent that you can in software development, and as a result, data engineers need to operate a bit differently than their counterparts in software engineering when it comes to applying agile principles. Strive for perfection, but plan for the impossibility of achieving it.

This article originally appeared at The New Stack on November 6th, 2024, under the title ‘Agile Data Management Explained and Demystified’.

About Indicium

Indicium is a global leader in data and AI services, built to help enterprises solve what matters now and prepare for what comes next. Backed by a 40 million dollar investment and a team of more than 400 certified professionals, we deliver end-to-end solutions across the full data lifecycle. Our proprietary AI-enabled, IndiMesh framework powers every engagement with collective intelligence, proven expertise, and rigorous quality control. Industry leaders like PepsiCo and Bayer trust Indicium to turn complex data challenges into lasting results.

Daniel Quadros

Daniel Quadros is Indicium’s VP of Consulting. He leverages deep expertise in data strategy and analytics, as well as a strong background in Economics, to help companies enhance their analytical maturity, modernize, and scale for a data-driven future.