Tuesday, March 16, 2021

An Important difference between DevOps and DataOps

Where DevOps is automation, technology, and delivery focused; DataOps is more customer focused. I like these descriptions from Wikipedia for DevOps and DataOps;

  • DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. https://en.wikipedia.org/wiki/DevOps
  • DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations. https://en.wikipedia.org/wiki/DataOps
The similarities between these two are many, particularly from a process and automation perspective. I see DevOps really focused on delivering quality software, and DataOps focused on delivering visualized data analytics to the customer.

Customer focused DataOps assists with Agility

Having a customer focused data analytics team fits well with an Agile approach. The data analytics team needs very involved customer analysts (or product owners). The customer analyst identifies the KPI's, models, or intelligences that need to be fulfilled. These become part of the backlog, and as new sprints are defined they become focused on the item(s) of analytic. A sprint can be built around a few analytics, then iterate around the items for a DataOps sprint;
  1. Where is the data? How do we get at it?
  2. How do we best move it? How often? What are the security or privacy issues?
  3. What needs to be cleansed or transformed? Is the data at the correct granularity?
  4. Do we already have any related data to improve the intelligence? Is this a new build or do we use / alter an existing pipeline?
  5. What models or analytics do we apply?
  6. How do we best visualize the data?
Not to say that DevOps can't fit well within Agile approaches, it can.... the backlog is more technically focused and fits into the sprint more from a continuous perspective than a customer perspective. (What DevOps features go into a sprint are often negotiated with the product owner). The focus of DataOps is in shipping features that fulfill a visualized analytic or more... The focus of DevOps is in CD / CI...

This approach worked well for us when working on a Business Intelligence project and our nine week sprints usually focused around 4 to 9 KPI's. The organization was in aerospace, they had many legacy data sources with new data sources coming online. As with many organizations, they were in a state of improvement and transformation. Fitting new cubes, representing KPI's, into sprints allowed us to show progress and success. The biggest challenge wasn't in the technical or delivery side of getting the data to the customer. The challenge came in developing a data team where every team member understood the process end-to-end and the efforts required during each step of the DataOps pipeline. Acquiring, cleansing, and transforming data takes as much effort and understanding as visualizing the data for the customer.