With data being recognised more and more as a business asset rather than just an artefact of doing business, realising and extracting the value from data is something many organisations private and public are wrestling with.
One of the major challenges with this is the trade-offs, real and perceived between agility and control. For decades now IT has set out its stall around controls, processes and rigour providing governance and the ability to sleep at night knowing data & systems and controlled and secured.
The friction comes where the business is demanding more than ever that the value of data, not to mention transformation of processes and products that consume and generate data, be extracted more rapidly. This business agility demand seems in direct opposition to the ITIL style processes IT have built to keep the business’ assets secure.
So how do we serve these opposing demands? Can it even be done?
The answer is largely yes, but it takes careful consideration, planning and architecture to finely balance the two.
Let’s look at data to start. Often businesses have invested heavily in a data warehousing approach. Rigorously designing and building a highly structured system that cleanses data on the way in, ensures it conforms to pre-defined norms and then loads it into a model that drives business processes from reporting to budgeting and planning.
With big data, data science and modern data platform hype there’s a fear that the highly controlled and orchestrated world of the data warehouse is slipping away and the data lake ‘wild west’ is taking over.
In reality we need both worlds and we need to be careful that the lineage between them is clear.
Your data scientists need quick access to raw data and the ability to run experiments over it, but you don’t want your business decision makers needing a PhD just to find out how their quarter is looking.
If you try and hang business processes off your agile reporting solutions, you’ll quickly kill your agility – you can’t risk iterating quickly on data models that are used to get people’s pay cheques calculated downstream.
There are multiple audiences to serve with your modern data platform, from the casual user who wants regular reports and simplified analytics, through analysts to data scientists. Designing the systems to allow federation of tasks across a wide audience, whilst governing processes and controls for more structured data is achievable and typically involves designing a solution that involves landing all data raw, untouched, un-purified once, often in a data lake, and providing access to those who need it.
You can feed your structured data warehouse & reporting style solutions off of this, serving operational masters by undergoing rigorous change management and controls to validate the quality and govern access, whilst empowering your analysts & data scientists to iterate on demand building & evolving models that hydrate themselves with the untouched data in your data lake and cleansing their downstream copies as they go, leaving the source data and systems that depend upon it unaffected.
There’s a fine balance to be found to enable agility and maintain control. Bolting everything to the floor will kill the ability to innovate. Opening everything up could enable chaos, but with a little planning and a well-designed data architecture that understands the multiple audiences it serves you can indeed have both.
 Techopedia defines a data lake as a “massive, easily accessible, centralised repository of large volumes of structured and unstructured data” https://www.techopedia.com/definition/30172/data-lake. This pattern often leverages cloud services such as Azure Data Lake to store, process and serve up data as required https://azure.microsoft.com/en-us/solutions/data-lake/