ML pipelines can get incredibly complex to create, manage and "glue" together data/datasets and code.
Understandably, it is always a dilemma between speed of execution and quality of engineering - due to this company-wide ML pipeline quickly becomes difficult and costly to radically change.
It is also remarkably easy to incur massive ongoing maintenance costs at the system level when applying ML (AWS bill for example).
Also many people are usually involved from different verticals and departments will focus on their own area of expertise, not on the ML pipeline as a whole and become significantly more resistant to changing the pipeline's structure.
My question is following - What is the right approach to avoid ossification? Do you recommend having multiple "small" pipelines rather than one big one (but then the results may be affected due to lack of data sources)? Have you faced similar issues in your projects and how did you solve them specifically?
I have encountered some form of ossification many times - due to the factors you enumerated, ossification is unavoidable. It is not uncommon that separate departments are responsible for the different parts of the pipeline (Conway's Law becomes a factor). Non -public datasource typically require legal contracts and expenses. Note that some of the factors we are talking about are not technical but organizational in nature.
Regarding how to address it - I typically don't recommend multiple smaller pipelines as a first step, because:
A combination of the multiple smaller pipelines (that are then integrated into the larger pipeline) is subject to the same forces.
When an organization is just starting with the AI, the main problem with most AI pipelines is typically not the complexity of pipeline per se - so modularity doesn't help much.
Not to mention that these smaller pipelines would also need to communicate between themselves, so pipeline needs to reach significant complexity before modularity starts helping at all.
Unlike in the traditional code, it is generally difficult to achieve low coupling and protected variation in the AI systems. D. Sculley et. al. "Machine Learning: The High Interest Credit Card of Technical Debt" is worth a read for some of the problems that maintenance of the AI systems brings to the table.
Instead, the recommendation is to accept that the pipeline would ossify to some extent and to make sure that you are working with the right pipeline in the early stages of the project. While good technical practices could slow down ossification, it is the nature of the beast that some ossification will happen.
MinMax analysis and Economize part of the CLUE process (chapters 6 and 7 in my book) talk about how to determine do you have the right pipeline and in which stage to invest. While I go into much more details in the book, in short:
Construct the simplest possible implementation of your AI pipeline - duct tape and quick prototyping. This is your min. Can it reach your business goals? If yes, then you know that you have an acceptable AI pipeline.
If min pipeline is not good enough, what can "money, expertise, and time are not a factor" version of your pipeline achieve? That is max analysis.
If max pipeline wouldn't work with every stage in it implemented with the best technology/result known to mankind, it is probably not a good structure. You research the best implementation of stages and use stubs or COTS software to implement them in the prototype. It is not advisable to hope you can improve the pipeline that fails max analysis.
If min is good enough (or max is not good enough), you know should you keep (or discard) your AI pipeline. If you are "in the middle" (min not good enough, max is good enough), then you need to improve the pipeline. That is a subject of the sensitivity analysis (sensitivity of the end result of the pipeline to the change in a stage of the pipeline).
Typically, you don't have enough resources to implement multiple AI pipeline. However, if you do, it is possible to draw a timing diagram to chose between multiple pipelines (the latter half of chapter 7).
Oh wow, thank you for a prompt and exhaustive answer Veljko! Your input is very valuable, indeed.
I hear what you're saying and understand your recommendations - start simple, always assume some level of (inevitable) ossification, put processes in place that will allow to "rinse and repeat" and improve the pipeline and if that is not enough - it is cheaper to start from scratch again.
And yes, i agree - usually we don't have enough resources to implement multiple AI pipelines because there are only handful of people doing this...