Why SLAs smell of waste

Julian Elve

Jul 12, 2011

Updated Jun 12, 2019

4 min read

DevOps

There’s quite a lot now on the internet about “devops” – combining development and operations work to increase flow and reduce problems

Google search for Kanban + Devops
Links I’ve tagged “devops”

One of the key features of this approach is the idea that above a certain threshold of estimated duration, all operations work has to be included in the kanban board for visibility and flow management.

For example see these meeting notes from a DevOps conference in Ocean View, June 2011, where Lonely Planet shared their experience of this, with a threshold of 30 mins – i.e. any servicedesk issue which takes more than 30 mins gets moved to Kanban

I had a brief exchange on Twitter with Dominica DeGrandis, one of the DevOps leading lights, in which she confirmed that in her experience 1 hour was the most common point where teams found that the visibility and sharing was worth more than the overhead of putting into Kanban. This happens to be the pragmatic transition point adopted by the team I am coaching.

SLAs are a Problem!

Intuitively a follow up step will be to look at how this approach affects traditional service desk SLAs. I’m not a great fan of SLAs, as they tend to focus on failure and what happens after a failure – I’d rather that a service “just worked” and that resources went into keeping it that way.

I’m particularly disenchanted with the idea of SLAs when they are applied between two departments of the same business, not least because they stimulate the wrong sort of behaviour, in particular the chasing of local optima. The SLA-driven approach also encourages managers on the “consumer” side of the agreement to pay no attention whatsoever to systemic problems of their own making, to the ultimate detriment of the overall firm.

Having just read Bob Marshall’s Marshall Model of Organisational Evolution I can see that the frustrations I am expressing are a factor of the transition zone between the Analytic and Synergistic organisational mindsets.

I find John Seddon another source of stimulating ideas, and his piece on “Why do we believe in economies of scale?” has some particular insights to how the SLA mentality creates organisational waste.

I humbly offer a few more thoughts to the debate:

“Incident management” as a process is focused on failure, and managing the impact of failure rather than removing the causes (and yes I know there is a whole other ITIL process of “Problem Management”)
SLAs focus resource on local optima (fix this incident for this user) rather than on the “best value for the whole firm at this time”
Incident management systems tend to accumulate backlogs of failure demand which represent inherent waste, and which also clog flow making the work to address underlying causes inherently less efficient
Efforts to create a synergistic “OneTeam” approach focused on flow are undermined by too much interrupt-driven work. Integrating the interrupt-driven work into the flow gives a much better sense of how to “add the most value possible right now”

The way I explain this to colleagues is usually along these lines:

The business has invested a fixed amount into IT development & support

Usually (especially in small / medium business) there is a constraint within the technology team

Therefore the best value to the business from that investment is through:

elevation and exploitation of the constraint
reducing lead time
prioritising based on economic cost of delay

All of which are delivered by a tuned Kanban system.

To make sense of this we need to educate ourselves and our colleagues about the systemic dysfunctions caused by trying to force a system to work faster than the constraints allow. It often takes time – for far too many people from the Analytic mindset the first response is “they just have to work harder”

How about you?

I’d love to hear from other people grappling with these issues – please comment here or tweet me (@Synesthesia)