This is a transcript of episode 257 of the Troubleshooting Agile podcast with Jeffrey Fredrick and Douglas Squirrel.
Jeffrey remarks that he’s working with a software team whose code is “done”, that is, the organisation wants to keep using it but doesn’t want to invest more in changing it. He and Squirrel reflect on when this make sense and how “software doneness” affects processes and measurements.
Listen to the episode on SoundCloud or Apple Podcasts.
Listen to this section at 00:11
Squirrel: Welcome back to Troubleshooting Agile. Hi there, Jeffrey.
Jeffrey: Hi, Squirrel.
Squirrel: So I hear you’re doing something new that our listeners might be interested in. In particular, something with a new kind of economic, profit-oriented attitude to it, a way of working with software that is new to you and me. So what is that and how could we learn from it?
Jeffrey: Well, I’ve moved into a slightly different role looking at this engineering organization, and one of the ideas we have is around the idea of software being “done,” that we can create software, it solves customer needs, and then we can have it be finished and move the development team into working on new software.
Squirrel: What a radical idea. It would be like building a machine and putting it in the factory and then expecting it to produce widgets for 20 years.
Jeffrey: That’s right. I’m sure for some of our audience they’ll say, “Well, that’s normal. Legacy systems are like that.” But I think there’s an element to which it feels unnatural to a lot of the software developers I talked to, because they’re used to working in their product teams and if issues come up in maintenance, then they just deal with it in the course of their business. I think it’s been different though, this model of, “Once the software is done, it should be high quality, it should work, the system should continue to operate. Any variation from that, if we end up spending unplanned time on this old software, that’s inhibiting the profitability of that software and we have less resources to go and put on to new innovations, new software.” So it’s a slight tweak.
Squirrel: I’d watch out for the negative connotations of some of the language. It would be natural to call this “legacy software”, “old software”, “the previous system”, “version one”, that sort of thing. This isn’t software that you think of that way. You don’t think, “Gosh, this software is what we want to replace. We’re sunsetting it. It’s going to be gone soon.” You’re thinking, “Hey, we can keep turning the crank on this for a long time! We just don’t want to spend a lot of time improving it or changing it.” Is that right?
The Need is Satisfied
Listen to this section at 02:38
Jeffrey: Exactly. Another big difference is how you think in terms of your marketplace and your clients. A lot of people are used to “Oh, yeah, that’s that old system with old clients on it and eventually they’ll go away and that thing will die.” But we’re like, “No, this could be something we just wrote, but it’s done and we’re selling it. And if there’s new clients in the market who don’t have it yet, well, they should buy it. We should be signing new people up.” And that idea of signing new people up for a system that’s not under active development—at least in my background often with startup companies—that kind of breaks people’s minds. The idea that you’d have people signing up and becoming new clients for software to which there’s no active development effort just doesn’t seem to make sense to them.
Squirrel: Another part of the world that probably many of our listeners are in is building software that’s bespoke for a large organization, or that solves a problem for a market of some variety. What you wind up doing in that situation is changing it to adapt to changing circumstances. If you’re providing a service to customers, the customer’s needs change and the economy changes and there are new things that the business decides to do that have nothing to do with computers, but the software has to keep up. But it sounds like to me what you’re deciding is this is a stable area. This is an area that’s not bubbling and roiling and altering all the time. This is an area that’s pretty much going to stay the same. “We’re not expecting something new to happen, so we’re just going to keep turning the crank on this piece of software,” is that right?
Jeffrey: Exactly right. Over the course of many years, things might change and we might come back to it. But at the moment and we think it’s done, and that whole idea of looking to build a system and have that system be complete and finished and just able to go and operate without someone looking after it all the time, that is a desirable end state. That’s what had me thinking about this, about the economics of software development from the view not of what we usually talk about, which is experimentation. I think a lot of the time we have talked about running different experiments and elephant carpaccio and having little slices of delivery every day and the economics of learning.
Squirrel: But here we’re not looking to learn. We’re looking to continue to run. But I would claim there’s areas where you do want to learn. You want to learn about aspects of the software and its operation that could help you make a greater profit from it so you don’t have to turn the crank so hard. Is that right?
Jeffrey: Exactly. I want to understand in these systems, how often do we have incidents? How often are we pulled in to prop something up or something breaks? Can we measure then how good we’ve been at creating a system that can run unattended and find areas of improvement? How can we learn not about the market, but learn about our own system in ways that increase its stability and therefore its profitability, its efficiency?
Squirrel: And your thinking might be quite different than in a system that is constantly evolving and in which you have a development team on staff. In those situations, often you’re trying to both learn a lot of things and reduce the large variation as much as you can. Whereas here I think you could tolerate quite a lot of variation if it was predictable. So for instance, if the system runs out of memory once a day and there’s a window where it’s not being used, well, you could turn the crank more easily by just restarting it once a day. Right? And you’d be okay with that, which you wouldn’t in a system that was evolving because that might get worse.
Jeffrey: Yeah, that’s right. That’s a really good point that we would hope that because there’s little change happening in these systems that they would be very stable and we’d be able to characterize their behavior very well and as you say, they become very predictable and have built-in remediations that allow things to operate smoothly. You’re right that that is different than a system that’s actively evolving, where it would be very difficult to characterize over a long period of time because we’re not going to have a stable system in terms of its operating characteristics. I wouldn’t expect the memory utilization profile or disk utilization profile or network or whatever to be that consistent if we’re constantly adding new features and driving new behavior from clients, compared to system that’s—as you said—relatively stable and should be pretty consistent over time.
Lessons from the Factory
Listen to this section at 07:18
Squirrel: So the thing that comes into my mind when when you describe this is an artifact called a control chart. For those who don’t have Google to hand, imagine two horizontal lines above and below an x axis, and you plot points with time on the x axis, these are different events, and a point might be above or below the x axis depending on how far out of bounds it is, how unusual. Right on the x axis is normal operation, and in my example running out of memory once a day has a relatively low impact because all you have to do is turn it off and on again and the frequency isn’t too high so there aren’t too many dots. You can plan for them. There’s a dot once a day and so before that you restart the system. That kind of control thinking that you use in a factory might be helpful in analyzing and mitigating any variation that you find in the “completed” system that you’re working with.
Jeffrey: Yeah, that’s right. When you brought that up I had been thinking just in terms of common measures. I was thinking of the number of incidents and in particular I was thinking of mean time to recovery (MTTR) and what occured to me is with this control chart approach, if I were measuring each incident on the chart, I plot it and then the y axis becomes time to recovery. Well, two things would happen, and one is I would get the MTTR out of it, but I would also then find what the normal range is, that standard distribution, and I might say, “Oh, a variation within this range is normal.” But then I could also spot outliers and have deeper investigations into what happened if something’s really unusual in terms of its impact. Those are my places for further investigation. So that was the idea that came out from discussing control charts.
Squirrel: Yeah, that’s exactly the sort of thing you do in an industrial machine environment, in a factory, because you’re looking for the machines that need some tuning or need some oil or need to be replaced because they’re creating variation. So you plot them on this kind of control chart and say, “Wait, these dots are outside the bounds. What do they have in common?”
Jeffrey: Yeah, the interesting thing is if I compare this to one of the points made by the person who wrote Principles of Product Development Flow-
Squirrel: Oh, Reinersten.
Jeffrey: Reinersten, yeah. Reinersten makes the point that the economics of software development are different than the economics of manufacturing. In particular, he points out that in software development, variation is a source of value. It’s where innovation comes from. Variation in a factory is waste. What’s interesting here, what it came up for me is the difference when we’re no longer trying to innovate with the software, when we’re trying to merely operate it, now we’re much more in that “variation as a waste” situation as in traditional manufacturing, as opposed to in new software development where variation is a source of innovation.
Squirrel: Absolutely. So if you if I told you, “Hey, you won’t have to restart the system once a day it’ll only be once a month, but occasionally the whole thing will be down and you’ll have to get the whole team in to fix it.” I think you’d be much more sad in the second situation given your current economics, whereas previously in a more innovative environment you might say, “Well, I’ll take a big variation, I’ll take a big outage and a big investment if it means that I can learn more and evolve my system.” But at the moment you’re not thinking about that in your current role.
Jeffrey: That’s right.
Squirrel: Great. Thanks, Jeffrey.
Jeffrey: Thanks, Squirrel.