This is a transcript of episode 228 of the Troubleshooting Agile podcast with Jeffrey Fredrick and Douglas Squirrel.
Squirrel advocates the radical view that data consistency is a shibboleth and an inadequate excuse. He and Jeffrey explore this position and wonder why companies are so obsessed with having a single source of truth.
Squirrel: Welcome back to Troubleshooting Agile. Hi there, Jeffrey.
Jeffrey: Hi, Squirrel. I was reading your Twitter feed and you’ve done it once again. You’ve come out with one of the more heterodox opinions that I’ve come across recently. You said something along the lines that multiple sources of truth aren’t so bad. Can you tell us what you mean by that?
Squirrel: Well, I had a client this week who told me, “everybody always tells us in our ecommerce business that we have multiple sources of truth. We need a single source of truth, a single view of the customer. We need to understand what’s happening in our business. We have to know and we have to know right now.” And I said, “No, you don’t.” This knocked them over. So I thought I’d tweet it. And then you wanted to ask me about it.
Jeffrey: Right. So presumably this is coming up in the context of a potential project. Every time I hear people talking about a single source of truth and “we need to get there,” there’s a lot of work being done to try to reconcile things, major data projects to bring stuff together.
Squirrel: And Jeffrey, how many of those projects do you know have been successful? How many companies do you know who have a literal single source of truth? They know what’s happening with every customer, every moment. They have all the right information.
Jeffrey: You know, that’s a good question. I’m sure there are some that get there, but I think many more don’t.
Squirrel: I’m not sure there are any that get there that are successful. I mean, if you had just one customer, I bet you could have a single source of truth about that customer. Those businesses aren’t the successful ones. So the question I asked my client was: “how many sources of truth do you think Jeff Bezos has?” Amazon is going into salons, I read this week; you can get your hair cut by Amazon. I don’t understand this, but apparently that’s the next thing you need from Amazon. The metrics for Salon success must be completely different from ebook ordering success or cloud hosting success. And I bet they’re all inconsistent. If I go get my haircut this week, Bezos will see one piece of information about me that will probably conflict. Maybe I gave him a different address, or I was in a different city than where I live, or who knows. Something will be different about me in one than the other. I think Amazon has still been successful. You could name any giant organization and if its big enough, you’re going to be able to see that they can’t possibly have a single source of truth, and that suggests maybe you can be successful without one.
Jeffrey: So you’re going here just empirically, saying you can be successful empirically, but what about theoretically?
Squirrel: I don’t run theoretical businesses. I advise real businesses that are actually operating, and the ones that I observe never have a single source of truth, and the successful ones are successful in spite of it, not because of it.
Jeffrey: So. Well, I want to know more about this “in spite of,” one thing before you say that: I think of correct versus useful. You’re saying we don’t need to have a perfectly correct view as long as we have a view that’s useful? Is that the direction you’re thinking here?
Squirrel: Absolutely. No major project that I know of has all the right information at all times. There’s always a fog of war, always clouded areas, bits you don’t understand, and you have to make decisions anyway. So if you imagine major activities like landing a person on the moon or just about to open here in England this gigantic project called Crossrail to expand the underground line, and this is a very successful project. It’s also way, way, way over budget, over time, and every other thing. But it’s actually going to run. I guarantee there were loads and loads of places as they were digging under the earth that they said, “well, we’re not really sure whether we’re on track with this part. We’re not sure whether we’re going to get the right parts. We’re not sure whether we’re a millimetre off or a metre off in this part of the tunnel.” But they went ahead anyway. If they’d stopped and tried to get perfect information, maybe they would have dug the first shovel-full today. The same is true for any other major project: you’re going to have imperfect information. This is, I think, an excuse. This is a shibboleth. This is something that people make up in order to avoid the actual problem. They say, “I haven’t got the right information. Therefore, I can’t go forward. I can’t start the marketing program because I don’t know whether to to run it in the north of the country or the south of the country.” Why don’t you run it in both for a week and see where you make more money? That would be a way of finding out quickly without having all the accurate data, without having the data lake normalized or whatever other project you were going to undertake. Get the right information that gives you the opportunity to make the business decision and move forward. That’s what people are doing in successful projects, and that’s what I told my client to do.
Jeffrey: Now, the thing about this that’s a bit ironic for me is I think that you would agree about being data-driven and making database decisions.
Squirrel: Sure. With inconsistent data.
Jeffrey: The funny part here is we get people who say they want to be data-driven and who are driving the idea of having a single source of truth. You may have touched on what they are after, which is they want to be able to say, “we’re certain that we’re going to be correct.” They want to be certain because “we had all the data. It was all in alignment. It all agreed. And therefore, when we go do this, make this action, we’re confident it’s going to be the right one because the data tell us.”
Squirrel: Yep. And all the companies I know that have been that certain are dead. Because that level of certainty takes you forever, and it’s almost impossible to achieve. You can imagine Jeff Bezos trying to create it at Amazon. If he had, we wouldn’t have Amazon.
Jeffrey: Implying here there’s a trade off in that we need enough data to make decisions—or do we? Is data not important?
Squirrel: Well we need some kind. It could be “The data is my gut tells me this is the right thing. I talked to five customers yesterday and they all told me this thing. We have 500. But five is enough for me to be sure that I should try this.” That’s data. So when you ask me “should we be data-driven?” Well, sure, I want to have as much information as is available, as is responsible to consult. But when somebody tells me, “look, we have to make sure that we’ve explained exactly why we’re 5% off in our marketing numbers than we are from our website traffic numbers.” I say “please stop talking to me about this before my brain explodes.”
Jeffrey: So it’s funny in hearing this, I’m getting an anti-cuckoo’s egg moment. The Cuckoo’s Egg being a Clifford Stoll book where he discovers KGB hackers in the university system because of a one penny difference between two accounting systems. What if I pull that out Squirrel? “If Clifford Sole had had your attitude, those terrible hackers would never have been caught.” What do you say to that?
Squirrel: You can always argue cases and argue the exceptions. “The exception proves the rule” means it tests the rule, by the way, per the old meaning of the word proof. So it is a good test, thank you. I would say that it’s very useful to be attuned to small differences. What I think Clifford did not do—I haven’t actually read the book so you can tell me, but what I think he did not do—was to set up a complicated data lake with normalization and comparison functions and a whole bunch of carefully orchestrated metrics that pointed out to him that there was a level three anomaly in the billing system between A and B. I think he was looking and he said “these things don’t seem to match up and that bugs me. There’s something here showing me as a human that something is wrong.” Because I predict that there were all kinds of billing systems that were all kinds of wrong because people had recorded the wrong information or gave the wrong address or there were two John H. Smiths, and one of them was John H. Q. Smith and who knows what else. If he had an alert for all of those, he never would have seen the important one. What he did is he had availability of things that were actually reporting inconsistent data, and he noticed that there was one that was important enough to follow up and he followed it up and he found something interesting. So what I’m advising is not investing in trying to find every single possible failure, to build a perfect system, because you’ll never get done. Instead give yourself enough information, including inconsistent information, which is exactly what he had, and to use the inconsistencies to tell you something.
Jeffrey: All right. The tradeoff between getting a useful system or a non-useful system is one that resonates with me. I think we’ve talked in the past that often companies spend more time generating metrics than using them. It seems to me that in a sense this “single source of truth” is the ultimate version of that.
Squirrel: “We know exactly how much money we’re losing per customer. We have it really accurate. We know exactly how bad it is. We haven’t done anything about it yet because we wanted to know exactly was it $4 per customer or $40? Now we know it’s $37. Okay. Unfortunately, we don’t have enough money to hire anybody to solve the problem, but we really understand the problem.” I don’t think that’s a useful direction to go.
Jeffrey: All right.
Squirrel: Excellent. Thanks, Jeffrey.
Jeffrey: Thanks, Squirrel.