Monday, 19 October 2015

A Perfect World

I really enjoyed the Fishman paper, “They Write the right stuff” as it was an interesting view on ‘how the other half live’. You can see that if we had a tour of the NASA software development center that they do indeed work to the highest possible standards, and may indeed work to the strictest possible letter of the law of some set of standards like the CMMI guidelines which Royce describes in one of this week’s other papers.

A number of years ago I was building a testing team for a company who had previously not really cared about software testing. That company had previously believed that agile development meant that they did not have to test their code. They thought that they could push rapid fixes to their live environment, and this negated the need for any formal testing structure. [Note: This may seem almost funny now, but a lot of people and companies really thought at that time]. In building that team I interviewed a gentleman who worked for a company who wrote the software which tested artificial hearts and lungs. Imagine, my companies bug will cause us to need to push out an emergency fix out of office hours. His companies bug will potentially kill people! It’s a different mindset.

I think when we look at Fishman’s Nasa example anyone in “commercial” software development. The software at Nasa cannot have any errors, plain and simple. They have spent the time planning, and clearly have the people time and processes in place to ensure that it is perfect. As the article mentions “It is an exercise in order, detail and methodical reiteration”. I have spent my career in development teams at rapidly growing commercial software companies. Our focus is on getting “this” feature out, at the best quality possible in the time available, and to move onto the “next” shiny object feature. Our resources and timelines are driven by commercial reasons. The resources and timelines at Nasa are clearly driven by the need to get it right.

If you go to the next level of information in the article. One line jumped out at me, they speak about how they get to the near zero defect rates, “Packet data and view graphs on a line by line basis”. I read that and I genuinely wonder how ANY complex code can get passed that. But I have never worked in a zero defect environment, and clearly wont ever! In commercial software development, we fix any critical defects. We have logs, and basically ignore things below error level, any warnings or info notifications don’t even get logged in the defect system. Clearly at Nasa, they want the logs empty of anything. I cant emphases enough how huge of a mindset change this would be for most people.

Finally, there is one last thing to be pointed out. The article quotes someone at Nasa as saying, “Don’t fix mistakes, fix what permitted the mistake”. This is one of those things that everyone knows, but virtually no one follows through and puts into place. It's like when I lost a lot of weight a few years ago. People are disappointed when I tell them I exercised more, ate smaller portions and made my own food. Folks want a story about a magic pill. They simply don’t want to know solutions that a 5-year-old could tell you. In this scenario, you have a defect and any developer could tell you that you don’t just fix the issue, but you fix the root cause. But I have done proper root cause analysis a tiny number of times in a 15-year career around software development [Mostly around times around defects which resulted in LARGE financial loss for the company!]. We all know we should do it, but no one takes the time to do it properly.

2 comments:

  1. Funny sad in fact, that people think they should release without testing, without rigour. Costly too if you get it wrong, which inevitably happens.

    ReplyDelete
  2. I would think that among the tech companies around the world you would run into attitudes like this. "We dont look back, we only move forward" and the ability to quickly push fixes to live environments make people and teams complacent.

    ReplyDelete