As programmers, we all need to fix bugs. As experienced programmers, we recognize that sometimes, the ability to fix one bug depends upon first fixing another bug. Managers, on the other hand, don't always get that simple concept.

At the beginning of my career, I worked for Initrode where I wrote software to run a test-station that diagnosed assorted electronic components of jet fighters. Initrode acted as a government-supplier of the test station to another government contractor (LUserCorp) that used the station to write the test sequences to diagnose electrical faults. If the test station hardware malfunctioned, or there were bugs in the software that made the electronics tests fail to work properly, then LUserCorp could use that as an excuse for time and cost overruns. If that happened, then the government would penalize Initrode to recoup those costs.

Over time and several releases of the hardware and software, a series of hardware faults and software bugs managed to creep into the system. Since LUserCorp was running behind schedule anyway, they decided that they'd use this as an excuse to hit the government for more time and money. Naturally, the brass at Initrode fought back because they didn't want to take the financial hit.

After lots of political back-and-forth, an official prioritized bug list was created, and it was mandated by LUserCorp that the bugs had to be fixed in the order in which they appeared on the list.

To this end, me and another junior developer were sent to LUserCorp to act as a Tiger Team. Basically, this means we are in a locked room, alone with the test station. The LUserCorp people were not allowed in the room with us. We brought the source code on our own disk pack which one of us had to be with at all times. This meant that if we went to lunch, or both had to hit the restroom at the same time, we had to power everything down and take the disk with us.

The list of bugs to be addressed was provided to us only after we were on site. The first and most important bug on the list was something I had coded that had an off-by-one error in a nested loop, that only appeared at the end of the third iteration of the outer loop. Since each inner loop processed and printed the 6-line result of each of 4K tests (to a very slow thermal printer), it took 6 hours to print out 3*4K*6 => 72K lines of test results.

Our software also had stop-at-test-n functionality. We noticed that bug number 2 on the list was in the stop-at-test-n functionality (it prevented that feature from being used).

In the ideal case where all other functions were working, we'd do:

  • turn off printing
  • set stop-at-test to: 12K-1
  • start running tests (it would finish in a few seconds)
  • turn on printing
  • set stop-at-test to: 12K+2
  • start running tests
  • look at 20 lines of output and see what was wrong

Naturally, we called Initrode and asked for permission to fix bug 2 before bug 1. Initrode called LUserCorp who flat out said No! Even after we explained that it would save oodles of time. NO; the bugs MUST be fixed in the specified order!

So for days, we would make a code change, compile for 5 minutes, launch the test sequence and goof off for 6 hours while waiting for it to print out 1200 pages of garbage before getting to the one test result we needed to see.

Once we figured out and fixed the problem of bug 1, we then spent the rest of our time there fixing the remainder of the bugs on the list, but were under strict orders NOT to divulge that we had fixed anything beyond bug 1.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!