Wednesday, November 30, 2011

Incorrect bug fixes

Whenever developers fix a bug there is of course a chance of making a mistake – not fixing the bug correctly or completely, or introducing a regression, an unexpected side effect. This is a serious problem in system maintenance. In Geriatric Issues of Aging Software, Capers Jones says that
Roughly 7 percent of all defect repairs will contain a new defect that was not there before. For very complex and poorly structured applications, these bad fix injections have topped 20 percent.
An interesting new study How do Fixes Become Bugs? looks at mistakes made by developers trying to fix bugs. The study was done on bug fixes made to large operating system code bases. Their findings aren’t surprising, but they are interesting:
  • Somewhere between 15-25% of fixes for bugs are found to be incorrect in the field. Almost half of these mistakes are serious (can cause crashes, hangs, data corruptions or security problems).
  • Concurrency bugs are the most difficult to fix correctly: 39% of concurrency bug fixes are wrong, and fixes on data race bugs can easily introduce deadlocks or reveal other bugs that were previously hidden. Not surprising given that the analysis was of operating system code. But this still hilights the risks in trying to fix concurrent code.
  • The risk of making mistakes is magnified if the person making the fix is not familiar with the code. More than 25%of incorrect fixes are made by developers who had never previously touched this part of the code before.
The main reasons for incorrect bug fixes:
  • Bug fixes are usually done under tight timelines – bug fixers don’t have the chance to think about potential side-effects and the interaction with the rest of the system, and testers don’t have enough time to thoroughly regression test the fix.
  • Bug fixing has a narrow focus – the developer is focused on understanding and fixing the bug, and doesn’t bother to understand the wider context of the system, often doesn’t even check for other places where the same fix needs to be made. Testers are also narrowly focused on proving that the fix works, and don’t look outside of the specific problem.
  • Lack of understanding of the code base – fixes, especially high-risk fixes (like concurrency changes) should be made by whoever understands the code the best.

So: don't let people who don't know the code well try to fix high-risk problems like concurrency bugs. But you knew that already, didn't you.

No comments:

Site Meter