Tuesday, September 16, 2014

Can Static Analysis replace Code Reviews?

In my last post, I explained how to do code reviews properly. I recommended taking advantage of static analysis tools like Findbugs, PMD, Klocwork or Fortify to check for common mistakes and bad code before passing the code on to a reviewer, to make the reviewer’s job easier and reviews more effective.

Some readers asked whether static analysis tools can be used instead of manual code reviews. Manual code reviews add delays and costs to development, while static analysis tools keep getting better, faster, and more accurate. So can you automate code reviews, in the same way that many teams automate functional testing? Do you need to do manual reviews too, or can you rely on technology to do the job for you?

Let’s start by understanding what static analysis bug checking tools are good at, and what they aren’t.

What static analysis tools can do – and what they can’t do

In this article, Paul Anderson at GrammaTech does a good job of explaining how static analysis bug finding works, the trade-offs between recall (finding all of the real problems), precision (minimizing false positives) and speed, and the practical limitations of using static analysis tools for finding bugs.

Static analysis tools are very good at catching certain kinds of mistakes, including memory corruption and buffer overflows (for C/C++), memory leaks, illegal and unsafe operations, null pointers, infinite loops, incomplete code, redundant code and dead code.

A static analysis tool knows if you are calling a library incorrectly (as long as it recognizes the function), if you are using the language incorrectly (things that a compiler could find but doesn’t) or inconsistently (indicating that the programmer may have misunderstood something).

And static analysis tools can identify code with maintainability problems, code that doesn't follow good practice or standards, is complex or badly structured and a good candidate for refactoring.

But these tools can’t tell you when you have got the requirements wrong, or when you have forgotten something or missed something important – because the tool doesn't know what the code is supposed to do. A tool can find common off-by-one mistakes and some endless loops, but it won’t catch application logic mistakes like sorting in descending order instead of ascending order, or dividing when you meant to multiply, referring to buyer when it should have been seller, or lessee instead of lessor. These are mistakes that aren't going to be caught in unit testing either, since the same person who wrote the code wrote the tests, and will make the same mistakes.

Tools can’t find missing functions or unimplemented features or checks that should have been made but weren't. They can’t find mistakes or holes in workflows. Or oversights in auditing or logging. Or debugging code left in by accident.

Static analysis tools may be able to find some backdoors or trapdoors – simple ones at least. And they might find some concurrency problems – deadlocks, races and mistakes or inconsistencies in locking. But they will miss a lot of them too.

Static analysis tools like Findbugs can do security checks for you: unsafe calls and operations, use of weak encryption algorithms and weak random numbers, using hard-coded passwords, and at least some cases of XSS, CSRF, and simple SQL injection. More advanced commercial tools that do inter-procedural and data flow analysis (looking at the sources, sinks and paths between) can find other bugs including injection problems that are difficult and time-consuming to trace by hand.

But a tool can’t tell you that you forgot to encrypt an important piece of data, or that you shouldn't be storing some data in the first place. It can’t find logic bugs in critical security features, if sensitive information could be leaked, when you got an access control check wrong, or if the code could fail open instead of closed.

And using one static analysis tool on its own to check code may not be enough. Evaluations of static analysis tools, such as NIST's SAMATE project (a series of comparative studies, where many tools are run against the same code), show almost no overlap between the problems found by different tools (outside of a few common areas like buffer errors) even when the tools are supposed to be doing the same kinds of checks. Which means that to get the most out of static analysis, you will need to run two or more tools against the same code (which is what SonarQube, for example, which integrates its own static analysis results with other tools, including popular free tools, does for you). If you’re paying for commercial tools, this could get very expensive fast.

Tools vs. Manual Reviews

Tools can find cases of bad coding or bad typing – but not bad thinking. These are problems that you will have to find through manual reviews.

A 2005 study Comparing Bug Finding Tools with Reviews and Tests used Open Source bug finding tools (including Findbugs and PMD) on 5 different code bases, comparing what the tools found to what was found through code reviews and functional testing. Static analysis tools found only a small subset of the bugs found in manual reviews, although the tools were more consistent – manual reviewers missed a few cases that the tools picked up.

Just like manual reviews, the tools found more problems with maintainability than real defects (this is partly because one of the tools evaluated – PMD – focuses on code structure and best practices). Testing (black box – including equivalence and boundary testing – and white box functional testing and unit testing) found fewer bugs than reviews. But different bugs. There was no overlap at all between bugs found in testing and the bugs found by the static analysis tools.

Finding problems that could happen - or do happen

Static analysis tools are good at finding problems that “could happen”, but not necessarily problems that “do happen”.

Researchers at Colorado State University ran static analysis tools against several releases of different Open Source projects, and compared what the tools found against the changes and fixes that developers actually made over a period of a few years – to see whether the tools could correctly predict the fixes that needed to be made and what code needed to be refactored.

The tools reported hundreds of problems in the code, but found very few of the serious problems that developers ended up fixing. One simple tool (Jlint) did not find anything that was actually fixed or cleaned up by developers. Of 112 serious bugs that were fixed in one project, only 3 were also found by static analysis tools. In another project, only 4 of 136 bugs that were actually reported and fixed were found by the tools. Many of the bugs that developers did fix were problems like null pointers and incorrect string operations – problems that static analysis tools should be good at catching, but didn’t.

The tools did a much better job of predicting what code should be refactored: developers ended up refactoring and cleaning up more than 70% of the code structure and code clarity issues that the tools reported (PMD, a free code checking tool, was especially good for this).

Ericsson evaluated different commercial static analysis tools against large, well-tested, mature applications. On one C application, a commercial tool found 40 defects – nothing that could cause a crash, but still problems that needed to be fixed. On another large C code base, 1% of the tool’s findings turned out to be bugs serious enough to fix. On the third project, they ran 2 commercial tools against an old version of a C system with known memory leaks. One tool found 32 bugs, another 16: only 3 of the bugs were found by both tools. Surprisingly, neither tool found the already known memory leaks – all of the bugs found were new ones. And on a Java system with known bugs they tried 3 different tools. None of the tools found any of the known bugs, but one of the tools found 19 new bugs that the team agreed to fix.

Ericsson’s experience is that static analysis tools find bugs that are extremely difficult to find otherwise. But it’s rare to find stop-the-world bugs – especially in production code – using static analysis.

This is backed up by another study on the use of static analysis (Findbugs) at Google and on the Sun JDK 1.6.0. Using the tool, engineers found a lot of bugs that were real, but not worth the cost of fixing: deliberate errors, masked errors, infeasible situations, code that was already doomed, errors in test code or logging code, errors in old code that was “going away soon” or other relatively unimportant cases. Only around 10% of medium and high priority correctness errors found by the tool were real bugs that absolutely needed to be fixed.

The Case for Security

So far we've mostly looked at static analysis checking for run-time correctness and general code quality, not security.

Although security builds on code quality – vulnerabilities are just bugs that hackers look for and exploit – checking code for correctness and clarity isn’t enough for a secure app. A lot of investment in static analysis technology over the past 5-10 years has been in finding security problems in code, such as common problems listed in OWASP’s Top 10 or the SANS/CWE Top 25 Most Dangerous Software Errors.

A couple of studies have looked at the effectiveness of static analysis tools compared to manual reviews in finding security vulnerabilities. The first study was on a large application that had 15 known security vulnerabilities found through a structured manual assessment done by security experts. Two different commercial static analysis tools were run across the code. The tools together found less than half of the known security bugs – only the simplest ones, the bugs that didn't require a deep understanding of the code or the design.

And of course the tools reported thousands of other issues that needed to be reviewed and qualified or thrown away as false positives. These other issues including some run-time correctness problems, null pointers and resource leaks, and code quality findings (dead code, unused variables), but no other real security vulnerabilities beyond those already found by the manual security review.

But this assumes that you have a security expert around to review the code. To find security vulnerabilities, a reviewer needs to understand the code (the language and the frameworks), and they also need to understand what kind of security problems to look for.

Another study shows how difficult this is. Thirty developers were hired to do independent security code reviews of a small web app (some security experts, others web developers). They were not allowed to use static analysis tools. The app had 6 known vulnerabilities. 20% of the reviewers did not find any of the known bugs. None of the reviewers found all of the known bugs, although several found a new XSS vulnerability that the researchers hadn’t known about. On average, 10 reviewers would have had only an 80% chance of finding all of the security bugs.

And, not Or

Static analysis tools are especially useful for developers working in unsafe languages like C/C++ (where there is a wide choice of tools to find common mistakes) or dynamically typed scripting languages like Javascript or PHP (where unfortunately the tools aren't that good), and for teams starting off learning a new language and framework. Using static analysis is (or should be) a requirement in highly regulated, safety critical environments like medical devices and avionics. And until more developers get more training and understand more about how to write secure software, we will all need to lean on static analysis (and dynamic analysis) security testing tools to catch vulnerabilities.

But static analysis isn't a substitute for code reviews. Yes, code reviews take extra time and add costs to development, even if you are smart about how you do them – and being smart includes running static analysis checks before you do reviews. If you want to move fast and write good, high-quality and secure code, you still have to do reviews.You can’t rely on static analysis alone.

Wednesday, August 20, 2014

Don’t waste time on Code Reviews

Less than half of development teams do code reviews and the other half are probably not getting as much out of code reviews as they should.

Here’s how to not waste time on code reviews.

Keep it Simple

Many people still think of code reviews as expensive formal code inspection meetings, with lots of prep work required before a room full of reviewers can slowly walk through the code together around a table with the help of a moderator and a secretary. Lots of hassles and delays and paperwork.

But you don’t have to do code reviews this way – and you shouldn’t.

There are several recent studies which prove that setting up and holding formal code review meetings add to development delays and costs without adding value. While it can take weeks to schedule a code review meeting, only 4% of defects are found in the meeting itself – the rest are all found by reviewers looking through code on their own.

At shops like Microsoft and Google, developers don’t attend formal code review meetings. Instead, they take advantage of collaborative code review platforms like Gerrit, CodeFlow, Collaborator, or ReviewBoard or Crucible, or use e-mail to request reviews asynchronously and to exchange information with reviewers.

These light weight reviews (done properly) are just as effective at finding problems in code as inspections, but much less expensive and much easier to schedule and manage. Which means they are done more often.

And these reviews fit much better with iterative, incremental development, providing developers with faster feedback (within a few hours or at most a couple of days, instead of weeks for formal inspections).

Keep the number of reviewers small

Some people believe that if two heads are better than one, then three heads are even better, and four heads even more better and so on…

So why not invite everyone on the team into a code review?

Answer: because it is a tragic waste of time and money.

As with any practice, you will quickly reach a point of diminishing returns as you try to get more people to look at the same code.

On average, one reviewer will find roughly half of the defects on their own. In fact, in a study at Cisco, developers who double-checked their own work found half of the defects without the help of a reviewer at all!

A second reviewer will find ½ as many new problems as the first reviewer. Beyond this point, you are wasting time and money. One study showed no difference in the number of problems found by teams of 3, 4 or 5 individuals, while another showed that 2 reviewers actually did a better job than 4.

This is partly because of overlap and redundancy – more reviewers means more people looking for and finding the same problems (and more people coming up with false positive findings that the author has to sift through). And as Geoff Crain at Atlassian explains, there is a “social loafing” problem: complacency and a false sense of security set in as you add more reviewers. Because each reviewer knows that somebody else is looking at the same code, they are under less pressure to find problems.

This is why at shops like Google and Microsoft where reviews are done successfully, the median number of reviewers is 2 (although there are times when an author may ask for more input, especially when the reviewers don’t agree with each other).

But what’s even more important than getting the right number of reviewers is getting the right people to review your code.

Code Reviews shouldn’t be done by n00bs – but they should be done for n00bs

By reviewing other people’s code a developer will get exposed to more of the code base, and learn some new ideas and tricks. But you can’t rely on new team members to learn how the system works or to really understand the coding conventions and architecture just by reviewing other developers’ code. Asking a new team member to review other people’s code is a lousy way to train people, and a lousy way to do code reviews.

Research backs up what should be obvious: the effectiveness of code reviews depend heavily on the reviewer’s skill and familiarity with the problem domain and with the code. Like other areas in software development, the differences in revew effectiveness can be huge, as much as 10x between best and worst performers. A study on code reviews at Microsoft found that reviewers from outside of the team or who were new to the team and didn’t know the code or the problem area could only do a superficial job of finding formatting issues or simple logic bugs.

This means that your best developers, team leads and technical architects will spend a lot of time reviewing code – and they should. You need reviewers who are good at reading code and good at debugging, and who know the language, framework and problem area well. They will do a much better job of finding problems, and can provide much more valuable feedback, including suggestions on how to solve the problem in a simpler or more efficient way, or how to make better use of the language and frameworks. And they can do all of this much faster.

If you want new developers to learn about the code and coding conventions and architecture, it will be much more effective to pair new developers up with an experienced team member in pair programming or pair debugging.

If you want new, inexperienced developers to do reviews (or if you have no choice), lower your expectations. Get them to review straightforward code changes (which don’t require in depth reviews), or recognize that you will need to depend a lot more on static analysis tools and another reviewer to find real problems.

Substance over Style

Reviewing code against coding standards is a sad way for a developer to spend their valuable time. Fight the religious style wars early, get everyone to use the same coding style templates in their IDEs and use a tool like Checkstyle to ensure that code is formatted consistently. Free up reviewers to focus on the things that matter: helping developers write better code, code that works correctly and that is easy to maintain.

“I’ve seen quite a few code reviews where someone commented on formatting while missing the fact that there were security issues or data model issues.”
Senior developer at Microsoft, from a study on code review practices

Correctness – make sure that the code works, look for bugs that might be hard to find in testing:

  • Functional correctness: does the code do what it is supposed to do – the reviewer needs to know the problem area, requirements and usually something about this part of the code to be effective at finding functional correctness issues
  • Coding errors: low-level coding mistakes like using <= instead of <, off-by-one errors, using the wrong variable (like mixing up lessee and lessor), copy and paste errors, leaving debugging code in by accident
  • Design mistakes: errors of omission, incorrect assumptions, messing up architectural and design patterns like MVC, abuse of trust
  • Safety and defensiveness: data validation, threading and concurrency (time of check/time of use mistakes, deadlocks and race conditions), error handling and exception handling and other corner cases
  • Malicious code: back doors or trap doors, time bombs or logic bombs
  • Security: properly enforcing security and privacy controls (authentication, access control, auditing, encryption)

Maintainability:

  • Clarity: class and method and variable naming, comments, …
  • Consistency: using common routines or language/library features instead of rolling your own, following established conventions and patterns
  • Organization: poor structure, duplicate or unused/dead code
  • Approach: areas where the reviewer can see a simpler or cleaner or more efficient implementation

Where should reviewers spend most of their time?

Research shows that reviewers find far more maintainability issues than defects (a ratio of 75:25) and spend more time on code clarity and understandability problems than correctness issues. There are a few reasons for this.

Finding bugs in code is hard. Finding bugs in someone else’s code is even harder.

In many cases, reviewers don’t know enough to find material bugs or offer meaningful insight on how to solve problems. Or they don’t have time to do a good job. So they cherry pick easy code clarity issues like poor naming or formatting inconsistencies.

But even experienced and serious reviewers can get caught up in what at first seem to be minor issues about naming or formatting, because they need to understand the code before they can find bugs, and code that is unnecessarily hard to read gets in the way and distracts them from more important issues.

This is why programmers at Microsoft will sometimes ask for 2 different reviews: a superficial “code cleanup” review from one reviewer that looks at standards and code clarity and editing issues, followed by a more in depth review to check correctness after the code has been tidied up.

Use static analysis to make reviews more efficient

Take advantage of static analysis tools upfront to make reviews more efficient. There’s no excuse not to at least use free tools like Findbugs and PMD for Java to catch common coding bugs and inconsistencies, and sloppy or messy code and dead code before submitting the code to someone else for review.

This frees the reviewer up from having to look for micro-problems and bad practices, so they can look for higher-level mistakes instead. But remember that static analysis is only a tool to help with code reviews – not a substitute. Static analysis tools can’t find functional correctness problems or design inconsistencies or errors of omission, or help you to find a better or simpler way to solve a problem.

Where’s the risk?

We try to review all code changes. But you can get most of the benefits of code reviews by following the 80:20 rule: focus reviews on high risk code, and high risk changes.

High risk code:

  • Network-facing APIs
  • Plumbing (framework code, security libraries….)
  • Critical business logic and workflows
  • Command and control and root admin functions
  • Safety-critical or performance-critical (especially real-time) sections
  • Code that handles private or sensitive data
  • Old code, code that is complex, code that has been worked on by a lot of different people, code that has had a lot of bugs in the past – error prone code
High risk changes:
  • Code written by a developer who has just joined the team
  • Big changes
  • Large-scale refactoring (redesign disguised as refactoring)

Get the most out of code reviews

Code reviews add to the cost of development, and if you don’t do them right they can destroy productivity and alienate the team. But they are also an important way to find bugs and for developers to help each other to write better code. So do them right.

Don’t waste time on meetings and moderators and paper work. Do reviews early and often. Keep the feedback loops as tight as possible.

Ask everyone to take reviews seriously – developers and reviewers. No rubber stamping, or letting each other off of the hook.

Make reviews simple, but not sloppy. Ask the reviewers to focus on what really matters: correctness issues, and things that make the code harder to understand and harder to maintain. Don’t waste time arguing about formatting or style.

Make sure that you always review high risk code and high risk changes.

Get the best people available to do the job – when it comes to reviewers, quality is much more important than quantity. Remember that code reviews are only one part of a quality program. Instead of asking more people to review code, you will get more value by putting time into design reviews or writing better testing tools or better tests. A code review is a terrible thing to waste.

Wednesday, August 6, 2014

Feature Toggles are one of the worst kinds of Technical Debt

Feature flags or config flags aka feature toggles aka flippers are an important part of Devops practices like dark launching (releasing features immediately and incrementally), A/B testing, and branching in code or branching by abstraction (so that development teams can all work together directly on the code mainline instead of creating separate feature branches).

Feature toggles can be simple Boolean switches or complex decision trees with multiple different paths. Martin Fowler differentiates between release toggles (which are used by development and ops to temporarily hide incomplete or risky features from all or part of the user base) and business toggles to control what features are available to different users (which may have a longer – even permanent – life). He suggests that these different kinds of flags should be managed separately, in different configuration files for example. But the basic idea is the same, to build conditional branches into mainline code in order to make logic available only to some users or to skip or hide logic at run-time, including code that isn’t complete (the case for branching by abstraction).

Using run-time flags like this isn't a new idea, certainly not invented at Flickr or Facebook. Using flags and conditional statements to offer different experiences to different users or to turn on code incrementally is something that many people have been practicing for a long time. And doing this in mainline code to avoid branching is in many ways a step back to the way that people built software 20+ years ago when we didn’t have reliable and easy to use code management systems.

Advantages and Problems of Feature Flags

Still, there are advantages to developers working this way, making merge problems go away, and eliminating the costs of maintaining and supporting long-lived branches. And carefully using feature flags can help you to reduce deployment risk through canary releases or other incremental release strategies, where you make the new code active for only some users or customers, or only on some systems, and closely check before releasing progressively to the rest of the user base – and turn off the new code if you run into problems. All of this makes it easier to get new code out faster for testing and feedback.

But using feature flags creates new problems of its own.

The plumbing and scaffolding logic to support branching in code becomes a nasty form of technical debt, from the moment each feature switch is introduced. Feature flags make the code more fragile and brittle, harder to test, harder to understand and maintain, harder to support, and less secure.

Feature Flags need to be Short Lived

Abhishek Tiwari does a good job of explaining feature toggles and how they should be used. He makes it clear that they should only be a temporary deployment/release management tool, and describes a disciplined lifecycle that all feature toggles need to follow, from when they are created by development, then turned on by operations, updated if any problems or feedback come up, and finally retired and removed when no longer needed.
Feature toggles require a robust engineering process, solid technical design and a mature toggle life-cycle management. Without these 3 key considerations, use of feature toggles can be counter-productive. Remember the main purpose of toggles is to perform release with minimum risk, once release is complete toggles need to be removed.

Feature Flags are Technical Debt – as soon as you add them

Like other sources of technical debt, feature flags are cheap and easy to add in the short term. But the longer that they are left in the code, the more that they will end up costing you.

Release toggles are supposed to make it easier and safer to push code out. You can push code out only to a limited number of users to start, reducing the impact of problems, or dark launch features incrementally, carefully assessing added performance costs as you turn on some of the logic behind the scenes, or run functions in parallel. And you can roll-back quickly by turning off features or optional behaviour if something goes wrong or if the system comes under too much load.

But as you add options, it can get harder to support and debug the system, keeping track of which flags are in which state in production and test can make it harder to understand and duplicate problems.

And there are dangers in releasing code that is not completely implemented, especially if you are following branching by abstraction and checking in work-in-progress code protected by a feature flag. If the scaffolding code isn't implemented correctly you could accidentally expose some of this code at run-time with unpredictable results.

…visible or not, you are still deploying code into production that you know for a fact to be buggy, untested, incomplete and quite possibly incompatible with your live data. Your if statements and configuration settings are themselves code which is subject to bugs – and furthermore can only be tested in production. They are also a lot of effort to maintain, making it all too easy to fat-finger something. Accidental exposure is a massive risk that could all too easily result in security vulnerabilities, data corruption or loss of trade secrets. Your features may not be as isolated from each other as you thought you were, and you may end up deploying bugs to your production environment”
James McKay

The support dangers of using – or misusing – feature flags was illustrated by a recent high-profile business failure at a major financial institution. The team used feature flags to contain operational risk when they introduced a new application feature. Unfortunately, they re-purposed a flag which was used by old code (code left in the system even though it hadn't been used in years).

Due to some operational mistakes in deployment, not all of the servers were successfully updated with the new code, and when the flag was turned on, old code and new code started to run on different computers at the same time doing completely different things with wildly inconsistent and, ultimately business-ending results. By the time that the team figured out what was going wrong, the company had lost millions of $.

As more flags get added, testing of the application becomes harder and more expensive, and can lead to an explosion of combinations: If a is on and b is off and c is on and d is off then… what is supposed to happen? Fowler says that you only need to test the combinations which should reasonably be expected to happen in production, but this demands that everyone involved clearly understand what options could and should be used together – as more flags get added, this gets harder to understand and verify.

And other testing needs to be done to make sure that switches can be turned on and off safely at run-time, and that features are completely and safely encapsulated by the flag settings and that behaviour doesn’t leak out by accident (especially if you are branching in code and releasing work-in-progress code). You also need to test to make sure that the structural changes to introduce the feature toggle do not introduce any regressions, all adding to testing costs and risks.

More feature flags also make it harder to understand how and where to make fixes or changes, especially when you are dealing with long-lived flags and nested options.

And using feature switches can make the system less secure, especially if you are hiding access to features in the UI. Adding a feature can make the attack surface of the application bigger, and hiding features at the UI level (for dark launching) won’t hide these features from bad guys.

Use Feature Flags with Caution

Feature flags are a convenient and flexible way to manage code, and can help you to get changes and fixes out to production more quickly. But if you are going to use flags, do so responsibly:

  • Minimize your use of feature flags for release management, and make the implementation as simple as possible. Martin Fowler explains that it is important to minimize conditional logic to the UI and to entry points in the system. He also emphasises that:
    Release toggles are a useful technique and lots of teams use them. However they should be your last choice when you're dealing with putting features into production.

    Your first choice should be to break the feature down so you can safely introduce parts of the feature into the product. The advantages of doing this are the same ones as any strategy based on small, frequent releases. You reduce the risk of things going wrong and you get valuable feedback on how users actually use the feature that will improve the enhancements you make later.
  • Review flags often, make sure that you know which flags are on and which are supposed to be on and when features are going to be removed. Create dashboards (so that everyone can easily see the configuration) and health checks – run-time assertions – to make sure that important flags are on or off as appropriate.
  • Once a feature is part of mainline, be ruthless about getting it out of the code base as soon as it isn't used or needed any more. This means carefully cleaning up the feature flags and all of the code involved, and testing again to make sure that you didn't break anything when you did this. Don’t leave code in the mainline just in case you might need it again some day. You can always go back and retrieve it from version control if you need to.
  • Recognize and account for the costs of using feature flags, especially long-lived business logic branching in code.

Feature toggles start off simple and easy. They provide you with new options to get changes out faster, and can help reduce the risk of deployment in the short term. But the costs and risks of relying on them too much can add up, especially over the longer term.

Tuesday, July 29, 2014

Devops isn't killing developers – but it is killing development and developer productivity

Devops isn't killing developers – at least not any developers that I know.

But Devops is killing development, or the way that most of us think of how we are supposed to build and deliver software. Agile loaded the gun. Devops is pulling the trigger.

Flow instead of Delivery

A sea change is happening in the way that software is developed and delivered. Large-scale waterfall software development projects gave way to phased delivery and Spiral approaches, and then to smaller teams delivering working code in time boxes using Scrum or other iterative Agile methods. Now people are moving on from Scrum to Kanban, and to One-Piece Continuous Flow with immediate and Continuous Deployment of code to production in Devops.

The scale and focus of development continues to shrink, and so does the time frame for making decisions and getting work done. Phases and milestones and project reviews to sprints and sprint reviews to Lean controls over WIP limits and task-level optimization. The size of deliverables: from what a project team could deliver in a year to what a Scrum team could get done in a month or a week to what an individual developer can get working in production in a couple of days or a couple of hours.

The definition of “Done” and “Working Software” changes from something that is coded and tested and ready to demo to something that is working in production – now (“Done Means Released”).

Continuous Delivery and Continuous Deployment replace Continuous Integration. Rapid deployment to production doesn't leave time for manual testing or for manual testers, which means developers are responsible for catching all of the bugs themselves before code gets to production – or do their testing in production and try to catch problems as they happen (aka “Monitoring as Testing").

Because Devops brings developers much closer to production, operational risks become more important than project risks, and operational metrics become more important than project metrics. System uptime and cycle time to production replace Earned Value or velocity. The stress of hitting deadlines is replaced by the stress of firefighting in production and being on call.

Devops isn't about delivering a project or even delivering features. It’s about minimizing lead time and maximizing flow of work to production, recognizing and eliminating junk work and delays and hand offs, improving system reliability and cutting operational costs, building in feedback loops from production to development, standardizing and automating steps as much as possible. It’s more manufacturing and process control than engineering.

Devops kills Developer Productivity too

Devops also kills developer productivity.

Whether you try to measure developer productivity by LOC or Function Points or Feature Points or Story Points or velocity or some other measure of how much code is written, less coding gets done because developers are spending more time on ops work and dealing with interruptions, and less time writing code.

Time learning about the infrastructure and the platform and understanding how it is setup and making sure that it is setup right. Building Continuous Delivery and Continuous Deployment pipelines and keeping them running. Helping ops to investigate and resolve issues, responding to urgent customer requests and questions, looking into performance problems, monitoring the system to make sure that it is working correctly, helping to run A/B experiments, pushing changes and fixes out… all take time away from development and pre-empt thinking about requirements and designing and coding and testing (the work that developers are trained to do and are good at).

The Impact of Interruptions and Multi-Tasking

You can’t protect developers from interruptions and changes in priorities in Devops, even if you use Kanban with strict WIP limits, even in a tightly run shop – and you don’t want to. Developers need to be responsive to operations and customers, react to feedback from production, jump on problems and help detect and resolve failures as quickly as possible. This means everyone, especially your most talented developers, need to be available for ops most if not all of the time.

Developers join ops on call after hours, which means carrying a pager (or being chased by Pager Duty) after the day’s work is done. And time wasted on support calls for problems that end up not being real problems, and long nights and weekends on fire fighting and tracking down production issues and helping to recover from failures, coming in tired the next day to spend more time on incident dry runs and testing failover and roll-forward and roll-back recovery and participating in post mortems and root cause analysis sessions when something goes wrong and the failover or roll-forward or roll-back doesn’t work.

You can’t plan for interruptions and operational problems, and you can’t plan around them. Which means developers will miss their commitments more often. Then why make commitments at all? Why bother planning or estimating? Use just-in-time prioritization instead to focus in on the most important thing that ops or the customer need at the moment, and deliver it as soon as you can – unless something more important comes up and pre-empts it.

As developers take on more ops and support responsibilities, multi-tasking and task switching – and the interruptions and inefficiency that come with it – increase, fracturing time and destroying concentration. This has an immediate drag on productivity, and a longer term impact on people’s ability to think and to solve problems.

Even the Continuous Deployment feedback loop itself is an interruption to a developer’s flow.

After a developer checks in code, running unit tests in Continuous Integration is supposed to be fast, a few seconds or minutes, so that they can keep moving forward with their work. But to deploy immediately to production means running through a more extensive set of integration tests and systems tests and other checks in Continuous Delivery (more tests and more checks takes more time), then executing the steps through to deployment, and then monitoring production to make sure that everything worked correctly, and jumping in if anything goes wrong. Even if most of the steps are automated and optimized, all of this takes extra time and the developer’s attention away from working on code.

Optimizing the flow of work in and out of operations means sacrificing developer flow, and slowing down development work itself.

Expectations and Metrics and Incentives have to Change

In Devops, the way that developers (and ops) work change, and the way that they need to be managed changes. It’s also critical to change expectations and metrics and incentives for developers.

Devops success is measured by operational IT metrics, not on meeting project delivery goals of scope, schedule and cost, not on meeting release goals or sprint commitments, or even meeting product design goals.

  • How fast can the team respond to important changes and problems: Change Lead Time and Cycle Time to production instead of delivery milestones or velocity
  • How often do they push changes to production (which is still the metric that most people are most excited about – how many times per day or per hour or minute Etsy or Netflix or Amazon deploy changes)
  • How often do they make mistakes - Change / Failure ratio
  • System reliability and uptime – MTBF and especially MTTD and MTTR
  • Cost of change – and overall Operations and Support costs

Devops is more about Ops than Dev

As more software is delivered earlier and more often to production, development turns into maintenance. Project management is replaced by incident management and task management. Planning horizons get much shorter – or planning is replaced by just-in-time queue prioritization and triage.

With Infrastructure as Code Ops become developers, designing and coding infrastructure and infrastructure changes, thinking about reuse and readability and duplication and refactoring, technical debt and testability and building on TDD to implement TDI (Test Driven Infrastructure). They become more agile and more Agile, making smaller changes more often, more time programming and less on paper work.

And developers start to work more like ops. Taking on responsibilities for operations and support, putting operational risks first, caring about the infrastructure, building operations tools, finding ways to balance immediate short-term demands for operational support with longer-term design goals.

None of this will be a surprise to anyone who has been working in an online business for a while. Once you deliver a system and customers start using it, priorities change, everything about the way that you work and plan has to change too.

This way of working isn't better for developers, or worse necessarily. But it is fundamentally different from how many developers think and work today. More frenetic and interrupt-driven. At the same time, more disciplined and more Lean. More transparent. More responsibility and accountability. Less about development and more about release and deployment and operations and support.

Developers – and their managers – will need to get used to being part of the bigger picture of running IT, which is about much more than designing apps and writing and delivering code. This might be the future of software development. But not all developers will like it, or be good at it.

Thursday, July 17, 2014

Trust instead of Threats

According to Dr. Gary McGraw’s ground breaking work on software security, up to half of security mistakes are made in design rather than in coding. So it’s critical to prevent – or at least try to find and fix – security problems in design.

For the last 10 years we’ve been told that we are supposed to do this through threat modeling aka architectural risk analysis – a structured review of the design or architecture of a system from a threat perspective to identify security weaknesses and come up with ways to resolve them.

But outside of a few organizations like Microsoft threat modeling isn’t being done at all, or at best only on an inconsistent basis.

Cigital’s work on the Build Security In Maturity Model (BSIMM), which looks in detail at application security programs in different organizations, has found that threat modeling doesn't scale. Threat modeling is still too heavyweight, too expensive, too waterfally, and requires special knowledge and skills.

The SANS Institute’s latest survey on application security practices and tools asked organizations to rank the application security tools and practices they used the most and found most effective. Threat modeling was second last.

And at the 2014 RSA Conference, Jim Routh at Aetna, who has implemented large-scale secure development programs in 4 different major organizations, admitted that he has not yet succeeded in injecting threat modeling into design anywhere “because designers don’t understand how to make the necessary tradeoff decisions”.

Most developers don’t know what threat modeling is, or how do to it, never mind practice it on a regular basis. With the push to accelerate software delivery, from Agile to One-Piece Continuous Flow and Continuous Deployment to production in Devops, the opportunities to inject threat modeling into software development are disappearing.

What else can we do to include security in application design?

If threat modeling isn’t working, what else can we try?

There are much better ways to deal with security than threat modelling... like not being a tool.
JeffCurless, comment on a blog post about threat modeling

Security people think in terms of threats and risks – at least the good ones do. They are good at exploring negative scenarios and what-ifs, discovering and assessing risks.

Developers don’t think this way. For most of them, walking through possibilities, things that will probably never happen, is a waste of time. They have problems that need to be solved, requirements to understand, features to deliver. They think like engineers, and sometimes they can think like customers, but not like hackers or attackers.

In his new book on Threat Modeling Adam Shostack says that telling developers to “think like an attacker” is like telling someone to think like a professional chef. Most people know something about cooking, but cooking at home and being a professional chef are very different things. The only way to know what it’s like to be a chef and to think like a chef is to work for some time as a chef. Talking to a chef or reading a book about being a chef or sitting in meetings with a chef won’t cut it.

Developrs aren’t good at thinking like attackers, but they constantly make assertions in design, including important assertions about dependencies and trust. This is where security should be injected into design.

Trust instead of Threats

Threats don’t seem real when you are designing a system, and they are hard to quantify, even if you are an expert. But trust assertions and dependencies are real and clear and concrete. Easy to see, easy to understand, easy to verify. You can read the code, or write some tests, or add a run-time check.

Reviewing a design this way starts off the same as a threat modeling exercise, but it is much simpler and less expensive. Look at the design at a system or subsystem-level. Draw trust boundaries between systems or subsystems or layers in the architecture, to see what’s inside and what’s outside of your code, your network, your datacenter:

Trust boundaries are like software firewalls in the system. Data inside a trust boundary is assumed to be valid, commands inside the trust boundary are assumed to have been authorized, users are assumed to be authenticated. Make sure that these assumptions are valid. And make sure to review dependencies on outside code. A lot of security vulnerabilities occur at the boundaries with other systems, or with outside libraries because of misunderstandings or assumptions in contracts.
OWASP Application Threat Modeling

Then, instead of walking through STRIDE or CAPEC or attack trees or some other way of enumerating threats and risks, ask some simple questions about trust:

Are the trust boundaries actually where you think they are, or think they should be?

Can you trust the system or subsystem or service on the other side of the boundary? How can you be sure? Do you know how it works, what controls and limits it enforces? Have you reviewed the code? Is there a well-defined API contract or protocol? Do you have tests that validate the interface semantics and syntax?

What data is being passed to your code? Can you trust this data – has it been validated and safely encoded, or do you need to take care of this in your code? Could the data have been tampered with or altered by someone else or some other system along the way?

Can you trust the code on the other side to protect the integrity and confidentiality of data that you pass to it? How can you be sure? Should you enforce this through a hash or an HMAC or a digital signature or by encrypting the data?

Can you trust the user’s identity? Have they been properly authenticated? Is the session protected?

What happens if an exception or error occurs, or if a remote call hangs or times out – could you lose data or data integrity, or leak data, does the code fail open or fail closed?

Are you relying on protections in the run-time infrastructure or application framework or language to enforce any of your assertions? Are you sure that you are using these functions correctly?

These are all simple, easy-to-answer questions about fundamental security controls: authentication, access control, auditing, encryption and hashing, and especially input data validation and input trust, which Michael Howard at Microsoft has found to be the cause of half of all security bugs.

Secure Design that can actually be done

Looking at dependencies and trust will find – and prevent – important problems in application design.

Developers don’t need to learn security jargon, try to come up with attacker personas or build catalogs of known attacks and risk weighting matrices, or figure out how to use threat modeling tools or know what a cyber kill chain is or understand the relative advantages of asset-centric threat modeling over attacker-centric modeling or software-centric modeling.

They don’t need to build separate models or hold separate formal review meetings. Just look at the existing design, and ask some questions about trust and dependencies. This can be done by developers and architects in-phase as they are working out the design or changes to the design – when it is easiest and cheapest to fix mistakes and oversights.

And like threat modeling, questioning trust doesn’t need to be done all of the time. It’s important when you are in the early stages of defining the architecture or when making a major design change, especially a change that makes the application’s attack surface much bigger (like introducing a new API or transitioning part of the system to the Cloud). Any time that you are doing a “first of”, including working on a part of the system for the first time. The rest of the time, the risks of getting trust assumptions wrong should be much lower.

Just focusing on trust won’t be enough if you are building a proprietary secure protocol. And it won’t be enough for high-risk security features – although you should be trying to leverage the security capabilities of your application framework or a special-purpose security library to do this anyways. There are still cases where threat modeling should be done – and code reviews and pen testing too. But for most application design, making sure that you aren’t misplacing trust should be enough to catch important security problems before it is too late.

Wednesday, July 9, 2014

10 things you can do to as a developer to make your app secure: #10 Design Security In

There’s more to secure design and architecture besides properly implementing Authentication, Access Control and Logging strategies, and choosing (and properly using) a good framework.

You need to consider and deal with security threats and risks at many different points in your design.

Adam Shostack’s new book on Threat Modeling explores how to do this in detail, with lots of exercises and examples on how to look for and plug security holes in software design, and how to think about design risks.

But some important basic ideas in secure design will take you far:

Know your Tools

When deciding on the language(s) and technology stack for the system, make sure that you understand the security constraints and risks that your choices will dictate. If you’re using a new language, take time to learn about how to write code properly and safely in that language. If you’re programming in Java, C or C++ or Perl check out CERT’s secure coding guidelines for those languages. If you're writing code on iOS, read Apple's Secure Coding Guide. For .NET, review OWASP's .NET Security project.

Look for static analysis tools like Findbugs and PMD for Java, JSHint for Javascript, OCLint for C/C++ and Objective-C, Brakeman for Ruby, RIPS for PHP, Microsoft's static analysis tools for .NET, or commercial tools that will help catch common security bugs and logic bugs in coding or Continuous Integration.

And make sure that you (or ops) understand how to lock down or harden the O/S and to safely configure your container and database (or NoSQL data) manager.

Tiering and Trust

Tiering or layering, and trust in design are closely tied together. You must understand and verify trust assumptions at the boundaries of each layer in the architecture and between systems and between components in design, in order to decide what security controls need to be enforced at these boundaries: authentication, access control, data validation and encoding, encryption, logging.

Understand when data or control crosses a trust boundary: to/from code that is outside of your direct control. This could be an outside system, or a browser or mobile client or other type of client, or another layer of the architecture or another component or service.

Thinking about trust is much simpler and more concrete than thinking about threats. And easier to test and verify. Just ask some simple questions:

Where is the data coming from? How can you be sure? Can you trust this data – has it been validated and safely encoded? Can you trust the code on the other side to protect the integrity and confidentiality of data that you pass to it? Do you know what happens if an exception or error occurs – could you lose data or data integrity, or leak data, does the code fail open or fail closed?

Before you make changes to the design, make sure that you understand these assumptions and make sure that the assumptions are correct.

The Application Attack Surface

Finally, it’s important to understand and manage the system’s Attack Surface: all of the ways that attackers can get in, or get data out, of the system, all of the resources that are exposed to attackers. APIs, files, sockets, forms, fields, URLs, parameters, cookies. And the security plumbing that protects these parts of the system.

Your goal should be to try to keep the Attack Surface as small as possible. But this is much easier said than done: each new feature and new integration point expands the Attack Surface. Try to understand the risks that you are introducing, and how serious they are. Are you creating a brand new network-facing API or designing a new business workflow that deals with money or confidential data, or changing your access control model, or swapping out an important part of your platform architecture? Or are you just adding yet another CRUD admin form, or just one more field to an existing form or file. In each case you are changing the Attack Surface, but the risks will be much different, and so will the way that you need to manage these risks.

For small, well-understood changes the risks are usually negligible – just keep coding. If the risks are high enough you’ll need to do some abuse case analysis or threat modeling, or make time for a code review or pen testing.

And of course, once a feature or option or interface is no longer needed, remove it and delete the code. This will reduce the system’s Attack Surface, as well as simplifying your maintenance and testing work.

That’s it. We’re done.

The 10 things you can do as a developer to make your app secure: from thinking about security in architectural layering and technology choices, including security in requirements, taking advantage of other people’s code by using frameworks and libraries carefully, making sure that you implement basic security controls and features like Authentication and Access Control properly, protecting data privacy, logging with security in mind, and dealing with input data and stopping injection attacks, especially SQL injection.

This isn’t an exhaustive list. But understanding and dealing with these important issues in application security – including security when you think about requirements and design and coding and testing, knowing more about your tools and using them properly – is work that all developers can do, and will take you a long, long way towards making your system secure and reliable.

Monday, July 7, 2014

10 things you can do as a developer to make your app secure: #9 Start with Requirements

To build a secure system, you should start thinking about security from the beginning.

Legal and Compliance Constraints

First, make sure that everyone on the team understands the legal and compliance requirements and constraints for the system.

Regulations will drive many of the security controls in your system, including authentication, access control, data confidentiality and integrity (and encryption), and auditing, as well as system availability and reliability.

Agile teams in particular should not depend only on their Product Owner to understand and communicate these requirements. Compliance restrictions can impose important design constraints which may not be clear from a business perspective, as well as assurance requirements that dictate how you need to build and test and deliver the system, and what evidence you need to show that you have done a responsible job.

As developers you should try to understand what all of this means to you as early as possible. As a place to start, Microsoft has a useful and simple guide (Regulatory Compliance Demystified: An Introduction to Compliance for Developers) that explains common business regulations including SOX, HIPAA, and PCI-DSS and what they mean to developers.

Tracking Confidential Data

The fundamental concern in most regulatory frameworks is controlling and protecting data.

Make sure that everyone understands what data is private/confidential/sensitive and therefore needs to be protected. Identify and track this data throughout the system. Who owns the data? What is the chain of custody? Where is the data coming from? Can the source be trusted? Where is the data going to? Can you trust the destination to protect the data? Where is the data stored or displayed? Does it have to be stored or displayed? Who is authorized to create it, see it, change it, and do these actions need to be tracked and reviewed?

The answers to these questions will drive requirements for data validation, data integrity, access control, encryption, and auditing and logging controls in the system.

Application Security Controls

Think through the basic functional application security controls: Authentication, Access Control, Auditing – all of which we’ve covered earlier in this series of posts. Where do these controls need to be added? What security stories need to be written? How will these controls be tested?

Business Logic Abuse Can be Abused

Security also needs to be considered in business logic, especially multi-step application workflows dealing with money or other valuable items, or that handle private or sensitive information, or command and control functions. Features like online shopping carts, online banking account transactions, user password recovery, bidding in online auctions, or online trading and root admin functions are all potential targets for attack.

The user stories or use cases for these features should include exceptions and failure scenarios (what happens if a step or check fails or times out, or if the user tries to cancel or repeat or bypass a step?) and requirements derived from "abuse cases" or “misuse cases”. Abuse cases explore how the application's checks and controls could be subverted by attackers or how the functions could be gamed, looking for common business logic errors including time of check/time of use and other race conditions and timing issues, insufficient entropy in keys or addresses, information leaks, failure to prevent brute forcing, failing to enforce workflow sequencing and approvals, and basic mistakes in input data validation and error/exception handling and limits checking.

This isn’t defence-against-the-dark-arts black hat magic, but getting this stuff wrong can be extremely damaging. For some interesting examples of how bad guys can exploit small and often just plain stupid mistakes in application logic, read Jeremiah Grossman’s classic paper “Seven Business Logic Flaws that put your Website at Risk”.

Make time to walk through important abuse cases when you’re writing up stories or functional requirements, and make sure to review this code carefully and include extra manual testing (especially exploratory testing) as well as pen testing of these features to catch serious business logic problems.

We’re close to the finish line. The final post in this series is coming up: Design and Architect Security In.
Site Meter