Thursday, June 16, 2016

Dev-Sec.io Automated Hardening Framework

Automated configuration management tools like Ansible, Chef and Puppet are changing the way that organizations provision and manage their IT infrastructure. These tools allow engineers to programmatically define how systems are set up, and automatically install and configure software packages. System provisioning and configuration becomes testable, auditable, efficient, scalable and consistent, from tens to hundreds or thousands of hosts.

These tools also change the way that system hardening is done. Instead of following a checklist or a guidebook like one of the CIS Benchmarks, and manually applying or scripting changes, you can automatically enforce hardening policies or audit system configurations against recognized best practices, using pre-defined hardening rules programmed into code.

An excellent resource for automated hardening is a set of open source templates originally developed at Deutsche Telekom, under the project name "Hardening.io". The authors have recently had to rename this hardening framework to Dev-Sec.io

It includes Chef recipes and Puppet manifests for hardening base Linux, as well as for SSH, MySQL and PostgreSQL, Apache and Nginx. Ansible support at this time is limited to playbooks for base Linux and SSH. Dev-Sec.io works on Ubuntu, Debian, RHEL, CenOS and Oracle Linux distros.

For container security, the project team have just added an InSpec profile for Chef Compliance against the CIS Docker 1.11.0 benchmark.

Dev-Sec.io is comprehensive and at the same time accessible. And it’s open, actively maintained, and free. You can review the rules, adopt them wholesale, or cherry pick or customize them if needed. It’s definitely worth your time to check it out on GitHub: https://github.com/dev-sec

Thursday, June 2, 2016

DevOpsSec: Using DevOps to Secure DevOps

I finished writing an e-book for O'Reilly on DevOpsSec: Securing Software through Continuous Delivery. It explains how to wire security into Continuous Delivery, and how to use Continuous Delivery and programmable Infrastructure as Code and other DevOps practices to build and operate more secure systems. It is based on approaches followed by organizations like Etsy, Netflix, LMAX, Amazon, Intuit, Google, and others, including my own firm.

The e-book is available for free download at: http://www.oreilly.com/webops-perf/free/devopssec.csp. I'd appreciate feedback and corrections.

Monday, April 18, 2016

DevOpsDays: Empathy, Scaling, Docker, Dependencies and Secrets

Last week I attended DevOpsDays 2016 in Vancouver. I was impressed to see how strong the DevOps community has grown from the time that I attended my first DevOpsDays event in Mountain View in 2012. There were more than 350 attendees, all of them doing interesting and important work.

Here are the main themes that I followed at this conference:

Empathy – Humanizing Engineering and Ops

There was a strong thread running through the conference on the importance of the human side of engineering and operations, understanding and empathizing with people across the organization. There were two presentations specifically on empathy: one from an engineering perspective by Joyent’s Matthew Smillie, and another excellent presentation on the neuroscience of empathy by Dave Mangot at Librato, which explained how we are all built for empathy and that it is core to our survival. There was also a presentation on gender issues, and several breakout sessions on dealing with people issues and bringing new people into DevOps.

Another side to this was how we use tools to collaborate and build connections between people. More people are depending more on – and doing more with – chat systems like HipChat and Slack to do ChatOps. Using chat as a general interface to other tools, leveraging bots like Hubot to automatically trigger and guide actions, such as tracking releases and handling incidents.

In some organizations, standups are being replaced with Chatups, as people continue to find new ways to engage and connect with other people working remotely and inside and outside of teams.

Scaling DevOps

All kinds of organizations are dealing with scaling problems in DevOps.

Scaling their organizations. Dealing with DevOps at the extremes, at really large organizations and figuring out how to effectively do DevOps in small teams.

Scaling Continuous Delivery. Everyone is trying to push out more changes, faster and more often in order to reduce risk (by reducing the batch size of changes), increase engagement (for users and developers), and improve the quality of feedback. Some organizations are already reaching the point where they need to manage hundreds or thousands of pipelines, or optimize single pipelines shared by hundreds of engineers, building and shipping out changes (or newly baked containers) several times a day to many different environments.

A common story for CD as organizations scale up goes something like this:

  1. Start out building a CD capability in an ad hoc way, using Jenkins and adding some plugins and writing custom scripts. Keep going until it can’t keep up.
  2. Then buy and install a commercial enterprise CD toolset, transition over and run until it can’t keep up.
  3. Finally, build your own custom CD server and move your build and test fleet to the cloud and keep going until your finance department shouts at you.
Scaling testing. Coming up with effective strategies for test automation where it adds most value – in unit testing (at the bottom of the test pyramid), and end-to-end system testing (at the top of the pyramid). Deciding where to invest your time. Understanding the tools and how to use them. What kind of tests are worth writing, and worth maintaining.

Scaling architecture. Which means more and more experiments with microservices.

Docker, Docker, Docker

Docker is everywhere. In pilots. In development environments. In test environments especially. And more often now, in production. Working with Docker, problems with Docker, and questions about Docker came up in many presentations, break outs and hallway discussions.

Docker is creating new problems at the start and end of the CD pipeline.

First, it moves configuration management upfront into the build step. Every change to the application or change to the stack that it is built and runs on requires you to “bake a new cake” (Diogenes Rettori at Openshift) and build up and ship out a new container. This places heavy demands on your build environment. You need to find effective and efficient ways to manage all of the layers in your containers, caching dependencies and images to make builds run fast.

Docker is also presenting new challenges at the production end. How do you track and manage and monitor clusters of containers as the application scales out? Kubernetes seems to be the tool of choice here.

Depending on Dependencies

More attention is turning to builds and dependency management, managing third party and open source dependencies. Identifying, streamlining and securing these dependencies.

Not just your applications and their direct dependencies – but all of the nested dependencies in all of the layers below (the software that your software depends on, and the software that this software depends on, and so on and so on). Especially for teams working with heavy stacks like Java.

There was a lot of discussion on the importance of tracking dependencies and managing your own dependency repositories, using tools like Archiva, Artifactory or Nexus, and private Docker registries. And stripping back unnecessary dependencies to reduce the attack surface and run-time footprint of VMs and containers. One organization does this by continuously cutting down build dependencies and spinning up test environments in Vagrant until things break.

Docker introduces some new challenges, by making dependency management seem simpler and more convenient, and giving developers more control over application dependencies – which is good for them, but not always good for security:

  • Containers are too fat by default - they include generic platform dependencies that you don’t need and - if you leave this up to developers - developer tools that you don’t want to have in production.
  • Containers are shipped with all of the dependencies baked in. Which means that as containers are put together and shipped around, you need to keep track of what versions of what images were built with what versions of what dependencies and when, where they have been shipped to, and what vulnerabilities need to be fixed.
  • Docker makes it easy to pull down pre-built images from public registries. Which means it is also easy to pull images that are out of date or that could contain malware.
You need to find a way to manage these risks without getting in the way and slowing down delivery. Container security tools like Twistlock can scan for vulnerabilities, provide visibility into run-time security risks, and enforce policies.

Keeping Secrets Secret

Docker, CD tooling, automated configuration management tools like Chef and Puppet and Ansible and other automated tooling create another set of challenges for ops and security: how to keep the credentials, keys and other secrets that these tools need safe. Keeping them out of code and scripts, out of configuration files, and out of environment variables.

This needs to be handled through code reviews, access control, encryption, auditing, frequent key rotation, and by using a secrets manager like Hashicorp’s Vault.

Passion, Patterns and Problems

I met a lot of interesting, smart people at this conference. I experienced a lot of sincere commitment and passion, excitement and energy. I learned about some cool ideas, new tools to use and patterns to follow (or to avoid).

And new problems that need to be solved.

Wednesday, December 23, 2015

DZone's 2015 Guide to Application Security

DZone recently published a Guide to Application Security. It provides a good overview of effective appsec tools and practices, including my article 10 Steps to Secure Software, which looks at the latest release of OWASP's Proactive Controls project.

Wednesday, December 9, 2015

Help make Software Development Safe and Secure

The OWASP community is working on a new set of secure developer guidelines, called the "OWASP Proactive Controls". The latest draft of these guidelines have been posted in "world edit" mode so that anyone can make direct comments or edits to the document, even anonymously.

You can help make software development safer and more secure by reviewing and contributing to the guidelines at this link:

https://docs.google.com/document/d/1e38W6fGv6PmTEFSAwCr9rOj_ACAeKz1bKYgDj2mCACs/edit?usp=sharing.

Thanks for your help!

Wednesday, November 11, 2015

DevOps for Financial Services

The e-book I wrote this summer for O'Reilly DevOps for Finance: Reducing Risk through Continuous Delivery. It looks at DevOps and Continuous Delivery from the perspective of improving reliability and reducing operational and technical risk, while improving security and meeting compliance requirements. It includes an analysis of the challenges that financial services organizations face, and how to address these challenges, with case studies from LMAX, ING, Capital One, Wealthfront and my own firm.

Thursday, August 20, 2015

How to Prevent Catastrophic Failures in Complex Distributed Systems

In his now famous paper How Complex Systems Fail, Dr. Richard Cook explains how and why failures happen in complex systems:

Some Rules of failure in Complex Systems

4. Complex systems contain changing mixtures of failures latent within them. The complexity of these systems makes it impossible for them to run without multiple flaws being present. Because these are individually insufficient to cause failure they are regarded as minor factors during operations.

3. Catastrophe requires multiple failures - single point failures are not enough. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure.

14. Change introduces new forms of failure. The low rate of overt accidents in reliable systems may encourage changes, especially the use of new technology, to decrease the number of low consequence but high frequency failures. These changes maybe actually create opportunities for new, low frequency but high consequence failures. Because these new, high consequence accidents occur at a low rate, multiple system changes may occur before an accident, making it hard to see the contribution of technology to the failure.

The net of this: Complex systems are essentially and unavoidably fragile. We can try, but we can’t stop them from failing – there are too many moving pieces, too many variables and too many combinations to understand and to test. And even the smallest change or mistake can trigger a catastrophic failure.

A New Hope

But new research at the University of Toronto on catastrophic failures in complex distributed systems offers some hope – a potentially simple way to reduce the risk and impact of these failures.

The researchers looked at distributed online systems that had been extensively reviewed and tested, but still failed in spectacular ways.

They found that most catastrophic failures were initially triggered by minor, non-fatal errors: mistakes in configuration, small bugs, hardware failures that should have been tolerated. Then, following rule #3 above, a specific and unusual sequence of events had to occur for the catastrophe to unravel.

The bad news is that this sequence of events can’t be predicted – or tested for – in advance.

The good news is that catastrophic failures in complex, distributed systems may actually be easier to fix than anyone previously thought. Looking closer, the researchers found that almost all (92%) catastrophic failures are the result of incorrect handling on non-fatal errors. These mistakes in error handling caused the system to behave unpredictably, causing other errors, which weren’t always handled correctly or predictably, creating a domino effect.

More than half (58%) of catastrophic failures could be prevented by careful review and testing of error handling code. In 35% of the cases, the faults in error handling code were trivial: the error handler was empty or only logged a failure, or the logic was clearly incomplete. Easy mistakes to find and fix. So easy that the researchers built a freely available static analysis checker for Java byte code, Aspirator, to catch many of these problems.

In another 23% of the cases, the error handling logic of a non-fatal error was so wrong that basic statement coverage testing or careful code reviews would have caught the mistakes.

The next challenge that the researchers encountered was convincing developers to take these mistakes seriously. They had to walk developers through understanding why small bugs in error handling, bugs that “would never realistically happen” needed to be fixed – and why careful error handling is so important.

This is a challenge that we all need to take up – if we hope to prevent catastrophic failure in complex distributed systems.

Site Meter