Recently, a thread about a security problem in a piece of open source software got a lot of attention. There was a vulnerability report, a defensive developer, persistent security folks, and of course sideline comments taking one side or the other. This discussion perfectly illustrates why it can be hard to have a civil discussion about security, and why even with the best of intentions and with skilled developers, security problems can persist in a software system.
I want to take the incident in question as an example to illustrate a better way to reason about and to talk about security. I’ll generalize and fictionalize the incident so it’s clear how it applies more broadly, not just to open source discussions between a handful of people, but also in a corporate context:
- Your typical hard-working and dedicated development team gets a report from the security team (or an external security researcher) that there are a pile of security problems with their project
- They take it a little bit personally: they are worried they might get into trouble. No one told them to care about security, or maybe they are frustrated that the security team is always slowing things down when they have features they need to develop. They wonder why the organization / community is investing time in finding security problems when there’s barely enough budget for the development team to get the features implemented.
- The security team starts suggesting really disruptive fixes before it’s even clear that there’s a problem, and even if there was a problem, maybe it’s not that big of a deal. Because after all, no one is supposed to use the system for sensitive data, or because no external people are allowed to use the system, or because of the firewall/VPN, or because it’s an embedded device, or because it’s just a beta test, or because users have root on these devices anyway, etc., etc.
- The development team grudgingly patches one of the problems and calls it a day, but the security team comes right back and claims it’s not fixed. They tell a senior manager that the developers didn’t implement their suggested fix, and now there are rumors that the project is completely insecure.
- The senior manager shows up one day and tells the developers that the critical IP of the company is being protected by their project, and now that these problems have surfaced, security has to be perfect from here on out.
- Now everyone is unhappy: The security team thinks their vulnerabilities will never be fixed so they write exploits to show how dangerous the problem is, the developers know that the system will never be perfectly secure because nothing ever is so they start hiding problems, the managers are terrified that their critical data is at risk so they start threatening people’s jobs, and the rest of the poor users are worried they’re going to lose a software tool that they rely on to get their job done.
This kind of communication problem, and the resulting security problems, can be avoided by using a disciplined approach to reasoning about and talking about security. This approach can be applied to restructuring large organizations or just having an effective conversation between two people. It goes something like this:
- identify the assets you’re trying to protect, their value, and their nature,
- identify the threats against those assets,
- discover the vulnerabilities that can be exploited by the threats, and
- identify and implement countermeasures that can mitigate those vulnerabilities.
These steps are the foundations of Security Risk Management and you can start with them or you can invoke them when the conversation about security seems to be going at cross purposes: when one person is talking about vulnerabilities but the other person is talking about firewall, or that no one would ever attack them anyway. Each element has to be addressed and ignoring one of them will probably lead you to the wrong conclusions about where to invest time and money.
The conversation might go something like this:
- Developer: I understand there might be a vulnerability here, but I thought that we weren’t going to worry about security because there is no critical data in this system.
- Risk manager: Let’s get the person in charge of the data to tell us how critical the data is. (Identify the assets.)
- Senior manager: Yes, this data is completely critical. If it gets leaked, our competition will beat us to market and if it gets deleted, we will lose months of work.
- Risk manager: The software is operated completely inside the firewall, do you ever get advanced attacks that get inside? (Identify the threats against those assets.)
- Security team: No, that never happens, but legal requirements say that insiders should not be allowed to modify the data, even if they can access it.
- Risk manager: Can any of the vulnerabilities be exploited from outside the firewall? (Discover the vulnerabilities.)
- Security team: No, you have to be on the internal network.
- Risk manager: Can any of the vulnerabilities be used by an insider to modify the data? (Discover the vulnerabilities.)
- Security team: Yes, one of the vulnerabilities can be used to modify data if the attacker is already on the internal network.
- Risk manager: Let’s look at the budget and estimate how many of these vulnerabilities we can fix, prioritizing the ones that can be used to modify data. (Implement countermeasures.)
Getting back to the incident that started this discussion, if you read the thread carefully you’ll find that the developer has a particular point of view about the assets (that local root exploits on a single-user machine aren’t a critical problem) but that the entire conversation has been about vulnerabilities, exploits, and mitigations. If the security folks had been able to act as “risk managers” and surfaced this misunderstanding early on, they could have potentially gotten the developer on their side by explaining the less-obvious threats related to local root exploits.
In the end, the security guys just kept posting exploits until they felt too bored and insulted to bother anymore. The developer unhappily disabled the component that he felt had been hugely useful for his users since he didn’t think it could be salvaged. Maybe this was inevitable, but maybe a better outcome for everyone could have been discovered.