This page looks best with JavaScript enabled

Do automated tools really detect only 45% of all vulnerabilities?

 ·  ☕ 7 min read

If you’ve dealt with IT security for any length of time, chances are that you’ve come across a claim that research has shown that automated tools can only detect 45% of vulnerabilities. It is often cited to illustrate the need for participation of human experts in security and penetration tests. However is the claim really true?

You may find it in, among many other places, the latest OWASP Testing Guide.

OTGv4

Given this source in particular, one might reasonably expect the claim to be correct…but, as you may have guessed, that is not the case. Or rather not entirely. Some other sources cite the original research, where the number 45% originated, more or less correctly, as tools are capable of detecting 45% of types of vulnerabilities. This version may be found (again, among many other places) in several OWASP presentations.

OWASP-Embed within SDLC-slide 42

Many have cited these presentations. For example Mitnick Security, company of the world renowned Kevin Mitnick, cites OWASP almost word for word on their website. However, as a closer look at the text of their site shows, even they, when coming from exact citation of OWASP, managed to interpret the conclusions of the original research to mean “automated tools detect only 45% of vulnerabilities” in some cases.

Citation of OWASP on mitnicksecurity.com

Leaving misunderstanding/misquoting of the original conclusions to one side, even in cases when the conclusions are cited correctly, many seem to gloss over the fact that the research, which resulted in the “45%” result, was limited in scope to only certain types of tools and took place all the way back in 2007 so its results don’t necessarily describe the current state of affairs… But we’re getting ahead of ourselves. First, let’s take a look at where the number actually came from.

OWASP Testing Guide is one of the few places where we may find an attribution (although the reference in OTG should point to [21], not [22]), which leads us to a presentation from BlackHat DC 2007 by a team (Robert A. Martin, Sean Barnum and Steve Christey) from MITRE/Cigital.

Unfortunately, by itself, the slide from the presentation which is cited in OTG doesn’t give us much information. We may deduce from it that 55% of CWEs were found not to be covered by - presumably - some tested or analyzed tools, but that is about it.

Slide 30

Since the other slides in the presentation don’t give us any more information regarding the presumed 45% detection rate, we need to dig a bit deeper. After a while of Googling, one might find couple of articles from the same authors on MITRE website (one which probably served as a basis for the BlackHat talk and one from CrossTalk magazine), which are both titled the same as the presentation. Neither of them, unfortunately, sheds any light on the issue of detection rate among automated tools.

I have to admit that this was the point, where my Google-Fu failed me as I was unable to find anything more exact with regards to the original research. I was, however, able to find e-mail contacts for all three authors of the original paper/presentation from BlackHat DC 2007 and one of them - Bob Martin - was kind enough to reply to my message and explain what their work was based on. Following paragraphs are contents of the e-mail I received, unedited except for the use of bold font for what I believe are the most important parts.



The source of that statistic is the MITRE CWE Team's compilation of the knowledge-bases from the static analysis tools that provided us with the details of weaknesses so we could build out CWE.

While we don't have a specific list of those who donated content, the list of organization on the CWE Community page is close, if you limit yourself to tool and researcher organizations.

The 2007 Black Hat talk does a pretty good job covering what went into the creation of CWE.

So when we combined all of these different knowledge-sources we found that there was only a very slight intersections between the tools knowledge-bases.

Now a days, the best way to recreate this would be to use the COVERAGE CLAIMS that most of the CWE Compatible tools and services provide.


Examples of Publicly Available CWE Coverage Claims:
----------------------------------------------------------------------
https://www.synopsys.com/content/dam/synopsys/sig-assets/datasheets/coverity-cwe-sanstop25.pdf
https://www.grammatech.com/software-assurance/certifications-compliance/cwe
https://docs.sonarqube.org/latest/user-guide/security-rules/
https://help.veracode.com/reader/DGHxSJy3Gn3gtuSIN2jkRQ/o5xpvFVymSUGcFJ492HXEg
http://docs.klocwork.com/Insight-10.0/CWE_IDs_mapped_to_Klocwork_C_and_C%2B%2B_checkers
http://docs.klocwork.com/Insight-10.0/2011_CWE-SANS_Top_25_Most_Dangerous_Software_Errors_mapped_to_Klocwork_checkers
http://docs.klocwork.com/Insight-10.0/2010_CWE-SANS_Top_25_Most_Dangerous_Software_Errors_mapped_to_Klocwork_checkers
http://docs.klocwork.com/Insight-10.0/CWE_IDs_mapped_to_Klocwork_Java_checkers
http://docs.klocwork.com/Insight-10.0/2011_CWE-SANS_Top_25_Most_Dangerous_Software_Errors_mapped_to_Klocwork_checkers
http://docs.klocwork.com/Insight-10.0/2010_CWE-SANS_Top_25_Most_Dangerous_Software_Errors_mapped_to_Klocwork_checkers
https://access.redhat.com/articles/171613
https://dwheeler.com/flawfinder/flawfinder.pdf
https://vulncat.fortify.com/en/weakness

On the last one, we'd like to get Fortify to offer their coverage indexed by CWE Ids but so far this is what we have from them.

That being said, the statistic we did in 2007 was only for static analysis tools, since at that time they were the only one really documenting the flaws they found and talking about how to fix them.

Similar static analysis studies were done by the Center for Assured Software (CAS) out of NSA. Here's links to their 2010 and 2011 reports:

http://cps-vo.org/file/1152/download/30152
https://samate.nist.gov/docs/CAS_2011_SA_Tool_Method.pdf

On slides 21 & 22 of the first one, and on page 25 of the second one they show the overlap in findings between tools.

Today you'd want to include DAST and Binary analysis, along with any other tool/technique that can uncover weaknesses in software architecture, software design, software code, and the deployment of software into operations.

You would also want include more of the quality issues that only indirectly make it easier to introduce a vulnerability and/or make the vulnerability more difficult to detect or mitigate, like reliability, performance, and maintainability, similar to the expansion undergone by CWE itself in January https://cwe.mitre.org/news/index.html#jan032019_CWE_Version_3.2_Now_Available.


Within CWE these are captured in the CWE-1128 view (CISQ Quality Measures (2016)) and the "quality" slice (CWE-1040, Quality Weaknesses with Indirect Security Impacts), which includes not-automatically-detectable quality issues.

CISQ recently published an update to their work which we are still capturing as a CWE view.

Leveraging the CAS work, NIST has been holding Software Assurance Tool Evaluation (SATE) efforts, where NIST is working with the community, both private industry, academia, and government, to get a better handle on what weaknesses tools find and how well they find them. They have annual workshops and share test programs (Juliet).

Finally, tools can not find many architecture or design weakness https://cwe.mitre.org/data/definitions/1008.html. If the development effort using model-based software engineering tools they theoretically can find some of these - which is the focus of a new MBSE Working Group in the Consortium of Information and Security Quality (CISQ).
___

As we may see - among many other information for which I’m very grateful to Bob Martin - the original research only covered static analysis tools (SAST). Even if the research wasn’t as old as it is, this fact alone shows its results should not be interpreted and presented in the way they very often are.

Don’t get me wrong - I don’t claim that tools alone can find every type of vulnerability out there. They can’t - automated scanners and other tools are great at finding certain types of vulnerabilities, but for others, they are either unable to find them at all or don’t come even close to what an experienced penetration tester, analyst or auditor may discover. I don’t even claim that tools are currently capable of finding more than 45% of all vulnerability types - I don’t know whether or not they are and as far as I can tell, no one else does either.

And that is the point - although we might like to have hard numbers to back up why human factor is indispensable when it comes to finding vulnerabilities, citing results of a study from 2007 as current, or using misquoted version of its conclusions in marketing materials in order to convince customers that they really need our experienced pentesters in order to be secure, is something we should try very hard to avoid.

Share on