Number of Paths

The number of unique decision paths through a function.

Interpretation

The attribute that number of outputs is expected to quantify is testability. A file that is difficult to exhaustively test is likely to have a high value for the number of paths metric.

Evidence

Number of Paths has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:

  1. To Fear or Not to Fear That is the Question: Code Characteristics of a Vulnerable Function with an Existing Exploit [2]

The empirical evidence overwhelming supports the notion that a source code file with high number of paths is more likely to contain a security vulnerability.

Implications

The security implication(s) of a file having high number of paths could be one or more of the following:

  • Difficulty in exhaustively testing a file may increase the potential for latent vulnerabilities.

Mitigations

The theoretical mitigation to lowering the number of paths of a file is to have source code files only include simpler constructs with limited use of statements that alter control flow. However, the theoretical mitigation is not practical because modern software is inherently complex. Therefore, the risk of latent vulnerabilities in a file with high number of paths could be mitigated using one or more of the following suggestions:

  • Refactor functions in the file to simplify the constructs used leveraging common design patterns when appropriate.
  • Leverage automated testing to ensure all functions are appropriately tested to a satisfactory level of exhaustiveness.

Implementation

In our implementation of the metric, we use SciTools Understand™ to collect the number of paths metric from functions. The metric is aggregated at the file level by computing the sum of the number of paths of all functions in a file.

The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.

Languages

The metric implementation is limited to projects written in C/C++, C#, Ada, Basic, Fortran, Java, Jovial, Pascal, Python, Web.

Example(s)

In this section, we present examples of the metric collected from popular open-source software projects.

Chromium

In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.

The metric examples presented here were collected at 6b9bf768231f commit to the master branch of the Chromium source code repository.

Summary

Chromium Number of Paths Distribution
Figure 1.1
Chromium Number of Paths Discriminatory
Figure 1.2

Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range value < 472 472 ≤ value < 6,399 6,399 ≤ value < 3,484,322 3,484,322 ≤ value
Risk Level Low Medium High Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.

Path Number of Paths Percentile
third_party/libwebp/src/demux/demux.c 472 70.0202
extensions/common/url_pattern.h 474 70.0241
third_party/libxml/src/testSAX.c 476 70.0493
...
third_party/libxml/src/buf.c 6,217 79.8921
third_party/tcmalloc/gperftools-2.0/chromium/src/base/sysinfo.cc 6,221 79.9074
third_party/wds/gen/parser.cpp 6,368 79.9675

Path Number of Paths Percentile
third_party/sqlite/patched/ext/rtree/geopoly.c 6,399 80.0578
third_party/sqlite/sqlite-src-3280000/ext/rtree/geopoly.c 6,399 80.0578
third_party/libxml/src/dict.c 6,444 80.1193
...
chrome/browser/metrics/metrics_memory_details.h 2,904,084 89.8764
third_party/blink/renderer/platform/graphics/paint/paint_property_node_test.cc 3,359,250 89.8938
third_party/libxslt/src/libexslt/date.c 3,442,948 89.9616

Path Number of Paths Percentile
third_party/blink/renderer/platform/wtf/vector.h 3,484,322 90.0067
third_party/opus/src/silk/NSQ_del_dec.c 3,697,176 90.0251
third_party/opus/src/src/opus_compare.c 3,741,628 90.0357
...
third_party/afl/src/afl-fuzz.c 3,338,833,311 97.6324
third_party/opus/src/tests/test_opus_api.c 5,000,000,013 97.6808
third_party/sqlite/amalgamation/sqlite3.c 13,196,384,961 100

OpenBSD

In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.

The metric examples presented here were collected at dbdab68da3b commit to the master branch of the OpenBSD source code repository.

Summary

OpenBSD Number of Paths Distribution
Figure 2.1
OpenBSD Number of Paths Discriminatory
Figure 2.2

Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range value < 1,293,891 1,293,891 ≤ value < 92,586,509 92,586,509 ≤ value < 1,007,304,635 1,007,304,635 ≤ value
Risk Level Low Medium High Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.

Path Number of Paths Percentile
sys/dev/ic/xl.c 1,293,891 70.0149
gnu/usr.bin/cvs/src/diff.c 1,298,254 70.0244
gnu/llvm/lib/MC/MCParser/AsmParser.cpp 1,307,068 70.0793
...
gnu/llvm/tools/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h 91,595,999 79.9809
usr.sbin/apmd/apmd.c 91,951,576 79.9876
gnu/llvm/lib/Target/SystemZ/SystemZISelLowering.h 92,038,451 79.9919

Path Number of Paths Percentile
gnu/usr.bin/binutils-2.17/gas/config/tc-v850.c 92,586,509 80.0152
gnu/usr.bin/binutils/gas/config/tc-v850.c 92,586,518 80.0391
usr.sbin/inetd/inetd.c 94,859,146 80.0573
...
gnu/usr.bin/binutils/bfd/elf32-sparc.c 1,006,542,578 89.7886
usr.sbin/tcpdump/print-802_11.c 1,007,179,746 89.8043
gnu/usr.bin/gcc/gcc/java/parse.c 1,007,244,971 89.9873

Path Number of Paths Percentile
gnu/usr.bin/binutils-2.17/bfd/elf32-bfin.c 1,007,304,635 90.0379
gnu/llvm/lib/Target/Hexagon/HexagonISelLowering.h 1,007,871,097 90.0413
gnu/usr.bin/binutils-2.17/bfd/elf-eh-frame.c 1,007,898,579 90.0545
...
sys/nfs/nfs_serv.c 9,022,869,430 99.9378
gnu/llvm/unittests/ADT/APIntTest.cpp 11,317,359,187 99.9608
gnu/llvm/unittests/ADT/APFloatTest.cpp 12,679,924,184 100

Reference(s)

[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747

[2] Awad Younis, Yashwant Malaiya, Charles Anderson, and Indrajit Ray. 2016. To Fear or Not to Fear That is the Question: Code Characteristics of a Vulnerable Function with an Existing Exploit. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy (CODASPY '16). New York, NY, USA, 97–104. https://doi.org/10.1145/2857705.2857750