Number of Outputs

The number of functions that a function calls.

Interpretation

The attribute that number of outputs is expected to quantify is testability. A file that is difficult to exhaustively test is likely to have a high value for the number of outputs metric.

Evidence

Number of Outputs has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:

  1. To Fear or Not to Fear That is the Question: Code Characteristics of a Vulnerable Function with an Existing Exploit [2]

The empirical evidence overwhelming supports the notion that a source code file with high number of outputs is more likely to contain a security vulnerability.

Implications

The security implication(s) of a file having high number of outputs could be one or more of the following:

  • Difficulty in exhaustively testing a file may increase the potential for latent vulnerabilities.

Mitigations

The theoretical mitigation to lowering the number of outputs of a file is to have functions in source code files call, or be dependent on, fewer other functions. However, the theoretical mitigation is not practical because modern software is inherently complex requiring functions to call other functions to implement features. Therefore, the risk of latent vulnerabilities in a file with high number of outputs could be mitigated using one or more of the following suggestions:

  • Refactor functions in the file to call fewer other functions when possible leveraging common design patterns as appropriate.
  • Leverage automated testing to ensure all functions are appropriately tested to a satisfactory level of exhaustiveness.

Implementation

In our implementation of the metric, we use SciTools Understand™ to collect the number of outputs metric from functions. The metric is aggregated at the file level by computing the sum of the number of outputs of all functions in a file.

The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.

Languages

The metric implementation is limited to projects written in C/C++, C#, Fortran, Java.

Example(s)

In this section, we present examples of the metric collected from popular open-source software projects.

Chromium

In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.

The metric examples presented here were collected at 6b9bf768231f commit to the master branch of the Chromium source code repository.

Summary

Chromium Number of Outputs Distribution
Figure 1.1
Chromium Number of Outputs Discriminatory
Figure 1.2

Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range value < 319 319 ≤ value < 554 554 ≤ value < 1,222 1,222 ≤ value
Risk Level Low Medium High Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.

Path Number of Outputs Percentile
chrome/browser/ui/views/page_info/page_info_bubble_view_browsertest.cc 319 70.0488
third_party/tcmalloc/chromium/src/profiledata.h 319 70.0488
components/sync_sessions/session_store_unittest.cc 319 70.0488
...
third_party/sqlite/sqlite-src-3280000/src/where.c 552 79.9817
chrome/browser/profiles/profile_manager_unittest.cc 552 79.9817
chrome/browser/browsing_data/cookies_tree_model.h 553 79.9912

Path Number of Outputs Percentile
third_party/sqlite/patched/src/where.c 554 80.0878
chrome/browser/page_load_metrics/observers/from_gws_page_load_metrics_observer_unittest.cc 557 80.1114
third_party/abseil-cpp/absl/container/internal/raw_hash_set_test.cc 558 80.1619
...
third_party/sqlite/patched/ext/fts5/fts5_index.c 1,161 89.8892
chrome/browser/ui/views/tabs/tab_strip.h 1,171 89.8981
chrome/renderer/net/net_error_helper_core_unittest.cc 1,195 89.9609

Path Number of Outputs Percentile
chrome/browser/sync/test/integration/two_client_bookmarks_sync_test.cc 1,222 90.0207
content/browser/webauth/authenticator_impl_unittest.cc 1,238 90.2554
third_party/sqlite/sqlite-src-3280000/ext/fts2/fts2.c 1,238 90.2554
...
third_party/wtl/include/atlwince.h 8,567 96.6272
third_party/libxml/src/testapi.c 11,115 97.6838
third_party/sqlite/amalgamation/sqlite3.c 14,612 100

OpenBSD

In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.

The metric examples presented here were collected at dbdab68da3b commit to the master branch of the OpenBSD source code repository.

Summary

OpenBSD Number of Outputs Distribution
Figure 2.1
OpenBSD Number of Outputs Discriminatory
Figure 2.2

Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range value < 415 415 ≤ value < 636 636 ≤ value < 1,103 1,103 ≤ value
Risk Level Low Medium High Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.

Path Number of Outputs Percentile
gnu/llvm/lib/Transforms/Scalar/LoopRerollPass.cpp 415 70.0482
gnu/gcc/gcc/config/h8300/h8300.c 415 70.0482
gnu/usr.bin/binutils/gdb/s390-tdep.c 415 70.0482
...
usr.sbin/unbound/util/netevent.c 632 79.9835
gnu/gcc/gcc/gcc.c 632 79.9835
gnu/llvm/tools/clang/lib/CodeGen/CGExprConstant.cpp 632 79.9835

Path Number of Outputs Percentile
sys/dev/pci/drm/i915/intel_sdvo.c 636 80.0126
usr.sbin/ldapd/btree.c 639 80.0446
gnu/llvm/lib/Analysis/ScalarEvolution.cpp 641 80.2240
...
gnu/llvm/tools/clang/lib/Sema/SemaDecl.cpp 1,099 89.8299
sys/dev/softraid.c 1,102 89.9442
gnu/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 1,102 89.9442

Path Number of Outputs Percentile
gnu/gcc/gcc/config/s390/s390.c 1,103 90.0289
gnu/gcc/gcc/tree-cfg.c 1,110 90.0798
gnu/usr.bin/binutils/gdb/breakpoint.c 1,115 90.1510
...
gnu/llvm/tools/clang/lib/AST/ExprConstant.cpp 3,621 99.4902
gnu/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 5,081 99.6455
gnu/llvm/lib/Target/X86/X86ISelLowering.cpp 7,128 100

Reference(s)

[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747

[2] Awad Younis, Yashwant Malaiya, Charles Anderson, and Indrajit Ray. 2016. To Fear or Not to Fear That is the Question: Code Characteristics of a Vulnerable Function with an Existing Exploit. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy (CODASPY '16). New York, NY, USA, 97–104. https://doi.org/10.1145/2857705.2857750