Number of Inputs

The number of inputs that a function uses.

Interpretation

The attribute that number of inputs is expected to quantify is testability. A file that is difficult to exhaustively test is likely to have a high value for the number of inputs metric.

Evidence

Number of Inputs has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:

To Fear or Not to Fear That is the Question: Code Characteristics of a Vulnerable Function with an Existing Exploit [2]

The empirical evidence overwhelming supports the notion that a source code file with high number of inputs is more likely to contain a security vulnerability.

Implications

The security implication(s) of a file having high number of inputs could be one or more of the following:

Difficulty in exhaustively testing a file may increase the potential for latent vulnerabilities.

Mitigations

The theoretical mitigation to lowering the number of inputs of a file is to have source code files only include functions with few inputs. However, the theoretical mitigation is not practical because modern software is inherently complex requiring functions to accept many inputs. Therefore, the risk of latent vulnerabilities in a file with high number of inputs could be mitigated using one or more of the following suggestions:

Refactor functions in the file to accept fewer inputs when possible leveraging common design patterns as appropriate.
Leverage automated testing to ensure all functions are appropriately tested to a satisfactory level of exhaustiveness.

Implementation

In our implementation of the metric, we use SciTools Understand™ to collect the number of inputs metric from functions. The metric is aggregated at the file level by computing the sum of the number of inputs of all functions in a file.

The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.

Languages

The metric implementation is limited to projects written in C/C++, C#, Fortran, Java.

Example(s)

In this section, we present examples of the metric collected from popular open-source software projects.

Chromium

In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.

The metric examples presented here were collected at 6b9bf768231f commit to the master branch of the Chromium source code repository.

Summary

Chromium Number of Inputs Distribution — Figure 1.1

Chromium Number of Inputs Discriminatory — Figure 1.2

Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range	value < 179	179 ≤ value < 326	326 ≤ value < 898	898 ≤ value
Risk Level	Low	Medium	High	Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.

Path	Number of Inputs	Percentile
`third_party/sqlite/patched/ext/misc/vfsstat.c`	179	70.0198
`third_party/libusb/src/libusb/os/openbsd_usb.c`	179	70.0198
`third_party/sqlite/sqlite-src-3280000/ext/misc/vfsstat.c`	179	70.0198
...
`third_party/sqlite/sqlite-src-3280000/src/trigger.c`	323	79.9769
`third_party/sqlite/sqlite-src-3280000/src/resolve.c`	323	79.9769
`third_party/libpng/pngerror.c`	324	79.9923

Path	Number of Inputs	Percentile
`third_party/sqlite/patched/test/threadtest3.c`	326	80.1661
`third_party/sqlite/patched/src/wherecode.c`	326	80.1661
`third_party/hunspell/src/hunspell/hunspell.cxx`	326	80.1661
...
`third_party/libxml/src/HTMLparser.c`	885	89.8512
`chrome/browser/ui/browser.h`	885	89.8512
`components/cronet/native/generated/cronet.idl_impl_struct.cc`	892	89.8875

Path	Number of Inputs	Percentile
`third_party/sqlite/patched/ext/rbu/sqlite3rbu.c`	898	90.0950
`third_party/sqlite/sqlite-src-3280000/ext/rbu/sqlite3rbu.c`	898	90.0950
`third_party/libxml/src/xmlmemory.c`	904	90.1087
...
`tools/clang/traffic_annotation_extractor/tests/dummy_classes.h`	7,376	97.6829
`third_party/sqlite/amalgamation/sqlite3.c`	26,204	99.9990
`tools/clang/rewrite_scoped_refptr/tests/scoped_refptr.h`	30,171	100

OpenBSD

In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.

The metric examples presented here were collected at dbdab68da3b commit to the master branch of the OpenBSD source code repository.

Summary

OpenBSD Number of Inputs Distribution — Figure 2.1

OpenBSD Number of Inputs Discriminatory — Figure 2.2

Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range	value < 487	487 ≤ value < 777	777 ≤ value < 1,347	1,347 ≤ value
Risk Level	Low	Medium	High	Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.

Path	Number of Inputs	Percentile
`sys/dev/pci/drm/radeon/atombios_encoders.c`	487	70.0289
`usr.sbin/npppd/npppd/npppd.c`	488	70.0764
`gnu/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp`	488	70.0764
...
`gnu/usr.bin/binutils/gdb/symfile.c`	770	79.9228
`sys/dev/acpi/acpi.c`	771	79.9528
`sys/dev/pci/drm/radeon/rv6xx_dpm.c`	775	79.9753

Path	Number of Inputs	Percentile
`sys/dev/pci/if_bnx.c`	777	80.0447
`gnu/usr.bin/gcc/gcc/cp/error.c`	777	80.0447
`gnu/llvm/tools/clang/lib/Sema/SemaCodeComplete.cpp`	778	80.1223
...
`gnu/gcc/gcc/cp/call.c`	1,322	89.7571
`gnu/usr.bin/perl/regcomp.c`	1,325	89.8972
`sys/dev/softraid.c`	1,345	89.9447

Path	Number of Inputs	Percentile
`gnu/usr.bin/perl/toke.c`	1,347	90.0528
`gnu/llvm/include/llvm/MC/MCInst.h`	1,349	90.0547
`gnu/gcc/gcc/config/frv/frv.c`	1,363	90.1341
...
`gnu/gcc/libmudflap/mf-runtime.c`	8,064	99.9905
`sys/kern/kern_malloc.c`	9,853	99.9934
`sys/kern/subr_prf.c`	11,544	100

Reference(s)

[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747

[2] Awad Younis, Yashwant Malaiya, Charles Anderson, and Indrajit Ray. 2016. To Fear or Not to Fear That is the Question: Code Characteristics of a Vulnerable Function with an Existing Exploit. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy (CODASPY '16). New York, NY, USA, 97–104. https://doi.org/10.1145/2857705.2857750

Number of Inputs

Interpretation

Evidence

Implications

Mitigations

Implementation

Languages

Example(s)

Chromium

Summary

Thresholds

Risky Files

Medium Risk

High Risk

Critical Risk

OpenBSD

Summary

Thresholds

Risky Files

Medium Risk

High Risk

Critical Risk

Reference(s)