Churn

The total number of lines added, modified, and deleted throughout the history of a file.

Interpretation

The attribute that churn is expected to quantify is change. A file that has undergone a lot of change is likely to have a high value for the churn metric.

Evidence

Churn has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:

Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista [2]

The empirical evidence overwhelming supports the notion that a source code file with high churn is more likely to contain a security vulnerability.

Implications

The security implication(s) of a file having high churn could be one or more of the following:

Not all changes to the file may have been as thoroughly reviewed for security concerns as others increasing the potential for latent vulnerabilities.

Mitigations

The theoretical mitigation to lowering the churn of a file is to avoid changing it. However, the theoretical mitigation is not practical. Therefore, the risk of latent vulnerabilities in a file with high churn could be mitigated using one or more of the following suggestions:

Subject the file to a security-focused review to ensure the voluminous churn has not inadvertently introduced a vulnerability.

Implementation

As the definition of the metric suggests, the implementation relies on the history of a file. In our implementation of the metric, we use the git log command to collect the metric from the source code repository of a project. As a direct consequence of our implementation approach, the churn metric can be collected for only those projects that use git as their source code repository.

The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.

Languages

The metric implementation is independent of programming language.

Example(s)

In this section, we present examples of the metric collected from popular open-source software projects.

Chromium

In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.

The metric examples presented here were collected at 6b9bf768231f commit to the master branch of the Chromium source code repository.

Summary

Chromium Churn Distribution — Figure 1.1

Chromium Churn Discriminatory — Figure 1.2

Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range	value < 2,954	2,954 ≤ value < 5,421	5,421 ≤ value < 12,164	12,164 ≤ value
Risk Level	Low	Medium	High	Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.

Path	Churn	Percentile
`net/socket/tcp_socket_libevent.cc`	2,954	70.0006
`android_webview/browser/browser_view_renderer_impl.cc`	2,954	70.0006
`chrome/browser/autocomplete/scored_history_match_unittest.cc`	2,954	70.0006
...
`third_party/WebKit/JavaScriptCore/kjs/property_map.cpp`	5,410	79.9832
`content/browser/renderer_host/input/render_widget_host_latency_tracker_unittest.cc`	5,413	79.9935
`media/base/yuv_convert_unittest.cc`	5,413	79.9935

Path	Churn	Percentile
`extensions/browser/api/usb/usb_api.cc`	5,421	80.0040
`third_party/npapi/npspy/extern/java/jni.h`	5,430	80.0131
`chrome/browser/ui/views/frame/browser_non_client_frame_view_ash_browsertest.cc`	5,430	80.0131
...
`cc/layers/picture_layer_impl.cc`	12,154	89.9804
`content/common/gpu/media/video_decode_accelerator_unittest.cc`	12,154	89.9804
`chrome/browser/devtools/devtools_window.cc`	12,162	89.9923

Path	Churn	Percentile
`content/browser/renderer_host/compositor_impl_android.cc`	12,164	90.0011
`third_party/hunspell/src/hunspell/affixmgr.cxx`	12,199	90.0385
`webkit/quota/quota_manager.cc`	12,211	90.0499
...
`third_party/webdriver/atoms.cc`	197,901	99.0468
`third_party/libxml/src/testapi.c`	208,231	99.3454
`third_party/sqlite/amalgamation/sqlite3.c`	989,708	100

OpenBSD

In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.

The metric examples presented here were collected at dbdab68da3b commit to the master branch of the OpenBSD source code repository.

Summary

OpenBSD Churn Discriminatory — Figure 2.2

Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range	value < 5,604	5,604 ≤ value < 8,907	8,907 ≤ value < 14,786	14,786 ≤ value
Risk Level	Low	Medium	High	Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.

Path	Churn	Percentile
`sys/arch/amd64/amd64/machdep.c`	5,604	70.0051
`gnu/gcc/gcc/function.c`	5,605	70.0364
`usr.bin/ssh/cipher.c`	5,609	70.0392
...
`sys/dev/pci/drm/i915/i915_drv.h`	8,820	79.8418
`gnu/gcc/gcc/c-typeck.c`	8,823	79.9102
`gnu/usr.bin/gcc/gcc/config/sparc/sparc.c`	8,896	79.9761

Path	Churn	Percentile
`gnu/usr.bin/binutils/gas/read.c`	8,907	80.0101
`gnu/gcc/gcc/config/sparc/sparc.c`	8,954	80.0751
`gnu/usr.bin/binutils-2.17/bfd/elf.c`	8,984	80.1422
...
`sys/dev/usb/umass.c`	14,571	89.8390
`gnu/llvm/tools/lldb/source/Plugins/Instruction/ARM/EmulateInstructionARM.cpp`	14,659	89.9408
`sys/dev/pci/drm/radeon/evergreen.c`	14,718	89.9956

Path	Churn	Percentile
`sbin/iked/ikev2.c`	14,786	90.0494
`sbin/ifconfig/ifconfig.c`	14,892	90.0992
`gnu/usr.bin/cvs/src/rcs.c`	15,007	90.1577
...
`sys/dev/ic/aic7xxx.c`	69,097	99.2720
`gnu/usr.bin/binutils-2.17/opcodes/m32c-opc.c`	80,237	100
`gnu/usr.bin/perl/charclass_invlists.h`	111,266	100

Reference(s)

[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747

[2] Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST '10). 421-428. https://doi.org/10.1109/ICST.2010.32

Churn

Interpretation

Evidence

Implications

Mitigations

Implementation

Languages

Example(s)

Chromium

Summary

Thresholds

Risky Files

Medium Risk

High Risk

Critical Risk

OpenBSD

Summary

Thresholds

Risky Files

Medium Risk

High Risk

Critical Risk

Reference(s)