Contribution Centrality

The node betweenness centrality of nodes representing files in a contribution network. A contribution network is a weighted and undirected bipartite graph with two sets of nodes: files and developers. An edge exists between a developer node and a file node if the developer made a change (commit) to the file. The weight of the edge is the number of changes a single developer made to a particular file.

Interpretation

The attribute that contribution centrality is expected to quantify is unfocused contribution. A file that is modified by a developer who is inturn modifying several other files is likely to have a high value for the contribution centrality metric.

Evidence

Contribution Centrality has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:

  1. Secure Open Source Collaboration: An Empirical Study of Linus’ Law [2]

The empirical evidence overwhelming supports the notion that a source code file with high contribution centrality is more likely to contain a security vulnerability.

Implications

The security implication(s) of a file having high contribution centrality could be one or more of the following:

  • Unfocused contribution to the file may increase the potential for latent vulnerabilities.

Mitigations

The theoretical mitigation to lowering the contribution centrality of a file is to encourage developers to contribute changes to a small collection of files that they are likely to be familiar with (i.e. have contributed changes to in the past). However, the theoretical mitigation is not practical because developers may be required to contribute changes to a file that they have not contributed to in the past as part of their implementation. Therefore, the risk of latent vulnerabilities in a file with high contribution centrality could be mitigated using one or more of the following suggestions:

  • Encourage developers to familiarize themselves with the file by participating in its review prior to contributing any changes.

Implementation

As the definition of the contribution centrality metric suggests, the implementation of the metric relies on the contribution network. In our implementation of the metric, we use git log command to build the contribution network with developers and files as two kinds of nodes and an edge existing between a developer node and a file node if the developer made a change to the file. We used an efficient Python module, called graph-tool, to determine the node betweenness centrality of the file nodes. As a direct consequence of our implementation approach, the contribution centrality metric can be collected for only those projects that use git as their source code repository.

The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.

Languages

The metric implementation is independent of programming language.

Example(s)

In this section, we present examples of the metric collected from popular open-source software projects.

Chromium

In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.

The metric examples presented here were collected at 6b9bf768231f commit to the master branch of the Chromium source code repository.

Summary

Chromium Contribution Centrality Distribution
Figure 1.1
Chromium Contribution Centrality Discriminatory
Figure 1.2

Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range value < 394,024.6612 394,024.6612 ≤ value < 778,280.8592 778,280.8592 ≤ value < 2,207,375.7721 2,207,375.7721 ≤ value
Risk Level Low Medium High Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.

Path Contribution Centrality Percentile
third_party/libvpx/source/config/linux/x64/vpx_dsp_rtcd.h 394,024.6612 70.1218
third_party/libvpx/source/config/win/x64/vpx_dsp_rtcd.h 394,024.6612 70.1218
third_party/libvpx/source/config/mac/x64/vpx_dsp_rtcd.h 394,024.6612 70.1218
...
chrome/browser/ui/search_engines/keyword_editor_controller_unittest.cc 778,093.1622 79.9871
content/browser/service_worker/embedded_worker_instance_unittest.cc 778,167.7353 79.9930
services/network/test/test_network_context.h 778,231.8847 79.9946

Path Contribution Centrality Percentile
third_party/protobuf/src/google/protobuf/descriptor.cc 778,280.8592 80.0469
base/trace_event/builtin_categories.h 778,734.8852 80.0480
chrome/browser/search/iframe_source.cc 778,970.7759 80.0487
...
net/disk_cache/blockfile/backend_impl.cc 2,204,535.6360 89.9789
content/browser/cache_storage/cache_storage_manager_unittest.cc 2,204,802.4200 89.9956
chrome/browser/sync/sync_ui_util.cc 2,204,947.2458 89.9978

Path Contribution Centrality Percentile
chrome/browser/ui/views/location_bar/icon_label_bubble_view.cc 2,207,375.7721 90.0010
chrome/browser/extensions/activity_log/activity_database.h 2,208,081.9178 90.0015
third_party/blink/renderer/core/frame/local_frame_view.h 2,210,351.5774 90.0071
...
chrome/browser/ui/browser.cc 129,019,395.4290 99.9570
chrome/browser/chrome_content_browser_client.cc 179,111,850.5509 99.9829
chrome/browser/about_flags.cc 185,376,320.1206 100

OpenBSD

In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.

The metric examples presented here were collected at dbdab68da3b commit to the master branch of the OpenBSD source code repository.

Summary

OpenBSD Contribution Centrality Distribution
Figure 2.1
OpenBSD Contribution Centrality Discriminatory
Figure 2.2

Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range value < 9,400.5027 9,400.5027 ≤ value < 39,017.2731 39,017.2731 ≤ value < 142,355.0648 142,355.0648 ≤ value
Risk Level Low Medium High Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.

Path Contribution Centrality Percentile
lib/libcrypto/x509v3/pcy_cache.c 9,400.5027 70.0064
lib/libssl/src/crypto/ecdh/ecdh.h 9,400.5027 70.0064
lib/libssl/src/crypto/ec/ec2_smpl.c 9,400.5027 70.0064
...
usr.bin/paste/paste.c 38,940.5872 79.9935
sys/dev/isa/ad1848var.h 38,974.0713 79.9939
sys/arch/loongson/loongson/generic2e_machdep.c 39,009.4321 79.9970

Path Contribution Centrality Percentile
usr.bin/ssh/misc.c 39,017.2731 80.0130
sys/arch/sh/include/spinlock.h 39,059.9255 80.0130
sys/arch/macppc/dev/zs.c 39,069.0008 80.0194
...
lib/libm/src/e_hypot.c 141,958.1969 89.9964
games/monop/misc.c 142,216.8915 89.9983
sys/sys/socketvar.h 142,341.4730 89.9989

Path Contribution Centrality Percentile
sys/kern/kern_synch.c 142,355.0648 90.0035
lib/libc/net/rcmdsh.c 142,802.3236 90.0047
sys/dev/acpi/acpiasus.c 142,817.0455 90.0060
...
gnu/gcc/gcc/config/arm/unwind-arm.h 15,442,307.0017 99.9983
lib/libcxx/include/stdio.h 28,547,350.3944 99.9983
sys/lib/libsa/printf.c 31,121,246.6001 100

Reference(s)

[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747

[2] Andrew Meneely and Laurie Williams. 2009. Secure Open Source Collaboration: An Empirical Study of Linus’ Law. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09). New York, NY, USA, 453–462. https://doi.org/10.1145/1653662.1653717