The node betweenness centrality of nodes representing files in a contribution network. A contribution network is a weighted and undirected bipartite graph with two sets of nodes: files and developers. An edge exists between a developer node and a file node if the developer made a change (commit) to the file. The weight of the edge is the number of changes a single developer made to a particular file.
The attribute that contribution centrality is expected to quantify is unfocused contribution. A file that is modified by a developer who is inturn modifying several other files is likely to have a high value for the contribution centrality metric.
Contribution Centrality has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:
The empirical evidence overwhelming supports the notion that a source code file with high contribution centrality is more likely to contain a security vulnerability.
The security implication(s) of a file having high contribution centrality could be one or more of the following:
The theoretical mitigation to lowering the contribution centrality of a file is to encourage developers to contribute changes to a small collection of files that they are likely to be familiar with (i.e. have contributed changes to in the past). However, the theoretical mitigation is not practical because developers may be required to contribute changes to a file that they have not contributed to in the past as part of their implementation. Therefore, the risk of latent vulnerabilities in a file with high contribution centrality could be mitigated using one or more of the following suggestions:
As the definition of the contribution centrality metric suggests, the implementation of the metric relies on the contribution network. In our implementation of the metric, we use git log
command to build the contribution network with developers and files as two kinds of nodes and an edge existing between a developer node and a file node if the developer made a change to the file. We used an efficient Python module, called graph-tool, to determine the node betweenness centrality of the file nodes. As a direct consequence of our implementation approach, the contribution centrality metric can be collected for only those projects that use git
as their source code repository.
The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.
The metric implementation is independent of programming language.
In this section, we present examples of the metric collected from popular open-source software projects.
In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.
The metric examples presented here were collected at
6b9bf768231f
commit to the master
branch of the Chromium source code repository.
Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.
The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.
Metric Range | value < 394,024.6612 | 394,024.6612 ≤ value < 778,280.8592 | 778,280.8592 ≤ value < 2,207,375.7721 | 2,207,375.7721 ≤ value |
---|---|---|---|---|
Risk Level | Low | Medium | High | Critical |
The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.
Path | Contribution Centrality | Percentile |
---|---|---|
third_party/libvpx/source/config/linux/x64/vpx_dsp_rtcd.h |
394,024.6612 | 70.1218 |
third_party/libvpx/source/config/win/x64/vpx_dsp_rtcd.h |
394,024.6612 | 70.1218 |
third_party/libvpx/source/config/mac/x64/vpx_dsp_rtcd.h |
394,024.6612 | 70.1218 | ... |
chrome/browser/ui/search_engines/keyword_editor_controller_unittest.cc |
778,093.1622 | 79.9871 |
content/browser/service_worker/embedded_worker_instance_unittest.cc |
778,167.7353 | 79.9930 |
services/network/test/test_network_context.h |
778,231.8847 | 79.9946 |
Path | Contribution Centrality | Percentile |
---|---|---|
third_party/protobuf/src/google/protobuf/descriptor.cc |
778,280.8592 | 80.0469 |
base/trace_event/builtin_categories.h |
778,734.8852 | 80.0480 |
chrome/browser/search/iframe_source.cc |
778,970.7759 | 80.0487 | ... |
net/disk_cache/blockfile/backend_impl.cc |
2,204,535.6360 | 89.9789 |
content/browser/cache_storage/cache_storage_manager_unittest.cc |
2,204,802.4200 | 89.9956 |
chrome/browser/sync/sync_ui_util.cc |
2,204,947.2458 | 89.9978 |
Path | Contribution Centrality | Percentile |
---|---|---|
chrome/browser/ui/views/location_bar/icon_label_bubble_view.cc |
2,207,375.7721 | 90.0010 |
chrome/browser/extensions/activity_log/activity_database.h |
2,208,081.9178 | 90.0015 |
third_party/blink/renderer/core/frame/local_frame_view.h |
2,210,351.5774 | 90.0071 | ... |
chrome/browser/ui/browser.cc |
129,019,395.4290 | 99.9570 |
chrome/browser/chrome_content_browser_client.cc |
179,111,850.5509 | 99.9829 |
chrome/browser/about_flags.cc |
185,376,320.1206 | 100 |
In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.
The metric examples presented here were collected at dbdab68da3b
commit to the master
branch of the OpenBSD source code repository.
Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.
The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.
Metric Range | value < 9,400.5027 | 9,400.5027 ≤ value < 39,017.2731 | 39,017.2731 ≤ value < 142,355.0648 | 142,355.0648 ≤ value |
---|---|---|---|---|
Risk Level | Low | Medium | High | Critical |
The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.
Path | Contribution Centrality | Percentile |
---|---|---|
lib/libcrypto/x509v3/pcy_cache.c |
9,400.5027 | 70.0064 |
lib/libssl/src/crypto/ecdh/ecdh.h |
9,400.5027 | 70.0064 |
lib/libssl/src/crypto/ec/ec2_smpl.c |
9,400.5027 | 70.0064 | ... |
usr.bin/paste/paste.c |
38,940.5872 | 79.9935 |
sys/dev/isa/ad1848var.h |
38,974.0713 | 79.9939 |
sys/arch/loongson/loongson/generic2e_machdep.c |
39,009.4321 | 79.9970 |
Path | Contribution Centrality | Percentile |
---|---|---|
usr.bin/ssh/misc.c |
39,017.2731 | 80.0130 |
sys/arch/sh/include/spinlock.h |
39,059.9255 | 80.0130 |
sys/arch/macppc/dev/zs.c |
39,069.0008 | 80.0194 | ... |
lib/libm/src/e_hypot.c |
141,958.1969 | 89.9964 |
games/monop/misc.c |
142,216.8915 | 89.9983 |
sys/sys/socketvar.h |
142,341.4730 | 89.9989 |
Path | Contribution Centrality | Percentile |
---|---|---|
sys/kern/kern_synch.c |
142,355.0648 | 90.0035 |
lib/libc/net/rcmdsh.c |
142,802.3236 | 90.0047 |
sys/dev/acpi/acpiasus.c |
142,817.0455 | 90.0060 | ... |
gnu/gcc/gcc/config/arm/unwind-arm.h |
15,442,307.0017 | 99.9983 |
lib/libcxx/include/stdio.h |
28,547,350.3944 | 99.9983 |
sys/lib/libsa/printf.c |
31,121,246.6001 | 100 |
[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747
[2] Andrew Meneely and Laurie Williams. 2009. Secure Open Source Collaboration: An Empirical Study of Linus’ Law. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09). New York, NY, USA, 453–462. https://doi.org/10.1145/1653662.1653717