Collaboration Centrality

The maximum of the edge centrality of edges representing files in a collaboration network. A collaboration network is an unweighted and undirected graph in which nodes represent developers and edges represent files. An edge exists between two developers if they both changed at least one file.

Interpretation

The attribute that collaboration centrality is expected to quantify is diversity in perspective or, rather, the lack thereof. A file that is modified by multiple clusters of otherwise independent developers is likely to have a high value for the collaboration centrality metric. The developer clusters could be an attribute of developers working in, or organized by, logical teams.

Evidence

Collaboration Centrality has been empirically-validated to be associated with historical vulnerabilities in software in the following peer-reviewed research studies:

Secure Open Source Collaboration: An Empirical Study of Linus’ Law [2]

The empirical evidence overwhelming supports the notion that a source code file with high collaboration centrality is more likely to contain a security vulnerability.

Implications

The security implication(s) of a file having high collaboration centrality could be one or more of the following:

Lack of diversity in perspective in contributions to the file may increase the potential for latent vulnerabilities being overlooked.

Mitigations

The theoretical mitigation to lowering the collaboration centrality of a file is to encourage developers from independent clusters to contribute changes to the file. However, the theoretical mitigation is not practical because, as mentioned earlier, the clusters could be an attribute of developers working in, or organized by, logical teams. Therefore, the risk of latent vulnerabilities in a file with high collaboration centrality could be mitigated using one or more of the following suggestions:

Encourage developers from the otherwise independent developer clusters to review the file.

Implementation

As the definition of the collaboration centrality metric suggests, the implementation of the metric relies on the collaboration network. In our implementation of the metric, we use git log command to build the collaboration network with developers as nodes and files as edges. We used an efficient Python module, called graph-tool, to determine the edge centralities. The collaboration centrality of a file is then the maximum of the centralities of all edges representing the file. As a direct consequence of our implementation approach, the collaboration centrality metric can be collected for only those projects that use git as their source code repository.

The source code of the implementation of the metric will be made available on GitHub. If you need to collect the metric from your project, the implementation will also be made available as a container image on Docker Hub.

Languages

The metric implementation is independent of programming language.

Example(s)

In this section, we present examples of the metric collected from popular open-source software projects.

Chromium

In this subsection, we present examples of the metric collected from the Chromium, the open-source project behind the Google Chrome web browser.

The metric examples presented here were collected at 6b9bf768231f commit to the master branch of the Chromium source code repository.

Summary

Chromium Collaboration Centrality Distribution — Figure 1.1

Chromium Collaboration Centrality Discriminatory — Figure 1.2

Shown in Figure 1.1 is the distribution of the metric collected from source code files in the Chromium project. Shown in Figure 1.2 is the comparison of the distribution of the metric collected from source code files in the Chromium project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the Chromium project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range	value < 217.4016	217.4016 ≤ value < 285.9418	285.9418 ≤ value < 438.6604	438.6604 ≤ value
Risk Level	Low	Medium	High	Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the Chromium project in each of the three non-trivial risk levels.

Path	Collaboration Centrality	Percentile
`chrome/browser/jumplist_win.cc`	217.4016	71.8508
`content/common/gpu/client/gpu_channel_host.cc`	217.4016	71.8508
`chrome/browser/ui/webui/chromeos/keyboard_overlay_ui.cc`	217.4016	71.8508
...
`base/i18n/rtl.cc`	285.8883	79.9950
`media/formats/mp2t/mp2t_stream_parser_unittest.cc`	285.8883	79.9950
`chromecast/app/android/cast_crash_reporter_client_android.h`	285.8883	79.9950

Path	Collaboration Centrality	Percentile
`chrome/installer/util/install_util.cc`	285.9418	80.0069
`base/threading/thread_local_storage_unittest.cc`	285.9418	80.0069
`chrome/installer/util/install_util_unittest.cc`	285.9418	80.0069
...
`components/autofill/core/browser/autofill_metrics.cc`	438.2340	89.9882
`third_party/blink/renderer/core/page/scrolling/scrolling_coordinator.h`	438.3339	89.9970
`third_party/blink/renderer/core/page/scrolling/scrolling_coordinator.cc`	438.3339	89.9970

Path	Collaboration Centrality	Percentile
`chrome/browser/safe_browsing/download_protection/download_protection_service_unittest.cc`	438.6604	90.0182
`components/autofill/core/browser/personal_data_manager_unittest.cc`	439.4133	90.0755
`components/arc/arc_service_manager.h`	439.6580	90.0949
...
`third_party/libwebp/fuzzing/fuzz_advanced_api.cc`	8,236.0000	100.0000
`testing/coverage_util_ios.cc`	8,236.0000	100
`testing/coverage_util_ios.h`	8,236.0000	100

OpenBSD

In this subsection, we present examples of the metric collected from the UNIX-like operating system developed by the OpenBSD project.

The metric examples presented here were collected at dbdab68da3b commit to the master branch of the OpenBSD source code repository.

Summary

OpenBSD Collaboration Centrality Distribution — Figure 2.1

OpenBSD Collaboration Centrality Discriminatory — Figure 2.2

Shown in Figure 2.1 is the distribution of the metric collected from source code files in the OpenBSD project. Shown in Figure 2.2 is the comparison of the distribution of the metric collected from source code files in the OpenBSD project that were not historically vulnerable and those that were.

Thresholds

The thresholds of the metric in the OpenBSD project determined using the approach prescribed by Alves et al. [1] is shown in the table below.

Metric Range	value < 5.5194	5.5194 ≤ value < 6.7249	6.7249 ≤ value < 8.6409	8.6409 ≤ value
Risk Level	Low	Medium	High	Critical

Risky Files

The thresholds are used to classify source code files into appropriate risk levels. Shown below are the top and bottom three source code files from the OpenBSD project in each of the three non-trivial risk levels.

Path	Collaboration Centrality	Percentile
`sbin/isakmpd/cookie.h`	5.5194	70.0090
`sbin/isakmpd/hash.c`	5.5194	70.0090
`sbin/isakmpd/math_mp.h`	5.5194	70.0090
...
`sys/net/pf_ruleset.c`	6.7211	79.9225
`sys/netinet/ip_var.h`	6.7211	79.9225
`sys/net/if_loop.c`	6.7211	79.9225

Path	Collaboration Centrality	Percentile
`usr.bin/file/magic.c`	6.7249	80.0564
`sys/arch/i386/i386/est.c`	6.7249	80.0564
`sys/arch/arm/xscale/pxa2x0_apm.c`	6.7249	80.0564
...
`sys/arch/mips64/mips64/fp_emulate.c`	8.5269	89.9813
`lib/libexpat/lib/internal.h`	8.5284	89.9813
`sys/arch/sparc64/dev/ce4231var.h`	8.5933	89.9821

Path	Collaboration Centrality	Percentile
`sys/dev/pci/if_vrreg.h`	8.6409	90.0442
`sys/netinet/in_pcb.h`	8.6409	90.0442
`sys/netinet/in_pcb.c`	8.6409	90.0442
...
`gnu/usr.bin/binutils/gas/config/tc-alpha.h`	83.1874	99.9968
`gnu/usr.bin/binutils/bfd/ecoff.c`	83.1874	99.9968
`libexec/login_skey/login_skey.c`	85.5167	100

Reference(s)

[1] Tiago L. Alves, Christiaan Ypma, and Joost Visser. 2010. Deriving Metric Thresholds From Benchmark Data. In Proceedings of the 26th International Conference on Software Maintenance (ICSM '10). 1-10. https://doi.org/10.1109/ICSM.2010.5609747

[2] Andrew Meneely and Laurie Williams. 2009. Secure Open Source Collaboration: An Empirical Study of Linus’ Law. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09). New York, NY, USA, 453–462. https://doi.org/10.1145/1653662.1653717

Collaboration Centrality

Interpretation

Evidence

Implications

Mitigations

Implementation

Languages

Example(s)

Chromium

Summary

Thresholds

Risky Files

Medium Risk

High Risk

Critical Risk

OpenBSD

Summary

Thresholds

Risky Files

Medium Risk

High Risk

Critical Risk

Reference(s)