Yuliia Verhun | Open Source AI in Defense Tech: The Debate, Risks, and Safeguards for Developers

Open Source AI in Defense Tech: The Debate, Risks, and Safeguards for Developers

According to the Center for Strategic & International Studies (CSIS) Preliminary Assessment, open-source software is spread across 96% of US civil and military codebases. This is not an industry secret that all tech companies rely heavily on open-source software to build their products.

The rise in defense tech startups due to global security challenges in the world puts them in between traditional startups with a well-trodden path (Build-Sale-Scale) and hard-regulated suppliers to defense authorities.

Definitely, Seed-stage or even Series A startups don’t need to put all their resources into overseeing the security risks of using open-source software (OSS) and open-source AI in particular in their products. However, the engineers should be ready to answer simple questions (to themselves, stakeholders, and clients from the government) about the security maturity levels of the OSS they use and the sources of datasets used to train the AI. And provide relevant documentation if requested.

The Baseline

The open-source AI appears as a next-level dimension of traditional OSS the technology companies use in their development cycles. The OSS tools, approaches, and instruments are well-developed, studied, and are not disputed. Until the dual-use open foundation models come into play.

Like with the OSS in general, the open foundation models’ underlying architecture is open to the public and can be examined, modified, and applied for various purposes (depending on the license type) by the end users – individuals or businesses.

The open-source AI model’s weights, training data, and the corresponding code to run them can be tuned and adapted to proprietary software products, while the openness allows other players to integrate and maintain them, reducing all types of costs. Sounds like a dream.

The recent case with Llama, a family of autoregressive large language models (LLMs) released by Meta AI in 2023, has strengthened all possible concerns regarding the existing mechanisms to prevent open-source AI's unauthorized use. After the first Llama models’ weights leak in February 2023 and the later use of adapted Meta’s Llama 13B LLM by six Chinese researchers to produce ChatBIT – an AI tool designed to enhance intelligence gathering, Meta decided it’s time to make the models’ weights public, not only the inference code. In other words, to encourage democracies to proactively use the Llama foundation model in the authorized way while it’s already used by other not-so-democratic actors in many other ways.

Way before the AI models outspread, the same “unauthorized” use was happening with open-source software. Everyone is aware of the Log4Shell vulnerability in 2021 (Log4j – logging framework developed by the Apache Software Foundation) that granted hackers total control of devices running unpatched versions of Log4j. However, no one is aware of how many more cases of OSS malicious use have not been discovered and tackled and how many hidden surprises are embedded into the proprietary software we use.

The Debate & Public Discussions

The emerging use of open-source AI models in cybersecurity and defense tech gives rise to new public discussions. During 2023-2024 numerous documents and requests for information from the US governmental bodies and agencies were released regarding the OSS and open-source foundation models use for military and cybersecurity purposes.

The general consensus among the security community is that the malicious actors “will get their hands on tools whether or not they are open-sourced”1, and that the advantages of open-source security software surpass the harms that might be leveraged by malicious actors.

However, the frontier United States authorities with the largest military budgets are still actively gathering feedback from the community and businesses – what are the risks of open-source AI, do they outweigh the benefits of using OSS and AI, how they should mitigate them and build OSS solutions into the national defense system.

Open-source software and AI in particular are widely used in the US military ecosystem. The CSIS in their Preliminary Assessment gives examples of Army smartphones, Navy warships, and Space Force missile-warning satellites that run on Linux-derived operating systems, and AI-powered F-16s that run on open-source orchestration frameworks like Kubernetes.2
The ability to have full access to model weights and software underlying architecture is vital to defense governmental stakeholders. In the same CSIS report, they highlight a case that the US Department of Defense (DOD) with nearly $70 billion Future Long-Range Assault Aircraft program, rejected a $3.6 billion lower bid over an incomplete Modular Open System Approach (MOSA) requirements, choosing for a vendor that provided full access to aircraft technical data packages (TDPs) that were necessary for maintenance.2

That said, defense tech companies should build their software architecture with complete and accurate documentation by design. Who knows the time when the startup will be bidding for a DOD contract?

Three documents from the US defense agencies for the defense tech startup developers to use as a blueprint:

1.CISA Open Source Software Security Roadmap

Cybersecurity and Infrastructure Security Agency (CISA) in their Open Source Software Security Roadmap as of September 2023 is defining the malicious compromise of OSS components, leading to downstream compromises as one of OSS security risks. CISA mentions real-world examples like embedding cryptominers in open source packages, modifying source
code with protestware that deletes a user’s files, and employing typosquatting attacks that take
advantage of developer errors.3

Key takeaways for defense tech developers:

Even CISA aims to understand where “the greatest dependencies lie for the federal government and critical infrastructure”.3
CISA will identify the OSS libraries that are most used to support critical functions across the federal government and critical infrastructure.
The framework will identify various categorizations of OSS components, such as components that are malicious, which the federal government should stop using, or are well supported and the government may continue using.3

And the developers should be the first to learn what CISA has identified.

2.The Summary of the 2023 Open-Source Software Security Request for Information Report (RFI)

In the original Request for Information on Open-Source Software Security: Areas of Long-Term Focus and Prioritization as of August 10th, 2023,4 Office of the National Cyber Director (ONCD) highlighted focus areas regarding securing Open-Source Software Foundations, and asked the private sector for their input on the questions:

Which of the potential focus areas should be prioritized for any potential action?
What areas of focus are the most time-sensitive or should be developed first?
What technical, policy, or economic challenges must the Government consider when implementing these solutions?

Among others, The ONCD defined four important sub-areas in the Secure Open-Source Software Foundations section:

1.Fostering the adoption of memory-safe programming languages;
2.Reducing entire classes of vulnerabilities at scale;
3.Strengthening the software supply chain;
4.Developer education.

A Summary Report on this Request for Information was published by the ONCD on August 9, 2024. However, it’s even more important what ONCD has asked in the initial Request, than the answers, because this is what defense tech startups will be asked at each stage (fundraising, bidding, exit).

Respondents provided 107 public submissions, the submissions are available for the public (you can search for your favorite company's reply, here’s GitHub’s response).

Key takeaways for defense tech developers:

Respondents recommended standardizing SBOMs (Software Bill of Materials, example from the npm repository).
They noted it can be achieved by supplying agencies with SBOMs and attestation management tools that provide a high level of vulnerability database integration.5
The respondents commented it is preferable to use tools that create SBOMs during the build process because build-time tools generally have access to more detailed and accurate information than what is available from analyzing an artifact.5

3.Request for Comment from the National Telecommunications and Information Administration (NTIA)

President Biden by his Executive Order on “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” as of October 30, 2023, has caused the US National Telecommunications and Information Administration (NTIA) to issue the Request for Comment on these matters.

Like two previous requests for information, the one from NTIA also aims to gather feedback regarding the OSS AI from the industry. NTIA states that the recent introduction of large, publicly-available models, such as those from Google, Meta, Stability AI, Mistral, the Allen Institute for AI, and Eleuthera AI, “have fostered an ecosystem of increasingly “open” advanced AI models, allowing developers and others to fine-tune models using widely available computing.”6

While confirming the well-known observations that the risk of hindering the open foundation models can be lessened by their wide availability, the agency aims to look a few steps ahead and tackle the open AI emerging issues (or at least attempting so).

Key takeaways for defense tech developers (a little longer than in the previous section but worth reading):

NTIA is asking for the community's opinion on when (if ever) should entities deploying AI disclose to users or the general public that they are using open foundation models either with or without widely available weights.
NTIA is raising a consequential question: what should the role of model hosting services (e.g. HuggingFace, GitHub, etc.) be in making dual-use models with open weights more or less available? Should hosting services host models that do not meet certain safety standards? By whom should those standards be prescribed? This echoes the safe harbor concept for online service providers in the DMCA, however, is a million times harder in practical application.
NTIA is asking which licenses are most prominent in the context of making model weights widely available. What are the tradeoffs associated with each of these licenses?
NTIA is researching whether there are concerns about potential barriers to interoperability stemming from different incompatible “open” licenses, e.g., licenses with conflicting requirements, applied to AI components. Would standardizing license terms specifically for foundation model weights be beneficial? Are there particular examples in existence that could be useful?6

So be prepared to provide the answers one day and explain how the open-source license in open foundation models interacts with the proprietary product you develop (the links between these two elements will be quite statical as the AI model will serve as a core in the product).

In other words, it’s better to learn in advance from expert-level US institutions and implement their insights into cutting-edge defense tech projects based on open AI.

Safeguards for Developers

Principles for Package Repository Security

In February 2024, the Cybersecurity and Infrastructure Security Agency (CISA) together with the Open Source Security Foundation (OpenSSF) published a framework to outline voluntary security maturity levels for package repositories.

They define 4 levels of security maturity for package repositories:

Level 0 is defined as having very little security maturity.
Level 1 is defined as having basic security maturity, which includes supporting basic security features like multi-factor authentication (MFA) and allowing security researchers to report vulnerabilities. All package management ecosystems should be working towards at least this level.
Level 2 is defined as having moderate security, which includes actions like requiring MFA for critical packages and warning users of known security vulnerabilities.
Level 3 is defined as having advanced security, which includes actions like requiring MFA for all maintainers and supporting build provenance for packages.

As an example of safer Authentication at package repositories for Level 1, the package repository should require users to verify their email address. The package repository should also support strong multi-factor authentication (MFA) via, at minimum, Time-based one-time passcode (TOTP). It’s stated that solely offering weaker forms of MFA such as SMS, email, or phone call-based MFA would not meet this requirement.7

For Level 2 Authentication security, the package repository should detect the abandoned email domains to prevent domain resurrection for account takeover via the recovery process. The authors suggest it may look like doing a WHOIS lookup on all registered email domains and removing the ability to recover an account via an email domain that has been abandoned.

The package repositories with Level 3 security should support passwordless authentication, such as passkeys, and require MFA for all maintainers (including phishing-resistant MFA for packages deemed critical e.g., the top 1% of packages by downloads).

The same security proposals are construed for Authorization, General Capabilities, and CLI Tooling for all 4 Levels.

Highly recommend going over them for defense tech developers to have the ability to distinguish between the secure and not-that-secure package repositories and their impact on storing the open-source models.

Package Repositories Actions on OSS Security

On March 7, 2024, CISA concluded a two-day Open Source Software (OSS) Security Summit which gathered OSS community leaders.

Summarizing the event, the key actions to help secure the open-source ecosystem were announced. Although the Security Summit was not conducted with a special emphasis on the open foundation models security, we can leverage some knowledge to equip the defense tech developers who use open-source AI as part of their product.

At the Summit the representatives of five package repositories have outlined their safeguards on improving the Principles for Package Repository Security (“Principles”) aiming to improve safety of all open-source ecosystem in general.

Key takeaways for defense tech developers:

The Python Software Foundation is working to provide an API and related tools for quickly reporting and mitigating malware to increase PyPI’s ability to respond to malware in a timely manner without consuming significant resources. They are also adding providers to PyPI for credential-less publishing (“Trusted Publishing”), expanding support from GitHub to include GitLab, Google Cloud and ActiveState as well. The Python ecosystem is finalizing PEP 740 (“Index support for digital attestations”) to enable uploading and distributing digitally signed attestations and metadata used to verify these attestations on a Python package repository, like PyPI.8
npm repository requires maintainers of high-impact projects to use the multifactor authentication. They introduced tooling that allows maintainers to automatically generate package provenance and SBOMs, giving consumers of those open source packages the ability to trace and verify the provenance of dependencies:8

“The npm sbom command generates a Software Bill of Materials (SBOM) listing the dependencies for the current project. SBOMs can be generated in either SPDX or CycloneDX format.)”

Side note: verifying the provenance of dependencies is one of the fundamental features in safeguarding the open foundation models as well. GitHub is doing the same for npm-based projects:

“Starting today, when you build your npm projects on GitHub Actions, you can publish provenance alongside your package by including the --provenance flag. This provenance data gives consumers a verifiable way to link a package back to its source repository and the specific build instructions used to publish it (see example on npmjs.com).”

Before the open foundation models maintainers and policymakers develop their rules to ensuring the ecosystem safety it’s worth referencing to apply the general OSS safeguards principles:

Tracking the models provenance;
Checking the models SBOMs;
Implementing provenance and developing SBOMs for own products to be prepared for any inquiries regarding the safety of the software and its components.

Summary

Open foundation models are widely used in every cutting-edge industry and defense tech is no exception. There are attempts to regulate the open AI and reverse cancellation of those attempts (like recent revoking of Joe Biden’s order on AI safety by Donald Trump).

Such change of governmental policies (regardless it’s US, EU, or Middle East regulations), however, does not change how open foundation models should be used and incorporated in defense tech software.
The defense tech developers should rely heavily on early initiatives for open-source software to make open AI more secure and to build-in security principles by design. They should be prepared and educated on security architecture because this is not only a matter of commercial relations – governmental contracts, partnerships, exits, when the company is required to reveal a lot in order to get the deal.

More importantly, defense tech software directly impacts global safety and has highly practical applications, the most common being real-world battlefield use. And no founder of promising startup would want their last-mile navigation system for drones, or automated turret with computer vision, or any other unmanned vehicle based on open foundation components to lose control due to malicious interference. Because, for instance, developers have not verified the security levels of package repository where the open foundation model was distributed. Yes, this is the artificial example but this is only a matter of time when the real examples will burst.

Author – Yuliia Verhun, IT Lawyer

All intellectual property rights to this Article belong to Yuliia Verhun.

Sources:
1.https://www.cisa.gov/news-events/news/open-source-artificial-intelligence-dont-forget-lessons-open-source-software
2.https://www.csis.org/analysis/defense-priorities-open-source-ai-debate
3.https://www.cisa.gov/sites/default/files/2024-02/CISA-Open-Source-Software-Security-Roadmap-508c.pdf
4.https://www.whitehouse.gov/wp-content/uploads/2023/09/OS3I-RFI-Final-09232023.pdf
5.https://www.whitehouse.gov/wp-content/uploads/2024/08/Final-OSSRFI-Summary-Report.pdf
6.https://www.ntia.gov/federal-register-notice/2024/dual-use-foundation-artificial-intelligence-models-widely-available-model-weights
7.https://repos.openssf.org/principles-for-package-repository-security
8.https://www.cisa.gov/news-events/news/cisa-announces-new-efforts-help-secure-open-source-ecosystem

We value Your privacy

Necessary Cookies

Analytics

Open Source AI in Defense Tech: The Debate, Risks, and Safeguards for Developers

The Baseline

The Debate & Public Discussions

Safeguards for Developers

Package Repositories Actions on OSS Security

Summary

More articles

Tell Your Lawyer Before You Offer ESOPs to Someone

Cap Table for Startup: Keep Track of Your Shares

Board of Directors: Purpose, Formation, and Does My Startup Need It?

Thank You for Your email!

Get in touch