A Playbook for Securing AI Model Weights

As frontier artificial intelligence (AI) models — that is, models that match or exceed the capabilities of the most advanced models at the time of their development — become more capable, protecting them from theft and misuse becomes more critical. Especially important are a model\’s weights — the learnable parameters derived by training the model on massive data sets. Stealing a model\’s weights gives attackers the ability to exploit the model for their own use. The requirement to secure AI models also has important national security implications. AI developers and stakeholders across industry, government, and the public need a shared language to assess threats, security postures, and security outcomes.

RAND researchers developed a first-of-its-kind playbook to help AI companies defend against a range of attacker capabilities, up to and including the most sophisticated attacks: Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models.^[1] The report also strives to facilitate meaningful dialogue among stakeholders on risk management strategies and the broader impact of AI security.

Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models

This brief provides an overview of the report, which

identifies 38 meaningfully distinct attack vectors
explores a variety of potential attacker capabilities, from opportunistic criminals to highly resourced nation-states
estimates the feasibility that an attack vector can be executed by different categories of attackers
proposes and defines five security levels and recommends preliminary benchmark security systems that roughly achieve the security levels.

Avoiding significant security gaps requires comprehensively implementing a broad set of security practices. However, several recommendations should be urgent priorities for frontier AI organizations:

Develop a security plan for a comprehensive threat model.
Centralize all copies of weights in access-controlled and monitored systems.
Reduce the number of people with access to the weights.
Harden interfaces for model access.
Employ defense-in-depth for redundancy.
Implement insider threat programs.
Incorporate confidential computing to secure the weights and reduce the attack surface.

Certain measures are needed to protect against the most sophisticated attackers. These include physical bandwidth limitations between devices or networks containing weights and the outside world; hardware to secure model weights while providing an interface for inference; and secure, completely isolated networks for training, research, and other interactions. Because such efforts make take significant time (e.g., five years) to implement, it would be wise for organizations to begin now.

Why Focus on Securing AI Systems, Especially Their Model Weights?

Advanced AI models hold the promise of enhancing labor productivity and improving human health. However, the promise comes with the attendant risk of misuse and unintended consequences of deployment.

The need to protect frontier AI models is not merely commercial: Concerns that the risks of AI models may have national security significance add the security and interests of the public to the risk calculation.

Potential threats are sophisticated, particularly high-priority operations conducted by nation-states. Organizations develop their own security strategies, based on their assessment of threats. But the idiosyncratic view of one organization\’s security team might have implications wider than the organization itself. All stakeholders need to have a shared understanding of how security strategies, whether voluntary or governmental, translate into actual security.

The research team\’s analysis focused on ways to prevent the theft of model weights, the learnable parameters of AI models. An AI model\’s weights represent the culmination of many costly prerequisites for training advanced AI models: significant investment in computing power, large amounts of training data, and years of research by top talent to optimize algorithms. If attackers have a model\’s weights, they have complete control over the model.

The research team analyzed a variety of written sources from the academic literature, commercial security reports, official government documents, media reports, and other online sources. They also conducted interviews with nearly three dozen experts, including national security government personnel specializing in information security, prominent information security industry experts, senior information security staff from frontier AI companies, other senior staff from frontier AI companies, independent AI experts with prior experience at frontier AI organizations, and insider threat experts.

What Are the Potential Avenues of Attack?

The research team identified 38 attack vectors that potential attackers could use to steal model weights. The vectors are not merely theoretical: The vast majority have already been used. Table 1 profiles five common attack vectors and gives examples of their use. The examples illustrate the span of attack capabilities — from incredibly common and easy, such as placing USBs in parking lots, to the most sophisticated attacks, such as development of game-changing cryptanalytic tools that only the most capable actors can achieve.

These examples offer an intuitive profile of the capabilities of different types of actors. The full report provides detailed attack descriptions and hundreds of examples.

\"A — **Table 1.** Five Common Attack Vectors

Recommended Security Measures

Securing AI Model Weights describes 167 recommended security measures that make up the security benchmarks. This brief provides two examples — a small sample of the many important and feasible actions that organizations can take to protect their model weights.

Hardening Interfaces for Weight Access

In many leading labs, hundreds or thousands of individuals have full \”read\” access to frontier model weights. Any one of those individuals can make a copy of the weights, which they could sell or disseminate. Generally, these individuals need to use the weights for their work — but the vast majority do not need the ability to copy them. This recommended security measure ensures that authorized users interact with the weights through a software interface that reduces the risk of the weights being illegitimately copied.

Combining three simple types of access could accommodate varied types of employee access while significantly reducing exfiltration risk:

Use of predefined code reviewed and vetted by the security team. This is the most logical choice for inference interfaces (both internally and for public application programming interfaces [APIs]).
More flexible access (including execution of custom code) on a server with rate-limited outputs, so that exfiltration of a significant portion of the weights would take too long to be practical. This could be used for most research and development use cases.
Direct work (without constraints on code or output rates) on an air-gapped isolated computer. This may be useful for rare instances where complete flexibility is needed, possibly when conducting interpretability research directly on frontier models.

Confidential Computing

Even if model weights (and other sensitive data) are encrypted in transport and in storage, they are decrypted and vulnerable to theft during their use. Many employees, as well as cyber attackers with a minimally persistent presence, can steal the weights once they are decrypted ahead of their intended use. Confidential computing is a technique for ensuring that data remain secure, including during use, by decrypting the data only within a hardware-based trusted execution environment (TEE) that will not run insecure code. Implementing confidential computing to secure AI model weights could significantly reduce the likelihood of the weights being stolen.

However, confidential computing needs to be implemented correctly:

The TEE must include protections against physical attacks (current implementations of confidential computing in graphics processing units [GPUs] do not).
Model weights must be encrypted by a key generated within the TEE and stored within it.
The TEE will run only prespecified and audited signed code. That code decrypts the weights, runs inference, and outputs only the model response. The code cannot output weights, the weight encryption key, or any information directly outputted by the model.

The use of confidential computing in GPUs is still nascent and may not be production-ready for frontier systems. However, there is an overwhelming consensus among experts regarding its importance, and it is expected to be deployed shortly.

How Can Securing AI Model Weights Be Used by Stakeholders?

There is an ongoing, lively debate regarding the extent to which different models need to be secured (if at all). The research team\’s goal was to improve the ability to secure whichever frontier AI models are deemed worth securing at the desired SL by systemizing knowledge about which security postures achieve various desirable security outcomes, specifically in the context of securing AI systems and their model weights. Securing AI Model Weights supports informed decisionmaking in both the private and public sectors:

Executives and security teams at leading AI companies can explore the attack vectors profiled in the report and the many documented instances of their use. Many security experts that the research team interviewed were familiar with some attack vectors but unaware or skeptical of others. This knowledge gap can leave their systems vulnerable.
AI companies should also compare their existing security posture to the five security benchmarks. By identifying which benchmark is closest to their current state, they can better understand which actors they are likely secure against and, more importantly, which actors remain threats. The benchmarks are not prescriptive, and their details will evolve, but they provide a useful calibration tool.
Companies can use the security benchmarks to identify next steps in improving its security posture. If companies are missing specific security measures (or alternatives that make more sense with their infrastructure) to meet a specific benchmark, they should focus on those first. Once they have achieved a security benchmark, they can look to the next one for further recommendations and next steps.
The security benchmarks could also be the foundation for standards and regulation, giving regulators and executives a bedrock of concepts and measures to assess what level of security companies have achieved or have plans to reach.

Source: www.rand.org

Share it :