As frontier artificial intelligence (AI) models — that is, models that match or exceed the capabilities of the most advanced models at the time of their development — become more capable, protecting them from theft and misuse becomes more critical. Especially important are a model’s weights — the learnable parameters derived by training the model on massive data sets. Stealing a model’s weights gives attackers the ability to exploit the model for their own use. The requirement to secure AI models also has important national security implications. AI developers and stakeholders across industry, government, and the public need a shared language to assess threats, security postures, and security outcomes.
RAND researchers developed a first-of-its-kind playbook to help AI companies defend against a range of attacker capabilities, up to and including the most sophisticated attacks: Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models.[1] The report also strives to facilitate meaningful dialogue among stakeholders on risk management strategies and the broader impact of AI security.
This brief provides an overview of the report, which
Avoiding significant security gaps requires comprehensively implementing a broad set of security practices. However, several recommendations should be urgent priorities for frontier AI organizations:
Certain measures are needed to protect against the most sophisticated attackers. These include physical bandwidth limitations between devices or networks containing weights and the outside world; hardware to secure model weights while providing an interface for inference; and secure, completely isolated networks for training, research, and other interactions. Because such efforts make take significant time (e.g., five years) to implement, it would be wise for organizations to begin now.
Advanced AI models hold the promise of enhancing labor productivity and improving human health. However, the promise comes with the attendant risk of misuse and unintended consequences of deployment.
The need to protect frontier AI models is not merely commercial: Concerns that the risks of AI models may have national security significance add the security and interests of the public to the risk calculation.
Potential threats are sophisticated, particularly high-priority operations conducted by nation-states. Organizations develop their own security strategies, based on their assessment of threats. But the idiosyncratic view of one organization’s security team might have implications wider than the organization itself. All stakeholders need to have a shared understanding of how security strategies, whether voluntary or governmental, translate into actual security.
The research team’s analysis focused on ways to prevent the theft of model weights, the learnable parameters of AI models. An AI model’s weights represent the culmination of many costly prerequisites for training advanced AI models: significant investment in computing power, large amounts of training data, and years of research by top talent to optimize algorithms. If attackers have a model’s weights, they have complete control over the model.
The research team analyzed a variety of written sources from the academic literature, commercial security reports, official government documents, media reports, and other online sources. They also conducted interviews with nearly three dozen experts, including national security government personnel specializing in information security, prominent information security industry experts, senior information security staff from frontier AI companies, other senior staff from frontier AI companies, independent AI experts with prior experience at frontier AI organizations, and insider threat experts.
The research team identified 38 attack vectors that potential attackers could use to steal model weights. The vectors are not merely theoretical: The vast majority have already been used. Table 1 profiles five common attack vectors and gives examples of their use. The examples illustrate the span of attack capabilities — from incredibly common and easy, such as placing USBs in parking lots, to the most sophisticated attacks, such as development of game-changing cryptanalytic tools that only the most capable actors can achieve.
These examples offer an intuitive profile of the capabilities of different types of actors. The full report provides detailed attack descriptions and hundreds of examples.
Securing AI Model Weights describes 167 recommended security measures that make up the security benchmarks. This brief provides two examples — a small sample of the many important and feasible actions that organizations can take to protect their model weights.
In many leading labs, hundreds or thousands of individuals have full “read” access to frontier model weights. Any one of those individuals can make a copy of the weights, which they could sell or disseminate. Generally, these individuals need to use the weights for their work — but the vast majority do not need the ability to copy them. This recommended security measure ensures that authorized users interact with the weights through a software interface that reduces the risk of the weights being illegitimately copied.
Combining three simple types of access could accommodate varied types of employee access while significantly reducing exfiltration risk:
Even if model weights (and other sensitive data) are encrypted in transport and in storage, they are decrypted and vulnerable to theft during their use. Many employees, as well as cyber attackers with a minimally persistent presence, can steal the weights once they are decrypted ahead of their intended use. Confidential computing is a technique for ensuring that data remain secure, including during use, by decrypting the data only within a hardware-based trusted execution environment (TEE) that will not run insecure code. Implementing confidential computing to secure AI model weights could significantly reduce the likelihood of the weights being stolen.
However, confidential computing needs to be implemented correctly:
The use of confidential computing in GPUs is still nascent and may not be production-ready for frontier systems. However, there is an overwhelming consensus among experts regarding its importance, and it is expected to be deployed shortly.
There is an ongoing, lively debate regarding the extent to which different models need to be secured (if at all). The research team’s goal was to improve the ability to secure whichever frontier AI models are deemed worth securing at the desired SL by systemizing knowledge about which security postures achieve various desirable security outcomes, specifically in the context of securing AI systems and their model weights. Securing AI Model Weights supports informed decisionmaking in both the private and public sectors:
Source: www.rand.org