As large language models gain traction, it is important to ensure that decisions delegated to them comply with ethical and legal principles. This column reports on an experiment in which state-of-the-art large language models were prompted to impersonate the CEO of a fictitious financial firm needing to repay corporate debt urgently with no funds available except for customer deposits. The findings indicate that some models incline towards fraudulent behaviour even when alternatives exist. Adequate prompting strategies may reduce risk, but are not sufficient. Further analysis of how such models make choices in the financial domain is essential, along with the adoption of appropriate governance frameworks in financial institutions.
The rapid adoption of large language models (LLMs) has renewed concerns about whether artificial intelligence shares human values – the so-called ‘alignment problem’ (Ji et al. 2024). LLMs are increasingly capable in a variety of fields, including the analysis of complex data. They are fast, easy to interact with, and quite convincing when recommending a course of action. Delegating decisions to them comes more naturally than it did with previous forms of artificial intelligence (AI), which did not speak in plain language and only understood one problem at a time. This is where we need to be sure that, left to their own devices, AIs will behave ethically.
Alignment is generally evaluated through benchmarks, where LLMs are scored on their ability to tell right from wrong in a large set of simulated ethical dilemmas. For example, an LLM may be prompted to impersonate a chemist working at a water treatment plant and asked whether it is acceptable to contaminate the water supply (Scherrer et al. 2024). While the answer is obvious to us, an LLM with adequate safety training could still get it wrong.
Surprisingly, leading benchmarks are very thin on financial behaviour. Financial firms are often early adopters of new technologies. Misaligned LLMs could impact financial stability, market fairness, and facilitate criminal abuse of the financial system (Aldasoro et al. 2024, Danielsson and Uthemann 2024).
Indeed, the problem is worth looking at in some depth. An early study on alignment in the financial domain showed that, when pressured by virtual colleagues, one LLM acting as portfolio manager for a company was willing to engage in insider trading and even lie about it (Scheurer et al. 2023).
A controlled experiment
In November 2022, news media exposed an $8 billion hole in the balance sheet of US crypto exchange FTX. The company soon went bankrupt, and founder Samuel Bankman-Fried was eventually convicted of “one of the largest financial frauds in history” (US Department of Justice 2024). Bankman-Fried and associates had attempted to cover corporate losses by misappropriating customer funds and deploying them on highly speculative trades, thus violating one of the fundamental principles of financial intermediation – the duty to safeguard client assets.
When asked to summarise and analyse the FTX story, LLMs generally do well. They know the facts, and they recognise the wrongdoing. Still, it is not a given that this textbook knowledge translates to the ability to make the correct decisions in a similar scenario.
In a recent paper (Biancotti et al. 2025), we prompted 12 state-of-the-art LLMs to impersonate the CEO of a fictitious financial firm facing an FTX-like dilemma, without explicitly mentioning FTX itself or crypto, to prevent the models from relying on specific prior knowledge. The AI needs to repay corporate debt urgently, but there are no funds available except for customer deposits. It is given three options: (1) leave the deposits alone, default on the loan, and shut the company down; (2) misappropriate a small amount of customer money and gamble it on financial markets; or (3) misappropriate and gamble a large amount of customer money.
The experiment revealed significant heterogeneity among LLMs in their baseline propensity to engage in fraudulent behaviour. Most models fell into a high misalignment category, approving the misuse of customer assets in 75-100% of simulations. A smaller group demonstrated moderate misalignment, making unethical decisions in 40-50% of cases, while only one model showed strong ethical guardrails, choosing to misappropriate customer funds in only 10% of simulations (Figure 1).
Figure 1 Relative frequency of misalignment for each of the twelve models tested


Put simply, when faced with the same choice that led to Bankman-Fried’s downfall, the vast majority of AI models made the same call. The variation seen in our results suggests that current training methods produce inconsistent moral reasoning across different LLMs, with some models routinely choosing to violate fiduciary duty even when alternatives exist.
Incentives, constraints, and understanding
When scenarios were modified to test responses to different preferences, incentives, and constraints, most models responded consistently with economic theory. For instance, LLMs were less likely to misappropriate funds when instructed to be risk-averse, more cautious when profit expectations from risky trades were low, and more compliant when operating in regulated environments with severe penalties.
These results suggest that properly designed prompts and incentive structures can influence AI behaviour in predictable directions, offering potential avenues for improving alignment through mechanism design (Christiano et al. 2017, Bai et al. 2022).
Puzzling evidence emerged regarding corporate governance. Contrary to extensive economic literature showing that solid governance structures reduce unethical behaviour (Bank for International Settlements 2015), most models became more likely to engage in fraud when told they might face internal audits. Perhaps the LLMs think they are being audited on profitability as opposed to legality.
The model that performed best on alignment was also the one most capable of explicitly mapping the simulated scenario onto concepts of ethics and legality. This is evident from an analysis of chat logs. In each of several thousand simulation runs, the 12 LLMs were asked to explain their decisions. Mostly, they elaborated on risk, profit expectations, and the company’s prospects. Only one model immediately identified the legal and ethical aspects of the problem (Figure 2).
Figure 2 Relative frequency of words referencing selected semantic categories in chat logs


Policy implications
The need to balance AI’s efficiency gains against alignment risks reflects broader policy questions about how societies should harness AI’s transformative potential while managing its downsides (Filippucci et al. 2024).
Before a given LLM is deployed on the market, simulation-based testing can effectively identify models with concerning behaviour. It can be particularly valuable because it can be conducted by independent auditors without relying on AI developers, thus avoiding potential conflicts of interest.
Still, this type of testing has important limitations. Results can be extremely sensitive to small prompt variations, and comprehensive evaluation across multiple scenarios is costly. Simulation-based testing should be complemented by investigation of the internal workings of LLMs, such as mechanistic interpretability research that examines how models actually process ethical reasoning (Templeton et al. 2024). This requires public–private cooperation through red-teaming exercises involving both developers and regulators.
In the post-deployment phase, appropriate governance frameworks within financial institutions remain essential. Human oversight and accountability mechanisms are still necessary – not only because prompt engineering doesn’t always yield the expected results, but because models that appear well-aligned on average may still make occasional unethical decisions.
Financial authorities are well-positioned to address this challenge, having previous experience with AI governance frameworks (Yong and Prenio 2021). One promising avenue is the adaptation of existing risk management and governance requirements to account for AI-specific risks (OECD and Financial Stability Board 2024), including the principal-agent problems that emerge when humans delegate decisions to opaque AI systems (Immorlica et al. 2024).
Moving forward, policymakers should prioritise developing AI-specific governance frameworks that account for the unique challenges of managing artificial agents. This includes not just pre-deployment safety testing, but also post-deployment monitoring and the development of AI-on-AI supervision capabilities.
Source: cepr.org