Generative AI tools such as GPT-4o are now widely accessible, but there is still limited experimental evidence on how they affect real knowledge work across diverse organisational roles. While existing studies show that AI can improve performance in narrowly defined tasks – such as writing short memos or generating code – we know less about its impact on complex institutional work involving varied task types, levels of specialisation, and departmental contexts.
To address this, we partnered with the National Bank of Slovakia (NBS) to evaluate the effect of GPT-4o in a structured field experiment. NBS is the country’s central bank and a member of the Eurosystem, and is responsible for monetary policy implementation, financial supervision, and economic analysis. We recruited 101 staff members across departments ranging from research and monetary policy to IT, supervision, and operations. Each participant was asked to complete a two-hour battery of tasks that closely mirrored their day-to-day work and the task-based frameworks from Autor et al. (2003) and Autor (2013).
The experiment consisted of two components:
This experimental design allowed us to cleanly estimate how access to generative AI affects task performance across a broad spectrum of task categories.
Access to GPT-4o led to large average productivity gains across the board. Task quality improved by 33% to 44%, while task completion time fell by 21%. Nearly all participants (94%) produced higher-quality outputs with AI, and a large majority (80%) completed tasks more quickly.
Yet the nature of these gains differed by worker skill level:
To see how these individual gains are distributed across our sample, Figure 1 plots kernel-density estimates of each participant’s AI treatment effect on quality (panel a) and efficiency (panel b) and for (c) for quality on domain-specific tasks.
Figure 1 Distribution of GPT-4o productivity gains
Panel (a) Quality effects on generalist tasks
Panel (b): Time-saving effects on generalist tasks
Panel (c) Quality effects on specialist tasks
While recent research suggests that generative AI disproportionately benefits lower-skill workers (Brynjolfsson et al. 2023, Dell’Acqua et al. 2023, Noy and Zhang 2023), our findings paint a more nuanced picture. We find that lower-skill employees do experience large gains in quality – but higher-skill employees benefit more in terms of speed. Generative AI may thus act as both a quality equaliser and an efficiency multiplier, raising important questions about how organisations structure work, training, and performance evaluation in the AI era.
While panels (a) and (b) of Figure 1 show that nearly everyone benefits, panel (c) reveals a longer right-tail on specialist tasks, highlighting that the biggest payoffs occur when AI meets deep domain expertise. To understand where AI helps most, we classified each task using a framework grounded in Autor’s (2013) taxonomy and subsequent work in task-based economics. Tasks varied along several dimensions:
Our results across these tasks were striking:
These patterns are consistent with the idea that AI is most productive when paired with cognitive complexity and expert context – not when used to automate simple or repetitive tasks.
Table 1 Average performance by task type and treatment condition
While GPT-4o delivered the largest productivity gains on non-routine and specialist tasks, these tasks were not always assigned to the workers who benefited most from AI. For example, employees in more routine job roles experienced the biggest individual quality improvements from using AI, but often worked on tasks that generated relatively modest returns at the task level.
This mismatch between who gains most from AI and where AI is most productive created a matching inefficiency.
To explore this, we simulated a counterfactual reallocation. Keeping staffing and total workload fixed, we reassigned tasks based on each worker’s AI-enhanced comparative advantage – that is, assigning individuals to the task types where their performance improved most with GPT-4o.
The result: aggregate output increased by 7.3%, without changing headcount or effort.
This highlights a key managerial insight: adopting AI is only the first step. To unlock its full value, organizations must also reconsider how tasks are assigned – matching tools, people, and work more intentionally.
Figure 2 Production possibility frontier, with and without generative AI
This highlights a key managerial insight: adopting AI tools is not enough, organisations must also reconsider who does what in order to fully capture the technology’s potential.
Our findings point to several actionable lessons for organisations aiming to integrate generative AI into knowledge work:
As generative AI tools become embedded in everyday work, the challenge shifts from adoption to integration. Realising the full value of generative AI will depend not just on the technology itself, but on how well organisations align it with human capital and task design.
Source: cepr.org
Artificial intelligence differs from other technological advancements in finance, such as the initial adoption of…
Industrial raw materials such as nickel, cobalt, and rare earths are critical inputs in countless…
As European governments scale up investment, bond market stability is more critical than ever. This…
Economists have long warned of the negative consequences of excessive US public debt (e.g. Friedman…
Financial distress affects roughly one in five adults in OECD countries (OECD 2024). It constrains…
Until 2018, the US-China trade data gap was in line with the discrepancies found in…