The science of AI-assisted military decision-making - the study of how humans perceive, trust, and defer to AI systems and the cognitive and organisational consequences that follow — is attracting significant scholarly attention.
That attention is warranted. AI promises to transform military operations by delivering speed, informational superiority, and sharper analysis. However, promise and performance are not the same thing, and the gap between them remains poorly understood.
This piece makes three arguments. First, while AI offers real advantages to military decision-makers, it also introduces underappreciated risks — chief among them the erosion of situational awareness. Second, human judgment remains irreplaceable in military contexts: the more AI is integrated into command and control, the more critical it becomes to understand its limitations. Third, the most consequential gap in AI-enabled warfighting today is evaluative: militaries lack robust frameworks to benchmark AI performance against the realities of operational environments.
To develop these arguments, the piece begins by examining the advantages and risks of AI reliance in military operations. It then considers the limits of machine cognition and the enduring importance of human judgement. Drawing on the US experience, it looks at what AI-enabled warfighting looks like in practice. Finally, it identifies the benchmarking gap as the central challenge facing responsible AI adoption in defence.
AI Advantages and the Risks of Over-Reliance
There are real advantages in using AI for military operations. Speed and informational superiority, which AI systems can deliver, give armies a structural advantage. AI systems allow military decision-makers to integrate data from all domains — maritime, land, air, cyberspace, and space — and organise it in a way that is accessible and actionable.
More broadly, using AI allows people and organisations to save time, especially on tasks requiring limited cognitive effort, and to process larger quantities of data than would otherwise be manageable. In time-critical operational environments, these are not trivial gains.
For the same reasons, however, there is also the risk of placing too much trust in AI performance. It has been widely documented, for example, that using AI produces automation bias — the cognitive tendency to favour machine output over human judgement. Relatedly, when using AI, humans tend to lose the situational awareness that usually allows them to catch errors when they occur. Reliance and vigilance trade off against each other.
AI decision-support systems also perform best when the situations they are asked to handle resemble the data on which they were trained. Out of their comfort zone, their performance degrades. In the wake of COVID-19, to take an example from the civilian domain, AI models that had reliably predicted shopping trends, traffic flows, and supply chain behaviour began to fail systematically, precisely because the pandemic upended the patterns on which they had been built. Military environments — defined by adversarial deception, rapid change, and ambiguity — are structurally more likely to produce such disruptions than most civilian ones.
These insights are of particular importance for the military domain. Most AI decision-support systems used by armies are also used for civilian purposes, and the problems emerging in both domains are fairly comparable. Still, the stakes in military contexts are considerably higher.
Human Judgement and the Limits of Machine Cognition
Scholars of military affairs are converging on a shared concern. In a recent piece in War on the Rocks, Christopher Denzel argued that operational art depends on causal reasoning under uncertainty, a task AI may not be well-equipped for. The weaknesses inherent in the use of AI in military decision-making, in his assessment, are too significant to ignore. Michael Horowitz reinforces this point, suggesting that AI in warfare may actually increase the importance of human judgement rather than diminish it, precisely because military environments are fluid, ambiguous, and organisationally complex in ways that current AI systems are not equipped to navigate alone.
In fact, humans, unlike machines, can draw on sources of knowledge that cannot be codified, such as intuition, practical experience, and cultural and social norms. These are particularly relevant skills for military leaders. Kenneth Payne has explored how machine cognition differs fundamentally from human reasoning about war, arguing that AI systems lack the embodied, emotional substrate that shapes human strategic judgement. Taken together, these perspectives suggest that the value of AI in military contexts depends considerably on how well it supports the irreducibly human dimensions of strategic thought.
This does not mean that AI has no practical usefulness in decision-making. The capacity to gather, analyse, and integrate large quantities of information is an essential enabler of better decision-making. Still, learning when and how to rely on it is equally important.
AI-Enabled Warfighting in Practice
The US military offers the clearest window into what AI-enabled warfighting looks like in practice. Project Maven, established by the Pentagon in 2017, applies computer vision algorithms to satellite imagery, video, and radar data to locate, identify, and track targets. Maven’s first significant deployment came after Russia’s invasion of Ukraine in 2022, when a version of the system was provided to Ukrainian forces to help identify Russian military vehicles, personnel, and infrastructure. It has since been extended across all US services and combatant commands, and is now reportedly capable of generating 1,000 targeting recommendations per hour.
Maven is integrated into the Maven Smart System, a broader AI-enabled warfighting platform developed by Palantir and powered in part by Anthropic’s Claude. Palantir has publicly demonstrated the platform’s ability to synthesise intelligence, generate courses of action, and support commanders through a conversational AI interface — moving fluidly from strategic framing to targeting decisions within a single workflow. That integration makes these platforms too attractive for militaries to ignore. But their added value also depends on the capacity to challenge and enrich decision-making processes, drawing on forms of knowledge rooted more heavily in practical experience, including that of military commanders.
The Benchmarking Gap
From what can be gleaned from public sources, the US Department of Defense has developed substantial guidance on the responsible deployment of AI. Still, as argued by Benjamin Jensen and Yasir Atalan of CSIS, the DoD lacks a systematic framework for benchmarking AI.
Existing benchmarking tools are optimised for commercial use cases and fail to capture the complexity of military decision-making, characterised by uncertainty and fog of war. Defense benchmarks make up only about two percent of all AI evaluation efforts, and those that do exist tend to concentrate on narrow domains such as cybersecurity and biosecurity. Core military functions — operational decision-making in areas like targeting, logistics, intelligence fusion, and command and control — remain largely outside any robust, shared evaluative framework.
In the absence of such benchmarks, the adoption of AI in defence struggles to justify investment, assess system reliability, or guard against the risk of accelerated decision-making. Addressing this gap requires evaluation frameworks grounded in metrics reflecting operational realities. This entails developing new methodologies, data infrastructures, and standards tailored to military contexts. These benchmarks must account for both the differences between human and machine cognition and the inherent complexity of warfare, ensuring that AI systems are optimised for reliability as well as performance.
In war, distinguishing between genuine decision advantage and its illusion is not only a moral but a strategic necessity.


