The Pentagon’s Ambitious AI Plans Look Less and Less Like ChatGPT
The military needs tools that can structure data, deliver insights, and be trusted.
Open AI’s ChatGPT and its ilk have dominated headlines this year, captivating billionaires, fans, regulators, and doomsayers. But much of the recent coverage has also revealed just why the Pentagon is pursuing quite different approaches to AI: military leaders need tools they can trust.
One big reason ChatGPT and other very large language models are so good at mimicking human writing is the same reason they are given to deceit: the neural network at the heart of the AI is fueled by scraping information from millions of websites. While Open AI hasn’t disclosed what websites it used to train its tools, a recent Washington Post investigation looked at 15 million websites that researchers have used to train similar models. Unsurprisingly, this large mass of data contains much that isn’t true—and solarge language and generative AI models frequently lie.
Even if you were to train large language models on a carefully selected pool of websites, you might still run into “artificial hallucination”: “the phenomenon of a machine, such as a chatbot, generating seemingly realistic sensory experiences that do not correspond to any real-world input.”
So DOD is being very careful about using such tools.
“We are not going to use ChatGPT in its present instantiation. However, large language models have a lot of utility,” Maynard Holliday, DOD’s deputy chief technology officer for critical technologies, said Thursday at the Defense One Tech Summit. “We will use these large language models, these generative AI models, based on our data. And so they will be tailored with Defense Department data, trained on our data, and then also on our compute—either our compute in the cloud and or on [premises] so that it's encrypted, and we're able to, essentially…analyze, its feedback.”
This week, Holliday said, the Defense Department will convene a gathering “to get after, you know, just what the use cases are; just what the state of the art is in the industry, and academia.”
DOD also needs to get better at structuring and sharing data, even two years after a seminal directive on the matter, saidMike Horowitz, who leads the Emerging Capabilities Policy Office in the office of the defense undersecretary for policy.
“You need good data, like data that's applicable to the questions that you want to use AI to answer,” Horowitz said. “You need that data to be to be cleaned, to be to be tagged, and that process is is time-consuming. And that process has been. I think,…challenging. And it's been challenging because we build all of these sort of pipes of data that were designed to be independent from each other.”
Commanders aren’t going to trust a tool unless they can understand how it was trained and on what data, Holliday said.
“Back in 2015, when I was on the Defense Science Board doing a study on on autonomy, when we briefed to our combatant commanders they said, ‘You know, this is great, potentially game changing, but…we're not going to use it unless we can trust it,’” he said.
Building Trust
Anyone can play around with ChatGPT to figure out how much to trust it for a given use, but DODis taking a more formal route.
“Initial trust can be gained from design and development decisions through soldier touchpoints and basic psychological safety and continuously calibrated trust through evidence of effectiveness and feedback provided by the system during integration and operation. And so challenges exist in measuring the warfighters’ trust, which require additional research and understanding to define what influences that,” he said.
In practice, that looks a lot like some of the exercises that CENTCOM is now undertaking, bringing together operators across services and AI in a widening series of games and evaluations centered on emerging technologies.
Exercises like Scarlet Dragon Oasis and Falcon Oasis are structured differently from the traditional military training game, said Schuyler Moore, U.S. Central Command’s Chief Technology Officer. These new tech-focused CENTCOM exercises occur in quick succession and are geared around innovating the technology based on soldier feedback as much as building operator skills, Moore said at the Tech Summit. Getting operators and builders collaborating as part of the exercises is also a key component.
These are “intended to follow in many ways that best practices of the software community and private sector, which is that: you do this in sprints; you do it iteratively and you repeat these exercises over and over again to improve over time,” she said. “So for the for the exercise that we're doing right now, there is a muscle memory that we're building, iterating back and forth with a software developer and not saying whatever software capability I've been handed is static…The expectation now is that you can and will poke holes in it, share your feedback, iterate with the team, continue to give your feedback every single time and that–to be frank–has been a cultural mindset shift because exercises previous have never given people the opportunity to experiment with that type of activity.”
Andrew Moore, a CENTCOM advisor on AI, robotics, cloud computing, and data analytics, came to the command from Google, where he worked on a variety of AI related projects, including Project Maven, which is seen as a model for how the military might develop human-AI teams in the future.
CENTCOM played a key role in launching Maven, as many analysts had the job of sifting through hours and hours of drone data to understand how different people on the ground were behaving and which ones might pose a threat.
The command is working to take that sort of research further, to enable AI engines to make better sense of the objects picked up by drones, Moore said.
“The next real question is making sure that you're able to do inferences about what's really happening based on finding relationships between all these dots on maps,” he said.
A breakthrough AI application for CENTCOM in the years ahead will likely look less like a flashy —and buggy-text generator and more like a knowledge graph, which Moore worked on at Google. A knowledge graph works to structure rapidly incoming data according to a rough concept of their properties and relationships between objects. When you go to a social media website and see recomendations for who you might connect with, that’s in part due to a knowledge graph.
“Knowledge graphs…in my opinion, are what's creating these trillion dollar companies that you see on the West Coast of the United States,” he said.
But for CENTCOM, Moore invisions building an engine to understand relationships between objects on a much deeper level, allowing command staff to see connections that will illuminate the battlefield and all the objects in it in ways that adversaries are trying to keep hidden, or may not even be aware of.
“I think that's going to be one of the unifying themes, you'll see,” he said. “It's the absolute importance not just being able to ingest large amounts of data, being able to normalize it in a way where we can actually do inferences so that, perhaps, it's not just that this ship on the ocean is taking a strange trajectory, but also…their financing, or perhaps by their ownership, or other secondary or tertiary really ternary information like that.”