Trustworthy battlefield software depends on better testing
The Air Force Office of Scientific Research is taking a new approach to software testing in an effort to significantly reduce the number of errors found in software used in-theater.
As the military increasingly depends on computerized systems to conduct operations on the net-centric battlefield, software reliability has become paramount.
That’s the theory behind a new approach to software testing that the Air Force Office of Scientific Research is reviewing. AFOSR's goal is to significantly reduce the number of errors found in software used in theater.
One result of this research could be software in which military units can quickly swap various components or replace software to fix faulty systems, much like computers and other electronic hardware can be repaired now.
It also could lead to faster development of military software systems.
The University of Nebraska is conducting the research, which AFOSR and the National Science Foundation are funding.
Generating tests account sfor a large part of the overall time needed to test software, said David Luginbuhl, program manager of systems and software at AFOSR, who is overseeing the research project. The Nebraska project has developed an algorithm that can generate tests 300 times faster than other methods.
The basis of the project, dubbed Just Enough Testing, is to reuse test results across different systems that share similar sets of features, thereby reducing the time needed to test a single system.
The project has been looking specifically at testing software product lines that are families of large software systems, said Myra Cohen, an assistant professor at the University of Nebraska and the project's lead researcher.
If you consider mobile phones as a particular product, there are many different kinds that have common and not-so-common elements, Cohen said. For example, although all will have an address book and dialing software, there could be many different graphical elements because some will offer text messaging or video messaging.
Likewise, various software products can have common and uncommon elements.
In the world of software development — and in the military — people are looking to build families of software products rather than single products, she said. So her project team's goal has been to build a model that will include elements that are common across all software products, in addition to things that vary.
“Having that model, we can now put together a model that can be used to test many, many systems at one time by building a common set of elements for all of them,” Cohen said. “And what we are looking for [in testing] are specific types of faults that happen when we put these elements together.”
She said that although algorithms already exist that can produce samples for testing, few can handle dependencies among software elements because developers make assumptions about interactions among those elements. Faults occur when elements are put together in a way that breaks those assumptions.
A corollary is that this testing will also produce information about which software components work well with one another. That means it could be easier for technicians in theater to test for faulty software and swap potentially bad software for components that are known to work well together.
“When you switch out a component, you will want to ensure that component will work well with the entire system,” Luginbuhl said. “That ability speaks to the whole concept of this type of testing.”
It’s also possible that at least part of this testing approach could be done early in the software development process, after you have the model for particular software products, Cohen said. If you know the final set of products, you know what to measure, she said.
How quickly this approach will find its way into the mainstream is an open question. The University of Nebraska recently released an open-source version of the test-generating algorithm to stimulate broader input, but Cohen said it might take some time to build confidence in the procedure.
“As [the technique] matures and people become more aware of it, we expect a well-formed demand will appear,” Luginbuhl said. “But we probably also need to be a bit more proactive with it.”