Table of contents
Report Date: 2/2002 Price Tag: $60 Billion AnnuallyWASHINGTON (COMPUTERWORLD) - Software bugs are costing the U.S. economy an estimated $59.5 billion each year, with more than half of the cost borne by end users and the remainder by developers and vendors, according to a new federal study.
Improvements in testing could reduce this cost by about a third, or $22.5 billion, but it won't eliminate all software errors, the study said. Of the total $59.5 billion cost, users incurred 64% of the cost and developers 36%.
NIST Report[local copy], News Release,
Out of curiosity of how the study calculated the cost, I skimmed through the report. The following is a summary of their methodology.
It divided software developing process into stages: Requirement Gathering and Analysis, Architectural Design, Coding, Unit Test, Integration and Component, RAISE System Test, Early Customer Feedback, Beta Test Programs, and Post-product Release.
Bugs are generated at each stage of the software development process. The later in the production process that a bug is discovered, the more costly it is to repair the bug. Then impact estimates were developed relative to two counterfactual scenarios. The first scenario investigates the cost reductions if all bugs and errors could be found in the same development stage in which they are introduced. This is inferred to as the cost of an inadequate software testing infrastructure. The second scenario investigates the cost reductions associated with finding an increased percentage (but not 100 percent) of bugs and errors closer to the development stages where they are introduced. This is referred to as a cost reduction from feasible infrastructure improvements.
The study examined the impact of buggy software in several major industries -- automotive, aerospace and financial services -- and then extrapolated the results for the U.S. economy. It then concluded software bugs are costing (the first scenario) the U.S. economy an estimated $59.5 billion each year. Improvements in testing (the second scenario) could reduce this cost by about a third, or $22.5 billion
The report also included interesting tables that show the frequency of which stages errors are found, and relative cost to repair defects when found at different stages (in Ch6 and Ch7).
Incident Date: 12/31/2008 Ironic Factor: ****
(Associated Press) Happy New Year from Microsoft Corp.: Your Zune is dead.
Thousands of Microsoft's Zune media players -- the software company's answer to Apple Inc.'s iPod -- unexpectedly conked out Wednesday and showed users an error message, prompting references to 'Y2K for Zunes'. The problems appeared when people tried to start up their devices.
Article [Local copy]
The software bug for the freeze was later isolated. It is a dumb programming bug that causes troubles only on the last day of a leap year.
Incident Date: 9/14/2004 Ironic Factor: *****
(IEEE Spectrum) -- It was an air traffic controller's worst nightmare. Without warning, on Tuesday, 14 September, at about 5 p.m. Pacific daylight time, air traffic controllers lost voice contact with 400 airplanes they were tracking over the southwestern United States. Planes started to head toward one another, something that occurs routinely under careful control of the air traffic controllers, who keep airplanes safely apart. But now the controllers had no way to redirect the planes' courses.
The controllers lost contact with the planes when the main voice communications system shut down unexpectedly. To make matters worse, a backup system that was supposed to take over in such an event crashed within a minute after it was turned on. The outage disrupted about 800 flights across the country.
Inside the control system unit is a countdown timer that ticks off time in milliseconds. The VCSU uses the timer as a pulse to send out periodic queries to the VSCS. It starts out at the highest possible number that the system's server and its software can handle232. It's a number just over 4 billion milliseconds. When the counter reaches zero, the system runs out of ticks and can no longer time itself. So it shuts down.
Counting down from 232 to zero in milliseconds takes just under 50 days. The FAA procedure of having a technician reboot the VSCS every 30 days resets the timer to 232 almost three weeks before it runs out of digits.
Article [Local copy]
Incident Date: 8/14/2003 Price Tag: $7 - $10 Billion Ironic Factor: **
NEW YORK (AP) - A programming error has been identified as the cause of alarm failures that might have contributed to the scope of last summer's Northeast blackout, industry officials said Thursday.
... The failures occurred when multiple systems trying to access the same information at once got the equivalent of busy signals, he said. The software should have given one system precedent.
With the software not functioning properly at that point, data that should have been deleted were instead retained, slowing performance, he said. Similar troubles affected the backup systems.
News Release [local copy], Cost Estimate [local copy],
Incident Date: 9/23/1999 Price Tag: $125 million Ironic Factor: ****WASHINGTON (AP) -- For nine months, the Mars Climate Orbiter was speeding through space and speaking to NASA in metric. But the engineers on the ground were replying in non-metric English.
It was a mathematical mismatch that was not caught until after the $125-million spacecraft, a key part of NASA's Mars exploration program, was sent crashing too low and too fast into the Martian atmosphere. The craft has not been heard from since.
Noel Henners of Lockheed Martin Astronautics, the prime contractor for the Mars craft, said at a news conference it was up to his company's engineers to assure the metric systems used in one computer program were compatible with the English system used in another program. The simple conversion check was not done, he said.
Article [local copy]
Incident Date: 11/1993 - 6/1994 Price Tag: > $200 million Ironic Factor: *
(Scientific America) -- Scheduled for takeoff by last Halloween (1993), the airport's grand opening was postponed until December to allow BAE Automated Systems time to flush the gremlins out of its $193-million system. December yielded to March. March slipped to May. In June the airport's planners, their bond rating demoted to junk and their budget hemorrhaging red ink at the rate of $1.1 million a day in interest and operating costs, conceded that they could not predict when the baggage system would stabilize enough for the airport to open.
Software Chronic Crisis
Incident Date: 9/1997 Ironic Factor: ****
(Government Computer News) The Navy's systems chief has begun an investigation into the computer failure that left the Aegis cruiser USS Yorktown dead in the water for several hours last fall.
On Sept. 21, 1997, the Yorktown experienced what the Navy called .an engineering LAN casualty. [GCN, July 13, Page 1]. A systems administrator fed bad data into the ship's Remote Database Manager, which caused a buffer overflow when the software tried to divide by zero. The overflow crashed computers on the LAN and caused the Yorktown to lose control of its propulsion system, Navy officials said.
Article [local copy]
Incident Date: 9/1997 Price Tag: $500 million Ironic Factor: ****
(By James Gleick) It took the European Space Agency 10 years and $7 billion to produce Ariane 5, a giant rocket capable of hurling a pair of three-ton satellites into orbit with each launch and intended to give Europe overwhelming supremacy in the commercial space business.
All it took to explode that rocket less than a minute into its maiden voyage last June, scattering fiery rubble across the mangrove swamps of French Guiana, was a small computer program trying to stuff a 64-bit number into a 16-bit space.
This shutdown occurred 36.7 seconds after launch, when the guidance system's own computer tried to convert one piece of data -- the sideways velocity of the rocket -- from a 64-bit format to a 16-bit format. The number was too big, and an overflow error resulted. When the guidance system shut down, it passed control to an identical, redundant unit, which was there to provide backup in case of just such a failure. But the second unit had failed in the identical manner a few milliseconds before. And why not? It was running the same software.
Thomas Huckle's Collection of Software Bugs
National Vulnerability Database
Jonathan Jacky's Safety-Critical Computing Page
They Write the Right Stuff
Software [In]security: Software Security Demand Rising
Last updated: 8/25/2009
Author : Gang Tan, Lehigh University
Please send comments to gtan AT cse.lehigh.edu