Software bug

format_list_bulleted Contenido keyboard_arrow_down
ImprimirCitar

A software bug, bug, or simply bug (also known as bug) is a problem in a computer program or software system that triggers an undesired result. Programs that help to detect and eliminate errors in software are called debuggers (in English, debuggers).

Among the many notable incidents caused by this type of error are the Therac-25 radiotherapy machine in the 1980s, the destruction of the Mariner 1 space probe in 1962, the Ariane 5 501 in 1996, and the Airbus A400M in 2015. Also famous were the AT&T network incidents in 1990, or the Boeing 737 MAX accidents in 2018 and 2019 that led to the suspension of their flights for months due to failures in the MCAS software.

In 2002, a study commissioned by the US Department of Commerce's National Institute of Standards and Technology found that computer errors cost the US economy $59.5 billion a year, or 0.6% of GDP.

Origins of the term

Photo of the origin of the legend about the first bug known computer.

On September 9, 1947, mathematician and physicist Grace Murray Hopper and others working at Harvard University on the Mark II reported that the computer suffered a failure in electromagnetic relay #70 on panel F. investigated that relay, the team found an electrified bug that caused the relay to stay open. Hopper taped the bug to the logbook with the comment:

‘First current case of bug being found. ‘
‘First real case of ‘bug’ found. ’

This incident is erroneously referred to as the origin of the use of the English term bug (“bug”) to indicate a problem in a device or system. In reality, the term bug was already part of the English language, at least since Thomas Alva Edison used it in his notes in 1872, in a letter in 1878 to refer to mechanical or electrical defects, and in 1889 referring to interference and malfunctions on a phonograph. Hopper may have first associated it with computing, in this case, related to an actual insect. On the other hand, although during the 50s of the 20th century, Hopper also used the term debug in English when talking about debugging errors in programming codes. The first recorded use of the term is in the 1945 Journal of the Royal Aeronautical Society:

‘It ranged from the pre-design development of essential components, through the stage of type test and flight test and ‘debugging’ right through to later development of the engine. ‘
‘I went from the previous development of the design of the essential components, going through the test stage, test flight and ‘debugging’ and ending with the development of the engine. ’

Common Programming Errors

  • Division by zero
  • Infinite cycle
  • Arithmetic problems such as overflows (overflow) or subsidences (underflow).
  • Exceder the size of the defined array
  • Use an uninitialized variable
  • Miscellaneous typographical errors, for example confuse the digit "0" with the vowel "O", confuse the digit "1" with the letters "I" or "l", or write "," instead of "." as part of variable names in languages where it is not necessary to declare the variables before using them. For example, in Fortran IV the spaces do not count, so change a comma for a point in a loop DO 100 I=1. 10 equals the allocation DO100I = 1.10 that because it is a valid instruction the compiler does not report it.
  • Use magic constants, i.e. instead of defining a variable or macro to define a parameter that can change, put the value in the code. For example, declare a settlement float X[10] and use it in loops for i=0 to 10 in several parts of the program rather than defining a constant XSize=10; float X[XSize]; and use for i=0 to XSize, so omissions are avoided when the X dimension is changed, as it is enough to change XSize=20 instead of searching all the code for the magical 10 constants to change them by 20.
  • Access to unenabled memory (Access journey)
  • Loss of memory (memory leak)
  • Overflow or overflow of the battery (data structure) (Stack overflow)
  • Buffer overflow (buffer overflow)
  • Mutual block (deadlock)
  • Inappropriate indexing of tables in databases.
  • Corruption of relational databases because they are not normalized.
  • Overflow of the recursion battery, when too many calls are left on hold.
  • Program by trial and error, rather than develop programs systematically.
  • Do not attend to warning messages when compiling a program.
  • Errors resulting from the indiscriminate use of global range variables.
  • Errors for not declaring the type of variables.
  • Errors that emerge because the programmer does not take into account the semantics of the programming language. For example, do not know if the scope is lexicon or dynamic.
  • Syntax errors, for example, do not end an instruction with a point and comma (;) in a programming language that requires it.
  • Not documenting the program correctly, it causes not to understand what it does, which causes errors because it does not make clear what it does.
  • Race condition (race)
  • Incorrect use of APIs.
  • Systems incompatibility.

Installation or programming defects

Computer screen with software error in Gate 11 of Santiago-Pudahel Airport
  • Removal or replacement of libraries common to more than one program or system (DLL Hell).
  • Arbitrarily restart a user's session so that the installation has effect.
  • Suppose the user has a permanent connection to the internet.
  • Use symbolic links as a source to files that can change location.

Programming Language Error Codes

Most programming languages have at least two types of bugs that allow programmers to handle program crashes in a way that is efficient and non-aggressive to the end user. Such errors are compiler and runtime errors.

Compilation errors typically inhibit source code from being derived into an executable program, while runtime errors are specific situations in which an event external to the program prevents its execution. Regularly an efficient programmer must try to figure out how to respond to these events so that it is the program and not the user or the operating system that solves the problem. For example, an unhandled error block could do the following:

Opens the "myarchive" file for writing
starts writing data in my file
closes the file

If "myfile" does not exist (or the program or the user does not have sufficient privileges to open it), the operating system will return an error that the program will not catch and we will get a message like "The file "myfile" cannot be opened for writing" and buttons to retry, cancel and abort (in the Windows operating system), which will have no other action than to repeat themselves indefinitely with no possibility of exiting that cycle except by violently terminating the program. A code that would allow to catch the error at runtime would be:

Opens the "myarchive" file for writing
If the operating system allows it
starts writing data on "myarchive"
If he didn't allow it
informs the user of what happens
returns to a point where there is no conflict (the main menu, for example)
It continues to operate normally

Different programming languages allow different logical constructs for programmers to catch and resolve errors at runtime, such as assert, try and on error statements. in different programming languages.

Program design flaws

  • Designs with inappropriate colors for people with daltonism
  • Designs that use texts with hard-to-read typography by size or design
  • Designs that force mouse use without leaving keyboard alternatives for people with motor dysfunctions
  • To estimate that the equipment where it will be installed has certain features (such as screen resolution, processor speed, memory amount or internet connectivity) owned by a high-end team, instead of designing the software for execution in normal equipment

Implications

The type and amount of damage produced by a software bug can affect decision-making processes and software quality policy. In applications for manned space travel and for the automobile, software quality controls must be superior.

Notable cases

Y2K

Y2K

The year 2000 (Y2K) problem could have become an economic collapse because many programs would interpret the year 2000 as if it were the year 1900. Program adaptation and correction efforts prevented serious problems.

Knight Capital

Disruption of the New York Stock Exchange in 2012. On August 1, 2012, Knight Capital caused a disruption of the stock market that caused it to lose 75% of its market capitalization in two days. Knight Capital had the SMARS software that managed the orders to be executed in the market in an automated, high speed and algorithmic way. Inside the code I had the "Power Peg" functionality, which I hadn't used since 2003. It hadn't been removed and was available if called. The count of shares sold in each order has been performed by another part of the code outside of "Power Peg" since 2005. Starting on July 27, 2012, the installation of the new SMARS software began in phases on the servers over several days. The new software contained a label that in the old one activated "Power Peg." A technician forgot to copy the new RLP code to one of the eight SMARS servers that handled automated stock buy and sell orders. On August 1, 2012, the server not updated when executing the old "Power Peg" code sent millions of orders because the counter of actions executed in each order was not communicated to SMARS and so the purchase process never stopped. In an attempt to fix the problem they uninstalled the new RLP code from the seven working servers, which exacerbated the problem. In 45 minutes he executed 4 million operations in 154 stocks moving 397 million shares, when instead he should have executed 212 small orders. Knight Capital lost $460 million and was fined $12 million by the SEC for violating exchange regulations.

Boeing 737 MAX

Boeing 737 MAX 8

Two accidents of the Boeing 737 MAX aircraft in 2018 and 2019 led to the FAA initiating the Flight Suspension of the Boeing 737 MAX for months on March 13, 2019 due to failures in the MCAS (Maneuvering Characteristics Augmentation System) software, in English, Manoeuvring Characteristics Augmentation System). 346 people died in the accidents. The FAA allowed manufacturers like Boeing to issue flight certificates for their planes. In November 2019, the FAA suspended Boeing's ability to issue certificates for MAX aircraft.

MCAS in the MAX was designed to be activated using the signal from one of the aircraft's two angle of attack sensors, making it susceptible to Single Point of Failure. When MCAS detects that the aircraft is in manual mode with the flaps up and at a high angle of attack, it adjusts the horizontal stabilizer to lower the nose so the pilot does not climb too fast and cause a stall. Although MCAS could cause unintentional dives, it was not mentioned in flight and training manuals, so pilots were unaware of it. In March 2019, 387 MAX aircraft that made 8,600 weekly flights for 59 airlines were grounded. In January 2020, Boeing estimated that in 2019 it lost $18.4 billion and had 183 MAX orders cancelled. In 2019 the price of a Boeing 737 MAX ranged from 100 to 135 million USD.

Therac-25

Simulation of the input screen of a Therac-25.
Simulation of the input screen of a Therac-25 with error 54.
Overflow of an octeto variable (8 bits).

The Therac-25 was a radiotherapy machine produced by AECL, successor to the Therac-6 and Therac-20 models (earlier units were produced in association with CGR). The device was involved in at least six accidents between 1985 and 1987, in which several patients received radiation overdoses. Three of the patients died as a direct consequence. These accidents called into question the reliability of software control of critical safety systems, becoming a case study in medical informatics and software engineering. The investigative commission concluded that the primary causes of the accidents were poor development practices, requirements analysis, and poor software design, and not isolated errors in the source code. In particular, the Therac-25's software was designed in such a way that it was nearly impossible to automatically find and fix bugs.

The system did not use a standard operating system. Instead it ran a proprietary operating system written in PDP-11 assembly language to run on the 32K PDP-11/23 computer. When the system reported an error and stopped X-ray, it only displayed the message "MALFUNCTION" (function error) followed by a number from 1 to 64. The manual for the machine did not explain the problem or display the error codes, and therefore the operator ended up closing the warning and proceeding with the treatment. AECL staff and machine operators initially did not believe the patients' complaints about their high confidence in the machine.

Engineering had reused code from older models (Therac-6 and Therac-20), which did have mechanical safety systems.

The glitch only occurred when a particular key sequence was rapidly entered into the VT100 terminal, which controlled the Therac-25's PDP-11 computer. The operator had filled in all the boxes and was in the command box when he realized that there was an error in the beam type box that contained an X (X-Ray) when it should have contained an E (Electron beam). To correct it, I used the cursor up to the box, type an E and move the cursor down to the command box, type a B and press Enter. The entire sequence was E B Enter. If this sequence was performed in less than 8 seconds, the machine produced radiation that could be up to 1000 times which it was intended to apply.

This occurred very rarely and was not known to exist such a race condition error in which the exit or state of a process is dependent on a sequence of events running in arbitrary order and they will work on the same shared resource. An error can occur when these events do not arrive (are executed) in the order that the scheduler expected.

The program changed the flag variable "Class3" by incrementing it each time it ran the test fit routine, instead of assigning it a fixed value. The "Class3" variable was 1 byte long and its possible values ranged from 0 to 255. When the value was 255 and 1 was added, the "Class3" variable became 0. When the "Class3" variable was 0, it indicated that the electron beam could be launched and the position of the collimator was not checked.

The fit test routine was run hundreds of times in each session for a patient. Once out of 256 executions of the routine, the variable "Class3" was set to 0 (unintended), the collimator was not checked, and any collimator failure was not detected. The overdose occurred if the operator pressed the button Set at the precise moment when the "Class3" variable went from 255 to 0 (overflow). The software then applied the maximum power of 25 MeV without having the target in place and without scanning. AECL corrected this problem by assigning the variable "Class3" a non-0 value each time it went through the fit test routine, instead of incrementing it.

OpenSSL

Simplified explanation of "Heartbleed" error.

A user of the Debian OpenSSL package reported a notice for rectification. In the process of fixing the issue, a programmer messed up the random number generator. The faulty patch was released in September 2006 with OpenSSL version 0.9.8c-1. It was not until April 2008 that the problem was discovered. All cryptographic keys generated with that version are compromised because "random" numbers are easily predictable, and data encrypted with them is also vulnerable. This posed a threat to many applications that rely on encryption such as S/MIME, Tor, connections protected by SSL or TLS, and SSH. Fixed in OpenSSL version 0.9.8c-4etch3.

Heartbleed is a software security hole in the open source OpenSSL library, only vulnerable in version 1.0.1f, that allows an attacker to read memory from a server or a client, allowing you, for example, to get the SSL private keys of a server.

The vulnerable code was adopted and widely used with the release of OpenSSL version 1.0.1 on March 14, 2012. Heartbeat support was enabled by default, causing affected versions to be vulnerable by default.

RFC 6520 Heartbeat Extension tests secure TLS/DTLS communication links by allowing a computer at one end of a connection to send a "heartbeat request" ("Heartbeat Request"), which consists of a payload, typically a text string, together with the length of said payload as a 16-bit integer. The receiving computer must then send the exact same payload back to the sender. Affected versions of OpenSSL allocate a memory buffer for the message to return based on the length field in the request message, without regard to the actual payload size of that message. Due to this failure to check for proper bounds, the returned message consists of the payload, possibly followed by whatever else is allocated in the memory buffer. The Canada Revenue Agency reported the theft of Social Security numbers belonging to 900 taxpayers, stating that they were accessed through exploiting the bug over a six-hour period on April 8, 2014. When the attack was discovered, the agency shut down its website and extended the filing deadline for taxpayers from April 30 to May 5. The first fixed version, 1.0.1g, was released on April 7, 2014.

In popular culture

  • In the film "Brazil" (1985), an error in a last name due to a fly fallen on the head of a teletype causes the wrong arrest of an innocent citizen, with fatal consequences for this; a kind of dramatization based on the case of the moth on the computer Mark II in 1947.
  • The novel by Ellen Ullman The Bug (2004) tries the search for software error by a software tester and a programmer. The bug is nicknamed "The Jester" for its tendency to appear in the most inopportune moments and threatening the fate of the company.
  • The Canadian film Control Alt Delete (2008) is a comedy that deals with a programmer who in late 1999 tries to correct errors to prevent 2YK error problems.

Contenido relacionado

PLC

The acronym PLC or plc can refer...

Monoid

In abstract algebra, a monoid is an algebraic structure with a binary operation, which is associative and has a neutral element, that is, it is a semigroup...

Forward bug fixes

In telecommunications, information theory, and coding theory, forward error correction or channel codingis a technique used to control errors in data...
Más resultados...
Tamaño del texto:
Copiar