Assembly language
The assembly language or assembler (in English: assembler language and the abbreviation asm) is a programming language used in microprocessors. It implements a symbolic representation of the binary machine codes and other constants needed to program a processor architecture and is the most direct representation of architecture-specific machine code readable by a programmer. Each processor architecture has its own assembly language that is usually defined by the hardware manufacturer, and is based on mnemonics that symbolize processing steps (instructions), processor registers, memory locations, and other language features.. An assembly language is therefore specific to a certain physical (or virtual) computer architecture. This is in contrast to most high-level programming languages, which are ideally portable.
A utility program called an assembler is used to translate assembly language statements into the machine code of the target computer. The assembler performs a more or less isomorphic translation (a one-to-one mapping) from mnemonic statements to machine instructions and data. This is in contrast to high-level languages, in which a single declaration generally gives rise to many machine instructions.
Many sophisticated assemblers offer additional mechanisms to facilitate program development, control the assembly process, and help debugging. Notably, most modern assemblers include a macro facility (described below), and are called macro assemblers.
It was mainly used in the early days of software development, when powerful high-level languages were not yet available and resources were limited. Currently it is frequently used in academic and research environments, especially when direct manipulation of hardware, high performance, or controlled and reduced use of resources are required. It is also used in the development of device drivers (in English, device drivers ) and in the development of operating systems, due to the need for direct access to the instructions of the machine. Many programmable devices (such as microcontrollers) still rely on assembler as the only way to manipulate them.
Features
- The code written in assembly language has a certain difficulty of being understood as its structure approaches the machine language, that is, a low-level language.
- The assembly language is difficultly portable, i.e. a code written for a microprocessor, may need to be modified, to be used in another different machine. By changing to a machine with different architecture, it is usually necessary to rewrite it completely.
- Programs made by an expert programmer in assembly language can be faster and consume less system resources (e.g. RAM) than the equivalent program compiled from a high-level language. Programming carefully in assembly language can create programs that run faster and occupy less space than with high-level languages. As both processors and high-level language compilers have evolved, this feature of the assembly language has become increasingly less significant. That is, a modern high-level language compiler can generate code almost as efficient as its equivalent in assembly language.
- With the assembly language you have a very precise control of the tasks performed by a microprocessor so you can create segments of difficult code and/or very inefficient to program in a high-level language, since, among other things, in the assembly language you have CPU instructions that are generally not available in high-level languages.
Assembler program
Generally, a modern assembler program creates object code by translating mnemonic assembly language instructions into opcodes, and resolving symbolic names for memory locations and other entities. Usage Symbolic referencing is a key feature of assembly language, avoiding tedious manual calculations and updates of addresses after each program modification. Most assemblers also include macro facilities to perform text substitution - eg. generate short sequences of instructions as inline expansion instead of calling subroutines.
Assemblers are generally simpler to write than compilers for high-level languages, and have been around since the 1950s. Modern assemblers, especially for RISC-based architectures such as MIPS, Sun SPARC, and HP PA-RISC, as well as for x86 (-64), optimize instruction scheduling to exploit CPU pipelines efficiently.
In high-level language compilers, they are the last step before generating executable code.
Number of steps
There are two types of assemblers based on how many passes through the source it takes to produce the executable program.
- One-step assemblers pass through the source code once and assume that all symbols will be defined before any instruction referred to them.
- The two-step assemblers create a table with all symbols and their values in the first step, then use the table in a second step to generate code. The assembler must at least be able to determine the length of each instruction in the first step so that the addresses of the symbols can be calculated.
The advantage of a one-step assembler is speed, which is not nearly as important as it once was given advances in computer speed and capabilities. The advantage of the two-step assembler is that symbols can be defined anywhere in the program's source code. This allows programs to be defined in more logical and meaningful ways, making two-step assembler programs easier to read and maintain.
High-level assemblers
The most sophisticated high-level assemblers provide language abstractions such as:
- Advanced control structures
- High-level statements and invocations
- High-level abstract data types, including structures/records, unions, classes, and sets
- Processing of sophisticated macros (although available in ordinary assemblies since the late 1960s for IBM S/360, among other machines)
- Object-oriented programming characteristics
Use of the term
Note that, in normal professional usage, the term assembler is often used to refer to both assembly language and an assembly program (which converts source code written in assembly language to object code that will then be linked to produce assembly language). of machine). The following two expressions use the term "assembler":
Language
Assembly language directly reflects the architecture and machine language instructions of the CPU, and can be very different from one CPU architecture to another.
Each microprocessor architecture has its own machine language, and consequently its own assembly language, since this is closely linked to the structure of the hardware for which it is programmed. Microprocessors differ in the type and number of operations they support; they can also have different number of registers, and different representation of data types in memory. Although most microprocessors are capable of performing essentially the same functions, the way in which they do so differs, and the respective assembly languages reflect this difference.
CPU Instructions
Most CPUs have more or less the same groups of instructions, although they don't necessarily have all the instructions in every group. The operations that can be performed vary from CPU to CPU. A particular CPU may have instructions that another does not and vice versa.
Early 8-bit microprocessors did not have operations to multiply or divide numbers, for example, and you had to make subroutines to perform those operations. Other CPUs may not have floating point operations and you would have to make or get libraries that do those operations.
CPU instructions can be grouped, according to their functionality, into:
Integer operations: (8, 16, 32 and 64 bit depending on CPU architecture, on very old systems also 12, 18, 24, 36 and 48 bit).
These are operations performed by the CPU's Arithmetic Logic Unit:
- Arithmetic operations. As a sum, subtraction, multiplication, division, module, change of sign
- Bone operations. Logic operations bit to bit like AND, OR, XOR, NOT
- Bit operations. As a logical shift or shift and rotations or Operators at bit level (do the right or left, through the stroke bit or without it)
- Comparisons
Data move operations:
- Between records and memory:
- Although the instruction is called "move", in the CPU, "move data" actually means copying data, from an origin to a destination, without the data disappearing from the origin.
- Values can be moved:
- From one registry to another
- From a record to a place of memory
- From a memory place to a record
- From one place to another of memory
- An immediate value to a registry
- An immediate value to a place of memory
- Note: An immediate value is a constant that is specified in the same microinstruction.
- Battery operations (stackin English:
- PUSH (write data to the top of the stack)
- POP (read data from battery top)
- Input/output operations:
- They are operations that move data from a record, from and to a port; or from memory, from and to a port
- INPUT Reading from a port of entry
- OUTPUT Writing to an output port
Operations to control the flow of the program:
- Subrutine calls and returns
- Calls and returns of interruptions
- Conditional balances according to the result of a comparison
- Unqualified balances
Operations with real numbers:
The standard for operations with real numbers in CPUs is defined by IEEE 754.
A CPU can have floating-point operations on real numbers using the numeric coprocessor (if any), such as the following:
- Arithmetic operations. Suma, subtraction, multiplication, division, change of sign, absolute value, whole part
- trigonometric operations. Sinus, cosine, tangent, archtangent
- Operations with logarithms, powers and roots
- Other
Assembly language has mnemonics for each of the CPU instructions in addition to other mnemonics to be processed by the assembly program (such as macros and other assembly-time statements).
Assembly
The transformation of assembly language into machine code is done by an assembler program, and the reverse translation can be done by a disassembler. Unlike high-level languages, here there is usually a 1-to-1 correspondence between simple assembler instructions and machine language. However, in some cases, an assembler can provide "pseudo instructions" that are expanded into larger machine code in order to provide the necessary functionality and simplify programming. For example, for conditional machine code such as "if X greater than or equal to", an assembler can use a pseudo-instruction to the group "make if less than", and "if = 0" on the result of the previous condition. The more complete Assemblers also provide a rich macro language that is used to generate more complex code and data streams.
For the same processor and the same CPU instruction set, different assembly programs may each have variations and differences in the set of mnemonics or in the syntax of their assembly language. For example, in an assembly language for the x86 architecture, the instruction to move 5
to the AL
register could be expressed as follows: MOV AL, 5
, whereas for another assembler for the same architecture it would be expressed the other way around: MOV 5, AL
. Both assembly languages would do exactly the same thing, just phrased differently. The former uses the Intel syntax, while the latter uses the AT&T syntax.
The use of the assembler does not definitively solve the problem of how to program a microprocessor-based system in a simple way, since to make efficient use of it, it is necessary to know the microprocessor in depth, the work registers it has, the memory structure, and many other things related to its basic structure of operation.
Examples
A program written in assembly language consists of a series of instructions that correspond to the flow of orders executable by a microprocessor.
For example, in assembly language for an x86 processor:
The judgment
MOV AL, 61h
Assigns the hexadecimal value 61
(97 decimal) to the "AL
" register.
The assembler program reads the above statement and produces its binary equivalent in machine language.
- Binary:
10110000 01100001
(hexadecimal:B061
)
The mnemonic MOV
is an opcode or "opcode". The opcode is followed by a list of arguments or parameters, completing a typical assembler statement. In the example, AL
is an 8-bit processor register, which will be assigned the specified hexadecimal value 61.
The machine code generated by the assembler consists of 2 bytes. The first byte contains packed the MOV instruction and the code of the register to which the data is going to be moved:
1011 0000 01100001 SPECIAL GENDER LICIT LICIT MIN LICIT MIN LIC MIN LIC MIN LIC MIN LIC MIN LIC MIN LIC UB LIC أعربية Русский 日本語 _ +------ MOV instruction.
The second byte specifies the number 61h, written in binary as 01100001
, which will be assigned to the AL
register, leaving the executable statement as:
10110000 01100001
Which can be understood and executed directly by the processor.
Language design
Basic Elements
There is a large degree of diversity in the way assembler authors categorize statements and in the nomenclature they use. In particular, some describe anything as a pseudo-operation (pseudo-Op), with the exception of the machine mnemonic or the extended mnemonic.
A typical assembly language consists of 3 types of instruction statements that are used to define the operations of the program:
- Mnemonics of opcode
- Data sections
- Assembly directives
Opcode Mnemonics and Extended Mnemonics
Unlike the instructions (statements) of high-level languages, instructions in assembly language are generally very simple. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of one operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (typically one-byte values, encoded in the instruction itself), registers specified in the instruction, implicit, or the addresses of data located elsewhere in memory. This is determined by the underlying architecture of the processor, the assembler simply reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand, eg, the System/360 assembler uses B
as an extended mnemonic for BC
with a mask of 15 and NOP
to BC
with a mask of 0.
Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the name of the instruction. For example, many CPUs do not have an explicit NOP
(No Operation) instruction, but do have instructions that can be used for such a purpose. On the 8086 CPU, the XCHG AX,AX
instruction (exchanges the AX register with itself) is used for the NOP
, with NOP
being a pseudo-opcode to encode the XCHG AX,AX
instruction. Some disassemblers recognize this and will decode the XCHG AX,AX
command as NOP
. Similarly, the IBM assemblers for System/360 use the extended mnemonics NOP
and NOPR
with zero masks for BC
and BCR
.
Some assemblers also support simple built-in macros that generate two or more machine instructions. For example, with some assemblers for the Z80, the instruction
LD HL, BC
generate the instructions
LD L, C
LD H, B
.
LD HL, BC
is a pseudo-opcode, which in this case simulates a 16-bit instruction. When expanded, two 8-bit instructions are produced that are equivalent to the 16-bit dummy.
Data Sections
There are instructions used to define data elements to handle data and variables. They define the data type, length, and alignment of the data. These instructions can also define whether the data is available to external programs (separately assembled programs) or only to the program in which the data section is defined. Some assemblers classify these instructions as pseudo-instructions.
Assembler Directives
Assembler directives, also called pseudo opcodes, pseudo-operations, or pseudo-ops, are instructions that are executed by an assembler at assembly time, not by the CPU at run time. They can make the assembly of the program dependent on parameters specified by the programmer, so that a program can be assembled in different ways, perhaps for different applications. They can also be used to manipulate the presentation of a program to make it easier to read and maintain.
For example, directives could be used to reserve storage areas and optionally to allocate their initial content. Directive names often begin with a period to distinguish them from machine instructions.
Symbolic assemblers allow programmers to associate arbitrary names (labels or symbols) with memory locations. Usually, each constant and variable has a name so that instructions can refer to those locations by name, thus promoting self-documenting code. In executable code, each applet's name is associated with its entry point, so any call to an applet can use its name. Within subprograms, GOTO destinations are given labels. Some assemblers support local symbols that are lexically different from normal symbols (eg, the use of "10$" as a GOTO destination).
Most assemblers provide flexible symbol handling, allowing programmers to handle various namespaces, automatically calculate offsets within data structures, and assign labels that refer to literal values or the result of simple calculations performed by the assembler. Tags can also be used to initialize constants and variables with relocatable addresses.
Assembly languages, like most other computer languages, allow comments to be added to the source code, which are ignored by the assembly program. Proper use of comments is even more important with assembly code than with high-level languages, since the meaning and purpose of a sequence of instructions can be more difficult to understand from the code itself.
Wise use of these facilities can significantly simplify the problems of coding and maintaining low-level code. Raw assembly language source code generated by compilers or disassemblers - code without any comments, meaningful symbols, or data definitions - is very difficult to read when changes need to be made.
Macro
Many assemblers support predefined macros, and others support programmer-defined (and repeatedly redefinable) macros that involve sequences of lines of text, in which variables and constants are embedded. This sequence of text lines can include opcodes or directives. Once a macro is defined, its name can be used in place of a mnemonic. When the assembler processes such a statement, it replaces the statement with the lines of text associated with that macro. It then processes them as if they existed in the original source code file (including, in some assemblers, expanding any macros that exist in the replacement text).
Since macros can have short names but expand to several, or indeed many, lines of code, they can be used to make assembly language programs appear much shorter, requiring fewer lines of source code, like It happens with high level languages. They can also be used to add higher levels of structure to assembly programs; they optionally introduce embedded debugging code via parameters and other similar features.
Many assemblers have built-in (or predefined) macros for system calls and other special sequences of code, such as data generation and storage performed via advanced bitwise operations and Booleans used in games, security software, data management, and cryptography.
Assembler macros often allow macros to take parameters. Some assemblers include sophisticated macro languages, incorporating high-level language elements such as optional parameters, symbolic variables, conditions, string manipulations, arithmetic operations, all usable during the execution of a given macro, and allowing macros save context or exchange information. Thus a macro can generate a large number of instructions or data definitions in assembly language, based on the macro's arguments. This could be used to generate, for example, record-style data structures or "unrolled" loops, or could generate entire algorithms based on complex parameters. An organization, using assembly language, which has been heavily extended using such a suite of macros, can be considered to be working in a high-level language, since such programmers are not working with the lower-level conceptual elements of the computer.
Macros were used to customize large-scale software systems for specific customers in the mainframe era. Macros were also used by the users themselves to meet the needs of their organizations by making specific versions of the manufacturer's operating systems. This was done, for example, by system programmers working with IBM's Conversational Monitor System / Virtual Machine (CMS/VM) and real time transaction processing plugins. > from IBM, CICS, Customer Information Control System, the airline/financial system that began in the 1970s and still runs with many computerized reservation systems (CRS) and systems today's credit card.
It is also possible to use only the macro processing abilities of an assembler to generate code written in completely different languages. For example, to generate a version of a COBOL program using a pure assembler macro program containing lines of COBOL code within assembly-time operators instructing the assembler to generate arbitrary code.
This was because, as in the 1970s it was observed, the concept of "macro processing" it is independent of the concept of "assembly", the former being, in modern terms, more of a text processing, than an object code generation. The concept of macro processing appeared, and appears, in the C programming language, which supports "preprocessor instructions" to set variables, and does conditional tests on their values. Note that unlike certain previous macroprocessors within assemblers, the C preprocessor is not Turing-complete because it lacked the ability to loop or go to, the latter allowing programs to loop.
Despite the power of macro processing, it fell out of use in many high-level languages (a notable exception is C/C++), but it persisted in assemblers. This was because many programmers were quite confused by macro parameter substitution and did not distinguish the difference between macro processing, assembly, and execution.
Macro parameter substitution is strictly by name: at macro processing time, the value of a parameter is textually substituted for its name. The most famous kind of resulting Software Bug was the use of a parameter that was itself an expression and not a primary name when the macro writer expected a name. In the macro:
- foo: macro to
- load a*b
the intent was that the calling routine would supply the name of a variable, and the variable or constant "global b" would be used to multiply "a". If foo is called with the a-c
parameter, the macro expansion load a-c*b
occurs. To avoid any possible ambiguity, users of macro processors can wrap formal parameters within macro definitions in parentheses, or calling routines can wrap input parameters in parentheses. Thus, the correct macro, with the parentheses, I would be:
- foo: macro to
- load (a)*b
and its expansion, would result in: load (a-c)*b
PL/I and C/C++ offer macros, but this facility can only manipulate text. On the other hand, homoiconic languages, such as Lisp, Prolog, and Forth, retain the power of assembly language macros because they can manipulate their own code as data.
Support for structured programming
Some assemblers have incorporated elements of structured programming to encode the flow of execution. The earliest example of this approach was in the Concept-14 macro set, originally proposed by Dr. H.D. Mills (March 1970), and implemented by Marvin Kessler at IBM's Federal Systems Division, which extended the S/360 macro assembler with IF/ELSE/ENDIF flow control blocks and the like. This was a way to reduce or eliminate the use of GOTO operations in assembly language code, one of the main factors causing spaghetti code in assembly language. This approach was widely accepted in the early 1980s (the last days of large-scale assembly language use).
A curious design was A-natural, a "stream-oriented" (stream-oriented) for Whitesmiths Ltd. (developers of the Unix-like Idris operating system) 8080/Z80[citation needed] processors, and what was reported as the first commercial C compiler). The language was classified as an assembler, because it worked with raw machine elements such as opcodes, registers, and memory references; but it incorporated an expression syntax to indicate the order of execution. Parentheses and other special symbols, along with block-oriented structured programming constructs, controlled the sequence of generated instructions. A-natural was built as the object language of a C compiler, rather than hand-coded, but its logical syntax gained some supporters.
There has been little apparent demand for more sophisticated assemblers due to the decline in large-scale assembly language development. Despite this, they are still being developed and applied in cases where resource constraints or architectural quirks system target prevent the effective use of high-level languages.
Using Assembly Language
Historical perspective
Assembly languages were first developed in the 1950s, when they were referred to as second-generation programming languages. For example, SOAP (Symbolic Optimal Assembly Program) was a 1957 assembly language for the IBM 650 computer. Assembly languages eliminated much of the error-prone and time-consuming programming of first-generation languages, which needed with early computers, freeing programmers from tedium such as remembering number codes and calculating addresses. They were once widely used for all kinds of programming. However, by the 1980s (1990s on microcomputers), their use had been largely supplanted by high-level languages,[citation needed] in search of improved programming productivity. Today, although assembly language is almost always handled and generated by compilers, it is still used for direct hardware manipulation, access to specialized processor instructions, or to solve performance-critical problems. Typical uses are device drivers, low-level embedded systems, and real-time systems.
Historically, a large number of programs have been written entirely in assembly language. Operating systems were almost exclusively written in assembly language until the widespread acceptance of the C programming language in the 1970s and early 1980s. Also, many commercial applications were written in assembly language, including much of the software written by large corporations. for IBM mainframes. The COBOL and FORTRAN languages eventually displaced much of this work, although a number of large organizations retained assembly language application frameworks well into the 1990s.
Most early microcomputers relied on hand-coded assembly language, including most operating systems and large applications. This was because these systems had severe resource limitations, imposed idiosyncratic display and memory architectures, and provided limited and buggy system services. Perhaps more important was the lack of first-class compilers of high-level languages suitable for use on the microcomputer. A psychological factor may also have played a role: the first generation of microcomputer programmers retained a "wires and pliers" hobbyist attitude.
In a more commercial context, the biggest reasons to use assembly language was to make programs with minimum size, minimum overhead, greater speed and reliability.
Typical examples of large assembly language programs of that time are the IBM PC DOS operating systems and early applications such as the Lotus 1-2-3 spreadsheet, and almost all popular games for the Atari 800 family of personal computers. Even in the 1990s, most console video games were written in assembler, including most games for the Mega Drive/Genesis and the Super Nintendo Entertainment System.[citation needed ] According to some industry insiders, assembly language was the best programming language to use to get the best performance from the Sega Saturn, a console for which it was notoriously challenging to develop and program games. arcade NBA Jam (1993) is another example. Assembler has long been the primary development language on the Commodore 64 and Atari ST home computers, as well as the ZX Spectrum. This was so in large part because the BASIC dialects on these systems offered insufficient execution speed, as well as insufficient features to take full advantage of the available hardware. Some systems, most notably the Amiga, even have IDEs with highly advanced debugging and macro features, such as the freeware ASM-One assembler, comparable to those of Microsoft Visual Studio (ASM-One predates Microsoft Visual Studio).
The assembler for the VIC-20 was written by Don French and published by French Silk. At 1639 bytes long, its author believes it to be the smallest symbolic assembler ever written. The assembler supported the usual symbolic addressing and the definition of character strings or hexadecimal strings. It also allowed address expressions that could be combined with the operations of addition, subtraction, multiplication, division, logical AND, logical OR, and exponentiation.
Current use
There have always been debates about the usefulness and performance of assembly language relative to high-level languages. Assembly language has specific niches where it is important (see below). But, in general, modern optimizing compilers for translating high-level languages into code can run as fast as handwritten assembly language, despite the counterexamples that can be found. The complexity of modern processors and memory subsystem makes effective optimization increasingly difficult for compilers as well as assembler programmers. Additionally, and to the dismay of efficiency lovers, increasing processor performance has meant that the Most CPUs are idle most of the time, with delays caused by predictable bottlenecks such as input/output operations and memory paging. This has made raw code execution speed a non-issue for many programmers.
There are some situations in which professionals might choose to use assembly language. For example when:
- An independent binary executable (stand-alone) is required, i.e. one that must be executed without resources to executing time components or libraries associated with high-level language; this is perhaps the most common situation. These are embedded programs that only store a small amount of memory and the device is directed to do tasks for a simple purpose. Examples consist of telephones, fuel and ignition systems for cars, air conditioning control systems, security systems, and sensors.
- Interacting directly with hardware, for example in controllers (drivers) of device and interrupt handlers.
- Using specific instructions from the processor not exploited or available by the compiler. A common example is the bitwise rotation instruction in the core of many encryption algorithms.
- Creating vectored functions for programs in high-level languages like C. In high-level language this is sometimes helped by intrinsic functions of the compiler that map directly to the mnemonics of the SIMD, but nevertheless results in a one-to-one assembly conversion for an associated vector processor.
- Extreme optimization is required, e.g., in an internal loop in an intensive algorithm in the use of the processor. Game programmers take advantage of hardware features skills in systems, allowing games to run faster. Large scientific simulations also require highly optimized algorithms, e.g., linear algebra with BLAS or the discrete cosine transformation (e.g., the SIMD version in x264 assembler, a library to encode video streams).
- A system with severe resource limitations (e.g., a built-in system) must be hand-coded to maximize the use of limited resources; but this is becoming less common as the price of the processor decreases and performance improves.
- There is no high-level language in a new or specialized processor.
- Writing real-time programs that need synchronization and accurate responses, such as flight navigation systems, and medical equipment. For example, in a fly-by-wire system (electronic command fly), telemetry must be interpreted and must be acted within strict time constraints. Such systems should eliminate sources of unpredictable delays, which can be created by (some) interpreted languages, automatic garbage collection, pageing operations, or appropriation multitasking. However, some high-level languages incorporate execution time components and operating system interfaces that can introduce such delays. Choosing the assembler or low-level languages for such systems gives programmers greater visibility and control over processing details.
- Total environmental control is required, in extremely high security situations where nothing can be taken for granted.
- Computer virus, bootloaders, certain device drivers/managers are written, or other items very close to the hardware or low-level operating system.
- Simulators of the set of instructions are written for monitoring, mapping and debugging of errors where the additional overload is kept to a minimum.
- Reverse engineering is made in existing binaries that may or may not have been originally written in a high-level language, for example by cracking the anti-copy protection of the proprietary software.
- Reverse engineering and video modification games (also called ROM hacking), which is possible through various methods. The most widely implemented is by altering the program code at the assembly language level.
- Automodifiable code is written (also known as polymorphic), something for which the assembly language lends itself well.
- Games and other software are written for graphic calculators.
- Compiler software is written that generates assembly code, and therefore developers must be assembly language programmers.
- They write cryptographic algorithms that should always take strictly the same time to execute, preventing time attacks.
However, assembly language is still taught in most computer science and electronic engineering programs. Although few programmers today regularly work with assembly language as a tool, the fundamental concepts remain very important. Such fundamental topics, such as binary arithmetic, memory allocation, stack processing, character set encoding, interrupt processing, and compiler design, would be hard to study in detail without understanding how the computer operates at the hardware level. Since the behavior of the computer is fundamentally defined by its instruction set, the logical way to learn such concepts is to study an assembly language. Most modern computers have a similar set of instructions. Therefore, studying a single assembly language is enough to learn: i) the basic concepts; ii) recognize situations where the use of assembly language may be appropriate; and iii) see how efficient executable code can be created by high-level languages.
Typical Applications
Hard-coded assembly language is typically used in the system boot ROM (BIOS on IBM PC compatible systems). This low-level code is used, among other things, to initialize and test the system hardware before loading the operating system, and is stored in the ROM. Once a certain level of hardware initialization has taken place, execution is transferred to other code, typically written in high-level languages; but the code running immediately after power is applied is usually written in assembly language. The same is true for boot loaders.
Many compilers translate high-level languages into assembly language first, before full compilation, allowing assembly code to be viewed for debugging and optimization purposes. Relatively low-level languages, like C, often provide special syntax for embedding assembly language on each hardware platform. The system's portable code can then use these processor-specific components through a uniform interface.
Assembly language is also valuable in reverse engineering, since many programs are only distributed in a form of machine code. Machine code is usually easy to translate into assembly language for later careful examination in this way, but it is very difficult to translate into a high-level language. Tools like Interactive Disassembler make extensive use of the disassembler for such purposes.
A niche that makes use of assembly language is the demoscene. Certain competitions require contestants to restrict their creations to a very small size (eg, 256 bytes, 1 KB, 4 KB, or 64 KB), and assembly language is the preferred language to achieve this goal. When resources are a Of concern, coding in assembler is a necessity, especially on systems constrained by CPU processing, such as the early models of the Amiga, and the Commodore 64. Optimized assembler code is written "by hand" by programmers in an attempt to minimize the number of CPU cycles used. The CPU limitations are so great that every cycle counts. Using such methods has enabled systems like the Commodore 64 to produce real-time 3D graphics with advanced effects, a feat that may be considered unlikely or even impossible for a system with a 0.99 MHz processor.[citation required]
Additional details
For any given personal computer, mainframe, embedded system, and game console, both past and present, at least one, and possibly dozens of assemblers have been written. For some examples, see the list of assemblers.
On Unix systems, the assembler is traditionally called as, although it is not a simple body of code, a new one typically being written for each port. A number of Unix variants use the GAS
Within processor groups, each assembler has its own dialect. Sometimes some assemblers can read another's dialect, for example TASM can read old MASM code, but not the other way around. FASM and NASM have a similar syntax, but each supports different macros that can be difficult to port from one to the other. The basic things are always the same, but the advanced features will be different
Also, assembly languages can sometimes be portable across different operating systems on the same type of CPU. Calling conventions between operating systems often differ slightly or not at all. and with care it is possible to gain assembly language portability, usually by linking with a C language library that doesn't change between operating systems. An instruction set simulator (which would ideally be written in assembly language) can, in theory, process the object/binary code of any assembler) for portability even across platforms (with an overhead no greater than that of a typical bytecode interpreter). This is essentially what the microcode accomplishes when a hardware platform changes internally.
For example, many things in libc rely on the preprocessor to do, to the program before it compiles, things that are OS-specific or C-specific. In fact, some functions and symbols aren't even guaranteed to exist outside of the preprocessor.. Worse yet, the size and order of struct fields, as well as the size of certain typedefs such as off_t, are not available in assembly language without the help of a configure script, and differ even between Linux versions, making it impossible to port function calls in libc other than those that take simple integers or pointers as parameters. To handle these problems, the FASMLIB project provides a portable assembly language library for Win32 and Linux platforms, but it is still very incomplete.
Some very high-level languages, such as C and Borland/Pascal, support inline assembly, where relatively short sections of assembly code can be embedded within the high-level language code. The Forth language commonly contains an assembler used to encode words.
Most people use an emulator to debug their assembly language programs.
Assembly Language Examples
Example for x86 architecture
The following is an example of the classic Hello World program written for the x86 processor architecture (under the DOS operating system).
-----------------------------; Program that prints a string on the screen-----------------------------.model small ; memory model.stack ; stack segment.data ; data segmentChain1 DB 'Hi. World. ; string to print (finaled at $).code ; code segment-----------------------------; Start of the program-----------------------------programme:; --------------------------------------------------------------------------------------------------------------------------------------------------; initiates the data segment; --------------------------------------------------------------------------------------------------------------------------------------------------MOV AX, @data ; load in AX the data segment addressMOV DS, AX ; move the address to the segment log through AX; --------------------------------------------------------------------------------------------------------------------------------------------------; Print a string on screen; --------------------------------------------------------------------------------------------------------------------------------------------------MOV DX, offset Chain1 ; moves to DX the string address to printMOV AH, 9 ; AH = code to tell the MS DOS to print on the screen, the DS:DX stringINT 21h ; call to MS DOS to execute the function (in this case specified in AH); --------------------------------------------------------------------------------------------------------------------------------------------------Finalize the program; --------------------------------------------------------------------------------------------------------------------------------------------------INT 20h ; call to MS DOS to complete the programend programme
Example for Virtual Computer (POCA)
A selection of instructions for a virtual computer) with the corresponding memory addresses where the instructions will be located. These addresses are NOT static. Each instruction is accompanied by generated assembly language code (object code) that matches the virtual computer architecture, or ISA instruction set.
Say. | Tag | Instruction | Machine code |
---|---|---|---|
.begin | |||
.org 2048 | |||
a_start | .equ 3000 | ||
2048 | ld [length], %r1 | ||
2064 | Be gifted | 00010 10,000 000 000 000 000 000 000 000 | |
2068 | addcc %r1,-4,%r1 | 10000010 1000000001111 11111 | |
2072 | addcc %r1,%r2,%r4 | 10001000 1000000000000000000000010 | |
2076 | ld %r4,%r5 | 110010 00001 00000000000000 | |
2080 | ba | 00010000 10111111 11111 11111 | |
2084 | addcc %r3,%r5,%r3 | 10000110 10000000000 11000000 00000101 | |
2088 | gift: | jmpl %r15+4,%r0 | 10000001 11000011 11100 00000100 |
2092 | length: | 20 | 000 000 000 000 000 000 000 |
2096 | address: | a_start | 00000000000000 00001011 10111000 |
.org a_start | |||
3000 | a: |
Example for virtual computer (ARC)
ARC is a subset of the SPARC processor-based architecture model. ARC (A RISC Computer) contains most of the features
important features of the SPARC architecture.
Direction | Tag | Instruction | Comments |
---|---|---|---|
.begin | |||
.org 2048 | |||
a_start: | .equ 3000 | ! Management of memory of the arrangement begins at 3000 | |
2048 | ld [lenght], %r1 | ! The length of the arrangement is set in the registry | |
2052 | ld [address], %r2 | ! Settlement Authority | |
2056 | anddcc %r3, %r0, %r3 | ! The partial amount is set at zero | |
2060 | loop: | anddcc %r1, %r1, %r0 | ! Verify if r1 has other elements |
2064 | Be gifted | ||
2068 | addcc %r1, -4, %r1 | I say the size of the arrangement | |
2072 | ld %r1 + %r2, %r4 | I charge the next element in r4 | |
2076 | addcc %r3, %r4, %r3 | ! I update the partial amount in r3 | |
2080 | ba | ! Recheck the cutting condition | |
2084 | gift: | jmpl %r15 + 4. %r0 | ! Return to the main routine |
2088 | lenght: | 20 | ! Settlement size in 20 bytes, 5 numbers |
2092 | address: | a_start | |
.org a_start | ! Start of settlement | ||
a: | 25 | ||
15 | |||
- 20 | |||
- 35 | |||
15 | |||
.end | ! End of the program |
Example for Intel 8051 µC
Assembly language code for Intel 80C51 µC:
ORG8030H include T05SEG:SETB TR0JNB uSEG,T05SEG ; this subroutine is usedCLR TR0 to make an accountCPL uSEG ;0.5 seconds throughMOV R1,DPL ; disruption of timer 0. INVOKEMOV R2,DPHCJNE R2,#07H,T05SEGCJNE R1,#78H,T05SEGMOV DPTR,♪ RET
Example for Microchip PIC16F84
Assembly language code for Microchip's 16F84 microcontroller:
ORG0 Home b.STATUS,RP0 clrfPORTB movlw0xFF movwfPORTA bsfSTATUS,RP0 Principal movfPORTA,W movwfAccountant movfAccountant,F btfscSTATUS,Z dripPuntoDecimal sublwd'9' btfssSTATUS,C END
Contenido relacionado
Telegraph
Talgo (train)
Block codes