Markup language
A markup language or markup language is a way of encoding a document that, along with the text, incorporates tags or markups that contain additional information about the structure of the text or its presentation.
The most widespread markup language is HTML (HyperText Markup Language, hypertext markup language), the foundation of the World Wide Web (global communication network).
Markup languages are often confused with programming languages. However, they are not the same, since markup does not have arithmetic functions or variables, as programming languages do. Historically, markup was and is used in the publishing and communication industry, as well as among authors, publishers, and printers.
An example of how markup language works can be seen in the live voice dictation of a document to a person who transcribes it on a machine:
- Put letter style, put quotes, put capital letters, Dear John, put two points aside, sangria, put first capital letter, I write this letter, put blacks, in a very urgent way, black closure, since you have not sent me...etc.
Markup language classes
It is usual to differentiate between three classes of markup languages, although in practice several classes can be combined in the same document. For example, HTML contains purely procedural tags, such as the B for bold, along with purely descriptive ones (BLOCKQUOTE, the attribute HREF). HTML also includes the PRE element, which indicates that the text should be rendered exactly as it is written.
Layout Markup
The presentation markup is one that indicates the format of the text. This type of markup is useful for laying out the presentation of a document for reading, but it is insufficient for the automatic processing of information. Presentation markup is easier to construct, especially for small amounts of information. However, it is difficult to maintain or modify, so its use has been reduced in large projects in favor of other, more structured types of markup.
You can try to figure out the structure of such a document by looking for clues in the text. For example, the title can be preceded by several line breaks (or lines), and be located centered on the web page. Various programs can deduce the structure of the text based on this kind of data, although the result is usually quite imperfect. An example of presentation markup might be RTF.
Procedure Markup
The procedural markup is focused towards the presentation of the text, however it is also visible to the user editing the text. The program that renders the document must interpret the code in the same order in which it appears. For example, to format a title, there must be a series of directives immediately before the text in question, telling the software instructions such as centering, increasing font size, or making it bold. Immediately after the title there should be reverse tags that reverse these effects. In more advanced systems, macros or stacks are used to facilitate the work.
Some examples of procedure markup are nroff, troff, TeX. This type of markup has been used extensively in professional publishing applications, handled by skilled typographers, as it can become extremely complex.
Descriptive markup
descriptive or semantic markup uses tags to describe pieces of text, but without specifying how they should be represented, or in what order. The languages expressly designed to generate descriptive markup are SGML and XML.
Tags can be used to add any kind of metadata to content. For example, the Atom standard, a syndication language, provides a method to stamp the "up-to-date" time, which is the publisher-supplied date of when certain information was last modified. The standard does not specify how it should be represented, or even if it should be represented. Software can use this data in many ways, including some not intended by the designers of the standard.
One of the virtues of descriptive markup is its flexibility: text snippets are tagged as they are, not as they should appear. These fragments can be used for more uses than initially intended. For example, hyperlinks were originally designed to be clicked by a user reading the text. However, search engines use them to locate new pages with related information, or to assess the popularity of a particular website.
Descriptive markup also simplifies the task of reformatting text, since the formatting information is separate from the content itself. For example, a fragment indicated as italics (<i>text</i>), can be used to mark emphasis or to mark words in another language. This ambiguity, present in the presentational and procedural marking, can only be avoided with a tedious revision by hand. However, if both cases had been descriptively differentiated with different labels, they could effortlessly be rendered differently.
Descriptive markup is evolving into generic markup. The new descriptive marking systems structure the documents in a tree, with the possibility of adding cross references. This allows them to be treated as databases, where the storage itself is structure-aware, not like large blobs (blobs) as in the past. These systems do not have a strict schema like relational databases, so they are often considered semi-structured databases.
Map of markup languages
This is a list of the main markup languages ordered by their field of application. Note that general scope languages can be used for more specific applications (but not the other way around).
Main
- GML -- 2005 SGML -- 2005 XML -- 2005 XML Dialects
Documents in general
Descriptive languages | Presentation languages | Light languages | Manual languages |
|
|
|
|
Internet Technologies
World Wide Web | User interface | Trade union | Web services |
|
|
|
|
Specialized languages
- Figures 2D: SVG, CGM, VML, InkML.
- 3D charts: VRML/X3D, STEP.
- Math: MathML and OpenMath.
- Music: LilyPond and MusicXML.
- Taxonomy: DITA
- Financial accounting: eXtensible Business Reporting Language.
- Geomatic: Geography ML.
- Aeronautics: Spacecraft ML.
- Multimedia: Synchronized Multimedia Integration Language.
- Voice: VoiceXML.
- Instant messaging: XMPP.
- Video games: BulletML, COLLADA.
History
Markup languages are named for the traditional practice of marking manuscripts with printing instructions in the margins. In the printing age, this task has corresponded to the markers, which indicated the type of letter, the style and the size, as well as the correction of errors, so that other people composed the typeface. This led to the creation of a group of standardized marks. With the introduction of computers, a similar concept carried over into the world of computing.
Origins
The concept of markup language was first expounded by William W. Tunnicliffe in 1967. The biggest novelty was the separation between presentation and text structure. Tunnicliffe, who preferred to refer to this concept as generic coding (generic coding ), would later lead the development of a standard he would call GenCode >, intended for the publishing industry. Publisher Stanley Fish also expounded similar ideas in the late 1960s. Brian Reid, in his 1980 dissertation at Carnegie Mellon University, showed his theory and a practical implementation of a descriptive language still in use.
However, the person who is considered the father of markup languages is Charles Goldfarb, a researcher for IBM. Goldfarb was involved in the creation of the GML language, and later led the committee that developed the SGML standard, the cornerstone of markup languages. In any case, and despite the controversies about its origin, it is commonly accepted that the idea arose independently several times during the 1970s, and that it became widespread in the 1980s.
Primitive languages
The first language that clearly differentiated the structure of the presentation was certainly the Scribe, developed by Brian Reid and described in 1980 in his doctoral thesis. Scribe was revolutionary for several reasons, not only because it separated the style of the marks themselves of the document, also due to the grammatical control of the use of descriptive elements. Scribe influenced the development of later languages.
Another major publishing standard is TeX, created and maintained by Donald Knuth in the 1970s and 1980s. TeX focuses on detailed text structure and font description, primarily in the field of specialized mathematical publications. This forced Knuth to spend considerable time studying typography. However, TeX requires extensive knowledge to be used, so it has only caught on in academic settings, where it is the de facto standard in various scientific disciplines. The most widespread software for the use of TeX is LaTeX.
Apart from the publishing industry, some initiatives also emerged, such as the troff and nroff languages, languages used for layout in UNIX systems. Its functionality was limited because it forced to work through trial and error, until the marks inserted in the text offered the desired result. These languages did not come to fruition in professional environments, being used by occasional users. The appearance of WYSIWYG type word processors relegated these systems to oblivion.
The generalization of markup languages
The initiative that would lay the foundations of current languages, would start from the IBM company, which was looking for new solutions to maintain large amounts of documents. The work was entrusted to Charles F. Goldfarb, who along with Edward Mosher and Raymond Lorie, designed the Generalized Markup Language or GML (note that they are also the initials of its creators). This language inherited from the GenCode project the idea that presentation should be separated from content. Markup, therefore, focuses on defining the structure of the text and not its visual presentation.
The GML language was a great success and soon spread to other areas, being adopted by the United States government, which arose the need to standardize it. In the early 1980s a committee headed by Goldfarb was formed. Sharon Adler, Anders Berglund, and James D. Mason were also members of that committee. Ideas from different sources were incorporated, and a large number of people participated. After a long process, in 1986 the International Organization for Standardization would publish the Standard Generalized Markup Language with the rank of International Standard with the code ISO 8879.
SGML specifies the syntax for including markup in text, as well as the document syntax that specifies which tags are allowed and where: the Document Type Definition or schema. This allowed an author to use any markup they wanted, choosing tag names that made sense both from the subject of the document and from the language. Thus, SGML is, strictly speaking, a metalanguage, from which several specialized languages are derived. Since the late 1980s, new languages based on SGML have appeared, such as TEI or DocBook.
SGML was widely accepted and is now used in fields where large-scale documentation is required. Despite this, it was cumbersome and difficult to learn, as a result of the ambition of the planned objectives. Its great power was both an advantage and a disadvantage. For example, certain tags could only have a beginning, or only an end, or even be ignored, thinking that the texts would be written by hand and thus save keystrokes. However, it was a key point in the development of current markup languages, since the vast majority derive from it.
Popularization: HTML
In 1991, it seemed that WYSIWYG editors (which store documents in proprietary binary formats) would encompass almost all of word processing, relegating SGML to very specific professional or industrial uses. However, the situation changed dramatically when Sir Tim Berners-Lee, who had learned SGML from his CERN colleague Anders Berglund, used SGML syntax to create HTML.
This language was similar to any other created from SGML, yet it was remarkably simple, so much so that the DTD was not developed until later. DeRose argues that the flexibility and scalability of HTML markup was one of the main factors, along with the use of URLs and the free distribution of browsers, in the success of the World Wide Web.
HTML is the most widely used document type in the world today. Its simplicity was such that anyone could write documents in this format, with little need for computer knowledge. This was one of the reasons for its success, but it also led to some chaos. The exponential growth of the web in the 1990s produced vast numbers of poorly structured documents, a problem further aggravated by the lack of respect for standards by web designers and software manufacturers.
Maturity: XML
The answer to the problems that arose around HTML came from the hand of XML (eXtensible Markup L anguage). XML is a metalanguage that allows you to create tags tailored to your needs (hence the word "extensible"). The standard defines what those tags can look like and what can be done with them. It is also especially strict regarding what is allowed and what is not, every document must meet two conditions: be valid and be well formed.
XML was developed by the World Wide Web Consortium, through a committee created and led by Jon Bosak. The main objective was to simplify SGML to adapt it to a very precise field: documents on the Internet.
The new language spread rapidly, since every XML document is itself SGML. Programs and documents created for and with SGML could be converted almost automatically to the new language. XML radically simplified the complexity of SGML, making it easier to learn and implement the new standard. Old problems were also solved, such as those arising from internationalization, and the impossibility of validating a document without a schema. The fundamental success of this language is that it strikes a balance between simplicity and flexibility.
XML was originally intended for semi-structured environments, such as texts and publications. One of the clearest examples is XHTML, the redefinition of HTML in XML code, with the advantages that this entails. However, it was soon observed that its virtues could be useful in very different fields. XML-based languages have countless applications, such as data transactions between servers, financial information exchange, chemical formulas and reactions, and much more.
Trends
New trends have abandoned tree-structured documents. The texts of ancient literature usually have a prose or poetry structure: verses, paragraphs, etc. Reference documents are usually organized into books, chapters, verses, and lines. They often intermingle with each other, so the tree structure does not suit your needs. The new modeling systems overcome these drawbacks, such as the MECS, designed for the work of Wittgenstein, or the TEI Guidelines, LMNL, and CLIX.
The Text Encoding Initiative (TEI) has published a multitude of guides for the encoding of documents of interest in the humanities and social sciences, developed over years of international collaborative work. These guidelines have been used in countless cataloging projects for historical documents, academic papers, etc.
The Semantic Web
Markup languages are the fundamental tool in the design of the semantic web, one that not only allows access to information, but also defines its meaning, so that its automatic processing is easier and can be reused for different applications. This is achieved by adding additional data to the documents, through two expressly created languages: RDF (Resource description framework- Resource description platform) and OWL (Web Ontology Language-ontology language for the web), both based on XML.
Features
Plain text
One of the main advantages of this type of coding is that the vast majority can be interpreted directly since they are plain text files, excluding some presentation languages that store information in binary files such as '.doc& #39; of MS Word where only a small part of the information is readable. This is an obvious advantage over binary file systems, which always require an intermediary program to work with them. A document written with markup languages can be edited by a user with a simple text editor, without prejudice to the fact that more sophisticated programs can be used to facilitate the work.
Because they are text only, documents are independent of the platform, operating system or program with which they were created. This was one of the premises of the creators of GML in the 70s, so as not to add unnecessary restrictions to the exchange of information. It is one of the fundamental reasons for the great acceptance they have had in the past and the excellent future that is predicted for them.
Compactness
Markup instructions are interspersed with the content itself in a single file or data stream. This is an example in different markup languages:
Examples | HTML | LaTeX | Wikitext |
---|---|---|---|
Title | Δh1 previouslyTítulo | section{Titus} | ♪ Title == |
List | ▪ | begin{itemize} | ♪ Point 1 |
text bold | ≤2⁄2⁄2⁄2⁄2⁄2⁄2⁄2⁄2⁄2⁄2 | bf{text} | '''text' |
text italics | ≤2 | it{text} | ''text' |
Code between angle brackets such as <ul>, or with section tags, are markup instructions, also called tags. These tags in particular are descriptive of the structure of the document, and can be its visual presentation in various ways. The i tag (from italics, italics), by contrast, specifies that the text should be displayed in italics, without specifying why. of this differentiation: it is a presentational label. The text between these instructions is the actual content of the document.
Ease of processing
Standards organizations have been developing specialized languages for types of documents in particular communities or industries. One of the first was CALS, used by the US military for their technical manuals. Other industries that need a large amount of documentation, such as aeronautics, telecommunications, automotive or hardware, have developed languages adapted to their needs. This has led to their manuals being published only in an electronic version, and later obtaining printed, online or CD versions from this. A notable example was the case of Sun Microsystems, a company that chose to write its product documentation in SGML, saving considerable costs. The person responsible for that decision was Jon Bosak, who would later found the XML committee.
Flexibility
Although markup languages were originally devised for text documents, they have become widely used in areas such as vector graphics, web services, web syndication, or user interfaces. These new applications take advantage of the simplicity and power of the XML language. This has allowed several different markup languages to be combined in a single file, as in the case of XHTML+SMIL and XHTML+MathML+SVG.
Contenido relacionado
AppleTalk
Broadband
GNU General Public License