Help:Special characters

ImprimirCitar
Quick help

If you need support to visualize exotic or archaic characters:

  1. Search Project Noto for the Unicode name of writing
  2. Download and install the source

To see which characters correspond to a project, the browser is asked to show the source code of the page and it sees:

or the one this page has:

Several characters that are not part of the standard ASCII repertoire are useful—even necessary—for Wiki pages, especially international pages. In principle, if the modern UTF-8 encoding is used, it should not cause problems, and it even allows pages written in several simultaneous languages to be read without problems. It doesn't matter if they have Arabic or Chinese characters, they are all included. If you have your operating system updated to use UTF-8 you will be able to write without problems and you will be able to read equally without problems. If you see that there are any old pages with squares instead of accented letters, help us to correct them.

So, if you see this series of accents: áéíóú (accented aeiou) instead of little squares or weird letters, you can read whatever is in UTF-8. If you need more information look here: UTF-8

This page contains recommendations on which characters are safe to use and how to use them. There are three ways to enter a non-ASCII character in a Wiki page apart from using UTF-8 which we recommend:

  • Enter the character directly from the appropriate keyboard, or by copying it from some "character map" application and then by hitting it, or by means of special resources that can offer your operating system, or text editing program. The web server should then be configured to report which 8-bit character game to be used.
  • Uses a well-known HTML reference entity, such as à. This is the safest and most compatible method, and it is not ambiguous even when the server does not announce the use of some special character game, and even when the character is not properly displayed in some browsers.
  • Uses an HTML reference numerical entity, such as ¡. This is not recommended, as many browsers misinterpret these entities as references to the native character game. However, this is the only way to enter Unicode values for which there is no entity, such as the letters of the Turkish language. Note that because the characters that target the values 128 to 159 are not used in both ISO-8859-1 and Unicode, the references in this range and ƒ are invalid and ambiguous, even though they are frequently used by several websites.

ISO-8859-1 features

__

The following characters from the ISO-8859-1 set (one of the extended ASCII codes) are safe to use on all Wiki pages. The following table lists the character itself, the code for each character in decimal and hexadecimal, the name of the HTML entity, and the common name of the character.

LiteralHexDecEntityCharacter
00A00160 space that does not produce line jumps
!00A10161¡opening exclamation
!00A20162¢penny sign
£00A30163£sign of pound
¤00A40164¤international currency sign
¥00A50165¥sign of yen
§00A70167§Section sign
!00A80168¨diéresis
©00A90169©copyright sign
a00AA0170ªfemale ordinal indicator
«00AB0171«wide open quotes
¬00AC0172¬sign of logical denial
®00AE0174®trademark sign
!00AF0175¯high
°00B00176°Grade sign
±00B10177±sign of more/less
00B40180´Acute accent
μ00B50181µmicro sign
00B60182sign of end of paragraph
·00B70183·midpoint (georgian coma)
.00B80184¸cedilla
or00BA0186ºmale ordinal indicator
»00BB0187»Angulated closing quotes
?00BF0191¿opening question mark
À00C00192ÀA with severe accent
A00C10193ÁA with acute accent
!00C20194ÂA with circumflex accent
.00C30195ÃA with tilde
Ä00C40196ÄA with diresis
Å00C50197ÅA with ring
.00C60198ÆAE ligature
Ç00C70199ÇC cedilla
He00C80200ÈE with a serious accent
He00C90201ÉE with acute accent
̄00CA0202ÊE with circumflex accent
He00CB0203ËE with diresis
!00CC0204ÌI with a serious accent
I00CD0205ÍI with acute accent
I am.00CE0206ÎI with circumflex accent
!00CF0207ÏI with diresis
Ñ00D10209ÑN with tilde
Ò00D20210ÒOr with a serious accent
Or00D30211ÓOr with a sharp accent
Ô00D40212ÔOr with a circumflex accent
00D50213ÕOr with tilde
Ö00D60214ÖOr with diresis
Ø00D80216ØOr bar
.00D90217ÙU with a serious accent
ONE00DA0218ÚU with acute accent
.00DB0219ÛU with circumflex accent
Ü00DC0220ÜU with diresis
ß00DF0223ßdouble s (German)
à00E00224àa with a serious accent
to00E10225áa with acute accent
00E20226âa with circumflex accent
ã00E30227ãwith tilde
ä00E40228äa with diresis
å00E50229åa with a ring
æ00E60230æae
ç00E70231çcedilla
è00E80232èwith a serious accent
E00E90233ée with acute accent i co
ê00EA0234êwith a circumflex accent
ë00EB0235ëe with diresis
Yes.00EC0236ìwith a serious accent
I000237íi with acute accent
î00EE0238îi with circumference accent
ï00EF0239ïi with diresis
ñ00F10241ñ# with tilde
!00F20242òor with a serious accent
or00F30243óor with acute accent
ô00F40244ôor with a circumflex accent
õ00F50245õor with tilde
ö00F60246öor with diresis
00F70247÷Mark of division
ø00F80248øor bar
!00F90249ùu with a serious accent
?00FA0250úu with acute accent
û00FB0251ûu with circumflex accent
ü00FC0252üu with diresis
?00FF0255ÿand with diresis
20AB20AB&dong;dong


These characters are a subset of the symbols most frequently used on the Internet from the extended ASCII character set, ISO 8859-1. Wikipedia pages are identified by the server as pages with ISO-8859-1 text. The characters mentioned above are a selected group to improve compatibility with other machines.

For example, the Apple Macintosh is widely used for Internet use, is not limited to any particular language, and its native character set (which is not ISO-8859-1) contains many of the characters international. Several Macintosh browsers correctly translate ISO text into the native character set, as long as the characters used are available. So the table described above is a subset of ISO-8859-1 characters that are also available in the native Macintosh character set. The Microsoft Windows standard code document 1252 is an extended set of ISO-8859-1, so these characters are also available on Windows machines. The most common Latin character sets other than ISO-8859-1 are MS-DOS document code 437 (pre-Windows), Macintosh Roman, and other ISO sets such as ISO-8859-2. The number of pre-Windows MS-DOS machines with web browsers is small, and they are often dedicated machines that wouldn't use Wikipedia anyway, so it's reasonably safe to sacrifice compatibility with those machines in favor of foreign characters. people in need. Other ISO sets are generally intended to be read by other browsers using the same character set in the same country, and therefore those pages should use a language-specific character set.

These characters can be entered either using references to HTML entities known as à, directly via some keyboards, or via whatever resource is available to the page author Wiki that needs to enter these characters. For example, Wiki authors using Windows machines can enter these characters by holding down the Alt key while typing the 4-digit decimal code corresponding to the character on the keyboard's numeric field. It is important that all 4 digits (including the leading 0) are entered; using a 3-digit code will cause characters of the obsolete 437 encoding to be entered. Wiki authors using Macintosh machines should be careful to use special facilities to enter these characters in ISO-8859-1 format rather than with the character set native, or you can use references to well-known HTML entities. Note that some Windows users may experience problems with versions of the Microsoft Internet Explorer browser that use "Alt-Left Arrow" and "Alt-Right Arrow" for scrolling between pages. These combinations interfere with entering codes containing the digits 4 and 6. In this case, use HTML entity references.

The characters in the described table can be used directly as 8-bit characters in all Wiki pages, and are sufficient for all pages written mainly in English, Spanish, French, German and languages that do not require special characters other than those (like Catalan). Despite their general security, it is currently not possible to use these characters in Wiki page titles on the English Wikipedia, although some of the International Wikipedias are configured to accept them.

Unsafe features

__

Notice especially what is missing here from the full ISO-8859-1 character set:

LiteralHexDecEntityCharacter
.0166¦vertical bar
­0173­smooth script
2 30178 0179² ³digits in superscript
1⁄4 1⁄2 3⁄40188 0189 0190¼ ½ ¾common fractions
¢Ü ¢Ü ¢Ü0208 0240 0222 0254Ð ð Þ þcharacters eth and thorn islandes
×0215×multiplication sign

These should be considered unsafe (besides, there are suitable substitutes for many of them).

Special care should be taken with those characters that exist in the native character set of some popular machines but not in the set mentioned above. They are not secure, even though you may be able to see them correctly when you use them. Characters that are part of the 1252 Windows code document but not ISO-8859-1 include the euro sign (€ €), the cross, and the double cross († †, ‡ ‡), the bullet (• •), the trademark sign (™ ™), style punctuation (see below), per-mil sign (‰ ‰), some letters with a caron accent from Eastern Europe, and the digraphs oe. Characters in the Macintosh Roman character set that are not part of ISO-8859-1 include the cross and double cross, the bullet, the trademark sign, a few mathematical symbols such as infinity (∞ & infin;) and not-equal (≠ ≠), some commonly used Greek letters such as pi (π π), digraphs (ligatures) such as oe and fl, stylish punctuation marks, the per thousand sign, and some accents such as the breve, onogek, and caron.

HTML 4.0 markup language defines entities for some Latin characters not included in ISO-8859-1 and used by popular languages, such as the OE digraph (Œ Œ, œ œ), the uppercase Y with an umlaut (Ÿ Ÿ), and some Eastern European characters with accents (š š). These are also unsafe; although if they are passed as references to HTML entities, they may display correctly on some machines.

In short, don't assume it's safe to use some special character just because it looks good on your machine. Use the characters in the table listed above, and read and understand how to use the others listed below.

Possibly usable non-ISO features

__

For many years now, the web has been based on the Unicode character repertoire. Many characters can be taken from this extended set, entering them either as references to HTML entities or directly. Web browsers recognize and interpret them correctly, perhaps using alternate character fonts as required. All of these characters should be considered less secure than the ones mentioned above, but only in the sense that they may not display properly on some machines, although in the form of HTML entity references they are unambiguous, and preserve data integrity.

For many of these, suitable substitutes and fixes are available, and should be used when the importance of making the text suitable for users of older machines and software outweighs the importance that good presentation may have for those with newer software (At the discretion of the author or publisher).

Stylish Punctuation Marks

Absent from the ISO-8859-1 character set, but frequently used and present in both the Macintosh Roman and Windows 1252 code document sets, and later included in Unicode, there are some punctuation marks (quotation marks and stripes) appropriate for languages like English and Spanish that have a certain unique style. These can be entered as entity references, and should display correctly on most machines. Even some text-mode browsers are capable of making the appropriate substitutions using simple plain ASCII quotes and dashes. Many of these references did not exist in older versions of HTML, so they may not be recognized by newer software. Since using these characters maintains data integrity even on machines that do not display them correctly, they can be considered safe to use unless proper deployment in older software is critical. The German quotation marks "low-9" they are a similar case, but they are less frequently translated by software, and therefore are not as secure. The following table shows these characters preceding an "O" capital letter for better visibility:

O';simple left
O'';simple right side
“O"ldquo;double left
O”";double right side
-Olong line
-OShort strip
I' sbquo;low-9 simple
„O" bdquo;low-9 double

Many websites targeting a Windows user audience use entities taken from the 1252 code for these characters: for example, using for the long dash. This is not an acceptable practice, as it would actually be referring to Unicode code 151, which has only one control character. In order to ensure future data integrity and maximum compatibility, these characters should be rewritten to references like —.

Greek letters and mathematical symbols

__

Traditionally, to represent letters of the Greek alphabet and other mathematical symbols in the Windows environment, the use of the Symbol font was common. With the widespread adoption of Unicode this method is not only obsolete but also flawed. To use these symbols, it is now necessary to use, as already said, HTML entities or simply type them (or copy them, since they are not usually on keyboards).

Uppercase and lowercase Greek letters simply use their full names as entities. These characters should, of course, only be used to represent occasional Greek letters in primarily Latin text. Really Greek text should be written using a Greek character set to avoid overloaded and poorly responsive files. Here are some examples:


α" alpha;
Interpreter" Gamma;
β" Beta;
..Lambda;
γ"gamma;
" Sigma;
π"pi;
Русский" Pi;
σ";
Ω" Omega;
.(sigmaf)

Other common math symbols:

.<
>
I was."ne;
"
"le;
" Prime;
"ge;
" part;
" equiv;
"int;
' asymp;
"sum;
" Infin;
" Prod;
"radic;

Many of the symbols in the Windows character font "Symbol" used to interpret mathematical symbols (such as expandable bracket segments) are not present on many other machines, and are not even present in Unicode 3.1 or as HTML entities (although they are planned for Unicode 3.2). These are used in products like TtH to interpret equations. It is not possible nowadays to use these characters in web pages in a compatible way for all readers.

Other common symbols

__

Some characters such as the bullet, the euro currency sign, and the registered trademark sign are special cases. It is very likely that they will be assimilated and interpreted in some way in many browsers. Because they are important to international trade, many systems add them to character fonts in some non-standard location and render them when requested, or else simply render them in special ways that do not require them to be present anywhere. font. Check the table below to see how your browser interprets the following symbols:

"bull;Vineyard
" Euro;Sign of the euro currency
TM" trade;Sign of trademark

Slightly less common symbols include:

"dagger;cross
" Dagger;double cross
" Loz;diamond diamond
." per thousand;sign per thousand
";arrow to the left
";up
";arrow to the right
"darr;arrow down
Δ" dispatches;black paint of stings
" clubs;black jellyfish
' arts;Black paint of hearts
"diams;Black diamond paint
anguilla simple opening
";simple angled comilla closure

The use of these symbols should be considered unsafe, except perhaps on pages targeted at a specific audience that most likely use fairly up-to-date software on popular machines.

Unicode

__

The Unicode UCS-4 character encoding is the official character encoding in HTML 4.0. Many browsers, however, are only capable of displaying a small subset of the complete UCS-4 repertoire. For example, the codes Й ק م are displayed in your browser as Й, ק, and م, which should ideally look like the Cyrillic letter "I short", the Hebrew letter "qof", and the Arabic letter "mim", respectively. It is unlikely that your computer has all of these fonts and displays them correctly, although it may display some of them. In any case, because these characters are encoded according to the standard, they should display correctly on any standard-compliant system that has the characters available. References to numeric entities are the only way to enter these characters on a Wiki page at the moment. Note that encoding them using decimal instead of hexadecimal (for example Й instead of Й) will increase the number of browsers in which the reference it will work.

See also Unicode and HTML for tables of character entities.

__
  • goal:MediaWiki User's Guide: Creating special characters: the most updated version of this article

Contenido relacionado

Parity bit

A parity bit is a binary digit that indicates whether the number of bits with a value of 1 in a set of bits is even or odd. Parity bits are the simplest error...

MediaWiki:Pagemovedtext

Page $1 renamed to...

Cyclic redundancy codes algorithm

The algorithm used by the cyclic redundancy check is the...
Más resultados...
Tamaño del texto:
Copiar