Formal language

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

This image shows the relationship between character strings, well-formed formulas and theorems. In some formal systems, however, the set of the theorems coincides with that of well-formed formulas.

In mathematics, logic, and computer science, a formal language is a language whose symbols are primitive and the rules for joining those symbols are formally specified. The set of primitive symbols is he calls it the alphabet (or vocabulary) of the language, and the set of rules is called the formal grammar (or syntax). A string of symbols formed according to the grammar is called a well-formed formula (or word) of the language. Strictly speaking, a formal language is identical to the set of all its well-formed formulas.

For example, an alphabet might be the set {a,b}, and a grammar might define well-formed formulas as having the same number of symbols a than b. So, some well-formed formulas of the language would be: ab, ba, abab, ababba, etc., and the formal language would be the set of all those well-formed formulas.

For some formal languages there is a formal semantics that can interpret and give meaning to the well-formed formulas of the language. However, a formal semantics is not a necessary condition to define a formal language, and that is an essential difference with natural languages.

In some formal languages, the empty word (i.e., the zero-length symbol string) is allowed, frequently noticed by ${displaystyle epsilon ,}$ , $e,$ or $lambda ,$ .

Example of formal languages

The Gödel Numeration {aⁿ: a It's a prime number. n a Gödel number}.
The set of all syntactically valid programs in a particular programming language.
The set of all formulas well formed in the logic of first order.

Specification of formal languages

Formal languages can be specified in a wide variety of ways, such as:

(If the language is regular)

Chains produced by a formal grammar (see Chomsky hierarchy).
Chains described by a regular expression.
Chains accepted by an automaton, such as a Turing machine or finite automaton.

The chains are formed by a set of symbols that belong to the same language, there are two ways to compose a sentence or function with the symbols:

Syntax
Semantic

Operations

Various operations can be used to produce new languages from other data. Suppose L₁ and L₂ are languages over a common alphabet. So:

La concatenation L₁L₂ consists of all those words of form vw where v It's a word of L₁ and w It's a word of L₂
La intersection L₁"L₂ consists of all those words that are contained both in L₁ as in L₂
La union L₁日本語L₂ consists of all those words that are contained either in L₁ or L₂
The Complement ~L₁ consists of all those producing words on the alphabet L₁ that are not already contained in L₁
The quotient L₁/L₂ consists of all those words v for which there is a word w in L₂ such that vw is located in L₁
La star L₁^♪(Also called Kleene's closing of language L₁) consists of all those words that can be written in the form W₁W₂...W_n where everything is W_i is located in L₁ and n ≥ 0. (Note that this definition includes λ in any L*)
Positive Closure L₁⁺ consists of all those words that can be written in the form W₁W₂...W_n where everything is W_i is located in L₁ and n /2005 0. (It differs from the Kleene closure as it contains λ if and only if L₁ contains λ)
La intercalation L₁♪L₂ consists of all those words that can be written in the form v₁w₁v₂w₂...v_nw_n; are such words that concatenation v₁...v_n It's in. L₁and concatenation w₁...w_n It's in. L₂

Be the ${displaystyle L_{i}}$ languages, so for each $i$ then. ${displaystyle L_{i}}$ is formed by all words that may arise from concatenation $i$ language words $L$ . For example, if ${displaystyle L=left{v,wright}}$ , then ${displaystyle L_{3}=left{v.v.v,v.v.w,v.w.v,v.w.w,w.v.v,w.v.w,w.w.v,w.w.wright}}$ Based on the previous concept, the above-mentioned closures can be defined:

The closure of Kleene is formally noted: ${displaystyle L^{*}=bigcup _{iin mathbb {N} _{0}}L_{i}=left{lambda right}cup L_{1}cup L_{2}cup L_{3}cup ldots cup L_{i}}$
The positive closure is formally noted: ${displaystyle L^{+}=bigcup _{iin mathbb {N} }L_{i}=L_{1}cup L_{2}cup L_{3}cup ldots cup L_{i}}$

So it follows that ${displaystyle L^{*}=left{lambda right}cup L^{+}}$

A question typically asked about a given L formal language is how difficult it is to decide whether or not to include a given v word. This topic is in the domain of computability theory and computational complexity theory.

In contrast to the language of living beings and especially human language, considered natural languages, the «artificial» languages of mathematics or computing are called formal languages, artificial languages are called formal languages (including programming languages). However, human language has a characteristic that is not found in programming languages: diversity.

In 1956, Noam Chomsky created Chomsky's hierarchy to organize the different types of formal language.

Truths Concerning Formal Languages

Theorem 1: The group of languages in general (including non-formal) is countless.

Lema 1: The set of languages in an unemployed alphabet is countless.

Affirming that an alphabet is not empty is equivalent to the fact that the alphabet contains at least one symbol, ergo, it is enough to prove that the set of languages in the alphabet

{displaystyle {a},}

It's countless. As we know, L language in

{displaystyle {a},}

is a subset of

{displaystyle {a}^{*},}

, this leads us to the conclusion that, the whole of all languages in

{displaystyle {a},}

That's right.

{displaystyle 2^{{a}^{*}},}

(the set of all subsets or power set

{displaystyle {a}^{*},}

) and it is evident that

{displaystyle {a}^{*},}

is infinite (in fact; accountant), it has also been shown that if

A

is an infinite set (counter or uncountable), then

{displaystyle 2^{A}}

It's bigger than

A

Because

{displaystyle 2^{A}}

becomes an infinite set of orders of infinity, being greater, there will be no bijection between

A

and

{displaystyle 2^{A}}

What he does to

{displaystyle 2^{A}}

a countless infinite set, the test has ended. (see Cantor theorem)

Demonstration of Theorem 1: It can easily be derived that the assertion delineated in Theorem 1 is true, because the set of languages in general is just an infinite union of sets of the type ${displaystyle 2^{A}}$ Where $A$ is an infinitely accounting set.

Theorem 2: Languages are accounting sets.

You know a language

L

in an alphabet

Sigma

is a subset of

Sigma ^{*}

and as already mentioned,

Sigma ^{*}

is infinite countable, therefore,

L

is as much an infinitely accounting set (of the same size as

Sigma ^{*}

The test has ended.

Theorem 3: The set of formal languages is accounting.

As we know a formal language can be generated by a formal grammar (or sentence structure), which implies that any formal language can be accepted by an MT, which in turn implies that a bijection can be defined between the set of formal languages and the set of the MT's (due to the transient property of the relationship "there is bijection between

A

and

B

"). To demonstrate the theorem the concept of MT's encoding will be used which is introduced in the study of the universal MT's, a MT is usually encoded with a function that has precisely as a domain to the MT's set (we will call it

X

) and as co-domain

{displaystyle {0,1}^{*},}

, that function can be a bijection if the codomain becomes Y (a subset of

{displaystyle {0,1}^{*},}

and as

{displaystyle {0,1}^{*},}

is accountant, that subset will also be countable and how there is such bijection (between

X

Y

), the assertion has been demonstrated, proof completed.

Context Free Languages

Formal languages derived from an alphabet consisting of a single letter, known as unary languages, can be considered as sets of natural numbers, and questions of representation of such sets by the basic devices of formal language theory form a special subject of study. Formal unary languages are basically periodic sets (Jez & Okhotin, 2007).

Formal unary languages are characterized by systems of language equations of the general form:

${displaystyle {begin{cases}X_{1}=varphi _{1}{bigl (}X_{1},...,X_{n}{bigr)},\.\.\.\X_{n}=varphi _{n}{bigl (}X_{1},...,X_{n}{bigr)}\end{cases}}}$ (1)

With different and permitted operations on its right side. Context-free or context-free languages are obtained by using system concatenation and union. As Jez & Okhotin (2007). If the concatenation is limited to a single linear side (that is, every concatenation appearing somewhere to the right is of the form w φ times some constant string w), then the solutions exactly represent the regular languages. Lastly, supporting automata, as shown by Okhotin (2004), they can be simulated by equations with union, intersection, and linear concatenation of two faces.

When we talk about languages we must consider the grammar that governs them. Barash & Okhotin (2014) tells us that context-free grammars are best understood as a logic for defining the syntax of languages. In this logic, the definitions are inductive, so the properties of a string are determined by the properties of its substrings. This is how a rule S → aSb asserts that if a string a^n-1bⁿ¹ has the property S, then the strings aⁿbⁿ have the property S as well. In addition to concatenation, the formalism of this logic includes a disjunction operation, represented by having multiple rules for a single symbol.

Contenido relacionado

Más resultados...