Computational linguistics

ImprimirCitar

Computational linguistics is an interdisciplinary field that deals with the development of formalisms for the functioning of natural language, such that they can be transformed into executable programs for a computer. This development is located between rule-based modeling and statistical modeling of natural language from a computational perspective, and involves linguists and computer scientists specialized in artificial intelligence, cognitive psychologists, and experts in logic, among others.

Some of the study areas of computational linguistics are:

  • Computer-assisted linguistic Corpus.
  • Syntactic Analyzers Design: parser), for natural languages.
  • Design of labelers or lematizers, such as the POS-tagger.
  • Definition of specialized logics that serve as a source for the Processing of Natural Languages.
  • Study of the possible relationship between formal and natural languages.
  • Automatic translation.

Origins

Computational linguistics arose in the US in the 1950s as an effort to obtain computers capable of automatically translating texts from foreign languages into English, particularly from Russian scientific journals. It arose as a result of the assertions of Warren Weaver, who saw in the translation a form of decryption. When artificial intelligence appeared in the 1960s, computational linguistics became a branch of AI, dealing with the level of human understanding and production of natural languages.

To translate one language into another, it was observed that one had to understand the syntax of both languages, at least at the level of morphology (the syntax of words) and entire sentences. To understand the syntax, one must understand the semantics of the vocabulary and the pragmatics of the language. What started as an effort to translate texts became a discipline concerned with understanding how to represent and process individual natural language using computers.

This type of study is also known as corpus linguistics, since the term "computational" it can be confusing.

Branches of study

Computational linguistics has been divided into two branches:

Theoretical Computational Linguistics

He bases his work topics on theoretical linguistics and cognitive science. The contributions of cognitive psychology, especially psycholinguistics, are also of special relevance, and have resulted in the emergence of a new science, computational psycholinguistics.

Their goal is to develop computable linguistic theories, that is, that can be subtly applied to computers. Since the existing formal theories do not account for all possible linguistic phenomena, theoretical computational linguistics emerges as an incentive for the formal understanding of linguistic processes, as well as a means for their practical demonstration. This occurs, for example, through the automatic analysis of large linguistic corpora, to investigate a linguistic phenomenon or test the validity of a theory.

However, not all linguistic theories claim to be applied to the computing domain. Indeed, cognitive linguistics studies the motivations of a given speaker to express himself, which naturally is far from being able to be modeled with a computer.

Applied Computational Linguistics

It is a branch of computational linguistics with an evident technological orientation, which has caused it to be frequently referred to by names such as linguistic engineering or human language technology. It focuses on the practical aspects that can be derived from the simulation of linguistic behavior with computer means.

Its objective is to create computer products that incorporate some component in which language intervenes, oral or written. Among them are:

  • Support to computer users with text processing, for example, correction of digitization and orthographic errors, proof of grammatical correction, change in ideograms in Japanese or Chinese.
  • Automatic search in text passages (smart search for information) and, not only their form, but also their meaning (recovery of information and search engines).
  • Support when translating texts into another language (computer-assisted translation) or also automatic translation.
  • The processing of spoken language (recognition of speech and speech synthesis), for example, in the service of telephone information or in reading devices for the blind.
  • From the search for large information from bibliographies to the direct answer of questions based on large data banks (recovery of information, data mining, information extraction).
  • The processing of data present linguistically, for example, the indexing of literature, the creation of indexes and lists of subjects, the production of abstracts and abstracts.
  • Support to authors in the drafting of texts, for example, in the search for the correct word or terminology.
  • The interaction of users with the computer in natural language, such that computers are also accessible to people who do not have sufficient knowledge of specific orders (man-machine interfaces).

Problems in Computational Linguistics

Some of the problems that need to be solved are:

  • Determine semantics. The same form of word can present in context a different meaning (compare homonym). The meaning relevant to the context should be chosen. On the other hand, formalisms are needed for the representation of the meanings of the words.
  • Resolution of the syntactic ambiguity. In some cases, a phrase is left to analyze and interpret in various ways. Choosing the correct one sometimes requires semantic information about the speech act and the intention of the speaker, but at least previous statistical knowledge about co-occurrence (joint appearance) of the words.

For example: «Pedro saw Maria with the binoculars» —here it is not clear if Pedro has seen Maria, who is holding a pair of binoculars in her hand, or if Peter with the help of binoculars was able to see Maria.

  • Recognize the purpose of a linguistic expression (see Pragmatic). Some phrases should not be understood textually. For example, the question "Can you give me the salt please?" is expected not the answer "Yes" or "No" but instead to give the salt.

Whether and how these can be solved automatically is not defined only by the state of computational technology but naturally depends very much on the characteristics of the language. Even more: it will certainly aspire to procedures that are applicable to all languages; however, the details will be elaborated separately for each one. A program for automatic hyphenation that was designed for English will not be usable without adaptation for German, because here the hyphenation principles are different. Unlike computer science, which generally deals with computer programming, the field of application of computational linguistics lies, therefore, in the specific part of the language of computer programs.

A science is not defined only by a field of application, but also by a theoretical interest. Computers are automata, manipulating symbols according to defined rules. Just like numbers, languages too are—admittedly very complex—symbol systems. Therefore, it is evident to design computer programs that simulate the operations that man performs with the words of a language, at least in part. In this way, linguistic hypotheses can be tested with the computer. Computational linguistics is, in this sense, a linguistics in which computational simulation is used as a methodical means to deepen our knowledge of human language.

After all, this approach certainly raises a number of psychological and philosophical questions. The computer is a machine, the language is something intellectual. How far can you perform calculations with language? Will computers one day think or will the human intellect function as a symbolic machine? The fascination of the computational simulation of the behavior of the language is precisely to probe its limits. A knowledge interest for which one may engage in computational linguistics is to discover if and how human communication is processable by computers and, if limits are found, what these are like. Are these limits only practical or theoretical overriding? This knowledge is very important for the place that we want to give computers in society.

Applications of Computational Linguistics

  • Syntactic Analyzer
  • Morphological analysis
  • Semantic Analyzer
  • Aligner
  • Conjugador
  • Language converter
  • Language Corpus
  • Orthographic corrector
  • Functional changer
  • Flexion
  • Lematizer
  • Indexing motor = indexing motor
  • Ontologies

More information

  • An introductory book (in English)
  • Book that explains several computer language applications
  • CICLing International Congress (organized in Mexico and other countries)
  • Natural Language Computing and Generative Grammar in Prolog
  • Juan Carlos Tordera Yllescas (2011), The bee of computer language. Arc-Books
  • Brief introduction to computer language (in Spanish)

Contenido relacionado

ICab

iCab is a proprietary web browser for Mac OS and Mac OS X developed in Germany by Alexander...

Data structure

In computer science, a data structure is a particular way of organizing information in a computer so that it can be used efficiently. Different types of data...

IP header

The minimum size of the header is 20 bytes while the maximum is 60...
Más resultados...
Tamaño del texto:
Copiar