Natural Language Processing

format_list_bulleted Contenido keyboard_arrow_down
ImprimirCitar

Natural Language Processing, abbreviated PLN —in English, natural language processing, NLP— is a field of computer science, artificial intelligence and linguistics that studies the interactions between computers and human language. It deals with the formulation and investigation of computationally efficient mechanisms for communication between people and machines through natural language, that is, the languages of the world. It is not about communication through natural languages in an abstract way, but about designing mechanisms to communicate that are computationally efficient—that can be done through programs that execute or simulate communication. Applied models focus not only on language comprehension per se, but on general human cognitive aspects and memory organization. Natural language serves only as a means to study these phenomena. Until the 1980s, most PLN systems were based on a complex set of hand-drawn rules. Beginning in the late 1980s, however, there was a revolution in NLP with the introduction of machine learning algorithms for language processing.

History

The history of the PLN dates back to 1950, although earlier works have been found. In 1950, Alan Turing published Computing machinery and intelligence , where he proposed what is now called the Turing test as a criterion of intelligence. In 1954, the Georgetown experiment involved machine translation of more than sixty sentences from Russian into English. The authors argued that in three to five years machine translation would be a solved problem. Actual progress in machine translation was slower, and in 1966 the ALPAC report showed that research had been underperforming. Later, until the late 1980s, smaller-scale research in machine translation was carried out, and the first statistical machine translation systems were developed. This was due to both the steady increase in computing power resulting from Moore's law and the gradual decline in the dominance of Noam Chomsky's linguistic theories (for example, transformational grammar), whose theoretical underpinnings discouraged the corpus type of linguistics., which is based on the machine learning approach to language processing. The first machine learning algorithms were then used, such as decision trees, systems produced from if-then statements similar to handwritten rules. A summary of the 50-year history of automatic processing post-NLP4NLP publications can be found in a double publication in Frontiers in Research Metrics and Analytics.

Natural language processing difficulties

Ambiguity

Natural languages are inherently ambiguous on different levels:

  • At the lexicon level, the same word may have several meanings, and the selection of the appropriate should be deduced from the oral context or basic knowledge. Many researches in the field of natural language processing have studied methods of solving lexical ambiguities through dictionaries, grammar, knowledge bases and statistical correlations.
  • At the reference level, the resolution of anaphores and cataphores implies determining the linguistic entity prior to or after which they refer.
  • At the structural level, semantics are required to displace the dependence of prepositional syntagmas leading to the construction of different syntactic trees. For example, in the phrase He broke the drawing of a nerve attack..
  • On the pragmatic level, a prayer often does not mean what is really being said. Elements such as irony have an important role in the interpretation of the message.

To resolve these types of ambiguities and others, the central problem in NLP is the translation of natural language inputs into an unambiguous internal representation, such as parse trees.

Word gap detection

In spoken language, there are usually no pauses between words. Where words should be separated often depends on how likely it is to make both grammatical and contextual logical sense. In the written language, languages like Mandarin Chinese also have no breaks between words.

Imperfect reception of data

Foreign accents, regionalisms or difficulties in speech production, typing errors or ungrammatical expressions, errors in reading texts through OCR

Components

  • Morphological analysis. The analysis of words to extract roots, flexive traits, composite lexical units and other phenomena.
  • Syntactic analysis. The analysis of the syntactic structure of the phrase through a grammar of the language in question.
  • Semantic analysis. The extraction of the meaning of the phrase, and the resolution of lexical and structural ambiguities.
  • Pragmatic analysis. The analysis of the text beyond the limits of the sentence, for example, to determine the baselines of pronouns.
  • Phrase planning. Structure each sentence of the text in order to express the proper meaning.
  • Generation of the sentence. The generation of the linear chain of words from the general structure of the phrase, with its corresponding reflections, concordances and other syntactic and morphological phenomena.

Natural Language Processing and Natural Language Understanding

Processing of natural language and Understanding natural language

It is possible to identify within the PLN a specialized subfield in semantic and pragmatic relationships, called Natural Language Understanding (CLN, Natural Language Understanding - NLU). The CLN would then group the areas of automatic summary, paraphrase, sentiment analysis and search for answers. Of this, the main application would be chatbots or conversational bots.

Applications

The main work tasks in the PLN are:

  • Synthesis of speech
  • Language analysis
  • Understanding language
  • Recognition of speech
  • Synthesis of voice
  • Generation of natural languages
  • Automatic translation
  • Answer to questions
  • Information recovery
  • Extraction of information

Contenido relacionado

Host (amphibology)

Host, as an amphibological term, retains the meaning of hosting, that is, the organism that hosts a parasite or symbiont, although it is controversial and has...

MediaWiki:Nowiki sample

Enter plain text...

Optical pencil

The stylus pen is an input peripheral for computers, in the form of a photosensitive wand, which can be used to point at objects displayed on a CRT television...
Más resultados...
Tamaño del texto:
undoredo
format_boldformat_italicformat_underlinedstrikethrough_ssuperscriptsubscriptlink
save