Database
A database is responsible not only for storing data, but also for connecting them together in a logical unit. In general terms, a database is a set of structured data that belongs to the same context and, in terms of its function, it is used to electronically manage large amounts of information. In this sense; A library can be considered a database made up mostly of documents and texts printed on paper and indexed for consultation. Currently, and due to the technological development of fields such as information technology and electronics, most of the databases are in digital format, this being an electronic component; therefore, a wide range of solutions to the data storage problem has been developed and is offered.
There are programs called database management systems, abbreviated DBMS (from English Database Management System or DBMS), which allow data to be stored and later accessed in a fast and structured way. The properties of these DBMS, as well as their use and administration, are studied within the field of computing.
The most common applications are for the management of companies and public institutions; They are also widely used in scientific settings in order to store experimental information.
Although databases can contain many types of data, some of them are mutually protected by the laws of various countries. For example, in Spain, personal data is protected by the Organic Law for the Protection of Personal Data (LOPD), in Mexico by the Federal Law on Transparency and Access to Public Government Information and in Argentina by the Law for the Protection of Personal Data.
In Argentina, the Penal Code penalizes certain behaviors related to a database: illegally accessing a personal data bank, providing or revealing information registered in a file or in a personal data bank whose secret is obligated to keep by law or insert or have data inserted into a personal data file. If the author is a public official, he also suffers a penalty of special disqualification.
Database Classification
Databases can be classified in various ways, according to the context being handled, their usefulness, or the needs they satisfy.
Depending on the variability of the database
Static databases
They are read-only databases, used primarily to store historical data that can later be used to study the behavior of a data set over time, make projections, make decisions, and perform data analysis for business intelligence.
Dynamic databases
They are databases where the stored information is modified over time, allowing operations such as updating, deleting and editing data, in addition to the fundamental query operations. An example may be the database used in a supermarket information system.
According to the content
Bibliographic databases
They only contain a surrogate (representative) of the primary source, which allows it to be located. A typical bibliographic database record contains information about the author, date of publication, publisher, title, edition, of a certain publication, etc. It can contain a summary or extract of the original publication, but never the full text, because otherwise, we would be in the presence of a full-text database (or primary sources —see below). As its name indicates, the content is figures or numbers. For example, a collection of laboratory analysis results greatly helps with data redundancy.
Full-text databases
Store primary sources, such as the entire content of all issues of a collection of scientific journals.
Directories
An example is telephone directories in electronic format.
These directories can be classified into two large types depending on whether they are personal or business (called white or yellow pages respectively).
There are three types of business directories:
- They have company name and address.
- They contain telephone and the most advanced contain email.
- They contain data such as billing or number of employees in addition to national codes that help their distinction.
There is only one type of personal directories, since laws such as the LOPD in Spain protect the privacy of the users belonging to the directory.
Reverse search is prohibited in personal directories (from a phone number to know the owner of the line).
Databases or "libraries" of chemical or biological information
They are databases that store different types of information from chemistry, life sciences or medicine. They can be considered in several subtypes:
- Those that store nucleotide or protein sequences.
- The databases of metabolic routes.
- Structure databases include experimental data records on 3D biomolecule structures.
- Clinical databases.
- Bibliographic databases (biological, chemical, medical and other fields): PubChem, Medline, EBSCOhost.
Database Management System (DBMS)
Databases require software that allows the administration of said database. These specialized programs serve as an interface so that users can manage how all the information collected is structured and optimized. A database administration system also allows for a large number of administration-related operations, such as monitoring productivity, tuning, backups, and restoration of data.
Among the best known database managers or DBMS are Microsoft SQL Server, MySQL, Oracle Database, Microsoft Access, FileMaker and dBASE.
Differences between databases and spreadsheets
Databases and spreadsheets (eg spreadsheets from office suites) are convenient ways to store information. The main differences between the two are:
- The way to manipulate and store the information.
- The amount of data that can be stored.
- Accessibility to these stored data.
Spreadsheets from their inception were designed for one user, and that can be seen in their features. They are great for one or a small number of users who do not need to deal with a large volume of complex data. Databases, on the other hand, were created to store large amounts of organized information, sometimes huge amounts. Databases allow multi-user queries, which allow many users to quickly and securely access and query data at the same time, using highly complex logic and language.
Database models
In addition to classifying the databases by function, they can also be classified according to their data management model.
A data model is basically a "description" something known as a data container (something where data is stored), as well as methods for storing and retrieving data from those containers. Data models are not physical things: they are abstractions that allow the implementation of an efficient database system; they usually refer to algorithms, and mathematical concepts.
Some models frequently used in databases:
Hierarchical databases
In this model the data is organized in the form of an inverted tree (some say root), where a parent node of information can have several children. The node that does not have parents is called root, and the nodes that do not have children are known as leaves.
Hierarchical databases are especially useful in the case of applications that handle a large volume of information and highly shared data, allowing the creation of stable and high-performance structures.
One of the main limitations of this model is its inability to efficiently represent data redundancy.
Network database
This is a slightly different model than the hierarchical one; their fundamental difference is the modification of the concept of node: it is allowed that the same node has several parents (possibility not allowed in the hierarchical model).
It was a great improvement over the hierarchical model, as it offered an efficient solution to the problem of data redundancy; but, even so, the difficulty of managing the information in a network database has meant that it is a model used mostly by programmers rather than by end users.
Transactional Databases
They are databases whose sole purpose is to send and receive data at high speeds, these databases are very rare and are generally aimed at the quality analysis environment, production and industrial data, it is important to understand that Its sole purpose is to collect and retrieve data as quickly as possible, therefore redundancy and duplication of information is not a problem as with other databases. In general, in order to take full advantage of them, they allow some type of connectivity to relational databases.
A common example of a transaction is the transfer of an amount of money between bank accounts. Normally it is done through two different operations, one in which the balance of the origin account is debited and another in which we credit the balance of the destination account. To guarantee the atomicity of the system (that is, so that money does not appear or disappear), the two operations must be atomic, that is, the system must guarantee that, under any circumstance (even a system crash), the final result is that either the two operations have been carried out, or neither have been carried out.
Relational Databases
This is the model used today to represent real problems and manage data dynamically. After its foundations were postulated in 1970 by Edgar Frank Codd, from the IBM laboratories in San José (California), it did not take long to establish itself as a new paradigm in database models. The fundamental idea of it is the use of & # 34; relations & # 34;. These relationships could logically be thought of as data sets called "tuples". Although this is the theory of relational databases created by Codd, most of the time it is conceptualized in a way that is easier to imagine. This is thinking of each relation as if it were a table that is composed of records (the rows of a table), which would represent the tuples, and fields (the columns of a table).
In this model, where and how the data is stored is irrelevant (unlike other models such as hierarchical and network models). This has the considerable advantage that it is easier to understand and use for a casual user of the database. Information can be retrieved or stored by "queries" that offer a wide flexibility and power to manage the information.
The most common language for building relational database queries is SQL, Structured Query Language or Structured Query Language, a standard implemented by the main engines or relational database management systems.
During its design, a relational database goes through a process known as database normalization.
Multidimensional Databases
They are databases designed to develop very specific applications, such as the creation of OLAP Cubes. Basically they don't differ too much from relational databases (a table in a relational database could also be a table in a multidimensional database), the difference is more at the conceptual level; In multidimensional databases, the fields or attributes of a table can be of two types, either they represent dimensions of the table, or they represent metrics that are to be learned.
Object-oriented databases
This fairly recent model, typical of object-oriented computer models, attempts to store complete objects (state and behavior) in the database.
An object-oriented database is a database that incorporates all the important concepts of the object paradigm:
- Encapsulation - Property that allows to hide the information to the rest of the objects, thus preventing incorrect access or conflicts.
- Heritage - Property through which objects inherit behavior within a class hierarchy.
- Polymorphism - Property of an operation through which it can be applied to different types of objects.
In object-oriented databases, users can define operations on the data as part of the database definition. An operation (called a function) is specified in two parts. The interface (or signature) of an operation includes the name of the operation and the data types of its arguments (or parameters). The implementation (or method) of the operation is specified separately and can be changed without affecting the interface. User application programs can operate on the data by invoking those operations through their names and arguments, regardless of how they have been implemented. This could be called independence between programs and operations.
SQL:2003, is the extended SQL92 standard, supports object-oriented concepts and maintains compatibility with SQL92.
Documentary databases
They allow full-text indexing, and in general terms make more powerful searches, they are used to store large volumes of historical background information. Thesaurus is an index system optimized for this type of database.
Deductive databases
A deductive database system is a database system but with the difference that it allows deductions through inferences. It is mainly based on rules and facts that are stored in the database. Deductive databases are also called logical databases, since they are based on mathematical logic. This type of database arises due to the limitations of the relational database in answering recursive queries and deducing indirect relationships from the data stored in the database.
Language
It uses a subset of the Prolog language called Datalog which is declarative and allows the computer to make deductions to answer queries based on the stored facts and rules.
Advantages
- Use of logical rules to express consultations.
- It allows to answer recursive consultations.
- It has stratified negations
- Ability to obtain new information through that already stored in the database by inference.
- Using algorithms that optimize queries.
- It supports complex objects and assemblies.
- data security and integrity
- quickly unserviceable data (duplicated, unnecessary extra data)
- ease in maintenance
Phases
- Interrogation phase: is responsible for searching in the database for implicit deductible information. The rules of this phase are called derivation rules.
- Modification phase: is responsible for adding new deductible information to the database. The rules of this phase are called generation rules.
Interpretation
We found two theories of interpretation of deductive databases for which we regard rules and facts as axioms. Facts are base axioms that are taken to be true and do not contain variables. The rules are deductive axioms since they are used to deduce new facts.
- Model Theory: an interpretation is called a model when for a specific set of rules, these are always fulfilled for that interpretation. It consists in assigning to a preached all the combinations of values and arguments of a given constant domain of values. It should then be verified whether that preaching is true or false.
Mechanisms
There are two inference mechanisms:
- Ascendant: where you are part of the facts and you get new ones by applying inference rules.
- Descending: where it is part of the predicate (objective of the consultation performed) and tries to find similarities between the variables that lead us to correct facts stored in the database.
Distributed Database Management System (DBMS)
The DBMS database and software may be distributed across multiple sites connected by a network. There are two types:
1. Homogeneous distributed: they use the same DBMS in multiple sites.
2. Heterogeneous Distributed: Gives rise to federated DBMS or multi-database systems in which the participating DBMS have a certain degree of local autonomy and have access to several pre-existing autonomous databases stored in the DBMS, many of these employ a client-server architecture.
These arise due to the physical existence of decentralized bodies. This gives them the ability to join the databases of each location and thus access different universities, store branches, etc.
Graph Oriented Database
A graph-oriented database (BDOG) represents information as nodes of a graph and their relationships to the edges of the graph, so that graph theory can be used to traverse the database since it can describe attributes of nodes (entities) and edges (relationships).
A BDOG must be absolutely normalized, this means that each table would have only one column and each relationship only two, with this it is achieved that any change in the information structure has only a local effect.
Other database types
Today, there are many types of databases, some less common adapted to financial functions, scientific functions and other highly specific functions, all depending on how technology advances. Some of them include:
- Databases in the cloud. A cloud database is a collection of data, can be structured or unstructured, located on a cloud computing platform private, public or a combination of previous ace (hybrid). There are two database models in the cloud: traditional and database as a service (DBaaS). With DBaaS, where administrative arrangements and maintenance are made by a service provider.
- Open source. An open source database system is the one whose source code is open source; it could be SQL or NoSQL databases.
- Document Database/JSON. to manage document-based information, document databases are a modern way of storing data in JSON format rather than rows and columns.
- Multimodel database. combine different types of database models into a single integrated back-end. In this way several types of data can live in the same database.
- Independent databases. Independent databases are the new database, (known as autonomous databases), are cloud-based and use automatic learning to automate adjustment, security, backups, updates and other routine database management tasks traditionally performed by database administrators.
Database query
A query is the method of accessing information in databases. With the queries you can modify, delete, display and add data in a database, they can also be used as a record source for forms. For this, a Query Language is used.
Queries to the database are made through a Data Manipulation Language, the most widely used database query language is SQL.
Research
Database technology has been an active research topic since the 1960s, both in academia and in industry research and development groups (for example, IBM Research). Research activities include theory and prototype development. Notable research topics have included data models, the concept of an atomic transaction, concurrency control techniques, query languages and query optimization methods, RAID, and more. The database research area has several dedicated academic journals (eg, ACM Transactions on Database Systems, Data and Knowledge Engineering-DKE) and annual conferences (eg, ACM SIGMOD, ACM PODS, VLDB, IEEE ICDE).
Contenido relacionado
Programming
Modem (disambiguation)
Graphics Interchange Format