Computer cluster

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

A cluster at McGill University.

An example of cluster at NASA (USA)

The term cluster (from the English cluster, which means 'group' or 'cluster') is applied to Distributed computer farm systems typically linked together by a high-speed network and behaving as if they were a single server. Unlike grid computing, computer clusters have each node performing the same task, controlled and scheduled by software.

Clustering technology has evolved to support activities ranging from supercomputing applications and software for mission-critical applications, web servers and e-commerce, to high-performance databases, and other uses.

Clustered computing arises as a result of the convergence of several current trends that include the availability of high-performance, inexpensive microprocessors and high-speed networks, the development of software tools for distributed computing of high performance, as well as the growing need for computational power for applications that require it.

Simply, a cluster is a group of computers linked by a high-speed network, in such a way that the whole is seen as a single computer.

History

The origin of the term and use of this type of technology is unknown but can be considered to have started in the late 1950s and early 1960s.

The formal computer engineering foundation believes that the methodology of providing a means of doing parallel work of any kind was possibly invented by IBM's Gene Amdahl, who in 1967 published what has come to be considered the initial white paper parallel processing: Amdahl's law that mathematically describes the speedup that can be expected by parallelizing any other series of tasks performed on a parallel architecture.

This article defines the basis for computer engineering for both multiprocessor and cluster computing, where the main difference is whether communications between processors is done within a local network or over a wide area network.

Consequently, the history of the first clusters of computers is more or less directly tied to the history of the first steps of communication networks, as a necessity to link computing resources and create a cluster of computers. Packet switching networks were conceptually invented by the RAND corporation in 1962.

Using the concept of a packet-switched network, the ARPANET project succeeded in creating in 1969 what was possibly the first clustered computer network in four computer centers (each of which was somewhat similar to a networked cluster). local but not to a cluster over a wide area network as it is understood today).

The ARPANET project grew into what is now the Internet. It can be considered as "the mother of all clusters" (as the union of almost all computing resources, including clusters, that would become connected).

It also established the paradigm for the use of computer clusters in today's world: the use of packet-switched networks to carry out communications between processors located in otherwise disconnected frames.

The development of PC building by customers and research groups proceeded in tandem with that of networking and the Unix operating system since the early 1970s, such as TCP/IP and the Unix project. Xerox PARC designed and formalized for protocols based on the communications network.

The operating system kernel was built by a group of DEC PDP-11 minicomputers called C.mmp at C-MU in 1971.

However, it wasn't until around 1983 that the protocols and tools for remote working that facilitate file distribution and sharing were defined (largely within the context of BSD Unix, and implemented by Sun Microsystems). and thus become available commercially, along with a shared file system.

The first commercial cluster-type product was ARCnet, developed in 1977 by Datapoint but it was not commercially successful and clustering did not become successful until in 1984 DEC produced VAXcluster for the VAX/VMS operating system.

ARCnet and VAXcluster are not only products that support parallel computing, but also share file systems and peripheral devices.

The idea was to provide the benefits of parallel processing while maintaining data reliability and uniqueness. VAXcluster, VMScluster are still available on HP OpenVMS systems running on Itanium and Alpha systems.

Two other notable cluster business principles were the Himalayan Tandem (circa 1994 with high availability products) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for enterprise use).

The history of computer clusters would be incomplete without noting the fundamental role played by the development of PVM (parallel virtual machine:) software.

This open source software, based on TCP/IP communications, allowed the creation of a virtual supercomputer ―an HPC cluster― made from any of the TCP/IP connected systems.

Freeform heterogeneous clusters have been the pinnacle of this model, achieving rapid increases in FLOPS and far exceeding the availability of even the most expensive supercomputers.

PVM and the use of low-cost PCs and networks led, in 1993, to a NASA project to build cluster-based supercomputers.

In 1995, the invention of the Beowulf ―a cluster style― a computing farm designed after a network commodity with the specific goal of "being a supercomputer" capable of robustly performing parallel HPC computations.

This stimulated the independent development of Grid computing as an entity, even though the Grid style revolved around that of the Unix operating system and the Arpanet.

Features

Clusters are usually employed to improve performance or availability above that provided by a single computer typically being more economical than individual computers of comparable speed and availability.

A cluster is expected to present combinations of the following services:

High performance
High availability
Balanced load
Scalability

The construction of the cluster computers is easier and cheaper due to its flexibility: they can all have the same hardware configuration and operating system (homogeneous cluster), different performance but with different architectures and systems similar operating systems (semi-homogeneous cluster), or having different hardware and operating systems (heterogeneous cluster), which makes their construction easier and cheaper.

For a cluster to function as such, it is not enough just to connect the computers together, but it is necessary to provide a cluster management system, which is in charge of interacting with the user and the processes that run in it to optimize operation.

Benefits of cluster technology

Scalable parallel applications require: good performance, low latency, high-bandwidth communications, scalable networks, and fast file access. A cluster can satisfy these requirements by using the resources that are associated with it.

Cluster technology allows organizations to increase their processing capacity using standard technology, both hardware and software components that can be purchased at relatively low cost from clusters

Classification of clusters

The term cluster has different connotations for different groups of people. The types of clusters, established according to their use and the services they offer, determine the meaning of the term for the group that uses it. Clusters can be classified according to their characteristics:

HPCC (High Performance Computing Clusters: high-performance clusters).
HA or HACC (High Availability Computing Clusters: high availability clusters).
HT or HTCC (High Throughput Computing Clusters: high efficiency clusters).

High performance: These are clusters in which tasks that require large computational capacity, large amounts of memory, or both are executed. Carrying out these tasks can commit cluster resources for long periods of time. This in turn is sub-classified:

Asset/asset
Asset/passive

High availability: These are clusters whose design objective is to provide availability and reliability. These clusters try to provide maximum availability of the services they offer. Reliability is provided by software that detects failures and allows recovering from them, while hardware avoids having a single point of failure. In the same way, it is sub-classified:

High availability of infrastructure.
High availability of application.

High efficiency: These are clusters whose design objective is to execute the largest number of tasks in the shortest possible time. There is data independence between individual tasks. The delay between cluster nodes is not considered a big problem.

Clusters can also be classified into:

commercial IT clusters (high availability and high efficiency) and
scientific clusters (high performance).

Despite discrepancies at the level of application requirements, many of the characteristics of the hardware and software architectures, which underlie the applications in all of these clusters, are the same. Furthermore, a cluster of a certain type can also present characteristics of the others.

Components of a cluster

In general, a cluster requires various software and hardware components to function:

nodes
storage
operating systems
network connections
middleware
communication and services protocols
applications
parallel programming environments

Nodes

They can be simple computers, multiprocessor systems or workstations. In computing, in a very general way, a node is a point of intersection or union of several elements that come together in the same place. Now, within computing, the word node can refer to different concepts depending on the area in which we move:

In computer networks each of the machines is a node, and if the network is the Internet, each server is also a node.
In dynamic data structures a node is a record that contains a data of interest and at least a pointer to refer to another node. If the structure has only one pointer, the only structure that can be built with it is a list, if the node has more than one pointer you can already build more complex structures such as trees or graphs.

Nodes can be contained and interconnected in a single cabinet, or, as in many cases, linked via a Local Area Network (LAN).

The cluster nodes can consist of one or more processors, these computers must have a cool environment so that they can work and avoid "overheating".

The cluster can be made up of dedicated nodes or non-dedicated nodes.

In a cluster with dedicated nodes, the nodes do not have a keyboard, mouse, or monitor and are used exclusively for performing cluster-related tasks. Whereas, in a cluster with non-dedicated nodes, the nodes have a keyboard, mouse and monitor and their use is not exclusively dedicated to performing tasks related to the cluster, the cluster makes use of the clock cycles that the computer user does not you are using to perform your tasks.

It should be clarified that when designing a cluster, the nodes must have similar characteristics, that is, they must have a certain similarity in architecture and operating systems, since if a cluster is formed with totally heterogeneous nodes (there is a big difference between processor capacity, memory, hard disk) will be inefficient because the middleware will delegate or assign all the processes to the node with the highest computing capacity and will only distribute when it is saturated with processes; That is why it is advisable to build a group of computers that are as similar as possible.

Storage

Storage can consist of a NAS, a SAN, or internal storage on the server. The most commonly used protocol is NFS (Network File System), a shared file system between the server and the nodes. However, there are specific file systems for clusters such as Luster (CFS) and PVFS2.

Technologies in hard disk storage support:

IDE or ATA: speeds of 33, 66, 100, 133 and 166 MB/s
SATA: speeds of 150, 300 and 600 MB/s
SCSI: speeds of 160, 320, 640 MB/s. It provides high yields.
SAS: SATA-II and SCSI. Speeds of 300 and 600 MB/s
The tape units (DLT) are used for backups for their low cost.

NAS (Network Attached Storage) is a specific device dedicated to network storage (usually TCP/IP) that makes use of an optimized operating system to provide access through CIFS, NFS, FTP or TFTP protocols.

For its part, DAS (Direct Attached Storage) consists of connecting external SCSI storage units or to a SAN (storage area network: 'storage area network') through a communication channel. fiber. These connections are dedicated.

While NAS allows storage sharing, network usage, and is easier to manage, DAS provides higher performance and greater reliability by not sharing the resource.

Operating system

An operating system must be multithreaded and multiuser. Other desirable features are ease of use and access. An operating system is a program or set of computer programs designed to allow efficient and secure management of its resources. It starts working when the bootloader loads its kernel into memory and manages the machine's hardware from the most basic levels, also allowing interaction with the user. It can normally be found in most electronic devices that use microprocessors to function, since thanks to these we can understand the machine and that it fulfills its functions (mobile phones, DVD players, radios, computers, etc.).

Examples

GNU/Linux
- ABC GNU/Linux
- OpenMosix
- Rocks
- Kerrighed
- Condor
- Sun Grid Engine
Unix
- Solaris
- HP-UX
- AIX
Windows
- NT
- 2000 Server
- 2003 Server
- 2008 Server
- 2012 server
Mac OS X
- Xgrid
Solaris
FreeBSD

Network Connections

The nodes of a cluster can be connected through a simple Ethernet network with common cards (network adapters or NICs), or use special high-speed technologies such as Fast Ethernet, Gigabit Ethernet, Myrinet, InfiniBand, SCI, etc.

Ethernet
- They are the most used networks today, due to their relative low cost. However, their technology limits package size, performs excessive error checks and their protocols are not efficient, and their transmission speeds can limit the performance of clusters. For applications with thick grain parallelism can suppose a successful solution.
- The most commonly used option is Gigabit Ethernet (1 Gbit/s), with the 10 Gigabit Ethernet (10 Gbit/s) solution emerging. The latency of these technologies is around 30 to 100 μs, depending on the communication protocol used.
- In any case, it is the administration network par excellence, so even if it is not the high-performance network solution for communications, it is the network dedicated to administrative tasks.

Myrinet (Myrinet 2000 and Myri-10G).
- Its latency is 99 to 10 μs, and its bandwidth is 2 to 10 Gbit/s (for Myrinet 2000 and Myri-10G respectively).
- It is the most widely used low latency network in clusters and MPPs; it is present in more than half of the top500 systems. It has two low-level communication libraries (GM and MX). About these libraries are implemented MPICH-GM, MPICH-MX, Sockets-GM and Sockets-MX, to take advantage of the excellent features of Myrinet. There are also IP emulations on TCP/IP, IPoGM and IPoMX.

InfiniBand
- It is a network derived from a standard developed specifically to make communication in clusters. One of its greatest advantages is that by aggregation of channels (x1, x4 and x12) it allows to obtain very high bandwidths. The basic connection is 2 Gbit/s effective and with ‘quad connection’ x12 reaches 96 Gbit/s. However, startups are not very high, they are around 10 μs.
- Defines a connection between a computer node and an I/O node. The connection goes from a Host Channel Adapter (HCA) to a Target Channel Adapter (TCA). It is mainly used to access SAS disk arrays.

SCI (scalable coherent interface) IEEE Standard 1596-1992
- Its theoretical latency is 1.43 μs and its bandwidth of 5333 Mbit/s bidirectional. Being able to configure with ring topologies (1D), bull (2D) and hypercouple (3D) without switching, you have a suitable network for small and medium size clusters.
- As a network of extremely low latency, it has advantages against Myrinet in small-sized clusters by having a point-to-point topology and the acquisition of a switch is not necessary. The software on SCI is less developed than on Myrinet, but the yields are higher, highlighting SCI Sockets (which gets 3 microseconds startups) and ScaMPI, a high-performance MPI library.
- In addition, through the preloading mechanism (LD_PRELOAD) you can get all system communications to go through SCI-SOCKETS (transparency for the user).

Middleware

middleware is software that generally acts between the operating system and applications in order to provide a cluster with the following:

A unique system access interface, called SSI (SSI)Single System Image), which generates the sensation to the user that uses a single very powerful computer;
Tools for system optimization and maintenance: process migration, checkpoint-restart (freeze one or more processes, move them from server and continue their operation in the new host), balanced load, tolerance to failures, etc.;
Scalability: You must be able to automatically detect new servers connected to the cluster to use them.

There are several types of middleware, such as: MOSIX, OpenMOSIX, Cóndor, OpenSSI, etc.

The middleware receives incoming jobs to the cluster and redistributes them so that the process runs faster and the system is not overloaded on a server. This is done through policies defined in the system (automatically or by an administrator) that tell it where and how to distribute the processes, by a monitoring system, which controls the load of each CPU and the number of processes in it.

Middleware should also be able to migrate processes between servers for different purposes:

balance the load: if a server is very loaded with processes and another is idle, processes can be transferred to the latter to release the load to the first and optimize the operation;
Server Maintenance: If there are processes running on a server that needs maintenance or an update, it is possible to migrate the processes to another server and proceed to disconnect from the cluster to the first;
Prioritization of work: in case of having several processes running in the cluster, but one of them of greater importance than others, this process can be migrated to servers who possess more or better resources to accelerate their processing.

Parallel Programming Environments

Parallel programming environments allow the implementation of algorithms that make use of shared resources: CPU (central processing unit: 'central processing unit'), memory, data and services.

Stages for assembly

For the assembly of a cluster, two stages can be described:

Hardware selection: Computers that form our cluster are selected, can be homogeneous (like HW) or heterogeneous(different HW).

2. Software Selection: the operating system or Middleware used by the equipment is defined.

Combinations of the two previous ones are possible, such as homogeneity in the hardware and heterogeneity in the software or vice versa.

Implemented cluster systems

Beowulf

In 1994, Donald Becker and Thomas Sterling built the first Beowulf. It was built with 16 personal computers with 100 MHz Intel DX4 processors, which were connected via an Ethernet switch. Theoretical throughput was 3.2 GFlops.

A Beowulf farm.

Berkeley NOW

Berkeley's NOW system consisted of 105 Sun Ultra 170 workstations, connected via a Myrinet network. Each workstation contained a 167 MHz Ultra1 microprocessor, 512 KiB L2 cache, 128 MiB of memory, two 2.3 GB disks, Ethernet and Myrinet network cards. In April 1997, NOW achieved a performance of 10 GFlops.

Google

During the year 2003, the Google cluster reached more than 1.5 million personal computers. A Google query reads on average hundreds of megabytes and consumes a few trillions of CPU cycles.^{[citation needed]}

PS2 Cluster

In 2004, at the University of Illinois (in Urbana-Champaign, United States), the use of PlayStation 2 (PS2) consoles in scientific computing and high-resolution visualization was explored. A cluster made up of 70 PS2 was built; using Sony Linux Kit (based on Linux Kondora and Linux Red Hat) and MPI.

System X

Main article: Supercomputer

Xserve of System X at the Polytechnic Institute and Virginia State University

In the November 2004 Supercomputer 500 list, System X (System Ten) was ranked the seventh fastest system in the world; however, in July 2005 it was ranked fourteenth. System X was built at Virginia Tech in 2003; its installation was carried out by students of that institute. It is made up of multiple clusters of 2,200 2.3 GHz Apple G5 processors. It uses two networks: 4x Infiniband for inter-process communications and Gigabit Ethernet for management. System X has 4 TiB of RAM and 176 TB of hard disk, its performance is 12.25 TFlops. It is also known as Terascale.

Spanish Supercomputing Network

Main article: Spanish Supercomputation Network

In 2007 the Spanish Supercomputing Network was created, made up of 7 clusters distributed in different Spanish institutions.

All the clusters (with the exception of the second versions of Magerit and MareNostrum, and the most recent Calendula) are made up of a variable number of nodes with 2.2 GHz PowerPC 970FX processors interconnected with a Myrinet network. The performance of the machines ranges from almost 65 TeraFLOPS provided by the more than 10,000 cores of Marenostrum, almost 16 TeraFLOPS of Magerit (first version) with 2,400 processors or almost 3 TeraFLOPS of the remaining 5 nodes.

The Magerit update in 2011 maintains the cluster architecture due to its versatility and replacing the computing elements with IBM PS702 nodes with POWER7 processors at 3.0 GHz and achieving a performance of more than 72 TeraFLOPS, which makes it the most powerful from Spain. This demonstrates the simplicity and flexibility of the architecture: by upgrading some elements, more powerful systems are obtained without major complications.

Thunder

Thunder was built by the Lawrence Livermore National Laboratory at the University of California. It is made up of 4096 1.4 GHz Intel Itanium2 Tiger4 processors. It uses a network based on Quadrics technology. Its throughput is 19.94 TFlops. It was ranked second on the "TOP 500" during June 2004, then fifth in November 2004, and on the July 2005 list it ranked seventh.

ASCIQ

ASCI Q was built in 2002 by the Los Alamos National Laboratory, United States. It is made up of 8,192 AlphaServer SC45 1.25 GHz processors. Its throughput is 13.88 TFlops. It ranked second in the "TOP 500" during June and November 2003, then third in June 2004, sixth in November 2004, and twelfth in July 2005.

Distributed resource management: queue management systems

The tail management systems, manage an execution line, plan the execution of tasks and manage resources, to minimize costs and maximize application performance.

Operation:

- Users send jobs with qsub indicating memory requirements, processor time and disk space.

- The resource manager records the job.

- As soon as the requested resources have become available, the queue manager puts into effect the requested work that according to its planning is the one that has the highest priority. The queue manager planner is used in the absence of more advanced planners (such as Maui / Moab cluster suite, which can be integrated into the queue system).

- The status of work, in implementation, pending or completed, can be consulted through qstat

- A job can be eliminated by qdel.

- The queue manager is configured with qconf.

Standard job output: job.o#job
Output of work error: job.e#job
Popular tail management systems: Sun Grid Engine (SGE), PBS, Open PBS and Torque.

Load Balancers: Linux Virtual Server

Linux Virtual Server (LVS, IPVS in kernels 2.6.x) is a highly scalable and highly available network service that performs:

- Balanced load using NAT (Network Address Translation), tunneling IP or direct routing (DR) through a master node that serves FTP and HTTP requests to the nodes of a cluster. This service is provided at the kernel level (the support for LVS/IPVS has been compiled).

NAT makes the cluster work with a single public IP, being the packages rewritten by the master node to hide the internal nodes. It is only acceptable for a small number of nodes, for the overload it carries.

Tunneling IP is similar to NAT, but the master node node no longer rewrites the packages, being their task much lighter.

Direct routing (DR) is an even lighter system, but it needs all servers to share the same network segment.

Clusters in scientific applications

They are usually characterized by computer-intensive applications
Their resource needs are very important in storage and especially memory.
They require dedicated nodes and systems, in HPC and HTC environments.
Resources are usually controlled by Maui-type planners and PBS-type resource managers.
They are often legacy codes, difficult to maintain, as application domains are often difficult to parallel.

Examples: Simulations (earth simulator), computational genomics, weather forecasting (MM5), simulation of currents and discharges in the sea, applications in computational chemistry

Clusters in Enterprise Applications

They are usually applications not especially computer-intensive, but that demand high availability and immediate response, so the services are running continuously and not controlled by a queue system
It is usual that a system provides several services. A first approach to a job distribution is to separate services:

A web server with the BD on one node, the EJB container on another and the web page server on another constitutes one

clear example of distribution in business.

Another approach is to install a web application in a squid cluster such as proxy-cache, apache/tomcat as a web/web server, memcached as query cache to the database and mysql as a database. These services may be replicated in several cluster nodes.

Examples: Flickr, Wikipedia and Google.

Contenido relacionado

Más resultados...