High Availability Cluster

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

A high availability cluster is a set of two or more virtual machines characterized by maintaining a series of shared services and by constantly monitoring each other.

Don't confuse a high availability cluster with a high performance cluster. The second is a configuration of equipment designed to provide computing capacities much greater than that provided by individual equipment (see for example Beowulf Cluster type systems), while the first type of cluster is designed to guarantee the uninterrupted operation of certain Applications.

This type of clusters are usually used for load balancers, backup services and failover. To configure them correctly, all the servers must have access to the same shared memory, so that in the event that one of these fails, a virtual machine can be launched from another of the servers and perform its tasks without downtime.

This ability of clusters to restore a service in a few seconds, maintaining the integrity of the data, means that in many cases users do not have to notice that a problem has occurred. When a failure of this type, in a system without a cluster, could leave them without service for hours. The use of clusters is not only beneficial for unscheduled service outages, but it is also useful in scheduled system shutdowns such as hardware maintenance or software updates.

Classes

We can divide it into two classes:

High availability of infrastructure: If a hardware failure occurs on one of the cluster machines, the high-availability software is able to automatically start services on any of the other cluster machines (failover). And when the failed machine recovers, the services are again migrated to the original machine (failback). This automatic service recovery capability guarantees us the high availability of the services offered by the cluster, thus minimizing users' perception of fault.

High availability of implementation: If a hardware failure occurs or the applications of any of the cluster machines, the high-availability software is able to automatically boot the services that have failed in any of the other cluster machines. And when the failed machine recovers, the services are again migrated to the original machine. This automatic service recovery capability guarantees us the integrity of the information, as there is no data loss, and also avoids discomfort to users, who do not have to notice that there has been a problem.

Availability calculation

In a real system, if one of the components fails, it is either repaired or replaced by a new component. If this new component fails, it is replaced by another, and so on. The fixed component is considered to be in the same state as a new component. During its useful life, one of the components can be considered in one of these states: working or under repair. The working state indicates that the component is operational and the one in repair means that it has failed and has not yet been replaced by a new component.

In case of defects, the system works in repair mode, and when the replacement is made, it will return to the working state. Therefore, we can say that the system has, during its lifetime, a mean time to failure (MTTF) and a mean time to repair (MTTR). Its lifetime is a succession of MTTFs and MTTRs, as it fails and is repaired. The lifetime of the system is the sum of MTTFs in MTTF + MTTR cycles already lived.

In simplified form, it is said that the availability of a system is the relation between the duration of the useful life of this system and its total life time. This can be represented by the formula below:

Availability = MTTF / (MTTF + MTTR)

In the evaluation of a high availability solution, it is important to take into account if in the MTTF measurement the possible planned stops are seen as failures.

Reasons to implement an HA cluster:

Increase availability

Improve performance

Scalability

Contenido relacionado

Más resultados...