System reliability

ImprimirCitar

The term system reliability refers to the performance or behavior of a system or procedure with respect to certain desirable or expected criteria. Therefore, extending the meaning to systems, it is said that the reliability of a system is the probability that this system works or develops a certain function, under fixed conditions and during a determined period. A system is a collection of components/subsystems arranged according to a given design with the purpose of achieving certain functions with an acceptable adequacy and reliability. The type of components, their quantity, their quality and the way in which they are arranged have a direct effect on the reliability of the system.

The reliability is defined as the probability that a good will function properly during a given period under specific operating conditions (for example, conditions of pressure, temperature, friction, speed, stress or shape of a electric wave, vibration level).

Introduction

Currently, most of the goods and services are obtained and delivered to their recipients through "production systems", often large in both the number of people who work in them and the size and the value of the facilities and equipment they use.

Throughout its life cycle each system goes through different phases. The first one is the construction and commissioning phase, until the normal operating regime is reached. During this second phase, called operation, which is the only truly productive one, the system is subjected to failures that hinder or even temporarily or permanently interrupt its operation.

The purpose of maintenance is precisely to reduce the negative incidence of these failures, either by reducing their number or mitigating their consequences.

We say that something fails when it stops providing us with the service it was supposed to give us or when undesirable effects appear, according to the design specifications with which the item in question was built or installed.

In general, everything that exists, especially if it is mobile, deteriorates, breaks or fails over time. It can be short term or very long term. The mere passage of time causes in some goods, evident decreases in their characteristics, qualities or benefits. The study of the failures of products, equipment and systems is what reliability is about. In a colloquial sense, we say that someone or something is reliable if we can trust him or it. We associate reliability with the ability to safely depend on something or someone. The systems created by man are intended to satisfy a certain need. To do this they must work in a specific way in a certain environment. Sooner or later, all systems reach a point where they cannot satisfactorily do what they were designed to do. The failure of the system will have some repercussions that will depend on the type of system, and the type of mission that it is performing and the moment in which the failure occurs, as well as its magnitude. It is desirable that the designed systems are reliable, in the sense that the user can operate them without there being a high risk of failure. The level of reliability, or security of satisfactory operation, will depend on the nature of the purpose of the system. The fact that a system has a certain reliability will carry a cost and an associated effort, so the reliability requirement for a system must be adapted to its objective and significance.

Reliability is clearly an essential factor in the safety of a product. To achieve the objectives of adequate functional performance, limitation of life cycle costs, and safety, the design phase is the moment when a significant influence on them can be achieved.

Consequently, most reliability studies and methods developed focus on product design.

Reliability engineering is the study of equipment longevity and failure. For the investigation of the causes by which the devices age and fail, scientific and mathematical principles are applied. The goal is that a better understanding of device failures will help identify improvements that can be made to product designs to increase their life or at least limit the adverse consequences of failures.

The word reliability has a precise technical definition and is not totally equivalent to that understood as human reliability. This is:

Definition: Reliability is the probability that a device properly performs its intended function over time, when it operates in the environment for which it has been designed.

Note that there are four attributes specific to this definition. These are:

  • (1) probability;
  • (2) proper functioning;
  • (3) rating with respect to the environment;
  • (4) time.

The important thing is that the equipment and systems that we design and acquire to satisfy our needs give us the benefits that we expect from them with a high level of security and confidence in their proper functioning, which will always depend on how important it is to us. have the function performed by that equipment or system as the consequences of failures that may occur. And this is where the discipline of reliability comes into play. For this reason, it is necessary to consider reliability as one more discipline in the design of any system, from the analysis of the identified need, to the withdrawal of service of the designed system, and in an integrated way with the rest of the logistics support disciplines.

History of Maintenance

The word maintenance is used to designate the techniques used to ensure the correct and continuous use of equipment, machinery, facilities and services.

During the industrial revolution, maintenance was corrective (urgent), accidents and losses caused by the first boilers and the urgent intervention of insurers demanding greater and better care, provided the appearance of workshops mechanics.

Starting in 1925, the need to organize maintenance on a scientific basis became clear in the American industry. It begins to think about the convenience of repairing before wear or breakage occurs, to avoid interruptions in the production process, with which the concept of Preventive maintenance arises.

From the sixties, with the development of the electronic, space and aeronautical industries, Predictive maintenance appears in the Anglo-Saxon world, by which maintenance intervention does not always depend on the operating time but on the state or condition effectiveness of the equipment or its elements and the determined reliability of the system.

Currently, maintenance is facing what could be called its third generation, with the availability of highly reliable electronic inspection and control equipment, to know the real state of the equipment through periodic measurements or continuous of certain variables (temperature, pressure, vibrations, resistance, etc.). The application to the maintenance of information systems based on computers that allow the accumulation of empirical experience and the development of data processing systems will lead in the future to the maintenance of the use of expert systems and artificial intelligence.

On the other hand, there are changes in maintenance policies marked by the legislation on Safety and Hygiene at Work and by pressures on environmental issues, such as purification devices, extraction plants, elements for the limitation and attenuation of noise and detection, control and alarm equipment.

It is predicted that maintenance costs will suffer a progressive increase, this leads to the manufacture of more reliable and easy-to-maintain products.

Maintenance and reliability

The concepts of maintenance and reliability are closely related. Reliability is defined as the probability that equipment will function properly during a given period under specified operating conditions (for example, conditions of pressure, temperature, speed, voltage or electrical waveform, level of vibrations, etc.) and maintenance is the set of techniques used to ensure the correct and continuous use of equipment, machinery, facilities and services in order to avoid their breakage (that is, increase their reliability). Therefore, both terms are analyzed together.

Essentially there are two types of maintenance: preventive and corrective.

In preventive maintenance, the goal is to incur modest expenses in servicing the equipment, in order to avoid potentially expensive failures during operation. Typically, equipment goes down during preventive maintenance, and the physical effect of maintenance activities is to mitigate the effects of previous operation.

In contrast, corrective maintenance (or repair) is the response to equipment failure in order to return it to a working state.

For both kinds of maintenance, it can be assumed that there are various types of cost structures and various types of equipment behavior patterns. Consequently, there are quite a few cases of different models.

It is important to note that modeling and analysis of equipment maintenance procedures often requires consideration of the entire system rather than its individual components.

Preventive maintenance

For the majority of industrial production cases, it is preferable and preferable to apply preventive maintenance before a corrective one, in order not to hinder production and avoid unforeseen events caused by system breakage. The reason to replace a working device is that the cost of doing so is small compared to the cost of responding to a failure that occurs during the operation of the device, a field failure. Historically, two types of preventive maintenance policies have been defined. They are designated as “age substitution” and “block substitution”.

An age-replacement policy involves exchanging a device for a new one, whenever the device fails or reaches a preset age.

With bulk replacement, the working device is replaced at evenly spaced times regardless of its age at those time points. The optimal values of the policy times can be determined by analyzing the appropriate cost models.

The distinction between failures and replacements is that replacements include both devices changed due to failure and devices replaced preemptively before the failure.

The motivation for using a preventive maintenance program is that by doing planned replacements (repairs), the frequency of unplanned field failures will be reduced and presumably this will mean cost savings.

Renovation of machinery or devices

A device is used until it fails, at which time it is immediately replaced by a new identical device that is also used until failure. If this process is repeated endlessly, the sequence of device operating times constitutes a refresh process. Here is an example of the relationship between reliability and maintainability in production systems.

Definition: A process of renewal is a sequence of non-negative, independent and identically distributed random variables, for example T1, T2,...

The application of this model to individual components of systems is not difficult.

Associated with the term of renewal, there are several concepts such as:

  • Ratio of growing failure (RFC or IFR) Increasing Failure Ratio), Decreasing Bug Reason (RFD or DFR Decreasing Failure Ratio),
  • New Better than Used (New Better than Used, NBU), New Worse than Used (New Worse than Used, NWU),
  • New Better than Used in Hope (NewBetter than Used in Expectation, NBUE), New Worse than Used in Hope (New Worse than Used in Expectation, NWUE). All these concepts refer to the time of applying new device changes to a functioning system and its subsequent performance.

Availability

Extending the above analyzes to incorporate the maintenance process leads to a new performance measure: availability. There are four measures of availability in continuous use and all related to each other, and they are the following:

  • Definition 1: “The availability (point) A(t) of a device is the probability that it is working at any time t”:

A(t)=P[chuckles]x(t)=1]=E[chuckles]x(t)]{displaystyle ,A(t)=P[x(t)=1]=E[x(t)]}

  • Definition 2 : “A device’s limit A availability is the limit of A(t)”:

A=limt→ → ∞ ∞ A(t){displaystyle ,A=lim _{trightarrow infty }A(t)}

  • Definition 3: “The average AAV limit availability at an interval [0, t] of a device is”:

Aav(Δ Δ )=1Δ Δ ∫ ∫ 0Δ Δ A(t)dt{displaystyle ,A_{av}(tau)={frac {1}{tau }}}{int _{0}{tau }A(t)dt}

  • Definition 4: “The average limit availability A(∞) of a device is the limit of average availability”:

A∞ ∞ =limΔ Δ → → ∞ ∞ Aav(Δ Δ )=limΔ Δ → → ∞ ∞ 1Δ Δ ∫ ∫ 0Δ Δ A(t)dt{displaystyle ,A_{infty }=lim _{tau rightarrow infty }A_{av}(tau)=lim _{tau rightarrow infty }{frac {1}{tau }}{0}{int _{tau }{

Mathematically, it can be shown, using the definitions above, that availability becomes reliability when repair is not possible. But nevertheless, the results obtained refer only to isolated devices and not to systems, while there are some cases where a system can be treated as something individual or unique, since systems are managed in many different ways.

Therefore, the analysis of the availability of the system must be carried out under a specific mode of operation.

Types of systems

Mathematical models make it possible to analyze characteristics of the behavior of systems with a significantly lower effort, cost and risk than would correspond to carrying out the same analyzes on the system itself. The development of any mathematical model, whether to study the reliability of a system or any other characteristic, starts from the establishment of a series of hypotheses. It is, therefore, essential to know exactly the basis for the development of any model, so that one knows how close or far it is to reality and, consequently, how true the results are.

It is generally recognized that there are four generic types of structural relationships between a device and its components. These are: series, parallel, k-of-n, and all the others.

Function structure of a system

-The reliability of a system depends both on the individual reliability of each of its components and on the logical way in which these components are connected in relation to the functioning or not of the system. It is assumed that the health or failure state of the components determines the health or failure state of the system. This information is collected in the so-called structure function of the system.

- We assume that the system is made up of n components and that the state of component i is described by the variable Xi, which can take the value 1 if it works or 0 if it doesn't. The state of the XS system is a function of the variables Xi:

XS=≈ ≈ (X1,K,Xn){displaystyle ,X_{S}=Phi (X_{1},K,X_{nn}}}}}

≈ ≈ {displaystyle ,Phi } is the structure function of the system.

We will denote through RS the reliability of the system, using Ri the reliability of component i. Therefore, Ri=P(Xi=1){displaystyle ,R_{i}=P(X_{i}=1)}. Through QS=1− − RS{displaystyle ,Q_{S}=1-R_{S}} We'll write it down. probability of system failuresimilarly Qi=1− − Ri{displaystyle ,Q_{i}=1-R_{i}}.

Serial systems

- In a serial configuration, the failure of any of its components causes the system to fail. In most cases, when we consider complete systems and their most basic decomposition, a logical ordering of their components in series is obtained. That is, a serial system is one in which all components must function properly for the system to work. The structure function of the system is:

XS=≈ ≈ (X1,K,Xn)=X1⋅ ⋅ ⋅ ⋅ Xn= i=1nXi{displaystyle X_{S}=Phi (X_{1},K,X_{n})=X_{1}cdots cdot X_{n}=prod _{i=1^}{n}X_{i}}

System reliability is the probability that all system components will function. Since we consider the lifetimes of the components independent, then the reliability of the system is:

Rs=P(Xs=1)=P(X1=1,K,Xn=1)=P(X1=1)↓ ↓ ↓ ↓ P(Xn=1)=R1↓ ↓ ↓ ↓ Rn= i=1nRi{displaystyle R_{s}=P(X_{s}=1)=P(X_{1}=1,K,X_{n}=1)=P(X_{1}=1)*dots *P(X_{n}=1)=R_{1}{1}*dots *R_{n}=prod _{i=1}{n}{n}{n}{1}{n}{1}{n}{i}{n}

- Effect of the reliability of a component on the reliability of the system → In a serial configuration, the component with the least reliability has a greater influence on the reliability of the system. It is said that “a chain is only as good as its weakest link”.

Representation of a serial structure.

Reliability of serial systems

≈ ≈ (X,t)= i=1nXit{displaystyle Phi (X,t)=prod _{i=1}^{n}X_{i}t}

RS(t)=E[chuckles] i=1nXit]= i=1nE[chuckles]Xit]= i=1nRit=≈ ≈ (R,t){displaystyle R_{S}(t)=E[prod _{i=1}^{n}X_{i}t]=prod _{i=1}^{n}E[X_{i}t]=prod _{i=1^}{n}R_{i}t=Phi (R,t)}

Systems in parallel

- In a parallel configuration, the operation of at least one component is required for the system to function. Components are said to be redundant. Redundancy is one of the methods used to improve the reliability of a system. The structure function of the system is:

XS=1− − ≈ ≈ (X1,K,Xn)= i=1nXi=1− − i=1n(1− − Xi){displaystyle X_{S}=1-Phi (X_{1},K,X_{n}}}=coprod _{i=1}^{n}X_{i}=1-prod _{i=1^}{n}(1-X_{i})}}}}

- The reliability function of the system is:

RS=1− − QS=1− − P(X1=0,K,Xn=0)=1− − ((1− − R1)↓ ↓ ↓ ↓ (1− − Rn))= i=1nRi=1− − i=1n(1− − Ri){displaystyle R_{S}=1-Q_{S}=1-P(X_{1}=0,K,X_{n}=0)=1-(1-R_{1})*dots *(1-R_{n})=coprod _{i=1}{n}{i=1-prod}{i={i=

-Effect of the reliability of the components on the reliability of the system → In a parallel system, the most important component in terms of reliability is the one that has the highest reliability of all. The characteristic inherent to the parallel model is called redundancy: That is, there is more than one component to perform a given function. Redundancy can be of two kinds:

  • Active Redundancy.- In this case, all redundant elements are active simultaneously during the mission.
  • Sequential Redundancy (also called stand-by or passive). - On this occasion, the redundant element only comes into play when the order is given as a result of the failure of the primary element. Until that time the redundant element has remained inactive, in reserve, but has been able to be:
  1. - Totally inactive (e.g.: A car spare wheel)
  2. - Full or partly energized (e.g., an generator).
Representation of a parallel structure.

Reliability of parallel systems

≈ ≈ (X,t)=1− − i=1n(1− − Xit){displaystyle Phi (X,t)=1-prod _{i=1}^{n}(1-X_{i}t)}}

RS(t)=1− − E[chuckles] i=1n(1− − Xit)]=1− − i=1n(1− − E[chuckles]Xit])=1− − i=1n(1− − Rit)=≈ ≈ (R,t){displaystyle R_{S}(t)=1-E[prod _{i=1}^{n}(1-X_{i}t)]=1-prod _{i=1}{n}{n}(1-E[X_{i}t])=1-prod _{i=1}{n}(1-R_{it,=

K-out-of-n systems

- The k-out-of-n configuration is a generalization of the parallel system in which the operation of at least k of the n units is required for the system to work. For example, an airplane that has four engines but can fly with at least two of them running is a 2-out-of-4 system.

The structure function of a k-out-of-n system is:

<math alttext="{displaystyle Phi (x)={begin{cases}{begin{matrix}1&si;sum _{i=1}^{n}x_{i}geq k\0&si;sum _{i=1}^{n}x_{i}≈ ≈ (x)={1si␡ ␡ i=1nxi≥ ≥ k0si␡ ␡ i=1nxi.k{displaystyle Phi (x)={begin{cases}{begin{matrix}1 fakesi;sum _{i=1}x_{n}x_{i}geq k pretendsi;sum _{i=1}{n}x_{i}{i}{matrix}}{end{matrix}}}}}}}}}<img alt="{displaystyle Phi (x)={begin{cases}{begin{matrix}1&si;sum _{i=1}^{n}x_{i}geq k\0&si;sum _{i=1}^{n}x_{i}

The reliability function, when considering independent components with identical reliability R is:

RS=k,n,R=␡ ␡ r=kn(nr)Rr↓ ↓ (1− − R)n− − r{displaystyle R_{S}=k,n,R=sum _{r=k}^{n}{n choose r}R^{r}(1-R)^{n-r}}}}

-Reliability of a k-out-of-n system with non-identical components → A reliability calculation method consists of determining all possible different combinations of operation and calculating the probability of each of them.

-Combination of subsystems in series and in parallel →The reliability of the resulting system is calculated by first evaluating the reliability of each subsystem to later combine them appropriately.

Representation of a structure in k-out-of-n.

Coherent Systems

The definition of reliability presents four elements: probability, time, environment, and correct operation. It is necessary to speak in probabilistic terms when referring to the possible state of the system at a future instant; time is essential, since the probability that the components work depends on the elapsed time, since most of the degradation processes that cause their failures are a function of this; the specification of the operating environment is necessary, since the same component will lose its qualities in different ways in different environments, and consequently its probabilities of survival or correct operation will be different; and finally, what is meant by correct operation must be defined, that is, the dividing line between “operation” and “failure” must be established.

-A component i of a system is relevant when there is at least one situation, defined by the state of the rest of the components, in which the functioning of the system depends on whether component i works or not, that is, ∃ vector of state of the components x t. Q. φ (1i x) > φ(0i x). Therefore, a component is irrelevant when for every vector of states of the components x it is verified that φ (1i x) = φ (0 i x).

  • A system is said to be consistent if all its components are relevant and the structure function is not decreasing in each argument.
  • φ(1)= 1, φ(0)= 0, if φ(x) is the structure function of a coherent system, then Русский Русский x{displaystyle forall x} verified i=1nxi≤ ≤ ≈ ≈ (x)≤ ≤ 1− − i=1n(1− − xi){displaystyle prod _{i=1}{n}x_{i}leq Phi (x)leq 1-prod _{i=1}^{n}(1-x_{i})}}

Representation of structures through cuts and paths

  • A subset of system components whose operation ensures the operation of the system is called road.
Definition: A road vector is a state vector of the x components for which Ф (x) = 1. The corresponding set of path is P (x) where P (x) = {i /25070/ xi = 1}
  • A road it is said to be minimal if no subset of its own is in turn on its way.
Definition: A minimum road vector is a road vector x for which and β x implies that Ф (y) = 0. The corresponding set of paths, P (x), is designated as a minimum set of paths.
  • A subset of system components whose failure causes system failure is called cut.
Definition. A cutting vector is a state x vector of the components for which Ф (x) = 0. The corresponding cutting set is C(x), where C(x) = {i Δ xi = 0}.
  • A cut is said minimal if no subset of its own is in turn cut.
Definition. A minimum cutting vector is a cutting x vector for which and /2005 x implies that Ф(y) = 1. We refer to the corresponding minimum set of cut, C(x), as a minimum set of cut.

Reliability Block Diagrams (RBDS)

In reliability, a block diagram (RBD) is a graphical representation of the components/subsystems of the system, and how they are related from the point of view of reliability. In some cases, this relationship is different from the physical relationship. For example, a group of resistors that is physically connected in parallel from a reliability standpoint is connected in series, since all resistors are necessary to provide the required resistance. The following graphic shows a simplified RBD of a computer system with a redundant fan. Each reliability block in the diagram could be represented by its own block diagram. For example, in the RBD of a car, the top level of blocks could represent the main systems of the car. Each of these systems could have their own RBDs.

BDRS diagram of a CPU.

The RBD provides a visual representation of how the blocks are related so that the diagram shows the effect that the operation or failure of a component has on the operation or failure of the system.

The first step in evaluating the reliability of a system is to obtain data about the reliability of each of the blocks. These data will allow the reliability engineer to characterize the life distributions of specific components or blocks.

The reliability of the system is expressed as a function of the reliability of its components. RBDs are often useful when determining this mathematical function.

  • The main objective of a system reliability analysis is to determine the distribution function from time to failure, although in other cases only the reliability of the system is desired or can be determined for a given time.
  • There are two methods to determine the reliability of a system: analytical method and simulation. In the first one is used theory of probability. There are methods that lead to accurate assessments of reliability while others provide blankets. The simulation method generates fault times for each component and from them the system's operating status or failure is determined according to the system structure.

Reliability over time

The definition of reliability given in previous chapters indicates that reliability is the probability of satisfactory operation over time. Therefore, the extension of reliability measures to include time implies the specification of probability distributions, which must be reasonable models of the dispersion of life span.

Reliability function

The random variable T is defined as the life of the good or component, that is, the random variable that defines the concept of reliability is the duration or life of the device. The lifetime distribution function is based on these four concepts: These are F(t), (t), the density function, and the hazard function. It is assumed that T has a cumulative distribution function F(t) expressed by:

F(t)=P(T≤ ≤ t){displaystyle F(t)=P(Tleq t)}

Where the density function, f (t), is defined as:

f(t)=ddtF(t){displaystyle f(t)={frac {d}{dt}}F(t)}

so that it provides a quantification of the spread of the most probabilistic distribution of lives.

The reliability function R(t), also called the survival function, is defined as:

t)=1-F(t)}" xmlns="http://www.w3.org/1998/Math/MathML">R(t)=P(T▪t)=1− − F(t){displaystyle ,R(t)=P(T expectations)=1-F(t)}t)=1-F(t)}" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/8a4ed18b180788958881a264ff92b0f207ade81d" style="vertical-align: -0.838ex; width:28.519ex; height:2.843ex;"/>

In other words, R(t) is the probability that a new component will survive longer than time t. Therefore, F(t) is the probability that a new component will not survive more than time t.

On the other hand, the probability that a component of age t fails between t and t+s (s is a time increment with respect to t) is equal to:

<math alttext="{displaystyle ,P(tt)={frac {P(tt)}}={frac {F(t+s)-F(t)}{R(t)}}}" xmlns="http://www.w3.org/1998/Math/MathML">P(t.T≤ ≤ t+s日本語T▪t)=P(t.T≤ ≤ t+s)P(T▪t)=F(t+s)− − F(t)R(t){displaystyle ,P(t ingredientTleq t+shealtht)={frac {P(t ingredientTleq t+s)}{P(T/2003/t)}}={frac {F(t+s)-F(t)}{R(t)}}}}}<img alt="{displaystyle ,P(tt)={frac {P(tt)}}={frac {F(t+s)-F(t)}{R(t)}}}" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/599332dc946d4bc8fb7fb90c1abff119fc1cbdbd" style="vertical-align: -2.671ex; width:64.024ex; height:6.509ex;"/>

Dividing by s and letting s approach zero:

λ λ (t)=lims→ → 01sF(t+s)− − F(t)R(t)=f(t)R(t){displaystyle ,lambda (t)=lim _{srightarrow 0}{frac {1}{s}{frac {F(t+s)-F(t)}{R(t)}}}}{{frac {f(t)}{R(t)}}}}}}}

λ(t) is the failure rate function or hazard function or instantaneous failure rate and is a reliability characteristic of the product.

The failure rate function has no direct physical interpretation, however, for sufficiently small values of t it can be defined as the probability of failure of the component in an infinitely small time dt when in the instant t was operational. The hazard function is a fundamental quantity in reliability analysis. It is quite common for the failure behavior of devices to be described in terms of their risk functions.

Evolution of the failure rate over time. bathtub curve

The idea of the bathtub curve forms the conceptual basis for much of the study of reliability. The idea of the bathtub curve is that the hazard function for a sample of devices evolves as shown in the figure below. In particular, early in the life of devices, weaker devices fail at a relatively high rate as a result of an "infant mortality" phenomenon, perhaps due to faulty manufacturing. As early failures remove weak copies of devices from the sample, the risk rate decreases. Similarly, at the end of the life of the devices, the survivors fail as a result of "wear and tear", so the risk rate increases. In the interval between these two behaviors, the sample of devices exhibits relatively low and approximately constant risk. This interval is often referred to as the functional life of the device.

The duration of the life of a computer can be divided into three different periods:

I.- Youth. Infant mortality zone.

The failure occurs immediately or a very short time after commissioning, as a result of:

  • Design mistakes.
  • Defects of manufacture or assembly.
  • Difficult adjustment, which needs to be revised in the actual operating conditions until the desired set-up.

II.- Maturity. Period of useful life.

Period of useful life in which failures of a random nature occur. It is the longest period, in which systems are usually studied, since it is assumed that they are replaced before they reach the aging period.

III.- Aging.

It corresponds to the depletion, after a certain time, of some element that is constantly consumed or deteriorates during operation.

These three periods are clearly distinguished in a graph that plots the system failure rate against time. This graph is called the “Bathtub Curve” or the “Davies Curve”.

Although there are up to six different types of bathtub curve, depending on the type of component in question, a conventional bathtub curve adapts to the following figure:

Graphical representation of Davies' tub or curve curve.

In a conventional bathtub curve, the three areas described above can be seen:

  1. Infant Mortality Zone: The breakdowns are decreasing over time, even taking a constant value and reaching a useful life. In this area the components with manufacturing defects fail, so the breakdown rate decreases over time. The manufacturers, to avoid this area, subject their components to an initial "burn-in" ("burn-in" in English), disposing the faulty components. This initial burn is done by subjecting the components to certain extreme conditions, which accelerate the failure mechanisms. The components that pass this period are those that sell us the manufacturers, already in the area of useful life.
  2. Life area:with approximately constant failure rate. It is the largest area, where systems are often studied, as they are supposed to be replaced before they reach the ageing area.
  3. Aging zone: The one that the breakdown rate grows again, because the components fail by degradation of their characteristics over the course of time. Even with repairs and maintenance, failure rates increase, until maintenance is too costly.

Mean Time Between Failures (MTBF)

In practice, reliability is measured as the mean time between maintenance cycles or the mean time between two consecutive failures (Mean Time Between Failures; MTBF).

For example, if we have a product of N components operating for a period of time T, and we assume that several components have failed in this period (some on several occasions), in this case the i-th component will have had neither failures, then the average number of failures for the product will be:

n! ! =␡ ␡ i=0NniN{displaystyle {bar {n}}=sum _{i=0}{n}{frac {n_{n}{n}}}{n}}}{n}}}{n}}}}}{n}{n}}{n}}{n}}}{n}}}}{n

That is, the failure rate will be:
Rate of Failure TF(% % )=num.Fallorsnum.UnidadesProrbadas100{displaystyle TF(%)={frac {num.Fallos}{num.UnitiesProbated}}}100
Rate of Failure TF(N)=num.FallorsTiempor.Tortal.Funciornamientor[chuckles]1h]{displaystyle TF(N)={frac {num.Fallos}{Time.Total.


Being the MTBF the quotient between T and n! ! {displaystyle {bar {n}}} I mean:

MTBF=Tn! ! {displaystyle MTBF={frac {T}{bar {n}}}}}}

Which is the same as:
MTBF=1TF(N)[chuckles]h]{displaystyle MTBF={frac {1}{TF(N)}}}[h]

Example

A company bought 30 laptops, 28 were used for 2000h, one failed after 400h and another after 1600h

Rate of Failure TF(% % )=230100{displaystyle TF(%)={frac {2}{30}}}100}= 6.66%
Rate of Failure TF(N)=228↓ ↓ 2000+1600+400=129000[chuckles]1h]{displaystyle TF(N)={frac {2}{28*2000+1600+400}={frac {1}{29000}}}[{frac {1}{h}}}}}}}}}}}}
MTBF=1129000=29000[chuckles]h]{displaystyle MTBF={frac {1}{frac {1}{29000}}= 29000[h]}

Mean Time To Failure (MTTF)

Mean Time To Failure (MTTF) is another parameter used, along with the failure rate λ(t) to specify the quality of a component or system.

For example, if N identical elements are tested from instant t=0, and the operating times of each one are measured until a fault occurs. Then the MTTF will be the average of the times ti measured, that is:

MTTF=␡ ␡ i=0NtiN{displaystyle MTTF={frac {sum _{i=}{n}t_{i}}{n}}}{n}}

Mathematical models of probability distribution of failures

In principle, any distribution function can be used to create a team lifetime model. In practice, distribution functions with monotonic hazard functions seem more realistic, and within this class there are a few that are considered to provide the most reasonable models of device reliability.

Exponential law of failures: constant failure rate

The distribution function most often used to model reliability is the exponential. The reason is that:

  • It's simple to treat algebraicly
  • It is considered appropriate to model the functional interval of the device's life cycle.

In fact, the exponential distribution appears when the failure rate is constant, that is, λ λ (t)=λ λ {displaystyle lambda (t)=lambda }. The corresponding reliability function is then

R(t)=e− − λ λ t,{displaystyle ,R(t)=e^{-lambda t},}

the distribution function

F(t)=1− − R(t)=1− − e− − λ λ t{displaystyle ,F(t)=1-R(t)=1-e^{-lambda t}}

and the density function f(t):

f(t)=λ λ e− − λ λ t.{displaystyle ,f(t)=lambda e^{-lambda t}.}

That is, if the failure rate is considered constant, then the distribution function of failures is exponential. From its properties it follows that the probability that a unit that is working fails in the next instant is independent of how long it has been working. This means that the drive shows no signs of aging: it is just as likely to fail in the next instant when it is new or when it is not.

Weibull's Law: Increasing and Decreasing Failure Rates

A vast majority of real equipment does not have a constant failure rate: it is more likely to fail as it ages. In this case the failure rate is increasing. Although it is also possible to find equipment with decreasing failure rates.

A function that can be used to model increasing or decreasing failure rates is

λ λ (t)=α α β β tβ β − − 1{displaystyle ,lambda (t)=alpha beta t^{beta -1}}}Where 0}" xmlns="http://www.w3.org/1998/Math/MathML">α α ▪0{displaystyle ,alpha /2005}0}" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b20e657e66ca954650c16a9bb55bc7dd93795b3d" style="vertical-align: -0.338ex; width:6.136ex; height:2.176ex;"/> and 0}" xmlns="http://www.w3.org/1998/Math/MathML">β β ▪0{displaystyle ,beta /2005}0}" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/1bcf32624aaec1856d083f77c8721dedb557984b" style="vertical-align: -0.671ex; width:5.98ex; height:2.509ex;"/>

This function is growing when 1}" xmlns="http://www.w3.org/1998/Math/MathML">β β ▪1{displaystyle beta 한1}1}" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/1476a19e8f0d4cee97bf949617be8ed3cbc676e6" style="vertical-align: -0.671ex; width:5.593ex; height:2.509ex;"/>, decreasing when <math alttext="{displaystyle beta β β .1{displaystyle beta ≤1}<img alt="{displaystyle beta and constant when β β =1{displaystyle beta =1}.

Graphical representation of potential curves of increasing and decreasing rates

The associated reliability function R(t) is

R(t)=e− − α α tβ β {displaystyle ,R(t)=e^{-alpha t}beta }

Graphical representation of the Weibull curve.

for all t ≥ ≥ 0{displaystyle geq 0} and therefore

F(t)=1− − e− − α α tβ β {displaystyle ,F(t)=1-e^{-alpha t}beta },

which is the Weibull distribution function. The Weibull distribution is frequently used in the development of reliability models. It has the advantage of flexibility in modeling various types of risk behavior, and it is also algebraically manageable. Also, as with any distribution with two parameters, it can describe many real situations quite well.

Lognormal Law

Another popular model is the lognormal distribution, with density function:

f(t)=e− − (t− − μ μ )22σ σ 2σ σ 2π π {displaystyle ,f(t)={frac {e^{frac {-(t-mu)^{2}{2sigma ^{2}}}{sigma {sqrt {2p}}}}}}}}}{sigma {sqrt {2sqrt {2sqp}}}}}}}}}}}}}}

Its hazard function is increasing and is often used to model the reliability of structural and electronic components.

Its disadvantage is that it is quite difficult to treat algebraically, but its advantage is that it arises naturally as the convolution of exponential distributions. Therefore, it is of considerable practical interest in relation to physical failure processes.

Failure processes

The fourth component of the definition of reliability is the environment. The imposition of forces (energy) on the system and its components from the environment mostly cause system failures due to the environment. These forces induce and sustain the progress of various types of deterioration processes, which ultimately result in component failure.

There are two types of component degradation process models: mechanical failure models and electronic failure models.

Mechanical failure models

In mechanics, failure models have been developed from a mechanical or chemical-electrical perspective. The reliability of mechanical equipment is often considered to depend on structural integrity, which is influenced by applied loads and inherent strength. As for the chemical-electric, it has usually been considered as dependent on material stability, despite exposure to hostile chemical reactions such as oxidation.

An early and still popular representation of the reliability of a mechanical device is the "force-stress interference" model.

Y)=int _{0}^{infty }h(y)(1-G(y))dy=int _{0}^{infty }H(x)g(x)dx}" xmlns="http://www.w3.org/1998/Math/MathML">R=P(X▪And)=∫ ∫ 0∞ ∞ h(and)(1− − G(and))dand=∫ ∫ 0∞ ∞ H(x)g(x)dx{displaystyle ,R=P(X/2005)=int _{0}^{infty }(y)(1-G(y)))dy=int _{0}^{infty }H(x)g(x)dx}Y)=int _{0}^{infty }h(y)(1-G(y))dy=int _{0}^{infty }H(x)g(x)dx}" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b8606561f3e1948f0fcdbf4a6503311acf29f808" style="vertical-align: -2.338ex; width:60.588ex; height:5.843ex;"/>

Where the stress can be modeled by the distribution function H(y) and the existence of a random dispersion in the inherent strength of the devices, x, which can be modeled by G(x).

Electronic failure models

The reliability models of electrical and electronic devices are due to empirical observations and were developed after the mechanical reliability models.

Most of the models developed are based on the idea that the degradation processes of electronic devices are essentially chemical conversion reactions that take place in the materials that make up the devices. Consequently, many models are based on the Arrhenius reaction rate equation, which took its name from the 19th century chemist who he developed the equation while studying irreversible reactions such as oxidation. The basic form of the equation is that the reaction rate, ρ, is:

ρ ρ =MIL MIL e− − EaKT{displaystyle ,rho =eta e^{frac {-E_{a}{KT}}}}}

where η is an electron frequency factor, K is Boltzmann's constant (8.623x10–5ev/°K), T is the temperature in kelvins, and Ea is the Gibbs activation free energy.

Other aspects of failure processes

Aspects such as aging acceleration (manipulation of the operating environment can be used to increase the aging rate of a sample of devices) or reliability growth (belief that the design and development of a new device, and the evolution of the manufacturing methods of the new design, result in an improvement in the reliability of a sample of devices) are points to take into account when possible failures appear in the systems.

Reliability Management

Introduction

A truly effective reliability program can only exist in an organization where meeting reliability objectives is recognized as an integral part of corporate strategy. Otherwise, it is one of the first to be cut as soon as there are cost or deadline pressures.

Integrated reliability programs

Since production quality will be the ultimate determinant of reliability, quality control is an integral part of the reliability program. The quality control program must be based on reliability requirements and not be aimed solely at reducing production costs. The quality control program will contribute effectively to reliability if the procedures of the first are linked to factors that may influence ene. Second, and not just to forms or functions, is QA test data integrated with other reliability data, and QA personnel are trained to recognize the reliability relevance of their work, as well as motivated to contribute to its fulfillment.

Reliability and associated costs

It is expensive to reach high reliability objectives, especially when the product or system is complex. But to all this, experience shows that all the efforts of a well-managed reliability program are profitable, since it is less expensive to discover and correct deficiencies during design and development than to correct the result of failures produced during the operation of the product or system. Depending on the nature of the program, we will be faced with one type of cost or another. If it is about the design of a product and placing it on the market, it will be about reliability cost, and if, on the contrary, it is about the design of a system at the specific request of a client, it is about life cycle cost.

The reliability cost includes all the costs incurred during the design, production, guarantee... and is based on the customer-user binomial, while the life cycle cost is made up of all the costs incurred by the system throughout of its life: from conception to its removal at the end of its useful life, and this type of cost is based on the perspective of the manufacturer with limited liability during the life of the product. The elements that make up each type of cost are represented below:

It should be noted that reliability programs are usually limited by the resources that can be allocated to them during the design and development phases. The allocation of resources to the activities of a reliability program must be based on a consideration of the associated risks, being a subjective value based on experience.

There is a relationship between reliability of a system and the cost of design-development, thus its representation:

Reliability management by customer

Management corresponds to the responsibilities of a contracting organization regarding the development of the reliability program:

  • Specify reliability requirements
  • Specify the rules and methods to follow
  • Specify reporting requirements
  • Establishing the contractual framework
  • Control the performance of the contract

Reliability requirements

Reliability strengthening designs are based on requirements that define the need to be satisfied. Reliability requirements specifications must contain the following:

  • A definition of flaws related to the functions of the system, including all relevant failure modes.
  • A complete description of the environments in which the product or system will be stored, transported, used or maintained
  • A clear specification of reliability requirement.
  • A relationship of fault modes (with their effects) that are particularly critical and should have a very low probability of occurrence.

Special care must be taken when defining failures so that they are not ambiguous. These failures must always be related to a parameter that can be measured or linked to a clear indication free of subjective interpretations. In addition to all this, it is not inevitable that subjective variations appear when validating the failures (usually when the origin of the data is not controlled) The environment specifications must include loads, temperatures, humidity, vibrations and all the necessary parameters that may condition the probability of product or system failure. These requirements must be stated in a way that they are verifiable and logical, and must be related to the corresponding distributions.

Contracts with incentives

One of the most famous is the so-called Reliability Improvement Warranty (RIW). This type of contract requires that the contracting party be in charge of the integral maintenance in a period (usually several years) and for an already predetermined amount. This decision by the contractor is intended to maximize benefits by improving reliability and thereby reduce maintenance costs. With this action, the contracting party also benefits from not having to monitor the development of the reliability program so closely and, in turn, having to take care of maintenance during the contracted agreed time. The fundamental aspects for this type of contract are listed below:

  • This type of contract is established when it comes to products or systems where there is no high risk of development and for which there is a stable use.
  • The prefixed amount will provide the contractor with a substantial benefit, with a reasonable risk.
  • Due to the possible occurrence of practical difficulties, the RIW system will require special labelling and manipulation to ensure that they are not repaired by a person outside the contractor.
  • The contractor must have wide sleeves and certain freedoms to modify the equipment subject to this system with the intention of improving reliability, but in turn, it may risk that the contracting party may think it appropriate to maintain some control of the changes, as this may affect the performance of the product or system.
  • In the contract it must appear stipulated in what form the product or system will be used or maintained, since these aspects may be affected in its reliability.

Contenido relacionado

Li and Yorke's theorem

The Li and Yorke theorem is a mathematical theorem that states that, being f: R → R a continuous map, if f has a periodic point of period 3 then it has...

Dionysus the Meager

Dionysius the Meager was a monk, scholar and mathematician of Byzantine origin, known above all for being the creator of the calculation of the Anno Domini to...

Bijective function

In mathematics, a function is bijective if it is both injective and surjective; that is, if all the elements of the output set have a different image in the...
Más resultados...
Tamaño del texto:
Copiar