Method for calculating availability

The work of Tsai and Sang [23] describes a method of analyzing solutions, based on the risk of unavailability of its constituent systems.  The systems are interconnected in one of two ways: in series or in parallel.  The systems are weighted according to their risk of unavailability, allowing assessment of availability, as described in section Availability of a solution.  This assessment therefore allows evaluation of availability for the solution.

The availability of a solution is the calculated result of availability of the series and parallel systems that comprise the solution, or the calculated availability of the single constituent component, where appropriate.  This calculation follows from the implementation of the logic layer by the physical layer of the systems themselves.

The present study does not quantify the availability of the logic components of the solution.  The internal availability of the logic components is known in advance, during the design of the solution, and an availability of 1 is therefore used in this analysis.  The current framework highlights ways to increase availability by modifying only the architecture of the solution (the number of logic components, the number of virtual components, along with their dependencies and communication links), without changing the design (the internal implementation of the logic components).

The availability of the virtual components on which a logic component depends directly impacts the system consisting of a logic component and the virtual components it depends on. Since the availability of virtual components is almost always less than 1, as discussed in section Availability of a solution, and that the logic component itself is assumed fully available, the availability of virtual components can only reduce the total availability of the system.

The terminology used in the analysis method of Tsai and Sang [5] is slightly different from that used in this study.  The context of this analysis must not only take into account the availability of logic components, but also the availability of virtual components on which each logic component depends. Logic components and virtual components are considered equivalent to the systems and subsystems of the method of Tsai and Sang [5].

The systems are entities entirely dependent on subsystems, and these subsystems may be in series or in parallel.  In the same sense, a solution is composed of logic components which themselves depend on virtual components that can be duplicated or not.

Therefore, a logic component and the virtual components on which it depends constitute a combination of systems allowing one to establish the availability of this component.  Figure below illustrates the dependencies between logic and virtual components and the dependencies of a model based on systems.

After identifying the logic components and virtual components, we can then put them in series or in parallel like the systems of Tsai and Sang.

Virtual components on which a logic component depends have an implementation in the physical viewpoint of the architecture.  To calculate the availability of one of these virtual components, you must know the availability of the physical implementation.  The physical component suppliers usually have statistics that allow extrapolation of availability.

The average recovery time is the time required for restarting a solution following the failure of a device.  This average recovery time (Mean Time to Recover, MTTR) combined with the average time between failures (Mean Time Between Failure, MTBF) gives the actual availability of a software component.  The availability of a component is therefore calculated as follows :

For example, based on a sample of one year:

  • Western Digital Velociraptor hard drives have an availability of 99,99942% based on a recovery time of eight hours. [25]
  • Intel S500PSL motherboards have an availability of 99,95081% based on an average recovery time of 48 hours.[11]
  • Highpoint Rocket Raid controllers have an availability of 99,99739% based on an average recovery time of 24 hours. [9]

Again according to Tsai and Sang [5], a solution where the availability of the systems is weighted may have a particular availability, while not addressing the way in which the availability of the systems can be evaluated.  Unlike the dependencies of a component, the framework of the present study abstracts the availability of the component itself and assumes that it is always available.

Virtual components have an availability that we do not know in the virtual layer.  We cannot predict their availability since we do not know the environment in which each component will be deployed.  The purpose of this study being to foreground the most at-risk components, the unknown availability of a virtual component has a major impact on the analysis of the total availability of the solution.  A pessimistic availability should be used.

This availability is determined by having a pessimistic assumption of one failure per month over a period of one year.  MTBF is then 12 breakdowns per period of 365.25 days, giving a MTBF of 731 hours.  A second hypothesis is issued where 48 hours are needed to address these failures.  Based on the availability calculation, the pessimistic availability that will be used will be 0.9384%.

Since the virtual components have incoming dependencies that come directly from a logic component, the availability of such a component can then be precalculated from the virtual components on which it depends, allowing therefore the simplification of the lowest level of abstraction of a virtual standpoint.

Despite that at this stage the availability of virtual components is unknown, some assumptions can be made so that the generic availability is more representative of the target environment.  Software development is usually done by targeting a typical physical environment, provided with constraints which allow one to make assumptions.

Software could target a specific platform in order to use an API, simplifying the development of the software solution.  This API can be deployed for only certain instruction sets, such as a set of Intel x86 instructions, reducing deployment possibilities of the software.  Virtual processing components can also be based on physical components having restrictions, such as a set of specific instructions.  In this case, an assumption may be added to the study, and assume, for example, that all virtual processing components have an availability reflecting the implementation of these virtual components – using an availability of 0.95, for example, in place of the assumption of 0.9384.

With the aim of simplifying the calculations, the explanation of the methodology of this chapter is based on a pessimistic availability of 0.5, while recognizing that this availability is unrealistic and should not be used in the framework of an analysis.

Availability of systems connected in series can be calculated by multiplying the availability of the sub-systems that link them.  For example, for a solution comprising the two components D1, D2, the solution would have an availability of A = D1 x D2. Using the fixed availability of the previous section,  would have an availability of , or .

When there is a duplicate system, all duplicates are connected to the rest of the solution in parallel.  Component availability can then be calculated by applying the function:

So for a component , with two dependencies in parallel D1 and D2, the component would have an availability of:

The calculation of availability of figure above is as follows:

The framework of analysis divides a solution into parallel and series systems.  However, a solution could be more complex and use a combination of parallel and series systems.  In the case of these complex solutions, the solution must be divided by isolating simple systems in order to be able to use the method of analysis based only on parallel and series systems.  For example:

Availability must be calculated by isolating subsystems C2 and C2′, subsystems C3 and C3′, and then calculating the availability of component C1 with the result of the last two calculations using a series system calculation.

The complexity of a system made of the sequence of components

 

is

where Ci is either a constant (component as is), a product of availability (sequence of components), or the result of calculating the availability of the duplicates.