Design for High Availability

Analysis method for improving availability of software solutions

Software solutions must meet a certain number of quality factors. Availability is one measure for evaluating the quality of a solution. Availability depends as much on the software as it does on the physical deployment environment. When developing a distributed solution, the impact of the physical environment on availability becomes more important, since the components of the software solution may depend on multiple physical components. Sometimes, however, the physical deployment environment is unknown, but the software designer must still try to maximize availability.

Evaluating the availability of a solution requires hardware abstraction because details of the hardware are not always available. A virtual layer between the logic layer and the physical layer makes possible such abstraction. The logic layer represents the logic components of the solution, and the physical layer is the physical implementation of these components by hardware or software. The virtual layer allows abstraction of the physical substrate into categories, representing the virtual components upon which the logic components of the solution depend. Virtual components are grouped into three categories: transition, processing, and immutability.

Within the framework of this test, a software solution includes logic components and virtual components offering unique services, and where the solution needs to serve all of the functionality for which it was designed. Duplication is the means proposed to improve availability, by allowing a component to consume the duplicate service of another component if the component itself is not available.

To determine if there is improved availability once some components are duplicated, the methodology of Tsai and Sang [23] is adapted in view of virtualization. It quantifies the availability of a software solution by combining the logic and virtual components in either series or parallel until the total availability of the solution is established. Some assumptions can be made based on the actual availability of the physical components which implement virtual components. A generic and pessimistic availability can be used where no assumption can be made in order to identify the risks associated with the use of those components. A search for bottlenecks can target the component or components whose improved availability will have the greatest impact on the software solution.

An evaluation of a real case allows validation of the methodology: first without assumptions about availability of the physical implementation, and then with assumptions. A final scenario, this time deployed using cloud technologies, concludes the assessment of actual cases.

The analysis method proposed proves successful in improving the availability of a software solution. Virtualization adds a level of abstraction for the evaluation of virtual components, and duplication is a way to improve availability. This method of analysis could be adapted to include availability of the logic components in addition to that of the virtual components. The methodology provides a means to improve availability, but availability is not limited to the creation of duplicates. We must take into account other considerations, such as implementation costs or the criticality of certain logic with respect to the other components in the solution.