Reliability in Water and Pumping Systems

Part 1. Determining reliability in pumping plants and water systems.

By Ed Butts, PE, CPI

We have been bombarded with the need to increase the efficiency of our pumping plants to improve performance and lower operating costs for years, but what about the reliability of the plant?

Increasingly, water system managers and engineers who are responsible for designing systems or providing water to a public or private water system or industrial process are incorporating a reliability factor along with efficiency into their designs and long-range plans.

When considering increasing operating and capital investment costs, the use of reliability as a design element is becoming more important to reduce life cycle costs. This month, we will begin a three-part series on defining reliability and how to determine it in pumping plants and water systems.

Defining Reliability

Reliability describes “the ability of a system or component to function under its stated conditions for a specified period of time.” During this operational period, no unexpected repair is required or performed, and the system essentially adheres to the defined performance specifications.

Reliability can best be defined as the relationship between two elements: simplicity and redundancy. Generally, the simpler a design is with at least one level of redundancy, the more reliable it will be.

For example, citing an ordinary pressurized water system using a standard method of “on-off” control by using two basic snap-action pressure switches in parallel so that either switch can control the pump provides both simplicity and redundancy, elevating the system’s operational reliability.

However, this does not necessarily increase the overall or system reliability, as the motor, pump, and other ancillary devices in the system will still possess their own individual reliability factors.

The reliability function is theoretically defined as the probability of success at a time (t), which is denoted as R(t). This probability is estimated from previous data sets or through actual testing.

The system’s primary equipment, including the pump and driver, has the largest influence on the overall system reliability. In most cases, the present industry standard of care for equipment specifications typically does not include a reliability function or performance requirement.

The closest most equipment specifications come to mandating reliability is by specifying basic component life requirements, such as bearings with an L life (e.g., an L10 life that equals 40,000 hours or an L50 life, which equals 200,000 hours).

In the simplest terms, an L life is calculating with 90% reliability how many hours a bearing will last under a given load and speed. This means there is a 10% probability that at the applied load and speed, 10% of a population of identical bearings would suffer a fatigue failure. Thus, if the specified equipment has a reliability requirement above 90%, the present standard of care for equipment specifications is regarded as inadequate.

However, if a system failure will not potentially possess a substantial disruption or impact to water delivery, the system reliability does not necessarily need to be very high. For example, a water system with an adequate supply of gravity water storage or backup water supplies may need a pumping system reliability of only 50% to 75%. In most cases, this type of water system would not require or work with a specified reliability level.

Conversely, if a potential system or component failure could present a substantial disruption to water service, such as with many closed-loop systems without automatic standby or gravity backup sources, the system or component reliability would need to be very high.

As another example, a system reliability of 80% with a 5-year mean time between failure (MTBF) factor of the pump would probably not be acceptable due to the likelihood of a failure occurring during the five years between pump rebuilds. A 90%, 95%, and 99% reliability level will typically incur less than one probable failure during the five-year period between rebuilds and would likely be selected.

Thus, the selected reliability needs to be based upon the acceptable degree of risk, including the period of loss and the potential monetary cost of failure. Offsetting factors include sanitary and health concerns along with customer impact and satisfaction. These are weighed against the monetary cost and scope of procuring, constructing, operating, and maintaining the higher-reliability components and backup systems.

There are several methods used to predict component, product, or system reliability, including a Monte Carlo simulation, Weibull Analysis, statistical analysis, manufacturer or vendor’s tested mean, and other more sophisticated modeling methods. However, for the purposes of predicting pumping and water system reliability and to keep most calculations simple, this discussion will be limited to the reliability estimates obtained from the actual or predicted failure rate and its related factors.

Even though the information and procedures within this column are potentially helpful to everyone involved or interested in pumping and water system operation, it is primarily directed towards the engineers, managers, planners, and others charged with determining reliability needed to make the decisions for long-range planning and equipment procurement or replacement.

In addition, although actual, real-world operating, maintenance, and repair data is always preferred over predicted or assumed values, in many instances with new pumps, drivers, or installations, actual or adequate data is unavailable or hasn’t yet been generated. In fact, this needed data is often incomplete or fragmented, even for many older pumping installations with established operating lives.

After a new system has operated for an adequate period of time or enough data is collected from an older pumping system, failure and maintenance data and frequency of failure can be used to pinpoint the failure rate and other parameters for greater accuracy.

In those situations without real-world data, reliability prediction is the second-best method, which describes the process used to estimate the constant failure rate (ƛ) during the presumed useful life (t) of a product or system by using established norms of similar equipment.

However, the prediction method is not always feasible or accurate because predictions generally assume that:

  • The design is ideal (perfect), all stresses are known, and everything is operating within the respective device’s allowable ratings at all times, such that random failures will be the only failures that will occur.
  • Every failure that occurs to any part of the equipment will result in failure to the entire piece of equipment.
  • The actual pumpage (pumped fluid) conditions will match the predicted or ideal fluid conditions.
  • The database is always valid.

Obviously, one or more of these assumptions are sometimes erroneous, incorrect, or invalid. The design can be, and is almost always, less than perfect. Not every failure of any part will cause the entire piece of equipment to fail, and the pumped fluid can vary from the predicted fluid in temperature, sand content, viscosity, entrained air, etc. Finally, the accumulated database is likely to be at least 10 to 15 years out of date.

However, none of this will matter if the predictions are used to compare different approaches rather than to establish an absolute figure for reliability since this is what predictions were originally designed for.

Some prediction manuals allow the substitution or use of vendor reliability data where such data is known and generally reliable instead of the recommended database values. Such data is very dependent on the specific environment under which it was measured and operates. Thus, any prediction based on such data would need to be adjusted for actual operating conditions and could no longer be depended on or used for actual comparison purposes.

The reliability of a pumping or water system is vastly different in context and definition than efficiency although the terms are often used and applied interchangeably. As opposed to efficiency, which is defined as the relationship (ratio) between the actual power produced (output) versus the theoretical power needed (input) to perform a specific task, system reliability can be a critical, long-term operational factor that heavily depends on the impact of several variables, including adherence to the original design; relative reliability of the individual system components; integrity and workmanship of the installation; operating time at the design and variable service conditions such as vibration, misalignment, piping strain, low/high voltage, variation from the BEP and COS, etc.; ongoing maintenance and repair; and the efficiency of system components.

This column initially outlines the various factors associated with component, system, and source reliability and a few of the methods available to improve the reliability of each. Then, the metrics and equations used to determine the different elements of reliability will be outlined along with specific examples in our second column, and we’ll finish with the procedures and equations for determining the individual component and system reliability for a pumping or water system in part three.

To summarize:

  1. Reliability is a probability of event occurrence. A pumping, mechanical, or electrical system failure can statistically occur during the first week of operation or may not occur until after 20 full years of operation.
  2. The probability of a component or system failure can be reduced but can never be eliminated.
  3. The overall system reliability is cumulative of and includes the reliability of the system’s equipment and each of the equipment’s individual components.
  4. Although reliability and efficiency are different terms and definitions, they are not mutually exclusive, as equipment with higher efficiency is often designed and constructed for higher reliability and vice versa.
  5. There is a point of diminishing return where the cost to obtain a higher reliability begins to substantially outweigh the actual benefits received.

Component (Pump) Product Development and Reliability

A water system’s component reliability estimate is generally determined and conducted by an original equipment manufacturer (OEM) as an element of product development and is vested in the many separate steps involved in the design, material selection, production, manufacture, assembly, and shipment of the product.

For most water systems, the two major components are generally the pump and its driver. They are usually an electric motor or internal combustion engine, although ancillary and support devices such as valves, piping, tanks, electrical equipment, and electric motor starters are also considered to be integral water system components.

For a reliability analysis, these two major components, although they are often supplied by different suppliers or manufacturers, are frequently combined to create a singular pumping plant/unit or for a pumping system when multiple units are used.

The component design process is often the same for each product. It initially evaluates a process of market research to determine the actual need and associated competition first. If a need or viable market for the component exists or is realized or uncovered, it is followed by a review of the intended service conditions and boundaries, operating environment, applicable material science, available technology, and manufacturability during a conceptual or pre-design phase.

The next design step usually includes preparing geometric dimensioning and patterns with determination of unit stresses followed by material selection usually with the assistance of a computer (i.e., computer aided design or CAD).

Generally, a reduced-scale mathematical, computer, or real-world physical model is created at this juncture and used to evaluate the design as well as enhance the future product testing and evaluation processes generally conducted by system analysts and end-users. A model is a computer-rendered or physically reduced-scale representation of a system, component, or product used to forecast the behavior of the full-scale component or system in some desired condition.

Reduced-scale models are built in a wide range of ratios to provide a realistic estimate of full-scale operation and material stresses. The most common test scales being: 1:4, 1:8, 1:12, 1:16, 1:18, 1:24, 1:48, and 1:72.

Models are particularly helpful in fluid mechanics when evaluating large and complex hydraulic structures or machines such as dams, filtration equipment, pumps and turbines, spillways, etc. Modeling of hydraulics for certain structures—such as low submergence sumps, approach pipes, or intake galleries used for high-capacity axial flow pumps—are often a critical design parameter to ensure vortex and cavitation conditions do not occur at the pump, both of which reduce reliability.

This generally uses a process known as similarity or hydraulic similitude. A model is said to have similarity with the actual case if the two have geometric similarity, kinematic similarity, and dynamic similarity.

The term dynamic similarity implies that geometric and kinematic similarities are satisfied. Sometimes, instead of similarity, similitude can be used. Similitude is basically defined as the functional and dimensional similarities between a model and its prototype in all respects. It suggests that the model and the eventual prototype will have similar properties or that the model and prototype will be completely similar in form and function.

Well-established ratios and equations are available from several engineering and hydraulic references to define the dimensions and performance of a reduced-scale model as opposed to a full-scale prototype. This provides reasonably predictable performance of the prototype by extrapolating data obtained from the test results of the model.

Hydraulic similitude testing of models is conducted on both components and systems to refute or verify the projected performance obtained from the initial design. In many cases the introduction of computer modeling has made the use of physically scaled models unnecessary, except for confirmation purposes with prototypes.

This procedure, applied to geometric similitude, applies the affinity laws for capacity, head, and horsepower used for pumps. One example of this is in the use of similitude for determining media filtration efficiency, efficacy, and filter run times by extrapolating full-scale performance from a reduced-scale simulation, often conducted with a pilot plant.

Mean Time Between Failure

The mean time between failure (MTBF) is defined as a prediction of the reliability of a product. Strictly speaking, the MTBF is a basic measure of reliability that applies to equipment that can or is going to be repaired and returned to service.

This is a common procedure used for pumps, motors, and many other components of a water system. Product tests and statistical analysis of individual parts and components are used to predict the rate at which a given product will fail. It is one of the most common forms of reliability prediction and is usually based on an established analysis model, often from real-world operating data, statistical analysis, or testing results from prototypes.

Many analytical models exist for various products and choosing one over another must be based on a broad array of factors specific to a product and its application.

In general, MTBF is specified with a duty cycle parameter and equals an index of reliability, calculated by dividing the total number of product stoppages (i.e., outages) by the operating time or the number of hours or cycles an item or items is operated divided by the number of failures that occurred with the items. It is also expressed as the reciprocal of the failure rate (1/ƛ).

Thus, simplistically, MTBF = the number of operational hours ÷ the number of failures.

MTBF is the main parameter used in the field of manufacturing industries as the average time observed between two failures (the start of the last uptime to the start of the follow-up failure). This is predominantly used to determine system reliability and compare different system designs.

In the current world, this is used in pump manufacturing and other industries for a similar type of prediction. The best way to obtain a value of MTBF is to group pumps with similar operating characteristics (the same fluid, similar operating times, etc.) and then calculate a MTBF for each group.

The most effective method to improve the MTBF for most centrifugal pumps, including their component parts, is to operate them as near as possible to their respective highest efficiency and practice optimum maintenance procedures, particularly with bearing lubrication. This reduces the operational differences in radial thrust while improving efficiency, operational life, and unit reliability. Chances are that levels of up to 20,000 to 30,000 hours of continued operation could thus be achieved in many instances.

Based on years of data collection from various pump installations, I have calculated a typical MTBF for centrifugal, vertical turbine, and submersible pumps, which are shown in Table 1. Obviously, these values will vary according to the type of pump, service conditions, and actual duty. An equation and word statement for determining the MTBF is: MTBF = Σ(TOT) ÷ F


MTBF = Uptime ÷ Number of system failures


TOT = Total operational time, which is calculated by using the description below

TOT = Sum of the start of downtime after last failure –

Start of uptime after last failure

F = Number of failures

These measures are often helpful in finding a failure rate which can become a preventive measure of many systems. An extremely high MTBF value generally means that the system is too good and expensive for normal operational use.

A MTBF calculation is based on the types of failures defined above but does not include scheduled maintenance such as inspections, recalibrations, or preventive part replacement.

Random failures are determined from the reliability factor using the following equation:

R(t) = e-(ƛ)(t) = e-t/(MTBF)


R(t) = Reliability at time (t)

e = 2.718 (natural log)

λ = Product failure rate

t = Time period

MTBF = Mean time between failure

After a specific time (t), equal to the MTBF, the reliability becomes: e(-1) = 2.718(-1) = 0.3679 × 100 = 36.79%

This is usually rounded up to 37% and can be interpreted in several ways:

  1. If multiple units are considered, only 37% of their operating time will be greater than the MTBF figure.
  2. For a single unit, the probability that it will work for as long as its MTBF value is only about 37%.
  3. The unit will work for as long as its MTBF value works with a 37% confidence level.

Let’s work though an example: There are 10 identical rebuildable pumps at a single facility. The pumps each operated for 100 hours over the course of one year and totaling 1000 hours. The pumps randomly failed 16 times over the same year. What is the mean time between failure (MTBF)?

MTBF = (Number of pumps × operating hours per pump) ÷ (Number of failures during the same time)

MTBF = (10 pumps × 100 operational hours each) ÷ 16 failures = 1000 ÷ 16 = MTBF = 62.5 hours


This concludes this introduction to reliability and the various factors involved in determining this parameter. Next month, we will continue this discussion with the various elements associated with centrifugal pump reliability.

Until then, work safe and smart.

Learn How to Engineer Success for Your Business
 Engineering Your Business: A series of articles serving as a guide to the groundwater business is a compilation of works from long-time Water Well Journal columnist Ed Butts, PE, CPI. Click here for more information.

Ed Butts, PE, CPI, is the chief engineer at 4B Engineering & Consulting, Salem, Oregon. He has more than 40 years of experience in the water well business, specializing in engineering and business management. He can be reached at