Understanding Prevention and Detection Controls in Failure Modes & Effects Analyses

Controls Don’t Enter FMEA Immediately

When completing an FMEA, controls aren’t developed until rather late in the process. After you’ve determined the functions and associated requirements, deduced failure modes, and determined effects and causes, you are ready to discuss both prevention and detection controls.

Why Controls Are Important

Actually, developing a sound list of controls is one of the main reasons for doing any FMEA. FMEA studies teach you a great deal about product or process requirements, and they can alert you to failure scenarios through “cause-mode-effect” chains that you hadn’t thought about. You can also increase your general understanding of either a product design or a process flow through the development of FMEA studies.

  • Of course, FMEA can also give you a semi-quantitative way to assess risk. But you can’t really understand the true nature of the risks that any project faces without planning and then executing a proper set of control activities. And that’s true for both design activities and for processes. You can read more about risk assessment here.

In short, controls are activities that allow you to recognize or identify the conditions that lead to specific causes or effects of a disrupted function (or a failure mode, to use the terminology of FMEA). And, there are two types of controls, prevention controls and detection controls. Here is a table that explains the difference controls for Design FMEA and Process FMEA:

How Does This Work on the FMEA Form?

If you know the cause-mode-effect chain, and you have derived that chain from a properly defined and constructed function statement, then any prevention control is a search for the cause in the chain, while any detection control is a search for the effect in the chain.

In any cause-mode-effect chain, something that happens before a function is disrupted must be a cause. As are result, you can imagine a number of possible causes, and, based on occurrence, you’ve selected one or two of the most likely causes for this chain. Each of these causes may or may not lead to a failure mode every time the cause arises, but the cause might, in some cases, lead to the failure mode at hand.

Because you have visualized (or imagined) these causes, you can now think about how you could possibly break the chain from cause to mode. You probably can’t prevent the cause from existing, but you can think about how you can react to the cause-mode link.

A Design Example: Prevention Controls

If you are designing a bolted joint, you will certainly have a function that is something like “bolt creates compression in component A” (and a similar function of “bolt creates compression in component B” to complete the joint). “Excessive compression” might be one of the ways the function could be disrupted. “Incorrect torque specification” is certainly one of the causes. At this point, we have a cause-mode link that says, “Incorrect torque specification leads to excess compression in component A.”

What can we do about this? We could certainly do a physical test, but a physical test won’t actually look at the torque specification directly. It would certainly lead to something that is wrong with the joint, but there are quite a few things (such as a defective bolt in the test, an error in torque reading in the test, or component A has a strength that is too low) that could happen in the test that could lead you astray.

Besides, we want to prevent any failures—including testing failures which cause project disruption—not just see them happen. To do this, you need to create a control that is based on thinking in some way, and does NOT depend on a test.

In this case, you could conduct a simple stress calculation based on bolted joint design—or do the same thing with finite element analysis. If the joint design is simple enough, you might even find tabulated values in a reference such as Machinery’s Handbook. Best of all, you can do these analyses at worst-case conditions (limits of specification for all relevant components), which is something that is quite difficult to do in testing.

You still may need to test a prototype to confirm the calculation, but you are much more likely to pass the test without trial and error efforts. And trial and error efforts take time and raise product development costs. In a world where faster-better-less cost is critical, this no small thing.

Now Consider the Same Function & Failure Mode in a Process Example

Again, we want to prevent failures. We don’t want to measure the joint after installing the bolt—not only would that be a detection control, but it would also lead to at least one defective process outcome. How can we avoid excessive torque?

To develop the most effective prevention control, you would certainly need to be very familiar with the production facility where the work would be done. That will lead you to the most likely cause. To show how a prevention control might work, let’s assume the most likely cause is that the torque gun is set too high.

Of course, is that a root cause or a superficial cause? So, let’s ask why the gun might be set too high. Let’s further assume that the gun is set high to speed up the operation. (The fact that higher torque might not really do that won’t stop some people from believing that!)

A relatively simple way to overcome this is to use a gun that has a torque limiting clutch. That’s a good prevention control—as long as the gun maintenance is done and the clutch is set to the correct maximum torque. Of course, it’s not a full jidoka control, which will stop production if excess torque is applied, but it’s a decent prevention control.

The fact that’s the torque limiting clutch isn’t a jidoka control means that the detection rating of this control will be lower than a full jidoka control. However, it’s quite a bit better than just putting up a sign that says, “Don’t change the torque setting.”

What About Detection Controls?

On the other side of the coin, detection controls are usually much easier to understand. For the design concern, you might be worried about “excessive bolt deformation” as the most severe effect. A test of prototype parts is certainly a valid detection control. Just remember that there are some statistical limits to what a test can tell you, and that is a result of not being able to do absolute tests at worst-case specification limits.

Similarly, you could check a finished joint in production to look for undue joint deformation due to excess torque. The actual effect might be “excess deflection of component A.” You wouldn’t know something had gone wrong until you found the defect in an inspection step, but at least you’d find it—at least as long as the excess deflection of component A was relatively easy to identify.

Summing Up

Both Prevention Controls and Detection Controls are important and are extremely important in control cost–both costs for product development processes as well as ongoing production costs. However, prevention controls are often more effective that detection controls, and they offer the added benefit of improving the speed of product development.

Are Stage-Gate Product Development Processes Doomed to Fail?

There’s an interesting article published by Machine Design written by Dr. David Ullman comparing “scrum” methods with stage-gate (or “waterfall”) processes.

Ullman, who is a well-known expert in product development methods, presented data suggesting that scrum methods cut project failure rates in half. Good result, but unless you read carefully, you may not have noticed that this was for software and did not relate to hardware.

As Ullman noted, software is less vulnerable to factors that make hardware development difficult. This relates to tooling and production investments—which are minimal or absent in software projects—but also includes greater difficulties that hardware faces in analysis and testing.

Nevertheless, as I’ve found in 47 years of experience with product development processes, hardware-driven projects can have high success rates using stage-gate processes. I’ve personally participated in re-structuring these processes for a number of firms, and, when the process is properly constructed, the failure rate for projects is usually less than 10%.

What’s the Major Factor in Failure?

One of the challenges in hardware design, whether you use scrum or stage-gate, is starting the project properly. If you don’t start well, the chances for failure rise dramatically.

Unfortunately, too many managers, who learned their craft with methods in use 10-20 years ago, have a strong preference to see hardware in some form as early as possible in the project timeline.

That’s a major problem. It’s all about how you get to verification of the design. In a hardware-first culture,  here’s what I have seen time and time again:

Once a basic concept design is generated, there’s a rush to get prototypes that can be looked at, touched, and most importantly, tested. Of course, these early prototypes don’t work properly, so the design is reworked. More samples are tested. They do better, but still don’t meet the goals. This is repeated again and again, usually until time runs out and the design is pushed over to the manufacturing team.

Results of Hardware-First Methods

The net result is the design team learns what doesn’t work, but often doesn’t understand why the design works (assuming final test results actually meet project goals). It’s also next to impossible to control project timing with this method.

The goal becomes simple: build prototypes that will pass all of the required tests.

And, finally, the manufacturing side of the team is slowly losing their minds as engineering changes keep coming until the product is launched. If you’re lucky, changes slow down, but usually keep trickling in for a year or so.

When the product reaches customers, there are often problems. It could be that the tests used didn’t reproduce real-world conditions very well. Often, customers use products in ways the design team didn’t anticipate.

Manufacturing costs and capital budgets are battered and market acceptance can be irreparably damaged.

What You Should Aim For

To avoid this pitfall, managers and executives need to be more patient and deliberate in the early stages. That doesn’t mean using time in a leisurely fashion—far from it. Instead, project teams need to move at breakneck speed, but be more structured and focused.

Instead of a goal that designs need to pass necessary tests, the project goal must be broader. This reflects what Dr. Joseph Juran defined as “quality.” The goal must be to arrive at a properly verified design that is fit for use by targeted customers.

Test criteria can never fully6 address that. There are dozens of reasons why, but I’ve studied this at length, and I find almost no products that failed in the market that didn’t pass tests used to verify the design.

How to Do Better

To create a product that is “fit for use,” spend some detailed, intense time trying to determine all of the functional requirements that will satisfy your customer base.

There’s a lot to do to achieve that, but it can be achieved quickly and comprehensively by following a simple outline.

As with the “early hardware” method, start with a concept design. That means a design that is far enough along to create a rough costed bill of materials.

Then follow these five steps:

    1. Create a block diagram for the entire product.
    2. Decide which sections of the block diagram—what Ullman called modules—cause concern. This will usually be subsystems or components that are new, unique (in terms of application), or difficult. These design elements are sometimes called “NUD’s.”
    3. Next, create a Parameter diagram, and try to determine what the modules of interest must do to make the system work correctly. If you are supplying something that will be used in a bigger system, this will include your customer’s system. You’ll also need to identify the environmental conditions that the product must withstand, as well as the important user factors that will make your product robust against reasonable and expected misuse.
    4. Use the block diagram and the Parameter diagram to create an interface matrix. The resulting functions that fill the matrix will be a detailed requirements statement that need to be evaluated to achieve verification.
    5. Use that list of functions to generate a Design Failure Modes & Effects Analysis. The resulting analytical prevention controls and physical detection controls will give you a list of things needed to achieve verification.

Carry out the analytical prevention controls before you build any significant test samples. Use analytical techniques to debug and modify the design.

Now, build prototypes and conduct the detection-driven control tests.

If you’ve done this well, the odds are you won’t be endlessly repeating tests with multiple hardware iterations. You’ll be able to stay on schedule, be on budget, and have a design that the manufacturing team won’t have to continuously change as they complete their preparation for production.

The Outcome

If you’ve done the proper analysis outlined above, your process outline will now look like this:


Of course, there are a lot of details in this process, but the improvement in project cost, project timing, and launch issues is the payoff you can expect from this process. And it fits very well in the familiar outline of a stage-gate process.

Five Reasons Cost Reduction Should Be Part of Every Annual Budget

For the past decade, many businesses focused heavily on market share and assumed cost issues could be deferred until a certain scale or size was achieved. The ready availability of venture capital fed that mania, providing “cash burn” and many companies followed that model.

However, even in the chaotic world of startups and unicorns, that proposition is slowly running out of steam.

And—what about most of the world? What about businesses that aren’t trying to become the next Tesla, or Uber, or WeWork?

In particular, what about companies that are trying to make things to sell, and not simply trade on ideas?

Cost reduction is always important, no matter what. And there are great reasons why cost reduction should be a major part of any management plan. So, here are five solid factors that should motivate senior leaders to make cost management part of your budget every year.

1. Cost reduction can have a major impact on capital needs, particularly working capital

Working capital—receivables and inventory in particular—have the highest marginal cost of any capital on a balance sheet. Investors and lenders don’t like to put money into a company if that money will be used for receivables or inventory. There’s a simple reason—those items are not particularly good collateral. So, most of the cash (and cash really is king) used for inventory and receivables comes out of ownership’s equity. And the return on that equity isn’t very good. If ownership can’t get a better return than a credit line would offer, why should owners apply capital to this category of investment?

2. A continual focus on cost reduction can be a powerful competitive tool

If margins improve, there are two non-exclusive options that can be pursued to increase market share and drive future success. One is to lower prices to gain share; this choice can be problematic because a price war may result—and all competitors could suffer. The other is to use the increased margin to develop new products, products that, with the learning gained from previous cost reductions, will have still greater margins. These new products can then become the factor that increases market share and leverage for increase earnings.

3. Repetitive efforts to reduce cost are self-energizing

This means that past efforts provide organizational insights that not only drive future cost reductions, but also power improvements in new products and services. In addition, additional margin can be used to provide both employees and ownership greater rewards—which, if structured properly, can lead to still higher returns.

4. Continual cost reduction will put pressure on competitors

Some competitors, especially those that are just a cog in a larger conglomerate, may even exit your markets because parent companies are keenly tuned to levels of return on equity. This can give you yet another wedge to increase market share. You might even find an attractive acquisition opportunity if a competitor feels sufficient distress from cost pressures.

5. Cost reduction can be a significant contributor to sustainability objectives

Most organizations have a great deal of waste built into their product or process delivery systems, including wasted energy and materials. Eliminating these wastes almost always reduces costs—sometimes from fixed costs, sometimes from variable costs, and more than occasionally from both.

Of course, cost reduction processes don’t come without risk. And it takes time and effort to achieve. It’s a certainty that any effort to reduce cost will have associated risks. Being able to differentiate between “cost reduction for the sake of cost reduction” and coherent, balanced cost reduction efforts is critical.

But that’s another story for another day. Failure Modes & Effects Analysis is a powerful tool that can be applied to manage risk that is inevitable when reducing costs.

Dealing With Risk In Product Development

Failure Modes & Effects Analysis is an incredibly valuable tool for assessing and dealing with risk in product development. However, the traditional “Risk Priority Number” or RPN-taken alone-doesn’t always give clear indications about how you should deal with risk.

What Are Some of the Other Ways to Assess Risk in FMEA?

One way is to simply look at each row on the FMEA form as a self-contained story about one limited aspect of a design or a process. Certainly, developing the FMEA yields a much more useful and meaningful result if you work through the form on a column by column basis—but that doesn’t change the fact that each row tells a tale.

And those line-by-line stories are built around specific cause-mode-effect (C-M-E) chains, because the occurrence and severity ratings are the major factors that determine risk. Certainly, the ability to detect a cause or an effect prior to design release or to process completion does have some impact on risk, and detection is not without importance. Detection, though, is less significant because it is always an imperfect element of risk management.

The biggest impact on risk arises from the severity and occurrence ratings.

With this in mind, the simplest way to assess risk is to read each row, and then, without any kind of calculation or formal rules, just sort the rows into three categories:

    • Those that make you worry a great deal—these are red risk
    • Those that you don’t think are really significant or worth much worry—green risks
    • All of the other rows, with a level of concern between the other two categories—yellow risks

I’ve been looking at FMEA studies—both those I’ve worked on personally and those completed without my participation—for many years and I’ve never had trouble sorting every row into one of these three categories. And, while doing that, I’ve rarely had much disagreement about these judgments from others.

Adding More Formality–But Potentially Less Understanding

Of course, there’s a lack of formality in assessing risk this way, and that informal approach makes many people—particularly senior managers and executives, who don’t always want to apply the effort and take the time to fully understand the details of risk and just want a more recognized and less opinion-based judgment. There’s also a very real concern about bias when it’s done just by high-level judgment.

There are more structured ways to do approximately the same thing, and most of these are based on the concept of criticality—the product of the severity rating and the occurrence rating. The original idea was simple—a higher value means more risk. But, as I explained in the previous Lighthouse post, this still assumes that severity and occurrence have equal significance, and that’s just not true.

So, over the years, various schemes have been proposed to evaluate the concept of criticality. Most are based on these ideas—and detection is not really a factor:

    • Any C-M-E chain with a severity of 9 or 10 is critical. Since human safety and/or serious regulatory issues are likely if a failure mode arises, the impact of occurrence (or detection) does not diminish the fact that this is a potentially serious chain of events.
    • A C-M-E chain with a reasonably high severity (say 5, 6, 7, or 8) in combination with a moderately high occurrence, typically 4 or more, is significant. Moderate severity with a lower occurrence doesn’t meet this test, nor does a high occurrence with a low severity.
    • Anything else is neither critical nor significant.

In this approach, any row—or C-M-E chain—that is either critical or significant requires a classification designation. This is what that skinny “classification” column is all about on most FMEA forms. However, this does not directly identify critical and significant characteristics; the classification is all about the C-M-E chain of events. Characteristics (or parameters for PFMEA) are a separate consideration, and it doesn’t necessary jump out from the FMEA form, unless the concept of function and functional requirements have been carefully worked out as part of the FMEA study.

So, if a C-M-E chain is classified as critical, there must be one or more critical product characteristic that drive the risk. In DFMEA, this can be found in the Function/Functional Requirements columns of the DFMEA. In PFMEA, this often requires a cross-reference to the associated DFMEA, as critical characteristics are always features or requirements that relate to the design of the product, and these don’t always show up clearly in a PFMEA study.

Significant classifications can lead to either a significant characteristic (again, a design element) or to a significant process parameter. Significant characteristics should be available directly from the functional section of the relevant DFMEA. Similarly, in Process FMEA, significant process parameters should be evident in the functional section of the PFMEA itself.

The major drawback with this approach is that less understanding and less attention is paid to risk that ends up leading to a product characteristic or process parameter that’s identified this way. That happens because the rules don’t force careful consideration of the risk issues. The other drawback is that this still assumes that severity and occurrence are equally significant, so sometimes distortion of risk is the result.


The Latest Method–Worth Considering

Finally, the 2019 Edition of the Automotive Industry Action Group (AIAG)-Verband der Automobilindustrie (VDA) FMEA handbook takes an approach that may bridge the gap between a simple “red-yellow-green” approach and more formal criticality methods. The AIAG-VDA approach has these features:

    • The results will be organized into three categories, called Action Priorities
    • There will be high, medium, and low categories
    • High issues must be acted on, medium issues should be acted on, and low issues may be considered for action
    • This is not explicitly about risk; instead, this is prioritization of issues that should be undertaken to diminish risk
    • There are guidelines about how to identify high, medium, and low priorities, but they are not completely dissimilar from the ideas that have been used to specifically define criticality using severity and occurrence

There are guidelines about how to identify high, medium, and low priorities, but they are not completely dissimilar from the ideas that have been used to specifically define criticality using severity and occurrence.
Severity is still called severity, but what’s been called occurrence is now called frequency, and detection is now called monitoring. Still, these are still basically the same ideas. There are (as always) new rating tables for these three concepts

After using the Action Priority approach a few times, I’ve found it to be a useful tool that is about the same as the very simple red, yellow, and green assessments I explained above.

As an example of why “action” is more useful than just high risk, consider a safety device that has several C-M-E chains with severities of 10. If the occurrence associated with these chains are all 2, and the detection is 2 in each chain (bearing in mind that getting to an occurrence rating and a detection rating of 1 requires some extraordinary evidence in most rating tables), is there really anything that can be done about this risk—at least at any practical cost?

In these cases, I’ve always said that the main response is to have a very high coverage limit product liability umbrella insurance, because the probability that something will go wrong is very low, and you are very likely to find this before it gets to the marketplace. But if it does, and something goes awry, the consequences can be harsh.

Risk Is Never Just a Number

When you really step back and think about it, any way you approach this, you are using less-than-purely-quantitative methods to assess the factors that drive risk, and then use less-than-purely-quantitative rules to decide what you might do to address the risk you’ve found. And risk can never be eliminated, but can always be managed.

In the end, it’s still judgment, as risk assessment has always been (and may forever be).  Nevertheless, intelligent decision-making based on circumstances and risk preferences can be improved signficantly by applying these ideas.

3 Actions To Make FMEA Worthwhile

FMEA is critical tool in Product Development processes, and you should understand why that’s true.

In this article, I’m going to explain why it’s worthwhile to complete an FMEA. Let’s start with a general discussion, without regard to Process FMEA or Design FMEA. The major issues are the same, although the details can (and often do) diverge. FMEA is critical tool in Product Development processes, and you should understand why that’s true.

Point 1: FMEA is NOT an End Point

If you complete an FMEA, heave a sigh of relief, and don’t use the results for meaningful and significant actions, don’t waste your time doing FMEA studies. FMEA is all about improving products and processes, and if you think a finished FMEA is useful because it checks off a “to-do” on some list of deliverables in a project, you are missing the point completely. You’ll spend time (and money) and get little or nothing in return. The FMEA form is just a place to keep track of what can and should be done as a result of the study.

Point 2: You Need to Make Changes As A Result of Every FMEA Study

If you’ve done an FMEA study correctly, you are going to find some cause-mode-effect chains (or rows on the FMEA form) that have a risk you aren’t comfortable with. In those cases you have two choices. You can agree, with an eyes-wide-open perspective, to accept the risk as it sits and move on. But you can also devise a plan to take a direct action, choosing a design change (for either DFMEA or PFMEA) or a process change (PFMEA only) that will address the risk.

If you never try to do anything except go forward the way you planned before starting the FMEA, you are making a terrible mistake. Worse, you are devaluing the FMEA process. Once you do that, you will find that each successive FMEA has less and less significance. Eventually, you’ll just be checking the box and filing the form away in some obscure folder, drive, or database.

Point 3: You Need to Implement The Controls

If you’ve done a decent job on the FMEA study, you will have a long list of prevention controls as well as detection controls. Do you need to add every single one to the Design Verification Plan (DFMEA) or the Manufacturing Control Plan (PFMEA)?

In a perfect world, the answer would be yes. And, there are plenty of OEM customers (and engineering managers) who will insist that you reach for a perfect world. However, reality is somewhat different. In a sound FMEA, there will be a number of controls that you would like to apply, but are either too expensive, take too long, or are otherwise unlikely to be carried out.

In those situations, you again have to deal with an eyes-wide-open risk. Either spend the money and/or time to carry out the controls, change the design or process, or accept the risk.

On the other hand, there will also be one, two, or even several controls that you really should add to the list for verification or control plans. They might add a bit of cost, they might add time to a project, or they might intrude on issues that veer into the “we don’t do that here” mentality of your organization.

If you aren’t actually applying some of these out of every FMEA, you simply aren’t getting to the main point. At the same time, if you are skipping or omitting one, two, or more of the controls you’d normally do for the same kind of product or process, you are—again—missing the boat.

What You Can Do

I’ve been doing FMEA studies since 1978. I’ve learned something about the methodology every time I use it. I’ve been involved, either as an active participant, instructor, or facilitator, in hundreds of DFMEA and PFMEA studies.

When the results are applied, good things happen. When results are not applied, the outcome is worse than “nothing happened.” What almost always occurs is that FMEA becomes devalued, and another group of engineers walks away with more than a bit of loathing for using this valuable technique.

And, while I suspect that many of you think this advice is too obvious to worry about, I will add this. Even in the past five years, the number of FMEA’s I have seen that have resulted in no meaningful action exceeds 80%. I would say 95%, but in truth I don’t always know if action has been taken in every case.

Use the results. Make something positive happen with what you learn from each and every FMEA.