Understanding Prevention and Detection Controls in Failure Modes & Effects Analyses

Controls Don’t Enter FMEA Immediately

When completing an FMEA, controls aren’t developed until rather late in the process. After you’ve determined the functions and associated requirements, deduced failure modes, and determined effects and causes, you are ready to discuss both prevention and detection controls.

Why Controls Are Important

Actually, developing a sound list of controls is one of the main reasons for doing any FMEA. FMEA studies teach you a great deal about product or process requirements, and they can alert you to failure scenarios through “cause-mode-effect” chains that you hadn’t thought about. You can also increase your general understanding of either a product design or a process flow through the development of FMEA studies.

  • Of course, FMEA can also give you a semi-quantitative way to assess risk. But you can’t really understand the true nature of the risks that any project faces without planning and then executing a proper set of control activities. And that’s true for both design activities and for processes. You can read more about risk assessment here.

In short, controls are activities that allow you to recognize or identify the conditions that lead to specific causes or effects of a disrupted function (or a failure mode, to use the terminology of FMEA). And, there are two types of controls, prevention controls and detection controls. Here is a table that explains the difference controls for Design FMEA and Process FMEA:

How Does This Work on the FMEA Form?

If you know the cause-mode-effect chain, and you have derived that chain from a properly defined and constructed function statement, then any prevention control is a search for the cause in the chain, while any detection control is a search for the effect in the chain.

In any cause-mode-effect chain, something that happens before a function is disrupted must be a cause. As are result, you can imagine a number of possible causes, and, based on occurrence, you’ve selected one or two of the most likely causes for this chain. Each of these causes may or may not lead to a failure mode every time the cause arises, but the cause might, in some cases, lead to the failure mode at hand.

Because you have visualized (or imagined) these causes, you can now think about how you could possibly break the chain from cause to mode. You probably can’t prevent the cause from existing, but you can think about how you can react to the cause-mode link.

A Design Example: Prevention Controls

If you are designing a bolted joint, you will certainly have a function that is something like “bolt creates compression in component A” (and a similar function of “bolt creates compression in component B” to complete the joint). “Excessive compression” might be one of the ways the function could be disrupted. “Incorrect torque specification” is certainly one of the causes. At this point, we have a cause-mode link that says, “Incorrect torque specification leads to excess compression in component A.”

What can we do about this? We could certainly do a physical test, but a physical test won’t actually look at the torque specification directly. It would certainly lead to something that is wrong with the joint, but there are quite a few things (such as a defective bolt in the test, an error in torque reading in the test, or component A has a strength that is too low) that could happen in the test that could lead you astray.

Besides, we want to prevent any failures—including testing failures which cause project disruption—not just see them happen. To do this, you need to create a control that is based on thinking in some way, and does NOT depend on a test.

In this case, you could conduct a simple stress calculation based on bolted joint design—or do the same thing with finite element analysis. If the joint design is simple enough, you might even find tabulated values in a reference such as Machinery’s Handbook. Best of all, you can do these analyses at worst-case conditions (limits of specification for all relevant components), which is something that is quite difficult to do in testing.

You still may need to test a prototype to confirm the calculation, but you are much more likely to pass the test without trial and error efforts. And trial and error efforts take time and raise product development costs. In a world where faster-better-less cost is critical, this no small thing.

Now Consider the Same Function & Failure Mode in a Process Example

Again, we want to prevent failures. We don’t want to measure the joint after installing the bolt—not only would that be a detection control, but it would also lead to at least one defective process outcome. How can we avoid excessive torque?

To develop the most effective prevention control, you would certainly need to be very familiar with the production facility where the work would be done. That will lead you to the most likely cause. To show how a prevention control might work, let’s assume the most likely cause is that the torque gun is set too high.

Of course, is that a root cause or a superficial cause? So, let’s ask why the gun might be set too high. Let’s further assume that the gun is set high to speed up the operation. (The fact that higher torque might not really do that won’t stop some people from believing that!)

A relatively simple way to overcome this is to use a gun that has a torque limiting clutch. That’s a good prevention control—as long as the gun maintenance is done and the clutch is set to the correct maximum torque. Of course, it’s not a full jidoka control, which will stop production if excess torque is applied, but it’s a decent prevention control.

The fact that’s the torque limiting clutch isn’t a jidoka control means that the detection rating of this control will be lower than a full jidoka control. However, it’s quite a bit better than just putting up a sign that says, “Don’t change the torque setting.”

What About Detection Controls?

On the other side of the coin, detection controls are usually much easier to understand. For the design concern, you might be worried about “excessive bolt deformation” as the most severe effect. A test of prototype parts is certainly a valid detection control. Just remember that there are some statistical limits to what a test can tell you, and that is a result of not being able to do absolute tests at worst-case specification limits.

Similarly, you could check a finished joint in production to look for undue joint deformation due to excess torque. The actual effect might be “excess deflection of component A.” You wouldn’t know something had gone wrong until you found the defect in an inspection step, but at least you’d find it—at least as long as the excess deflection of component A was relatively easy to identify.

Summing Up

Both Prevention Controls and Detection Controls are important and are extremely important in control cost–both costs for product development processes as well as ongoing production costs. However, prevention controls are often more effective that detection controls, and they offer the added benefit of improving the speed of product development.

Are Stage-Gate Product Development Processes Doomed to Fail?

There’s an interesting article published by Machine Design written by Dr. David Ullman comparing “scrum” methods with stage-gate (or “waterfall”) processes.

Ullman, who is a well-known expert in product development methods, presented data suggesting that scrum methods cut project failure rates in half. Good result, but unless you read carefully, you may not have noticed that this was for software and did not relate to hardware.

As Ullman noted, software is less vulnerable to factors that make hardware development difficult. This relates to tooling and production investments—which are minimal or absent in software projects—but also includes greater difficulties that hardware faces in analysis and testing.

Nevertheless, as I’ve found in 47 years of experience with product development processes, hardware-driven projects can have high success rates using stage-gate processes. I’ve personally participated in re-structuring these processes for a number of firms, and, when the process is properly constructed, the failure rate for projects is usually less than 10%.

What’s the Major Factor in Failure?

One of the challenges in hardware design, whether you use scrum or stage-gate, is starting the project properly. If you don’t start well, the chances for failure rise dramatically.

Unfortunately, too many managers, who learned their craft with methods in use 10-20 years ago, have a strong preference to see hardware in some form as early as possible in the project timeline.

That’s a major problem. It’s all about how you get to verification of the design. In a hardware-first culture,  here’s what I have seen time and time again:

Once a basic concept design is generated, there’s a rush to get prototypes that can be looked at, touched, and most importantly, tested. Of course, these early prototypes don’t work properly, so the design is reworked. More samples are tested. They do better, but still don’t meet the goals. This is repeated again and again, usually until time runs out and the design is pushed over to the manufacturing team.

Results of Hardware-First Methods

The net result is the design team learns what doesn’t work, but often doesn’t understand why the design works (assuming final test results actually meet project goals). It’s also next to impossible to control project timing with this method.

The goal becomes simple: build prototypes that will pass all of the required tests.

And, finally, the manufacturing side of the team is slowly losing their minds as engineering changes keep coming until the product is launched. If you’re lucky, changes slow down, but usually keep trickling in for a year or so.

When the product reaches customers, there are often problems. It could be that the tests used didn’t reproduce real-world conditions very well. Often, customers use products in ways the design team didn’t anticipate.

Manufacturing costs and capital budgets are battered and market acceptance can be irreparably damaged.

What You Should Aim For

To avoid this pitfall, managers and executives need to be more patient and deliberate in the early stages. That doesn’t mean using time in a leisurely fashion—far from it. Instead, project teams need to move at breakneck speed, but be more structured and focused.

Instead of a goal that designs need to pass necessary tests, the project goal must be broader. This reflects what Dr. Joseph Juran defined as “quality.” The goal must be to arrive at a properly verified design that is fit for use by targeted customers.

Test criteria can never fully6 address that. There are dozens of reasons why, but I’ve studied this at length, and I find almost no products that failed in the market that didn’t pass tests used to verify the design.

How to Do Better

To create a product that is “fit for use,” spend some detailed, intense time trying to determine all of the functional requirements that will satisfy your customer base.

There’s a lot to do to achieve that, but it can be achieved quickly and comprehensively by following a simple outline.

As with the “early hardware” method, start with a concept design. That means a design that is far enough along to create a rough costed bill of materials.

Then follow these five steps:

    1. Create a block diagram for the entire product.
    2. Decide which sections of the block diagram—what Ullman called modules—cause concern. This will usually be subsystems or components that are new, unique (in terms of application), or difficult. These design elements are sometimes called “NUD’s.”
    3. Next, create a Parameter diagram, and try to determine what the modules of interest must do to make the system work correctly. If you are supplying something that will be used in a bigger system, this will include your customer’s system. You’ll also need to identify the environmental conditions that the product must withstand, as well as the important user factors that will make your product robust against reasonable and expected misuse.
    4. Use the block diagram and the Parameter diagram to create an interface matrix. The resulting functions that fill the matrix will be a detailed requirements statement that need to be evaluated to achieve verification.
    5. Use that list of functions to generate a Design Failure Modes & Effects Analysis. The resulting analytical prevention controls and physical detection controls will give you a list of things needed to achieve verification.

Carry out the analytical prevention controls before you build any significant test samples. Use analytical techniques to debug and modify the design.

Now, build prototypes and conduct the detection-driven control tests.

If you’ve done this well, the odds are you won’t be endlessly repeating tests with multiple hardware iterations. You’ll be able to stay on schedule, be on budget, and have a design that the manufacturing team won’t have to continuously change as they complete their preparation for production.

The Outcome

If you’ve done the proper analysis outlined above, your process outline will now look like this:

 

Of course, there are a lot of details in this process, but the improvement in project cost, project timing, and launch issues is the payoff you can expect from this process. And it fits very well in the familiar outline of a stage-gate process.