Driving Production Validation With PFMEA

In product development, there are a number of steps that should be carried out in order to launch a new product. The overall objectives for any new product will almost certainly include earning a margin and generating a return on investment for the product development expenses as well as  investment in production capability.

To achieve these goals for a manufactured product, it’s important that the start of production be smooth and as trouble-free as possible. In addition, it’s also critical that the products that are sold perform as promised. That means no defects during the first year of usage as well as no safety defects during the entire life of the product.

Getting these kinds of results is never an accident. To do this requires a level of structure and thoroughness in the development process. What are the key steps?

ISO 15288

The ISO/IEC 15288 is an international standard for systems engineering that describes an ideal set of steps to reach these goals.  While this standard isn’t intended to be a comprehensive description of everything that must be done in product development, it does explain the general flow of information from initial planning of a product all the way to the end-of-life considerations that are increasingly important. There are four steps in this standard:

    • The first step is to plan the product, including the reason for the product and, most importantly, the generation of a reasonable complete conceptual design.
    • The second step is the overall development of the concept into a production-ready package, including a verified design and a validated system of manufacturing.
    • The third step is ongoing or serial production of items that will be sold.
    • The fourth step includes all of the things that are necessary to support products that are sold, including usage instructions, service and support, and recommendations for end-of-life disposal of the product.

For most manufacturing companies, the second step is the most critical. It’s also the step that has the most impact on the eventual financial results of any development project.

Understanding Verification and Validation

While the terms “verification” and “validation” are often used interchangeably, in quality systems they have very specific and limited meanings.

Verification is the development of information and other forms of evidence that show that the final product design will perform as intended, independent of manufacturing considerations.

Validation is a demonstration that a verified design can be translated into a physical product using the actual production processes that will be used to generate saleable products.

In short, verification is evidence that the design is sound, and validation is proof that the product can be made in the way the design intends.

The Chain of Validation

While verification is a separate subject that won’t be addressed in detail in this article,  validation is the final confirmation that a new product will be delivered to customers in a form that is consistent with the latest plans and expectations.

Validating a production system for a new product progresses through two stages.  To start, you must develop a plan for production. Then, you must put that plan into action in a controlled experiment to see if the plan actually works as desired.

Validation Planning

Here are the steps involved in planning for validation. They include the development of a production process as well as any initial trials of machines, tooling, and related items.

The foundation for a successful production outcome is to plan the process, analyze the process for potential flaws before investing in hardware, and then to use that analysis to develop a Control Plan that will be used to keep the process operating properly once saleable product is being made. The analysis tool that underpins this “chain” of events is Process Failure Modes and Effects Analysis, or PFMEA.

In PFMEA, each manufacturing process (fabrication and/or assembly) is analyzed for cause and effect relationships. Controls are then derived, with prevention controls acting against causes and detection controls acting against effects. These controls then form the basis for the Control Plan.

With Control Plan in hand, it’s then possible to put together first-rate work instructions for factory personnel, and to formulate a plan for making sure that all of the measurement tools (or gages) can be validated, which is called Measurement System Analysis, or MSA.

While doing this, flaws or weaknesses in the planned manufacturing process should be revealed, and then changes can be made before finalizing investment decisions.

It’s then possible to try out the entire plan, using the tooling and equipment, the intermediate end point of a serious plant-level review being the result. Any corrective action necessary can be carried out at that point.

Completion of Validation

Once plant management has been satisfied that all of the plans, equipment, instructions, and workforce preparedness are complete, it’s time to see if it all works properly under the approximate conditions of ongoing production. This is the final stage of validation.

This starts with a production trial run, which is sometimes called a final pilot run of production. Real products, suitable for sale, are made. To the greatest extent possible, this production run should be done at the planned maximum rate of production, using all of the tools, gages, and equipment that will be used after the product launch begins. It also means that properly trained people, the same people that will work on the product after launch, should be doing the manual elements of the processes.

Enough parts or products should be built so that reasonable statistical analysis can be done. Taken together with the run-at-rate condition previously stated, this usually required at least four and usually eight hours of production be completed.

These parts can then be used to validated the gaging system and to statistically assess the critical product characteristics (which were identified in the Design FMEA process).

If all of those outcomes are reasonable, then a sampling of the products generated during the Production Trial Run can be subjected to whatever physical tests for performance and durability might be necessary.

If those outcomes are successful, the entire “machine” for production can be formally approved both at the plant level and by any other important party (usually an industrial customer for a supplier or by the product planning and marketing staff for an end-point manufacturer).

PFMEA Is Critical

Process FMEA is vital in this entire set of events. A carefully considered PFMEA will be reduce the likelihood that anything will go wrong during the validation process. If something does go astray, the consequences are not trivial. The launch can be delayed, possibly missing key marketing dates, delaying or even reducing the revenue needed to pay for all of the investment in the project. Tooling and equipment purchases can go far over budget. Worst of all, defective product can be sold, and that can be damage your company’s reputation in a way that can take years to overcome.

Validation matters.

 

Understanding Prevention and Detection Controls in Failure Modes & Effects Analyses

Controls Don’t Enter FMEA Immediately

When completing an FMEA, controls aren’t developed until rather late in the process. After you’ve determined the functions and associated requirements, deduced failure modes, and determined effects and causes, you are ready to discuss both prevention and detection controls.

Why Controls Are Important

Actually, developing a sound list of controls is one of the main reasons for doing any FMEA. FMEA studies teach you a great deal about product or process requirements, and they can alert you to failure scenarios through “cause-mode-effect” chains that you hadn’t thought about. You can also increase your general understanding of either a product design or a process flow through the development of FMEA studies.

  • Of course, FMEA can also give you a semi-quantitative way to assess risk. But you can’t really understand the true nature of the risks that any project faces without planning and then executing a proper set of control activities. And that’s true for both design activities and for processes. You can read more about risk assessment here.

In short, controls are activities that allow you to recognize or identify the conditions that lead to specific causes or effects of a disrupted function (or a failure mode, to use the terminology of FMEA). And, there are two types of controls, prevention controls and detection controls. Here is a table that explains the difference controls for Design FMEA and Process FMEA:

How Does This Work on the FMEA Form?

If you know the cause-mode-effect chain, and you have derived that chain from a properly defined and constructed function statement, then any prevention control is a search for the cause in the chain, while any detection control is a search for the effect in the chain.

In any cause-mode-effect chain, something that happens before a function is disrupted must be a cause. As are result, you can imagine a number of possible causes, and, based on occurrence, you’ve selected one or two of the most likely causes for this chain. Each of these causes may or may not lead to a failure mode every time the cause arises, but the cause might, in some cases, lead to the failure mode at hand.

Because you have visualized (or imagined) these causes, you can now think about how you could possibly break the chain from cause to mode. You probably can’t prevent the cause from existing, but you can think about how you can react to the cause-mode link.

A Design Example: Prevention Controls

If you are designing a bolted joint, you will certainly have a function that is something like “bolt creates compression in component A” (and a similar function of “bolt creates compression in component B” to complete the joint). “Excessive compression” might be one of the ways the function could be disrupted. “Incorrect torque specification” is certainly one of the causes. At this point, we have a cause-mode link that says, “Incorrect torque specification leads to excess compression in component A.”

What can we do about this? We could certainly do a physical test, but a physical test won’t actually look at the torque specification directly. It would certainly lead to something that is wrong with the joint, but there are quite a few things (such as a defective bolt in the test, an error in torque reading in the test, or component A has a strength that is too low) that could happen in the test that could lead you astray.

Besides, we want to prevent any failures—including testing failures which cause project disruption—not just see them happen. To do this, you need to create a control that is based on thinking in some way, and does NOT depend on a test.

In this case, you could conduct a simple stress calculation based on bolted joint design—or do the same thing with finite element analysis. If the joint design is simple enough, you might even find tabulated values in a reference such as Machinery’s Handbook. Best of all, you can do these analyses at worst-case conditions (limits of specification for all relevant components), which is something that is quite difficult to do in testing.

You still may need to test a prototype to confirm the calculation, but you are much more likely to pass the test without trial and error efforts. And trial and error efforts take time and raise product development costs. In a world where faster-better-less cost is critical, this no small thing.

Now Consider the Same Function & Failure Mode in a Process Example

Again, we want to prevent failures. We don’t want to measure the joint after installing the bolt—not only would that be a detection control, but it would also lead to at least one defective process outcome. How can we avoid excessive torque?

To develop the most effective prevention control, you would certainly need to be very familiar with the production facility where the work would be done. That will lead you to the most likely cause. To show how a prevention control might work, let’s assume the most likely cause is that the torque gun is set too high.

Of course, is that a root cause or a superficial cause? So, let’s ask why the gun might be set too high. Let’s further assume that the gun is set high to speed up the operation. (The fact that higher torque might not really do that won’t stop some people from believing that!)

A relatively simple way to overcome this is to use a gun that has a torque limiting clutch. That’s a good prevention control—as long as the gun maintenance is done and the clutch is set to the correct maximum torque. Of course, it’s not a full jidoka control, which will stop production if excess torque is applied, but it’s a decent prevention control.

The fact that’s the torque limiting clutch isn’t a jidoka control means that the detection rating of this control will be lower than a full jidoka control. However, it’s quite a bit better than just putting up a sign that says, “Don’t change the torque setting.”

What About Detection Controls?

On the other side of the coin, detection controls are usually much easier to understand. For the design concern, you might be worried about “excessive bolt deformation” as the most severe effect. A test of prototype parts is certainly a valid detection control. Just remember that there are some statistical limits to what a test can tell you, and that is a result of not being able to do absolute tests at worst-case specification limits.

Similarly, you could check a finished joint in production to look for undue joint deformation due to excess torque. The actual effect might be “excess deflection of component A.” You wouldn’t know something had gone wrong until you found the defect in an inspection step, but at least you’d find it—at least as long as the excess deflection of component A was relatively easy to identify.

Summing Up

Both Prevention Controls and Detection Controls are important and are extremely important in control cost–both costs for product development processes as well as ongoing production costs. However, prevention controls are often more effective that detection controls, and they offer the added benefit of improving the speed of product development.

Are Stage-Gate Product Development Processes Doomed to Fail?

There’s an interesting article published by Machine Design written by Dr. David Ullman comparing “scrum” methods with stage-gate (or “waterfall”) processes.

Ullman, who is a well-known expert in product development methods, presented data suggesting that scrum methods cut project failure rates in half. Good result, but unless you read carefully, you may not have noticed that this was for software and did not relate to hardware.

As Ullman noted, software is less vulnerable to factors that make hardware development difficult. This relates to tooling and production investments—which are minimal or absent in software projects—but also includes greater difficulties that hardware faces in analysis and testing.

Nevertheless, as I’ve found in 47 years of experience with product development processes, hardware-driven projects can have high success rates using stage-gate processes. I’ve personally participated in re-structuring these processes for a number of firms, and, when the process is properly constructed, the failure rate for projects is usually less than 10%.

What’s the Major Factor in Failure?

One of the challenges in hardware design, whether you use scrum or stage-gate, is starting the project properly. If you don’t start well, the chances for failure rise dramatically.

Unfortunately, too many managers, who learned their craft with methods in use 10-20 years ago, have a strong preference to see hardware in some form as early as possible in the project timeline.

That’s a major problem. It’s all about how you get to verification of the design. In a hardware-first culture,  here’s what I have seen time and time again:

Once a basic concept design is generated, there’s a rush to get prototypes that can be looked at, touched, and most importantly, tested. Of course, these early prototypes don’t work properly, so the design is reworked. More samples are tested. They do better, but still don’t meet the goals. This is repeated again and again, usually until time runs out and the design is pushed over to the manufacturing team.

Results of Hardware-First Methods

The net result is the design team learns what doesn’t work, but often doesn’t understand why the design works (assuming final test results actually meet project goals). It’s also next to impossible to control project timing with this method.

The goal becomes simple: build prototypes that will pass all of the required tests.

And, finally, the manufacturing side of the team is slowly losing their minds as engineering changes keep coming until the product is launched. If you’re lucky, changes slow down, but usually keep trickling in for a year or so.

When the product reaches customers, there are often problems. It could be that the tests used didn’t reproduce real-world conditions very well. Often, customers use products in ways the design team didn’t anticipate.

Manufacturing costs and capital budgets are battered and market acceptance can be irreparably damaged.

What You Should Aim For

To avoid this pitfall, managers and executives need to be more patient and deliberate in the early stages. That doesn’t mean using time in a leisurely fashion—far from it. Instead, project teams need to move at breakneck speed, but be more structured and focused.

Instead of a goal that designs need to pass necessary tests, the project goal must be broader. This reflects what Dr. Joseph Juran defined as “quality.” The goal must be to arrive at a properly verified design that is fit for use by targeted customers.

Test criteria can never fully6 address that. There are dozens of reasons why, but I’ve studied this at length, and I find almost no products that failed in the market that didn’t pass tests used to verify the design.

How to Do Better

To create a product that is “fit for use,” spend some detailed, intense time trying to determine all of the functional requirements that will satisfy your customer base.

There’s a lot to do to achieve that, but it can be achieved quickly and comprehensively by following a simple outline.

As with the “early hardware” method, start with a concept design. That means a design that is far enough along to create a rough costed bill of materials.

Then follow these five steps:

    1. Create a block diagram for the entire product.
    2. Decide which sections of the block diagram—what Ullman called modules—cause concern. This will usually be subsystems or components that are new, unique (in terms of application), or difficult. These design elements are sometimes called “NUD’s.”
    3. Next, create a Parameter diagram, and try to determine what the modules of interest must do to make the system work correctly. If you are supplying something that will be used in a bigger system, this will include your customer’s system. You’ll also need to identify the environmental conditions that the product must withstand, as well as the important user factors that will make your product robust against reasonable and expected misuse.
    4. Use the block diagram and the Parameter diagram to create an interface matrix. The resulting functions that fill the matrix will be a detailed requirements statement that need to be evaluated to achieve verification.
    5. Use that list of functions to generate a Design Failure Modes & Effects Analysis. The resulting analytical prevention controls and physical detection controls will give you a list of things needed to achieve verification.

Carry out the analytical prevention controls before you build any significant test samples. Use analytical techniques to debug and modify the design.

Now, build prototypes and conduct the detection-driven control tests.

If you’ve done this well, the odds are you won’t be endlessly repeating tests with multiple hardware iterations. You’ll be able to stay on schedule, be on budget, and have a design that the manufacturing team won’t have to continuously change as they complete their preparation for production.

The Outcome

If you’ve done the proper analysis outlined above, your process outline will now look like this:

 

Of course, there are a lot of details in this process, but the improvement in project cost, project timing, and launch issues is the payoff you can expect from this process. And it fits very well in the familiar outline of a stage-gate process.

Dealing With Risk In Product Development

Failure Modes & Effects Analysis is an incredibly valuable tool for assessing and dealing with risk in product development. However, the traditional “Risk Priority Number” or RPN-taken alone-doesn’t always give clear indications about how you should deal with risk.

What Are Some of the Other Ways to Assess Risk in FMEA?

One way is to simply look at each row on the FMEA form as a self-contained story about one limited aspect of a design or a process. Certainly, developing the FMEA yields a much more useful and meaningful result if you work through the form on a column by column basis—but that doesn’t change the fact that each row tells a tale.

And those line-by-line stories are built around specific cause-mode-effect (C-M-E) chains, because the occurrence and severity ratings are the major factors that determine risk. Certainly, the ability to detect a cause or an effect prior to design release or to process completion does have some impact on risk, and detection is not without importance. Detection, though, is less significant because it is always an imperfect element of risk management.

The biggest impact on risk arises from the severity and occurrence ratings.

With this in mind, the simplest way to assess risk is to read each row, and then, without any kind of calculation or formal rules, just sort the rows into three categories:

    • Those that make you worry a great deal—these are red risk
    • Those that you don’t think are really significant or worth much worry—green risks
    • All of the other rows, with a level of concern between the other two categories—yellow risks

I’ve been looking at FMEA studies—both those I’ve worked on personally and those completed without my participation—for many years and I’ve never had trouble sorting every row into one of these three categories. And, while doing that, I’ve rarely had much disagreement about these judgments from others.

Adding More Formality–But Potentially Less Understanding

Of course, there’s a lack of formality in assessing risk this way, and that informal approach makes many people—particularly senior managers and executives, who don’t always want to apply the effort and take the time to fully understand the details of risk and just want a more recognized and less opinion-based judgment. There’s also a very real concern about bias when it’s done just by high-level judgment.

There are more structured ways to do approximately the same thing, and most of these are based on the concept of criticality—the product of the severity rating and the occurrence rating. The original idea was simple—a higher value means more risk. But, as I explained in the previous Lighthouse post, this still assumes that severity and occurrence have equal significance, and that’s just not true.

So, over the years, various schemes have been proposed to evaluate the concept of criticality. Most are based on these ideas—and detection is not really a factor:

    • Any C-M-E chain with a severity of 9 or 10 is critical. Since human safety and/or serious regulatory issues are likely if a failure mode arises, the impact of occurrence (or detection) does not diminish the fact that this is a potentially serious chain of events.
    • A C-M-E chain with a reasonably high severity (say 5, 6, 7, or 8) in combination with a moderately high occurrence, typically 4 or more, is significant. Moderate severity with a lower occurrence doesn’t meet this test, nor does a high occurrence with a low severity.
    • Anything else is neither critical nor significant.

In this approach, any row—or C-M-E chain—that is either critical or significant requires a classification designation. This is what that skinny “classification” column is all about on most FMEA forms. However, this does not directly identify critical and significant characteristics; the classification is all about the C-M-E chain of events. Characteristics (or parameters for PFMEA) are a separate consideration, and it doesn’t necessary jump out from the FMEA form, unless the concept of function and functional requirements have been carefully worked out as part of the FMEA study.

So, if a C-M-E chain is classified as critical, there must be one or more critical product characteristic that drive the risk. In DFMEA, this can be found in the Function/Functional Requirements columns of the DFMEA. In PFMEA, this often requires a cross-reference to the associated DFMEA, as critical characteristics are always features or requirements that relate to the design of the product, and these don’t always show up clearly in a PFMEA study.

Significant classifications can lead to either a significant characteristic (again, a design element) or to a significant process parameter. Significant characteristics should be available directly from the functional section of the relevant DFMEA. Similarly, in Process FMEA, significant process parameters should be evident in the functional section of the PFMEA itself.

The major drawback with this approach is that less understanding and less attention is paid to risk that ends up leading to a product characteristic or process parameter that’s identified this way. That happens because the rules don’t force careful consideration of the risk issues. The other drawback is that this still assumes that severity and occurrence are equally significant, so sometimes distortion of risk is the result.

 

The Latest Method–Worth Considering

Finally, the 2019 Edition of the Automotive Industry Action Group (AIAG)-Verband der Automobilindustrie (VDA) FMEA handbook takes an approach that may bridge the gap between a simple “red-yellow-green” approach and more formal criticality methods. The AIAG-VDA approach has these features:

    • The results will be organized into three categories, called Action Priorities
    • There will be high, medium, and low categories
    • High issues must be acted on, medium issues should be acted on, and low issues may be considered for action
    • This is not explicitly about risk; instead, this is prioritization of issues that should be undertaken to diminish risk
    • There are guidelines about how to identify high, medium, and low priorities, but they are not completely dissimilar from the ideas that have been used to specifically define criticality using severity and occurrence

There are guidelines about how to identify high, medium, and low priorities, but they are not completely dissimilar from the ideas that have been used to specifically define criticality using severity and occurrence.
Severity is still called severity, but what’s been called occurrence is now called frequency, and detection is now called monitoring. Still, these are still basically the same ideas. There are (as always) new rating tables for these three concepts

After using the Action Priority approach a few times, I’ve found it to be a useful tool that is about the same as the very simple red, yellow, and green assessments I explained above.

As an example of why “action” is more useful than just high risk, consider a safety device that has several C-M-E chains with severities of 10. If the occurrence associated with these chains are all 2, and the detection is 2 in each chain (bearing in mind that getting to an occurrence rating and a detection rating of 1 requires some extraordinary evidence in most rating tables), is there really anything that can be done about this risk—at least at any practical cost?

In these cases, I’ve always said that the main response is to have a very high coverage limit product liability umbrella insurance, because the probability that something will go wrong is very low, and you are very likely to find this before it gets to the marketplace. But if it does, and something goes awry, the consequences can be harsh.

Risk Is Never Just a Number

When you really step back and think about it, any way you approach this, you are using less-than-purely-quantitative methods to assess the factors that drive risk, and then use less-than-purely-quantitative rules to decide what you might do to address the risk you’ve found. And risk can never be eliminated, but can always be managed.

In the end, it’s still judgment, as risk assessment has always been (and may forever be).  Nevertheless, intelligent decision-making based on circumstances and risk preferences can be improved signficantly by applying these ideas.

3 Actions To Make FMEA Worthwhile

FMEA is critical tool in Product Development processes, and you should understand why that’s true.

In this article, I’m going to explain why it’s worthwhile to complete an FMEA. Let’s start with a general discussion, without regard to Process FMEA or Design FMEA. The major issues are the same, although the details can (and often do) diverge. FMEA is crtical tool in Product Development processes, and you should understand why that’s true.

Point 1: FMEA is NOT an End Point.

If you complete an FMEA, heave a sigh of relief, and don’t use the results for meaningful and significant actions, don’t waste your time doing FMEA studies. FMEA is all about improving products and processes, and if you think a finished FMEA is useful because it checks off a “to-do” on some list of deliverables in a project, you are missing the point completely. You’ll spend time (and money) and get little or nothing in return. The FMEA form is just a place to keep track of what can and should be done as a result of the study.

Point 2: You Need to Make Changes As A Result of Every FMEA Study

If you’ve done an FMEA study correctly, you are going to find some cause-mode-effect chains (or rows on the FMEA form) that have a risk you aren’t comfortable with. In those cases you have two choices. You can agree, with an eyes-wide-open perspective, to accept the risk as it sits and move on. But you can also devise a plan to take a direct action, choosing a design change (for either DFMEA or PFMEA) or a process change (PFMEA only) that will address the risk.

If you never try to do anything except go forward the way you planned before starting the FMEA, you are making a terrible mistake. Worse, you are devaluing the FMEA process—and once you do that, you will find that each successive FMEA has less and less significance. Eventually, you’ll just be checking the box and filing the form away in some obscure folder, drive, or database.

Point 3: You Need to Implement The Controls

If you’ve done a decent job on the FMEA study, you will have a long list of prevention controls as well as detection controls. Do you need to add every single one to the Design Verification Plan (DFMEA) or the Manufacturing Control Plan (PFMEA)?

In a perfect world, the answer would be yes. And, there are plenty of OEM customers (and engineering managers) who will insist that you reach for a perfect world. However, reality is somewhat different. In a sound FMEA, there will be a number of controls that you would like to apply, but are either too expensive, take too long, or are otherwise unlikely to be carried out.

In those situations, you again have to deal with an eyes-wide-open risk. Either spend the money and/or time to carry out the controls, change the design or process, or accept the risk.

On the other hand, there will also be one, two, or even several controls that you really should add to the list for verification or control plans. They might add a bit of cost, they might add time to a project, or they might intrude on issues that veer into the “we don’t do that here” mentality of your organization.

If you aren’t actually applying some of these out of every FMEA, you simply aren’t getting to the main point. At the same time, if you are skipping or omitting one, two, or more of the controls you’d normally do for the same kind of product or process, you are—again—missing the boat.

What You Can Do

I’ve been doing FMEA studies since 1978. I’ve learned something about the methodology every time I use it, and I’ve been involved, either as an active participant, instructor, or facilitator, in hundreds of DFMEA and PFMEA studies.

When the results are applied, good things happen. When results are not applied, the outcome is worse than “nothing happened.” What almost always occurs is that FMEA becomes devalued, and another group of engineers walks away with more than a bit of loathing for using this valuable technique.

And, while I suspect that many of you think this advice is too obvious to worry about, I will add this. Even in the past five years, the number of FMEA’s I have seen that have resulted in no meaningful action exceeds 80%. I would say 95%, but in truth I don’t always know if action has been taken in every case.

Use the results. Make something positive happen with what you learn from each and every FMEA.