Statistical Methods/Control Charts

My question is regarding a threading process.  There is 100% inspection for go/no go check and about 5% rejection/rework.  The batch size is 5,000 nos and is completed in 3 days of production. Two such batches are produced in a month.

What type of control chart should be used to monitor the process? How should the process capability be calculated in this case?

The type of control chart first depends on what type of data you are measuring.  If you are doing go/no go then you are limited to a “P” chart or a “C” chart.  A “P” chart looks at % good (or bad).  A “C” chart looks at the number of defects found.

If you are measuring thickness or strength, (something that can be measured), then you can use a X-bar/R chart or an X-bar/S chart depending on many samples are taken.

That is the simple answer; part of this depends on how you are taking samples and how often.  If samples are taken at the start and the finish, then I would probably recommend the “P” chart.

If you can measure throughout the manufacturing process, and you look at the type of defects, then I recommend a “C” chart.

Ideally, if you can get measurement data, you are better off with the X-bar/R or the X-bar/S charts.  These tend to be better predictors and it is easier to calculate capability.

With the capability for the go/no go data, you can get % defective, (or % good) and multiply that by 1,000,000 to get your capability estimate in defects per million.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
Fort Worth, TX

Additional ASQ resources:

ASQ Learn About Quality- Control Charts

The Shewhart p Chart for Comparisons
by Marilyn K. Hart and Robert F. Hart


Control Chart to Analyze Customer Satisfaction Data

Control chart, data, analysis

Q: Let’s assume we have a process that is under control and we want to monitor a number of key quality characteristics expressed through small subjective scales, such as: excellent, very good, good, acceptable, poor and awful. This kind of data is typically available from customer satisfaction surveys, peer reviews, or similar sources.

In my situation, I have full historical data available and the process volume average is approximately 200 deliveries per month, giving me enough data and plenty of freedom to design the control chart I want.

What control chart would you recommend?

I don’t want to reduce my small scale data to pass/fail, since I would lose insight in the underlying data. Ideally, I’d like a chart that both provides control limits for process monitoring and gives insight on the repartition of scale items (i.e., “poor,” “good,” “excellent”).

A: You can handle this analysis a couple of ways.  The most obvious choice and probably the one that would give you the most information is a Q-chart. This chart is sometimes called a quality score chart.

The Q-chart assigns a weight to each category. Using the criteria presented, values would be:

  • excellent = 6
  • very good =5
  • good =4
  • acceptable =3
  • poor =2
  • awful=1.

You calculate the subgroup score by taking the weight of each score and multiply it by the count and then add all of the totals for the subgroup mean.

If 100 surveys were returned with results of 20 that were excellent, 25 very good, 25 good, 15  acceptable, 12 poor, and 3 awful, the calculation is:

6(20)+5(25)+4(25)+3(15)+2(12)+3(1)= 417

This is your score for this subgroup.   If you have more subgroups, you can calculate a grand mean by adding all the subgroup scores and dividing it by the number of subgroups.

If you had 10 subgroup scores of 417, 520, 395, 470, 250, 389, 530, 440, 420, and 405, the grand mean is simply:

((417+ 520+ 395+ 470+ 250+ 389+ 530+ 440+ 420+ 405)/10) = 4236/10 =423.6

The control limits would be the grand mean +/- 3 √grand mean.  Again, in this example, 423.6 +/-3√423.6 = 423.6 +/-3(20.58).   The lower limit is  361.86 and the upper limit is 485.34. This gives you a chance to see if things are stable or not.  If there is an out of control situation, you need to investigate further to find the cause.

The other choice is similar, but the weights have to total to 1. Using the criteria presented, the values would be:

  •  excellent = .3
  • very good = .28
  • good =.25
  • acceptable =.1
  • poor=.05
  • awful = .02.

You would calculate the numbers the same way for each subgroup:

.3(20)+.28(25)+.25(25)+.1(15)+.05(12)+.02(1)= 6+7+6.25+1.5+.6+.02=21.37

If you had 10 subgroup scores of 21.37, 19.3, 20.22, 25.7, 21.3, 17.2, 23.3, 22, 19.23, and 22.45, the grand mean is simply ((21.37+ 19.3+ 20.22+ 25.7+ 21.3+ 17.2+ 23.3+ 22+ 19.23+ 22.45)/10)= 212.07/10 =21.207.

The control limits would be the grand mean +/- 3 √grand mean.  Therefore, the limits would be 21.207+/-3 √21.207= 21.207+/-3(4.605).  The lower limit is  7.39 and the upper limit is 35.02.

The method is up to you.  The weights I used were simply arbitrary for this example. You would have to create your own weights for this analysis to be meaningful in your situation.  In the first example, I have it somewhat equally weighted. In the second example, it is biased to the high side.

I hope this helps.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
Fort Worth, TX

Related Resources from the ASQ Knowledge Center:

Find more open access articles and resources about control charts in ASQ Knowledge Center search results:

Learn About Quality: Control Charts

The control chart is a graph used to study how a process changes over time. Data  are plotted in time order. A control chart always has a central line for the  average, an upper line for the upper control limit and a lower line for the lower control limit. These lines are determined from historical data. Read the full overview and download a free control template here.

Should Observations Be Grouped for Effective Process Monitoring? Journal of Quality Technology

During process monitoring, it is assumed that a special cause will result in a sustained shift in a process parameter that will continue until the shift is detected and the cause is removed.

In some cases, special causes may produce a transient shift that lasts only a short time. Control charts used to detect these shifts are usually based on samples taken at the end of the sampling interval d, but another option is to disperse the sample over the interval. For this purpose, combinations of two Shewhart or two cumulative sum (CUSUM) charts are considered. Results demonstrate that the statistical performance of the Shewhart chart combination is inferior compared with the CUSUM chart combination. Read more.

The Use of Control Charts in Health-Care and Public-Health Surveillance (With Discussion and Rejoinder), Journal of Quality Technology

Applications of control charts in healthcare monitoring and public health surveillance are introduced to industrial practitioners. Ideas that originate in this venue that may be applicable in industrial monitoring are discussed. Relevant contributions in the industrial statistical process control literature are considered. Read more.

Browse ASQ Knowledge Center search results for more open access articles about control charts.

Find featured open access articles from ASQ magazines and journals here.

AQL for Electricity Meter Testing

Chart, graph, sampling, plan, calculation, z1.4

Q: We have implemented a program to test electricity meters that are already in use. This would target approximately 28,000 electricity meters that have been in operation for more than 15 years. Under this program, we plan to test a sample of meters and come to a conclusion about the whole batch  —  whether replacement is required or not. As per ANSI/ISO/ASQ 2859-1:1999: Sampling procedures for inspection by attributes — Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection, we have selected a sample of 315 to be in line with the total number of electricity meters in the batch.

Please advice us on how to select an appropriate acceptable quality level (AQL) value to accurately reflect the requirement of our survey and come in to a decision on whether the whole batch to be rejected and replaced. Thank you.

A: One of the least liked phrases uttered by statisticians is “it depends.” Unfortunately, in response to your question, the selection of the AQL depends on a number of factors and considerations.

If one didn’t have to sample from a population to make a decision, meaning we could perform 100% inspection accurately and economically, we wouldn’t need to set an AQL. Likewise, if we were not able to test any units from the population at all, we wouldn’t need the AQL. It’s the sampling and associated uncertainty that it provides that requires some thought in setting an AQL value.

As you may notice, the lower the AQL the more samples are required. Think of it as reflecting the size of a needle. A very large needle (say, the size of a telephone pole) is very easy to find in a haystack. An ordinary needle is proverbially impossible to find. If you desire to determine if all the units are faulty or not (100% would fail the testing if the hypothesis is true), that would be a large needle and only one sample would be necessary. If, on the other hand, you wanted to find if only one unit of the entire population is faulty, that would be a relatively small needle and 100% sampling may be required, as the testing has the possibility of finding all are good except for the very last unit tested in the population.

AQL is not the needle or, in your case, the proportion of faulty fielded units. It is the average quality level which is related to the proportion of bad units. The AQL is fixed by the probability of a random sample being drawn from a population with an unknown actual failure rate of the AQL (say 0.5%), creating a sample that has a sample failure rate of 0.5% or less. We set the probability of acceptance relatively high, often 95%. This means if the population is actually mostly as good as or better than our AQL, we have a 95% chance of pulling a sample that will result in accepting the batch as being good.

The probability of acceptance is built into the sampling plan. Drafting an operating characteristic curve of your sampling plan is helpful in understanding the relationship between AQL, probability of acceptance, and other sampling related values.

Now back to the comment of “it depends.” The AQL is the statement that basically says the population is good enough – an acceptable low failure rate. For an electrical meter, the number of out of specification may be defined by contract or agreement with the utility or regulatory body. As an end customer, I would enjoy a meter that under reports my electricity use as I would pay for less than I received. The utility company would not enjoy this situation, as it provides their service at a discount. And you can imagine the reverse situation and consequences. Some calculations and assumptions would permit you to determine the cost to the consumers or to the utility for various proportions of units out of specification, either over or under reporting. Balance the cost of testing to the cost to meter errors and you can find a reasonable sampling plan.

Besides the regulatory or contract requirements for acceptable percent defective, or the balance between costs, you should also consider the legal and publicity ramifications. If you accept 0.5% as the AQL, and there are one million end customers, that is 5,000 customers with possibly faulty meters. What is the cost of bad publicity or legal action? While not likely if the total number of faulty units is small, there does exist the possibility of a very expensive consequence.

Another consideration is the measurement error of the testing of the sampled units. If the measurement is not perfect, which is a reasonable assumption in most cases, then the results of the testing may have some finite possibilities to not represent the actual performance of the units. If the testing itself has repeatability and reproducibility issues, then setting a lower AQL may help to provide a margin to guard from this uncertainty. A good test (accurate, repeatable, reproducible, etc.) should have less of an effect on the AQL setting.

In summary, if the decision based on the sample results is important (major expensive recall, safety or loss of account, for example), then use a relatively lower AQL. If the test result is for an information gathering purpose which is not used for any major decisions, then setting a relatively higher AQL is fine.

If my meter is in the population under consideration, I am not sure I want my meter evaluated. There are three outcomes:

  • The meter is fine and in specification, which is to be expected and nothing changes.
  • The meter is overcharging me and is replaced with a new meter and my utility bill is reduced going forward. I may then pursue the return of past overcharging if the amount is worth the effort.
  • The meter is undercharging me, in which case I wouldn’t want the meter changed nor the back charging bill from the utility (which I doubt they would do unless they found evidence of tampering).

As an engineer and good customer, I would want to be sure my meter is accurate, of course.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability

Related Content:

To obtain more resources on sampling and statistics, read the featured ASQ journal articles below or browse ASQ Knowledge Center search results.

Rethinking Statistics for Quality Control, Quality Engineering

Setting Appropriate Fill Weight Targets — A Statistical Engineering Case Study, Quality Engineering

Compliance Testing for Random Effects Models With Joint Acceptance Criteria, Technometics


Variation in Continuous and Discrete Measurements

Q: I would appreciate some advice on how I can fairly assess process variation for metrics derived from “discrete” variables over time.

For example, I am looking at “unit iron/unit air” rates for a foundry cupola melt furnace in which the “unit air” rate is derived from the “continuous” air blast, while the unit iron rate is derived from input weights made at “discrete” points in time every 3 to 5 minutes.

The coefficient of variation (CV), for the air rate is exceedingly small (good) due to its “continuous’ nature” but the CV for iron rate is quite large because of its “discrete nature,” even when I use moving averages for extended periods of time. Hence, that seemingly large variation for iron rate then carries over when computing the unit iron/unit air rate.

I think the discrete nature of some process variables results in unfairly high assessments of process variation, so I would appreciate some advice on any statistical methods that would more fairly assess process variation for metrics derived from discrete variables.

A: I’m not sure I fully understand the problem, But I do have a few assumptions and possibly a reasonable answer for you. As you know, when making a measurement, using a discrete scale (red, blue, green; on/off, or similar), the item being measured is placed into one of the “discrete” buckets. For continuous measurements, we use some theoretically infinite scale to place the units location on that scale. For this latter type of measurement, we are often limited by the accuracy of the equipment to the level of precision the measurement can be accomplished.

In the question, you mention measurements of air from the “continuous” air blast. The air may be moving without interruption (continuously), yet the measurement is probably recorded periodically unless you are using a continuous chart recorder. Even so, matching up the reading with the unit iron readings every 3 to 5 minutes, does create individual readings for the air value. The unit iron reading is a “weights” based reading (not sure what is meant by derived, yet let’s assume the measurement is a weight scale of some sort.) Weight, like mass or length, is an infinite scale measurement, limited by the ability of the specific measurement system to differentiate between sufficiently small units.

I think you see where I’m heading with this line of thought. The variability with the unit iron reading may simply reflect the ability of the measurement process. I do not think either air rate or unit iron (weight based) is a discrete measurement, per se. Improve the ability to measure the unit iron and that may reduce some measurement error and subsequent variation. Or, it may confirm that the unit iron is variable to an unacceptable amount.

Another assumption I could make is that the unit iron is measured for the batch that then has unit air rates regularly measured. The issue here may just be the time scales involved. Not being familiar with the particular process involved, I’ll assume some manner of metal forming, where a batch of metal is created then formed over time where the unit air is important. And, furthermore, assume the batch of metal takes an hour for the processing. That means we would have about a dozen or so readings of unit air for the one reading of unit iron.

If you recall, the standard deviation formula is divided by square root of n (number of samples). In this case, there is about a 10 to 1 difference in n (10 for unit air to one for unit iron). Over many batches of metal, the ratio of readings remains at or about 10 to 1, thus impacting the relative stability of the two coefficient of variations. Get more readings for unit iron or reduce the unit air readings, and it may just even out. Or, again, you may discover the unit iron readings and underlying process is just more variable.

From the scant information provided, I think this provides two areas to conduct further exploration. Good luck.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability

Could Null Hypothesis State Difference?

Q: Does a null hypothesis always state that there is no difference?  Could there be a null hypothesis that claims there is? 

In the U.S. legal system, the null hypothesis is that the accused is assumed innocent until proven guilty.  In another legal system, there might exist the possibility that the accused is assumed guilty until proven innocent.  In our system, a type 1 error would be to find an innocent man guilty.  What would be considered a type 1 error if the null hypothesis was assumed guilt?

A: Sir Ronald Fisher developed this basic principle more than 90 years ago. As you have correctly stated above, the process is assumed innocent until proven guilty. You must have evidence beyond reasonable doubt. An alpha error (type 1) is calling an innocent person guilty. Failure to prove guilt when a person really did commit a crime is a Beta error (type 2).

What can null hypothesis tell us?  Does the confidence interval include zero (or innocence in the court example)? Instead of asking, “can you assume guilt and prove innocence?” — turn the question around and ask “does the confidence interval include some value that is guilty?”

For example, let’s say a process has an unknown mean and standard deviation, but it has customer specifications from 8-12 millimeters. Your sample measures 14 millimeters. Clearly, your sample is guilty by customer specifications. We need to prove beyond reasonable doubt that the confidence interval of the process, at some risk level (alpha), does not include guilty material. This is done by measuring the process for control.  If it is in control and not meeting customer specifications, either move the distribution, reduce the variation (through Design of Experiments, or other methods), or through some combination of both.

If the new confidence interval does not include guilt, the argument would be that you have proven, beyond reasonable doubt, that the confidence interval does not include the out-of-spec material. Under this circumstance, a type 1 error (alpha error) would be a process  mean less than the upper specification, but the confidence interval included the specification.

For further reading on this material, refer to the following text:

Testing Statistical Hypothesis, E. L. Lehmann and Joseph Romano,  January 2005.

 Bill Hooper
ASQ Certified Six Sigma Master Black Belt
President, William Hooper Consulting Inc.
Naperville, IL