## AQL Clarifications

Question

I am confused about the values used for AQLs. For example in Table II-A the AQL values range from 0.010 to 1000. Where do these values come from and what do they mean?

The table states, “AQLs, in Percent Nonconforming Items and Nonconformities per 100 Items .” At first I thought the values were percentages, but how can you have more than 100, as in 100%, as the values go up to 1000? Also how can there be more than 100 nonconformities per 100 items, unless one part can have multiple nonconformities?

Just looking for clarification on the AQL numbers, what they mean, and how to interpret them.

Let’s start with the definition of Acceptable Quality Level (AQL).  From Z1.4, the AQL is the quality level that is the worst tolerable process average when a continuing series of lots is submitted for acceptance sampling.  Although individual lots with quality as bad as the AQL can be accepted with fairly high probability, the designation of an AQL does not suggest that this is necessarily a desirable quality level. The AQL is a parameter of the sampling scheme and should not be confused with a process average which describes the operating level of a manufacturing process. It is expected that the product quality level will be less than the AQL to avoid excessive non-accepted lots.

The columns with percentages greater than 100% should not be included in the standard, but remain as indication of how to interpret lots where the entire sample is defective.  It has some statistical relevance with use of the switching rules, but for the general practitioner, it should be ignored.

Hope this helps.

Steven Walfish

## Z1.4: Selecting the Sample Size

Q: I work for a pharmaceutical company that manufactures soft gel capsules. What is the proper way to select a sample size when using ANSI/ASQ Z1.4-2008: Sampling Procedures and Tables for Inspection by Attributes?

I’ll further illustrate my question with an example.  If one were to have a batch size of 20,000 units, according to General Inspection Level II, Normal, the corresponding letter code is “M.” In the master table for Acceptable Quality Levels (AQLs), the sample size would be 315 units.  If my AQL is 0.010 (with an acceptance/rejection number of 0/1 based on the table), does my sample size change to 1250 units? Or does it remain at 315 units?

A: The simple answer is 1250, not 315 suggested for sample size letter M.  General Inspection Level II, Normal, shows that for a lot size of 20,000, a sample size code level of M corresponds to a sample size of 315.  For an AQL of 0.01, the arrow points to a sample size of 1250 (sample size letter code Q) to have the required AQL of 0.01.

The calculation of AQL is not dependent on lot size.  In other words, a sample size of 315 gives a minimum AQL of 0.04, so a larger sample is required to estimate an AQL of 0.01.

Q2: Could you please add another layer to your response? The reason I’m seeking additional clarification is that the first step in determining the sample size is to find the letter code and the corresponding sample size. To me, it feels like the first step should be to determine the AQL.

A2: Let me expand with a more technical explanation.  Attribute sampling is based on the hypergeometric distribution and is estimated using the binomial distribution (which assumes an infinite population size).

The basic formula for the binomial is:

AQL and LQ for a given sample size (n) and defects allowed (x):

If n=30, x=0; AQL=0.17%; LQ=7.4%:

If you are using Z1.4, your sample size is selected based on your lot size.  Then, you would pick the AQL you need based on the risk you are willing to take for the process average of percent defective.  If you decide to not use Z1.4, but instead use the binomial directly, then you are correct that you would decide on the AQL and lot tolerance proportion defective (LTPD) first, then calculate a sample size for c=0, c=1, c=2, and etc.

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com

Related Content:

Acceptance Sampling With Rectification When Inspection Errors Are Present, Journal of Quality Technology

Zero Defect Sampling, World Conference on Quality and Improvement, open access

Explore ASQ’s website for more case studies, articles, benchmarking reports, and other content about zero defect sampling.

## Z1.4 Split Sampling

Q: I have two questions about Z1.4-2008: Sampling Procedures and Tables for Inspection by Attributes.

1. Does the plan allow one to “split” sampling plans among multiple items, or is only one item per plan intended?

2. The plan states a 95% confidence level, which means the findings of the sampling will statistically show that the findings (or number of defects) will be consistent with the findings of the entire inspected lot. So, if we split the sampling, how can you determine what happens to the confidence level?

A: Thank you for submitting your question to ASQ’s Ask the Experts Program. Answers to your inquiries follow.

1. In attempting to answer any given question, one needs to understand the question with respect to its gist and terms used.

Z1.4 uses the term “unit” to represent an individual “product” entity (unit here can represent a discrete fairly simple product, such as a bolt or nut), or it can represent a complex product (such as a computer, or a large piece of machinery, or even a square meter of cloth or other material, a length of wire or other material, etc.).

It is assumed here that the use of the term “item” in the question refers to a “unit.” It might, however, refer to a quality characteristic, and the explanation given here will attempt to explain either case.

Now, units can have a single principal quality characteristic or they can have many different quality characteristics.

Z1.4 allows for some of these quality characteristics to be of greater importance (severity for example, with respect to quality and/or economic effects) than others, whereby separate sampling is applied to each group with different sampling parameters (such as sample size, acceptance number, lot size). Hence, units with a single quality characteristic can be checked by sampling via Z1.4 and units with multiple quality characteristics can be checked by sampling via Z1.4.

In each case, the chosen Acceptable Quality Limit (AQL) and what it stands for applies to whatever is included in the inspection made on each unit. It is also assumed that this separate handling of units and quality characteristics is what the question means with respect to the term “split.”

Furthermore, it should also be understood that sampling inspection can be conducted with respect to two distinctly different statistics. One is the number of nonconforming units found in the sample. These are sometimes referred to as “defectives.” The second is the number (sum) of nonconformities found on all units in the sample, where any given single unit can have multiple nonconformities. These are often referred to as “defects.”

A “nonconforming unit” is defined as a unit with one or more nonconformities (defects) — but counted only as one “defective” unit. A “nonconformity” is any departure for any quality characteristic being considered in the inspection of each unit. In Z1.4, one can use either statistic as desired. The choice is largely dependent on the nature of product units and the reason for doing the sampling inspection — whether it is to control or oversee defective units or to control or oversee defects.

In the tables of Z1.4, note the top line above the range of AQLs: “Acceptance Quality Limits (AQLs), Percent Nonconforming Items and Nonconformities per 100 Items”. It should also be pointed out that Z1.4 is intended to be a sampling scheme or system, not just a selection of a given sampling plan. Please review the standard and any number of excellent books available on sampling inspection covering Z1.4, ISO 2859, and etc.

2. If one examines the Z1.4 standard from cover to cover, one will not encounter the term “confidence level.” Z1.4 contains no confidence intervals (or levels) related to any of its features.

Furthermore, the 95% figure is a very general figure associated with the expected “probability of acceptance” at the designated (selected) AQL. This is NOT a confidence level! In fact, the AQL is NOT a statistic!

Setting an AQL is generally an agreement/negotiation process between the customer and supplier. It is more of an index. Essentially, it refers to a level of nonconformity that is generally “acceptable” — a value of 0 being desired of course — but otherwise, a compromise figure.

And it is not by any means a constant, as can be seen by examining the Operating Characteristic (OC) Curves for the various code letters A through R using the same AQL in every table.

For example, for an AQL of 2.5% with the code letter C plan, incoming quality p must be 1.03% for Pa to be 95%, and Pa at 2.5% is less than 90%; for the code letter F plan, p must be 1.80% for Pa to be 95% and Pa at 2.5% is between 90% and 95%, etc.

If confidence intervals at chosen levels are desired for any given sampling plan, one most resort to the theory and methodologies of statistical inference with the available information provided by the sample statistics.

Kenneth Stephens
ASQ Fellow
ASQ Quality Press Author

For more on this topic, please visit ASQ’s website.

## Terminology for Inspected Material (GMP, ISO 13485)

Q: There is often confusion with the labeling of purchased materials  after they have been “inspected, tested and/or verified” according to good manufacturing practice (GMP)
requirements.  Once out of quarantine, are purchased materials labeled as accepted, approved or released?  I’ve had auditors and inspectors tell me all three.

A: Either term (accepted, approved, or released) is appropriate and commonly used.  It would appear that the auditors are voicing an opinion and shouldn’t be. Neither ISO 13485:2003: Medical devices — Quality management systems — Requirements for
regulatory purposes or FDA’s quality system regulation (QSR) specify what language is to be used.

ISO 13485:2003, clause 7.5.3.3 status identification, states:

“The organization shall identify the product status with respect to monitoring and measurement requirements.  The identification of product status shall be maintained throughout production, storage, installation and servicing of the product to ensure that only product that has passed the required inspections and test … is dispatched, used or installed.”

FDA 21 CFR 820.86 acceptance status requires:

“Each manufacturer shall identify by suitable means the acceptance status of product, to indicate the conformance or nonconformance of product with acceptance criteria. The identification of acceptance status shall be maintained throughout manufacturing, packaging, labeling, installation, and serving of the product to ensure that only product which has passed the required acceptance activities is distributed, used, or installed.”

The requirement should be clear for purchased materials: identify so that only those materials that passed acceptance activities are allowed to be used.  Neither the standard or regulation states how the material is to be identified.  That is up to the manufacturer to define in its operating procedure(s).

My personal recommendation is to use the terms “accept/reject” at receiving and during in-process, then use the terms “release/hold” to mean the final product is or is not to be released for distribution.  But any similar terms are fine as long as they are consistently used throughout the quality system and personnel understand the requirement that they can only use product that passed their acceptance activities.

Jim Werner
Voting member to the U.S. TAG to ISO TC 176 Quality Management and Quality Assurance
Medical Device Quality Compliance (MDQC), LLC.
ASQ Senior Member
ASQ CQE, CQA, RABQSA Lead QMS Assessor

For more on this topic, please visit ASQ’s website.

## Sampling in a Call Center

Q: I work as a quality assessor (QA) and I am assisting with a number of analyses in a call center. I need a little help with sampling. My questions are as follows:

1. How do I sample calls taken by an agent if there are six assessors and 20 call center agents that each make 100 calls per day?

2. I am assessing claims paid and I want to determine the error rate and the root cause. How many of those claims would have to be assessed by the same number of QAs if claims per day, per agent, exceed 100?

3. If there are 35 interventions made by an agent per day, with two QAs assessing 20 agents in this environment, then the total completed would amount to between 300 to 500 per month. What would be the sample size be in this situation?

A: I may be able to provide some ideas to help solve your problem.

The first question is about sampling calls per day by you and your fellow assessors. It is clear that the six assessors are not able to cover all of the calls handled by the 20 call center agents.

What is missing from the question is what are you measuring — customer satisfaction, correct resolution of issues, whether agents are appropriately following call protocols, or something else? Be very clear on what you are measuring.

For the sake of providing a response, let’s say you are able to judge whether the agents are appropriately addressing callers’ issues or not. A binary response, or simply a call, is either considered good or not (pass/fail). While this may oversimply your situation, it may be instructive on sampling.

Recalling some basic terms from statistics, remember that a sample is taken from some defined population in order to characterize or understand the population. Here, a sample of calls are assessed and you are interested in what portion of the calls are handled adequately (pass). If you could measure all calls, that would provide the answer. However, a limit on resources requires that we use sampling to estimate the population proportion of adequate calls.

Next, consider how sure you want the results of the sample to reflect the true and unknown population results. For example, if you don’t assess any calls and simply guess at the result, there would be little confidence in that result.

Confidence in sampling in one manner represents the likelihood that the sample is within a range of about the sample’s result. A 90 percent confidence means that if we repeatedly draw samples from the population, then the result from the sample would be within a confidence bound (close to the actual and unknown result) 90 percent of the time. That also means that the estimate will be wrong 10 percent of the time due to errors caused by sampling. This error is simply the finite chance that the sample draws from more calls that “pass” or “fail.” The sample, thus, is not able to accurately reflect the true population.

Setting the confidence is a reflection on how much risk one is willing to take related to the sample providing an inaccurate result. A higher confidence requires more samples.

Here is a simple sample size formula that may be useful in some situations.

n is samples size

C is confidence where 90% would be expressed as 0.9

pi is proportion considered passing, in this case good calls.

ln is  the natural logarithm

If we want 90 percent confidence that at least 90 percent of all calls are judged good (pass), then we need at least 22 monitored calls.

This formula is a special case of the binomial sample size calculation and assumes that there are no failed calls in the calls monitored. This assumes that if we assess 22 calls and none fail, that we have at least 90% confidence that the population has at least 90% good calls. If there is a failed call out the 22 assessments, we have evidence that we have less than 90 percent confidence of at least 90 percent good calls. This doesn’t provide information to estimate the actual proportion, yet it is a way to detect if the proportion falls below a set level.

If the intention is to estimate the population proportion of good vs. bad calls, then we use a slightly more complex formula.

pi is the same, the proportion of good calls vs. bad calls

z is the area under a standard normal distribution corresponding to alpha/2 (for 90 percent confidence, we have 90 = 100 percent (1-alpha), thus, in this case alpha is 0.1. The area under the standard normal distribution is 1.645.

E is related to accuracy of the result. It defines a range within which the estimate should reside about the resulting estimate of the population value. A higher value of E reduces the number of samples needed, yet the result may be further away from the true value than desired.

The value of E depends on the standard deviation of the population. If that is not known, just use an estimate from previous measurements or run a short experiment to determine a reasonable estimate. If the proportion of bad calls is the same from day-to-day and from agent-to-agent,  then the standard deviation may be relatively small. If, on the other hand, there is agent-to -agent and day-to-day variation, the standard deviation may be relatively large and should be carefully estimated.

The z value is directly related to the confidence and affects the sample size as discussed above.

Notice that pi, the proportion of good calls, is in the formula. Thus if you are taking the sample in order to estimate an unknown pi, then to determine sample size, assume pi is 0.5. This will generate the largest possible sample size and permit an estimate of pi with confidence of 100 percent (1-alpha) and accuracy of E or better. If you know pi from previous estimates, then use it to help reduce the sample size slightly.

Let’s do an example and say we want 90 percent confidence. The alpha is 0.1 and the z alpha/2 is 1.645. Let’s assume we do not have an estimate for pi, so we will use 0.5 for pi in the equation. Lastly, we want the final estimate based on the sample to be within 0.1 (estimate of pi +/- 0.1), so E is 0.1.

Running the calculation, we find that we need to sample 1,178 calls to meet the constraints of confidence and accuracy. Increasing the allowable accuracy or increasing the sampling risk (higher E or higher C) may permit finding a meaningful sample size.

It may occur that obtaining a daily sample rate with an acceptable confidence and accuracy is not possible. In that case, sample as many as you can. The results over a few days may provide enough of a sample to provide an estimate.

One consideration with the normal approximation of a binomial distribution for the second sample size formula is it breaks down when either pi n and n (1-pi) are less than five. If either value is less than five, then the confidence interval is large enough to be of little value. If you are in this situation, use the binomial distribution directly rather than the normal approximation.

One last note. In most sampling cases, the overall size of the population doesn’t really matter too much. A population of about 100 is close enough to infinite that we really do not consider the population size. A small population and a need to sample may require special treatment of sampling with or without replacement, plus adjustments to the basic sample size formulas.

Creating the right sample size to a large degree depends on what you want to know about the population. In part, you need to know the final result to calculate the “right” sample size, so it often just an estimate. By using the above equations and concepts, you can minimize risk of determining an unclear result, yet it will always be an evolving process to determine the right sample size for each situation.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability
http://www.fmsreliability.com

## AQL for Electricity Meter Testing

Q: We have implemented a program to test electricity meters that are already in use. This would target approximately 28,000 electricity meters that have been in operation for more than 15 years. Under this program, we plan to test a sample of meters and come to a conclusion about the whole batch  —  whether replacement is required or not. As per ANSI/ISO/ASQ 2859-1:1999: Sampling procedures for inspection by attributes — Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection, we have selected a sample of 315 to be in line with the total number of electricity meters in the batch.

Please advice us on how to select an appropriate acceptable quality level (AQL) value to accurately reflect the requirement of our survey and come in to a decision on whether the whole batch to be rejected and replaced. Thank you.

A: One of the least liked phrases uttered by statisticians is “it depends.” Unfortunately, in response to your question, the selection of the AQL depends on a number of factors and considerations.

If one didn’t have to sample from a population to make a decision, meaning we could perform 100% inspection accurately and economically, we wouldn’t need to set an AQL. Likewise, if we were not able to test any units from the population at all, we wouldn’t need the AQL. It’s the sampling and associated uncertainty that it provides that requires some thought in setting an AQL value.

As you may notice, the lower the AQL the more samples are required. Think of it as reflecting the size of a needle. A very large needle (say, the size of a telephone pole) is very easy to find in a haystack. An ordinary needle is proverbially impossible to find. If you desire to determine if all the units are faulty or not (100% would fail the testing if the hypothesis is true), that would be a large needle and only one sample would be necessary. If, on the other hand, you wanted to find if only one unit of the entire population is faulty, that would be a relatively small needle and 100% sampling may be required, as the testing has the possibility of finding all are good except for the very last unit tested in the population.

AQL is not the needle or, in your case, the proportion of faulty fielded units. It is the average quality level which is related to the proportion of bad units. The AQL is fixed by the probability of a random sample being drawn from a population with an unknown actual failure rate of the AQL (say 0.5%), creating a sample that has a sample failure rate of 0.5% or less. We set the probability of acceptance relatively high, often 95%. This means if the population is actually mostly as good as or better than our AQL, we have a 95% chance of pulling a sample that will result in accepting the batch as being good.

The probability of acceptance is built into the sampling plan. Drafting an operating characteristic curve of your sampling plan is helpful in understanding the relationship between AQL, probability of acceptance, and other sampling related values.

Now back to the comment of “it depends.” The AQL is the statement that basically says the population is good enough – an acceptable low failure rate. For an electrical meter, the number of out of specification may be defined by contract or agreement with the utility or regulatory body. As an end customer, I would enjoy a meter that under reports my electricity use as I would pay for less than I received. The utility company would not enjoy this situation, as it provides their service at a discount. And you can imagine the reverse situation and consequences. Some calculations and assumptions would permit you to determine the cost to the consumers or to the utility for various proportions of units out of specification, either over or under reporting. Balance the cost of testing to the cost to meter errors and you can find a reasonable sampling plan.

Besides the regulatory or contract requirements for acceptable percent defective, or the balance between costs, you should also consider the legal and publicity ramifications. If you accept 0.5% as the AQL, and there are one million end customers, that is 5,000 customers with possibly faulty meters. What is the cost of bad publicity or legal action? While not likely if the total number of faulty units is small, there does exist the possibility of a very expensive consequence.

Another consideration is the measurement error of the testing of the sampled units. If the measurement is not perfect, which is a reasonable assumption in most cases, then the results of the testing may have some finite possibilities to not represent the actual performance of the units. If the testing itself has repeatability and reproducibility issues, then setting a lower AQL may help to provide a margin to guard from this uncertainty. A good test (accurate, repeatable, reproducible, etc.) should have less of an effect on the AQL setting.

In summary, if the decision based on the sample results is important (major expensive recall, safety or loss of account, for example), then use a relatively lower AQL. If the test result is for an information gathering purpose which is not used for any major decisions, then setting a relatively higher AQL is fine.

If my meter is in the population under consideration, I am not sure I want my meter evaluated. There are three outcomes:

• The meter is fine and in specification, which is to be expected and nothing changes.
• The meter is overcharging me and is replaced with a new meter and my utility bill is reduced going forward. I may then pursue the return of past overcharging if the amount is worth the effort.
• The meter is undercharging me, in which case I wouldn’t want the meter changed nor the back charging bill from the utility (which I doubt they would do unless they found evidence of tampering).

As an engineer and good customer, I would want to be sure my meter is accurate, of course.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability
http://www.fmsreliability.com

For more on this topic, please visit ASQ’s website