Statistical Methods/Control Charts

Question:
My question is regarding a threading process.  There is 100% inspection for go/no go check and about 5% rejection/rework.  The batch size is 5,000 nos and is completed in 3 days of production. Two such batches are produced in a month.

What type of control chart should be used to monitor the process? How should the process capability be calculated in this case?

Response:
The type of control chart first depends on what type of data you are measuring.  If you are doing go/no go then you are limited to a “P” chart or a “C” chart.  A “P” chart looks at % good (or bad).  A “C” chart looks at the number of defects found.

If you are measuring thickness or strength, (something that can be measured), then you can use a X-bar/R chart or an X-bar/S chart depending on many samples are taken.

That is the simple answer; part of this depends on how you are taking samples and how often.  If samples are taken at the start and the finish, then I would probably recommend the “P” chart.

If you can measure throughout the manufacturing process, and you look at the type of defects, then I recommend a “C” chart.

Ideally, if you can get measurement data, you are better off with the X-bar/R or the X-bar/S charts.  These tend to be better predictors and it is easier to calculate capability.

With the capability for the go/no go data, you can get % defective, (or % good) and multiply that by 1,000,000 to get your capability estimate in defects per million.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CSSMBB
Fort Worth, TX


Additional ASQ resources:

ASQ Learn About Quality- Control Charts

The Shewhart p Chart for Comparisons
by Marilyn K. Hart and Robert F. Hart

 

Z1.4 Split Sampling

Chemistry, micro testing, chemical analysis, sampling

Q: I have two questions about Z1.4-2008: Sampling Procedures and Tables for Inspection by Attributes.

1. Does the plan allow one to “split” sampling plans among multiple items, or is only one item per plan intended?

2. The plan states a 95% confidence level, which means the findings of the sampling will statistically show that the findings (or number of defects) will be consistent with the findings of the entire inspected lot. So, if we split the sampling, how can you determine what happens to the confidence level?

A: Thank you for submitting your question to ASQ’s Ask the Experts Program. Answers to your inquiries follow.

1. In attempting to answer any given question, one needs to understand the question with respect to its gist and terms used.

Z1.4 uses the term “unit” to represent an individual “product” entity (unit here can represent a discrete fairly simple product, such as a bolt or nut), or it can represent a complex product (such as a computer, or a large piece of machinery, or even a square meter of cloth or other material, a length of wire or other material, etc.).

It is assumed here that the use of the term “item” in the question refers to a “unit.” It might, however, refer to a quality characteristic, and the explanation given here will attempt to explain either case.

Now, units can have a single principal quality characteristic or they can have many different quality characteristics.

Z1.4 allows for some of these quality characteristics to be of greater importance (severity for example, with respect to quality and/or economic effects) than others, whereby separate sampling is applied to each group with different sampling parameters (such as sample size, acceptance number, lot size). Hence, units with a single quality characteristic can be checked by sampling via Z1.4 and units with multiple quality characteristics can be checked by sampling via Z1.4.

In each case, the chosen Acceptable Quality Limit (AQL) and what it stands for applies to whatever is included in the inspection made on each unit. It is also assumed that this separate handling of units and quality characteristics is what the question means with respect to the term “split.”

Furthermore, it should also be understood that sampling inspection can be conducted with respect to two distinctly different statistics. One is the number of nonconforming units found in the sample. These are sometimes referred to as “defectives.” The second is the number (sum) of nonconformities found on all units in the sample, where any given single unit can have multiple nonconformities. These are often referred to as “defects.”

A “nonconforming unit” is defined as a unit with one or more nonconformities (defects) — but counted only as one “defective” unit. A “nonconformity” is any departure for any quality characteristic being considered in the inspection of each unit. In Z1.4, one can use either statistic as desired. The choice is largely dependent on the nature of product units and the reason for doing the sampling inspection — whether it is to control or oversee defective units or to control or oversee defects.

In the tables of Z1.4, note the top line above the range of AQLs: “Acceptance Quality Limits (AQLs), Percent Nonconforming Items and Nonconformities per 100 Items”. It should also be pointed out that Z1.4 is intended to be a sampling scheme or system, not just a selection of a given sampling plan. Please review the standard and any number of excellent books available on sampling inspection covering Z1.4, ISO 2859, and etc.

2. If one examines the Z1.4 standard from cover to cover, one will not encounter the term “confidence level.” Z1.4 contains no confidence intervals (or levels) related to any of its features.

Furthermore, the 95% figure is a very general figure associated with the expected “probability of acceptance” at the designated (selected) AQL. This is NOT a confidence level! In fact, the AQL is NOT a statistic!

Setting an AQL is generally an agreement/negotiation process between the customer and supplier. It is more of an index. Essentially, it refers to a level of nonconformity that is generally “acceptable” — a value of 0 being desired of course — but otherwise, a compromise figure.

And it is not by any means a constant, as can be seen by examining the Operating Characteristic (OC) Curves for the various code letters A through R using the same AQL in every table.

For example, for an AQL of 2.5% with the code letter C plan, incoming quality p must be 1.03% for Pa to be 95%, and Pa at 2.5% is less than 90%; for the code letter F plan, p must be 1.80% for Pa to be 95% and Pa at 2.5% is between 90% and 95%, etc.

If confidence intervals at chosen levels are desired for any given sampling plan, one most resort to the theory and methodologies of statistical inference with the available information provided by the sample statistics.

Kenneth Stephens
ASQ Fellow
ASQ Quality Press Author

Related Resources:

Browse the free, open access articles below, or find more in the ASQ Knowledge Center.

Acceptance Sampling With Rectification When Inspection Errors Are Present, Journal of Quality Technology

In this paper the authors consider the problem of estimating the number of nonconformances remaining in outgoing lots after acceptance sampling with rectification when inspection errors can occur. Read more.

Zero Defect Sampling, World Conference on Quality and Improvement

Zero defect sampling is an alternative method to the obsolete Mil Std 105E sampling scheme previously used to accept or reject products, and the remaining ANSI Z1.4-1993 which is still in use. This paper discusses the development of zero defect sampling and compares it to Mil Std 105E. Read more.

Explore the ASQ Knowledge Center for more case studies, articles, benchmarking reports, and more.

Browse ASQ magazines and journals here.

Sampling in a Call Center

Q: I work as a quality assessor (QA) and I am assisting with a number of analyses in a call center. I need a little help with sampling. My questions are as follows:

1. How do I sample calls taken by an agent if there are six assessors and 20 call center agents that each make 100 calls per day?

2. I am assessing claims paid and I want to determine the error rate and the root cause. How many of those claims would have to be assessed by the same number of QAs if claims per day, per agent, exceed 100?

3. If there are 35 interventions made by an agent per day, with two QAs assessing 20 agents in this environment, then the total completed would amount to between 300 to 500 per month. What would be the sample size be in this situation?

A: I may be able to provide some ideas to help solve your problem.

The first question is about sampling calls per day by you and your fellow assessors. It is clear that the six assessors are not able to cover all of the calls handled by the 20 call center agents.

What is missing from the question is what are you measuring — customer satisfaction, correct resolution of issues, whether agents are appropriately following call protocols, or something else? Be very clear on what you are measuring.

For the sake of providing a response, let’s say you are able to judge whether the agents are appropriately addressing callers’ issues or not. A binary response, or simply a call, is either considered good or not (pass/fail). While this may oversimply your situation, it may be instructive on sampling.

Recalling some basic terms from statistics, remember that a sample is taken from some defined population in order to characterize or understand the population. Here, a sample of calls are assessed and you are interested in what portion of the calls are handled adequately (pass). If you could measure all calls, that would provide the answer. However, a limit on resources requires that we use sampling to estimate the population proportion of adequate calls.

Next, consider how sure you want the results of the sample to reflect the true and unknown population results. For example, if you don’t assess any calls and simply guess at the result, there would be little confidence in that result.

Confidence in sampling in one manner represents the likelihood that the sample is within a range of about the sample’s result. A 90 percent confidence means that if we repeatedly draw samples from the population, then the result from the sample would be within a confidence bound (close to the actual and unknown result) 90 percent of the time. That also means that the estimate will be wrong 10 percent of the time due to errors caused by sampling. This error is simply the finite chance that the sample draws from more calls that “pass” or “fail.” The sample, thus, is not able to accurately reflect the true population.

Setting the confidence is a reflection on how much risk one is willing to take related to the sample providing an inaccurate result. A higher confidence requires more samples.

Here is a simple sample size formula that may be useful in some situations.

n is samples size

C is confidence where 90% would be expressed as 0.9

pi is proportion considered passing, in this case good calls.

ln is  the natural logarithm

If we want 90 percent confidence that at least 90 percent of all calls are judged good (pass), then we need at least 22 monitored calls.

This formula is a special case of the binomial sample size calculation and assumes that there are no failed calls in the calls monitored. This assumes that if we assess 22 calls and none fail, that we have at least 90% confidence that the population has at least 90% good calls. If there is a failed call out the 22 assessments, we have evidence that we have less than 90 percent confidence of at least 90 percent good calls. This doesn’t provide information to estimate the actual proportion, yet it is a way to detect if the proportion falls below a set level.

If the intention is to estimate the population proportion of good vs. bad calls, then we use a slightly more complex formula.

pi is the same, the proportion of good calls vs. bad calls

z is the area under a standard normal distribution corresponding to alpha/2 (for 90 percent confidence, we have 90 = 100 percent (1-alpha), thus, in this case alpha is 0.1. The area under the standard normal distribution is 1.645.

E is related to accuracy of the result. It defines a range within which the estimate should reside about the resulting estimate of the population value. A higher value of E reduces the number of samples needed, yet the result may be further away from the true value than desired.

The value of E depends on the standard deviation of the population. If that is not known, just use an estimate from previous measurements or run a short experiment to determine a reasonable estimate. If the proportion of bad calls is the same from day-to-day and from agent-to-agent,  then the standard deviation may be relatively small. If, on the other hand, there is agent-to -agent and day-to-day variation, the standard deviation may be relatively large and should be carefully estimated.

The z value is directly related to the confidence and affects the sample size as discussed above.

Notice that pi, the proportion of good calls, is in the formula. Thus if you are taking the sample in order to estimate an unknown pi, then to determine sample size, assume pi is 0.5. This will generate the largest possible sample size and permit an estimate of pi with confidence of 100 percent (1-alpha) and accuracy of E or better. If you know pi from previous estimates, then use it to help reduce the sample size slightly.

Let’s do an example and say we want 90 percent confidence. The alpha is 0.1 and the z alpha/2 is 1.645. Let’s assume we do not have an estimate for pi, so we will use 0.5 for pi in the equation. Lastly, we want the final estimate based on the sample to be within 0.1 (estimate of pi +/- 0.1), so E is 0.1.

Running the calculation, we find that we need to sample 1,178 calls to meet the constraints of confidence and accuracy. Increasing the allowable accuracy or increasing the sampling risk (higher E or higher C) may permit finding a meaningful sample size.

It may occur that obtaining a daily sample rate with an acceptable confidence and accuracy is not possible. In that case, sample as many as you can. The results over a few days may provide enough of a sample to provide an estimate.

One consideration with the normal approximation of a binomial distribution for the second sample size formula is it breaks down when either pi n and n (1-pi) are less than five. If either value is less than five, then the confidence interval is large enough to be of little value. If you are in this situation, use the binomial distribution directly rather than the normal approximation.

One last note. In most sampling cases, the overall size of the population doesn’t really matter too much. A population of about 100 is close enough to infinite that we really do not consider the population size. A small population and a need to sample may require special treatment of sampling with or without replacement, plus adjustments to the basic sample size formulas.

Creating the right sample size to a large degree depends on what you want to know about the population. In part, you need to know the final result to calculate the “right” sample size, so it often just an estimate. By using the above equations and concepts, you can minimize risk of determining an unclear result, yet it will always be an evolving process to determine the right sample size for each situation.

Fred Schenkelberg
Voting member of U.S. TAG to ISO/TC 56
Voting member of U.S. TAG to ISO/TC 69
Reliability Engineering and Management Consultant
FMS Reliability
www.fmsreliability.com

Related Content:

To obtain more resources on sampling and statistics, explore the open access ASQ journal articles below or browse ASQ Knowledge Center search results.

Rethinking Statistics for Quality Control, Quality Engineering

Setting Appropriate Fill Weight Targets — A Statistical Engineering Case Study, Quality Engineering

Compliance Testing for Random Effects Models With Joint Acceptance Criteria, Technometics

Z1.4:2008 inspection levels

Q: I am reading ANSI/ASQ Z1.4-2008: Sampling procedures and tables for inspection by attributes, and there is a small section regarding inspection level (clause 9.2). Can I get further explanation of how one would justify that less discrimination is needed?

For example, my lot size is 720 which means, under general inspection level II, the sample size would be 80 (code J). However, we run a variety of tests, including microbial and heavy metal testing. These tests are very costly. We would like to justify that we can abide by level I or even lower if possible. Do you have any advice?

The product is a liquid dietary supplement.

 A: Justification of a specific inspection level is the responsibility of the “responsible party.” Rationale for using one of the special levels (S-1, S-2, S-3, S-4) could be based on the cost or time to perform a test. Less discrimination means that the actual Acceptable Quality Level (AQL) on the table underestimates the true AQL, as the sample size has been reduced from the table-suggested sample size (i.e. Table II-A has sample level G of 32 for a lot size of 151 to 280, while General Inspection level I would require Letter E or 13 samples for the same lot size).

Justification of a sampling plan is based on risk and a sampling plan can be justified based on the cost of the test, assuming you are willing to take larger sampling risks. If you use one of the special sampling plans based on the cost of the test, it is helpful to calculate the actual AQL and Limiting Quality (LQ) using the following formulas.

You solve the equation for AQL and LQ for a given sample size (n) and defects allowed (x):

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com

Related Content:

Acceptance Sampling With Rectification When Inspection Errors Are Present, Journal of Quality Technology

In this paper the authors consider the problem of estimating the number of nonconformances remaining in outgoing lots after acceptance sampling with rectification when inspection errors can occur. Read more.

Zero Defect Sampling, World Conference on Quality and Improvement

Zero defect sampling is an alternative method to the obsolete Mil Std 105E sampling scheme previously used to accept or reject products, and the remaining ANSI Z1.4-1993 which is still in use. This paper discusses the development of zero defect sampling and compares it to Mil Std 105E. Read more.

Sampling Plan for Pharmaceuticals

Pharmaceutical sampling

Q: We are a U.S. dietary supplements manufacturer operating under c-GMP conditions set by the U.S. Food & Drug Administration (FDA).

As such, we perform analyses of incoming raw materials (finished product ingredients), intermediate products (during manufacturing), and finished products. Analyses include identity testing (incoming raw materials), and other types of analysis (e.g. microbiological, heavy metals, some quantitative assays on specific compounds). These tests would be the attributes we wish to assess.

Basically, we are refining our sampling procedures and need to ascertain an acceptable number of samples to be taken for the various testing purposes outlined above.

The World Health Organization’s (WHO) Technical Report Series No. 929,  Annex 4, “WHO Guidelines for sampling of pharmaceutical products and related materials” references ANSI/ISO/ASQ 2859-1:1999 Sampling procedures for inspection of attributes – Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection in reference to the selection of a statistically-valid number of samples for testing purposes.

I note from your website that there are a number of other sampling standards available. I am seeking some guidance as to the most appropriate standard(s) for our particular purposes.

Any assistance you can offer would be much appreciated.

A: Though many of the sampling plans are similar, many standards organizations have published different interpretations of sampling schemes.  Since WHO recommends using ISO 2859-1 as the guidance document, I suggest selecting that plan.

There are similar documents that could be used as an alternative, if necessary:

1. ANSI/ASQ Z1.4-2008: Sampling Procedures and tables for inspection by attributes

2. BS 6001-1:1999/ISO 2859-1:1999+A1:2011 Sampling procedures for inspection by attributes. Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection

3. MIL-STD-105E – Sampling Procedures and Tables for Inspection by Attributes*

4. JIS Z9015-0-1999 Sampling procedures for inspection by attributes — Part 0 Introduction to the JIS Z 9015 attribute sampling system

A few points to consider:

  • Usually for FDA-regulated products, a c=0 sampling plan is appropriate. See H1331 Zero Acceptance Number Sampling Plans, Fifth Edition, by Nicholas L. Squeglia
  • Based on risk, an Acceptable Quality Level (AQL) should be selected
  • Your sample size is usually set to be proportional to lot size.  If you are doing testing on bulk raw materials, the sample size will be set based on the variability of the lot as well as the variability of the method.

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com/

Note:

 *military standard, cancelled and superceded by MIL-STD-1916, “DoD Preferred Methods for Acceptance of Product”, or ANSI/ASQ Z1.4:2008, according to Notice of Cancellation

More open access resources about sampling from ASQ:

Explore more in the ASQ Knowledge Center.

Sampling Employee Tasks

Q: We are collecting data on what tasks our employees in various departments do each day. We hope to eventually get a representation of what each employee does all year long.  Randomly, throughout the day, employees record the tasks they are doing.  We are not sure how to calculate an appropriate sample size and we are not sure how many data points to collect.

A: I wish there was a simple answer.  We need to consider:

  • If it makes a difference on how long an employee has been performing a job?
  • Are the departments are equivalent in terms of what they are doing?
  • What is the difference that you  want to detect?

The simple rule is that the smaller the difference, then the larger the sample size. By smaller, it is less than 1 standard deviation from the data that has been detected.

Random records are O.K., but really, shouldn’t you want a record for everyone for at least a week? That would give you an idea of what is done across the board and, then, if you are trying to readjust the workloads, you have some basis for it based on the logs.  My concern with the current method is that you may have a lot of extra paperwork to account for everyone for a certain time.

Additional information provided by the questioner:

The goal of this project is to establish a baseline of activities that occur in the department and to answer the question “What does the department do all day?”

The amount of time an employee has been performing a job does not make a difference. The tasks performed in each department are considered equivalent.  We are not accounting for the amount of time it takes to complete a task — we are more interested in how frequently that task is required/requested.

The results will be used to identify enhancement opportunities to our database and identifying improvements to the current (and more frequent) processes.  The team will use a system (form in Metastorm) to capture activities throughout the day.  Frequency is approximately 5 entries an hour at random times of the hour.

I have worked with the department’s manager to capture content for the following fields using the form:

  1. Department (network management or dealer relation)
  2. Task (tier 1)
  3. Why (tier 2 – dependent on selection of task)
  4. Lessee/client name
  5. Application
  6. Country
  7. Source of request (department)

We are looking for a reasonable approach to calculate the sample size required for a 90 – 95% confidence level.  The frequency of hourly entries and length of period to capture the data can be adjusted to accommodate the resulting sample size.

A: The additional information helps.  Since you have no previous data and you are getting 5 samples an hour from each employee, (assuming a 7 hour workday, taking out lunch and two breaks), that will give you approximately 35 samples a day. Assuming a five-day week, that gives you approximately 175 data points per employee.  This should give you enough information to get an estimate of what is done for a week.

Now, you will probably want to extend this out another three weeks so that you have an idea of what happens over a month.  If you can assume that the data collected is representative of all months, then you should be O.K.  If you feel that some months are different, then you may want to look at taking another sample during the months where you anticipate different volumes from the one you have. You can use the sample size calculation for discrete data using the information that you have already collected and not look at all employees, but target your average performers.

Jim Bossert
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CMBB
Fort Worth, TX

Learn more about sampling with open access articles from ASQ publications:

Explore more in the ASQ Knowledge Center.

 

Could Null Hypothesis State Difference?

Q: Does a null hypothesis always state that there is no difference?  Could there be a null hypothesis that claims there is? 

In the U.S. legal system, the null hypothesis is that the accused is assumed innocent until proven guilty.  In another legal system, there might exist the possibility that the accused is assumed guilty until proven innocent.  In our system, a type 1 error would be to find an innocent man guilty.  What would be considered a type 1 error if the null hypothesis was assumed guilt?

A: Sir Ronald Fisher developed this basic principle more than 90 years ago. As you have correctly stated above, the process is assumed innocent until proven guilty. You must have evidence beyond reasonable doubt. An alpha error (type 1) is calling an innocent person guilty. Failure to prove guilt when a person really did commit a crime is a Beta error (type 2).

What can null hypothesis tell us?  Does the confidence interval include zero (or innocence in the court example)? Instead of asking, “can you assume guilt and prove innocence?” — turn the question around and ask “does the confidence interval include some value that is guilty?”

For example, let’s say a process has an unknown mean and standard deviation, but it has customer specifications from 8-12 millimeters. Your sample measures 14 millimeters. Clearly, your sample is guilty by customer specifications. We need to prove beyond reasonable doubt that the confidence interval of the process, at some risk level (alpha), does not include guilty material. This is done by measuring the process for control.  If it is in control and not meeting customer specifications, either move the distribution, reduce the variation (through Design of Experiments, or other methods), or through some combination of both.

If the new confidence interval does not include guilt, the argument would be that you have proven, beyond reasonable doubt, that the confidence interval does not include the out-of-spec material. Under this circumstance, a type 1 error (alpha error) would be a process  mean less than the upper specification, but the confidence interval included the specification.

For further reading on this material, refer to the following text:

Testing Statistical Hypothesis, E. L. Lehmann and Joseph Romano,  January 2005.

 Bill Hooper
ASQ Certified Six Sigma Master Black Belt
President, William Hooper Consulting Inc.
Williamhooperconsulting.com
Naperville, IL

 

ANOVA for Tailgate Samples

Automotive inspection, TS 16949, IATF 16949

Q: I have a question that is related to comparison studies done on incoming inspections.

My organization has a process for which it receives a “tailgate” sample from a supplier and then compares that data with three samples of the next three shipments to “qualify” them. The reason behind this comparison is to determine if the production process of the vendor has changed significantly from the “tailgate” sample, or if they picked the best of the best for the “tailgate.”

It seems a student’s t-test for comparing two means might be a simple and quick evaluation, but I believe an ANOVA might in order for the various characteristics measured (there are multiple).

Can an expert provide some statistician advice to help me move forward in determining an effective solution?

A: Assuming the data is continuous,  ANOVA (or MANOVA for multiple responses) should be employed. Since the tailgate sample is a control, Dunnett’s multiple comparison test should be used if the p-value from ANOVA is less than 0.05.  If the data is discrete (pass/fail), then comparing the lots would require the use of a chi-square test.

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Principal Statistician, BD
http://statisticaloutsourcingservices.com/

Is C=0 in Z1.4?

Chart, graph, sampling, plan, calculation, z1.4

Q: I have ANSI/ASQ Z1.4-2008 Sampling Procedures and Tables for Inspection by Attributes. I looked through it rapidly, and I still can’t find the C=0 plan directly, so I am a little confused. I thought C=0 is included in Z1.4. Is the C=0 plan spirit/concept contained in Z1.4 or does C=0 need to be calculated from the several tables in Z1.4? (if yes, which tables?).

A: Z1.4:2008 is a general sampling plan for attributes.  It is tabled by AQL with varying accept reject numbers.  The standard gives a framework for attribute inspection plans. Though Z1.4 does have some plans where C=0, they are NOT optimal to minimize the Type II error. For C=0 plans specifically, I would recommend purchasing Zero Acceptance Number Sampling Plans, Fifth Edition.  The value of the Z1.4 standard is the switching rules used for incoming inspection.

Steven Walfish
Secretary, U.S. TAG to ISO/TC 69
ASQ CQE
Statistician, GE Healthcare
http://statisticaloutsourcingservices.com/

Z1.4 or Z1.9 Sampling Plan for IT Tickets

Data review, data analysis, data migration

Q: I need to purchase a sampling standard. However I notice there are a few options for sampling plans, such as attributes vs. variables.  I am not sure which one will best fit my needs.  I need help in determining this.

I need to determine what the best sample size would be for recurring IT operations.  For example:  If my server team closes 500 tickets a month and I want to pick a sample size to review for quality purposes, what is the best chart to use to determine what the industry standards say are the recommended sample size?  My understanding is there is a light, normal and heavy chart that can be offered.

Please help.  Thanks!

A: The answer is “it depends.”  What it depends on what is she reviewing for quality purposes?  If the inspection is for either “good quality” or “poor quality,” then Z1.4-2008: Sampling Procedures and Tables for Inspection by Attributes, would be appropriate.  If she is measuring something, “time to close,” for example, then Z1.9-2008: Sampling Procedures and Tables for Inspection by Variables for Percent Nonconforming, might be appropriate, although Z1.9 is really only good if the data are normally distributed, which waiting times are generally not.

With more information, I could provide a more definitive answer.

Q: Our intention right now, is to evaluate tickets closed  (or work processed, which could be in other facets other than tickets, may be items logged in a log sheet to check service statuses, etc) to determine if the quality of work performed meets our quality standards.  We are determining what “quality” means to us.  For example:  We want to look at tickets closed to determine if the ticket was escalated properly from our tier 1 to tier 2 team AND if the work log of that ticket had the correct data and correct amount of data documented.  Meaning a tech didn’t just say “resolved user issue,” but rather they documented more relevant data about what they did to resolve the issue.  All of the work performed is service delivery in an operations environment, so the evaluations will be performed on the quality of following our processes and the quality of our resources.  We have an amount of tickets closed per month that vary, slightly up or slightly down.  I want to look at a table to determine what our sample size should be.

However, in addition to the above, I am very interested in learning the other plan too because we do have Service Level Objectives (SLO’s and SLA’s) in this environment (example: time to close, first call resolution, call abandonment rate, etc) If I can understand that other table and how to use it, both may be valuable and I may purchase both.

I didn’t understand the comment that “Z1.9 is really only good if the data are normally distributed, which waiting times are generally not.”  What does normally distributed mean?  I would like that explained.
Can your expert answer and provide information on both sampling plans for me?

Thanks again and I look forward to the response.

A: Normally distributed means that the data follow a bell-shaped curve with the most frequency values falling around some average and tailing off in frequency both above and below that average.  Many processes in real life follow the normal distribution.  Time to close is an exception.  It is more likely to follow the exponential distribution, which means that there will be lots of tickets closed at shorter durations, with some tailing out very far into longer durations.  Also a ticket can’t be closed at less than 0 duration.  The normal distribution extends, in theory, to +/- infinity.  Rates (percentages, I’m assuming) can often be approximated using the normal distribution as long as they aren’t too near 0% or 100%.  If they are near the edges a square root transformation often help to make the data more approximately normal.

Most of the quality characteristics you described are of the pass-fail variety which implies Z1.4 would be appropriate.

I strongly recommend that you take a course and/or read a book on statistical process control or acceptance sampling before attempting this.  There are many potential gotchas that can lead to erroneous analysis and therefore decision making.  ASQ offers some that are quite good.  A comprehensive book would be:

Process Quality Control: Troubleshooting and Interpretation of Data, Fourth Edition
by Ellis R. Ott, Edward G. Schilling, and Dean V. Neubauer.

Brenda Bishop
US Liaison to TC 69/WG3
CQE,CQA,CMQ/OE,CRE,SSBB,CQIA
Belleville, Illinois