**Q:** I work as a quality assessor (QA) and I am assisting with a number of analyses in a call center. I need a little help with sampling. My questions are as follows:

1. How do I sample calls taken by an agent if there are six assessors and 20 call center agents that each make 100 calls per day?

2. I am assessing claims paid and I want to determine the error rate and the root cause. How many of those claims would have to be assessed by the same number of QAs if claims per day, per agent, exceed 100?

3. If there are 35 interventions made by an agent per day, with two QAs assessing 20 agents in this environment, then the total completed would amount to between 300 to 500 per month. What would be the sample size be in this situation?

**A:** I may be able to provide some ideas to help solve your problem.

The first question is about sampling calls per day by you and your fellow assessors. It is clear that the six assessors are not able to cover all of the calls handled by the 20 call center agents.

What is missing from the question is what are you measuring — customer satisfaction, correct resolution of issues, whether agents are appropriately following call protocols, or something else? Be very clear on what you are measuring.

For the sake of providing a response, let’s say you are able to judge whether the agents are appropriately addressing callers’ issues or not. A binary response, or simply a call, is either considered good or not (pass/fail). While this may oversimply your situation, it may be instructive on sampling.

Recalling some basic terms from statistics, remember that a sample is taken from some defined population in order to characterize or understand the population. Here, a sample of calls are assessed and you are interested in what portion of the calls are handled adequately (pass). If you could measure all calls, that would provide the answer. However, a limit on resources requires that we use sampling to estimate the population proportion of adequate calls.

Next, consider how sure you want the results of the sample to reflect the true and unknown population results. For example, if you don’t assess any calls and simply guess at the result, there would be little confidence in that result.

Confidence in sampling in one manner represents the likelihood that the sample is within a range of about the sample’s result. A 90 percent confidence means that if we repeatedly draw samples from the population, then the result from the sample would be within a confidence bound (close to the actual and unknown result) 90 percent of the time. That also means that the estimate will be wrong 10 percent of the time due to errors caused by sampling. This error is simply the finite chance that the sample draws from more calls that “pass” or “fail.” The sample, thus, is not able to accurately reflect the true population.

Setting the confidence is a reflection on how much risk one is willing to take related to the sample providing an inaccurate result. A higher confidence requires more samples.

Here is a simple sample size formula that may be useful in some situations.

n is samples size

C is confidence where 90% would be expressed as 0.9

pi is proportion considered passing, in this case good calls.

ln is the natural logarithm

If we want 90 percent confidence that at least 90 percent of all calls are judged good (pass), then we need at least 22 monitored calls.

This formula is a special case of the binomial sample size calculation and assumes that there are no failed calls in the calls monitored. This assumes that if we assess 22 calls and none fail, that we have at least 90% confidence that the population has at least 90% good calls. If there is a failed call out the 22 assessments, we have evidence that we have less than 90 percent confidence of at least 90 percent good calls. This doesn’t provide information to estimate the actual proportion, yet it is a way to detect if the proportion falls below a set level.

If the intention is to estimate the population proportion of good vs. bad calls, then we use a slightly more complex formula.

pi is the same, the proportion of good calls vs. bad calls

z is the area under a standard normal distribution corresponding to alpha/2 (for 90 percent confidence, we have 90 = 100 percent (1-alpha), thus, in this case alpha is 0.1. The area under the standard normal distribution is 1.645.

E is related to accuracy of the result. It defines a range within which the estimate should reside about the resulting estimate of the population value. A higher value of E reduces the number of samples needed, yet the result may be further away from the true value than desired.

The value of E depends on the standard deviation of the population. If that is not known, just use an estimate from previous measurements or run a short experiment to determine a reasonable estimate. If the proportion of bad calls is the same from day-to-day and from agent-to-agent, then the standard deviation may be relatively small. If, on the other hand, there is agent-to -agent and day-to-day variation, the standard deviation may be relatively large and should be carefully estimated.

The z value is directly related to the confidence and affects the sample size as discussed above.

Notice that pi, the proportion of good calls, is in the formula. Thus if you are taking the sample in order to estimate an unknown pi, then to determine sample size, assume pi is 0.5. This will generate the largest possible sample size and permit an estimate of pi with confidence of 100 percent (1-alpha) and accuracy of E or better. If you know pi from previous estimates, then use it to help reduce the sample size slightly.

Let’s do an example and say we want 90 percent confidence. The alpha is 0.1 and the z alpha/2 is 1.645. Let’s assume we do not have an estimate for pi, so we will use 0.5 for pi in the equation. Lastly, we want the final estimate based on the sample to be within 0.1 (estimate of pi +/- 0.1), so E is 0.1.

Running the calculation, we find that we need to sample 1,178 calls to meet the constraints of confidence and accuracy. Increasing the allowable accuracy or increasing the sampling risk (higher E or higher C) may permit finding a meaningful sample size.

It may occur that obtaining a daily sample rate with an acceptable confidence and accuracy is not possible. In that case, sample as many as you can. The results over a few days may provide enough of a sample to provide an estimate.

One consideration with the normal approximation of a binomial distribution for the second sample size formula is it breaks down when either pi n and n (1-pi) are less than five. If either value is less than five, then the confidence interval is large enough to be of little value. If you are in this situation, use the binomial distribution directly rather than the normal approximation.

One last note. In most sampling cases, the overall size of the population doesn’t really matter too much. A population of about 100 is close enough to infinite that we really do not consider the population size. A small population and a need to sample may require special treatment of sampling with or without replacement, plus adjustments to the basic sample size formulas.

Creating the right sample size to a large degree depends on what you want to know about the population. In part, you need to know the final result to calculate the “right” sample size, so it often just an estimate. By using the above equations and concepts, you can minimize risk of determining an unclear result, yet it will always be an evolving process to determine the right sample size for each situation.

Fred Schenkelberg

Voting member of U.S. TAG to ISO/TC 56

Voting member of U.S. TAG to ISO/TC 69

Reliability Engineering and Management Consultant

FMS Reliability

http://www.fmsreliability.com

**Related Content:**

To obtain more resources on sampling and statistics, explore the open access ASQ journal articles below or browse ASQ Knowledge Center search results.

Rethinking Statistics for Quality Control, *Quality Engineering*

Setting Appropriate Fill Weight Targets — A Statistical Engineering Case Study, *Quality Engineering*

Compliance Testing for Random Effects Models With Joint Acceptance Criteria, *Technometics*