Q: We are collecting data on what tasks our employees in various departments do each day. We hope to eventually get a representation of what each employee does all year long. Randomly, throughout the day, employees record the tasks they are doing. We are not sure how to calculate an appropriate sample size and we are not sure how many data points to collect.
A: I wish there was a simple answer. We need to consider:
- If it makes a difference on how long an employee has been performing a job?
- Are the departments are equivalent in terms of what they are doing?
- What is the difference that you want to detect?
The simple rule is that the smaller the difference, then the larger the sample size. By smaller, it is less than 1 standard deviation from the data that has been detected.
Random records are O.K., but really, shouldn’t you want a record for everyone for at least a week? That would give you an idea of what is done across the board and, then, if you are trying to readjust the workloads, you have some basis for it based on the logs. My concern with the current method is that you may have a lot of extra paperwork to account for everyone for a certain time.
Additional information provided by the questioner:
The goal of this project is to establish a baseline of activities that occur in the department and to answer the question “What does the department do all day?”
The amount of time an employee has been performing a job does not make a difference. The tasks performed in each department are considered equivalent. We are not accounting for the amount of time it takes to complete a task — we are more interested in how frequently that task is required/requested.
The results will be used to identify enhancement opportunities to our database and identifying improvements to the current (and more frequent) processes. The team will use a system (form in Metastorm) to capture activities throughout the day. Frequency is approximately 5 entries an hour at random times of the hour.
I have worked with the department’s manager to capture content for the following fields using the form:
- Department (network management or dealer relation)
- Task (tier 1)
- Why (tier 2 – dependent on selection of task)
- Lessee/client name
- Source of request (department)
We are looking for a reasonable approach to calculate the sample size required for a 90 – 95% confidence level. The frequency of hourly entries and length of period to capture the data can be adjusted to accommodate the resulting sample size.
A: The additional information helps. Since you have no previous data and you are getting 5 samples an hour from each employee, (assuming a 7 hour workday, taking out lunch and two breaks), that will give you approximately 35 samples a day. Assuming a five-day week, that gives you approximately 175 data points per employee. This should give you enough information to get an estimate of what is done for a week.
Now, you will probably want to extend this out another three weeks so that you have an idea of what happens over a month. If you can assume that the data collected is representative of all months, then you should be O.K. If you feel that some months are different, then you may want to look at taking another sample during the months where you anticipate different volumes from the one you have. You can use the sample size calculation for discrete data using the information that you have already collected and not look at all employees, but target your average performers.
SVP Process Design Manger, Process Optimization
Bank of America
ASQ Fellow, CQE, CQA, CMQ/OE, CSSBB, CMBB
Fort Worth, TX
Learn more about sampling with open access articles from ASQ publications:
- Sample Wise: Settling on a suitable sample size for your project is half the battle, Quality Progress
- Evaluating and Implementing 3-Level Control Charts, Quality Engineering