Random Sampling for QC

December 16, 201411 yr

Just started a new project for a Lab, and they manufacture slides from cuttings from tissue. The way it was explained to me is that they need to generate a random sampling of cuts for quality control.
it was mentioned there is higher failure rate at the end of the block or at the beginning of the block vs the middle of the block during the person shift.

Trying to determine the proper math needed to generate random sample size for the lot so that the system can assign random code to a slides from the lots so the a QC tech can pull the slides from the cold storage and test them.

The goal being the QC tech isn't the same person that cut the slides - or at least their slides are in the blind behind a random code - that of course resolves back to the lot behind the scene.

December 16, 201411 yr

What is the goal of the quaity study? Their is a whole body of work in quality engineering that leverages statistics and sampling inspection to judge the quality of a production run. Seems to me that you should ask some more questions on exactly what they want. Random sample generation of a single batch is easly done by consecutivly numbering the members of the lot, choosing a statistically appropriate sample size, and selecting the members at random. I would leverage something like RandomInteger() at brian dunnings custom function site to build a cf that does something like RandomSample( lotsize , samplesize ) that would return the list of inspection batch members. I'm pretty sure this would be easy for you. From a selection perspective If you randomize the sample and randomize the lot numbering you may lose any ability to discern when (end of the block or at the beginning of the block vs the middle of the block during the person shift) the defects surface in a run.

http://en.wikipedia.org/wiki/Sample_size_determination

1

December 16, 201411 yr

Author

Thanks Kris

I think the goal is two fold, first is to label samples so that just incase the same person that cut samples is the same person doing QC just so that their results are not biased.

It's a small lab so just not enough techs in rotation to insulate the process. At least the person pulling items from lots can be assigned and by the time they do QC adequate time has passed that looking at the QC lots it would be hard to identify who the tech was that did the cuts.

Second is production of quality - measuring the performance of the tech doing cuts - we plan to start recording time start / stop when doing blocks and calculating yields from each lot to determine the proficiency of the tech, weighted by the results of the random samples taken. I think there is some sort of incentive program for quaintly production for the techs.

And lastly with the duration of the process we can calculate the labor costs in and products are produced so that an average can be established along with the other fixed expenses that make up the product.

If a block can on average yield 80 - 100 slides this would be our lot size, i think i was told that they pull one slide from a lot. So finding a representative sample is easy as if it were a hecatohedral. (100 sided dice )

December 16, 201411 yr

I'm still trying to wrap my head around exactly what issues you are and aren't dealing with. Which of these are actual issues?

Randomization of slide numbers/IDs to act as a "blind" between the slide preparation and QC processes — What constraints are there on the slide ID scheme? Is there a reason you can't use UUIDs, which are already pretty inscrutable?
Measuring production quality and quantity — Do they already have formulas for calculating these for you to implement, or are they looking for help figuring this out? Are the different possible actions quantitatively different ($ bonus for good performance) or qualitative (deciding to fire someone or not)? What are the thresholds of measured quality and quantity that lead to qualitatively different responses? How confident do they need to be about the lot quality based on the sampled slides to take those different actions?
Determining how large a sample is needed to achieve the desired confidence in the quality measurement — There are well established techniques on this, and Kris's Wikipedia link is a perfectly good start. Let us know if you need any more guidance on that.
Selecting a random sample from a lot — It sounds like you have this under control already. If you do need more than 1 sample from a lot, my go-to approach for this kind of situation is to gather all the IDs from the lot, shuffle them randomly with the ValueShuffle custom function, and select the first <sample size> values from the top of the result.

December 16, 201411 yr

Leaning on wikipedia again... here is an excellent article that describes sampling. Making a decision about sample methods (ref section 4) is not always straight forward especially where personel decisions are the ourcome of the analysis

http://en.wikipedia.org/wiki/Sampling_%28statistics%29

December 17, 201411 yr

Author

Thanks for the discussion, just getting into the project further -

The wiki article is a great read - will have to read it again.

to address your questions Jeremy.

1 - for the back end uuids are fine for that but the issue is the actual physical label size that can fit on the slide - they are tiny - currently they have a lot number on that sample so it more of a procedure - the person that pulls samples needs to re label these with some other number assigned by the database so that the person doing QC doesn't know where they come from.

2 - good question will have to think that over and probe them for more discussion. - I think part of it for retrospect analysis: n# hour for this person or these people to prepare 4000 slides, and that took n# number of days. So future orders can accommodate staffing, and predict lead-time for delivery.

3 - it seems that for a lot they only pick ONE slide however it seems that some products have historically have yield great results in n the middle of the block but the end of the blocks they have higher failure rate - so after speaking with someone today, the thought is that they will determine in the product table where to weight the sample from ether the beginning somewhere in the middle and somewhere at the end depending on how many they were able to yield - and how many determine the a segment if you cut 120 slices perhaps only the first 8 are considered the beginning and the last 12 could be considered the end - since they only take one sample from that lot of cuts then they'd prefer to flip a coin to determine to either take it from the front or back - and then from that population select one of the slides for QC.

Probably not the most scientific approach. But when i have a chance to sit down with the resident PhD academic to find out if that is adequate for their needs.

December 17, 201411 yr

1 - for the back end uuids are fine for that but the issue is the actual physical label size that can fit on the slide - they are tiny - currently they have a lot number on that sample so it more of a procedure - the person that pulls samples needs to re label these with some other number assigned by the database so that the person doing QC doesn't know where they come from.

I was hoping for more detail on the constraints. How many characters fit in the physical area available? What character set are you limited to (numbers, alphanumeric)? Do the slide numbers have to be unique for all slides ever produced by the client, or only unique within a lot? These affect what randomization/hash/obfuscation approach I might try first.