Monday, January 6, 2014

Confessions of a Pump Jockey

I admit it.  Early in my career I was a Pump Jockey.   I have received my basic training in air monitoring and I was quite proud of the fact that I could calibrate a pump and work out all the logistics of air monitoring.  Indeed, it was somewhat magical and heady for me to realize that we can sample the air and actually determine the concentration of specific chemical species within the air of the breathing zone of workers.

Armed with my list of Exposure Limits (both ACGIH TLVs and my company’s internal limits) I was ready to take on the world of Industrial Hygiene.   I was hot stuff!     I understood the basic premise that a ratio of Exposure/Exposure Limit less than one had a happy face J while an exposure above the exposure limit required some action L.   Confidence was high and self-doubt and introspection relatively low.   If an exposure limit was 10ppm and I measured a breathing zone exposure of 20ppm I was pretty sure that an overexposure had occurred L.   If I took a single measurement of a breathing zone exposure of 2.1ppm for the same compound I would tend to declare the situation “safe” J and not consider doing any more testing.   If I was the least bit unsure and took another sample (same scenario, same worker, and different day) and got 4.2ppm I would still tend to think that this average exposure that was less than 50% of the OEL was safe.    If for some crazy reason I took a third sample and got 8.4ppm my confidence might be shaken somewhat but I could still rationalize that the mean and median measured exposures were still below 50% of the OEL often considered to be the “action level” or point where you would do something to control the exposure L .

Enter statistical analysis and my introduction to reality.  Indeed, I eventually I learned that exposures in essentially all workplace environments are quite variable even for the same worker doing the same job.   I learned that most exposures are well described by either a normal or lognormal distribution.   The normal distribution is the “bell shaped curve” that has probabilities for every exposure value with likelihoods for those values.   The area from the top of the bell to the left (toward negative infinity) has 50% of the exposure values and the area to the right toward positive infinitely has the other 50%.    So if the population of exposure numbers is highly scattered or diverse then the width or spread of the bell is relatively broad.    It should be noted that the numbers never end, they go to negative infinity to the left and positive infinity to the right.  So there is always some finite (but often vanishingly small) chance of any exposure in this distribution.   A lognormal distribution is just the distribution of the log of all these exposures.   This distribution of exposures in a lognormal distribution is bounded on the left by zero (just like the real world) and positive infinity to the right.  It is skewed or pushed over to the left which means it is asymmetrical with more values of exposure concentrated toward zero (just like the real world).    Indeed, in general, the lognormal distribution does a much better job of describing the distribution of real world exposures in any homogeneous scenario and should be used by default as long as the data passes a fit test of the lognormal assumption.

The above is statistical reality but what we folks in the field need is a user-friendly statistical tool to put this rubber to the road.   There have been a number of candidates over the years but the latest and, in my opinion, the greatest is IH STAT developed by Dr. John Mulhausen who is the Director of Corporate Safety and Industrial Hygiene at 3M Company.   John developed the original spreadsheet program over the years where it has been modified into its current multilingual version by Daniel Drolet.   You can get it at:   For us English speakers, I suggest downloading the “macro free version” for ease of use.

As an exercise let’s put our data 2.1, 4.2 and 8.4 ppm into IH STAT and see what we get.    The program advises that the data fit both the normal and lognormal distribution but fit the lognormal better.   The error bands around the estimates of the mean are very broad primarily because we only have three samples.  Statistically, the model is much “happier” with 6 or more samples but that was frankly unheard of in my pump jockey days.

The statistical lognormal fitted model has a geometric standard deviation (GSD) of 2.0.     This represents the width of the lognormal curve as discussed above and a value of 2 is pretty typical.   Indeed, it is not until the GSD gets to be greater than 3 that the process is considered to be out of control or the exposure group poorly defined. 

What is most interesting about this analysis is that the lognormal distribution predicts that greater than 10% of the time the OEL will be exceeded in this exposure scenario.   That would mean that for more than 25 days in a 250-day working year the exposure in this scenario would be predicted to exceed the exposure limit (OEL).   If I had known this in my heady days as a pump jockey it would have given me pause.  Indeed, there was advice around even on those days from NIOSH that if the GSD was 2 then the “action level” should be about 10% of the OEL.   Thus, the above data were all above this recommended action level.   Unfortunately, absent wonderful tools like IH STAT, few were doing detailed statistical analysis in those days (the 1970s) and I certainly was not.

The Pennsylvania Dutch have a wonderful saying:  “Too soon old and too late smart”.   It is definitely not too late for you rise from pump jockey status to that of exposure assessor using this remarkable tool.



  1. Mike

    Well said. As a laboratorian, one familiar with the variability of instrumental measurements not to mention the variability of exposure you so succinctly described, it was always troubling when a client treated a single result as unequivocal evidence of an exposure, one way or the other. As a commerical laboratory, you can imagine the raised eyebrows when advising clients to take more samples to truly characterize the exposure. Best regards, Bob Lieckfield, Jr., CIH, Bureau Veritas North America.

  2. I have never been in a position that statistical modeling like this was necessary. I've always been curious, though, and I apologize if this is a dumb question: How well validated is the lognormal model? To use the above example, how many times has anyone performed 250 consecutive days of sampling on the same task to document the lognormal exposure distribution and the 10% overexposure prediction? Would such sampling not be likely to reduce the gsd and change the percentage of overexposures predicted?

  3. It is not a dumb question. Indeed,it has been discussed quite recently among folks with more statistical knowledge than myself. It is my understanding that the lognormal distribution does fit large data sets as well as or better than most other distributions. Given the uncertainty from just 3 samples in the example - a sample of 250 days will almost certainly be different from the 10% predicted. This was just the best prediction based on the available data. There does remain however a lot of uncertainty which is also statistically estimated. Indeed, the gsd could go up or down - it was simply the best estimate from only 3 samples.

  4. Hi Mike,

    Thank you very much for a very informative piece.

    I tried downloading the mentioned file but it seems the link is not functioning, is there another portal that can be used?

  5. Dear Anonymous,

    Sorry you could not get the files. Tell me which you want and I will send them to you.

  6. Too bad you were not my teacher in college and or trainer in the work place!

  7. This cunt knows what he is talking about.