## Sunday, August 4, 2013

### Excel Regression Analysis for Ventilation Rate

This is a hopefully “bite-sized” lesson in tracer gas ventilation rate analysis.  It gets a little technical but I tried to break it down into little pieces and if you stay with me I think you will find it rewarding.

In last week’s blog we went through a derivation of the first order kinetic model.   The primary equation is:   C = C0 e-kt.   Where C is the concentration at any time (t) and C0 is the starting concentration.   Here k is the rate of loss as proportion per unit time.  For example k = 0.5 means that 50% is lost every hour.  If you take the natural log of both sides of this equation you get:  ln(C) = -kt + ln(C0).    You may recognize this form as the straight line equation y = bx + c.  Here y = the point-in-time concentration (C) at any time (t) and c equals C0 or the starting or t=0 concentration.  The x value in this equation is time (t).  The slope b is equal to –k or the loss rate or Q/V.

Given this straight line form, a plot of ln(C) versus t data points should equal a straight line with negative slope k and y-intercept value equal to C0.

The technique of linear regression is very valuable in that it takes the matched data pairs of values of  (y,x ) or  (ln(C), t) in this case and gives you the best straight-line equation that fits these data.   That is, it gives you k and CO in the equation:  ln(C) = -kt + ln(C0).    Once you have k you have a precise estimate of  the ventilation rate based on our tracer gas data.   The technique also tells you how well your data fit the first order decay model.

So how do we do this mathematical magic?  Doing it by hand can be a real pain but doing it by Excel spreadsheet is relatively easy.   First, you have to turn on “Data Analysis” and the Analysis ToolPak if they are not already activated in your copy of Excel (they are free but NOT activated by default).  You do this in Excel 2010 by going to the File tab then Options then Add-ins.  At that point you will see Analysis ToolPak.   You simply add it to Excel at this point and it shows up in the DATA tab.  If you have a different version of Excel or other problems then go to Google and look for YouTube presentations that show you step by step how to add this important tool.

Open a spreadsheet and let’s use the tracer gas data from the July 22 blog:   Concentration, t pairs:    16 ppm @ 0 hour,  12.5 @ 0.5,  9.7 @ 1, 7.9 @ 1.4 and 4.1 ppm after 2.8 hours.   Make a column in the spreadsheet for all the concentrations.    Thus cell A1 has 16 and A5 has 4.1.    Now go to cell B1 and put in =ln(A1).   This calculates the natural logarithm of 16 ppm or 2.7725.    Copy B1 down to B5 so that B5 now equals the ln(A5) or 1.410987  in this example.   In the C column put in the respective times for these concentrations; that is, 0 for C1 filling the cells in down to 2.8 hours for C5.

Now go to the DATA tab and then click “Data Analysis” to the right.   A small window opens entitled “Data Analysis” with many choices.  Go to “Regression” and hit OK.  Highlight the cells B1 thru B5.   Then go to the “input y range”: click the cursor into this window and then highlight B1 thru B5.   It will show up in the window as \$B\$1:\$B\$5.   You can also put this exact string in manually.  Now click the cursor into the Input X Range window and then highlight C1 thru B5.   It should show up as \$C\$1:\$C\$5.   At this point just hit OK and the magic happens.  A new worksheet opens up (Sheet 4) with all the analysis data.

Remember the straight line equation form we are using in the regression:  y = bx + c .  In this case straight line equation is ln(C)  = kt + ln(C0).      The calculated intercept is the model-predicted ln(C0).   In the regression model calculated by Excel this value is 2.76455 (in cell B17 intercept coefficient).  Taking e to this power is the model-predicted C0 = 15.9 ppm which is very close to the 16 ppm we actually measured at time equal zero.   The value for k is in cell B18 (X variable coefficient) and is -0.49 which is pretty close to the -0.5 that we estimated from looking at the data and assuming a first-order model without doing the regression analysis.    So we now have a reasonably precise analysis of ventilation from our tracer gas data because we now know that Q/V = 0.49/hr.

One of the neatest benefits from running the model is that it tells you how good the model fits the data using the first-order decay assumption.   In this case the R2 value (cell B5) in Sheet4 is 0.9996 which means that 99.96% of the data is explained or accounted for by the model and only 0.04% can be considered random error or noise.

I will be happy to send a copy of this spreadsheet complete with the sheet4 analysis if you email me at mjayjock@gmail.com.   I will be away this week so it may take some time for me to get back to you with the spreadsheet.

I went a bit deep into the technical woods on this one but I thought it might be worth it because the first order decay model is very important for a number of exposure modeling situations.   This example was for ventilation rate but you can use it for modeling the time-dependent rate of an evaporating spill if you have data on the remaining weight or size of spill versus time.