metachronistic http://swingleydev.com/blog/ Latest metachronistic posts en-us Sat, 10 Nov 2018 10:02:21 -0900 Predicting Snow Depth http://swingleydev.com/blog/p/2011/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>It’s November 10th in Fairbanks and we have only an inch of snow on the ground. The average depth on this date is 6.1 inches, but that a little deceptive because snow depth doesn’t follow a normal distribution; it can never be below zero, and has a long tail toward deeper snow depths. In the 92 years of snow depth data for the Fairbanks Airport, we’ve had less than an inch of snow only six times (6.5%). At the other end of the distribution, there have been seven years with more than 14 inches of snow on November 10th.</p> <p>My question is: what does snow depth on November 10th tell us about how much snow we are going to get later on in the winter? Is there a relationship between depth on November 10th and depths later in the winter, and if there is, how much snow can we expect this winter?</p> </div> <div class="section" id="data"> <h1>Data</h1> <p>We’ll use the 92-year record of snow depth data from the Fairbanks International Airport station that’s in the <a class="reference external" href="https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-ghcn">Global Historical Climate Network</a>.</p> <p>The correlation coefficients (a 1 means a perfect correlation, and a 0 is no correlation) between snow depth on November 10th, and the first of the months of December, January and February of that same winter are shown below:</p> <table border="1" class="docutils"> <colgroup> <col width="20%" /> <col width="20%" /> <col width="20%" /> <col width="20%" /> <col width="20%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">&nbsp;</th> <th class="head">nov_10</th> <th class="head">dec_01</th> <th class="head">jan_01</th> <th class="head">feb_01</th> </tr> </thead> <tbody valign="top"> <tr><td>nov_10</td> <td>1.00</td> <td>0.65</td> <td>0.49</td> <td>0.46</td> </tr> <tr><td>dec_01</td> <td>0.65</td> <td>1.00</td> <td>0.60</td> <td>0.39</td> </tr> <tr><td>jan_01</td> <td>0.49</td> <td>0.60</td> <td>1.00</td> <td>0.74</td> </tr> <tr><td>feb_01</td> <td>0.46</td> <td>0.39</td> <td>0.74</td> <td>1.00</td> </tr> </tbody> </table> <p>Looking down the <tt class="docutils literal">nov_10</tt> column, you can see a high correlation between snow depth on November 10th and depth on December 1st, but lower (and similar) correlations with depths in January and February.</p> <p>This makes sense. In Fairbanks, snow that falls after the second week in October is likely to be around for the rest of the winter, so all the snow on the ground on November 10th, will still be there in December, and throughout the winter.</p> <p>But what can a snow depth of one inch on November 10th tell us about how much snow we will have in December or later on?</p> <p>Here’s the data for those six years with a snow depth of 1 inch on November 10th:</p> <table border="1" class="docutils"> <colgroup> <col width="22%" /> <col width="26%" /> <col width="26%" /> <col width="26%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">wyear</th> <th class="head">dec_01</th> <th class="head">jan_01</th> <th class="head">feb_01</th> </tr> </thead> <tbody valign="top"> <tr><td>1938</td> <td>5</td> <td>11</td> <td>24</td> </tr> <tr><td>1940</td> <td>6</td> <td>8</td> <td>9</td> </tr> <tr><td>1951</td> <td>12</td> <td>22</td> <td>31</td> </tr> <tr><td>1953</td> <td>1</td> <td>5</td> <td>17</td> </tr> <tr><td>1954</td> <td>9</td> <td>15</td> <td>12</td> </tr> <tr><td>1979</td> <td>3</td> <td>8</td> <td>14</td> </tr> </tbody> </table> <p>Not exactly encouraging data for our current situation, although 1951 gives us some hope of a good winter.</p> </div> <div class="section" id="methods"> <h1>Methods</h1> <p>We used Bayesian linear regression to predict snow depth on December 1st, January 1st and February 1st, based on our snow depth data and the current snow depth in Fairbanks. We used the <tt class="docutils literal">rstanarm</tt> R package, which mimics the <tt class="docutils literal">glm</tt> function that’s part of base R.</p> <p>Because of the non-zero, skewed nature of the distribution of snow depths, a log-linked Gamma distribution is appropriate. We used the <tt class="docutils literal">rstanarm</tt> defaults for priors.</p> <p>One of the great things about Bayesian linear regression is that it incorporates our uncertainty about the model coefficients to produce a distribution of predicted values. The more uncertainty there is in our model, the wider the range of predicted values. We examine the distribution of these predicted snow depth values and compare them with the distribution of actual values.</p> <p>The code for the analysis appears at the bottom of the post.</p> </div> <div class="section" id="results"> <h1>Results</h1> <p>The following figure shows a histogram and density function plot for the predicted snow depth on December 1st (top pane) for this year, and the actual December 1st snow depth data in past years (bottom).</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2018/11/december_comparison.pdf"><img alt="December Snow Depth" class="img-responsive" src="//media.swingleydev.com/img/blog/2018/11/december_comparison.svgz" /></a> </div> <p>The predicted snow depth ranges from zero to almost 27 inches of snow, but the distribution is concentrated around 5 inches. The lower plot showing the distribution of actual snow depth on December 1st isn’t as smooth, but it has a similar shape and peaks at 9 inches.</p> <p>If we run the same analysis for January and February, we get a set of frequency distributions that look like the following plot, again with the predicted snow depth distribution on top and the distribution of actual data on the bottom.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2018/11/dec_jan_feb.pdf"><img alt="Snow Depth Distribution" class="img-responsive" src="//media.swingleydev.com/img/blog/2018/11/dec_jan_feb.svgz" /></a> </div> <p>The December densities are repeated here, in red, along with the January (green) and February (blue) results. In the top plot, you can clearly see that the shape of the distribution gets more spread out as we get farther from November, indicating our increasing uncertainty in our predictions, although some of that pattern is also from the source data (below), which also gets more spread out in January and February.</p> <p>Despite our increasing uncertainty, it’s clear from comparing the peaks in these curves that our models expect there to be less snow in December, January and February this year, compared with historical values. By my reckoning, we can expect around 5 inches on December 1st, 10 inches on January 1st, and 12 or 13 inches by February. In an average year, these values would be closer to 9, 12, and 15 inches.</p> </div> <div class="section" id="conclusion"> <h1>Conclusion</h1> <p>There is a relationship between snow depth on November 10th and depths later in the winter, but the distributions of predicted values are so spread out that we could easily receive as much or more snow as we have in previous years. Last year on this date we had 5 inches, on December 1st we had 11 inches, 13 inches on New Year’s Day, and 20 inches on February 1st. Here’s hoping we quickly reach, and surpass those values in 2018/2019.</p> </div> <div class="section" id="appendix"> <h1>Appendix</h1> <div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>tidyverse<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>glue<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>ggpubr<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>scales<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>lubridate<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>RPostgres<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>rstanarm<span class="p">)</span> noaa <span class="o">&lt;-</span> dbConnect<span class="p">(</span>Postgres<span class="p">(),</span> dbname <span class="o">=</span> <span class="s">&quot;noaa&quot;</span><span class="p">)</span> ghcnd_stations <span class="o">&lt;-</span> noaa <span class="o">%&gt;%</span> tbl<span class="p">(</span><span class="s">&quot;ghcnd_stations&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>station_name <span class="o">==</span> <span class="s">&quot;FAIRBANKS INTL AP&quot;</span><span class="p">)</span> ghcnd_variables <span class="o">&lt;-</span> noaa <span class="o">%&gt;%</span> tbl<span class="p">(</span><span class="s">&quot;ghcnd_variables&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>variable <span class="o">==</span> <span class="s">&quot;SNWD&quot;</span><span class="p">)</span> ghcnd_obs <span class="o">&lt;-</span> noaa <span class="o">%&gt;%</span> tbl<span class="p">(</span><span class="s">&quot;ghcnd_obs&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> inner_join<span class="p">(</span>ghcnd_stations<span class="p">,</span> by <span class="o">=</span> <span class="s">&quot;station_id&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> inner_join<span class="p">(</span>ghcnd_variables<span class="p">,</span> by <span class="o">=</span> <span class="s">&quot;variable&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>month <span class="o">=</span> date_part<span class="p">(</span><span class="s">&quot;month&quot;</span><span class="p">,</span> dte<span class="p">),</span> day <span class="o">=</span> date_part<span class="p">(</span><span class="s">&quot;day&quot;</span><span class="p">,</span> dte<span class="p">))</span> <span class="o">%&gt;%</span> filter<span class="p">((</span>month <span class="o">==</span> <span class="m">11</span> <span class="o">&amp;</span> day <span class="o">==</span> <span class="m">10</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span>month <span class="o">==</span> <span class="m">12</span> <span class="o">&amp;</span> day <span class="o">==</span> <span class="m">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span>month <span class="o">==</span> <span class="m">1</span> <span class="o">&amp;</span> day <span class="o">==</span> <span class="m">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span>month <span class="o">==</span> <span class="m">2</span> <span class="o">&amp;</span> day <span class="o">==</span> <span class="m">1</span><span class="p">),</span> <span class="kp">is.na</span><span class="p">(</span>meas_flag<span class="p">)</span> <span class="o">|</span> meas_flag <span class="o">==</span> <span class="s">&quot;&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>value <span class="o">=</span> raw_value <span class="o">*</span> raw_multiplier<span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>dte<span class="p">,</span> month<span class="p">,</span> day<span class="p">,</span> variable<span class="p">,</span> value<span class="p">)</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> snow_depths <span class="o">&lt;-</span> ghcnd_obs <span class="o">%&gt;%</span> mutate<span class="p">(</span>wyear <span class="o">=</span> year<span class="p">(</span>dte <span class="o">-</span> days<span class="p">(</span><span class="m">91</span><span class="p">)),</span> mmdd <span class="o">=</span> <span class="kp">factor</span><span class="p">(</span>glue<span class="p">(</span><span class="s">&quot;{str_to_lower(month.abb[month])}&quot;</span><span class="p">,</span> <span class="s">&quot;_{sprintf(&#39;%02d&#39;, day)}&quot;</span><span class="p">),</span> levels <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;nov_10&quot;</span><span class="p">,</span> <span class="s">&quot;dec_01&quot;</span><span class="p">,</span> <span class="s">&quot;jan_01&quot;</span><span class="p">,</span> <span class="s">&quot;feb_01&quot;</span><span class="p">)),</span> value <span class="o">=</span> value <span class="o">/</span> <span class="m">25.4</span><span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>wyear<span class="p">,</span> mmdd<span class="p">,</span> value<span class="p">)</span> <span class="o">%&gt;%</span> spread<span class="p">(</span>mmdd<span class="p">,</span> value<span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>nov_10<span class="p">))</span> write_csv<span class="p">(</span>snow_depths<span class="p">,</span> <span class="s">&quot;snow_depths.csv&quot;</span><span class="p">,</span> na <span class="o">=</span> <span class="s">&quot;&quot;</span><span class="p">)</span> dec <span class="o">&lt;-</span> stan_glm<span class="p">(</span>dec_01 <span class="o">~</span> nov_10<span class="p">,</span> data <span class="o">=</span> snow_depths<span class="p">,</span> family <span class="o">=</span> Gamma<span class="p">(</span>link <span class="o">=</span> <span class="s">&quot;log&quot;</span><span class="p">),</span> <span class="c1"># prior = normal(0.7, 3),</span> <span class="c1"># prior_intercept = normal(1, 3),</span> iter <span class="o">=</span> <span class="m">5000</span><span class="p">)</span> <span class="c1"># What does the model day about 2018?</span> dec_prediction_mat <span class="o">&lt;-</span> posterior_predict<span class="p">(</span>dec<span class="p">,</span> newdata <span class="o">=</span> tibble<span class="p">(</span>nov_10 <span class="o">=</span> <span class="m">1</span><span class="p">))</span> dec_prediction <span class="o">&lt;-</span> tibble<span class="p">(</span>pred_dec_01 <span class="o">=</span> dec_prediction_mat<span class="p">[,</span><span class="m">1</span><span class="p">])</span> dec_hist <span class="o">&lt;-</span> ggplot<span class="p">(</span>data <span class="o">=</span> dec_prediction<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> pred_dec_01<span class="p">,</span> y <span class="o">=</span> <span class="m">..</span>density..<span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_histogram<span class="p">(</span>binwidth <span class="o">=</span> <span class="m">0.25</span><span class="p">,</span> color <span class="o">=</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> fill <span class="o">=</span> <span class="s">&#39;darkorange&#39;</span><span class="p">)</span> <span class="o">+</span> geom_density<span class="p">()</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Snow depth (inches)&quot;</span><span class="p">,</span> limits <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">40</span><span class="p">),</span> breaks <span class="o">=</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">40</span><span class="p">,</span> <span class="m">5</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Frequency&quot;</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>plot.margin <span class="o">=</span> unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="s">&#39;lines&#39;</span><span class="p">))</span> <span class="o">+</span> theme<span class="p">(</span>axis.text.x <span class="o">=</span> element_blank<span class="p">(),</span> axis.title.x <span class="o">=</span> element_blank<span class="p">(),</span> axis.ticks.x <span class="o">=</span> element_blank<span class="p">())</span> <span class="o">+</span> labs<span class="p">(</span>title <span class="o">=</span> <span class="s">&quot;December Snow Depth&quot;</span><span class="p">,</span> subtitle <span class="o">=</span> <span class="s">&quot;Fairbanks Airport Station&quot;</span><span class="p">)</span> actual_december <span class="o">&lt;-</span> ggplot<span class="p">(</span>data <span class="o">=</span> snow_depths<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> dec_01<span class="p">,</span> y <span class="o">=</span> <span class="m">..</span>density..<span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_histogram<span class="p">(</span>binwidth <span class="o">=</span> <span class="m">1</span><span class="p">,</span> color <span class="o">=</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> fill <span class="o">=</span> <span class="s">&#39;darkorange&#39;</span><span class="p">)</span> <span class="o">+</span> geom_density<span class="p">()</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Snow depth (inches)&quot;</span><span class="p">,</span> limits <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">40</span><span class="p">),</span> breaks <span class="o">=</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">40</span><span class="p">,</span> <span class="m">5</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Frequency&quot;</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>plot.margin <span class="o">=</span> unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="s">&#39;lines&#39;</span><span class="p">))</span> height <span class="o">&lt;-</span> <span class="m">9</span> width <span class="o">&lt;-</span> <span class="m">16</span> rescale <span class="o">&lt;-</span> <span class="m">0.75</span> heights <span class="o">&lt;-</span> <span class="kt">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span> <span class="m">0.5</span><span class="p">)</span> <span class="o">*</span> height gt <span class="o">&lt;-</span> ggarrange<span class="p">(</span>dec_hist<span class="p">,</span> actual_december<span class="p">,</span> ncol <span class="o">=</span> <span class="m">1</span><span class="p">,</span> nrow <span class="o">=</span> <span class="m">2</span><span class="p">,</span> align <span class="o">=</span> <span class="s">&quot;v&quot;</span><span class="p">,</span> widths <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> heights <span class="o">=</span> heights<span class="p">)</span> svg<span class="p">(</span><span class="s">&#39;december_comparison.svg&#39;</span><span class="p">,</span> width<span class="o">=</span>width<span class="o">*</span>rescale<span class="p">,</span> height<span class="o">=</span>height<span class="o">*</span>rescale<span class="p">)</span> gt dev.off<span class="p">()</span> jan <span class="o">&lt;-</span> stan_glm<span class="p">(</span>jan_01 <span class="o">~</span> nov_10<span class="p">,</span> data <span class="o">=</span> snow_depths<span class="p">,</span> <span class="c1"># family = gaussian(link = &quot;identity&quot;),</span> family <span class="o">=</span> Gamma<span class="p">(</span>link <span class="o">=</span> <span class="s">&quot;log&quot;</span><span class="p">),</span> <span class="c1"># prior = normal(0.7, 3),</span> <span class="c1"># prior_intercept = normal(1, 3),</span> iter <span class="o">=</span> <span class="m">5000</span><span class="p">)</span> jan_prediction_mat <span class="o">&lt;-</span> posterior_predict<span class="p">(</span>jan<span class="p">,</span> newdata <span class="o">=</span> tibble<span class="p">(</span>nov_10 <span class="o">=</span> <span class="m">1</span><span class="p">))</span> jan_prediction <span class="o">&lt;-</span> tibble<span class="p">(</span>pred_jan_01 <span class="o">=</span> jan_prediction_mat<span class="p">[,</span><span class="m">1</span><span class="p">])</span> feb <span class="o">&lt;-</span> stan_glm<span class="p">(</span>feb_01 <span class="o">~</span> nov_10<span class="p">,</span> data <span class="o">=</span> snow_depths<span class="p">,</span> <span class="c1"># family = gaussian(link = &quot;identity&quot;),</span> family <span class="o">=</span> Gamma<span class="p">(</span>link <span class="o">=</span> <span class="s">&quot;log&quot;</span><span class="p">),</span> <span class="c1"># family = poisson(link = &quot;identity&quot;),</span> <span class="c1"># prior = normal(0.7, 3),</span> <span class="c1"># prior_intercept = normal(1, 3),</span> iter <span class="o">=</span> <span class="m">5000</span><span class="p">)</span> feb_prediction_mat <span class="o">&lt;-</span> posterior_predict<span class="p">(</span>feb<span class="p">,</span> newdata <span class="o">=</span> tibble<span class="p">(</span>nov_10 <span class="o">=</span> <span class="m">1</span><span class="p">))</span> feb_prediction <span class="o">&lt;-</span> tibble<span class="p">(</span>pred_feb_01 <span class="o">=</span> feb_prediction_mat<span class="p">[,</span><span class="m">1</span><span class="p">])</span> all_predictions <span class="o">&lt;-</span> bind_cols<span class="p">(</span>dec_prediction<span class="p">,</span> jan_prediction<span class="p">,</span> feb_prediction<span class="p">)</span> <span class="o">%&gt;%</span> rename<span class="p">(</span><span class="sb">`Dec 1`</span> <span class="o">=</span> pred_dec_01<span class="p">,</span> <span class="sb">`Jan 1`</span> <span class="o">=</span> pred_jan_01<span class="p">,</span> <span class="sb">`Feb 1`</span> <span class="o">=</span> pred_feb_01<span class="p">)</span> <span class="o">%&gt;%</span> gather<span class="p">(</span>prediction<span class="p">,</span> snow_depth_inches<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>prediction <span class="o">=</span> <span class="kp">factor</span><span class="p">(</span>prediction<span class="p">,</span> levels <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;Dec 1&quot;</span><span class="p">,</span> <span class="s">&quot;Jan 1&quot;</span><span class="p">,</span> <span class="s">&quot;Feb 1&quot;</span><span class="p">)))</span> pred_density_plot <span class="o">&lt;-</span> ggplot<span class="p">(</span>data <span class="o">=</span> all_predictions<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> snow_depth_inches<span class="p">,</span> colour <span class="o">=</span> prediction<span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_density<span class="p">()</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Snow depth (inches)&quot;</span><span class="p">,</span> limits <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">55</span><span class="p">),</span> breaks <span class="o">=</span> pretty_breaks<span class="p">(</span>n <span class="o">=</span> <span class="m">10</span><span class="p">))</span> <span class="o">+</span> theme<span class="p">(</span>axis.text.x <span class="o">=</span> element_blank<span class="p">(),</span> axis.title.x <span class="o">=</span> element_blank<span class="p">(),</span> axis.ticks.x <span class="o">=</span> element_blank<span class="p">())</span> <span class="o">+</span> labs<span class="p">(</span>title <span class="o">=</span> <span class="s">&quot;Predicted and actual snow depths based on November 10 depth&quot;</span><span class="p">,</span> subtitle <span class="o">=</span> <span class="s">&quot;Fairbanks Airport Station&quot;</span><span class="p">)</span> actual_data <span class="o">&lt;-</span> snow_depths <span class="o">%&gt;%</span> transmute<span class="p">(</span><span class="sb">`Dec 1`</span> <span class="o">=</span> dec_01<span class="p">,</span> <span class="sb">`Jan 1`</span> <span class="o">=</span> jan_01<span class="p">,</span> <span class="sb">`Feb 1`</span> <span class="o">=</span> feb_01<span class="p">)</span> <span class="o">%&gt;%</span> gather<span class="p">(</span>actual<span class="p">,</span> snow_depth_inches<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>actual <span class="o">=</span> <span class="kp">factor</span><span class="p">(</span>actual<span class="p">,</span> levels <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;Dec 1&quot;</span><span class="p">,</span> <span class="s">&quot;Jan 1&quot;</span><span class="p">,</span> <span class="s">&quot;Feb 1&quot;</span><span class="p">)))</span> actual_density_plot <span class="o">&lt;-</span> ggplot<span class="p">(</span>data <span class="o">=</span> actual_data<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> snow_depth_inches<span class="p">,</span> colour <span class="o">=</span> actual<span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_density<span class="p">()</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Snow depth (inches)&quot;</span><span class="p">,</span> limits <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">55</span><span class="p">),</span> breaks <span class="o">=</span> pretty_breaks<span class="p">(</span>n <span class="o">=</span> <span class="m">10</span><span class="p">))</span> <span class="o">+</span> theme<span class="p">(</span>plot.margin <span class="o">=</span> unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="s">&#39;lines&#39;</span><span class="p">))</span> height <span class="o">&lt;-</span> <span class="m">9</span> width <span class="o">&lt;-</span> <span class="m">16</span> rescale <span class="o">&lt;-</span> <span class="m">0.75</span> heights <span class="o">&lt;-</span> <span class="kt">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span> <span class="m">0.5</span><span class="p">)</span> <span class="o">*</span> height gt <span class="o">&lt;-</span> ggarrange<span class="p">(</span>pred_density_plot<span class="p">,</span> actual_density_plot<span class="p">,</span> ncol <span class="o">=</span> <span class="m">1</span><span class="p">,</span> nrow <span class="o">=</span> <span class="m">2</span><span class="p">,</span> align <span class="o">=</span> <span class="s">&quot;v&quot;</span><span class="p">,</span> widths <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> heights <span class="o">=</span> heights<span class="p">)</span> svg<span class="p">(</span><span class="s">&#39;dec_jan_feb.svg&#39;</span><span class="p">,</span> width<span class="o">=</span>width<span class="o">*</span>rescale<span class="p">,</span> height<span class="o">=</span>height<span class="o">*</span>rescale<span class="p">)</span> gt dev.off<span class="p">()</span> </pre></div> </div> </div> Sat, 10 Nov 2018 10:02:21 -0900 http://swingleydev.com/blog/p/2011/ weather snow depth bayesian statistics regression rstanarm R Equinox Marathon Weather, 2018 update http://swingleydev.com/blog/p/2010/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>A couple years ago I wrote a post about past <a class="reference external" href="https://swingleydev.com/blog/p/1999/">Equinox Marathon weather</a>. Since that post Andrea and I have run the relay twice, and I plan on running the full marathon in a couple days. This post updates the statistics and plots to include two more years of the race.</p> </div> <div class="section" id="methods"> <h1>Methods</h1> <p>Methods and data are the same as in my previous <a class="reference external" href="https://swingleydev.com/blog/p/1999/">post</a>, except the daily data has been updated to include 2016 and 2017. The R code is available at the end of the previous post.</p> </div> <div class="section" id="results"> <h1>Results</h1> <div class="section" id="race-day-weather"> <h2>Race day weather</h2> <p>Temperatures at the airport on race day ranged from 19.9&nbsp;°F in 1972 to 35.1&nbsp;°F in 1969, but the average range is between 34.1 and 53.1&nbsp;°F. Using our model of Ester Dome temperatures, we get an average range of 29.5 and 47.3&nbsp;°F and an overall min / max of 16.1 / 61.3&nbsp;°F. Generally speaking, it will be below freezing on Ester Dome, but possibly before most of the runners get up there.</p> <p>Precipitation (rain, sleet or snow) has fallen on 16 out of 55 race days, or 29% of the time, and measurable snowfall has been recorded on four of those sixteen. The highest amount fell in 2014 with 0.36 inches of liquid precipitation (no snow was recorded and the temperatures were between 45 and 51&nbsp;°F so it was almost certainly all rain, even on Ester Dome). More than a quarter of an inch of precipitation fell in three of the sixteen years when it rained or snowed (1990, 1993, and 2014), but most rainfall totals are much smaller.</p> <p>Measurable snow fell at the airport in four years, or seven percent of the time: 4.1&nbsp;inches in 1993, 2.1&nbsp;inches in 1985, 1.2&nbsp;inches in 1996, and 0.4&nbsp;inches in 1992. But that’s at the airport station. Five of the 12 years where measurable precipitation fell at the airport and no snow fell, had possible minimum temperatures on Ester Dome that were below freezing. It’s likely that some of the precipitation recorded at the airport in those years was coming down as snow up on Ester Dome. If so, that means snow may have fallen on nine race days, bringing the percentage up to sixteen percent.</p> <p>Wind data from the airport has only been recorded since 1984, but from those years the average wind speed at the airport on race day is 4.8&nbsp;miles per hour. The highest 2-minute wind speed during Equinox race day was 21&nbsp;miles per hour in 2003. Unfortunately, no wind data is available for Ester Dome, but it’s likely to be higher than what is recorded at the airport.</p> </div> <div class="section" id="weather-from-the-week-prior"> <h2>Weather from the week prior</h2> <p>It’s also useful to look at the weather from the week before the race, since excessive pre-race rain or snow can make conditions on race day very different, even if the race day weather is pleasant. The year I ran the full marathon (2013), it snowed the week before and much of the trail in the woods before the water stop near Henderson and all of the out and back were covered in snow.</p> <p>The most dramatic example of this was 1992 where 23 inches (!) of snow fell at the airport in the week prior to the race, with much higher totals up on the summit of Ester Dome. Measurable snow has been recorded at the airport in the week prior to six races, but all the weekly totals are under an inch except for the snow year of 1992.</p> <p>Precipitation has fallen in 44 of 55 pre-race weeks (80% of the time). Three years have had more than an inch of precipitation prior to the race: 1.49&nbsp;inches in 2015, 1.26&nbsp;inches in 1992 (most of which fell as snow), and 1.05&nbsp;inches in 2007. On average, just over two tenths of an inch of precipitation falls in the week before the race.</p> </div> </div> <div class="section" id="summary"> <h1>Summary</h1> <p>The following stacked plots shows the weather for all 55 runnings of the Equinox marathon. The top panel shows the range of temperatures on race day from the airport station (wide bars) and estimated on Ester Dome (thin lines below bars). The shaded area at the bottom shows where temperatures are below freezing.</p> <p>The middle panel shows race day liquid precipitation (rain, melted snow). Bars marked with an asterisk indicate years where snow was also recorded at the airport, but remember that five of the other years with liquid precipitation probably experienced snow on Ester Dome (1977, 1986, 1991, 1994, and 2016) because the temperatures were likely to be below freezing at elevation.</p> <p>The bottom panel shows precipitation totals from the week prior to the race. Bars marked with an asterisk indicate weeks where snow was also recorded at the airport.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2018/09/equinox_weather_thru_2017.pdf"><img alt="Equinox Marathon Weather" class="img-responsive" src="//media.swingleydev.com/img/blog/2018/09/equinox_weather_thru_2017.svgz" /></a> </div> <p>Here’s a table with most of the data from the analysis. A CSV with this data can be downloaded from <a class="reference external" href="//media.swingleydev.com/img/blog/2018/09/all_wx.csv">all_wx.csv</a></p> <table border="1" class="docutils"> <colgroup> <col width="15%" /> <col width="9%" /> <col width="9%" /> <col width="13%" /> <col width="13%" /> <col width="8%" /> <col width="8%" /> <col width="8%" /> <col width="10%" /> <col width="10%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Date</th> <th class="head">min t</th> <th class="head">max t</th> <th class="head">ED min t</th> <th class="head">ED max t</th> <th class="head">awnd</th> <th class="head">prcp</th> <th class="head">snow</th> <th class="head">p prcp</th> <th class="head">p snow</th> </tr> </thead> <tbody valign="top"> <tr><td>1963-09-21</td> <td>32.0</td> <td>54.0</td> <td>27.5</td> <td>48.2</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.01</td> <td>0.0</td> </tr> <tr><td>1964-09-19</td> <td>34.0</td> <td>57.9</td> <td>29.4</td> <td>51.8</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> </tr> <tr><td>1965-09-25</td> <td>37.9</td> <td>60.1</td> <td>33.1</td> <td>53.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.80</td> <td>0.0</td> </tr> <tr><td>1966-09-24</td> <td>36.0</td> <td>62.1</td> <td>31.3</td> <td>55.8</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.01</td> <td>0.0</td> </tr> <tr><td>1967-09-23</td> <td>35.1</td> <td>57.9</td> <td>30.4</td> <td>51.8</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>1968-09-21</td> <td>23.0</td> <td>44.1</td> <td>19.1</td> <td>38.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.04</td> <td>0.0</td> </tr> <tr><td>1969-09-20</td> <td>35.1</td> <td>68.0</td> <td>30.4</td> <td>61.3</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>1970-09-19</td> <td>24.1</td> <td>39.9</td> <td>20.1</td> <td>34.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.42</td> <td>0.0</td> </tr> <tr><td>1971-09-18</td> <td>35.1</td> <td>55.9</td> <td>30.4</td> <td>50.0</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.14</td> <td>0.0</td> </tr> <tr><td>1972-09-23</td> <td>19.9</td> <td>42.1</td> <td>16.1</td> <td>37.0</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.01</td> <td>0.2</td> </tr> <tr><td>1973-09-22</td> <td>30.0</td> <td>44.1</td> <td>25.6</td> <td>38.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.05</td> <td>0.0</td> </tr> <tr><td>1974-09-21</td> <td>48.0</td> <td>60.1</td> <td>42.5</td> <td>53.9</td> <td>&nbsp;</td> <td>0.08</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>1975-09-20</td> <td>37.9</td> <td>55.9</td> <td>33.1</td> <td>50.0</td> <td>&nbsp;</td> <td>0.02</td> <td>0.0</td> <td>0.02</td> <td>0.0</td> </tr> <tr><td>1976-09-18</td> <td>34.0</td> <td>59.0</td> <td>29.4</td> <td>52.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.54</td> <td>0.0</td> </tr> <tr><td>1977-09-24</td> <td>36.0</td> <td>48.9</td> <td>31.3</td> <td>43.4</td> <td>&nbsp;</td> <td>0.06</td> <td>0.0</td> <td>0.20</td> <td>0.0</td> </tr> <tr><td>1978-09-23</td> <td>30.0</td> <td>42.1</td> <td>25.6</td> <td>37.0</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.10</td> <td>0.3</td> </tr> <tr><td>1979-09-22</td> <td>35.1</td> <td>62.1</td> <td>30.4</td> <td>55.8</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.17</td> <td>0.0</td> </tr> <tr><td>1980-09-20</td> <td>30.9</td> <td>43.0</td> <td>26.5</td> <td>37.8</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.35</td> <td>0.0</td> </tr> <tr><td>1981-09-19</td> <td>37.0</td> <td>43.0</td> <td>32.2</td> <td>37.8</td> <td>&nbsp;</td> <td>0.15</td> <td>0.0</td> <td>0.04</td> <td>0.0</td> </tr> <tr><td>1982-09-18</td> <td>42.1</td> <td>61.0</td> <td>37.0</td> <td>54.8</td> <td>&nbsp;</td> <td>0.02</td> <td>0.0</td> <td>0.22</td> <td>0.0</td> </tr> <tr><td>1983-09-17</td> <td>39.9</td> <td>46.9</td> <td>34.9</td> <td>41.5</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.05</td> <td>0.0</td> </tr> <tr><td>1984-09-22</td> <td>28.9</td> <td>60.1</td> <td>24.6</td> <td>53.9</td> <td>5.8</td> <td>0.00</td> <td>0.0</td> <td>0.08</td> <td>0.0</td> </tr> <tr><td>1985-09-21</td> <td>30.9</td> <td>42.1</td> <td>26.5</td> <td>37.0</td> <td>6.5</td> <td>0.14</td> <td>2.1</td> <td>0.57</td> <td>0.0</td> </tr> <tr><td>1986-09-20</td> <td>36.0</td> <td>52.0</td> <td>31.3</td> <td>46.3</td> <td>8.3</td> <td>0.07</td> <td>0.0</td> <td>0.21</td> <td>0.0</td> </tr> <tr><td>1987-09-19</td> <td>37.9</td> <td>61.0</td> <td>33.1</td> <td>54.8</td> <td>6.3</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>1988-09-24</td> <td>37.0</td> <td>45.0</td> <td>32.2</td> <td>39.7</td> <td>4.0</td> <td>0.00</td> <td>0.0</td> <td>0.11</td> <td>0.0</td> </tr> <tr><td>1989-09-23</td> <td>36.0</td> <td>61.0</td> <td>31.3</td> <td>54.8</td> <td>8.5</td> <td>0.00</td> <td>0.0</td> <td>0.07</td> <td>0.5</td> </tr> <tr><td>1990-09-22</td> <td>37.9</td> <td>50.0</td> <td>33.1</td> <td>44.4</td> <td>7.8</td> <td>0.26</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>1991-09-21</td> <td>36.0</td> <td>57.0</td> <td>31.3</td> <td>51.0</td> <td>4.5</td> <td>0.04</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> </tr> <tr><td>1992-09-19</td> <td>24.1</td> <td>33.1</td> <td>20.1</td> <td>28.5</td> <td>6.7</td> <td>0.01</td> <td>0.4</td> <td>1.26</td> <td>23.0</td> </tr> <tr><td>1993-09-18</td> <td>28.0</td> <td>37.0</td> <td>23.8</td> <td>32.2</td> <td>4.9</td> <td>0.29</td> <td>4.1</td> <td>0.37</td> <td>0.3</td> </tr> <tr><td>1994-09-24</td> <td>27.0</td> <td>51.1</td> <td>22.8</td> <td>45.5</td> <td>6.0</td> <td>0.02</td> <td>0.0</td> <td>0.08</td> <td>0.0</td> </tr> <tr><td>1995-09-23</td> <td>43.0</td> <td>66.9</td> <td>37.8</td> <td>60.3</td> <td>4.0</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>1996-09-21</td> <td>28.9</td> <td>37.9</td> <td>24.6</td> <td>33.1</td> <td>6.9</td> <td>0.06</td> <td>1.2</td> <td>0.26</td> <td>0.0</td> </tr> <tr><td>1997-09-20</td> <td>27.0</td> <td>55.0</td> <td>22.8</td> <td>49.1</td> <td>3.8</td> <td>0.00</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> </tr> <tr><td>1998-09-19</td> <td>42.1</td> <td>60.1</td> <td>37.0</td> <td>53.9</td> <td>4.9</td> <td>0.00</td> <td>0.0</td> <td>0.37</td> <td>0.0</td> </tr> <tr><td>1999-09-18</td> <td>39.0</td> <td>64.9</td> <td>34.1</td> <td>58.4</td> <td>3.8</td> <td>0.00</td> <td>0.0</td> <td>0.26</td> <td>0.0</td> </tr> <tr><td>2000-09-16</td> <td>28.9</td> <td>50.0</td> <td>24.6</td> <td>44.4</td> <td>5.6</td> <td>0.00</td> <td>0.0</td> <td>0.30</td> <td>0.0</td> </tr> <tr><td>2001-09-22</td> <td>33.1</td> <td>57.0</td> <td>28.5</td> <td>51.0</td> <td>1.6</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>2002-09-21</td> <td>33.1</td> <td>48.9</td> <td>28.5</td> <td>43.4</td> <td>3.8</td> <td>0.00</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> </tr> <tr><td>2003-09-20</td> <td>26.1</td> <td>46.0</td> <td>22.0</td> <td>40.7</td> <td>9.6</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>2004-09-18</td> <td>26.1</td> <td>48.0</td> <td>22.0</td> <td>42.5</td> <td>4.3</td> <td>0.00</td> <td>0.0</td> <td>0.25</td> <td>0.0</td> </tr> <tr><td>2005-09-17</td> <td>37.0</td> <td>63.0</td> <td>32.2</td> <td>56.6</td> <td>0.9</td> <td>0.00</td> <td>0.0</td> <td>0.09</td> <td>0.0</td> </tr> <tr><td>2006-09-16</td> <td>46.0</td> <td>64.0</td> <td>40.7</td> <td>57.6</td> <td>4.3</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>2007-09-22</td> <td>25.0</td> <td>45.0</td> <td>20.9</td> <td>39.7</td> <td>4.7</td> <td>0.00</td> <td>0.0</td> <td>1.05</td> <td>0.0</td> </tr> <tr><td>2008-09-20</td> <td>34.0</td> <td>51.1</td> <td>29.4</td> <td>45.5</td> <td>4.5</td> <td>0.00</td> <td>0.0</td> <td>0.08</td> <td>0.0</td> </tr> <tr><td>2009-09-19</td> <td>39.0</td> <td>50.0</td> <td>34.1</td> <td>44.4</td> <td>5.8</td> <td>0.00</td> <td>0.0</td> <td>0.25</td> <td>0.0</td> </tr> <tr><td>2010-09-18</td> <td>35.1</td> <td>64.9</td> <td>30.4</td> <td>58.4</td> <td>2.5</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>2011-09-17</td> <td>39.9</td> <td>57.9</td> <td>34.9</td> <td>51.8</td> <td>1.3</td> <td>0.00</td> <td>0.0</td> <td>0.44</td> <td>0.0</td> </tr> <tr><td>2012-09-22</td> <td>46.9</td> <td>66.9</td> <td>41.5</td> <td>60.3</td> <td>6.0</td> <td>0.00</td> <td>0.0</td> <td>0.33</td> <td>0.0</td> </tr> <tr><td>2013-09-21</td> <td>24.3</td> <td>44.1</td> <td>20.3</td> <td>38.9</td> <td>5.1</td> <td>0.00</td> <td>0.0</td> <td>0.13</td> <td>0.6</td> </tr> <tr><td>2014-09-20</td> <td>45.0</td> <td>51.1</td> <td>39.7</td> <td>45.5</td> <td>1.6</td> <td>0.36</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> </tr> <tr><td>2015-09-19</td> <td>37.9</td> <td>44.1</td> <td>33.1</td> <td>38.9</td> <td>2.9</td> <td>0.01</td> <td>0.0</td> <td>1.49</td> <td>0.0</td> </tr> <tr><td>2016-09-17</td> <td>34.0</td> <td>57.9</td> <td>29.4</td> <td>51.8</td> <td>2.2</td> <td>0.01</td> <td>0.0</td> <td>0.61</td> <td>0.0</td> </tr> <tr><td>2017-09-16</td> <td>33.1</td> <td>66.0</td> <td>28.5</td> <td>59.5</td> <td>3.1</td> <td>0.00</td> <td>0.0</td> <td>0.02</td> <td>0.0</td> </tr> </tbody> </table> </div> </div> Thu, 13 Sep 2018 17:40:09 -0800 http://swingleydev.com/blog/p/2010/ Equinox Marathon R running weather Another Equinox Marathon prediction http://swingleydev.com/blog/p/2009/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>In previous posts (<a class="reference external" href="https://swingleydev.com/blog/p/2000/">Fairbanks Race Predictor</a>, <a class="reference external" href="https://swingleydev.com/blog/p/1968/">Equinox from Santa Claus</a>, <a class="reference external" href="https://swingleydev.com/blog/p/1967/">Equinox from Gold Discovery</a>) I’ve looked at predicting Equinox Marathon results based on results from earlier races. In all those cases I’ve looked at single race comparisons: how results from Gold Discovery can predict Marathon times, for example. In this post I’ll look at all the <a class="reference external" href="https://www.runningclubnorth.org/usibelli-running-series/">Usibelli Series races</a> I completed this year to see how they can inform my expectations for next Saturday’s Equinox Marathon.</p> </div> <div class="section" id="methods"> <h1>Methods</h1> <p>I’ve been collecting the results from all Usibelli Series races since 2010. Using that data, grouped by the name of the person racing and year, find all runners that completed the same set of Usibelli Series races that I finished in 2018, as well as their Equinox Marathon finish pace. Between 2010 and 2017 there are 160 records that match.</p> <p>The data looks like this. <tt class="docutils literal">crr</tt> is that person’s Chena River Run pace in minutes, <tt class="docutils literal">msr</tt> is Midnight Sun Run pace for the same person and year, <tt class="docutils literal">rotv</tt> is the pace from Run of the Valkyries, <tt class="docutils literal">gdr</tt> is the Gold Discovery Run, and <tt class="docutils literal">em</tt> is Equniox Marathon pace for that same person and year.</p> <table border="1" class="docutils"> <colgroup> <col width="19%" /> <col width="21%" /> <col width="19%" /> <col width="21%" /> <col width="21%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">crr</th> <th class="head">msr</th> <th class="head">rotv</th> <th class="head">gdr</th> <th class="head">em</th> </tr> </thead> <tbody valign="top"> <tr><td>8.1559</td> <td>8.8817</td> <td>8.1833</td> <td>10.2848</td> <td>11.8683</td> </tr> <tr><td>8.7210</td> <td>9.1387</td> <td>9.2120</td> <td>11.0152</td> <td>13.6796</td> </tr> <tr><td>8.7946</td> <td>9.0640</td> <td>9.0077</td> <td>11.3565</td> <td>13.1755</td> </tr> <tr><td>9.4409</td> <td>10.6091</td> <td>9.6250</td> <td>11.2080</td> <td>13.1719</td> </tr> <tr><td>7.3581</td> <td>7.1836</td> <td>7.1310</td> <td>8.0001</td> <td>9.6565</td> </tr> <tr><td>7.4731</td> <td>7.5349</td> <td>7.4700</td> <td>8.2465</td> <td>9.8359</td> </tr> <tr><td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> </tbody> </table> <p>I will use two methods for using these records to predict Equinox Marathon times, multivariate linear regression and Random Forest.</p> <p>The R code for the analysis appears at the end of this post.</p> </div> <div class="section" id="results"> <h1>Results</h1> <div class="section" id="linear-regression"> <h2>Linear regression</h2> <p>We start with linear regression, which isn’t entirely appropriate for this analysis because the independent variables (pre-Equinox race pace times) aren’t really independent of one another. A person who runs a 6 minute pace in the Chena River Run is likely to also be someone who runs Gold Discovery faster than the average runner. This relationship, in fact, is the basis for this analysis.</p> <p>I started with a model that includes all the races I completed in 2018, but pace time for the Midnight Sun Run wasn’t statistically significant so I removed it from the final model, which included Chena River Run, Run of the Valkyries, and Gold Discovery.</p> <p>This model is significant, as are all the coefficients except the intercept, and the model explains nearly 80% of the variation in the data:</p> <pre class="literal-block"> ## ## Call: ## lm(formula = em ~ crr + gdr + rotv, data = input_pivot) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.8837 -0.6534 -0.2265 0.3549 5.8273 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) 0.6217 0.5692 1.092 0.276420 ## crr -0.3723 0.1346 -2.765 0.006380 ** ## gdr 0.8422 0.1169 7.206 2.32e-11 *** ## rotv 0.7607 0.2119 3.591 0.000442 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.278 on 156 degrees of freedom ## Multiple R-squared: 0.786, Adjusted R-squared: 0.7819 ## F-statistic: 191 on 3 and 156 DF, p-value: &lt; 2.2e-16 </pre> <p>Using this model and my 2018 results, my overall pace and finish times for Equinox are predicted to be 10:45 and 4:41:50. The 95% confidence intervals for these predictions are 10:30–11:01 and 4:35:11–4:48:28.</p> </div> <div class="section" id="random-forest"> <h2>Random Forest</h2> <p>Random Forest is another regression method but it doesn’t require independent variables be independent of one another. Here are the results of building 5,000 random trees from the data:</p> <pre class="literal-block"> ## ## Call: ## randomForest(formula = em ~ ., data = input_pivot, ntree = 5000) ## Type of random forest: regression ## Number of trees: 5000 ## No. of variables tried at each split: 1 ## ## Mean of squared residuals: 1.87325 ## % Var explained: 74.82 ## IncNodePurity ## crr 260.8279 ## gdr 321.3691 ## msr 268.0936 ## rotv 295.4250 </pre> <p>This model, which includes all race results explains just under 74% of the variation in the data. And you can see from the importance result that Gold Discovery results factor more heavily in the result than earlier races in the season like Chena River Run and the Midnight Sun Run.</p> <p>Using this model, my predicted pace is 10:13 and my finish time is 4:27:46. The 95% confidence intervals are 9:23–11:40 and 4:05:58–5:05:34. You’ll notice that the confidence intervals are wider than with linear regression, probably because there are fewer assumptions with Random Forest and less power.</p> </div> </div> <div class="section" id="conclusion"> <h1>Conclusion</h1> <p>My number one goal for this year’s Equinox Marathon is simply to finish without injuring myself, something I wasn’t able to do the last time I ran the whole race in 2013. I finished in 4:49:28 with an overall pace of 11:02, but the race or my training for it resulted in a torn hip labrum.</p> <p>If I’m able to finish uninjured, I’d like to beat my time from 2013. These results suggest I should have no problem acheiving my second goal and perhaps knowing how much faster these predictions are from my 2013 times, I can race conservatively and still get a personal best time.</p> </div> <div class="section" id="appendix-r-code"> <h1>Appendix - R code</h1> <div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>tidyverse<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>RPostgres<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>lubridate<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>glue<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>randomForest<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>knitr<span class="p">)</span> races <span class="o">&lt;-</span> dbConnect<span class="p">(</span>Postgres<span class="p">(),</span> host <span class="o">=</span> <span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname <span class="o">=</span> <span class="s">&quot;races&quot;</span><span class="p">)</span> all_races <span class="o">&lt;-</span> races <span class="o">%&gt;%</span> tbl<span class="p">(</span><span class="s">&quot;all_races&quot;</span><span class="p">)</span> usibelli_races <span class="o">&lt;-</span> tibble<span class="p">(</span>race <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;Chena River Run&quot;</span><span class="p">,</span> <span class="s">&quot;Midnight Sun Run&quot;</span><span class="p">,</span> <span class="s">&quot;Jim Loftus Mile&quot;</span><span class="p">,</span> <span class="s">&quot;Run of the Valkyries&quot;</span><span class="p">,</span> <span class="s">&quot;Gold Discovery Run&quot;</span><span class="p">,</span> <span class="s">&quot;Santa Claus Half Marathon&quot;</span><span class="p">,</span> <span class="s">&quot;Golden Heart Trail Run&quot;</span><span class="p">,</span> <span class="s">&quot;Equinox Marathon&quot;</span><span class="p">))</span> css_2018 <span class="o">&lt;-</span> all_races <span class="o">%&gt;%</span> inner_join<span class="p">(</span>usibelli_races<span class="p">,</span> copy <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>year <span class="o">==</span> <span class="m">2018</span><span class="p">,</span> name <span class="o">==</span> <span class="s">&quot;Christopher Swingley&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> candidate_races <span class="o">&lt;-</span> css_2018 <span class="o">%&gt;%</span> select<span class="p">(</span>race<span class="p">)</span> <span class="o">%&gt;%</span> bind_rows<span class="p">(</span>tibble<span class="p">(</span>race <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;Equinox Marathon&quot;</span><span class="p">)))</span> input_data <span class="o">&lt;-</span> all_races <span class="o">%&gt;%</span> inner_join<span class="p">(</span>candidate_races<span class="p">,</span> copy <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>gender<span class="p">),</span> <span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>birth_year<span class="p">))</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> input_pivot <span class="o">&lt;-</span> input_data <span class="o">%&gt;%</span> group_by<span class="p">(</span>race<span class="p">,</span> name<span class="p">,</span> year<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>n <span class="o">=</span> n<span class="p">())</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>n <span class="o">==</span> <span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> ungroup<span class="p">()</span> <span class="o">%&gt;%</span> select<span class="p">(</span>name<span class="p">,</span> year<span class="p">,</span> race<span class="p">,</span> pace_min<span class="p">)</span> <span class="o">%&gt;%</span> spread<span class="p">(</span>race<span class="p">,</span> pace_min<span class="p">)</span> <span class="o">%&gt;%</span> rename<span class="p">(</span>crr <span class="o">=</span> <span class="sb">`Chena River Run`</span><span class="p">,</span> msr <span class="o">=</span> <span class="sb">`Midnight Sun Run`</span><span class="p">,</span> rotv <span class="o">=</span> <span class="sb">`Run of the Valkyries`</span><span class="p">,</span> gdr <span class="o">=</span> <span class="sb">`Gold Discovery Run`</span><span class="p">,</span> em <span class="o">=</span> <span class="sb">`Equinox Marathon`</span><span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>crr<span class="p">),</span> <span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>msr<span class="p">),</span> <span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>rotv<span class="p">),</span> <span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>gdr<span class="p">),</span> <span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>em<span class="p">))</span> <span class="o">%&gt;%</span> select<span class="p">(</span><span class="o">-</span><span class="kt">c</span><span class="p">(</span>name<span class="p">,</span> year<span class="p">))</span> kable<span class="p">(</span>input_pivot <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">)</span> css_2018_pivot <span class="o">&lt;-</span> css_2018 <span class="o">%&gt;%</span> select<span class="p">(</span>name<span class="p">,</span> year<span class="p">,</span> race<span class="p">,</span> pace_min<span class="p">)</span> <span class="o">%&gt;%</span> spread<span class="p">(</span>race<span class="p">,</span> pace_min<span class="p">)</span> <span class="o">%&gt;%</span> rename<span class="p">(</span>crr <span class="o">=</span> <span class="sb">`Chena River Run`</span><span class="p">,</span> msr <span class="o">=</span> <span class="sb">`Midnight Sun Run`</span><span class="p">,</span> rotv <span class="o">=</span> <span class="sb">`Run of the Valkyries`</span><span class="p">,</span> gdr <span class="o">=</span> <span class="sb">`Gold Discovery Run`</span><span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span><span class="o">-</span><span class="kt">c</span><span class="p">(</span>name<span class="p">,</span> year<span class="p">))</span> pace <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>minutes<span class="p">)</span> <span class="p">{</span> mm <span class="o">=</span> <span class="kp">floor</span><span class="p">(</span>minutes<span class="p">)</span> seconds <span class="o">=</span> <span class="p">(</span>minutes <span class="o">-</span> mm<span class="p">)</span> <span class="o">*</span> <span class="m">60</span> glue<span class="p">(</span><span class="s">&#39;{mm}:{sprintf(&quot;%02.0f&quot;, seconds)}&#39;</span><span class="p">)</span> <span class="p">}</span> finish_time <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>minutes<span class="p">)</span> <span class="p">{</span> hh <span class="o">=</span> <span class="kp">floor</span><span class="p">(</span>minutes <span class="o">/</span> <span class="m">60.0</span><span class="p">)</span> min <span class="o">=</span> minutes <span class="o">-</span> <span class="p">(</span>hh <span class="o">*</span> <span class="m">60</span><span class="p">)</span> mm <span class="o">=</span> <span class="kp">floor</span><span class="p">(</span><span class="kp">min</span><span class="p">)</span> seconds <span class="o">=</span> <span class="p">(</span>min <span class="o">-</span> mm<span class="p">)</span> <span class="o">*</span> <span class="m">60</span> glue<span class="p">(</span><span class="s">&#39;{hh}:{sprintf(&quot;%02d&quot;, mm)}:{sprintf(&quot;%02.0f&quot;, seconds)}&#39;</span><span class="p">)</span> <span class="p">}</span> lm_model <span class="o">&lt;-</span> lm<span class="p">(</span>em <span class="o">~</span> crr <span class="o">+</span> gdr <span class="o">+</span> rotv<span class="p">,</span> data <span class="o">=</span> input_pivot<span class="p">)</span> <span class="kp">summary</span><span class="p">(</span>lm_model<span class="p">)</span> prediction <span class="o">&lt;-</span> predict<span class="p">(</span>lm_model<span class="p">,</span> css_2018_pivot<span class="p">,</span> interval <span class="o">=</span> <span class="s">&quot;confidence&quot;</span><span class="p">,</span> level <span class="o">=</span> <span class="m">0.95</span><span class="p">)</span> prediction rf <span class="o">&lt;-</span> randomForest<span class="p">(</span>em <span class="o">~</span> <span class="m">.</span><span class="p">,</span> data <span class="o">=</span> input_pivot<span class="p">,</span> ntree <span class="o">=</span> <span class="m">5000</span><span class="p">)</span> rf importance<span class="p">(</span>rf<span class="p">)</span> rfp_all <span class="o">&lt;-</span> predict<span class="p">(</span>rf<span class="p">,</span> css_2018_pivot<span class="p">,</span> predict.all <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> rfp_all<span class="o">$</span>aggregate rf_ci <span class="o">&lt;-</span> quantile<span class="p">(</span>rfp_all<span class="o">$</span>individual<span class="p">,</span> <span class="kt">c</span><span class="p">(</span><span class="m">0.025</span><span class="p">,</span> <span class="m">0.975</span><span class="p">))</span> rf_ci </pre></div> </div> </div> Sun, 09 Sep 2018 10:54:14 -0800 http://swingleydev.com/blog/p/2009/ Equinox Marathon running statistics randomForest Lincoln in the Bardo, George Saunders http://swingleydev.com/blog/p/2007/ <div class="document"> <p>Well that was disappointing. I’ve read some of George Saunders’s short stories and was entertained, but I didn’t much enjoy <a class="reference external" href="https://www.amazon.com/dp/0812995341">Lincoln in the Bardo</a>. It’s the story of Abraham Lincoln coming to the graveyard to visit his newly dead son William, told from the perspective of a variety of lost souls that don’t believe they’re dead. There was no plot to speak of, and none of the large cast of characters was appealing. I did enjoy the sections that were fictional quotes from contemporary histories, many of which contradicted each other on the details, and some of the characters told funny stories, but it didn’t hold together as a novel.</p> <p>Widely acclaimed, winner of the Man Booker Prize, on many best of 2017 lists. Not my cup of tea.</p> <p>Music I listened to while reading this:</p> <ul class="simple"> <li>Carlow Town, Seamus Fogarty</li> <li>You’ve Got Tonight, Wiretree</li> </ul> </div> Wed, 03 Jan 2018 19:32:58 -0900 http://swingleydev.com/blog/p/2007/ books George Saunders All Our Wrong Todays, Elan Mastai http://swingleydev.com/blog/p/2006/ <div class="document"> <p>It’s one day until <a class="reference external" href="https://themorningnews.org/archive/tag/tob18">The Tournament of Books</a> announces the list of books for this year’s competition, and I’ve been reading some of the <a class="reference external" href="https://themorningnews.org/article/the-year-in-fiction-2017">Long List</a>, including the book commented on here, Elan Mastai’s <a class="reference external" href="https://www.amazon.com/dp/1101985135">All Our Wrong Todays</a>. I throughly enjoyed it. The writing sparkles, the narrator is hilarously self-deprecating, and because of the premise, there is a lot of insightful commentary about contemporary society.</p> <p>The main plot line is that the main character grew up in an alternative timeline where a device that produces free energy was invented in 1965 and put into the public domain. With free energy and fifty plus years, his world is something of a techonological utopia (especially compared with our present). However, for reasons best left unspoiled, he alters the timeline and is stuck here in our timeline with the rest of us.</p> <p>The narrator on waking up for the first time in our timeline:</p> <blockquote> Here, it’s like nobody has considered using even the most rudimentary technology to improve the process. Mattresses don’t subtly vibrate to keep your muscles loose. Targeted steam valves don’t clean your body in slumber. I mean, blankets are made from tufts of plant fiber spun into thread and occasionally stuffed with feathers. Feathers. Like from actual birds.</blockquote> <p>While there’s a lot of science-fiction concepts in the story, it’s really more of a love story than what it sounds like it’d be. There were a couple plot points I probably would have written differently, but the book is really funny, touching and thoughful. I highly recommend it. Best book I’ve read in 2018 so far…</p> <p>A couple other quotes I found particularly timely:</p> <blockquote> Part of the problem is this world is basically a cesspool of misogyny, male entitlement, and deeply demented gender constructs accepted as casual fact by outrageously large swaths of the human population.</blockquote> <p>and</p> <blockquote> People are despondent about the future because they’re increasingly aware that we, as a species, chased an inspiring dream that led us to ruin. We told ourselves the world is here for us to control, so the better our technology, the better our control, the better our world will be. The fact that for every leap in technology the world gets more sour and chaotic is deeply confusing. The better things we build keep making it worse. The belief that the world is here for humans to control is the philosophical bedrock of our civilization, but it’s a mistaken belief. Optimism is the pyre on which we’ve been setting ourselves aflame.</blockquote> <p>Music I listened to while reading this book:</p> <ul class="simple"> <li>Jesus Christ, Brand New</li> <li>House of Cards, Radiohead</li> <li>Conundrum, Hak Baker</li> <li>Die Young, Sylvan Esso</li> <li>Feat &amp; Force, Vagabon</li> <li>No War, Cari Cari</li> </ul> </div> Tue, 02 Jan 2018 15:18:40 -0900 http://swingleydev.com/blog/p/2006/ books Elan Mastai