metachronistic http://swingleydev.com/blog/ Latest metachronistic posts en-us Tue, 13 Sep 2016 18:31:28 -0800 Equinox Marathon Weather http://swingleydev.com/blog/p/1999/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>Andrea and I are running the <a class="reference external" href="https://equinoxmarathon.org">Equinox Marathon</a> relay this Saturday with Norwegian dog musher Halvor Hoveid. He’s running the first leg, I’m running the second, and Andrea finishes the race. I ran the second leg as a training run a couple weeks ago and feel good about my physical conditioning, but the weather is always a concern this late in the fall, especially up on top of Ester Dome, where it can be dramatically different than the valley floor where the race starts and ends.</p> <p>Andrea ran the full marathon in 2009—2012 and the relay in 2008 and 2013—2015. I ran the full marathon in 2013. There was snow on the trail when I ran it, making the out and back section slippery and treacherous, and the cold temperatures at the start meant my feet were frozen until I got off of the single-track, nine or ten miles into the course. In other years, rain turned the powerline section to sloppy mud, or cold temperatures and freezing rain up on the Dome made it unpleasant for runners and supporters.</p> <p>In this post we will examine the available weather data, looking at the range of conditions we could experience this weekend. The current forecast from the National Weather Service is calling for mostly cloudy skies with highs in the 50s. Low temperatures the night before are predicted to be in the 40s, with rain in the forecast between now and then.</p> </div> <div class="section" id="methods"> <h1>Methods</h1> <p>There is no long term climate data for Ester Dome, but there are several valley-level stations with data going back to the start of the race in 1963. The best data comes from the Fairbanks Airport station and includes daily temperature, precipitation, and snowfall for all years, and wind speed and direction since 1984. I also looked at the data from the College Observatory station (FAOA2) behind the GI on campus and the University Experimental Farm, also on campus, but neither of these stations have a complete record. The daily data is part of the <a class="reference external" href="https://data.noaa.gov/dataset/global-historical-climatology-network-daily-ghcn-daily-version-3">Global Historical Climatology Network - Daily dataset</a>.</p> <p>I also have hourly data from 2008—2013 for both the Fairbanks Airport and a station located on Ester Dome that is no longer operational. We’ll use this to get a sense of what the possible temperatures on Ester Dome might have been based on the Fairbanks Airport data. Hourly data comes from the <a class="reference external" href="https://madis.noaa.gov">Meterological Assimilation Data Ingest System (MADIS)</a>.</p> <p>The R code used for this post appears at the bottom, and all the data used is available from <a class="reference external" href="//media.swingleydev.com/img/blog/2016/09/equinox_weather.tar.gz">here</a>.</p> </div> <div class="section" id="results"> <h1>Results</h1> <div class="section" id="ester-dome-temperatures"> <h2>Ester Dome temperatures</h2> <p>Since there isn’t a long-running weather station on Ester Dome (at least not one that’s publicly available), we’ll use the September data from an hourly Ester Dome station that was operational until 2014. If we join the Fairbanks Airport station data with this data wherever the observations are within 30 minutes of each other, we can see the relationship between Ester Dome temperature and temperature at the Fairbanks Airport.</p> <p>Here’s what that relationship looks like, including a linear regression line between the two. The shaded area in the lower left corner shows the region where the temperatures on Ester Dome are below freezing.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/09/pafa_fbsa_sept_temps.pdf"><img alt="Ester Dome and Fairbanks Airport temperatures" class="img-responsive" src="//media.swingleydev.com/img/blog/2016/09/pafa_fbsa_sept_temps.svgz" /></a> </div> <p>And the regression:</p> <pre class="literal-block"> ## ## Call: ## lm(formula = ester_dome_temp_f ~ pafa_temp_f, data = pafa_fbsa) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.649 -3.618 -1.224 2.486 22.138 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) -2.69737 0.77993 -3.458 0.000572 *** ## pafa_temp_f 0.94268 0.01696 55.567 &lt; 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.048 on 803 degrees of freedom ## Multiple R-squared: 0.7936, Adjusted R-squared: 0.7934 ## F-statistic: 3088 on 1 and 803 DF, p-value: &lt; 2.2e-16 </pre> <p>The regression model is highly significant, as are both coefficients, and the relationship explains almost 80% of the variation in the data. According to the model, in the month of September, Ester Dome average temperature is almost three degrees colder than at the airport. And whenever temperature at the airport drops below 37 degrees, it’s probably below freezing on the Dome.</p> </div> <div class="section" id="race-day-weather"> <h2>Race day weather</h2> <p>Temperatures at the airport on race day ranged from 19.9&nbsp;°F in 1972 to 68&nbsp;°F in 1969, and the range of average temperatures is 34.2 and 53&nbsp;°F. Using our model of Ester Dome temperatures, we get an average range of 29.5 and 47&nbsp;°F and an overall min / max of 16.1 / 61.4&nbsp;°F. Generally speaking, in most years it will be below freezing on Ester Dome, but possibly before most of the runners get up there.</p> <p>Precipitation (rain, sleet, or snow) has fallen on 15 out of 53 race days, or 28% of the time, and measurable snowfall has been recorded on four of those fifteen. The highest amount fell in 2014 with 0.36&nbsp;inches of liquid precipitation (no snow was recorded and the temperatures were between 45 and 51&nbsp;°F so it was almost certainly all rain, even on Ester Dome). More than a quarter of an inch of precipitation fell in three of the fifteen years (1990, 1992, and 2014), but most rainfall totals are much smaller.</p> <p>Measurable snow fell at the airport in four years, or seven percent of the time: 4.1&nbsp;inches in 1993, 2.1&nbsp;inches in 1985, 1.2&nbsp;inches in 1996 and 0.4&nbsp;inches in 1992. But that’s at the airport station. Four of the 15 years where measurable precipitation fell at the airport, but no snow fell, had possible minimum temperatures on Ester Dome that were below freezing. It’s likely that some of the precipitation recorded at the airport in those years was coming down as snow up on Ester Dome. If so, that means snow may have fallen on eight race days, bringing the percentage up to fifteen percent.</p> <p>Wind data from the airport has only been recorded since 1984, but from those years the average wind speed at the airport on race day is 4.9 miles per hour. Peak 2-minute winds during Equinox race day was 21 miles per hour in 2003. Unfortunately, no wind data is available for Ester Dome, but it’s likely to be higher than what is recorded at the airport. We do have wind speed data from the hourly Ester Dome station from 2008 through 2013, but the linear relationship between Ester Dome winds and winds at the Fairbanks airport only explain about a quarter of the variation in the data, and a look at the plot doesn’t give me much confidence in the relationship shown (see below).</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/09/pafa_fbsa_sept_wspds.pdf"><img alt="Ester Dome and Fairbanks Airport wind speeds" class="img-responsive" src="//media.swingleydev.com/img/blog/2016/09/pafa_fbsa_sept_wspds.svgz" /></a> </div> </div> <div class="section" id="weather-from-the-week-prior"> <h2>Weather from the week prior</h2> <p>It’s also useful to look at the weather from the week before the race, since excessive pre-race rain or snow can make conditions on race day very different, even if the race day weather is pleasant. The year I ran the full marathon (2013), it had snowed the week before and much of the trail in the woods before the water stop near Henderson and all of the out and back were covered in snow.</p> <p>The most dramatic example of this was 1992 where 23 inches of snow fell at the airport in the week prior to the race, with much higher totals up on the summit of Ester Dome. Measurable snow has been recorded at the airport in the week prior to six races, but all the weekly totals are under an inch except for the snow year of 1992.</p> <p>Precipitation has fallen in 42 of 53 pre-race weeks (79% of the time). Three years have had more than an inch of precipitation prior to the race: 1.49&nbsp;inches in 2015, 1.26&nbsp;inches in 1992 (which fell as snow), and 1.05&nbsp;inches in 2007. On average, just over two tenths of an inch of precipitation falls in the week before the race.</p> </div> </div> <div class="section" id="summary"> <h1>Summary</h1> <p>The following stacked plots shows the weather for all 53 runnings of the Equinox marathon. The top panel shows the range of temperatures on race day from the airport station (wide bars) and estimated on Ester Dome (thin lines below bars). The shaded area at the bottom shows where temperatures are below freezing. Dashed orange horizonal lines represent the average high and low temperature at the airport on race day; solid orange horizonal lines indicate estimated average high and low temperature on Ester Dome.</p> <p>The middle panel shows race day liquid precipitation (rain, melted snow). Bars marked with an asterisk indicate years where snow was also recorded at the airport, but remember that four of the other years with liquid precipitation probably experienced snow on Ester Dome (1977, 1986, 1991, and 1994) because the temperatures were likely to be below freezing at elevation.</p> <p>The bottom panel shows precipitation totals from the week prior to the race. Bars marked with an asterisk indicate weeks where snow was also recorded at the airport.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/09/equinox_weather_grid.pdf"><img alt="Equinox Marathon Weather" class="img-responsive" src="//media.swingleydev.com/img/blog/2016/09/equinox_weather_grid.svgz" /></a> </div> <p>Here’s a table with most of the data from the analysis. Record values for each variable are in bold.</p> <table border="1" class="tosf docutils"> <colgroup> <col width="13%" /> <col width="10%" /> <col width="10%" /> <col width="9%" /> <col width="10%" /> <col width="9%" /> <col width="10%" /> <col width="10%" /> <col width="11%" /> <col width="10%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">&nbsp;</th> <th class="head" colspan="7">Fairbanks Airport Station</th> <th class="head" colspan="2">Ester Dome (estimated)</th> </tr> <tr><th class="head">&nbsp;</th> <th class="head" colspan="5">Race Day</th> <th class="head" colspan="2">Previous Week</th> <th class="head" colspan="2">Race Day</th> </tr> <tr><th class="head">Date</th> <th class="head">min t</th> <th class="head">max t</th> <th class="head">wind</th> <th class="head">prcp</th> <th class="head">snow</th> <th class="head">prcp</th> <th class="head">snow</th> <th class="head">min t</th> <th class="head">max t</th> </tr> </thead> <tbody valign="top"> <tr><td>1963‑09‑21</td> <td>32.0</td> <td>54.0</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.01</td> <td>0.0</td> <td>27.5</td> <td>48.2</td> </tr> <tr><td>1964‑09‑19</td> <td>34.0</td> <td>57.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> <td>29.4</td> <td>51.9</td> </tr> <tr><td>1965‑09‑25</td> <td>37.9</td> <td>60.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.80</td> <td>0.0</td> <td>33.0</td> <td>54.0</td> </tr> <tr><td>1966‑09‑24</td> <td>36.0</td> <td>62.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.01</td> <td>0.0</td> <td>31.2</td> <td>55.8</td> </tr> <tr><td>1967‑09‑23</td> <td>35.1</td> <td>57.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>30.4</td> <td>51.9</td> </tr> <tr><td>1968‑09‑21</td> <td>23.0</td> <td>44.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.04</td> <td>0.0</td> <td>19.0</td> <td>38.9</td> </tr> <tr><td>1969‑09‑20</td> <td>35.1</td> <td><strong>68.0</strong></td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>30.4</td> <td><strong>61.4</strong></td> </tr> <tr><td>1970‑09‑19</td> <td>24.1</td> <td>39.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.42</td> <td>0.0</td> <td>20.0</td> <td>34.9</td> </tr> <tr><td>1971‑09‑18</td> <td>35.1</td> <td>55.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.14</td> <td>0.0</td> <td>30.4</td> <td>50.0</td> </tr> <tr><td>1972‑09‑23</td> <td><strong>19.9</strong></td> <td>42.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.01</td> <td>0.2</td> <td><strong>16.1</strong></td> <td>38.0</td> </tr> <tr><td>1973‑09‑22</td> <td>30.0</td> <td>44.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.05</td> <td>0.0</td> <td>25.6</td> <td>38.9</td> </tr> <tr><td>1974‑09‑21</td> <td>48.0</td> <td>60.1</td> <td>&nbsp;</td> <td>0.08</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>42.6</td> <td>54.0</td> </tr> <tr><td>1975‑09‑20</td> <td>37.9</td> <td>55.9</td> <td>&nbsp;</td> <td>0.02</td> <td>0.0</td> <td>0.02</td> <td>0.0</td> <td>33.0</td> <td>50.0</td> </tr> <tr><td>1976‑09‑18</td> <td>34.0</td> <td>59.0</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.54</td> <td>0.0</td> <td>29.4</td> <td>52.9</td> </tr> <tr><td>1977‑09‑24</td> <td>36.0</td> <td>48.9</td> <td>&nbsp;</td> <td>0.06</td> <td>0.0</td> <td>0.20</td> <td>0.0</td> <td>31.2</td> <td>43.4</td> </tr> <tr><td>1978‑09‑23</td> <td>30.0</td> <td>42.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.10</td> <td>0.3</td> <td>25.6</td> <td>37.0</td> </tr> <tr><td>1979‑09‑22</td> <td>35.1</td> <td>62.1</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.17</td> <td>0.0</td> <td>30.4</td> <td>55.8</td> </tr> <tr><td>1980‑09‑20</td> <td>30.9</td> <td>43.0</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.35</td> <td>0.0</td> <td>26.4</td> <td>37.8</td> </tr> <tr><td>1981‑09‑19</td> <td>37.0</td> <td>43.0</td> <td>&nbsp;</td> <td>0.15</td> <td>0.0</td> <td>0.04</td> <td>0.0</td> <td>32.2</td> <td>37.8</td> </tr> <tr><td>1982‑09‑18</td> <td>42.1</td> <td>61.0</td> <td>&nbsp;</td> <td>0.02</td> <td>0.0</td> <td>0.22</td> <td>0.0</td> <td>37.0</td> <td>54.8</td> </tr> <tr><td>1983‑09‑17</td> <td>39.9</td> <td>46.9</td> <td>&nbsp;</td> <td>0.00</td> <td>0.0</td> <td>0.05</td> <td>0.0</td> <td>34.9</td> <td>41.5</td> </tr> <tr><td>1984‑09‑22</td> <td>28.9</td> <td>60.1</td> <td>5.8</td> <td>0.00</td> <td>0.0</td> <td>0.08</td> <td>0.0</td> <td>24.5</td> <td>54.0</td> </tr> <tr><td>1985‑09‑21</td> <td>30.9</td> <td>42.1</td> <td>6.5</td> <td>0.14</td> <td>2.1</td> <td>0.57</td> <td>0.0</td> <td>26.4</td> <td>37.0</td> </tr> <tr><td>1986‑09‑20</td> <td>36.0</td> <td>52.0</td> <td>8.3</td> <td>0.07</td> <td>0.0</td> <td>0.21</td> <td>0.0</td> <td>31.2</td> <td>46.3</td> </tr> <tr><td>1987‑09‑19</td> <td>37.9</td> <td>61.0</td> <td>6.3</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>33.0</td> <td>54.8</td> </tr> <tr><td>1988‑09‑24</td> <td>37.0</td> <td>45.0</td> <td>4.0</td> <td>0.00</td> <td>0.0</td> <td>0.11</td> <td>0.0</td> <td>32.2</td> <td>39.7</td> </tr> <tr><td>1989‑09‑23</td> <td>36.0</td> <td>61.0</td> <td>8.5</td> <td>0.00</td> <td>0.0</td> <td>0.07</td> <td>0.5</td> <td>31.2</td> <td>54.8</td> </tr> <tr><td>1990‑09‑22</td> <td>37.9</td> <td>50.0</td> <td>7.8</td> <td>0.26</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>33.0</td> <td>44.4</td> </tr> <tr><td>1991‑09‑21</td> <td>36.0</td> <td>57.0</td> <td>4.5</td> <td>0.04</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> <td>31.2</td> <td>51.0</td> </tr> <tr><td>1992‑09‑19</td> <td>24.1</td> <td>33.1</td> <td>6.7</td> <td>0.01</td> <td>0.4</td> <td><strong>1.26</strong></td> <td><strong>23.0</strong></td> <td>20.0</td> <td>28.5</td> </tr> <tr><td>1993‑09‑18</td> <td>28.0</td> <td>37.0</td> <td>4.9</td> <td>0.29</td> <td><strong>4.1</strong></td> <td>0.37</td> <td>0.3</td> <td>23.7</td> <td>32.2</td> </tr> <tr><td>1994‑09‑24</td> <td>27.0</td> <td>51.1</td> <td>6.0</td> <td>0.02</td> <td>0.0</td> <td>0.08</td> <td>0.0</td> <td>22.8</td> <td>45.5</td> </tr> <tr><td>1995‑09‑23</td> <td>43.0</td> <td>66.9</td> <td>4.0</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>37.8</td> <td>60.4</td> </tr> <tr><td>1996‑09‑21</td> <td>28.9</td> <td>37.9</td> <td>6.9</td> <td>0.06</td> <td>1.2</td> <td>0.26</td> <td>0.0</td> <td>24.5</td> <td>33.0</td> </tr> <tr><td>1997‑09‑20</td> <td>27.0</td> <td>55.0</td> <td>3.8</td> <td>0.00</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> <td>22.8</td> <td>49.2</td> </tr> <tr><td>1998‑09‑19</td> <td>42.1</td> <td>60.1</td> <td>4.9</td> <td>0.00</td> <td>0.0</td> <td>0.37</td> <td>0.0</td> <td>37.0</td> <td>54.0</td> </tr> <tr><td>1999‑09‑18</td> <td>39.0</td> <td>64.9</td> <td>3.8</td> <td>0.00</td> <td>0.0</td> <td>0.26</td> <td>0.0</td> <td>34.1</td> <td>58.5</td> </tr> <tr><td>2000‑09‑16</td> <td>28.9</td> <td>50.0</td> <td>5.6</td> <td>0.00</td> <td>0.0</td> <td>0.30</td> <td>0.0</td> <td>24.5</td> <td>44.4</td> </tr> <tr><td>2001‑09‑22</td> <td>33.1</td> <td>57.0</td> <td>1.6</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>28.5</td> <td>51.0</td> </tr> <tr><td>2002‑09‑21</td> <td>33.1</td> <td>48.9</td> <td>3.8</td> <td>0.00</td> <td>0.0</td> <td>0.03</td> <td>0.0</td> <td>28.5</td> <td>43.4</td> </tr> <tr><td>2003‑09‑20</td> <td>26.1</td> <td>46.0</td> <td><strong>9.6</strong></td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>21.9</td> <td>40.7</td> </tr> <tr><td>2004‑09‑18</td> <td>26.1</td> <td>48.0</td> <td>4.3</td> <td>0.00</td> <td>0.0</td> <td>0.25</td> <td>0.0</td> <td>21.9</td> <td>42.6</td> </tr> <tr><td>2005‑09‑17</td> <td>37.0</td> <td>63.0</td> <td>0.9</td> <td>0.00</td> <td>0.0</td> <td>0.09</td> <td>0.0</td> <td>32.2</td> <td>56.7</td> </tr> <tr><td>2006‑09‑16</td> <td>46.0</td> <td>64.0</td> <td>4.3</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>40.7</td> <td>57.6</td> </tr> <tr><td>2007‑09‑22</td> <td>25.0</td> <td>45.0</td> <td>4.7</td> <td>0.00</td> <td>0.0</td> <td>1.05</td> <td>0.0</td> <td>20.9</td> <td>39.7</td> </tr> <tr><td>2008‑09‑20</td> <td>34.0</td> <td>51.1</td> <td>4.5</td> <td>0.00</td> <td>0.0</td> <td>0.08</td> <td>0.0</td> <td>29.4</td> <td>45.5</td> </tr> <tr><td>2009‑09‑19</td> <td>39.0</td> <td>50.0</td> <td>5.8</td> <td>0.00</td> <td>0.0</td> <td>0.25</td> <td>0.0</td> <td>34.1</td> <td>44.4</td> </tr> <tr><td>2010‑09‑18</td> <td>35.1</td> <td>64.9</td> <td>2.5</td> <td>0.00</td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>30.4</td> <td>58.5</td> </tr> <tr><td>2011‑09‑17</td> <td>39.9</td> <td>57.9</td> <td>1.3</td> <td>0.00</td> <td>0.0</td> <td>0.44</td> <td>0.0</td> <td>34.9</td> <td>51.9</td> </tr> <tr><td>2012‑09‑22</td> <td>46.9</td> <td>66.9</td> <td>6.0</td> <td>0.00</td> <td>0.0</td> <td>0.33</td> <td>0.0</td> <td>41.5</td> <td>60.4</td> </tr> <tr><td>2013‑09‑21</td> <td>24.3</td> <td>44.1</td> <td>5.1</td> <td>0.00</td> <td>0.0</td> <td>0.13</td> <td>0.6</td> <td>20.2</td> <td>38.9</td> </tr> <tr><td>2014‑09‑20</td> <td>45.0</td> <td>51.1</td> <td>1.6</td> <td><strong>0.36</strong></td> <td>0.0</td> <td>0.00</td> <td>0.0</td> <td>39.7</td> <td>45.5</td> </tr> <tr><td>2015‑09‑19</td> <td>37.9</td> <td>44.1</td> <td>2.9</td> <td>0.01</td> <td>0.0</td> <td>1.49</td> <td>0.0</td> <td>33.0</td> <td>38.9</td> </tr> </tbody> </table> </div> <div class="section" id="postscript"> <h1>Postscript</h1> <p>The weather for the 2016 race was just about perfect with temperatures ranging from 34 to 58 °F and no precipitation during the race. The airport did record 0.01 inches for the day, but this fell in the evening, after the race had finished.</p> </div> <div class="section" id="appendix-r-code"> <h1>Appendix: R code</h1> <div class="highlight"><pre> <span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>readr<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>lubridate<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>scales<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>grid<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>gtable<span class="p">)</span> race_dates <span class="o">&lt;-</span> read_fwf<span class="p">(</span><span class="s">&quot;equinox_marathon_dates.rst&quot;</span><span class="p">,</span> skip<span class="o">=</span><span class="m">5</span><span class="p">,</span> n_max<span class="o">=</span><span class="m">54</span><span class="p">,</span> fwf_positions<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span> <span class="m">6</span><span class="p">),</span> <span class="kt">c</span><span class="p">(</span><span class="m">9</span><span class="p">,</span> <span class="m">19</span><span class="p">),</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;number&quot;</span><span class="p">,</span> <span class="s">&quot;race_date&quot;</span><span class="p">)))</span> noaa <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host<span class="o">=</span><span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname<span class="o">=</span><span class="s">&quot;noaa&quot;</span><span class="p">)</span> <span class="c1"># pivot &lt;- tbl(noaa, build_sql(&quot;SELECT * FROM ghcnd_pivot</span> <span class="c1"># WHERE station_name = &#39;UNIVERSITY EXP STN&#39;&quot;))</span> <span class="c1"># pivot &lt;- tbl(noaa, build_sql(&quot;SELECT * FROM ghcnd_pivot</span> <span class="c1"># WHERE station_name = &#39;COLLEGE OBSY&#39;&quot;))</span> pivot <span class="o">&lt;-</span> tbl<span class="p">(</span>noaa<span class="p">,</span> build_sql<span class="p">(</span><span class="s">&quot;SELECT * FROM ghcnd_pivot</span> <span class="s"> WHERE station_name = &#39;FAIRBANKS INTL AP&#39;&quot;</span><span class="p">))</span> race_day_wx <span class="o">&lt;-</span> pivot <span class="o">%&gt;%</span> inner_join<span class="p">(</span>race_dates<span class="p">,</span> by<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">&quot;dte&quot;</span><span class="o">=</span><span class="s">&quot;race_date&quot;</span><span class="p">),</span> copy<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>tmin_f<span class="o">=</span><span class="kp">round</span><span class="p">((</span>tmin_c<span class="o">*</span><span class="m">9</span><span class="o">/</span><span class="m">5.0</span><span class="p">)</span><span class="m">+32</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> tmax_f<span class="o">=</span><span class="kp">round</span><span class="p">((</span>tmax_c<span class="o">*</span><span class="m">9</span><span class="o">/</span><span class="m">5.0</span><span class="p">)</span><span class="m">+32</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> prcp_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>prcp_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span> snow_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>snow_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> snwd_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>snow_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> awnd_mph<span class="o">=</span><span class="kp">round</span><span class="p">(</span>awnd_mps<span class="o">*</span><span class="m">2.2369</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> wsf2_mph<span class="o">=</span><span class="kp">round</span><span class="p">(</span>wsf2_mps<span class="o">*</span><span class="m">2.2369</span><span class="p">),</span> <span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>number<span class="p">,</span> race_date<span class="p">,</span> tmin_f<span class="p">,</span> tmax_f<span class="p">,</span> prcp_in<span class="p">,</span> snow_in<span class="p">,</span> snwd_in<span class="p">,</span> awnd_mph<span class="p">,</span> wsf2_mph<span class="p">)</span> week_before_race_day_wx <span class="o">&lt;-</span> pivot <span class="o">%&gt;%</span> mutate<span class="p">(</span>year<span class="o">=</span>date_part<span class="p">(</span><span class="s">&quot;year&quot;</span><span class="p">,</span> dte<span class="p">))</span> <span class="o">%&gt;%</span> inner_join<span class="p">(</span>race_dates <span class="o">%&gt;%</span> mutate<span class="p">(</span>year<span class="o">=</span>year<span class="p">(</span>race_date<span class="p">)),</span> copy<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>tmin_f<span class="o">=</span><span class="kp">round</span><span class="p">((</span>tmin_c<span class="o">*</span><span class="m">9</span><span class="o">/</span><span class="m">5.0</span><span class="p">)</span><span class="m">+32</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> tmax_f<span class="o">=</span><span class="kp">round</span><span class="p">((</span>tmax_c<span class="o">*</span><span class="m">9</span><span class="o">/</span><span class="m">5.0</span><span class="p">)</span><span class="m">+32</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> prcp_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>prcp_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span> snow_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>snow_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> snwd_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>snow_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> awnd_mph<span class="o">=</span><span class="kp">round</span><span class="p">(</span>awnd_mps<span class="o">*</span><span class="m">2.2369</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> wsf2_mph<span class="o">=</span><span class="kp">round</span><span class="p">(</span>wsf2_mps<span class="o">*</span><span class="m">2.2369</span><span class="p">,</span> <span class="m">1</span><span class="p">))</span> <span class="o">%&gt;%</span> select<span class="p">(</span>number<span class="p">,</span> year<span class="p">,</span> race_date<span class="p">,</span> dte<span class="p">,</span> prcp_in<span class="p">,</span> snow_in<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>week_before<span class="o">=</span>race_date<span class="o">-</span>days<span class="p">(</span><span class="m">7</span><span class="p">))</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>dte<span class="o">&lt;</span>race_date<span class="p">,</span> dte<span class="o">&gt;=</span>week_before<span class="p">)</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>number<span class="p">,</span> year<span class="p">,</span> race_date<span class="p">)</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span>pweek_prcp_in<span class="o">=</span><span class="kp">sum</span><span class="p">(</span>prcp_in<span class="p">),</span> pweek_snow_in<span class="o">=</span><span class="kp">sum</span><span class="p">(</span>snow_in<span class="p">))</span> all_wx <span class="o">&lt;-</span> race_day_wx <span class="o">%&gt;%</span> inner_join<span class="p">(</span>week_before_race_day_wx<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>tavg_f<span class="o">=</span><span class="p">(</span>tmin_f<span class="o">+</span>tmax_f<span class="p">)</span><span class="o">/</span><span class="m">2.0</span><span class="p">,</span> snow_label<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>snow_in<span class="o">&gt;</span><span class="m">0</span><span class="p">,</span> <span class="s">&#39;*&#39;</span><span class="p">,</span> <span class="kc">NA</span><span class="p">),</span> pweek_snow_label<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>pweek_snow_in<span class="o">&gt;</span><span class="m">0</span><span class="p">,</span> <span class="s">&#39;*&#39;</span><span class="p">,</span> <span class="kc">NA</span><span class="p">))</span> <span class="o">%&gt;%</span> select<span class="p">(</span>number<span class="p">,</span> year<span class="p">,</span> race_date<span class="p">,</span> tmin_f<span class="p">,</span> tmax_f<span class="p">,</span> tavg_f<span class="p">,</span> prcp_in<span class="p">,</span> snow_in<span class="p">,</span> snwd_in<span class="p">,</span> awnd_mph<span class="p">,</span> wsf2_mph<span class="p">,</span> pweek_prcp_in<span class="p">,</span> pweek_snow_in<span class="p">,</span> snow_label<span class="p">,</span> pweek_snow_label<span class="p">);</span> write_csv<span class="p">(</span>all_wx<span class="p">,</span> <span class="s">&quot;all_wx.csv&quot;</span><span class="p">)</span> madis <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host<span class="o">=</span><span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname<span class="o">=</span><span class="s">&quot;madis&quot;</span><span class="p">)</span> pafa_fbsa <span class="o">&lt;-</span> tbl<span class="p">(</span>madis<span class="p">,</span> build_sql<span class="p">(</span><span class="s">&quot;</span> <span class="s"> WITH pafa AS (</span> <span class="s"> SELECT dt_local, temp_f, wspd_mph</span> <span class="s"> FROM observations</span> <span class="s"> WHERE station_id = &#39;PAFA&#39; AND date_part(&#39;month&#39;, dt_local) = 9),</span> <span class="s"> fbsa AS (</span> <span class="s"> SELECT dt_local, temp_f, wspd_mph</span> <span class="s"> FROM observations</span> <span class="s"> WHERE station_id = &#39;FBSA2&#39; AND date_part(&#39;month&#39;, dt_local) = 9)</span> <span class="s"> SELECT pafa.dt_local, pafa.temp_f AS pafa_temp_f, pafa.wspd_mph as pafa_wspd_mph,</span> <span class="s"> fbsa.temp_f AS ester_dome_temp_f, fbsa.wspd_mph as ester_dome_wspd_mph</span> <span class="s"> FROM pafa</span> <span class="s"> INNER JOIN fbsa ON</span> <span class="s"> pafa.dt_local BETWEEN fbsa.dt_local - interval &#39;15 minutes&#39;</span> <span class="s"> AND fbsa.dt_local + interval &#39;15 minutes&#39;&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> write_csv<span class="p">(</span>pafa_fbsa<span class="p">,</span> <span class="s">&quot;pafa_fbsa.csv&quot;</span><span class="p">)</span> ester_dome_temps <span class="o">&lt;-</span> lm<span class="p">(</span>data<span class="o">=</span>pafa_fbsa<span class="p">,</span> ester_dome_temp_f <span class="o">~</span> pafa_temp_f<span class="p">)</span> <span class="kp">summary</span><span class="p">(</span>ester_dome_temps<span class="p">)</span> <span class="c1"># Model and coefficients are significant, r2 = 0.794</span> <span class="c1"># intercept = -2.69737, slope = 0.94268</span> all_wx_with_ed <span class="o">&lt;-</span> all_wx <span class="o">%&gt;%</span> mutate<span class="p">(</span>ed_min_temp_f<span class="o">=</span><span class="kp">round</span><span class="p">(</span>ester_dome_temps<span class="o">$</span>coefficients<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">+</span> tmin_f<span class="o">*</span>ester_dome_temps<span class="o">$</span>coefficients<span class="p">[</span><span class="m">2</span><span class="p">],</span> <span class="m">1</span><span class="p">),</span> ed_max_temp_f<span class="o">=</span><span class="kp">round</span><span class="p">(</span>ester_dome_temps<span class="o">$</span>coefficients<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">+</span> tmax_f<span class="o">*</span>ester_dome_temps<span class="o">$</span>coefficients<span class="p">[</span><span class="m">2</span><span class="p">],</span> <span class="m">1</span><span class="p">))</span> make_gt <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>outside<span class="p">,</span> instruments<span class="p">,</span> chamber<span class="p">,</span> width<span class="p">,</span> heights<span class="p">)</span> <span class="p">{</span> gt1 <span class="o">&lt;-</span> ggplot_gtable<span class="p">(</span>ggplot_build<span class="p">(</span>outside<span class="p">))</span> gt2 <span class="o">&lt;-</span> ggplot_gtable<span class="p">(</span>ggplot_build<span class="p">(</span>instruments<span class="p">))</span> gt3 <span class="o">&lt;-</span> ggplot_gtable<span class="p">(</span>ggplot_build<span class="p">(</span>chamber<span class="p">))</span> max_width <span class="o">&lt;-</span> unit.pmax<span class="p">(</span>gt1<span class="o">$</span>widths<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">3</span><span class="p">],</span> gt2<span class="o">$</span>widths<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">3</span><span class="p">],</span> gt3<span class="o">$</span>widths<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">3</span><span class="p">])</span> gt1<span class="o">$</span>widths<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">3</span><span class="p">]</span> <span class="o">&lt;-</span> max_width gt2<span class="o">$</span>widths<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">3</span><span class="p">]</span> <span class="o">&lt;-</span> max_width gt3<span class="o">$</span>widths<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">3</span><span class="p">]</span> <span class="o">&lt;-</span> max_width gt <span class="o">&lt;-</span> gtable<span class="p">(</span>widths <span class="o">=</span> unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span>width<span class="p">),</span> <span class="s">&quot;in&quot;</span><span class="p">),</span> heights <span class="o">=</span> unit<span class="p">(</span>heights<span class="p">,</span> <span class="s">&quot;in&quot;</span><span class="p">))</span> gt <span class="o">&lt;-</span> gtable_add_grob<span class="p">(</span>gt<span class="p">,</span> gt1<span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span> gt <span class="o">&lt;-</span> gtable_add_grob<span class="p">(</span>gt<span class="p">,</span> gt2<span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span> gt <span class="o">&lt;-</span> gtable_add_grob<span class="p">(</span>gt<span class="p">,</span> gt3<span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span> gt <span class="p">}</span> temps <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>all_wx_with_ed<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>year<span class="p">,</span> ymin<span class="o">=</span>tmin_f<span class="p">,</span> ymax<span class="o">=</span>tmax_f<span class="p">,</span> y<span class="o">=</span>tavg_f<span class="p">))</span> <span class="o">+</span> <span class="c1"># geom_abline(intercept=32, slope=0, color=&quot;blue&quot;, alpha=0.25) +</span> geom_rect<span class="p">(</span>data<span class="o">=</span>all_wx_with_ed <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">(</span>n<span class="o">=</span><span class="m">1</span><span class="p">),</span> aes<span class="p">(</span>xmin<span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span> xmax<span class="o">=</span><span class="kc">Inf</span><span class="p">,</span> ymin<span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span> ymax<span class="o">=</span><span class="m">32</span><span class="p">),</span> fill<span class="o">=</span><span class="s">&quot;darkcyan&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.25</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>aes<span class="p">(</span>slope<span class="o">=</span><span class="m">0</span><span class="p">,</span> intercept<span class="o">=</span><span class="kp">mean</span><span class="p">(</span>all_wx_with_ed<span class="o">$</span>tmin_f<span class="p">)),</span> color<span class="o">=</span><span class="s">&quot;darkorange&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.50</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">2</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>aes<span class="p">(</span>slope<span class="o">=</span><span class="m">0</span><span class="p">,</span> intercept<span class="o">=</span><span class="kp">mean</span><span class="p">(</span>all_wx_with_ed<span class="o">$</span>tmax_f<span class="p">)),</span> color<span class="o">=</span><span class="s">&quot;darkorange&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.50</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">2</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>aes<span class="p">(</span>slope<span class="o">=</span><span class="m">0</span><span class="p">,</span> intercept<span class="o">=</span><span class="kp">mean</span><span class="p">(</span>all_wx_with_ed<span class="o">$</span>ed_min_temp_f<span class="p">)),</span> color<span class="o">=</span><span class="s">&quot;darkorange&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.50</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>aes<span class="p">(</span>slope<span class="o">=</span><span class="m">0</span><span class="p">,</span> intercept<span class="o">=</span><span class="kp">mean</span><span class="p">(</span>all_wx_with_ed<span class="o">$</span>ed_max_temp_f<span class="p">)),</span> color<span class="o">=</span><span class="s">&quot;darkorange&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.50</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> geom_linerange<span class="p">(</span>aes<span class="p">(</span>ymin<span class="o">=</span>ed_min_temp_f<span class="p">,</span> ymax<span class="o">=</span>ed_max_temp_f<span class="p">))</span> <span class="o">+</span> <span class="c1"># geom_smooth(method=&quot;lm&quot;, se=FALSE) +</span> geom_linerange<span class="p">(</span>size<span class="o">=</span><span class="m">3</span><span class="p">,</span> color<span class="o">=</span><span class="s">&quot;grey30&quot;</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;&quot;</span><span class="p">,</span> limits<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1963</span><span class="p">,</span> <span class="m">2015</span><span class="p">),</span> breaks<span class="o">=</span><span class="kp">seq</span><span class="p">(</span><span class="m">1963</span><span class="p">,</span> <span class="m">2015</span><span class="p">,</span> <span class="m">2</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Temperature (deg F)&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> theme<span class="p">(</span>plot.margin<span class="o">=</span>unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="s">&#39;lines&#39;</span><span class="p">))</span> <span class="o">+</span> <span class="c1"># t, r, b, l</span> theme<span class="p">(</span>axis.text.x<span class="o">=</span>element_blank<span class="p">(),</span> axis.title.x<span class="o">=</span>element_blank<span class="p">(),</span> axis.ticks.x<span class="o">=</span>element_blank<span class="p">(),</span> panel.grid.minor.x<span class="o">=</span>element_blank<span class="p">())</span> <span class="o">+</span> ggtitle<span class="p">(</span><span class="s">&quot;Weather during and in the week prior to the Equinox Marathon</span> <span class="s"> Fairbanks Airport Station&quot;</span><span class="p">)</span> prcp <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>all_wx<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>year<span class="p">,</span> y<span class="o">=</span>prcp_in<span class="p">))</span> <span class="o">+</span> geom_bar<span class="p">(</span>stat<span class="o">=</span><span class="s">&quot;identity&quot;</span><span class="p">)</span> <span class="o">+</span> geom_text<span class="p">(</span>aes<span class="p">(</span>y<span class="o">=</span>prcp_in<span class="m">+0.025</span><span class="p">,</span> label<span class="o">=</span>snow_label<span class="p">))</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;&quot;</span><span class="p">,</span> limits<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1963</span><span class="p">,</span> <span class="m">2015</span><span class="p">),</span> breaks<span class="o">=</span><span class="kp">seq</span><span class="p">(</span><span class="m">1963</span><span class="p">,</span> <span class="m">2015</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Precipitation (inches)&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">5</span><span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> theme<span class="p">(</span>plot.margin<span class="o">=</span>unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="s">&#39;lines&#39;</span><span class="p">))</span> <span class="o">+</span> <span class="c1"># t, r, b, l</span> theme<span class="p">(</span>axis.text.x<span class="o">=</span>element_blank<span class="p">(),</span> axis.title.x<span class="o">=</span>element_blank<span class="p">(),</span> axis.ticks.x<span class="o">=</span>element_blank<span class="p">(),</span> panel.grid.minor.x<span class="o">=</span>element_blank<span class="p">())</span> pweek_prcp <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>all_wx<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>year<span class="p">,</span> y<span class="o">=</span>pweek_prcp_in<span class="p">))</span> <span class="o">+</span> geom_bar<span class="p">(</span>stat<span class="o">=</span><span class="s">&quot;identity&quot;</span><span class="p">)</span> <span class="o">+</span> geom_text<span class="p">(</span>aes<span class="p">(</span>y<span class="o">=</span>pweek_prcp_in<span class="m">+0.1</span><span class="p">,</span> label<span class="o">=</span>pweek_snow_label<span class="p">))</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;&quot;</span><span class="p">,</span> limits<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1963</span><span class="p">,</span> <span class="m">2015</span><span class="p">),</span> breaks<span class="o">=</span><span class="kp">seq</span><span class="p">(</span><span class="m">1963</span><span class="p">,</span> <span class="m">2015</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Pre-week precip (inches)&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">5</span><span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> theme<span class="p">(</span>plot.margin<span class="o">=</span>unit<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="s">&#39;lines&#39;</span><span class="p">),</span> axis.text.x<span class="o">=</span>element_text<span class="p">(</span>angle<span class="o">=</span><span class="m">45</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">1</span><span class="p">,</span> vjust<span class="o">=</span><span class="m">1</span><span class="p">),</span> panel.grid.minor.x<span class="o">=</span>element_blank<span class="p">())</span> rescale <span class="o">&lt;-</span> <span class="m">0.75</span> full_plot <span class="o">&lt;-</span> make_gt<span class="p">(</span>temps<span class="p">,</span> prcp<span class="p">,</span> pweek_prcp<span class="p">,</span> <span class="m">16</span><span class="o">*</span>rescale<span class="p">,</span> <span class="kt">c</span><span class="p">(</span><span class="m">7.5</span><span class="o">*</span>rescale<span class="p">,</span> <span class="m">2.5</span><span class="o">*</span>rescale<span class="p">,</span> <span class="m">3.0</span><span class="o">*</span>rescale<span class="p">))</span> pdf<span class="p">(</span><span class="s">&quot;equinox_weather_grid.pdf&quot;</span><span class="p">,</span> height<span class="o">=</span><span class="m">13</span><span class="o">*</span>rescale<span class="p">,</span> width<span class="o">=</span><span class="m">16</span><span class="o">*</span>rescale<span class="p">)</span> grid.newpage<span class="p">()</span> grid.draw<span class="p">(</span>full_plot<span class="p">)</span> dev.off<span class="p">()</span> fai_ed_temps <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>pafa_fbsa<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>pafa_temp_f<span class="p">,</span> y<span class="o">=</span>ester_dome_temp_f<span class="p">))</span> <span class="o">+</span> geom_rect<span class="p">(</span>data<span class="o">=</span>pafa_fbsa <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">(</span>n<span class="o">=</span><span class="m">1</span><span class="p">),</span> aes<span class="p">(</span>xmin<span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span> ymin<span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span> xmax<span class="o">=</span><span class="p">(</span><span class="m">32+2.69737</span><span class="p">)</span><span class="o">/</span><span class="m">0.94268</span><span class="p">,</span> ymax<span class="o">=</span><span class="m">32</span><span class="p">),</span> color<span class="o">=</span><span class="s">&quot;black&quot;</span><span class="p">,</span> fill<span class="o">=</span><span class="s">&quot;darkcyan&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.25</span><span class="p">)</span> <span class="o">+</span> geom_point<span class="p">(</span>position<span class="o">=</span>position_jitter<span class="p">())</span> <span class="o">+</span> geom_smooth<span class="p">(</span>method<span class="o">=</span><span class="s">&quot;lm&quot;</span><span class="p">,</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Fairbanks Airport Temperature (degrees F)&quot;</span><span class="p">)</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Ester Dome Temperature (degrees F)&quot;</span><span class="p">)</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> ggtitle<span class="p">(</span><span class="s">&quot;Relationship between Fairbanks Airport and Ester Dome Temperatures</span> <span class="s"> September, 2008-2013&quot;</span><span class="p">)</span> pdf<span class="p">(</span><span class="s">&quot;pafa_fbsa_sept_temps.pdf&quot;</span><span class="p">,</span> height<span class="o">=</span><span class="m">10.5</span><span class="p">,</span> width<span class="o">=</span><span class="m">10.5</span><span class="p">)</span> <span class="kp">print</span><span class="p">(</span>fai_ed_temps<span class="p">)</span> dev.off<span class="p">()</span> fai_ed_wspds <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>pafa_fbsa<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>pafa_wspd_mph<span class="p">,</span> y<span class="o">=</span>ester_dome_wspd_mph<span class="p">))</span> <span class="o">+</span> geom_point<span class="p">(</span>position<span class="o">=</span>position_jitter<span class="p">())</span> <span class="o">+</span> geom_smooth<span class="p">(</span>method<span class="o">=</span><span class="s">&quot;lm&quot;</span><span class="p">,</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Fairbanks Airport Wind Speed (MPH)&quot;</span><span class="p">)</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Ester Dome Wind (MPH)&quot;</span><span class="p">)</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> ggtitle<span class="p">(</span><span class="s">&quot;Relationship between Fairbanks Airport and Ester Dome Wind Speeds</span> <span class="s"> September, 2008-2013&quot;</span><span class="p">)</span> pdf<span class="p">(</span><span class="s">&quot;pafa_fbsa_sept_wspds.pdf&quot;</span><span class="p">,</span> height<span class="o">=</span><span class="m">10.5</span><span class="p">,</span> width<span class="o">=</span><span class="m">10.5</span><span class="p">)</span> <span class="kp">print</span><span class="p">(</span>fai_ed_wspds<span class="p">)</span> dev.off<span class="p">()</span> </pre></div> </div> </div> Tue, 13 Sep 2016 18:31:28 -0800 http://swingleydev.com/blog/p/1999/ weather running Equinox Marathon Buddy, 2001—2016 http://swingleydev.com/blog/p/1998/ <div class="document"> <div class="figure align-right"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2016/02/buddy_falling_asleep_on_a_house_2016-02.jpg"><img alt="" src="//media.swingleydev.com/img/blog/2016/09/buddy_on_house.jpg" style="width: 300px; height: 169px;" /></a> <p class="caption">Buddy</p> </div> <p>This morning I came down the stairs to a house without Buddy. He liked sleeping on the rug in front of the heater at the bottom of the stairs and he was always the first dog I saw in the morning.</p> <p>Buddy came to us in August 2003 as a two year old and became Andrea’s mighty lead dog. He had the confidence to lead her teams even in single lead by himself, listened to whomever was driving, and tolerated all manner of misbehavior from whatever dog was next to him. He retired from racing after eleven years, but was still enjoying himself and pulling hard up to his last race.</p> <p>Our friend, musher, and writer Carol Kaynor wrote this about him in 2012:</p> <blockquote> <p>But it will be Buddy who will move me nearly to tears. He will drive for 6 full miles. On the very far side of 10 years old, with his eleventh birthday coming up in a month, he will bring us home to fourth place for the day and a respectable time for the distance. I’ll step off that sled as happy as if I’d won.</p> <p>It wasn’t me pushing. I don’t get any credit for a run like that. It was Buddy pushing himself, like the champion he is.</p> </blockquote> <p>Read the whole post here: <a class="reference external" href="https://carolkaynor.wordpress.com/2012/02/25/tribute-to-a-champion/">Tribute to a champion</a>.</p> <p>After he retired, he enjoyed walking on the trails around our house, running around in the dog yard with the younger dogs, but most of all, relaxing in the house on the dog beds. He was a big, sweet, patient dog that took everything in stride and who wanted all the love and attention we could give him. The spot at the bottom of the stairs is empty now, and we will miss him.</p> <div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2012/03/buddy_post_day_1.jpg"><img alt="" src="//media.swingleydev.com/img/blog/2016/09/buddy_post_day_1_600.jpg" style="width: 450px; height: 600px;" /></a> <p class="caption">Buddy in lead in Tok, 2012</p> </div> <div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2012/06/mr_buddy.jpg"><img alt="" src="//media.swingleydev.com/img/blog/2016/09/mr_buddy_600.jpg" style="width: 600px; height: 600px;" /></a> <p class="caption">Mr. Buddy</p> </div> </div> Fri, 09 Sep 2016 07:28:42 -0800 http://swingleydev.com/blog/p/1998/ Buddy memorial Earliest 80+ degree daily maximum temperature in Fairbanks http://swingleydev.com/blog/p/1997/ <div class="document"> <p>This morning’s weather forecast:</p> <pre class="literal-block"> SUNNY. HIGHS IN THE UPPER 70S TO LOWER 80S. LIGHT WINDS. </pre> <p>May 13th seems very early in the year to hit 80 degrees in Fairbanks, so I decided to check it out. What I’m doing here is selecting all the dates where the temperature is above 80°F, then ranking those dates by year and date, and extracting the “winner” for each year (where <tt class="docutils literal">rank</tt> is 1).</p> <div class="highlight"><pre><span class="k">WITH</span> <span class="n">warm</span> <span class="k">AS</span> <span class="p">(</span> <span class="k">SELECT</span> <span class="k">extract</span><span class="p">(</span><span class="k">year</span> <span class="k">from</span> <span class="n">dte</span><span class="p">)</span> <span class="k">AS</span> <span class="k">year</span><span class="p">,</span> <span class="n">dte</span><span class="p">,</span> <span class="n">c_to_f</span><span class="p">(</span><span class="n">tmax_c</span><span class="p">)</span> <span class="k">AS</span> <span class="n">tmax_f</span> <span class="k">FROM</span> <span class="n">ghcnd_pivot</span> <span class="k">WHERE</span> <span class="n">station_name</span> <span class="o">=</span> <span class="s1">&#39;FAIRBANKS INTL AP&#39;</span> <span class="k">AND</span> <span class="n">c_to_f</span><span class="p">(</span><span class="n">tmax_c</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">80</span><span class="p">.</span><span class="mi">0</span><span class="p">),</span> <span class="n">ranked</span> <span class="k">AS</span> <span class="p">(</span> <span class="k">SELECT</span> <span class="k">year</span><span class="p">,</span> <span class="n">dte</span><span class="p">,</span> <span class="n">tmax_f</span><span class="p">,</span> <span class="n">row_number</span><span class="p">()</span> <span class="n">OVER</span> <span class="p">(</span><span class="n">PARTITION</span> <span class="k">BY</span> <span class="k">year</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">dte</span><span class="p">)</span> <span class="k">AS</span> <span class="n">rank</span> <span class="k">FROM</span> <span class="n">warm</span><span class="p">)</span> <span class="k">SELECT</span> <span class="n">dte</span><span class="p">,</span> <span class="k">extract</span><span class="p">(</span><span class="n">doy</span> <span class="k">from</span> <span class="n">dte</span><span class="p">)</span> <span class="k">AS</span> <span class="n">doy</span><span class="p">,</span> <span class="n">round</span><span class="p">(</span><span class="n">tmax_f</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="k">as</span> <span class="n">tmax_f</span> <span class="k">FROM</span> <span class="n">ranked</span> <span class="k">WHERE</span> <span class="n">rank</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">doy</span><span class="p">;</span> </pre></div> <p>And the results:</p> <table border="1" class="tosf docutils"> <caption>Earliest 80 degree dates, Fairbanks Airport</caption> <colgroup> <col width="27%" /> <col width="27%" /> <col width="47%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Date</th> <th class="head">Day of year</th> <th class="head">High temperature (°F)</th> </tr> </thead> <tbody valign="top"> <tr><td>1995-05-09</td> <td>129</td> <td>80.1</td> </tr> <tr><td>1975-05-11</td> <td>131</td> <td>80.1</td> </tr> <tr><td>1942-05-12</td> <td>132</td> <td>81.0</td> </tr> <tr><td>1915-05-14</td> <td>134</td> <td>80.1</td> </tr> <tr><td>1993-05-16</td> <td>136</td> <td>82.0</td> </tr> <tr><td>2002-05-20</td> <td>140</td> <td>80.1</td> </tr> <tr><td>2015-05-22</td> <td>142</td> <td>80.1</td> </tr> <tr><td>1963-05-22</td> <td>142</td> <td>84.0</td> </tr> <tr><td>1960-05-23</td> <td>144</td> <td>80.1</td> </tr> <tr><td>2009-05-24</td> <td>144</td> <td>80.1</td> </tr> <tr><td>…</td> <td>…</td> <td>…</td> </tr> </tbody> </table> <p>If we hit 80°F today, it’ll be the fourth earliest day of year to hit that temperature since records started being kept in 1904.</p> <p><span class="red">Update:</span> We didn’t reach 80°F on the 13th, but got to 82°F on May 14th, tied with that date in 1915 for the fourth earliest 80 degree temperature.</p> </div> Fri, 13 May 2016 06:02:10 -0800 http://swingleydev.com/blog/p/1997/ Fairbanks temperature weather climate Image similarity analysis: color http://swingleydev.com/blog/p/1995/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>There are now 777 photos in my <a class="reference external" href="https://swingleydev.com/photolog/">photolog</a>, organized in reverse chronological order (or chronologically if you append <tt class="docutils literal">/asc/</tt> to the url). With that much data, it occurred to me that there ought to be a way to organize these photos by color, similar to the way some people organize their <a class="reference external" href="http://www.slate.com/blogs/the_eye/2014/02/11/arranging_your_books_by_color_is_not_a_moral_failure.html">books</a>. I didn’t find a way of doing that, unfortunately, but I did spend some time experimenting with image similarity analysis using color.</p> <p>The basic idea is to generate histograms (counts of the pixels in the image that fit into pre-defined bins) for red, green and blue color combinations in the image. Once we have these values for each image, we use the chi square distance between the values as a distance metric that is a measure of color similarty between photos.</p> </div> <div class="section" id="code"> <h1>Code</h1> <p>I followed this tutorial <a class="reference external" href="http://www.pyimagesearch.com/2014/01/27/hobbits-and-histograms-a-how-to-guide-to-building-your-first-image-search-engine-in-python/">Building your first image search engine in Python</a> which uses code like this to generate 3D RGB histograms (all the code from this post is on <a class="reference external" href="https://github.com/cswingle/opencv_3d_rgb_histograms">GitHub</a>):</p> <div class="highlight"><pre><span class="kn">import</span> <span class="nn">cv2</span> <span class="k">def</span> <span class="nf">get_histogram</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">bins</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot; calculate a 3d RGB histogram from an image &quot;&quot;&quot;</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">image</span><span class="p">):</span> <span class="n">imgarray</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">imread</span><span class="p">(</span><span class="n">image</span><span class="p">)</span> <span class="n">hist</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">calcHist</span><span class="p">([</span><span class="n">imgarray</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="bp">None</span><span class="p">,</span> <span class="p">[</span><span class="n">bins</span><span class="p">,</span> <span class="n">bins</span><span class="p">,</span> <span class="n">bins</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">256</span><span class="p">])</span> <span class="n">hist</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">normalize</span><span class="p">(</span><span class="n">hist</span><span class="p">,</span> <span class="n">hist</span><span class="p">)</span> <span class="k">return</span> <span class="n">hist</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="bp">None</span> </pre></div> <p>Once you have them, you need to calculate all the pair-wise distances using a function like this:</p> <div class="highlight"><pre><span class="k">def</span> <span class="nf">chi2_distance</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="mf">1e-10</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot; distance between two histograms (a, b) &quot;&quot;&quot;</span> <span class="n">d</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">([((</span><span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span> <span class="k">for</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)])</span> <span class="k">return</span> <span class="n">d</span> </pre></div> <p>Getting histogram data using OpenCV in Python is pretty fast. Even with 32 bins, it only took about 45 minutes for all 777 images. Computing the distances between histograms was a lot slower, depending on how the code was written.</p> <p>With 8 bin histograms, a Python script using the function listed above, took just under 15 minutes to calculate each pairwise comparison (see the <a class="reference external" href="https://github.com/cswingle/opencv_3d_rgb_histograms/blob/master/rgb_histogram.py">rgb_histogram.py</a> script).</p> <p>Since the photos are all in a database so they can be displayed on the Internet, I figured a SQL function to calculate the distances would make the most sense. I could use the OpenCV Python code to generate histograms and add them to the database when the photo was inserted, and a SQL function to get the distances.</p> <p>Here’s the function:</p> <div class="highlight"><pre><span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">FUNCTION</span> <span class="n">chi_square_distance</span><span class="p">(</span><span class="n">a</span> <span class="nb">numeric</span><span class="p">[],</span> <span class="n">b</span> <span class="nb">numeric</span><span class="p">[])</span> <span class="k">RETURNS</span> <span class="nb">numeric</span> <span class="k">AS</span> <span class="err">$</span><span class="n">_$</span> <span class="k">DECLARE</span> <span class="k">sum</span> <span class="nb">numeric</span> <span class="p">:</span><span class="o">=</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="nb">integer</span><span class="p">;</span> <span class="k">BEGIN</span> <span class="k">FOR</span> <span class="n">i</span> <span class="k">IN</span> <span class="mi">1</span> <span class="p">..</span> <span class="n">array_upper</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">LOOP</span> <span class="n">IF</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="k">THEN</span> <span class="k">sum</span> <span class="o">=</span> <span class="k">sum</span> <span class="o">+</span> <span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-</span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="o">^</span><span class="mi">2</span> <span class="o">/</span> <span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span> <span class="k">END</span> <span class="n">IF</span><span class="p">;</span> <span class="k">END</span> <span class="n">LOOP</span><span class="p">;</span> <span class="k">RETURN</span> <span class="k">sum</span><span class="o">/</span><span class="mi">2</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$</span><span class="n">_$</span> <span class="k">LANGUAGE</span> <span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>Unfortunately, this is incredibly slow. Instead of the 15 minutes the Python script took, it took just under two hours to compute the pairwise distances on the 8 bin histograms.</p> <p>When your interpreted code is slow, the solution is often to re-write compiled code and use that. I found <a class="reference external" href="http://stackoverflow.com/questions/16992339/">some C code</a> on Stack Overflow for writing array functions. The PostgreSQL interface isn’t exactly intuitive, but here’s the gist of the code (<a class="reference external" href="https://github.com/cswingle/opencv_3d_rgb_histograms/blob/master/chi_square_distance.c">full code</a>):</p> <div class="highlight"><pre><span class="cp">#include &lt;postgres.h&gt;</span> <span class="cp">#include &lt;fmgr.h&gt;</span> <span class="cp">#include &lt;utils/array.h&gt;</span> <span class="cp">#include &lt;utils/lsyscache.h&gt;</span> <span class="cm">/* From intarray contrib header */</span> <span class="cp">#define ARRPTR(x) ( (float8 *) ARR_DATA_PTR(x) )</span> <span class="n">PG_MODULE_MAGIC</span><span class="p">;</span> <span class="n">PG_FUNCTION_INFO_V1</span><span class="p">(</span><span class="n">chi_square_distance</span><span class="p">);</span> <span class="n">Datum</span> <span class="nf">chi_square_distance</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">);</span> <span class="n">Datum</span> <span class="nf">chi_square_distance</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span> <span class="p">{</span> <span class="n">ArrayType</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="o">*</span><span class="n">b</span><span class="p">;</span> <span class="n">float8</span> <span class="o">*</span><span class="n">da</span><span class="p">,</span> <span class="o">*</span><span class="n">db</span><span class="p">;</span> <span class="n">float8</span> <span class="n">sum</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span> <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">;</span> <span class="n">da</span> <span class="o">=</span> <span class="n">ARRPTR</span><span class="p">(</span><span class="n">a</span><span class="p">);</span> <span class="n">db</span> <span class="o">=</span> <span class="n">ARRPTR</span><span class="p">(</span><span class="n">b</span><span class="p">);</span> <span class="c1">// Generate the sums.</span> <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">da</span> <span class="o">-</span> <span class="o">*</span><span class="n">db</span><span class="p">)</span> <span class="p">{</span> <span class="n">sum</span> <span class="o">=</span> <span class="n">sum</span> <span class="o">+</span> <span class="p">((</span><span class="o">*</span><span class="n">da</span> <span class="o">-</span> <span class="o">*</span><span class="n">db</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="o">*</span><span class="n">da</span> <span class="o">-</span> <span class="o">*</span><span class="n">db</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="o">*</span><span class="n">da</span> <span class="o">+</span> <span class="o">*</span><span class="n">db</span><span class="p">));</span> <span class="p">}</span> <span class="n">da</span><span class="o">++</span><span class="p">;</span> <span class="n">db</span><span class="o">++</span><span class="p">;</span> <span class="p">}</span> <span class="n">sum</span> <span class="o">=</span> <span class="n">sum</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">;</span> <span class="n">PG_RETURN_FLOAT8</span><span class="p">(</span><span class="n">sum</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>This takes 79 <em>seconds</em> to do all the distance calculates on 8 bin histograms. That kind of improvement is well worth the effort.</p> </div> <div class="section" id="results"> <h1>Results</h1> <p>After all that, the results aren’t as good as I was hoping. For some photos, such as the photos I took while re-raising the bridge across the creek, sorting by the histogram distances does actually identify other photos taken of the same process. For example, these two photos are the closest to each other by 32 bin histogram distance:</p> <div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/02/bridge.jp://media.swingleydev.com/img/photolog/2014/08/end_of_the_log_raised_to_the_bank_2014-08.jpg"><img alt="//media.swingleydev.com/img/photolog/2014/08/end_of_the_log_raised_to_the_bank_2014-08_600.jpg" class="img-responsive" src="//media.swingleydev.com/img/photolog/2014/08/end_of_the_log_raised_to_the_bank_2014-08_600.jpg" /></a> </div> <br><div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2014/08/moving_heavy_things:_log_edition_2014-08.jpg"><img alt="//media.swingleydev.com/img/photolog/2014/08/moving_heavy_things:_log_edition_2014-08_600.jpg" class="img-responsive" src="//media.swingleydev.com/img/photolog/2014/08/moving_heavy_things:_log_edition_2014-08_600.jpg" /></a> </div> <p>But there are certain images, such as the middle image in the three below that are very close to many of the photos in the database, even though they’re really not all that similar. I think this is because images with a lot of black in them (or white) wind up being similar to each other because of the large areas without color. It may be that performing the same sort of analysis using the HSV color space, but restricting the histogram to regions with high saturation and high value, would yield results that make more sense.</p> <div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2016/01/sunrise_at_abr_2016-01.jpg"><img alt="//media.swingleydev.com/img/photolog/2016/01/sunrise_at_abr_2016-01_600.jpg" class="img-responsive" src="//media.swingleydev.com/img/photolog/2016/01/sunrise_at_abr_2016-01_600.jpg" /></a> </div> <br><div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2013/01/arrival.jpg"><img alt="//media.swingleydev.com/img/photolog/2013/01/arrival_600.jpg" class="img-responsive" src="//media.swingleydev.com/img/photolog/2013/01/arrival_600.jpg" /></a> </div> <br><div class="figure align-center"> <a class="reference external image-reference" href="//media.swingleydev.com/img/photolog/2012/09/chinook_sunrise.jpg"><img alt="//media.swingleydev.com/img/photolog/2012/09/chinook_sunrise_600.jpg" class="img-responsive" src="//media.swingleydev.com/img/photolog/2012/09/chinook_sunrise_600.jpg" /></a> </div> </div> </div> Sun, 13 Mar 2016 08:27:18 -0800 http://swingleydev.com/blog/p/1995/ OpenCV SQL C photolog photos color RGB Getting hops data from YCH Hops website http://swingleydev.com/blog/p/1994/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>I’ve been <a class="reference external" href="https://swingley.org/brewing/recipe_list.php">brewing beer</a> since the <a class="reference external" href="https://swingley.org/brewing/old_listing.php">early 90s</a>, and since those days the number of hops available to homebrewers has gone from a handfull of varieties (Northern Brewer, Goldings, Fuggle, Willamette, Cluster) to well over a hundred. Whenever I go to my local brewing store I’m bewildered by the large variety of hops, most of which I’ve never heard of. I’m also not all that fond of super-citrusy hops like Cascade or it’s variants, so it is a challenge to find flavor and aroma hops that aren’t citrusy among the several dozen new varieties on display.</p> <p>Most of the hops at the store are <a class="reference external" href="http://ychhops.com/">Yakima Chief – Hop Union</a> branded, and they’ve got a great web site that lists all their varieties and has information about each hop. As convenient as a website can be, I’d rather have the data in a database where I can search and organize it myself. Since the data is all on the website, we can use a web scraping library to grab it and format it however we like.</p> <p>One note: websites change all the time, so whenever the structure of a site changes, the code to grab the data will need to be updated. I originally wrote the code for this post a couple weeks ago, scraping data from the Hop Union web site. This morning, that site had been replaced with an entirely different Yakima Chief – Hop Union site and I had to completely rewrite the code.</p> </div> <div class="section" id="rvest"> <h1><tt class="docutils literal">rvest</tt></h1> <p>I’m using the <a class="reference external" href="https://cran.r-project.org/web/packages/rvest/index.html">rvest</a> package from Hadley Wickham and RStudio to do the work of pulling the data from each page. In the Python world, <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a> would be the library I’d use, but there’s a fair amount of data manipulation needed here and I felt like <a class="reference external" href="https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html">dplyr</a> would be easier.</p> </div> <div class="section" id="process"> <h1>Process</h1> <p>First, load all the libraries we need.</p> <div class="highlight"><pre><span class="kn">library</span><span class="p">(</span>rvest<span class="p">)</span> <span class="c1"># scraping data from the web</span> <span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span> <span class="c1"># manipulation, filtering, grouping into tables</span> <span class="kn">library</span><span class="p">(</span>stringr<span class="p">)</span> <span class="c1"># string functions</span> <span class="kn">library</span><span class="p">(</span>tidyr<span class="p">)</span> <span class="c1"># creating columns from rows</span> <span class="kn">library</span><span class="p">(</span>RPostgreSQL<span class="p">)</span> <span class="c1"># dump final result to a PostgreSQL database</span> </pre></div> <p>Next, we retrieve the data from the main page that has all the varieties listed on it, and extract the list of links to each hop. In the code below, we read the entire document into a variable, <tt class="docutils literal">hop_varieties</tt> using the <tt class="docutils literal">read_html function</tt>.</p> <p>Once we’ve got the web page, we need to find the HTML nodes that contain links to the page for each individual hop. To do that, we use <tt class="docutils literal">html_nodes()</tt>, passing a <a class="reference external" href="http://www.w3schools.com/cssref/css_selectors.asp">CSS selector</a> to the function. In this case, we’re looking for <tt class="docutils literal">a</tt> tags that have a class of <tt class="docutils literal">card__name</tt>. I figured this out by looking at the raw source code from the page using my web browser’s inspection tools. If you right-click on what looks like a link on the page, one of the options in the pop-up menu is “inspect”, and when you choose that, it will show you the HTML for the element you clicked on. It can take a few tries to find the proper combination of tag, class, attribute or id to uniquely identify the things you want.</p> <p>The YCH site is pretty well organized, so this isn’t too difficult. Once we’ve got the nodes, we extract the links by retrieving the <tt class="docutils literal">href</tt> attribute from each one with <tt class="docutils literal">html_attr()</tt>.</p> <div class="highlight"><pre>hop_varieties <span class="o">&lt;-</span> read_html<span class="p">(</span><span class="s">&quot;http://ychhops.com/varieties&quot;</span><span class="p">)</span> hop_page_links <span class="o">&lt;-</span> hop_varieties <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&quot;a.card__name&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_attr<span class="p">(</span><span class="s">&quot;href&quot;</span><span class="p">)</span> </pre></div> <p>Now we have a list of links to all the varieties on the page. It turns out that they made a mistake when they transitioned to the new site and the links all have the wrong host (<tt class="docutils literal">ych.craft.dev</tt>). We can fix that by applying replacing the host in all the links.</p> <div class="highlight"><pre>fixed_links <span class="o">&lt;-</span> <span class="kp">sapply</span><span class="p">(</span>hop_page_links<span class="p">,</span> FUN<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="kp">sub</span><span class="p">(</span><span class="s">&#39;ych.craft.dev&#39;</span><span class="p">,</span> <span class="s">&#39;ychhops.com&#39;</span><span class="p">,</span> x<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">as.vector</span><span class="p">()</span> </pre></div> <p>Each page will need to be loaded, the relevant information extracted, and the data formatted into a suitable data structure. I think a data frame is the best format for this, where each row in the data frame represents the data for a single hop and each column is a piece of information from the web page.</p> <p>First we write a function the retrieves the data for a single hop and returns a one-row data frame with that data. Most of the data is pretty simple, with a single value for each hop. Name, description, type of hop, etc. Where it gets more complicated is the each hop can have more than one aroma category, and the statistics for each hop vary from one to the next. What we’ve done here is combine the aromas together into a single string, using the at symbol (<tt class="docutils literal">&#64;</tt>) to separate the parts. Since it’s unlikely that symbol will appear in the data, we can split it back apart later. We do the same thing for the other parameters, creating an <tt class="docutils literal">&#64;</tt>-delimited string for the items, and their values.</p> <div class="highlight"><pre>get_hop_stats <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>p<span class="p">)</span> <span class="p">{</span> hop_page <span class="o">&lt;-</span> read_html<span class="p">(</span>p<span class="p">)</span> hop_name <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;h1[itemprop=&quot;name&quot;]&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> type <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;div.hop-profile__data div[itemprop=&quot;additionalProperty&quot;]&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> type <span class="o">&lt;-</span> <span class="p">(</span>str_split<span class="p">(</span>type<span class="p">,</span> <span class="s">&#39; &#39;</span><span class="p">))[[</span><span class="m">1</span><span class="p">]][</span><span class="m">2</span><span class="p">]</span> region <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;div.hop-profile__data h5&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> description <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;div.hop-profile__profile p[itemprop=&quot;description&quot;]&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> aroma_profiles <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;div.hop-profile__profile h3.headline a[itemprop=&quot;category&quot;]&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> aroma_profiles <span class="o">&lt;-</span> <span class="kp">sapply</span><span class="p">(</span>aroma_profiles<span class="p">,</span> FUN<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="kp">sub</span><span class="p">(</span><span class="s">&#39;,$&#39;</span><span class="p">,</span> <span class="s">&#39;&#39;</span><span class="p">,</span> x<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">as.vector</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="kp">paste</span><span class="p">(</span>collapse<span class="o">=</span><span class="s">&quot;@&quot;</span><span class="p">)</span> composition_keys <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;div.hop-composition__item&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> composition_keys <span class="o">&lt;-</span> <span class="kp">sapply</span><span class="p">(</span>composition_keys<span class="p">,</span> FUN<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="kp">tolower</span><span class="p">(</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;[ -]&#39;</span><span class="p">,</span> <span class="s">&#39;_&#39;</span><span class="p">,</span> x<span class="p">)))</span> <span class="o">%&gt;%</span> <span class="kp">as.vector</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="kp">paste</span><span class="p">(</span>collapse<span class="o">=</span><span class="s">&quot;@&quot;</span><span class="p">)</span> composition_values <span class="o">&lt;-</span> hop_page <span class="o">%&gt;%</span> html_nodes<span class="p">(</span><span class="s">&#39;div.hop-composition__value&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> html_text<span class="p">()</span> <span class="o">%&gt;%</span> <span class="kp">paste</span><span class="p">(</span>collapse<span class="o">=</span><span class="s">&quot;@&quot;</span><span class="p">)</span> hop <span class="o">&lt;-</span> <span class="kt">data.frame</span><span class="p">(</span><span class="s">&#39;hop&#39;</span><span class="o">=</span>hop_name<span class="p">,</span> <span class="s">&#39;type&#39;</span><span class="o">=</span>type<span class="p">,</span> <span class="s">&#39;region&#39;</span><span class="o">=</span>region<span class="p">,</span> <span class="s">&#39;description&#39;</span><span class="o">=</span>description<span class="p">,</span> <span class="s">&#39;aroma_profiles&#39;</span><span class="o">=</span>aroma_profiles<span class="p">,</span> <span class="s">&#39;composition_keys&#39;</span><span class="o">=</span>composition_keys<span class="p">,</span> <span class="s">&#39;composition_values&#39;</span><span class="o">=</span>composition_values<span class="p">)</span> <span class="p">}</span> </pre></div> <p>With a function that takes a URL as input, and returns a single-row data frame, we use a common idiom in R to combine everything together. The inner-most <tt class="docutils literal">lapply</tt> function will run the function on each of the fixed variety links, and each single-row data frame will then be combined together using <tt class="docutils literal">rbind</tt> within <tt class="docutils literal">do.call</tt>.</p> <div class="highlight"><pre>all_hops <span class="o">&lt;-</span> <span class="kp">do.call</span><span class="p">(</span><span class="kp">rbind</span><span class="p">,</span> <span class="kp">lapply</span><span class="p">(</span>fixed_links<span class="p">,</span> get_hop_stats<span class="p">))</span> <span class="o">%&gt;%</span> tbl_df<span class="p">()</span> <span class="o">%&gt;%</span> arrange<span class="p">(</span>hop<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>id<span class="o">=</span>row_number<span class="p">())</span> </pre></div> <p>At this stage we’ve retrieved all the data from the website, but some of it has been encoded in a less that useful format.</p> </div> <div class="section" id="data-tidying"> <h1>Data tidying</h1> <p>To tidy the data, we want to extract only a few of the item / value pairs of data from the data frame, alpha acid, beta acid, co-humulone and total oil. We also need to remove carriage returns from the description and remove the aroma column.</p> <p>We split the keys and values back apart again using the <tt class="docutils literal">&#64;</tt> symbol used earlier to combine them, then use <tt class="docutils literal">unnest</tt> to create duplicate columns with each of the key / value pairs in them. <tt class="docutils literal">spread</tt> pivots these up into columns such that the end result has one row per hop with the relevant composition values as columns in the tidy data set.</p> <div class="highlight"><pre>hops <span class="o">&lt;-</span> all_hops <span class="o">%&gt;%</span> arrange<span class="p">(</span>hop<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>description<span class="o">=</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;\\r&#39;</span><span class="p">,</span> <span class="s">&#39;&#39;</span><span class="p">,</span> description<span class="p">),</span> keys<span class="o">=</span>str_split<span class="p">(</span>composition_keys<span class="p">,</span> <span class="s">&quot;@&quot;</span><span class="p">),</span> values<span class="o">=</span>str_split<span class="p">(</span>composition_values<span class="p">,</span> <span class="s">&quot;@&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> unnest<span class="p">(</span>keys<span class="p">,</span> values<span class="p">)</span> <span class="o">%&gt;%</span> spread<span class="p">(</span>keys<span class="p">,</span> values<span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>id<span class="p">,</span> hop<span class="p">,</span> type<span class="p">,</span> region<span class="p">,</span> alpha_acid<span class="p">,</span> beta_acid<span class="p">,</span> co_humulone<span class="p">,</span> total_oil<span class="p">,</span> description<span class="p">)</span> kable<span class="p">(</span>hops <span class="o">%&gt;%</span> select<span class="p">(</span>id<span class="p">,</span> hop<span class="p">,</span> type<span class="p">,</span> region<span class="p">,</span> alpha_acid<span class="p">)</span> <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">())</span> </pre></div> <table border="1" class="docutils"> <colgroup> <col width="9%" /> <col width="21%" /> <col width="19%" /> <col width="30%" /> <col width="21%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">id</th> <th class="head">hop</th> <th class="head">type</th> <th class="head">region</th> <th class="head">alpha_acid</th> </tr> </thead> <tbody valign="top"> <tr><td>1</td> <td>Admiral</td> <td>Bittering</td> <td>United Kingdom</td> <td>13 - 16%</td> </tr> <tr><td>2</td> <td>Ahtanum™</td> <td>Aroma</td> <td>Pacific Northwest</td> <td>3.5 - 6.5%</td> </tr> <tr><td>3</td> <td>Amarillo®</td> <td>Aroma</td> <td>Pacific Northwest</td> <td>7 - 11%</td> </tr> <tr><td>4</td> <td>Aramis</td> <td>Aroma</td> <td>France</td> <td>7.9 - 8.3%</td> </tr> <tr><td>5</td> <td>Aurora</td> <td>Dual</td> <td>Slovenia</td> <td>7 - 9.5%</td> </tr> <tr><td>6</td> <td>Bitter Gold</td> <td>Dual</td> <td>Pacific Northwest</td> <td>12 - 14.5%</td> </tr> </tbody> </table> <p>For the aromas we have a one to many relationship where each hop has one or more aroma categories associated. We could fully normalize this by created an aroma table and a join table that connects hop and aroma, but this data is simple enough that I just created the join table itself. We’re using the same <tt class="docutils literal">str_split</tt> / <tt class="docutils literal">unnest</tt> method we used before, except that in this case we don't want to turn those into columns, we <em>want</em> a separate row for each hop × aroma combination.</p> <div class="highlight"><pre>hops_aromas <span class="o">&lt;-</span> all_hops <span class="o">%&gt;%</span> select<span class="p">(</span>id<span class="p">,</span> hop<span class="p">,</span> aroma_profiles<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>aroma<span class="o">=</span>str_split<span class="p">(</span>aroma_profiles<span class="p">,</span> <span class="s">&quot;@&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> unnest<span class="p">(</span>aroma<span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>id<span class="p">,</span> hop<span class="p">,</span> aroma<span class="p">)</span> </pre></div> </div> <div class="section" id="saving-and-exporting"> <h1>Saving and exporting</h1> <p>Finally, we save the data and export it into a PostgreSQL database.</p> <div class="highlight"><pre><span class="kp">save</span><span class="p">(</span><span class="kt">list</span><span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">&quot;hops&quot;</span><span class="p">,</span> <span class="s">&quot;hops_aromas&quot;</span><span class="p">),</span> file<span class="o">=</span><span class="s">&quot;ych_hops.rdata&quot;</span><span class="p">)</span> beer <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host<span class="o">=</span><span class="s">&quot;dryas.swingleydev.com&quot;</span><span class="p">,</span> dbname<span class="o">=</span><span class="s">&quot;beer&quot;</span><span class="p">,</span> port<span class="o">=</span><span class="m">5434</span><span class="p">,</span> user<span class="o">=</span><span class="s">&quot;cswingle&quot;</span><span class="p">)</span> dbWriteTable<span class="p">(</span>beer<span class="o">$</span>con<span class="p">,</span> <span class="s">&quot;ych_hops&quot;</span><span class="p">,</span> hops <span class="o">%&gt;%</span> <span class="kt">data.frame</span><span class="p">(),</span> row.names<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> dbWriteTable<span class="p">(</span>beer<span class="o">$</span>con<span class="p">,</span> <span class="s">&quot;ych_hops_aromas&quot;</span><span class="p">,</span> hops_aromas <span class="o">%&gt;%</span> <span class="kt">data.frame</span><span class="p">(),</span> row.names<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> </pre></div> </div> <div class="section" id="usage"> <h1>Usage</h1> <p>I created a view in the database that combines all the aroma categories into a Postgres array type using this query. I also use a pair of regular expressions to convert the alpha acid string into a Postgres numrange.</p> <div class="highlight"><pre><span class="k">CREATE</span> <span class="k">VIEW</span> <span class="n">ych_basic_hop_data</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="n">ych_hops</span><span class="p">.</span><span class="n">id</span><span class="p">,</span> <span class="n">ych_hops</span><span class="p">.</span><span class="n">hop</span><span class="p">,</span> <span class="n">array_agg</span><span class="p">(</span><span class="n">aroma</span><span class="p">)</span> <span class="k">AS</span> <span class="n">aromas</span><span class="p">,</span> <span class="k">type</span><span class="p">,</span> <span class="n">numrange</span><span class="p">(</span> <span class="n">regexp_replace</span><span class="p">(</span><span class="n">alpha_acid</span><span class="p">,</span> <span class="s1">&#39;([0-9.]+).*&#39;</span><span class="p">,</span> <span class="n">E</span><span class="s1">&#39;\\1&#39;</span><span class="p">)::</span><span class="nb">numeric</span><span class="p">,</span> <span class="n">regexp_replace</span><span class="p">(</span><span class="n">alpha_acid</span><span class="p">,</span> <span class="s1">&#39;.*- ([0-9.]+)%&#39;</span><span class="p">,</span> <span class="n">E</span><span class="s1">&#39;\\1&#39;</span><span class="p">)::</span><span class="nb">numeric</span><span class="p">,</span> <span class="s1">&#39;[]&#39;</span><span class="p">)</span> <span class="k">AS</span> <span class="n">alpha_acid_percent</span><span class="p">,</span> <span class="n">description</span> <span class="k">FROM</span> <span class="n">ych_hops</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">ych_hops_aromas</span> <span class="k">USING</span><span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">ych_hops</span><span class="p">.</span><span class="n">id</span><span class="p">,</span> <span class="n">ych_hops</span><span class="p">.</span><span class="n">hop</span><span class="p">,</span> <span class="k">type</span><span class="p">,</span> <span class="n">alpha_acid</span><span class="p">,</span> <span class="n">description</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">hop</span><span class="p">;</span> </pre></div> <p>With this, we can, for example, find US aroma hops that are spicy, but without citrus using the <tt class="docutils literal">ANY()</tt> and <tt class="docutils literal">ALL()</tt> array functions.</p> <div class="highlight"><pre><span class="k">SELECT</span> <span class="n">hop</span><span class="p">,</span> <span class="n">region</span><span class="p">,</span> <span class="k">type</span><span class="p">,</span> <span class="n">aromas</span><span class="p">,</span> <span class="n">alpha_acid_percent</span> <span class="k">FROM</span> <span class="n">ych_basic_hop_data</span> <span class="k">WHERE</span> <span class="k">type</span> <span class="o">=</span> <span class="s1">&#39;Aroma&#39;</span> <span class="k">AND</span> <span class="n">region</span> <span class="o">=</span> <span class="s1">&#39;Pacific Northwest&#39;</span> <span class="k">AND</span> <span class="s1">&#39;Spicy&#39;</span> <span class="o">=</span> <span class="k">ANY</span><span class="p">(</span><span class="n">aromas</span><span class="p">)</span> <span class="k">AND</span> <span class="s1">&#39;Citrus&#39;</span> <span class="o">!=</span> <span class="k">ALL</span><span class="p">(</span><span class="n">aromas</span><span class="p">)</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">alpha_acid_percent</span><span class="p">;</span> <span class="n">hop</span> <span class="o">|</span> <span class="n">region</span> <span class="o">|</span> <span class="k">type</span> <span class="o">|</span> <span class="n">aromas</span> <span class="o">|</span> <span class="n">alpha_acid_percent</span> <span class="c1">-----------+-------------------+-------+------------------------------+--------------------</span> <span class="n">Crystal</span> <span class="o">|</span> <span class="n">Pacific</span> <span class="n">Northwest</span> <span class="o">|</span> <span class="n">Aroma</span> <span class="o">|</span> <span class="err">{</span><span class="n">Floral</span><span class="p">,</span><span class="n">Spicy</span><span class="err">}</span> <span class="o">|</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span> <span class="n">Hallertau</span> <span class="o">|</span> <span class="n">Pacific</span> <span class="n">Northwest</span> <span class="o">|</span> <span class="n">Aroma</span> <span class="o">|</span> <span class="err">{</span><span class="n">Floral</span><span class="p">,</span><span class="n">Spicy</span><span class="p">,</span><span class="n">Herbal</span><span class="err">}</span> <span class="o">|</span> <span class="p">[</span><span class="mi">3</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">.</span><span class="mi">5</span><span class="p">]</span> <span class="n">Tettnang</span> <span class="o">|</span> <span class="n">Pacific</span> <span class="n">Northwest</span> <span class="o">|</span> <span class="n">Aroma</span> <span class="o">|</span> <span class="err">{</span><span class="n">Earthy</span><span class="p">,</span><span class="n">Floral</span><span class="p">,</span><span class="n">Spicy</span><span class="p">,</span><span class="n">Herbal</span><span class="err">}</span> <span class="o">|</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span> <span class="n">Mt</span><span class="p">.</span> <span class="n">Hood</span> <span class="o">|</span> <span class="n">Pacific</span> <span class="n">Northwest</span> <span class="o">|</span> <span class="n">Aroma</span> <span class="o">|</span> <span class="err">{</span><span class="n">Spicy</span><span class="p">,</span><span class="n">Herbal</span><span class="err">}</span> <span class="o">|</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span><span class="mi">6</span><span class="p">.</span><span class="mi">5</span><span class="p">]</span> <span class="n">Santiam</span> <span class="o">|</span> <span class="n">Pacific</span> <span class="n">Northwest</span> <span class="o">|</span> <span class="n">Aroma</span> <span class="o">|</span> <span class="err">{</span><span class="n">Floral</span><span class="p">,</span><span class="n">Spicy</span><span class="p">,</span><span class="n">Herbal</span><span class="err">}</span> <span class="o">|</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span><span class="mi">8</span><span class="p">.</span><span class="mi">5</span><span class="p">]</span> <span class="n">Ultra</span> <span class="o">|</span> <span class="n">Pacific</span> <span class="n">Northwest</span> <span class="o">|</span> <span class="n">Aroma</span> <span class="o">|</span> <span class="err">{</span><span class="n">Floral</span><span class="p">,</span><span class="n">Spicy</span><span class="err">}</span> <span class="o">|</span> <span class="p">[</span><span class="mi">9</span><span class="p">.</span><span class="mi">2</span><span class="p">,</span><span class="mi">9</span><span class="p">.</span><span class="mi">7</span><span class="p">]</span> </pre></div> </div> <div class="section" id="code"> <h1>Code</h1> <p>The RMarkdown version of this post, including the code can be downloaded from GitHub:</p> <p><a class="reference external" href="https://github.com/cswingle/ych_hops_scraper">https://github.com/cswingle/ych_hops_scraper</a></p> </div> </div> Sun, 28 Feb 2016 09:22:50 -0900 http://swingleydev.com/blog/p/1994/ brewing R rvest web scraping PostgreSQL hops