On Wednesday I reported the results of my analysis examining the average date of first snow recorded at the Fairbanks Airport weather station. It was based on the snow_flag boolean field in the ISD database. In that post I mentioned that examining snow depth data might show the date on which permanent snow (snow that lasts all winter) first falls in Fairbanks. I’m calling this the first “true” snowfall of the season.
For this analysis I looked at the snow depth field in the ISD database for the Fairbanks station. The data was present for the years between 1973 and 1999, but isn’t in the database before that date. I’m not sure why it’s not in there after 1999, but luckily I’ve been collecting and archiving the data in the Fairbanks Daily Climate Summary (which includes a snow depth measurement) since late 2000. Combining those two data sets, I’ve got data for 27 years.
The SQL query I came up with to get the data from the data sets is a good estimate of what we’re interested in, but isn’t perfect because it only finds the date of first snow that lasts at least a week. In a place like Fairbanks where the turn to winter is so rapid and so dependent on the high albedo of snow cover, I think it’s close enough to the truth. Unfortunately, the query is brutally slow because it involves six (!) inner self-joins. The idea is to join the table containing snow depth data against itself, incrementing the date by one day at each join. The result set before the WHERE statement is the data for each date, plus the data for the six days following that date. The WHERE clause requires that snow depth on all those seven dates is above zero. This large query is a subquery of the main query which selects the earliest date found in each year.
There must be a better way to deal with conditions like this where we’re interested in the consecutive nature of the phenomenon, but I couldn’t figure out any other way to handle it in SQL, so here it is:
SELECT year, min(date) FROM ( SELECT extract(year from a.dt) AS year, to_char(extract(month from a.dt), '00') || '-' || ltrim(to_char(extract(day from a.dt), '00')) AS date FROM isd_daily AS a INNER JOIN isd_daily AS b ON a.isd_id=b.isd_id AND a.dt=b.dt - interval '1 day' INNER JOIN isd_daily AS c ON a.isd_id=c.isd_id AND a.dt=c.dt - interval '2 days' INNER JOIN isd_daily AS d ON a.isd_id=d.isd_id AND a.dt=d.dt - interval '3 day' INNER JOIN isd_daily AS e ON a.isd_id=e.isd_id AND a.dt=e.dt - interval '4 day' INNER JOIN isd_daily AS f ON a.isd_id=f.isd_id AND a.dt=f.dt - interval '5 day' INNER JOIN isd_daily AS g ON a.isd_id=g.isd_id AND a.dt=g.dt - interval '6 day' WHERE a.isd_id = '702610-26411' AND a.snow_depth > 0 AND b.snow_depth > 0 AND c.snow_depth > 0 AND d.snow_depth > 0 AND e.snow_depth > 0 AND f.snow_depth > 0 AND g.snow_depth > 0 AND extract(month from a.dt) > 7 ) AS snow_depth_conseq GROUP BY year ORDER BY year;
See what I mean? It’s pretty ugly. Running the result through the same R script as in my previous snowfall post yields this plot:
Between 1973 and 2008 we’ve gotten snow lasting the whole winter starting as early as September 12th (that was the infamous 1992), and as late as the first of November (1976). The median date is October 13th, which matches my impression. Now that the leaves have largely fallen off the trees, I’m hoping we get our first true snowfall on the early end of the distribution. We’ve still got a few things to take care of (a couple new dog houses, insulating the repaired septic line, etc.), but once those are done, I’m ready for the Creek to freeze and snow to blanket the trails.