Introduction
For the past two years I’ve played Yahoo fantasy baseball with a group of friends. It’s a fun addition to watching games because it requires you to pay attention to more than just the players on the teams you root for (especially important if your favorite “team” is the Athletics).
Last year we had a draft party and it was interesting to see how different people approached the draft. Some of us chose players for emotional reasons like whether they played for the team they rooted for or what country the player was from, and some used a very analytical approach. The last two years I’ve tended to be more on the emotional side, choosing preferrentialy for former Oakland Athletcs players in the first year, and current Phillies last year. Some brought computers to track choices and rankings, and some didn’t bring anything at all except their phones and minds.
I’ve been working on my draft strategy for next year, and plan to use a more analytical approach to the draft. I’m working on an app that will have all the players in draft ranked, and allow me to easily mark off who has been selected, and who I’ve added to my team in real time as the draft is underway.
One of the important considerations for choosing any player is what positions they can play. Not only do you need to field a complete team with pitchers, catchers, infielders, and outfielders, but some players are capable of playing multiple positions, and those players can be more valuable to a fantasy manager than their pure numbers would suggest because you can plug them into different positions on any given day. Last year I had Alec Bohm on my team, which allowed me to fill either first base (typically manned by Vladimir Gurerro Jr) or third, depending on what teams were playing or who might be injured or getting a day off. I used Brandon Drury to great effect two years ago because he was eligible for three infield positions.
Positional eligibility for Yahoo fantasy follows these rules:
- Position eligibility – 5 starts or 10 total appearances in a position.
- Pitcher eligibility – 3 starts to be a starter, or 5 relief appearances to qualify as a reliever.
In this post I will use Retrosheet event data to determine the positional eligibility for all the players who played in the majors last year. In cases where a player in the draft hasn’t played in the majors but is likely to reach Major League Baseball in 2024, I’ll just use whatever position the projections have him in.
Methods
I’m going to use the retrosheet R package to load the event files for 2023, then determine how many games each player started and substituted at each position, and apply Yahoo’s rules to determine eligibility.
We’ll load some libraries, get the team IDs, and map Retrosheet position IDs to the usual position abbreviations.
library(tidyr)
library(dplyr)
library(purrr)
library(retrosheet)
library(glue)
YEAR <- 2023
team_ids <- getTeamIDs(YEAR)
positions <- tribble(
~fieldPos, ~pos,
"1", "P",
"2", "C",
"3", "1B",
"4", "2B",
"5", "3B",
"6", "SS",
"7", "LF",
"8", "CF",
"9", "RF",
"10", "DH",
"11", "PH",
"12", "PR"
)
Next, we write a function to retrieve the data for a single team’s home games, and extract the starting and subtitution information, which are stored as $start and $sub matrices in the Retrosheet event files. Then loop over this function for every team, and convert position ID to the position abbreviations.
get_pbp <- function(team_id) {
print(glue("loading {team_id}"))
pbp <- getRetrosheet("play", YEAR, team_id)
starters <- map(
seq(1, length(pbp)),
function(game) {
pbp[[game]]$start |>
as_tibble()
}
) |>
list_rbind() |>
mutate(start_sub = "start")
subs <- map(
seq(1, length(pbp)),
function(game) {
pbp[[game]]$sub |>
as_tibble()
}
) |>
list_rbind() |>
mutate(start_sub = "sub")
bind_rows(starters, subs)
}
pbp_start_sub <- map(
team_ids,
get_pbp
) |>
list_rbind() |>
inner_join(positions, by = "fieldPos")
That data frame looks like this, with one row for every player that played in any game during the 2023 regular season:
# A tibble: 76,043 × 7 retroID name team batPos fieldPos start_sub pos <chr> <chr> <chr> <chr> <chr> <chr> <chr> 1 sprig001 George Springer 0 1 9 start RF 2 bichb001 Bo Bichette 0 2 6 start SS 3 guerv002 Vladimir Guerrero Jr. 0 3 3 start 1B 4 chapm001 Matt Chapman 0 4 5 start 3B 5 merrw001 Whit Merrifield 0 5 7 start LF 6 kirka001 Alejandro Kirk 0 6 2 start C 7 espis001 Santiago Espinal 0 7 4 start 2B 8 luplj001 Jordan Luplow 0 8 10 start DH 9 kierk001 Kevin Kiermaier 0 9 8 start CF 10 bassc001 Chris Bassitt 0 0 1 start P # ℹ 76,033 more rows
Next, we convert that into appearances by grouping the data by player, whether they were a starter or substitute, and by their position. Since each row in the original data frame is per game, we can use n() to count the games each player started and subbed for each position.
appearances <- pbp_start_sub |>
group_by(retroID, name, start_sub, pos) |>
summarize(games = n(), .groups = "drop") |>
pivot_wider(names_from = start_sub, values_from = games)
That looks like this:
# A tibble: 3,479 × 5 retroID name pos sub start <chr> <chr> <chr> <int> <int> 1 abadf001 Fernando Abad P 6 NA 2 abboa001 Andrew Abbott P NA 21 3 abboc001 Cory Abbott P 22 NA 4 abrac001 CJ Abrams SS 3 148 5 abrac001 CJ Abrams PH 2 NA 6 abrac001 CJ Abrams PR 1 NA 7 abrea001 Albert Abreu P 45 NA 8 abreb002 Bryan Abreu P 72 NA 9 abrej003 Jose Abreu 1B NA 134 10 abrej003 Jose Abreu DH NA 7 # ℹ 3,469 more rows
Finally, we group by the player and position, calculate eligibility, then group by player and combine all the positions they are eligible for into a single string. There’s a little funny business at the end to remove pitching eligibility from position players who are called into action as pitchers in blow out games, and player suffixes, which may or may not be necessary for matching against your projection ranks.
eligibility <- appearances |>
filter(pos != "PH", pos != "PR") |>
mutate(
sub = if_else(is.na(sub), 0, sub),
start = if_else(is.na(start), 0, start),
total = sub + start,
eligible = case_when(
pos == "P" & start >= 3 & sub >= 5 ~ "SP,RP",
pos == "P" & start >= 3 ~ "SP",
pos == "P" & sub >= 5 ~ "RP",
pos == "P" ~ "P",
start >= 5 | total >= 10 ~ pos,
TRUE ~ NA
)
) |>
filter(!is.na(eligible)) |>
arrange(retroID, name, desc(total)) |>
group_by(retroID, name) |>
summarize(
eligible = paste(eligible, collapse = ","),
eligible = gsub(",P$", "", eligible),
.groups = "drop"
) |>
mutate(
name = gsub(" (Jr.|II|IV)", "", name)
)
Here’s a look at the final results. You can download the full data as a CSV file below.
# A tibble: 1,402 × 3 retroID name eligible <chr> <chr> <chr> 1 abadf001 Fernando Abad RP 2 abboa001 Andrew Abbott SP 3 abboc001 Cory Abbott RP 4 abrac001 CJ Abrams SS 5 abrea001 Albert Abreu RP 6 abreb002 Bryan Abreu RP 7 abrej003 Jose Abreu 1B,DH 8 abrew002 Wilyer Abreu CF,LF 9 acevd001 Domingo Acevedo RP 10 actog001 Garrett Acton RP # ℹ 1,392 more rows
Who is eligible for the most positions? Here's the top 20:
retroID name eligible <chr> <chr> <chr> 1 herne001 Enrique Hernandez SS,2B,CF,3B,LF,1B 2 diaza003 Aledmys Diaz 3B,SS,LF,2B,1B,DH 3 hampg001 Garrett Hampson SS,CF,RF,2B,LF 4 mckiz001 Zach McKinstry 3B,2B,RF,LF,SS 5 ariag002 Gabriel Arias SS,1B,RF,3B 6 bertj001 Jon Berti SS,3B,LF,2B 7 biggc002 Cavan Biggio 2B,RF,1B,3B 8 cabro002 Oswaldo Cabrera LF,RF,3B,SS 9 castw003 Willi Castro LF,CF,3B,2B 10 dubom001 Mauricio Dubon 2B,CF,LF,SS 11 edmat001 Tommy Edman 2B,SS,CF,RF 12 gallj002 Joey Gallo 1B,LF,CF,RF 13 ibana001 Andy Ibanez 2B,3B,LF,RF 14 newmk001 Kevin Newman 3B,SS,2B,1B,DH 15 rengl001 Luis Rengifo 2B,SS,3B,RF 16 senzn001 Nick Senzel 3B,LF,CF,RF 17 shorz001 Zack Short 2B,SS,3B,RP 18 stees001 Spencer Steer 1B,3B,LF,2B,DH 19 vargi001 Ildemaro Vargas 3B,2B,SS,LF 20 vierm001 Matt Vierling RF,LF,3B,CF
References and Acknowledgements
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.