/r/Sabermetrics
Sabermetrics is the search for objective knowledge about baseball.
Sabermetrics - The search for objective knowledge about baseball through the analysis of empirical evidence.
Data Sources |
---|
Retro Sheet |
Sean Lahman Database |
DingerDB |
Fangraphs |
Baseball Reference |
Stat Corner |
Baseball Heat Maps |
Pitch F/X |
---|
Brooks Baseball Pitch f/x |
Baseball Savant |
TexasLeaguers |
AL East | AL Central | AL West |
---|---|---|
Yankees | Tigers | Oakland |
Orioles | WhiteSox | Rangers |
Rays | Royals | Angels |
Blue Jays | Indians | Mariners |
Red Sox | Twins | Astros |
NL East | NL Central | NL West |
---|---|---|
Nationals | Reds | Giants |
Braves | Cardinals | Dodgers |
Phillies | Brewers | D-Backs |
Mets | Pirates | Padres |
Marlins | Cubs | Rockies |
Related Subreddits |
---|
/r/baseball |
/r/baseballstats |
/r/fantasybaseball |
/r/sultansofstats |
/r/sportsanalytics |
/r/footballstrategy |
/r/nflstatheads |
Misc. |
---|
/r/Sabermetrics Weekly Stat Discussions |
Reddit Markdown Primer - how to make charts, other stuff in reddit |
/r/Sabermetrics
Is anyone familiar with any attempts to quantify the expected cost of pitch tipping? My group chat sent this tweet
https://x.com/jomboy_/status/1842062696847393120?s=46&t=WHf4nK-muUXyQhXDAWyXMA
And suggested Devin Williams got rocked because of this but after watching the video I remained a bit skeptical because it was so subtle. I watched the video in the first comment by Trevor May and he walks through David Bednar’s performance and thinks he was tipping his pitches (which I can get onboard with given the more visible changes and the continual steep drop in performance this year).
But for a one game blowup it does seem unlikely that Williams didn’t tip his pitches all year (or he did and teams didn’t pick up on it) until the Mets did in the postseason.
So I was trying to approximate the likelihood using Bednar’s change in expected ERA YoY to guesstimate the impact on performance and assess the relatively likelihoods but I was wondering if anyone else has done this more quantitatively and systematically.
So, over the summer, as an experiment, I tried to come up with a run prediction formula solely based on XBH. Without getting too technical, I assigned a value for 2B+3B, a value for HR, and a value to HR per 2B+3B. I didn't factor BB rate or exit velocity. I based my values solely on 2023 league averages.
Once I set this up, I went team by team for 2023, and found that my formula correlated with total runs by about 95.5 percent, almost identical to the "technical" Runs Created formula based on Bill James work, and was more predictive than OPS. I then tested my formula on every team in 2022, which lead to a 97.1% correlation, and every team in 2021, which ended up at 96.2%. While I haven't yet gone team-by-team prior to 2021, I tested it against league averages each year from 2010-2019, and this still produced correlation at 95.5%, so I had hope that I might be on to something.
However, when crunching team-by-team 2024 numbers, the James model resulted in its usual 96%, whereas my model suddenly dropped to 90%. Specifically, it tended to underrate good offenses and overrate bad ones by a much larger degree than the three previous years. So my question is: what was different about this season that could've lead to this result? What would've caused a 96% correlation based on 110 samples to dip to 90% in this year's 30 samples? When searching everything available on fangraphs, I wasn't noticing anything that seemed obviously different this season.
As an aside, have any of you tried a similar experiment? And if so, what did you find?
I was talking to friend re todays Mets Braves as compared to Royals A's in 2014 and visually comaparing the WPA charts, and I suggested that WPA charts would better show action if they were on a log chart, since, say, a 3 run homer in 1-0 game in the third inning would make the chart swing steeply from like 65% to 30% despite not really making for a "crazy" game
Anyone know how I can find something like that? Or maybe the best way to download csv/xcelof individual games' wpas so I can do it myself
Any sites to search for L/R batting splits for the 80's? Fangraphs only shows it on league-wide scale for 21st century players. BRef shows it for individual players, but can't find where to search for it on a league-wide scale either
Not a specifically sabermetric question, but I assumed this subreddit would be the better one to ask
Edit: To be more specific. I want to sort through players by splits (similar to how you can on Fangraphs for seasons the past 20 years)
I was wondering if there was publicly available code to recreate a 3D pitch trajectory plot given Trackman data.
I've seen Scott Powers' work (https://github.com/saberpowers/predictive-pitch-score/blob/main/package/predpitchscore/R/get\_quadratic\_coef.R) and creating a dataframe for it, I just want to be able to plot it and have their trajectories.
Noob sabermetrics enjoyer here. Let me start by saying in no way I'm bashing Judge; I think he is amazing.
I'm looking at fWAR. I was wondering if someone can point out why Judge Off value is 96.2, or 16.3 points higher than Ohtani, who is at 79.9. Off is computed adding Batting Runs + BsR. In the latter Ohtani crushes Judge (9.2 vs -0.5, the japanese is the second best baserunner in MLB), so this means that Batting Runs value for them is Ohtani 70.7 vs Judge 96.7!!! A difference of 26 points.
Now, of course there's a reason for it, it is math. I just want to understand better what counts for Batting Runs. is it this because of +4 HR, +14 RBI and +0.016 point of average? Or is there something else I'm missing?
PS: RBI are counted in Off? Or do they account in the computation that they strongly depend on teammates getting on base?
It appears the rolling xwOBA charts for pitchers have been replaced by a "movement profiles" chart. I have been searching how to switch back or find the same charts that they used to post. does anyone know how to find these red/blue xwOBA charts?
What is the one sabermetric stat that most correlates with total runs scored for a team in a season?
At what point in a season do "expected" stats start to correlate with actual numbers? In other words, if an xwOBA-wOBA split is large after the first 30 games, do they usually come close to each other by the 80th game?
Each mlb.com team has an injury and roster moves page (not an article) like this one for the Braves:
https://www.mlb.com/news/braves-injuries-and-roster-moves
All of the team can be found from links here:
https://www.mlb.com/injury-report
I'd love to find a way to see if any new information has been added to them. Or all the text from them to a doc (ex. Google Docs) and I could search them by date. Any suggestions? Thanks.
I'm sure we've all heard that pitchers tend to spin it better when they throw harder but it's definitely more nuanced than that.
This is every pitch in the majors and minors since 2020 thrown 200 times. Included is the correlation, slope, and intercept of velo and spin rate for each pitch. I also set up a few more columns for perspective: the min, med, and max of velo and rate, the expected spin for the min, med, and max of velo, and from 65-105mph. Added a few pivot tables to help sort through the data. If you just want to use it see what random minor league guys spin the best breakers though, go ahead.
It's immediately apparent that there is quite a bit of variance in how spin changes with velocity. Some guys consistently run high correlations while many others have basically none. Most people gain some spin as they throw harder, but some guys gain a ton while some guys actually lose spin.
Definitely more to investigate here. Could be good for investigating how individual pitcher's stuff will change in varying roles.
https://docs.google.com/spreadsheets/d/1hxWx6e81YR4_VeEaIRYPZ_qEG39DVrlJj3ST1J8LEWE/edit?usp=sharing
Are Stuff+ models even worth looking at for evaluating MLB pitchers? Every model I've looked into, logistic regression, random forest, XGBoost (What's used in industry), has an extremely small R^2 value. In fact, I've never seen a model with an R^2 value > 0.1
This suggests that the models cannot accurately predict changes in run expectancy for a pitch based on its characteristics (velo, spin rate, etc.), and the conclusions we takeaway from its inference, especially towards increasing pitchers' velo and spin rates, are not that meaningful.
Adding pitch sequencing, batter statistics, and pitch location adds a lot more predictive power to these types of Pitching models, which is why Pitching+ and Location+ exist as model alternatives. However, even adding these variables does not increase the R^2 value significantly.
Are these types of X+ pitching statistics ill-advised?
I've been experimenting with stuff models, pitch classification, and minor league pitch data. I need to do more with tuning and validating but current performance looks quite good and I will definitely have more to show y'all 'eventually'. Until then, with Jackson Jobe on his way to Detroit, I wanted to look at his milb stuff. Some data below for the fellow autists.
He’s sitting 96-97 mph with the fastball the last two years and is a premium fastball spinner. However, that's slightly stifled by being a short extension guy with an average release height. He's started cutting his fastball a bit this year; its giving him better seam effects, but he’s also lost some spin and movement. Should help him against shh but it looks worse against ohh.
He's been a +3k breaking ball guy before, but he’s lost a little spin on the breakers in 24 as well. The shape is basically identical though. A cutter-slider sits around 90 mph, and a big sweeper around 83. A mid-80s changeup seems unremarkable.
His median pitches look 50-65 grade on the 20-80, but his +95th percentile pitches look elite and he is going to be pitching in the bullpen for now. Some control metrics don't love his use of any pitch, but nothing looks particularly bad. His profile honestly looks like a younger higher-octane Randy Vásquez. Not the most flattering comp but overall still exciting.
If this stuff interests y'all leave some more names for me. Minors leaguers must have pitched in AAA or FSL-A.
https://docs.google.com/spreadsheets/d/1JTBAFxldDFENi3iWugQucg5-Jeq53CNkUq4N_gw8MBg/edit?usp=sharing
Are any of you aware of a Paper (or otherwise publicized piece) providing a way to measure reaction time to pitches?
Would the beginning of bat movement be a good estimator for this?
Having a solid estimator for the time it takes for a batter to decide whether to swing or not would be awesome.
Looking forward to any ideas you all have!
Given two players, if all averaged stats are equal (batting avg, walks per 9, so's per 9, ..) and hit results (singles, doubles, ..) proportional to at bats are the same, would the player with the higher number of at bats have a higher WAR?
I want to download every pitch from this season from pitchers who have thrown over 500 pitches. I thought I had this however when I downloaded the csv file it only gave me 25,000 rows. I was expecting it to be in the hundreds of thousands. How can I do this?
Hi, I am looking for data that will have a row for each plate appearance by a batter and the result of that plate appearance, specifically including if an RBI was recorded on that play.
For example, for Marcell Ozuna, I can get his Game Logs anywhere, but when i break it down to Play Log or Plate Appearance log, I can't find if an RBI was recorded or not. Such as FanGraphs Play Log (https://www.fangraphs.com/players/marcell-ozuna/10324/play-log?position=OF) or Savant's Statcast search. Yes, it tells me in a text field whether someone scored or not, but not every time that someone scores does an RBI occur. I also could not find Play Log on Baseball Reference (maybe I am missing it)
Thanks
As the title says, I've been having an issue with scraping Baseball Savant from baseballr. I presume this has to do with the addition of the bat speed based columns, if anyone has a work around or a fix, please let me know.
Question for the older baseball fans who might be in this sub: was there ever a vocal opposition to the metrics invented by Bill James?
James is the originator of game score, range factor, similarity scores, power/speed, and MANY other measures which are now widely accepted and available on virtually any baseball stats resource (whether or not they're all that useful in 2024).
Considering that in modern times there are older, more traditional baseball fans who still haven't even tried to understand WAR, outs above average etc, it's easy to imagine a block of old-heads who fully opposed James' statistical innovations.
It can be frustrating to hear MLB Network analysts reject even the simplest advanced metrics and complain about "launch angle ruining baseball," and I'm curious if fans, broadcasters, and writers shit on Bill James back in the day.
Any response appreciated
I'm writing a paper for school about TJ and the endless pursuit of velocity. I wanted to include a bit about splits versus higher velocities to assert that some of that overthrowing is grounded in analytics, but I can't figure out how to find the leaguewide slash line versus different pitch velocities, whether on Savant, baseball reference splits, or fangraphs. Any help would be greatly appreciated.
Is there any public site that tracks a player's changes in WAR on a game-by-game basis? Specifically, I'm interested in seeing how WAR accrues and diminishes throughout the season in a game log-type format, but WAR isn't included among the statistics on either BBRef or Frangraphs' game log pages.
I'm not the data scientist that a lot of you in this community seem to be (so I'm not about to do coding to create such a tool myself) but I'm deeply intrigued by statistical analysis of the game nonetheless and this would be helpful in getting a better understanding of how game performance translates to WAR totals. As it stands now, I can only watch a specific player's WAR total fluctuations daily and then surmise how the last game affected it. It would be much more useful if I could look back at the whole season and view the changes.
been getting this error and can't figure out how to fix it
I recently developed a site to show the uncertainty between different WAR implementations: https://clearingthefog.github.io/pages/player_comparisons.html
It combines and permutes the WAR components of Baseball Reference, FanGraphs, and Baseball Prospectus to estimate uncertainty of each player's WAR totals, and lets you compare players head to head.
I've included some example figures, but the site has lots more (and accompanying explanatory text). I'd be curious to get some feedback from you sabermatricians before I try and share it with the general public.
Tom Tango approved! https://x.com/tangotiger/status/1832818215338094624