/r/Sabermetrics

Photograph via snooOG

Sabermetrics is the search for objective knowledge about baseball.

Sabermetrics - The search for objective knowledge about baseball through the analysis of empirical evidence.

Sabermetrics Analysis
Baseball Prospectus
Beyond the Box Score
Fangraphs
Hardball Times
High Heat Stats
Tom Tango
Tango Tiger Wiki
Balls and Strikes
Baseball Think Factory
Baseball Analysts
The Physics of Baseball, Alan Nathan
Baseball HQ Research and Analysis
Sabermetrics 101: Introduction to Baseball Analytics
Data Sources
Retro Sheet
Sean Lahman Database
DingerDB
Fangraphs
Baseball Reference
Stat Corner
Baseball Heat Maps
Pitch F/X
Brooks Baseball Pitch f/x
Baseball Savant
TexasLeaguers
Books
The Book: Playing the Percentages in Baseball
The Hidden Game of Baseball
Baseball Between the Numbers
Extra Innings: More Baseball Between the Numbers
The Bill James Historical Baseball Abstract
Curve Ball
The Baseball Economist
The Numbers Game
The Extra 2% - Jonah Keri
Big Data Baseball
Dollar Sign on the Muscle
Analyzing Baseball Data with R
Baseball Hacks: Tips & Tools for Analyzing and Winning with Statistics
The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
Trading Bases
AL East AL Central AL West
Yankees Tigers Oakland
Orioles WhiteSox Rangers
Rays Royals Angels
Blue Jays Indians Mariners
Red Sox Twins Astros
NL East NL Central NL West
Nationals Reds Giants
Braves Cardinals Dodgers
Phillies Brewers D-Backs
Mets Pirates Padres
Marlins Cubs Rockies
Related Subreddits
/r/baseball
/r/baseballstats
/r/fantasybaseball
/r/sultansofstats
/r/sportsanalytics
/r/footballstrategy
/r/nflstatheads
Misc.
/r/Sabermetrics Weekly Stat Discussions
Reddit Markdown Primer - how to make charts, other stuff in reddit

/r/Sabermetrics

13,617 Subscribers

2

Fangraphs fielding value?

Hello all, I have a feeling I’m being stupid, but I am at a loss figuring out how fangraphs calculates the “fielding” component of fWAR.

The original write up states that it’s UZR, which was replaced with OAA in 2022. If I look at lindor though for instance, his OAA is 16 and his FRV is 12 (this matches the statcast leaderboard). Somehow though this gets to 10.8 runs in the actual fielding component of his WAR. What’s that -1.2 runs?

2 Comments
2024/10/30
23:57 UTC

2

Is there a site or database that has biographical data like height and weight by season? I'm trying to use this for a statistics project

So my current plan is to analyze BMI as an indicator of performance and also weight and height individually, but it seems like I can only get either the current or last updated biographical data. Is there anywhere that has records by the season? Baseball reference mentions only maintaining data since 2012, but I can't seem to find historical biographical data.

3 Comments
2024/10/30
00:45 UTC

5

wOBA calculation question

hey, managed to calculate the RE24 table and about to implement calculating wOBA for my project, but one thing doesn't really check out in my head.

Let's say that the bases are loaded with 0 out, and that the RE24 entry for that state is 2.2

the batter hits a grand slam. this counts as 4 runs

bases are now clear with 0 out, the RE24 entry is 0.5

thus, to capture the run value of that particular grand slam, does it add up to 4+(0.5-2.2)=2.3?

6 Comments
2024/10/28
19:06 UTC

3

Calculating players with gaps between appearances of at least five years.

I am working on a SABR BioProject for a player who had a six-year gap between appearances. I would like to know how rare it is to have a gap of at least five years between appearances, post-1980. Does anyone know if this report could be run on Retrosheet or Stathead?

2 Comments
2024/10/26
17:00 UTC

8

Mass downloading data from baseball savant for ML project

Hi everyone, I’m currently a statistics masters student and for my final project this quarter I’m planning on doing an ML project using pose estimation and other contextual data to predict risk of TJ surgery/ UCL injury. I know that baseball savant has video data of every pitch thrown on their website and I’ve been manually downloading videos so far. Recently however I met with my project mentor and he’s worried I won’t be able to create a large enough dataset given the time and so I wanted to ask if there’s anyway to mass download videos of pitches for certain players in certain time frames. Ive done some digging and can’t find a good way so wanted to reach out to this community and see if there were any ideas. I also want to make sure I don’t run afoul of MLBs policies when doing this so please let me know if there’s considerations there as well. Appreciate any help or advice, thanks!

17 Comments
2024/10/25
19:52 UTC

5

What is the IP equivalent to 650 PA?

I don’t know if this is much of a sabermetrics question but I can’t seem to find the answer anywhere

15 Comments
2024/10/24
12:59 UTC

7

A quick look at the payrolls and revenues of past World Series winners

https://preview.redd.it/i76kqw80hfwd1.png?width=1452&format=png&auto=webp&s=1f3bd13ff361cf420149bd81f7d992a513eb935d

With team finance talks surfacing in light of the upcoming Yankees-Dodgers Fall Classic, I figured I would look at past World Series winners' spending habits.

Explanation

The two dimensions of this graph are Payroll+ (x-axis) and Revenue+ (y-axis). Opening day payroll data are widely available (I gathered them from here). Revenue data were estimated based on information from here, which is why I've only gone back to 2003. I've used the "plus" version of each to indicate how they relate to league average. If you're familiar with how stats like wRC+ and ERA+ work, this is the same concept: League average is fixed to 100. So if a team's Payroll+ is 120 for example, that means their payroll was 20% higher than the average team's that season.

Key Takeaways

The clearest conclusion to draw from this graph is how positively correlated payroll and revenue are. This is no surprise, as teams that make more money will have more money to spend on players and win more games. But let's look at the interesting data points:

  • 2003 Florida Marlins: The biggest financial underdog to win the World Series in this time frame, the Marlins were the only team to rank substantially below average in both revenue and payroll (they were bottom third that year). Interestingly, their revenue was pretty much commensurate with their payroll, so it's not like they relatively overspent to contend. Had they fallen short, the Yankees would've snagged yet another title. Speaking of...
  • 2009 New York Yankees: The only World Series winner in this time frame to sport an opening day payroll over twice as large as league average. And hey, they only moderately overspent relative to their revenue, so why not? Just as interesting is the fact that they only won it once despite being top 2 in payroll for all but four of these years.
  • Despite most World Series winners being above average in both payroll and revenue, a little over half of them were within 25% of the average in both. The remaining teams tended to be the big market heavy hitters (Yankees, Dodgers, Red Sox x4). The way World Series champions are determined simply won't allow those large markets to win all the time.
  • The average World Series winner throughout this time period spent 29% more than average on payroll and earned 22% more than average in revenue. The payroll difference being a little larger than the revenue difference tells us that World Series winners have overspent relative to their revenue more often than not. This is also usually what fans want (especially fans of non-big markets that know not to expect extravagant revenues).
    • The most obvious example of this is the Mets, with Cohen spending on payroll with reckless abandon recently--something I'd imagine not many of their fans are unhappy about. If the Mets win a World Series soon, I would anticipate their data point being far closer to the bottom right of this graph than everyone else's. The teams on the opposite end of this spectrum are usually those with owners often derided for being cheap.
  • The World Series winner that overspent the most relative to their revenue was the 2019 Washington Nationals (though that trend has since reversed to how it was for them ~15 years ago). They were the only winner with a Payroll+ above 125 that brought in below-average revenue. Those who also overspent relative to revenue were last year's Rangers and most of those Red Sox teams.
  • The World Series winner that underspent the most relative to their revenue was the 2021 Atlanta Braves. They were the only winner with a Revenue+ above 125 and a Payroll+ below 125, so perhaps they deserve credit for having been such a well-oiled machine. They still had an above-average payroll though, unlike the 2016 Cubs and 2017 Astros, who were also relative underspenders (I wonder why it worked out so well for Houston that year). The Giants of 2010 and 2014 were the other significant relative underspenders, though not their 2012 run oddly enough.

Conclusion

Whoever wins the World Series this year will find their data point on this graph closer to the top right than most. However, that doesn't mean such a guarantee can or should be expected most of the time.

I hope folks find this interesting!

0 Comments
2024/10/23
16:05 UTC

2

Minor League Statcast Pitch Type Classification

Does anyone know if there is a program to more accurately classify AAA and low A pitch type data than the one that currently exists.

1 Comment
2024/10/18
18:29 UTC

1

Minor League Batting+Pitching Data

I'm working on comparing performance at Rookie, A, and A+ ball for players drafted out of various NCAA leagues, but am having a hard time finding minor league batting and pitching data all in the same place. I really don't want to have to spend countless hours gathering data piece-by-piece, and if there's a place I can find it for free, that would be much better.

Any suggestions?

4 Comments
2024/10/17
19:39 UTC

1

Why is BsR not correct (?) on Fangraphs?

By FG's library, BsR = wSB + wGDP + UBR.

But if I look at the leaderboard on FanGraphs and do the sum, BsR is never equal to it. What am I doing wrong?

Example below

https://preview.redd.it/utol2ggmsxud1.jpg?width=728&format=pjpg&auto=webp&s=bd6b04c367ae8096e6b1fea0601c154501d67712

3 Comments
2024/10/15
15:14 UTC

16

The Baseball Cube Data Store

I suppose I'm the dummy from purchasing data from here, but I have to say that this site does a REALLY poor job.

First, I'll give him his props for putting college baseball data all in the same place. Thanks!

Aside from that, nothing else deserves any commendation. I'll list my grievances here:

  1. The item descriptions are misleading - I purchased an item called "College Stats - All", which claimed to have all available college data from all divisions and leagues on site. This turned out to be a complete lie - I was only given the data from 2017 to the present, even though he had more data available. I was able to get this data, but only by purchasing one of the other NCAA data items. I'll assume, charitably, that I was supposed to assume that the "College Stats - All" data was incomplete, but I don't think I should have to.

  2. Communication was painfully slow - When I purchased the data, I got it the next day, as I was expecting. But I could only get about one message per day with him when I was trying to coordinate getting the rest of the data. This cost me a couple of days of work. Not ideal.

  3. The data I received is a COMPLETE MESS - There are so many problems with the data I got:

a) The column names are inconsistent across sheets, and even when they are consistent, the names are not conventional. Some were formatted word1word2, some Word1Word2, others Word1word2, and some word1Word2. Like seriously. Pick a style.

b) Thousands of observations in the sheet had values shifted from one column into the wrong column. I had to delete these from the data altogether. Bad for the stability of my models.

c) Some of the observations were not ASCII encoded, which was a real hassle to deal with.

d) Some of the observations had spaces in the front, which is easy to fix, but still really annoying.

e) Some of the conferences had the same name with different capitalizations (i.e "ColoJr" vs "ColoJR", which took nearly an hour to identify and fix.

f) Some of the NCJAA teams shifted back and forth between being identified in their conference (i.e Mon-Dak conference) and their region (NJCAA Region 13/9). This will take me hours to fix when I finally get to it.

I purchased this data because I wanted to save myself some time. I didn't end up saving that much time, thanks to poor encoding and data reporting practices. I understand that not everyone can be as based as Sean Lahman, but there are basic standards of conduct that should be upheld, especially when you're selling the data to other people for money. I was really disappointed in the service and products I received from The Baseball Cube. I extend a warning to others who may be interested in their products or services.

2 Comments
2024/10/14
19:04 UTC

4

Ideas for creating a postgame pitch report dashboard to track starting pitcher performance?

I’m learning to use the MLB Stats API to track the Padres performance.

I’m curious to see if any insight can be made on why Cease struggled in his two starts against LA.

I made a couple posts about pitch breakdowns- could definitely look at a lot more data!

https://www.reddit.com/r/Padres/comments/1g02r5h/dylan_ceases_pitch_breakdown_from_nlds_game_1_im/

https://www.reddit.com/r/Padres/comments/1g1e1dj/darvish_pitch_breakdown_from_nlds_game_2/

0 Comments
2024/10/11
16:27 UTC

3

About pitch counts for starters in the playoffs -- anyone know of any specific research or analysis? EDIT: any *good* research or analysis?

Anyone have any thoughts on how long of a leash Cobb is likely to have today? Either in terms of number of pitches or if he starts to look shaky? So far this playoffs Cleveland has limited their starters to mid-70 pitch counts, but that is a sample size of just two games; is it fair to expect the same from Cobb?

In fact, more generally, does anyone know of anywhere or anyone who has done any kind of analysis on the length of outings or pitch count limits on starting pitchers in playoff situations vs in the regular season? I get the general feeling that pitchers tend to have shorter leashes (maybe on avg like 10 pitches less than what is typical for them, but that is just a random non-scientific observation), but i would love to know if anyone has done any specific work on this?

3 Comments
2024/10/09
18:22 UTC

34

Baseball Mini-Game using MLBAPI Play by Play Data using Python

https://reddit.com/link/1fzgxpd/video/y3xz97qjzktd1/player

Check out this mini-game I made using play-by-play data from the MLB API.

https://www.moonshotbaseball.io/dugout

You start with a randomly generated lineup of 9 batters, and then you hit through that lineup trying to score as many runs as you can score before all 9 batters get out.

Each play outcome is a randomly selected real life play from that batterover the last 3 years where the base runner situation matches the state of your game, so whatever happens to the batter and runners in the video shown, is what happens to your batter and the runners on base in your game!

1 Comment
2024/10/09
02:09 UTC

3

Thought of an interesting metric

New here. So this thought came to me earlier this morning. I was reading a few articles about the postseason games this past weekend, and one word kept coming up: clutch. Apparently there's no definitive way to measure a player's clutch ability (or so I read). But I may have thought of one, if it's not already in existence. Basically, any time a player gets an RBI whenever their team is either tied or trailing, they earn "1" clutch factor (CF). Crude I know, but I can't think of any other way to describe or name it. Does something like this exist? What is everyone's thoughts on this metric?

11 Comments
2024/10/07
17:39 UTC

1

Runs saved by an average player at his position

Hello. I am a sabermetrics enjoyer, but fairly new. I'm just learning a lot of things, mainly with FanGraphs' site and some other sources.

I want to do a calculation for my own curiosity: I want to count all the runs created by hitting and saved by pitching and fielding to look at the total and see how many runs each part of the game saved or produced. I hope you catch my train of thought. For instance, in 2024 season, 500 runs were created hitting, 450 were saved pitching, 150 were saved on fielding.

Now, I'm sure something like this can be done because when you do WAR for position players and pitchers your currency is always Runs, that are converted to Wins, but you can absolutely compare all the players.

For hitting, wRC is what I'm looking for. What should I use for fielding and pitching?

UZR, or maybe DRS since it is used for all positions (while UZR excludes catchers) is in Runs, but it is Above Average. So I need to know what league average is (and for each position). But where?

For pitching I have no idea, because FIP is counted like ERA, so Runs Allowed. The pitching side of sabermetrics is something I didn't dig into at all, so I'm definitely short of ideas here.

3 Comments
2024/10/06
20:57 UTC

11

Estimating the cost of pitch tipping?

Is anyone familiar with any attempts to quantify the expected cost of pitch tipping? My group chat sent this tweet

https://x.com/jomboy_/status/1842062696847393120?s=46&t=WHf4nK-muUXyQhXDAWyXMA

And suggested Devin Williams got rocked because of this but after watching the video I remained a bit skeptical because it was so subtle. I watched the video in the first comment by Trevor May and he walks through David Bednar’s performance and thinks he was tipping his pitches (which I can get onboard with given the more visible changes and the continual steep drop in performance this year).

But for a one game blowup it does seem unlikely that Williams didn’t tip his pitches all year (or he did and teams didn’t pick up on it) until the Mets did in the postseason.

So I was trying to approximate the likelihood using Bednar’s change in expected ERA YoY to guesstimate the impact on performance and assess the relatively likelihoods but I was wondering if anyone else has done this more quantitatively and systematically.

9 Comments
2024/10/04
23:19 UTC

9

What Was Different About 2024?

So, over the summer, as an experiment, I tried to come up with a run prediction formula solely based on XBH. Without getting too technical, I assigned a value for 2B+3B, a value for HR, and a value to HR per 2B+3B. I didn't factor BB rate or exit velocity. I based my values solely on 2023 league averages.

Once I set this up, I went team by team for 2023, and found that my formula correlated with total runs by about 95.5 percent, almost identical to the "technical" Runs Created formula based on Bill James work, and was more predictive than OPS. I then tested my formula on every team in 2022, which lead to a 97.1% correlation, and every team in 2021, which ended up at 96.2%. While I haven't yet gone team-by-team prior to 2021, I tested it against league averages each year from 2010-2019, and this still produced correlation at 95.5%, so I had hope that I might be on to something.

However, when crunching team-by-team 2024 numbers, the James model resulted in its usual 96%, whereas my model suddenly dropped to 90%. Specifically, it tended to underrate good offenses and overrate bad ones by a much larger degree than the three previous years. So my question is: what was different about this season that could've lead to this result? What would've caused a 96% correlation based on 110 samples to dip to 90% in this year's 30 samples? When searching everything available on fangraphs, I wasn't noticing anything that seemed obviously different this season.

As an aside, have any of you tried a similar experiment? And if so, what did you find?

5 Comments
2024/10/03
18:49 UTC

4

WPA chart that has a log scale?

I was talking to friend re todays Mets Braves as compared to Royals A's in 2014 and visually comaparing the WPA charts, and I suggested that WPA charts would better show action if they were on a log chart, since, say, a 3 run homer in 1-0 game in the third inning would make the chart swing steeply from like 65% to 30% despite not really making for a "crazy" game
Anyone know how I can find something like that? Or maybe the best way to download csv/xcelof individual games' wpas so I can do it myself

3 Comments
2024/09/30
20:46 UTC

2

Where to find 80's splits?

Any sites to search for L/R batting splits for the 80's? Fangraphs only shows it on league-wide scale for 21st century players. BRef shows it for individual players, but can't find where to search for it on a league-wide scale either

Not a specifically sabermetric question, but I assumed this subreddit would be the better one to ask

Edit: To be more specific. I want to sort through players by splits (similar to how you can on Fangraphs for seasons the past 20 years)

3 Comments
2024/09/29
21:34 UTC

2

3D Pitch Trajectory

I was wondering if there was publicly available code to recreate a 3D pitch trajectory plot given Trackman data.

I've seen Scott Powers' work (https://github.com/saberpowers/predictive-pitch-score/blob/main/package/predpitchscore/R/get\_quadratic\_coef.R) and creating a dataframe for it, I just want to be able to plot it and have their trajectories.

1 Comment
2024/09/29
20:34 UTC

5

I created a new Stat for Relievers. What do you think of it? The Standard Relief Outing

1 Comment
2024/09/29
03:31 UTC

4

Introducing The PCV. I Created a new pitching stat for starting pitchers.

2 Comments
2024/09/29
01:02 UTC

Back To Top