/r/q?req.query.q -- Subreddit Search

Sabermetrics Analysis
Baseball Prospectus
Beyond the Box Score
Fangraphs
Hardball Times
High Heat Stats
Tom Tango
Tango Tiger Wiki
Balls and Strikes
Baseball Think Factory
Baseball Analysts
The Physics of Baseball, Alan Nathan
Baseball HQ Research and Analysis
Sabermetrics 101: Introduction to Baseball Analytics

Data Sources
Retro Sheet
Sean Lahman Database
DingerDB
Fangraphs
Baseball Reference
Stat Corner
Baseball Heat Maps

Pitch F/X
Brooks Baseball Pitch f/x
Baseball Savant
TexasLeaguers

Books
The Book: Playing the Percentages in Baseball
The Hidden Game of Baseball
Baseball Between the Numbers
Extra Innings: More Baseball Between the Numbers
The Bill James Historical Baseball Abstract
Curve Ball
The Baseball Economist
The Numbers Game
The Extra 2% - Jonah Keri
Big Data Baseball
Dollar Sign on the Muscle
Analyzing Baseball Data with R
Baseball Hacks: Tips & Tools for Analyzing and Winning with Statistics
The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
Trading Bases

AL East	AL Central	AL West
Yankees	Tigers	Oakland
Orioles	WhiteSox	Rangers
Rays	Royals	Angels
Blue Jays	Indians	Mariners
Red Sox	Twins	Astros

NL East	NL Central	NL West
Nationals	Reds	Giants
Braves	Cardinals	Dodgers
Phillies	Brewers	D-Backs
Mets	Pirates	Padres
Marlins	Cubs	Rockies

Related Subreddits
/r/baseball
/r/baseballstats
/r/fantasybaseball
/r/sultansofstats
/r/sportsanalytics
/r/footballstrategy
/r/nflstatheads

Misc.
/r/Sabermetrics Weekly Stat Discussions
Reddit Markdown Primer - how to make charts, other stuff in reddit

13,841 Subscribers

Past projections

Does anyone know a way to get past ATC and THE BAT (THE BAT X) projections? I can get 2023 and 2024 data on Fangraphs using the Wayback Machine, but before that I can only get the first 30 players, because of the way the old Fangraphs website was structured. Thanks!

2 Comments

2025/01/31
16:46 UTC

2024 Win Estimator Accuracy

Over the past couple seasons I've been using team xwOBA and xwOBA allowed to generate projected standings and playoff odds. This season, I also kept track of a couple other win estimators like Pythagorean expectation to see how the xwOBA method stacked up. Here are the monthly snapshots based on simulating the remainder of the season 10,000 times. The "contestants" were: Actual Win Percentage, Tango Regressed Win Percentage (+35 wins, +35 losses), Pythagenpat, BaseRuns, and xwOBA. I'm also included the FanGraphs depth charts projections as a comp. I'm reporting the RMSE in terms of both total wins and winning percentage.

April 30	Total Wins	Win%
Actual	12.23	7.56%
Tango	7.38	4.58%
Pyth	11.21	6.92%
BaseRuns	10.34	6.39%
xwOBA	8.25	5.11%
FanGraphs	6.35	3.94%

May 31	Total Wins	Win%
Actual	8.70	5.37%
Tango	6.83	4.23%
Pyth	8.24	5.08%
BaseRuns	7.23	4.47%
xwOBA	6.18	3.84%
FanGraphs	5.52	3.42%

June 30	Total Wins	Win%
Actual	6.87	4.23%
Tango	5.83	3.60%
Pyth	6.74	4.15%
BaseRuns	6.57	4.06%
xwOBA	6.00	3.71%
FanGraphs	5.12	3.17%

July 31	Total Wins	Win%
Actual	3.91	2.41%
Tango	3.90	2.41%
Pyth	3.66	2.26%
BaseRuns	3.86	2.40%
xwOBA	3.93	2.44%
FanGraphs	3.75	2.32%

August 31	Total Wins	Win%
Actual	2.50	1.54%
Tango	2.36	1.46%
Pyth	2.47	1.52%
BaseRuns	2.50	1.55%
xwOBA	2.43	1.51%
FanGraphs	2.21	1.37%

I feel like this basically unfolds how you'd expect. Actual win percentage is the least accurate, Pythagorean starts out a bit behind BaseRuns but starts to catch up as we get later in the season (maybe teams have some degree of control over timing that BaseRuns doesn't pick up), and the two regression methods (Tango and FanGraphs) are the clear front runners. xwOBA starts in a middle ground between Pyth/BaseRuns on the one hand and Tango/FanGraphs on the other and then, later in the season, ends up at roughly the same level as Pyth and BaseRuns.

Nothing groundbreaking or particularly noteworthy here, but I figured I'd share the results for posterity's sake.

10 Comments

2025/01/31
04:01 UTC

Pybaseball Stats Explained

I am looking for any list or reference to the Python package pybaseball statcast statistics.

I am specifically looking to understand the delta_runs_exp statistic and a brief explanation into how it is calculated and how I can use it.

I haven’t been able to find a solid reference for them and was wondering if anyone had a good place to look.

5 Comments

2025/01/29
06:23 UTC

Importing retro sheet into R

Does anyone know how to import a file from retro sheet into R? I’m trying to create a new statistic to measure a hitters ability to hit to the situation so I need Base/Out States and the result of their at bat.

1 Comment

2025/01/29
03:39 UTC

Umpire info

I'm working with R package baseballR and looking for umpire info for each game.

Is there a way to find umpire information using MLB API with baseballR?

8 Comments

2025/01/28
14:14 UTC

Downloading Data

I am very new to R and just started to use BaseballR. I've watched a few videos but have been struggling to get what I need. I am looking for stats from 2000 through the 2024 season. I really only need WAR, age and position for all players (batters and pitchers) but would also like PA and IP for sorting purposes.

If there is already a database out there with these stats or if someone could recommend what to do or code for it, that would be greatly appreciated, thank you!

7 Comments

2025/01/27
19:05 UTC

MLB Stats API documentation

Google is sponsoring an MLB hackathon. The deadline is Feb 4, so there is not much time to create something if you haven't already started, but they have a GitHub repo with MLB Stats API documentation.

https://github.com/MajorLeagueBaseball/google-cloud-mlb-hackathon/tree/main/datasets/mlb-statsapi-docs

1 Comment

2025/01/27
17:15 UTC

Holds calculation

I'm building holds using retrosheet data and when checking against data sources I realized there are some discrepancies.

Example: Cade Smith (CLE)

- 28 holds (https://www.baseball-reference.com/players/gl.fcgi?id=smithca06&t=p&year=2024)

- 26 holds (https://www.fangraphs.com/leaders/major-league?pos=all&stats=pit&lg=all&type=0&season=2024&month=0&season1=2024&ind=0&sortcol=11&sortdir=desc&pageitems=100&qual=0)

- 28 holds (https://www.mlb.com/stats/pitching/holds?expanded=true)

i looked at the game log and fangraphs didn't count this game as a hold, but others did: https://www.espn.com/mlb/playbyplay/_/gameId/401568588

it feels like it should be... 2 run lead that holds through the rest of the game with a pitcher of record already in place before Cade Smith enters. am I missing something?

3 Comments

2025/01/26
03:34 UTC

Who was the most/least consistent player of 2024

I'm testing something. I want to know what y'all thought. What I found might surprise y'all

12 Comments

2025/01/25
00:49 UTC

FanGraphs Exporting Data

Disclaimer: I am very new to Fangraphs website and just got a subscription. When exporting data to a CSV/Excel I'm getting names with accents and other special characters like this. I was wondering if there is anyway to fix this when exporting data. Thank you!

8 Comments

2025/01/24
18:34 UTC

Explain expected stats

I see expected stats everywhere, but I'm unsure exactly how they work. I understand xBA, but that's about it. I guess I'm asking what factors are used and what formulas are used when counting expected stats instead of regular counting stats or efficiency stats.

7 Comments

2025/01/21
23:45 UTC

Pitching WAR calculations for FanGraphs vs. Baseball Reference

I just realized that FanGraohs and Baseball Reference must have wildly different ways of calculating WAR for pitchers. For example, BR lists Tanner Scott's 2024 total bWAR across two teams as 4.0, whereas FanGraphs lists the fWAR as 1.6.

What gives? And which approach do you find more meaningful for evaluating pitchers?

13 Comments

2025/01/20
12:00 UTC

I’m at a Crossroads

I hope this isn’t talking into a giant black hole. I just joined this community a few seconds ago but for those that have made it in baseball I am about as lost as you can be.

I am a sophomore Sports Management major and am currently working with a D1 analytics staff, where all we do is basically clip video and run Trackman. I’ve had a great experience working with the staff and have learned a lot more about baseball analytics than I knew before, and am excited for this upcoming season.

Over Christmas break I tried applying for internships on Teamwork Online. After an extensive search, I was only able to muster up four applications, and not one of them has even contacted me regarding an interview. I’m only 19 and have little to no proof of my knowledge in baseball other than my word of mouth through my cover letters. My only projects I’ve worked on regarding baseball on the side were making a top 1000 players of all time list (took me almost 2 years), seasonal player rankings and predictions, and recently am working on developing a stat to measure a player’s overall hit tool (albeit a rather elementary one).

I realize that if I am going to get anywhere in this field I need to just do more, and I don’t know how. I have 0 clue whatsoever how to code, which I hear is one of the most important skills in the industry. My bigger fear is that I am selling out and betting on myself entirely by chasing this career path. The likelihood I get a job in this field realistically, despite my analytic experience, is slim to none. If I fail at this, I don’t really have anywhere to turn to and will probably just work odd jobs for the rest of my life. Even if I do get a job in this field, the pay will be low (at least that’s what I’ve heard) and will probably struggle to make ends meet. The only reason I chase this crazy dream of mine is because this is something I enjoy and would kill to be able to do for a living.

If you were once in my shoes, what did you do to somehow get a job in baseball analytics? What should I be doing to make myself THE most marketable and qualified guy out there? If you currently are in similar shoes, feel free to comment and share your experience so I know I’m not the only one sitting here at 12:30 at night wondering what the hell I’m even doing.

27 Comments

2025/01/16
05:31 UTC

Searching for baseball reference page.

Is there a baseball reference page that has every single plate appearance by that player. I'm trying to do a rolling average.

6 Comments

2025/01/15
00:47 UTC

Brooks baseball short form movement

Hello, Im a teenager that is getting into sabermetrics and among a lot of other questions I have a question about brooks baseball short form movement. I was trying to look at pitch movement for a lot of my favorite interesting players that played between 2008-2014 (livan hernandez, carlos zambrano, etc). However I notice the absolute value of each pitch movement data is a lot less than that of statcast or other datasources (sorry, i may have not used correct terminology). I compared players who play today, and they have different numbers as well. Does anybody know why this is? Is there any other places where I can look at pitch movement data for some other pitchers? Sorry, this is likely been well covered already, I havent found any info tho. Thanks

5 Comments

2025/01/14
04:26 UTC

Calculating batting avg/slg against specific pitch type

Can someone explain to me how batting averages and slugging against specific pitches are calculated?

From a Grant Brisbee article on Justin Verlander: "Before his neck injury at the end of May, hitters batted .227 with a .454 slugging percentage against his fastball. After, they hit .382 with a .551 slugging percentage."

What happens if the pitch doesn't result in a hit or an out (i.e. it's strike 1 or 2, or ball 1,2,3,4)? Thanks.

3 Comments

2025/01/12
08:38 UTC

Pybaseball statcast queries taking longer with each one

Hello, I have a couple of questions:

I have a loop gathering each baseball game ID by just cycling through all the teams for 3 years using statcast(date range, team). When I started running this, each teams season would take approximately 1 minute in their own separate query. I have cache enabled so if this is messed up I can run it faster next time.

What might be causing the query time to increase by about 7 seconds per iteration?

Can I stop it now in the middle of the loop then run it again using mostly the cached data and start back down at a 1min query time?

Does stopping it mid loop effect the cache for all of the completed iterations? I’m so far in I don’t want to mess with it and find out.

2 Comments

2025/01/11
22:29 UTC

Student Ticket for Sloan Conference

Hi.

Anyone willing to sell their student ticket for Sloan Analytics Sports Conference to me? Thanks

9 Comments

2025/01/06
01:02 UTC

What data is used for pitch contour heatmap in baseball Savant?

Is it hit coordinates, so does not include balls or strikes?

1 Comment

2025/01/05
03:31 UTC

Survey on Fantasy Sports/Advanced Analytics

mods please delete if not allowed

College classmate and I are trying to collect perspectives on current fantasy sports offerings. We’re particularly interested in the alignment between fantasy sports and advance analytics/sabermetrics. Please consider filling out our survey. Your perspectives would be incredible valuable to us. Thank you!

Link:

https://docs.google.com/forms/d/e/1FAIpQLSeUZ3TXCzS-Qn8KnaOra5mEG6tUN9I5lIBJYPOL_5rZOpowgw/viewform

0 Comments

2025/01/04
01:45 UTC

Baseball Analytics Discord???

Does anyone know of any discord severs dedicated to advanced baseball analytics? I found a great discord for NFL analysis but nothing for MLB. Please let me know.

26 Comments

2025/01/02
18:01 UTC

K per innings vs K per batters faced?

Is looking at a pitchers strikeout rate per batters faced better than looking at their strikeout rate per innings? Intuitively I would think looking at k/batters faced is better but I'm curious on opinions on this.

8 Comments

2025/01/01
18:56 UTC

Highly Considered Hall of Fame Metrics

Hi all. Happy New Years Eve!

What are all of the "modern" non-position-specific metrics that some of the writers like to consider? I know they mean heavily on WAR/VORP, and I think xwOBA & wRC+... but beyond that I couldn't think of any. Appreciate your insight

3 Comments

2025/01/01
01:17 UTC

What major should I go into for an MLB front office career?

I'm a junior in high school and I'm starting to look at colleges. Just wondering what major I should go into for an MLB front office career. Everything I'm seeing mainly says sports management, but my dad said most things he read said sports analytics. Any help would be well appreciated. Thanks!!!

55 Comments

2024/12/31
20:22 UTC

WAR for DIII questions

TLDR: Baseruns vs wOBA? Do I need to find DIII wOBA weights? Best way to track baserunning? TZ on team level vs individual when box scores are unreliable? Tweak starter/reliever adjustment? Can I leave out the leverage component?

I'm an athlete at a DIII school, and I've taken it upon myself to have a sort of front office role as well, gathering and tracking the relevant information to better inform decisions. It may not be quite as useful as some of the other metrics I'm utilizing, but I would like to get a WAR model in place for at least our conference (13 teams, 1 DH against each per season for 24 conference games). The problem of course is that there is no retrosheet equivalent for me to use, so I have to build my own chart that would track everything.

Starting with batting WAR, I have everything I need already but I am not sure which metric to use as my base. I ran team-level numbers on last season for baseruns and wOBA and while I am more satisfied with the wOBA for runs above/below average, I had to tweak the formula to PA * (wOBA - lgwOBA) / 0.75 because I found that dividing by 1.25 produced too conservative of results, underestimating the best teams and overestimating the worst ones. My issue is that I am not sure if it is fair of me to use wOBA in the first place, since its weights are of course based on major league data, and I doubt that those weights are truly the same at the DIII level. Baseruns turned out not particularly accurate, which makes me tentative to use that as well. Some insight as to what would be the best course of action would be appreciated.

With baserunning, the question turns more to my methodology of data collection. The way I have it set up, each PA will be a new row in a spreadsheet, with the columns being either identifiers (name, venue, game state, etc) or events (PA result, batted ball type, first fielder to touch the ball, etc). With this however, I do not record anywhere who baserunners are, just where they are. I suppose this can be corrected easily enough, but the bigger issue is that I don't have accounting for steals in there, nor am I sure how I would do that. Any suggestions would be appreciated.

For fielding, I obviously cannot use statcast OAA, and I think it would be best to use TZ. Herein lies my second question, since box scores at this level are unreliable, and fielders switch in without necessarily getting reflected in it until they come to the plate (especially problematic for defensive subs at the end of a game). Does it make sense then to only find TZ for each position on a team level? Or is it in my best interest to still attempt to record who fielded the ball?

Pitching I'll be using Fangraphs' formula, and the only questions I have there are whether I'll need to tweak the starter/reliever component, as well as another regarding leverage index. I'm personally not a fan of saying that a given out is more valuable than another, and as such I am considering leaving the leverage component out. I understand why it is included normally, but when research consistently shows that players reduce to themselves regardless of situation, I have a hard time justifying including it.

All in all, I have my work cut out for me to say the least. Any insight, tweaks, or recommendations you all have would be much appreciated.

10 Comments

2024/12/31
16:07 UTC

I am working on this dashboard in Shiny, wanted to ask what more could be or should be added to this for pitch related metrics that could add value

https://preview.redd.it/tuiq2kbv4v9e1.png?width=3046&format=png&auto=webp&s=0de44e638c10ef9868ba28fb99c8f89d84d3e0f4

11 Comments

2024/12/29
22:21 UTC

Is there a way to get historical WAA out of Fangraphs?

They have WAR for players obviously, but they don't seem to have all the pieces to calculate WAA. Any suggestions appreciated. Thank you.

1 Comment

2024/12/26
20:19 UTC

Questions about Josh Hader's SSW 2-seamer

Why aren't/ can't more guys throw a SSW 2-seamer like Josh Hader's? Calvin Faucher seems to be the only other guy throwing something similar to it.

What are the release traits required to throw this pitch?

Are there pitchers who would be a good candidate to change to this fastball?

4 Comments

2024/12/26
15:54 UTC