Lahman
Lahman’s baseball database contains complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more.
Original source: www.seanlahman.com
Versions
Lahman_2014 (by Jan Motl)
- Added foreign key constrains by removal of violating samples
Dataset details
- Associated task:
- Regression
- Domain:
- Sport
- Data types:
- Size:
- 74.1 MB
- Count of tables:
- 25
- Count of rows:
- 470,225
- Count of columns:
- 353
- Missing values:
- Yes
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 23,111
- Target table:
- salaries
- Target column:
- salary
- Target ID:
- teamID, playerID, lgID
- Target timestamp:
- yearID
Algorithms
Dataset version | Target | Algorithm | Author text | Measure | Value |
---|---|---|---|---|---|
lahman_2014 | salary | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | R2 | 0.788 |
lahman_2014 | salary | Relboost | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | R2 | 0.8395 |
lahman_2014 | salary | Deep Feature Synthesis | featuretools | R2 | 0.7797 |
lahman_2014 | salary | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | RMSE | 1402960 |
lahman_2014 | salary | Relboost | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | RMSE | 1220382 |
lahman_2014 | salary | Deep Feature Synthesis | featuretools | RMSE | 1431516 |
lahman_2014 | salary | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | MAE | 765292 |
lahman_2014 | salary | Relboost | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | MAE | 666548 |
lahman_2014 | salary | Deep Feature Synthesis | featuretools | MAE | 769939 |
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: relational.fel.cvut.cz
- port: 3306
- username: guest
- password: ctu-relational
- Export "lahman_2014" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).