SFScores
The San Francisco Dept. of Public Health’s database of eateries, inspections of those eateries, and violations found during the inspections. The task is to predict the unscheduled inspection scores from 2013 to 2016. The scores range from 1 to 100, where 100 means that the establishment meets all required standards. Beware of temporal leakages: you are not permitted to use violations from the predicted inspection to predict the inspection score as that would be too easy.
Original source: 2016.padjo.org
Versions
SFScores (by Jan Motl)
- Deleted records from frpm without a match in schools
Dataset details
- Associated task:
- Regression
- Domain:
- Government
- Data types:
- Size:
- 10.3 MB
- Count of tables:
- 3
- Count of rows:
- 66,153
- Count of columns:
- 25
- Missing values:
- Yes
- Compound keys:
- No
- Loops:
- No
- Type:
- Real
- Instance count:
- 23,833
- Target table:
- inspections
- Target column:
- score
- Target ID:
- business_id
- Target timestamp:
- date
Algorithms
Dataset version | Target | Algorithm | Author text | Measure | Value |
---|---|---|---|---|---|
SFScores | scores | Predictor Factory | Predictor Factory | MAE | 5.221 |
SFScores | scores | Predictor Factory | Predictor Factory | RMSE | 6.917 |
SFScores | scores | Predictor Factory | Predictor Factory | R2 | 0.289 |
SFScores | score | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | R2 | 0.2793 |
SFScores | score | Deep Feature Synthesis | featuretools | R2 | 0.2618 |
SFScores | score | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | RMSE | 7.1 |
SFScores | score | Deep Feature Synthesis | featuretools | RMSE | 7.18 |
SFScores | score | FastProp | getML: Feature Learning with AutoML to build end-to-end prediction pipelines | MAE | 5.37 |
SFScores | score | Deep Feature Synthesis | featuretools | MAE | 5.45 |
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: relational.fel.cvut.cz
- port: 3306
- username: guest
- password: ctu-relational
- Export "SFScores" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).