SFScores

SFScores

The San Francisco Dept. of Public Health’s database of eateries, inspections of those eateries, and violations found during the inspections. The task is to predict the unscheduled inspection scores from 2013 to 2016. The scores range from 1 to 100, where 100 means that the establishment meets all required standards. Beware of temporal leakages: you are not permitted to use violations from the predicted inspection to predict the inspection score as that would be too easy.

Original source: 2016.padjo.org

Versions

  • SFScores (by Jan Motl)

    • Deleted records from frpm without a match in schools

Dataset details

Associated task:
Regression
Domain:
Government
Data types:
Size:
10.3 MB
Count of tables:
3
Count of rows:
66,153
Count of columns:
25
Missing values:
Yes
Compound keys:
No
Loops:
No
Type:
Real
Instance count:
23,833
Target table:
inspections
Target column:
score
Target ID:
business_id
Target timestamp:
date

Algorithms

Dataset versionTargetAlgorithmAuthor textMeasureValue
SFScoresscoresPredictor FactoryPredictor FactoryMAE5.221
SFScoresscoresPredictor FactoryPredictor FactoryRMSE6.917
SFScoresscoresPredictor FactoryPredictor FactoryR20.289
SFScoresscoreFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesR20.2793
SFScoresscoreDeep Feature SynthesisfeaturetoolsR20.2618
SFScoresscoreFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesRMSE7.1
SFScoresscoreDeep Feature SynthesisfeaturetoolsRMSE7.18
SFScoresscoreFastPropgetML: Feature Learning with AutoML to build end-to-end prediction pipelinesMAE5.37
SFScoresscoreDeep Feature SynthesisfeaturetoolsMAE5.45

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: relational.fel.cvut.cz
    • port: 3306
    • username: guest
    • password: ctu-relational
  3. Export "SFScores" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).