Stats

An anonymized dump of all user-contributed content on the Stats Stack Exchange network.

Original source: archive.org

Stats (by Jan Motl)
Stats_CEB (by Jan Motl)
- A simplified version that eliminate all the attributes with string type.

Dataset details

Associated task:

Regression

Domain:

Education

Data types:

Size:

658.4 MB

Count of tables:

Count of rows:

1,027,838

Count of columns:

Missing values:

Yes

Compound keys:

Loops:

Yes

Dataset version	Target	Algorithm	Author text	Measure	Value
stats	Reputation	FastProp	getML: Feature Learning with AutoML to build end-to-end prediction pipelines	R2	0.9777
stats	Reputation	Relboost	getML: Feature Learning with AutoML to build end-to-end prediction pipelines	R2	0.9809
stats	Reputation	Deep Feature Synthesis	featuretools	R2	0.9624
stats	Reputation	FastProp	getML: Feature Learning with AutoML to build end-to-end prediction pipelines	RMSE	0.6533
stats	Reputation	Relboost	getML: Feature Learning with AutoML to build end-to-end prediction pipelines	RMSE	0.6076
stats	Reputation	Deep Feature Synthesis	featuretools	RMSE	0.8499
stats	Reputation	FastProp	getML: Feature Learning with AutoML to build end-to-end prediction pipelines	MAE	0.3361
stats	Reputation	Relboost	getML: Feature Learning with AutoML to build end-to-end prediction pipelines	MAE	0.3114
stats	Reputation	Deep Feature Synthesis	featuretools	MAE	0.3487

The datasets are publicly available directly from MariaDB database.

Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
Use following credentials:
- hostname: relational.fel.cvut.cz
- port: 3306
- username: guest
- password: ctu-relational
Export "stats" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).