Mutagenesis
The dataset comprises of 230 molecules trialed for mutagenicity on Salmonella typhimurium. A subset of 188 molecules is learnable using linear regression. This subset was later termed the ”regression friendly” dataset. The remaining subset of 42 molecules is named the ”regression unfriendly” dataset. Note that authors use this dataset with a variable set of the background knowledge (count of features in ”molecule” table) and consequently, the reported accuracies do not have to be directly comparable.
Original source: www.cs.ox.ac.uk
Versions
Mutagenesis (by Jan Motl)
Mutagenesis_42 (by Janez Kranjc)
Mutagenesis_188 (by Janez Kranjc)
Dataset details
- Associated task:
- Classification
- Domain:
- Medicine
- Data types:
- Size:
- 900 KB
- Count of tables:
- 3
- Count of rows:
- 10,324
- Count of columns:
- 14
- Missing values:
- No
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 188
- Target table:
- molecule
- Target column:
- mutagenic
- Target ID:
- molecule_id
- Target timestamp:
- ?
References
Algorithms
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: relational.fel.cvut.cz
- port: 3306
- username: guest
- password: ctu-relational
- Export "mutagenesis" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).