PTC
Predictive Toxicology Challenge (2000) consists of more than three hundreds of organic molecules marked according to their carcinogenicity on male and female mice and rats.
Original source: www.predictive-toxicology.org
Versions
Toxicology (by Jan Motl)
- Unresolved issues for molecule TR499 (the prolog file has different content from SMILES file). Molecule table contains only binarized labels for male rats (positive if MR={P, CE, SE}, negative if MR={NE, N}). There is a single missing value - a possible error.
Dataset details
- Associated task:
- Classification
- Domain:
- Medicine
- Data types:
- Size:
- 8.1 MB
- Count of tables:
- 4
- Count of rows:
- 49,239
- Count of columns:
- 11
- Missing values:
- Yes
- Compound keys:
- No
- Loops:
- No
- Type:
- Real
- Instance count:
- 343
- Target table:
- molecule
- Target column:
- label
- Target ID:
- molecule_id
- Target timestamp:
- ?
Algorithms
Dataset version | Target | Algorithm | Author text | Measure | Value |
---|---|---|---|---|---|
Toxicology | label | Predictor Factory | Predictor Factory | Accuracy | 0.5951 |
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: relational.fel.cvut.cz
- port: 3306
- username: guest
- password: ctu-relational
- Export "Toxicology" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).