Carcinogenesis
Alternative names: PTE
For prediction of whether a given molecule is carcinogenic or not. The dataset contains 182 positive carcinogenicity tests and 148 negative tests.
Original source: kt.ijs.si
Versions
Carcinogenesis (by Janez Kranjc)
- Foreign key constraints violated. Specifically, table "atom" has a drug "d115" that is missing in "canc" table. The dataset contains just 329 instances, the expected number is 330.
Dataset details
- Associated task:
- Classification
- Domain:
- Medicine
- Data types:
- Size:
- 21 MB
- Count of tables:
- 6
- Count of rows:
- 27,570
- Count of columns:
- 23
- Missing values:
- No
- Compound keys:
- No
- Loops:
- Yes
- Type:
- Real
- Instance count:
- 329
- Target table:
- canc
- Target column:
- class
- Target ID:
- drug_id
- Target timestamp:
- ?
References
Algorithms
Dataset version | Target | Algorithm | Author text | Measure | Value |
---|---|---|---|---|---|
Carcinogenesis | class | Aleph | Wordification: Propositionalization by unfolding relational data into bags of words | Accuracy | 0.5532 |
Carcinogenesis | class | Predictor Factory | Predictor Factory | Accuracy | 0.6689 |
Carcinogenesis | class | RelF | Wordification: Propositionalization by unfolding relational data into bags of words | Accuracy | 0.6018 |
Carcinogenesis | class | RSD | ClowdFlows | Accuracy | 0.55 |
Carcinogenesis | class | RSD | Wordification: Propositionalization by unfolding relational data into bags of words | Accuracy | 0.6049 |
Carcinogenesis | class | Wordification | ClowdFlows | Accuracy | 0.8 |
Carcinogenesis | class | Wordification | Wordification: Propositionalization by unfolding relational data into bags of words | Accuracy | 0.6231 |
How to download the dataset
The datasets are publicly available directly from MariaDB database.
- Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
- Use following credentials:
- hostname: relational.fel.cvut.cz
- port: 3306
- username: guest
- password: ctu-relational
- Export "Carcinogenesis" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).