Atherosclerosis
The study STULONG is a longitudinal 20 years lasting primary preventive study of middle-aged men. The study aims to identify prevalence of atherosclerosis RFs in a population generally considered to be the most endangered by possible atherosclerosis com…
Biodegradability
This is an older data set of chemical structures containing 328 compounds labeled by their half-life for aerobic aqueous biodegradation (a regression task).
Bupa
Evaluation of patients on liver disorder.
Carcinogenesis
For prediction of whether a given molecule is carcinogenic or not. The dataset contains 182 positive carcinogenicity tests and 148 negative tests.
FNHK
Anonymised data from a hospital in Hradec Kralove, Czech Republic, about treatment and medication.
Genes
KDD Cup 2001 prediction of gene/protein function and localization.
Hepatitis
PKDD'02 Hepatitis dataset describes 206 instances of Hepatitis B (contrasting them against 484 cases of Hepatitis C).
Thrombosis
PKDD'99 Medical dataset describes 41 patients with Thrombosis.
Musk
The Musk database describes molecules occurring in different conformations. Each molecule is either musk or non-musk and one of the conformations determines this property. Such a problem is known as a multiple-instance problem, and is modeled by two tables molecule and…
Mutagenesis
The dataset comprises of 230 molecules trialed for mutagenicity on Salmonella typhimurium. A subset of 188 molecules is learnable using linear regression. This subset was later termed the ”regression friendly” dataset. The remaining subset of 42 molecules is named the …
Pima
The National Institute of Diabetes and Digestive and Kidney Diseases conducted a study on 768 adult female Pima Indians living near Phoenix.
PTE
A database from The Predictive Toxicology Evaluation Challenge (1997). The task is to predict whether the compound is carcinogenic, or not.
Pyrimidine
A pyrimidine QSAR dataset. The goal is to predict the inhibition of dihydrofolate reductase by pyrimidines.
PTC
Predictive Toxicology Challenge (2000) consists of more than three hundreds of organic molecules marked according to their carcinogenicity on male and female mice and rats.
Triazine
A pyrimidine QSAR dataset. The the goal is to predict the inhibition of dihydrofolate reductase by pyrimidines.