TPCDS

TPCDS

TPC-DS is the new decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. Although the underlying business model of TPC-DS is a retail product supplier, the database schema, data population, queries, data maintenance model and implementation rules have been designed to be broadly representative of modern decision support systems.

Original source: www.tpc.org

Versions

  • Tpcds (by Jan Motl)

Dataset details

Associated task:
Classification
Domain:
Retail
Data types:
Size:
4.8 GB
Count of tables:
24
Count of rows:
21,005,545
Count of columns:
425
Missing values:
Yes
Compound keys:
No
Loops:
Yes
Type:
Synthetic
Instance count:
97,006
Target table:
customer
Target column:
c_preferred_cust_flag
Target ID:
c_customer_sk
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: relational.fel.cvut.cz
    • port: 3306
    • username: guest
    • password: ctu-relational
  3. Export "tpcds" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).