Data sets
Below are some data sets that may be used in connection with some of the exercises in the book.Training sets
Missing data
Hidden variable
Structural constraint
- Infected milk (10 000, missing values)
- Infected milk (100 000, no missing values)
The KDD cups and the UCI machine learning repository are sources of other data sets frequently used in the machine learning/data mining litterature.
KDD Cup data
UCI data repository
Last modified: Tue Jun 19 20:13:08 2007