Auto-WEKA : Sample Datasets

 

Below are some sample datasets that have been used with Auto-WEKA. Each zip has two files, test.arff and train.arff in WEKA's native format. To use these zip files with Auto-WEKA, you need to pass them to an InstanceGenerator that will split them up into different subsets to allow for processes like cross-validation.

To perform 10 fold cross-validation with a specific seed, you can use the following line for your instanceGeneratorArgs that you pass to the ExperimentConstructor: When running an Auto-WEKA wrapper, you can then use the following 10 lines as an instanceString: See the manual provided with Auto-WEKA for more details on how to chain InstanceGenerators together.

Data Sets

Name/SourceAuto-WEKA ZipSize
Dexterdexter.zip311K
German Creditcreditg.zip18K
Dorotheadorothea.zip3.3M
Yeastyeast.zip13K
Amazonamazon.zip2.3M
Secomsecom.zip2.0M
Semeionsemeion.zip51K
Carcar.zip5.6K
Madelonmadelon.zip1.6M
KR-vs-KPkrvskp.zip17K
Abaloneabalone.zip57K
Wine Qualitywinequalitywhite.zip74K
Waveformwaveform.zip338K
Gisettegisette.zip 12M
Convexconvex.zip4.2M
CIFAR-10-Smallcifar10small.zip67M
MNISTmnist.zip23M
Rot. MNIST + BImnistrotationbackimagenew.zip114M
Shuttleshuttle.zip425K
KDD09-Appentencykddcup09appetency.zip9.2M
CIFAR-10cifar10.zip199M

Raw SMAC Data

We have been asked to provide the raw state files for the SMAC runs on these datasets (Note that they are not using the same version of Auto-WEKA as in the KDD paper). Raw Data (1.2GB)