Evolutionary Computation for Big Data and Big Learning Workshop Data Mining Competition 2014: Self-deployment track

The competition is closed now.
Competition report

The results of the competition were presented at the ECBDL'14 workshop, part of GECCO-2014 in Vancouver, July 13th, 2014

Dataset

The dataset select for this competition comes from the Protein Structure Prediction field, and it was originally generated to train a predictor for the residue-residue contact prediction track of the CASP9 competiton. The dataset has 32 million instances, 631 attributes, 2 classes, 98% of negative examples and occupies, when uncompressed, about 56GB of disk space. The details of the dataset generation and a learning strategy used to train a method for this problem using evolutionary computation have been published in this Bioinformatics article on contact map prediction. The dataset is available in the ARFF format of the WEKA machine learning package.

training set (1.4 GB): training_set.arff.xz
test set (116 MB): test_set.arff.xz
(updated 2014-07-07 with the actual classes of the test samples, after the competition closed)