Evolutionary Computation for Big Data and Big Learning Workshop Data Mining Competition 2014: Self-deployment track

Competition report

The results of the competition were presented at the ECBDL'14 workshop, part of GECCO-2014 in Vancouver, July 13th, 2014


The dataset select for this competition comes from the Protein Structure Prediction field, and it was originally generated to train a predictor for the residue-residue contact prediction track of the CASP9 competiton. The dataset has 32 million instances, 631 attributes, 2 classes, 98% of negative examples and occupies, when uncompressed, about 56GB of disk space. The details of the dataset generation and a learning strategy used to train a method for this problem using evolutionary computation have been published in this Bioinformatics article on contact map prediction. The dataset is available in the ARFF format of the WEKA machine learning package.