The dataset select for this competition comes from the Protein Structure Prediction field, and it was originally generated to train a predictor for the residue-residue contact prediction track of the CASP9 competiton. The dataset has 32 million instances, 631 attributes, 2 classes, 98% of negative examples and occupies, when uncompressed, about 56GB of disk space. The details of the dataset generation and a learning strategy used to train a method for this problem using evolutionary computation have been published in this Bioinformatics article on contact map prediction. The dataset is available in the ARFF format of the WEKA machine learning package.