Improving Performances of Suboptimal Greedy Iterative Biclustering Heuristics via Localization


About our research

 

April,2010

Biclustering gene expression data is the problem of extracting submatrices of genes and conditions exhibiting significant correlation across both the rows and the columns of a data matrix of expression values. Even the simplest versions of the problem are computationally hard. Most of the proposed solutions therefore employ greedy iterative heuristics that locally optimize a suitably assigned scoring function. We provide a fast and simple preprocessing algorithm called localization that reorders the rows and columns of the input data matrix in such a way as to group correlated entries in small local neighborhoods within the matrix. The proposed localization algorithm takes its roots from effective use of graph-theoretical methods applied to problems exhibiting a similar structure to that of biclustering. In order to evaluate the effectivenesss of the localization preprocessing algorithm, we focus on three representative greedy iterative heuristic methods. We show how the localization preprocessing can be incorporated into each representative algorithm to improve biclustering performance. Furthermore we propose a simple biclustering algorithm, Random Extraction After Localization (REAL) which randomly extracts submatrices from the localization preprocessed data matrix, eliminates those with low similarity scores, and provides the rest as correlated structures representing biclusters. We compare the proposed localization preprocessing with another preprocessing alternative, non-negative matrix factorization. We show that our fast and simple localization procedure provides similar or even better results than the computationally heavy matrix factorization preprocessing with regards to H-value tests. We next demonstrate that the performances of the three representative greedy iterative heuristic methods improve with localization preprocessing when biological correlations in the form of functional enrichment and PPI verification constitute the main performance criteria. The fact that the random extraction method based on localization REAL performs better than the representative greedy heuristic methods under same criteria also confirms the effectiveness of the suggested preprocessing method.

Previously, we proposed a biclustering visualization tool called Robinviz. You can use it to visualize the results of REAL which is embedded within this visualization tool. We also provide the self executables of REAL.

We provide the results of two localized datasets for Yeast( Saccharomyces Cerevisiae) and Thaliana(Aribidopsis Thaliana).

Additionaly we provide the implementation of our experimental framework and it is free to download.

In the above screenshot localization is applied on the input expression matrix(left) to provide the reordered localized matrix (right).