Creating innovative bio-convergent technologies for better human life

제 목 : Genomic Data Mining Enhanced by Symbolic
Manipulation of Boolean Functions
일 시 : 2005년 9월 7일 오전11시

장 소 : 정문술빌딩 219호

연 사 : 윤 성로 (Stanford Univeristy)

Abstract

Today, more and more large-scale genomic data sets are being produced by various high-throughputtechnologies, and genomic data mining has never been more important. Clustering is an unsupervised learning technique that has been popular in data analysis. Although there is mature statistical literature on clustering, new types of genomic data such as gene expression data have sparked development of multiple new methods. Specifically, the technique of biclustering refers to a method that performs simultaneous clustering of rows and columns in a data matrix identifying patterns that appear in the form of (possibly overlapping) submatrices. Although this method has some clear advantages over conventional clustering techniques, it has been challenging to develop an efficient biclustering algorithm, since the problem of biclustering is inherently intractable and hard to approximate. In the first part of this issertation, a novel biclustering algorithm based upon the symbolic manipulation of Boolean functions is presented. This algorithm exploits the zero-suppressed binary decision diagrams (ZBDDs) to implicitly represent and manipulate massive intermediate data occurring in the biclustering process. Leveraged by the ZBDDs, the proposed algorithm can find all the biclusters that satisfy specific input parameters.The second part discusses the application of this algorithm to various genomic data mining tasks such as analyzing gene expression data, linking clinical traits with related genes, and predicting microRNA regulatory modules. The experimental results demonstrate that the proposed method outperforms the alternative techniques tested - in terms of response time, the number of biclusters that can be found, and more importantly, how accurately the discovered biclusters conform to the known biological knowledge.