Type of Document Dissertation Author Zhu, Yitan Author's Email Address firstname.lastname@example.org URN etd-09092009-214731 Title Learning Statistical and Geometric Models from Microarray Gene Expression Data Degree PhD Department Electrical and Computer Engineering Advisory Committee
Advisor Name Title Wang, Yue J. Committee Chair Lu, Chang-Tien Committee Member Wyatt, Christopher L. Committee Member Xuan, Jianhua Jason Committee Member Zaghloul, Amir I. Committee Member Keywords
- Blind Source Separation
- Convex Analysis and Optimization
- Gene Expressions
- Clustering Evaluation
- Data Clustering and Visualization
Date of Defense 2009-09-02 Availability unrestricted AbstractIn this dissertation, we propose and develop innovative data modeling and analysis methods for extracting meaningful and specific information about disease mechanisms from microarray gene expression data.
To provide a high-level overview of gene expression data for easy and insightful understanding of data structure, we propose a novel statistical data clustering and visualization algorithm that is comprehensively effective for multiple clustering tasks and that overcomes some major limitations of existing clustering methods. The proposed clustering and visualization algorithm performs progressive, divisive hierarchical clustering and visualization, supported by hierarchical statistical modeling, supervised/unsupervised informative gene/feature selection, supervised/unsupervised data visualization, and user/prior knowledge guidance through human-data interactions, to discover cluster structure within complex, high-dimensional gene expression data.
For the purpose of selecting suitable clustering algorithm(s) for gene expression data analysis, we design an objective and reliable clustering evaluation scheme to assess the performance of clustering algorithms by comparing their sample clustering outcome to phenotype categories. Using the proposed evaluation scheme, we compared the performance of our newly developed clustering algorithm with those of several benchmark clustering methods, and demonstrated the superior and stable performance of the proposed clustering algorithm.
To identify the underlying active biological processes that jointly form the observed biological event, we propose a latent linear mixture model that quantitatively describes how the observed gene expressions are generated by a process of mixing the latent active biological processes. We prove a series of theorems to show the identifiability of the noise-free model. Based on relevant geometric concepts, convex analysis and optimization, gene clustering, and model stability analysis, we develop a robust blind source separation method that fits the model to the gene expression data and subsequently identify the underlying biological processes and their activity levels under different biological conditions.
Based on the experimental results obtained on cancer, muscle regeneration, and muscular dystrophy gene expression data, we believe that the research work presented in this dissertation not only contributes to the engineering research areas of machine learning and pattern recognition, but also provides novel and effective solutions to potentially solve many biomedical research problems, for improving the understanding about disease mechanisms.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Zhu_Y_D_2009.pdf 2.93 Mb 00:13:33 00:06:58 00:06:05 00:03:02 00:00:15
If you have questions or technical problems, please Contact DLA.