Title page for ETD etd-09092009-214731

Type of Document Dissertation
Author Zhu, Yitan
Author's Email Address yitanzhu@vt.edu
URN etd-09092009-214731
Title Learning Statistical and Geometric Models from Microarray Gene Expression Data
Degree PhD
Department Electrical and Computer Engineering
Advisory Committee
Advisor Name Title
Wang, Yue J. Committee Chair
Lu, Chang-Tien Committee Member
Wyatt, Christopher L. Committee Member
Xuan, Jianhua Jason Committee Member
Zaghloul, Amir I. Committee Member
  • Blind Source Separation
  • Convex Analysis and Optimization
  • Gene Expressions
  • Clustering Evaluation
  • Data Clustering and Visualization
Date of Defense 2009-09-02
Availability unrestricted
In this dissertation, we propose and develop innovative data modeling and analysis methods for extracting meaningful and specific information about disease mechanisms from microarray gene expression data.

To provide a high-level overview of gene expression data for easy and insightful understanding of data structure, we propose a novel statistical data clustering and visualization algorithm that is comprehensively effective for multiple clustering tasks and that overcomes some major limitations of existing clustering methods. The proposed clustering and visualization algorithm performs progressive, divisive hierarchical clustering and visualization, supported by hierarchical statistical modeling, supervised/unsupervised informative gene/feature selection, supervised/unsupervised data visualization, and user/prior knowledge guidance through human-data interactions, to discover cluster structure within complex, high-dimensional gene expression data.

For the purpose of selecting suitable clustering algorithm(s) for gene expression data analysis, we design an objective and reliable clustering evaluation scheme to assess the performance of clustering algorithms by comparing their sample clustering outcome to phenotype categories. Using the proposed evaluation scheme, we compared the performance of our newly developed clustering algorithm with those of several benchmark clustering methods, and demonstrated the superior and stable performance of the proposed clustering algorithm.

To identify the underlying active biological processes that jointly form the observed biological event, we propose a latent linear mixture model that quantitatively describes how the observed gene expressions are generated by a process of mixing the latent active biological processes. We prove a series of theorems to show the identifiability of the noise-free model. Based on relevant geometric concepts, convex analysis and optimization, gene clustering, and model stability analysis, we develop a robust blind source separation method that fits the model to the gene expression data and subsequently identify the underlying biological processes and their activity levels under different biological conditions.

Based on the experimental results obtained on cancer, muscle regeneration, and muscular dystrophy gene expression data, we believe that the research work presented in this dissertation not only contributes to the engineering research areas of machine learning and pattern recognition, but also provides novel and effective solutions to potentially solve many biomedical research problems, for improving the understanding about disease mechanisms.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Zhu_Y_D_2009.pdf 2.93 Mb 00:13:33 00:06:58 00:06:05 00:03:02 00:00:15

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.