Type of Document Dissertation Author Cheng, Hui Author's Email Address firstname.lastname@example.org URN etd-10292010-151237 Title Data integration and visualization for systems biology data Degree PhD Department Genetics, Bioinformatics, and Computational Biology Advisory Committee
Advisor Name Title Mendes, Pedro J. P. Committee Chair Hoeschele, Ina Committee Member Laubenbacher, Reinhard C. Committee Member Tyler, Brett M. Committee Member Keywords
- Fast Fourier transform
- phase spectrum
- data fusion
- data visualization
- biplot display
- Systems biology
- data integration
Date of Defense 2010-10-27 Availability unrestricted AbstractSystems biology aims to understand cellular behavior in terms of the spatiotemporal interactions among cellular components, such as genes, proteins and metabolites. Comprehensive visualization tools for exploring multivariate data are needed to gain insight into the physiological processes reflected in these molecular profiles. Data fusion methods are required to integratively study high-throughput transcriptomics, metabolomics and proteomics data combined before systems biology can live up to its potential. In this work I explored mathematical and statistical methods and visualization tools to resolve the prominent issues in the nature of systems biology data fusion and to gain insight into these comprehensive data.
In order to choose and apply multivariate methods, it is important to know the distribution of the experimental data. Chi square Q-Q plot and violin plot were applied to all M. truncatula data and V. vinifera data, and found most distributions are right-skewed (Chapter 2). The biplot display provides an effective tool for reducing the dimensionality of the systems biological data and displaying the molecules and time points jointly on the same plot. Biplot of M. truncatula data revealed the overall system behavior, including unidentified compounds of interest and the dynamics of the highly responsive molecules (Chapter 3). The phase spectrum computed from the Fast Fourier transform of the time course data has been found to play more important roles than amplitude in the signal reconstruction. Phase spectrum analyses on in silico data created with two artificial biochemical networks, the Claytor model and the AB2 model proved that phase spectrum is indeed an effective tool in system biological data fusion despite the data heterogeneity (Chapter 4). The difference between data integration and data fusion are further discussed. Biplot analysis of scaled data were applied to integrate transcriptome, metabolome and proteome data from the V. vinifera project. Phase spectrum combined with k-means clustering was used in integrative analyses of transcriptome and metabolome of the M. truncatula yeast elicitation data and of transcriptome, metabolome and proteome of V. vinifera salinity stress data. The phase spectrum analysis was compared with the biplot display as effective tools in data fusion (Chapter 5). The results suggest that phase spectrum may perform better than the biplot.
This work was funded by the National Science Foundation Plant Genome Program, grant DBI- 0109732, and by the Virginia Bioinformatics Institute.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Cheng_H_D_2010_corrected.pdf 5.04 Mb 00:23:21 00:12:00 00:10:30 00:05:15 00:00:26
If you have questions or technical problems, please Contact DLA.