Title page for ETD etd-10292010-151237

Type of Document Dissertation
Author Cheng, Hui
Author's Email Address hcheng@vt.edu
URN etd-10292010-151237
Title Data integration and visualization for systems biology data
Degree PhD
Department Genetics, Bioinformatics, and Computational Biology
Advisory Committee
Advisor Name Title
Mendes, Pedro J. P. Committee Chair
Hoeschele, Ina Committee Member
Laubenbacher, Reinhard C. Committee Member
Tyler, Brett M. Committee Member
  • Fast Fourier transform
  • phase spectrum
  • data fusion
  • data visualization
  • biplot display
  • Systems biology
  • data integration
Date of Defense 2010-10-27
Availability unrestricted
Systems biology aims to understand cellular behavior in terms of the spatiotemporal interactions among cellular components, such as genes, proteins and metabolites. Comprehensive visualization tools for exploring multivariate data are needed to gain insight into the physiological processes reflected in these molecular profiles. Data fusion methods are required to integratively study high-throughput transcriptomics, metabolomics and proteomics data combined before systems biology can live up to its potential. In this work I explored mathematical and statistical methods and visualization tools to resolve the prominent issues in the nature of systems biology data fusion and to gain insight into these comprehensive data.

In order to choose and apply multivariate methods, it is important to know the distribution of the experimental data. Chi square Q-Q plot and violin plot were applied to all M. truncatula data and V. vinifera data, and found most distributions are right-skewed (Chapter 2). The biplot display provides an effective tool for reducing the dimensionality of the systems biological data and displaying the molecules and time points jointly on the same plot. Biplot of M. truncatula data revealed the overall system behavior, including unidentified compounds of interest and the dynamics of the highly responsive molecules (Chapter 3). The phase spectrum computed from the Fast Fourier transform of the time course data has been found to play more important roles than amplitude in the signal reconstruction. Phase spectrum analyses on in silico data created with two artificial biochemical networks, the Claytor model and the AB2 model proved that phase spectrum is indeed an effective tool in system biological data fusion despite the data heterogeneity (Chapter 4). The difference between data integration and data fusion are further discussed. Biplot analysis of scaled data were applied to integrate transcriptome, metabolome and proteome data from the V. vinifera project. Phase spectrum combined with k-means clustering was used in integrative analyses of transcriptome and metabolome of the M. truncatula yeast elicitation data and of transcriptome, metabolome and proteome of V. vinifera salinity stress data. The phase spectrum analysis was compared with the biplot display as effective tools in data fusion (Chapter 5). The results suggest that phase spectrum may perform better than the biplot.

This work was funded by the National Science Foundation Plant Genome Program, grant DBI- 0109732, and by the Virginia Bioinformatics Institute.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Cheng_H_D_2010_corrected.pdf 5.04 Mb 00:23:21 00:12:00 00:10:30 00:05:15 00:00:26

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.