Title page for ETD etd-05032007-223232

Type of Document Dissertation
Author Kumar, Deept
Author's Email Address dkumar@vt.edu
URN etd-05032007-223232
Title Redescription Mining: Algorithms and Applications in Bioinformatics
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Ramakrishnan, Naren Committee Chair
Helm, Richard Frederick Committee Member
Murali, T. M. Committee Member
North, Christopher L. Committee Member
Potts, Malcolm Committee Member
  • bioinformatics
  • storytelling
  • redescription mining
  • redescriptions
Date of Defense 2007-04-19
Availability unrestricted
Scientific data mining purports to extract useful knowledge from massive datasets curated

through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences,

and computational chemistry. In the recent past, we have witnessed major transformations

of these applied sciences into data-driven endeavors. In particular, scientists are now

faced with an overload of vocabularies for describing domain entities. All of these vocabularies

offer alternative and mostly complementary (sometimes, even contradictory) ways to

organize information and each vocabulary provides a different perspective into the problem

being studied. To further knowledge discovery, computational scientists need tools to help

uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and

situate knowledge gained from one study in terms of others.

This dissertation defines a new pattern class called redescriptions that provides high level capabilities

for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or

a different way of communicating the same information; redescription mining finds concerted

sets of objects that can be defined in (at least) two ways using given descriptors. We present

the CARTwheels algorithm for mining redescriptions by exploiting equivalences of partitions

induced by distinct descriptor classes as well as applications of CARTwheels to several bioinformatics

datasets. We then outline how we can build more complex data mining operations

by cascading redescriptions to realize a story, leading to a new data mining capability called

storytelling. Besides applications to characterizing gene sets, we showcase its uses in other

datasets as well. Finally, we extend the core CARTwheels algorithm by introducing a theoretical

framework, based on partitions, to systematically explore redescription space; generalizing

from mining redescriptions (and stories) within a single domain to relating descriptors across

different domains, to support complex relational data mining scenarios; and exploiting structure

of the underlying descriptor space to yield more effective algorithms for specific classes

of datasets.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  deept_redescs.pdf 2.87 Mb 00:13:16 00:06:49 00:05:58 00:02:59 00:00:15

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.