Title page for ETD etd-11182009-172742

Type of Document Master's Thesis
Author Owens, Clifford Conley
Author's Email Address ccowens@vt.edu
URN etd-11182009-172742
Title Mining Truth Tables and Straddling Biclusters in Binary Datasets
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Ramakrishnan, Naren Committee Chair
Murali, T. M. Committee Co-Chair
Brown, Ezra A. Committee Member
  • data mining
  • binary datasets
Date of Defense 2009-11-05
Availability unrestricted
As the world swims deeper into a deluge of data, binary datasets relating objects to properties can be found in many different fields. Such datasets abound in practically any area of interest, including biology, politics, entertainment, and education. This explosion calls for the definition of new types of patterns in binary data, as well as algorithms to find efficiently find these patterns.

In this work, we introduce truth tables as a new class of patterns to be mined in binary datasets. Truth tables represent a subset of properties which exhibit maximal variability (and hence, suggest independence) in occurrence patterns over the underlying objects. Unlike other measures of independence, truth tables possess anti-monotone features that can be exploited in order to mine them effectively. We present a level-wise algorithm that takes advantage of these features, showing results on real and synthetic data. These results demonstrate the scalability of our algorithm.

We also introduce new methods of mining straddling biclusters. Biclusters relate subsets of objects to subsets of properties they share within a single dataset. Straddling biclusters extend biclusters by relating a subset of objects to subsets of properties they share in two datasets. We present two levelwise algorithms, named UnionMiner and TwoMiner, which discover straddling biclusters efficiently by treating multiple datasets as a single dataset. We show results on real and synthetic data, and explore the advantages and limitations of each algorithm. We develop guidelines which suggest which of these algorithms is likely to perform better based on features of the datasets.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Owens_CA_T_2009 977.28 Kb 00:04:31 00:02:19 00:02:02 00:01:01 00:00:05

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.