Title page for ETD etd-06152012-070746

Type of Document Dissertation
Author Guo, Sheng
Author's Email Address guos@vt.edu
URN etd-06152012-070746
Title Using Dependency Parses to Augment Feature Construction for Text Mining
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Ramakrishnan, Naren Committee Chair
Fox, Edward Alan Committee Member
Helm, Richard Frederick Committee Member
Murali, T. M. Committee Member
Zaki, Mohammed J. Committee Member
  • dependency parsing
  • text mining
  • linguistic cues
Date of Defense 2012-05-02
Availability unrestricted
With the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clustering, segmentation, relationship discovery, and practically any task that discovers latent information from written natural language.

Classical mining algorithms have traditionally focused on shallow representations such as bag-of-words and similar feature-based models. With the advent of modern high performance computing, deep sentence level linguistic analysis of large scale text corpora has become practical. In this dissertation, we evaluate the utility of dependency parses as textual features for different text mining applications. Dependency parsing is one form of syntactic parsing, based on the dependency grammar implicit in sentences. While dependency parsing has traditionally been used for text understanding, we investigate here its application to supply features for text mining applications.

We specifically focus on three methods to construct textual features from dependency parses. First, we consider a dependency parse as a general feature akin to a traditional bag-of-words model. Second, we consider the dependency parse as the basis to build a feature graph representation. Finally, we use dependency parses in a supervised collocation mining method for feature selection. To investigate these three methods, several applications are studied, including: (i) movie spoiler detection, (ii) text segmentation, (iii) query expansion, and (iv) recommender systems.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Guo_Sheng_D_2012.pdf 1.61 Mb 00:07:27 00:03:50 00:03:21 00:01:40 00:00:08

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.