Title page for ETD etd-07212009-040529

Type of Document Master's Thesis
Author Averboch, Guillermo Andres
URN etd-07212009-040529
Title A system for document analysis, translation, and automatic hypertext linking
Degree Master of Science
Department Computer Science and Applications
Advisory Committee
Advisor Name Title
Heath, Lenwood S. Committee Chair
Arthur, James D. Committee Member
Fox, Edward Alan Committee Member
  • computer language
  • database
  • formats
Date of Defense 1995-06-05
Availability unrestricted

A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library.

This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes.

To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats.

The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  LD5655.V855_1995.A992.pdf 7.55 Mb 00:34:56 00:17:58 00:15:43 00:07:51 00:00:40
[BTD] next to an author's name indicates that all files or directories associated with their ETD are accessible from the Virginia Tech campus network only.

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.