Communications Project

Document Type:Dissertation
Name:Ghaleb Abdulla
Title:Analysis and Modeling of World Wide Web Traffic
Degree:Doctor of Philosophy
Department:Computer Science
Committee Chair: Edward A. Fox
Committee Members:Marc Abrams
Ali Nayfeh
Osman Balci
Dennis Kafura
Keywords:Proxy, Caching, Log analysis, World Wide Web, Scalability, Modeling, Time Series
Date of defense:April 27, 1998
Availability:Release the entire work for Virginia Tech access only.
After one year release worldwide only with written permission of the student and the advisory committee chair.


This dissertation deals with monitoring, collecting, analyzing, and modeling of World Wide Web (WWW) traffic and client interactions. The rapid growth of WWW usage has not been accompanied by an overall understanding of models of information resources and their deployment strategies. Consequently, the current Web architecture often faces performance and reliability problems. Scalability, latency, bandwidth, and disconnected operations are some of the important issues that should be considered when attempting to adjust for the growth in Web usage. The WWW Consortium launched an effort to design a new protocol that will be able to support future demands. Before doing that, however, we need to characterize current users' interactions with the WWW and understand how it is being used.

We focus on proxies since they provide a good medium for caching, filtering information, payment methods, and copyright management. We collected proxy data from our environment over a period of more than two years. We also collected data from other sources such as schools, information service providers, and commercial sites. Sampling times range from days to years. We analyzed the collected data looking for important characteristics that can help in designing a better HTTP protocol. We developed a modeling approach that considers Web traffic characteristics such as self-similarity and long-range dependency. We developed an algorithm to characterize users' sessions. Finally we developed a high-level Web traffic model suitable for sensitivity analysis.

As a result of this work we develop statistical models of parameters such as arrival times, file sizes, file types, and locality of reference. We describe an approach to model long-range and dependent Web traffic and we characterize activities of users accessing a digital library courseware server or Web search tools.

Temporal and spatial locality of reference within examined user communities is high, so caching can be an effective tool to help reduce network traffic and to help solve the scalability problem. We recommend utilizing our findings to promote a smart distribution or push model to cache documents when there is likelihood of repeat accesses.

List of Attached Files


At the author's request, all materials (PDF files, images, etc.) associated with this ETD are accessible from the Virginia Tech network only.

The author grants to Virginia Tech or its agents the right to archive and display their thesis or dissertation in whole or in part in the University Libraries in all forms of media, now or hereafter known. The author retains all proprietary rights, such as patent rights. The author also retains the right to use in future works (such as articles or books) all or part of this thesis or dissertation.