|Title:||Analysis and Modeling of World Wide Web Traffic|
|Degree:||Doctor of Philosophy|
|Committee Chair:||Edward A. Fox|
|Committee Members:||Marc Abrams|
|Keywords:||Proxy, Caching, Log analysis, World Wide Web, Scalability, Modeling, Time Series|
|Date of defense:||April 27, 1998|
|Availability:||Release the entire work for Virginia Tech access only.
After one year release worldwide only with written permission of the student and the advisory committee chair.
This dissertation deals with monitoring, collecting, analyzing, and modeling of World Wide Web (WWW) traffic and client interactions. The rapid growth of WWW usage has not been accompanied by an overall understanding of models of information resources and their deployment strategies. Consequently, the current Web architecture often faces performance and reliability problems. Scalability, latency, bandwidth, and disconnected operations are some of the important issues that should be considered when attempting to adjust for the growth in Web usage. The WWW Consortium launched an effort to design a new protocol that will be able to support future demands. Before doing that, however, we need to characterize current users' interactions with the WWW and understand how it is being used.
We focus on proxies since they provide a good medium for caching, filtering information, payment methods, and copyright management. We collected proxy data from our environment over a period of more than two years. We also collected data from other sources such as schools, information service providers, and commercial sites. Sampling times range from days to years. We analyzed the collected data looking for important characteristics that can help in designing a better HTTP protocol. We developed a modeling approach that considers Web traffic characteristics such as self-similarity and long-range dependency. We developed an algorithm to characterize users' sessions. Finally we developed a high-level Web traffic model suitable for sensitivity analysis.
As a result of this work we develop statistical models of parameters such as arrival times, file sizes, file types, and locality of reference. We describe an approach to model long-range and dependent Web traffic and we characterize activities of users accessing a digital library courseware server or Web search tools.
Temporal and spatial locality of reference within examined user communities is high, so caching can be an effective tool to help reduce network traffic and to help solve the scalability problem. We recommend utilizing our findings to promote a smart distribution or push model to cache documents when there is likelihood of repeat accesses.
List of Attached Files
|At the author's request, all materials (PDF files, images, etc.) associated with this ETD are accessible from the Virginia Tech network only.|