ABSTRACT
The World Wide Web has experienced phenomenal growth in recent years. This growth of the Web has contributed significantly to the network traffic on the Internet, and motivated much research into improving the performance and scalability of the Web.
In recent years, Web proxy caches have been deployed to reduce network traffic and provide better response time for Web accesses. A Web proxy consists of application-level software that accepts document retrieval requests from a set of clients, forwards these requests to appropriate servers if the requested documents are not already present in the proxy's cache, and sends documents back to the clients. Originally, proxies were designed to allow network administrators to be able to control Internet access from within an intranet [1]. It was recognized, however, that proxies may also serve as repositories for frequently requested documents. This role of proxies has made them very popular. Caching documents at the proxy can save network bandwidth and reduce network latency for document accesses [2].
Most browsers can be configured to use a proxy. Proxies can be deployed almost anywhere in the Internet. A second-level cache, the first level being the browser cache, can be provided by a proxy cache that serves requests from a large community such as a large corporation, a university, or the customers of an Internet service provider. Higher-level proxies have also been deployed. These proxies typically have other second-level caches (i.e., lower-level proxies) as their clients.
Web caches connected in the above described tree-like fashion are said to form a cache hierarchy. Web caches distribute load away from server "hot spots," saving wide area network bandwidth and reducing access latencies. Lower-level caches typically configure a higher-level cache as their parent. The first Web proxy, the CERN httpd server, allowed caches to be arranged hierarchically. In such Web caches, requests that cannot be satisfied from a lower-level cache are forwarded to the higher-level cache, with the expectation that some of these requested documents will be stored there. The Internet Cache Protocol (ICP) [3] was developed as a part of the Harvest caching project [4] to provide a more efficient method of intercache communication. ICP allows lower-level caches to send queries to a higher-level cache to inquire whether or not it has a copy of the requested document. Requests are forwarded to the higher level only if a copy of the document is found; otherwise, the request is forwarded to the origin server.
Understanding Web traffic characteristics is key to the design of techniques that save network bandwidth, reduce latency, and improve response time of Web accesses. This article is concerned with understanding factors that can affect the performance of Web proxy caches. The study builds on previous work done by Arlitt and Williamson [5] on Web server workloads, Cunha et al. [6] on Web client workloads, and Abdulla et al. [7] on Web proxy workloads. The present work differs from that in [7] because workloads from proxies at different levels of a caching hierarchy are considered, permitting a more complete study of how the workload characteristics change as one moves from the client side to the server side of the network.
The rest of the article is organized as follows. We describe the data collection, reduction, and analysis of access logs performed. Next, we present the results of the workload analysis, followed by conclusions.
The Web proxy server at the University of Saskatchewan represents an institution-level Web proxy in the CA*net II caching hierarchy. It serves several hundred users on the University of Saskatchewan campus who have configured their browsers to use the proxy cache. This proxy server is operated by the Department of Computing Services at the University of Saskatchewan. The proxy server uses a Digital AlphaServer 1200 5/400 with two 400 MHz processors running Squid v. 1.2.beta20 with 5 Gbytes disk cache. The proxy is configured to use the CANARIE cache as its parent. That is, cache misses at the University of Saskatchewan cache result in requests to the CANARIE cache for the missing document.
The CANARIE proxy cache is the root of the CA*net II caching hierarchy. This proxy is a SUN Ultra Sparc 180 MHz machine running Squid v. 1.2.beta22. The CANARIE proxy server is physically located at Bell Canada, Toronto, although it is administratively controlled by the CANARIE Advanced Research and Development Network Operations Centre (ARDNOC) in Ottawa. This cache currently functions as a parent for several first-level proxies, including proxies at the University of Saskatchewan, the University of Alberta, Dalhousie University, and McMaster University. There are also a small number of users who configure their browsers to directly use the CANARIE proxy cache. The CANARIE proxy has parent links to two nodes in the NLANR caching hierarchy: the Pittsburgh NLANR node and the NLANR node at UIUC.
The NLANR traces in this study come from the National Center for Supercomputing Applications (NCSA) proxy server at UIUC. This site represents one of several top-level nodes in the NLANR Web caching hierarchy; it receives requests from sibling caches at the top level, as well as from lower-level caches that use it as a parent. The NCSA proxy server is a Digital AlphaServer 1000 266 MHz machine running Squid v. 1.2.beta17 with about 10 Gbytes of disk cache.
The access logs of individual days were concatenated to obtain longer data sets for each site. A 45-day trace was created for the University of Saskatchewan proxy server by concatenating access logs from February 5 to March 30, 1999.1 Similarly, a 45-day CANARIE trace spanning February 2 to March 24, 1999,2 and a 30-day NLANR trace spanning February 3 to March 4, 1999 were created. The University of Saskatchewan proxy server recorded a total of 30,330,149 requests in 45 days of activity. The CANARIE proxy server recorded a total of 20,725,429 requests in 45 days of activity. The NLANR access logs recorded 35,631,782 requests in 30 days of activity. These three data sets will be referred to as USask, CANARIE, and NLANR in the rest of this article. The access logs are presented in this relative order to reflect the progression from an institution-level Web proxy cache to a top-level Web proxy cache.
After removing the ICP queries from the raw data sets, the next step used the HTTP reply codes in the access logs for data reduction. The HTTP reply codes describe how a client's request was serviced. For example, a reply code of 200 (OK) means that a valid document was made available to the client, either directly from the proxy cache, or by retrieving the document from another proxy cache or the origin server. A reply code of 304 (Not Modified) implies that a client had issued a GET If-Modified-Since request to determine whether or not the client's cached copy of a document was up to date, and the server (or some intermediate proxy) replied indicating that the client has a valid copy of the document. An HTTP response code of 206 (Partial Content) implies partial transfer of the document to the client in response to a partial GET request. Partial GETs are used to recover efficiently from partial failed transfers in HTTP/1.1. An HTTP response of 302 (Found) means that the requested document is known to reside in a different location than that specified by the URL.
In the data sets, approximately 90 percent of the HTTP requests made to the proxy result in the client successfully obtaining the document (HTTP 200 and HTTP 304). In this study, we consider all requests that would result in the document being accessed from the origin server in the absence of intermediate proxies. Therefore, only the 200 (OK) and 206 (Partial Content) status codes are considered.
Table 1 summarizes the reduced access logs for the three Web proxy sites. Based on the average number of requests seen per day at each proxy server, the NLANR proxy server has the highest activity, while the CANARIE proxy server has the least activity. This is not surprising since very few institutional caches currently use the CANARIE proxy cache.3
Table 1 also indicates the number of distinct documents, servers, and clients recorded in the access logs. Each proxy handles requests for millions of distinct documents located at thousands of different Web servers. The number of clients (i.e., distinct client IP addresses seen) varies quite significantly in the three traces. This number is not known precisely for the NLANR log, since the client IP addresses in the NLANR access logs are randomized every day (for privacy concerns). However, on any particular day, about 700 clients generate requests to the NLANR cache.
Overall, the mean and median document transfer sizes are quite small, as has been reported in previous studies of Web servers [5, 10] and Web proxy servers [7]. In these data sets, the mean size of the documents transferred ranges from 8 to 13 kbytes, while the median is in the range of 2–3 kbytes. The mean transfer size is larger than the median transfer size because there are some very large documents that skew the mean of the transfer size distribution. There is also high variability in the sizes of documents transferred, as indicated by the coefficient of variation (COV) reported in Table 1. These reduced data sets are used in the analyses in the rest of the article.
The results of this analysis for the three data sets are summarized in Table 2. The table shows that HTML and image documents account for close to 95 percent of the total requests. Similar results were reported for client traces by Cunha et al. [6], and for server traces by Arlitt and Williamson [5].
Unlike Web server workloads, however, Table 2 shows that image documents are consistently the most requested document type (68–78 percent), followed by HTML documents (about 20 percent). Similar observations were made by Abdulla et al. [7] in their characterization of Web proxy traffic. It is also observed that image type documents are responsible for the highest percentage of bytes transferred (40–52 percent), followed by HTML documents (18–23 percent).
A substantial portion of the byte transfer volume is accounted for by audio, video, compressed, and formatted document types, which, despite their relatively infrequent access (typically close to 1 percent of requests), are large enough to generate 20–30 percent of the bytes transferred. This observation is substantiated by the large mean and median transfer sizes indicated for these document types compared to other document types. In general, the COV values within each document type category are much lower than the COV of transfer sizes for the aggregate data set (Table 1). The large variation in the mean transfer sizes across the diverse document types helps explain the high COV reported in the aggregate data sets.
The precise cause of this one-time referencing behavior is not fully understood. Several explanations have been proposed. First, it might indicate the vastness of the Web and the low signal-to-noise ratio for much of its content. Second, it might reflect human nature in browsing habits (e.g., once a site has been visited, there is no need to visit it again). Third, it might reflect the behavior of content providers, who might use date-based URL names, or redesign or modify Web pages on a regular basis to keep them current, while possibly removing or renaming old pages. Fourth, it may reflect the presence of search engines or Web robots that traverse many pages to construct an index. Fifth, the presence of a high percentage of one-timers might indicate that caching hierarchies are actually working well. Finally, it may be the consequence of document prefetching (i.e., prediction) algorithms in some proxies and/or client browsers. All of these hypotheses seem plausible. If any (or all) of them are true, they indicate a challenging workload environment for Web caching algorithms.
Several other explanations are less plausible. For example, attributing this one-time referencing to the presence of dynamic requests (e.g., CGI) is not possible, since dynamic documents typically account for much less than 5 percent of the workload in Web server and Web proxy access logs. Attributing one-time referencing to typographical errors made by clients in requested URL names does not make sense, since the access log analyses usually focus on successful requests, not errors.
Table 3 summarizes the contribution of one-timers with respect to the number of distinct documents, total requests, and total bytes transferred for the three data sets. One-timers account for a substantial portion of the total requests (18–47 percent) and total bytes transferred (27–48 percent). Also, approximately 70 percent of the documents referenced are one-timers. The latter number is significantly higher than that for Web server workloads [5]. Presumably this is because requests made to a proxy are for a much larger population of Web documents than those made to a single server.
The predominance of one-time referencing for Web documents highlights the need for novel Web proxy caching policies that can effectively discriminate against one-timers. For example, frequency-based algorithms, such as Least Frequently Used (LFU), tend to perform better than recency-based algorithms, such as Least Recently Used (LRU) [11]. Similar observations have been made for Web server caching algorithms [5, 12, 13].
Figure 1 shows the cumulative distribution function of the transfer size for each proxy server, using a logarithmic scale on the horizontal axis. Almost all the transfers are in the range from 100 to 100,000 bytes, with very few small transfers (less than 100 bytes) and very few large transfers (more than 100,000 bytes). This distribution is similar to the document size distribution reported for Web clients [6] and Web servers [5, 10, 14, 15].
The transfer size distribution in Fig. 1 is heavy-tailed. In simple terms, a "heavy tail" in a Web document size distribution means that the very large "elephants" (i.e., outliers) in the tail of the distribution, although relatively few in number, are large enough to contribute significantly to the overall traffic volume observed (e.g., skew the mean transfer size distribution, as observed in Table 1).
Figure 2 illustrates the "heavy-tail" transfer size distribution for the USask data set. Figure 2 shows that in the USask data set 75 percent of all the distinct documents requested are for transfers less than 10,000 bytes in size.5 About 80 percent of the references are for documents in this category. However, transfers of documents smaller than 10,000 bytes account for only 27 percent of the total bytes transferred by the proxy to the clients. The tail of the distribution (transfers over 100,000 bytes) accounts for a significant 30 percent of the total bytes transferred. Similar observations can be made for the other data sets [16]. Therefore, caching smaller documents can increase the hit ratio at the cost of more bytes transferred over the network. To reduce the network traffic volume, proxies can cache larger documents, thus sacrificing the cache hit ratio.
The simplest example of a heavy-tailed distribution is the doubly exponential Pareto distribution; its cumulative distribution function is
F(x) = 1 – (k/x)
, k > 0, x
k
The parameter is known as the tail index [15], and k defines the beginning of the "tail" of the distribution (i.e., it
represents the smallest possible value of the random variable in the heavy-tailed distribution). As
decreases, the tail of the distribution becomes heavier [15]. In other words, an arbitrarily large portion of the probability mass may be present in the tail of the
distribution as
decreases.
To estimate the tail index for the transfer size data sets, the approach outlined in [15, 17] is followed. First, a log-log
complementary distribution (LLCD) is plotted for the transfer sizes in the data sets. An LLCD plot graphs log
(x) = log(1 – F(x)) vs. logx, for large x [17]. Heavy-tailed distributions have the following property:
An estimate of the tail index is obtained by determining the slope of the LLCD plot for values of x greater than k, using least-squares linear regression [18].
Figure 3 shows the LLCD plot for the NLANR data set, along with the least-squares regression fit for the heavy tail (k = 10,000 bytes). Table 4 summarizes the estimated value for each data set, along with the coefficient of determination (R2), which assesses the "goodness of fit" for the linear regression.
These results show a very strong fit R2, and values ranging from 1.07 to 1.27.
It can be concluded that proxy transfer sizes are heavy-tailed, just like server transfer sizes (0.93
1.33). Among the three data sets considered, the NLANR
data set exhibited the heaviest tail (
= 1.07), while the USask data set shows the least heavy tail (
= 1.27). One possible explanation for this observation is successful caching of small documents at lower-level proxies, resulting in heavier tails at higher-level caches.
Zipf's Law [21] states that if items are ranked (r) according to their popularity (P) measured by the frequency of reference for individual items, then the popularity of an item is given by
where the exponent ß is often close to unity [22].
In the special case where the exponent ß = 1, the Nth most popular item is exactly twice as popular as the 2Nth item, and so on. The Zipf distribution has been widely used to model many aspects of social and economic behavior [21, 22]. It has also been applied to model memory referencing behavior of computer programs [23, 24] and, most recently, Web referencing behavior [20].
To determine whether or not Web proxy document references follow the Zipf distribution, the documents are first ranked, with the most referenced document being assigned a rank of one, followed by the next most referenced document with a rank of two, and so on.
The frequency vs. rank plot for the USask data set (on a log-log scale) appears in Fig. 4. Visual inspection of the graph suggests that the distribution follows Zipf's Law, albeit not as strictly as has been observed for Web servers. Similar observations can be made for the other two data sets [16]. There appears to be some flattening at the most popular and least popular ends of the plots. This might indicate caching of the frequently requested "hot" documents at browsers and lower-level caches. The middle portion appears to follow the Zipf distribution more accurately. Similar observations are made by Breslau et al. [20].
To determine the best-fit exponent () of the Zipf distribution, the method in [25] is used. This involves grouping all documents that
have been referenced an identical number of times and assigning them the same rank. For example, if three documents are referenced 100 times each and are ranked 1000,
1001, and 1002, the modified data set considers them to be a single data point with a rank of 1001 (i.e., the average of the three ranks) and popularity equal to 100.
After obtaining the modified data set, a least-squares fit over the (log-transformed) modified data set is performed. The calculated values for
and goodness of
fit R2 are shown in Table 5. Figure 4 also shows that the Zipf distribution plot obtained with the calculated
value (the dotted line) is very similar to the corresponding frequency vs. rank plot. These
results suggest that a Zipf-like model can capture the distribution of references seen at a proxy. Note that the exponent
decreases for higher-level proxies.
This is most likely due to filtering of popular documents at the browsers and lower-level proxies. Roadknight et al. came to similar conclusions after a
detailed study of document popularity characteristics.
Another way of characterizing the uneven document access patterns is to determine the extent to which references are skewed toward certain documents. A measure of skewness, referred to as concentration, was applied to memory and file referencing behavior [23, 26], and later to Web server document accesses [5, 10, 27].
The concentration phenomenon, as applied to documents sorted by their reference counts, is illustrated in Fig. 5a. Nonuniform referencing of Web documents is clearly reflected in this figure, with 30 percent of the documents accounting for 60–80 percent of the references. The remaining 70 percent of the documents (primarily the one-timers) account for the remaining requests.
Concentration of references is also observed when documents are sorted according to the volume of bytes transferred in response to requests for them. Figure 5b shows that 10 percent of the documents account for approximately 90 percent of the bytes transferred by the proxy to clients.
Higher-level proxies (i.e., CANARIE, NLANR) receive requests from clients that are typically lower-level proxies. Therefore, the concentration of requests at these proxies might be due to the overlap in requests from different clients, since frequent requests from users generally get captured by the browsers and lower-level proxy caches. Among the three data sets, the USask data set shows the most concentration, while the CANARIE one shows the least. The lower concentration at the CANARIE proxy might be due to its small client population (about 13).
These results suggest that the concentration of references is lower at the Web proxy servers than at the Web servers [5]. This observation makes sense intuitively, since the clients of a Web proxy can effectively access any available document in the Web (i.e., the document set is very large). For Web servers the requests are restricted to a limited set of documents (i.e., the documents present at the Web server).
However, some recent studies have shown that many static documents are never cached [28, 29]. There are three main reasons for these documents not being cached. First, a significant fraction of the responses received from the Web servers contain no Last Modified dates, disabling caching of these requests [30]. Second, many requests in the proxy traces are associated with "cookies,"6 and most proxies do not cache responses that are cookie-specific. Third, some static documents are associated with an HTTP directive ("no-cache" pragma in HTTP/1.0) which informs the clients that the document should not be cached.
A cursory analysis of the access logs revealed that there are many documents never being cached (at the browser and/or proxy). Therefore, part of the concentration of references seen is due to the uncachability of the documents. Since the access logs used in this analysis did not have the necessary information to distinguish clearly between cachable and uncachable documents, no further attempt was made to understand the impact of cachability of documents.
A document is assumed to be modified when the size associated with a reference to the document is different from the size associated with the document when it was last referenced. There is some error in this assumption, since the access logs report the bytes transferred from the proxy to the clients, which includes HTTP headers. Since document size is approximated by bytes transferred, modifications are reported when header size changes occur and when clients abort requests. The change ratio of a document is defined to be the ratio of the number of modifications seen to the number of references made since the document was first referenced.
Figure 6 shows the average change ratio of documents as a function of the midpoints of reference count intervals. The change ratio for all documents with reference counts in the interval [xi, xi+1], where xi+1 = 1.1 * xi, were averaged to produce each point in the figure.
The average change ratio appears to increase initially, then decrease, and finally increase again as the reference count increases. The documents seen in the traces tend to be more frequently modified as one moves higher up in the caching hierarchy because requests from lower-level proxies are forwarded to higher-level proxies when document modifications occur.
The main observations of the study are as follows:
As for the future, the Web is a continually evolving system. Forecasts indicate that by the year 2002, there will be about 320 million Web users, compared to approximately 100 million users in the year 1998 [30]. The number of multimedia services is also increasing, resulting in more audio and video traffic. Therefore, analysis of Web proxy traffic over time needs to be conducted to understand the changes occurring in the traffic characteristics.
References
[1] M. Baentsch et al., "World-Wide Web Caching-The Application Level View of the Internet," IEEE Commun. Mag., vol. 35, no. 6, June 1997.
[2] S. Glassman, "A Caching Relay for the World-Wide Web," Comp. Networks and IDSN Sys., vol. 27, no. 2, Nov. 1994, pp. 165–74.
[3] D. Wessels and K. Claffy, "ICP and the Squid Web Cache," IEEE JSAC, vol. 16, no. 3, Apr. 1998, available at http://ircache.nlanr.net/~wessels/Papers/icp-squid.ps.gz, pp, 345–57.
[4] A. Chankhunthod et al., "A Hierarchical Internet Object Cache," Proc. 1996 USENIX Tech. Conf., San Diego, CA, Jan. 1996, pp. 153-166.
[5] M. Arlitt and C. Williamson, "Internet Web Servers: Workload Characterization and Performance Implications," IEEE/ACM Trans. Net., vol. 5, no. 5, Oct. 1997, pp. 631–45.
[6] C. Cunha, A. Bestavros, and M. Crovella, "Characteristics of WWW Client-Based Traces," Tech. rep. TR-95-010, Dept. of Comp. Sci., Boston Univ., Apr. 1995, available at ftp://cs-ftp.bu.edu/techreports/1995-010-www-client-traces.ps.Z
[7] G. Abdulla et al., "WWW Proxy Traffic Characterization with Application to Caching," Tech. rep. TR-97-03, Comp. Sci. Dt., Virginia Tech., Mar. 1997, available at http://www.cs.vt.edu/~chitra/work.html
[8] CANARIE, CA*net II sanitized access logs, available at http://ardnoc41.canet2.net/cache/squid/rawlogs/
[9] NLANR sanitized access logs, available at ftp://ircache.nlanr.net/Traces
[10] H. Braun and K. Claffy, "Web Traffic Characterization: An Assessment of the Impact of Caching Documents from NCSA's Web Server," Comp. Networks and ISDN Sys., vol. 28, no. 1 & 2, Jan. 1995, pp. 37–51.
[11] M. Arlitt, R. Friedrich, and T. Jin, "Performance Evaluation of Web Proxy Cache Replacement Policies," Tech. rep. HPL-98-97, HP Labs, May 1998, available at http://www.hpl.hp.com/techreports/98/HPL-98-97.html
[12] S. Williams et al., "Removal Policies in Network Caches for World-Wide Web Documents," Proc. 1996 ACM SIGCOMM Conf. Appls., Technologies, Architectures, and Protocols for Comp. Commun., Stanford, CA, Aug. 1996, pp. 293–305.
[13] M. Arlitt and C. Williamson, "Trace-Driven Simulation of Document Caching for Internet Web Servers," Simulation, vol. 68, no. 1, Jan. 1997, pp. 23–33.
[14] P. Barford and M. Crovella, "Generating Representative Web Workloads for Network and Server Performance Evaluation," Proc. 1998 ACM SIGMETRICS Conf. Measurement and Modeling of Comp. Sys., Madison, WI, July 1998, pp. 151–60.
[15] M. Crovella and A. Bestavros, "Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes," IEEE/ACM Trans. Networking, vol. 5, no. 6, Dec. 1997, pp. 835–71.
[16] A. Mahanti, Web Proxy Workload Characterisation and Modelling, M.Sc. thesis, Dept. of Comp. Sci., Univ. of Saskatchewan, Sept. 1999, available at ftp://ftp.cs.usask.ca/pub/discus/thesis-mahanti.ps.Z
[17] P. Barford et al., "Changes in Web Client Access/Patterns: Characteristics and Caching Implications," World Wide Web J., vol. 2, 1999, pp. 15–28, available at http://cs-www.bu.edu/faculty/crovella/paper-archive/traces98.ps
[18] V. Paxson, "Empirically Derived Analytic Models of Wide-Area TCP Connections," IEEE/ACM Trans. Net., vol. 2, no. 4, Aug. 1994, pp. 316–36.
[19] V. Almeida et al., "Characterizing Reference Locality in the WWW," Proc. 1996 Int'l. Conf. Parallel and Dist. Info. Sys., Miami Beach, FL, Dec. 1996, pp. 92–103.
[20] L. Breslau et al., "On the Implications of Zipf's Law for Web Caching," Proc. IEEE INFOCOM '99, New York, Mar. 1999, available at http://www.cs.wisc.edu/~cao/papers/zipf-implications.html
[21] G. Zipf, Human Behavior and the Principle of Least-Effort, Cambridge, MA: Addison-Wesley, 1949.
[22] M. Kendall, "Natural Law in the Social Sciences," J. Royal Stat. Soc. 4, vol. 124, 1961, pp. 1–18.
[23] R. Bunt and J. Murphy, "The Measurement of Locality and the Behaviour of Programs," Comp. J., vol. 27, no. 2, Aug. 1984, pp. 238–45.
[24] J. Peachey, R. Bunt, and C. Colbourn, "Some Empirical Observations on Program Behaviour with Applications to Program Restructuring," IEEE Trans. Software Eng., vol. 11, no. 2, Feb. 1985, pp. 188-93.
[25] C. Roadknight, I. Marshall, and D. Vearer, "File Popularity Characterisation," Proc. 2nd Wksp. Internet Server Perf., Atlanta, GA, May 1999, available at http://www.cc.gatech.edu/fac/Ellen.Zegura/wisp99/papers/roadknight.ps
[26] C. Williamson and R. Bunt, "Characterizing Short Term File Referencing Behaviour," Proc. 5th Annual IEEE Phoenix Conf. Comp. and Commun., Phoenix, AZ, Mar. 1986, pp. 651–60.
[27] T. Kwan, R. McGrath, and D. Reed, "NCSA's World Wide Web Server: Design and Performance," IEEE Comp., vol. 28, no. 11, Nov. 1995, pp. 68–74.
[28] R. Caceres et al., "Web Proxy Caching: The Devil is in the Details," Perf. Eval. Rev., vol. 26, no. 1, Dec. 1998, pp. 11–15.
[29] C. Wills and M. Mikhailov, "Examining the Cacheability of User-Requested Web Resources," Proc. 4th Int'l. Web Caching Wksp., San Diego, CA, Mar./Apr. 1999, available at http://www.cs.wpi.edu/~mikhail
[30] L. Lange, "The Internet," IEEE Spectrum, vol. 36, no. 1, Jan. 1999, pp. 35–40.
Additional Reading
[1] CERN httpd Web Server
Biographies
Anirban Mahanti received a B.E. degree in computer science in 1997 from Birla Institute of Technology, Ranchi, India, and an M.Sc. degree in computer science from the University of Saskatchewan in 1999. He is currently working toward a Ph.D. degree in computer science at the University of Saskatchewan. His research interests are in parallel processing and Web performance.
Carey Williamson [M] is a professor in the Department of Computer Science at the University of Saskatchewan. He received a B.Sc. (Honours) in computer science from the University of Saskatchewan in 1985, and a Ph.D. in computer science from Stanford University in 1991. His general research interests are in computer networks and computer systems performance evaluation. More specific interests include network traffic measurement, workload characterization, network simulation, and Web performance. He is a member of ACM and SCS.
Derek L. Eager received a B.Sc. degree in computer science from the University of Regina in 1979, and M.Sc. and Ph.D. degrees in computer science from the University of Toronto in 1981 and 1984, respectively. Currently he is a professor in the Department of Computer Science at the University of Saskatchewan. His research interests are in the areas of performance evaluation, streaming media delivery, and distributed systems.