ABSTRACT
Proxy cache servers in the World Wide Web deliver information to end users more quickly and efficiently than serving every request directly from the origin server. They can improve response time, reduce network demand, and lighten the load on origin Web servers.
Proxy caches (or simply caches) achieve these benefits by storing copies of recently requested objects, avoiding the need to transfer those objects again. Cached objects are usually served more quickly and do not consume external network or server resources. Cache servers are typically placed close to a group of end users and handle all HTTP requests from those users. Requests for objects that are in the cache can be served to the user without remote communication and wide area data transfer. Requests for objects not in the cache are always resolved externally.
A cached copy of an object may differ from the current copy of that object at the origin server. This happens when a cache holds an object after the origin server changes that object. Currently origin servers do not communicate changes to caches; a cache must ask about them. Cache consistency is explored in [1, 2]. In [3] we propose an invalidation-based protocol motivated by Cao's work [2] to support stronger cache consistency.
When a request arrives for a cached object, the cache must decide whether to serve the object immediately or to validate it with the origin server. The user's request headers and the content headers in cache may instruct the cache to validate the object. Otherwise, the cache will determine whether to validate the object based on the freshness of the object in cache. The cache will serve locally any object it considers to be fresh, but will attempt to check an object's consistency before serving the request if the object is stale. The determination of fresh or stale is made by the cache, not the origin server. Note also that the cache has no way to know when the object actually changes, so fresh does not imply that the object is consistent with the current copy of the object.
This study explores the impact of the consistency decision on the response time of a cache, and finds that making a round-trip to the origin server to validate freshness is the dominant component of the response time of most user requests.
We define the response classes used in this analysis and describe a widely used consistency protocol that is used by a cache to identify whether an object is fresh or stale. Using these definitions, we then analyze data from three cache servers and summarize the findings.
The Alex protocol defines content freshness as follows:
Furthermore, if an object has an Expires header, the cache will consider the object fresh until the time specified in the Expires header, after which it is stale. Note that the object will be served inconsistently from a cache if it changes prior to its expiration time.
This analysis extracted request service time, response size, and response type fields from a set of proxy cache server logs. For each response type we computed the frequency distribution of the response size and request service time metrics. From the frequency distribution we observed the mean, the median, and the 80th and 90th percentiles of the distribution, which led to conclusions about service time as a function of consistency. The mapping of log fields to response type is described in the full version of this article [5].
There are some limitations with such a log-based analysis scheme:
In a Web workload, mean response time (and response size) is often skewed by a relatively small number of long-duration (or large) responses. For this reason the median response time, which indicates what most responses see, is often much less than the mean. Sometimes 80 percent or more of responses are satisfied in less than the mean response time. Since the median is not skewed by a few large (or long) responses, it may be a better indicator of response time than the mean. For this reason this analysis presents the mean as well as the 50th percentile (the median) and the 80th percentile responses.
The analysis also excludes non-HTTP, error, and other response types, so the sum of the percent of responses may be less than 100 percent.
Sixteen months of responses were analyzed, consisting of 3.3 million cachable requests.
Table 2 shows that the mean response time for slow validations is approximately eight times longer than fast validations, and that both transfer about the same amount of data to the client. The median response time for slow validations is about l0 times longer than the median response time for fast validations. The response size is that of the HTTP response headers. No object data is sent or returned to the user for a validation.
The mean response time for slow hits is 3.5 times that for fast hits; the median is 5.5 times. Both of these response classes return object data to the client of about 5 kbytes. Table 2 also shows that consistency misses take approximately nine times longer than fast hits and twice as long as slow hits; the median time for consistency misses is seven times longer than fast hits. In these cases object data is resumed to the user, but fast hits require no remote communication; slow hits receive only a small validation from the origin server; and consistency misses retrieve full object data from the remote server. The data transfer size, wide-area round-trip, and server demand all affect response times.
Note also that consistency misses are considerably larger than fast and slow hits. Most of the consistency misses were for HTML objects. The increased response time is due to both larger mean HTML object size (compared with the mean object size of images, which accounts for most responses; see [5] for a breakdown of object size by type), and the complexity of generating dynamic HTML objects. The log-based analysis could not determine when HTML objects were dynamically generated, but we know from experience that some of them are.
The mean response times by class in Table 2 are closer to the 80th percentile value than to the median value for the class. In some cases the mean is larger than the 80th percentile. This indicates a heavy-tailed distribution where a few very long transfers skew the mean such that it no longer corresponds to the response time of the majority of requests.
Cold misses are slower than consistency misses, but the mean object size is again much larger. This is due to a few very large cold misses. The granite cache did not keep any objects in cache over 4 Mbytes, so each response over 4 Mbytes is necessarily a miss.
The median and 80th percentile object sizes for consistency misses and cold misses are nearly equal, as shown in Table 3; it is only at the 90th percentile and above that cold misses have a distinctly heavy tail relative to other response types. This is likely due to a few large objects being requested through the cache. The 90th percentile value indicates that 10 percent of misses from this server were larger than 27 kbytes.
Figure 2 shows the distribution of response times observed at this server. Figure 3 shows the distribution of response sizes. These graphs present a cumulative distribution function (CDF), which indicates on the y axis the percentage of responses less than or equal to the time (or size) on the x axis. With a CDF the median value is the x value at which a curve crosses 50 percent.
Figure 2 shows that fast validations and fast hits are significantly faster to complete than the other response types, and that misses are the slowest response type. Seventy percent of fast hits and 80 percent of fast validations complete in under l00 ms. Fewer than 10 percent of slow validations complete in under l00 ms.
The response size distribution for fast validations is in a very small range, indicated by a nearly vertical line in the CDF in Fig. 3. The response size is determined by the headers sent by the cache, which are very similar for every response. Slow validation headers come from various origin servers, and show greater variability and smaller overall size. The Squid proxy cache evidently includes more header information than most origin servers do.
The file size distribution also shows that response sizes for fast hits and slow hits are closely matched. This eliminates size variation as a likely cause for the difference in response times.
More detail about response times and sizes for each response type is presented in [5].
Table 4 indicates that fast validations were more than ten times faster than slow validations: about 14 times faster in the mean and median; and about nine times faster at the 80th (and 90th) percentiles. In these cases the reported data size was zero bytes; the Netscape cache did not report header size to the access log.
The response time for slow hits is about six times longer than fast hits at the mean, nine times at the median.
Note the large response size for misses. Some very large objects were served through this cache, which significantly affected the mean transfer size. The response size CDF shows that there were very few of these objects: only 5 percent of objects were over 27 kbytes.
The response time distribution for this server shows that fast validations and fast hits are significantly faster than the other response types. Furthermore, slow validations and slow hits take almost exactly the same amount of time. This indicates that the round-trip to the remote server is the dominant component in cache response time, not the amount of data transferred to clients. Consistency misses and regular misses are slower than slow validations and slow hits, as observed earlier.
The response size distribution shows that fast hits and slow hits return similar amounts of data. Consistency misses and regular misses are also closely matched. Validations (fast and slow) were reported as zero bytes.
The workload of the users at this site during the entire five month log collection period was studied in detail by Arlitt et al. [6]. This report extends that characterization to examine cache response time by cache behavior.
The logs from this site did not contain enough detail to differentiate between consistency, proxy-only, and regular misses. Table 5 summarizes the results from this site.
This data set confirms the LAN findings of fast validations being more than an order of magnitude faster than slow validations. Both the mean and the median response time for slow validations indicate that they take between 15 and 20 times longer to complete than fast validations.
The mean response time for slow hits is about the same as fast hits, but this is evidently due to a few very long fast hits: the median and 80th percentile response times show that slow hits take eight to nine times as long to complete as fast hits. In Table 5 the mean and median response time for misses is about 160 percent longer than for slow hits; the mean response time for misses is 170 percent longer than fast hits. However, the median response time for misses is 20 times longer than fast hits.
Table 6 presents the mean, median, and 80th percentile response size for fast hits, slow hits, and misses.
The response size of fast hits is 10 percent larger on average than slow hits, but slightly smaller at the median and 80th percentiles. The mean response size for misses is 72 percent larger than for slow hits, the median 70 percent larger.
Object size does not appear to be a dominant factor in response time, but object service location (direct from the proxy vs. requiring validation at the origin) does.
In this report we demonstrate that performing a consistency check prior to resuming an object from cache also has a significant impact on object retrieval time. When a validation is returned directly from a cache it is resumed eight to 15 times faster than an object that requires a consistency check with the origin server. This effect is especially significant if the origin server is slow, distant, overloaded, or has failed. Reducing the service time of requests will also allow a cache to support more total user requests.
Our results are similar across two very different proxy cache implementations and user bases, and widely ranging levels of request demand. This indicates that the characteristics observed are likely to be fundamental. This concurs with intuition: requests satisfied nearby are expected to be satisfied faster than requests to more distant servers.
Based on these results we suggest that there is an opportunity to improve user response time from caches by improving cache consistency. Earlier work in this area suggests strong consistency is possible to achieve in the Web at the same cost as weak consistency by using server invalidations [2]. We have proposed such a mechanism and shown via simulation that it provides better object consistency, is faster for end users, consumes slightly less network bandwidth, and reduces origin server load [3].
As faster end systems and residential networks are deployed, content consistency will have more relative impact on the performance of user requests and caches.
References
[1] J. Gwertzman and M. Seltzer, "World-Wide Web Cache Consistency," USENIX 1996 Annual Tech. Conf., Jan. 1996.
[2] C. Liu and P. Cao, "Maintaining Strong Cache Consistency in the World-Wide Web," IEEE Trans. Comp., vol. 47, no. 4, Apr. 1998, pp. 445–57.
[3] J. Dilley et al., "The Distributed Object Consistency Protocol," Tech. rep. HPL1999-109, HP Labs, Sept. 1999.
[4] V. Cate, "Alex -- A Global Filesystem," Proc. USENIX File Sys. Wksp., May 1992, pp. 1–12.
[5] J. Dilley, "The Effect of Consistency on Cache Response Time," Tech. rep. HPL-1999-107, HP Labs, Sept. 1999.
[6] M. Arlitt, R. Friedrich, and T. Jin, "Workload Characterization of a Web Proxy in a Cable Modem Environment," ACM SIGMETRICS Perf. Eval. Rev., vol. 27 no. 2, Aug. 1998, pp. 25–36.
Biographies
John Dilley [M] is a distributed systems architect with Akamai Technologies. His research interests include architecture and design of distributed applications, object location and distributed directory services, and object-oriented application design and development. He received B.S. degrees in mathematics and computer science from Purdue University in 1984 and an M.S. in computer science in 1985. He is a member of ACM.