Your Internet Connection

The Neural System of the World Wide Web

Eddie Rabinovitch

If HTML can be considered the language of the Internet and the World Wide Web (see my February 1998 column), the Hypertext Transfer Protocol (HTTP) is clearly the "neural system" of the Web. Let's take a short look at the history of hyperlinks. Invented more than 30 years ago(!) by Doug Engelbart and his team of the Augmentation Research Center (ARC) at the Stanford Research Institute (SRI), hyperlinks represent the interconnections among units of information in a shared computer environment. Hyperlinks were analogous to the neural connections in the human brain, which collectively represent actionable knowledge. The goal was to augment human intellect by providing a computerized extension of memory and thought. By the 1970s, hyperlinks were imbedded in a context, most often based on collaboration. For example, hyperlinks formed an audit trail through document co-authoring, provided online access to references, or created a threaded conversation tracing the formation of ideas, designs, proposals, software, and so on. So, more than 20 years ago, hyperlinks created a new way for computer-network-assisted human collaboration.
The original hyperlink evolved into an applet that could be interpreted and run as a filter, search, or string manipulation program. HTTP is based on a request/response model. Actually, the name is somewhat misleading, since HTTP is not a protocol for transferring hypertext, but rather a protocol for transmitting information with the efficiency necessary for making hypertext jumps. The actual data transferred by HTTP can include plain ASCII text, hypertext, audio, images, video -- in short, any form of information accessible on the Internet. HTTP is an application-level protocol for distributed, collaborative hypermedia information systems. It is a generic, stateless, object-oriented protocol, which can be used for many tasks -- most notably, name servers and distributed object management systems. And because of its stateless nature (i.e., no information is retained by the server between requests, and each transaction is treated independently), HTTP is well suited for a typical Web application, allowing a user to retrieve a sequence of Web pages, documents, and multimedia objects from a number of widely distributed servers.
However, since the Web became so popular so fast, the depth of underlying protocols did not evolve as fast as the breadth of applications. To remedy this situation various initiatives began within the Internet community to enhance HTTP. Typically, a new TCP client/server connection will be created for each transaction, which will be terminated as soon as the transaction completes. Here lies one of the major differences between HTTP 1.1 and its predecessors: with HTTP 0.9 or HTTP 1.0, a separate TCP connection is created for each element downloaded in a Web page. In other words, for downloading a Web page with N images, HTTP 1.0 will create N + 1 TCP connections, including the HTML connection for the Web page itself. So with HTTP 1.1, one TCP connection can be used to download multiple files of a Web page.
Support for persistent connections is an additional important improvement introduced with HTTP 1.1. The initial versions of HTTP (0.9 and 1.0) did not support any notion of persistent connection between an HTTP client (e.g., Web browser) and server. HTTP 1.1 permits the client and server to maintain their connection, exchanging multiple requests/responses until one party explicitly closes the connection.
Security, or rather lack of it (see my March 1997 column) is another issue, addressed to some extent by HTTP 1.1. HTTP 1.0 includes specification for a so-called basic access authentication scheme, where the user name and password are passed over the network in an unencrypted form. The digest access authentication scheme, introduced for HTTP 1.1, addresses the most serious flaws of basic authentication. Similar to basic access authentication, the digest scheme is based on a simple challenge–response paradigm. The password, however, is never sent in the clear. See RFC 2069 (ftp://ds.internic.net/rfc/rfc2069.txt) for details on the digest scheme.
And since we mentioned standardization, let's see where HTTP stands with standards organizations. The World Wide Web Consortium (W3C) is closely working with the Internet Engineering Task Force (IETF) to improve HTTP through continuous evolution, rationalization, and extensions. HTTP/1.1 is currently an IETF proposed standard RFC 2068. W3C recommends addressing a list of issues prior to moving it to a draft standard. See http://www.w3.org/Protocols/HTTP/Issues for the details.
All the improvements included in the most recent standard version of HTTP 1.1 did not change the overall nature of the protocol, and the requirement for backward compatibility with HTTP 1.0 has prevented a real cleanup of its architecture. In summer 1997, W3C began its HTTP New Generation (HTTP-NG) project. The purpose of this project is to address the shortcomings of the existing HTTP protocol and design a new architecture based on a distributed object-oriented model. Additional details on the HTTP-NG project can be found at http://www.w3.org/Protocols/HTTP-NG.