Bell Labs Internet Traffic Research

PackMime: Statistical Modeling of Connection Request Variables

The Architecture
The ``net'' is a modeled network of links, network devices, and algorithms whose performance is studied through simulation. The ``load clouds'' are collections of hosts. The net is modeled by a network simulator such as Opnet or NS. TCP transports information across the net from a host in one cloud to a host in another. In NS, for example, TCP is modeled using the source code of an actual TCP implementation from the BSD kernel.

Each cloud has an aggregate load of requests for transfers from the hosts in each of the other clouds. The cloud generates TCP connection requests for applications such as HTTP, SMTP, FTP, and so forth. A request consists of (1) a request time; (2) file sizes (e.g., for HTTP, the size of the request file and the size of the downloaded file); and (3) a packet flight time from each host involved in the transfer to the net. Each request variable is generated stochastically by a statistical model, a time series of values of the variable occurring at the request times.

Statistical Models for Connection Request Models
Connection request variables, or simply connection variables, measure characteristics of the request such as time and file size, and they measure characteristics of the network at the time of the request such as round-trip time. For example, for HTTP 1.0, the connection variables are connection-start inter-arrival times, server file sizes, client file sizes, server-side round-trip times, and client-side round-trip times. Each of these times-series variables is defined separately for each application protocol. We have built statistical models for connection request variables. Each request variable is generated stochastically by its statistical model, a time series of values of the variable each associated with one request time. Our statistical models of traffic requests incorporate the long-range dependence and the nonstationarity that is pervasive in Internet traffic. The former is well known and much studied. The latter, however, has received much less attention. Traffic on Internet wires is a superposition of traffic sources. The cause of the nonstationarity is a changing number of superposed traffic sources. As the number changes, the statistical properties change. And the change is far more profound than just simply an increase in rate as the number of sources increases. Marginal distributions and autocorrelation change as well. Our models are being developed through extensive empirical and theoretical study based on an approach to traffic modeling called ``connection-rate superposition." The basic assumption of this approach is that the statistical point process that generates the TCP request times for an application when the request rate is kr, where k is a positive integer and r is a base process rate, is the k-fold superposition of k independent point processes with rate r.

The Break with the Past
In the past we have had network simulators that recreate network devices, topologies, and protocols with stunning detail. But their applicability has been hampered by a lack of request traffic generation. The nature of the request traffic can have a major effect on the packet traffic of the net. For example, queueing behavior depends heavily on request traffic. One of the most impressive simulators, Opnet, has 12 volumes of documentation, each book-length, but only a few pages are devoted to connection requests.

One remedy to the lack of request traffic modeling has been to feed packets into the net using statistical models of packet behavior. But this is not realistic because it is open-loop not closed-loop; that is, it does not take TCP feedback into account. By modeling requests as we do, at the TCP start level, and running TCP software to generate requests, we achieve closed-loop packet generation. So are queueing studies for example have a validity not achieved by open loop queueing. The properties of connection variables such as the inter-arrival times of HTTP connections on a wire have been studied in the literature. However, much of the study has been descriptive, only partial characterizations of the statistical behavior of the variables. Our models provide a full description that allows the variables to stochastically generated so that they mimic live requests on an Internet link.

Another remedy has been to build request models at the user level, that is, modeling user behavior. But this is a daunting task, and while it can serve very usefully isolated studies of particular applications, it is not practical for the extensive application environment that pervades an Internet wire, which is filled with packets from hundreds of applications. In addition, our modeling at the TCP start-level allows traffic at different rates to be generated as a single stream, which greatly speeds up computation and allows large numbers of hosts, whereas superposition of sources for user-level models involves merging as many streams as there are users, which limits the number of hosts.