An Integrated Experimental Environment
for Distributed Systems and Networks
Brian White Jay Lepreau Leigh Stoller Robert Ricci Shashi Guruprasad
Mac Newbold Mike Hibler Chad Barb Abhijeet Joglekar
School of Computing, University of Utah
www.flux.utah.edu
Abstract
Three experimental environments traditionally support
network and distributed systems research: network emulators, network simulators, and live networks. The continued use of multiple approaches highlights both the
value and inadequacy of each. Netbed, a descendant of
Emulab, provides an experimentation facility that integrates these approaches, allowing researchers to configure and access networks composed of emulated, simulated, and wide-area nodes and links. Netbed’s primary
goals are ease of use, control, and realism, achieved
through consistent use of virtualization and abstraction.
By providing operating system-like services, such as
resource allocation and scheduling, and by virtualizing
heterogeneous resources, Netbed acts as a virtual machine for network experimentation. This paper presents
Netbed’s overall design and implementation and demonstrates its ability to improve experimental automation
and efficiency. These, in turn, lead to new methods of
experimentation, including automated parameter-space
studies within emulation and straightforward comparisons of simulated, emulated, and wide-area scenarios.
1
Introduction
The diverse requirements of network and distributed systems research are not well met by any single experimental environment. Competing approaches remain popular
because each covers a different point in a space defined
by levels of ease of use, control, and realism. Packetlevel discrete event simulation and live network experimentation represent two extremes. Simulation presents
a controlled, repeatable environment. However, its level
of abstraction may be too high to capture low-level effects such as the impact of interrupts under heavy load.
Live networks achieve realism, but surrender repeatability and the ability to modify or even monitor internal router behavior. Emulation [1, 27, 36, 42] is a hybrid approach that subjects real applications, protocols,
and operating systems to a synthetic network environment. While single-node WAN emulators, such as Dummynet [36], introduce artificial delays, losses, and band-
www.netbed.org
width constraints in a controlled manner, they require tedious manual configuration.
Netbed complements existing experimental environments. It spans simulation, emulation, and live network experimentation by integrating them into a common framework. This integration brings the control and
ease of use usually associated with simulation to emulation and live network experimentation without sacrificing
realism. It gives users the individual benefits of simulation, emulation, and live network experimentation, configured and controlled in a consistent manner. Further,
integration facilitates interaction, comparison, and validation across the three domains.
Netbed is a software system that provides a time- and
space-shared platform for research, education, or development in distributed systems and networks. It leverages
local nodes, allocated from clusters and temporarily dedicated to individual users, for emulation; this paper often
refers to these as emulated nodes. Netbed also employs
geographically-distributed nodes that are simultaneously
shared amongst users; this paper frequently refers to such
resources as wide-area nodes. Researchers access these
resources by specifying a virtual topology graphically or
via an ns script [40], causing Netbed to automatically
configure a physical topology. An experiment is defined
by this configuration and any run-time dynamics, such
as traffic generation, specified via the general-purpose ns
interface. When realizing the virtual topology, Netbed
virtualizes host names, IP addresses, links, and nodes.
Virtual nodes may be instantiated from a large set of local
nodes, from a smaller set of distributed nodes, or within
ns simulation. Virtual links may map directly to localarea links, may be matched to similar wide-area links,
or may be emulated by interposing Dummynet nodes to
regulate bandwidth, latency, loss, and queuing behavior.
Netbed’s framework provides integrated abstractions,
services, and name spaces common to all three environments, mapping them into domain-specific mechanisms and internal names. Netbed’s operating systemlike services include node and link allocation and naming, scheduling and idle experiment preemption, experiment “swapping,” and disk image loading.
Given these services, an analogy between an experiment and a Unix process seems natural. This metaphor
illustrates the life cycle of an experiment and Netbed’s
role in automating and controlling the procedure. The ns
specification serves as the “program text,” which Netbed
compiles to synthesize a hardware realization of the virtual topology. The specification is first parsed into an intermediate representation that is stored in a database and
later allocated and loaded onto hardware. During experiment execution, Netbed provides interfaces and tools for
experiment control and interaction. Finally, Netbed may
preempt and swap out an experiment. Because Netbed
gives experimenters run-time control over node and link
characteristics and an ability to interpose traffic-shaping
and monitoring nodes, we view the system as a virtual
machine for heterogeneous node, link, and topology allocation and control. While traditional virtual machines
target an architecture’s instruction set, Netbed instead abstracts the network.
The analogy is not merely cosmetic; experiments derive key benefits from Netbed’s design, namely automation and time- and space-efficiency. Experiment creation
involves a large number of steps including, for example, configuring network interfaces and routing tables,
installing operating systems, exporting file trees, and administering user accounts. Netbed removes the tedium of
manual configuration through automation. Netbed was
designed to make efficient use of physical resources and
to enhance experimenter productivity. It manages the
shared use of physical resources to provide their greatest possible utilization, while ensuring inter-experiment
isolation. Netbed performs experiment creation and termination in a few minutes, enabling an interactive style
of use. Attention to efficiency of disk reloading, resource allocation, and experiment creation maximizes
time spent executing experiments and minimizes effort
expended configuring them.
This paper makes the following contributions:
• It introduces the notion of a virtual machine for controlled network experimentation and shows how it
integrates heterogeneous resources.
• It outlines the key obstacles to virtual machine efficiency and how they were overcome.
• It shows that Netbed’s automation, efficiency, and
services inspire qualitatively new methods of experimentation.
• It provides data validating Netbed’s emulation capabilities.
Section 2 continues by outlining the heterogeneous resources managed by Netbed. Section 3 outlines the life
cycle of an experiment, using the virtual machine analogy to describe the system’s design, and Section 4 shows
the benefits of this approach. Section 5 details the challenges overcome by Netbed’s experiment services, including the mapping of virtual to physical resources and
disk loading, and their efficiency. Section 6 validates the
emulation facilities. Section 7 illustrates unique experimental techniques facilitated by Netbed. Finally, related
work is addressed in Section 8 and Section 9 concludes.
2
Resources
As its original name, “Emulab,” suggests, Netbed was
conceived as an emulation platform. Through its flexible
design, it has evolved to support a diverse set of physical node and link types. These resources are virtualized
in the sense that they may be allocated and controlled
largely independently of their physical realization.
Local-Area Resources: Netbed software currently
controls two clusters: one at the University of Utah comprised of 168 PCs and another at the University of Kentucky containing 50 PCs. The two sites are configured
in a nearly identical fashion. Any of these nodes can
function as an edge node, a traffic generator, or a router.
Each machine has five 100Mb Ethernet interfaces: one
is on a dedicated control and data acquisition network
and the others are for arbitrary use by experiments. At
each node, local memory and disk provide ample room
for computation and logging of monitoring data.
All local nodes are connected using high-end switches
that function as a “programmable patch panel.” To support arbitrary and isolated topologies and to provide security to Netbed users, we employ Virtual LANs. A
VLAN is a switch technology that restricts traffic to the
subnet defined by its members. We have verified empirically that our switches provide inter-VLAN performance
isolation, in the face of both traffic and control operations
(VLAN creation, deletion, and modification).
Netbed’s local nodes and wealth of available bandwidth can be configured into switched LAN topologies.
This, coupled with its rapid and automated configuration
of operating systems, makes Netbed an attractive platform for kernel development and research within localarea networks. Root privileges, remotely accessible consoles, and remote power cycling help make kernel development convenient.
Emulated Resources: Netbed uses Dummynet and
VLANs to emulate wide-area links within the local-area
environment. A Dummynet node is automatically inserted between two physical nodes and enforces queue
and bandwidth limitations, introducing delays and packet
loss. Dummynet nodes act as Ethernet bridges and are
transparent to experimental traffic.
Distributed Resources: Netbed integrates both the
MIT-owned testbed nodes first used for the RON [4] research, as well as nodes contributed by other organizations that run our special CD-based Unix configuration. These resources today provide Netbed with approximately 40 nodes at 30 different sites around the world,
including nodes connected via Internet2, DSL, and cable
modems. These nodes are valuable to experimenters performing Internet measurement or who require the characteristics of a live network. Experimenters may request a
random set of nodes, specific nodes, nodes having a specific class of network connection (e.g., via a cable modem), or nodes connected via specified latencies, bandwidths, and loss rates. In the latter case, Netbed provides
a best-effort mapping of a user-specified virtual topology
onto physical distributed nodes.
Distributed nodes support many of Netbed’s key features, including account establishment and automated
traffic generation, subject to their particular policies and
mechanisms. For example, distributed nodes typically
have only one network interface, so do not have a physically separate control network. Due to their scarcity, by
policy—not limitation of mechanism—distributed nodes
currently are shared; multiple experiments may use a
node simultaneously. Netbed provides some isolation between experiments through the FreeBSD Jail [18] mechanism, which provides a primitive form of virtual machine and restricts root privileges. Our modifications to
Jail provide access to raw sockets, while preventing processes from spoofing IP addresses. Multiplexing is supported by providing a (currently fixed) number of jailed
virtual machines per node. Extending this mechanism to
provide fair sharing of CPU, memory, and network resources is a subject of future work.
Netbed provides flexibility in specifying interconnections between distributed nodes. By default, the nodes
retain full, unmediated access to the Internet. However, if links are specified between the nodes, Netbed
sets up IP tunnels so that distributed nodes can use “private” IP addresses. In conjunction with Netbed’s automated routing setup, this creates an overlay network
configured to the experimenter’s specifications. These
tunnels also allow transparent communication between
distributed nodes and experimental interfaces on local
nodes, so that networks can contain both Internet and
emulated links. Thus, distributed nodes may be treated
the same as local nodes with respect to traffic generation,
routes, and IP addresses.
Simulated Resources: Netbed integrates simulation
through ns’s emulation facility, nse [10], allowing simulated nodes, links, and traffic to interact with application traffic. Though simulation abstracts detail [15, 11],
it can provide scalability beyond the limits of physical
resources; many virtual simulated nodes can be multiplexed on one physical node.
masterhost
usershost
Internet
Web/DB/SNMP
Switch Mgmt
Serial Links
Control Switch / Router
Distributed Nodes
NSE
NSE
NSE
Virtual PC
Virtual PC
Virtual PC
Power Cntl
PC
PC
PC
PC
PC
PC
PC
PC
168
"Programmable patchpanel"
Figure 1: Netbed Architecture
Netbed’s deployment of ns brings a wealth of simulation infrastructure to emulated and distributed experiments, including ns’s rich and diverse protocol suite, varied statistical models, and support for wireless devices.
nse can also be used to simulate a large-scale network
within emulation. The close interaction between simulation and live protocols presents an opportunity to validate
ns’s abstractions.
Planned Extensions: Plans are underway to integrate
additional virtual resource types: we are constructing a
WAN emulator based on the Intel IXP1200 network processor [17] that provides improved features and performance over Dummynet. Second, we plan to control and
configure ModelNet [42] through Netbed’s existing interfaces.
3
Experiment Life Cycle
An experiment is Netbed’s central operational entity. It
represents a network configuration, including links and
VLANs; node state, including operating system images;
and database entries, including event sequences. The intended duration of an experiment ranges from a few minutes, to many days, to months or years on distributed
nodes. This section follows the life cycle of an experiment to illustrate Netbed’s operation and further develop
its role as a virtual machine for network experimentation.
The Netbed virtual machine is architected around interacting state machines, monitored by a state management daemon. A primary state machine represents the
experiment, while subsidiary state machines handle node
allocation, configuration, and disk reloading. The state
daemon catches illegal or tardy state transitions. For example, if a node hangs while rebooting, the state daemon
times out and attempts an alternate reboot mechanism.
This approach copes reasonably well with the reliability
challenges of large-scale distributed systems which are
composed of often unstable commodity hardware, but
further work on reliability is needed.
3.1 Accessing Netbed
To minimize administrative overhead, Netbed employs a
hierarchical structure for authorization: To begin a new
project, a “leader,” e.g., a faculty member or senior student, submits a simple web form. Once the project has
been approved by Netbed staff, accountability and ability to authorize other project members are delegated to
the project leader. The web interface then serves as a
universally-accessible portal to Netbed, through which
an experimenter may create or terminate an experiment,
view the corresponding virtual topology, or configure
node properties.
After experiment creation, experimenters may log directly into their allocated nodes, or in to usershost,
depicted in Figure 1, which serves as a centralized point
of control. This node is also fileserver, which stores
operating system images, exports home and project directories to local nodes via NFS and to distributed nodes
via SFS, the Secure File System [20]. masterhost is a
secure server for many of our critical systems, including
the web server, database, and switch management.
3.2 Specification
Just as program text is the concrete specification of a
run-time process, an ns script written in Tcl configures a
Netbed experiment. This choice facilitates validation and
comparison since ns-specified topologies, traffic generation, and events can be reproduced in an emulated or
wide-area environment. For the large community of researchers familiar with ns, it provides a graceful transition from simulation and an opportunity to leverage existing scripts. Since Tcl is a general-purpose programming language, a researcher is empowered with looping
constructs, conditionals, and arbitrary functions to drive
experiment configuration and execution.
Emulated nodes and links enjoy full implementation
transparency. By default, links specified in the ns experiment file are realized as interposed Dummynet nodes. To
instead incorporate distributed nodes, an experimenter
need only specify an appropriate node type. For example, Figure 2 requests an Internet-connected node
by specifying a pc-inet hardware type. A simulated topology can be embedded within an emulated
topology by wrapping standard ns syntax in a makesimulated block, a Netbed-specific construct.
Any constant bit rate traffic flow identified via standard ns syntax automatically instantiates traffic sources
and sinks using the TG Tool Set [21]. Simulated FTP
and Telnet flows are rendered using ns’s emulation facility, nse. This mechanism injects traffic generated by
models, such as the tcplib telnet distribution, into a live
network. Such cross traffic is important for studying protocol behavior in the face of congestion.
Netbed defines a small set of ns extensions, including
set ns [new Simulator] # Create the simulator
source tb_compat.tcl
# Add Netbed commands
$ns rtproto Static
# Netbed computes routes
set source [$ns node]
set router [$ns node]
set dest
[$ns node]
# define new nodes
# Connect source to router and router to dest
$ns duplex-link $source $router 10Mb 0ms RED
$ns duplex-link $router $dest 1.5Mb 20ms DropTail
tb-set-node-os $source FBSD45-STD # Set OS on local node
tb-set-hardware $dest pc-inet # Request distributed node
$ns run
# "run" on Netbed
Figure 2: An ns file showing a linear topology with routing and a distributed node
procedures to configure a node’s operating system and
to specify its hardware type. These procedures are not
required; Netbed supplies default behavior in their absence. A stub library defines null procedures so that the
same script may be executed on Netbed and within ns.
Program objects are a Netbed-specific ns extension
that provides a rudimentary remote execution facility. A
program object is associated with an ns node in the script
and attaches arbitrary applications to the corresponding
local node. It may be independently controlled during an
experiment’s execution. Program objects are currently
not available on distributed nodes, until we finish securing the distributed event system.
Experimenters unfamiliar with ns syntax may create
topologies graphically via a Java GUI, which generates
an ns configuration file. Alternatively, a standard topology generator such as GT-ITM or BRITE may be used to
generate an ns script. This highlights one of the primary
benefits of integration: application of tools intended for
one experimental domain, in this case simulation, to another.
3.3 Parsing
A traditional compiler is separated into front and back
ends whose interactions are mediated by an intermediate representation. This aids portability since the same
front end can be reused with back ends supporting different hardware architectures. Since Netbed targets multiple, heterogeneous physical resources simultaneously,
it uses an analogous split-phase style of compilation.
A database serves as the shared repository between the
front-end Tcl/ns parser and resource-specific back-end
mechanisms. Thus, a single experiment may incorporate
simulated, emulated, and wide-area links without requiring excessive resource-specific knowledge in the specification language or front-end parser.
Netbed’s parser recognizes the subset of ns relevant to
topology and traffic generation. Written in Tcl, it operates by overriding and interposing on standard ns procedures and Tcl primitives. Netbed executes the experi-
ment configuration script in the context of these new definitions. Unrecognized ns commands output a warning,
while ns syntax configuring links and traffic endpoints
triggers the overloaded procedures. ns-specified event
generation is performed at this time, storing the events
in the database. Therefore, ns-specified events are static
and have a (large) limit on their number.
Both overloaded and Netbed-specific procedures populate the database, which also stores information about
hardware, users, and experiments.
The database
presents a consistent abstraction of heterogeneous resources to higher layers of Netbed and to experimenters.
For example, the front-end database representations of
distributed and emulated nodes differ only in a type tag.
The database provides a single name space for all experimental entities. Thus, in most cases, experimenters can
interact with them using the same commands, tools, and
naming conventions regardless of their implementation.
As an example, nodes of any type may host traffic generators, despite the fact that the traffic may flow over links
simulated by ns, emulated by Dummynet, or across the
Internet between distributed nodes.
3.4 Global Resource Allocation
The global resource allocation phase is responsible for
binding abstractions created during previous stages to
physical entities. It corresponds to the resource allocation performed during back-end compilation and linkerdirected name binding. For overall simplicity, resources
are currently allocated on demand rather than reserved
by experimenters in advance.
Netbed uses general combinatorial optimization techniques to perform resource allocation. The algorithms
map a target configuration, stored in the database, onto
available physical resources. Such a mapping respects
the interconnections of the virtual topology, including
their latency, bandwidth, and loss rates. As further explained in Sections 5.2 and 5.3, we use separate algorithms for local and distributed nodes due to their differing constraints. The mapping program for local nodes,
assign, uses simulated annealing, while the wanassign program uses a genetic algorithm for distributed
resources. Based on the output of assign and wanassign, Netbed reserves nodes and links and updates the
database with resource mappings and user-supplied parameters.
Although within an experiment we follow our principle of conservative resource allocation, we’ve found it
impractical to do so between experiments on local nodes.
We currently have only 2 Gbps inter-switch bandwidth,
much of which is theoretically consumed by single experiments, preventing other experiments from mapping
successfully. However, our traffic monitoring has shown
that, in practice, experiments rarely use their allocated
inter-switch bandwidth. Therefore we have adopted a
policy of over-reserving these bottleneck links while
continuously monitoring them for high bandwidth use.
Thus far, that has never occurred.
Occasionally, there is a need to dynamically change
node membership in an experiment. This can happen,
for example, if a node fails and must be replaced, or if
nodes are no longer needed because of a change in application demands. Netbed supports the dynamic addition
or removal of nodes in any active experiment, and can
graft added nodes into LAN-connected topologies.
To ensure consistent naming across instantiations of
an ns configuration, Netbed virtualizes IP addresses and
host names. This level of indirection is necessary since a
configuration is unlikely to be mapped to the same physical resources upon re-creation. While experimenters are
free to manually assign IP addresses, this task is most often left to Netbed. Netbed deterministically names nodes
and links for consistency across experiment creations.
3.5 Node Self-Configuration
Node configuration is driven by the nodes themselves,
but entirely controlled by state stored centrally in the
database. This is accomplished in a manner reminiscent
of Unix dynamic linking and loading. A traditional dynamic linker is responsible for establishing the proper
context for a process, loading it, and then invoking it.
Netbed applies this strategy at the node level to achieve
distributed self-configuration, which includes obtaining
a host name, loading a disk image, and executing experiment startup scripts.
Intelligent node state management is crucial in realizing our robustness and security goals. Nodes are kept
free of persistent configuration state; their memory and
local disks are considered volatile soft state. This allows
an experiment to be “swapped out” and its resources reclaimed. If experimenters wish to retain local disk modifications, such as kernel revisions, they can easily save
an image of their disk on persistent store. A reference
to the image is stored in the database and becomes hard
state. While an experiment is swapped out, Netbed stores
its virtual topology, host name, and general setup in the
database. “Swap in” reconstitutes this hard state on an
equivalent set of physical resources and brings the node
to a fully-known state.
For local nodes, Netbed ensures that a clean disk image is installed on every node before experiment swap-in
or creation. Then, in parallel, Netbed attempts to reboot
all the nodes using increasingly aggressive techniques.
First, it issues a reboot command via ssh; any nodes
that fail to boot in a timely manner are sent a secure
authenticated “ping of death”; should that fail, they are
power-cycled. Nodes boot using Intel’s PXE [34] network bootstrap protocol. Each node’s PXE BIOS con-
tacts masterhost, which loads a first level kernel as
directed by the database. This first level kernel might be
a fast disk image loader, a memory file system-based operating system, or typically, a larger second level bootstrap program. This second level loader again contacts
the database to determine the next step, either booting
from an on-disk partition or downloading an OSKit [12]
kernel. This multi-phase approach permits flexible configuration and customization of the OS that runs on each
node. The system then waits for the nodes to come
back up. If a node does not come up in a timely manner, one more attempt is made; if it still fails, the entire
experiment swap-in fails. To improve resilience, overallocation of nodes is an obvious avenue for future work.
It is not entirely straightforward, due to topological constraints and heterogeneous node types.
Distributed nodes use an analogous disk loading
mechanism. Each time a distributed node reboots, it does
so from a CD-ROM which then negotiates with masterhost to, if necessary, securely apply software updates or reload the disk over the network. On each distributed node, Netbed instantiates a new Jail in a known
initial state, analogous to the known initial state of a local node after disk loading and booting. In addition, a Jail
can be “powered off” by terminating it or “rebooted” by
restarting it.
Once a node or Jail has booted, our initialization sequence invokes a node configuration script that uses
a program called the Testbed Master Control Client,
TMCC , to securely communicate with a daemon on masterhost that fronts the database. Using this script and
TMCC , a node obtains and initializes its hostname, experimental network IP addresses, routes, software packages, user accounts, and other configuration information. Local nodes NFS-mount the appropriate project
tree and users’ home directories from fileserver; in
the wide-area, SFS is used instead.
3.6 Experiment Control
Traditional operating systems provide signals as a rudimentary form of control over local processes. Whereas
users often start, stop, and resume processes, experimenters want to start, stop, and resume traffic generators
and network monitors. To support dynamic experiment
control, Netbed uses an event system to extend the notion of signals across sets of nodes and links. This facility closely mirrors the style of event schedulers found
in network simulators. Just as simulation allows experimenters to manipulate link characteristics at prescribed
times, so too can experimenters dynamically change latencies, bandwidths, and loss rates on emulated links.
For example, to bring down a link named link0 10.5
seconds after experiment creation, a script would specify: $ns at 10.5 "$link0 down".
Our event system is built on top of Elvin [38], a publish/subscribe system that supports federation. Static
events are extracted from the database and fed into Elvin
at experiment creation time. Dynamic events may be created through library interfaces and a command-line tool.
Current clients of the event system include traffic generators, a WAN emulator control agent, a general remote
execution facility, and Netbed’s own management programs.
The event system is used extensively on local nodes
but sparingly on distributed nodes, due to its current insecure deployment. Well-known solutions exist to secure the system; we are exploring a number of them, including using Elvin’s “security keys,” which limit the exchange of subscriptions and events to specific producers
and consumers.
The event system controls high-level abstractions as
defined in the ns configuration file, including links,
nodes, and program objects. If experimenters were restricted to such high-level interfaces and tools, Netbed
would limit the granularity of their control. Therefore, to
the extent allowed by local policy, Netbed provides lowlevel and open access to resources, including root privileges on local nodes and Jail-restricted root privileges
on distributed nodes. Of course, with such privileges
experimenters can unwittingly corrupt their resources.
Netbed’s ability to quickly restore an experiment’s hard
state from the database and reload disk images makes it
easy to recover from such accidents.
Root access on local nodes has proven to be an especially valued aspect of control, since it enables experiments requiring kernel modifications or access to raw
sockets. To maintain security and isolation in the face
of root access, Netbed prevents MAC and IP spoofing
on local nodes through switch mechanisms. Since privileged access is mediated by Jail on shared, distributed
nodes, these issues are not a concern there: though a process “in jail” can access raw sockets, it can only bind to
its assigned IP address. This gives experimenters access
to tools such as tcpdump and traceroute, without
exposing insecurities.
Since the local nodes currently in use have serial console lines, power controllers, multiple network interfaces, and are dedicated to an experiment, they provide
additional control mechanisms. Each local node is connected to a separate control network, isolated from the
networks that are used for experimental traffic. This separate network provides three important features: more
reliable control, cleaner experimental data, and greater
security. Unless a program requires the use of a display or mouse attached directly to the node, Netbed does
not penalize remote experimenters—with only minor exceptions, remote users have as much control over these
nodes as they do over desk-side machines. For exam-
ple, node consoles are virtualized so that an experimenter
need not be logged into the server that physically hosts
the serial console lines. Instead, all consoles can be securely accessed from any Unix or Windows machine via
a local telnet session, connected through a transparent
application-level SSL tunnel. We find that most kernel
developers, once they have tried it, prefer remote use of
Netbed machines to using desk-side test boxes.
3.7 Preemption and Scheduling
Traditional operating systems preempt and schedule processes for better system throughput and CPU utilization.
Because Netbed manages shared community resources,
efficient utilization is also a priority. Local nodes currently use a conservative allocation policy: each virtual
node is mapped to a separate physical node. Therefore,
Netbed can preempt idle experiments on local nodes to
reacquire physical resources and to satisfy “runnable”
experiments. Distributed nodes typically run each virtual node within a Jail, and are not currently subject to
preemption. This policy is used because an idle distributed virtual node consumes only a single Jail rather
than an entire physical node, and additional OS resource
accounting mechanisms would be needed to accurately
detect idle virtual nodes.
Local nodes are often idle despite being assigned to
experiments. Determining idleness in Netbed is difficult; the indicators used in standard clusters are not sufficiently sensitive, since activity may constitute something
as simple as infrequent network probes. Netbed’s idle
detection system currently monitors three metrics: traffic on the experimental networks, use of pseudo-terminal
devices, and CPU load averages.
To avoid inconveniencing users, we manually confirm idle indications with them before swapping out their
experiments. With recent tuning of the idle detection
heuristics, Netbed has not experienced false positives
and appears to find all truly idle experiments. Since our
current swapping mechanism preserves only hard state,
users with experiments dependent on soft state may manually disable preemption. With planned future work in
disk state saving, Netbed should be able to safely preempt such experiments.
When experimenter interaction is not required, Netbed
can fully automate the experimentation process by
scheduling batch experiments, which execute whenever
resources become available. Batch processing allows an
experimenter to iterate over a large problem space without manual interaction. It also helps accommodate large
experiments that may only find sufficient resources at
low-usage, inconvenient times. Such off-peak scheduling further improves Netbed utilization.
4
Improving Network Experimentation
While Netbed provides most of the benefits of emulation,
simulation, and wide-area experimentation, it is more
than a simple sum of services. Netbed’s common set
of tools and abstractions have important practical benefits for experimentation, including: automated and efficient realization of virtual topologies, efficient use of resources through time- and space-sharing, and increased
fault-tolerance through resource virtualization.
The savings afforded by automated mapping of a virtual topology to physical devices removes a significant
experimentation barrier. Our user experiments show that
after learning and rehearsing the task of manually configuring a 6-node “dumbbell” network, a student with
significant Linux system administration experience took
3.25 hours to accomplish what Netbed accomplished in
less than 3 minutes. This factor of 70 improvement
and the subsequent programmatic control over links and
nodes encourage “what if” experiments that were previously too time- and labor-intensive even to consider.
Efficient use of scarce and expensive infrastructure is
also important and a sophisticated testbed system can
markedly improve utilization. For example, analysis of
12 months of Netbed’s historical logs gave quantitative
estimates of the value of time-sharing (i.e., swapping
out idle experiments) and space-sharing (i.e., isolating
multiple active experiments). Although the behavior of
both users and facility management would change without such features, the estimate is still revealing. Without
Netbed’s ability to time-share its 168 local Utah nodes,
a testbed of 1064 nodes would have been required to
provide equivalent service. Similarly, without spacesharing, 19.1 years, instead of one, would be required.
These are order-of-magnitude improvements.
Netbed virtualizes node names and IP addresses such
that equivalent nodes can be used interchangeably. For
example, when an experiment is swapped in, it need not
execute on the same set of physical nodes. Any nodes
exhibiting the same properties and interconnection characteristics are suitable candidates. The flexibility to allocate from an equivalence class provides fault tolerance.
If a node or link fails, an experimenter need not wait until the node or link partition is available again, but may
instead re-map the experiment to an equivalent set of machines. This feature is valuable wherever node or link
failures are anticipated, such as within large-scale clusters or wide-area networks.
5
Key Services and Evaluation
Much of ns’s popularity and power result from the flexibility it gives experimenters to efficiently change parameters and network scenarios. Netbed aims to bring
a similar level of control and ease of use to emulated and
450
All other steps
Waiting for nodes to boot
Issuing node reboots
Resource reservation
Resource mapping
400
Experiment creation with disk loading
Experiment creation without disk loading
600
350
500
Time (seconds)
Time (seconds)
300
250
200
150
400
300
200
100
100
50
0
0
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
70
80
Number of Nodes
Number of Nodes
Figure 3: Time to create an experiment without disk loading. Times
shown are cumulative, i.e., the difference between adjacent lines represents the time for that step.
Figure 4: Time to create an experiment with disk loading. Time without
disk loading from Figure 3 is also shown for comparison; note that the
y-axis scale is different here.
wide-area experimentation, through automation and efficiency. In this section we describe the main challenges to
Netbed’s efficiency, and evaluate how well Netbed meets
those performance challenges. These challenges include
experiment creation and swapping, disk loading, mapping of virtual resources to local and distributed physical
resources, and multiplexing simulated nodes.
taken by assign to map physical resources. The next
line is for reservation of those resources, which turns out
to be dominated by reassigning serial console lines and
logs. The next line is for issuing reboots to the nodes.
They are rebooted in parallel, with a ten second pause
every eight nodes so as not to over-stress network resources and lose too many control-related UDP packets,
typically manifested by nodes failing to boot.1 Finally,
in the slowest step, Netbed waits for all nodes to come
back up. The PC’s BIOS is the biggest culprit; average
time spent in the BIOS was 55 seconds for the nodes used
in this experiment. Netbed also has 40 nodes that spend
only 20 seconds in the BIOS, but in order to achieve consistency up to large scales, we limited these experiments
to the more numerous nodes.
Figure 4 shows the additional expense of automatic
disk loading, performed when an experimenter requests
a custom disk image. Since our default dual-boot
FreeBSD/Linux disk images prove sufficient for most experimenters, the majority of experiments do not incur
this cost. Though much of the added time comes from
transferring and writing the new disk image, a significant
amount comes from rebooting each node twice (once to
enter the disk loader, and again into the newly-loaded operating system). Although the absolute time for experiment creation is higher when loading disks, it is similarly
scalable; the marginal cost per node is comparable.
5.1 Experiment Creation and Swapping
This subsection quantifies the time spent in experiment
creation, which is comprised of parsing, global resource
allocation, and local self-configuration, as described in
Section 3. These results apply only to local resources;
since distributed nodes are typically shared resources,
Netbed does not routinely reboot them or re-install disk
images on experiment creation. As shown in Figures 3
and 4, disk loading and node rebooting dominate experiment creation time. Therefore, configuration of distributed nodes is lightweight and not examined here.
The top line in Figure 3 shows the total time to create typical experiments. The duration of experiment creation is essentially equal to the swap-in duration, since
the one-time expenses unique to experiment creation are
insignificant compared to the cost of mechanisms shared
by both, such as node rebooting. A single-node experiment takes 135 seconds. The majority of this time is
spent rebooting the node and waiting for it to finish booting. As experiment sizes grow, creation time remains
linear, with a marginal cost per node of approximately
3.4 seconds. Throughout the process, Netbed exploits
parallelism as much as possible. For example, although
it takes non-negligible time, VLAN setup does not contribute to creation time because it occurs in parallel with
the longer node reboot stage.
Figure 3 also breaks out the costs of the most timeconsuming stages of experiment creation, in the order
those steps occur. The bottom line represents the time
5.2 Mapping Local Resources
Netbed’s local assignment phase must not only realize
user-specified node types, features, link characteristics,
and topologies, but must also respect the limitations of
available bandwidth. That is, Netbed ensures that the
physical hardware will support the emulated traffic flows
1 The
PXE ROMs use UDP and a fixed timeout that we cannot
change; hence we are forced to work around the problem.
Nodes
D
C
E
B
A
D
80
70
E
B
F
60
C
Virtual toplogy
Switches
100Mbit links
F
Physcial Topology
Figure 5: A trivial six-node partitioning problem
without introducing any bottlenecks, with their accompanying experimental artifacts. To map the desired virtual
topology of Figure 5 onto the physical topology shown to
its right, Netbed should pick a physical realization which
groups A, B, and C together on one switch, and D, E,
and F on the other switch; any other configuration will
attempt to send excess traffic across the inter-switch link.
This testbed mapping problem problem is trivial in this
six-node example, but in the general case, is NP-hard
(by reduction to the multiway separator problem or the
minimum-degree graph partitioning problem [13]). In
conjunction with aggressive abstraction techniques to reduce the search space, assign uses simulated annealing [16], a randomized heuristic algorithm, to map virtual nodes and links to local nodes and VLANs. In addition to satisfying the individual experiment’s requirements, the algorithm also attempts to minimize the required inter-switch bandwidth and the number of involved switches, in order to promote efficient utilization
of the cluster.
Netbed has kept detailed logs of every experiment submitted since June 2001. We analyzed the following 12
months’ data, covering over 2000 experiments. Figure 6
shows that a reliable indicator of the difficulty of a mapping problem, as measured by the runtime of assign, is
the number of virtual nodes the user requests. We added
a general notion of resource equivalence classes to assign in December 2001; the strikingly bimodal distribution in the figure demonstrates the resulting improvements. Grouping nodes into equivalence classes greatly
reduces the search space since assign need only search
the small number of equivalence classes rather than the
large number of nodes. The new version takes less than
13 seconds on even the largest topologies and less than 5
seconds for most experiments.
5.3 Mapping Distributed Resources
The distributed case has different constraints. First, the
underlying physical nodes are treated as fully connected,
via the Internet. Second, distributed nodes are fairly well
characterized by the nature of their “last-mile” link, e.g.,
cable modem, commodity Internet, or Internet2. Therefore, Netbed assigns corresponding intuitive subtypes
to distributed nodes, e.g., pc-cable, pc-inet, pcinet2. This typing lets experimenters request virtual
Time (seconds)
A
90
50
40
30
20
10
old assign
new assign
0
0
10
20
30
40
50
60
70
80
90
Number of Virtual Nodes
Figure 6: Performance and Scaling of assign
nodes by their type or subtype, rather than specify a particular topology connecting them. Netbed’s generic resource assignment code, identical for both local and distributed resources, handles this common situation.
However, some experimenters may want more
precisely-matched resources or a particular virtual topology. Netbed allows them to request a virtual topology with wide-area links of specific latency, loss, and
bandwidth characteristics. They may assign weights to
each of the three attributes, based on their perceived
importance. Unlike the highly configurable local links
in a Netbed cluster, connections between distributed
nodes traverse the Internet through uncontrollable links.
Therefore, our challenge is to map virtual nodes to physical resources such that the requested links best match
the actual characteristics of the corresponding internode Internet paths. (Netbed’s database is updated frequently with the measured latency and loss on the N xN
paths, and occasionally updated with bandwidth measurements.)
This mapping is a variation of the NP-hard Quadratic
Assignment Problem. To provide an efficient, best-effort
solution, Netbed’s wanassign is implemented as a genetic algorithm [39]. Possible solutions are scored based
on how closely they match desired link characteristics.
For each solution, a normalized sum of errors-squared
is found for latency, loss rate, and bandwidth. A geometric mean of the three errors results in an overall
score. Wanassign evolves its answer by propagating
solutions with the least error.
We conducted two experiments to test wanassign’s
performance. The first mapped a wide variety of virtual
topologies onto a set of 16 physical, distributed nodes.
We varied the number of requested nodes from 4 to 16
and the number of requested links from 4 to 120, examining 48 pairs from this set to present a cross section
of experiment complexities. For each of these pairs, we
ran hundreds of tests on automatically-generated topolo-
6
4 nodes
6 nodes
8 nodes
10 nodes
12 nodes
14 nodes
16 nodes
5
Time (Seconds)
4
3
2
1
0
0
20
40
60
80
100
120
Number of Edges
Figure 7: Average time for wanassign to find a solution for a variety
of experimental topology complexities (node and edge counts).
gies. Figure 7 shows the average time to find a solution
for each complexity. Interestingly, mappings using all
16 nodes were found much faster than mappings using
most, but not all, of the nodes. The results show that for
modestly-sized experiments, the algorithm does not contribute noticeably to the total experiment setup time, nor
is it prohibitively slow for experiments involving most of
the available nodes.
The second experiment explored further scalability,
mapping a range of virtual topologies onto a synthetic
set of 256 distributed nodes. All experiments requesting 32 virtual nodes, as well as all sparse topologies,
mapped in a few minutes. For larger and denser topologies, up to 256 nodes and approximately 40 edges/node,
mapping time ranged from 10 minutes to 2 hours. We
expect to improve these results by an order of magnitude
using the following three techniques: less stringent and
more clever termination conditions; standard optimization techniques, in particular memoizing; and parallelizing the algorithm, which is practical in either a shared
memory multiprocessor or on a cluster [39]. Finally, we
expect major additional improvement to come from “binning” the nodes and links into groups with similar characteristics, dramatically reducing the search space.
5.4 Disk Reloading
An important feature of testbed control is the ability to
reload the contents of node local disks automatically.
This not only ensures node integrity, but also allows custom OS configurations. The two common approaches
for achieving this goal are to load complete disk images [14, 32, 35] or to work through the file system to
incrementally synchronize a target hierarchy with a reference copy (rsync [37], Unison [41]). There are five
reasons for preferring disk imaging. (1) While sometimes more efficient in terms of network bandwidth, on
our images, at least, the synchronization approach is
slower. rsync takes over 50% longer to compare file
timestamps on our typical image (80K inodes, 500MB
data) than Netbed’s disk loader takes to copy all the allocated blocks. Comparing hashes of file contents takes
much longer. (2) Approaches that rely solely on file
timestamps cannot be used for security reasons, as falsified timestamps allow modified files to corrupt the next
experiment. (3) Approaches working through the file
system cannot be used on corrupt target file systems, nor
(4) to install custom OS’s with unknown file systems. (5)
Bulk disk imaging is scalable through multicast-based
approaches. A third approach based on content hashes
of blocks, as in LBFS [23], may be worth investigating.
Policy: The policy for disk reloading presents a tension between the latency of typical experiment creation,
overall Netbed throughput, Netbed system complexity,
node robustness, and experiments’ security. Our policies
have evolved over time, driven by our tools, pressure on
resources, and experience.
Each node in a new experiment requires a clean
disk. However, disk reloading remains the most timeconsuming aspect of experiment creation and swap-in,
even though we have reduced it to less than 100 seconds.
Netbed’s current policy reloads each node’s disk with
the default image containing both FreeBSD and Linux.
This works well since most users request one of these
OSes, and if there are sufficient free nodes, the disks are
reloaded in the background and are immediately available for the next swap-in.
A troubling effect occurs, however, in the common
case of a single experimenter creating and tearing down
very similar experiments, in quick succession; this frequently also happens with the batch queue. The nodes
are not available for the few (typically wasted) minutes
while reloading, during which time the user requests a
similar number of nodes for their next experiment. To
avoid this anomaly we currently pace the reloading of
freed nodes, instead of reloading them all at once. For
security reasons, we allow an un-reloaded node to be assigned only to an experiment in the same project as the
node’s previous experiment. This approach, however,
has robustness vulnerabilities, since the disk’s soft state
will not be reinitialized, and may have been changed by
the previous experiment—though that is rare.
Users can also specify an alternate disk image or partition. In this case, the background disk reloading is
wasted, as the default image is overwritten by the user’s
custom one. Automated analysis of historical and ongoing experiment creation and swap patterns is one promising way to attack this challenge.
Process: The procedure for disk reloading follows
the initial steps described in Section 3.5: the PXE BIOS
loads the initial bootstrap which in turn loads a small,
memory file system-based FreeBSD system used to run
the disk loader client. This client contacts an instance of
the disk loader server, downloading, uncompressing and
writing out the disk image. After completion, the node
reboots from the newly installed image.
We currently provide a small set of images containing
various versions of Linux and FreeBSD; we will soon
add Windows XP. Custom disk images can be used to
boot an unsupported OS, to load a newer (or older) version of a supported OS, or to install a specialized version
of an existing image on multiple nodes.
The Netbed disk loader, termed “Frisbee” (the flying
disk) uses three main techniques to improve performance
from Netbed’s first loader, which took 29 minutes per
image. First, it carefully overlaps block decompression
and device I/O. Second, it uses a domain-specific compression algorithm that uses file system information to
identify which parts of the disk need to be saved; it compresses these portions with standard zlib-based compression. Third, it uses a custom reliable multicast protocol to deliver compressed images to clients, dramatically
reducing the required server bandwidth and improving
scalability. The result is that a standard FreeBSD image requires 88 seconds to load onto a single node. It
also scales well; 80 nodes can be loaded simultaneously
with an average of only 97 seconds per node, and with all
nodes completing in 117 seconds. Frisbee’s performance
also compares favorably to commercial tools; in our initial tests, it was able to load our standard Linux image on
a single node in 77% of the time taken by Norton Ghost.
The compression algorithm exploits the fact that many
disks contain large swap partitions and mostly-empty file
systems, and looks at partition types and file system freeblock lists to find these. For example, one of our standard
FreeBSD images for a 3GB partition is over 80% unused,
and reduces to 156MB using Frisbee image compression, versus 473MB using naive zlib compression. In
addition to saving network bandwidth when transferring
the file, the file system-specific compression enables the
Frisbee decompression program to optionally skip, rather
than zero, the free file system blocks when writing the
disk image. This turned out to be very important: once
we had done standard compression and implemented a
multicast mechanism, writing to the disk became the bottleneck. For the aforementioned FreeBSD disk image,
Frisbee wrote 550MB of actual decompressed data rather
than the full 3GB.
5.5 Scaling of Simulated Resources
Experiments can leverage simulation to multiplex simulated nodes onto a single physical node and to obtain
greater scalability. Since the simulator interacts with the
physical world through nse, it must keep pace with real
time. Its ability to do so is dependent on the rate of
events that need to be processed, rather than the num-
ber of nodes or links per se. Towards achieving greater
scale, we have made several improvements and contributed fixes to nse. We describe here a simple study
that achieves greater scale through simulation.
An instance of nse simulated 2Mb constant bit rate
UDP flows between pairs of nodes on 2Mb links with
50ms latencies. To measure nse’s ability to keep pace
with real time, and thus with live traffic, a similar
link was instantiated inside the same nse simulation, to
forward live TCP traffic between two physical Netbed
nodes, again at a rate of 2Mb. On an 850MHz PC, we
were able to scale the number of simulated flows up
to 150 simulated links and 300 simulated nodes, while
maintaining the full throughput of the live TCP connection. With additional simulated links, the throughput dropped precipitously. We also measured nse’s TCP
model on the simulated links: the performance dropped
after 80 simulated links due to a higher event rate from
the acknowledgment traffic in the return path.
More complex hybrid topologies exposed unanticipated routing behavior. Incorrect routing arises when
an nse simulation, running on a multihomed host, relies on its kernel’s routing tables. The solution required Netbed’s global system perspective; it computes
the overall routes, using Unix policy routing mechanisms
(ipfw and ipchains) to control the packet routes.
6
Validation and Testing
This section validates Netbed’s emulation capabilities
through micro- and macro-benchmarks. Since Netbed
is itself a complex and evolving distributed system, it
requires continual testing and validation. This section
therefore outlines a testing methodology intended to ensure Netbed’s continued accuracy.
6.1 WAN Emulator Validation
There are two concerns with using off-the-shelf PCs and
a general purpose operating system for emulation: first,
machines must be able to keep pace when emulated links
are operating at full speed; second, delays, bandwidths,
and packet loss rates should be emulated accurately.
Emulation nodes in Netbed run a FreeBSD 4.6 kernel
with Dummynet and polling device drivers. We run these
kernels with a clock frequency of 10000HZ to allow submillisecond delay granularity, while the polling drivers
reduce interrupt load and provide improved precision.
As a capacity test, we generated streams of UDP
round-trip traffic between two nodes, with and without
an interposed emulator node. The emulator node showed
no adverse effects on 1518-byte packets; either configuration easily saturated a 100Mb link. With 64-byte packets, the two nodes exchanged 55000 packets (3.5MB) per
second when connected directly versus 37000 packets
delay
(ms)
0
5
10
50
300
packet
size
64
1518
64
1518
64
1518
64
1518
64
1518
observed Dummynet
RTT
stdev
% err
0.177
0.003
N/A
1.225
0.004
N/A
10.183
0.041
1.83
11.187
0.008 11.87
20.190
0.063
0.95
21.185
0.008
5.92
100.185 0.086
0.18
101.169 0.013
1.16
600.126 0.133
0.02
600.953 0.014
0.15
adjusted Dummynet
RTT
% err
N/A
N/A
N/A
N/A
10.006
0.06
9.962
0.38
20.013
0.06
19.960
0.20
100.008
0.00
99.943
0.05
599.949
0.0
599.728
0.04
observed nse
RTT
stdev
0.238
0.004
1.554
0.025
10.251
0.295
11.586
0.067
20.255
0.014
21.675
0.093
100.474 0.029
102.394 3.440
601.690 0.546
602.999 0.093
% err
N/A
N/A
2.51
15.86
1.28
8.38
0.47
2.39
0.28
0.49
adjusted nse
RTT
% err
N/A
N/A
N/A
N/A
10.013
0.13
10.032
0.32
20.017
0.09
20.121
0.61
100.236
0.24
100.840
0.84
601.452
0.24
601.445
0.24
Table 1: Accuracy of Dummynet and nse delay at maximum packet rate as a function of packet size and link delay. The 0ms measurement represents
the base overhead of the link. Adjusted RTT is the observed value minus the base overhead.
bandwidth
(Kbps)
56
384
1544
10000
45000
packet
size
64
1518
64
1518
64
1518
64
1518
1518
observed Dummynet
bw (Kbps)
% err
56.06
0.11
56.67
1.89
384.2
0.05
385.2
0.34
1544.7
0.04
1545.8
0.11
10004
0.04
10005
0.05
45019
0.04
observed nse
bw (Kbps)
% err
55.60
0.71
56.63
1.12
376.3
2.00
382.1
0.49
1444.5
6.44
1531.0
0.84
N/A
N/A
9659.6
3.40
39857
11.43
packet loss
rate (%)
packet
size
0.8
64
1518
64
1518
64
1518
2.5
12
observed Dummynet
loss rate
% err
(%)
0.802
0.2
0.803
0.3
2.51
0.4
2.47
1.1
12.05
0.4
12.09
0.7
observed nse
loss rate
% err
(%)
0.819
2.37
0.820
2.50
2.477
0.92
2.477
0.92
11.88
1.00
11.89
0.91
Table 3: Accuracy of Dummynet and nse packet loss rate as a function
of link loss rate and packet size.
Table 2: Accuracy of Dummynet and nse bandwidth as a function of
link bandwidth and packet size.
rized in Table 2.
(2.4MB) when joined by an emulator node. Since these
are round trip measurements, the packet rates are actually
twice the numbers reported.
To bound the accuracy and precision of emulation
nodes, we performed a series of experiments using a representative range of delay, bandwidth, and packet loss
rate values coupled with high packet rates for both large
and small packets.
After establishing maximum emulation rates for large
and small packets, we ran a series of tests using those
packet rates with various delay, bandwidth, and loss rate
values, measuring both accuracy and precision. The delay results are presented in Table 1. The 0ms rows represent the base overhead associated with interposition of
an emulation node. These results seem to indicate, and
further experimentation confirmed, that emulation node
overhead is proportional to the packet size. As indicated
in the “observed” column, small packets show noticeable
error with delays less than 10ms and large packets suffer
with delays less than 50ms. While both are tolerable for
wide-area emulation, we can improve accuracy by adjusting delays to compensate for emulation overhead. As
a first approximation, we scaled delays by the base overhead shown in the 0ms case. The adjusted results, shown
in the “adjusted” column, are both accurate and precise.
To measure the bandwidth limiting capabilities of an
emulation node, we used one-way traffic. A sender node
sent packets through an emulation node to a consumer
node, which calculated bandwidth. Results are summa-
Finally, using the same setup, we instead measured
packet loss rates as observed by the consumer. Results
are summarized in Table 3.
6.2 nse Validation
This section uses the methodology of Section 6.1 to validate the observed latencies, bandwidths, and loss rates
induced by ns’s emulation facility, nse, against their expected values. nse runs on a FreeBSD 4.5 kernel at
1000HZ. The simulation is configured with two nodes
and a duplex link connecting them. The physical node
running nse interposes two other traffic-generating physical nodes. This setup mimics Section 6.1, differing
only in packet rate. A maximum stable packet rate of
4000 packets per second was determined over a range of
packet rates and link delays using 64-byte and 1518-byte
packets. Note that the actual capacity is twice this value
due to the duplex link. With this capacity, we performed
experiments to measure the delay, bandwidth and loss
rates for representative values. The results are summarized in Tables 1, 2 and 3. Netbed’s integration of nse is
much less mature than its support for Dummynet. This
is reflected in the larger relative error rates of nse bandwidth and loss rates with respect to Dummynet. Integrating nse has already uncovered a number of problems that
have since been solved; as we continue to gain experience with nse, we expect the situation to improve.
Fast
Slow
tics
29
21
Live Internet
stddev
retransmits
0.00
1.10
0.73
1.70
tics
28
21
Emulated
stddev retransmits
0.67
1.10
0.52
2.80
Table 4: Median “tic” rates and packet retransmission counts achieved
by DOOM clients, both on live Internet and emulated links. Numbers
are repeated both for nodes with uniformly fast links and with some
intermixed slower links.
6.3 Validation Against a Wide-Area Network
This section validates Netbed’s emulation mechanisms
against a wide-area network: it compares two macrobenchmarks run on a set of live Internet nodes and then
within a corresponding emulation. The first example
also demonstrates the transparency of Netbed’s heterogeneous resource specification and its ability to provide
a best-fit mapping between requested wide-area links and
live Internet links.
Distributed Multiplayer Game:
This benchmark
evaluates a derivative of DOOM on four network configurations, making at least four repeated runs on each. In
these scenarios, five synthetic clients communicate using
a simple protocol. At a target rate of 30 times per second, each client sends unicast packets to all other clients,
doing so only after receiving all packets from the prior
period. We specified the desired latency and bandwidth
of the ten links comprising a fully-connected graph between the five clients.
The first configuration specified a node type of
pcvremote to obtain wide-area “virtual” nodes. In this
sense, virtual means the nodes may be multiplexed onto
a single physical distributed node. Netbed’s distributed
mapping service, the genetic algorithm described in Section 5.3, found the best-matching fit from among the distributed nodes with available virtual node “slots.”
The second configuration used the same link specification, but instead of mapping to the live Internet, requested emulation on local nodes and links. Making that
switch to an entirely different experimental environment
required changing only one line within a Tcl loop that set
the node type. The third and fourth configurations were
analogous to the first two configurations, but requested a
few substantially slower links.
The results were similar between emulation and the
live Internet, as presented in Table 4. The two key metrics in DOOM are “tic rate” and packet retransmission.
Tic rate in this example is affected primarily by latency,
and represents the rate at which progress is made in
the system—a higher tic rate indicates faster progress.
Packet retransmission rates are governed by bandwidth
and packet loss rate; there are typically only a handful of
retransmitted packets per trial.
Wide-Area Database Replication:
Researchers at
Johns Hopkins University are studying group communication mechanisms for wide-area replication of
databases. In the course of their research, they compared results from the CAIRN wide-area network [7]
to those obtained emulating the observed CAIRN delay and bandwidth characteristics with Netbed. Their
application-level measurements of communication characteristics matched well [3]. Netbed offered two advantages over CAIRN: First, with Netbed’s control, they
were able to study the system-wide effects caused by
varying network characteristics. Second, they were able
to obtain a set of nodes of a consistent type.
6.4 Testing
Netbed presents unusual testing challenges: First, it is inherently coupled to physical artifacts which, unlike software state, can not be cloned. This makes full test and
regression runs impossible. Second, its mission is to provide a public evaluation platform for arbitrary programs.
This mission simultaneously puts a premium on accuracy and precision, while presenting a fundamentally unknowable workload. Combined, these two reasons also
mean that Netbed must run continuously, even as its software radically evolves.
We have countered with the following procedures.
First, we have created a separate 8-node Netbed,
Minibed. As an independent Netbed instance, Minibed
is also important to our future work on federation.
Second, we have integrated support for testing
throughout the Netbed software suite. In addition to the
normal operating mode, all of our software supports a
“test mode” in which any operations that normally affect
hardware are prevented. It allows us to make duplicate
installations of Netbed databases and software, including web interfaces and daemons, and to run tests of the
software without requiring exclusive access to hardware.
We also have incorporated a “full-test mode,” in which
we can reserve hardware in the master Netbed database
and use that hardware in conjunction with the duplicate
database and software. This enables the test environment to affect this hardware, which is ignored by the
“main” Netbed system. This feature is made possible
by database-driven, node-specific redirection to alternate
daemons and databases.
Third, we have developed a comprehensive regression
test suite that is run nightly and optionally at compile
time. However, we currently only systematically test
for software bugs. To monitor Netbed accuracy, we are
adding additional point tests as well as end-to-end tests.
7
New Experimental Techniques
This section showcases the novel experimental opportunities made possible by Netbed. The first case study capitalizes on Netbed’s ns compatibility to automate comparison of emulated and simulated results. Other systems
have leveraged a similar synergy between simulation and
live experimentation [6], but required adoption of a nonstandard programming interface. The second case study
shows the importance of automation.
7.1 TCP Dynamics
Network simulators, such as ns, have proven invaluable
in studying TCP behavioral dynamics [11]. Nevertheless, with its abstractions such as one-way protocols with
simplified window and ACK behavior, simulation should
be validated empirically. Ironically, the potential for
bugs and unspecified design parameters mean that real
implementations do not necessarily define valid behavior, either. Fortunately, the notion of “deviant behavior” [9] allows an experimenter simultaneously to gain
confidence in the validity of simulation and the correctness of implementation. This case study leverages existing simulation experiments to drive emulated scenarios.
This approach makes an existing corpus of test scenarios amenable to live experimentation. Thus, corner cases
with known results can be applied as regression tests to
real network stacks to evaluate their conformance.
The ns maintainers run nightly regression tests [24].
Netbed’s ability to parse ns scripts means these scripts
can instead be used to validate ns behavior against emulation. Further, the tests may drive regression testing
of a kernel implementation or a comparison across several implementations. This section presents preliminary
results that show the feasibility of automating this process. The study of low-level, fine-grained TCP dynamics
shows Netbed’s flexibility in modulating a virtual network at various scales.
Our framework executes a test script within ns and
parses output trace files to determine where to generate traffic, which packets are dropped, and which links
suffer losses. It then configures a network topology
via Netbed’s event system and passes a list of target
drop packets to the correct Dummynet node (we have
extended Dummynet to drop packets by ordinal packet
number). Again via the event system, the framework
starts a program object to record packet traces and finally
invokes the traffic generators.
Figure 8 shows a simple test from the ns validation
suite that drops a single packet in a TCP New Reno
stream. The ns and FreeBSD 4.5 senders detect a Triple
Duplicate ACK and perform a Fast Retransmit immediately. They behave similarly; over 10 experiments
FreeBSD 4.5 achieves a mean throughput of 50232Bps
(standard deviation 4.09) and ns achieved 48090Bps.
By contrast, we discovered that FreeBSD 4.3 does not
retransmit until triggered by a timer expiration, which
greatly degrades throughput. The behavior in FreeBSD
4.3 is caused by an uninitialized variable. A thorough
application of the full suite of TCP tests may well uncover additional subtle bugs that would be exceedingly
difficult to detect and reproduce without Netbed’s finegrained control.
7.2 The Armada I/O Framework
Simulation allows an experimenter to effortlessly explore
a large parameter space. Using Netbed’s programmatic
ns interface to loop over a configuration space and exercising its distributed event system to affect link characteristics, an experimenter has similar power over emulation.
Oldfield and Kotz [30] used these techniques in evaluating Armada [29], a file system for computational grids.
Armada’s performance is highly dependent on link bandwidth, latency, and packet loss rate. The authors used
Netbed’s batch system to evaluate every possible combination of 7 bandwidths, 5 latencies, and 3 application
parameter settings on four different configurations on a
set of 20 nodes, performing a total of 420 different tests
in 30 hours, averaging 4.3 minutes each.
8
Related Efforts
Network Emulation: ModelNet [42] is a new network emulation system focused on scalability. It uses
a small gigabit cluster, running a much extended version of Dummynet, which is able accurately to emulate
an impressively large number of moderate speed links.
This core routes packets between applications running
on additional “edge nodes.” Applications can be multiplexed on edge nodes, without resource isolation. ModelNet shares some of Netbed’s automatic configuration
of physical resources by including tools to take a target
topology specified in a high-level format and map it into
ModelNet mechanisms; it provides the added capability
of optionally distilling the topology to trade accuracy for
scalability.
ModelNet emphasizes scalability through a highperformance implementation of emulated links. This
contrasts with our emphasis on complete accuracy
through conservative resource allocation, exposure of all
resources (including link emulation mechanisms) to manipulation by experimenters, and integration of disparate
techniques into a common framework.
ModelNet’s core contributions are complementary to
Netbed’s; indeed, we intend to work together to integrate
ModelNet into Netbed. This combination should bring
Netbed’s rich user interface and ease of use to ModelNet, while adding a scalable new mechanism to those
available through Netbed’s common abstractions.
180000
160000
160000
140000
120000
100000
80000
60000
40000
20000
120000
Sequence Number (Bytes)
200000
180000
Sequence Number (Bytes)
Sequence Number (Bytes)
200000
140000
120000
100000
80000
60000
40000
100000
80000
60000
40000
20000
20000
0
0
0
0
0.5
1
1.5
2
2.5
Time (Seconds)
3
3.5
4
0
0.5
1
1.5
2
2.5
Time (Seconds)
3
Figure 8: New Reno One Drop Test: (a) ns (b) FreeBSD 4.5
Yet another link emulation technique is trace modulation [27], which recreates observed end-to-end characteristics of a wireless network. Interposing trace modulation instead of Dummynet would bring wireless emulation to Netbed.
There have been a large number of single-node network emulation efforts. These include hitbox [1],
ONE [2], NIST Net [26], and Rice’s support for evaluating their OS optimizations [31]. Another category is represented by the “Orchestra” fault-injection system [8].
With a few exceptions, these single node emulators were
tailored for a specific research application. A few multinode network emulators have been planned or built, but
only for specific projects. One of the earliest and largest
was a particular configuration of 12 workstations at USC
in 1994, used to study TCP Vegas [1]. They cite an emulator effort at Bell Labs [19], which apparently started to
build a more general emulator.
Distributed Network Testbeds: The “Access” vision [5] originated the idea of a set of small testbeds,
distributed over dozens of sites. The Access vision overlapped with Netbed in our shared emphasis on completely replaceable node software and our operational
model of a Web-accessible master control host. However, Access did not intend to provide an emulation facility nor did it intend to offer integration. They did recognize a need we identified only later, for real wide-area
links for some experimenters.
PlanetLab [33] is a new effort that plans to provide
to researchers a large number (1000) of centrally administered, geographically distributed PCs, along with
a modest number of clusters. This testbed, currently in
its initial phase, would be used for arbitrary research,
yet provide a transition avenue to production deployment
of overlay network services. Unlike Netbed, PlanetLab
plans to emphasize the design of APIs and services that
can be shared by higher-level services.
Netbed’s distributed node support is similar to what
is planned for PlanetLab’s next phase. Although with
a different primary goal, PlanetLab’s notion of a “service” across a “slice” of PlanetLab nodes is similar to
Netbed’s “experiment,” since Netbed experiments can be
3.5
4
0
0.5
1
1.5
2
2.5
Time (Seconds)
3
3.5
4
(c) FreeBSD 4.3 (different y-axis scale)
of arbitrary duration. An experiment is richer in that
it contains flexible notions of topology, swapping, hard
state, soft state, and optional shared persistent storage.
Like Netbed, PlanetLab’s current testbed management
is centralized. Their future plans emphasize unbundled
management in order to facilitate research into management; our plans emphasize federation, in order to achieve
greater scalability and another route to overlay service
deployment. In fact, we are jointly exploring providing
access to PlanetLab through Netbed’s interface.
Network Simulators: Network simulators successfully isolate protocol dynamics but may do so at the
expense of accuracy. Therefore, results from simulators may not be valid indicators of deployed performance [11]. Brakmo and Peterson [6] highlight differences between simulated and implemented TCP protocols. Their x-kernel-based simulator avoids inaccuracies
by using actual protocol code, as does recent work integrating Click elements into ns [25]. However, both systems rely on non-standard protocol implementations.
Cluster Management: Through its virtualization
of cluster hardware and software, “Emulab Classic”—
Netbed’s cluster-based emulation portion that has been
in public production use since October 2000—is relevant far beyond network experimentation. In its flexible and efficient allocation of all hardware and software
resources (except shared persistent storage) and ability
to isolate virtual sub-clusters, Emulab overlaps many or
most of the low level facilities in “computing utility” efforts such as IBM’s Océano [28], HP’s Utility Data Centers, and Duke’s Cluster-on-Demand [22]. Netbed has
the flexible interfaces and all the needed mechanisms—
including dynamically adding or removing nodes in an
experiment—to support reconfiguration by Service Level
Agreements or by sub-cluster management systems.
9
Conclusion
Acting as a virtual machine for network experimentation, Netbed virtualizes and integrates simulated, emulated, and distributed nodes and links. Through a rich
user interface, efficiency, and automation, Netbed enables qualitatively new kinds of experimentation across
these mechanisms.
[19] A. M. Lapone, N. F. Maxemchuk, and H. Schulzrinne. The Bell
Laboratories Network Emulator. Technical Report BL0113820930913-64TM, AT&T Bell Labs, Sept. 1993.
References
[20] D. Mazières, M. Kaminsky, M. F. Kaashoek, and E. Witchel. Separating key management from file system security. In Proc. of
SOSP ’99, December 1999.
[1] J. S. Ahn et al. Evaluation of TCP Vegas: Emulation and Experiment. In Proc. of SIGCOMM ’95, pages 185–195, Aug. 1995.
[2] M. Allman, A. Caldwell, and S. Ostermann. ONE: The Ohio
Network Emulator. Technical Report TR–19972, Ohio University Computer Science, Aug. 1997.
[3] Y. Amir, C. Danilov, M. Miskin-Amir, J. Stanton, and C. Tutu.
Practical Wide-Area Database Replication. Technical report,
Johns Hopkins University, 2002.
[4] D. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris. Resilient Overlay Networks. In Proc. 18th SOSP, Oct. 2001.
[5] T. Anderson. A Case for Access: A High Performance Communication and Computation Environment for Wide Area Distributed
Systems, Networking, and Applications Research.
http://www.cs.washington.edu/homes/tom/access/.
[6] L. S. Brakmo and L. L. Peterson. Experiences with Network
Simulation. In Proc. of ACM SIGMETRICS’96, May 1996.
[7] CAIRN: Collaborative Advanced Internet Research Network.
http://www.isi.edu/CAIRN/.
[8] S. Dawson et al. Testing of Fault-Tolerant and Real-Time Distributed Systems via Protocol Fault Injection. In Proc. FTCS ’96.
[9] D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs
as Deviant Behavior: A General Approach to Inferring Errors in
Systems Code. In Proc. 18th SOSP, Oct. 2001.
[21] P. E. McKenney, D. Y. Lee, and B. A. Denny. Traffic Generator
Software Release Notes. SRI International and USC/ISI Postel
Center for Experimental Networking. http://www.postel.org/tg/.
[22] J. Moore and J. Chase. Cluster On Demand. Technical Report
CS-2002-07, Duke University, Dept. of Computer Science, May
2002.
[23] A. Muthitacharoen, B. Chen, and D. Mazières. A Low-bandwidth
Network File System. In Proc. 18th SOSP, Oct. 2001.
[24] The Network Simulator ns-2: Validation Tests. http://www.isi.edu/nsnam/ns/ns-tests.html.
[25] M. Neufeld, A. Jain, and D. Grunwald. Nsclick: Bridging Network Simulation and Deployment. In Proc. MSWiM 2002.
[26] NIST Internetworking Technology Group. NIST Net home page.
http://www.antd.nist.gov/itg/nistnet/.
[27] B. D. Noble et al. Trace-Based Mobile Network Emulation. In
Proc. of SIGCOMM ’97, Sept. 1997.
[28] Océano Project. http://www.research.ibm.com/oceanoproject/.
[29] R. Oldfield and D. Kotz. Armada: A parallel file system for computational grids. In Proc. of IEEE/ACM International Symposium
on Cluster Computing and the Grid, May 2001.
[10] K. Fall. Network Emulation in the Vint/NS Simulator. In Proc.
IEEE ISCC ’99, 1999.
[30] R. Oldfield and D. Kotz. Using the Emulab network testbed
to evaluate the Armada I/O framework for computational grids.
Technical report, Dartmouth, May 2002. ftp://ftp.cs.dartmouth.edu/pub/raoldfi/armada/oldfield:armada-emulab-tr.pdf.
[11] S. Floyd and V. Paxson. Difficulties in simulating the Internet.
IEEE/ACM Transactions on Networking, 9(4), August 2001.
[31] V. S. Pai, P. Druschel, and W. Zwaenepoel. IO-Lite: A Unified
I/O Buffering and Caching System. In Proc. 3rd OSDI, Feb. 1999.
[12] B. Ford, G. Back, G. Benson, J. Lepreau, A. Lin, and O. Shivers.
The Flux OSKit: A Substrate for OS and Language Research. In
Proc. 16th SOSP, pages 38–51, Oct. 1997.
[32] Partition Image. http://www.partimage.org/.
[13] M. R. Garey and D. S. Johnson. Computers and Intractability: A
guide to the theory of NP-completeness. W. H. Freeman, 1979.
[14] Symantec Ghost. http://www.symantec.com/sabu/ghost/.
[15] J. Heidemann et al. Effects of Detail in Wireless Network Simulation. http://www.isi.edu/˜johnh/PAPERS/Heidemann00d.html.
[16] L. Ingber. Very Fast Simulated Re-Annealing. Journal of Mathematical Computer Modelling, 12:967–973, 1989. http://www.ingber.com/asa89 vfsr.ps.gz.
[17] IXP1200.
http://www.intel.com/design/network/products/npfamily/ixp1200.htm.
[18] P.-H. Kamp and R. N. M. Watson. Jails: Confining the omnipotent root. In Proc. 2nd Intl. SANE Conference, May 2000.
Acknowledgments: Many thanks to Chris Alfeld, Dave Andersen,
and Kirk Webb for discussion, design, and code, to Dave for MIT’s
RON testbed nodes, to Russ Christensen, Alastair Reid, Tim Stack, and
Parveen Patel for running experiments, to Eric Eide for editing, to exFluxers who helped build the Emulab cluster, to John Regehr, Robert
Morris, and the anonymous reviewers for comments, to Jim Griffioen
for bravely being the first to bring up another cluster, to Ron Oldfield
for the Armada results, to Nicolas Christin for suggesting validating
TCP dynamics, to Mark Tinguely for clarifications on FreeBSD TCP,
and to our many users. Finally, we are grateful to our many sponsors,
especially NSF under grants ANI-0082493 and ANI-0205702, Cisco
Systems, and DARPA/Air Force under grant F30602-99-1-0503.
[33] L. Peterson, T. Anderson, D. Culler, and T. Roscoe. A Blueprint
for Introducing Disruptive Technology into the Internet. In Proc.
HotNets-I, Princeton, NJ, Oct. 2002.
[34] PXE Preboot Execution Environment Specification Version 2.1.
ftp://download.intel.com/ial/wfm/pxespec.pdf.
[35] Rembo Technology. BpBatch. http://www.bpbatch.org/.
[36] L. Rizzo. Dummynet and Forward Error Correction. In Proc. of
the 1998 USENIX Annual Technical Conf., June 1998.
[37] rsync. http://rsync.samba.org/.
[38] B. Segall, D. Arnold, J. Boot, M. Henderson, and T. Phelps. Content Based Routing with Elvin4. In Proc. AUUG ’00, June 2000.
[39] R. Tanese. The Distributed Genetic Algorithm. In Proc. ICGA
’89. Morgan Kaufmann, 1989.
[40] The VINT Project. The ns Manual, Apr. 2002. http://www.isi.edu/nsnam/ns/ns-documentation.html.
[41] Unison. http://www.cis.upenn.edu/˜bcpierce/unison/.
[42] A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic,
J. Chase, and D. Becker. Scalability and Accuracy in a LargeScale Network Emulator. In Proc. 5th OSDI, Dec. 2002.
CSCI 606
Assignment # 3
Research
Read the research paper attached about:” An Integrated Experimental Environment for
Distributed Systems and Networks”
Address the two major questions:
1. How is a global resource allocation can be implemented and used efficiently within a
network for a distributed system?
2. How is validation used within a network experimentation and why this phase is
considered very important?
Write your answers using APA style format, give examples to support your answers (Network
Emulation, distributed network Testbeds, cluster management….)
Purchase answer to see full
attachment