read and give a summary of the papers

User Generated


Computer Science


read the papers below:

When reading ask yourself this questions:
* what is the paper about?
* why are they doing it (experiment or the research)
* how are they doing it?
* what is the result?
* future work?

You can compare the papers with each other and use picture from the papers if neccasarry
about half to one page per paper. Should be good academic english and use references

Unformatted Attachment Preview

Network Research Workshop Proceedings of the Asia-Pacific Advanced Network 2014 v. 38, p. 19-22. ISSN 2227-3026 A Flow-based Method to Measure Traffic Statistics in Software Defined Network Chun-Yu Hsu 1,*, Pang-Wei Tsai 1 , Hou-Yi Chou 1 , Mon-Yen Luo 2 and Chu-Sing Yang 1 1 Institute of Computer and Communication Engineering, Department of Electrical Engineering, National Cheng Kung University / No.1, University Rd., East Dist. Tainan City, Taiwan 2 Department of Computer Science & Information Engineering, National Kaohsiung University of Applied Sciences / No.415, Chien Kung Rd., Sanmin District, Kaohsiung City, Taiwan Mails:;;;; * 92633R, EE Department Building, Tze-Chiang Campus, National Cheng Kung University, No.1, University Rd., East Dist. Tainan City, Taiwan 701, Taiwan; Tel.: +886-6-2757575-62357; Fax: +886-6-234-5482 Abstract: Since software defined network became a revolution of network, many new developments and deployments are put forward. Device operating in software defined network environment will need a controller to store control policy. As a result of flow is one of traffic presentation units in software defined network, hence, this paper proposed a prototype of flow-based method to measure traffic statistics which can be used in OpenFlow network. By analysis of flow table, active flows which managed by controller can be listed, and information of each flow can be presented with developed module on the controller. After gathered port statistics from OpenFlow switch and entries in flow table, information of each flow can be presented on a developed graphic interface. By using this method, Traffic statistics of each flow may be more adaptive and realizable for measurement for observation. Keywords: flow-based; OpenFlow; traffic statistics. 1. Introduction 19 Since Software Defined Network (SDN) became a revolution of network, many new developments and deployments are put forward. The most significant difference between traditional network and software defined network is design of control- and data-plane. This will make a significant difference on network operation issues such as network management, traffic statistic and so on. Hence, this paper proposed a prototype of flow-based method to measure traffic statistics in OpenFlow network environment. According to collected information from controller, with developed module in this paper, there will be more adaptive and realizable to monitor flow processing in SDN environment. 2. Related Work 2.1. OpenFlow and OpenFlow Controller In the past, the network is hard to innovation and manage due to each device has its own control logic and has vendor dependency. To break this limitation, OpenFlow[1, 2] was proposed. The core idea of OpenFlow is to decouple the data plane and control plane. OpenFlow enabled devices can be controlled by controllers through secure channel. Controllers instruct devices to forward packets according to the instructions or flow entries. Users can deploy their new idea on the OpenFlow network with a central control logic. We choose POX[3] controller, origin from NOX[4], for deployment. 2.2. Mininet Mininet[5] is an open source network emulator for prototyping software defined networks. With Python language, Mininet is simple to use and has a great flexibility. It can create fairly components with its lightweight approach which uses an OS-level virtualization. As for switches, they are built upon OpenvSwitch[6] as OpenFlow based devices. 3. Design and Development 3.1. Flow Diagnostic Module Due to the design concept of OpenFlow, forwarding rules (flow entries) for all controlled switches will be kept in data structure of their controller. This module is used to make analysis of forwarding rules, we can list all flows on entire network which controlled by this controller. Usually these information are stored in a Flow Table. To identify different flows from different switches, our method attaches a distinctive id number to each flow for recognizing them. In the implementation of this flow-based measurement method, some modification has been made to POX controller. Hence, controller will handling flows from different switches correctly. 20 3.2. Data Collection Module Usually, there will be several counters stored packet processing record in each OpenFlow device. In our development, we add a data collection function implemented in controller for querying the counter periodically. Accumulated data such as packet count, byte count and dropped-packet number in period of time will be stored in data structure. The interval time is given by administrator. This module can itemizes data in two types: per port and per flow statistics. Per port collection is a basic way for gathering traffic statistics. For most classic network monitor presentations such as MRTG[7], port utilization of each device can be easily appeared. However, this may not be applicable for all OpenFlow devices. Therefore, a Data Collection Module is designed. It collects data from switches by controller periodically and send them for calculation. On the other hand, according to flow information which provided from Flow Diagnostic Module, a data structure for per flow statistics is also prepared to preserve these calculated statistics. 3.3. Statistic Integration Module Statistic Integration Module is used to aggregate traffic including port, flow and other index items. Collection of Statistic Integration Module will be sent to here, and be refreshed iteratively. For example, port status can be presented with received and transmit bit rate; received and transmit packet rate; packet average size and drop packet number, etc. For per flow status, packet rate and byte rate per flow are the crucial statistics in flow-based mode. 4. Experiment and Verification This paragraph shows experiment and verification results of proposed method in this paper. The experiments configured in Mininet 2.0.0. Controller devices are configured in OpenFlow 1.1.0 and Linux Kernel 2.6.32. We utilize Mininet to create simulation environment. In experiment, two of the hosts send heavy-load packets flow to another host. Our modules collect data, analysis the traffic load, and present the result in weathermap as shown in Figure.1. The green nodes and the orange nodes represent hosts and switches, while the number beside switch means switch id. Port and flow statistics are attached beside the weathermap. Taking Figure.1 as an example, the detail data for switch will be listed on web interface, including port statistics such as received packet count, transmitted packet count, dropped packet count and bit rate. For more, a flow statistics index is developed to identify the header, action and traffic statistics of each flow, and it will also be presented on web interface. 21 Figure 1. Weathermap and information on web interface. 5. Conclusions This paper proposed a flow-based method to measure traffic statistics in software defined network environment. According to collected information in POX controller, this design can provide statistics which are related to each flow. By using this prototyping method, Traffic statistics of each flow may be more adaptive and realizable for measurement or observation. References 1. Nick McKeown; Tom Anderson; Hari Balakrishnan; Guru Parulkar; Larry Peterson; Jennifer Rexford; Scott Shenker; Jonathan Turner. OpenFlow: Enabling Innovation in Campus Networks. 2008. 2. OpenFlow Switch Specication Version 1.1.0 Implemented ( Wire Protocol 0x02 ). Available: 3. POX controller. Available: 4. N. Gude; T. Koponen; J. Pettit; B. Pfaff; Martin Casado; Nick Mckeown; Scott Shenker. NOX: towards an operating system for networks. SIGCOMM Comput. Commun. Rev., vol. 38, 2008; pp. 105-110. 5. Mininet. Available: 6. OpenvSwitch. Available: 7. MRTG. Available: © 2014 by the authors; licensee Asia-Pacific Advanced Network. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license ( 22 Available online at Available online at Available online at ScienceDirect Procedia Computer Science (2016) 000–000 Procedia Computer Science 11000 (2017) 516–523 Procedia Computer Science 00 (2016) 000–000 International Workshop on Applications of Software-Defined Networking in Cloud Computing International Workshop on Applications of(SDNCC) Software-Defined Networking in Cloud Computing (SDNCC) SOFTmon SOFTmon -- Traffic Traffic Monitoring Monitoring for for SDN SDN a b Marc Marc Hartung Hartunga ,, Marc Marc Körner Körnerb b International b International a FernUniversität a FernUniversität in Hagen, Universitätsstraße 11, 58084 Hagen, Germany Hagen, 1947 Universitätsstraße 58084 Germany Computer ScienceinInstitute, Center Street,11, Suite 600,Hagen, Berkeley, CA 94704-1198, USA Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, CA 94704-1198, USA Abstract Abstract Software Defined Networking (SDN) substrates are basic enabler for the network virtualization. They provide many opportuSoftware Defined Networking (SDN) substrates are basic enabler for the network virtualization. They provide many opportunities but also require new solutions for well known legacy mechanisms. Thus, in this paper we present an innovative network nities but also require new solutions for well known legacy mechanisms. Thus, in this paper we present an innovative network monitoring tool which is able to operate with the usual available OpenFlow controllers. The presented tool extends the controller monitoring tool which is able to operate with the usual available OpenFlow controllers. The presented tool extends the controller monitoring capabilities by providing utilization charts and statistics up to a flow level. In order to present the feature set, the tools monitoring capabilities by providing utilization charts and statistics up to a flow level. In order to present the feature set, the tools architecture and implementation will be introduced. Further, an evaluation on a virtualized Mininet network using Open vSwitch architecture and implementation will be introduced. Further, an evaluation on a virtualized Mininet network using Open vSwitch is presented as well as an evaluation on our SDN research cluster with a typical data center fat tree topology composed out of NEC is presented as well as an evaluation on our SDN research cluster with a typical data center fat tree topology composed out of NEC IP 8800 switches. IP 8800 switches. c 2016  The Authors. Published by Elsevier B.V. c 2017  2016 The The Authors.Published Published byElsevier ElsevierB.V. B.V. © Peer-review Authors. under responsibilityby of the Conference Program Chairs. Peer-review under responsibility of the Conference Program Chairs. Keywords: Software Defined Networking; OpenFlow; Monitoring; Testbed Keywords: Software Defined Networking; OpenFlow; Monitoring; Testbed 1. Introduction 1. Introduction Since the invention of computer networks, their monitoring had always played a central role regarding the perforSince the invention of computer networks, their monitoring had always played a central role regarding the performance management. It is necessary to identify important status parameter in order to determine the network’s health mance management. It is necessary to identify important status parameter in order to determine the network’s health status. Network monitoring in particular basically means frequently measuring and observing several network related status. Network monitoring in particular basically means frequently measuring and observing several network related parameters like the utilized bandwidth or the latency between nodes. Moreover, also parameters that are based on netparameters like the utilized bandwidth or the latency between nodes. Moreover, also parameters that are based on network nodes like link failures or packet drops are very important indicators if everything in the network is working as work nodes like link failures or packet drops are very important indicators if everything in the network is working as expected. Network monitoring helps to avoid congestion and to identify architectural bottlenecks. Furthermore, physexpected. Network monitoring helps to avoid congestion and to identify architectural bottlenecks. Furthermore, physical failures due to broken cables or offline compute and network nodes can also be identified and resolved promptly. ical failures due to broken cables or offline compute and network nodes can also be identified and resolved promptly. These examples demonstrate clearly how important network monitoring is. Especially today’s network complexity These examples demonstrate clearly how important network monitoring is. Especially today’s network complexity and scale makes monitoring indispensable. In detail, the growing complexity of data centers and networks in genand scale makes monitoring indispensable. In detail, the growing complexity of data centers and networks in gen∗ ∗ Corresponding author. Corresponding E-mail address:author. E-mail address: c 2016 The Authors. Published by Elsevier B.V. 1877-0509  c 2016 The Authors. Published by Elsevier B.V. 1877-0509 Peer-reviewunder responsibility of the Conference Program Chairs. Peer-review©under of the Conference Program B.V. Chairs. 1877-0509 2017responsibility The Authors. Published by Elsevier Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.06.138 2 Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 M. Hartung / Procedia Computer Science 00 (2016) 000–000 517 eral, makes it essential to have a contemporary monitoring solution. In addition, virtualization based on the Cloud Computing (CC) and the SDN paradigm are redefining the challenges for network monitoring on a daily basis. Thus, we would like to introduce our open source SDN monitoring tool SOFTmon. This tool is designed to deliver a Network Operating System (NOS) independent monitoring solution with extended capabilities. Therefore, it extends the conventional NOS based monitoring capabilities by providing additional graphical transmission charts with several utilization information on switch, port, and even on a flow level. For instance, it supports the opportunity to differentiate several IP related flows and their load in context of the overall network utilization and capabilities. The remaining paper is structured as follows. In section 2 some background information is given and some related work is presented. This is followed by the tool’s architecture as described in section 3 and the prototypical implementation in section 4. Finally, an evaluation is presented in section 5, which is followed by a brief conclusion in section 7. 2. Background and related work Monitoring in general is a very important and almost underestimated topic. Networks are usually monitored with several different mechanism. For instance, the host based latency measurements via Internet Control Message Protocol (ICMP) or network node based queries via the Simple Network Management Protocol (SNMP). However, these applications need to be configured and tested in a decentralized way. Thus, a centralized monitoring server component, like e.g. Zabbix 1 or Nagios 2 , is required. This monitoring server collects, analyze, and visualize the frequently obtained information. Although these particular mechanisms can also be applied to SDN networks, technologies like the OpenFlow protocol provide and support direct access to the network nodes and a variety of statistic information. Thus, a monitoring solution utilizing SDN would be more powerful and would also provide a lot more new opportunities to gather network information, for instance, different flow statistics which can be directly obtained from the flow tables of the switches. Moreover, OpenFlow managed switches typically report any network status changes like e.g. a failed link status, instantly to the NOS. They also exchange frequent keep alive messages with the NOS in order to determine the network status as a whole. On the other hand, the NOS is frequently using a mechanism similar to the Link Layer Discovery Protocol (LLDP) to obtain the current network topology and the regarding interconnects. Furthermore, the NOS can be triggered to query the network nodes via the OpenFlow protocol in order to obtain the flow tables, flow entries, as well as their counters and statistics. This particular mechanism is utilized by SOFTmon to provide a very fine granular flow-based monitoring solution. There are several open source OpenFlow based SDN NOS available. OpenDaylight 3 and Floodlight 4 for instance, are very common at the moment. As previously mentioned, they generally support basic monitoring capabilities like the visualization of network topology or the flow statistics in a tabular representation. Nevertheless, the presentation of this information can be evolved, since it is not really human readable neither it provides an appropriate understanding of the current network utilization. Thus, several papers try to address this issue with proposals and approaches for SDN based network monitoring 5,6,7 . However, almost all of them mainly deal with different measurement approaches and procedures in order to increase the measurement accuracy concerning time. On the other hand, some papers 8,9 present controller module extensions, which are bound to a particular NOS and interact directly with the packet forwarding process. Others again, describe just some early work prototypes 10 , which are not available for testing or downloading. In contrast, the presented SOFTmon tool presented in this paper introduces a method of flow monitoring using the northbound NOS interface. The tool is completely decoupled from other network or software components and acts as an additional utility to observe the network utilization. Furthermore, the prototype implementation including the Floodlight connector is available on GitHub 11 . 3. Architecture The key idea of SOFTmon is to implement a NOS independent traffic monitoring tool which adds additional monitoring capabilities and provides them within a proper visualization. Thus, SOFTmon implements a traffic measurement method that exclusively relies on the switch, port and flow statistics, which are defined by the OpenFlow standard and can be queried and obtained by every common NOS. SOFTmon is a business application on the network 518 Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 M. Hartung / Procedia Computer Science 00 (2016) 000–000 3 application layer of the SDN paradigm as described in 12 and depicted in Figure 1(a). It interacts with the northbound NOS interface via the platform specific application programming interface (API). (a) Conceptual SOFTmon software architecture (b) Mininet based development environment and topology Fig. 1. Architecture The conceptual architecture of SOFTmon itself is likewise based on the pattern of a layered software architecture. The lowest layer is the data access layer and includes database, file I/O, and representational state transfer (REST) support. This layer provides the basic functionality required for the communication with the NOS. The majority of open source NOS implementations provide an interfaces which follows the REST paradigm as northbound API 13 facilitating a programming language independent interface for network applications. Thus, this interface was favored as a connector between SOFTmon and the NOS. Since the NOS northbound API has not yet been standardized, using REST seems to be the best way of adding new NOS connectors for other network controllers easily. Therefore SOFTmon’s architecture contains an abstraction layer called REST connector, which defines the methods and the data model that have to be provided by a specific communication module for the regarding NOS implementation. The next higher layer includes the data model. The data model computes the performance metrics by using the statistics provided by the NOS. The NOS again obtained these information via OpenFlow from the network nodes. However, the data model is composed out of three main elements. First of all this is the topology. It is composed out of all network devices, which typically are switches, and their interconnects. Further, it contains the counters which are keeping the statistical data. The last element is the metrics. They are needed to visualize the network performance. The topology and counter object model is predominantly based on the OpenFlow v1.3 specification 14 . Thus, the functionality of transforming the data model obtained by the NOS non-standardized REST API into SOFTmons’s data model has to be provided by the particular REST client of the data access layer. The topmost layer contains the graphical user interface for user interaction and data visualization. It is a selection of tabs for measurement options, buttons for starting and stopping the visualization, and the chart component for a graphical presentation of the performance metrics in soft real time. During the SOFTmon development process, a virtual test environment based on the Mininet 15 network emulator and the Floodlight 4 SDN controller has been used. This development environment is composed of two virtual machines (VM) with an Ubuntu 14.04 Linux guest operating system nested in a VMware Workstation hypervisor hosted on a Windows 7 Professional system. The first VM contains the Mininet emulator. Mininet is configured with a tree topology with depth two and fanout three, as depicted in figure 1(b). During the entire development process this topology was recurrently used in order to obtain reproducible return values from the NOS. The NOS is encapsulated in the second VM whereas the development itself was carried out directly on the Windows host operating system using the Java and the Eclipse integrated development environment (IDE). Floodlight was chosen for the first implementation in SOFTmon, because it is widely-used in the area of research and development. Moreover, it is comparably less complex then e.g. OpenDaylight and also well documented. It Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 M. Hartung / Procedia Computer Science 00 (2016) 000–000 4 519 further includes a collection of network applications. Almost all of these applications are built as Java modules and directly compiled together with Floodlight. Other built in network applications again are using the REST API. Thus, it comes with a lot of helpful implementation examples. In particular, for the SOFTmon development and testing the modules Forwarding and Learning Switch came into operation. 4. Implementation The architecture introduced in section 3 supports an incremental and modular development of the overall software system. However, the monitoring capabilities of SOFTmon are always limited by the information provided through the RESTful interface of the underlying NOS. In order to integrate a particular NOS, it is necessary to implement the regarding REST client. The SOFTmon prototype works with Floodlight. Albeit a detailed documentation of the unified resource identifiers (URIs) for the specific Floodlight REST calls 16 exist, no data model description for the used Javascript Object Notation (JSON) return structure is given. Thus, the data model needed to be identified first by reverse engineering and probes. In order to support further NOS in the future, the REST client implementation may turn out to be the most complex and time consuming part. This depends highly on the completeness of the NOS REST API documentation. Table 1. Metrics and underlying counters Type Switch Stats. Port Stats. Flow Stats. Metric Flow Count Packet Rate Byte Rate RX Packet Rate TX Packet Rate RX Byte Rate TX Byte Rate RX Port Usage TX Port Usage Packet Rate Byte Rate Unit n n/s Byte/s n/s n/s Byte/s Byte/s % % n/s Byte/s Counter Active Entries (Tables) Received Packets (Flows) Received Bytes (Flows) Received Packets Transmitted Packets Received Bytes Transmitted Bytes Received Bytes Transmitted Bytes Received Packets Received Bytes Unit n n Byte n n Byte Byte Byte Byte n Byte The calculation of the performance metrics is based upon the port and flow statistics defined in the OpenFlow v1.3 standard. Table 4 shows the calculated metrics in relation to the underlying statistics (counters). The switch counters represent aggregated values that are not part of the OpenFlow specification, whereas the aggregation is carried out by the NOS. Since a performance metric m(t) is a time-related value, it can be calculated through the time derivation of the corresponding time-dependent counter c(t): m(t) = d c(t) dt (1) Counter values are available only in time-discrete form. Thus, the calculation of a metric can be approximated by using the corresponding time interval ∆t: m(t) = c(t) − c(t − ∆t) ∆t (2) The OpenFlow specification defines duration counters for port statistics, as well as flow statistics (since OpenFlow version 1.3), which could be used as a time base for the time-dependent counter values. This would help to achieve measurements with a theoretical accuracy up to one nanosecond. However, the counter for the nanosecond portion of the duration is marked as optional in the OpenFlow specification. Thus, the maximum feasible and guaranteed time resolution is one second. In addition, the port duration counters do not exist in earlier OpenFlow versions. 520 Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 M. Hartung / Procedia Computer Science 00 (2016) 000–000 5 Unfortunately, one second is not sufficient to achieve a fluent visualization in soft real time. The solution for this issue is to create an additional time base by generating and adding a system time stamp to the counter values based on the arrival time of the corresponding JSON object received from the NOS. This is also necessary for adding the functionality of presenting historical values. The resulting error of the time stamp approach will be analyzed in detail in section V. Fig. 2. SOFTmon GUI Figure 2 presents an overview of SOFTmon’s graphical user interface on a Windows 7 OS. The parameters and credentials for the REST connection and the regarding NOS can be configured in the upper left area. To the right is an other area where the refreshing time and the amount of values for the visualization can be adjusted. The tree view on the left allows to chose different points of measurement dependent on the selected tab. The tree view also reflects the network topology, while the tabs allow to switch the presentation according to the ports per switch, the flows per switch or the switch interconnects. Further details of the selected sample are displayed on the lower left side, whereas the sample is presented as a chart on the right side. The OpenFlow based statistics values are monitored in soft real time. In order to measure the network utilization caused by a particular flow, some effort has to be spent in filtering the flow tables of a certain switch and locating the statistic entries of interest. The Floodlight REST interface only allows to query the complete list of all flows in all flow tables of a certain switch at once. This list is arranged by the table ID and the processing sequence of each table regarding the switch’s matching process. The current SOFTmon prototype only supports flow monitoring for the network layer. This means in order to become a selectable item in the SOFtmon GUI , a flow needs to have valid entries in the fields IPv4 source and destination address as well as Ethernet source and destination address. Further, the instruction field must contain a valid action. However, flows are installed and deleted dynamically by the Floodlight’s Learning Switch module. Thus, flow list obtained by the NOS and it’s flow statistics can differ in length and sequence from one measurement cycle to the subsequent one. Therefore, the flow that is selected for monitoring has to be identified in the list through an internal matching process in which the following fields are compared: flow table ID, IPv4 source and destination address, Ethernet source and destination address, Ethernet type, IP protocol, transport protocol source and destination port and physical input port. Not mandatory Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 M. Hartung / Procedia Computer Science 00 (2016) 000–000 6 521 values (e.g. transport protocol ports) are substituted by a wildcard for the search process. In order to not disrupt a flow’s utilization visualization metrics when it has been deleted, the statistic values of a missing flow are marked as invalid. This causes the calculation module to return zero as a value. Therefore the graph of the measured metric drops also to zero, but is continued to be drawn until the selected flow is probably active again. The GUI elements that can be selected for monitoring are: switches, switch ports, and flows. They are presented in a tree structure corresponding to the network’s topology. Since flows can be installed and deleted within short time frames, this tree structure for selecting the measurement has to be updated manually by the user. The visual presentation of a metric is implemented with the JChart2D library 17 . It is intended especially for engineering tasks and therefore optimized for the dynamic and precise visualization of data with a minimal configuration overhead. The user can configure the duration of a measurement cycle d M , as well as the amount of values displayed in a graph N M via the GUI. 5. Evaluation In order to determine the error that emerges from the proposed and implemented system time stamp approach to label the probes, as described in section IV, the deviation of a switch port metric m(∆tS ) is measured. This metric is calculated for a time interval ∆tS based on the time stamps for the metric m(∆tC ), which again is calculated for the time interval ∆tC of the time counters. As shown in table 2, the experimental evaluated and calculated relative deviation of the time interval increases slightly with a decreasing duration of the measurement cycle d M . In contrast, the mean deviation of the calculated metric is constantly lower than 0,005 percent. Table 2. Empirical identified error with time stamp approach dM dO dR 1000 ms 27.24 ms 6.28 ms ∆tS m(∆tS ) 500 ms 250 ms 50 ms 21.60 ms 22.39 ms 27.49 ms 6.32 ms 7.11 ms 5.13 ms Mean relative Deviation from ∆tC based Values -0.002 % 0.001 % 0.004 % 0.110 % 0.002 % 0.002 % 0.000 % -0.001 % The mean REST call execution time dR of the test system results in comparatively constant values between approximately five and seven milliseconds with regard to the measurement cycle d M . However, there is an offset dO from the instant of time tS based on time stamps to the instant of time tC based on time counters. This offset has an averages time of around 25 milliseconds. That means, the metrics that are obtained from the NOS are visualized and displayed around 25 milliseconds later than they actually occur. This is negligible for the applicability as anetwork monitoring tool. In a nutshell, the obtained results demonstrate that even commodity hardware is able to deliver a sufficient sample rate and resolution for the usage of SOFTmon. In addition to the Mininet based development environment presented in section III, SOFTmon was also intensively evaluated on a local SDN research cluster. This cluster is named Asok and has a typical SDN enabled data center fat tree network topology. The SDN network is composed out of dedicated OpenFlow switches from NEC. Table 3 lists all components and their hardware and software specifications as used for the cluster based evaluation. Table 3. Asok Cluster Hardware Configuration System Cluster Node NOS Node Monitoring PC Switches Hardware 2 Intel Xeon Quad-Core 2,66 GHz, 32GB RAM Pentium Dual-Core E5500 2,80GHz, 4GB RAM Intel Core i7 2,8 GHz, 8GB RAM NEC IP8800/S3640-48T OS/Firmware Ubuntu Server 14.04.3 LTS 64 bit Ubuntu Desktop 14.04.3 LTS 64 bit Windows 7 Professional 64 bit OS-F3L Ver. 11.1.C.Af In order to evaluate the monitoring performance with SOFTmon, network traffic was generated using the iperf tool 18 . Figure 3(a) depicts the evaluation deployment as well as the iperf server and client configuration. The NOS Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 522 M. Hartung / Procedia Computer Science 00 (2016) 000–000 (a) Asok Research Cluster - Evaluation configuration 7 (b) Port traffic on cluster switches Fig. 3. Evaluation on a SDN cluster and the SOFTmon application are running on dedicated nodes, which are not directly part of the cluster. They are not connected to the SDN data network, but to the separated management network via 1Gbps Ethernet. This network is used for the NOS to switch communication and vice versa. Figure 3(b) shows port traffic probes that were collected with SOFTmon on the cluster. It shows the throughput as byte and packet rate. This particular example was generated with the iperf setup that previously has been introduced and described. The iperf clients were configured to use a to 100Mbit/s limited transmission rate in order to avoid traffic congestion. The graph depicted in figure 3(b), shows the measured and visualized throughput on port 19 of switch nec1-1. This is the incoming (RX) traffic from client asok04, which reaches an averages of 12,5 MByte/s. This correlates with the configured 100 Mbit/s transmission rate. Moreover, the graph on the right shows the outgoing (TX) throughput of port 18 of switch nec3-1 which is the sum of the iperf traffic of all three clients (asok04 to asok06) that was started successively. The traffic that was limited to 100 Mbit/s per client reaches an average overall amount of 37,5 MByte/s, which again correlates to the configured 300 Mbit/s transmission rate. For further evaluation of SOFTmon under real traffic conditions, the development environment, as introduced by fig. 1(b), was used for video streaming experiments. The charts that are presented in 4 were collected while retrieving a video live stream with a web browser that was started on a virtual host in the Mininet environment. The curves are showing data bursts which are typical for video streaming. Fig. 4. Port and flow metrics with Youtube traffic evaluated on Mininet All charts that are presented in this paper are screenshots of the current version of the SOFTmon tool. They reflect samples taken during the evaluation and validation process. 6. Disclosure This work was supported by a fellowship within the FITweltweit programme of the German Academic Exchange Service (DAAD). Marc Hartung et al. / Procedia Computer Science 110 (2017) 516–523 M. Hartung / Procedia Computer Science 00 (2016) 000–000 8 523 7. Conclusion The introduced monitoring tool presents a new and innovative approach for network monitoring in OpenFlow networks. It extends the topology based monitoring capabilities that are provided by common NOS and can be used to to determine any kind of network behavior. The implementation is open source and available on GitHub 11 . Moreover, its reliable software architecture is waiting for contributions from the community, in order to extend the existing implementation of the Floodlight REST client or add support of further NOS. Its implementation is based on Java, so it can be used on any operating system. The presented application was successfully evaluated with Mininet and OpenFlow version 1.3. Moreover, it was evaluated on a SDN research cluster with the typical data center network topology and OpenFlow version 1.0. Furthermore, a live video streaming was used to evaluate the tool under real network traffic conditions. Thus, SOFTmon has already proven that it’s capital S does not mean simple in terms of limited capabilities, it means simple in terms of being easy to operate. The high usability of the tool was actually one of the design requirements. In order to deploy SDN in productive environments, a simple manageable tool like SOFTmon could be very helpful to provide e.g. Network Operation Control (NOC) operators with a simple but powerful administrative application. Furthermore, another benefit is that SOFTmon does not require direct access to the network. Thus, a local admin for example can use the tool to debug a network issue, while the main control over the network still remains by the NOC. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Zabbix :: The enterprise-class monitoring solution for everyone. 2016. URL: Nagios – the industry standard in it infrastructure monitoring. 2016. URL: Opendaylight platform. 2016. URL: Project floodlight. 2016. URL: Baik, S., Lim, Y., Kim, J., Lee, Y.. Adaptive flow monitoring in sdn architecture. In: Network Operations and Management Symposium (APNOMS), 2015 17th Asia-Pacific. 2015, p. 468–470. doi:10.1109/APNOMS.2015.7275368. Isolani, P.H., Wickboldt, J.A., Both, C.B., Rochol, J., Granville, L.Z.. Interactive monitoring, visualization, and configuration of openflow-based sdn. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM). 2015, p. 207–215. doi:10.1109/INM.2015.7140294. Pajin, D., Vuleti, P.V.. Of2nf: Flow monitoring in openflow environment using netflow/ipfix. In: Network Softwarization (NetSoft), 2015 1st IEEE Conference on. 2015, p. 1–5. doi:10.1109/NETSOFT.2015.7116138. van Adrichem, N.L.M., Doerr, C., Kuipers, F.A.. Opennetmon: Network monitoring in openflow software-defined networks. In: 2014 IEEE Network Operations and Management Symposium (NOMS). 2014, p. 1–8. doi:10.1109/NOMS.2014.6838228. Grover, N., Agarwal, N., Kataoka, K.. liteflow: Lightweight and distributed flow monitoring platform for sdn. In: Network Softwarization (NetSoft), 2015 1st IEEE Conference on. 2015, p. 1–9. doi:10.1109/NETSOFT.2015.7116160. Raumer, D., Schwaighofer, L., Carle, G.. Monsamp: A distributed sdn application for qos monitoring. In: Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on. 2014, p. 961–968. doi:10.15439/2014F175. Softmon. 2016. URL: Open networking foundation. 2016. URL: Kreutz, D., Ramos, F.M., Esteves Verissimo, P., Esteve Rothenberg, C., Azodolmolky, S., Uhlig, S.. Software-defined networking: A comprehensive survey 2015;103(1):14–76. Open Networking Foundation, . OpenFlow switch specification 1.3.0. 2012. URL: Bob Lantz, . Mininet VM images. 2016. URL: Floodlight, P.. Floodlight controller - rest api. 2016. URL: Achim Westermann, . Trace2dltd (JChart2d API documentation, version 3.2.2). 2016. URL: iperf2. 2016. URL: IEICE TRANS. INF. & SYST., VOL.E94–D, NO.10 OCTOBER 2011 1917 PAPER Special Section on Information-Based Induction Sciences and Machine Learning Adaptive Online Prediction Using Weighted Windows Shin-ichi YOSHIDA† , Nonmember, Kohei HATANO††a) , Eiji TAKIMOTO†† , Members, and Masayuki TAKEDA†† , Nonmember SUMMARY We propose online prediction algorithms for data streams whose characteristics might change over time. Our algorithms are applications of online learning with experts. In particular, our algorithms combine base predictors over sliding windows with different length as experts. As a result, our algorithms are guaranteed to be competitive with the base predictor with the best fixed-length sliding window in hindsight. key words: machine learning, data stream, online learning, sliding window 1. Introduction Data stream arises in many applications. For example, developments of distributed censor devices enable us to collect data which are generated constantly over time. Also, there are more and more huge data available and such huge data can be viewed as data stream as well if we want to deal with them by “one-pass” scan. Researches on data stream have become popular in various areas in computer science such as databases, algorithms [1], data mining and machine learning [2]. There are two notable properties of data stream. The first property is that nature of the data stream might change over time. The underlying distribution which generates the data might change gradually over time or change suddenly at some trial (concept drifts). So, prediction algorithms for data stream need to adapt concept drifts. The second property is that whole the data stream is too huge to keep since the new data comes endlessly. Therefore, prediction algorithms also need to choose some partial data. A natural approach to deal with data stream is to use a sliding window. The sliding window keeps only recent data. As a new instance comes in, then the oldest instance in the window is discarded from the window and the new one is added to the window. Then prediction algorithms use only the data of the sliding window to make predictions on future data. For time-changing data streams, it is reasonable to assume that recent data is more informative than older data. So, sliding window approaches seem to work well for prediction tasks on data streams. However, it is not trivial to determine the size of sliding window in advance. If the size is Manuscript received January 7, 2011. Manuscript revised May 4, 2011. † The author is with NTT West, Osaka-shi, 540–8511 Japan. †† The authors are with Department of Informatics, Kyushu University, Fukuoka-shi, 819–0395 Japan. a) E-mail: DOI: 10.1587/transinf.E94.D.1917 too large, accuracy of prediction might become worse when the nature of the data stream changes, since older data affects the prediction. On the other hand, if the size is too small, accuracy of prediction might become worse as well when the data is rather stationary. There are some researches to make the size of the sliding window adaptive [3]–[6]. These proposed methods heavily depend on the choice of parameters, e.g., a threshold to determine when to discard data in the window. For example, given a single temporal outlier, ADWIN [3] discards the all data in the window even if the data stream is rather stationary, which might get the accuracy worse. In this paper, we take an alternative approach. Instead of choosing a fixed-sized window or changing the size of the window adaptively, we combine the predictions using multiple windows with different sizes. More precisely, we employ the approach of online learning with experts [7]–[18]. We consider M “sub-windows” which contains the k newest elements of the window (k = 1, . . . , M). We assume a fixed predictor, called the base predictor, which works with a sliding window so that it makes predictions using only the data in the window. Since we have M sliding windows of size 1 through M, we get M predictors: the k-th predictor is the base predictor running with the sliding window of size k. Using these M predictors as experts and applying Weighted Average Algorithm by Kivinen and Warmuth [19], we obtain an online prediction algorithm, called the Weighted Window (WW, for short). The WW is guaranteed to perform almost as well as the best expert, i.e., the base predictor with the best fixed size window. More precisely, we show that the WW has O(ln M) regret, where the regret of a prediction algorithm is defined as the cumulative loss of the algorithm minus that of the predictor with the best fixed size window. Furthermore, we apply the method of Hazan and Seshadhri [20] to make the prediction algorithm more adaptive. In particular, by combining multiple copies of WWs over different intervals, we obtain the second algorithm called the Weighted Windows with follow the leading Hisotry (WWH, for short). The WWH is guaranteed to perform almost as well as the best experts for all intervals in the data stream. More precisely, we show that for any interval I in the data stream, the regret of WWH measured for I is bounded above by O(ln M ln T + ln2 T ), where T is the length of the data stream. Note that our contribution is not to develop new tech- c 2011 The Institute of Electronics, Information and Communication Engineers Copyright  IEICE TRANS. INF. & SYST., VOL.E94–D, NO.10 OCTOBER 2011 1918 niques for online learning with experts, but to apply online learning with experts framework to sequence prediction using sliding windows in order to make it robust. In our experiments over artificial and read time-series data, WW and WWH outperform other previous methods and compete fairly with predictions with the best fixed window. 2. Preliminaries For a fixed integer N, let X be the domain of interest. A member x in X is called an instance. For integers r and s(r ≤ s), we denote by [r, s] the set consisting of sequential integers r, . . . , s. In particular, we write [s] for short if r = 1. 2.1 Online Prediction with Experts We consider the following protocol of online prediction. At each trial t = 1, . . . , T , 1. the adversary gives an instance xt to the learner, 2. the learner guesses a prediction ŷt ∈ [0, 1] for xt , 3. the adversary gives the true value yt ∈ [0, 1] to the learner, and 4. the learner incurs loss (yt , ŷt ). Here the function  : [0, 1] × [0, 1] → R is called the loss function. In particular, in the setting of online learning with experts ([7], [8]), the learner can use predictions of experts. More precisely, the learner is given M experts in advance. At each trial t, each expert is given an instance xt and returns its prediction ŷt,i ∈ [0, 1]. The goal of the learner is to predict as well as the best expert in hindsight. More detailed goals are the following: • Regret: T  (yt , ŷt ) − min i=1,...,M t=1 T  (yt , ŷt,i ). t=1 • Adaptive regret [20] ⎧ s ⎫ s ⎪ ⎪  ⎪ ⎪ ⎨ ⎬ sup ⎪ (y , ŷ ) − min (y , ŷ ) . ⎪ t t t t,i ⎪ ⎪ ⎩ ⎭ i=1,...,M I=[r,s]⊂[T ] t=r t=r A loss function  is called α-exp concave if the function e−α(ŷ,y) is concave w.r.t. ŷ. It is known that some natural loss functions such as square loss, log loss, relative entropy and Hellinger loss are α-exp concave for some α (see, e.g., [9]). Let us discuss the difference between regret and adaptive regret. Regret measures the difference between cumulative losses of the algorithm and the best expert for all T trials. However, low regret does not necessarily imply a good performance over time-changing data. This is because, for data changing its tendency over time, the best expert for all T trials might not adapt changes in data and predicts badly for some intervals. On the other hand, If the adaptive regret is bounded, then the regret w.r.t. the best expert for any interval is bounded as well. So, minimizing the adaptive regret is more challenging goal, especially for time-changing data streams. 2.2 Sliding Window A sliding window is popular in the task of prediction of data streams which might change over time. A sliding window of size k keeps k newest instances. More formally, the sliding window W of size k at trial t is a sequence of instances of size k, ⎧ t−1 ⎪ ⎪ if t − 1 ≤ k ⎨∪ j=1 {(x j , y j )} . W=⎪ ⎪ ⎩∪t−1 {(x , y )} if t−1>k j j j=t−k We assume a base prediction algorithm associated with a sliding window. The algorithm uses the examples in the sliding window to make predictions. In general, behaviors of the prediction algorithm using a sliding window depend on the size of the window. When the size of the sliding window is large, predictions using the window tends to be insensitive to outliers. So, the predictions are robust with respect to temporal noises. However, if the tendency of the data stream changes, the predictions tend to become worse, since older data in the sliding window affects so that the prediction algorithm to adapt the change more slowly. On the other hand, when the size of the sliding window is small, the predictions are more sensitive to changes in the data stream. Thus the prediction algorithm can adapt the change quickly. But, its disadvantage is that the predictions become sensitive to temporal noises as well. Therefore, in order to predict adaptively w.r.t. data streams, we need to determine the size of the sliding window appropriately. 2.3 Our Goal Given a base predictor, our goal is to predict as well as the base predictor using the best fixed-sized sliding window. Specifically, we aim to construct online prediction algorithms whose regret or adaptive regret w.r.t. the base predictor with the best fixed sliding window. 3. Algorithms In this section, we propose two algorithms, which are modifications of existing algorithms having regret and adaptive regret bounds, respectively. 3.1 Weighted Window The first algorithm, which we call Weighted Window (WW), the special case of Weighted Average Algorithm [19] with base predictors with sliding windows as experts. More precisely, WW has a sliding window of size M, and the sliding window induces M sub-windows. Each sub-window, which is denoted as W[i] (i = 1, . . . , M), has the at most i newest examples. We regard the base predictor with each subwindow W[i] as an expert. That is, each expert predicts using the base predictor and the data in the sub-window W[i]. YOSHIDA et al.: ADAPTIVE ONLINE PREDICTION USING WEIGHTED WINDOWS 1919 Algorithm 1 WW(Weighted Window) 1 1 1. w1 = ( M ,..., M ). 2. For t = 1, . . . , T a. The sliding window W contains at most M newest examples before trial t. ⎧ t−1 ⎪ ⎪ if t − 1 ≤ M, ⎨∪ j=1 {(x j , y j )} W=⎪ ⎪ ⎩∪t−1 {(x j , y j )} if t − 1 > M. j=t−M b. c. d. e. f. Each sub-window W[i] contains at most i newest examples (i = 1, . . . , M). Receive an instance xt . Each expert Ei predicts ŷt,i using the sub-window W[i] (1 ≤ i ≤ M). M wt,i ŷt,i . Predict ŷt = i=1 Receive the true outcome yt . Update the weight vector. wt+1,i = e−α(yt ,ŷt,i ) M −α(yt ,ŷt,i ) i=1 e . Finally, Weighted Average Algorithm combines the experts’ predictions by computing the weighted average. The details of WW is given in Algorithm 1. An advantage of WW is that it predicts adaptively w.r.t. changes of tendency in the data stream. For example, when the tendency of the data changes drastically, experts corresponding to small sub-windows would have larger weights and older data no longer affects the predictions of WW. Similarly, if the tendency of the data does not change, experts corresponding to large sub-windows would have larger weights and predictions of WW would become resistant to temporal outliers in the data stream. The regret bound of WW is directly follows from that of Weighted Average Algorithm [19]. Theorem 1 (Kivinen & Warmuth [19]). Suppose that the loss function  is α-exp concave. Then the regret of WW is at most (1/α) ln M. By Theorem 1, WW is guaranteed to perform almost as well as predictions with best fixed window of size less than M. 3.2 Weighted Window with Follow the Leading History The second algorithm, Weighted Window with follow the leading History (WWH), is a modification of Follow the Leading History (FLH) [20] with many copies of WWs as experts. Specifically, WWH differs from FLH in that experts of FLH use all the past instances given to them while those of WWH use instances in their sliding windows only. This change makes, as we will show later, practical improvement in changing environments. At each trial i, WWH generates a copy of WW denoted as WW i as an expert. Each WW i has a lifetime lifetimei and it is only active in lifetimei trials. Each WW i runs WW through the data given in the lifetime. Each WW has a sliding window of size M and M sub-windows as subexperts. At each trial, WWH combines the prediction of experts which are active at the trial. More precisely, an expert WW i , which is generated at trial i, is active at trial t if i + liftimei ≥ t. The lifetimei of expert WW i is given as follows: If i is represented as i = r2k , where r is some odd number and k is an integer, we fix lifetimei = 2k+2 + 1. Note that r and k are unique for each i. For example, if i is odd, then k = 0 and r = i. Similarly, if i is even, there also exist unique k such that k ≥ 1 and odd number r satisfying i = r2k . Let At be the set of indices of active experts at trial t. Then the following lemma holds [20]. Note that the lemma holds not only for FLH but also for WWH. Lemma 1 (Hazan & Seshadhri [20]). 1. For any s ≤ t, [s, (s + t)/2] ∩ At  ∅. 2. For any t, |At | = O(log T ). 3. For any t, At+1 \ At = {t + 1}. The description of WWH is given in Algorithm 2. At each trial WWH combines the predictions of active experts by computing the weighted average. Then WWH generates a new expert. Finally, WWH removes the experts whose lifetimes are zero, and normalize the weights of active experts. 3.3 Analysis We show a regret bound of WWH. First, we use the lemma for FLH [20]. Lemma 2 ([20]). Suppose that for an interval I = [r, s] WW r is active. Then, regret of WWH w.r.t. WW r for the interval I is α2 (ln r + ln |I|). By Lemma 2 and Theorem 1, we have the following lemma. Lemma 3. Suppose that, during the interval I = [r, s], WW r is active. Then, regret of WWH w.r.t. any sub-window for the interval I is at most α2 (ln r + ln |I| + ln M). Then we analyze the regret of WWH w.r.t. any interval I = [r, s] and any sub-window. Lemma 4. For any interval I = [r, s], the regret of WWH w.r.t. I is O( α1 (ln M + ln s) ln |I|). Proof. By Lemma 1, for the trial s and the interval I = [r, s], there exists i ∈ A s such that (i) i ∈ [r, r+s 2 ], and (ii) the expert WW i is generated at trial i and is active at trial s. Therefore, by Lemma 3, the regret of WWH for any sub-window and the interval I is at most α2 (ln i + ln |I| + ln M). Similarly, for the interval [r, i], there exists i such that i ∈ [r, r+i 2 ], we can evaluate the regret of WWH for any sub-window and the interval [i , i]. Note that, by this argument, the interval for which the regret is evaluated becomes at most a half of the original interval. So, there are at most log2 |I| intervals to consider. Thus the regret of WWH for any sub-window and the interval I = [r, s] is at most IEICE TRANS. INF. & SYST., VOL.E94–D, NO.10 OCTOBER 2011 1920 Algorithm 2 WWH (Weighted Window with follow the leading History) 1. Let A1 ={1} and w1,1 = 1. Generate the expert WW 1 having WW as its prediction algorithm. 2. For t = 1, . . . , T a. b. c. d. e. Receive an instance xt . For each i ∈ At , the expert WW i predicts ŷt,i Predict ŷt = i∈At wt,i ŷt,i . Receive the true outcome yt . Update: ŵt+1,i = wt,i e−α(yt ,ŷt,i ) j∈At wt, j e−α(yt ,ŷt, j ) f. Add the new expert WW t : ⎧ 1 ⎪ ⎪ ⎨ t+1 w̄t+1,i = ⎪ ⎪ ⎩(1 − 1 )ŵt+1,i t+1 if i = t + 1, if i  t + 1. w̄t+1,i . j∈At+1 w̄t+1, j 2 (ln s + ln |I| + ln M) · log2 |I| α 1 = O (ln M + ln s) ln |I| . α  Finally, we prove the adaptive regret bound of WWH. Theorem 2. The adaptive regret of WWH w.r.t. the best fixed-sized window is O(ln M ln T + ln2 T ). Proof. By Lemma 4, the regret of WWH for any interval I =  [r, s] and the best sub-window is O α1 (ln M + ln s) ln |I| . Since, s, |I| ≤ T , we complete the proof.  4. Experiments We evaluate our proposed algorithms and other previous methods over synthetic and real time series data. The data we deal with has the following form: S = {(x1 , y1 ), (x2 , y2 ), . . . , (xt , yt )}(1 ≤ t ≤ T ), where each xt = t, and yt ∈ [0, 1] (1 ≤ t ≤ T ). At each trial t, each prediction algorithm is supposed to predict ŷt ∈ [0, 1], given xt = t. The loss function we consider here is square loss, i.e., (y, ŷ) = (y − ŷ)2 . It can be shown that square loss (y, ŷ) is α-exp concave for α ≤ 1/2 when y, ŷ ∈ [0, 1]. Since larger α implies smaller regret (as stated in Theorem 1), we fix α = 1/2 when we use WW in our experiments. We assume that the base prediction algorithm associated with each sub-window performs least square regression. Then, given M examples (x1 , y1 ), . . . , (x M , y M ) ⊂ (R × [0, 1]) M , the prediction is ŷ = ax + b, where a= M i=1 (xi − x̄)(yi − M 2 i=1 (xi − x̄) ȳ) . Here, x̄, ȳ are denoted as the averages of xi and yi (i = 1, . . . , M), respectively. For WW, a naive implementation takes O(M 2 ) time to make a prediction. But, it is possible to reduce the computation down to O(M). The algorithms we evaluate are WW, WWH, FLH [20], ADWIN (ADaptive WINdowing) [3], KAARch (Kernel Aggregating Algorithm for Regression with Changing dependencies) [17], and the best fixed-sized sub-window. Note that all algorithms use least square regression as the base prediction algorithm. For ADWIN, we set δ = 0.9. For KAARCh, we set  a = Y 2 c2 T (T − 1)/2s(T ) as suggested in [18]. 4.1 g. Let At+1 be the set of indices of active experts at trial t + 1. For each i ∈ At+1 , let wt+1,i = b = ȳ − a x̄. Experiments for Artificial Data We use the following artificial data sets: Each data set consists of a sequence of 1000 examples. Radical : a sequence where the values radically changes at t = 500 (in Fig. 1). Gradual : a sequence where the values gradually change at t = 300, . . . , 700 (in Fig. 2). Temporal : a sequence where an outlier appears after each 200 steps (in Fig. 3). Random : a sequence where a trend changes at random trials and the degree of the gradient changes as much as randomly determined (in Fig. 4). For each artificial data, we further add random noises at each trial, where each random noise is generated i.i.d. from N(0, 0.05). The cumulative loss of each algorithm for each data is shown in Figs. 1, 2, 3, 4, respectively. We set the size of the window M = 10 and α = 1/2. For all artificial data sets except from Gradual data, WW and WWH perform better than other algorithms. Further, cumulative losses of WW, WWH and the best window are close. For Gradual data, ADWIN performs best among all algorithms other than the best window. Curiously, even though WWH has a stronger theoretical guarantee (i.e., the adaptive regret), its performance is slightly worse than that of WW. Other algorithms sometimes perform well but sometimes not. In particular, FLH seems not to adapt changes of the data well at later trials. ADWIN shows a good performance for data with gradual changes such as Gradual data, but behaves badly when temporal noises appear in the data such as Temporal data. We omit the plots of KAARCh since its performance is much worse than others. 4.2 Experiments on Real Data As a real data, we use Nikkei 225, a stock market index for the Tokyo Stock Exchange. The data consists of 6311 daily average stocks (the closing price) of 225 Japanese representative companies, ranging from 1984/1/4 to 2009/8/27 (in YOSHIDA et al.: ADAPTIVE ONLINE PREDICTION USING WEIGHTED WINDOWS 1921 Fig. 1 Radical data (left) and the cumulative losses of the algorithms (right). Fig. 2 Gradual data (left) and the cumulative losses of the algorithms (right). Fig. 3 Temporal data (left) and the cumulative losses of the algorithms (right). IEICE TRANS. INF. & SYST., VOL.E94–D, NO.10 OCTOBER 2011 1922 Fig. 4 Fig. 5 Random data (left) and the cumulative losses of the algorithms (right). Nikkei 225 data (left) and the cumulative losses of the algorithms (right). Fig. 5). We set M = 50 for WW and WWH. For Nikkei 225 data, again, WW and WWH perform the best among other algorithms except from the best window (We omit the plot of KAARCh since the performance is much worse). Similar to the results for artificial data, WWH performs slightly worse than WW. 5. Conclusion In this paper, we propose algorithms for predicting of data streams, based on techniques of online learning with experts. The first algorithm, WW, combines slide windows of different sizes, and we show it predicts almost as well as the best sub-window. The second algorithm, WWH, further combines WWs over different intervals, having the adaptive regret small. Acknowledgements We thank anonymous reviewers for providing many useful comments and suggestions to improve initial manuscript. This research was partially supported by MEXT Grand-inAid for Young Scientists (B) 21700171. References [1] S. Muthukrishnan, “Data streams: Algorithms and applications,” Foundations and Trends in Theoretical Computer Science, vol.1, no.2, 2005. [2] C.C. Aggarwal, ed., Data Streams: Models and Algorithms, Springer, 2007. [3] A. Bifet and R. Gavaldà, “Learning from time-changing data with adaptive windowing,” Proc. 7th SIAM International Conference on Data Mining (SDM’07), pp.443–449, 2007. [4] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” SBIA Brazilian Symposium on Artificial Intelligence, pp.286–295, 2004. [5] R. Klinkenberg and T. Joachims, “Detecting concept drift with support vector machines,” Proc. International Conference on Machine Learning (ICML), 2000. [6] G. Widmer and M. Kubat, “Learning in the presence of concept drift and hiden contexts,” Mach. Learn., vol.23, no.1, pp.69–101, 1996. [7] V. Vovk, “Aggregating strategies,” Proc. 3rd Annual Workshop on Computational Learning Theory, pp.371–386, 1990. [8] N. Littlestone and M.K. Warmuth, “The weighted majority algorithm,” Inf. Comput., vol.108, no.2, pp.212–261, 1994. [9] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, 2006. [10] M. Herbster and M. Warmuth, “Tracking the best linear predictor,” YOSHIDA et al.: ADAPTIVE ONLINE PREDICTION USING WEIGHTED WINDOWS 1923 J. Machine Learning Research, vol.1, pp.281–309, 2001. [11] O. Bousquet and M.K. Warmuth, “Tracking a small set of experts by mixing past posteriors,” J. Machine Learning Research, vol.3, pp.363–396, 2002. [12] J.Z. Kolter and M.A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” J. Machine Learning Research, vol.8, pp.2755–2790, 2007. [13] P. Auer, N. Cesa-Bianchi, and C. Gentile, “Adaptive and selfconfident on-line learning algorithms,” J. Comput. Syst. Sci., vol.64, pp.48–75, 2002. [14] M. Herbster and M.K. Warmuth, “Tracking the best expert,” Mach. Learn., vol.32, no.2, pp.151–178, 1998. [15] C. Monteleoni and T.S. Jaakkola, “Online learning of non-stationary sequences,” Advances in Neural Information Processing Systems 16 (NIPS’03), 2004. [16] V. Vovk, “Competitive on-line statistics,” International Statistical Review, vol.69, no.2, pp.213–248, 2001. [17] S. Busuttil and Y. Kalnishkan, “Online regression competitive with changing predictors,” Proc. 18th Conference on Algorithmic Learning Theory (ALT’07), pp.181–195, 2007. [18] S. Busuttil and Y. Kalnishkan, “Weighted kernel regression for predicting changing dependencies,” Proc. 18th European Conference on Machine Learning, pp.535–542, 2007. [19] J. Kivinen and M.K. Warmuth, “Averaging expert predictions,” Proc. 4th European Conference on Computational Leanring Theory (EuroCOLT’99), pp.153–167, 1999. [20] E. Hazan and C. Seshadhri, “Efficient learning algorithms for changing environments,” Proc. 26th Annual International Conference on Machine Learning (ICML’09), 2009. Shin-ichi Yoshida received B.E and M.E. degrees from Kyushu University in 2008 and 2010, respectively. He now works for NTT West. Kohei Hatano received Ph.D. from Tokyo Institute of Technology in 2005. Currently, he is an assistant professor at Department of Informatics in Kyushu University. His research interests include boosting, online learning and their applications. Eiji Takimoto received Dr. Eng. degree from Tohoku University in 1991. Currently, he is a professor at Department of Informatics in Kyushu University. His research interests include computational complexity, computational learning theory, and online learning. Masayuki Takeda received Dr. Eng. degree from Kyushu University in 1996. Currently, he is a professor at Department of Informatics in Kyushu University. His research interests include string algorithms, data compression, and discovery science. Learning from Time-Changing Data with Adaptive Windowing ∗ Albert Bifet Ricard Gavaldà Universitat Politècnica de Catalunya {abifet,gavalda} variable size containing bits or real numbers. The algorithm automatically grows the window when no change is apparent, and shrinks it when data changes. Unlike many related works, we provide rigorous guarantees of its performance, in the form of bounds on the rates of false positives and false negatives. In fact, it is possible to show that for some change structures, ADWIN automatically adjusts its window size to the optimum balance point between reaction time and small variance. Since ADWIN keeps bits or real numbers, it can be put to work together with a learning algorithm in the first way, that is, to monitor the error rate of the current model. The first version of ADWIN is inefficient in time and memory. Using ideas from data-stream algorithmics, we provide another version, ADWIN2, working in low memory and time. In particular, ADWIN2 keeps a window of length W with O(log W ) memory and update time, while keeping essentially the same performance guarantees as ADWIN (in fact, it does slightly better in experiments). Because of this low time and memory requirements, it is thus possible to use ADWIN2 in the 1 Introduction second way: a learning algorithm can create many Dealing with data whose nature changes over time is instances of ADWIN2 to maintain updated the statistics one of the core problems in data mining and machine (counts, averages, entropies, . . . ) from which it builds learning. To mine or learn such data, one needs strate- the model. We compare ADWIN2 with a number of fixed-size gies for the following three tasks, at least: 1) detecting when change occurs 2) deciding which examples to keep windows and show, as expected, that it performs about and which ones to forget (or, more in general, keeping as well or only slightly worse than the best window for updated sufficient statistics), and 3) revising the current each rate of change, and performs far better than each windows of any fixed-size W when the change of rate is model(s) when significant change has been detected. Most strategies use variations of the sliding window very different from W . NOTE: Several discussions, technical details, comidea: a window is maintained that keeps the most recently read examples, and from which older examples parison to related work, and results of experiments can are dropped according to some set of rules. For this be found in an extended version, available from the authors’ homepages. three tasks, the content of the window can be used. In this paper, we present a new algorithm (ADWIN, for ADaptive WINdowing) for maintaining a window of 2 Maintaining Updated Windows of Varying Length In this section we describe our algorithms for dynam∗ Partially supported by the 6th Framework Program of ically adjusting the length of a data window, make a EU through the integrated project DELIS (#001907), by the formal claim about its performance, and derive an effiEU PASCAL Network of Excellence, IST-2002-506778, and by cient variation. the DGICYT MOISES-BAR project, TIN2005-08832-C03-03. Abstract We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user or programmer from having to guess a time-scale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. Using ideas from data stream algorithmics, we develop a time- and memory-efficient version of this algorithm, called ADWIN2. We show how to combine ADWIN2 with the Naı̈ve Bayes (NB) predictor, in two ways: one, using it to monitor the error rate of the current model and declare when revision is necessary and, two, putting it inside the NB predictor to maintain up-to-date estimations of conditional probabilities in the data. We test our approach using synthetic and real data streams and compare them to both fixed-size and variable-size window strategies with good results. Keywords: Data Streams, Time-Changing Data, Concept and Distribution Drift, Naı̈ve Bayes Home pages:{abifet, gavalda} 2.1 Setting The inputs to the algorithms are a confidence value δ ∈ (0, 1) and a (possibly infinite) sequence of real values x1 , x2 , x3 , . . . , xt , . . . The value of xt is available only at time t. Each xt is generated according to some distribution Dt , independently for every t. We denote with µt the expected value when it is drawn according to Dt . We assume that xt is always in [0, 1]; by an easy rescaling, we can handle any case in which we know an interval [a, b] such that a ≤ xt ≤ b. Nothing else is known about the sequence of distributions Dt ; in particular, µt is unknown for all t. 2.2 First algorithm ADWIN keeps a sliding window W with the most recently read xi . Let n denote the length of W , µ̂W the (observed) average of the elements in W , and µW the (unknown) average of µt for t ∈ W . Strictly speaking, these quantities should be indexed by t, but in general t will be clear from the context. Algorithm ADWIN is presented in Figure 1. The idea is simple: whenever two “large enough” subwindows of W exhibit “distinct enough” averages, one can conclude that the corresponding expected values are different, and the older portion of the window is dropped. The value of cut for a partition W0 · W1 of W is computed as follows: Let n0 and n1 be the lengths of W0 and W1 and n be the length of W , so n = n0 + n1 . Let µ̂W0 and µ̂W1 be the averages of the values in W0 and W1 , and µW0 and µW1 their expected values. To obtain totally rigorous performance guarantees we define: m = δ0 = 1 (harmonic mean of n0 and n1 ), 1/n0 + 1/n1 r 4 1 δ , and cut = · ln 0 . n 2m δ Our statistical test for different distributions in W0 and W1 simply checks whether the observed average in both subwindows differs by more than the threshold cut . The role of δ 0 is to avoid problems with multiple hypothesis testing (since we will be testing n different possibilities for W0 and W1 and we want global error below δ). Later we will provide a more sensitive test based on the normal approximation that, although not 100% rigorous, is perfectly valid in practice. Now we state our main technical result about the performance of ADWIN: Theorem 2.1. At every time step we have 1. (False positive rate bound). If µt remains constant within W , the probability that ADWIN shrinks the window at this step is at most δ. 2. (False negative rate bound). Suppose that for some partition of W in two parts W0 W1 (where W1 ADWIN: Adaptive Windowing Algorithm 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µ̂W0 − µ̂W1 | ≥ cut holds 6 for every split of W into W = W0 · W1 7 output µ̂W Figure 1: Algorithm ADWIN. contains the most recent items) we have |µW0 − µW1 | > 2cut . Then with probability 1 − δ ADWIN shrinks W to W1 , or shorter. In practice, the definition of cut as above is too conservative. Indeed, it is based on the Hoeffding bound, which is valid for all distributions but greatly overestimates the probability of large deviations for distributions of small variance; in fact, it is equivalent to assuming always the worst-case variance σ 2 = 1/4. In practice, one can observe that µW0 − µW1 tends to a normal distribution for large window sizes, and use r 2 2 2 2 · ln 2 + (2.1) · σW ln 0 , cut = 0 m δ 3m δ 2 where σW is the observed variance of the elements in window W . Thus, the term with the square root is essentially equivalent to setting cut to k times the standard deviation, for k depending on the desired confidence δ, as is done in [4]. Setting δ 0 = δ/(ln n) is enough in this context to protect from the multiple hypothesis testing problem. Let us consider how ADWIN behaves in two special cases: sudden (but infrequent) changes, and slow gradual changes. Suppose that for a long time µt has remained fixed at a value µ, and that it suddenly jumps to a value µ0 = µ + . By part (2) of Theorem 2.1 and Equation 2.1, one can derive that the window will start shrinking after O(µ ln(1/δ)/2 ) steps, and in fact will be shrunk to the point where only O(µ ln(1/δ)/2 ) examples prior to the change are left. From then on, if no further changes occur, no more examples will be dropped so the window will expand unboundedly. In case of a gradual change with slope α following a long stationary period at µ, the average of W1 after p n1 steps is µ + αn1 /2; we have (= αn1 /2) ≥ O( µ ln(1/δ)/n1 ) iff n1 = O(µ ln(1/δ)/α2 )1/3 . So n1 steps after the change the window will start shrinking, and will remain at approximately size n1 from then on. A dependence on α of the form O(α−2/3 ) may seem ADWIN 0.9 0.8 0.7 0.6 µ axis 0.5 0.4 0.3 0.2 0.1 µt µ̂W W 2500 2000 1500 Width 1000 500 0 0 Figure 2: change. ADWIN 500 10001500200025003000 t axis 0.9 0.8 0.7 0.6 µ axis 0.5 0.4 0.3 0.2 0.1 2500 µt µ̂W W 2000 1500 Width 1000 500 0 0 1000 2000 3000 t axis 4000 Output of algorithm ADWIN with abrupt Figure 3: Output of algorithm ADWIN with slow gradual changes. odd at first, but one can show that this window length is actually optimal in this setting: it minimizes the sum of variance error (due to short window) and error due to out-of-date data (due to long windows in the presence of change). Thus, in this setting, ADWIN provably adjusts automatically the window setting to its optimal value, up to multiplicative constants. Figures 2 and 3 illustrate these behaviors. In Figure 2, a sudden change from µt−1 = 0.8 to µt = 0.4 occurs, at t = 1000. In Figure 3, µt gradually descends from 0.8 to 0.2 in the range t ∈ [1000..2000]. In this case, ADWIN cuts the window sharply at t around 1200, keeps the window length bounded (with some random fluctuations) while the slope lasts, and starts growing it linearly again after that. law, with no memory overhead with respect to keeping the count for a single window. That is, our data structure will be able to give the number of 1s among the most recently t − 1, t − bcc, t − bc2 c ,. . . , t − bci c, . . . read bits, with the same amount of memory required to keep an approximation for the whole W . Note that keeping exact counts for a fixed-window size is provably impossible in sublinear memory. We go around this problem by shrinking or enlarging the window strategically so that what would otherwise be an approximate count happens to be exact. More precisely, to design the algorithm one chooses a parameter M , which controls both 1) the amount of memory used (it will be O(M log W/M ) words, and 2) the closeness of the cutpoints checked (the basis c of the geometric series above will be about c = 1+1/M ). Note that the choice of M does not reflect any assumption about the time-scale of change: Since points are checked at a geometric rate anyway, this policy is essentially scale-independent. We summarize these main results with the following theorem. 2.3 Improving time and memory requirements Our first version of ADWIN is computationally expensive, because it checks exhaustively all “large enough” subwindows of the current window for possible cuts. Furthermore, the contents of the window is kept explicitly, with the corresponding memory cost as the window grows. To reduce these costs we present a new version Theorem 2.2. The ADWIN2 algorithm maintains a data ADWIN2 using ideas developed in data stream algorith- structure with the following properties: mics [1, 7, 2, 3] to find a good cutpoint quickly. We next • It uses O(M ·log(W/M )) memory words (assuming provide a sketch of how these data structures work. a memory word can contain numbers up to W ). Our data structure is a variation of exponential histograms [3], a data structure that maintains an • It can process the arrival of a new element in O(1) approximation of the number of 1’s in a sliding window amortized time and O(log W ) worst-case time. of length W with logarithmic memory and update time. We adapt this data structure in a way that can • It can provide the exact counts of 1’s for all the provide this approximation simultaneously for about subwindows whose lengths are of the form b(1 + O(log W ) subwindows whose lengths follow a geometric 1/M )i c, in O(1) time per query. Since ADWIN2 tries O(log W ) cutpoints, the total processing time per example is O(log W ) (amortized) and O(log2 W ) (worst-case). In the case of real values, we maintain buckets of two elements: capacity and content. We store at content the sum of the real numbers we want to summarize. We restrict capacity to be a power of two. We use O(log W ) buckets, and check O(log W ) possible cuts. The memory requirement for each bucket is log W + R + log log W bits per bucket, where R is number of bits used to store a real number. The difference in approximation power between ADWIN and ADWIN2 is almost negligible, so we use ADWIN2 exclusively for our experiments. 3 Experimental Validation of ADWIN2 We construct the following experiments to test the performance of our algorithms. We use, somewhat arbitrarily, M = 5 for all experiments. In a first experiment, we investigate the rate of false positives of ADWIN2. This is a very important measure, specially when there is a cost associated with a reported change. To do this, we feed ADWIN2 a data stream of 100,000 bits, generated from a stationary Bernoulli distribution with parameter µ, and different confidence parameters δ. Table 1 shows the ratio of false positives obtained. In all cases, it is below δ as predicted by the theory, and in fact much smaller for small values of µ. Table 1: Rate of false positives µ δ = 0.05 δ = 0.1 δ = 0.3 0.01 0.0000 0.0000 0.0000 0.1 0.0001 0.0002 0.0018 0.3 0.0008 0.0017 0.0100 0.5 0.0012 0.0030 0.0128 In a second set of experiments, we want to compare ADWIN2 as an estimator with estimations obtained from fixed-size window, and with fixed-size window which are flushed when change is detected. In the last case, we use a pair of windows (X, Y ) of a fixed size W . Window X is used as a reference window that contains the first W elements of the stream that occurred after the last detected change. Window Y is a sliding window that contains the latest W items in the data stream. To detect change we check whether the difference of the averages of the two windows exceeds threshold cut . If it does, we copy the content of window Y into reference window X, and empty the sliding window Y . This scheme is as in [6], and we refer to it as “fixed-size windows with flushing”. We build a framework with a stream of synthetic data, and estimators of each class: an estimator that uses ADWIN2, an array of estimators of fixed-size windows for different sizes, and also an array of fixed-size windows with flushing. Our synthetic data streams consist of some triangular wavelets, of different periods, some square wavelets, also of different periods, and a staircase wavelet of different values. We test the estimator’s performance over a sample of 106 points, feeding the same synthetic data stream to each one of the estimators tested. We compute the average distance (both L1 and L2 ) from the true probability generating the data stream to the estimation. Finally, we compare these measures for the different estimators. The general pattern for the triangular or square wavelets is as follows. For any fixed period P , the best fixed-size estimator is that whose window size is a certain fraction of P . ADWIN2 usually does sometimes does worse than this best fixed-size window, but only slightly, and often does better than even the best fixed size that we try. Additionally, it does better than any window of fixed size W when P is much larger or much smaller than W , that is, when W is a “wrong” time scale. The explanation is simple: if W is too large the estimator does not react quickly enough to change, and if W is too small the variance within the window implies a bad estimation. One can check that ADWIN2 adjusts its window length to about P/4 when P is small, but keeps it much smaller than P for large P , in order again to minimize the variance / time-sensitivity tradeoff. In a third type of experiments, we test ADWIN2 as a change detector rather than as an estimator, and compare it to Gama’s method [4]. The measures of interest here are the rate of changes detected and the mean time until detection. To do this, we feed ADWIN2 and Gama’s change detector with four data streams of lengths L = 2, 000, 10, 000, 100, 000 and 1, 000, 000 bits, generated from a Bernoulli distribution of parameter µ. We keep µ = 0.2 stationary during the first L−1, 000 time steps, and then make it increase linearly during the last 1, 000 steps. We try different slopes: 0 (no change), 10−4 , 2·10−4 , 3·10−4 , and 4 · 10−4 . To compare the rate of false negatives on an equal foot, we adjust ADWIN2 confidence parameter δ to have the same rate of false positives as Gama’s method. Table 2 shows the results for two data streams. Rows are grouped in four parts, corresponding to the four values of L that we tested. For each value of L, we give the number of changes detected in the last 1, 000 samples (summed over all runs) and the mean and standard distribution of the time until the change is detected, in those runs where there is detection. The first column gives the ratio of false positives. One observation we made is that Gama’s method tends to detect many more changes early on (when the window is small) and less changes as the window grows. This explains that, on the first column, even if the ratio of false positives is the same, the average time until the first false positive is produced is much smaller for Gama’s method. The last four columns describe the results when change does occur, with different slopes. ADWIN2 detects change more often, with the exception of the L = 2, 000 experiment. As the number of samples increases, the percentage of changes detected decreases in Gama’s methodology; as discussed early, this is to be expected since it takes a long time for Gama’s method to overcome the weight of past examples. In contrast, ADWIN2 maintains a good rate of detected changes, largely independent of the number of the number of past samples L−1, 000. One can observe the same phenomenon as before: even though Gama’s method detects less changes, the average time until detection (when detection occurs) is smaller. 4 Experiments with Naı̈ve Bayes 4.1 An Incremental Naı̈ve Bayes Predictor We compare two time-change management strategies. The first one uses a static model to make predictions. This model is rebuilt every time that an external change detector module detects a change. We use Gama’s detection method and ADWIN2 as change detectors. Gama’s method generates a warning example some time before actually declaring change; see [4] for the details; the examples received between the warning and the change signal are used to rebuild the model. In ADWIN2, we use the examples currently stored in the window to rebuild the static model. The second one is incremental: we simply create an instance Ai,j,c of ADWIN2 for each count Ni,j,c , and one for each value c of C. When a labelled example is processed, add a 1 to Ai,j,c if xi = v ∧ C = c, and a 0 otherwise, and similarly for Nc . When the value of Pr[xi = vj ∧ C = c] is required to make a prediction, compute it using the estimate of Ni,j,c provided by Ai,j,c . This estimate varies automatically as Pr[xi = vj ∧ C = c] changes in the data. Note that different Ai,j,c may have windows of different lengths at the same time. This will happen when the distribution is changing at different rates for different attributes and values, and there is no reason to sacrifice accuracy in all of the counts Ni,j,c , only because a few of them are changing fast. This is the intuition why this approach may give better results than one monitoring the global error of the predictor: it has more accurate information on at least some of the statistics that are used for the prediction. Results on synthetic data are omitted in this version. 4.2 Real-world data experiments We test the performance of our Naı̈ve Bayes predictors using the Electricity Market Dataset described by M. Harries [5] and used by Gama [4]. This dataset is a real-world dataset where we do not know when drift occurs or if there is drift, hence it is not possible to build a static model for comparison as we did before. This data was collected from the Australian New South Wales Electricity Market. The prices are not fixed and are affected by demand and supply of the market. The ELEC2 dataset contains 45312 instances dated from 7 May 1996 to 5 December 1998. Each example of the dataset refers to a period of 30 minutes, i.e. there are 48 instances for each time period of one day. Each example on the dataset has 5 fields, the day of week, the time stamp, the NSW electricity demand, the Vic electricity demand, the scheduled electricity transfer between states and the class label. The class label identifies the change of the price related to a moving average of the last 24 hours. The class level only reflect deviations of the price on a one day average and removes the impact of longer term price trends. At each time step, we train a static model using the last 48 samples received. We compare this static model with other models, also on the last 48 samples. Table 3 shows accuracy results. In each column (a test), we show in boldface the result for ADWIN2 and for the best result. ADWIN2 applied in the incremental time-change model does much better than all the others, with the exception of the shortest fixed-length window, which achieves 86.44% of the static performance compared to ADWIN2 ’s 83.62%. The reason for this anomaly is due to the nature of this particular dataset: by visual inspection, one can see that it contains a lot of short runs (length 10 to 20) of identical values, and therefore a myopic strategy (i.e., a short window) gives best results. ADWIN2 behaves accordingly and shortens its window as much as it can, but the formulas involved do not allow windows as short as 10 elements. In fact, we have tried replicating each instance in the dataset 10 times, and then ADWIN2 becomes the winner again. We also test the prediction accuracy of these methods. We compare, as before, a static model generated at each time t to the other models, and evaluate them asking to predict the instance that will arrive at time t + 1. The static model is computed training on the last 24 samples. The results are shown in Table 3. Generally, the incremental time-change management model Table 2: Change detection experiments. 2 · 103 samples, 103 trials Slope Detection time (Gama) %runs detected (Gama) Detection time ADWIN2) %runs detected (ADWIN2) 105 samples, 100 trials Detection time (Gama) %runs detected (Gama) Detection time ADWIN2) %runs detected (ADWIN2) 0 854 ± 462 10.6 975 ± 607 10.6 10−4 532 ± 271 58.6 629 ± 247 39.1 2 · 10−4 368 ± 248 97.2 444 ± 210 94.6 3 · 10−4 275 ± 206 100 306 ± 171 93 4 · 10−4 232 ± 178 100 251 ± 141 95 12,164 ± 17,553 12 47,439 ± 32,609 12 127 ± 254 4 878 ± 102 28 206 ± 353 7 640 ± 101 89 440 ± 406 11 501 ± 72 84 658 ± 422 8 398 ± 69 89 Table 3: Naı̈ve Bayes, Electricity data benchmark, testing on last 48 items and on next instance. Testing on last 48 items Testing on next instance Static = 91.62% Static = 94.40% Width %Dynamic % Dynamic/Static %Dynamic % Dynamic/Static Gama Change Detection 45.94% 50.14% 45.87% 48.59% ADWIN2 Change Detection 60.29% 65.81% 46.86% 49.64% ADWIN2 for counts 76.61% 83.62% 72.71% 77.02% Fixed-sized Window 32 79.13% 86.44% 71.54% 75.79% Fixed-sized Window 128 72.29% 78.97% 68.78% 72.87% Fixed-sized Window 512 68.34% 74.65% 67.14% 71.13% Fixed-sized Window 2048 65.02% 71.02% 64.25% 68.07% Fixed-sized flushing Window 32 78.57% 85.83% 71.62% 75.88% Fixed-sized flushing Window 128 73.46% 80.24% 70.12% 74.29% Fixed-sized flushing Window 512 69.65% 76.08% 68.02% 72.07% Fixed-sized flushing Window 2048 66.54% 72.69% 65.60% 69.50% does much better than the static model that refreshes its NB model when change is detected. 5 Conclusions We have described a new method for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We developed an algorithm ADWIN using sliding windows whose size is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user from having to choose any parameter (for example, window size), a step that most often ends up being guesswork. So, client algorithms can simply assume that ADWIN stores the currently relevant data. We tested on both synthetic and real datasets, showed that ADWIN2 really adapts its behavior to the characteristics of the problem at hand. References [1] B. Babcock, S. Babu, M. Datar, R. Motwani, and [2] [3] [4] [5] [6] [7] J. Widom. Models and issues in data stream systems. In Proc. 21st ACM Symposium on Principles of Database Systems, 2002. B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 2002. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 14(1):27–45, 2002. J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. In SBIA Brazilian Symposium on Artificial Intelligence, pages 286–295, 2004. M. Harries. Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales, 1999. D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In Proc. 30th VLDB Conf., Toronto, Canada, 2004. S. Muthukrishnan. Data streams: Algorithms and applications. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2003.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer


Surname 1
Student’s Name
The OpenFlow network
The study provides an insight prototype on a flow-based method to measure traffic
statistics. The recommended software is defined as OpenFlow network that is presumed to be
more adaptive and capable of monitoring flow processing in SDN environment. The OpenFlow
strategy is said to decouple the data plane as well as the control plane. The controller will be able
to initiate commands to the devices for forward packets.
The researcher aims to provide a strategy that will be able to accumulate data as per
count, byte count and drop-count and storing data structure for future retrieval. The module is
conferred to itemize the data in two different types; per port and per flow statistics. The
controller will be able to collect data from the switches periodically and provide them for
circulation. Per port collection provides the basic way of gathering traffic statistics while the port
utilization will offer easy device identification. To preserve the circulated statistics, the flow of
information will be regulated by the Flow Diagnostic Module.
OpenFlow authenticity and configuration of the module will have to utilize a mininet to
create simulations environment. The module will be able to collect the data, analyze the traffic
load and present the results in a weather map. The detailed data in the switches will be listed on
the web interface including the port statistics. Additionally, the method will be able to identify
the actions, headers and the traffic statistics of each flow.

Surname 2
The strategy will result in a flow-based measure for traffic statistics in a software defined
environment and able to provide statistics related to each flow. The method was acknowledged
to provide a more adaptive and realizable observations.
Adaptive online prediction
In this paper, the authors present a proposal for online forecasting algorithms for data streams.
Data stream has two main properties which are the data stream changes over time and that the
whole data stream is vast to keep since new data comes continuously.
The sliding window is a natural approach way of dealing with the data stream. The
sliding window keeps only recent data and as new data comes in the oldest window is discarded
from the window, and a newfound window is added. Prediction algorithms make use of the d ata
from the sliding window to predict forthcoming data.
Time-changing data stream an assumption that the new data is more practical than older
data modify the sliding window approach to work well on prediction tasks on data streams. The
experiment aims to predict the base predictor using the best fixed -sized sliding windows over
read time-series data. The research evaluated the projected algorithms and other methods over
artificial and real time series data. The algorithms include WW (Weighted Window) and WWH
(Weighted Window with following the leading history). They used actual data of Nikkei 225
which is a stock market index for Tokyo Stock Market.
From the experiment (Artificial statistics), considering that radical artificial data changes
at (t=500) continuous data (t=300), temporal (outlier appear after 200steps) and the random
sequences that change after every trial, shown that WW and WWH algorithms perform better
than other algorithms. Similarly, the aggregate losses of the algorithms are close. Considering

Surname 3
the continuous data, the ADWIN achieves better than different algorithms other than the best
windows. Also from the experiment, we find that the property of adaptive regret of the WWH is
less than that of WW even though it has a stronger theoretical guarantee. The ADWIN showed a
good performance with a gradual change of data. The KAARCh experiment showed poor data
Likewise, data collected from the real data shown that the WW and WWH perform best
among other algorithms. On t...

Really great stuff, couldn't ask for more.


Similar Content

Related Tags