SDN Test Suite – Methodology

December 27, 2013 by sdn-test-suite No Comments

Typical SDN-based Network Virtualization

The architecture rolled out can be one of the following:

  • Pure-overlay: Programmable virtual dataplane elements (vDP) are inserted into edge servers and controlled by the controller cluster. They form overlay networks using tunnels that are routed over the legacy network fabric. The gateway can be physical or virtual, and participates in the overlay networking.typical-netvirt-arch
  • Pure-underlay: SDN-enabled top-of-rack or leaf physical switches are deployed and controlled by the controller cluster. The hardware may or may not expose an open API (like OpenFlow). The controller integrates with the VM manager to obtain all information essential to create virtual networks that originate from the server.
  • Combined (aka, P+V or Overlay-Underlay): Vendors are converging towards an architecture where Overlay and Underlay work together to provide both flexibility and performance. This architecture achieves end-to-end visibility and seamless control.

Testing these solutions irrespective of the architectural choice makes it essential to adopt a black-box testing methodology that integrates leverages components from the computing world and the networking world.

Classes of Tests

For testing Network Virtualization, we divided our tests into four main categories:

Functionality tests:
Essentially these test the claimed (and unclaimed) features of the network-virtualization solution, including the following.

Setup for functionality and data plane tests

Setup for functionality and data plane tests using bare-metal servers

  • isolation between different virtual networks for unicast, broadcast, and multicast traffic
  • ability to handle overlapping IP ranges, overlapping MAC addresses
  • the ability to provision QoS configurations on a per-virtual-network basis
  • the ability to live-migrate virtual machines across L2 and L3 boundaries
  • ability to connect to the legacy world (L4-L7 appliances, bare-metal servers, Internet)
  • tolerance of faults of controller, and limited impact for faults on data plane (vSwitch, ports, VMs)
  • IPv6 support

Data plane tests: 
In this series of tests, we compare the throughput, scalability, and other parameters of the data plane under the SDN solution with a baseline (i.e., a configuration without the solution). In our case, the baseline is a non-virtualized network which uses Openstack’s nova-network for networking and Linux bridge module for communication between VMs on a host server. Primary tests under this category involve measuring the following metrics for communication between two VMs, as well as the corresponding CPU utilization of the host servers.

What is measured? VM 1 (10.10.10.1) command VM 2 (10.10.10.2) command
Average flow setup latency (measured in millisecs)
$ for i in `seq 1 20`; do 
ping -c 1 -q 10.10.10.2 >> /tmp/single-packet.ping; sleep 10;
done
Average steady-state latencies (millisecs)
$ ping -c 50000 -f 10.10.10.2
Max TCP unicast through under varying MSS (viz., 90 bytes, 1490 bytes, 9000 bytes)
— Intra-domain (L2 switching between two servers)
— Inter-domain (L3 routing across gateway)
$ for i in 90 1490 9000; do
iperf  -c 10.10.10.2 -t 60 -m -M $i
done
$ iperf  -s
UDP under varying number of parallel sessions (viz., 1 and 5)
— Intra-domain (L2 switching between two servers)
— Inter-domain (L3 routing across gateway)
$ for i in 1 5; do
iperf   -c 10.10.10.2 -u -t 60 -b 10G -P $i
done
$ iperf  -s -u
Max Multicast/broadcast throughput (in packets/sec)
$ time arping -c 1000000 -w 0 10.10.10.2 | tail
Max ARP traffic forwarded (measured in packets/sec)
$ ping -c 1000000 -f -b 10.10.10.255
Max allowed TCP flows between known hosts, without zero drops (measured in flows/sec)
$ for i in 10 100 1000 10000; do
hping3 -c `expr $i \* 30` -i u`expr 1000000 / $i` -S -q -I  -p ++1024 10.10.10.2
done

All tests and measurements need to be conducted between two VMs on 1) the same server, and 2) two different servers.

controlplane_test_setup

Setup for control plane tests

Control plane tests:
SDN solutions gives this additional component to test for bottlenecks and scalability issues. In our set of control plane tests, after initiating tests on the compute nodes, we connect to the machine running the controller (or controller cluster) and measure the controller’s CPU utilization and other scalability parameters. The controller, as shown in the figure, connects to several vDPs (i.e., vSwitch or VM appliances) within a VM. This is how we stress the controller. Tests for the control plane include:

Metrics measured Tests
  • Controller CPU usage (%)
  • Response time (in millisecs)
  • Flow drop rate (in flows/sec)
  • Number of hypervisors, vSwitches, and VNets increased
  • Number of VMs (i.e., MAC and IP addresses) increased
  • Varying levels of illegal traffic injection
  • Controller outage in an redundant setup

Management plane tests:
These are tests typically ignored in the past. With the growing need for agility, we are having to pay attention to the overall orchestration and its impact on the data plane traffic. The main tests for this place include

  • Onboarding time for REST API or GUI based provisioning
  • Onboarding time for CLI based provisioning
  • Time to propagate REST API, GUI and CLI changes
  • Stress test of the management plane CPU

Test Suite Design and Architecture

As we mentioned earlier, we need access to the VMs, hypervisors, and the network configuration manager to orchestrate our SDN tests. As you can tell, we are big fans of open-source, and we decided to go the Openstack route as a uniform way to plug into the network and compute configuration pipeline. This decision was also influenced by most SDN vendors offering a way to tie their solutions into an Openstack cloud deployment.test-suite-arch

The test suite has three main pieces

  • Virtual Network Configuration Module: Using Openstack Neutron, this component creates, configures, and manages Virtual Networks or VNs – the basic unit of networking in a virtualized network.
  • VM Configuration Module: Using the APIs offered by Openstack Nova, this component is responsible for booting up test VMs on specific hypervisors, etc. Another feature we plan to add to nova is the ability to pin VMs to certain CPUs on the host, such that we can measure the number of CPU cores used by the vswitch for processing, especially while performing CPU-intensive operations such as tunneling.
  • Test initiation and Measurement: This component works by logging into the test VMs and test hosts (i.e., hypervisors of compute nodes) via ssh, initiating tests and performing measurements. We are currently using SSH to log in to each machine using Python’s fab toolkit (similar to Ansible, Chef or Puppet).

Challenges

  • Hypervisor CPU usage measurement: In virtualized environments, the CPU is used by both the hypervisor and the VM for processing. At times, they even share physical CPU. In such cases, it is crucial to perform CPU pinning of the virtual CPU of the VMs to the physical CPUs so that their utilization can be ignored during the hypervisor CPU usage computation.
  • Maximizing TCP throughput: Jumbo frame support is essential on all the interfaces and the network fabric to maximize the TCP throughput achieved for bulk data transfer. This allows the maximum segment size of TCP datagrams to be as much as 9000 bytes.
  • Emulating multiple hypervisors: Most SDN solutions today are vertically integrated and closed. This makes it difficult to simply use network namespace approaches (like mininet) to emulate multiple vSwitches. Considering it is not easy to dedicate 1000 servers for a PoC, we emulate 1000 vSwitches by installing each within a VM.


Resources


Here is the CentOS VM that we used for testing. This image is in QCOW2 format and can be imported into the OpenStack controller for spawing as tenant VMs in Redhat KVM platforms. It has pre-installed several tools such as iperf, hping, arping, tcpdump.

  • Login: root, Password: centos

Leave a Reply