SDN Test Suite: Results for Floodlight and OpenContrail

We compared the functionality and data plane of the following three approaches for virtual networking:

  1. OpenContrail solution: This overlay-SDN solution uses BGP control plane and MPLSoGRE for the data plane. Each virtual dataplane (called vRouter) is also a full-fledged routing gateway.
    The snapshot of the custom devstack is from Nov 20, 2013 (commit 4e030aeeedc870364152d49bbaec639e56b06ae0), controller is from Nov 19, 2013 (commit 89f6aadbe6fa0202b4d4aa45ad39c5b9ead0070c) and vRouter is from Nov 10, 2013 (commit 47856fbab60f8b4f49de0b5293001f599e46dfb3).
  2. FloodLight + IVS: This overlay-SDN solution uses an OpenFlow-based controller for the control plane. Each virtual dataplane (called Indigo Virtual Switch — IVS) uses kernel module of Open vSwitch in combination with a stub for supporting extended OpenFlow.
    The snapshot of the IVS is from Oct 15, 2013 (commit ad6ae82fb4255a6837a5847211698555c4580318) and FloodLight controller is from Sept 17, 2013 (commit b0597f122f8cb9e66c47c6e93c75fa7da615b504).
  3. Linux bridge: This is the basic virtual networking used by nova-network and it served as our baseline non-SDN solution.

Note that we chose the open-source versions of the first two solutions. They may not be representative of the commercial version of the solutions offered by Juniper and Big Switch.

Environment details

Each solution was run on the exact same setup as described below.

  • Dataplane components: The dataplane tests used two servers (24-core Xeon X56560 @ 2.67 Ghz; 128 GB RAM running Centos 6.4) interconnected by a Dell Powerconnect 8024 24x10GE switch. No SR-IOV was used.
  • Workload VM information: Each server ran KVM. The two VMs spawned used for the functionality and dataplane tests had 4 vCPUs pinned 1:1 to host CPUs, 8GB memory, 12 GB hard disk in the qcow2 format.
  • Controller server information: Each SDN solution had a controller running on a bare-metal CentOS 6.4 server with 8-core Xeon X5550 @ 2.67 Ghz, 32 GB RAM

Functionality test results

Category Test OpenContrail FloodLight
Isolation Provision 2 VNets (i.e., virtual networks) and 2 VMs with the same sets of IPs (within their own VNs). Ensure reachability within each VNet. Pass Pass
Deprovisioning VNets disallows reachability immediately and stops any ongoing VM traffic in that VNet. N/A[1] Fail[2]
VNet A does not see VNet B IP unicast traffic to valid IPs in VNet B Pass Pass
VNet A does not see VNet B IP flood traffic to invalid/broadcast IPs (arp -s should be done on source) Pass Fail[3]
Rate limiting Set rate limit of 10Mbps for logical ports and verify compliance through iperf N/A N/A[4]
Set rate limit of 10Mbps for select logical ports and verify that it does not affect other logical ports of same VNet N/A N/A
Set rate limit of 10Mbps for select logical ports and verify that it does not affect logical ports of different VNet N/A N/A
Fault tolerance Start ping between two VMs in same VNet, and then turn off controller gracefully. Verify that ping continues. Fail-close Pass
Start ping between two VMs in same VNet, and then turn off controller ungracefully. Verify that ping continues. Pass Pass
Start ping between two VMs in same VNet, then turn off controller gracefully and then start a new iperf session between the two VMs. Fail-close[5] Pass
Start ping between two VMs in same VNet, then turn off controller ungracefully, and then start a new iperf session between the two VMs. Pass Pass
Turn off the controller and then start a new ping between two VMs in same VNet. List whether the behavior is fail-open or fail-close Fail-close Fail-close
Start ping between two VMs in same VNet, then turn off controller, stop ping, progressively increase wait time and restart the ping to identify the idle timeout. N/A 5 secs
Controller fault tolerance through multiple controller instances Pass (untested) N/A
Restart virtual switch and ensure traffic flows Pass Pass
VM migration Start ping between two VMs in same VNet. Shutdown one VM, migrate it to a different server and boot up. Verify that ping works between the VMs N/A Pass
Start ping between two VMs in same VNet, and migrate one VM to a different server. Verify that ping continues and list downtime. N/A Fail[6]
Start ping between two VMs in same VNet, then turn off controller, then migrate VM to a new server, then restart controller, and then ping again to verify if it works. N/A Pass
Value-added features Other additional features:

  • MPLS+GRE v3 encapsulation and decapsulation (layer 3 in layer 2.5)
  • Link bonding at virtual switch to increase aggregate throughput
  • Proxy ARP reply from either the controller or the vSwitch
  • In-built DHCP at the control plane
Yes N/A
  1. Workflow disallows deleting VNets with VMs in it
  2. Existing cross-VM traffic continues to work.
  3. Broadcast is sent to all VMs and not just restricted to tenants
  4. No rate-limiting feature in the basic branch
  5. All communication within a VNet stops when controller is turned off
  6. Once migration is complete, the VMs related to existing flows cannot communicate. Migrated VM can ping other VMs.

Data plane test results

Test Server Variable OpenContrail Floodlight Linux bridge
RTT (ms) for ICMP ping between two VMs Same server Single-packet time-deferred 0.408ms +/-0.035ms 1.102ms +/-0.594ms[1] 0.336 ms +/- 0.237 ms
Steady state ICMP flood 0.129ms
+/- 0.157ms
0% loss
0.122ms
+/- 0.029ms
0% loss
0.119
+/- 0.085
0% loss
Across servers Single-packet time-deferred 0.728ms +/-0.406ms 1.536ms +/- 0.941ms 0.452ms +/- 0.355ms
Steady state ICMP flood 0.112ms
+/-0.018ms
0% loss
0.162ms
+/- 0.036ms
0% loss
0.150ms
+/- 0.078
0% loss
Unidirectional TCP transfer between two VMs Same server MSS 68 bytes 11.4 Gbps
CPU 197%[2]
9.18Gbps
CPU 195.0%
7.48 Gbps
CPU 41.0%
MSS 1468 bytes 10.8 Gbps
CPU 203.1%
11.12 Gbps
CPU 197.0%
7.70Gbps
CPU 31.8%
MSS 8908 bytes 11.2Gbps
CPU 197.2%
8.87 Gbps
CPU 192%
8.54 Gbps
CPU 57%
Across servers MSS 68 bytes Kernel panic on client[3] 0.82Gbps
CPU 15.6%
TODO
MSS 1468 bytes 3.99 Gbps
CPU 55.4%
6.67 Gbps
CPU 100.9%
TODO
MSS 8908 bytes 6.91 Gbps
CPU 67.1%
7.88 Gbps
CPU 109.4%
TODO
Unidirectional UDP For 60 secs between two VMs Same server p=1 sessions 2.28Gbps
CPU 176.0%
33% loss
1.73 Gbps
CPU 201.0%
46% loss
1.73 Gbps
CPU 61.0%
46% loss
p=5 sessions 2.92 Gbps
CPU 204.1%
26% loss
3.59Gbps
CPU 209.0%
15% loss
3.59Gbps
CPU 42.0%
15% loss
Across servers p=1 sessions 1.33Gbps
CPU 106.0%
64% loss
2.84 Gbps
CPU 99.3%
27% loss
996 Mbps
CPU 80%
46% loss
p=5 sessions 1.59 Gbps
CPU 114.9%
51% loss
3.31Gbps
CPU 103.7%
15% loss
2.01 Gbps
CPU 98.0%
44% loss
Broadcast packets between two VMs Same server ARP 0% loss
CPU 21.7%
97% loss[4]
CPU 165.4%
11% loss
CPU 23.7
ICMP 0% loss
120.5% CPU
0% loss
CPU 192.6%
0% loss
CPU 19%
Across servers ARP 0% loss
CPU 58.9%
99% loss[4]
CPU 170.1%
82% loss
CPU 22%
ICMP 0% loss
CPU 45.4%
0% loss
CPU 182.6%
0% loss
CPU 2%
Multiple TCP sessions between two VMs with static MAC Same server 10 flows/sec 0% loss
CPU 3%
0% loss
CPU 3%
0% loss
CPU TODO
100 flows/sec 0% loss
CPU 12%
0% loss
CPU 10.7%
0% loss
CPU TODO
1,000 flows/sec 0% loss
CPU 18.1%
0% loss
CPU 16.1%
0% loss
CPU TODO
10,000 flows/sec 2% loss
CPU 34.4%
2% loss
CPU 75.3%
0% loss
CPU TODO
Across servers 10 flows/sec 0% loss
CPU 3%
0% loss
CPU 1%
0% loss
CPU 2%
100 flows/sec 0% loss
CPU 10.9%
0% loss
CPU 3%
0% loss
CPU 2%
1,000 flows/sec 0% loss
CPU 15.9%
0% loss
CPU 10.1%
0% loss
CPU 32.9%
10,000 flows/sec 2% loss
CPU 25.0%
2% loss
CPU 31.8%
0% loss
CPU 38.4
  1. Higher latency indicates reactive flow setup
  2. 197% CPU means that the vDP consumed close to 2 physical CPU resources
  3. Client server crashes with kernel panic. Problem solved in Dec 2013.
  4. Each ARP generates pkt_ins. Total pkt_in messages are throttled at 200 pkts/sec

Comments are closed.