We moved this page to our Documentation Portal. You can find the latest updates here. |
Issue
After creating an LACP bond (mode 4) using 2 or more NICS (max 4), all performance seems to go through 1 interface instead of using the 4 interfaces.
Slow network (or less than expected) performance on the network where the bond is in place. We usually see this, when 4 LACP NICS are used for the OnApp Storage Network.
Throubeshooting
Executing IPerf shows that the max speed is worth 1 NIC bandwidth.
#Excuted Iperf -c 10.200.4.254 -N -P 4 -M 9230 on the client side
[root@test ~]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37718
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.15 GBytes 989 Mbits/sec
[ 5] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37721
[ 4] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37724
[ 6] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37722
[ 7] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37723
[ 6] 0.0-10.0 sec 382 MBytes 319 Mbits/sec
[ 5] 0.0-10.1 sec 368 MBytes 307 Mbits/sec
[ 4] 0.0-10.1 sec 295 MBytes 246 Mbits/sec
[ 7] 0.0-10.1 sec 137 MBytes 114 Mbits/sec
[SUM] 0.0-10.1 sec 1.16 GBytes 987 Mbits/sec
[ 8] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37730
[ 8] 0.0-10.0 sec 1.15 GBytes 989 Mbits/sec
[ 4] local 10.200.1.254 port 5001 connected with 10.200.4.254 port 37735
[ 4] 0.0-10.0 sec 1.15 GBytes 984 Mbits/sec
Cause
As per https://www.kernel.org/doc/Documentation/networking/bonding.txt mode 4 utilize all slaves in the active aggregator. Slave selection for outgoing traffic is done according to the transmit hash policy:
802.3ad or 4 IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification. Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option, documented below. Note that not all transmit policies may be 802.3ad compliant, particularly in regards to the packet mis-ordering requirements of section 43.2.4 of the 802.3ad standard. Differing peer implementations will have varying tolerances for noncompliance. Prerequisites: 1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave. 2. A switch that supports IEEE 802.3ad Dynamic link aggregation. Most switches will require some type of configuration to enable 802.3ad mode.
Depending on your switch configuration, you will have to:
1. Check your switch config for "etherchannel is correctly configured."
2. Check different xmit_transfer policies.
Resolution
As an example, we changed the xmit_transfer policy here from layer2+3 to layer3+4:
[root@test~]# ifdown onappstorebond
[root@test ~]# echo "layer3+4" > /sys/class/net/onappstorebond/bonding/xmit_hash_policy
[root@test ~]# ifup onappstorebond
[root@test ~]# iperf -s
[root@test1 ~]# iperf -c 10.200.1.254 -P 4 -M 9000
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
WARNING: attempt to set TCP maximum segment size to 9000, but got 536
------------------------------------------------------------
Client connecting to 10.200.1.254, TCP port 5001
------------------------------------------------------------
[ 4] local 10.200.4.254 port 38032 connected with 10.200.1.254 port 5001
[ 6] local 10.200.4.254 port 38034 connected with 10.200.1.254 port 5001
[ 5] local 10.200.4.254 port 38033 connected with 10.200.1.254 port 5001
[ 3] local 10.200.4.254 port 38031 connected with 10.200.1.254 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.15 GBytes 989 Mbits/sec
[ 6] 0.0-10.0 sec 1.15 GBytes 991 Mbits/sec
[ 5] 0.0-10.0 sec 1.15 GBytes 984 Mbits/sec
[ 3] 0.0-10.0 sec 1.15 GBytes 986 Mbits/sec
[SUM] 0.0-10.0 sec 4.60 GBytes 3.95 Gbits/sec
Layer3+4 is not 100% 802.3ad compliant. Please check if your application will deal OK with unordered packets traffic.