* bridging: flow control regression @ 2010-11-01 12:29 Simon Horman 2010-11-01 12:59 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2010-11-01 12:29 UTC (permalink / raw) To: netdev; +Cc: Jay Vosburgh, Eric Dumazet, David S. Miller Hi, I have observed what appears to be a regression between 2.6.34 and 2.6.35-rc1. The behaviour described below is still present in Linus's current tree (2.6.36+). On 2.6.34 and earlier when sending a UDP stream to a bonded interface the throughput is approximately equal to the available physical bandwidth. # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.50.253 (172.17.50.253) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 114688 1472 30.00 2438265 0 957.1 18.09 3.159 109568 30.00 2389980 938.1 -1.00 -1.000 On 2.6.35-rc1 netpref sends~7Gbits/s. Curiously it only consumes 50% CPU, I would expect this to be CPU bound. # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.50.253 (172.17.50.253) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 116736 1472 30.00 18064360 0 7090.8 50.62 8.665 109568 30.00 2438090 957.0 -1.00 -1.000 In this case the bonding device has a single gitabit slave device and is running in balance-rr mode. I have observed similar results with two and three slave devices. I have bisected the problem and the offending commit appears to be "net: Introduce skb_orphan_try()". My tired eyes tell me that change frees skb's earlier than they otherwise would be unless tx timestamping is in effect. That does seem to make sense in relation to this problem, though I am yet to dig into specifically why bonding is adversely affected. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bridging: flow control regression 2010-11-01 12:29 bridging: flow control regression Simon Horman @ 2010-11-01 12:59 ` Eric Dumazet 2010-11-02 2:06 ` bonding: flow control regression [was Re: bridging: flow control regression] Simon Horman 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2010-11-01 12:59 UTC (permalink / raw) To: Simon Horman; +Cc: netdev, Jay Vosburgh, David S. Miller Le lundi 01 novembre 2010 à 21:29 +0900, Simon Horman a écrit : > Hi, > > I have observed what appears to be a regression between 2.6.34 and > 2.6.35-rc1. The behaviour described below is still present in Linus's > current tree (2.6.36+). > > On 2.6.34 and earlier when sending a UDP stream to a bonded interface > the throughput is approximately equal to the available physical bandwidth. > > # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472 > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 172.17.50.253 (172.17.50.253) port 0 AF_INET > Socket Message Elapsed Messages CPU Service > Size Size Time Okay Errors Throughput Util Demand > bytes bytes secs # # 10^6bits/sec % SU us/KB > > 114688 1472 30.00 2438265 0 957.1 18.09 3.159 > 109568 30.00 2389980 938.1 -1.00 -1.000 > > On 2.6.35-rc1 netpref sends~7Gbits/s. > Curiously it only consumes 50% CPU, I would expect this to be CPU bound. > > # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472 > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 172.17.50.253 (172.17.50.253) port 0 AF_INET > Socket Message Elapsed Messages CPU Service > Size Size Time Okay Errors Throughput Util Demand > bytes bytes secs # # 10^6bits/sec % SU us/KB > > 116736 1472 30.00 18064360 0 7090.8 50.62 8.665 > 109568 30.00 2438090 957.0 -1.00 -1.000 > > In this case the bonding device has a single gitabit slave device > and is running in balance-rr mode. I have observed similar results > with two and three slave devices. > > I have bisected the problem and the offending commit appears to be > "net: Introduce skb_orphan_try()". My tired eyes tell me that change > frees skb's earlier than they otherwise would be unless tx timestamping > is in effect. That does seem to make sense in relation to this problem, > though I am yet to dig into specifically why bonding is adversely affected. > I assume you meant "bonding: flow control regression", ie this is not related to bridging ? One problem on bonding is that the xmit() method always returns NETDEV_TX_OK. So a flooder cannot know some of its frames were lost. So yes, the patch you mention has the effect of allowing UDP to flood bonding device, since we orphan skb before giving it to device (bond or ethX) With a normal device (with a qdisc), we queue skb, and orphan it only when leaving queue. With a not too big socket send buffer, it slows down the sender enough to "send UDP frames at line rate only" ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-01 12:59 ` Eric Dumazet @ 2010-11-02 2:06 ` Simon Horman 2010-11-02 4:53 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2010-11-02 2:06 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller On Mon, Nov 01, 2010 at 01:59:32PM +0100, Eric Dumazet wrote: > Le lundi 01 novembre 2010 à 21:29 +0900, Simon Horman a écrit : > > Hi, > > > > I have observed what appears to be a regression between 2.6.34 and > > 2.6.35-rc1. The behaviour described below is still present in Linus's > > current tree (2.6.36+). > > > > On 2.6.34 and earlier when sending a UDP stream to a bonded interface > > the throughput is approximately equal to the available physical bandwidth. > > > > # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472 > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > > 172.17.50.253 (172.17.50.253) port 0 AF_INET > > Socket Message Elapsed Messages CPU Service > > Size Size Time Okay Errors Throughput Util Demand > > bytes bytes secs # # 10^6bits/sec % SU us/KB > > > > 114688 1472 30.00 2438265 0 957.1 18.09 3.159 > > 109568 30.00 2389980 938.1 -1.00 -1.000 > > > > On 2.6.35-rc1 netpref sends~7Gbits/s. > > Curiously it only consumes 50% CPU, I would expect this to be CPU bound. > > > > # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472 > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > > 172.17.50.253 (172.17.50.253) port 0 AF_INET > > Socket Message Elapsed Messages CPU Service > > Size Size Time Okay Errors Throughput Util Demand > > bytes bytes secs # # 10^6bits/sec % SU us/KB > > > > 116736 1472 30.00 18064360 0 7090.8 50.62 8.665 > > 109568 30.00 2438090 957.0 -1.00 -1.000 > > > > In this case the bonding device has a single gitabit slave device > > and is running in balance-rr mode. I have observed similar results > > with two and three slave devices. > > > > I have bisected the problem and the offending commit appears to be > > "net: Introduce skb_orphan_try()". My tired eyes tell me that change > > frees skb's earlier than they otherwise would be unless tx timestamping > > is in effect. That does seem to make sense in relation to this problem, > > though I am yet to dig into specifically why bonding is adversely affected. > > > > I assume you meant "bonding: flow control regression", ie this is not > related to bridging ? Yes, sorry about that. I meant bonding not bridging. > One problem on bonding is that the xmit() method always returns > NETDEV_TX_OK. > > So a flooder cannot know some of its frames were lost. > > So yes, the patch you mention has the effect of allowing UDP to flood > bonding device, since we orphan skb before giving it to device (bond or > ethX) > > With a normal device (with a qdisc), we queue skb, and orphan it only > when leaving queue. With a not too big socket send buffer, it slows down > the sender enough to "send UDP frames at line rate only" Thanks for the explanation. I'm not entirely sure how much of a problem this is in practice. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-02 2:06 ` bonding: flow control regression [was Re: bridging: flow control regression] Simon Horman @ 2010-11-02 4:53 ` Eric Dumazet 2010-11-02 7:03 ` Simon Horman 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2010-11-02 4:53 UTC (permalink / raw) To: Simon Horman; +Cc: netdev, Jay Vosburgh, David S. Miller Le mardi 02 novembre 2010 à 11:06 +0900, Simon Horman a écrit : > Thanks for the explanation. > I'm not entirely sure how much of a problem this is in practice. Maybe for virtual devices (tunnels, bonding, ...), it would make sense to delay the orphaning up to the real device. But if the socket send buffer is very large, it would defeat the flow control any way... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-02 4:53 ` Eric Dumazet @ 2010-11-02 7:03 ` Simon Horman 2010-11-02 7:30 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2010-11-02 7:03 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller On Tue, Nov 02, 2010 at 05:53:42AM +0100, Eric Dumazet wrote: > Le mardi 02 novembre 2010 à 11:06 +0900, Simon Horman a écrit : > > > Thanks for the explanation. > > I'm not entirely sure how much of a problem this is in practice. > > Maybe for virtual devices (tunnels, bonding, ...), it would make sense > to delay the orphaning up to the real device. That was my initial thought. Could you give me some guidance on how that might be done so I can try and make a patch to test? > But if the socket send buffer is very large, it would defeat the flow > control any way... I'm primarily concerned about a situation where UDP packets are sent as fast as possible, indefinitely. And in that scenario, I think it would need to be a rather large buffer. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-02 7:03 ` Simon Horman @ 2010-11-02 7:30 ` Eric Dumazet 2010-11-02 8:46 ` Simon Horman 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2010-11-02 7:30 UTC (permalink / raw) To: Simon Horman; +Cc: netdev, Jay Vosburgh, David S. Miller Le mardi 02 novembre 2010 à 16:03 +0900, Simon Horman a écrit : > On Tue, Nov 02, 2010 at 05:53:42AM +0100, Eric Dumazet wrote: > > Le mardi 02 novembre 2010 à 11:06 +0900, Simon Horman a écrit : > > > > > Thanks for the explanation. > > > I'm not entirely sure how much of a problem this is in practice. > > > > Maybe for virtual devices (tunnels, bonding, ...), it would make sense > > to delay the orphaning up to the real device. > > That was my initial thought. Could you give me some guidance > on how that might be done so I can try and make a patch to test? > > > But if the socket send buffer is very large, it would defeat the flow > > control any way... > > I'm primarily concerned about a situation where > UDP packets are sent as fast as possible, indefinitely. > And in that scenario, I think it would need to be a rather large buffer. > Please try following patch, thanks. drivers/net/bonding/bond_main.c | 1 + include/linux/if.h | 3 +++ net/core/dev.c | 5 +++-- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index bdb68a6..325931e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4714,6 +4714,7 @@ static void bond_setup(struct net_device *bond_dev) bond_dev->flags |= IFF_MASTER|IFF_MULTICAST; bond_dev->priv_flags |= IFF_BONDING; bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; + bond_dev->priv_flags &= ~IFF_EARLY_ORPHAN; if (bond->params.arp_interval) bond_dev->priv_flags |= IFF_MASTER_ARPMON; diff --git a/include/linux/if.h b/include/linux/if.h index 1239599..7499a99 100644 --- a/include/linux/if.h +++ b/include/linux/if.h @@ -77,6 +77,9 @@ #define IFF_BRIDGE_PORT 0x8000 /* device used as bridge port */ #define IFF_OVS_DATAPATH 0x10000 /* device used as Open vSwitch * datapath port */ +#define IFF_EARLY_ORPHAN 0x20000 /* early orphan skbs in + * dev_hard_start_xmit() + */ #define IF_GET_IFACE 0x0001 /* for querying only */ #define IF_GET_PROTO 0x0002 diff --git a/net/core/dev.c b/net/core/dev.c index 35dfb83..eabf94d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2005,7 +2005,8 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(skb); - skb_orphan_try(skb); + if (dev->priv_flags & IFF_EARLY_ORPHAN) + skb_orphan_try(skb); if (vlan_tx_tag_present(skb) && !(dev->features & NETIF_F_HW_VLAN_TX)) { @@ -5590,7 +5591,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name, INIT_LIST_HEAD(&dev->napi_list); INIT_LIST_HEAD(&dev->unreg_list); INIT_LIST_HEAD(&dev->link_watch_list); - dev->priv_flags = IFF_XMIT_DST_RELEASE; + dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_EARLY_ORPHAN ; setup(dev); strcpy(dev->name, name); return dev; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-02 7:30 ` Eric Dumazet @ 2010-11-02 8:46 ` Simon Horman 2010-11-02 9:29 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2010-11-02 8:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller On Tue, Nov 02, 2010 at 08:30:57AM +0100, Eric Dumazet wrote: > Le mardi 02 novembre 2010 à 16:03 +0900, Simon Horman a écrit : > > On Tue, Nov 02, 2010 at 05:53:42AM +0100, Eric Dumazet wrote: > > > Le mardi 02 novembre 2010 à 11:06 +0900, Simon Horman a écrit : > > > > > > > Thanks for the explanation. > > > > I'm not entirely sure how much of a problem this is in practice. > > > > > > Maybe for virtual devices (tunnels, bonding, ...), it would make sense > > > to delay the orphaning up to the real device. > > > > That was my initial thought. Could you give me some guidance > > on how that might be done so I can try and make a patch to test? > > > > > But if the socket send buffer is very large, it would defeat the flow > > > control any way... > > > > I'm primarily concerned about a situation where > > UDP packets are sent as fast as possible, indefinitely. > > And in that scenario, I think it would need to be a rather large buffer. > > > > Please try following patch, thanks. Thanks Eric, that seems to resolve the problem that I was seeing. With your patch I see: No bonding # netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 116736 1472 30.00 2438413 0 957.2 8.52 1.458 129024 30.00 2438413 957.2 -1.00 -1.000 With bonding (one slave, the interface used in the test above) netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SU us/KB 116736 1472 30.00 2438390 0 957.1 8.97 1.535 129024 30.00 2438390 957.1 -1.00 -1.000 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-02 8:46 ` Simon Horman @ 2010-11-02 9:29 ` Eric Dumazet 2010-11-06 9:25 ` Simon Horman 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2010-11-02 9:29 UTC (permalink / raw) To: Simon Horman; +Cc: netdev, Jay Vosburgh, David S. Miller Le mardi 02 novembre 2010 à 17:46 +0900, Simon Horman a écrit : > Thanks Eric, that seems to resolve the problem that I was seeing. > > With your patch I see: > > No bonding > > # netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET > Socket Message Elapsed Messages CPU Service > Size Size Time Okay Errors Throughput Util Demand > bytes bytes secs # # 10^6bits/sec % SU us/KB > > 116736 1472 30.00 2438413 0 957.2 8.52 1.458 > 129024 30.00 2438413 957.2 -1.00 -1.000 > > With bonding (one slave, the interface used in the test above) > > netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET > Socket Message Elapsed Messages CPU Service > Size Size Time Okay Errors Throughput Util Demand > bytes bytes secs # # 10^6bits/sec % SU us/KB > > 116736 1472 30.00 2438390 0 957.1 8.97 1.535 > 129024 30.00 2438390 957.1 -1.00 -1.000 > Sure the patch helps when not too many flows are involved, but this is a hack. Say the device queue is 1000 packets, and you run a workload with 2000 sockets, it wont work... Or device queue is 1000 packets, one flow, and socket send queue size allows for more than 1000 packets to be 'in flight' (echo 2000000 >/proc/sys/net/core/wmem_default) , it wont work too with bonding, only with devices with a qdisc sitting in the first device met after the socket. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-02 9:29 ` Eric Dumazet @ 2010-11-06 9:25 ` Simon Horman 2010-12-08 13:22 ` Simon Horman 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2010-11-06 9:25 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller On Tue, Nov 02, 2010 at 10:29:45AM +0100, Eric Dumazet wrote: > Le mardi 02 novembre 2010 à 17:46 +0900, Simon Horman a écrit : > > > Thanks Eric, that seems to resolve the problem that I was seeing. > > > > With your patch I see: > > > > No bonding > > > > # netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET > > Socket Message Elapsed Messages CPU Service > > Size Size Time Okay Errors Throughput Util Demand > > bytes bytes secs # # 10^6bits/sec % SU us/KB > > > > 116736 1472 30.00 2438413 0 957.2 8.52 1.458 > > 129024 30.00 2438413 957.2 -1.00 -1.000 > > > > With bonding (one slave, the interface used in the test above) > > > > netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET > > Socket Message Elapsed Messages CPU Service > > Size Size Time Okay Errors Throughput Util Demand > > bytes bytes secs # # 10^6bits/sec % SU us/KB > > > > 116736 1472 30.00 2438390 0 957.1 8.97 1.535 > > 129024 30.00 2438390 957.1 -1.00 -1.000 > > > > > Sure the patch helps when not too many flows are involved, but this is a > hack. > > Say the device queue is 1000 packets, and you run a workload with 2000 > sockets, it wont work... > > Or device queue is 1000 packets, one flow, and socket send queue size > allows for more than 1000 packets to be 'in flight' (echo 2000000 > >/proc/sys/net/core/wmem_default) , it wont work too with bonding, only > with devices with a qdisc sitting in the first device met after the > socket. True, thanks for pointing that out. The scenario that I am actually interested in is virtualisation. And I believe that your patch helps the vhostnet case (I don't see flow control problems with bonding + virtio without vhostnet). However, I am unsure if there are also some easy work-arounds to degrade flow control in the vhostnet case too. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-11-06 9:25 ` Simon Horman @ 2010-12-08 13:22 ` Simon Horman 2010-12-08 13:50 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Simon Horman @ 2010-12-08 13:22 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller On Sat, Nov 06, 2010 at 06:25:37PM +0900, Simon Horman wrote: > On Tue, Nov 02, 2010 at 10:29:45AM +0100, Eric Dumazet wrote: > > Le mardi 02 novembre 2010 à 17:46 +0900, Simon Horman a écrit : > > > > > Thanks Eric, that seems to resolve the problem that I was seeing. > > > > > > With your patch I see: > > > > > > No bonding > > > > > > # netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 > > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET > > > Socket Message Elapsed Messages CPU Service > > > Size Size Time Okay Errors Throughput Util Demand > > > bytes bytes secs # # 10^6bits/sec % SU us/KB > > > > > > 116736 1472 30.00 2438413 0 957.2 8.52 1.458 > > > 129024 30.00 2438413 957.2 -1.00 -1.000 > > > > > > With bonding (one slave, the interface used in the test above) > > > > > > netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472 > > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET > > > Socket Message Elapsed Messages CPU Service > > > Size Size Time Okay Errors Throughput Util Demand > > > bytes bytes secs # # 10^6bits/sec % SU us/KB > > > > > > 116736 1472 30.00 2438390 0 957.1 8.97 1.535 > > > 129024 30.00 2438390 957.1 -1.00 -1.000 > > > > > > > > > Sure the patch helps when not too many flows are involved, but this is a > > hack. > > > > Say the device queue is 1000 packets, and you run a workload with 2000 > > sockets, it wont work... > > > > Or device queue is 1000 packets, one flow, and socket send queue size > > allows for more than 1000 packets to be 'in flight' (echo 2000000 > > >/proc/sys/net/core/wmem_default) , it wont work too with bonding, only > > with devices with a qdisc sitting in the first device met after the > > socket. > > True, thanks for pointing that out. > > The scenario that I am actually interested in is virtualisation. > And I believe that your patch helps the vhostnet case (I don't see > flow control problems with bonding + virtio without vhostnet). However, > I am unsure if there are also some easy work-arounds to degrade > flow control in the vhostnet case too. Hi Eric, do you have any thoughts on this? I measured the performance impact of your patch on 2.6.37-rc1 and I can see why early orphaning is a win. The tests are run over a bond with 3 slaves. The bond is in rr-balance mode. Other parameters of interest are: MTU=1500 client,server:tcp_reordering=3(default) client:GSO=off, client:TSO=off server:GRO=off server:rx-usecs=3(default) Without your no early-orphan patch TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 1621.03 16.31 6.48 1.648 2.621 With your no early-orphan patch # netperf -C -c -4 -t TCP_STREAM -H 172.17.60.216 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 1433.48 9.60 5.45 1.098 2.490 However in the case of virtualisation I think it is a win to be able to do flow control on UDP traffic from guests (using vitio). Am I missing something and flow control can be bypassed anyway? If not perhaps making the change that your patch makes configurable through proc or ethtool is an option? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: bonding: flow control regression [was Re: bridging: flow control regression] 2010-12-08 13:22 ` Simon Horman @ 2010-12-08 13:50 ` Eric Dumazet 0 siblings, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2010-12-08 13:50 UTC (permalink / raw) To: Simon Horman; +Cc: netdev, Jay Vosburgh, David S. Miller Le mercredi 08 décembre 2010 à 22:22 +0900, Simon Horman a écrit : > Hi Eric, > > do you have any thoughts on this? > > I measured the performance impact of your patch on 2.6.37-rc1 > and I can see why early orphaning is a win. > > The tests are run over a bond with 3 slaves. > The bond is in rr-balance mode. Other parameters of interest are: > MTU=1500 > client,server:tcp_reordering=3(default) > client:GSO=off, > client:TSO=off > server:GRO=off > server:rx-usecs=3(default) > > Without your no early-orphan patch > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 172.17.60.216 (172.17.60.216) port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 16384 10.00 1621.03 16.31 6.48 1.648 2.621 > > With your no early-orphan patch > # netperf -C -c -4 -t TCP_STREAM -H 172.17.60.216 > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 172.17.60.216 (172.17.60.216) port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 16384 10.00 1433.48 9.60 5.45 1.098 2.490 > It seems strange this makes such big difference with one flow > > However in the case of virtualisation I think it is a win to be able to do > flow control on UDP traffic from guests (using vitio). Am I missing > something and flow control can be bypassed anyway? If not perhaps making > the change that your patch makes configurable through proc or ethtool is an > option? > virtio_net start_xmit() does one skb_orphan() anyway, so not doing it some nano seconds before wont change anything. Real perf problem is when skb are queued (for example on eth driver TX ring or qdisc queue), then freed some micro (or milli) seconds later. Maybe your ethtool suggestion is the way to go, so that we can remove special "skb_orphans()" that can be done in some drivers : Let core network stack decide to skb_orphan() itself, not the driver. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-12-08 13:51 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-01 12:29 bridging: flow control regression Simon Horman 2010-11-01 12:59 ` Eric Dumazet 2010-11-02 2:06 ` bonding: flow control regression [was Re: bridging: flow control regression] Simon Horman 2010-11-02 4:53 ` Eric Dumazet 2010-11-02 7:03 ` Simon Horman 2010-11-02 7:30 ` Eric Dumazet 2010-11-02 8:46 ` Simon Horman 2010-11-02 9:29 ` Eric Dumazet 2010-11-06 9:25 ` Simon Horman 2010-12-08 13:22 ` Simon Horman 2010-12-08 13:50 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).