From: Simon Horman <horms@verge.net.au>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, Jay Vosburgh <fubar@us.ibm.com>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: bonding: flow control regression [was Re: bridging: flow control regression]
Date: Wed, 8 Dec 2010 22:22:17 +0900 [thread overview]
Message-ID: <20101208132217.GA28040@verge.net.au> (raw)
In-Reply-To: <20101106092535.GD5128@verge.net.au>
On Sat, Nov 06, 2010 at 06:25:37PM +0900, Simon Horman wrote:
> On Tue, Nov 02, 2010 at 10:29:45AM +0100, Eric Dumazet wrote:
> > Le mardi 02 novembre 2010 à 17:46 +0900, Simon Horman a écrit :
> >
> > > Thanks Eric, that seems to resolve the problem that I was seeing.
> > >
> > > With your patch I see:
> > >
> > > No bonding
> > >
> > > # netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472
> > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
> > > Socket Message Elapsed Messages CPU Service
> > > Size Size Time Okay Errors Throughput Util Demand
> > > bytes bytes secs # # 10^6bits/sec % SU us/KB
> > >
> > > 116736 1472 30.00 2438413 0 957.2 8.52 1.458
> > > 129024 30.00 2438413 957.2 -1.00 -1.000
> > >
> > > With bonding (one slave, the interface used in the test above)
> > >
> > > netperf -c -4 -t UDP_STREAM -H 172.17.60.216 -l 30 -- -m 1472
> > > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.60.216 (172.17.60.216) port 0 AF_INET
> > > Socket Message Elapsed Messages CPU Service
> > > Size Size Time Okay Errors Throughput Util Demand
> > > bytes bytes secs # # 10^6bits/sec % SU us/KB
> > >
> > > 116736 1472 30.00 2438390 0 957.1 8.97 1.535
> > > 129024 30.00 2438390 957.1 -1.00 -1.000
> > >
> >
> >
> > Sure the patch helps when not too many flows are involved, but this is a
> > hack.
> >
> > Say the device queue is 1000 packets, and you run a workload with 2000
> > sockets, it wont work...
> >
> > Or device queue is 1000 packets, one flow, and socket send queue size
> > allows for more than 1000 packets to be 'in flight' (echo 2000000
> > >/proc/sys/net/core/wmem_default) , it wont work too with bonding, only
> > with devices with a qdisc sitting in the first device met after the
> > socket.
>
> True, thanks for pointing that out.
>
> The scenario that I am actually interested in is virtualisation.
> And I believe that your patch helps the vhostnet case (I don't see
> flow control problems with bonding + virtio without vhostnet). However,
> I am unsure if there are also some easy work-arounds to degrade
> flow control in the vhostnet case too.
Hi Eric,
do you have any thoughts on this?
I measured the performance impact of your patch on 2.6.37-rc1
and I can see why early orphaning is a win.
The tests are run over a bond with 3 slaves.
The bond is in rr-balance mode. Other parameters of interest are:
MTU=1500
client,server:tcp_reordering=3(default)
client:GSO=off,
client:TSO=off
server:GRO=off
server:rx-usecs=3(default)
Without your no early-orphan patch
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 1621.03 16.31 6.48 1.648 2.621
With your no early-orphan patch
# netperf -C -c -4 -t TCP_STREAM -H 172.17.60.216
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.17.60.216 (172.17.60.216) port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 16384 10.00 1433.48 9.60 5.45 1.098 2.490
However in the case of virtualisation I think it is a win to be able to do
flow control on UDP traffic from guests (using vitio). Am I missing
something and flow control can be bypassed anyway? If not perhaps making
the change that your patch makes configurable through proc or ethtool is an
option?
next prev parent reply other threads:[~2010-12-08 13:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-01 12:29 bridging: flow control regression Simon Horman
2010-11-01 12:59 ` Eric Dumazet
2010-11-02 2:06 ` bonding: flow control regression [was Re: bridging: flow control regression] Simon Horman
2010-11-02 4:53 ` Eric Dumazet
2010-11-02 7:03 ` Simon Horman
2010-11-02 7:30 ` Eric Dumazet
2010-11-02 8:46 ` Simon Horman
2010-11-02 9:29 ` Eric Dumazet
2010-11-06 9:25 ` Simon Horman
2010-12-08 13:22 ` Simon Horman [this message]
2010-12-08 13:50 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101208132217.GA28040@verge.net.au \
--to=horms@verge.net.au \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=fubar@us.ibm.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).