From: "Michael S. Tsirkin" <mst@redhat.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: Simon Horman <horms@verge.net.au>, Jesse Gross <jesse@nicira.com>,
Rusty Russell <rusty@rustcorp.com.au>,
virtualization@lists.linux-foundation.org, dev@openvswitch.org,
virtualization@lists.osdl.org, netdev@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: Flow Control and Port Mirroring Revisited
Date: Mon, 24 Jan 2011 21:42:24 +0200 [thread overview]
Message-ID: <20110124194224.GD29941@redhat.com> (raw)
In-Reply-To: <4D3DCC99.5050101@hp.com>
On Mon, Jan 24, 2011 at 11:01:45AM -0800, Rick Jones wrote:
> Michael S. Tsirkin wrote:
> >On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> >
> >>>Just to block netperf you can send it SIGSTOP :)
> >>>
> >>
> >>Clever :) One could I suppose achieve the same result by making the
> >>remote receive socket buffer size smaller than the UDP message size
> >>and then not worry about having to learn the netserver's PID to send
> >>it the SIGSTOP. I *think* the semantics will be substantially the
> >>same?
> >
> >
> >If you could set, it, yes. But at least linux ignores
> >any value substantially smaller than 1K, and then
> >multiplies that by 2:
> >
> > case SO_RCVBUF:
> > /* Don't error on this BSD doesn't and if you think
> > about it this is right. Otherwise apps have to
> > play 'guess the biggest size' games. RCVBUF/SNDBUF
> > are treated in BSD as hints */
> >
> > if (val > sysctl_rmem_max)
> > val = sysctl_rmem_max;
> >set_rcvbuf: sk->sk_userlocks |=
> >SOCK_RCVBUF_LOCK;
> >
> > /*
> > * We double it on the way in to account for
> > * "struct sk_buff" etc. overhead. Applications
> > * assume that the SO_RCVBUF setting they make will
> > * allow that much actual data to be received on that
> > * socket.
> > *
> > * Applications are unaware that "struct sk_buff" and
> > * other overheads allocate from the receive buffer
> > * during socket buffer allocation.
> >*
> > * And after considering the possible alternatives,
> > * returning the value we actually used in getsockopt
> > * is the most desirable behavior.
> > */ if ((val * 2) <
> >SOCK_MIN_RCVBUF)
> > sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
> > else
> > sk->sk_rcvbuf = val * 2;
> >
> >and
> >
> >/* * Since sk_rmem_alloc sums skb->truesize,
> >even a small frame might need
> > * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
> > */ #define SOCK_MIN_RCVBUF (2048 + sizeof(struct
> >sk_buff))
>
> Pity - seems to work back on 2.6.26:
Hmm, that code is there at least as far back as 2.6.12.
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 1024
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 124928 1024 10.00 2882334 0 2361.17
> 256 10.00 0 0.00
>
> raj@tardy:~/netperf2_trunk$ uname -a
> Linux tardy 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux
>
> Still, even with that (or SIGSTOP) we don't really know where the
> packets were dropped right? There is no guarantee they weren't
> dropped before they got to the socket buffer
>
> happy benchmarking,
> rick jones
Right. Better send to a port with no socket listening there,
that would drop the packet at an early (if not at the earliest
possible) opportunity.
> PS - here is with a -S 1024 option:
>
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1024 -m 1024
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 124928 1024 10.00 1679269 0 1375.64
> 2048 10.00 1490662 1221.13
>
> showing that there is a decent chance that many of the frames were
> dropped at the socket buffer, but not all - I suppose I could/should
> be checking netstat stats... :)
>
> And just a little more, only because I was curious :)
>
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1M -m 257
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 124928 257 10.00 1869134 0 384.29
> 262142 10.00 1869134 384.29
>
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 257
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 124928 257 10.00 3076363 0 632.49
> 256 10.00 0 0.00
next prev parent reply other threads:[~2011-01-24 19:42 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-06 9:33 Flow Control and Port Mirroring Revisited Simon Horman
2011-01-06 10:22 ` Eric Dumazet
2011-01-06 12:44 ` Simon Horman
2011-01-06 13:28 ` Eric Dumazet
2011-01-06 22:01 ` Simon Horman
2011-01-06 22:38 ` Jesse Gross
2011-01-07 1:23 ` Simon Horman
2011-01-10 9:31 ` Simon Horman
2011-01-13 6:47 ` Simon Horman
2011-01-13 15:45 ` Jesse Gross
2011-01-13 23:41 ` Simon Horman
2011-01-14 4:58 ` Michael S. Tsirkin
2011-01-14 6:35 ` Simon Horman
2011-01-14 6:54 ` Michael S. Tsirkin
2011-01-16 22:37 ` Simon Horman
2011-01-16 23:56 ` Rusty Russell
2011-01-17 10:38 ` Michael S. Tsirkin
2011-01-17 10:26 ` Michael S. Tsirkin
2011-01-18 19:41 ` Rick Jones
2011-01-18 20:13 ` Michael S. Tsirkin
2011-01-18 21:28 ` Rick Jones
2011-01-19 9:11 ` Simon Horman
2011-01-20 8:38 ` Simon Horman
2011-01-21 2:30 ` Rick Jones
2011-01-21 9:59 ` Michael S. Tsirkin
2011-01-21 18:04 ` Rick Jones
2011-01-21 23:11 ` Simon Horman
2011-01-22 21:57 ` Michael S. Tsirkin
2011-01-23 6:38 ` Simon Horman
2011-01-23 10:39 ` Michael S. Tsirkin
2011-01-23 13:53 ` Simon Horman
2011-01-24 18:27 ` Rick Jones
2011-01-24 18:36 ` Michael S. Tsirkin
2011-01-24 19:01 ` Rick Jones
2011-01-24 19:42 ` Michael S. Tsirkin [this message]
2011-01-06 10:27 ` Michael S. Tsirkin
2011-01-06 11:30 ` Simon Horman
2011-01-06 12:07 ` Michael S. Tsirkin
2011-01-06 12:29 ` Simon Horman
2011-01-06 12:47 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110124194224.GD29941@redhat.com \
--to=mst@redhat.com \
--cc=dev@openvswitch.org \
--cc=horms@verge.net.au \
--cc=jesse@nicira.com \
--cc=kvm@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rick.jones2@hp.com \
--cc=rusty@rustcorp.com.au \
--cc=virtualization@lists.linux-foundation.org \
--cc=virtualization@lists.osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).