virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: Simon Horman <horms@verge.net.au>, Jesse Gross <jesse@nicira.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	virtualization@lists.linux-foundation.org, dev@openvswitch.org,
	virtualization@lists.osdl.org, netdev@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: Flow Control and Port Mirroring Revisited
Date: Mon, 24 Jan 2011 21:42:24 +0200	[thread overview]
Message-ID: <20110124194224.GD29941@redhat.com> (raw)
In-Reply-To: <4D3DCC99.5050101@hp.com>

On Mon, Jan 24, 2011 at 11:01:45AM -0800, Rick Jones wrote:
> Michael S. Tsirkin wrote:
> >On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> >
> >>>Just to block netperf you can send it SIGSTOP :)
> >>>
> >>
> >>Clever :)  One could I suppose achieve the same result by making the
> >>remote receive socket buffer size smaller than the UDP message size
> >>and then not worry about having to learn the netserver's PID to send
> >>it the SIGSTOP.  I *think* the semantics will be substantially the
> >>same?
> >
> >
> >If you could set, it, yes. But at least linux ignores
> >any value substantially smaller than 1K, and then
> >multiplies that by 2:
> >
> >        case SO_RCVBUF:
> >                /* Don't error on this BSD doesn't and if you think
> >                   about it this is right. Otherwise apps have to
> >                   play 'guess the biggest size' games. RCVBUF/SNDBUF
> >                   are treated in BSD as hints */
> >
> >                if (val > sysctl_rmem_max)
> >                        val = sysctl_rmem_max;
> >set_rcvbuf:                     sk->sk_userlocks |=
> >SOCK_RCVBUF_LOCK;
> >
> >                /*
> >                 * We double it on the way in to account for
> >                 * "struct sk_buff" etc. overhead.   Applications
> >                 * assume that the SO_RCVBUF setting they make will
> >                 * allow that much actual data to be received on that
> >                 * socket.
> >                 *
> >                 * Applications are unaware that "struct sk_buff" and
> >                 * other overheads allocate from the receive buffer
> >                 * during socket buffer allocation.
> >*
> >                 * And after considering the possible alternatives,
> >                 * returning the value we actually used in getsockopt
> >                 * is the most desirable behavior.
> >                 */                 if ((val * 2) <
> >SOCK_MIN_RCVBUF)
> >                        sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
> >                else
> >                        sk->sk_rcvbuf = val * 2;
> >
> >and
> >
> >/*                       * Since sk_rmem_alloc sums skb->truesize,
> >even a small frame might need
> > * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
> > */             #define SOCK_MIN_RCVBUF (2048 + sizeof(struct
> >sk_buff))
> 
> Pity - seems to work back on 2.6.26:

Hmm, that code is there at least as far back as 2.6.12.

> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 1024
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928    1024   10.00     2882334      0    2361.17
>    256           10.00           0              0.00
> 
> raj@tardy:~/netperf2_trunk$ uname -a
> Linux tardy 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux
> 
> Still, even with that (or SIGSTOP) we don't really know where the
> packets were dropped right?  There is no guarantee they weren't
> dropped before they got to the socket buffer
> 
> happy benchmarking,
> rick jones

Right. Better send to a port with no socket listening there,
that would drop the packet at an early (if not at the earliest
possible)  opportunity.

> PS - here is with a -S 1024 option:
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1024 -m 1024
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928    1024   10.00     1679269      0    1375.64
>   2048           10.00     1490662           1221.13
> 
> showing that there is a decent chance that many of the frames were
> dropped at the socket buffer, but not all - I suppose I could/should
> be checking netstat stats... :)
> 
> And just a little more, only because I was curious :)
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1M -m 257
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928     257   10.00     1869134      0     384.29
> 262142           10.00     1869134            384.29
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 257
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928     257   10.00     3076363      0     632.49
>    256           10.00           0              0.00

  reply	other threads:[~2011-01-24 19:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-06  9:33 Flow Control and Port Mirroring Revisited Simon Horman
2011-01-06 10:22 ` Eric Dumazet
2011-01-06 12:44   ` Simon Horman
2011-01-06 13:28     ` Eric Dumazet
2011-01-06 22:01       ` Simon Horman
2011-01-06 22:38     ` Jesse Gross
2011-01-07  1:23       ` Simon Horman
2011-01-10  9:31         ` Simon Horman
2011-01-13  6:47           ` Simon Horman
2011-01-13 15:45             ` Jesse Gross
2011-01-13 23:41               ` Simon Horman
2011-01-14  4:58                 ` Michael S. Tsirkin
2011-01-14  6:35                   ` Simon Horman
2011-01-14  6:54                     ` Michael S. Tsirkin
2011-01-16 22:37                       ` Simon Horman
2011-01-16 23:56                         ` Rusty Russell
2011-01-17 10:38                           ` Michael S. Tsirkin
2011-01-17 10:26                         ` Michael S. Tsirkin
2011-01-18 19:41                           ` Rick Jones
2011-01-18 20:13                             ` Michael S. Tsirkin
2011-01-18 21:28                               ` Rick Jones
2011-01-19  9:11                               ` Simon Horman
2011-01-20  8:38                             ` Simon Horman
2011-01-21  2:30                               ` Rick Jones
2011-01-21  9:59                               ` Michael S. Tsirkin
2011-01-21 18:04                                 ` Rick Jones
2011-01-21 23:11                                 ` Simon Horman
2011-01-22 21:57                                   ` Michael S. Tsirkin
2011-01-23  6:38                                     ` Simon Horman
2011-01-23 10:39                                       ` Michael S. Tsirkin
2011-01-23 13:53                                         ` Simon Horman
2011-01-24 18:27                                         ` Rick Jones
2011-01-24 18:36                                           ` Michael S. Tsirkin
2011-01-24 19:01                                             ` Rick Jones
2011-01-24 19:42                                               ` Michael S. Tsirkin [this message]
2011-01-06 10:27 ` Michael S. Tsirkin
2011-01-06 11:30   ` Simon Horman
2011-01-06 12:07     ` Michael S. Tsirkin
2011-01-06 12:29       ` Simon Horman
2011-01-06 12:47         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110124194224.GD29941@redhat.com \
    --to=mst@redhat.com \
    --cc=dev@openvswitch.org \
    --cc=horms@verge.net.au \
    --cc=jesse@nicira.com \
    --cc=kvm@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=rusty@rustcorp.com.au \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).