virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Rick Jones <rick.jones2@hp.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Simon Horman <horms@verge.net.au>, Jesse Gross <jesse@nicira.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	virtualization@lists.linux-foundation.org, dev@openvswitch.org,
	virtualization@lists.osdl.org, netdev@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: Flow Control and Port Mirroring Revisited
Date: Mon, 24 Jan 2011 11:01:45 -0800	[thread overview]
Message-ID: <4D3DCC99.5050101@hp.com> (raw)
In-Reply-To: <20110124183626.GB29941@redhat.com>

Michael S. Tsirkin wrote:
> On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> 
>>>Just to block netperf you can send it SIGSTOP :)
>>>
>>
>>Clever :)  One could I suppose achieve the same result by making the
>>remote receive socket buffer size smaller than the UDP message size
>>and then not worry about having to learn the netserver's PID to send
>>it the SIGSTOP.  I *think* the semantics will be substantially the
>>same?
> 
> 
> If you could set, it, yes. But at least linux ignores
> any value substantially smaller than 1K, and then
> multiplies that by 2:
> 
>         case SO_RCVBUF:
>                 /* Don't error on this BSD doesn't and if you think
>                    about it this is right. Otherwise apps have to
>                    play 'guess the biggest size' games. RCVBUF/SNDBUF
>                    are treated in BSD as hints */
> 
>                 if (val > sysctl_rmem_max)
>                         val = sysctl_rmem_max;
> set_rcvbuf:     
>                 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
> 
>                 /*
>                  * We double it on the way in to account for
>                  * "struct sk_buff" etc. overhead.   Applications
>                  * assume that the SO_RCVBUF setting they make will
>                  * allow that much actual data to be received on that
>                  * socket.
>                  *
>                  * Applications are unaware that "struct sk_buff" and
>                  * other overheads allocate from the receive buffer
>                  * during socket buffer allocation. 
>                  *
>                  * And after considering the possible alternatives,
>                  * returning the value we actually used in getsockopt
>                  * is the most desirable behavior.
>                  */ 
>                 if ((val * 2) < SOCK_MIN_RCVBUF)
>                         sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
>                 else
>                         sk->sk_rcvbuf = val * 2;
> 
> and
> 
> /*                      
>  * Since sk_rmem_alloc sums skb->truesize, even a small frame might need
>  * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
>  */             
> #define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))

Pity - seems to work back on 2.6.26:

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928    1024   10.00     2882334      0    2361.17
    256           10.00           0              0.00

raj@tardy:~/netperf2_trunk$ uname -a
Linux tardy 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux

Still, even with that (or SIGSTOP) we don't really know where the packets were 
dropped right?  There is no guarantee they weren't dropped before they got to 
the socket buffer

happy benchmarking,
rick jones

PS - here is with a -S 1024 option:

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1024 -m 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928    1024   10.00     1679269      0    1375.64
   2048           10.00     1490662           1221.13

showing that there is a decent chance that many of the frames were dropped at 
the socket buffer, but not all - I suppose I could/should be checking netstat 
stats... :)

And just a little more, only because I was curious :)

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1M -m 257
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928     257   10.00     1869134      0     384.29
262142           10.00     1869134            384.29

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 257
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928     257   10.00     3076363      0     632.49
    256           10.00           0              0.00


  reply	other threads:[~2011-01-24 19:01 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-06  9:33 Flow Control and Port Mirroring Revisited Simon Horman
2011-01-06 10:22 ` Eric Dumazet
2011-01-06 12:44   ` Simon Horman
2011-01-06 13:28     ` Eric Dumazet
2011-01-06 22:01       ` Simon Horman
2011-01-06 22:38     ` Jesse Gross
2011-01-07  1:23       ` Simon Horman
2011-01-10  9:31         ` Simon Horman
2011-01-13  6:47           ` Simon Horman
2011-01-13 15:45             ` Jesse Gross
2011-01-13 23:41               ` Simon Horman
2011-01-14  4:58                 ` Michael S. Tsirkin
2011-01-14  6:35                   ` Simon Horman
2011-01-14  6:54                     ` Michael S. Tsirkin
2011-01-16 22:37                       ` Simon Horman
2011-01-16 23:56                         ` Rusty Russell
2011-01-17 10:38                           ` Michael S. Tsirkin
2011-01-17 10:26                         ` Michael S. Tsirkin
2011-01-18 19:41                           ` Rick Jones
2011-01-18 20:13                             ` Michael S. Tsirkin
2011-01-18 21:28                               ` Rick Jones
2011-01-19  9:11                               ` Simon Horman
2011-01-20  8:38                             ` Simon Horman
2011-01-21  2:30                               ` Rick Jones
2011-01-21  9:59                               ` Michael S. Tsirkin
2011-01-21 18:04                                 ` Rick Jones
2011-01-21 23:11                                 ` Simon Horman
2011-01-22 21:57                                   ` Michael S. Tsirkin
2011-01-23  6:38                                     ` Simon Horman
2011-01-23 10:39                                       ` Michael S. Tsirkin
2011-01-23 13:53                                         ` Simon Horman
2011-01-24 18:27                                         ` Rick Jones
2011-01-24 18:36                                           ` Michael S. Tsirkin
2011-01-24 19:01                                             ` Rick Jones [this message]
2011-01-24 19:42                                               ` Michael S. Tsirkin
2011-01-06 10:27 ` Michael S. Tsirkin
2011-01-06 11:30   ` Simon Horman
2011-01-06 12:07     ` Michael S. Tsirkin
2011-01-06 12:29       ` Simon Horman
2011-01-06 12:47         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D3DCC99.5050101@hp.com \
    --to=rick.jones2@hp.com \
    --cc=dev@openvswitch.org \
    --cc=horms@verge.net.au \
    --cc=jesse@nicira.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).