public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>,
	rick.jones2@hp.com, therbert@google.com, wsommerfeld@google.com,
	daniel.baluta@gmail.com, netdev@vger.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?
Date: Mon, 28 Feb 2011 15:53:03 +0100	[thread overview]
Message-ID: <1298904783.2941.412.camel@edumazet-laptop> (raw)
In-Reply-To: <1298899971.2941.281.camel@edumazet-laptop>

Le lundi 28 février 2011 à 14:32 +0100, Eric Dumazet a écrit :
> Le lundi 28 février 2011 à 19:36 +0800, Herbert Xu a écrit :
> > On Sun, Feb 27, 2011 at 07:06:14PM +0800, Herbert Xu wrote:
> > > I'm working on this right now.
> > 
> > OK I think I was definitely on the right track.  With the send
> > patch made lockless I now get numbers which are even better than
> > those obtained with running named with multiple sockets.  That's
> > right, a single socket is now faster than what multiple sockets
> > were without the patch (of course, multiple sockets may still
> > faster with the patch vs. a single socket for obvious reasons,
> > but I couldn't measure any significant difference).
> > 
> > Also worthy of note is that prior to the patch all CPUs showed
> > idleness (lazy bastards!), with the patch they're all maxed out.
> > 
> > In retrospect, the idleness was simply the result of the socket
> > lock scheduling away and was an indication of lock contention.
> > 
> 
> Now, input path can run without finding socket locked by xmit path, so
> skb are queued into receive queue, not backlog one.
> 
> > Here are the patches I used.  Please don't them yet as I intend
> > to clean them up quite a bit.
> > 
> > But please do test them heavily, especially if you have an AMD
> > NUMA machine as that's where scalability problems really show
> > up.  Intel tends to be a lot more forgiving.  My last AMD machine
> > blew up years ago :)
> 
> I am going to test them, thanks !
> 

First "sending only" tests on my 2x4x2 machine (two E5540@2.53GHz, quad
core, hyper threaded, NUMA kernel)

16 threads, each one sending 100.000 UDP frames using a _shared_ socket

I use the same destination IP, so suffer a bit of dst refcount
contention.

(to dummy0 device to avoid contention on qdisc and device)
# ip ro get 10.2.2.21
10.2.2.21 dev dummy0  src 10.2.2.2 
    cache 

LOCKDEP enabled kernel

Before :

time ./udpflood -f -t 16 -l 100000 10.2.2.21

real	0m42.749s
user	0m1.010s
sys	1m38.039s

After :

time ./udpflood -f -t 16 -l 100000 10.2.2.21

real	0m1.167s
user	0m0.488s
sys	0m17.373s


With one thread only and 16*100000 frames :
# time ./udpflood -f -l 1600000 10.2.2.21

real	0m9.318s
user	0m0.238s
sys	0m9.052s

(We have some false sharing on atomic fields in struct file and socket,
but nothing to worry about.)

With LOCKDEP OFF :

16 threads :

# time ./udpflood -f -t 16 -l 100000 10.2.2.21

real	0m0.718s
user	0m0.376s
sys	0m10.963s

1 thread :

# time ./udpflood -f -l 1600000 10.2.2.21

real	0m1.514s
user	0m0.153s
sys	0m1.357s


"perf record/report" results for the 16 threads case (no lockdep)

# Events: 389K cpu-clock-msecs
#
# Overhead      Command        Shared Object                               Symbol
# ........  ...........  ...................  ...................................
#
     9.03%     udpflood  [kernel.kallsyms]    [k] sock_wfree
     8.58%     udpflood  [kernel.kallsyms]    [k] __ip_route_output_key
     8.52%     udpflood  [kernel.kallsyms]    [k] sock_alloc_send_pskb
     7.46%     udpflood  [kernel.kallsyms]    [k] sock_def_write_space
     6.76%     udpflood  [kernel.kallsyms]    [k] __xfrm_lookup
     6.18%      swapper  [kernel.kallsyms]    [k] acpi_idle_enter_bm
     5.66%     udpflood  [kernel.kallsyms]    [k] dst_release
     4.96%     udpflood  [kernel.kallsyms]    [k] udp_sendmsg
     3.48%     udpflood  [kernel.kallsyms]    [k] fget_light
     2.75%     udpflood  [kernel.kallsyms]    [k] sock_tx_timestamp
     2.40%     udpflood  [kernel.kallsyms]    [k] __ip_make_skb
     2.36%     udpflood  [kernel.kallsyms]    [k] fput
     1.87%      swapper  [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore
     1.81%     udpflood  [kernel.kallsyms]    [k] inet_sendmsg
     1.53%     udpflood  [kernel.kallsyms]    [k] sys_sendto
     1.50%     udpflood  [kernel.kallsyms]    [k] ip_finish_output
     1.31%     udpflood  [kernel.kallsyms]    [k] csum_partial_copy_generic
     1.30%     udpflood  udpflood             [.] do_thread
     1.28%     udpflood  [kernel.kallsyms]    [k] __ip_append_data
     1.08%     udpflood  [kernel.kallsyms]    [k] __memset
     1.05%     udpflood  [kernel.kallsyms]    [k] ip_route_output_flow
     0.91%     udpflood  [kernel.kallsyms]    [k] kfree
     0.88%     udpflood  [vdso]               [.] 0xffffe430
     0.83%     udpflood  [kernel.kallsyms]    [k] copy_user_generic_string
     0.78%     udpflood  libc-2.3.4.so        [.] __GI_memcpy
     0.77%     udpflood  [kernel.kallsyms]    [k] ia32_sysenter_target


What do you suggest to perform a bind based test ?




  parent reply	other threads:[~2011-02-28 14:55 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27 10:07 SO_REUSEPORT - can it be done in kernel? Daniel Baluta
2011-01-27 15:55 ` Bill Sommerfeld
2011-01-27 21:32 ` Tom Herbert
2011-02-25 12:56   ` Thomas Graf
2011-02-25 19:18     ` Rick Jones
2011-02-25 19:20       ` David Miller
2011-02-26  0:57         ` Herbert Xu
2011-02-26  2:12           ` David Miller
2011-02-26  2:48             ` Herbert Xu
2011-02-26  3:07               ` David Miller
2011-02-26  3:11                 ` Herbert Xu
2011-02-26  7:31                   ` Eric Dumazet
2011-02-26  7:46                     ` David Miller
2011-02-27 11:02           ` Thomas Graf
2011-02-27 11:06             ` Herbert Xu
2011-02-28  3:45               ` Tom Herbert
2011-02-28  4:26                 ` Herbert Xu
2011-02-28 11:36               ` Herbert Xu
2011-02-28 13:32                 ` Eric Dumazet
2011-02-28 14:13                   ` Herbert Xu
2011-02-28 14:22                     ` Eric Dumazet
2011-02-28 14:25                       ` Herbert Xu
2011-02-28 14:53                   ` Eric Dumazet [this message]
2011-02-28 15:01                     ` Thomas Graf
2011-02-28 14:13                 ` Thomas Graf
2011-02-28 16:22                   ` Eric Dumazet
2011-02-28 16:37                     ` Thomas Graf
2011-02-28 17:07                       ` Eric Dumazet
2011-03-01 10:19                         ` Thomas Graf
2011-03-01 10:33                           ` Eric Dumazet
2011-03-01 11:07                             ` Thomas Graf
2011-03-01 11:13                               ` Eric Dumazet
2011-03-01 11:27                                 ` Thomas Graf
2011-03-01 11:45                                   ` Eric Dumazet
2011-03-01 11:53                                     ` Herbert Xu
2011-03-01 12:32                                       ` Herbert Xu
2011-03-01 13:04                                         ` Eric Dumazet
2011-03-01 13:11                                           ` Herbert Xu
2011-03-01 13:03                                       ` Eric Dumazet
2011-03-01 13:18                                         ` Herbert Xu
2011-03-01 13:52                                           ` Eric Dumazet
2011-03-01 13:58                                             ` Herbert Xu
2011-03-01 16:31                                           ` Eric Dumazet
2011-03-02  0:23                                             ` Herbert Xu
2011-03-02  2:00                                               ` Eric Dumazet
2011-03-02  2:39                                                 ` Herbert Xu
2011-03-02  2:56                                                   ` Eric Dumazet
2011-03-02  3:09                                                     ` Herbert Xu
2011-03-02  3:44                                                       ` Eric Dumazet
2011-03-02  7:12                                                   ` Tom Herbert
2011-03-02  7:31                                                     ` Herbert Xu
2011-03-02  8:04                                                       ` Eric Dumazet
2011-03-02  8:07                                                         ` Herbert Xu
2011-03-02  8:24                                                           ` Eric Dumazet
2011-03-01 12:01                                     ` Thomas Graf
2011-03-01 12:15                                       ` Herbert Xu
2011-03-01 13:27                                       ` Herbert Xu
2011-03-01 12:18                                     ` Thomas Graf
2011-03-01 12:19                                       ` Herbert Xu
2011-03-01 13:50                                         ` Thomas Graf
2011-03-01 14:06                                           ` Eric Dumazet
2011-03-01 14:22                                             ` Thomas Graf
2011-03-01 14:30                                               ` Thomas Graf
2011-03-01 14:52                                                 ` Eric Dumazet
2011-03-01 15:07                                                   ` Thomas Graf
2011-03-01  5:33                 ` Eric Dumazet
2011-03-01 12:35                 ` Herbert Xu
2011-03-01 12:36                   ` [PATCH 1/5] inet: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 12:36                   ` [PATCH 3/5] inet: Add ip_make_skb and ip_finish_skb Herbert Xu
2011-03-01 12:36                   ` [PATCH 2/5] inet: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-02  6:15                     ` inet: Replace left-over references to inet->cork Herbert Xu
2011-03-02  7:01                       ` David Miller
2011-03-01 12:36                   ` [PATCH 5/5] udp: Add lockless transmit path Herbert Xu
2011-03-01 12:36                   ` [PATCH 4/5] udp: Switch to ip_finish_skb Herbert Xu
2011-03-01 16:43                   ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-03-01 20:36                     ` David Miller
2011-02-28 11:41               ` [PATCH 1/5] net: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01  5:31                 ` Eric Dumazet
2011-02-28 11:41               ` [PATCH 2/5] net: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-01  5:31                 ` Eric Dumazet
2011-02-28 11:41               ` [PATCH 3/5] inet: Add ip_make_skb and ip_send_skb Herbert Xu
2011-03-01  5:31                 ` Eric Dumazet
2011-02-28 11:41               ` [PATCH 4/5] udp: Add lockless transmit path Herbert Xu
2011-02-28 11:41                 ` Herbert Xu
2011-03-01  5:30                 ` Eric Dumazet
2011-02-25 19:21       ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-02-25 22:48       ` Thomas Graf
2011-02-25 23:15         ` Rick Jones
2011-02-25 19:51     ` Tom Herbert
2011-02-25 22:58       ` Thomas Graf
2011-02-25 23:33       ` Bill Sommerfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298904783.2941.412.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=daniel.baluta@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=therbert@google.com \
    --cc=wsommerfeld@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox