From: Eric Dumazet <eric.dumazet@gmail.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>,
rick.jones2@hp.com, therbert@google.com, wsommerfeld@google.com,
daniel.baluta@gmail.com, netdev@vger.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?
Date: Mon, 28 Feb 2011 15:53:03 +0100 [thread overview]
Message-ID: <1298904783.2941.412.camel@edumazet-laptop> (raw)
In-Reply-To: <1298899971.2941.281.camel@edumazet-laptop>
Le lundi 28 février 2011 à 14:32 +0100, Eric Dumazet a écrit :
> Le lundi 28 février 2011 à 19:36 +0800, Herbert Xu a écrit :
> > On Sun, Feb 27, 2011 at 07:06:14PM +0800, Herbert Xu wrote:
> > > I'm working on this right now.
> >
> > OK I think I was definitely on the right track. With the send
> > patch made lockless I now get numbers which are even better than
> > those obtained with running named with multiple sockets. That's
> > right, a single socket is now faster than what multiple sockets
> > were without the patch (of course, multiple sockets may still
> > faster with the patch vs. a single socket for obvious reasons,
> > but I couldn't measure any significant difference).
> >
> > Also worthy of note is that prior to the patch all CPUs showed
> > idleness (lazy bastards!), with the patch they're all maxed out.
> >
> > In retrospect, the idleness was simply the result of the socket
> > lock scheduling away and was an indication of lock contention.
> >
>
> Now, input path can run without finding socket locked by xmit path, so
> skb are queued into receive queue, not backlog one.
>
> > Here are the patches I used. Please don't them yet as I intend
> > to clean them up quite a bit.
> >
> > But please do test them heavily, especially if you have an AMD
> > NUMA machine as that's where scalability problems really show
> > up. Intel tends to be a lot more forgiving. My last AMD machine
> > blew up years ago :)
>
> I am going to test them, thanks !
>
First "sending only" tests on my 2x4x2 machine (two E5540@2.53GHz, quad
core, hyper threaded, NUMA kernel)
16 threads, each one sending 100.000 UDP frames using a _shared_ socket
I use the same destination IP, so suffer a bit of dst refcount
contention.
(to dummy0 device to avoid contention on qdisc and device)
# ip ro get 10.2.2.21
10.2.2.21 dev dummy0 src 10.2.2.2
cache
LOCKDEP enabled kernel
Before :
time ./udpflood -f -t 16 -l 100000 10.2.2.21
real 0m42.749s
user 0m1.010s
sys 1m38.039s
After :
time ./udpflood -f -t 16 -l 100000 10.2.2.21
real 0m1.167s
user 0m0.488s
sys 0m17.373s
With one thread only and 16*100000 frames :
# time ./udpflood -f -l 1600000 10.2.2.21
real 0m9.318s
user 0m0.238s
sys 0m9.052s
(We have some false sharing on atomic fields in struct file and socket,
but nothing to worry about.)
With LOCKDEP OFF :
16 threads :
# time ./udpflood -f -t 16 -l 100000 10.2.2.21
real 0m0.718s
user 0m0.376s
sys 0m10.963s
1 thread :
# time ./udpflood -f -l 1600000 10.2.2.21
real 0m1.514s
user 0m0.153s
sys 0m1.357s
"perf record/report" results for the 16 threads case (no lockdep)
# Events: 389K cpu-clock-msecs
#
# Overhead Command Shared Object Symbol
# ........ ........... ................... ...................................
#
9.03% udpflood [kernel.kallsyms] [k] sock_wfree
8.58% udpflood [kernel.kallsyms] [k] __ip_route_output_key
8.52% udpflood [kernel.kallsyms] [k] sock_alloc_send_pskb
7.46% udpflood [kernel.kallsyms] [k] sock_def_write_space
6.76% udpflood [kernel.kallsyms] [k] __xfrm_lookup
6.18% swapper [kernel.kallsyms] [k] acpi_idle_enter_bm
5.66% udpflood [kernel.kallsyms] [k] dst_release
4.96% udpflood [kernel.kallsyms] [k] udp_sendmsg
3.48% udpflood [kernel.kallsyms] [k] fget_light
2.75% udpflood [kernel.kallsyms] [k] sock_tx_timestamp
2.40% udpflood [kernel.kallsyms] [k] __ip_make_skb
2.36% udpflood [kernel.kallsyms] [k] fput
1.87% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.81% udpflood [kernel.kallsyms] [k] inet_sendmsg
1.53% udpflood [kernel.kallsyms] [k] sys_sendto
1.50% udpflood [kernel.kallsyms] [k] ip_finish_output
1.31% udpflood [kernel.kallsyms] [k] csum_partial_copy_generic
1.30% udpflood udpflood [.] do_thread
1.28% udpflood [kernel.kallsyms] [k] __ip_append_data
1.08% udpflood [kernel.kallsyms] [k] __memset
1.05% udpflood [kernel.kallsyms] [k] ip_route_output_flow
0.91% udpflood [kernel.kallsyms] [k] kfree
0.88% udpflood [vdso] [.] 0xffffe430
0.83% udpflood [kernel.kallsyms] [k] copy_user_generic_string
0.78% udpflood libc-2.3.4.so [.] __GI_memcpy
0.77% udpflood [kernel.kallsyms] [k] ia32_sysenter_target
What do you suggest to perform a bind based test ?
next prev parent reply other threads:[~2011-02-28 14:55 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 10:07 SO_REUSEPORT - can it be done in kernel? Daniel Baluta
2011-01-27 15:55 ` Bill Sommerfeld
2011-01-27 21:32 ` Tom Herbert
2011-02-25 12:56 ` Thomas Graf
2011-02-25 19:18 ` Rick Jones
2011-02-25 19:20 ` David Miller
2011-02-26 0:57 ` Herbert Xu
2011-02-26 2:12 ` David Miller
2011-02-26 2:48 ` Herbert Xu
2011-02-26 3:07 ` David Miller
2011-02-26 3:11 ` Herbert Xu
2011-02-26 7:31 ` Eric Dumazet
2011-02-26 7:46 ` David Miller
2011-02-27 11:02 ` Thomas Graf
2011-02-27 11:06 ` Herbert Xu
2011-02-28 3:45 ` Tom Herbert
2011-02-28 4:26 ` Herbert Xu
2011-02-28 11:36 ` Herbert Xu
2011-02-28 13:32 ` Eric Dumazet
2011-02-28 14:13 ` Herbert Xu
2011-02-28 14:22 ` Eric Dumazet
2011-02-28 14:25 ` Herbert Xu
2011-02-28 14:53 ` Eric Dumazet [this message]
2011-02-28 15:01 ` Thomas Graf
2011-02-28 14:13 ` Thomas Graf
2011-02-28 16:22 ` Eric Dumazet
2011-02-28 16:37 ` Thomas Graf
2011-02-28 17:07 ` Eric Dumazet
2011-03-01 10:19 ` Thomas Graf
2011-03-01 10:33 ` Eric Dumazet
2011-03-01 11:07 ` Thomas Graf
2011-03-01 11:13 ` Eric Dumazet
2011-03-01 11:27 ` Thomas Graf
2011-03-01 11:45 ` Eric Dumazet
2011-03-01 11:53 ` Herbert Xu
2011-03-01 12:32 ` Herbert Xu
2011-03-01 13:04 ` Eric Dumazet
2011-03-01 13:11 ` Herbert Xu
2011-03-01 13:03 ` Eric Dumazet
2011-03-01 13:18 ` Herbert Xu
2011-03-01 13:52 ` Eric Dumazet
2011-03-01 13:58 ` Herbert Xu
2011-03-01 16:31 ` Eric Dumazet
2011-03-02 0:23 ` Herbert Xu
2011-03-02 2:00 ` Eric Dumazet
2011-03-02 2:39 ` Herbert Xu
2011-03-02 2:56 ` Eric Dumazet
2011-03-02 3:09 ` Herbert Xu
2011-03-02 3:44 ` Eric Dumazet
2011-03-02 7:12 ` Tom Herbert
2011-03-02 7:31 ` Herbert Xu
2011-03-02 8:04 ` Eric Dumazet
2011-03-02 8:07 ` Herbert Xu
2011-03-02 8:24 ` Eric Dumazet
2011-03-01 12:01 ` Thomas Graf
2011-03-01 12:15 ` Herbert Xu
2011-03-01 13:27 ` Herbert Xu
2011-03-01 12:18 ` Thomas Graf
2011-03-01 12:19 ` Herbert Xu
2011-03-01 13:50 ` Thomas Graf
2011-03-01 14:06 ` Eric Dumazet
2011-03-01 14:22 ` Thomas Graf
2011-03-01 14:30 ` Thomas Graf
2011-03-01 14:52 ` Eric Dumazet
2011-03-01 15:07 ` Thomas Graf
2011-03-01 5:33 ` Eric Dumazet
2011-03-01 12:35 ` Herbert Xu
2011-03-01 12:36 ` [PATCH 1/5] inet: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 12:36 ` [PATCH 3/5] inet: Add ip_make_skb and ip_finish_skb Herbert Xu
2011-03-01 12:36 ` [PATCH 2/5] inet: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-02 6:15 ` inet: Replace left-over references to inet->cork Herbert Xu
2011-03-02 7:01 ` David Miller
2011-03-01 12:36 ` [PATCH 5/5] udp: Add lockless transmit path Herbert Xu
2011-03-01 12:36 ` [PATCH 4/5] udp: Switch to ip_finish_skb Herbert Xu
2011-03-01 16:43 ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-03-01 20:36 ` David Miller
2011-02-28 11:41 ` [PATCH 1/5] net: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 2/5] net: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 3/5] inet: Add ip_make_skb and ip_send_skb Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 4/5] udp: Add lockless transmit path Herbert Xu
2011-02-28 11:41 ` Herbert Xu
2011-03-01 5:30 ` Eric Dumazet
2011-02-25 19:21 ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-02-25 22:48 ` Thomas Graf
2011-02-25 23:15 ` Rick Jones
2011-02-25 19:51 ` Tom Herbert
2011-02-25 22:58 ` Thomas Graf
2011-02-25 23:33 ` Bill Sommerfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1298904783.2941.412.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=daniel.baluta@gmail.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=rick.jones2@hp.com \
--cc=therbert@google.com \
--cc=wsommerfeld@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox