From: Eric Dumazet <eric.dumazet@gmail.com>
To: Thomas Graf <tgraf@infradead.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
David Miller <davem@davemloft.net>,
rick.jones2@hp.com, therbert@google.com, wsommerfeld@google.com,
daniel.baluta@gmail.com, netdev@vger.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?
Date: Mon, 28 Feb 2011 17:22:54 +0100 [thread overview]
Message-ID: <1298910174.2941.585.camel@edumazet-laptop> (raw)
In-Reply-To: <20110228141322.GF9763@canuck.infradead.org>
Le lundi 28 février 2011 à 09:13 -0500, Thomas Graf a écrit :
> On Mon, Feb 28, 2011 at 07:36:59PM +0800, Herbert Xu wrote:
> > But please do test them heavily, especially if you have an AMD
> > NUMA machine as that's where scalability problems really show
> > up. Intel tends to be a lot more forgiving. My last AMD machine
> > blew up years ago :)
>
> This is just a preliminary test result and not 100% reliable
> because half through the testing the machine reported memory
> issues and disabled a DIMM before booting the tested kernels.
>
> Nevertheless, bind 9.7.3:
>
> 2.6.38-rc5+: 62kqps
> 2.6.38-rc5+ w/ Herbert's patch: 442kqps
>
> This is on a 2 NUMA Intel Xeon X5560 @ 2.80GHz with 16 cores
>
> Again, this number is not 100% reliably but clearly shows that
> the concept of the patch is working very well.
>
> Will test Herbert's patch on the machine that did 650kqps with
> SO_REUSEPORT and also on some AMD machines.
> --
I suspect your queryperf input file hits many zones ?
With a single zone, my machine is able to give 250kps : most of the time
is consumed in bind code, dealing with rwlocks and false sharing
things...
(bind-9.7.2-P3)
Using two remote machines to perform queries, on bnx2x adapter, RSS
enabled : two cpus receive UDP frames for the same socket, so we also
hit false sharing in kernel receive path.
---------------------------------------------------------------------------------------------------------------------------------
PerfTop: 558863 irqs/sec kernel:40.8% exact: 0.0% [1000Hz cpu-clock-msecs], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _____________________________ ______________________________________
137175.00 12.4% acpi_idle_enter_bm [kernel.kallsyms]
63784.00 5.8% _raw_spin_unlock_irqrestore [kernel.kallsyms]
54140.00 4.9% isc_rwlock_lock /opt/src/bind-9.7.2-P3/bin/named/named
32682.00 2.9% isc_rwlock_unlock /opt/src/bind-9.7.2-P3/bin/named/named
21823.00 2.0% dns_rbt_findnode /opt/src/bind-9.7.2-P3/bin/named/named
20306.00 1.8% __ticket_spin_lock [kernel.kallsyms]
16881.00 1.5% finish_task_switch [kernel.kallsyms]
15335.00 1.4% zone_find /opt/src/bind-9.7.2-P3/bin/named/named
14082.00 1.3% decrement_reference /opt/src/bind-9.7.2-P3/bin/named/named
14064.00 1.3% __pthread_mutex_lock_internal /lib/tls/libpthread-2.3.4.so
13519.00 1.2% isc_stats_increment /opt/src/bind-9.7.2-P3/bin/named/named
13027.00 1.2% __GI_memcpy /lib/tls/libc-2.3.4.so
12516.00 1.1% dns_name_concatenate /opt/src/bind-9.7.2-P3/bin/named/named
12499.00 1.1% currentversion /opt/src/bind-9.7.2-P3/bin/named/named
11412.00 1.0% dns_name_fullcompare /opt/src/bind-9.7.2-P3/bin/named/named
10814.00 1.0% new_reference.clone.6 /opt/src/bind-9.7.2-P3/bin/named/named
10580.00 1.0% attach /opt/src/bind-9.7.2-P3/bin/named/named
9805.00 0.9% zone_zonecut_callback /opt/src/bind-9.7.2-P3/bin/named/named
next prev parent reply other threads:[~2011-02-28 16:23 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 10:07 SO_REUSEPORT - can it be done in kernel? Daniel Baluta
2011-01-27 15:55 ` Bill Sommerfeld
2011-01-27 21:32 ` Tom Herbert
2011-02-25 12:56 ` Thomas Graf
2011-02-25 19:18 ` Rick Jones
2011-02-25 19:20 ` David Miller
2011-02-26 0:57 ` Herbert Xu
2011-02-26 2:12 ` David Miller
2011-02-26 2:48 ` Herbert Xu
2011-02-26 3:07 ` David Miller
2011-02-26 3:11 ` Herbert Xu
2011-02-26 7:31 ` Eric Dumazet
2011-02-26 7:46 ` David Miller
2011-02-27 11:02 ` Thomas Graf
2011-02-27 11:06 ` Herbert Xu
2011-02-28 3:45 ` Tom Herbert
2011-02-28 4:26 ` Herbert Xu
2011-02-28 11:36 ` Herbert Xu
2011-02-28 13:32 ` Eric Dumazet
2011-02-28 14:13 ` Herbert Xu
2011-02-28 14:22 ` Eric Dumazet
2011-02-28 14:25 ` Herbert Xu
2011-02-28 14:53 ` Eric Dumazet
2011-02-28 15:01 ` Thomas Graf
2011-02-28 14:13 ` Thomas Graf
2011-02-28 16:22 ` Eric Dumazet [this message]
2011-02-28 16:37 ` Thomas Graf
2011-02-28 17:07 ` Eric Dumazet
2011-03-01 10:19 ` Thomas Graf
2011-03-01 10:33 ` Eric Dumazet
2011-03-01 11:07 ` Thomas Graf
2011-03-01 11:13 ` Eric Dumazet
2011-03-01 11:27 ` Thomas Graf
2011-03-01 11:45 ` Eric Dumazet
2011-03-01 11:53 ` Herbert Xu
2011-03-01 12:32 ` Herbert Xu
2011-03-01 13:04 ` Eric Dumazet
2011-03-01 13:11 ` Herbert Xu
2011-03-01 13:03 ` Eric Dumazet
2011-03-01 13:18 ` Herbert Xu
2011-03-01 13:52 ` Eric Dumazet
2011-03-01 13:58 ` Herbert Xu
2011-03-01 16:31 ` Eric Dumazet
2011-03-02 0:23 ` Herbert Xu
2011-03-02 2:00 ` Eric Dumazet
2011-03-02 2:39 ` Herbert Xu
2011-03-02 2:56 ` Eric Dumazet
2011-03-02 3:09 ` Herbert Xu
2011-03-02 3:44 ` Eric Dumazet
2011-03-02 7:12 ` Tom Herbert
2011-03-02 7:31 ` Herbert Xu
2011-03-02 8:04 ` Eric Dumazet
2011-03-02 8:07 ` Herbert Xu
2011-03-02 8:24 ` Eric Dumazet
2011-03-01 12:01 ` Thomas Graf
2011-03-01 12:15 ` Herbert Xu
2011-03-01 13:27 ` Herbert Xu
2011-03-01 12:18 ` Thomas Graf
2011-03-01 12:19 ` Herbert Xu
2011-03-01 13:50 ` Thomas Graf
2011-03-01 14:06 ` Eric Dumazet
2011-03-01 14:22 ` Thomas Graf
2011-03-01 14:30 ` Thomas Graf
2011-03-01 14:52 ` Eric Dumazet
2011-03-01 15:07 ` Thomas Graf
2011-03-01 5:33 ` Eric Dumazet
2011-03-01 12:35 ` Herbert Xu
2011-03-01 12:36 ` [PATCH 3/5] inet: Add ip_make_skb and ip_finish_skb Herbert Xu
2011-03-01 12:36 ` [PATCH 2/5] inet: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-02 6:15 ` inet: Replace left-over references to inet->cork Herbert Xu
2011-03-02 7:01 ` David Miller
2011-03-01 12:36 ` [PATCH 1/5] inet: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 12:36 ` [PATCH 4/5] udp: Switch to ip_finish_skb Herbert Xu
2011-03-01 12:36 ` [PATCH 5/5] udp: Add lockless transmit path Herbert Xu
2011-03-01 16:43 ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-03-01 20:36 ` David Miller
2011-02-28 11:41 ` [PATCH 2/5] net: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 1/5] net: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 4/5] udp: Add lockless transmit path Herbert Xu
2011-02-28 11:41 ` Herbert Xu
2011-03-01 5:30 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 3/5] inet: Add ip_make_skb and ip_send_skb Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-25 19:21 ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-02-25 22:48 ` Thomas Graf
2011-02-25 23:15 ` Rick Jones
2011-02-25 19:51 ` Tom Herbert
2011-02-25 22:58 ` Thomas Graf
2011-02-25 23:33 ` Bill Sommerfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1298910174.2941.585.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=daniel.baluta@gmail.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=rick.jones2@hp.com \
--cc=tgraf@infradead.org \
--cc=therbert@google.com \
--cc=wsommerfeld@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox