From: Thomas Graf <tgraf@infradead.org>
To: Rick Jones <rick.jones2@hp.com>
Cc: Tom Herbert <therbert@google.com>,
Bill Sommerfeld <wsommerfeld@google.com>,
Daniel Baluta <daniel.baluta@gmail.com>,
netdev@vger.kernel.org
Subject: Re: SO_REUSEPORT - can it be done in kernel?
Date: Fri, 25 Feb 2011 17:48:46 -0500 [thread overview]
Message-ID: <20110225224846.GC9763@canuck.infradead.org> (raw)
In-Reply-To: <1298661495.14113.152.camel@tardy>
On Fri, Feb 25, 2011 at 11:18:15AM -0800, Rick Jones wrote:
> I think the idea is goodness, but will ask, was the (first) bottleneck
> actually in the kernel, or was it in bind itself? I've seen
> single-instance, single-byte burst-mode netperf TCP_RR do in excess of
> 300K transactions per second (with TCP_NODELAY set) on an X5560 core.
>
> ftp://ftp.netperf.org/netperf/misc/dl380g6_X5560_rhel54_ad386_cxgb3_1.4.1.2_b2b_to_same_agg_1500mtu_20100513-2.csv
>
> and that was with now ancient RHEL5.4 bits... yes, there is a bit of
> apples, oranges and kumquats but still, I am wondering if this didn't
> also "work around" some internal BIND scaling issues as well.
Yes it is. We have observed two separate bottlenecks.
The first we have discovered is within BIND. As soon as more than 1
worker thread is being used strace showed a ton of futex() system
calls to the kernel as soon as the number of queries crossed a magic
barrier. This suggested heavy lock contention within BIND.
This BIND lock contetion was not visible on all systems having scalability
issues though. Some machines were not able to deliver enough queries to
BIND in order for the lock contention to appear.
next prev parent reply other threads:[~2011-02-25 22:48 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 10:07 SO_REUSEPORT - can it be done in kernel? Daniel Baluta
2011-01-27 15:55 ` Bill Sommerfeld
2011-01-27 21:32 ` Tom Herbert
2011-02-25 12:56 ` Thomas Graf
2011-02-25 19:18 ` Rick Jones
2011-02-25 19:20 ` David Miller
2011-02-26 0:57 ` Herbert Xu
2011-02-26 2:12 ` David Miller
2011-02-26 2:48 ` Herbert Xu
2011-02-26 3:07 ` David Miller
2011-02-26 3:11 ` Herbert Xu
2011-02-26 7:31 ` Eric Dumazet
2011-02-26 7:46 ` David Miller
2011-02-27 11:02 ` Thomas Graf
2011-02-27 11:06 ` Herbert Xu
2011-02-28 3:45 ` Tom Herbert
2011-02-28 4:26 ` Herbert Xu
2011-02-28 11:36 ` Herbert Xu
2011-02-28 13:32 ` Eric Dumazet
2011-02-28 14:13 ` Herbert Xu
2011-02-28 14:22 ` Eric Dumazet
2011-02-28 14:25 ` Herbert Xu
2011-02-28 14:53 ` Eric Dumazet
2011-02-28 15:01 ` Thomas Graf
2011-02-28 14:13 ` Thomas Graf
2011-02-28 16:22 ` Eric Dumazet
2011-02-28 16:37 ` Thomas Graf
2011-02-28 17:07 ` Eric Dumazet
2011-03-01 10:19 ` Thomas Graf
2011-03-01 10:33 ` Eric Dumazet
2011-03-01 11:07 ` Thomas Graf
2011-03-01 11:13 ` Eric Dumazet
2011-03-01 11:27 ` Thomas Graf
2011-03-01 11:45 ` Eric Dumazet
2011-03-01 11:53 ` Herbert Xu
2011-03-01 12:32 ` Herbert Xu
2011-03-01 13:04 ` Eric Dumazet
2011-03-01 13:11 ` Herbert Xu
2011-03-01 13:03 ` Eric Dumazet
2011-03-01 13:18 ` Herbert Xu
2011-03-01 13:52 ` Eric Dumazet
2011-03-01 13:58 ` Herbert Xu
2011-03-01 16:31 ` Eric Dumazet
2011-03-02 0:23 ` Herbert Xu
2011-03-02 2:00 ` Eric Dumazet
2011-03-02 2:39 ` Herbert Xu
2011-03-02 2:56 ` Eric Dumazet
2011-03-02 3:09 ` Herbert Xu
2011-03-02 3:44 ` Eric Dumazet
2011-03-02 7:12 ` Tom Herbert
2011-03-02 7:31 ` Herbert Xu
2011-03-02 8:04 ` Eric Dumazet
2011-03-02 8:07 ` Herbert Xu
2011-03-02 8:24 ` Eric Dumazet
2011-03-01 12:01 ` Thomas Graf
2011-03-01 12:15 ` Herbert Xu
2011-03-01 13:27 ` Herbert Xu
2011-03-01 12:18 ` Thomas Graf
2011-03-01 12:19 ` Herbert Xu
2011-03-01 13:50 ` Thomas Graf
2011-03-01 14:06 ` Eric Dumazet
2011-03-01 14:22 ` Thomas Graf
2011-03-01 14:30 ` Thomas Graf
2011-03-01 14:52 ` Eric Dumazet
2011-03-01 15:07 ` Thomas Graf
2011-03-01 5:33 ` Eric Dumazet
2011-03-01 12:35 ` Herbert Xu
2011-03-01 12:36 ` [PATCH 3/5] inet: Add ip_make_skb and ip_finish_skb Herbert Xu
2011-03-01 12:36 ` [PATCH 2/5] inet: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-02 6:15 ` inet: Replace left-over references to inet->cork Herbert Xu
2011-03-02 7:01 ` David Miller
2011-03-01 12:36 ` [PATCH 1/5] inet: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 12:36 ` [PATCH 5/5] udp: Add lockless transmit path Herbert Xu
2011-03-01 12:36 ` [PATCH 4/5] udp: Switch to ip_finish_skb Herbert Xu
2011-03-01 16:43 ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-03-01 20:36 ` David Miller
2011-02-28 11:41 ` [PATCH 2/5] net: Remove explicit write references to sk/inet in ip_append_data Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 1/5] net: Remove unused sk_sndmsg_* from UFO Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 4/5] udp: Add lockless transmit path Herbert Xu
2011-02-28 11:41 ` Herbert Xu
2011-03-01 5:30 ` Eric Dumazet
2011-02-28 11:41 ` [PATCH 3/5] inet: Add ip_make_skb and ip_send_skb Herbert Xu
2011-03-01 5:31 ` Eric Dumazet
2011-02-25 19:21 ` SO_REUSEPORT - can it be done in kernel? Eric Dumazet
2011-02-25 22:48 ` Thomas Graf [this message]
2011-02-25 23:15 ` Rick Jones
2011-02-25 19:51 ` Tom Herbert
2011-02-25 22:58 ` Thomas Graf
2011-02-25 23:33 ` Bill Sommerfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110225224846.GC9763@canuck.infradead.org \
--to=tgraf@infradead.org \
--cc=daniel.baluta@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rick.jones2@hp.com \
--cc=therbert@google.com \
--cc=wsommerfeld@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).