From: Eric Dumazet <dada1@cosmosbay.com>
To: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>,
Linux Netdev List <netdev@vger.kernel.org>,
Andi Kleen <ak@suse.de>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
Date: Thu, 01 Nov 2007 18:54:24 +0100 [thread overview]
Message-ID: <472A12D0.4070401@cosmosbay.com> (raw)
In-Reply-To: <20071101091456.26248ce0@freepuppy.rosehill>
Stephen Hemminger a écrit :
> On Thu, 01 Nov 2007 11:16:20 +0100
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>> As done two years ago on IP route cache table (commit
>> 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per
>> hash bucket for the huge TCP/DCCP hash tables.
>>
>> On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle
>> performance differences. (we hit a different cache line for the rwlock, but
>> then the bucket cache line have a better sharing factor among cpus, since we
>> dirty it less often)
>>
>> Using a 'small' table of hashed rwlocks should be more than enough to provide
>> correct SMP concurrency between different buckets, without using too much
>> memory. Sizing of this table depends on NR_CPUS and various CONFIG settings.
>>
>> This patch provides some locking abstraction that may ease a future work using
>> a different model for TCP/DCCP table.
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>>
>> include/net/inet_hashtables.h | 40 ++++++++++++++++++++++++++++----
>> net/dccp/proto.c | 16 ++++++++++--
>> net/ipv4/inet_diag.c | 9 ++++---
>> net/ipv4/inet_hashtables.c | 7 +++--
>> net/ipv4/inet_timewait_sock.c | 13 +++++-----
>> net/ipv4/tcp.c | 11 +++++++-
>> net/ipv4/tcp_ipv4.c | 11 ++++----
>> net/ipv6/inet6_hashtables.c | 19 ++++++++-------
>> 8 files changed, 89 insertions(+), 37 deletions(-)
>>
>
> Longterm is there any chance of using rcu for this? Seems like
> it could be a big win.
>
This was discussed in the past, and I even believe some patch was proposed,
but some guys (including David) complained that RCU is well suited for 'mostly
read' structures.
On some web server workloads, TCP hash table is constantly accessed in write
mode (socket creation, socket move to timewait state, socket deleted...), and
RCU added overhead and poor cache re-use (because sockets must be placed on
RCU queue before reuse)
On these typical workload, hash table without RCU is still the best.
Longterm changes would rather be based on Robert Olsson suggestion last year
(trie based lookups and unified IP/TCP cache)
Short term changes would be to be able to resize the TCP hash table (being
small at boot, and be able to grow it if necessary). Its current size on
modern machines is just insane.
next prev parent reply other threads:[~2007-11-01 17:54 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-01 10:16 [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table Eric Dumazet
2007-11-01 11:03 ` David Miller
2007-11-01 11:20 ` Arnaldo Carvalho de Melo
2007-11-01 11:15 ` Ilpo Järvinen
2007-11-01 16:06 ` Jarek Poplawski
2007-11-01 18:00 ` Eric Dumazet
2007-11-01 16:14 ` Stephen Hemminger
2007-11-01 17:54 ` Eric Dumazet [this message]
2007-11-01 18:48 ` Rick Jones
2007-11-01 19:00 ` Eric Dumazet
2007-11-01 19:17 ` Eric Dumazet
2007-11-01 21:52 ` David Miller
2007-11-01 21:46 ` David Miller
2007-11-03 23:18 ` Andi Kleen
2007-11-03 23:23 ` David Miller
2007-11-04 0:54 ` Andi Kleen
2007-11-04 11:31 ` Eric Dumazet
2007-11-04 12:26 ` Andi Kleen
2007-11-04 13:05 ` Eric Dumazet
2007-11-04 21:56 ` David Miller
2007-11-04 23:01 ` Andi Kleen
2007-11-05 4:24 ` David Miller
2007-11-05 4:35 ` David Miller
2007-11-04 17:58 ` Jarek Poplawski
2007-11-04 18:15 ` Jarek Poplawski
2007-11-04 21:23 ` Eric Dumazet
2007-11-04 23:08 ` Jarek Poplawski
2007-11-07 10:41 ` David Miller
2007-11-07 12:13 ` Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=472A12D0.4070401@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=acme@redhat.com \
--cc=ak@suse.de \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).