From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: shemminger@vyatta.com, benny+usenet@amorsen.dk, minyard@acm.org,
netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com,
Christoph Lameter <clameter@sgi.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Subject: Re: [PATCH 2/2] udp: RCU handling for Unicast packets.
Date: Tue, 28 Oct 2008 23:45:15 +0100 [thread overview]
Message-ID: <490795FB.2000201@cosmosbay.com> (raw)
In-Reply-To: <49077918.4050706@cosmosbay.com>
Eric Dumazet a écrit :
> RCUification of UDP hash tables
>
> Goals are :
>
> 1) Optimizing handling of incoming Unicast UDP frames, so that no memory
> writes should happen in the fast path. Using an array of rwlocks (one per
> slot for example is not an option in this regard)
>
> Note: Multicasts and broadcasts still will need to take a lock,
> because doing a full lockless lookup in this case is difficult.
>
> 2) No expensive operations in the socket bind/unhash phases :
> - No expensive synchronize_rcu() calls.
>
> - No added rcu_head in socket structure, increasing memory needs,
> but more important, forcing us to use call_rcu() calls,
> that have the bad property of making sockets structure cold.
> (rcu grace period between socket freeing and its potential reuse
> make this socket being cold in CPU cache).
> David did a previous patch using call_rcu() and noticed a 20%
> impact on TCP connection rates.
> Quoting Cristopher Lameter :
> "Right. That results in cacheline cooldown. You'd want to recycle
> the object as they are cache hot on a per cpu basis. That is screwed
> up by the delayed regular rcu processing. We have seen multiple
> regressions due to cacheline cooldown.
> The only choice in cacheline hot sensitive areas is to deal with the
> complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
>
> - Because udp sockets are allocated from dedicated kmem_cache,
> use of SLAB_DESTROY_BY_RCU can help here.
>
> Theory of operation :
> ---------------------
>
> As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
> special attention must be taken by readers and writers.
>
> Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
> reused, inserted in a different chain or in worst case in the same chain
> while readers could do lookups in the same time.
>
> In order to avoid loops, a reader must check each socket found in a chain
> really belongs to the chain the reader was traversing. If it finds a
> mismatch, lookup must start again at the begining. This *restart* loop
> is the reason we had to use rdlock for the multicast case, because
> we dont want to send same message several times to the same socket.
>
> We use RCU only for fast path. Thus, /proc/net/udp still take rdlocks.
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> include/net/sock.h | 37 ++++++++++++++++++++++++++++++++++++-
> net/core/sock.c | 3 ++-
> net/ipv4/udp.c | 35 ++++++++++++++++++++++++++---------
> net/ipv4/udplite.c | 1 +
> net/ipv6/udp.c | 25 ++++++++++++++++++-------
> net/ipv6/udplite.c | 1 +
> 6 files changed, 84 insertions(+), 18 deletions(-)
>
On ipv6 side, I forgot to add a check before compute_score(), like I did on ipv4
+ rcu_read_lock();
+begin:
+ result = NULL;
+ badness = -1;
+ sk_for_each_rcu(sk, node, &hslot->head) {
< BEGIN HERE missing part --->
/*
* lockless reader, and SLAB_DESTROY_BY_RCU items:
* We must check this item was not moved to another chain
*/
if (udp_hashfn(net, sk->sk_hash) != hash)
goto begin;
< END missing part --->
score = compute_score(sk, net, hnum, saddr, sport, daddr, dport, dif);
if (score > badness) {
result = sk;
badness = score;
}
}
- if (result)
- sock_hold(result);
- read_unlock(&hslot->lock);
+ if (result) {
+ if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
+ result = NULL;
+ else if (unlikely(compute_score(result, net, hnum, saddr, sport,
+ daddr, dport, dif) < badness)) {
+ sock_put(result);
+ goto begin;
+ }
+ }
I will submit a new patch serie tomorrow, with :
Patch 1 : spinlocks instead of rwlocks, and bug spotted by Christian Bell
Patch 2 : splited on two parts (2 & 3) , one for IPV4, one for IPV6,
Thanks
next prev parent reply other threads:[~2008-10-28 23:49 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-06 18:50 [PATCH 3/3] Convert the UDP hash lock to RCU Corey Minyard
2008-10-06 21:22 ` Eric Dumazet
2008-10-06 21:40 ` David Miller
2008-10-06 23:08 ` Corey Minyard
2008-10-07 8:37 ` Evgeniy Polyakov
2008-10-07 14:16 ` Christoph Lameter
2008-10-07 14:29 ` Evgeniy Polyakov
2008-10-07 14:38 ` Christoph Lameter
2008-10-07 14:33 ` Paul E. McKenney
2008-10-07 14:45 ` Christoph Lameter
2008-10-07 15:07 ` Eric Dumazet
2008-10-07 15:07 ` Paul E. McKenney
2008-10-07 5:24 ` Eric Dumazet
2008-10-07 8:54 ` Benny Amorsen
2008-10-07 12:59 ` Eric Dumazet
2008-10-07 14:07 ` Stephen Hemminger
2008-10-07 20:55 ` David Miller
2008-10-07 21:20 ` Stephen Hemminger
2008-10-08 13:55 ` Eric Dumazet
2008-10-08 18:45 ` David Miller
2008-10-28 20:37 ` [PATCH 1/2] udp: introduce struct udp_table and multiple rwlocks Eric Dumazet
2008-10-28 21:23 ` Christian Bell
2008-10-28 21:31 ` Evgeniy Polyakov
2008-10-28 21:48 ` Eric Dumazet
2008-10-28 21:28 ` Evgeniy Polyakov
2008-10-28 20:42 ` [PATCH 2/2] udp: RCU handling for Unicast packets Eric Dumazet
2008-10-28 22:45 ` Eric Dumazet [this message]
2008-10-29 5:05 ` David Miller
2008-10-29 8:23 ` Eric Dumazet
2008-10-29 8:56 ` David Miller
2008-10-29 10:19 ` Eric Dumazet
2008-10-29 18:19 ` David Miller
2008-10-29 9:04 ` Eric Dumazet
2008-10-29 9:17 ` David Miller
2008-10-29 13:17 ` Corey Minyard
2008-10-29 14:36 ` Eric Dumazet
2008-10-29 15:34 ` Corey Minyard
2008-10-29 16:09 ` Eric Dumazet
2008-10-29 16:37 ` Paul E. McKenney
2008-10-29 17:22 ` Corey Minyard
2008-10-29 17:45 ` Eric Dumazet
2008-10-29 18:28 ` Corey Minyard
2008-10-29 18:52 ` Paul E. McKenney
2008-10-29 20:00 ` Eric Dumazet
2008-10-29 20:17 ` Paul E. McKenney
2008-10-29 21:29 ` Corey Minyard
2008-10-29 21:57 ` Eric Dumazet
2008-10-29 21:58 ` Paul E. McKenney
2008-10-29 22:08 ` Eric Dumazet
2008-10-30 3:22 ` Corey Minyard
2008-10-30 5:50 ` Eric Dumazet
2008-11-02 4:19 ` David Miller
2008-10-30 5:40 ` David Miller
2008-10-30 5:51 ` Eric Dumazet
2008-10-30 7:04 ` Eric Dumazet
2008-10-30 7:05 ` David Miller
2008-10-30 15:40 ` [PATCH] udp: Introduce special NULL pointers for hlist termination Eric Dumazet
2008-10-30 15:51 ` Stephen Hemminger
2008-10-30 16:28 ` Corey Minyard
2008-10-31 14:37 ` Eric Dumazet
2008-10-31 14:55 ` Pavel Emelyanov
2008-11-02 4:22 ` David Miller
2008-10-30 17:12 ` Eric Dumazet
2008-10-31 7:51 ` David Miller
2008-10-30 16:01 ` Peter Zijlstra
2008-10-31 0:14 ` Keith Owens
2008-11-13 13:13 ` [PATCH 0/3] net: RCU lookups for UDP, DCCP and TCP protocol Eric Dumazet
2008-11-13 17:20 ` Andi Kleen
2008-11-17 3:41 ` David Miller
2008-11-19 19:52 ` Christoph Lameter
2008-11-13 13:14 ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Eric Dumazet
2008-11-13 13:29 ` Peter Zijlstra
2008-11-13 13:44 ` Eric Dumazet
2008-11-13 16:02 ` [PATCH 4/3] rcu: documents rculist_nulls Eric Dumazet
2008-11-14 15:16 ` Peter Zijlstra
2008-11-17 3:36 ` David Miller
2008-11-19 17:07 ` Paul E. McKenney
2008-11-14 15:16 ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Peter Zijlstra
2008-11-19 17:01 ` Paul E. McKenney
2008-11-19 17:53 ` Eric Dumazet
2008-11-19 18:46 ` Paul E. McKenney
2008-11-19 18:53 ` Arnaldo Carvalho de Melo
2008-11-19 21:17 ` Paul E. McKenney
2008-11-19 20:39 ` Eric Dumazet
2008-11-19 21:21 ` Paul E. McKenney
2008-11-13 13:15 ` [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code Eric Dumazet
2008-11-19 17:29 ` Paul E. McKenney
2008-11-19 17:53 ` Eric Dumazet
2008-11-13 13:15 ` [PATCH 3/3] net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls Eric Dumazet
2008-11-13 13:34 ` Peter Zijlstra
2008-11-13 13:51 ` Eric Dumazet
2008-11-13 14:08 ` Christoph Lameter
2008-11-13 14:22 ` Peter Zijlstra
2008-11-13 14:27 ` Christoph Lameter
2008-11-19 17:53 ` Paul E. McKenney
2008-11-23 9:33 ` [PATCH] net: Convert TCP/DCCP listening hash tables to use RCU Eric Dumazet
2008-11-23 15:59 ` Paul E. McKenney
2008-11-23 18:42 ` Eric Dumazet
2008-11-23 19:17 ` Paul E. McKenney
2008-11-23 20:18 ` Eric Dumazet
2008-11-23 22:33 ` Paul E. McKenney
2008-11-24 1:23 ` David Miller
2008-10-30 11:04 ` [PATCH 2/2] udp: RCU handling for Unicast packets Peter Zijlstra
2008-10-30 11:30 ` Eric Dumazet
2008-10-30 18:25 ` Paul E. McKenney
2008-10-31 16:40 ` Eric Dumazet
2008-11-01 3:10 ` Paul E. McKenney
2008-10-29 17:32 ` Eric Dumazet
2008-10-29 18:11 ` Paul E. McKenney
2008-10-29 18:29 ` David Miller
2008-10-29 18:38 ` Paul E. McKenney
2008-10-29 18:36 ` Eric Dumazet
2008-10-29 18:20 ` David Miller
2008-10-30 11:12 ` Peter Zijlstra
2008-10-30 11:29 ` Eric Dumazet
2008-10-28 20:37 ` [PATCH 0/2] udp: Convert the UDP hash lock to RCU Eric Dumazet
2008-10-28 21:28 ` Stephen Hemminger
2008-10-28 21:50 ` Eric Dumazet
2008-10-07 16:43 ` [PATCH 3/3] " Corey Minyard
2008-10-07 18:26 ` David Miller
2008-10-08 8:35 ` Eric Dumazet
2008-10-08 16:38 ` David Miller
2008-10-07 8:31 ` Peter Zijlstra
2008-10-07 14:36 ` Paul E. McKenney
2008-10-07 18:29 ` David Miller
2008-10-06 22:07 ` Corey Minyard
2008-10-07 8:17 ` Peter Zijlstra
2008-10-07 9:24 ` Eric Dumazet
2008-10-07 14:15 ` Christoph Lameter
2008-10-07 14:38 ` Paul E. McKenney
2008-10-07 14:50 ` Eric Dumazet
2008-10-07 15:05 ` Paul E. McKenney
2008-10-07 15:09 ` Peter Zijlstra
2008-10-07 15:23 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=490795FB.2000201@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=a.p.zijlstra@chello.nl \
--cc=benny+usenet@amorsen.dk \
--cc=clameter@sgi.com \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=minyard@acm.org \
--cc=netdev@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.