From: Eric Dumazet <dada1@cosmosbay.com>
To: Corey Minyard <minyard@acm.org>
Cc: paulmck@linux.vnet.ibm.com, David Miller <davem@davemloft.net>,
shemminger@vyatta.com, benny+usenet@amorsen.dk,
netdev@vger.kernel.org,
Christoph Lameter <cl@linux-foundation.org>,
a.p.zijlstra@chello.nl, johnpol@2ka.mipt.ru,
Christian Bell <christian@myri.com>
Subject: Re: [PATCH 2/2] udp: RCU handling for Unicast packets.
Date: Thu, 30 Oct 2008 06:50:07 +0100 [thread overview]
Message-ID: <49094B0F.2090208@cosmosbay.com> (raw)
In-Reply-To: <49092891.5060603@acm.org>
[-- Attachment #1: Type: text/plain, Size: 4369 bytes --]
Corey Minyard a écrit :
> Eric Dumazet wrote:
>> Paul E. McKenney a écrit :
>>> On Wed, Oct 29, 2008 at 09:00:13PM +0100, Eric Dumazet wrote:
>>>> Hum... Another way of handling all those cases and avoid memory
>>>> barriers
>>>> would be to have different "NULL" pointers.
>>>>
>>>> Each hash chain should have a unique "NULL" pointer (in the case of
>>>> UDP, it
>>>> can be the 128 values : [ (void*)0 .. (void *)127 ]
>>>>
>>>> Then, when performing a lookup, a reader should check the "NULL"
>>>> pointer
>>>> it get at the end of its lookup has is the "hash" value of its chain.
>>>>
>>>> If not -> restart the loop, aka "goto begin;" :)
>>>>
>>>> We could avoid memory barriers then.
>>>>
>>>> In the two cases Corey mentioned, this trick could let us avoid
>>>> memory barriers.
>>>> (existing one in sk_add_node_rcu(sk, &hslot->head); should be enough)
>>>>
>>>> What do you think ?
>>>
>>> Kinky!!! ;-)
>>>
>>> Then the rcu_dereference() would be supplying the needed memory
>>> barriers.
>>>
>>> Hmmm... I guess that the only confusion would be if the element got
>>> removed and then added to the same list. But then if its pointer was
>>> pseudo-NULL, then that would mean that all subsequent elements had been
>>> removed, and all preceding ones added after the scan started.
>>>
>>> Which might well be harmless, but I must defer to you on this one at
>>> the moment.
>>>
>>> If you need a larger hash table, another approach would be to set the
>>> pointer's low-order bit, allowing the upper bits to be a full-sized
>>> index -- or even a pointer to the list header. Just make very sure
>>> to clear the pointer when freeing, or an element on the freelist
>>> could end up looking like a legitimate end of list... Which again
>>> might well be safe, but why inflict this on oneself?
>>
>> Well, for UDP case, hash table will be <= 65536 anyway, we can assume
>> no dynamic kernel memory is in the range [0 .. 65535]
>>
>> Here is a patch (untested yet, its really time for a sleep for me ;) )
>>
>> [PATCH] udp: Introduce special NULL pointers for hlist termination
>>
>> In order to safely detect changes in chains, we would like to have
>> different
>> 'NULL' pointers. Each chain in hash table is terminated by an unique
>> 'NULL'
>> value, so that the lockless readers can detect their lookups evaded from
>> their starting chain.
>>
>> We define 'NULL' values as ((unsigned long)values < UDP_HTABLE_SIZE)
>>
>> This saves memory barriers (a read barrier to fetch 'next' pointers
>> *before* checking key values) we added in commit
>> 96631ed16c514cf8b28fab991a076985ce378c26 (udp: introduce
>> sk_for_each_rcu_safenext())
>> This also saves a missing memory barrier spotted by Corey Minyard (a
>> write one in udp_lib_get_port(), between sk_hash update and ->next
>> update)
> I think you are right, this will certainly perform a lot better without
> the read barrier in the list traversal. I haven't seen any problems
> with this approach, though it's unusual enough to perhaps warrant some
> extra comments in the code.
>
> You do need to modify udp_lib_unhash(), as sk_del_node_init_rcu() will
> do a NULL check on the ->next value, so you will need a special version
> of that as well.
>
Yes, we need many new macros, like sk_next_nulls(), sk_head_nulls(), ...
I have a working patch now, but not yet presentable for lkml :)
This patch need to touch files outside of netdev scope, so will need
really good shape and documentation.
(Probably a new file : include/linux/list_nulls.h ?)
Maybe in the meantime, we can commit a temporary patch doing the smp_wmb()
you suggested ?
Thanks
[PATCH] udp: add a missing smp_wmb() in udp_lib_get_port()
Corey Minyard spotted a missing memory barrier in udp_lib_get_port()
We need to make sure a reader cannot read the new 'sk->sk_next' value
and previous value of 'sk->sk_hash'. Or else, an item could be deleted
from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain.
This patch is temporary, since we expect an upcoming patch
to introduce another way of handling the problem.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: missing_barrier.patch --]
[-- Type: text/plain, Size: 521 bytes --]
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c3ecec8..5e605ac 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -189,6 +189,11 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
inet_sk(sk)->num = snum;
sk->sk_hash = snum;
if (sk_unhashed(sk)) {
+ /*
+ * We need that previous write to sk->sk_hash committed
+ * before write to sk->next done in following add_node() variant
+ */
+ smp_wmb();
sk_add_node_rcu(sk, &hslot->head);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
}
next prev parent reply other threads:[~2008-10-30 5:51 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-06 18:50 [PATCH 3/3] Convert the UDP hash lock to RCU Corey Minyard
2008-10-06 21:22 ` Eric Dumazet
2008-10-06 21:40 ` David Miller
2008-10-06 23:08 ` Corey Minyard
2008-10-07 8:37 ` Evgeniy Polyakov
2008-10-07 14:16 ` Christoph Lameter
2008-10-07 14:29 ` Evgeniy Polyakov
2008-10-07 14:38 ` Christoph Lameter
2008-10-07 14:33 ` Paul E. McKenney
2008-10-07 14:45 ` Christoph Lameter
2008-10-07 15:07 ` Eric Dumazet
2008-10-07 15:07 ` Paul E. McKenney
2008-10-07 5:24 ` Eric Dumazet
2008-10-07 8:54 ` Benny Amorsen
2008-10-07 12:59 ` Eric Dumazet
2008-10-07 14:07 ` Stephen Hemminger
2008-10-07 20:55 ` David Miller
2008-10-07 21:20 ` Stephen Hemminger
2008-10-08 13:55 ` Eric Dumazet
2008-10-08 18:45 ` David Miller
2008-10-28 20:37 ` [PATCH 1/2] udp: introduce struct udp_table and multiple rwlocks Eric Dumazet
2008-10-28 21:23 ` Christian Bell
2008-10-28 21:31 ` Evgeniy Polyakov
2008-10-28 21:48 ` Eric Dumazet
2008-10-28 21:28 ` Evgeniy Polyakov
2008-10-28 20:42 ` [PATCH 2/2] udp: RCU handling for Unicast packets Eric Dumazet
2008-10-28 22:45 ` Eric Dumazet
2008-10-29 5:05 ` David Miller
2008-10-29 8:23 ` Eric Dumazet
2008-10-29 8:56 ` David Miller
2008-10-29 10:19 ` Eric Dumazet
2008-10-29 18:19 ` David Miller
2008-10-29 9:04 ` Eric Dumazet
2008-10-29 9:17 ` David Miller
2008-10-29 13:17 ` Corey Minyard
2008-10-29 14:36 ` Eric Dumazet
2008-10-29 15:34 ` Corey Minyard
2008-10-29 16:09 ` Eric Dumazet
2008-10-29 16:37 ` Paul E. McKenney
2008-10-29 17:22 ` Corey Minyard
2008-10-29 17:45 ` Eric Dumazet
2008-10-29 18:28 ` Corey Minyard
2008-10-29 18:52 ` Paul E. McKenney
2008-10-29 20:00 ` Eric Dumazet
2008-10-29 20:17 ` Paul E. McKenney
2008-10-29 21:29 ` Corey Minyard
2008-10-29 21:57 ` Eric Dumazet
2008-10-29 21:58 ` Paul E. McKenney
2008-10-29 22:08 ` Eric Dumazet
2008-10-30 3:22 ` Corey Minyard
2008-10-30 5:50 ` Eric Dumazet [this message]
2008-11-02 4:19 ` David Miller
2008-10-30 5:40 ` David Miller
2008-10-30 5:51 ` Eric Dumazet
2008-10-30 7:04 ` Eric Dumazet
2008-10-30 7:05 ` David Miller
2008-10-30 15:40 ` [PATCH] udp: Introduce special NULL pointers for hlist termination Eric Dumazet
2008-10-30 15:51 ` Stephen Hemminger
2008-10-30 16:28 ` Corey Minyard
2008-10-31 14:37 ` Eric Dumazet
2008-10-31 14:55 ` Pavel Emelyanov
2008-11-02 4:22 ` David Miller
2008-10-30 17:12 ` Eric Dumazet
2008-10-31 7:51 ` David Miller
2008-10-30 16:01 ` Peter Zijlstra
2008-10-31 0:14 ` Keith Owens
2008-11-13 13:13 ` [PATCH 0/3] net: RCU lookups for UDP, DCCP and TCP protocol Eric Dumazet
2008-11-13 17:20 ` Andi Kleen
2008-11-17 3:41 ` David Miller
2008-11-19 19:52 ` Christoph Lameter
2008-11-13 13:14 ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Eric Dumazet
2008-11-13 13:29 ` Peter Zijlstra
2008-11-13 13:44 ` Eric Dumazet
2008-11-13 16:02 ` [PATCH 4/3] rcu: documents rculist_nulls Eric Dumazet
2008-11-14 15:16 ` Peter Zijlstra
2008-11-17 3:36 ` David Miller
2008-11-19 17:07 ` Paul E. McKenney
2008-11-14 15:16 ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Peter Zijlstra
2008-11-19 17:01 ` Paul E. McKenney
2008-11-19 17:53 ` Eric Dumazet
2008-11-19 18:46 ` Paul E. McKenney
2008-11-19 18:53 ` Arnaldo Carvalho de Melo
2008-11-19 21:17 ` Paul E. McKenney
2008-11-19 20:39 ` Eric Dumazet
2008-11-19 21:21 ` Paul E. McKenney
2008-11-13 13:15 ` [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code Eric Dumazet
2008-11-19 17:29 ` Paul E. McKenney
2008-11-19 17:53 ` Eric Dumazet
2008-11-13 13:15 ` [PATCH 3/3] net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls Eric Dumazet
2008-11-13 13:34 ` Peter Zijlstra
2008-11-13 13:51 ` Eric Dumazet
2008-11-13 14:08 ` Christoph Lameter
2008-11-13 14:22 ` Peter Zijlstra
2008-11-13 14:27 ` Christoph Lameter
2008-11-19 17:53 ` Paul E. McKenney
2008-11-23 9:33 ` [PATCH] net: Convert TCP/DCCP listening hash tables to use RCU Eric Dumazet
2008-11-23 15:59 ` Paul E. McKenney
2008-11-23 18:42 ` Eric Dumazet
2008-11-23 19:17 ` Paul E. McKenney
2008-11-23 20:18 ` Eric Dumazet
2008-11-23 22:33 ` Paul E. McKenney
2008-11-24 1:23 ` David Miller
2008-10-30 11:04 ` [PATCH 2/2] udp: RCU handling for Unicast packets Peter Zijlstra
2008-10-30 11:30 ` Eric Dumazet
2008-10-30 18:25 ` Paul E. McKenney
2008-10-31 16:40 ` Eric Dumazet
2008-11-01 3:10 ` Paul E. McKenney
2008-10-29 17:32 ` Eric Dumazet
2008-10-29 18:11 ` Paul E. McKenney
2008-10-29 18:29 ` David Miller
2008-10-29 18:38 ` Paul E. McKenney
2008-10-29 18:36 ` Eric Dumazet
2008-10-29 18:20 ` David Miller
2008-10-30 11:12 ` Peter Zijlstra
2008-10-30 11:29 ` Eric Dumazet
2008-10-28 20:37 ` [PATCH 0/2] udp: Convert the UDP hash lock to RCU Eric Dumazet
2008-10-28 21:28 ` Stephen Hemminger
2008-10-28 21:50 ` Eric Dumazet
2008-10-07 16:43 ` [PATCH 3/3] " Corey Minyard
2008-10-07 18:26 ` David Miller
2008-10-08 8:35 ` Eric Dumazet
2008-10-08 16:38 ` David Miller
2008-10-07 8:31 ` Peter Zijlstra
2008-10-07 14:36 ` Paul E. McKenney
2008-10-07 18:29 ` David Miller
2008-10-06 22:07 ` Corey Minyard
2008-10-07 8:17 ` Peter Zijlstra
2008-10-07 9:24 ` Eric Dumazet
2008-10-07 14:15 ` Christoph Lameter
2008-10-07 14:38 ` Paul E. McKenney
2008-10-07 14:50 ` Eric Dumazet
2008-10-07 15:05 ` Paul E. McKenney
2008-10-07 15:09 ` Peter Zijlstra
2008-10-07 15:23 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49094B0F.2090208@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=a.p.zijlstra@chello.nl \
--cc=benny+usenet@amorsen.dk \
--cc=christian@myri.com \
--cc=cl@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=minyard@acm.org \
--cc=netdev@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).