Re: [PATCH 2/2] udp: RCU handling for Unicast packets.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Dumazet <dada1@cosmosbay.com>
To: paulmck@linux.vnet.ibm.com
Cc: Corey Minyard <minyard@acm.org>,
	David Miller <davem@davemloft.net>,
	shemminger@vyatta.com, benny+usenet@amorsen.dk,
	netdev@vger.kernel.org,
	Christoph Lameter <cl@linux-foundation.org>,
	a.p.zijlstra@chello.nl, johnpol@2ka.mipt.ru,
	Christian Bell <christian@myri.com>
Subject: Re: [PATCH 2/2] udp: RCU handling for Unicast packets.
Date: Wed, 29 Oct 2008 21:00:13 +0100	[thread overview]
Message-ID: <4908C0CD.5050406@cosmosbay.com> (raw)
In-Reply-To: <20081029185200.GE6732@linux.vnet.ibm.com>

Paul E. McKenney a écrit :
> On Wed, Oct 29, 2008 at 01:28:15PM -0500, Corey Minyard wrote:
>> Eric Dumazet wrote:
>>> Corey Minyard a écrit :
>>>> Paul E. McKenney wrote:
>>>>> On Wed, Oct 29, 2008 at 05:09:53PM +0100, Eric Dumazet wrote:
>>>>>  
>>>>>> Corey Minyard a écrit :
>>>>>>   
>>>>>>> Eric Dumazet wrote:
>>>>>>>     
>>>>>>>> Corey Minyard found a race added in commit 
>>>>>>>> 271b72c7fa82c2c7a795bc16896149933110672d
>>>>>>>> (udp: RCU handling for Unicast packets.)
>>>>>>>>
>>>>>>>> "If the socket is moved from one list to another list in-between the 
>>>>>>>> time  the hash is calculated and the next field is accessed, and the 
>>>>>>>> socket  has moved to the end of the new list, the traversal will not 
>>>>>>>> complete  properly on the list it should have, since the socket will 
>>>>>>>> be on the end  of the new list and there's not a way to tell it's on 
>>>>>>>> a new list and  restart the list traversal.  I think that this can be 
>>>>>>>> solved by  pre-fetching the "next" field (with proper barriers) 
>>>>>>>> before checking the  hash."
>>>>>>>>
>>>>>>>> This patch corrects this problem, introducing a new 
>>>>>>>> sk_for_each_rcu_safenext()
>>>>>>>> macro.
>>>>>>>>         
>>>>>>> You also need the appropriate smp_wmb() in udp_lib_get_port() after 
>>>>>>> sk_hash is set, I think, so the next field is guaranteed to be changed 
>>>>>>> after the hash value is changed.
>>>>>>>       
>>>>>> Not sure about this one Corey.
>>>>>>
>>>>>> If a reader catches previous value of item->sk_hash, two cases are to 
>>>>>> be taken into :
>>>>>>
>>>>>> 1) its udp_hashfn(net, sk->sk_hash) is != hash   -> goto begin : Reader 
>>>>>> will redo its scan
>>>>>>
>>>>>> 2) its udp_hashfn(net, sk->sk_hash) is == hash
>>>>>>  -> next pointer is good enough : it points to next item in same hash 
>>>>>> chain.
>>>>>>     No need to rescan the chain at this point.
>>>>>>     Yes we could miss the fact that a new port was bound and this UDP 
>>>>>> message could be lost.
>>>>>>     
>>>>> 3) its udp_hashfn(net, sk-sk_hash) is == hash, but only because it was
>>>>> removed, freed, reallocated, and then readded with the same hash value,
>>>>> possibly carrying the reader to a new position in the same list.
>>>>>   
>>>> If I understand this, without the smp_wmb(), it is possible that the next 
>>>> field can be written to main memory before the hash value is written.  If 
>>>> that happens, the following can occur:
>>>>
>>>>  CPU1                    CPU2
>>>>  next is set to NULL (end of new list)
>>> Well, if this item is injected to the same chain, next wont be set to 
>>> NULL.
>>>
>>> That would mean previous writers deleted all items from the chain.
>> I put my comment in the wrong place, I wasn't talking about being injected 
>> into the same chain.
>>
>>> In this case, readers can see NULL, it is not a problem at all.
>>> List is/was empty.
>>> An application cannot complain a packet is not
>>> handled if its bind() syscall is not yet completed :)
>>>
>>> If item is injected on another chain, we will detect hash mismatch and 
>>> redo full scan.
>> If the item is injected onto the end of another chain, the next field will 
>> be NULL and you won't detect a hash mismatch.  It's basically the same 
>> issue as the previous race, though a lot more subtle and unlikely.  If you 
>> get (from the previous socket) an old value of "sk_hash" and (from the new 
>> socket) a new value of "next" that is NULL, you will terminate the loop 
>> when you should have restarted it.  I'm pretty sure that can occur without 
>> the write barrier.
> 
> One way of dealing with this is to keep a tail pointer.  Then, if the
> element containing the NULL pointer doesn't match the tail pointer seen
> at the start of the search, or if the tail pointer has changed,
> restart the search.  Memory barriers will be needed.  ;-)
> 

Hum... Another way of handling all those cases and avoid memory barriers
would be to have different "NULL" pointers.

Each hash chain should have a unique "NULL" pointer (in the case of UDP, it
can be the 128 values : [ (void*)0 .. (void *)127 ]

Then, when performing a lookup, a reader should check the "NULL" pointer
it get at the end of its lookup has is the "hash" value of its chain.

If not -> restart the loop, aka "goto begin;" :)

We could avoid memory barriers then.

In the two cases Corey mentioned, this trick could let us avoid memory barriers.
(existing one in sk_add_node_rcu(sk, &hslot->head); should be enough)

What do you think ?

next prev parent reply	other threads:[~2008-10-29 20:01 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-06 18:50 [PATCH 3/3] Convert the UDP hash lock to RCU Corey Minyard
2008-10-06 21:22 ` Eric Dumazet
2008-10-06 21:40   ` David Miller
2008-10-06 23:08     ` Corey Minyard
2008-10-07  8:37       ` Evgeniy Polyakov
2008-10-07 14:16         ` Christoph Lameter
2008-10-07 14:29           ` Evgeniy Polyakov
2008-10-07 14:38             ` Christoph Lameter
2008-10-07 14:33           ` Paul E. McKenney
2008-10-07 14:45             ` Christoph Lameter
2008-10-07 15:07               ` Eric Dumazet
2008-10-07 15:07               ` Paul E. McKenney
2008-10-07  5:24     ` Eric Dumazet
2008-10-07  8:54       ` Benny Amorsen
2008-10-07 12:59         ` Eric Dumazet
2008-10-07 14:07           ` Stephen Hemminger
2008-10-07 20:55             ` David Miller
2008-10-07 21:20               ` Stephen Hemminger
2008-10-08 13:55               ` Eric Dumazet
2008-10-08 18:45                 ` David Miller
2008-10-28 20:37                   ` [PATCH 1/2] udp: introduce struct udp_table and multiple rwlocks Eric Dumazet
2008-10-28 21:23                     ` Christian Bell
2008-10-28 21:31                       ` Evgeniy Polyakov
2008-10-28 21:48                       ` Eric Dumazet
2008-10-28 21:28                     ` Evgeniy Polyakov
2008-10-28 20:42                   ` [PATCH 2/2] udp: RCU handling for Unicast packets Eric Dumazet
2008-10-28 22:45                     ` Eric Dumazet
2008-10-29  5:05                       ` David Miller
2008-10-29  8:23                         ` Eric Dumazet
2008-10-29  8:56                           ` David Miller
2008-10-29 10:19                             ` Eric Dumazet
2008-10-29 18:19                               ` David Miller
2008-10-29  9:04                           ` Eric Dumazet
2008-10-29  9:17                             ` David Miller
2008-10-29 13:17                             ` Corey Minyard
2008-10-29 14:36                               ` Eric Dumazet
2008-10-29 15:34                                 ` Corey Minyard
2008-10-29 16:09                                   ` Eric Dumazet
2008-10-29 16:37                                     ` Paul E. McKenney
2008-10-29 17:22                                       ` Corey Minyard
2008-10-29 17:45                                         ` Eric Dumazet
2008-10-29 18:28                                           ` Corey Minyard
2008-10-29 18:52                                             ` Paul E. McKenney
2008-10-29 20:00                                               ` Eric Dumazet [this message]
2008-10-29 20:17                                                 ` Paul E. McKenney
2008-10-29 21:29                                                   ` Corey Minyard
2008-10-29 21:57                                                     ` Eric Dumazet
2008-10-29 21:58                                                     ` Paul E. McKenney
2008-10-29 22:08                                                   ` Eric Dumazet
2008-10-30  3:22                                                     ` Corey Minyard
2008-10-30  5:50                                                       ` Eric Dumazet
2008-11-02  4:19                                                         ` David Miller
2008-10-30  5:40                                                     ` David Miller
2008-10-30  5:51                                                       ` Eric Dumazet
2008-10-30  7:04                                                         ` Eric Dumazet
2008-10-30  7:05                                                           ` David Miller
2008-10-30 15:40                                                     ` [PATCH] udp: Introduce special NULL pointers for hlist termination Eric Dumazet
2008-10-30 15:51                                                       ` Stephen Hemminger
2008-10-30 16:28                                                         ` Corey Minyard
2008-10-31 14:37                                                           ` Eric Dumazet
2008-10-31 14:55                                                             ` Pavel Emelyanov
2008-11-02  4:22                                                               ` David Miller
2008-10-30 17:12                                                         ` Eric Dumazet
2008-10-31  7:51                                                           ` David Miller
2008-10-30 16:01                                                       ` Peter Zijlstra
2008-10-31  0:14                                                       ` Keith Owens
2008-11-13 13:13                                                       ` [PATCH 0/3] net: RCU lookups for UDP, DCCP and TCP protocol Eric Dumazet
2008-11-13 17:20                                                         ` Andi Kleen
2008-11-17  3:41                                                         ` David Miller
2008-11-19 19:52                                                           ` Christoph Lameter
2008-11-13 13:14                                                       ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Eric Dumazet
2008-11-13 13:29                                                         ` Peter Zijlstra
2008-11-13 13:44                                                           ` Eric Dumazet
2008-11-13 16:02                                                             ` [PATCH 4/3] rcu: documents rculist_nulls Eric Dumazet
2008-11-14 15:16                                                               ` Peter Zijlstra
2008-11-17  3:36                                                                 ` David Miller
2008-11-19 17:07                                                               ` Paul E. McKenney
2008-11-14 15:16                                                         ` [PATCH 1/3] rcu: Introduce hlist_nulls variant of hlist Peter Zijlstra
2008-11-19 17:01                                                         ` Paul E. McKenney
2008-11-19 17:53                                                           ` Eric Dumazet
2008-11-19 18:46                                                             ` Paul E. McKenney
2008-11-19 18:53                                                               ` Arnaldo Carvalho de Melo
2008-11-19 21:17                                                                 ` Paul E. McKenney
2008-11-19 20:39                                                               ` Eric Dumazet
2008-11-19 21:21                                                                 ` Paul E. McKenney
2008-11-13 13:15                                                       ` [PATCH 2/3] udp: Use hlist_nulls in UDP RCU code Eric Dumazet
2008-11-19 17:29                                                         ` Paul E. McKenney
2008-11-19 17:53                                                           ` Eric Dumazet
2008-11-13 13:15                                                       ` [PATCH 3/3] net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls Eric Dumazet
2008-11-13 13:34                                                         ` Peter Zijlstra
2008-11-13 13:51                                                           ` Eric Dumazet
2008-11-13 14:08                                                             ` Christoph Lameter
2008-11-13 14:22                                                             ` Peter Zijlstra
2008-11-13 14:27                                                               ` Christoph Lameter
2008-11-19 17:53                                                         ` Paul E. McKenney
2008-11-23  9:33                                                         ` [PATCH] net: Convert TCP/DCCP listening hash tables to use RCU Eric Dumazet
2008-11-23 15:59                                                           ` Paul E. McKenney
2008-11-23 18:42                                                             ` Eric Dumazet
2008-11-23 19:17                                                               ` Paul E. McKenney
2008-11-23 20:18                                                                 ` Eric Dumazet
2008-11-23 22:33                                                                   ` Paul E. McKenney
2008-11-24  1:23                                                           ` David Miller
2008-10-30 11:04                                                 ` [PATCH 2/2] udp: RCU handling for Unicast packets Peter Zijlstra
2008-10-30 11:30                                                   ` Eric Dumazet
2008-10-30 18:25                                                     ` Paul E. McKenney
2008-10-31 16:40                                                       ` Eric Dumazet
2008-11-01  3:10                                                         ` Paul E. McKenney
2008-10-29 17:32                                       ` Eric Dumazet
2008-10-29 18:11                                         ` Paul E. McKenney
2008-10-29 18:29                                           ` David Miller
2008-10-29 18:38                                             ` Paul E. McKenney
2008-10-29 18:36                                           ` Eric Dumazet
2008-10-29 18:20                                 ` David Miller
2008-10-30 11:12                                 ` Peter Zijlstra
2008-10-30 11:29                                   ` Eric Dumazet
2008-10-28 20:37                 ` [PATCH 0/2] udp: Convert the UDP hash lock to RCU Eric Dumazet
2008-10-28 21:28                   ` Stephen Hemminger
2008-10-28 21:50                     ` Eric Dumazet
2008-10-07 16:43           ` [PATCH 3/3] " Corey Minyard
2008-10-07 18:26       ` David Miller
2008-10-08  8:35         ` Eric Dumazet
2008-10-08 16:38           ` David Miller
2008-10-07  8:31     ` Peter Zijlstra
2008-10-07 14:36       ` Paul E. McKenney
2008-10-07 18:29       ` David Miller
2008-10-06 22:07   ` Corey Minyard
2008-10-07  8:17   ` Peter Zijlstra
2008-10-07  9:24     ` Eric Dumazet
2008-10-07 14:15       ` Christoph Lameter
2008-10-07 14:38         ` Paul E. McKenney
2008-10-07 14:50         ` Eric Dumazet
2008-10-07 15:05           ` Paul E. McKenney
2008-10-07 15:09           ` Peter Zijlstra
2008-10-07 15:23           ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4908C0CD.5050406@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=benny+usenet@amorsen.dk \
    --cc=christian@myri.com \
    --cc=cl@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=johnpol@2ka.mipt.ru \
    --cc=minyard@acm.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).