netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
[parent not found: <200510011837.j91IbE01012915@rastaban.cs.pdx.edu>]
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-10-01 18:37 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-10-01 18:37 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

Please excuse this restatement of an earlier concern.

  > From suzannew Sat Oct  1 11:00:28 2005

  > With the spotlight leaving in_dev_get, we have the parallel question 
  > of in_dev_put() and __in_dev_put()  both defined with refcnt 
  > decrement, but the preceding underscore may lend itself to an
  > inadvertant pairing and refcnt inaccuracy.

Dave Miller already addressed this question in 
http://www.ussg.iu.edu/hypermail/linux/kernel/0509.3/0757.html 

  > It may not be reasonable to rename __in_dev_put for its parallel definition
  > since its current usage is with __in_dev_get_rtnl() which does not increment 
  > refcnt. 

But the following may be worth considering.

  > It is also probably good to retain the old __in_dev_get()
and deprecate it.

Thank you.

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-10-01 18:00 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-10-01 18:00 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

With the spotlight leaving in_dev_get, we have the parallel question 
of in_dev_put() and __in_dev_put()  both defined with refcnt 
decrement, but the preceding underscore may lend itself to an
inadvertant pairing and refcnt inaccuracy.

In the following  cscope of linux-2.6.14-rc2 prior to Herbert's patches 
which remove the occurence in pktgen.c and the others are
paired with __in_dev_get_rtnl(), except xfrm4_policy.c():

C symbol: __in_dev_put
  File           Function              Line
  File           Function              Line
0 inetdevice.h   <global>               172 #define __in_dev_put(idev) atomic_dec(&(idev)->refcnt)
1 pktgen.c       pktgen_setup_inject   1687 __in_dev_put(in_dev);
2 devinet.c      inet_rtm_deladdr       412 __in_dev_put(in_dev);
3 igmp.c         igmp_gq_timer_expire   686 __in_dev_put(in_dev);
4 igmp.c         igmp_ifc_timer_expire  698 __in_dev_put(in_dev);
5 igmp.c         igmp_heard_query       801 __in_dev_put(in_dev);
6 igmp.c         ip_mc_down            1228 __in_dev_put(in_dev);
7 igmp.c         ip_mc_down            1231 __in_dev_put(in_dev);
8 igmp.c         ip_mc_find_dev        1310 __in_dev_put(idev);
9 xfrm4_policy.c xfrm4_dst_ifdown       280 __in_dev_put(loopback_idev);

It may not be reasonable to rename __in_dev_put for its parallel definition
since its current usage is with __in_dev_get_rtnl() which does not increment 
refcnt. 

It is also probably good to retain the old __in_dev_get() 
and __in_dev_put() and deprecate them.

Old business:  this is probably stating the obvious,
with Herbert's patches in place, none of the test patch
I'd submitted 8 Sept should take effect.

Thank you.
Suzanne

----- Original Message ----- 
From: "Herbert Xu" <herbert@gondor.apana.org.au>
Sent: Saturday, October 01, 2005 12:12 AM

> On Fri, Sep 30, 2005 at 11:56:41PM -0700, Suzanne Wood wrote:
>> 
>> The other thing I'd hoped to address in pktgen.c was 
>> removing the __in_dev_put() which decrements refcnt 
>> while __in_dev_get_rcu() does not increment.
> 
> Well spotted.
> 
> Here is a patch on top of the last one to fix this bogus decrement.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

^ permalink raw reply	[flat|nested] 24+ messages in thread
[parent not found: <200510010656.j916ufhT007410@rastaban.cs.pdx.edu>]
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-10-01  6:56 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-10-01  6:56 UTC (permalink / raw)
  To: herbert
  Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, suzannew,
	walpole

Many thanks for addressing this issue.

----- Original Message ----- 
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sat, 1 Oct 2005 11:13:12 +1000
> 
> On Thu, Sep 29, 2005 at 06:06:50PM -0700, Suzanne Wood wrote:
>> 
>> Are there three cases then?  RCU protection with refcnt, RCU without refcnt,
>> and the bare cast of the dereference? 
> 
> Correct.
> 
> The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
> introduces __in_dev_get_rcu() to cover the second case.
> 
> 1) RCU with refcnt should use in_dev_get().
> 2) RCU without refcnt should use __in_dev_get_rcu().
> 3) All others must hold RTNL and use __in_dev_get_rtnl().
> 

The naming to indicate purpose looks good and the leading underscores
in the prior work did seem to imply wrapping to add functionality.

But it is interesting to have discarded what was developed yesterday
to minimize rcu_dereference impact:

>> ----- Original message  -----
>> From: Herbert Xu <herbert@gondor.apana.org.au>
>> Date: Fri, 30 Sep 2005 11:19:07 +1000
>> 
>> On Thu, Sep 29, 2005 at 06:16:03PM -0700, Paul E. McKenney wrote:
>>> 
>>> OK, how about this instead?
>>> 
>>> rcu_read_lock();
>>> in_dev = dev->ip_ptr;
>>> if (in_dev) {
>>> atomic_inc(&rcu_dereference(in_dev)->refcnt);
>>> }
>>> rcu_read_unlock();
>>> return in_dev;
>> 
>> Looks great. 

while adding a function call level by wrapping  __in_dev_get_rcu
with in_dev_get as suggested here.

> -- 
> diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
> --- a/include/linux/inetdevice.h
> +++ b/include/linux/inetdevice.h
> @@ -142,13 +142,21 @@ static __inline__ int bad_mask(u32 mask,
> 
> #define endfor_ifa(in_dev) }
> 
> +static inline struct in_device *__in_dev_get_rcu(const struct net_device *dev)
> +{
> + struct in_device *in_dev = dev->ip_ptr;
> + if (in_dev)
> + in_dev = rcu_dereference(in_dev);
> + return in_dev;
> +}
> +
> static __inline__ struct in_device *
> in_dev_get(const struct net_device *dev)
> {
>  struct in_device *in_dev;
> 
>  rcu_read_lock();
> - in_dev = dev->ip_ptr;
> + in_dev = __in_dev_get_rcu(dev);
>  if (in_dev)
>  atomic_inc(&in_dev->refcnt);
>  rcu_read_unlock();
> @@ -156,7 +164,7 @@ in_dev_get(const struct net_device *dev)
> }
> 
> static __inline__ struct in_device *
> -__in_dev_get(const struct net_device *dev)
> +__in_dev_get_rtnl(const struct net_device *dev)
> {
>  return (struct in_device*)dev->ip_ptr;
> }

The other thing I'd hoped to address in pktgen.c was 
removing the __in_dev_put() which decrements refcnt 
while __in_dev_get_rcu() does not increment.

> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -1667,7 +1667,7 @@ static void pktgen_setup_inject(struct p
>  struct in_device *in_dev; 
> 
>  rcu_read_lock();
> - in_dev = __in_dev_get(pkt_dev->odev);
> + in_dev = __in_dev_get_rcu(pkt_dev->odev);
>  if (in_dev) {
>  if (in_dev->ifa_list) {
>  pkt_dev->saddr_min = in_dev->ifa_list->ifa_address;

     pkt_dev->saddr_max = pkt_dev->saddr_min;
    }
-   __in_dev_put(in_dev); 
   }
   rcu_read_unlock();

Thank you very much again for resolving this by clarifying both the 
RCU and RTNL usages.

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-30  1:06 Suzanne Wood
  2005-10-01  1:13 ` Herbert Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Suzanne Wood @ 2005-09-30  1:06 UTC (permalink / raw)
  To: paulmck, suzannew
  Cc: Robert.Olsson, davem, herbert, linux-kernel, netdev, walpole

In reviewing the 44 kernel uses of __in_dev_get and seeing many 
cases in 13 of 20 C code files of insertions of rcu_read_lock with 
and without the rcu_dereference that is indicated, so it does appear 
often to be programmer intent.  

Of the programs using __in_dev_get that don't include rcu, devinet.c 
and igmp.c use an rtnl lock.  Five other programs that use __in_dev_get 
without rcu have rtnl locking in the program source code, but I need 
to actually look further into the call tree to say more.

Are there three cases then?  RCU protection with refcnt, RCU without refcnt,
and the bare cast of the dereference? 

Thank you very much for getting it back on track.

  > From paulmck@us.ibm.com  Thu Sep 29 17:23:18 2005

  > Is there any case where __in_dev_get() might be called without
  > needing to be wrapped with rcu_dereference()?  If so, then I
  > agree (FWIW, given my meagre knowledge of Linux networking).

  > If all __in_dev_get() invocations need to be wrapped in
  > rcu_dereference(), then it seems to me that there would be
  > motivation to bury rcu_dereference() in __in_dev_get().

  > > > Some callers of __in_dev_get() don't need rcu_dereference at all
  > > > because they're protected by the rtnl.
  > > 
  > > > BTW, could you please move the rcu_dereference in in_dev_get()
  > > > into the if clause? The barrier is not needed when ip_ptr is
  > > > NULL.
  > > 
  > > The trouble with that may be that there are three events, the
  > > dereference, the assignment, and the conditional test.  The
  > > rcu_dereference() is meant to assure deferred destruction
  > > throughout.

  > One only needs an rcu_dereference() once on the data-flow path from
  > fetching the RCU-protected pointer to dereferencing that pointer.
  > If the pointer is NULL, there is no way you can dereference it,
  > so, technically, Herbert is quite correct.

  > However, rcu_dereference() only generates a memory barrier on DEC
  > Alpha, so there is normally no penalty for using it in the NULL-pointer
  > case.  So, when using rcu_dereference() unconditionally simplifies
  > the code, it may make sense to "just do it".

  > 							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread
[parent not found: <200509292359.j8TNxuxD019838@rastaban.cs.pdx.edu>]
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 23:59 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-09-29 23:59 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

Sorry to be thinking on-line, but if you mean this:

  if (in_dev = rcu_dereference(dev->ip_ptr))

I think that's fine.

  > From suzannew Thu Sep 29 16:39:57 2005

  >   > From suzannew Thu Sep 29 16:30:28 2005

  >   > > From: Herbert Xu 30 Sep 2005 07:28

  >   > > BTW, could you please move the rcu_dereference in in_dev_get()
  >   > > into the if clause? The barrier is not needed when ip_ptr is
  >   > > NULL.

  >   > The trouble with that may be that there are three events, the
  >   > dereference, the assignment, and the conditional test.  The
  >   > rcu_dereference() is meant to assure deferred destruction
  >   > throughout.

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 23:39 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-09-29 23:39 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

  > From suzannew Thu Sep 29 16:30:28 2005

  > > From: Herbert Xu 30 Sep 2005 07:28

  > > BTW, could you please move the rcu_dereference in in_dev_get()
  > > into the if clause? The barrier is not needed when ip_ptr is
  > > NULL.

  > The trouble with that may be that there are three events, the
  > dereference, the assignment, and the conditional test.  The
  > rcu_dereference() is meant to assure deferred destruction
  > throughout.

Sorry, I was thinking in terms of the rcu_read_lock, so this is misstated.

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 23:30 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-09-29 23:30 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

> Date: Fri, 30 Sep 2005 07:28:36 +1000
> From: Herbert Xu <herbert@gondor.apana.org.au>

> On Thu, Sep 29, 2005 at 09:02:29AM -0700, Suzanne Wood wrote:
> > 
> > The exchange below suggests that it is equally important 
> > to have the rcu_dereference() in __in_dev_get(), so the 
> > idea of the only difference between in_dev_get and 
> > __in_dev_get being the refcnt may be accepted.

> With __in_dev_get() it's the caller's responsibility to ensure
> that RCU works correctly.  Therefore if any rcu_dereference is
> needed it should be done by the caller.

This sounds reasonable to me.  Does everyone agree? 

> Some callers of __in_dev_get() don't need rcu_dereference at all
> because they're protected by the rtnl.

> BTW, could you please move the rcu_dereference in in_dev_get()
> into the if clause? The barrier is not needed when ip_ptr is
> NULL.

The trouble with that may be that there are three events, the
dereference, the assignment, and the conditional test.  The
rcu_dereference() is meant to assure deferred destruction
throughout.

Thank you very much for your suggestions.

^ permalink raw reply	[flat|nested] 24+ messages in thread
[parent not found: <200509291602.j8TG2TuI015920@rastaban.cs.pdx.edu>]
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 16:02 Suzanne Wood
  0 siblings, 0 replies; 24+ messages in thread
From: Suzanne Wood @ 2005-09-29 16:02 UTC (permalink / raw)
  To: paulmck
  Cc: Robert.Olsson, davem, herbert, linux-kernel, netdev, suzannew,
	walpole

The original motivation for this patch was in __in_dev_get 
usage.  I'll try to test a build, but should submittals be 
incremental? first addressing in_dev_get, then 
__in_dev_get?  What seems resolved so far follows.

The exchange below suggests that it is equally important 
to have the rcu_dereference() in __in_dev_get(), so the 
idea of the only difference between in_dev_get and 
__in_dev_get being the refcnt may be accepted.

Correct usage may be a question with the mismatched 
definitions (in terms of refcnt) of __in_dev_get() 
and __in_dev_put() that superficially appear 
paired and this may merit a comment.  If interested, 
examples are mentioned in 
www.uwsg.iu.edu/hypermail/linux/kernel/0509.1/0184.html
and
www.ussg.iu.edu/hypermail/linux/kernel/0509.3/0757.html

But when the refcnt is employed for the DEC Alpha, 
rcu-protection or other locking must be in place for 
multiple CPUs, which apparently affirms the value 
of the marking of an rcu read-side critical section 
done by the calling function which has the vision of the 
extent of use of the protected dereference.

Is this all reasonable to you?
Thank you very much.

----- Original Message ----- 
From: Paul E. McKenney
Sent: Wednesday, September 28, 2005 7:51 AM


> On Wed, Sep 28, 2005 at 12:55:45PM +1000, Herbert Xu wrote:
>> David S. Miller wrote:
>> > 
>> > I agree with the changes to add rcu_dereference() use.
>> > Those were definitely lacking and needed.
>> 
>> Actually I'm not so sure that they are all needed.  I only looked
>> at the > guarantee correct code.  We really need to look at each case
>> individually.
> 
> Yep, these two APIs are only part of the solution.
> 
> The reference-count approach is only guaranteed to work if the kernel
> thread that did the reference-count increment is later referencing that
> same data element.  Otherwise, one has the following possible situation
> on DEC Alpha:
> 
> o CPU 0 initializes and inserts a new element into the data
> structure, using rcu_assign_pointer() to provide any needed
> memory barriers.  (Or, if RCU is not being used, under the
> appropriate update-side lock.)
> 
> o CPU 1 acquires a reference to this new element, presumably
> using either a lock or rcu_read_lock() and rcu_dereference()
> in order to do so safely.  CPU 1 then increments the reference
> count.
> 
> o CPU 2 picks up a pointer to this new element, but in a way
> that relies on the reference count having been incremented,
> without using locking, rcu_read_lock(), rcu_dereference(),
> and so on.
> 
> This CPU can then see the pre-initialized contents of the
> newly inserted data structure (again, but only on DEC Alpha).
> 
> Again, if the same kernel thread that incremented the reference count
> is later accessing it, no problem, even on Alpha.
> 
> Thanx, Paul
>

^ permalink raw reply	[flat|nested] 24+ messages in thread
[parent not found: <20050927.135626.88296134.davem@davemloft.net>]

end of thread, other threads:[~2005-10-01 19:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200509292330.j8TNUSmH019572@rastaban.cs.pdx.edu>
2005-09-30  0:21 ` [RFC][PATCH] identify in_dev_get rcu read-side critical sections Herbert Xu
2005-09-30  0:23 ` Paul E. McKenney
2005-09-30  0:27   ` Herbert Xu
     [not found]   ` <20050930002719.GC21062@gondor.apana.org.au>
2005-09-30  0:36     ` Paul E. McKenney
     [not found]     ` <20050930003642.GQ8177@us.ibm.com>
2005-09-30  1:04       ` Herbert Xu
     [not found]       ` <20050930010404.GA21429@gondor.apana.org.au>
2005-09-30  1:16         ` Paul E. McKenney
2005-09-30  1:19           ` Herbert Xu
     [not found] <200510011837.j91IbE01012915@rastaban.cs.pdx.edu>
2005-10-01 19:29 ` Herbert Xu
2005-10-01 18:37 Suzanne Wood
  -- strict thread matches above, loose matches on Subject: below --
2005-10-01 18:00 Suzanne Wood
     [not found] <200510010656.j916ufhT007410@rastaban.cs.pdx.edu>
2005-10-01  7:12 ` Herbert Xu
     [not found] ` <20051001071248.GA15990@gondor.apana.org.au>
2005-10-01 18:04   ` Paul E. McKenney
2005-10-01  6:56 Suzanne Wood
2005-09-30  1:06 Suzanne Wood
2005-10-01  1:13 ` Herbert Xu
     [not found] <200509292359.j8TNxuxD019838@rastaban.cs.pdx.edu>
2005-09-30  0:23 ` Herbert Xu
2005-09-29 23:59 Suzanne Wood
2005-09-29 23:39 Suzanne Wood
2005-09-29 23:30 Suzanne Wood
     [not found] <200509291602.j8TG2TuI015920@rastaban.cs.pdx.edu>
2005-09-29 21:28 ` Herbert Xu
2005-09-29 16:02 Suzanne Wood
     [not found] <20050927.135626.88296134.davem@davemloft.net>
2005-09-28  2:55 ` Herbert Xu
     [not found] ` <E1EKS6j-0006s4-00@gondolin.me.apana.org.au>
2005-09-28 14:51   ` Paul E. McKenney
     [not found]   ` <20050928145110.GA4925@us.ibm.com>
2005-09-28 22:11     ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).