Re: 2.6.34-rc1: rcu lockdep bug?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.34-rc1: rcu lockdep bug?
       [not found]   ` <20100311161751.GA3804@hack>
@ 2010-03-12  7:56     ` Américo Wang
  2010-03-12  8:07       ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-12  7:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Américo Wang, Peter Zijlstra, LKML,
	Linux Kernel Network Developers

(Cc'ing netdev)

On Fri, Mar 12, 2010 at 12:17 AM, Américo Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Mar 11, 2010 at 05:45:56AM -0800, Paul E. McKenney wrote:
>>On Thu, Mar 11, 2010 at 06:05:38PM +0800, Américo Wang wrote:
>>> Hello, Paul and Peter,
>>>
>>> Attached is the lockdep warning that I triggered today.
>>>
>>> I am not sure if this is a bug of rcu lockdep, because I am
>>> testing my patch when this occurred. However, in the backtrace,
>>> there is none of the functions that I touched, weird.
>>>
>>> So, please help to check if this is a bug of rcu lockdep.
>>
>>This sort of thing is caused by acquiring the same lock with softirq
>>(AKA BH) blocked and not, which can result in self-deadlock.
>>
>>There was such a bug in the RCU lockdep stuff in -tip, but it has long
>>since been fixed.  If you were seeing that bug, rcu_do_batch() would
>>be on the stack, which it does not appear to be.
>>
>>So does your patch involve the usbfs_mutex?  Or attempt to manipulate
>>vfs/fs state from withing networking softirq/BH context?
>>
>
> Nope, it is a patch for netpoll, nothing related with usb, nor vfs.
>

Ok, after decoding the lockdep output, it looks like that
netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
But I don't know if all callers of netif_receive_skb() are in softirq context.

Paul, what do you think?

Thank you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12  7:56     ` 2.6.34-rc1: rcu lockdep bug? Américo Wang
@ 2010-03-12  8:07       ` David Miller
  2010-03-12  8:59         ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2010-03-12  8:07 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: paulmck, peterz, linux-kernel, netdev

From: Américo Wang <xiyou.wangcong@gmail.com>
Date: Fri, 12 Mar 2010 15:56:03 +0800

> Ok, after decoding the lockdep output, it looks like that
> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
> But I don't know if all callers of netif_receive_skb() are in softirq context.

Normally, netif_receive_skb() is invoked from softirq context.

However, via netpoll it can be invoked essentially from any context.

But, when this happens, the networking receive path makes amends such
that this works fine.  That's what the netpoll_receive_skb() check in
netif_receive_skb() is for.  That check makes it bail out early if the
call to netif_receive_skb() is via a netpoll invocation.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12  8:07       ` David Miller
@ 2010-03-12  8:59         ` Américo Wang
  2010-03-12 11:11           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-12  8:59 UTC (permalink / raw)
  To: David Miller; +Cc: paulmck, peterz, linux-kernel, netdev

On Fri, Mar 12, 2010 at 4:07 PM, David Miller <davem@davemloft.net> wrote:
> From: Américo Wang <xiyou.wangcong@gmail.com>
> Date: Fri, 12 Mar 2010 15:56:03 +0800
>
>> Ok, after decoding the lockdep output, it looks like that
>> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
>> But I don't know if all callers of netif_receive_skb() are in softirq context.
>
> Normally, netif_receive_skb() is invoked from softirq context.
>
> However, via netpoll it can be invoked essentially from any context.
>
> But, when this happens, the networking receive path makes amends such
> that this works fine.  That's what the netpoll_receive_skb() check in
> netif_receive_skb() is for.  That check makes it bail out early if the
> call to netif_receive_skb() is via a netpoll invocation.
>

Oh, I see. This means we should call rcu_read_lock_bh() instead.
If Paul has no objections, I will send a patch for this.

Thanks much, David!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12  8:59         ` Américo Wang
@ 2010-03-12 11:11           ` Eric Dumazet
  2010-03-12 13:11             ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-03-12 11:11 UTC (permalink / raw)
  To: Américo Wang; +Cc: David Miller, paulmck, peterz, linux-kernel, netdev

Le vendredi 12 mars 2010 à 16:59 +0800, Américo Wang a écrit :
> On Fri, Mar 12, 2010 at 4:07 PM, David Miller <davem@davemloft.net> wrote:
> > From: Américo Wang <xiyou.wangcong@gmail.com>
> > Date: Fri, 12 Mar 2010 15:56:03 +0800
> >
> >> Ok, after decoding the lockdep output, it looks like that
> >> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
> >> But I don't know if all callers of netif_receive_skb() are in softirq context.
> >
> > Normally, netif_receive_skb() is invoked from softirq context.
> >
> > However, via netpoll it can be invoked essentially from any context.
> >
> > But, when this happens, the networking receive path makes amends such
> > that this works fine.  That's what the netpoll_receive_skb() check in
> > netif_receive_skb() is for.  That check makes it bail out early if the
> > call to netif_receive_skb() is via a netpoll invocation.
> >
> 
> Oh, I see. This means we should call rcu_read_lock_bh() instead.
> If Paul has no objections, I will send a patch for this.
> 

Nope, its calling rcu_read_lock() from interrupt context and it should
stay as is (we dont need to disable bh, this has a cpu cost)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12 11:11           ` Eric Dumazet
@ 2010-03-12 13:11             ` Américo Wang
  2010-03-12 13:37               ` Eric Dumazet
  2010-03-12 22:03               ` Paul E. McKenney
  0 siblings, 2 replies; 17+ messages in thread
From: Américo Wang @ 2010-03-12 13:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, paulmck, peterz, linux-kernel, netdev

On Fri, Mar 12, 2010 at 7:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 12 mars 2010 à 16:59 +0800, Américo Wang a écrit :
>> On Fri, Mar 12, 2010 at 4:07 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Américo Wang <xiyou.wangcong@gmail.com>
>> > Date: Fri, 12 Mar 2010 15:56:03 +0800
>> >
>> >> Ok, after decoding the lockdep output, it looks like that
>> >> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
>> >> But I don't know if all callers of netif_receive_skb() are in softirq context.
>> >
>> > Normally, netif_receive_skb() is invoked from softirq context.
>> >
>> > However, via netpoll it can be invoked essentially from any context.
>> >
>> > But, when this happens, the networking receive path makes amends such
>> > that this works fine.  That's what the netpoll_receive_skb() check in
>> > netif_receive_skb() is for.  That check makes it bail out early if the
>> > call to netif_receive_skb() is via a netpoll invocation.
>> >
>>
>> Oh, I see. This means we should call rcu_read_lock_bh() instead.
>> If Paul has no objections, I will send a patch for this.
>>
>
> Nope, its calling rcu_read_lock() from interrupt context and it should
> stay as is (we dont need to disable bh, this has a cpu cost)
>

Oh, but lockdep complains about rcu_read_lock(), it said
rcu_read_lock() can't be used in softirq context.

Am I missing something?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12 13:11             ` Américo Wang
@ 2010-03-12 13:37               ` Eric Dumazet
  2010-03-13  5:33                 ` Américo Wang
  2010-03-12 22:03               ` Paul E. McKenney
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-03-12 13:37 UTC (permalink / raw)
  To: Américo Wang; +Cc: David Miller, paulmck, peterz, linux-kernel, netdev

Le vendredi 12 mars 2010 à 21:11 +0800, Américo Wang a écrit :

> Oh, but lockdep complains about rcu_read_lock(), it said
> rcu_read_lock() can't be used in softirq context.
> 
> Am I missing something?

Well, lockdep might be dumb, I dont know...

I suggest you read rcu_read_lock_bh kernel doc :

/**
 * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
section
 *
 * This is equivalent of rcu_read_lock(), but to be used when updates
 * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
 * consider completion of a softirq handler to be a quiescent state,
 * a process in RCU read-side critical section must be protected by
 * disabling softirqs. Read-side critical sections in interrupt context
 * can use just rcu_read_lock().
 *
 */

Last sentence being perfect :

Read-side critical sections in interrupt context
can use just rcu_read_lock().

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12 13:11             ` Américo Wang
  2010-03-12 13:37               ` Eric Dumazet
@ 2010-03-12 22:03               ` Paul E. McKenney
  2010-03-13  5:31                 ` Américo Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2010-03-12 22:03 UTC (permalink / raw)
  To: Américo Wang
  Cc: Eric Dumazet, David Miller, peterz, linux-kernel, netdev

On Fri, Mar 12, 2010 at 09:11:02PM +0800, Américo Wang wrote:
> On Fri, Mar 12, 2010 at 7:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le vendredi 12 mars 2010 à 16:59 +0800, Américo Wang a écrit :
> >> On Fri, Mar 12, 2010 at 4:07 PM, David Miller <davem@davemloft.net> wrote:
> >> > From: Américo Wang <xiyou.wangcong@gmail.com>
> >> > Date: Fri, 12 Mar 2010 15:56:03 +0800
> >> >
> >> >> Ok, after decoding the lockdep output, it looks like that
> >> >> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
> >> >> But I don't know if all callers of netif_receive_skb() are in softirq context.
> >> >
> >> > Normally, netif_receive_skb() is invoked from softirq context.
> >> >
> >> > However, via netpoll it can be invoked essentially from any context.
> >> >
> >> > But, when this happens, the networking receive path makes amends such
> >> > that this works fine.  That's what the netpoll_receive_skb() check in
> >> > netif_receive_skb() is for.  That check makes it bail out early if the
> >> > call to netif_receive_skb() is via a netpoll invocation.
> >> >
> >>
> >> Oh, I see. This means we should call rcu_read_lock_bh() instead.
> >> If Paul has no objections, I will send a patch for this.
> >>
> >
> > Nope, its calling rcu_read_lock() from interrupt context and it should
> > stay as is (we dont need to disable bh, this has a cpu cost)
> >
> 
> Oh, but lockdep complains about rcu_read_lock(), it said
> rcu_read_lock() can't be used in softirq context.
> 
> Am I missing something?

Hmmm...  It is supposed to be OK to use rcu_read_lock() in pretty much
any context, even NMI.  I will take a look.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12 22:03               ` Paul E. McKenney
@ 2010-03-13  5:31                 ` Américo Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Américo Wang @ 2010-03-13  5:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Américo Wang, Eric Dumazet, David Miller, peterz,
	linux-kernel, netdev

On Fri, Mar 12, 2010 at 02:03:19PM -0800, Paul E. McKenney wrote:
>On Fri, Mar 12, 2010 at 09:11:02PM +0800, Américo Wang wrote:
>> On Fri, Mar 12, 2010 at 7:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le vendredi 12 mars 2010 à 16:59 +0800, Américo Wang a écrit :
>> >> On Fri, Mar 12, 2010 at 4:07 PM, David Miller <davem@davemloft.net> wrote:
>> >> > From: Américo Wang <xiyou.wangcong@gmail.com>
>> >> > Date: Fri, 12 Mar 2010 15:56:03 +0800
>> >> >
>> >> >> Ok, after decoding the lockdep output, it looks like that
>> >> >> netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()?
>> >> >> But I don't know if all callers of netif_receive_skb() are in softirq context.
>> >> >
>> >> > Normally, netif_receive_skb() is invoked from softirq context.
>> >> >
>> >> > However, via netpoll it can be invoked essentially from any context.
>> >> >
>> >> > But, when this happens, the networking receive path makes amends such
>> >> > that this works fine.  That's what the netpoll_receive_skb() check in
>> >> > netif_receive_skb() is for.  That check makes it bail out early if the
>> >> > call to netif_receive_skb() is via a netpoll invocation.
>> >> >
>> >>
>> >> Oh, I see. This means we should call rcu_read_lock_bh() instead.
>> >> If Paul has no objections, I will send a patch for this.
>> >>
>> >
>> > Nope, its calling rcu_read_lock() from interrupt context and it should
>> > stay as is (we dont need to disable bh, this has a cpu cost)
>> >
>> 
>> Oh, but lockdep complains about rcu_read_lock(), it said
>> rcu_read_lock() can't be used in softirq context.
>> 
>> Am I missing something?
>
>Hmmm...  It is supposed to be OK to use rcu_read_lock() in pretty much
>any context, even NMI.  I will take a look.
>

Thanks! Please let me know if you have new progress.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-12 13:37               ` Eric Dumazet
@ 2010-03-13  5:33                 ` Américo Wang
  2010-03-13 21:58                   ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-13  5:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Américo Wang, David Miller, paulmck, peterz, linux-kernel,
	netdev

On Fri, Mar 12, 2010 at 02:37:38PM +0100, Eric Dumazet wrote:
>Le vendredi 12 mars 2010 à 21:11 +0800, Américo Wang a écrit :
>
>> Oh, but lockdep complains about rcu_read_lock(), it said
>> rcu_read_lock() can't be used in softirq context.
>> 
>> Am I missing something?
>
>Well, lockdep might be dumb, I dont know...
>
>I suggest you read rcu_read_lock_bh kernel doc :
>
>/**
> * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
>section
> *
> * This is equivalent of rcu_read_lock(), but to be used when updates
> * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
> * consider completion of a softirq handler to be a quiescent state,
> * a process in RCU read-side critical section must be protected by
> * disabling softirqs. Read-side critical sections in interrupt context
> * can use just rcu_read_lock().
> *
> */
>
>
>Last sentence being perfect :
>
>Read-side critical sections in interrupt context
>can use just rcu_read_lock().
>

Yeah, right, then it is more likely to be a bug of rcu lockdep.
Paul is looking at it.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-13  5:33                 ` Américo Wang
@ 2010-03-13 21:58                   ` Paul E. McKenney
  2010-03-15  1:08                     ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2010-03-13 21:58 UTC (permalink / raw)
  To: Américo Wang
  Cc: Eric Dumazet, David Miller, peterz, linux-kernel, netdev

On Sat, Mar 13, 2010 at 01:33:56PM +0800, Américo Wang wrote:
> On Fri, Mar 12, 2010 at 02:37:38PM +0100, Eric Dumazet wrote:
> >Le vendredi 12 mars 2010 à 21:11 +0800, Américo Wang a écrit :
> >
> >> Oh, but lockdep complains about rcu_read_lock(), it said
> >> rcu_read_lock() can't be used in softirq context.
> >> 
> >> Am I missing something?
> >
> >Well, lockdep might be dumb, I dont know...
> >
> >I suggest you read rcu_read_lock_bh kernel doc :
> >
> >/**
> > * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
> >section
> > *
> > * This is equivalent of rcu_read_lock(), but to be used when updates
> > * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
> > * consider completion of a softirq handler to be a quiescent state,
> > * a process in RCU read-side critical section must be protected by
> > * disabling softirqs. Read-side critical sections in interrupt context
> > * can use just rcu_read_lock().
> > *
> > */
> >
> >
> >Last sentence being perfect :
> >
> >Read-side critical sections in interrupt context
> >can use just rcu_read_lock().
> >
> 
> Yeah, right, then it is more likely to be a bug of rcu lockdep.
> Paul is looking at it.

Except that it seems to be working correctly for me...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-13 21:58                   ` Paul E. McKenney
@ 2010-03-15  1:08                     ` Américo Wang
  2010-03-15  3:10                       ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-15  1:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Américo Wang, Eric Dumazet, David Miller, peterz,
	linux-kernel, netdev

On Sat, Mar 13, 2010 at 01:58:38PM -0800, Paul E. McKenney wrote:
>On Sat, Mar 13, 2010 at 01:33:56PM +0800, Américo Wang wrote:
>> On Fri, Mar 12, 2010 at 02:37:38PM +0100, Eric Dumazet wrote:
>> >Le vendredi 12 mars 2010 à 21:11 +0800, Américo Wang a écrit :
>> >
>> >> Oh, but lockdep complains about rcu_read_lock(), it said
>> >> rcu_read_lock() can't be used in softirq context.
>> >> 
>> >> Am I missing something?
>> >
>> >Well, lockdep might be dumb, I dont know...
>> >
>> >I suggest you read rcu_read_lock_bh kernel doc :
>> >
>> >/**
>> > * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
>> >section
>> > *
>> > * This is equivalent of rcu_read_lock(), but to be used when updates
>> > * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
>> > * consider completion of a softirq handler to be a quiescent state,
>> > * a process in RCU read-side critical section must be protected by
>> > * disabling softirqs. Read-side critical sections in interrupt context
>> > * can use just rcu_read_lock().
>> > *
>> > */
>> >
>> >
>> >Last sentence being perfect :
>> >
>> >Read-side critical sections in interrupt context
>> >can use just rcu_read_lock().
>> >
>> 
>> Yeah, right, then it is more likely to be a bug of rcu lockdep.
>> Paul is looking at it.
>
>Except that it seems to be working correctly for me...
>

Hmm, then I am confused. The only possibility here is that this is
a lockdep bug...

Thanks for your help!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-15  1:08                     ` Américo Wang
@ 2010-03-15  3:10                       ` Américo Wang
  2010-03-15  9:39                         ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-15  3:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Américo Wang, Eric Dumazet, David Miller, peterz,
	linux-kernel, netdev

2010/3/15 Américo Wang <xiyou.wangcong@gmail.com>:
> On Sat, Mar 13, 2010 at 01:58:38PM -0800, Paul E. McKenney wrote:
>>On Sat, Mar 13, 2010 at 01:33:56PM +0800, Américo Wang wrote:
>>> On Fri, Mar 12, 2010 at 02:37:38PM +0100, Eric Dumazet wrote:
>>> >Le vendredi 12 mars 2010 à 21:11 +0800, Américo Wang a écrit :
>>> >
>>> >> Oh, but lockdep complains about rcu_read_lock(), it said
>>> >> rcu_read_lock() can't be used in softirq context.
>>> >>
>>> >> Am I missing something?
>>> >
>>> >Well, lockdep might be dumb, I dont know...
>>> >
>>> >I suggest you read rcu_read_lock_bh kernel doc :
>>> >
>>> >/**
>>> > * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
>>> >section
>>> > *
>>> > * This is equivalent of rcu_read_lock(), but to be used when updates
>>> > * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
>>> > * consider completion of a softirq handler to be a quiescent state,
>>> > * a process in RCU read-side critical section must be protected by
>>> > * disabling softirqs. Read-side critical sections in interrupt context
>>> > * can use just rcu_read_lock().
>>> > *
>>> > */
>>> >
>>> >
>>> >Last sentence being perfect :
>>> >
>>> >Read-side critical sections in interrupt context
>>> >can use just rcu_read_lock().
>>> >
>>>
>>> Yeah, right, then it is more likely to be a bug of rcu lockdep.
>>> Paul is looking at it.
>>
>>Except that it seems to be working correctly for me...
>>
>
> Hmm, then I am confused. The only possibility here is that this is
> a lockdep bug...
>

I believe so...

Peter, this looks odd:

 kernel:  (usbfs_mutex){+.?...}, at: [<ffffffff8146419f>]
netif_receive_skb+0xe7/0x819

netif_receive_skb() never has a chance to take usbfs_mutex. How can this
comes out?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-15  3:10                       ` Américo Wang
@ 2010-03-15  9:39                         ` Américo Wang
  2010-03-15 10:04                           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-15  9:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Américo Wang, Eric Dumazet, David Miller, peterz,
	linux-kernel, netdev

2010/3/15 Américo Wang <xiyou.wangcong@gmail.com>:
> 2010/3/15 Américo Wang <xiyou.wangcong@gmail.com>:
>> On Sat, Mar 13, 2010 at 01:58:38PM -0800, Paul E. McKenney wrote:
>>>On Sat, Mar 13, 2010 at 01:33:56PM +0800, Américo Wang wrote:
>>>> On Fri, Mar 12, 2010 at 02:37:38PM +0100, Eric Dumazet wrote:
>>>> >Le vendredi 12 mars 2010 à 21:11 +0800, Américo Wang a écrit :
>>>> >
>>>> >> Oh, but lockdep complains about rcu_read_lock(), it said
>>>> >> rcu_read_lock() can't be used in softirq context.
>>>> >>
>>>> >> Am I missing something?
>>>> >
>>>> >Well, lockdep might be dumb, I dont know...
>>>> >
>>>> >I suggest you read rcu_read_lock_bh kernel doc :
>>>> >
>>>> >/**
>>>> > * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical
>>>> >section
>>>> > *
>>>> > * This is equivalent of rcu_read_lock(), but to be used when updates
>>>> > * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
>>>> > * consider completion of a softirq handler to be a quiescent state,
>>>> > * a process in RCU read-side critical section must be protected by
>>>> > * disabling softirqs. Read-side critical sections in interrupt context
>>>> > * can use just rcu_read_lock().
>>>> > *
>>>> > */
>>>> >
>>>> >
>>>> >Last sentence being perfect :
>>>> >
>>>> >Read-side critical sections in interrupt context
>>>> >can use just rcu_read_lock().
>>>> >
>>>>
>>>> Yeah, right, then it is more likely to be a bug of rcu lockdep.
>>>> Paul is looking at it.
>>>
>>>Except that it seems to be working correctly for me...
>>>
>>
>> Hmm, then I am confused. The only possibility here is that this is
>> a lockdep bug...
>>
>
> I believe so...
>
> Peter, this looks odd:
>
>  kernel:  (usbfs_mutex){+.?...}, at: [<ffffffff8146419f>]
> netif_receive_skb+0xe7/0x819
>
> netif_receive_skb() never has a chance to take usbfs_mutex. How can this
> comes out?
>

Ok, I think I found what lockdep really complains about, it is that we took
spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment,
later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(),
so lockdep thought we broke the locking rule.

I don't know why netpoll_rx() needs irq disabled, it looks like that no one
takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock)
instead? Or am I missing something here? Eric? David?

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-15  9:39                         ` Américo Wang
@ 2010-03-15 10:04                           ` Eric Dumazet
  2010-03-15 10:12                             ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-03-15 10:04 UTC (permalink / raw)
  To: Américo Wang
  Cc: Paul E. McKenney, David Miller, peterz, linux-kernel, netdev

Le lundi 15 mars 2010 à 17:39 +0800, Américo Wang a écrit :

> 
> Ok, I think I found what lockdep really complains about, it is that we took
> spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment,
> later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(),
> so lockdep thought we broke the locking rule.
> 
> I don't know why netpoll_rx() needs irq disabled, it looks like that no one
> takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock)
> instead? Or am I missing something here? Eric? David?

I am a bit lost.

Could you give the complete picture, because I cannot find it in my
netdev archives.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-15 10:04                           ` Eric Dumazet
@ 2010-03-15 10:12                             ` Américo Wang
  2010-03-15 10:41                               ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Américo Wang @ 2010-03-15 10:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Paul E. McKenney, David Miller, peterz, linux-kernel, netdev

On Mon, Mar 15, 2010 at 6:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 15 mars 2010 à 17:39 +0800, Américo Wang a écrit :
>
>>
>> Ok, I think I found what lockdep really complains about, it is that we took
>> spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment,
>> later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(),
>> so lockdep thought we broke the locking rule.
>>
>> I don't know why netpoll_rx() needs irq disabled, it looks like that no one
>> takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock)
>> instead? Or am I missing something here? Eric? David?
>
> I am a bit lost.
>
> Could you give the complete picture, because I cannot find it in my
> netdev archives.
>

Sure, sorry for this.

Here is the whole thread:

http://lkml.org/lkml/2010/3/11/100

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-15 10:12                             ` Américo Wang
@ 2010-03-15 10:41                               ` Eric Dumazet
  2010-03-16 10:26                                 ` Américo Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2010-03-15 10:41 UTC (permalink / raw)
  To: Américo Wang
  Cc: Paul E. McKenney, David Miller, peterz, linux-kernel, netdev

Le lundi 15 mars 2010 à 18:12 +0800, Américo Wang a écrit :
> On Mon, Mar 15, 2010 at 6:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le lundi 15 mars 2010 à 17:39 +0800, Américo Wang a écrit :
> >
> >>
> >> Ok, I think I found what lockdep really complains about, it is that we took
> >> spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment,
> >> later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(),
> >> so lockdep thought we broke the locking rule.
> >>
> >> I don't know why netpoll_rx() needs irq disabled, it looks like that no one
> >> takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock)
> >> instead? Or am I missing something here? Eric? David?
> >
> > I am a bit lost.
> >
> > Could you give the complete picture, because I cannot find it in my
> > netdev archives.
> >
> 
> Sure, sorry for this.
> 
> Here is the whole thread:
> 
> http://lkml.org/lkml/2010/3/11/100

OK thanks

netpoll_rx() can be called from hard irqs (netif_rx()), so rx_lock
definitly needs irq care.

netpoll_poll_lock() does take a spinlock with irq enabled, but its not
rx_lock, its napi->poll_lock.

I dont see what could be the problem, is it reproductible with vanilla
kernel ?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.34-rc1: rcu lockdep bug?
  2010-03-15 10:41                               ` Eric Dumazet
@ 2010-03-16 10:26                                 ` Américo Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Américo Wang @ 2010-03-16 10:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Paul E. McKenney, David Miller, peterz, linux-kernel, netdev

On Mon, Mar 15, 2010 at 6:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 15 mars 2010 à 18:12 +0800, Américo Wang a écrit :
>> On Mon, Mar 15, 2010 at 6:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le lundi 15 mars 2010 à 17:39 +0800, Américo Wang a écrit :
>> >
>> >>
>> >> Ok, I think I found what lockdep really complains about, it is that we took
>> >> spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment,
>> >> later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(),
>> >> so lockdep thought we broke the locking rule.
>> >>
>> >> I don't know why netpoll_rx() needs irq disabled, it looks like that no one
>> >> takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock)
>> >> instead? Or am I missing something here? Eric? David?
>> >
>> > I am a bit lost.
>> >
>> > Could you give the complete picture, because I cannot find it in my
>> > netdev archives.
>> >
>>
>> Sure, sorry for this.
>>
>> Here is the whole thread:
>>
>> http://lkml.org/lkml/2010/3/11/100
>
> OK thanks
>
> netpoll_rx() can be called from hard irqs (netif_rx()), so rx_lock
> definitly needs irq care.
>
> netpoll_poll_lock() does take a spinlock with irq enabled, but its not
> rx_lock, its napi->poll_lock.

Yeah, I knew, but besides rcu locks, these two locks are the only
locks that can be taken in the call chain. I suspect lockdep got
something wrong.

>
> I dont see what could be the problem, is it reproductible with vanilla
> kernel ?
>

No. I don't know why, my patch doesn't touch any function in the
call chain.

I already "fix" this in another way, so no need to worry this any more.

Thanks for your help!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-03-16 10:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2375c9f91003110205v1d7f00bfk89472cb11bd985d3@mail.gmail.com>
     [not found] ` <20100311134556.GA6344@linux.vnet.ibm.com>
     [not found]   ` <20100311161751.GA3804@hack>
2010-03-12  7:56     ` 2.6.34-rc1: rcu lockdep bug? Américo Wang
2010-03-12  8:07       ` David Miller
2010-03-12  8:59         ` Américo Wang
2010-03-12 11:11           ` Eric Dumazet
2010-03-12 13:11             ` Américo Wang
2010-03-12 13:37               ` Eric Dumazet
2010-03-13  5:33                 ` Américo Wang
2010-03-13 21:58                   ` Paul E. McKenney
2010-03-15  1:08                     ` Américo Wang
2010-03-15  3:10                       ` Américo Wang
2010-03-15  9:39                         ` Américo Wang
2010-03-15 10:04                           ` Eric Dumazet
2010-03-15 10:12                             ` Américo Wang
2010-03-15 10:41                               ` Eric Dumazet
2010-03-16 10:26                                 ` Américo Wang
2010-03-12 22:03               ` Paul E. McKenney
2010-03-13  5:31                 ` Américo Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).