public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
@ 2008-06-02 23:47 Kevin Winchester
  2008-06-03  7:53 ` Sitsofe Wheeler
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Kevin Winchester @ 2008-06-02 23:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Linux Kernel Mailing List


In next-20080530 and next-20080602 (and possibly earlier - I can't 
remember the linux-next tree before that I tried) I get the following:

[   12.885153] ------------[ cut here ]------------
[   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
[   12.885248] Pid: 4, comm: watchdog/0 Not tainted 
2.6.26-rc4-next-20080602 #13
[   12.885292]
[   12.885293] Call Trace:
[   12.885364]  [<ffffffff8022bbd5>] warn_on_slowpath+0x58/0x8a
[   12.885410]  [<ffffffff804c9cfe>] ? _spin_unlock_irqrestore+0x51/0x6d
[   12.885455]  [<ffffffff8032ff41>] ? debug_locks_off+0x9/0x3c
[   12.885498]  [<ffffffff802582dd>] ? ftrace_record_ip+0x1fa/0x272
[   12.885542]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
[   12.885584]  [<ffffffff8020b2c0>] ? mcount_call+0x5/0x35
[   12.885627]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
[   12.885668]  [<ffffffff80247c80>] check_flags+0x98/0x151
[   12.885710]  [<ffffffff8024ae72>] lock_acquire+0x4a/0xa9
[   12.885753]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
[   12.885795]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
[   12.885837]  [<ffffffff804c98da>] _read_lock+0x37/0x43
[   12.885879]  [<ffffffff8025278a>] watchdog+0xc5/0x1ff
[   12.885921]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
[   12.885963]  [<ffffffff8023e48b>] kthread+0x4e/0x7b
[   12.886005]  [<ffffffff8020bf18>] child_rip+0xa/0x12
[   12.886046]  [<ffffffff80227d8f>] ? finish_task_switch+0x57/0x92
[   12.886090]  [<ffffffff804c9d55>] ? _spin_unlock_irq+0x3b/0x57
[   12.886133]  [<ffffffff8020bad3>] ? restore_args+0x0/0x30
[   12.886137]  [<ffffffff8023e43d>] ? kthread+0x0/0x7b
[   12.886137]  [<ffffffff8020bf0e>] ? child_rip+0x0/0x12
[   12.886137]
[   12.886137] ---[ end trace 60e7f098a6913839 ]---
[   12.886137] possible reason: unannotated irqs-on.
[   12.886137] irq event stamp: 20
[   12.886137] hardirqs last  enabled at (19): [<ffffffff80249cc3>] 
trace_hardirqs_on+0xd/0xf
[   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>] 
trace_hardirqs_off+0xd/0xf
[   12.886137] softirqs last  enabled at (0): [<ffffffff80229fef>] 
copy_process+0x2da/0x109e
[   12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0

Do I understand this correctly that something enabled irqs in a way that 
got around lockdep?  I assume the problem is not in watchdog, just that 
the watchdog was the next thing to run that interacted with irqs and 
thus lockdep picked up the situation then?

Is there additional debugging I can do, given some instructions?  If the 
cause is readily apparent to anyone, could they let me know (for my own 
interest) why it is apparent so that I can investigate things like this 
further next time?

This is completely reproducible on every boot - should I try to bisect it?

-- 
Kevin Winchester



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
  2008-06-02 23:47 linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Kevin Winchester
@ 2008-06-03  7:53 ` Sitsofe Wheeler
  2008-06-03  8:53 ` Peter Zijlstra
  2008-06-04  7:45 ` linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Sitsofe Wheeler
  2 siblings, 0 replies; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-06-03  7:53 UTC (permalink / raw)
  To: linux-kernel

Kevin Winchester wrote:

> In next-20080530 and next-20080602 (and possibly earlier - I can't
> remember the linux-next tree before that I tried) I get the following:
> 
> [   12.885153] ------------[ cut here ]------------
> [   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
> [   12.885248] Pid: 4, comm: watchdog/0 Not tainted
> 2.6.26-rc4-next-20080602 #13

I saw this and mentioned it too - 
http://groups.google.co.uk/group/fa.linux.kernel/browse_thread/thread/77d4cedb7b27fa3b/fc66f2d849b42031?lnk=gst&q=watchdog+sitsofe#fc66f2d849b42031 .

> This is completely reproducible on every boot - should I try to bisect it?

Things that have been bisected seem far more likely to be looked at so if
you have the stomach... If you do start bisection I can confirm that it is
in next-20080529 . I can also mention that on my system the warning seems
to go away if I boot with nmi_watchdog=2 .

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
  2008-06-02 23:47 linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Kevin Winchester
  2008-06-03  7:53 ` Sitsofe Wheeler
@ 2008-06-03  8:53 ` Peter Zijlstra
  2008-06-03 12:22   ` Kevin Winchester
  2008-06-04  7:45 ` linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Sitsofe Wheeler
  2 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2008-06-03  8:53 UTC (permalink / raw)
  To: Kevin Winchester; +Cc: Ingo Molnar, Linux Kernel Mailing List

On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote:
> In next-20080530 and next-20080602 (and possibly earlier - I can't 
> remember the linux-next tree before that I tried) I get the following:
> 
> [   12.885153] ------------[ cut here ]------------
> [   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
> [   12.885248] Pid: 4, comm: watchdog/0 Not tainted 
> 2.6.26-rc4-next-20080602 #13
> [   12.885292]
> [   12.885293] Call Trace:
> [   12.885364]  [<ffffffff8022bbd5>] warn_on_slowpath+0x58/0x8a
> [   12.885410]  [<ffffffff804c9cfe>] ? _spin_unlock_irqrestore+0x51/0x6d
> [   12.885455]  [<ffffffff8032ff41>] ? debug_locks_off+0x9/0x3c
> [   12.885498]  [<ffffffff802582dd>] ? ftrace_record_ip+0x1fa/0x272
> [   12.885542]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
> [   12.885584]  [<ffffffff8020b2c0>] ? mcount_call+0x5/0x35
> [   12.885627]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
> [   12.885668]  [<ffffffff80247c80>] check_flags+0x98/0x151
> [   12.885710]  [<ffffffff8024ae72>] lock_acquire+0x4a/0xa9
> [   12.885753]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
> [   12.885795]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
> [   12.885837]  [<ffffffff804c98da>] _read_lock+0x37/0x43
> [   12.885879]  [<ffffffff8025278a>] watchdog+0xc5/0x1ff
> [   12.885921]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
> [   12.885963]  [<ffffffff8023e48b>] kthread+0x4e/0x7b
> [   12.886005]  [<ffffffff8020bf18>] child_rip+0xa/0x12
> [   12.886046]  [<ffffffff80227d8f>] ? finish_task_switch+0x57/0x92
> [   12.886090]  [<ffffffff804c9d55>] ? _spin_unlock_irq+0x3b/0x57
> [   12.886133]  [<ffffffff8020bad3>] ? restore_args+0x0/0x30
> [   12.886137]  [<ffffffff8023e43d>] ? kthread+0x0/0x7b
> [   12.886137]  [<ffffffff8020bf0e>] ? child_rip+0x0/0x12
> [   12.886137]
> [   12.886137] ---[ end trace 60e7f098a6913839 ]---
> [   12.886137] possible reason: unannotated irqs-on.
> [   12.886137] irq event stamp: 20
> [   12.886137] hardirqs last  enabled at (19): [<ffffffff80249cc3>] 
> trace_hardirqs_on+0xd/0xf
> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>] 
> trace_hardirqs_off+0xd/0xf
> [   12.886137] softirqs last  enabled at (0): [<ffffffff80229fef>] 
> copy_process+0x2da/0x109e
> [   12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0
> 
> Do I understand this correctly that something enabled irqs in a way that 
> got around lockdep?  I assume the problem is not in watchdog, just that 
> the watchdog was the next thing to run that interacted with irqs and 
> thus lockdep picked up the situation then?
> 
> Is there additional debugging I can do, given some instructions?  If the 
> cause is readily apparent to anyone, could they let me know (for my own 
> interest) why it is apparent so that I can investigate things like this 
> further next time?


You are correct - someone did sti but failed to call
trace_hardirqs_on(). Frequently its possible to isolate the code from
knowing the last recorded event:

[   12.886137] irq event stamp: 20
[   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>] trace_hardirqs_off+0xd/0xf

However your compilation seems to have lost the caller IP:

void trace_hardirqs_off(void)
{
        trace_hardirqs_off_caller(CALLER_ADDR0);
}
EXPORT_SYMBOL(trace_hardirqs_off);

So that is of little help here. (Not sure how that happened, nor how you
could fix that - perhaps turn on some debugging knobs like build with
debug info etc..)


> This is completely reproducible on every boot - should I try to bisect it?

That is a possibility yes - if you can find the offending patch it
should be relatively straight forward to find the offending sti.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
  2008-06-03  8:53 ` Peter Zijlstra
@ 2008-06-03 12:22   ` Kevin Winchester
  2008-06-03 16:28     ` Johannes Weiner
  2008-06-04 17:50     ` [X86][BISECTED] WARNING: at kernel/lockdep.c:2680 Sitsofe Wheeler
  0 siblings, 2 replies; 9+ messages in thread
From: Kevin Winchester @ 2008-06-03 12:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Linux Kernel Mailing List

On Tue, Jun 3, 2008 at 5:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote:
>> In next-20080530 and next-20080602 (and possibly earlier - I can't
>> remember the linux-next tree before that I tried) I get the following:
>>
>> [   12.885153] ------------[ cut here ]------------
>> [   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
>> [   12.885248] Pid: 4, comm: watchdog/0 Not tainted
>> 2.6.26-rc4-next-20080602 #13
>> [   12.885292]
>> [   12.885293] Call Trace:
>> [   12.885364]  [<ffffffff8022bbd5>] warn_on_slowpath+0x58/0x8a
>> [   12.885410]  [<ffffffff804c9cfe>] ? _spin_unlock_irqrestore+0x51/0x6d
>> [   12.885455]  [<ffffffff8032ff41>] ? debug_locks_off+0x9/0x3c
>> [   12.885498]  [<ffffffff802582dd>] ? ftrace_record_ip+0x1fa/0x272
>> [   12.885542]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
>> [   12.885584]  [<ffffffff8020b2c0>] ? mcount_call+0x5/0x35
>> [   12.885627]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
>> [   12.885668]  [<ffffffff80247c80>] check_flags+0x98/0x151
>> [   12.885710]  [<ffffffff8024ae72>] lock_acquire+0x4a/0xa9
>> [   12.885753]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
>> [   12.885795]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
>> [   12.885837]  [<ffffffff804c98da>] _read_lock+0x37/0x43
>> [   12.885879]  [<ffffffff8025278a>] watchdog+0xc5/0x1ff
>> [   12.885921]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
>> [   12.885963]  [<ffffffff8023e48b>] kthread+0x4e/0x7b
>> [   12.886005]  [<ffffffff8020bf18>] child_rip+0xa/0x12
>> [   12.886046]  [<ffffffff80227d8f>] ? finish_task_switch+0x57/0x92
>> [   12.886090]  [<ffffffff804c9d55>] ? _spin_unlock_irq+0x3b/0x57
>> [   12.886133]  [<ffffffff8020bad3>] ? restore_args+0x0/0x30
>> [   12.886137]  [<ffffffff8023e43d>] ? kthread+0x0/0x7b
>> [   12.886137]  [<ffffffff8020bf0e>] ? child_rip+0x0/0x12
>> [   12.886137]
>> [   12.886137] ---[ end trace 60e7f098a6913839 ]---
>> [   12.886137] possible reason: unannotated irqs-on.
>> [   12.886137] irq event stamp: 20
>> [   12.886137] hardirqs last  enabled at (19): [<ffffffff80249cc3>]
>> trace_hardirqs_on+0xd/0xf
>> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>]
>> trace_hardirqs_off+0xd/0xf
>> [   12.886137] softirqs last  enabled at (0): [<ffffffff80229fef>]
>> copy_process+0x2da/0x109e
>> [   12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0
>>
>> Do I understand this correctly that something enabled irqs in a way that
>> got around lockdep?  I assume the problem is not in watchdog, just that
>> the watchdog was the next thing to run that interacted with irqs and
>> thus lockdep picked up the situation then?
>>
>> Is there additional debugging I can do, given some instructions?  If the
>> cause is readily apparent to anyone, could they let me know (for my own
>> interest) why it is apparent so that I can investigate things like this
>> further next time?
>
>
> You are correct - someone did sti but failed to call
> trace_hardirqs_on(). Frequently its possible to isolate the code from
> knowing the last recorded event:
>
> [   12.886137] irq event stamp: 20
> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>] trace_hardirqs_off+0xd/0xf
>
> However your compilation seems to have lost the caller IP:
>
> void trace_hardirqs_off(void)
> {
>        trace_hardirqs_off_caller(CALLER_ADDR0);
> }
> EXPORT_SYMBOL(trace_hardirqs_off);
>
> So that is of little help here. (Not sure how that happened, nor how you
> could fix that - perhaps turn on some debugging knobs like build with
> debug info etc..)
>

Could this be related to CONFIG_CC_OPTIMIZE_INLINING (or whatever it
is called now)?  I am pretty sure I have that enabled.  I will check
the rest of my config this evening to see if there is anything else I
can turn on/off to help.

>
>> This is completely reproducible on every boot - should I try to bisect it?
>
> That is a possibility yes - if you can find the offending patch it
> should be relatively straight forward to find the offending sti.
>

Sure - I can try it this evening.  Also, is calling sti the only way
this could have happened?  And is linux-next broken out into a single
patch?  I wouldn't expect there to be too many calls to sti, so a grep
in the patch file might be quicker than bisection.

Thanks,

-- 
Kevin Winchester

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
  2008-06-03 12:22   ` Kevin Winchester
@ 2008-06-03 16:28     ` Johannes Weiner
  2008-06-03 16:34       ` Peter Zijlstra
  2008-06-04 17:50     ` [X86][BISECTED] WARNING: at kernel/lockdep.c:2680 Sitsofe Wheeler
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2008-06-03 16:28 UTC (permalink / raw)
  To: Kevin Winchester; +Cc: Peter Zijlstra, Ingo Molnar, Linux Kernel Mailing List

Hi,

"Kevin Winchester" <kjwinchester@gmail.com> writes:

> On Tue, Jun 3, 2008 at 5:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote:
>>> In next-20080530 and next-20080602 (and possibly earlier - I can't
>>> remember the linux-next tree before that I tried) I get the following:
>>>
>>> [   12.885153] ------------[ cut here ]------------
>>> [   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
>>> [   12.885248] Pid: 4, comm: watchdog/0 Not tainted
>>> 2.6.26-rc4-next-20080602 #13
>>> [   12.885292]
>>> [   12.885293] Call Trace:
>>> [   12.885364]  [<ffffffff8022bbd5>] warn_on_slowpath+0x58/0x8a
>>> [   12.885410]  [<ffffffff804c9cfe>] ? _spin_unlock_irqrestore+0x51/0x6d
>>> [   12.885455]  [<ffffffff8032ff41>] ? debug_locks_off+0x9/0x3c
>>> [   12.885498]  [<ffffffff802582dd>] ? ftrace_record_ip+0x1fa/0x272
>>> [   12.885542]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
>>> [   12.885584]  [<ffffffff8020b2c0>] ? mcount_call+0x5/0x35
>>> [   12.885627]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
>>> [   12.885668]  [<ffffffff80247c80>] check_flags+0x98/0x151
>>> [   12.885710]  [<ffffffff8024ae72>] lock_acquire+0x4a/0xa9
>>> [   12.885753]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
>>> [   12.885795]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
>>> [   12.885837]  [<ffffffff804c98da>] _read_lock+0x37/0x43
>>> [   12.885879]  [<ffffffff8025278a>] watchdog+0xc5/0x1ff
>>> [   12.885921]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
>>> [   12.885963]  [<ffffffff8023e48b>] kthread+0x4e/0x7b
>>> [   12.886005]  [<ffffffff8020bf18>] child_rip+0xa/0x12
>>> [   12.886046]  [<ffffffff80227d8f>] ? finish_task_switch+0x57/0x92
>>> [   12.886090]  [<ffffffff804c9d55>] ? _spin_unlock_irq+0x3b/0x57
>>> [   12.886133]  [<ffffffff8020bad3>] ? restore_args+0x0/0x30
>>> [   12.886137]  [<ffffffff8023e43d>] ? kthread+0x0/0x7b
>>> [   12.886137]  [<ffffffff8020bf0e>] ? child_rip+0x0/0x12
>>> [   12.886137]
>>> [   12.886137] ---[ end trace 60e7f098a6913839 ]---
>>> [   12.886137] possible reason: unannotated irqs-on.
>>> [   12.886137] irq event stamp: 20
>>> [   12.886137] hardirqs last  enabled at (19): [<ffffffff80249cc3>]
>>> trace_hardirqs_on+0xd/0xf
>>> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>]
>>> trace_hardirqs_off+0xd/0xf
>>> [   12.886137] softirqs last  enabled at (0): [<ffffffff80229fef>]
>>> copy_process+0x2da/0x109e
>>> [   12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0
>>>
>>> Do I understand this correctly that something enabled irqs in a way that
>>> got around lockdep?  I assume the problem is not in watchdog, just that
>>> the watchdog was the next thing to run that interacted with irqs and
>>> thus lockdep picked up the situation then?
>>>
>>> Is there additional debugging I can do, given some instructions?  If the
>>> cause is readily apparent to anyone, could they let me know (for my own
>>> interest) why it is apparent so that I can investigate things like this
>>> further next time?
>>
>>
>> You are correct - someone did sti but failed to call
>> trace_hardirqs_on(). Frequently its possible to isolate the code from
>> knowing the last recorded event:
>>
>> [   12.886137] irq event stamp: 20
>> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>] trace_hardirqs_off+0xd/0xf
>>
>> However your compilation seems to have lost the caller IP:
>>
>> void trace_hardirqs_off(void)
>> {
>>        trace_hardirqs_off_caller(CALLER_ADDR0);
>> }
>> EXPORT_SYMBOL(trace_hardirqs_off);
>>
>> So that is of little help here. (Not sure how that happened, nor how you
>> could fix that - perhaps turn on some debugging knobs like build with
>> debug info etc..)
>>
>
> Could this be related to CONFIG_CC_OPTIMIZE_INLINING (or whatever it
> is called now)?  I am pretty sure I have that enabled.  I will check
> the rest of my config this evening to see if there is anything else I
> can turn on/off to help.
>
>>
>>> This is completely reproducible on every boot - should I try to bisect it?
>>
>> That is a possibility yes - if you can find the offending patch it
>> should be relatively straight forward to find the offending sti.
>>
>
> Sure - I can try it this evening.  Also, is calling sti the only way
> this could have happened?  And is linux-next broken out into a single
> patch?  I wouldn't expect there to be too many calls to sti, so a grep
> in the patch file might be quicker than bisection.

git-log -p origin/master..next/master showed nothing interesting.
Grepping for sti() would also not find indirect calls.

	Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
  2008-06-03 16:28     ` Johannes Weiner
@ 2008-06-03 16:34       ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2008-06-03 16:34 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Kevin Winchester, Ingo Molnar, Linux Kernel Mailing List

On Tue, 2008-06-03 at 18:28 +0200, Johannes Weiner wrote:
> Hi,
> 
> "Kevin Winchester" <kjwinchester@gmail.com> writes:
> 
> > On Tue, Jun 3, 2008 at 5:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >> On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote:
> >>> In next-20080530 and next-20080602 (and possibly earlier - I can't
> >>> remember the linux-next tree before that I tried) I get the following:
> >>>
> >>> [   12.885153] ------------[ cut here ]------------
> >>> [   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
> >>> [   12.885248] Pid: 4, comm: watchdog/0 Not tainted
> >>> 2.6.26-rc4-next-20080602 #13
> >>> [   12.885292]
> >>> [   12.885293] Call Trace:
> >>> [   12.885364]  [<ffffffff8022bbd5>] warn_on_slowpath+0x58/0x8a
> >>> [   12.885410]  [<ffffffff804c9cfe>] ? _spin_unlock_irqrestore+0x51/0x6d
> >>> [   12.885455]  [<ffffffff8032ff41>] ? debug_locks_off+0x9/0x3c
> >>> [   12.885498]  [<ffffffff802582dd>] ? ftrace_record_ip+0x1fa/0x272
> >>> [   12.885542]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
> >>> [   12.885584]  [<ffffffff8020b2c0>] ? mcount_call+0x5/0x35
> >>> [   12.885627]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
> >>> [   12.885668]  [<ffffffff80247c80>] check_flags+0x98/0x151
> >>> [   12.885710]  [<ffffffff8024ae72>] lock_acquire+0x4a/0xa9
> >>> [   12.885753]  [<ffffffff8025278a>] ? watchdog+0xc5/0x1ff
> >>> [   12.885795]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
> >>> [   12.885837]  [<ffffffff804c98da>] _read_lock+0x37/0x43
> >>> [   12.885879]  [<ffffffff8025278a>] watchdog+0xc5/0x1ff
> >>> [   12.885921]  [<ffffffff802526c5>] ? watchdog+0x0/0x1ff
> >>> [   12.885963]  [<ffffffff8023e48b>] kthread+0x4e/0x7b
> >>> [   12.886005]  [<ffffffff8020bf18>] child_rip+0xa/0x12
> >>> [   12.886046]  [<ffffffff80227d8f>] ? finish_task_switch+0x57/0x92
> >>> [   12.886090]  [<ffffffff804c9d55>] ? _spin_unlock_irq+0x3b/0x57
> >>> [   12.886133]  [<ffffffff8020bad3>] ? restore_args+0x0/0x30
> >>> [   12.886137]  [<ffffffff8023e43d>] ? kthread+0x0/0x7b
> >>> [   12.886137]  [<ffffffff8020bf0e>] ? child_rip+0x0/0x12
> >>> [   12.886137]
> >>> [   12.886137] ---[ end trace 60e7f098a6913839 ]---
> >>> [   12.886137] possible reason: unannotated irqs-on.
> >>> [   12.886137] irq event stamp: 20
> >>> [   12.886137] hardirqs last  enabled at (19): [<ffffffff80249cc3>]
> >>> trace_hardirqs_on+0xd/0xf
> >>> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>]
> >>> trace_hardirqs_off+0xd/0xf
> >>> [   12.886137] softirqs last  enabled at (0): [<ffffffff80229fef>]
> >>> copy_process+0x2da/0x109e
> >>> [   12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0
> >>>
> >>> Do I understand this correctly that something enabled irqs in a way that
> >>> got around lockdep?  I assume the problem is not in watchdog, just that
> >>> the watchdog was the next thing to run that interacted with irqs and
> >>> thus lockdep picked up the situation then?
> >>>
> >>> Is there additional debugging I can do, given some instructions?  If the
> >>> cause is readily apparent to anyone, could they let me know (for my own
> >>> interest) why it is apparent so that I can investigate things like this
> >>> further next time?
> >>
> >>
> >> You are correct - someone did sti but failed to call
> >> trace_hardirqs_on(). Frequently its possible to isolate the code from
> >> knowing the last recorded event:
> >>
> >> [   12.886137] irq event stamp: 20
> >> [   12.886137] hardirqs last disabled at (20): [<ffffffff80248565>] trace_hardirqs_off+0xd/0xf
> >>
> >> However your compilation seems to have lost the caller IP:
> >>
> >> void trace_hardirqs_off(void)
> >> {
> >>        trace_hardirqs_off_caller(CALLER_ADDR0);
> >> }
> >> EXPORT_SYMBOL(trace_hardirqs_off);
> >>
> >> So that is of little help here. (Not sure how that happened, nor how you
> >> could fix that - perhaps turn on some debugging knobs like build with
> >> debug info etc..)
> >>
> >
> > Could this be related to CONFIG_CC_OPTIMIZE_INLINING (or whatever it
> > is called now)?  I am pretty sure I have that enabled.  I will check
> > the rest of my config this evening to see if there is anything else I
> > can turn on/off to help.
> >
> >>
> >>> This is completely reproducible on every boot - should I try to bisect it?
> >>
> >> That is a possibility yes - if you can find the offending patch it
> >> should be relatively straight forward to find the offending sti.
> >>
> >
> > Sure - I can try it this evening.  Also, is calling sti the only way
> > this could have happened?  And is linux-next broken out into a single
> > patch?  I wouldn't expect there to be too many calls to sti, so a grep
> > in the patch file might be quicker than bisection.
> 
> git-log -p origin/master..next/master showed nothing interesting.
> Grepping for sti() would also not find indirect calls.

Yeah - except you'd miss all the (inline) asm. Not sure there are many
calls that implicity enable interrupts, but a popf can also enable them.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
  2008-06-02 23:47 linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Kevin Winchester
  2008-06-03  7:53 ` Sitsofe Wheeler
  2008-06-03  8:53 ` Peter Zijlstra
@ 2008-06-04  7:45 ` Sitsofe Wheeler
  2 siblings, 0 replies; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-06-04  7:45 UTC (permalink / raw)
  To: linux-kernel

Kevin Winchester wrote:

> In next-20080530 and next-20080602 (and possibly earlier - I can't
> remember the linux-next tree before that I tried) I get the following:
> 
> [   12.885153] ------------[ cut here ]------------
> [   12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151()
> [   12.885248] Pid: 4, comm: watchdog/0 Not tainted
> 
> This is completely reproducible on every boot - should I try to bisect it?

If you are going to bisect it seems to me that the problem seems to have
occurred between next-20080526 and next-20080527...

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [X86][BISECTED] WARNING: at kernel/lockdep.c:2680
  2008-06-03 12:22   ` Kevin Winchester
  2008-06-03 16:28     ` Johannes Weiner
@ 2008-06-04 17:50     ` Sitsofe Wheeler
       [not found]       ` <20080605130823.GD6332@elte.hu>
  1 sibling, 1 reply; 9+ messages in thread
From: Sitsofe Wheeler @ 2008-06-04 17:50 UTC (permalink / raw)
  To: linux-kernel

Kevin Winchester wrote:

>> On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote:
>>> In next-20080530 and next-20080602 (and possibly earlier - I can't
>>> remember the linux-next tree before that I tried) I get the following:
>>>
>>> [   12.885153] ------------[ cut here ]------------
>>> [   12.885203] WARNING: at kernel/lockdep.c:2680
>>> [   check_flags+0x98/0x151() 12.885248] Pid: 4, comm: watchdog/0 Not

OK I've managed to bisect this down to commit
[c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4] (sched: do not trace
sched_clock):

commit c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4
Author: Ingo Molnar <mingo@elte.hu>
Date:   Mon May 12 21:21:14 2008 +0200

    sched: do not trace sched_clock

    The tracer uses sched_clock, so do not trace it.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reverting this made the lockdep warning go away. Here is the bisection log:
# bad: [cb689015ce306647ea9e00c2412b852b1dd1580d] Add linux-next specific
files
# good: [5ffeee1785c30303007b3d2dce85402b0bbf9b00] Add linux-next specific
files
git-bisect start 'next-20080527' 'next-20080526'
# bad: [357e7b54308101878f7e93e04fa8d7f06eea57a8] Merge
commit 'avr32/avr32-arch'
git-bisect bad 357e7b54308101878f7e93e04fa8d7f06eea57a8
# bad: [048f32bfd577d3c0c2c5f2e4b03690a2902be1c2] Merge
branch 'tracing/mmiotrace' into auto-ftrace-next
git-bisect bad 048f32bfd577d3c0c2c5f2e4b03690a2902be1c2
# good: [4d9493c90f8e6e1b164aede3814010a290161abb] ftrace: remove add-hoc
code
git-bisect good 4d9493c90f8e6e1b164aede3814010a290161abb
# good: [37135677e653537ffc6e7def679443272a1c03c3] ftrace: fix mcount export
bug
git-bisect good 37135677e653537ffc6e7def679443272a1c03c3
# good: [801a175bf601f9a9d5e86e92dee9adeeb6625da8] mmiotrace: ftrace fix
git-bisect good 801a175bf601f9a9d5e86e92dee9adeeb6625da8
# good: [e0fd5c2fa188311667267c02a702ae699a9fc2bd] mmiotrace: do not print
bogus pid for maps either
git-bisect good e0fd5c2fa188311667267c02a702ae699a9fc2bd
# good: [790e2a290b499b0400254e6870ec27969065d122] x86 mmiotrace: page level
is unsigned
git-bisect good 790e2a290b499b0400254e6870ec27969065d122
# good: [7b3ce4a410ccdc8ca0be361d4d4fd61c3830f52a] sched: clean up scheduler
markers
git-bisect good 7b3ce4a410ccdc8ca0be361d4d4fd61c3830f52a
# bad: [7fa13a1e8501b96ff11d4a19a5401f61ccd3d98c] Merge
branch 'tracing/ftrace' into auto-ftrace-next
git-bisect bad 7fa13a1e8501b96ff11d4a19a5401f61ccd3d98c
# bad: [c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4] sched: do not trace
sched_clock
git-bisect bad c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [X86][BISECTED] WARNING: at kernel/lockdep.c:2680
       [not found]       ` <20080605130823.GD6332@elte.hu>
@ 2008-06-05 23:53         ` Kevin Winchester
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin Winchester @ 2008-06-05 23:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Sitsofe Wheeler, Thomas Gleixner, Linux Kernel Mailing List

Ingo Molnar wrote:
> * Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
>> OK I've managed to bisect this down to commit 
>> [c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4] (sched: do not trace 
>> sched_clock):
>>
>> commit c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4
>> Author: Ingo Molnar <mingo@elte.hu>
>> Date:   Mon May 12 21:21:14 2008 +0200
>>
>>     sched: do not trace sched_clock
>>
>>     The tracer uses sched_clock, so do not trace it.
>>
>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>
>> Reverting this made the lockdep warning go away. Here is the bisection log:
> 
> ah, good find! Does the patch below (instead of the full revert) fix it 
> as well? I've queued the fix up into tip/tracing/ftrace.
> 
> 	Ingo
> 
> ---------------->
> Subject: sched: sched_clock() lockdep fix
> From: Ingo Molnar <mingo@elte.hu>
> Date: Thu Jun 05 15:04:17 CEST 2008
> 
> Sitsofe Wheeler reported a lockdep warning and bisected it down to:
> 
>> commit c6531cce6e6e4b99bcda46b6268d6f2d9e30aea4
>> Author: Ingo Molnar <mingo@elte.hu>
>> Date:   Mon May 12 21:21:14 2008 +0200
>>
>>     sched: do not trace sched_clock
> 
> do not use raw irq flags in cpu_clock() as it causes lockdep to lose
> track of the true state of the IRQ flag.
> 
> Reported-and-bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/sched.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux/kernel/sched.c
> ===================================================================
> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -862,7 +862,7 @@ unsigned long long notrace cpu_clock(int
>  	unsigned long long prev_cpu_time, time, delta_time;
>  	unsigned long flags;
>  
> -	raw_local_irq_save(flags);
> +	local_irq_save(flags);
>  	prev_cpu_time = per_cpu(prev_cpu_time, cpu);
>  	time = __cpu_clock(cpu) + per_cpu(time_offset, cpu);
>  	delta_time = time-prev_cpu_time;
> @@ -871,7 +871,7 @@ unsigned long long notrace cpu_clock(int
>  		time = __sync_cpu_clock(time, cpu);
>  		per_cpu(prev_cpu_time, cpu) = time;
>  	}
> -	raw_local_irq_restore(flags);
> +	local_irq_restore(flags);
>  
>  	return time;
>  }
> 

I didn't manage to get through a bisection run for the warning - thanks 
to Sitsofe for doing it.  The patch does eliminate the warning for me.

Thanks,

-- 
Kevin Winchester


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-06-05 23:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-02 23:47 linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Kevin Winchester
2008-06-03  7:53 ` Sitsofe Wheeler
2008-06-03  8:53 ` Peter Zijlstra
2008-06-03 12:22   ` Kevin Winchester
2008-06-03 16:28     ` Johannes Weiner
2008-06-03 16:34       ` Peter Zijlstra
2008-06-04 17:50     ` [X86][BISECTED] WARNING: at kernel/lockdep.c:2680 Sitsofe Wheeler
     [not found]       ` <20080605130823.GD6332@elte.hu>
2008-06-05 23:53         ` Kevin Winchester
2008-06-04  7:45 ` linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Sitsofe Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox