public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* lockdep and debug objects together are broken?
@ 2009-01-20  8:55 Nick Piggin
  2009-01-20 21:11 ` Vegard Nossum
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2009-01-20  8:55 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Vegard Nossum, Peter Zijlstra,
	Linux Kernel Mailing List

Hi,

I've had a problem frustrating my testing because lockdep was silently turning
itself off... I patched out the code to disable lockdep after the first error,
and it started showing up weird errors. kernel/fork.c:990 seemed to be the
first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
migration thread was reported to try to unlock rq->lock although it was
holding no locks. Then init was reported to return to userspace without
releasing an objectdebug hash lock.

All that went away and everything seemed to work properly with debug objects
configured out.

I didn't get too far in trying to debug the problem. But it should be easy
enough to reproduce (if not, I can post traces or test patches).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-20  8:55 lockdep and debug objects together are broken? Nick Piggin
@ 2009-01-20 21:11 ` Vegard Nossum
  2009-01-21  7:19   ` Nick Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Vegard Nossum @ 2009-01-20 21:11 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Linux Kernel Mailing List

On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote:
> Hi,
>
> I've had a problem frustrating my testing because lockdep was silently turning
> itself off... I patched out the code to disable lockdep after the first error,
> and it started showing up weird errors. kernel/fork.c:990 seemed to be the
> first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
> migration thread was reported to try to unlock rq->lock although it was
> holding no locks. Then init was reported to return to userspace without
> releasing an objectdebug hash lock.
>
> All that went away and everything seemed to work properly with debug objects
> configured out.
>
> I didn't get too far in trying to debug the problem. But it should be easy
> enough to reproduce (if not, I can post traces or test patches).

I just built a kernel with lockdep and debugobjects enabled, and
everything seemed fine. I think you should post your kernel version,
config, and the lockdep patch (if needed -- it didn't seem to turn
itself off here).


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-20 21:11 ` Vegard Nossum
@ 2009-01-21  7:19   ` Nick Piggin
  2009-01-21 11:42     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2009-01-21  7:19 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Linux Kernel Mailing List

On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote:
> On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote:
> > Hi,
> >
> > I've had a problem frustrating my testing because lockdep was silently turning
> > itself off... I patched out the code to disable lockdep after the first error,
> > and it started showing up weird errors. kernel/fork.c:990 seemed to be the
> > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
> > migration thread was reported to try to unlock rq->lock although it was
> > holding no locks. Then init was reported to return to userspace without
> > releasing an objectdebug hash lock.
> >
> > All that went away and everything seemed to work properly with debug objects
> > configured out.
> >
> > I didn't get too far in trying to debug the problem. But it should be easy
> > enough to reproduce (if not, I can post traces or test patches).
> 
> I just built a kernel with lockdep and debugobjects enabled, and
> everything seemed fine. I think you should post your kernel version,
> config, and the lockdep patch (if needed -- it didn't seem to turn
> itself off here).

Are you sure? Ie. sysrq+D a still works properly? In that case, you
wouldn't need the lockdep patch because it just prevents the feature from being
switched off.

I'll have to dig a bit further, then. The annoying thing is that
lockdep turns itself off at the drop of a hat (and this particular
problem seems to happen without any backtraces), so it invalidates
all your lockdep testing if you don't realise it has turned itself
off.

Is there a way to re-arm lockdep? That would be neat.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-21  7:19   ` Nick Piggin
@ 2009-01-21 11:42     ` Ingo Molnar
  2009-01-21 11:50       ` Peter Zijlstra
  2009-01-21 11:54       ` Nick Piggin
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2009-01-21 11:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Vegard Nossum, Thomas Gleixner, Peter Zijlstra,
	Linux Kernel Mailing List


* Nick Piggin <npiggin@suse.de> wrote:

> On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote:
> > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote:
> > > Hi,
> > >
> > > I've had a problem frustrating my testing because lockdep was silently turning
> > > itself off... I patched out the code to disable lockdep after the first error,
> > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the
> > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
> > > migration thread was reported to try to unlock rq->lock although it was
> > > holding no locks. Then init was reported to return to userspace without
> > > releasing an objectdebug hash lock.
> > >
> > > All that went away and everything seemed to work properly with debug objects
> > > configured out.
> > >
> > > I didn't get too far in trying to debug the problem. But it should be easy
> > > enough to reproduce (if not, I can post traces or test patches).
> > 
> > I just built a kernel with lockdep and debugobjects enabled, and
> > everything seemed fine. I think you should post your kernel version,
> > config, and the lockdep patch (if needed -- it didn't seem to turn
> > itself off here).
> 
> Are you sure? Ie. sysrq+D a still works properly? In that case, you
> wouldn't need the lockdep patch because it just prevents the feature from being
> switched off.
> 
> I'll have to dig a bit further, then. The annoying thing is that
> lockdep turns itself off at the drop of a hat (and this particular
> problem seems to happen without any backtraces), so it invalidates
> all your lockdep testing if you don't realise it has turned itself
> off.
> 
> Is there a way to re-arm lockdep? That would be neat.

Not at the moment, and it looks somewhat complicated. All lock state 
freezes the moment lockdep disarms itself. That's very much a key design 
element: rarely will you see any real lockdep-inflicted crash - even if it 
has a bug it is self-disabling itself and running for the door very 
efficiently.

So by the time you'd rearm, there's a lot of tasks with no proper locking 
state built up. We might be able to re-arm via stop_machine_run perhaps.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-21 11:42     ` Ingo Molnar
@ 2009-01-21 11:50       ` Peter Zijlstra
  2009-01-21 11:54         ` Ingo Molnar
  2009-01-21 11:54       ` Nick Piggin
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2009-01-21 11:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Vegard Nossum, Thomas Gleixner,
	Linux Kernel Mailing List

On Wed, 2009-01-21 at 12:42 +0100, Ingo Molnar wrote:

> So by the time you'd rearm, there's a lot of tasks with no proper locking 
> state built up. We might be able to re-arm via stop_machine_run perhaps.

Won't work either, kstopmachine only preempts everybody. We'd require
something stronger.

What we need is a point where there's guaranteed no locks held, for
regular tasks that would be a trip to userspace and back, but for kernel
tasks that's a bit harder -- does the freezer stuff guarantee this?

Supposing we have such a point for all tasks, what you then do is wipe
all lock state and rig a trigger to start tracking lock state once you
passed through the point.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-21 11:42     ` Ingo Molnar
  2009-01-21 11:50       ` Peter Zijlstra
@ 2009-01-21 11:54       ` Nick Piggin
  2009-01-21 11:57         ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Nick Piggin @ 2009-01-21 11:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vegard Nossum, Thomas Gleixner, Peter Zijlstra,
	Linux Kernel Mailing List

On Wed, Jan 21, 2009 at 12:42:29PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <npiggin@suse.de> wrote:
> 
> > On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote:
> > > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote:
> > > > Hi,
> > > >
> > > > I've had a problem frustrating my testing because lockdep was silently turning
> > > > itself off... I patched out the code to disable lockdep after the first error,
> > > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the
> > > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
> > > > migration thread was reported to try to unlock rq->lock although it was
> > > > holding no locks. Then init was reported to return to userspace without
> > > > releasing an objectdebug hash lock.
> > > >
> > > > All that went away and everything seemed to work properly with debug objects
> > > > configured out.
> > > >
> > > > I didn't get too far in trying to debug the problem. But it should be easy
> > > > enough to reproduce (if not, I can post traces or test patches).
> > > 
> > > I just built a kernel with lockdep and debugobjects enabled, and
> > > everything seemed fine. I think you should post your kernel version,
> > > config, and the lockdep patch (if needed -- it didn't seem to turn
> > > itself off here).
> > 
> > Are you sure? Ie. sysrq+D a still works properly? In that case, you
> > wouldn't need the lockdep patch because it just prevents the feature from being
> > switched off.
> > 
> > I'll have to dig a bit further, then. The annoying thing is that
> > lockdep turns itself off at the drop of a hat (and this particular
> > problem seems to happen without any backtraces), so it invalidates
> > all your lockdep testing if you don't realise it has turned itself
> > off.
> > 
> > Is there a way to re-arm lockdep? That would be neat.
> 
> Not at the moment, and it looks somewhat complicated. All lock state 
> freezes the moment lockdep disarms itself. That's very much a key design 
> element: rarely will you see any real lockdep-inflicted crash - even if it 
> has a bug it is self-disabling itself and running for the door very 
> efficiently.

Lockdep isn't exactly for production systems though, is it? If you
want to debug some problem but you have other code (that you don't
have knowledge to debug) is switching it off...

Also, I'd guess that most bugs in lockdep would probably fall pretty
neatly into either the "pretty harmless" or "completely take down the
system" categories ;)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-21 11:50       ` Peter Zijlstra
@ 2009-01-21 11:54         ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2009-01-21 11:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Vegard Nossum, Thomas Gleixner,
	Linux Kernel Mailing List


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2009-01-21 at 12:42 +0100, Ingo Molnar wrote:
> 
> > So by the time you'd rearm, there's a lot of tasks with no proper 
> > locking state built up. We might be able to re-arm via 
> > stop_machine_run perhaps.
> 
> Won't work either, kstopmachine only preempts everybody. We'd require 
> something stronger.

indeed - mutexes wont be covered.

> What we need is a point where there's guaranteed no locks held, for 
> regular tasks that would be a trip to userspace and back, but for kernel 
> tasks that's a bit harder -- does the freezer stuff guarantee this?

yes, that might be doable.

Sounds very fragile though.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep and debug objects together are broken?
  2009-01-21 11:54       ` Nick Piggin
@ 2009-01-21 11:57         ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2009-01-21 11:57 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Vegard Nossum, Thomas Gleixner, Peter Zijlstra,
	Linux Kernel Mailing List


* Nick Piggin <npiggin@suse.de> wrote:

> On Wed, Jan 21, 2009 at 12:42:29PM +0100, Ingo Molnar wrote:
> > 
> > * Nick Piggin <npiggin@suse.de> wrote:
> > 
> > > On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote:
> > > > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote:
> > > > > Hi,
> > > > >
> > > > > I've had a problem frustrating my testing because lockdep was silently turning
> > > > > itself off... I patched out the code to disable lockdep after the first error,
> > > > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the
> > > > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later,
> > > > > migration thread was reported to try to unlock rq->lock although it was
> > > > > holding no locks. Then init was reported to return to userspace without
> > > > > releasing an objectdebug hash lock.
> > > > >
> > > > > All that went away and everything seemed to work properly with debug objects
> > > > > configured out.
> > > > >
> > > > > I didn't get too far in trying to debug the problem. But it should be easy
> > > > > enough to reproduce (if not, I can post traces or test patches).
> > > > 
> > > > I just built a kernel with lockdep and debugobjects enabled, and
> > > > everything seemed fine. I think you should post your kernel version,
> > > > config, and the lockdep patch (if needed -- it didn't seem to turn
> > > > itself off here).
> > > 
> > > Are you sure? Ie. sysrq+D a still works properly? In that case, you
> > > wouldn't need the lockdep patch because it just prevents the feature from being
> > > switched off.
> > > 
> > > I'll have to dig a bit further, then. The annoying thing is that
> > > lockdep turns itself off at the drop of a hat (and this particular
> > > problem seems to happen without any backtraces), so it invalidates
> > > all your lockdep testing if you don't realise it has turned itself
> > > off.
> > > 
> > > Is there a way to re-arm lockdep? That would be neat.
> > 
> > Not at the moment, and it looks somewhat complicated. All lock state 
> > freezes the moment lockdep disarms itself. That's very much a key design 
> > element: rarely will you see any real lockdep-inflicted crash - even if it 
> > has a bug it is self-disabling itself and running for the door very 
> > efficiently.
> 
> Lockdep isn't exactly for production systems though, is it? If you want 
> to debug some problem but you have other code (that you don't have 
> knowledge to debug) is switching it off...
> 
> Also, I'd guess that most bugs in lockdep would probably fall pretty 
> neatly into either the "pretty harmless" or "completely take down the 
> system" categories ;)

i think lockdep could be expanded into production use via code patching 
techniques.

So in that sense the rearm bit could be useful - it would give us a 
lockdep variant that would run for the first 5 minutes of uptime (where 
90% of all lockdep reports trigger: lockdep maps the dependencies very 
quickly) - and could turn itself off after that, and patch out / disable 
its callbacks.

The memory footprint would still remain, but that is not nearly as much of 
a problem for production systems as runtime overhead.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-01-21 11:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-20  8:55 lockdep and debug objects together are broken? Nick Piggin
2009-01-20 21:11 ` Vegard Nossum
2009-01-21  7:19   ` Nick Piggin
2009-01-21 11:42     ` Ingo Molnar
2009-01-21 11:50       ` Peter Zijlstra
2009-01-21 11:54         ` Ingo Molnar
2009-01-21 11:54       ` Nick Piggin
2009-01-21 11:57         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox