* lockdep and debug objects together are broken? @ 2009-01-20 8:55 Nick Piggin 2009-01-20 21:11 ` Vegard Nossum 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2009-01-20 8:55 UTC (permalink / raw) To: Ingo Molnar, Thomas Gleixner, Vegard Nossum, Peter Zijlstra, Linux Kernel Mailing List Hi, I've had a problem frustrating my testing because lockdep was silently turning itself off... I patched out the code to disable lockdep after the first error, and it started showing up weird errors. kernel/fork.c:990 seemed to be the first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, migration thread was reported to try to unlock rq->lock although it was holding no locks. Then init was reported to return to userspace without releasing an objectdebug hash lock. All that went away and everything seemed to work properly with debug objects configured out. I didn't get too far in trying to debug the problem. But it should be easy enough to reproduce (if not, I can post traces or test patches). Thanks, Nick ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-20 8:55 lockdep and debug objects together are broken? Nick Piggin @ 2009-01-20 21:11 ` Vegard Nossum 2009-01-21 7:19 ` Nick Piggin 0 siblings, 1 reply; 8+ messages in thread From: Vegard Nossum @ 2009-01-20 21:11 UTC (permalink / raw) To: Nick Piggin Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote: > Hi, > > I've had a problem frustrating my testing because lockdep was silently turning > itself off... I patched out the code to disable lockdep after the first error, > and it started showing up weird errors. kernel/fork.c:990 seemed to be the > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, > migration thread was reported to try to unlock rq->lock although it was > holding no locks. Then init was reported to return to userspace without > releasing an objectdebug hash lock. > > All that went away and everything seemed to work properly with debug objects > configured out. > > I didn't get too far in trying to debug the problem. But it should be easy > enough to reproduce (if not, I can post traces or test patches). I just built a kernel with lockdep and debugobjects enabled, and everything seemed fine. I think you should post your kernel version, config, and the lockdep patch (if needed -- it didn't seem to turn itself off here). Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-20 21:11 ` Vegard Nossum @ 2009-01-21 7:19 ` Nick Piggin 2009-01-21 11:42 ` Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2009-01-21 7:19 UTC (permalink / raw) To: Vegard Nossum Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote: > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote: > > Hi, > > > > I've had a problem frustrating my testing because lockdep was silently turning > > itself off... I patched out the code to disable lockdep after the first error, > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, > > migration thread was reported to try to unlock rq->lock although it was > > holding no locks. Then init was reported to return to userspace without > > releasing an objectdebug hash lock. > > > > All that went away and everything seemed to work properly with debug objects > > configured out. > > > > I didn't get too far in trying to debug the problem. But it should be easy > > enough to reproduce (if not, I can post traces or test patches). > > I just built a kernel with lockdep and debugobjects enabled, and > everything seemed fine. I think you should post your kernel version, > config, and the lockdep patch (if needed -- it didn't seem to turn > itself off here). Are you sure? Ie. sysrq+D a still works properly? In that case, you wouldn't need the lockdep patch because it just prevents the feature from being switched off. I'll have to dig a bit further, then. The annoying thing is that lockdep turns itself off at the drop of a hat (and this particular problem seems to happen without any backtraces), so it invalidates all your lockdep testing if you don't realise it has turned itself off. Is there a way to re-arm lockdep? That would be neat. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-21 7:19 ` Nick Piggin @ 2009-01-21 11:42 ` Ingo Molnar 2009-01-21 11:50 ` Peter Zijlstra 2009-01-21 11:54 ` Nick Piggin 0 siblings, 2 replies; 8+ messages in thread From: Ingo Molnar @ 2009-01-21 11:42 UTC (permalink / raw) To: Nick Piggin Cc: Vegard Nossum, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List * Nick Piggin <npiggin@suse.de> wrote: > On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote: > > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote: > > > Hi, > > > > > > I've had a problem frustrating my testing because lockdep was silently turning > > > itself off... I patched out the code to disable lockdep after the first error, > > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the > > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, > > > migration thread was reported to try to unlock rq->lock although it was > > > holding no locks. Then init was reported to return to userspace without > > > releasing an objectdebug hash lock. > > > > > > All that went away and everything seemed to work properly with debug objects > > > configured out. > > > > > > I didn't get too far in trying to debug the problem. But it should be easy > > > enough to reproduce (if not, I can post traces or test patches). > > > > I just built a kernel with lockdep and debugobjects enabled, and > > everything seemed fine. I think you should post your kernel version, > > config, and the lockdep patch (if needed -- it didn't seem to turn > > itself off here). > > Are you sure? Ie. sysrq+D a still works properly? In that case, you > wouldn't need the lockdep patch because it just prevents the feature from being > switched off. > > I'll have to dig a bit further, then. The annoying thing is that > lockdep turns itself off at the drop of a hat (and this particular > problem seems to happen without any backtraces), so it invalidates > all your lockdep testing if you don't realise it has turned itself > off. > > Is there a way to re-arm lockdep? That would be neat. Not at the moment, and it looks somewhat complicated. All lock state freezes the moment lockdep disarms itself. That's very much a key design element: rarely will you see any real lockdep-inflicted crash - even if it has a bug it is self-disabling itself and running for the door very efficiently. So by the time you'd rearm, there's a lot of tasks with no proper locking state built up. We might be able to re-arm via stop_machine_run perhaps. Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-21 11:42 ` Ingo Molnar @ 2009-01-21 11:50 ` Peter Zijlstra 2009-01-21 11:54 ` Ingo Molnar 2009-01-21 11:54 ` Nick Piggin 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2009-01-21 11:50 UTC (permalink / raw) To: Ingo Molnar Cc: Nick Piggin, Vegard Nossum, Thomas Gleixner, Linux Kernel Mailing List On Wed, 2009-01-21 at 12:42 +0100, Ingo Molnar wrote: > So by the time you'd rearm, there's a lot of tasks with no proper locking > state built up. We might be able to re-arm via stop_machine_run perhaps. Won't work either, kstopmachine only preempts everybody. We'd require something stronger. What we need is a point where there's guaranteed no locks held, for regular tasks that would be a trip to userspace and back, but for kernel tasks that's a bit harder -- does the freezer stuff guarantee this? Supposing we have such a point for all tasks, what you then do is wipe all lock state and rig a trigger to start tracking lock state once you passed through the point. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-21 11:50 ` Peter Zijlstra @ 2009-01-21 11:54 ` Ingo Molnar 0 siblings, 0 replies; 8+ messages in thread From: Ingo Molnar @ 2009-01-21 11:54 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, Vegard Nossum, Thomas Gleixner, Linux Kernel Mailing List * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Wed, 2009-01-21 at 12:42 +0100, Ingo Molnar wrote: > > > So by the time you'd rearm, there's a lot of tasks with no proper > > locking state built up. We might be able to re-arm via > > stop_machine_run perhaps. > > Won't work either, kstopmachine only preempts everybody. We'd require > something stronger. indeed - mutexes wont be covered. > What we need is a point where there's guaranteed no locks held, for > regular tasks that would be a trip to userspace and back, but for kernel > tasks that's a bit harder -- does the freezer stuff guarantee this? yes, that might be doable. Sounds very fragile though. Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-21 11:42 ` Ingo Molnar 2009-01-21 11:50 ` Peter Zijlstra @ 2009-01-21 11:54 ` Nick Piggin 2009-01-21 11:57 ` Ingo Molnar 1 sibling, 1 reply; 8+ messages in thread From: Nick Piggin @ 2009-01-21 11:54 UTC (permalink / raw) To: Ingo Molnar Cc: Vegard Nossum, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List On Wed, Jan 21, 2009 at 12:42:29PM +0100, Ingo Molnar wrote: > > * Nick Piggin <npiggin@suse.de> wrote: > > > On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote: > > > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote: > > > > Hi, > > > > > > > > I've had a problem frustrating my testing because lockdep was silently turning > > > > itself off... I patched out the code to disable lockdep after the first error, > > > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the > > > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, > > > > migration thread was reported to try to unlock rq->lock although it was > > > > holding no locks. Then init was reported to return to userspace without > > > > releasing an objectdebug hash lock. > > > > > > > > All that went away and everything seemed to work properly with debug objects > > > > configured out. > > > > > > > > I didn't get too far in trying to debug the problem. But it should be easy > > > > enough to reproduce (if not, I can post traces or test patches). > > > > > > I just built a kernel with lockdep and debugobjects enabled, and > > > everything seemed fine. I think you should post your kernel version, > > > config, and the lockdep patch (if needed -- it didn't seem to turn > > > itself off here). > > > > Are you sure? Ie. sysrq+D a still works properly? In that case, you > > wouldn't need the lockdep patch because it just prevents the feature from being > > switched off. > > > > I'll have to dig a bit further, then. The annoying thing is that > > lockdep turns itself off at the drop of a hat (and this particular > > problem seems to happen without any backtraces), so it invalidates > > all your lockdep testing if you don't realise it has turned itself > > off. > > > > Is there a way to re-arm lockdep? That would be neat. > > Not at the moment, and it looks somewhat complicated. All lock state > freezes the moment lockdep disarms itself. That's very much a key design > element: rarely will you see any real lockdep-inflicted crash - even if it > has a bug it is self-disabling itself and running for the door very > efficiently. Lockdep isn't exactly for production systems though, is it? If you want to debug some problem but you have other code (that you don't have knowledge to debug) is switching it off... Also, I'd guess that most bugs in lockdep would probably fall pretty neatly into either the "pretty harmless" or "completely take down the system" categories ;) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: lockdep and debug objects together are broken? 2009-01-21 11:54 ` Nick Piggin @ 2009-01-21 11:57 ` Ingo Molnar 0 siblings, 0 replies; 8+ messages in thread From: Ingo Molnar @ 2009-01-21 11:57 UTC (permalink / raw) To: Nick Piggin Cc: Vegard Nossum, Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List * Nick Piggin <npiggin@suse.de> wrote: > On Wed, Jan 21, 2009 at 12:42:29PM +0100, Ingo Molnar wrote: > > > > * Nick Piggin <npiggin@suse.de> wrote: > > > > > On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote: > > > > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin <npiggin@suse.de> wrote: > > > > > Hi, > > > > > > > > > > I've had a problem frustrating my testing because lockdep was silently turning > > > > > itself off... I patched out the code to disable lockdep after the first error, > > > > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the > > > > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, > > > > > migration thread was reported to try to unlock rq->lock although it was > > > > > holding no locks. Then init was reported to return to userspace without > > > > > releasing an objectdebug hash lock. > > > > > > > > > > All that went away and everything seemed to work properly with debug objects > > > > > configured out. > > > > > > > > > > I didn't get too far in trying to debug the problem. But it should be easy > > > > > enough to reproduce (if not, I can post traces or test patches). > > > > > > > > I just built a kernel with lockdep and debugobjects enabled, and > > > > everything seemed fine. I think you should post your kernel version, > > > > config, and the lockdep patch (if needed -- it didn't seem to turn > > > > itself off here). > > > > > > Are you sure? Ie. sysrq+D a still works properly? In that case, you > > > wouldn't need the lockdep patch because it just prevents the feature from being > > > switched off. > > > > > > I'll have to dig a bit further, then. The annoying thing is that > > > lockdep turns itself off at the drop of a hat (and this particular > > > problem seems to happen without any backtraces), so it invalidates > > > all your lockdep testing if you don't realise it has turned itself > > > off. > > > > > > Is there a way to re-arm lockdep? That would be neat. > > > > Not at the moment, and it looks somewhat complicated. All lock state > > freezes the moment lockdep disarms itself. That's very much a key design > > element: rarely will you see any real lockdep-inflicted crash - even if it > > has a bug it is self-disabling itself and running for the door very > > efficiently. > > Lockdep isn't exactly for production systems though, is it? If you want > to debug some problem but you have other code (that you don't have > knowledge to debug) is switching it off... > > Also, I'd guess that most bugs in lockdep would probably fall pretty > neatly into either the "pretty harmless" or "completely take down the > system" categories ;) i think lockdep could be expanded into production use via code patching techniques. So in that sense the rearm bit could be useful - it would give us a lockdep variant that would run for the first 5 minutes of uptime (where 90% of all lockdep reports trigger: lockdep maps the dependencies very quickly) - and could turn itself off after that, and patch out / disable its callbacks. The memory footprint would still remain, but that is not nearly as much of a problem for production systems as runtime overhead. Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-01-21 11:58 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-20 8:55 lockdep and debug objects together are broken? Nick Piggin 2009-01-20 21:11 ` Vegard Nossum 2009-01-21 7:19 ` Nick Piggin 2009-01-21 11:42 ` Ingo Molnar 2009-01-21 11:50 ` Peter Zijlstra 2009-01-21 11:54 ` Ingo Molnar 2009-01-21 11:54 ` Nick Piggin 2009-01-21 11:57 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox