From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764616AbZAUL6R (ORCPT ); Wed, 21 Jan 2009 06:58:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758134AbZAUL6B (ORCPT ); Wed, 21 Jan 2009 06:58:01 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:47510 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757715AbZAUL6B (ORCPT ); Wed, 21 Jan 2009 06:58:01 -0500 Date: Wed, 21 Jan 2009 12:57:44 +0100 From: Ingo Molnar To: Nick Piggin Cc: Vegard Nossum , Thomas Gleixner , Peter Zijlstra , Linux Kernel Mailing List Subject: Re: lockdep and debug objects together are broken? Message-ID: <20090121115744.GB22054@elte.hu> References: <20090120085559.GB19505@wotan.suse.de> <19f34abd0901201311t2425056dia6182812f7270297@mail.gmail.com> <20090121071950.GM24891@wotan.suse.de> <20090121114229.GA10606@elte.hu> <20090121115438.GT24891@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090121115438.GT24891@wotan.suse.de> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Nick Piggin wrote: > On Wed, Jan 21, 2009 at 12:42:29PM +0100, Ingo Molnar wrote: > > > > * Nick Piggin wrote: > > > > > On Tue, Jan 20, 2009 at 10:11:47PM +0100, Vegard Nossum wrote: > > > > On Tue, Jan 20, 2009 at 9:55 AM, Nick Piggin wrote: > > > > > Hi, > > > > > > > > > > I've had a problem frustrating my testing because lockdep was silently turning > > > > > itself off... I patched out the code to disable lockdep after the first error, > > > > > and it started showing up weird errors. kernel/fork.c:990 seemed to be the > > > > > first to trigger (hard irqs disabled) from a call_usermodehelper call. Later, > > > > > migration thread was reported to try to unlock rq->lock although it was > > > > > holding no locks. Then init was reported to return to userspace without > > > > > releasing an objectdebug hash lock. > > > > > > > > > > All that went away and everything seemed to work properly with debug objects > > > > > configured out. > > > > > > > > > > I didn't get too far in trying to debug the problem. But it should be easy > > > > > enough to reproduce (if not, I can post traces or test patches). > > > > > > > > I just built a kernel with lockdep and debugobjects enabled, and > > > > everything seemed fine. I think you should post your kernel version, > > > > config, and the lockdep patch (if needed -- it didn't seem to turn > > > > itself off here). > > > > > > Are you sure? Ie. sysrq+D a still works properly? In that case, you > > > wouldn't need the lockdep patch because it just prevents the feature from being > > > switched off. > > > > > > I'll have to dig a bit further, then. The annoying thing is that > > > lockdep turns itself off at the drop of a hat (and this particular > > > problem seems to happen without any backtraces), so it invalidates > > > all your lockdep testing if you don't realise it has turned itself > > > off. > > > > > > Is there a way to re-arm lockdep? That would be neat. > > > > Not at the moment, and it looks somewhat complicated. All lock state > > freezes the moment lockdep disarms itself. That's very much a key design > > element: rarely will you see any real lockdep-inflicted crash - even if it > > has a bug it is self-disabling itself and running for the door very > > efficiently. > > Lockdep isn't exactly for production systems though, is it? If you want > to debug some problem but you have other code (that you don't have > knowledge to debug) is switching it off... > > Also, I'd guess that most bugs in lockdep would probably fall pretty > neatly into either the "pretty harmless" or "completely take down the > system" categories ;) i think lockdep could be expanded into production use via code patching techniques. So in that sense the rearm bit could be useful - it would give us a lockdep variant that would run for the first 5 minutes of uptime (where 90% of all lockdep reports trigger: lockdep maps the dependencies very quickly) - and could turn itself off after that, and patch out / disable its callbacks. The memory footprint would still remain, but that is not nearly as much of a problem for production systems as runtime overhead. Ingo