From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755835AbZA3OGm (ORCPT ); Fri, 30 Jan 2009 09:06:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753171AbZA3OGd (ORCPT ); Fri, 30 Jan 2009 09:06:33 -0500 Received: from mx3.mail.elte.hu ([157.181.1.138]:51753 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753048AbZA3OGc (ORCPT ); Fri, 30 Jan 2009 09:06:32 -0500 Date: Fri, 30 Jan 2009 15:06:20 +0100 From: Ingo Molnar To: "Rafael J. Wysocki" Cc: =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Steven Rostedt , Linus Torvalds , Maciej Rutecki , Linux Kernel Mailing List , Andrew Morton , Thomas Gleixner Subject: Re: [Linux 2.6.29-rc2] BUG: using smp_processor_id() in preemptible Message-ID: <20090130140620.GD17401@elte.hu> References: <8db1092f0901170058k325dc6ddtddb42deea1ddd098@mail.gmail.com> <200901272218.39608.rjw@sisk.pl> <20090129150701.GE6512@elte.hu> <200901292329.59121.rjw@sisk.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200901292329.59121.rjw@sisk.pl> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Rafael J. Wysocki wrote: > On Thursday 29 January 2009, Ingo Molnar wrote: > > > > * Rafael J. Wysocki wrote: > > > > > On Tuesday 27 January 2009, Ingo Molnar wrote: > > > > > > > > * Rafael J. Wysocki wrote: > > > > > > > > > > In fact whatever check you put in it's _always_ going to be > > > > > > fundamentally more fragile than direct instrumentation: you cannot > > > > > > possibly check all possible places that enable interrupts. (they could > > > > > > be disabling interrupts as a _restore_irqs() sequence for example) > > > > > > > > > > In this particular case, I'm not really interested in that. What I'm > > > > > interested in is which driver's ->suspend_late() or ->resume_early() (or > > > > > the equivalents for sysdevs) has enabled interrupts, which is quite easy > > > > > to check directly. > > > > > > > > But this is exactly what it does - without any need for debug checks > > > > spread around! > > > > > > > > You'll get a _full stack dump_ from the very driver that is enabling > > > > interrupts! You dont get a trace - you get a stack dump of the very place > > > > that is buggy. It does not get any better than that. > > > > > > I'm not going to argue. > > > > > > Nevertheless, IMO something like the patch below should be sufficient to catch > > > these bugs. > > > > > > Thanks, > > > Rafael > > > > > > > > > --- > > > drivers/base/power/main.c | 12 ++++++++++++ > > > drivers/base/sys.c | 21 ++++++++++++++++----- > > > include/linux/pm.h | 18 ++++++++++++++++++ > > > 3 files changed, 46 insertions(+), 5 deletions(-) > > > > hm, so now you sprinkle debug checks all around the code, instead of > > putting in a single pair of: > > > > force_irqs_off_start(); > > ... > > force_irqs_off_end(); > > And what debug options exactly would that require to be set to work? hm, if you worry about that aspect: we could make it seemlessly enabled if PM_DEBUG is enabled. > > which would catch everything that your checks would catch - and it > > would catch more. > > Except that the checks trigger in specific places, so if a check > triggers you know precisely where the bug happened regardless of what > garbage is in the call trace. This argument is 100% mystery to me. Do you really not see the quality difference between a stack trace generated _right at the buggy piece of code_ and a warning later on that might (or might not) trigger? Especially considering that your approach wont catch such bugs: ... spin_unlock_irq(); ... spin_lock_irq(); ... Or such bugs: local_irq_enable(); ... local_irq_disable(); Or such bugs: spin_lock_irq_save(&lock1, flags); ... spin_lock_irqsave(&lock2, flags); ... spin_unlock_irq(&lock2); /* accidental bug */ ... spin_unlock_irq_restore(&lock1, flags); Such types of bugs might be especially hard to find in practice, if the window where irqs are enabled is small. There is no guarantee at all that accidental irq enabling survives a critical section - it can be re-disabled in the normal flow of things very easily. And even if we are lucky and if the irqs stay enabled by the time the callback returns, what if your warning flags some big and complex driver, one line of which is buggy? If you had the choice, what would you prefer - a stack dump done at the point of incident (pinpointing the driver, the subsystem and the buggy function with its full callframe), or your "this driver is buggy" generic warning with no specificity about where that bug might be? Stacktraces _at the point of incident_, and a _guaranteed_ facility that _enforces_ that irqs are off during the whole resume cycle are just about the highest quality debug info and debug protection we can get in such situations. I really dont understand your points about this. Ingo