From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753897AbZBCVMN (ORCPT ); Tue, 3 Feb 2009 16:12:13 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751988AbZBCVL5 (ORCPT ); Tue, 3 Feb 2009 16:11:57 -0500 Received: from gate.crashing.org ([63.228.1.57]:32880 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751508AbZBCVL5 (ORCPT ); Tue, 3 Feb 2009 16:11:57 -0500 Subject: Re: Reworking suspend-resume sequence (was: Re: PCI PM: Restore standard config registers of all devices early) From: Benjamin Herrenschmidt To: Linus Torvalds Cc: Jesse Barnes , "Rafael J. Wysocki" , Linux Kernel Mailing List , Andreas Schwab , Len Brown , Ingo Molnar In-Reply-To: References: <200901261904.n0QJ4Q9c016709@hera.kernel.org> <200902031804.26752.rjw@sisk.pl> <200902031032.26771.jesse.barnes@intel.com> Content-Type: text/plain Date: Wed, 04 Feb 2009 08:11:26 +1100 Message-Id: <1233695486.16867.95.camel@pasglop> Mime-Version: 1.0 X-Mailer: Evolution 2.24.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2009-02-03 at 11:19 -0800, Linus Torvalds wrote: > > On Tue, 3 Feb 2009, Linus Torvalds wrote: > > > > Hmm. The _normal_ simple irq handler does this the way I described, but > > for some reason the "handle_edge_irq()" does not. And the reason is > > actually a buglet: it needs to mask things for the "recursive interrupt" > > case. > > Btw, just to clarify: none of this happens at the actual "irq_disable()" > time: it only happens if you get an interrupt _while_ it's disabled. Which > obviously shouldn't happen in the shutdown/wakeup path anyway for MSI, > since the interrupts aren't shared, but it would be good to just be extra > safe. > > I do suspect we could/should just get rid of the msi masking entirely, but > that may be too scary a step. I agree :-) It's been a source of problem anyway, I remember hearing some device reacting strangely (aka, losing MSIs) when masked, and it's not even implemented by all devices (somebody had the "smart" idea of making an optional feature). So it ends up being a lot of not very useful code... > For the current suspend/resume situation, maybe it's enough to know that > it shouldn't be happening anyway, and even if it _does_ happen on a device > that has been shut down, it's just not going to do anything. Sure, it's > doing that "writel/readl", but if it gets lost, who really cares? Nobody. Note that I still believe that life would be simpler to just keep the current local_irq_save() and just tweak might_sleep() etc... so that ACPI is safe to call. At least it can be done orthogonally to this interrupt change. IE. As I said earlier, that interrupt masking -will- change the exposed semantics of suspend_late() to drivers. In fact, iirc, it's you who advertised initially suspend_late() as being the 'easy' way to write a PCI driver suspend routine specifically because you don't have to bother about being re-entered from anywhere for things like request processing etc... With the change, the kernel is essentially still operating, timers are ticking, we may even be scheduling, so this important assumption is gone. That means a lot more chances for driver to screw up. Cheers, Ben.