From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Rafael J. Wysocki" Subject: Re: Async resume patch (was: Re: [GIT PULL] PM updates for 2.6.33) Date: Tue, 8 Dec 2009 23:03:12 +0100 Message-ID: <200912082303.12758.rjw@sisk.pl> References: <200912082240.42773.rjw@sisk.pl> Mime-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <200912082240.42773.rjw@sisk.pl> Sender: linux-kernel-owner@vger.kernel.org To: Linus Torvalds Cc: Alan Stern , Zhang Rui , LKML , ACPI Devel Maling List , pm list List-Id: linux-acpi@vger.kernel.org On Tuesday 08 December 2009, Rafael J. Wysocki wrote: > On Tuesday 08 December 2009, Linus Torvalds wrote: > > > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote: > > > > > > Anyway, if we use an rwsem, it won't be checkable from interrupt context just > > > as well. > > > > You can't do a lock() from an interrupt, but the unlocks should be > > irq-safe. > > > > > Suppose we use rwsem and during suspend each child uses a down_read() on a > > > parent and then the parent uses down_write() on itself. What if, whatever the > > > reason, the parent is a bit early and does the down_write() before one of the > > > children has a chance to do the down_read()? Aren't we toast? > > > > We're toast, but we're toast for a totally unrealted reason: it means that > > you tried to resume a child before a parent, which would be a major bug to > > begin with. > > > > Look, I even wrote out the comments, so let me repeat the code one more > > time. > > > > - suspend time calling: > > // This won't block, because we suspend nodes before parents > > down_read(node->parent->lock); > > // Do the part that may block asynchronously > > async_schedule(do_usb_node_suspend, node); > > > > - resume time calling: > > // This won't block, because we resume parents before children, > > // and the children will take the read lock. > > down_write(leaf->lock); > > // Do the blocking part asynchronously > > async_schedule(usb_node_resume, leaf); > > > > See? So when we take the parent lock for suspend, we are guaranteed to do > > so _before_ the parent node itself suspends. And conversely, when we take > > the parent lock (asynchronously) for resume, we're guaranteed to do that > > _after_ the parent node has done its own down_write. > > > > And that all depends on just one trivial thing; that the suspend and > > resume is called in the right order (children first vs parent first > > respectively). And that is such a _major_ correctness issue that if that > > isn't correct, your suspend isn't going to work _anyway_. > > Understood (I think). > > Let's try it, then. Below is the resume patch based on my previous one in this > thread (I have only verified that it builds). Ah, I need to check if dev->parent is not NULL before trying to lock it, but apart from this it doesn't break things at least. Rafael