From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Jeremy Fitzhardinge <jeremy@goop.org>,
pm list <linux-pm@lists.linux-foundation.org>,
Len Brown <lenb@kernel.org>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
Date: Mon, 23 Feb 2009 12:03:07 +0100 [thread overview]
Message-ID: <200902231203.09082.rjw@sisk.pl> (raw)
In-Reply-To: <m1bpst77te.fsf@fess.ebiederm.org>
On Monday 23 February 2009, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >> Ingo Molnar <mingo@elte.hu> writes:
> >>
> >> > I think this aspect has been well-understood during the
> >> > discussion of this topic and it's just a slightly misleading
> >> > changelog.
> >>
> >> As I was a member of that discussion I did not see that.
> >>
> >> It took me several passes through the patches to realize the
> >> goal is to allow drivers to be able to sleep while they are in
> >> their late pm shutdown routines.
> >>
> >> Why we want this I don't know. But it seems simple enough to
> >> implement, and it makes it harder to get the late pm suspend
> >> routines wrong, which is always good.
> >
> > That's not the only goal. The other goal is to further shrink a
> > particular window of suspend fragility: the irqs-disabled stage
> > of the suspend/resume sequence.
> >
> > Since suspend/resume is a mini-reboot sequence, there's a large
> > amount of code executed - and the variety of code is large as
> > well. We had repeat cases of random drivers re-enabling
> > interrupts and thus breaking other drivers - and these are nasty
> > to debug.
> >
> > So this patchset disables device IRQs centrally and serializes
> > with pending work - so there's no races with pending IRQs
> > anymore.
> >
> > The fact that we keep the timer irq running is two-fold: firstly
> > the timer code is special and not really part of the regular
> > suspend/resume sequence.
> >
> > Drivers want to take timestamps, sometimes they even want to do
> > a small usleep(), etc. Ideally the suspend/resume code is pretty
> > much _the same_ as a regular bootup (and shutdown) code - so we
> > want to provide a similar environment to how drivers initialize
> > and deinitialize, and we want to enable them to share code
> > between bootup/shutdown and suspend/resume agressively.
> >
> > So the more generic kernel environment we give these fragile
> > handlers, the better we are off in the end. Since we already had
> > IRQS_TIMER, that was just the natural thing to do.
>
> I am all for sharing code, especially if we can factor if
> we can find common factors that do the same thing.
>
> I don't know how many times I have found drivers doing something
> weird in their shutdown routines that they don't know how
> to get the device out of. The e1000 driver has shown up several
> times because it likes to suspend the device on shutdown.
>
> The fact that the methods exposed to drivers were only defined
> to be usable on the s2ram/hibernate path is something I have
> brought up on more than one occasion as a bad choice.
>
> I'm really not convinced that the rational for separating
> out the shutdown methods from the remove methods has
> been very good. That of we don't need to clean up the in-kernel
> data structures on reboot so why do something extra that can
> introduce instability.
>
> So having been watching a smaller form of this drama on the
> reboot path for several years. Having had a device method
> with fixed semantics, and not the dwm sematics of the historical
> suspend routing. I expect there is still a ways to go before
> it is simple and easy for drivers to figure out what they need
> to implement out of the confusing variety of possible device
> methods.
>
> >> > The new suspend code does not rely on truly disabling IRQs
> >> > on the low level. The purpose is to not get IRQs to drivers
> >> > - which might crash/hang/race/misbehave.
> >>
> >> Reasonable. I expect one of the problems with drivers getting
> >> it wrong is that the interface is too complex for mortal
> >> humans to understand.
> >
> > The suspend/resume state machine certainly used to be a piece of
> > code that makes a seasoned kernel developer weep in fear.
> >
> > That has changed drastically in the past few months. The
> > suspend+hibernation logic got unified (at least as far as driver
> > methods go), and all the flow and ordering has been cleaned up
> > and has been made more robust.
>
> I will have to look again. My impression is that overloading
> a single method is part of what got us into this mess in the
> first place.
>
> No that I don't see things getting better.
>
> > What makes s2ram fragile is not human failure but the
> > combination of a handful of physical property:
> >
> > 1) Psychology: shutting the lid or pushing the suspend button is
> > a deceivingly 'simple' action to the user. But under the
> > hood, a ton of stuff happens: we deinitialize a lot of
> > things, we go through _all hardware state_, and we do so in a
> > serial fashion. If just one piece fails to do the right
> > thing, the box might not resume. Still, the user expects this
> > 'simple' thing to just work, all the time. No excuses
> > accepted.
> >
> > 2) Length of code: To get a successful s2ram sequence the kernel
> > runs through tens of thousands of lines of code. Code which
> > never gets executed on a normal box - only if we s2ram. If
> > just one step fails, we get a hung box.
> >
> > 3) Debuggability: a lot of s2ram code runs with the console off,
> > making any bugs hard to debug. Furthermore we have no
> > meaningful persistent storage either for kernel bug messages.
> > The RTC trick of PM_DEBUG works but is a very narrow channel
> > of information and it takes a lot of time to debug a bug via
> > that method.
>
> Yep that is an issue.
>
> > The combination of these factors really makes up for a perfect
> > storm in terms of kernel technology: we have this
> > very-deceivingly-simple-looking but complex-and-rarely-executed
> > piece of code, which is very hard to debug.
>
> And much of this as you are finding with this piece of code
> is how the software was designed rather then how the software
> needed to be.
>
> > Even just one of these factors would be enough to make an
> > otherwise healthy subsystem fragile - no wonder s2ram has been a
> > problem ever since it existed in the upstream kernel.
> >
> > So now we need just one thing: patience and more of the same
> > good stuff that happened lately.
>
> I think there has been some good progress, and so I am happy
> to be patient. I will still mention on occasion what it
> seems we are doing wrong. Unfortunately I don't have time
> to do a lot more than that.
>
> >> > Still, it might make sense to not just use the ->disable
> >> > sequence but primarily the ->shutdown irqchip method (when
> >> > it's available in the irqchip).
> >>
> >> Disable seems fine to me. This is interesting in the context
> >> of all of the irqs that will when masked show up somewhere
> >> else (think boot interrupts).
> >>
> >> > While we obviously cannot turn off the PIC that delivers
> >> > timer IRQs at this stage - there's no theoretical reason why
> >> > the suspend sequence couldnt power down some secondary PICs
> >> > as well - in some arch code, or maybe even in the generic
> >> > driver suspend sequence if the device tree is structured
> >> > carefully enough so that the PIC gets turned off last.
> >>
> >> If the point is simply to prevent deliver of irqs to the
> >> drivers I don't see the point of anything more than what the
> >> patch does now.
> >
> > ... except for the usecase i described above. Say some PIC sits
> > on a piece of silicon which gets turned off. I'm not talking
> > about x86 but some custom device. We really dont want that IRQ
> > line to send half of an IRQ message (un-ACK-ed) when it gets
> > turned off. So physically 'suspending' all IRQ lines does make a
> > certain level of long-term sense.
>
> Good point. We will loose both level and edge triggered events
> that occur between suspending the irqs and restoring them but
> that is inevitable. So we might as well call shutdown and totally
> turn off the irqs if we can.
>
> I don't know where in the state machine this is getting called but
> I would suggest doing this before we shutdown cpus.
This is the plan. In fact, I'm going to do this in the next patch after the
$subject one has been tested and found acceptable.
Thanks,
Rafael
next prev parent reply other threads:[~2009-02-23 11:04 UTC|newest]
Thread overview: 187+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-22 17:37 [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume Rafael J. Wysocki
2009-02-22 17:38 ` [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up] Rafael J. Wysocki
2009-02-22 20:56 ` Adrian Bunk
2009-02-22 21:07 ` Linus Torvalds
2009-02-22 21:12 ` Ingo Molnar
2009-02-22 22:42 ` Adrian Bunk
2009-03-05 16:54 ` Pavel Machek
2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
2009-02-22 18:01 ` Linus Torvalds
2009-02-22 22:42 ` Rafael J. Wysocki
2009-02-22 23:48 ` Rafael J. Wysocki
2009-02-23 0:05 ` Linus Torvalds
2009-02-23 1:23 ` Linus Torvalds
2009-02-23 10:52 ` Rafael J. Wysocki
2009-02-23 3:04 ` Eric W. Biederman
2009-02-23 8:44 ` Ingo Molnar
2009-02-23 9:22 ` Eric W. Biederman
2009-02-23 9:44 ` Ingo Molnar
2009-02-23 10:42 ` Eric W. Biederman
2009-02-23 11:03 ` Rafael J. Wysocki [this message]
2009-02-23 15:28 ` Eric W. Biederman
2009-02-23 21:39 ` Rafael J. Wysocki
2009-02-24 3:30 ` Eric W. Biederman
2009-02-24 22:42 ` Rafael J. Wysocki
2009-02-24 22:51 ` Linus Torvalds
2009-02-24 23:07 ` Rafael J. Wysocki
2009-02-24 23:09 ` Ingo Molnar
2009-02-24 23:29 ` Rafael J. Wysocki
2009-02-25 13:23 ` Ingo Molnar
2009-02-26 1:17 ` Arve Hjønnevåg
2009-02-26 1:27 ` Linus Torvalds
2009-02-26 2:13 ` Arve Hjønnevåg
2009-02-26 2:51 ` Linus Torvalds
2009-02-26 3:00 ` Ingo Molnar
2009-02-26 3:31 ` Arve Hjønnevåg
2009-02-26 3:37 ` Linus Torvalds
2009-02-26 3:50 ` Arve Hjønnevåg
2009-02-26 3:57 ` Linus Torvalds
2009-02-26 4:13 ` Arve Hjønnevåg
2009-02-26 4:20 ` Eric W. Biederman
2009-02-26 4:24 ` Arve Hjønnevåg
2009-02-26 9:50 ` Rafael J. Wysocki
2009-02-26 20:34 ` Arve Hjønnevåg
2009-02-26 20:57 ` Benjamin Herrenschmidt
2009-02-26 21:20 ` Arve Hjønnevåg
2009-02-26 21:49 ` Benjamin Herrenschmidt
2009-02-26 21:58 ` Rafael J. Wysocki
2009-02-26 22:10 ` Linus Torvalds
2009-02-26 22:30 ` Arve Hjønnevåg
2009-02-26 23:10 ` Rafael J. Wysocki
2009-02-27 0:00 ` Arve Hjønnevåg
2009-02-27 0:27 ` Linus Torvalds
2009-02-27 3:20 ` [linux-pm] " Alan Stern
2009-02-27 4:43 ` Linus Torvalds
2009-02-27 14:59 ` Alan Stern
2009-02-27 20:30 ` Linus Torvalds
2009-02-28 3:54 ` Arve Hjønnevåg
2009-02-28 10:06 ` Rafael J. Wysocki
2009-02-28 17:03 ` Linus Torvalds
2009-02-28 22:15 ` Arve Hjønnevåg
2009-02-26 22:30 ` Rafael J. Wysocki
2009-02-25 4:16 ` Eric W. Biederman
2009-02-25 4:26 ` Linus Torvalds
2009-02-25 4:59 ` Eric W. Biederman
2009-02-25 15:32 ` [linux-pm] " Alan Stern
2009-02-25 16:19 ` Linus Torvalds
2009-02-23 11:04 ` Ingo Molnar
2009-02-23 14:45 ` Rafael J. Wysocki
2009-02-23 15:06 ` Ingo Molnar
2009-02-23 21:59 ` Rafael J. Wysocki
2009-02-23 10:13 ` Benjamin Herrenschmidt
2009-02-23 8:36 ` Ingo Molnar
2009-02-23 11:29 ` Rafael J. Wysocki
2009-02-23 12:28 ` Ingo Molnar
2009-02-23 14:48 ` Rafael J. Wysocki
2009-02-23 20:49 ` Benjamin Herrenschmidt
2009-02-23 12:45 ` Ingo Molnar
2009-02-23 15:07 ` Rafael J. Wysocki
2009-02-23 15:52 ` Johannes Berg
2009-02-23 17:16 ` Ingo Molnar
2009-02-23 17:28 ` Linus Torvalds
2009-02-23 22:11 ` Rafael J. Wysocki
2009-02-23 22:11 ` Arve Hjønnevåg
2009-02-23 22:23 ` Rafael J. Wysocki
2009-02-23 22:44 ` Arve Hjønnevåg
2009-02-22 18:13 ` [RFC][PATCH 0/2] Rework disabling " Linus Torvalds
2009-02-22 18:18 ` Ingo Molnar
2009-02-22 18:25 ` Linus Torvalds
2009-02-22 18:35 ` Linus Torvalds
2009-02-22 22:37 ` Eric W. Biederman
2009-02-22 22:56 ` Benjamin Herrenschmidt
2009-02-22 23:02 ` Linus Torvalds
2009-03-01 22:21 ` [RFC][PATCH 0/4] " Rafael J. Wysocki
2009-03-01 22:24 ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
2009-03-02 23:01 ` Arve Hjønnevåg
2009-03-02 23:13 ` Rafael J. Wysocki
2009-03-02 23:18 ` Arve Hjønnevåg
2009-03-02 23:27 ` Rafael J. Wysocki
2009-03-03 22:56 ` Arve Hjønnevåg
2009-03-04 22:03 ` [Update, rev. 5] " Rafael J. Wysocki
2009-03-05 10:35 ` Ingo Molnar
2009-03-02 23:32 ` Linus Torvalds
2009-03-02 23:35 ` Linus Torvalds
2009-03-03 0:08 ` Arve Hjønnevåg
2009-03-03 8:41 ` Arve Hjønnevåg
2009-03-01 22:25 ` [RFC][PATCH 2/4] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-02 20:48 ` Linus Torvalds
2009-03-02 22:02 ` Rafael J. Wysocki
2009-03-01 22:26 ` [RFC][PATCH 3/4] PM: Change hibernation " Rafael J. Wysocki
2009-03-01 22:27 ` [RFC][PATCH 4/4] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-05 23:44 ` [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume Linus Torvalds
2009-03-06 6:47 ` Sitsofe Wheeler
2009-03-06 10:19 ` Rafael J. Wysocki
2009-03-07 10:19 ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Rafael J. Wysocki
2009-03-07 10:20 ` [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
2009-03-07 16:51 ` [linux-pm] " Alan Stern
2009-03-07 17:56 ` Rafael J. Wysocki
2009-03-08 3:53 ` Alan Stern
2009-03-08 10:00 ` Rafael J. Wysocki
2009-03-08 12:37 ` Alan Stern
2009-03-08 17:20 ` Linus Torvalds
2009-03-08 20:40 ` Alan Stern
2009-03-08 21:37 ` Rafael J. Wysocki
2009-03-09 14:59 ` Linus Torvalds
2009-03-09 15:13 ` Alan Stern
2009-03-09 15:40 ` Linus Torvalds
2009-03-07 10:21 ` [RFC][PATCH][2/8] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-07 10:22 ` [RFC][PATCH][3/8] PM: Change hibernation " Rafael J. Wysocki
2009-03-07 10:23 ` [RFC][PATCH][4/8] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-07 10:24 ` [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-07 10:25 ` [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-07 10:26 ` [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-07 10:27 ` [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend Rafael J. Wysocki
2009-03-08 19:28 ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
2009-03-08 20:50 ` Rafael J. Wysocki
2009-03-14 8:44 ` Frans Pop
2009-03-14 11:59 ` Rafael J. Wysocki
2009-03-14 14:11 ` Frans Pop
2009-03-14 22:31 ` Rafael J. Wysocki
2009-03-11 9:30 ` [PATCH 0/10] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated) Rafael J. Wysocki
2009-03-11 9:36 ` [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
2009-03-11 10:33 ` Thomas Gleixner
2009-03-11 20:59 ` Rafael J. Wysocki
2009-03-11 21:42 ` Thomas Gleixner
2009-03-11 22:01 ` Rafael J. Wysocki
2009-03-11 22:45 ` Thomas Gleixner
2009-03-12 13:36 ` Rafael J. Wysocki
2009-03-12 21:43 ` [update, rev. 6] " Rafael J. Wysocki
2009-03-13 0:39 ` Ingo Molnar
2009-03-13 17:07 ` Rafael J. Wysocki
2009-03-13 7:15 ` Arve Hjønnevåg
2009-03-13 16:53 ` Rafael J. Wysocki
2009-03-13 19:55 ` Thomas Gleixner
2009-03-13 21:56 ` Rafael J. Wysocki
2009-03-14 7:31 ` Thomas Gleixner
2009-03-14 10:01 ` Rafael J. Wysocki
2009-03-14 0:04 ` Rafael J. Wysocki
2009-03-11 21:15 ` Rafael J. Wysocki
2009-03-11 21:35 ` Thomas Gleixner
2009-03-11 21:50 ` Rafael J. Wysocki
2009-03-11 21:53 ` Thomas Gleixner
2009-03-11 22:01 ` Linus Torvalds
2009-03-11 22:13 ` Rafael J. Wysocki
2009-03-11 22:25 ` Thomas Gleixner
2009-03-11 22:07 ` Rafael J. Wysocki
2009-03-11 9:37 ` [PATCH 2/10] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-11 9:38 ` [PATCH 3/10] PM: Change hibernation " Rafael J. Wysocki
2009-03-11 9:39 ` [PATCH 4/10] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-11 9:41 ` [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-11 9:42 ` [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-11 9:47 ` [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-11 9:48 ` [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
2009-03-11 9:55 ` [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
2009-03-11 9:56 ` [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
2009-03-14 11:26 ` [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts Rafael J. Wysocki
2009-03-14 11:27 ` [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
2009-03-14 11:28 ` [PATCH 3/11] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-14 11:28 ` [PATCH 4/11] PM: Change hibernation " Rafael J. Wysocki
2009-03-14 11:29 ` [PATCH 5/11] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-14 11:30 ` [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-14 11:31 ` [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-14 11:32 ` [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-14 11:32 ` [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
2009-03-14 11:33 ` [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
2009-03-14 11:34 ` [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
2009-03-14 11:43 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200902231203.09082.rjw@sisk.pl \
--to=rjw@sisk.pl \
--cc=benh@kernel.crashing.org \
--cc=ebiederm@xmission.com \
--cc=jbarnes@virtuousgeek.org \
--cc=jeremy@goop.org \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox