public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	pm list <linux-pm@lists.linux-foundation.org>,
	Len Brown <lenb@kernel.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume
Date: Mon, 23 Feb 2009 12:03:07 +0100	[thread overview]
Message-ID: <200902231203.09082.rjw@sisk.pl> (raw)
In-Reply-To: <m1bpst77te.fsf@fess.ebiederm.org>

On Monday 23 February 2009, Eric W. Biederman wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >> Ingo Molnar <mingo@elte.hu> writes:
> >> 
> >> > I think this aspect has been well-understood during the 
> >> > discussion of this topic and it's just a slightly misleading 
> >> > changelog.
> >> 
> >> As I was a member of that discussion I did not see that.
> >> 
> >> It took me several passes through the patches to realize the 
> >> goal is to allow drivers to be able to sleep while they are in 
> >> their late pm shutdown routines.
> >> 
> >> Why we want this I don't know.  But it seems simple enough to 
> >> implement, and it makes it harder to get the late pm suspend 
> >> routines wrong, which is always good.
> >
> > That's not the only goal. The other goal is to further shrink a 
> > particular window of suspend fragility: the irqs-disabled stage 
> > of the suspend/resume sequence.
> >
> > Since suspend/resume is a mini-reboot sequence, there's a large 
> > amount of code executed - and the variety of code is large as 
> > well. We had repeat cases of random drivers re-enabling 
> > interrupts and thus breaking other drivers - and these are nasty 
> > to debug.
> >
> > So this patchset disables device IRQs centrally and serializes 
> > with pending work - so there's no races with pending IRQs 
> > anymore.
> >
> > The fact that we keep the timer irq running is two-fold: firstly 
> > the timer code is special and not really part of the regular 
> > suspend/resume sequence.
> >
> > Drivers want to take timestamps, sometimes they even want to do 
> > a small usleep(), etc. Ideally the suspend/resume code is pretty 
> > much _the same_ as a regular bootup (and shutdown) code - so we 
> > want to provide a similar environment to how drivers initialize 
> > and deinitialize, and we want to enable them to share code 
> > between bootup/shutdown and suspend/resume agressively.
> >
> > So the more generic kernel environment we give these fragile 
> > handlers, the better we are off in the end. Since we already had 
> > IRQS_TIMER, that was just the natural thing to do.
> 
> I am all for sharing code, especially if we can factor if
> we can find common factors that do the same thing.
> 
> I don't know how many times I have found drivers doing something
> weird in their shutdown routines that they don't know how
> to get the device out of.  The e1000 driver has shown up several
> times because it likes to suspend the device on shutdown.
> 
> The fact that the methods exposed to drivers were only defined
> to be usable on the s2ram/hibernate path is something I have
> brought up on more than one occasion as a bad choice.
> 
> I'm really not convinced that the rational for separating
> out the shutdown methods from the remove methods has
> been very good.  That of we don't need to clean up the in-kernel
> data structures on reboot so why do something extra that can
> introduce instability.
> 
> So having been watching a smaller form of this drama on the
> reboot path for several years.  Having had a device method
> with fixed semantics, and not the dwm sematics of the historical
> suspend routing.  I expect there is still a ways to go before
> it is simple and easy for drivers to figure out what they need
> to implement out of the confusing variety of possible device
> methods.
> 
> >> > The new suspend code does not rely on truly disabling IRQs 
> >> > on the low level. The purpose is to not get IRQs to drivers 
> >> > - which might crash/hang/race/misbehave.
> >> 
> >> Reasonable.  I expect one of the problems with drivers getting 
> >> it wrong is that the interface is too complex for mortal 
> >> humans to understand.
> >
> > The suspend/resume state machine certainly used to be a piece of 
> > code that makes a seasoned kernel developer weep in fear.
> >
> > That has changed drastically in the past few months. The 
> > suspend+hibernation logic got unified (at least as far as driver 
> > methods go), and all the flow and ordering has been cleaned up 
> > and has been made more robust.
> 
> I will have to look again.  My impression is that overloading
> a single method is part of what got us into this mess in the
> first place.
> 
> No that I don't see things getting better. 
> 
> > What makes s2ram fragile is not human failure but the 
> > combination of a handful of physical property:
> >
> > 1) Psychology: shutting the lid or pushing the suspend button is 
> >    a deceivingly 'simple' action to the user. But under the 
> >    hood, a ton of stuff happens: we deinitialize a lot of 
> >    things, we go through _all hardware state_, and we do so in a 
> >    serial fashion. If just one piece fails to do the right 
> >    thing, the box might not resume. Still, the user expects this 
> >    'simple' thing to just work, all the time. No excuses 
> >    accepted.
> >
> > 2) Length of code: To get a successful s2ram sequence the kernel
> >    runs through tens of thousands of lines of code. Code which
> >    never gets executed on a normal box - only if we s2ram. If 
> >    just one step fails, we get a hung box.
> >
> > 3) Debuggability: a lot of s2ram code runs with the console off, 
> >    making any bugs hard to debug. Furthermore we have no 
> >    meaningful persistent storage either for kernel bug messages. 
> >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> >    of information and it takes a lot of time to debug a bug via 
> >    that method.
> 
> Yep that is an issue.
> 
> > The combination of these factors really makes up for a perfect 
> > storm in terms of kernel technology: we have this 
> > very-deceivingly-simple-looking but complex-and-rarely-executed 
> > piece of code, which is very hard to debug.
> 
> And much of this as you are finding with this piece of code
> is how the software was designed rather then how the software
> needed to be.
> 
> > Even just one of these factors would be enough to make an 
> > otherwise healthy subsystem fragile - no wonder s2ram has been a 
> > problem ever since it existed in the upstream kernel.
> >
> > So now we need just one thing: patience and more of the same 
> > good stuff that happened lately.
> 
> I think there has been some good progress, and so I am happy
> to be patient.  I will still mention on occasion what it
> seems we are doing wrong.  Unfortunately I don't have time
> to do a lot more than that.
> 
> >> > Still, it might make sense to not just use the ->disable 
> >> > sequence but primarily the ->shutdown irqchip method (when 
> >> > it's available in the irqchip).
> >> 
> >> Disable seems fine to me.  This is interesting in the context 
> >> of all of the irqs that will when masked show up somewhere 
> >> else (think boot interrupts).
> >> 
> >> > While we obviously cannot turn off the PIC that delivers 
> >> > timer IRQs at this stage - there's no theoretical reason why 
> >> > the suspend sequence couldnt power down some secondary PICs 
> >> > as well - in some arch code, or maybe even in the generic 
> >> > driver suspend sequence if the device tree is structured 
> >> > carefully enough so that the PIC gets turned off last.
> >> 
> >> If the point is simply to prevent deliver of irqs to the 
> >> drivers I don't see the point of anything more than what the 
> >> patch does now.
> >
> > ... except for the usecase i described above. Say some PIC sits 
> > on a piece of silicon which gets turned off. I'm not talking 
> > about x86 but some custom device. We really dont want that IRQ 
> > line to send half of an IRQ message (un-ACK-ed) when it gets 
> > turned off. So physically 'suspending' all IRQ lines does make a 
> > certain level of long-term sense.
> 
> Good point.  We will loose both level and edge triggered events
> that occur between suspending the irqs and restoring them but
> that is inevitable.  So we might as well call shutdown and totally
> turn off the irqs if we can.
> 
> I don't know where in the state machine this is getting called but
> I would suggest doing this before we shutdown cpus.

This is the plan.  In fact, I'm going to do this in the next patch after the
$subject one has been tested and found acceptable.

Thanks,
Rafael

  reply	other threads:[~2009-02-23 11:04 UTC|newest]

Thread overview: 187+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 17:37 [RFC][PATCH 0/2] Rework disabling of interrupts during suspend-resume Rafael J. Wysocki
2009-02-22 17:38 ` [RFC][PATCH 1/2] PM: Split up sysdev_[suspend|resume] from device_power_[down|up] Rafael J. Wysocki
2009-02-22 20:56   ` Adrian Bunk
2009-02-22 21:07     ` Linus Torvalds
2009-02-22 21:12       ` Ingo Molnar
2009-02-22 22:42       ` Adrian Bunk
2009-03-05 16:54   ` Pavel Machek
2009-02-22 17:39 ` [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
2009-02-22 18:01   ` Linus Torvalds
2009-02-22 22:42     ` Rafael J. Wysocki
2009-02-22 23:48       ` Rafael J. Wysocki
2009-02-23  0:05         ` Linus Torvalds
2009-02-23  1:23           ` Linus Torvalds
2009-02-23 10:52             ` Rafael J. Wysocki
2009-02-23  3:04         ` Eric W. Biederman
2009-02-23  8:44           ` Ingo Molnar
2009-02-23  9:22             ` Eric W. Biederman
2009-02-23  9:44               ` Ingo Molnar
2009-02-23 10:42                 ` Eric W. Biederman
2009-02-23 11:03                   ` Rafael J. Wysocki [this message]
2009-02-23 15:28                     ` Eric W. Biederman
2009-02-23 21:39                       ` Rafael J. Wysocki
2009-02-24  3:30                         ` Eric W. Biederman
2009-02-24 22:42                           ` Rafael J. Wysocki
2009-02-24 22:51                             ` Linus Torvalds
2009-02-24 23:07                               ` Rafael J. Wysocki
2009-02-24 23:09                                 ` Ingo Molnar
2009-02-24 23:29                                   ` Rafael J. Wysocki
2009-02-25 13:23                                     ` Ingo Molnar
2009-02-26  1:17                                     ` Arve Hjønnevåg
2009-02-26  1:27                                       ` Linus Torvalds
2009-02-26  2:13                                         ` Arve Hjønnevåg
2009-02-26  2:51                                           ` Linus Torvalds
2009-02-26  3:00                                             ` Ingo Molnar
2009-02-26  3:31                                               ` Arve Hjønnevåg
2009-02-26  3:37                                                 ` Linus Torvalds
2009-02-26  3:50                                                   ` Arve Hjønnevåg
2009-02-26  3:57                                                     ` Linus Torvalds
2009-02-26  4:13                                                       ` Arve Hjønnevåg
2009-02-26  4:20                                                         ` Eric W. Biederman
2009-02-26  4:24                                                           ` Arve Hjønnevåg
2009-02-26  9:50                                       ` Rafael J. Wysocki
2009-02-26 20:34                                         ` Arve Hjønnevåg
2009-02-26 20:57                                           ` Benjamin Herrenschmidt
2009-02-26 21:20                                             ` Arve Hjønnevåg
2009-02-26 21:49                                               ` Benjamin Herrenschmidt
2009-02-26 21:58                                           ` Rafael J. Wysocki
2009-02-26 22:10                                             ` Linus Torvalds
2009-02-26 22:30                                               ` Arve Hjønnevåg
2009-02-26 23:10                                                 ` Rafael J. Wysocki
2009-02-27  0:00                                                   ` Arve Hjønnevåg
2009-02-27  0:27                                                     ` Linus Torvalds
2009-02-27  3:20                                                       ` [linux-pm] " Alan Stern
2009-02-27  4:43                                                         ` Linus Torvalds
2009-02-27 14:59                                                           ` Alan Stern
2009-02-27 20:30                                                             ` Linus Torvalds
2009-02-28  3:54                                                               ` Arve Hjønnevåg
2009-02-28 10:06                                                                 ` Rafael J. Wysocki
2009-02-28 17:03                                                                   ` Linus Torvalds
2009-02-28 22:15                                                                   ` Arve Hjønnevåg
2009-02-26 22:30                                               ` Rafael J. Wysocki
2009-02-25  4:16                               ` Eric W. Biederman
2009-02-25  4:26                                 ` Linus Torvalds
2009-02-25  4:59                                   ` Eric W. Biederman
2009-02-25 15:32                             ` [linux-pm] " Alan Stern
2009-02-25 16:19                               ` Linus Torvalds
2009-02-23 11:04                   ` Ingo Molnar
2009-02-23 14:45                     ` Rafael J. Wysocki
2009-02-23 15:06                       ` Ingo Molnar
2009-02-23 21:59                         ` Rafael J. Wysocki
2009-02-23 10:13               ` Benjamin Herrenschmidt
2009-02-23  8:36         ` Ingo Molnar
2009-02-23 11:29           ` Rafael J. Wysocki
2009-02-23 12:28             ` Ingo Molnar
2009-02-23 14:48               ` Rafael J. Wysocki
2009-02-23 20:49               ` Benjamin Herrenschmidt
2009-02-23 12:45             ` Ingo Molnar
2009-02-23 15:07               ` Rafael J. Wysocki
2009-02-23 15:52             ` Johannes Berg
2009-02-23 17:16             ` Ingo Molnar
2009-02-23 17:28               ` Linus Torvalds
2009-02-23 22:11                 ` Rafael J. Wysocki
2009-02-23 22:11   ` Arve Hjønnevåg
2009-02-23 22:23     ` Rafael J. Wysocki
2009-02-23 22:44       ` Arve Hjønnevåg
2009-02-22 18:13 ` [RFC][PATCH 0/2] Rework disabling " Linus Torvalds
2009-02-22 18:18   ` Ingo Molnar
2009-02-22 18:25     ` Linus Torvalds
2009-02-22 18:35       ` Linus Torvalds
2009-02-22 22:37 ` Eric W. Biederman
2009-02-22 22:56   ` Benjamin Herrenschmidt
2009-02-22 23:02   ` Linus Torvalds
2009-03-01 22:21 ` [RFC][PATCH 0/4] " Rafael J. Wysocki
2009-03-01 22:24   ` [RFC][PATCH 1/4] PM: Rework handling of interrupts during suspend-resume (rev. 4) Rafael J. Wysocki
2009-03-02 23:01     ` Arve Hjønnevåg
2009-03-02 23:13       ` Rafael J. Wysocki
2009-03-02 23:18         ` Arve Hjønnevåg
2009-03-02 23:27           ` Rafael J. Wysocki
2009-03-03 22:56             ` Arve Hjønnevåg
2009-03-04 22:03               ` [Update, rev. 5] " Rafael J. Wysocki
2009-03-05 10:35                 ` Ingo Molnar
2009-03-02 23:32           ` Linus Torvalds
2009-03-02 23:35             ` Linus Torvalds
2009-03-03  0:08               ` Arve Hjønnevåg
2009-03-03  8:41                 ` Arve Hjønnevåg
2009-03-01 22:25   ` [RFC][PATCH 2/4] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-02 20:48     ` Linus Torvalds
2009-03-02 22:02       ` Rafael J. Wysocki
2009-03-01 22:26   ` [RFC][PATCH 3/4] PM: Change hibernation " Rafael J. Wysocki
2009-03-01 22:27   ` [RFC][PATCH 4/4] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-05 23:44   ` [RFC][PATCH 0/4] Rework disabling of interrupts during suspend-resume Linus Torvalds
2009-03-06  6:47     ` Sitsofe Wheeler
2009-03-06 10:19     ` Rafael J. Wysocki
2009-03-07 10:19 ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Rafael J. Wysocki
2009-03-07 10:20   ` [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
2009-03-07 16:51     ` [linux-pm] " Alan Stern
2009-03-07 17:56       ` Rafael J. Wysocki
2009-03-08  3:53         ` Alan Stern
2009-03-08 10:00           ` Rafael J. Wysocki
2009-03-08 12:37             ` Alan Stern
2009-03-08 17:20           ` Linus Torvalds
2009-03-08 20:40             ` Alan Stern
2009-03-08 21:37               ` Rafael J. Wysocki
2009-03-09 14:59               ` Linus Torvalds
2009-03-09 15:13                 ` Alan Stern
2009-03-09 15:40                   ` Linus Torvalds
2009-03-07 10:21   ` [RFC][PATCH][2/8] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-07 10:22   ` [RFC][PATCH][3/8] PM: Change hibernation " Rafael J. Wysocki
2009-03-07 10:23   ` [RFC][PATCH][4/8] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-07 10:24   ` [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-07 10:25   ` [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-07 10:26   ` [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-07 10:27   ` [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend Rafael J. Wysocki
2009-03-08 19:28   ` [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts Frans Pop
2009-03-08 20:50     ` Rafael J. Wysocki
2009-03-14  8:44       ` Frans Pop
2009-03-14 11:59         ` Rafael J. Wysocki
2009-03-14 14:11           ` Frans Pop
2009-03-14 22:31             ` Rafael J. Wysocki
2009-03-11  9:30 ` [PATCH 0/10] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated) Rafael J. Wysocki
2009-03-11  9:36   ` [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5) Rafael J. Wysocki
2009-03-11 10:33     ` Thomas Gleixner
2009-03-11 20:59       ` Rafael J. Wysocki
2009-03-11 21:42         ` Thomas Gleixner
2009-03-11 22:01           ` Rafael J. Wysocki
2009-03-11 22:45           ` Thomas Gleixner
2009-03-12 13:36             ` Rafael J. Wysocki
2009-03-12 21:43               ` [update, rev. 6] " Rafael J. Wysocki
2009-03-13  0:39                 ` Ingo Molnar
2009-03-13 17:07                   ` Rafael J. Wysocki
2009-03-13  7:15                 ` Arve Hjønnevåg
2009-03-13 16:53                   ` Rafael J. Wysocki
2009-03-13 19:55                 ` Thomas Gleixner
2009-03-13 21:56                   ` Rafael J. Wysocki
2009-03-14  7:31                     ` Thomas Gleixner
2009-03-14 10:01                       ` Rafael J. Wysocki
2009-03-14  0:04                   ` Rafael J. Wysocki
2009-03-11 21:15       ` Rafael J. Wysocki
2009-03-11 21:35         ` Thomas Gleixner
2009-03-11 21:50           ` Rafael J. Wysocki
2009-03-11 21:53             ` Thomas Gleixner
2009-03-11 22:01               ` Linus Torvalds
2009-03-11 22:13                 ` Rafael J. Wysocki
2009-03-11 22:25                 ` Thomas Gleixner
2009-03-11 22:07               ` Rafael J. Wysocki
2009-03-11  9:37   ` [PATCH 2/10] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-11  9:38   ` [PATCH 3/10] PM: Change hibernation " Rafael J. Wysocki
2009-03-11  9:39   ` [PATCH 4/10] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-11  9:41   ` [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-11  9:42   ` [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-11  9:47   ` [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-11  9:48   ` [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
2009-03-11  9:55   ` [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
2009-03-11  9:56   ` [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
2009-03-14 11:24 ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Rafael J. Wysocki
2009-03-14 11:26   ` [PATCH 1/11] PM: Introduce functions for suspending and resuming device interrupts Rafael J. Wysocki
2009-03-14 11:27   ` [PATCH 2/11] PM: Rework handling of interrupts during suspend-resume Rafael J. Wysocki
2009-03-14 11:28   ` [PATCH 3/11] PM: Change suspend code ordering Rafael J. Wysocki
2009-03-14 11:28   ` [PATCH 4/11] PM: Change hibernation " Rafael J. Wysocki
2009-03-14 11:29   ` [PATCH 5/11] kexec: Change kexec jump " Rafael J. Wysocki
2009-03-14 11:30   ` [PATCH 6/11] PCI PM: Consistently use variable name "error" for pm call return values Rafael J. Wysocki
2009-03-14 11:31   ` [PATCH 7/11] PCI PM: Use pci_set_power_state during early resume Rafael J. Wysocki
2009-03-14 11:32   ` [PATCH 8/11] PCI PM: Move pci_restore_standard_config to pci-driver.c Rafael J. Wysocki
2009-03-14 11:32   ` [PATCH 9/11] PCI PM: Put devices into low power states during late suspend (rev. 2) Rafael J. Wysocki
2009-03-14 11:33   ` [PATCH 10/11] PCI PM: Make pci_set_power_state() handle devices with no PM support Rafael J. Wysocki
2009-03-14 11:34   ` [PATCH 11/11] PCI PM: Restore config spaces of all devices during early resume Rafael J. Wysocki
2009-03-14 11:43   ` [PATCH 0/11] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated 2x) Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200902231203.09082.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=benh@kernel.crashing.org \
    --cc=ebiederm@xmission.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jeremy@goop.org \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox