public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: David Brownell <david-b@pacbell.net>
Cc: Nigel Cunningham <nigel@suspend2.net>,
	Andrew Morton <akpm@osdl.org>,
	linux-pm@lists.osdl.org, linux-usb-devel@lists.sourceforge.net
Subject: Re: [patch/rft 2.6.17-rc2] swsusp resume must not device_suspend()
Date: Tue, 25 Apr 2006 23:55:28 +0200	[thread overview]
Message-ID: <200604252355.29006.rjw@sisk.pl> (raw)
In-Reply-To: <200604251404.30228.david-b@pacbell.net>

On Tuesday 25 April 2006 23:04, David Brownell wrote:
> On Tuesday 25 April 2006 11:56 am, Rafael J. Wysocki wrote:
> > 
> > > I've begun thinking that calls like pm_should_I_spin_down_drives() would be a
> > > better structural approach than continually redefining this "freeze" thing so
> > > it makes less and less sense to all other drivers ... who nonethless need to
> > > clutter themselves up with a growing list of special cases, to accomodate
> > > rotating media that may not even exist in the target system.
> > 
> > I think we should do something different to device_power_down(PMSG_FREEZE)
> > there, but I'm not sure it should be kernel_restart_prepare(NULL).
> > 
> > Actually spinning down disks during resume is a problem for some users (yes,
> > we've had such bug reports recently), so it's better to avoid this.
> 
> Well, if we had a pm_should_I_spin_down_drives() it would make sense to me
> that it return FALSE during kernel_restart_prepare() too ... surely kexec
> users have the same issues!
> 
> If you currently have users who object to spindown-during-resume, then it'd
> seem that my patch couldn't change anything except maybe details.

Fortunately this particular problem has been fixed in the driver. ;-)

> And that switching over to a call like pm_should_I_spin_down_drives() should
> fix it all.

Agreed, but I have to learn quite a bit to implement such a thing.

> > > > OTOH I think at least some device driver writers assume that .resume() will
> > > > always be called after .suspend() which only is true for non-modular drivers
> > > > (or for modular drivers loaded from an initrd before resume). 
> > > 
> > > Say what?  Of _course_ resume() should only be called after suspend().  If
> > > that's not true in any case, the code wrongly issuing the resume() is buggy.
> > 
> > Well, suppose we have a modular driver that's not loaded before resume.
> 
> That's not the problem case though; it works correctly, since the device
> hardware is already being left in an appropriate (RESET) state.
> 
> 
> > Then it goes like that (approximately):
> > (1) We activate swsusp which calls .suspend() for all devices including our
> > driver (this is a real suspend).
> > (2) swsusp snapshots the system and creates the image.
> > (3) swsusp calls .resume() for all devices in order to be able to save the
> > image (.resume() for our driver is also called which is OK).
> > (4) swsusp turns off the system.
> > (5) (some time later) We start a new kernel and tell it to resume.
> > (6) It activates swsusp which reads the image.
> 
> And assuming this is an x86 PC, at this point every device is in one of three states:
> 
>   - initialized by BIOS.  This is a particular PITA for USB, but one that's
>     handled OK (mostly) except when BIOS bugs kick in.  There's some nasty
>     code that kicks in along with PCI quirk handling, which ensures that by
>     the time Linux-USB  driver could see this state (or the input subsystem
>     needs to care about it), the state has morphed to reset.  Video cards
>     have funky issues here too.
> 
>   - (powerup) reset.  This is the ideal state, in terms of "truth" to convey
>     to the image we're about to restore ... no ambiguity, every driver will
>     need to re-init.  As if there were (thank you!!) no BIOS.
> 
>   - initialized by Linux ... which leads to the case my patch addresses.
> 
> Those first two states are legit for any resume() call, and they apply in
> your scenario restriction.

IIRC, there were some ALSA problems with .resume() called on a reseted
device, but they seem to be fixed now.
 
> The third state is the problem scenario, kicking in when the driver was
> statically linked (or modprobed from initramfs, etc), but not during
> your scenario.

The problem, as I see it, is that too many devices may be initialized at the
kernel startup.  I think we _can_ reset some of them before the image
is restored, but at least some of them need to be treated more carefully.

> > (7) (without your change) swsusp calls .suspend() for all device drivers that
> > are present at that time,
> 
> ... the current troublesome consequence of that third state ...
> 
> > but our driver is not there, so its .suspend() 
> > _won't_ be called.  [Of course with your change .suspend() won't be called
> > for any driver.]
> 
> Right:  the first two "safe" cases kick in.  This is the partial workaround I
> had identified:  dodging the code paths for that third state, where suspend()
> is being used to put the hardware into a broken suspend state.

So perhaps we should just make them enter a state that's not broken?
That may be reset for some devices (eg. USB) and something else for some
others (eg. storage).

> Note that with that third state, there are actually two suspend() calls, but
> only one resume() call.  (Suspend before snapshot, suspend before resume
> snapshot, resume after activating snapshot.)  Such an extra suspend() call is
> a small hint that something's odd, and maybe wrong.

Agreed.

> > (8) swsusp restores the image.
> > (9) swsusp calls .resume() for all devices _including_ our driver, because it
> > was in memory before suspend.  For our driver this .resume() is not
> > called after .suspend(), is it?
> 
> The suspend() was called from the kernel being resumed ... and the device hardware
> is in one of the states (reset) that it's allowed to be in when calling its
> matching resume().  No problem there.
> 
> 
> > You're saying that (9) is wrong, so could you please suggest what to do
> > instead of it?
> 
> The case in which (9) is wrong is the case you excluded:  where the pre-resume
> kernel loaded the driver and used the third state listed above, and then trashed
> the correct device hardware state (reset) and replaced it with a suspend state.
> 
> It may help to think of two distinct types of device hardware suspend states
> (only the first is real, the second is just a software bug):
> 
>  - Correct, with internal state corresponding to what the driver suspend() did;
>    what a normal hardware suspend/resume cycle (not powercycle!) could do.
> 
>  - Broken, with any other internal state (except reset).  This is what swsusp
>    currently forces, by adding **AND HIDING** a reset and reinit cycle, because
>    of the extra suspend() call in (7).

Still there are drivers that have no problems with it, so why we should we
forcibly reset their devices?

> My patch/suggestion just ensures that instead of that broken state, reset is used.
> in all cases ... not just the "driver not initialized before snapshot resume" case.

As I said before I generally agree with this except I think some more fine
grained approach is needed in this case.

Basically it seems we need something like a .prepare_for_resume() routine,
that will be called before restoring the swsusp's image, apart from .suspend()
and .resume() for each driver.  Your patch just assumes that
.prepare_for_resume() should be "reset" for all devices.

Greetings,
Rafael

  parent reply	other threads:[~2006-04-25 21:55 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-24 21:29 [patch/rft 2.6.17-rc2] swsusp resume must not device_suspend() David Brownell
2006-04-24 21:47 ` [linux-pm] " Rafael J. Wysocki
2006-04-24 22:47   ` David Brownell
2006-04-25 10:34     ` Nigel Cunningham
2006-04-25 14:41       ` Alan Stern
2006-04-25 17:37         ` [linux-pm] " David Brownell
2006-04-25 20:45           ` Alan Stern
2006-04-26  0:30             ` David Brownell
2006-04-27  8:20               ` Pavel Machek
2006-04-27  8:16             ` Pavel Machek
2006-04-27  8:08         ` Pavel Machek
2006-04-27 14:34           ` [linux-pm] " Alan Stern
2006-04-27 16:55             ` Patrick Mochel
2006-04-27 17:41               ` Alan Stern
2006-04-27 19:21               ` David Brownell
2006-04-27 20:35                 ` Nigel Cunningham
2006-04-27 20:58                   ` Pavel Machek
2006-04-25 16:56       ` David Brownell
2006-04-24 22:31 ` [linux-pm] " Nigel Cunningham
2006-04-25  8:32   ` Rafael J. Wysocki
2006-04-25 16:11     ` David Brownell
2006-04-25 18:56       ` Rafael J. Wysocki
2006-04-25 20:28         ` Nigel Cunningham
2006-04-25 20:53           ` [linux-pm] " Rafael J. Wysocki
2006-04-25 21:03             ` Nigel Cunningham
2006-04-25 22:06               ` Rafael J. Wysocki
2006-04-25 22:18                 ` Nigel Cunningham
2006-04-25 22:34                   ` Rafael J. Wysocki
2006-04-25 23:55                   ` David Brownell
2006-04-26  1:16                     ` Nigel Cunningham
2006-04-26  3:32                       ` [linux-pm] " David Brownell
2006-04-26  3:44                         ` Nigel Cunningham
2006-04-26 14:24           ` Alan Stern
2006-04-26 19:47             ` [linux-pm] " Nigel Cunningham
2006-04-25 21:04         ` David Brownell
2006-04-25 21:41           ` Pavel Machek
2006-04-25 23:13             ` [linux-pm] " David Brownell
2006-04-26  9:07               ` Pavel Machek
2006-04-25 21:55           ` Rafael J. Wysocki [this message]
2006-04-25 22:56             ` David Brownell
2006-04-26 11:26               ` Rafael J. Wysocki
2006-04-26 14:38                 ` [linux-pm] " Alan Stern
2006-04-26 15:26                   ` Rafael J. Wysocki
2006-04-26 15:38                     ` Alan Stern
2006-04-26 16:09                       ` Rafael J. Wysocki
2006-04-26 19:06                         ` Alan Stern
2006-04-26 20:37                           ` Rafael J. Wysocki
2006-04-26 21:31                 ` David Brownell
2006-04-26 22:24                   ` Rafael J. Wysocki
2006-04-27 19:44                     ` David Brownell
2006-04-25 15:56   ` David Brownell
2006-04-27 10:54     ` Pavel Machek
2006-04-25 13:50 ` [linux-usb-devel] " Alan Stern
2006-04-25 15:49   ` David Brownell
2006-04-27  1:22 ` Patrick Mochel
2006-04-27 19:41   ` [linux-pm] " David Brownell
2006-05-02 16:12     ` Patrick Mochel
2006-05-26  3:06       ` David Brownell
2006-05-26 19:50         ` Rafael J. Wysocki
2006-05-26 23:16         ` Pavel Machek
2006-05-27  0:19           ` [linux-usb-devel] " David Brownell
2006-05-27 16:38             ` Pavel Machek
2006-06-05 15:31               ` David Brownell
2006-06-05 16:36               ` [patch/rft 2.6.17-rc5-git 0/6] PM_EVENT_PRETHAW David Brownell
2006-06-05 16:36               ` [patch/rft 2.6.17-rc5-git 1/6] fix broken/dubious driver suspend() methods David Brownell
     [not found]                 ` <20060530191140.GA4017@ucw.cz>
2006-06-07  0:53                   ` David Brownell
2006-06-05 16:37               ` [patch/rft 2.6.17-rc5-git 2/6] add PM_EVENT_PRETHAW David Brownell
2006-05-30 19:17                 ` Pavel Machek
2006-06-07  1:02                   ` David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 3/6] PM_EVENT_PRETHAW, handle in IDE and PCI David Brownell
2006-05-30 19:21                 ` Pavel Machek
2006-06-07  0:51                   ` David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 4/6] PM_EVENT_PRETHAW for various graphics cards David Brownell
2006-05-30 19:30                 ` Pavel Machek
2006-06-07  1:24                   ` David Brownell
2006-06-07 18:57                     ` PM docs and API? bsmith
2006-06-07 22:58                       ` David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 5/6] PM_EVENT_PRETHAW, handle for USB David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 6/6] PM_EVENT_PRETHAW, issue from PM core David Brownell
2006-05-30 19:28                 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200604252355.29006.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=akpm@osdl.org \
    --cc=david-b@pacbell.net \
    --cc=linux-pm@lists.osdl.org \
    --cc=linux-usb-devel@lists.sourceforge.net \
    --cc=nigel@suspend2.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox