All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nigel Cunningham <ncunningham@linuxmail.org>
To: Andrew Morton <akpm@osdl.org>
Cc: Don Zickus <dzickus@redhat.com>,
	ak@suse.de, shaohua.li@intel.com, miles.lane@gmail.com,
	jeremy@goop.org, linux-kernel@vger.kernel.org
Subject: Re: [2.6.17-rc5-mm2] crash when doing second suspend: BUG in arch/i386/kernel/nmi.c:174
Date: Wed, 7 Jun 2006 09:38:29 +1000	[thread overview]
Message-ID: <200606070938.34927.ncunningham@linuxmail.org> (raw)
In-Reply-To: <20060606162201.f0f9f308.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 4901 bytes --]

Hi guys.

Back on board after the big shift :)

On Wednesday 07 June 2006 09:22, Andrew Morton wrote:
> On Tue, 6 Jun 2006 19:05:04 -0400
>
> Don Zickus <dzickus@redhat.com> wrote:
> > On Tue, Jun 06, 2006 at 03:15:07PM -0700, Andrew Morton wrote:
> > > On Tue, 6 Jun 2006 17:45:53 -0400
> > >
> > > Don Zickus <dzickus@redhat.com> wrote:
> > > > On Tue, Jun 06, 2006 at 04:18:15PM +0200, Andi Kleen wrote:
> > > > > > Because he is using a i386 machine, the nmi watchdog is disabled
> > > > > > by default.
> > > > >
> > > > > I changed that - it's now on by default on i386 too.
> > > > >
> > > > > -Andi
> > > >
> > > > I am trying to create a patch for this problem and it just dawned on
> > > > me, how does one store the previous state in a suspend/resume path if
> > > > the code hotplugs all the cpus first?  CPU0 is easy because an
> > > > explicit suspend/resume path is called, but it seems to be called
> > > > last after all the other cpus have been removed.  How do I save the
> > > > state?
> > >
> > > I'm really struggling to understand this question.  If you're referring
> > > to some per-cpu state then a CPU hotplug handler would be appropriate?
> >
> > Sorry.  I got ahead of myself.  My concern is how the suspend/resume code
> > works with device drivers on an SMP system.  My initial impression was
> > that the subsystem registers with the suspend/resume layer and upon such
> > actions those registered functions are called.
> >
> > Inside those functions I saved the previous state of the watchdog timer.
> > However, I learned today that my understanding was incorrect.  Instead
> > first the _hotplug_ code is called for every cpu _except_ cpu0.  The
> > _suspend/resume_ functions are only called in the context of _cpu0_.
> >
> > This breaks the design I have because upon resuming the watchdog timers
> > automatically start on all cpus (except cpu0 because I saved the previous
> > state through the handlers), regardless of what the previous state was.
> >
> > So my question is/was what is the proper way to handle processor level
> > subsystems during the suspend/resume path on an SMP system.  I really
> > don't understand the hotplug path nor the suspend/resume path very well.
> >
> > I didn't want to register a hotplug handler because a hotplug event is
> > really different than a suspend event (I want to _save_ info during a
> > suspend event).  The documentation I was reading seemed to suggest that
> > hotplug/suspend/smp was a work-in-progress.
> >
> > Is the typical approach to just hack in an extra parameter to the
> > start/stop functions of the nmi_watchdog letting the function know it is
> > coming through the suspend/resume path?
> >
> > Any tips, code, other docs would be helpful.
>
> OK...  My understanding of how it works is that the cpu hotplug handlers
> are called early in the suspend process to take the CPUs down.  Once all
> the APs are shut down, CPU0 will then proceed to handle the devices.
>
> So if you want to save and restore per-cpu NMI state then doing it in the
> CPU hot-add and hot-remove handlers is appropriate.  It will affect the
> behaviour of _real_ CPU hot-add and hot-remove as well.  But in what
> appears to be a correct fashion.
>
> All the above applies to suspend-to-disk.  I don't know if suspend-to-RAM
> shuts down the APs.

I'm not sure about suspend to ram either, but I can confirm the rest:

* Hotplugging handlers should manage the state of secondary cpus, really 
disabling NMIs on unplug and only enabling NMIs if the boot processor has 
them enabled (assuming consistent behaviour across cpus is desired). At a 
hotplug event, both the hardware state and values of variables may not match 
the state at the end of a previous hotplug event (we may have powered off, or 
may have switched from a boot kernel to a suspended-to-disk one), so even if 
you know the hotplug event is synthesised, you may still need to treat it as 
uninitialised.
* Driver suspend and resume calls should only handle cpu0, and should not 
touch other processors. The same semantics regarding hardware state and 
values of variables apply here.

If you _really_ need to store in a variable what the hardware state was at the 
last suspend call, it is possible to use _nosave variables, but this isn't 
done or recommended at the moment. (I think it's a good idea, but that's just 
my opinion). The value of a _nosave variable will persist across the atomic 
restore of a suspend to disk kernel context, and since a drivers_suspend call 
is made before doing the atomic restore, you'll never get uninitialised 
values. I'm not sure what it would be helpful for, but there may be a case.

Hope this helps.

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2006-06-06 23:37 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-02 22:51 [2.6.17-rc5-mm2] crash when doing second suspend: BUG in arch/i386/kernel/nmi.c:174 Jeremy Fitzhardinge
2006-06-04 11:47 ` Rafael J. Wysocki
2006-06-05  7:21   ` Jeremy Fitzhardinge
2006-06-05  7:37 ` Jeremy Fitzhardinge
2006-06-05  7:48   ` Andrew Morton
2006-06-05  7:59     ` Jeremy Fitzhardinge
2006-06-05  8:35     ` Miles Lane
2006-06-06  6:44       ` Shaohua Li
2006-06-06 14:17         ` Don Zickus
2006-06-06 14:18           ` Andi Kleen
2006-06-06 21:45             ` Don Zickus
2006-06-06 22:15               ` Andrew Morton
2006-06-06 23:05                 ` Don Zickus
2006-06-06 23:22                   ` Andrew Morton
2006-06-06 23:27                     ` Jeremy Fitzhardinge
2006-06-06 23:32                       ` Andi Kleen
2006-06-06 23:42                       ` Don Zickus
2006-06-08 20:11                         ` Pavel Machek
2006-06-06 23:38                     ` Nigel Cunningham [this message]
2006-06-07  0:06                       ` Jeremy Fitzhardinge
2006-06-07  0:13                         ` Nigel Cunningham
2006-06-07  0:24                           ` Andrew Morton
2006-06-07  0:29                             ` Jeremy Fitzhardinge
2006-06-07  0:31                             ` Nigel Cunningham
2006-06-07  0:33                             ` Andi Kleen
2006-06-07  0:40                               ` Nigel Cunningham
2006-06-07  0:26                           ` Jeremy Fitzhardinge
2006-06-07  0:33                             ` Nigel Cunningham
2006-06-07  0:56                               ` Jeremy Fitzhardinge
2006-06-08 20:13                         ` Pavel Machek
2006-06-08 12:45                     ` Pavel Machek
2006-06-06 23:34                   ` Andi Kleen
2006-06-06 23:55                     ` Don Zickus
2006-06-07  0:04                       ` Andi Kleen
2006-06-07  0:05                       ` Nigel Cunningham
2006-06-07  0:42                         ` Don Zickus
2006-06-07  0:50                           ` Nigel Cunningham
2006-06-07  3:29                             ` [linux-pm] " David Brownell
2006-06-07  9:55                           ` Rafael J. Wysocki
2006-06-08 20:27                           ` Pavel Machek
2006-06-06 16:23         ` Jeremy Fitzhardinge
2006-06-06 16:51           ` Don Zickus
2006-06-07  2:49           ` Don Zickus
2006-06-07 16:33             ` Andi Kleen
2006-06-07 17:07             ` Jeremy Fitzhardinge
2006-06-07 17:50               ` Don Zickus
2006-06-07 18:53                 ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200606070938.34927.ncunningham@linuxmail.org \
    --to=ncunningham@linuxmail.org \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=dzickus@redhat.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miles.lane@gmail.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.