From: Michael Neuling <mikey@neuling.org>
To: Matt Evans <matt@ozlabs.org>
Cc: linuxppc-dev@ozlabs.org, kexec@lists.infradead.org,
Milton Miller <miltonm@bga.com>
Subject: Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
Date: Fri, 30 Jul 2010 13:15:14 +1000 [thread overview]
Message-ID: <20694.1280459714@neuling.org> (raw)
In-Reply-To: <4C521FD2.4050301@ozlabs.org>
(adding kexec list to CC)
In message <4C521FD2.4050301@ozlabs.org> you wrote:
> Michael Neuling wrote:
> > In message <4C511216.30109@ozlabs.org> you wrote:
> >> When CPU hotplug is used, some CPUs may be offline at the time a kexec is
> >> performed. The subsequent kernel may expect these CPUs to be already runn
ing
> > ,
> >> and will declare them stuck. On pseries, there's also a soft-offline (ced
e)
> >> state that CPUs may be in; this can also cause problems as the kexeced ker
nel
> >> may ask RTAS if they're online -- and RTAS would say they are. Again, stu
ck.
> >>
> >> This patch kicks each present offline CPU awake before the kexec, so that
> >> none are lost to these assumptions in the subsequent kernel.
> >
> > There are a lot of cleanups in this patch. The change you are making
> > would be a lot clearer without all the additional cleanups in there. I
> > think I'd like to see this as two patches. One for cleanups and one for
> > the addition of wake_offline_cpus().
>
> Okay, I can split this. Typofixy-add-debug in one, wake_offline_cpus
> in another.
Thanks.
>
> > Other than that, I'm not completely convinced this is the functionality
> > we want. Do we really want to online these cpus? Why where they
> > offlined in the first place? I understand the stuck problem, but is the
> > solution to online them, or to change the device tree so that the second
> > kernel doesn't detect them as stuck?
>
> Well... There are two cases. If a CPU is soft-offlined on pseries, it
> must b e woken from that cede loop (in
> platforms/pseries/hotplug-cpu.c) as we're repla cing code under its
> feet. We could either special-case the wakeup from this ce de loop to
> get that CPU to RTAS "stop-self" itself properly. (Kind of like a "
> wake to die".)
Makes sense.
> So that leaves hard-offline CPUs (perhaps including the above): I
> don't know why they might have been offlined. If it's something
> serious, like fire, they'd be removed from the present set too (and
> thus not be considered in this restarting case). We could add a mask
> to the CPU node to show which of the threads (if any) are running, and
> alter the startup code to start everything if this mask doesn't exist
> (non-kexec) or only online currently-running threads if the mask is
> present. That feels a little weird.
>
> My reasoning for restarting everything was: The first time you boot,
> all of your present CPUs are started up. When you reboot, any CPUs
> you offlined for fun are restarted. Kexec is (in this non-crash
> sense) a user-initiated 'quick reboot', so I reasoned that it should
> look the same as a 'hard reboot' and your new invocation would have
> all available CPUs running as is usual.
OK, I like this justification. Would be good to include it in the
checkin comment since we're changing functionality somewhat.
Mikey
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Michael Neuling <mikey@neuling.org>
To: Matt Evans <matt@ozlabs.org>
Cc: linuxppc-dev@ozlabs.org, kexec@lists.infradead.org,
Milton Miller <miltonm@bga.com>
Subject: Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec
Date: Fri, 30 Jul 2010 13:15:14 +1000 [thread overview]
Message-ID: <20694.1280459714@neuling.org> (raw)
In-Reply-To: <4C521FD2.4050301@ozlabs.org>
(adding kexec list to CC)
In message <4C521FD2.4050301@ozlabs.org> you wrote:
> Michael Neuling wrote:
> > In message <4C511216.30109@ozlabs.org> you wrote:
> >> When CPU hotplug is used, some CPUs may be offline at the time a kexec is
> >> performed. The subsequent kernel may expect these CPUs to be already runn
ing
> > ,
> >> and will declare them stuck. On pseries, there's also a soft-offline (ced
e)
> >> state that CPUs may be in; this can also cause problems as the kexeced ker
nel
> >> may ask RTAS if they're online -- and RTAS would say they are. Again, stu
ck.
> >>
> >> This patch kicks each present offline CPU awake before the kexec, so that
> >> none are lost to these assumptions in the subsequent kernel.
> >
> > There are a lot of cleanups in this patch. The change you are making
> > would be a lot clearer without all the additional cleanups in there. I
> > think I'd like to see this as two patches. One for cleanups and one for
> > the addition of wake_offline_cpus().
>
> Okay, I can split this. Typofixy-add-debug in one, wake_offline_cpus
> in another.
Thanks.
>
> > Other than that, I'm not completely convinced this is the functionality
> > we want. Do we really want to online these cpus? Why where they
> > offlined in the first place? I understand the stuck problem, but is the
> > solution to online them, or to change the device tree so that the second
> > kernel doesn't detect them as stuck?
>
> Well... There are two cases. If a CPU is soft-offlined on pseries, it
> must b e woken from that cede loop (in
> platforms/pseries/hotplug-cpu.c) as we're repla cing code under its
> feet. We could either special-case the wakeup from this ce de loop to
> get that CPU to RTAS "stop-self" itself properly. (Kind of like a "
> wake to die".)
Makes sense.
> So that leaves hard-offline CPUs (perhaps including the above): I
> don't know why they might have been offlined. If it's something
> serious, like fire, they'd be removed from the present set too (and
> thus not be considered in this restarting case). We could add a mask
> to the CPU node to show which of the threads (if any) are running, and
> alter the startup code to start everything if this mask doesn't exist
> (non-kexec) or only online currently-running threads if the mask is
> present. That feels a little weird.
>
> My reasoning for restarting everything was: The first time you boot,
> all of your present CPUs are started up. When you reboot, any CPUs
> you offlined for fun are restarted. Kexec is (in this non-crash
> sense) a user-initiated 'quick reboot', so I reasoned that it should
> look the same as a 'hard reboot' and your new invocation would have
> all available CPUs running as is usual.
OK, I like this justification. Would be good to include it in the
checkin comment since we're changing functionality somewhat.
Mikey
next prev parent reply other threads:[~2010-07-30 3:15 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-29 5:31 [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec Matt Evans
2010-07-30 0:08 ` Michael Neuling
2010-07-30 0:41 ` Matt Evans
2010-07-30 3:15 ` Michael Neuling [this message]
2010-07-30 3:15 ` Michael Neuling
2010-07-30 3:25 ` Simon Horman
2010-07-30 3:25 ` Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20694.1280459714@neuling.org \
--to=mikey@neuling.org \
--cc=kexec@lists.infradead.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=matt@ozlabs.org \
--cc=miltonm@bga.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.