From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: ego@linux.vnet.ibm.com,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
matt@ozlabs.org, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
ebiederm@xmission.com, paulus@samba.org,
linuxppc-dev@lists.ozlabs.org, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 02:46:40 +0530 [thread overview]
Message-ID: <53922FB8.6070408@linux.vnet.ibm.com> (raw)
In-Reply-To: <5391B413.100@linux.vnet.ibm.com>
On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:
> On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>>
>
> Thanks a lot for the explanation Ben!
>
> I thought about this and this is what I think: whether the CPU is in the kernel
> or in the firmware is a hard-boundary. But once we know it is still in the
> kernel, whether it is online or offline is a soft-boundary, something that
> ideally shouldn't make any difference to kexec.
>
> Then I looked at what is that special state that kexec expects the online CPUs
> to be in, before performing kexec, and I found that that state is entered via
> kexec_smp_down().
>
> Which means, if we poke the soft-offline CPUs and make them execute
> kexec_smp_down(), we should be able to do a successful kexec without having to
> actually online them. After all, the core kexec code doesn't mandate that they
> should be online. So if we satisfy powerpc's requirement that all the CPUs are
> in a sane state, that should be good enough. (This would be similar to how the
> subcore code wakes up offline CPUs to perform the split-core procedure).
>
> I know, this is all theory for now since I haven't tested it yet, but I think
> we can make this work.
>
> Below are the 4 preliminary patches I'm have so far, to implement this.
>
And with the following hunk added (which I had forgotten earlier), it worked just
fine on powernv :-)
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 2ef6c58..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -243,6 +243,9 @@ static void wake_offline_cpus(void)
{
int cpu = 0;
+ if (ppc_md.kexec_wake_prepare)
+ ppc_md.kexec_wake_prepare();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
I tried putting the machine into ST mode, and in a separate experiment, I kept
just CPU 0 online in the first kernel, and then issued a kexec. The second kernel
booted successfully with all the CPUs in both the cases.
I haven't explored the crashed-kernel case though, it might need some auditing
to check if the code handles that as well.
Regards,
Srivatsa S. Bhat
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: ego@linux.vnet.ibm.com, matt@ozlabs.org,
mahesh@linux.vnet.ibm.com, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
ebiederm@xmission.com, paulus@samba.org,
linuxppc-dev@lists.ozlabs.org, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 02:46:40 +0530 [thread overview]
Message-ID: <53922FB8.6070408@linux.vnet.ibm.com> (raw)
In-Reply-To: <5391B413.100@linux.vnet.ibm.com>
On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:
> On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>>
>
> Thanks a lot for the explanation Ben!
>
> I thought about this and this is what I think: whether the CPU is in the kernel
> or in the firmware is a hard-boundary. But once we know it is still in the
> kernel, whether it is online or offline is a soft-boundary, something that
> ideally shouldn't make any difference to kexec.
>
> Then I looked at what is that special state that kexec expects the online CPUs
> to be in, before performing kexec, and I found that that state is entered via
> kexec_smp_down().
>
> Which means, if we poke the soft-offline CPUs and make them execute
> kexec_smp_down(), we should be able to do a successful kexec without having to
> actually online them. After all, the core kexec code doesn't mandate that they
> should be online. So if we satisfy powerpc's requirement that all the CPUs are
> in a sane state, that should be good enough. (This would be similar to how the
> subcore code wakes up offline CPUs to perform the split-core procedure).
>
> I know, this is all theory for now since I haven't tested it yet, but I think
> we can make this work.
>
> Below are the 4 preliminary patches I'm have so far, to implement this.
>
And with the following hunk added (which I had forgotten earlier), it worked just
fine on powernv :-)
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 2ef6c58..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -243,6 +243,9 @@ static void wake_offline_cpus(void)
{
int cpu = 0;
+ if (ppc_md.kexec_wake_prepare)
+ ppc_md.kexec_wake_prepare();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
I tried putting the machine into ST mode, and in a separate experiment, I kept
just CPU 0 online in the first kernel, and then issued a kexec. The second kernel
booted successfully with all the CPUs in both the cases.
I haven't explored the crashed-kernel case though, it might need some auditing
to check if the code handles that as well.
Regards,
Srivatsa S. Bhat
WARNING: multiple messages have this Message-ID (diff)
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: ego@linux.vnet.ibm.com, matt@ozlabs.org,
mahesh@linux.vnet.ibm.com, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, suzuki@in.ibm.com,
ebiederm@xmission.com, paulus@samba.org,
linuxppc-dev@lists.ozlabs.org, Vivek Goyal <vgoyal@redhat.com>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Date: Sat, 07 Jun 2014 02:46:40 +0530 [thread overview]
Message-ID: <53922FB8.6070408@linux.vnet.ibm.com> (raw)
In-Reply-To: <5391B413.100@linux.vnet.ibm.com>
On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:
> On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>>
>
> Thanks a lot for the explanation Ben!
>
> I thought about this and this is what I think: whether the CPU is in the kernel
> or in the firmware is a hard-boundary. But once we know it is still in the
> kernel, whether it is online or offline is a soft-boundary, something that
> ideally shouldn't make any difference to kexec.
>
> Then I looked at what is that special state that kexec expects the online CPUs
> to be in, before performing kexec, and I found that that state is entered via
> kexec_smp_down().
>
> Which means, if we poke the soft-offline CPUs and make them execute
> kexec_smp_down(), we should be able to do a successful kexec without having to
> actually online them. After all, the core kexec code doesn't mandate that they
> should be online. So if we satisfy powerpc's requirement that all the CPUs are
> in a sane state, that should be good enough. (This would be similar to how the
> subcore code wakes up offline CPUs to perform the split-core procedure).
>
> I know, this is all theory for now since I haven't tested it yet, but I think
> we can make this work.
>
> Below are the 4 preliminary patches I'm have so far, to implement this.
>
And with the following hunk added (which I had forgotten earlier), it worked just
fine on powernv :-)
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 2ef6c58..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -243,6 +243,9 @@ static void wake_offline_cpus(void)
{
int cpu = 0;
+ if (ppc_md.kexec_wake_prepare)
+ ppc_md.kexec_wake_prepare();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
I tried putting the machine into ST mode, and in a separate experiment, I kept
just CPU 0 online in the first kernel, and then issued a kexec. The second kernel
booted successfully with all the CPUs in both the cases.
I haven't explored the crashed-kernel case though, it might need some auditing
to check if the code handles that as well.
Regards,
Srivatsa S. Bhat
next prev parent reply other threads:[~2014-06-06 21:18 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-27 10:55 [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode Srivatsa S. Bhat
2014-05-27 10:55 ` Srivatsa S. Bhat
2014-05-27 10:55 ` Srivatsa S. Bhat
2014-05-28 13:31 ` Vivek Goyal
2014-05-28 13:31 ` Vivek Goyal
2014-05-28 13:31 ` Vivek Goyal
2014-06-03 20:28 ` Srivatsa S. Bhat
2014-06-03 20:28 ` Srivatsa S. Bhat
2014-06-03 20:28 ` Srivatsa S. Bhat
2014-06-03 22:09 ` Benjamin Herrenschmidt
2014-06-03 22:09 ` Benjamin Herrenschmidt
2014-06-03 22:09 ` Benjamin Herrenschmidt
2014-06-04 13:46 ` Vivek Goyal
2014-06-04 13:46 ` Vivek Goyal
2014-06-04 13:46 ` Vivek Goyal
2014-06-06 12:30 ` Srivatsa S. Bhat
2014-06-06 12:30 ` Srivatsa S. Bhat
2014-06-06 12:30 ` Srivatsa S. Bhat
2014-06-06 18:27 ` Vivek Goyal
2014-06-06 18:27 ` Vivek Goyal
2014-06-06 18:27 ` Vivek Goyal
2014-06-06 19:00 ` Srivatsa S. Bhat
2014-06-06 19:00 ` Srivatsa S. Bhat
2014-06-06 19:00 ` Srivatsa S. Bhat
2014-06-06 12:29 ` Srivatsa S. Bhat
2014-06-06 12:29 ` Srivatsa S. Bhat
2014-06-06 12:29 ` Srivatsa S. Bhat
2014-06-06 12:37 ` Srivatsa S. Bhat
2014-06-06 12:37 ` Srivatsa S. Bhat
2014-06-06 12:37 ` Srivatsa S. Bhat
2014-06-06 21:16 ` Srivatsa S. Bhat [this message]
2014-06-06 21:16 ` Srivatsa S. Bhat
2014-06-06 21:16 ` Srivatsa S. Bhat
2014-06-12 6:39 ` Joel Stanley
2014-06-12 6:39 ` Joel Stanley
2014-06-12 6:39 ` Joel Stanley
2014-06-12 8:17 ` Srivatsa S. Bhat
2014-06-12 8:17 ` Srivatsa S. Bhat
2014-06-12 8:17 ` Srivatsa S. Bhat
2014-06-04 13:41 ` Vivek Goyal
2014-06-04 13:41 ` Vivek Goyal
2014-06-04 13:41 ` Vivek Goyal
2014-06-06 12:31 ` Srivatsa S. Bhat
2014-06-06 12:31 ` Srivatsa S. Bhat
2014-06-06 12:31 ` Srivatsa S. Bhat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53922FB8.6070408@linux.vnet.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=ananth@in.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=ebiederm@xmission.com \
--cc=ego@linux.vnet.ibm.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=matt@ozlabs.org \
--cc=paulus@samba.org \
--cc=suzuki@in.ibm.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.