public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed
From: Stephen Warren <swarren@wwwdotorg.org>
To: Will Deacon <will.deacon@arm.com>, Joseph Lo <josephl@nvidia.com>,
	Peter De Schrijver <pdeschrijver@nvidia.com>
Cc: "linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	Stephen Warren <swarren@nvidia.com>,
	Eric Biederman <ebiederm@xmission.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] kexec: disable non-boot CPUs
Date: Thu, 20 Dec 2012 13:15:30 -0700	[thread overview]
Message-ID: <50D371E2.1030807@wwwdotorg.org> (raw)
In-Reply-To: <50D35214.4090008@wwwdotorg.org>

On 12/20/2012 10:59 AM, Stephen Warren wrote:
> On 12/20/2012 10:36 AM, Will Deacon wrote:
>> On Thu, Dec 20, 2012 at 05:21:56PM +0000, Stephen Warren wrote:
>>> On 12/20/2012 03:49 AM, Will Deacon wrote:
>>>> If you do manage to get this merged, please can you follow up with a patch
>>>> to remove the smp_kill_cpus bits from arch/arm/kernel/smp.c please? It only
>>>> exists as a hook to do exactly this and currently nobody is using it afaict.
>>>
>>> I originally implemented this in
>>> arch/arm/kernel/process.c:machine_shutdown(), which currently is:
>>>
>>> void machine_shutdown(void)
>>> {
>>> #ifdef CONFIG_SMP
>>> 	smp_send_stop();
>>> #endif
>>> }
>>>
>>> and I changed it to something like:
>>>
>>> void machine_shutdown(void)
>>> {
>>> #ifdef CONFIG_HOTPLUG_CPU
>>>  	disable_nonboot_cpus();
>>> #elifdef CONFIG_SMP
>>> 	smp_send_stop();
>>> #endif
>>> }
>>>
>>> ... but then figured that moving it up into the core kexec code would be
>>> better, so that everything always worked the same way.
>>
>> Hmmm, isn't this racy: requiring the secondaries to hit idle and notice
>> they're offline and call cpu_die before the primary has replace the kernel
>> image?
> 
> Isn't disable_nonboot_cpus() synchronous? If not, I imagine my original
> patch wasn't any better in this respect, except that the hotunplug
> happened earlier, and hence reduced the likelihood of actually seeing
> any such issues.
> 
>>> Anyway, the change above addresses Eric's concern about isolating the
>>> change to ARM. Does that seem like a reasonable thing for the ARM code
>>> to do?
>>
>> I think you're better off using what we currently have and hanging your code
>> off platform_cpu_kill.
> 
> OK, I'll look into that. Joseph Lo just posted patches to implement
> cpu_kill() on Tegra, which was needed to fix some issues in our hotplug
> code anyway. Perhaps that will remove the need for any other changes...

Will,

I just remembered one other advantage of disable_nonboot_cpus(); it
always makes the kexec happen on the boot CPU. Without this, I believe
it's random whether CPU0 or CPU1 performs the kexec. I suspect it's most
likely to work if we can always kexec on the boot CPU rather than a
random CPU?

Joseph, Peter,

As you know, I've been looking into kexec[2] on Tegra. Here's a summary
of a few tests I did:

linux-next plus nothing in particular, SMP enabled:
* Hangs/crashes during kexec

linux-next + SMP disabled: kexec works

linux-next + hotunplug CPUs other than CPU0 before kexec: kexec works

Will Deacon suggested this was due to a missing implementation of struct
smp_operations .cpu_kill on Tegra, which means that when CPUn are
present, they're simply spinning executing code, and kexec will
eventually overwrite that code, causing all manner of problems. So,
since I noticed that Joseph posted an implementation of .cpu_kill for
Tegra, I tried that out[1]. It certainly doesn't solve the problem, and
in fact actually causes the kernel performing the kexec (rather than the
kexec'd kernel) to hang:

> [   26.291903] Starting new kernel
> [   47.309571] INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=2102 jiffies, g=211, c=210, q=37)
> [   47.323410] Task dump for CPU 1:
> [   47.329763] dd              R running      0   401      1 0x00000000
> [   47.339343] [<c0521690>] (__schedule+0x360/0x600) from [<c002d9b0>] (do_syslog+0x2b4/0x478)
> [   47.350952] [<c002d9b0>] (do_syslog+0x2b4/0x478) from [<ed86bb08>] (0xed86bb08)

Manually hotunplugging the CPUs first still works OK with those patches
applied though.

I /think/ kexec calls .cpu_kill() without having caused the CPU itself
to call .cpu_die() first? Did you allow for that possibility inside the
implementation you posted?

[1]
http://patchwork.ozlabs.org/patch/207601/
http://patchwork.ozlabs.org/patch/207602/

[2]
http://en.wikipedia.org/wiki/Kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2012-12-20 20:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-19 23:44 [PATCH] kexec: disable non-boot CPUs Stephen Warren
2012-12-19 23:55 ` Eric W. Biederman
2012-12-20 10:49 ` Will Deacon
2012-12-20 17:21   ` Stephen Warren
2012-12-20 17:36     ` Will Deacon
2012-12-20 17:59       ` Stephen Warren
2012-12-20 20:15         ` Stephen Warren [this message]
2012-12-23 11:06           ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D371E2.1030807@wwwdotorg.org \
    --to=swarren@wwwdotorg.org \
    --cc=ebiederm@xmission.com \
    --cc=josephl@nvidia.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=pdeschrijver@nvidia.com \
    --cc=swarren@nvidia.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox