From: Laurent Vivier <lvivier@redhat.com>
To: David Gibson <dgibson@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
thuth@redhat.com
Subject: Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online
Date: Fri, 16 Oct 2015 09:57:24 +0200 [thread overview]
Message-ID: <5620ADE4.9060701@redhat.com> (raw)
In-Reply-To: <20151016132943.1386fda6@voom.fritz.box>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 16/10/2015 04:29, David Gibson wrote:
> On Thu, 15 Oct 2015 21:00:58 +0200 Laurent Vivier
> <lvivier@redhat.com> wrote:
>
>> On kexec, all secondary offline CPUs are onlined before starting
>> the new kernel, this is not done in the case of kdump.
>>
>> If kdump is configured and a kernel crash occurs whereas some
>> secondaries CPUs are offline (SMT=off), the new kernel is not
>> able to start them and displays some "Processor X is stuck.".
>>
>> Starting with POWER8, subcore logic relies on all threads of core
>> being booted. So, on startup kernel tries to start all threads,
>> and asks OPAL (or RTAS) to start all CPUs (including threads). If
>> a CPU has been offlined by the previous kernel, it has not been
>> returned to OPAL, and thus OPAL cannot restart it: this CPU has
>> been lost...
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> Nice analysis of the problem. But, I'm a bit uneasy about this
> approach to fixing it: Onlining potentially hundreds of CPU threads
> seems like a risky operation in a kernel that's already crashed.
I agree.
> I don't have a terribly clear idea of what is the best way to
> address this. Here's a few ideas in the right general direction:
>
> * I'm already looking into a kdump userspace fixes to stop it
> attempting to bring up secondary CPUs
>
> * A working kernel option to say "only allow this many online cpus
> ever" which we could pass to the kdump kernel would be nice
>
> * Paulus had an idea about offline threads returning themselves
> directly to OPAL by kicking a flag at kdump/kexec time.
For me the problem is: as these CPUs are offline, I guess the core has
been switched to 1 thread per core, so the CPUs (1 to 7 for core 0)
don't exist anymore, how can we return them to OPAL ?
>
> BenH, Paulus,
>
> OPAL <-> kernel cpu transitions don't seem to work quite how I
> thought they would. IIUC there's a register we can use to directly
> control which threads on a core are active. Given that I would
> have thought cpu "ownership" OPAL vs. kernel would be on a
> per-core, rather than per-thread basis.
>
> Is there some way we can change the CPU onlining / offlining code
> so that if threads aren't in OPAL, we directly enable them, rather
> than just hoping they're in a nap loop somewhere?
>
Laurent
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJWIK3kAAoJEPMMOL0/L748S4UP/2rJIRavrB4QylPMYKpRIxf6
VCLuve3TRY40er5GO8bwQ+95yHUo8K57OzZAh8T2mDQGjHGJArMElWUbb+EGaDF2
z5FU0iH7TKkJ9FDBlz2ZTny0vrEK2eBwxAFggLcfF8PeKMs5H4Rh9FrTFKKuc9Z4
KSAdhi4niKVdn0ln8M6k5FGB3AE0gG7zeTPeO74Knrr8cvOX1Xk5pfgzo2WpD91w
zymDgG127xBL0G9gs8jrse+yXoB2dLsevdxS6CEH4vKnjsLokqnWlk1n9JeIUKiW
+BEZ0llb5jppBYzOmrghTS5fPwh+Nmkbc4Kk9i/1Tjb8LRXNBEiSxVtHu9XIdwve
K37gOIuqCkOap0NE/AbcDjsFEoCFVSHbdD6cCgtLEPVFq7f8w7U/qa9ty//PM8br
KGtfZ1sG2/LCapMuyx3QhplxrXEy/bpQwT3BPnS818OMxrE20QfR5PM2C+nCpd4H
8mpdLpOctLJ7lgmYSwSlbNkJrQJvTFXv8WhZB2Qkadi0yaq8C5JZ3Dr10HrijoVL
lsOfrevB/mHrZmLBkp8t4+UYa5fM59nNpFZ/0BTdWfP8CDAlkw2Kla5PVeKN4ssk
GzySgQwOPsyS27aAk005ZeXPtfrGD93A43EcwG4IULf5J8DbzmCt5gPoJ241D0IO
3Z8+/4nl3WVRVzQ/Lwlc
=yLqE
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2015-10-16 7:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-15 19:00 [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online Laurent Vivier
2015-10-16 0:27 ` kbuild test robot
2015-10-16 2:14 ` Michael Ellerman
2015-10-16 7:48 ` Laurent Vivier
2015-10-17 2:01 ` Benjamin Herrenschmidt
2015-10-16 2:29 ` David Gibson
2015-10-16 7:57 ` Laurent Vivier [this message]
2015-10-17 2:03 ` Benjamin Herrenschmidt
2015-11-04 12:34 ` Hari Bathini
2015-11-04 13:54 ` Laurent Vivier
2015-11-05 1:32 ` David Gibson
2015-11-05 6:59 ` Stewart Smith
2015-11-05 10:23 ` Hari Bathini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5620ADE4.9060701@redhat.com \
--to=lvivier@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=dgibson@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.