On Wed, 4 Nov 2015 14:54:51 +0100
Laurent Vivier <lvivier@redhat.com> wrote:
On 04/11/2015 13:34, Hari Bathini wrote:
On 10/16/2015 12:30 AM, Laurent Vivier wrote:
On kexec, all secondary offline CPUs are onlined before
starting the new kernel, this is not done in the case of kdump.
If kdump is configured and a kernel crash occurs whereas
some secondaries CPUs are offline (SMT=off),
the new kernel is not able to start them and displays some
"Processor X is stuck.".
Starting with POWER8, subcore logic relies on all threads of
core being booted. So, on startup kernel tries to start all
threads, and asks OPAL (or RTAS) to start all CPUs (including
threads). If a CPU has been offlined by the previous kernel,
it has not been returned to OPAL, and thus OPAL cannot restart
it: this CPU has been lost...
Signed-off-by: Laurent Vivier<lvivier@redhat.com>
Hi Laurent,
Hi Hari,
Sorry for jumping too late into this.
better late than never :)
Are you seeing this issue even with the below patches:
pseries:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55
Unfortunately, this is unlikely to be relevant - this fixes a failure
while setting up the kexec. The problem we see occurs once we've
booted the second kernel and it's attempting to bring up secondary CPUs.
opal/powernv:
https://github.com/open-power/skiboot/commit/9ee56b5
Very interesting. Is there a way to have a firmware with the fix ?
>From Laurent's analysis of the crash, I don't think this will be