* [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
@ 2024-06-25 13:40 Nicholas Piggin
2024-06-26 9:27 ` Michael Ellerman
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Nicholas Piggin @ 2024-06-25 13:40 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Sourabh Jain, Nicholas Piggin
kexec on pseries disables AIL (reloc_on_exc), required for scv
instruction support, before other CPUs have been shut down. This means
they can execute scv instructions after AIL is disabled, which causes an
interrupt at an unexpected entry location that crashes the kernel.
Change the kexec sequence to disable AIL after other CPUs have been
brought down.
As a refresher, the real-mode scv interrupt vector is 0x17000, and the
fixed-location head code probably couldn't easily deal with implementing
such high addresses so it was just decided not to support that interrupt
at all.
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kexec/core_64.c | 11 +++++++++++
arch/powerpc/platforms/pseries/kexec.c | 8 --------
arch/powerpc/platforms/pseries/pseries.h | 1 -
arch/powerpc/platforms/pseries/setup.c | 1 -
4 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 85050be08a23..72b12bc10f90 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -27,6 +27,7 @@
#include <asm/paca.h>
#include <asm/mmu.h>
#include <asm/sections.h> /* _end */
+#include <asm/setup.h>
#include <asm/smp.h>
#include <asm/hw_breakpoint.h>
#include <asm/svm.h>
@@ -317,6 +318,16 @@ void default_machine_kexec(struct kimage *image)
if (!kdump_in_progress())
kexec_prepare_cpus();
+#ifdef CONFIG_PPC_PSERIES
+ /*
+ * This must be done after other CPUs have shut down, otherwise they
+ * could execute the 'scv' instruction, which is not supported with
+ * reloc disabled (see configure_exceptions()).
+ */
+ if (firmware_has_feature(FW_FEATURE_SET_MODE))
+ pseries_disable_reloc_on_exc();
+#endif
+
printk("kexec: Starting switchover sequence.\n");
/* switch to a staticly allocated stack. Based on irq stack code.
diff --git a/arch/powerpc/platforms/pseries/kexec.c b/arch/powerpc/platforms/pseries/kexec.c
index 096d09ed89f6..431be156ca9b 100644
--- a/arch/powerpc/platforms/pseries/kexec.c
+++ b/arch/powerpc/platforms/pseries/kexec.c
@@ -61,11 +61,3 @@ void pseries_kexec_cpu_down(int crash_shutdown, int secondary)
} else
xics_kexec_teardown_cpu(secondary);
}
-
-void pseries_machine_kexec(struct kimage *image)
-{
- if (firmware_has_feature(FW_FEATURE_SET_MODE))
- pseries_disable_reloc_on_exc();
-
- default_machine_kexec(image);
-}
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index bba4ad192b0f..3968a6970fa8 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -38,7 +38,6 @@ static inline void smp_init_pseries(void) { }
#endif
extern void pseries_kexec_cpu_down(int crash_shutdown, int secondary);
-void pseries_machine_kexec(struct kimage *image);
extern void pSeries_final_fixup(void);
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 284a6fa04b0c..b44de0f0822f 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1159,7 +1159,6 @@ define_machine(pseries) {
.machine_check_exception = pSeries_machine_check_exception,
.machine_check_log_err = pSeries_machine_check_log_err,
#ifdef CONFIG_KEXEC_CORE
- .machine_kexec = pseries_machine_kexec,
.kexec_cpu_down = pseries_kexec_cpu_down,
#endif
#ifdef CONFIG_MEMORY_HOTPLUG
--
2.45.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-25 13:40 [PATCH] powerpc/pseries: Fix scv instruction crash with kexec Nicholas Piggin
@ 2024-06-26 9:27 ` Michael Ellerman
2024-06-26 9:46 ` Sourabh Jain
2024-06-26 9:40 ` Gautam Menghani
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2024-06-26 9:27 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin, Sourabh Jain
Nicholas Piggin <npiggin@gmail.com> writes:
> kexec on pseries disables AIL (reloc_on_exc), required for scv
> instruction support, before other CPUs have been shut down. This means
> they can execute scv instructions after AIL is disabled, which causes an
> interrupt at an unexpected entry location that crashes the kernel.
>
> Change the kexec sequence to disable AIL after other CPUs have been
> brought down.
>
> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
> fixed-location head code probably couldn't easily deal with implementing
> such high addresses so it was just decided not to support that interrupt
> at all.
>
> Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Was this reported publicly? I don't remember it.
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-25 13:40 [PATCH] powerpc/pseries: Fix scv instruction crash with kexec Nicholas Piggin
2024-06-26 9:27 ` Michael Ellerman
@ 2024-06-26 9:40 ` Gautam Menghani
2024-07-01 4:16 ` Sourabh Jain
2024-07-06 22:49 ` Michael Ellerman
2024-07-09 10:53 ` Michal Suchánek
3 siblings, 1 reply; 10+ messages in thread
From: Gautam Menghani @ 2024-06-26 9:40 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: linuxppc-dev, Sourabh Jain
Without this patch, we had an issue where if we have some cpus disabled
in the system and we try to do a 2 stage kexec as follows:
kexec -l vmlinux ....
kexec -e
we would hit the following Oops
[ 2598.923098] kernel BUG at arch/powerpc/kernel/exceptions-64s.S:501!
[ 2598.923103] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2598.923107] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 2598.923111] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc kvm_hv kvm bonding tls rfkill binfmt_misc tg3 vmx_crypto aes_gcm_p10_crypto ibmveth crct10dif_vpmsum pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse loop dm_multipath nfnetlink zram xfs ibmvscsi scsi_transport_srp crc32c_vpmsum pseries_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables
[ 2598.923167] CPU: 11 PID: 1548 Comm: systemd-journal Not tainted 6.9.0+ #4
[ 2598.923171] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries
[ 2598.923176] NIP: c0000000000089e4 LR: 00007fffaa1427c4 CTR: c0000000000089b0
[ 2598.923180] REGS: c0000008dfe7fd60 TRAP: 0700 Not tainted (6.9.0+)
[ 2598.923184] MSR: 8000000000021031 <SF,ME,IR,DR,LE> CR: 28002413 XER: 00000000
[ 2598.923192] CFAR: c0000000000089dc IRQMASK: 0
[ 2598.923192] GPR00: 0000000000000003 00007ffff40fb110 0000000000000000 0000000000000009
[ 2598.923192] GPR04: 00007ffff40fbcf0 0000000000002000 00007ffff40fdcc0 0000000000000000
[ 2598.923192] GPR08: 00007fffaabc3b80 0000000048002413 00007ffff40fb3e0 0000000000017000
[ 2598.923192] GPR12: 8000000000009003 c0000008dfff2b00 0000000000000000 0000000000000000
[ 2598.923192] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2598.923192] GPR20: 0000000000000000 0000000000000000 0000000000000000 00007fffaabaf448
[ 2598.923192] GPR24: 000000011bc72700 00007ffff40fddf8 0000000132490ea0 00007ffff40fddf0
[ 2598.923192] GPR28: 0000000000000000 00007ffff40fbcf0 0000000000002000 0000000000000009
[ 2598.923238] NIP [c0000000000089e4] data_access_common_virt+0x14/0x220
[ 2598.923245] LR [00007fffaa1427c4] 0x7fffaa1427c4
[ 2598.923251] Call Trace:
[ 2598.923253] Code: 2c0a0000 39400300 408242c0 e94d0020 694a0002 7d400164 60420000 718a4000 7c2a0b78 3821fd30 41c20008 e82d0910 <0981fd30> f9210160 f9610130 f9810138
[ 2598.923269] ---[ end trace 0000000000000000 ]---
[ 2598.926662] pstore: backend (nvram) writing error (-1)
With this patch, the disabled cpus are woken up and kexec goes through
fine.
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-26 9:27 ` Michael Ellerman
@ 2024-06-26 9:46 ` Sourabh Jain
2024-06-28 12:01 ` Michael Ellerman
0 siblings, 1 reply; 10+ messages in thread
From: Sourabh Jain @ 2024-06-26 9:46 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, linuxppc-dev
Hello Michael,
On 26/06/24 14:57, Michael Ellerman wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> kexec on pseries disables AIL (reloc_on_exc), required for scv
>> instruction support, before other CPUs have been shut down. This means
>> they can execute scv instructions after AIL is disabled, which causes an
>> interrupt at an unexpected entry location that crashes the kernel.
>>
>> Change the kexec sequence to disable AIL after other CPUs have been
>> brought down.
>>
>> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
>> fixed-location head code probably couldn't easily deal with implementing
>> such high addresses so it was just decided not to support that interrupt
>> at all.
>>
>> Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>
> Was this reported publicly? I don't remember it.
No, I didn't report this issue publicly.
While debugging a kexec issue, the git bisect pointed to the commit
mentioned
in the patch description. So, I contacted Nick directly.
`kexec -e` with --smt=off the first kernel hits exception when
wake_offline_cpus() -> add_cpu() is called
to bring up offline CPUs.
Console log:
[ 68.824514] restraintd[899]: * Parsing recipe
[ 68.825546] restraintd[899]: * Running recipe
[ 68.825591] restraintd[899]: ** Continuing task: 20291
[/mnt/tests/distribution/reservesys]
[ 68.834095] restraintd[899]: ** Preparing metadata
[ 68.872927] restraintd[899]: ** Refreshing peer role hostnames: Retries 0
[ 68.911107] restraintd[899]: ** Updating env vars
[ 68.911737] restraintd[899]: *** Current Time: Tue May 21 09:09:42
2024 Localwatchdog at: * Disabled! *
[ 68.922803] restraintd[899]: ** Running task: 20291
[/distribution/reservesys]
[ 78.027943] Removing IBM Power 842 compression device
[ 78.093777] XFS (sda2): Block device removal (0x20) detected at
xfs_fs_shutdown+0x34/0x50 [xfs] (fs/xfs/xfs_super.c:1179). Shutting down
filesystem.
[ 78.093894] XFS (sda2): Please unmount the filesystem and rectify the
problem(s)
[ 83.450854] dm-0: writeback error on inode 17086756, offset 569344,
sector 11026136
[ 83.450910] dm-0: writeback error on inode 36421601, offset 0, sector
20772504
[ 84.021819] dm-0: writeback error on inode 36382045, offset 0, sector
20772536
[ 84.094348] dm-0: writeback error on inode 18703102, offset 0, sector
11021000
[ 84.601228] dm-0: writeback error on inode 51268015, offset 0, sector
27663152
[ 84.601468] dm-0: writeback error on inode 58225471, offset 0, sector
34636080
[ 85.370996] kexec_core: Starting new kernel
[ 85.391013] kexec: Waking offline cpu 1.
[ 85.391038] ------------[ cut here ]------------
[ 85.391042] kernel BUG at arch/powerpc/kernel/exceptions-64s.S:501!
[ 85.391047] Oops: Exception in kernel mode, sig: 5 [#1]
[ 85.391051] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 85.391056] Modules linked in: bonding tls rfkill pseries_rng
vmx_crypto drm fuse drm_panel_orientation_quirks xfs libcrc32c sr_mod
sd_mod cdrom t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror
dm_region_hash dm_log dm_mod
[ 85.391086] CPU: 0 PID: 565 Comm: systemd-journal Kdump: loaded Not
tainted 6.9.0+ #1
[ 85.391092] Hardware name: IBM,9008-22L POWER9 (raw) 0x4e0202
0xf000005 of:IBM,FW950.A0 (VL950_144) hv:phyp pSeries
[ 85.391096] NIP: c0000000000089a4 LR: 000000000001703c CTR:
c000000000008980
[ 85.391101] REGS: c00000000f76fd60 TRAP: 0700 Not tainted (6.9.0+)
[ 85.391106] MSR: 8000000000021031 <SF,ME,IR,DR,LE> CR: 240022d4
XER: 00000000
[ 85.391116] CFAR: c00000000000899c IRQMASK: 0
[ 85.391116] GPR00: 0000000000000003 00007fffc4f783a0 00007fff9f0a7200
0000010014331bb8
[ 85.391116] GPR04: 00007fffc4f7b078 000000000000c4f6 00007fffc4f7b1d0
00000100143469a0
[ 85.391116] GPR08: 00007fff9f489268 00000000440022d4 00007fffc4f78670
00000000000ac588
[ 85.391116] GPR12: 8000000000009003 c000000002f50000 0000000000000000
0000000000000000
[ 85.391116] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 85.391116] GPR20: 0000000000000000 0000000000000000 0000000127117b48
00000001271185b8
[ 85.391116] GPR24: 0000000127117b90 00007fffc4f7b070 0000010014331540
00007fffc4f7b078
[ 85.391116] GPR28: 0000000000000000 00007fffc4f78f80 000000000000c4f6
0000010014331ba0
[ 85.391173] NIP [c0000000000089a4] data_access_common_virt+0x14/0x220
[ 85.391181] LR [000000000001703c] 0x1703c
[ 85.391186] Call Trace:
[ 85.391189] Code: 48024df9 48000000 60000000 e94d0020 694a0002
7d400164 60000000 718a4000 7c2a0b78 3821fd30 41c20008 e82d0910
<0981fd30> f9210160 f9610130 f9810138
[ 85.391208] ---[ end trace 0000000000000000 ]---
[ 85.394302] pstore: backend (nvram) writing error (-1)
[ 85.394306]
[ 86.394309] Kernel panic - not syncing: Fatal exception
[ 86.399970] Rebooting in 10 seconds..
Thanks,
Sourabh Jain
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-26 9:46 ` Sourabh Jain
@ 2024-06-28 12:01 ` Michael Ellerman
0 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2024-06-28 12:01 UTC (permalink / raw)
To: Sourabh Jain, Nicholas Piggin, linuxppc-dev
Sourabh Jain <sourabhjain@linux.ibm.com> writes:
> On 26/06/24 14:57, Michael Ellerman wrote:
>> Nicholas Piggin <npiggin@gmail.com> writes:
>>> kexec on pseries disables AIL (reloc_on_exc), required for scv
>>> instruction support, before other CPUs have been shut down. This means
>>> they can execute scv instructions after AIL is disabled, which causes an
>>> interrupt at an unexpected entry location that crashes the kernel.
>>>
>>> Change the kexec sequence to disable AIL after other CPUs have been
>>> brought down.
>>>
>>> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
>>> fixed-location head code probably couldn't easily deal with implementing
>>> such high addresses so it was just decided not to support that interrupt
>>> at all.
>>>
>>> Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>>
>> Was this reported publicly? I don't remember it.
>
> No, I didn't report this issue publicly.
OK. It's always nice to have a public report so if someone else hits it,
either at the same time, or in the future, they can search the archive
and see that it's been reported.
But this now counts as a public report, so I'll just point the link at
this thread :)
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-26 9:40 ` Gautam Menghani
@ 2024-07-01 4:16 ` Sourabh Jain
0 siblings, 0 replies; 10+ messages in thread
From: Sourabh Jain @ 2024-07-01 4:16 UTC (permalink / raw)
To: Gautam Menghani, Nicholas Piggin; +Cc: linuxppc-dev
On 26/06/24 15:10, Gautam Menghani wrote:
> Without this patch, we had an issue where if we have some cpus disabled
> in the system and we try to do a 2 stage kexec as follows:
>
> kexec -l vmlinux ....
> kexec -e
>
> we would hit the following Oops
>
> [ 2598.923098] kernel BUG at arch/powerpc/kernel/exceptions-64s.S:501!
> [ 2598.923103] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 2598.923107] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [ 2598.923111] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc kvm_hv kvm bonding tls rfkill binfmt_misc tg3 vmx_crypto aes_gcm_p10_crypto ibmveth crct10dif_vpmsum pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse loop dm_multipath nfnetlink zram xfs ibmvscsi scsi_transport_srp crc32c_vpmsum pseries_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables
> [ 2598.923167] CPU: 11 PID: 1548 Comm: systemd-journal Not tainted 6.9.0+ #4
> [ 2598.923171] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries
> [ 2598.923176] NIP: c0000000000089e4 LR: 00007fffaa1427c4 CTR: c0000000000089b0
> [ 2598.923180] REGS: c0000008dfe7fd60 TRAP: 0700 Not tainted (6.9.0+)
> [ 2598.923184] MSR: 8000000000021031 <SF,ME,IR,DR,LE> CR: 28002413 XER: 00000000
> [ 2598.923192] CFAR: c0000000000089dc IRQMASK: 0
> [ 2598.923192] GPR00: 0000000000000003 00007ffff40fb110 0000000000000000 0000000000000009
> [ 2598.923192] GPR04: 00007ffff40fbcf0 0000000000002000 00007ffff40fdcc0 0000000000000000
> [ 2598.923192] GPR08: 00007fffaabc3b80 0000000048002413 00007ffff40fb3e0 0000000000017000
> [ 2598.923192] GPR12: 8000000000009003 c0000008dfff2b00 0000000000000000 0000000000000000
> [ 2598.923192] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 2598.923192] GPR20: 0000000000000000 0000000000000000 0000000000000000 00007fffaabaf448
> [ 2598.923192] GPR24: 000000011bc72700 00007ffff40fddf8 0000000132490ea0 00007ffff40fddf0
> [ 2598.923192] GPR28: 0000000000000000 00007ffff40fbcf0 0000000000002000 0000000000000009
> [ 2598.923238] NIP [c0000000000089e4] data_access_common_virt+0x14/0x220
> [ 2598.923245] LR [00007fffaa1427c4] 0x7fffaa1427c4
> [ 2598.923251] Call Trace:
> [ 2598.923253] Code: 2c0a0000 39400300 408242c0 e94d0020 694a0002 7d400164 60420000 718a4000 7c2a0b78 3821fd30 41c20008 e82d0910 <0981fd30> f9210160 f9610130 f9810138
> [ 2598.923269] ---[ end trace 0000000000000000 ]---
> [ 2598.926662] pstore: backend (nvram) writing error (-1)
>
>
> With this patch, the disabled cpus are woken up and kexec goes through
> fine.
Verified the same on LPAR and has similar observation as Guatam
mentioned above.
Thanks for the fix Nick.
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
- Sourabh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-25 13:40 [PATCH] powerpc/pseries: Fix scv instruction crash with kexec Nicholas Piggin
2024-06-26 9:27 ` Michael Ellerman
2024-06-26 9:40 ` Gautam Menghani
@ 2024-07-06 22:49 ` Michael Ellerman
2024-07-09 10:53 ` Michal Suchánek
3 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2024-07-06 22:49 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin; +Cc: Sourabh Jain
On Tue, 25 Jun 2024 23:40:47 +1000, Nicholas Piggin wrote:
> kexec on pseries disables AIL (reloc_on_exc), required for scv
> instruction support, before other CPUs have been shut down. This means
> they can execute scv instructions after AIL is disabled, which causes an
> interrupt at an unexpected entry location that crashes the kernel.
>
> Change the kexec sequence to disable AIL after other CPUs have been
> brought down.
>
> [...]
Applied to powerpc/fixes.
[1/1] powerpc/pseries: Fix scv instruction crash with kexec
https://git.kernel.org/powerpc/c/21a741eb75f80397e5f7d3739e24d7d75e619011
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-06-25 13:40 [PATCH] powerpc/pseries: Fix scv instruction crash with kexec Nicholas Piggin
` (2 preceding siblings ...)
2024-07-06 22:49 ` Michael Ellerman
@ 2024-07-09 10:53 ` Michal Suchánek
2024-07-09 13:03 ` Michael Ellerman
3 siblings, 1 reply; 10+ messages in thread
From: Michal Suchánek @ 2024-07-09 10:53 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: linuxppc-dev, Sourabh Jain
Hello,
On Tue, Jun 25, 2024 at 11:40:47PM +1000, Nicholas Piggin wrote:
> kexec on pseries disables AIL (reloc_on_exc), required for scv
> instruction support, before other CPUs have been shut down. This means
> they can execute scv instructions after AIL is disabled, which causes an
> interrupt at an unexpected entry location that crashes the kernel.
>
> Change the kexec sequence to disable AIL after other CPUs have been
> brought down.
>
> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
> fixed-location head code probably couldn't easily deal with implementing
> such high addresses so it was just decided not to support that interrupt
> at all.
>
> Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
looks like this is only broken by
commit 2ab2d5794f14 ("powerpc/kasan: Disable address sanitization in kexec paths")
This change reverts the kexec parts done in that commit.
That is the fix is 5.19+, not 5.9+
Thanks
Michal
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/kexec/core_64.c | 11 +++++++++++
> arch/powerpc/platforms/pseries/kexec.c | 8 --------
> arch/powerpc/platforms/pseries/pseries.h | 1 -
> arch/powerpc/platforms/pseries/setup.c | 1 -
> 4 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 85050be08a23..72b12bc10f90 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -27,6 +27,7 @@
> #include <asm/paca.h>
> #include <asm/mmu.h>
> #include <asm/sections.h> /* _end */
> +#include <asm/setup.h>
> #include <asm/smp.h>
> #include <asm/hw_breakpoint.h>
> #include <asm/svm.h>
> @@ -317,6 +318,16 @@ void default_machine_kexec(struct kimage *image)
> if (!kdump_in_progress())
> kexec_prepare_cpus();
>
> +#ifdef CONFIG_PPC_PSERIES
> + /*
> + * This must be done after other CPUs have shut down, otherwise they
> + * could execute the 'scv' instruction, which is not supported with
> + * reloc disabled (see configure_exceptions()).
> + */
> + if (firmware_has_feature(FW_FEATURE_SET_MODE))
> + pseries_disable_reloc_on_exc();
> +#endif
> +
> printk("kexec: Starting switchover sequence.\n");
>
> /* switch to a staticly allocated stack. Based on irq stack code.
> diff --git a/arch/powerpc/platforms/pseries/kexec.c b/arch/powerpc/platforms/pseries/kexec.c
> index 096d09ed89f6..431be156ca9b 100644
> --- a/arch/powerpc/platforms/pseries/kexec.c
> +++ b/arch/powerpc/platforms/pseries/kexec.c
> @@ -61,11 +61,3 @@ void pseries_kexec_cpu_down(int crash_shutdown, int secondary)
> } else
> xics_kexec_teardown_cpu(secondary);
> }
> -
> -void pseries_machine_kexec(struct kimage *image)
> -{
> - if (firmware_has_feature(FW_FEATURE_SET_MODE))
> - pseries_disable_reloc_on_exc();
> -
> - default_machine_kexec(image);
> -}
> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
> index bba4ad192b0f..3968a6970fa8 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -38,7 +38,6 @@ static inline void smp_init_pseries(void) { }
> #endif
>
> extern void pseries_kexec_cpu_down(int crash_shutdown, int secondary);
> -void pseries_machine_kexec(struct kimage *image);
>
> extern void pSeries_final_fixup(void);
>
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index 284a6fa04b0c..b44de0f0822f 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -1159,7 +1159,6 @@ define_machine(pseries) {
> .machine_check_exception = pSeries_machine_check_exception,
> .machine_check_log_err = pSeries_machine_check_log_err,
> #ifdef CONFIG_KEXEC_CORE
> - .machine_kexec = pseries_machine_kexec,
> .kexec_cpu_down = pseries_kexec_cpu_down,
> #endif
> #ifdef CONFIG_MEMORY_HOTPLUG
> --
> 2.45.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-07-09 10:53 ` Michal Suchánek
@ 2024-07-09 13:03 ` Michael Ellerman
2024-07-09 13:10 ` Michal Suchánek
0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2024-07-09 13:03 UTC (permalink / raw)
To: Michal Suchánek, Nicholas Piggin; +Cc: linuxppc-dev, Sourabh Jain
Michal Suchánek <msuchanek@suse.de> writes:
> Hello,
>
> On Tue, Jun 25, 2024 at 11:40:47PM +1000, Nicholas Piggin wrote:
>> kexec on pseries disables AIL (reloc_on_exc), required for scv
>> instruction support, before other CPUs have been shut down. This means
>> they can execute scv instructions after AIL is disabled, which causes an
>> interrupt at an unexpected entry location that crashes the kernel.
>>
>> Change the kexec sequence to disable AIL after other CPUs have been
>> brought down.
>>
>> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
>> fixed-location head code probably couldn't easily deal with implementing
>> such high addresses so it was just decided not to support that interrupt
>> at all.
>>
>> Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
>
> looks like this is only broken by
> commit 2ab2d5794f14 ("powerpc/kasan: Disable address sanitization in kexec paths")
>
> This change reverts the kexec parts done in that commit.
>
> That is the fix is 5.19+, not 5.9+
Commit 2ab2d5794f14 moved the kexec code from one file to another, but
didn't change when the key function (pseries_disable_reloc_on_exc()) was
called.
The old code was:
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index a3dab15b0a2f..c9fcc30a0365 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -421,16 +421,6 @@ void pseries_disable_reloc_on_exc(void)
}
EXPORT_SYMBOL(pseries_disable_reloc_on_exc);
-#ifdef CONFIG_KEXEC_CORE
-static void pSeries_machine_kexec(struct kimage *image)
-{
- if (firmware_has_feature(FW_FEATURE_SET_MODE))
- pseries_disable_reloc_on_exc();
-
- default_machine_kexec(image);
-}
-#endif
-
ie. pseries_disable_reloc_on_exc() (which disables AIL) is called before
default_machine_kexec() where secondary CPUs are collected.
So AFAICS the bug would still have been there prior to 2ab2d5794f14. But
it's late here so I could be reading it wrong.
cheers
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] powerpc/pseries: Fix scv instruction crash with kexec
2024-07-09 13:03 ` Michael Ellerman
@ 2024-07-09 13:10 ` Michal Suchánek
0 siblings, 0 replies; 10+ messages in thread
From: Michal Suchánek @ 2024-07-09 13:10 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, Sourabh Jain, Nicholas Piggin
On Tue, Jul 09, 2024 at 11:03:10PM +1000, Michael Ellerman wrote:
> Michal Suchánek <msuchanek@suse.de> writes:
> > Hello,
> >
> > On Tue, Jun 25, 2024 at 11:40:47PM +1000, Nicholas Piggin wrote:
> >> kexec on pseries disables AIL (reloc_on_exc), required for scv
> >> instruction support, before other CPUs have been shut down. This means
> >> they can execute scv instructions after AIL is disabled, which causes an
> >> interrupt at an unexpected entry location that crashes the kernel.
> >>
> >> Change the kexec sequence to disable AIL after other CPUs have been
> >> brought down.
> >>
> >> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
> >> fixed-location head code probably couldn't easily deal with implementing
> >> such high addresses so it was just decided not to support that interrupt
> >> at all.
> >>
> >> Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> >> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
> >
> > looks like this is only broken by
> > commit 2ab2d5794f14 ("powerpc/kasan: Disable address sanitization in kexec paths")
> >
> > This change reverts the kexec parts done in that commit.
> >
> > That is the fix is 5.19+, not 5.9+
>
> Commit 2ab2d5794f14 moved the kexec code from one file to another, but
> didn't change when the key function (pseries_disable_reloc_on_exc()) was
> called.
>
> The old code was:
>
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index a3dab15b0a2f..c9fcc30a0365 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -421,16 +421,6 @@ void pseries_disable_reloc_on_exc(void)
> }
> EXPORT_SYMBOL(pseries_disable_reloc_on_exc);
>
> -#ifdef CONFIG_KEXEC_CORE
> -static void pSeries_machine_kexec(struct kimage *image)
> -{
> - if (firmware_has_feature(FW_FEATURE_SET_MODE))
> - pseries_disable_reloc_on_exc();
> -
> - default_machine_kexec(image);
> -}
> -#endif
> -
>
> ie. pseries_disable_reloc_on_exc() (which disables AIL) is called before
> default_machine_kexec() where secondary CPUs are collected.
>
> So AFAICS the bug would still have been there prior to 2ab2d5794f14. But
> it's late here so I could be reading it wrong.
Indeed, missed that the code was only moved.
Thanks for the clarification
Michal
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-07-09 13:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-25 13:40 [PATCH] powerpc/pseries: Fix scv instruction crash with kexec Nicholas Piggin
2024-06-26 9:27 ` Michael Ellerman
2024-06-26 9:46 ` Sourabh Jain
2024-06-28 12:01 ` Michael Ellerman
2024-06-26 9:40 ` Gautam Menghani
2024-07-01 4:16 ` Sourabh Jain
2024-07-06 22:49 ` Michael Ellerman
2024-07-09 10:53 ` Michal Suchánek
2024-07-09 13:03 ` Michael Ellerman
2024-07-09 13:10 ` Michal Suchánek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).