* KVM CPU hotplug notifier triggers BUG_ON on arm64
@ 2023-07-01 12:50 Kristina Martsenko
2023-07-01 17:42 ` Oliver Upton
0 siblings, 1 reply; 6+ messages in thread
From: Kristina Martsenko @ 2023-07-01 12:50 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, isaku.yamahata, seanjc, pbonzini
Cc: kvmarm, kvm, linux-arm-kernel, James Morse
Hi,
When I try to online a CPU on arm64 while a KVM guest is running, I hit a
BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
points at commit:
0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
Thanks,
Kristina
-->8--
/ # /root/lkvm-static run /root/kimgs/Image -c 1 --console virtio -p "earlycon loglevel=9" -d /root/kvm-rootfs/ &
/ # # lkvm run -k /root/kimgs/Image -m 256 -c 1 --name guest-112
/ # echo 0 > /sys/devices/system/cpu/cpu1/online
[ 2060.783263] psci: CPU1 killed (polled 0 ms)
/ # echo 1 > /sys/devices/system/cpu/cpu1/online
[ 2061.070582] Detected PIPT I-cache on CPU1
[ 2061.070800] GICv3: CPU1: found redistributor 100 region 0:0x000000002f120000
[ 2061.070985] CPU1: Booted secondary processor 0x0000000100 [0x410fd0f0]
[ 2061.071167] ------------[ cut here ]------------
[ 2061.071233] WARNING: CPU: 1 PID: 18 at arch/arm64/kernel/cpufeature.c:3228 this_cpu_has_cap+0x14/0x60
[ 2061.071403] Modules linked in:
[ 2061.071478] CPU: 1 PID: 18 Comm: cpuhp/1 Not tainted 6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.071606] Hardware name: FVP Base RevC (DT)
[ 2061.071678] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.071807] pc : this_cpu_has_cap+0x14/0x60
[ 2061.071922] lr : cpu_hyp_init_context+0x100/0x168
[ 2061.072028] sp : ffff800082a4bd10
[ 2061.072091] x29: ffff800082a4bd10 x28: 0000000000000000 x27: 0000000000000000
[ 2061.072270] x26: 0000000000000000 x25: ffff8000801227c8 x24: 0000000000000001
[ 2061.072447] x23: ffff800081c94008 x22: ffff80008234c7d0 x21: 0000000000000001
[ 2061.072626] x20: 0000000000000001 x19: ffff800081c958b0 x18: 0000000000000006
[ 2061.072803] x17: 000000040044ffff x16: 00500075b5503510 x15: ffff800082a0b920
[ 2061.072984] x14: 0000000000000000 x13: ffff800082351aa0 x12: 00000000000004a4
[ 2061.073159] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a4bce0
[ 2061.073337] x8 : ffff000800240ac0 x7 : ffff00087f768040 x6 : ffff80008234b010
[ 2061.073518] x5 : ffff00087f75f970 x4 : 0000000000000000 x3 : ffff80008012c140
[ 2061.073696] x2 : 0000000000000005 x1 : 0000000000000000 x0 : 0000000000000039
[ 2061.073868] Call trace:
[ 2061.073923] this_cpu_has_cap+0x14/0x60
[ 2061.074038] _kvm_arch_hardware_enable+0x48/0xa0
[ 2061.074148] kvm_arch_hardware_enable+0x2c/0x60
[ 2061.074263] __hardware_enable_nolock+0x40/0x78
[ 2061.074388] kvm_online_cpu+0x4c/0x6c
[ 2061.074507] cpuhp_invoke_callback+0x100/0x1f4
[ 2061.074631] cpuhp_thread_fun+0xac/0x194
[ 2061.074754] smpboot_thread_fn+0x224/0x248
[ 2061.074893] kthread+0x114/0x118
[ 2061.074996] ret_from_fork+0x10/0x20
[ 2061.075104] ---[ end trace 0000000000000000 ]---
[ 2061.075254] ------------[ cut here ]------------
[ 2061.075316] kernel BUG at arch/arm64/kvm/vgic/vgic-init.c:517!
[ 2061.075405] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 2061.075503] Modules linked in:
[ 2061.075580] CPU: 1 PID: 18 Comm: cpuhp/1 Tainted: G W 6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.075718] Hardware name: FVP Base RevC (DT)
[ 2061.075790] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.075922] pc : kvm_vgic_init_cpu_hardware+0x80/0x84
[ 2061.076061] lr : _kvm_arch_hardware_enable+0x94/0xa0
[ 2061.076169] sp : ffff800082a4bd40
[ 2061.076236] x29: ffff800082a4bd40 x28: 0000000000000000 x27: 0000000000000000
[ 2061.076412] x26: 0000000000000000 x25: ffff8000801227c8 x24: 0000000000000001
[ 2061.076588] x23: ffff800081c94008 x22: ffff80008234c7d0 x21: 0000000000000001
[ 2061.076768] x20: 0000000000000001 x19: ffff800081c958b0 x18: 0000000000000006
[ 2061.076944] x17: 000000040044ffff x16: 00500075b5503510 x15: ffff800082a0b920
[ 2061.077126] x14: 0000000000000000 x13: ffff800082351aa0 x12: 00000000000004a4
[ 2061.077303] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a4bce0
[ 2061.077479] x8 : ffff000800240ac0 x7 : ffff00087f768040 x6 : ffff80008234b010
[ 2061.077660] x5 : ffff00087f75f970 x4 : 0000000000000001 x3 : ffff800081ca1000
[ 2061.077838] x2 : ffff800081c958c0 x1 : 0000000000000008 x0 : 0000000000000000
[ 2061.078013] Call trace:
[ 2061.078068] kvm_vgic_init_cpu_hardware+0x80/0x84
[ 2061.078209] kvm_arch_hardware_enable+0x2c/0x60
[ 2061.078324] __hardware_enable_nolock+0x40/0x78
[ 2061.078450] kvm_online_cpu+0x4c/0x6c
[ 2061.078568] cpuhp_invoke_callback+0x100/0x1f4
[ 2061.078692] cpuhp_thread_fun+0xac/0x194
[ 2061.078816] smpboot_thread_fn+0x224/0x248
[ 2061.078955] kthread+0x114/0x118
[ 2061.079057] ret_from_fork+0x10/0x20
[ 2061.079199] Code: d50323bf d65f03c0 d53b4220 373ffc80 (d4210000)
[ 2061.079294] ---[ end trace 0000000000000000 ]---
[ 2061.288961] pstore: backend (efi_pstore) writing error (-5)
[ 2061.289043] note: cpuhp/1[18] exited with irqs disabled
[ 2061.289151] note: cpuhp/1[18] exited with preempt_count 1
[ 2061.289452] ------------[ cut here ]------------
[ 2061.289516] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.289712] Modules linked in:
[ 2061.289790] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W 6.4.0-rc3-00072-g192df2aa0113 #80
[ 2061.289928] Hardware name: FVP Base RevC (DT)
[ 2061.290000] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2061.290131] pc : ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.290275] lr : ct_idle_enter+0x10/0x1c
[ 2061.290407] sp : ffff800082a0bdd0
[ 2061.290473] x29: ffff800082a0bdd0 x28: 0000000000000000 x27: 0000000000000000
[ 2061.290650] x26: 0000000000000000 x25: ffff000800238000 x24: 0000000000000000
[ 2061.290826] x23: 0000000000000000 x22: ffff000800238000 x21: ffff800082339b48
[ 2061.291006] x20: ffff800082339a40 x19: ffff00087f764a18 x18: 0000000000000006
[ 2061.291185] x17: 0000000000000008 x16: ffff800082b7bff0 x15: ffff800082a4b4d0
[ 2061.291365] x14: 0000000000000059 x13: 0000000000000059 x12: 0000000000000001
[ 2061.291539] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082a0bd30
[ 2061.291715] x8 : ffff000800238ac0 x7 : 0000000000000000 x6 : 000000306bbc2709
[ 2061.291892] x5 : 4000000000000002 x4 : ffff8007fdac9000 x3 : ffff800082a0bdd0
[ 2061.292073] x2 : 4000000000000000 x1 : ffff800081c9ba18 x0 : ffff800081c9ba18
[ 2061.292255] Call trace:
[ 2061.292310] ct_kernel_exit.constprop.0+0x98/0xa0
[ 2061.292455] ct_idle_enter+0x10/0x1c
[ 2061.292589] default_idle_call+0x1c/0x3c
[ 2061.292728] do_idle+0x20c/0x264
[ 2061.292840] cpu_startup_entry+0x24/0x2c
[ 2061.292958] secondary_start_kernel+0x130/0x154
[ 2061.293090] __secondary_switched+0xb8/0xbc
[ 2061.293214] ---[ end trace 0000000000000000 ]---
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64 2023-07-01 12:50 KVM CPU hotplug notifier triggers BUG_ON on arm64 Kristina Martsenko @ 2023-07-01 17:42 ` Oliver Upton 2023-07-03 9:45 ` Marc Zyngier 0 siblings, 1 reply; 6+ messages in thread From: Oliver Upton @ 2023-07-01 17:42 UTC (permalink / raw) To: Kristina Martsenko Cc: Marc Zyngier, isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel, James Morse Hi Kristina, Thanks for the bug report. On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote: > Hi, > > When I try to online a CPU on arm64 while a KVM guest is running, I hit a > BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log. > > This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it > points at commit: > > 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Makes sense. We were using a spinlock before, which implictly disables preemption. Well, one way to hack around the problem would be to just cram preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's kinda gross in the context of cpuhp which isn't migratable in the first place. Let me have a look... -- Thanks, Oliver _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64 2023-07-01 17:42 ` Oliver Upton @ 2023-07-03 9:45 ` Marc Zyngier 2023-07-03 10:36 ` Kristina Martsenko 2023-07-03 16:02 ` Oliver Upton 0 siblings, 2 replies; 6+ messages in thread From: Marc Zyngier @ 2023-07-03 9:45 UTC (permalink / raw) To: Oliver Upton, Kristina Martsenko Cc: isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel, James Morse On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > Hi Kristina, > > Thanks for the bug report. > > On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote: > > Hi, > > > > When I try to online a CPU on arm64 while a KVM guest is running, I hit a > > BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log. > > > > This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it > > points at commit: > > > > 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") > > Makes sense. We were using a spinlock before, which implictly disables > preemption. > > Well, one way to hack around the problem would be to just cram > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's > kinda gross in the context of cpuhp which isn't migratable in the first > place. Let me have a look... An alternative would be to replace the preemptible() checks with a one that looks at the migration state, but I'm not sure that's much better (it certainly looks more costly). There is also the fact that most of our per-CPU accessors are already using preemption disabling, and this code has a bunch of them. So I'm not sure there is a lot to be gained from not disabling preemption upfront. Anyway, as I was able to reproduce the issue under NV, I tested the hack below. If anything, I expect it to be a reasonable fix for 6.3/6.4, and until we come up with a better approach. Thanks, M. diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index aaeae1145359..a28c4ffe4932 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard) int kvm_arch_hardware_enable(void) { - int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); + int was_enabled; + /* + * Most calls to this function are made with migration + * disabled, but not with preemption disabled. The former is + * enough to ensure correctness, but most of the helpers + * expect the later and will throw a tantrum otherwise. + */ + preempt_disable(); + + was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); _kvm_arch_hardware_enable(NULL); if (!was_enabled) { @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void) kvm_timer_cpu_up(); } + preempt_enable(); + return 0; } -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64 2023-07-03 9:45 ` Marc Zyngier @ 2023-07-03 10:36 ` Kristina Martsenko 2023-07-03 16:02 ` Oliver Upton 1 sibling, 0 replies; 6+ messages in thread From: Kristina Martsenko @ 2023-07-03 10:36 UTC (permalink / raw) To: Marc Zyngier, Oliver Upton Cc: isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel, James Morse On 03/07/2023 10:45, Marc Zyngier wrote: > On Sat, 01 Jul 2023 18:42:28 +0100, > Oliver Upton <oliver.upton@linux.dev> wrote: >> >> Hi Kristina, >> >> Thanks for the bug report. >> >> On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote: >>> Hi, >>> >>> When I try to online a CPU on arm64 while a KVM guest is running, I hit a >>> BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log. >>> >>> This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it >>> points at commit: >>> >>> 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") >> >> Makes sense. We were using a spinlock before, which implictly disables >> preemption. >> >> Well, one way to hack around the problem would be to just cram >> preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's >> kinda gross in the context of cpuhp which isn't migratable in the first >> place. Let me have a look... > > An alternative would be to replace the preemptible() checks with a one > that looks at the migration state, but I'm not sure that's much better > (it certainly looks more costly). > > There is also the fact that most of our per-CPU accessors are already > using preemption disabling, and this code has a bunch of them. So I'm > not sure there is a lot to be gained from not disabling preemption > upfront. > > Anyway, as I was able to reproduce the issue under NV, I tested the > hack below. If anything, I expect it to be a reasonable fix for > 6.3/6.4, and until we come up with a better approach. > > Thanks, > > M. > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index aaeae1145359..a28c4ffe4932 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard) > > int kvm_arch_hardware_enable(void) > { > - int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); > + int was_enabled; > > + /* > + * Most calls to this function are made with migration > + * disabled, but not with preemption disabled. The former is > + * enough to ensure correctness, but most of the helpers > + * expect the later and will throw a tantrum otherwise. > + */ > + preempt_disable(); > + > + was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); > _kvm_arch_hardware_enable(NULL); > > if (!was_enabled) { > @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void) > kvm_timer_cpu_up(); > } > > + preempt_enable(); > + > return 0; > } This fixes the issue for me. Tested-by: Kristina Martsenko <kristina.martsenko@arm.com> Thanks, Kristina _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64 2023-07-03 9:45 ` Marc Zyngier 2023-07-03 10:36 ` Kristina Martsenko @ 2023-07-03 16:02 ` Oliver Upton 2023-07-03 16:38 ` Marc Zyngier 1 sibling, 1 reply; 6+ messages in thread From: Oliver Upton @ 2023-07-03 16:02 UTC (permalink / raw) To: Marc Zyngier Cc: Kristina Martsenko, isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel, James Morse Hey Marc, On Mon, Jul 03, 2023 at 10:45:26AM +0100, Marc Zyngier wrote: > On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > Well, one way to hack around the problem would be to just cram > > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's > > kinda gross in the context of cpuhp which isn't migratable in the first > > place. Let me have a look... Heh, I should've mentioned I'm on holiday until Thursday. > An alternative would be to replace the preemptible() checks with a one > that looks at the migration state, but I'm not sure that's much better > (it certainly looks more costly). > > There is also the fact that most of our per-CPU accessors are already > using preemption disabling, and this code has a bunch of them. So I'm > not sure there is a lot to be gained from not disabling preemption > upfront. > > Anyway, as I was able to reproduce the issue under NV, I tested the > hack below. If anything, I expect it to be a reasonable fix for > 6.3/6.4, and until we come up with a better approach. Yeah, I'm fine with a hack like this. Do you want to send this out as a patch? -- Thanks, Oliver > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index aaeae1145359..a28c4ffe4932 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard) > > int kvm_arch_hardware_enable(void) > { > - int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); > + int was_enabled; > > + /* > + * Most calls to this function are made with migration > + * disabled, but not with preemption disabled. The former is > + * enough to ensure correctness, but most of the helpers > + * expect the later and will throw a tantrum otherwise. > + */ > + preempt_disable(); > + > + was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); > _kvm_arch_hardware_enable(NULL); > > if (!was_enabled) { > @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void) > kvm_timer_cpu_up(); > } > > + preempt_enable(); > + > return 0; > } > > > > > -- > Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: KVM CPU hotplug notifier triggers BUG_ON on arm64 2023-07-03 16:02 ` Oliver Upton @ 2023-07-03 16:38 ` Marc Zyngier 0 siblings, 0 replies; 6+ messages in thread From: Marc Zyngier @ 2023-07-03 16:38 UTC (permalink / raw) To: Oliver Upton Cc: Kristina Martsenko, isaku.yamahata, seanjc, pbonzini, kvmarm, kvm, linux-arm-kernel, James Morse On Mon, 03 Jul 2023 17:02:30 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > Hey Marc, > > On Mon, Jul 03, 2023 at 10:45:26AM +0100, Marc Zyngier wrote: > > On Sat, 01 Jul 2023 18:42:28 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > > Well, one way to hack around the problem would be to just cram > > > preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's > > > kinda gross in the context of cpuhp which isn't migratable in the first > > > place. Let me have a look... > > Heh, I should've mentioned I'm on holiday until Thursday. No problem, happy to keep an eye on stuff in the meantime. > > > An alternative would be to replace the preemptible() checks with a one > > that looks at the migration state, but I'm not sure that's much better > > (it certainly looks more costly). > > > > There is also the fact that most of our per-CPU accessors are already > > using preemption disabling, and this code has a bunch of them. So I'm > > not sure there is a lot to be gained from not disabling preemption > > upfront. > > > > Anyway, as I was able to reproduce the issue under NV, I tested the > > hack below. If anything, I expect it to be a reasonable fix for > > 6.3/6.4, and until we come up with a better approach. > > Yeah, I'm fine with a hack like this. Do you want to send this out as a > patch? Now sent as 20230703163548.1498943-1-maz@kernel.org. Enjoy your time off! M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-03 16:38 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-01 12:50 KVM CPU hotplug notifier triggers BUG_ON on arm64 Kristina Martsenko 2023-07-01 17:42 ` Oliver Upton 2023-07-03 9:45 ` Marc Zyngier 2023-07-03 10:36 ` Kristina Martsenko 2023-07-03 16:02 ` Oliver Upton 2023-07-03 16:38 ` Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).