* commit de752774f38bb causes fatal error on boot
@ 2024-09-10 11:11 Bert Karwatzki
2024-09-10 13:56 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Bert Karwatzki @ 2024-09-10 11:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Bert Karwatzki, linux-kernel, linux-next, Thomas Gleixner,
Darrick J . Wong, x86, chandanbabu, willy
When booting linux-next-20240910 on my MSI alpha 15 Laptop running debian sid (amd64),
I get dropped to a shell and get the folllowing error in dmesg. I bisected this to
commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again").
Error message:
[ 5.156254] [ T228] jump_label: Fatal kernel bug, unexpected op at mem_cgroup_sk_alloc+0x4/0xa0 [00000000ec1ab76c] (eb 05 e9 e0 2b != 66 90 0f 1f 00)) size:2 type:1
[ 5.156289] [ T228] ------------[ cut here ]------------
[ 5.156291] [ T228] kernel BUG at arch/x86/kernel/jump_label.c:73!
[ 5.156305] [ T228] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 5.156318] [ T228] CPU: 2 UID: 0 PID: 228 Comm: kworker/2:1 Not tainted 6.11.0-rc7-next-20240910-master #413
[ 5.156335] [ T228] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[ 5.156351] [ T228] Workqueue: cgroup_destroy css_free_rwork_fn
[ 5.156366] [ T228] RIP: 0010:__jump_label_patch.cold+0x24/0x26
[ 5.156380] [ T228] Code: 84 e9 af 39 73 ff 48 c7 c3 68 34 08 85 41 55 45 89 f1 49 89 d8 4c 89 e1 4c 89 e2 4c 89 e6 48 c7 c7 60 9c ca 84 e8 c5 89 00 00 <0f> 0b 4c 89 ca 48 83 ce ff bf 10 00 00 00 e8 72 bb 01 00 53 48 c7
[ 5.156403] [ T228] RSP: 0018:ffffa6b400623d80 EFLAGS: 00010246
[ 5.156415] [ T228] RAX: 0000000000000090 RBX: ffffffff84a01aa1 RCX: 0000000000000000
[ 5.156428] [ T228] RDX: 0000000000000000 RSI: ffff95bd2e6977c0 RDI: ffff95bd2e6977c0
[ 5.156441] [ T228] RBP: ffffa6b400623db0 R08: 0000000000000000 R09: ffffa6b400623c20
[ 5.156454] [ T228] R10: ffffffff84e81f88 R11: 0000000000000003 R12: ffffffff84091524
[ 5.156467] [ T228] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000000
[ 5.156480] [ T228] FS: 0000000000000000(0000) GS:ffff95bd2e680000(0000) knlGS:0000000000000000
[ 5.156495] [ T228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.156506] [ T228] CR2: 000055c651a32170 CR3: 0000000788018000 CR4: 0000000000750ef0
[ 5.156520] [ T228] PKRU: 55555554
[ 5.156528] [ T228] Call Trace:
[ 5.156538] [ T228] <TASK>
[ 5.156545] [ T228] ? __die+0x51/0x92
[ 5.156556] [ T228] ? die+0x29/0x50
[ 5.156566] [ T228] ? do_trap+0x105/0x110
[ 5.156577] [ T228] ? do_error_trap+0x60/0x80
[ 5.156587] [ T228] ? __jump_label_patch.cold+0x24/0x26
[ 5.156599] [ T228] ? exc_invalid_op+0x4d/0x70
[ 5.156610] [ T228] ? __jump_label_patch.cold+0x24/0x26
[ 5.156622] [ T228] ? asm_exc_invalid_op+0x1a/0x20
[ 5.156634] [ T228] ? mem_cgroup_sk_alloc+0x4/0xa0
[ 5.156646] [ T228] ? __jump_label_patch.cold+0x24/0x26
[ 5.156658] [ T228] ? arch_jump_label_transform_queue+0x32/0x80
[ 5.156671] [ T228] ? __jump_label_update+0x3d/0xf0
[ 5.156683] [ T228] ? __static_key_slow_dec_cpuslocked+0x4c/0x60
[ 5.156694] [ T228] ? static_key_slow_dec+0x1e/0x40
[ 5.156705] [ T228] ? mem_cgroup_css_free+0x9d/0xa0
[ 5.156716] [ T228] ? css_free_rwork_fn+0x45/0x370
[ 5.156728] [ T228] ? process_one_work+0x161/0x270
[ 5.156740] [ T228] ? worker_thread+0x2ea/0x420
[ 5.156751] [ T228] ? rescuer_thread+0x4e0/0x4e0
[ 5.156762] [ T228] ? kthread+0xcd/0x100
[ 5.156772] [ T228] ? kthread_park+0x80/0x80
[ 5.156782] [ T228] ? ret_from_fork+0x2f/0x50
[ 5.156793] [ T228] ? kthread_park+0x80/0x80
[ 5.156803] [ T228] ? ret_from_fork_asm+0x11/0x20
[ 5.156816] [ T228] </TASK>
[ 5.156824] [ T228] Modules linked in: mt7921e(+) mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr nvme_fabrics fuse efi_pstore configfs efivarfs autofs4 ext4 crc32c_generic mbcache jbd2 usbhid amdgpu i2c_algo_bit drm_ttm_helper xhci_pci ttm drm_exec drm_suballoc_helper xhci_hcd amdxcp drm_buddy hid_sensor_hub hid_multitouch usbcore nvme gpu_sched mfd_core i2c_piix4 hid_generic crc32c_intel psmouse drm_display_helper i2c_smbus amd_sfh nvme_core usb_common r8169 crc16 i2c_hid_acpi i2c_hid hid i2c_designware_platform i2c_designware_core
[ 5.156938] [ T228] ---[ end trace 0000000000000000 ]---
[ 5.278455] [ T228] RIP: 0010:__jump_label_patch.cold+0x24/0x26
[ 5.278461] [ T228] Code: 84 e9 af 39 73 ff 48 c7 c3 68 34 08 85 41 55 45 89 f1 49 89 d8 4c 89 e1 4c 89 e2 4c 89 e6 48 c7 c7 60 9c ca 84 e8 c5 89 00 00 <0f> 0b 4c 89 ca 48 83 ce ff bf 10 00 00 00 e8 72 bb 01 00 53 48 c7
[ 5.278462] [ T228] RSP: 0018:ffffa6b400623d80 EFLAGS: 00010246
[ 5.278465] [ T228] RAX: 0000000000000090 RBX: ffffffff84a01aa1 RCX: 0000000000000000
[ 5.278466] [ T228] RDX: 0000000000000000 RSI: ffff95bd2e6977c0 RDI: ffff95bd2e6977c0
[ 5.278467] [ T228] RBP: ffffa6b400623db0 R08: 0000000000000000 R09: ffffa6b400623c20
[ 5.278468] [ T228] R10: ffffffff84e81f88 R11: 0000000000000003 R12: ffffffff84091524
[ 5.278469] [ T228] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000000
[ 5.278470] [ T228] FS: 0000000000000000(0000) GS:ffff95bd2e680000(0000) knlGS:0000000000000000
[ 5.278472] [ T228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.278473] [ T228] CR2: 000055c651a32170 CR3: 00000001445b8000 CR4: 0000000000750ef0
[ 5.278474] [ T228] PKRU: 55555554
Reverting commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again")
in next-20240910 fixes the issue for me.
Bert Karwatzki
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: commit de752774f38bb causes fatal error on boot 2024-09-10 11:11 commit de752774f38bb causes fatal error on boot Bert Karwatzki @ 2024-09-10 13:56 ` Peter Zijlstra 2024-09-10 21:10 ` Bert Karwatzki 0 siblings, 1 reply; 3+ messages in thread From: Peter Zijlstra @ 2024-09-10 13:56 UTC (permalink / raw) To: Bert Karwatzki Cc: linux-kernel, linux-next, Thomas Gleixner, Darrick J . Wong, x86, chandanbabu, willy On Tue, Sep 10, 2024 at 01:11:09PM +0200, Bert Karwatzki wrote: > When booting linux-next-20240910 on my MSI alpha 15 Laptop running debian sid (amd64), > I get dropped to a shell and get the folllowing error in dmesg. I bisected this to > commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again"). I've just replaced that commit with the below -- which should be in tomorrows tree: --- commit 1d7f856c2ca449f04a22d876e36b464b7a9d28b6 Author: Peter Zijlstra <peterz@infradead.org> Date: Mon Sep 9 12:50:09 2024 +0200 jump_label: Fix static_key_slow_dec() yet again While commit 83ab38ef0a0b ("jump_label: Fix concurrency issues in static_key_slow_dec()") fixed one problem, it created yet another, notably the following is now possible: slow_dec if (try_dec) // dec_not_one-ish, false // enabled == 1 slow_inc if (inc_not_disabled) // inc_not_zero-ish // enabled == 2 return guard((mutex)(&jump_label_mutex); if (atomic_cmpxchg(1,0)==1) // false, we're 2 slow_dec if (try-dec) // dec_not_one, true // enabled == 1 return else try_dec() // dec_not_one, false WARN Use dec_and_test instead of cmpxchg(), like it was prior to 83ab38ef0a0b. Add a few WARNs for the paranoid. Fixes: 83ab38ef0a0b ("jump_label: Fix concurrency issues in static_key_slow_dec()") Reported-by: "Darrick J. Wong" <djwong@kernel.org> Tested-by: Klara Modin <klarasmodin@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> diff --git a/kernel/jump_label.c b/kernel/jump_label.c index 6dc76b590703..93a822d3c468 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -168,7 +168,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key) jump_label_update(key); /* * Ensure that when static_key_fast_inc_not_disabled() or - * static_key_slow_try_dec() observe the positive value, + * static_key_dec_not_one() observe the positive value, * they must also observe all the text changes. */ atomic_set_release(&key->enabled, 1); @@ -250,7 +250,7 @@ void static_key_disable(struct static_key *key) } EXPORT_SYMBOL_GPL(static_key_disable); -static bool static_key_slow_try_dec(struct static_key *key) +static bool static_key_dec_not_one(struct static_key *key) { int v; @@ -274,6 +274,14 @@ static bool static_key_slow_try_dec(struct static_key *key) * enabled. This suggests an ordering problem on the user side. */ WARN_ON_ONCE(v < 0); + + /* + * Warn about underflow, and lie about success in an attempt to + * not make things worse. + */ + if (WARN_ON_ONCE(v == 0)) + return true; + if (v <= 1) return false; } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1))); @@ -284,15 +292,27 @@ static bool static_key_slow_try_dec(struct static_key *key) static void __static_key_slow_dec_cpuslocked(struct static_key *key) { lockdep_assert_cpus_held(); + int val; - if (static_key_slow_try_dec(key)) + if (static_key_dec_not_one(key)) return; guard(mutex)(&jump_label_mutex); - if (atomic_cmpxchg(&key->enabled, 1, 0) == 1) + val = atomic_read(&key->enabled); + /* + * It should be impossible to observe -1 with jump_label_mutex held, + * see static_key_slow_inc_cpuslocked(). + */ + if (WARN_ON_ONCE(val == -1)) + return; + /* + * Cannot already be 0, something went sideways. + */ + if (WARN_ON_ONCE(val == 0)) + return; + + if (atomic_dec_and_test(&key->enabled)) jump_label_update(key); - else - WARN_ON_ONCE(!static_key_slow_try_dec(key)); } static void __static_key_slow_dec(struct static_key *key) @@ -329,7 +349,7 @@ void __static_key_slow_dec_deferred(struct static_key *key, { STATIC_KEY_CHECK_USE(key); - if (static_key_slow_try_dec(key)) + if (static_key_dec_not_one(key)) return; schedule_delayed_work(work, timeout); ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: commit de752774f38bb causes fatal error on boot 2024-09-10 13:56 ` Peter Zijlstra @ 2024-09-10 21:10 ` Bert Karwatzki 0 siblings, 0 replies; 3+ messages in thread From: Bert Karwatzki @ 2024-09-10 21:10 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-next, Thomas Gleixner, Darrick J . Wong, x86, chandanbabu, willy, spasswolf Am Dienstag, dem 10.09.2024 um 15:56 +0200 schrieb Peter Zijlstra: > On Tue, Sep 10, 2024 at 01:11:09PM +0200, Bert Karwatzki wrote: > > When booting linux-next-20240910 on my MSI alpha 15 Laptop running debian sid (amd64), > > I get dropped to a shell and get the folllowing error in dmesg. I bisected this to > > commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again"). > > I've just replaced that commit with the below -- which should be in > tomorrows tree: > > --- > commit 1d7f856c2ca449f04a22d876e36b464b7a9d28b6 > Author: Peter Zijlstra <peterz@infradead.org> > Date: Mon Sep 9 12:50:09 2024 +0200 > > jump_label: Fix static_key_slow_dec() yet again > > While commit 83ab38ef0a0b ("jump_label: Fix concurrency issues in > static_key_slow_dec()") fixed one problem, it created yet another, > notably the following is now possible: > > slow_dec > if (try_dec) // dec_not_one-ish, false > // enabled == 1 > slow_inc > if (inc_not_disabled) // inc_not_zero-ish > // enabled == 2 > return > > guard((mutex)(&jump_label_mutex); > if (atomic_cmpxchg(1,0)==1) // false, we're 2 > > slow_dec > if (try-dec) // dec_not_one, true > // enabled == 1 > return > else > try_dec() // dec_not_one, false > WARN > > Use dec_and_test instead of cmpxchg(), like it was prior to > 83ab38ef0a0b. Add a few WARNs for the paranoid. > > Fixes: 83ab38ef0a0b ("jump_label: Fix concurrency issues in static_key_slow_dec()") > Reported-by: "Darrick J. Wong" <djwong@kernel.org> > Tested-by: Klara Modin <klarasmodin@gmail.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > diff --git a/kernel/jump_label.c b/kernel/jump_label.c > index 6dc76b590703..93a822d3c468 100644 > --- a/kernel/jump_label.c > +++ b/kernel/jump_label.c > @@ -168,7 +168,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key) > jump_label_update(key); > /* > * Ensure that when static_key_fast_inc_not_disabled() or > - * static_key_slow_try_dec() observe the positive value, > + * static_key_dec_not_one() observe the positive value, > * they must also observe all the text changes. > */ > atomic_set_release(&key->enabled, 1); > @@ -250,7 +250,7 @@ void static_key_disable(struct static_key *key) > } > EXPORT_SYMBOL_GPL(static_key_disable); > > -static bool static_key_slow_try_dec(struct static_key *key) > +static bool static_key_dec_not_one(struct static_key *key) > { > int v; > > @@ -274,6 +274,14 @@ static bool static_key_slow_try_dec(struct static_key *key) > * enabled. This suggests an ordering problem on the user side. > */ > WARN_ON_ONCE(v < 0); > + > + /* > + * Warn about underflow, and lie about success in an attempt to > + * not make things worse. > + */ > + if (WARN_ON_ONCE(v == 0)) > + return true; > + > if (v <= 1) > return false; > } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1))); > @@ -284,15 +292,27 @@ static bool static_key_slow_try_dec(struct static_key *key) > static void __static_key_slow_dec_cpuslocked(struct static_key *key) > { > lockdep_assert_cpus_held(); > + int val; > > - if (static_key_slow_try_dec(key)) > + if (static_key_dec_not_one(key)) > return; > > guard(mutex)(&jump_label_mutex); > - if (atomic_cmpxchg(&key->enabled, 1, 0) == 1) > + val = atomic_read(&key->enabled); > + /* > + * It should be impossible to observe -1 with jump_label_mutex held, > + * see static_key_slow_inc_cpuslocked(). > + */ > + if (WARN_ON_ONCE(val == -1)) > + return; > + /* > + * Cannot already be 0, something went sideways. > + */ > + if (WARN_ON_ONCE(val == 0)) > + return; > + > + if (atomic_dec_and_test(&key->enabled)) > jump_label_update(key); > - else > - WARN_ON_ONCE(!static_key_slow_try_dec(key)); > } > > static void __static_key_slow_dec(struct static_key *key) > @@ -329,7 +349,7 @@ void __static_key_slow_dec_deferred(struct static_key *key, > { > STATIC_KEY_CHECK_USE(key); > > - if (static_key_slow_try_dec(key)) > + if (static_key_dec_not_one(key)) > return; > > schedule_delayed_work(work, timeout); Just tested the new version, no error here. Bert karwatzki ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-09-10 21:10 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-10 11:11 commit de752774f38bb causes fatal error on boot Bert Karwatzki 2024-09-10 13:56 ` Peter Zijlstra 2024-09-10 21:10 ` Bert Karwatzki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox