public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
* commit de752774f38bb causes fatal error on boot
@ 2024-09-10 11:11 Bert Karwatzki
  2024-09-10 13:56 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Bert Karwatzki @ 2024-09-10 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Bert Karwatzki, linux-kernel, linux-next, Thomas Gleixner,
	Darrick J . Wong, x86, chandanbabu, willy

When booting linux-next-20240910 on my MSI alpha 15 Laptop running debian sid (amd64),
I get dropped to a shell and get the folllowing error in dmesg. I bisected this to
commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again").

Error message:
[    5.156254] [  T228] jump_label: Fatal kernel bug, unexpected op at mem_cgroup_sk_alloc+0x4/0xa0 [00000000ec1ab76c] (eb 05 e9 e0 2b != 66 90 0f 1f 00)) size:2 type:1
[    5.156289] [  T228] ------------[ cut here ]------------
[    5.156291] [  T228] kernel BUG at arch/x86/kernel/jump_label.c:73!
[    5.156305] [  T228] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[    5.156318] [  T228] CPU: 2 UID: 0 PID: 228 Comm: kworker/2:1 Not tainted 6.11.0-rc7-next-20240910-master #413
[    5.156335] [  T228] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021
[    5.156351] [  T228] Workqueue: cgroup_destroy css_free_rwork_fn
[    5.156366] [  T228] RIP: 0010:__jump_label_patch.cold+0x24/0x26
[    5.156380] [  T228] Code: 84 e9 af 39 73 ff 48 c7 c3 68 34 08 85 41 55 45 89 f1 49 89 d8 4c 89 e1 4c 89 e2 4c 89 e6 48 c7 c7 60 9c ca 84 e8 c5 89 00 00 <0f> 0b 4c 89 ca 48 83 ce ff bf 10 00 00 00 e8 72 bb 01 00 53 48 c7
[    5.156403] [  T228] RSP: 0018:ffffa6b400623d80 EFLAGS: 00010246
[    5.156415] [  T228] RAX: 0000000000000090 RBX: ffffffff84a01aa1 RCX: 0000000000000000
[    5.156428] [  T228] RDX: 0000000000000000 RSI: ffff95bd2e6977c0 RDI: ffff95bd2e6977c0
[    5.156441] [  T228] RBP: ffffa6b400623db0 R08: 0000000000000000 R09: ffffa6b400623c20
[    5.156454] [  T228] R10: ffffffff84e81f88 R11: 0000000000000003 R12: ffffffff84091524
[    5.156467] [  T228] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000000
[    5.156480] [  T228] FS:  0000000000000000(0000) GS:ffff95bd2e680000(0000) knlGS:0000000000000000
[    5.156495] [  T228] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.156506] [  T228] CR2: 000055c651a32170 CR3: 0000000788018000 CR4: 0000000000750ef0
[    5.156520] [  T228] PKRU: 55555554
[    5.156528] [  T228] Call Trace:
[    5.156538] [  T228]  <TASK>
[    5.156545] [  T228]  ? __die+0x51/0x92
[    5.156556] [  T228]  ? die+0x29/0x50
[    5.156566] [  T228]  ? do_trap+0x105/0x110
[    5.156577] [  T228]  ? do_error_trap+0x60/0x80
[    5.156587] [  T228]  ? __jump_label_patch.cold+0x24/0x26
[    5.156599] [  T228]  ? exc_invalid_op+0x4d/0x70
[    5.156610] [  T228]  ? __jump_label_patch.cold+0x24/0x26
[    5.156622] [  T228]  ? asm_exc_invalid_op+0x1a/0x20
[    5.156634] [  T228]  ? mem_cgroup_sk_alloc+0x4/0xa0
[    5.156646] [  T228]  ? __jump_label_patch.cold+0x24/0x26
[    5.156658] [  T228]  ? arch_jump_label_transform_queue+0x32/0x80
[    5.156671] [  T228]  ? __jump_label_update+0x3d/0xf0
[    5.156683] [  T228]  ? __static_key_slow_dec_cpuslocked+0x4c/0x60
[    5.156694] [  T228]  ? static_key_slow_dec+0x1e/0x40
[    5.156705] [  T228]  ? mem_cgroup_css_free+0x9d/0xa0
[    5.156716] [  T228]  ? css_free_rwork_fn+0x45/0x370
[    5.156728] [  T228]  ? process_one_work+0x161/0x270
[    5.156740] [  T228]  ? worker_thread+0x2ea/0x420
[    5.156751] [  T228]  ? rescuer_thread+0x4e0/0x4e0
[    5.156762] [  T228]  ? kthread+0xcd/0x100
[    5.156772] [  T228]  ? kthread_park+0x80/0x80
[    5.156782] [  T228]  ? ret_from_fork+0x2f/0x50
[    5.156793] [  T228]  ? kthread_park+0x80/0x80
[    5.156803] [  T228]  ? ret_from_fork_asm+0x11/0x20
[    5.156816] [  T228]  </TASK>
[    5.156824] [  T228] Modules linked in: mt7921e(+) mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr nvme_fabrics fuse efi_pstore configfs efivarfs autofs4 ext4 crc32c_generic mbcache jbd2 usbhid amdgpu i2c_algo_bit drm_ttm_helper xhci_pci ttm drm_exec drm_suballoc_helper xhci_hcd amdxcp drm_buddy hid_sensor_hub hid_multitouch usbcore nvme gpu_sched mfd_core i2c_piix4 hid_generic crc32c_intel psmouse drm_display_helper i2c_smbus amd_sfh nvme_core usb_common r8169 crc16 i2c_hid_acpi i2c_hid hid i2c_designware_platform i2c_designware_core
[    5.156938] [  T228] ---[ end trace 0000000000000000 ]---
[    5.278455] [  T228] RIP: 0010:__jump_label_patch.cold+0x24/0x26
[    5.278461] [  T228] Code: 84 e9 af 39 73 ff 48 c7 c3 68 34 08 85 41 55 45 89 f1 49 89 d8 4c 89 e1 4c 89 e2 4c 89 e6 48 c7 c7 60 9c ca 84 e8 c5 89 00 00 <0f> 0b 4c 89 ca 48 83 ce ff bf 10 00 00 00 e8 72 bb 01 00 53 48 c7
[    5.278462] [  T228] RSP: 0018:ffffa6b400623d80 EFLAGS: 00010246
[    5.278465] [  T228] RAX: 0000000000000090 RBX: ffffffff84a01aa1 RCX: 0000000000000000
[    5.278466] [  T228] RDX: 0000000000000000 RSI: ffff95bd2e6977c0 RDI: ffff95bd2e6977c0
[    5.278467] [  T228] RBP: ffffa6b400623db0 R08: 0000000000000000 R09: ffffa6b400623c20
[    5.278468] [  T228] R10: ffffffff84e81f88 R11: 0000000000000003 R12: ffffffff84091524
[    5.278469] [  T228] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000000
[    5.278470] [  T228] FS:  0000000000000000(0000) GS:ffff95bd2e680000(0000) knlGS:0000000000000000
[    5.278472] [  T228] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.278473] [  T228] CR2: 000055c651a32170 CR3: 00000001445b8000 CR4: 0000000000750ef0
[    5.278474] [  T228] PKRU: 55555554

Reverting commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again")
in next-20240910 fixes the issue for me.

Bert Karwatzki

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: commit de752774f38bb causes fatal error on boot
  2024-09-10 11:11 commit de752774f38bb causes fatal error on boot Bert Karwatzki
@ 2024-09-10 13:56 ` Peter Zijlstra
  2024-09-10 21:10   ` Bert Karwatzki
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2024-09-10 13:56 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: linux-kernel, linux-next, Thomas Gleixner, Darrick J . Wong, x86,
	chandanbabu, willy

On Tue, Sep 10, 2024 at 01:11:09PM +0200, Bert Karwatzki wrote:
> When booting linux-next-20240910 on my MSI alpha 15 Laptop running debian sid (amd64),
> I get dropped to a shell and get the folllowing error in dmesg. I bisected this to
> commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again").

I've just replaced that commit with the below -- which should be in
tomorrows tree:

---
commit 1d7f856c2ca449f04a22d876e36b464b7a9d28b6
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Sep 9 12:50:09 2024 +0200

    jump_label: Fix static_key_slow_dec() yet again
    
    While commit 83ab38ef0a0b ("jump_label: Fix concurrency issues in
    static_key_slow_dec()") fixed one problem, it created yet another,
    notably the following is now possible:
    
      slow_dec
        if (try_dec) // dec_not_one-ish, false
        // enabled == 1
                                    slow_inc
                                      if (inc_not_disabled) // inc_not_zero-ish
                                      // enabled == 2
                                        return
    
        guard((mutex)(&jump_label_mutex);
        if (atomic_cmpxchg(1,0)==1) // false, we're 2
    
                                    slow_dec
                                      if (try-dec) // dec_not_one, true
                                      // enabled == 1
                                        return
        else
          try_dec() // dec_not_one, false
          WARN
    
    Use dec_and_test instead of cmpxchg(), like it was prior to
    83ab38ef0a0b. Add a few WARNs for the paranoid.
    
    Fixes: 83ab38ef0a0b ("jump_label: Fix concurrency issues in static_key_slow_dec()")
    Reported-by: "Darrick J. Wong" <djwong@kernel.org>
    Tested-by: Klara Modin <klarasmodin@gmail.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 6dc76b590703..93a822d3c468 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -168,7 +168,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
 		jump_label_update(key);
 		/*
 		 * Ensure that when static_key_fast_inc_not_disabled() or
-		 * static_key_slow_try_dec() observe the positive value,
+		 * static_key_dec_not_one() observe the positive value,
 		 * they must also observe all the text changes.
 		 */
 		atomic_set_release(&key->enabled, 1);
@@ -250,7 +250,7 @@ void static_key_disable(struct static_key *key)
 }
 EXPORT_SYMBOL_GPL(static_key_disable);
 
-static bool static_key_slow_try_dec(struct static_key *key)
+static bool static_key_dec_not_one(struct static_key *key)
 {
 	int v;
 
@@ -274,6 +274,14 @@ static bool static_key_slow_try_dec(struct static_key *key)
 		 * enabled. This suggests an ordering problem on the user side.
 		 */
 		WARN_ON_ONCE(v < 0);
+
+		/*
+		 * Warn about underflow, and lie about success in an attempt to
+		 * not make things worse.
+		 */
+		if (WARN_ON_ONCE(v == 0))
+			return true;
+
 		if (v <= 1)
 			return false;
 	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
@@ -284,15 +292,27 @@ static bool static_key_slow_try_dec(struct static_key *key)
 static void __static_key_slow_dec_cpuslocked(struct static_key *key)
 {
 	lockdep_assert_cpus_held();
+	int val;
 
-	if (static_key_slow_try_dec(key))
+	if (static_key_dec_not_one(key))
 		return;
 
 	guard(mutex)(&jump_label_mutex);
-	if (atomic_cmpxchg(&key->enabled, 1, 0) == 1)
+	val = atomic_read(&key->enabled);
+	/*
+	 * It should be impossible to observe -1 with jump_label_mutex held,
+	 * see static_key_slow_inc_cpuslocked().
+	 */
+	if (WARN_ON_ONCE(val == -1))
+		return;
+	/*
+	 * Cannot already be 0, something went sideways.
+	 */
+	if (WARN_ON_ONCE(val == 0))
+		return;
+
+	if (atomic_dec_and_test(&key->enabled))
 		jump_label_update(key);
-	else
-		WARN_ON_ONCE(!static_key_slow_try_dec(key));
 }
 
 static void __static_key_slow_dec(struct static_key *key)
@@ -329,7 +349,7 @@ void __static_key_slow_dec_deferred(struct static_key *key,
 {
 	STATIC_KEY_CHECK_USE(key);
 
-	if (static_key_slow_try_dec(key))
+	if (static_key_dec_not_one(key))
 		return;
 
 	schedule_delayed_work(work, timeout);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: commit de752774f38bb causes fatal error on boot
  2024-09-10 13:56 ` Peter Zijlstra
@ 2024-09-10 21:10   ` Bert Karwatzki
  0 siblings, 0 replies; 3+ messages in thread
From: Bert Karwatzki @ 2024-09-10 21:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-next, Thomas Gleixner, Darrick J . Wong, x86,
	chandanbabu, willy, spasswolf

Am Dienstag, dem 10.09.2024 um 15:56 +0200 schrieb Peter Zijlstra:
> On Tue, Sep 10, 2024 at 01:11:09PM +0200, Bert Karwatzki wrote:
> > When booting linux-next-20240910 on my MSI alpha 15 Laptop running debian sid (amd64),
> > I get dropped to a shell and get the folllowing error in dmesg. I bisected this to
> > commit de752774f38bb ("jump_label: Fix static_key_slow_dec() yet again").
>
> I've just replaced that commit with the below -- which should be in
> tomorrows tree:
>
> ---
> commit 1d7f856c2ca449f04a22d876e36b464b7a9d28b6
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Mon Sep 9 12:50:09 2024 +0200
>
>     jump_label: Fix static_key_slow_dec() yet again
>
>     While commit 83ab38ef0a0b ("jump_label: Fix concurrency issues in
>     static_key_slow_dec()") fixed one problem, it created yet another,
>     notably the following is now possible:
>
>       slow_dec
>         if (try_dec) // dec_not_one-ish, false
>         // enabled == 1
>                                     slow_inc
>                                       if (inc_not_disabled) // inc_not_zero-ish
>                                       // enabled == 2
>                                         return
>
>         guard((mutex)(&jump_label_mutex);
>         if (atomic_cmpxchg(1,0)==1) // false, we're 2
>
>                                     slow_dec
>                                       if (try-dec) // dec_not_one, true
>                                       // enabled == 1
>                                         return
>         else
>           try_dec() // dec_not_one, false
>           WARN
>
>     Use dec_and_test instead of cmpxchg(), like it was prior to
>     83ab38ef0a0b. Add a few WARNs for the paranoid.
>
>     Fixes: 83ab38ef0a0b ("jump_label: Fix concurrency issues in static_key_slow_dec()")
>     Reported-by: "Darrick J. Wong" <djwong@kernel.org>
>     Tested-by: Klara Modin <klarasmodin@gmail.com>
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 6dc76b590703..93a822d3c468 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -168,7 +168,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
>  		jump_label_update(key);
>  		/*
>  		 * Ensure that when static_key_fast_inc_not_disabled() or
> -		 * static_key_slow_try_dec() observe the positive value,
> +		 * static_key_dec_not_one() observe the positive value,
>  		 * they must also observe all the text changes.
>  		 */
>  		atomic_set_release(&key->enabled, 1);
> @@ -250,7 +250,7 @@ void static_key_disable(struct static_key *key)
>  }
>  EXPORT_SYMBOL_GPL(static_key_disable);
>
> -static bool static_key_slow_try_dec(struct static_key *key)
> +static bool static_key_dec_not_one(struct static_key *key)
>  {
>  	int v;
>
> @@ -274,6 +274,14 @@ static bool static_key_slow_try_dec(struct static_key *key)
>  		 * enabled. This suggests an ordering problem on the user side.
>  		 */
>  		WARN_ON_ONCE(v < 0);
> +
> +		/*
> +		 * Warn about underflow, and lie about success in an attempt to
> +		 * not make things worse.
> +		 */
> +		if (WARN_ON_ONCE(v == 0))
> +			return true;
> +
>  		if (v <= 1)
>  			return false;
>  	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1)));
> @@ -284,15 +292,27 @@ static bool static_key_slow_try_dec(struct static_key *key)
>  static void __static_key_slow_dec_cpuslocked(struct static_key *key)
>  {
>  	lockdep_assert_cpus_held();
> +	int val;
>
> -	if (static_key_slow_try_dec(key))
> +	if (static_key_dec_not_one(key))
>  		return;
>
>  	guard(mutex)(&jump_label_mutex);
> -	if (atomic_cmpxchg(&key->enabled, 1, 0) == 1)
> +	val = atomic_read(&key->enabled);
> +	/*
> +	 * It should be impossible to observe -1 with jump_label_mutex held,
> +	 * see static_key_slow_inc_cpuslocked().
> +	 */
> +	if (WARN_ON_ONCE(val == -1))
> +		return;
> +	/*
> +	 * Cannot already be 0, something went sideways.
> +	 */
> +	if (WARN_ON_ONCE(val == 0))
> +		return;
> +
> +	if (atomic_dec_and_test(&key->enabled))
>  		jump_label_update(key);
> -	else
> -		WARN_ON_ONCE(!static_key_slow_try_dec(key));
>  }
>
>  static void __static_key_slow_dec(struct static_key *key)
> @@ -329,7 +349,7 @@ void __static_key_slow_dec_deferred(struct static_key *key,
>  {
>  	STATIC_KEY_CHECK_USE(key);
>
> -	if (static_key_slow_try_dec(key))
> +	if (static_key_dec_not_one(key))
>  		return;
>
>  	schedule_delayed_work(work, timeout);

Just tested the new version, no error here.

Bert karwatzki

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-09-10 21:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 11:11 commit de752774f38bb causes fatal error on boot Bert Karwatzki
2024-09-10 13:56 ` Peter Zijlstra
2024-09-10 21:10   ` Bert Karwatzki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox