public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* Regression on linux-next (next-20260324 )
@ 2026-03-27 13:39 Borah, Chaitanya Kumar
  2026-03-27 16:31 ` Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Borah, Chaitanya Kumar @ 2026-03-27 13:39 UTC (permalink / raw)
  To: willy
  Cc: linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, peterz, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

Hello Matthew,

Hope you are doing well. I am Chaitanya from the linux graphics team in 
Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on
linux-next repository.

Since the version next-20260324 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current 
test with SIGQUIT.
<6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) 
show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) 
kill-all-tasks(i) thaw-filesystems(j) sak(k) 
show-backtrace-all-active-cpus(l) show-memory-usage(m) 
nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) 
unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) 
show-blocked-tasks(w) dump-ftrace-buffer(z) replay-kernel-logs(R)
<6>[  157.399543] sysrq: Show State
<6>[  157.403061] task:systemd         state:S stack:0     pid:1 
tgid:1     ppid:0      task_flags:0x400100 flags:0x00080000
<6>[  157.403067] Call Trace:
<6>[  157.403069]  <TASK>
<6>[  157.403072]  __schedule+0x5d7/0x1ef0
<6>[  157.403078]  ? lock_acquire+0xc4/0x300
<6>[  157.403084]  ? schedule+0x10e/0x180
<6>[  157.403087]  ? lock_release+0xcd/0x2b0
<6>[  157.403092]  schedule+0x3a/0x180
<6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
<6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
<6>[  157.403102]  ? lock_release+0xcd/0x2b0
<6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
<6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
<6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first 
"bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
Author: Matthew Wilcox (Oracle) willy@infradead.org
Date:   Thu Mar 5 19:55:43 2026 +0000

     locking/mutex: Remove the list_head from struct mutex
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflict but resetting to 
the parent of the commit seems to fix the issue.

Could you please check why the patch causes this regression and provide 
a fix if necessary?

Thank you.

Regards

Chaitanya

[1]
https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20260324
[3]
https://intel-gfx-ci.01.org/tree/linux-next/next-20260326/bat-arlh-2/dmesg0.txt
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130&id=f4acfcd4deb158b96595250cc332901b282d15b0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
@ 2026-03-27 16:31 ` Peter Zijlstra
  2026-03-27 16:43   ` Peter Zijlstra
  2026-03-27 16:49 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-27 16:31 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
> Hello Matthew,
> 
> Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> 
> This mail is regarding a regression we are seeing in our CI runs[1] on
> linux-next repository.
> 
> Since the version next-20260324 [2], we are seeing the following regression
> 
> `````````````````````````````````````````````````````````````````````````````````
> <5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current
> test with SIGQUIT.
> <6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
> show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
> kill-all-tasks(i) thaw-filesystems(j) sak(k)
> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
> show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
> dump-ftrace-buffer(z) replay-kernel-logs(R)
> <6>[  157.399543] sysrq: Show State
> <6>[  157.403061] task:systemd         state:S stack:0     pid:1 tgid:1
> ppid:0      task_flags:0x400100 flags:0x00080000
> <6>[  157.403067] Call Trace:
> <6>[  157.403069]  <TASK>
> <6>[  157.403072]  __schedule+0x5d7/0x1ef0
> <6>[  157.403078]  ? lock_acquire+0xc4/0x300
> <6>[  157.403084]  ? schedule+0x10e/0x180
> <6>[  157.403087]  ? lock_release+0xcd/0x2b0
> <6>[  157.403092]  schedule+0x3a/0x180
> <6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
> <6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
> <6>[  157.403102]  ? lock_release+0xcd/0x2b0
> <6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
> <6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
> <6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [3].
> 
> After bisecting the tree, the following patch [4] seems to be the first
> "bad" commit
> 
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
> Author: Matthew Wilcox (Oracle) willy@infradead.org
> Date:   Thu Mar 5 19:55:43 2026 +0000
> 
>     locking/mutex: Remove the list_head from struct mutex
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> 
> We could not revert the patch because of merge conflict but resetting to the
> parent of the commit seems to fix the issue.
> 
> Could you please check why the patch causes this regression and provide a
> fix if necessary?

Does this help?

---
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -40,10 +40,10 @@ __ww_waiter_last(struct mutex *lock)
 	__must_hold(&lock->wait_lock)
 {
 	struct mutex_waiter *w = lock->first_waiter;
+	if (!w)
+		return NULL;
 
-	if (w)
-		w = list_prev_entry(w, list);
-	return w;
+	return __ww_waiter_prev(lock, w);
 }
 
 static inline void

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-27 16:31 ` Peter Zijlstra
@ 2026-03-27 16:43   ` Peter Zijlstra
  2026-03-30  8:26     ` Borah, Chaitanya Kumar
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-27 16:43 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
> > Hello Matthew,
> > 
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> > Intel.
> > 
> > This mail is regarding a regression we are seeing in our CI runs[1] on
> > linux-next repository.
> > 
> > Since the version next-20260324 [2], we are seeing the following regression
> > 
> > `````````````````````````````````````````````````````````````````````````````````
> > <5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current
> > test with SIGQUIT.
> > <6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
> > show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
> > kill-all-tasks(i) thaw-filesystems(j) sak(k)
> > show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
> > poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
> > show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
> > dump-ftrace-buffer(z) replay-kernel-logs(R)
> > <6>[  157.399543] sysrq: Show State
> > <6>[  157.403061] task:systemd         state:S stack:0     pid:1 tgid:1
> > ppid:0      task_flags:0x400100 flags:0x00080000
> > <6>[  157.403067] Call Trace:
> > <6>[  157.403069]  <TASK>
> > <6>[  157.403072]  __schedule+0x5d7/0x1ef0
> > <6>[  157.403078]  ? lock_acquire+0xc4/0x300
> > <6>[  157.403084]  ? schedule+0x10e/0x180
> > <6>[  157.403087]  ? lock_release+0xcd/0x2b0
> > <6>[  157.403092]  schedule+0x3a/0x180
> > <6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
> > <6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
> > <6>[  157.403102]  ? lock_release+0xcd/0x2b0
> > <6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
> > <6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
> > <6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
> > `````````````````````````````````````````````````````````````````````````````````
> > Details log can be found in [3].
> > 
> > After bisecting the tree, the following patch [4] seems to be the first
> > "bad" commit
> > 
> > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
> > Author: Matthew Wilcox (Oracle) willy@infradead.org
> > Date:   Thu Mar 5 19:55:43 2026 +0000
> > 
> >     locking/mutex: Remove the list_head from struct mutex
> > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > 
> > We could not revert the patch because of merge conflict but resetting to the
> > parent of the commit seems to fix the issue.
> > 
> > Could you please check why the patch causes this regression and provide a
> > fix if necessary?
> 
> Does this help?

More tidy version of the same...

---
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index b1834ab7e782..bb8b410779d4 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
 	struct mutex_waiter *w = lock->first_waiter;
 
 	if (w)
-		w = list_prev_entry(w, list);
+		w = __ww_waiter_prev(lock, w);
 	return w;
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2)
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
  2026-03-27 16:31 ` Peter Zijlstra
@ 2026-03-27 16:49 ` Patchwork
  2026-04-20 19:22 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev3) Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2026-03-27 16:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-gfx

== Series Details ==

Series: Regression on linux-next (next-20260324 ) (rev2)
URL   : https://patchwork.freedesktop.org/series/164009/
State : failure

== Summary ==

Address 'peterz@infradead.org' is not on the allowlist, which prevents CI from being triggered for this patch.
If you want Intel GFX CI to accept this address, please contact the script maintainers at i915-ci-infra@lists.freedesktop.org.
Exception occurred during validation, bailing out!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-27 16:43   ` Peter Zijlstra
@ 2026-03-30  8:26     ` Borah, Chaitanya Kumar
  2026-03-30 19:50       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Borah, Chaitanya Kumar @ 2026-03-30  8:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam



On 3/27/2026 10:13 PM, Peter Zijlstra wrote:
> On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote:
>> On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
>>> Hello Matthew,
>>>
>>> Hope you are doing well. I am Chaitanya from the linux graphics team in
>>> Intel.
>>>
>>> This mail is regarding a regression we are seeing in our CI runs[1] on
>>> linux-next repository.
>>>
>>> Since the version next-20260324 [2], we are seeing the following regression
>>>
>>> `````````````````````````````````````````````````````````````````````````````````
>>> <5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current
>>> test with SIGQUIT.
>>> <6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
>>> show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
>>> kill-all-tasks(i) thaw-filesystems(j) sak(k)
>>> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
>>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
>>> show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
>>> dump-ftrace-buffer(z) replay-kernel-logs(R)
>>> <6>[  157.399543] sysrq: Show State
>>> <6>[  157.403061] task:systemd         state:S stack:0     pid:1 tgid:1
>>> ppid:0      task_flags:0x400100 flags:0x00080000
>>> <6>[  157.403067] Call Trace:
>>> <6>[  157.403069]  <TASK>
>>> <6>[  157.403072]  __schedule+0x5d7/0x1ef0
>>> <6>[  157.403078]  ? lock_acquire+0xc4/0x300
>>> <6>[  157.403084]  ? schedule+0x10e/0x180
>>> <6>[  157.403087]  ? lock_release+0xcd/0x2b0
>>> <6>[  157.403092]  schedule+0x3a/0x180
>>> <6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
>>> <6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
>>> <6>[  157.403102]  ? lock_release+0xcd/0x2b0
>>> <6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
>>> <6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
>>> <6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
>>> `````````````````````````````````````````````````````````````````````````````````
>>> Details log can be found in [3].
>>>
>>> After bisecting the tree, the following patch [4] seems to be the first
>>> "bad" commit
>>>
>>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>> commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
>>> Author: Matthew Wilcox (Oracle) willy@infradead.org
>>> Date:   Thu Mar 5 19:55:43 2026 +0000
>>>
>>>      locking/mutex: Remove the list_head from struct mutex
>>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>>
>>> We could not revert the patch because of merge conflict but resetting to the
>>> parent of the commit seems to fix the issue.
>>>
>>> Could you please check why the patch causes this regression and provide a
>>> fix if necessary?
>>
>> Does this help?
> 
> More tidy version of the same...
> 
> ---
> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> index b1834ab7e782..bb8b410779d4 100644
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
>   	struct mutex_waiter *w = lock->first_waiter;
>   
>   	if (w)
> -		w = list_prev_entry(w, list);
> +		w = __ww_waiter_prev(lock, w);
>   	return w;
>   }
>   
Thank you for the response, Peter. Unfortunately, the issue is still 
seen with this change.

Regards
Chaitanya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-30  8:26     ` Borah, Chaitanya Kumar
@ 2026-03-30 19:50       ` Peter Zijlstra
  2026-04-20 13:03         ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2026-03-30 19:50 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote:
> > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> > index b1834ab7e782..bb8b410779d4 100644
> > --- a/kernel/locking/ww_mutex.h
> > +++ b/kernel/locking/ww_mutex.h
> > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
> >   	struct mutex_waiter *w = lock->first_waiter;
> >   	if (w)
> > -		w = list_prev_entry(w, list);
> > +		w = __ww_waiter_prev(lock, w);
> >   	return w;
> >   }
> Thank you for the response, Peter. Unfortunately, the issue is still seen
> with this change.

Bah, indeed. Looking at this after the weekend I see that it's actually
wrong.

But I haven't yet had a new idea. I don't suppose there is a relatively
easy way to reproduce this issue outside of your CI robot?

My current working thesis is that since this is graphics, this is
ww_mutex related. I'll go over this code once more...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-30 19:50       ` Peter Zijlstra
@ 2026-04-20 13:03         ` Peter Zijlstra
  2026-04-21  6:45           ` John Stultz
  2026-04-21 14:31           ` Borah, Chaitanya Kumar
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-04-20 13:03 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Mon, Mar 30, 2026 at 09:50:37PM +0200, Peter Zijlstra wrote:
> On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote:
> > > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> > > index b1834ab7e782..bb8b410779d4 100644
> > > --- a/kernel/locking/ww_mutex.h
> > > +++ b/kernel/locking/ww_mutex.h
> > > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
> > >   	struct mutex_waiter *w = lock->first_waiter;
> > >   	if (w)
> > > -		w = list_prev_entry(w, list);
> > > +		w = __ww_waiter_prev(lock, w);
> > >   	return w;
> > >   }
> > Thank you for the response, Peter. Unfortunately, the issue is still seen
> > with this change.
> 
> Bah, indeed. Looking at this after the weekend I see that it's actually
> wrong.
> 
> But I haven't yet had a new idea. I don't suppose there is a relatively
> easy way to reproduce this issue outside of your CI robot?
> 
> My current working thesis is that since this is graphics, this is
> ww_mutex related. I'll go over this code once more...

Since you've not provided a reproducer, can I ask you to try the below?

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 186b463fe326..a93e57fc53b1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -229,10 +229,8 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter)
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
 		lock->first_waiter = NULL;
 	} else {
-		if (lock->first_waiter == waiter) {
-			lock->first_waiter = list_first_entry(&waiter->list,
-							      struct mutex_waiter, list);
-		}
+		if (lock->first_waiter == waiter)
+			lock->first_waiter = list_next_entry(waiter, list);
 		list_del(&waiter->list);
 	}
 
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 016f0db892a5..875b303511b3 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -6,6 +6,32 @@
 #define MUTEX_WAITER	mutex_waiter
 #define WAIT_LOCK	wait_lock
 
+/*
+ *                +-------+
+ *                |   3   | <+
+ *                +-------+  |
+ *                    ^      |
+ *                    |      |
+ *                    v      |
+ *  +-------+     +-------+  |
+ *  | first | --> |   1   |  |
+ *  +-------+     +-------+  |
+ *                    ^      |
+ *                    |      |
+ *                    v      |
+ *                +-------+  |
+ *                |   2   | <+
+ *                +-------+
+ */
+
+/*
+ * Specifically:
+ *
+ *   for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
+ *     ...
+ *
+ * should iterate like: 1 2 3
+ */
 static inline struct mutex_waiter *
 __ww_waiter_first(struct mutex *lock)
 	__must_hold(&lock->wait_lock)
@@ -18,23 +44,21 @@ __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w)
 	__must_hold(&lock->wait_lock)
 {
 	w = list_next_entry(w, list);
-	if (lock->first_waiter == w)
-		return NULL;
-
-	return w;
-}
-
-static inline struct mutex_waiter *
-__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
-	__must_hold(&lock->wait_lock)
-{
-	w = list_prev_entry(w, list);
-	if (lock->first_waiter == w)
+	/* We've already seen first, terminate */
+	if (w == __ww_waiter_first(lock))
 		return NULL;
 
 	return w;
 }
 
+/*
+ * Specifically:
+ *
+ *   for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
+ *     ...
+ *
+ * should iterate like: 3 2 1
+ */
 static inline struct mutex_waiter *
 __ww_waiter_last(struct mutex *lock)
 	__must_hold(&lock->wait_lock)
@@ -46,6 +70,18 @@ __ww_waiter_last(struct mutex *lock)
 	return w;
 }
 
+static inline struct mutex_waiter *
+__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
+	__must_hold(&lock->wait_lock)
+{
+	w = list_prev_entry(w, list);
+	/* We've already seen last, terminate */
+	if (w == __ww_waiter_last(lock))
+		return NULL;
+
+	return w;
+}
+
 static inline void
 __ww_waiter_add(struct mutex *lock, struct mutex_waiter *waiter, struct mutex_waiter *pos)
 	__must_hold(&lock->wait_lock)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev3)
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
  2026-03-27 16:31 ` Peter Zijlstra
  2026-03-27 16:49 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork
@ 2026-04-20 19:22 ` Patchwork
  2026-04-21 15:17 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev4) Patchwork
  2026-04-22  9:54 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev5) Patchwork
  4 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2026-04-20 19:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-gfx

== Series Details ==

Series: Regression on linux-next (next-20260324 ) (rev3)
URL   : https://patchwork.freedesktop.org/series/164009/
State : failure

== Summary ==

Address 'peterz@infradead.org' is not on the allowlist, which prevents CI from being triggered for this patch.
If you want Intel GFX CI to accept this address, please contact the script maintainers at i915-ci-infra@lists.freedesktop.org.
Exception occurred during validation, bailing out!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-20 13:03         ` Peter Zijlstra
@ 2026-04-21  6:45           ` John Stultz
  2026-04-21 10:15             ` Peter Zijlstra
  2026-04-21 14:31           ` Borah, Chaitanya Kumar
  1 sibling, 1 reply; 23+ messages in thread
From: John Stultz @ 2026-04-21  6:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam,
	K Prateek Nayak

On Mon, Apr 20, 2026 at 6:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 30, 2026 at 09:50:37PM +0200, Peter Zijlstra wrote:
> > On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote:
> > > > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> > > > index b1834ab7e782..bb8b410779d4 100644
> > > > --- a/kernel/locking/ww_mutex.h
> > > > +++ b/kernel/locking/ww_mutex.h
> > > > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
> > > >           struct mutex_waiter *w = lock->first_waiter;
> > > >           if (w)
> > > > -         w = list_prev_entry(w, list);
> > > > +         w = __ww_waiter_prev(lock, w);
> > > >           return w;
> > > >   }
> > > Thank you for the response, Peter. Unfortunately, the issue is still seen
> > > with this change.
> >
> > Bah, indeed. Looking at this after the weekend I see that it's actually
> > wrong.
> >
> > But I haven't yet had a new idea. I don't suppose there is a relatively
> > easy way to reproduce this issue outside of your CI robot?
> >
> > My current working thesis is that since this is graphics, this is
> > ww_mutex related. I'll go over this code once more...

So I tripped over this in my own testing today preping proxy patches,
bisecting it down to the same problematic commit 25500ba7e77c
("locking/mutex: Remove the list_head from struct mutex").

Inteed it does seem related to ww_mutexes, as I can pretty easily
reproduce it with defconfig + CONFIG_WW_MUTEX_SELFTEST=y  using
qemu-system-x86

Where the test will basically hang on bootup.

> Since you've not provided a reproducer, can I ask you to try the below?
>

Unfortunately that patch doesn't seem to sort it (see the same
behavior).  I'm about cooked for tonight so I'll have to look more
closely tomorrow.

thanks
-john

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21  6:45           ` John Stultz
@ 2026-04-21 10:15             ` Peter Zijlstra
  2026-04-21 12:54               ` K Prateek Nayak
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2026-04-21 10:15 UTC (permalink / raw)
  To: John Stultz
  Cc: Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam,
	K Prateek Nayak

On Mon, Apr 20, 2026 at 11:45:12PM -0700, John Stultz wrote:

> So I tripped over this in my own testing today preping proxy patches,
> bisecting it down to the same problematic commit 25500ba7e77c
> ("locking/mutex: Remove the list_head from struct mutex").
> 
> Inteed it does seem related to ww_mutexes, as I can pretty easily
> reproduce it with defconfig + CONFIG_WW_MUTEX_SELFTEST=y  using
> qemu-system-x86
> 
> Where the test will basically hang on bootup.

*groan* indeed. This of course means no CI is running this thing :-(

Anyway, yay for deterministic reproducer. Let me go prod at this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 10:15             ` Peter Zijlstra
@ 2026-04-21 12:54               ` K Prateek Nayak
  2026-04-21 14:37                 ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: K Prateek Nayak @ 2026-04-21 12:54 UTC (permalink / raw)
  To: Peter Zijlstra, John Stultz
  Cc: Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam



On 4/21/2026 3:45 PM, Peter Zijlstra wrote:
> On Mon, Apr 20, 2026 at 11:45:12PM -0700, John Stultz wrote:
> 
>> So I tripped over this in my own testing today preping proxy patches,
>> bisecting it down to the same problematic commit 25500ba7e77c
>> ("locking/mutex: Remove the list_head from struct mutex").
>>
>> Inteed it does seem related to ww_mutexes, as I can pretty easily
>> reproduce it with defconfig + CONFIG_WW_MUTEX_SELFTEST=y  using
>> qemu-system-x86
>>
>> Where the test will basically hang on bootup.
> 
> *groan* indeed. This of course means no CI is running this thing :-(
> 
> Anyway, yay for deterministic reproducer. Let me go prod at this.

So I managed to unblock the ww-mutext_test with:

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 186b463fe326..623c892c3742 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -209,8 +209,13 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
 	hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
 	debug_mutex_add_waiter(lock, waiter, current);
 
-	if (!first)
+	if (!first) {
 		first = lock->first_waiter;
+	} else if (first == lock->first_waiter) {
+		list_add_tail(&waiter->list, &first->list);
+		lock->first_waiter = waiter;
+		return;
+	}
 
 	if (first) {
 		list_add_tail(&waiter->list, &first->list);
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 016f0db892a5..2fcd6221fc64 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -28,10 +28,9 @@ static inline struct mutex_waiter *
 __ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
 	__must_hold(&lock->wait_lock)
 {
-	w = list_prev_entry(w, list);
 	if (lock->first_waiter == w)
 		return NULL;
-
+	w = list_prev_entry(w, list);
 	return w;
 }
 
---

First hunk orders the first_waiter if we are attaching to the
tail of current first_waiter which would have previously ended
up next to list_head.

The second hunk deals with __ww_waiter_prev() - since we are
traversing back from w, I guess we must first check if we are
at the first_waiter already or not.

I'll let you stare and see if it is correct or not.

-- 
Thanks and Regards,
Prateek


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-20 13:03         ` Peter Zijlstra
  2026-04-21  6:45           ` John Stultz
@ 2026-04-21 14:31           ` Borah, Chaitanya Kumar
  1 sibling, 0 replies; 23+ messages in thread
From: Borah, Chaitanya Kumar @ 2026-04-21 14:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

Hello Peter,

On 4/20/2026 6:33 PM, Peter Zijlstra wrote:
> On Mon, Mar 30, 2026 at 09:50:37PM +0200, Peter Zijlstra wrote:
>> On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote:
>>>> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
>>>> index b1834ab7e782..bb8b410779d4 100644
>>>> --- a/kernel/locking/ww_mutex.h
>>>> +++ b/kernel/locking/ww_mutex.h
>>>> @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
>>>>    	struct mutex_waiter *w = lock->first_waiter;
>>>>    	if (w)
>>>> -		w = list_prev_entry(w, list);
>>>> +		w = __ww_waiter_prev(lock, w);
>>>>    	return w;
>>>>    }
>>> Thank you for the response, Peter. Unfortunately, the issue is still seen
>>> with this change.
>>
>> Bah, indeed. Looking at this after the weekend I see that it's actually
>> wrong.
>>
>> But I haven't yet had a new idea. I don't suppose there is a relatively
>> easy way to reproduce this issue outside of your CI robot?
>>
>> My current working thesis is that since this is graphics, this is
>> ww_mutex related. I'll go over this code once more...
> 
> Since you've not provided a reproducer, can I ask you to try the below?
> 
> ---
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 186b463fe326..a93e57fc53b1 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -229,10 +229,8 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter)
>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
>   		lock->first_waiter = NULL;
>   	} else {
> -		if (lock->first_waiter == waiter) {
> -			lock->first_waiter = list_first_entry(&waiter->list,
> -							      struct mutex_waiter, list);
> -		}
> +		if (lock->first_waiter == waiter)
> +			lock->first_waiter = list_next_entry(waiter, list);
>   		list_del(&waiter->list);
>   	}
>   
> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> index 016f0db892a5..875b303511b3 100644
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -6,6 +6,32 @@
>   #define MUTEX_WAITER	mutex_waiter
>   #define WAIT_LOCK	wait_lock
>   
> +/*
> + *                +-------+
> + *                |   3   | <+
> + *                +-------+  |
> + *                    ^      |
> + *                    |      |
> + *                    v      |
> + *  +-------+     +-------+  |
> + *  | first | --> |   1   |  |
> + *  +-------+     +-------+  |
> + *                    ^      |
> + *                    |      |
> + *                    v      |
> + *                +-------+  |
> + *                |   2   | <+
> + *                +-------+
> + */
> +
> +/*
> + * Specifically:
> + *
> + *   for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
> + *     ...
> + *
> + * should iterate like: 1 2 3
> + */
>   static inline struct mutex_waiter *
>   __ww_waiter_first(struct mutex *lock)
>   	__must_hold(&lock->wait_lock)
> @@ -18,23 +44,21 @@ __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w)
>   	__must_hold(&lock->wait_lock)
>   {
>   	w = list_next_entry(w, list);
> -	if (lock->first_waiter == w)
> -		return NULL;
> -
> -	return w;
> -}
> -
> -static inline struct mutex_waiter *
> -__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
> -	__must_hold(&lock->wait_lock)
> -{
> -	w = list_prev_entry(w, list);
> -	if (lock->first_waiter == w)
> +	/* We've already seen first, terminate */
> +	if (w == __ww_waiter_first(lock))
>   		return NULL;
>   
>   	return w;
>   }
>   
> +/*
> + * Specifically:
> + *
> + *   for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
> + *     ...
> + *
> + * should iterate like: 3 2 1
> + */
>   static inline struct mutex_waiter *
>   __ww_waiter_last(struct mutex *lock)
>   	__must_hold(&lock->wait_lock)
> @@ -46,6 +70,18 @@ __ww_waiter_last(struct mutex *lock)
>   	return w;
>   }
>   
> +static inline struct mutex_waiter *
> +__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
> +	__must_hold(&lock->wait_lock)
> +{
> +	w = list_prev_entry(w, list);
> +	/* We've already seen last, terminate */
> +	if (w == __ww_waiter_last(lock))
> +		return NULL;
> +
> +	return w;
> +}
> +
>   static inline void
>   __ww_waiter_add(struct mutex *lock, struct mutex_waiter *waiter, struct mutex_waiter *pos)
>   	__must_hold(&lock->wait_lock)

Thank you for the patch.
This seems to fix the issue on our CI machine. The diff turned out to be 
slightly different, pasting it here just in case.

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 186b463fe326..a93e57fc53b1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -229,10 +229,8 @@ __mutex_remove_waiter(struct mutex *lock, struct 
mutex_waiter *waiter)
                 __mutex_clear_flag(lock, MUTEX_FLAGS);
                 lock->first_waiter = NULL;
         } else {
-               if (lock->first_waiter == waiter) {
-                       lock->first_waiter = list_first_entry(&waiter->list,
-                                                             struct 
mutex_waiter, list);
-               }
+               if (lock->first_waiter == waiter)
+                       lock->first_waiter = list_next_entry(waiter, list);
                 list_del(&waiter->list);
         }

diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 016f0db892a5..875b303511b3 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -6,6 +6,32 @@
#define MUTEX_WAITER   mutex_waiter
#define WAIT_LOCK      wait_lock

+/*
+ *                +-------+
+ *                |   3   | <+
+ *                +-------+  |
+ *                    ^      |
+ *                    |      |
+ *                    v      |
+ *  +-------+     +-------+  |
+ *  | first | --> |   1   |  |
+ *  +-------+     +-------+  |
+ *                    ^      |
+ *                    |      |
+ *                    v      |
+ *                +-------+  |
+ *                |   2   | <+
+ *                +-------+
+ */
+
+/*
+ * Specifically:
+ *
+ *   for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
+ *     ...
+ *
+ * should iterate like: 1 2 3
+ */
static inline struct mutex_waiter *
__ww_waiter_first(struct mutex *lock)
         __must_hold(&lock->wait_lock)
@@ -18,31 +44,41 @@ __ww_waiter_next(struct mutex *lock, struct 
mutex_waiter *w)
         __must_hold(&lock->wait_lock)
{
         w = list_next_entry(w, list);
-       if (lock->first_waiter == w)
+       /* We've already seen first, terminate */
+       if (w == __ww_waiter_first(lock))
                 return NULL;

         return w;
}

+/*
+ * Specifically:
+ *
+ *   for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
+ *     ...
+ *
+ * should iterate like: 3 2 1
+ */
static inline struct mutex_waiter *
-__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
+__ww_waiter_last(struct mutex *lock)
         __must_hold(&lock->wait_lock)
{
-       w = list_prev_entry(w, list);
-       if (lock->first_waiter == w)
-               return NULL;
+       struct mutex_waiter *w = lock->first_waiter;

+       if (w)
+               w = list_prev_entry(w, list);
         return w;
}

static inline struct mutex_waiter *
-__ww_waiter_last(struct mutex *lock)
+__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
         __must_hold(&lock->wait_lock)
{
-       struct mutex_waiter *w = lock->first_waiter;
+       w = list_prev_entry(w, list);
+       /* We've already seen last, terminate */
+       if (w == __ww_waiter_last(lock))
+               return NULL;

-       if (w)
-               w = list_prev_entry(w, list);
         return w;
}

==
Chaitanya



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 12:54               ` K Prateek Nayak
@ 2026-04-21 14:37                 ` Peter Zijlstra
  2026-04-21 14:45                   ` Matthew Wilcox
  2026-04-21 15:48                   ` K Prateek Nayak
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-04-21 14:37 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: John Stultz, Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

On Tue, Apr 21, 2026 at 06:24:20PM +0530, K Prateek Nayak wrote:
> 
> 
> On 4/21/2026 3:45 PM, Peter Zijlstra wrote:
> > On Mon, Apr 20, 2026 at 11:45:12PM -0700, John Stultz wrote:
> > 
> >> So I tripped over this in my own testing today preping proxy patches,
> >> bisecting it down to the same problematic commit 25500ba7e77c
> >> ("locking/mutex: Remove the list_head from struct mutex").
> >>
> >> Inteed it does seem related to ww_mutexes, as I can pretty easily
> >> reproduce it with defconfig + CONFIG_WW_MUTEX_SELFTEST=y  using
> >> qemu-system-x86
> >>
> >> Where the test will basically hang on bootup.
> > 
> > *groan* indeed. This of course means no CI is running this thing :-(
> > 
> > Anyway, yay for deterministic reproducer. Let me go prod at this.
> 
> So I managed to unblock the ww-mutext_test with:
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 186b463fe326..623c892c3742 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -209,8 +209,13 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
>  	hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
>  	debug_mutex_add_waiter(lock, waiter, current);
>  
> -	if (!first)
> +	if (!first) {
>  		first = lock->first_waiter;
> +	} else if (first == lock->first_waiter) {
> +		list_add_tail(&waiter->list, &first->list);
> +		lock->first_waiter = waiter;
> +		return;
> +	}
>  
>  	if (first) {
>  		list_add_tail(&waiter->list, &first->list);

> First hunk orders the first_waiter if we are attaching to the
> tail of current first_waiter which would have previously ended
> up next to list_head.

This is the case in __ww_mutex_add_waiter() where pos == first, right?

Argh, I see... yes. Perhaps something like the below though?

> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> index 016f0db892a5..2fcd6221fc64 100644
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -28,10 +28,9 @@ static inline struct mutex_waiter *
>  __ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
>  	__must_hold(&lock->wait_lock)
>  {
> -	w = list_prev_entry(w, list);
>  	if (lock->first_waiter == w)
>  		return NULL;
> -
> +	w = list_prev_entry(w, list);
>  	return w;
>  }

> The second hunk deals with __ww_waiter_prev() - since we are
> traversing back from w, I guess we must first check if we are
> at the first_waiter already or not.

Yes, that second hunk is what I found yesterday, although my fix was
far more verbose. I like this one better ;-)


---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 95f1822122a1..7d48d6f49f71 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -200,21 +200,34 @@ static inline void __mutex_clear_flag(struct mutex *lock, unsigned long flag)
  */
 static void
 __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
-		   struct mutex_waiter *first)
+		   struct mutex_waiter *pos)
 {
+	struct mutex_waiter *first = lock->first_waiter;
+
 	hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
 	debug_mutex_add_waiter(lock, waiter, current);
 
-	if (!first)
-		first = lock->first_waiter;
+	if (pos) {
+		/*
+		 * Insert @waiter before @pos.
+		 */
+		list_add_tail(&waiter->list, &pos->list);
+		/*
+		 * If @pos == @first, then @waiter will be the new first.
+		 */
+		if (pos == first)
+			lock->first_waiter = waiter;
+		return;
+	}
 
 	if (first) {
 		list_add_tail(&waiter->list, &first->list);
-	} else {
-		INIT_LIST_HEAD(&waiter->list);
-		lock->first_waiter = waiter;
-		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
+		return;
 	}
+
+	INIT_LIST_HEAD(&waiter->list);
+	lock->first_waiter = waiter;
+	__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
 }
 
 static void
@@ -224,10 +237,8 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter)
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
 		lock->first_waiter = NULL;
 	} else {
-		if (lock->first_waiter == waiter) {
-			lock->first_waiter = list_first_entry(&waiter->list,
-							      struct mutex_waiter, list);
-		}
+		if (lock->first_waiter == waiter)
+			lock->first_waiter = list_next_entry(waiter, list);
 		list_del(&waiter->list);
 	}
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 14:37                 ` Peter Zijlstra
@ 2026-04-21 14:45                   ` Matthew Wilcox
  2026-04-21 15:03                     ` Peter Zijlstra
  2026-04-21 15:48                   ` K Prateek Nayak
  1 sibling, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2026-04-21 14:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: K Prateek Nayak, John Stultz, Borah, Chaitanya Kumar,
	linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Tue, Apr 21, 2026 at 04:37:52PM +0200, Peter Zijlstra wrote:
> Argh, I see... yes. Perhaps something like the below though?

This is a documentation bug.  I thought it was supposed to add *after*
pos, not before.  So can we clear that up too?

@@ -198,7 +198,7 @@ static inline void __mutex_clear_flag(struct mutex *lock, unsigned long flag)
 }

 /*
- * Add @waiter to a given location in the lock wait_list and set the
+ * Add @waiter before a given location in the lock wait_list and set the
  * FLAG_WAITERS flag if it's the first waiter.
  */
 static void


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 14:45                   ` Matthew Wilcox
@ 2026-04-21 15:03                     ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-04-21 15:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: K Prateek Nayak, John Stultz, Borah, Chaitanya Kumar,
	linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Tue, Apr 21, 2026 at 03:45:02PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 21, 2026 at 04:37:52PM +0200, Peter Zijlstra wrote:
> > Argh, I see... yes. Perhaps something like the below though?
> 
> This is a documentation bug.  I thought it was supposed to add *after*
> pos, not before.  So can we clear that up too?

Sure, that comment is indeed less than clear. The comment in
__ww_mutex_add_waiter() is better.

> 
> @@ -198,7 +198,7 @@ static inline void __mutex_clear_flag(struct mutex *lock, unsigned long flag)
>  }
> 
>  /*
> - * Add @waiter to a given location in the lock wait_list and set the
> + * Add @waiter before a given location in the lock wait_list and set the
>   * FLAG_WAITERS flag if it's the first waiter.
>   */
>  static void
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev4)
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
                   ` (2 preceding siblings ...)
  2026-04-20 19:22 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev3) Patchwork
@ 2026-04-21 15:17 ` Patchwork
  2026-04-22  9:54 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev5) Patchwork
  4 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2026-04-21 15:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-gfx

== Series Details ==

Series: Regression on linux-next (next-20260324 ) (rev4)
URL   : https://patchwork.freedesktop.org/series/164009/
State : failure

== Summary ==

Address 'peterz@infradead.org' is not on the allowlist, which prevents CI from being triggered for this patch.
If you want Intel GFX CI to accept this address, please contact the script maintainers at i915-ci-infra@lists.freedesktop.org.
Exception occurred during validation, bailing out!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 14:37                 ` Peter Zijlstra
  2026-04-21 14:45                   ` Matthew Wilcox
@ 2026-04-21 15:48                   ` K Prateek Nayak
  2026-04-21 17:29                     ` John Stultz
  1 sibling, 1 reply; 23+ messages in thread
From: K Prateek Nayak @ 2026-04-21 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

Hello Peter,

On 4/21/2026 8:07 PM, Peter Zijlstra wrote:
>> First hunk orders the first_waiter if we are attaching to the
>> tail of current first_waiter which would have previously ended
>> up next to list_head.
> 
> This is the case in __ww_mutex_add_waiter() where pos == first, right?
> 
> Argh, I see... yes. Perhaps something like the below though?

Neat! Thank you for cleaning it up. Those, along with the changes in
ww_mutex.h fix the issue of ww-mutex_test hanging in my case. Feel free
to include:

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 15:48                   ` K Prateek Nayak
@ 2026-04-21 17:29                     ` John Stultz
  2026-04-21 20:56                       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: John Stultz @ 2026-04-21 17:29 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Peter Zijlstra, Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

On Tue, Apr 21, 2026 at 8:48 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> On 4/21/2026 8:07 PM, Peter Zijlstra wrote:
> >> First hunk orders the first_waiter if we are attaching to the
> >> tail of current first_waiter which would have previously ended
> >> up next to list_head.
> >
> > This is the case in __ww_mutex_add_waiter() where pos == first, right?
> >
> > Argh, I see... yes. Perhaps something like the below though?
>
> Neat! Thank you for cleaning it up. Those, along with the changes in
> ww_mutex.h fix the issue of ww-mutex_test hanging in my case. Feel free
> to include:
>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

Same. With Peter's change and K Prateek's ww_mutex.h change it looks
like it's working for me.

Thank you both!
Tested-by: John Stultz <jstultz@google.com>

thanks
-john

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 17:29                     ` John Stultz
@ 2026-04-21 20:56                       ` Peter Zijlstra
  2026-04-22  9:23                         ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2026-04-21 20:56 UTC (permalink / raw)
  To: John Stultz
  Cc: K Prateek Nayak, Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

On Tue, Apr 21, 2026 at 10:29:28AM -0700, John Stultz wrote:
> On Tue, Apr 21, 2026 at 8:48 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> > On 4/21/2026 8:07 PM, Peter Zijlstra wrote:
> > >> First hunk orders the first_waiter if we are attaching to the
> > >> tail of current first_waiter which would have previously ended
> > >> up next to list_head.
> > >
> > > This is the case in __ww_mutex_add_waiter() where pos == first, right?
> > >
> > > Argh, I see... yes. Perhaps something like the below though?
> >
> > Neat! Thank you for cleaning it up. Those, along with the changes in
> > ww_mutex.h fix the issue of ww-mutex_test hanging in my case. Feel free
> > to include:
> >
> > Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> 
> Same. With Peter's change and K Prateek's ww_mutex.h change it looks
> like it's working for me.
> 
> Thank you both!
> Tested-by: John Stultz <jstultz@google.com>

Excellent, I'll write it up tomorrow.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-21 20:56                       ` Peter Zijlstra
@ 2026-04-22  9:23                         ` Peter Zijlstra
  2026-04-22 12:07                           ` K Prateek Nayak
  2026-04-22 15:52                           ` mikhail.v.gavrilov
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2026-04-22  9:23 UTC (permalink / raw)
  To: John Stultz
  Cc: K Prateek Nayak, Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

On Tue, Apr 21, 2026 at 10:56:48PM +0200, Peter Zijlstra wrote:
> Excellent, I'll write it up tomorrow.

How's this? It 'passes' the ww_mutex selftest thing in so far as that I
get the same:

[    2.312369] Beginning ww (wound) mutex selftests
[    4.853240] stress (stress_inorder_work) failed with -35
[    9.379572] Beginning ww (die) mutex selftests
[   16.435831] All ww mutex selftests passed

before the offending commit and after this patch.

---
Subject: Subject: locking/mutex: Fix ww_mutex wait_list operations
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Apr 22 10:38:41 CEST 2026

Chaitanya and John reported commit 25500ba7e77c ("locking/mutex: Remove the
list_head from struct mutex") wrecked ww_mutex.

Specifically there were 2 issues:

 - __ww_waiter_prev() had the termination condition wrong; it would terminate
   when the previous entry was the first, which results in a truncated
   iteration: W3, W2, (no W1).

 - __mutex_add_waiter(@pos != NULL), as used by __ww_waiter_add() /
   __ww_mutex_add_waiter(); this inserts @waiter before @pos (which is what
   list_add_tail() does). But this should then also update lock->first_waiter.

Much thanks to Prateek for spotting the __mutex_add_waiter() issue!

Fixes: 25500ba7e77c ("locking/mutex: Remove the list_head from struct mutex")
Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Closes: https://lore.kernel.org/r/af005996-05e9-4336-8450-d14ca652ba5d%40intel.com
Reported-by: John Stultz <jstultz@google.com>
Closes: https://lore.kernel.org/r/CANDhNCq%3Doizzud3hH3oqGzTrcjB8OwGeineJ3mwZuGdDWG8fRQ%40mail.gmail.com
Debugged-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/mutex.c    |   40 +++++++++++++++++++++++++++-------------
 kernel/locking/ww_mutex.h |   34 ++++++++++++++++++++++++++++++++--
 2 files changed, 59 insertions(+), 15 deletions(-)

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -198,27 +198,43 @@ static inline void __mutex_clear_flag(st
 }
 
 /*
- * Add @waiter to a given location in the lock wait_list and set the
- * FLAG_WAITERS flag if it's the first waiter.
+ * Add @waiter to the @lock wait_list and set the FLAG_WAITERS flag if it's
+ * the first waiter.
+ *
+ * When @pos, @waiter is added before the waiter indicated by @pos. Otherwise
+ * @waiter will be added to the tail of the list.
  */
 static void
 __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
-		   struct mutex_waiter *first)
+		   struct mutex_waiter *pos)
 	__must_hold(&lock->wait_lock)
 {
+	struct mutex_waiter *first = lock->first_waiter;
+
 	hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
 	debug_mutex_add_waiter(lock, waiter, current);
 
-	if (!first)
-		first = lock->first_waiter;
+	if (pos) {
+		/*
+		 * Insert @waiter before @pos.
+		 */
+		list_add_tail(&waiter->list, &pos->list);
+		/*
+		 * If @pos == @first, then @waiter will be the new first.
+		 */
+		if (pos == first)
+			lock->first_waiter = waiter;
+		return;
+	}
 
 	if (first) {
 		list_add_tail(&waiter->list, &first->list);
-	} else {
-		INIT_LIST_HEAD(&waiter->list);
-		lock->first_waiter = waiter;
-		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
+		return;
 	}
+
+	INIT_LIST_HEAD(&waiter->list);
+	lock->first_waiter = waiter;
+	__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
 }
 
 static void
@@ -229,10 +245,8 @@ __mutex_remove_waiter(struct mutex *lock
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
 		lock->first_waiter = NULL;
 	} else {
-		if (lock->first_waiter == waiter) {
-			lock->first_waiter = list_first_entry(&waiter->list,
-							      struct mutex_waiter, list);
-		}
+		if (lock->first_waiter == waiter)
+			lock->first_waiter = list_next_entry(waiter, list);
 		list_del(&waiter->list);
 	}
 
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -6,6 +6,19 @@
 #define MUTEX_WAITER	mutex_waiter
 #define WAIT_LOCK	wait_lock
 
+/*
+ *           +--------+
+ *           | first  |
+ *           +--------+
+ *                |
+ *                v
+ *  +----+     +----+     +----+
+ *  | W3 | <-> | W1 | <-> | W2 |
+ *  +----+     +----+     +----+
+ *    ^                     ^
+ *    +---------------------+
+ */
+
 static inline struct mutex_waiter *
 __ww_waiter_first(struct mutex *lock)
 	__must_hold(&lock->wait_lock)
@@ -13,26 +26,43 @@ __ww_waiter_first(struct mutex *lock)
 	return lock->first_waiter;
 }
 
+/*
+ * for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
+ *
+ * Should iterate like: W1, W2, W3
+ */
 static inline struct mutex_waiter *
 __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w)
 	__must_hold(&lock->wait_lock)
 {
 	w = list_next_entry(w, list);
+	/*
+	 * Terminate if the next entry is the first again, that has already
+	 * been observed.
+	 */
 	if (lock->first_waiter == w)
 		return NULL;
 
 	return w;
 }
 
+/*
+ * for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
+ *
+ * Should iterate like: W3, W2, W1
+ */
 static inline struct mutex_waiter *
 __ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
 	__must_hold(&lock->wait_lock)
 {
-	w = list_prev_entry(w, list);
+	/*
+	 * Terminate at the first entry, the previous entry of first is the
+	 * last and that has already been observed.
+	 */
 	if (lock->first_waiter == w)
 		return NULL;
 
-	return w;
+	return list_prev_entry(w, list);
 }
 
 static inline struct mutex_waiter *

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev5)
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
                   ` (3 preceding siblings ...)
  2026-04-21 15:17 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev4) Patchwork
@ 2026-04-22  9:54 ` Patchwork
  4 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2026-04-22  9:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-gfx

== Series Details ==

Series: Regression on linux-next (next-20260324 ) (rev5)
URL   : https://patchwork.freedesktop.org/series/164009/
State : failure

== Summary ==

Address 'peterz@infradead.org' is not on the allowlist, which prevents CI from being triggered for this patch.
If you want Intel GFX CI to accept this address, please contact the script maintainers at i915-ci-infra@lists.freedesktop.org.
Exception occurred during validation, bailing out!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-22  9:23                         ` Peter Zijlstra
@ 2026-04-22 12:07                           ` K Prateek Nayak
  2026-04-22 15:52                           ` mikhail.v.gavrilov
  1 sibling, 0 replies; 23+ messages in thread
From: K Prateek Nayak @ 2026-04-22 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, John Stultz
  Cc: Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

Hello Peter,

On 4/22/2026 2:53 PM, Peter Zijlstra wrote:
> On Tue, Apr 21, 2026 at 10:56:48PM +0200, Peter Zijlstra wrote:
>> Excellent, I'll write it up tomorrow.
> 
> How's this? It 'passes' the ww_mutex selftest thing in so far as that I
> get the same:
> 
> [    2.312369] Beginning ww (wound) mutex selftests
> [    4.853240] stress (stress_inorder_work) failed with -35
> [    9.379572] Beginning ww (die) mutex selftests
> [   16.435831] All ww mutex selftests passed
> 
> before the offending commit and after this patch.

Yup I see pretty much the same. I think 4k ww-mutexes being fought
for by #CPUs threads with a short timeout doesn't sit too well with
stress_inorder_work but it does make some forward progress until
that timeout hits like before ;-)

> 
> ---
> Subject: Subject: locking/mutex: Fix ww_mutex wait_list operations
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Apr 22 10:38:41 CEST 2026
> 
> Chaitanya and John reported commit 25500ba7e77c ("locking/mutex: Remove the
> list_head from struct mutex") wrecked ww_mutex.
> 
> Specifically there were 2 issues:
> 
>  - __ww_waiter_prev() had the termination condition wrong; it would terminate
>    when the previous entry was the first, which results in a truncated
>    iteration: W3, W2, (no W1).
> 
>  - __mutex_add_waiter(@pos != NULL), as used by __ww_waiter_add() /
>    __ww_mutex_add_waiter(); this inserts @waiter before @pos (which is what
>    list_add_tail() does). But this should then also update lock->first_waiter.
> 
> Much thanks to Prateek for spotting the __mutex_add_waiter() issue!
> 
> Fixes: 25500ba7e77c ("locking/mutex: Remove the list_head from struct mutex")
> Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
> Closes: https://lore.kernel.org/r/af005996-05e9-4336-8450-d14ca652ba5d%40intel.com
> Reported-by: John Stultz <jstultz@google.com>
> Closes: https://lore.kernel.org/r/CANDhNCq%3Doizzud3hH3oqGzTrcjB8OwGeineJ3mwZuGdDWG8fRQ%40mail.gmail.com
> Debugged-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Still runs as expected! Feel free to include:

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

-- 
Thanks and Regards,
Prateek


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-04-22  9:23                         ` Peter Zijlstra
  2026-04-22 12:07                           ` K Prateek Nayak
@ 2026-04-22 15:52                           ` mikhail.v.gavrilov
  1 sibling, 0 replies; 23+ messages in thread
From: mikhail.v.gavrilov @ 2026-04-22 15:52 UTC (permalink / raw)
  To: Peter Zijlstra, John Stultz
  Cc: K Prateek Nayak, Borah, Chaitanya Kumar, willy, linux-kernel,
	intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
	Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam

On Wed, 2026-04-22 at 11:23 +0200, Peter Zijlstra wrote:
> 
> How's this? It 'passes' the ww_mutex selftest thing in so far as that
> I
> get the same:
> 
> [    2.312369] Beginning ww (wound) mutex selftests
> [    4.853240] stress (stress_inorder_work) failed with -35
> [    9.379572] Beginning ww (die) mutex selftests
> [   16.435831] All ww mutex selftests passed
> 
> before the offending commit and after this patch.
> 
> ---
> Subject: Subject: locking/mutex: Fix ww_mutex wait_list operations
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Apr 22 10:38:41 CEST 2026
> 
> Chaitanya and John reported commit 25500ba7e77c ("locking/mutex:
> Remove the
> list_head from struct mutex") wrecked ww_mutex.
> 
> Specifically there were 2 issues:
> 
>  - __ww_waiter_prev() had the termination condition wrong; it would
> terminate
>    when the previous entry was the first, which results in a
> truncated
>    iteration: W3, W2, (no W1).
> 
>  - __mutex_add_waiter(@pos != NULL), as used by __ww_waiter_add() /
>    __ww_mutex_add_waiter(); this inserts @waiter before @pos (which
> is what
>    list_add_tail() does). But this should then also update lock-
> >first_waiter.
> 
> Much thanks to Prateek for spotting the __mutex_add_waiter() issue!
> 
> Fixes: 25500ba7e77c ("locking/mutex: Remove the list_head from struct
> mutex")
> Reported-by: "Borah, Chaitanya Kumar"
> <chaitanya.kumar.borah@intel.com>
> Closes:
> https://lore.kernel.org/r/af005996-05e9-4336-8450-d14ca652ba5d%40intel.com
> Reported-by: John Stultz <jstultz@google.com>
> Closes:
> https://lore.kernel.org/r/CANDhNCq%3Doizzud3hH3oqGzTrcjB8OwGeineJ3mwZuGdDWG8fRQ%40mail.gmail.com
> Debugged-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/locking/mutex.c    |   40 +++++++++++++++++++++++++++--------
> -----
>  kernel/locking/ww_mutex.h |   34 ++++++++++++++++++++++++++++++++--
>  2 files changed, 59 insertions(+), 15 deletions(-)
> 
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -198,27 +198,43 @@ static inline void __mutex_clear_flag(st
>  }
>  
>  /*
> - * Add @waiter to a given location in the lock wait_list and set the
> - * FLAG_WAITERS flag if it's the first waiter.
> + * Add @waiter to the @lock wait_list and set the FLAG_WAITERS flag
> if it's
> + * the first waiter.
> + *
> + * When @pos, @waiter is added before the waiter indicated by @pos.
> Otherwise
> + * @waiter will be added to the tail of the list.
>   */
>  static void
>  __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> -		   struct mutex_waiter *first)
> +		   struct mutex_waiter *pos)
>  	__must_hold(&lock->wait_lock)
>  {
> +	struct mutex_waiter *first = lock->first_waiter;
> +
>  	hung_task_set_blocker(lock, BLOCKER_TYPE_MUTEX);
>  	debug_mutex_add_waiter(lock, waiter, current);
>  
> -	if (!first)
> -		first = lock->first_waiter;
> +	if (pos) {
> +		/*
> +		 * Insert @waiter before @pos.
> +		 */
> +		list_add_tail(&waiter->list, &pos->list);
> +		/*
> +		 * If @pos == @first, then @waiter will be the new
> first.
> +		 */
> +		if (pos == first)
> +			lock->first_waiter = waiter;
> +		return;
> +	}
>  
>  	if (first) {
>  		list_add_tail(&waiter->list, &first->list);
> -	} else {
> -		INIT_LIST_HEAD(&waiter->list);
> -		lock->first_waiter = waiter;
> -		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
> +		return;
>  	}
> +
> +	INIT_LIST_HEAD(&waiter->list);
> +	lock->first_waiter = waiter;
> +	__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
>  }
>  
>  static void
> @@ -229,10 +245,8 @@ __mutex_remove_waiter(struct mutex *lock
>  		__mutex_clear_flag(lock, MUTEX_FLAGS);
>  		lock->first_waiter = NULL;
>  	} else {
> -		if (lock->first_waiter == waiter) {
> -			lock->first_waiter =
> list_first_entry(&waiter->list,
> -							      struct
> mutex_waiter, list);
> -		}
> +		if (lock->first_waiter == waiter)
> +			lock->first_waiter = list_next_entry(waiter,
> list);
>  		list_del(&waiter->list);
>  	}
>  
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -6,6 +6,19 @@
>  #define MUTEX_WAITER	mutex_waiter
>  #define WAIT_LOCK	wait_lock
>  
> +/*
> + *           +--------+
> + *           | first  |
> + *           +--------+
> + *                |
> + *                v
> + *  +----+     +----+     +----+
> + *  | W3 | <-> | W1 | <-> | W2 |
> + *  +----+     +----+     +----+
> + *    ^                     ^
> + *    +---------------------+
> + */
> +
>  static inline struct mutex_waiter *
>  __ww_waiter_first(struct mutex *lock)
>  	__must_hold(&lock->wait_lock)
> @@ -13,26 +26,43 @@ __ww_waiter_first(struct mutex *lock)
>  	return lock->first_waiter;
>  }
>  
> +/*
> + * for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next())
> + *
> + * Should iterate like: W1, W2, W3
> + */
>  static inline struct mutex_waiter *
>  __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w)
>  	__must_hold(&lock->wait_lock)
>  {
>  	w = list_next_entry(w, list);
> +	/*
> +	 * Terminate if the next entry is the first again, that has
> already
> +	 * been observed.
> +	 */
>  	if (lock->first_waiter == w)
>  		return NULL;
>  
>  	return w;
>  }
>  
> +/*
> + * for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev())
> + *
> + * Should iterate like: W3, W2, W1
> + */
>  static inline struct mutex_waiter *
>  __ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w)
>  	__must_hold(&lock->wait_lock)
>  {
> -	w = list_prev_entry(w, list);
> +	/*
> +	 * Terminate at the first entry, the previous entry of first
> is the
> +	 * last and that has already been observed.
> +	 */
>  	if (lock->first_waiter == w)
>  		return NULL;
>  
> -	return w;
> +	return list_prev_entry(w, list);
>  }
>  
>  static inline struct mutex_waiter *


Confirmed on an independent userspace-visible reproducer: Resident
Evil 2/3/4/9 under Proton on AMD Zen4 + RX 7900 XTX, which hangs
deterministically during level load on current master (main thread
parked in futex_waitv). With this patch applied on top of master,
both RE2 and RE9 complete a full playthrough with save-resume on two
independent workstations (ASUS and ASRock B650). No hang, no splats.

Symptom details and third bisect log are in the separate thread at
https://lore.kernel.org/r/CABXGCsO5fKq2nD9nO8yO1z50ZzgCPWqueNXHANjntaswoOh2Dg@mail.gmail.com

Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

-- 
Thanks,
Mikhail

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-04-24 12:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
2026-03-27 16:31 ` Peter Zijlstra
2026-03-27 16:43   ` Peter Zijlstra
2026-03-30  8:26     ` Borah, Chaitanya Kumar
2026-03-30 19:50       ` Peter Zijlstra
2026-04-20 13:03         ` Peter Zijlstra
2026-04-21  6:45           ` John Stultz
2026-04-21 10:15             ` Peter Zijlstra
2026-04-21 12:54               ` K Prateek Nayak
2026-04-21 14:37                 ` Peter Zijlstra
2026-04-21 14:45                   ` Matthew Wilcox
2026-04-21 15:03                     ` Peter Zijlstra
2026-04-21 15:48                   ` K Prateek Nayak
2026-04-21 17:29                     ` John Stultz
2026-04-21 20:56                       ` Peter Zijlstra
2026-04-22  9:23                         ` Peter Zijlstra
2026-04-22 12:07                           ` K Prateek Nayak
2026-04-22 15:52                           ` mikhail.v.gavrilov
2026-04-21 14:31           ` Borah, Chaitanya Kumar
2026-03-27 16:49 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork
2026-04-20 19:22 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev3) Patchwork
2026-04-21 15:17 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev4) Patchwork
2026-04-22  9:54 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev5) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox