* Regression on linux-next (next-20260324 ) @ 2026-03-27 13:39 Borah, Chaitanya Kumar 2026-03-27 16:31 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: Borah, Chaitanya Kumar @ 2026-03-27 13:39 UTC (permalink / raw) To: willy Cc: linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, peterz, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam Hello Matthew, Hope you are doing well. I am Chaitanya from the linux graphics team in Intel. This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository. Since the version next-20260324 [2], we are seeing the following regression ````````````````````````````````````````````````````````````````````````````````` <5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT. <6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z) replay-kernel-logs(R) <6>[ 157.399543] sysrq: Show State <6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00080000 <6>[ 157.403067] Call Trace: <6>[ 157.403069] <TASK> <6>[ 157.403072] __schedule+0x5d7/0x1ef0 <6>[ 157.403078] ? lock_acquire+0xc4/0x300 <6>[ 157.403084] ? schedule+0x10e/0x180 <6>[ 157.403087] ? lock_release+0xcd/0x2b0 <6>[ 157.403092] schedule+0x3a/0x180 <6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120 <6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0 <6>[ 157.403102] ? lock_release+0xcd/0x2b0 <6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70 <6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0 <6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30 ````````````````````````````````````````````````````````````````````````````````` Details log can be found in [3]. After bisecting the tree, the following patch [4] seems to be the first "bad" commit ````````````````````````````````````````````````````````````````````````````````````````````````````````` commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe Author: Matthew Wilcox (Oracle) willy@infradead.org Date: Thu Mar 5 19:55:43 2026 +0000 locking/mutex: Remove the list_head from struct mutex ````````````````````````````````````````````````````````````````````````````````````````````````````````` We could not revert the patch because of merge conflict but resetting to the parent of the commit seems to fix the issue. Could you please check why the patch causes this regression and provide a fix if necessary? Thank you. Regards Chaitanya [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html? [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20260324 [3] https://intel-gfx-ci.01.org/tree/linux-next/next-20260326/bat-arlh-2/dmesg0.txt [4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130&id=f4acfcd4deb158b96595250cc332901b282d15b0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar @ 2026-03-27 16:31 ` Peter Zijlstra 2026-03-27 16:43 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2026-03-27 16:31 UTC (permalink / raw) To: Borah, Chaitanya Kumar Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote: > Hello Matthew, > > Hope you are doing well. I am Chaitanya from the linux graphics team in > Intel. > > This mail is regarding a regression we are seeing in our CI runs[1] on > linux-next repository. > > Since the version next-20260324 [2], we are seeing the following regression > > ````````````````````````````````````````````````````````````````````````````````` > <5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current > test with SIGQUIT. > <6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) > show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) > kill-all-tasks(i) thaw-filesystems(j) sak(k) > show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) > poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) > show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) > dump-ftrace-buffer(z) replay-kernel-logs(R) > <6>[ 157.399543] sysrq: Show State > <6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1 > ppid:0 task_flags:0x400100 flags:0x00080000 > <6>[ 157.403067] Call Trace: > <6>[ 157.403069] <TASK> > <6>[ 157.403072] __schedule+0x5d7/0x1ef0 > <6>[ 157.403078] ? lock_acquire+0xc4/0x300 > <6>[ 157.403084] ? schedule+0x10e/0x180 > <6>[ 157.403087] ? lock_release+0xcd/0x2b0 > <6>[ 157.403092] schedule+0x3a/0x180 > <6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120 > <6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0 > <6>[ 157.403102] ? lock_release+0xcd/0x2b0 > <6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70 > <6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0 > <6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30 > ````````````````````````````````````````````````````````````````````````````````` > Details log can be found in [3]. > > After bisecting the tree, the following patch [4] seems to be the first > "bad" commit > > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe > Author: Matthew Wilcox (Oracle) willy@infradead.org > Date: Thu Mar 5 19:55:43 2026 +0000 > > locking/mutex: Remove the list_head from struct mutex > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > > We could not revert the patch because of merge conflict but resetting to the > parent of the commit seems to fix the issue. > > Could you please check why the patch causes this regression and provide a > fix if necessary? Does this help? --- --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -40,10 +40,10 @@ __ww_waiter_last(struct mutex *lock) __must_hold(&lock->wait_lock) { struct mutex_waiter *w = lock->first_waiter; + if (!w) + return NULL; - if (w) - w = list_prev_entry(w, list); - return w; + return __ww_waiter_prev(lock, w); } static inline void ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-03-27 16:31 ` Peter Zijlstra @ 2026-03-27 16:43 ` Peter Zijlstra 2026-03-30 8:26 ` Borah, Chaitanya Kumar 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2026-03-27 16:43 UTC (permalink / raw) To: Borah, Chaitanya Kumar Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote: > On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote: > > Hello Matthew, > > > > Hope you are doing well. I am Chaitanya from the linux graphics team in > > Intel. > > > > This mail is regarding a regression we are seeing in our CI runs[1] on > > linux-next repository. > > > > Since the version next-20260324 [2], we are seeing the following regression > > > > ````````````````````````````````````````````````````````````````````````````````` > > <5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current > > test with SIGQUIT. > > <6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) > > show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) > > kill-all-tasks(i) thaw-filesystems(j) sak(k) > > show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) > > poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) > > show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) > > dump-ftrace-buffer(z) replay-kernel-logs(R) > > <6>[ 157.399543] sysrq: Show State > > <6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1 > > ppid:0 task_flags:0x400100 flags:0x00080000 > > <6>[ 157.403067] Call Trace: > > <6>[ 157.403069] <TASK> > > <6>[ 157.403072] __schedule+0x5d7/0x1ef0 > > <6>[ 157.403078] ? lock_acquire+0xc4/0x300 > > <6>[ 157.403084] ? schedule+0x10e/0x180 > > <6>[ 157.403087] ? lock_release+0xcd/0x2b0 > > <6>[ 157.403092] schedule+0x3a/0x180 > > <6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120 > > <6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0 > > <6>[ 157.403102] ? lock_release+0xcd/0x2b0 > > <6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70 > > <6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0 > > <6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30 > > ````````````````````````````````````````````````````````````````````````````````` > > Details log can be found in [3]. > > > > After bisecting the tree, the following patch [4] seems to be the first > > "bad" commit > > > > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > > commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe > > Author: Matthew Wilcox (Oracle) willy@infradead.org > > Date: Thu Mar 5 19:55:43 2026 +0000 > > > > locking/mutex: Remove the list_head from struct mutex > > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > > > > We could not revert the patch because of merge conflict but resetting to the > > parent of the commit seems to fix the issue. > > > > Could you please check why the patch causes this regression and provide a > > fix if necessary? > > Does this help? More tidy version of the same... --- diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index b1834ab7e782..bb8b410779d4 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock) struct mutex_waiter *w = lock->first_waiter; if (w) - w = list_prev_entry(w, list); + w = __ww_waiter_prev(lock, w); return w; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-03-27 16:43 ` Peter Zijlstra @ 2026-03-30 8:26 ` Borah, Chaitanya Kumar 2026-03-30 19:50 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: Borah, Chaitanya Kumar @ 2026-03-30 8:26 UTC (permalink / raw) To: Peter Zijlstra Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam On 3/27/2026 10:13 PM, Peter Zijlstra wrote: > On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote: >> On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote: >>> Hello Matthew, >>> >>> Hope you are doing well. I am Chaitanya from the linux graphics team in >>> Intel. >>> >>> This mail is regarding a regression we are seeing in our CI runs[1] on >>> linux-next repository. >>> >>> Since the version next-20260324 [2], we are seeing the following regression >>> >>> ````````````````````````````````````````````````````````````````````````````````` >>> <5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current >>> test with SIGQUIT. >>> <6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) >>> show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) >>> kill-all-tasks(i) thaw-filesystems(j) sak(k) >>> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) >>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) >>> show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) >>> dump-ftrace-buffer(z) replay-kernel-logs(R) >>> <6>[ 157.399543] sysrq: Show State >>> <6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1 >>> ppid:0 task_flags:0x400100 flags:0x00080000 >>> <6>[ 157.403067] Call Trace: >>> <6>[ 157.403069] <TASK> >>> <6>[ 157.403072] __schedule+0x5d7/0x1ef0 >>> <6>[ 157.403078] ? lock_acquire+0xc4/0x300 >>> <6>[ 157.403084] ? schedule+0x10e/0x180 >>> <6>[ 157.403087] ? lock_release+0xcd/0x2b0 >>> <6>[ 157.403092] schedule+0x3a/0x180 >>> <6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120 >>> <6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0 >>> <6>[ 157.403102] ? lock_release+0xcd/0x2b0 >>> <6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70 >>> <6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0 >>> <6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30 >>> ````````````````````````````````````````````````````````````````````````````````` >>> Details log can be found in [3]. >>> >>> After bisecting the tree, the following patch [4] seems to be the first >>> "bad" commit >>> >>> ````````````````````````````````````````````````````````````````````````````````````````````````````````` >>> commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe >>> Author: Matthew Wilcox (Oracle) willy@infradead.org >>> Date: Thu Mar 5 19:55:43 2026 +0000 >>> >>> locking/mutex: Remove the list_head from struct mutex >>> ````````````````````````````````````````````````````````````````````````````````````````````````````````` >>> >>> We could not revert the patch because of merge conflict but resetting to the >>> parent of the commit seems to fix the issue. >>> >>> Could you please check why the patch causes this regression and provide a >>> fix if necessary? >> >> Does this help? > > More tidy version of the same... > > --- > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h > index b1834ab7e782..bb8b410779d4 100644 > --- a/kernel/locking/ww_mutex.h > +++ b/kernel/locking/ww_mutex.h > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock) > struct mutex_waiter *w = lock->first_waiter; > > if (w) > - w = list_prev_entry(w, list); > + w = __ww_waiter_prev(lock, w); > return w; > } > Thank you for the response, Peter. Unfortunately, the issue is still seen with this change. Regards Chaitanya ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-03-30 8:26 ` Borah, Chaitanya Kumar @ 2026-03-30 19:50 ` Peter Zijlstra 2026-04-20 13:03 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2026-03-30 19:50 UTC (permalink / raw) To: Borah, Chaitanya Kumar Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote: > > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h > > index b1834ab7e782..bb8b410779d4 100644 > > --- a/kernel/locking/ww_mutex.h > > +++ b/kernel/locking/ww_mutex.h > > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock) > > struct mutex_waiter *w = lock->first_waiter; > > if (w) > > - w = list_prev_entry(w, list); > > + w = __ww_waiter_prev(lock, w); > > return w; > > } > Thank you for the response, Peter. Unfortunately, the issue is still seen > with this change. Bah, indeed. Looking at this after the weekend I see that it's actually wrong. But I haven't yet had a new idea. I don't suppose there is a relatively easy way to reproduce this issue outside of your CI robot? My current working thesis is that since this is graphics, this is ww_mutex related. I'll go over this code once more... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-03-30 19:50 ` Peter Zijlstra @ 2026-04-20 13:03 ` Peter Zijlstra 2026-04-21 6:45 ` John Stultz 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2026-04-20 13:03 UTC (permalink / raw) To: Borah, Chaitanya Kumar Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam On Mon, Mar 30, 2026 at 09:50:37PM +0200, Peter Zijlstra wrote: > On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote: > > > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h > > > index b1834ab7e782..bb8b410779d4 100644 > > > --- a/kernel/locking/ww_mutex.h > > > +++ b/kernel/locking/ww_mutex.h > > > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock) > > > struct mutex_waiter *w = lock->first_waiter; > > > if (w) > > > - w = list_prev_entry(w, list); > > > + w = __ww_waiter_prev(lock, w); > > > return w; > > > } > > Thank you for the response, Peter. Unfortunately, the issue is still seen > > with this change. > > Bah, indeed. Looking at this after the weekend I see that it's actually > wrong. > > But I haven't yet had a new idea. I don't suppose there is a relatively > easy way to reproduce this issue outside of your CI robot? > > My current working thesis is that since this is graphics, this is > ww_mutex related. I'll go over this code once more... Since you've not provided a reproducer, can I ask you to try the below? --- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 186b463fe326..a93e57fc53b1 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -229,10 +229,8 @@ __mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter) __mutex_clear_flag(lock, MUTEX_FLAGS); lock->first_waiter = NULL; } else { - if (lock->first_waiter == waiter) { - lock->first_waiter = list_first_entry(&waiter->list, - struct mutex_waiter, list); - } + if (lock->first_waiter == waiter) + lock->first_waiter = list_next_entry(waiter, list); list_del(&waiter->list); } diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 016f0db892a5..875b303511b3 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -6,6 +6,32 @@ #define MUTEX_WAITER mutex_waiter #define WAIT_LOCK wait_lock +/* + * +-------+ + * | 3 | <+ + * +-------+ | + * ^ | + * | | + * v | + * +-------+ +-------+ | + * | first | --> | 1 | | + * +-------+ +-------+ | + * ^ | + * | | + * v | + * +-------+ | + * | 2 | <+ + * +-------+ + */ + +/* + * Specifically: + * + * for (cur = __ww_waiter_first(); cur; cur = __ww_waiter_next()) + * ... + * + * should iterate like: 1 2 3 + */ static inline struct mutex_waiter * __ww_waiter_first(struct mutex *lock) __must_hold(&lock->wait_lock) @@ -18,23 +44,21 @@ __ww_waiter_next(struct mutex *lock, struct mutex_waiter *w) __must_hold(&lock->wait_lock) { w = list_next_entry(w, list); - if (lock->first_waiter == w) - return NULL; - - return w; -} - -static inline struct mutex_waiter * -__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w) - __must_hold(&lock->wait_lock) -{ - w = list_prev_entry(w, list); - if (lock->first_waiter == w) + /* We've already seen first, terminate */ + if (w == __ww_waiter_first(lock)) return NULL; return w; } +/* + * Specifically: + * + * for (cur = __ww_waiter_last(); cur; cur = __ww_waiter_prev()) + * ... + * + * should iterate like: 3 2 1 + */ static inline struct mutex_waiter * __ww_waiter_last(struct mutex *lock) __must_hold(&lock->wait_lock) @@ -46,6 +70,18 @@ __ww_waiter_last(struct mutex *lock) return w; } +static inline struct mutex_waiter * +__ww_waiter_prev(struct mutex *lock, struct mutex_waiter *w) + __must_hold(&lock->wait_lock) +{ + w = list_prev_entry(w, list); + /* We've already seen last, terminate */ + if (w == __ww_waiter_last(lock)) + return NULL; + + return w; +} + static inline void __ww_waiter_add(struct mutex *lock, struct mutex_waiter *waiter, struct mutex_waiter *pos) __must_hold(&lock->wait_lock) ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-04-20 13:03 ` Peter Zijlstra @ 2026-04-21 6:45 ` John Stultz 2026-04-21 10:15 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: John Stultz @ 2026-04-21 6:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Borah, Chaitanya Kumar, willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam, K Prateek Nayak On Mon, Apr 20, 2026 at 6:03 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Mar 30, 2026 at 09:50:37PM +0200, Peter Zijlstra wrote: > > On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote: > > > > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h > > > > index b1834ab7e782..bb8b410779d4 100644 > > > > --- a/kernel/locking/ww_mutex.h > > > > +++ b/kernel/locking/ww_mutex.h > > > > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock) > > > > struct mutex_waiter *w = lock->first_waiter; > > > > if (w) > > > > - w = list_prev_entry(w, list); > > > > + w = __ww_waiter_prev(lock, w); > > > > return w; > > > > } > > > Thank you for the response, Peter. Unfortunately, the issue is still seen > > > with this change. > > > > Bah, indeed. Looking at this after the weekend I see that it's actually > > wrong. > > > > But I haven't yet had a new idea. I don't suppose there is a relatively > > easy way to reproduce this issue outside of your CI robot? > > > > My current working thesis is that since this is graphics, this is > > ww_mutex related. I'll go over this code once more... So I tripped over this in my own testing today preping proxy patches, bisecting it down to the same problematic commit 25500ba7e77c ("locking/mutex: Remove the list_head from struct mutex"). Inteed it does seem related to ww_mutexes, as I can pretty easily reproduce it with defconfig + CONFIG_WW_MUTEX_SELFTEST=y using qemu-system-x86 Where the test will basically hang on bootup. > Since you've not provided a reproducer, can I ask you to try the below? > Unfortunately that patch doesn't seem to sort it (see the same behavior). I'm about cooked for tonight so I'll have to look more closely tomorrow. thanks -john ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression on linux-next (next-20260324 ) 2026-04-21 6:45 ` John Stultz @ 2026-04-21 10:15 ` Peter Zijlstra 0 siblings, 0 replies; 8+ messages in thread From: Peter Zijlstra @ 2026-04-21 10:15 UTC (permalink / raw) To: John Stultz Cc: Borah, Chaitanya Kumar, willy, linux-kernel, intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar, Saarinen, Jani, ravitejax.veesam, K Prateek Nayak On Mon, Apr 20, 2026 at 11:45:12PM -0700, John Stultz wrote: > So I tripped over this in my own testing today preping proxy patches, > bisecting it down to the same problematic commit 25500ba7e77c > ("locking/mutex: Remove the list_head from struct mutex"). > > Inteed it does seem related to ww_mutexes, as I can pretty easily > reproduce it with defconfig + CONFIG_WW_MUTEX_SELFTEST=y using > qemu-system-x86 > > Where the test will basically hang on bootup. *groan* indeed. This of course means no CI is running this thing :-( Anyway, yay for deterministic reproducer. Let me go prod at this. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-21 10:15 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar 2026-03-27 16:31 ` Peter Zijlstra 2026-03-27 16:43 ` Peter Zijlstra 2026-03-30 8:26 ` Borah, Chaitanya Kumar 2026-03-30 19:50 ` Peter Zijlstra 2026-04-20 13:03 ` Peter Zijlstra 2026-04-21 6:45 ` John Stultz 2026-04-21 10:15 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox