public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
* Regression on linux-next (next-20260324 )
@ 2026-03-27 13:39 Borah, Chaitanya Kumar
  2026-03-27 16:31 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Borah, Chaitanya Kumar @ 2026-03-27 13:39 UTC (permalink / raw)
  To: willy
  Cc: linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, peterz, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

Hello Matthew,

Hope you are doing well. I am Chaitanya from the linux graphics team in 
Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on
linux-next repository.

Since the version next-20260324 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current 
test with SIGQUIT.
<6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) 
show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) 
kill-all-tasks(i) thaw-filesystems(j) sak(k) 
show-backtrace-all-active-cpus(l) show-memory-usage(m) 
nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) 
unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) 
show-blocked-tasks(w) dump-ftrace-buffer(z) replay-kernel-logs(R)
<6>[  157.399543] sysrq: Show State
<6>[  157.403061] task:systemd         state:S stack:0     pid:1 
tgid:1     ppid:0      task_flags:0x400100 flags:0x00080000
<6>[  157.403067] Call Trace:
<6>[  157.403069]  <TASK>
<6>[  157.403072]  __schedule+0x5d7/0x1ef0
<6>[  157.403078]  ? lock_acquire+0xc4/0x300
<6>[  157.403084]  ? schedule+0x10e/0x180
<6>[  157.403087]  ? lock_release+0xcd/0x2b0
<6>[  157.403092]  schedule+0x3a/0x180
<6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
<6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
<6>[  157.403102]  ? lock_release+0xcd/0x2b0
<6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
<6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
<6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first 
"bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
Author: Matthew Wilcox (Oracle) willy@infradead.org
Date:   Thu Mar 5 19:55:43 2026 +0000

     locking/mutex: Remove the list_head from struct mutex
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflict but resetting to 
the parent of the commit seems to fix the issue.

Could you please check why the patch causes this regression and provide 
a fix if necessary?

Thank you.

Regards

Chaitanya

[1]
https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20260324
[3]
https://intel-gfx-ci.01.org/tree/linux-next/next-20260326/bat-arlh-2/dmesg0.txt
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20231130&id=f4acfcd4deb158b96595250cc332901b282d15b0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
@ 2026-03-27 16:31 ` Peter Zijlstra
  2026-03-27 16:43   ` Peter Zijlstra
  2026-03-27 16:36 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
  2026-03-27 16:44 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2026-03-27 16:31 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
> Hello Matthew,
> 
> Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> 
> This mail is regarding a regression we are seeing in our CI runs[1] on
> linux-next repository.
> 
> Since the version next-20260324 [2], we are seeing the following regression
> 
> `````````````````````````````````````````````````````````````````````````````````
> <5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current
> test with SIGQUIT.
> <6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
> show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
> kill-all-tasks(i) thaw-filesystems(j) sak(k)
> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
> show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
> dump-ftrace-buffer(z) replay-kernel-logs(R)
> <6>[  157.399543] sysrq: Show State
> <6>[  157.403061] task:systemd         state:S stack:0     pid:1 tgid:1
> ppid:0      task_flags:0x400100 flags:0x00080000
> <6>[  157.403067] Call Trace:
> <6>[  157.403069]  <TASK>
> <6>[  157.403072]  __schedule+0x5d7/0x1ef0
> <6>[  157.403078]  ? lock_acquire+0xc4/0x300
> <6>[  157.403084]  ? schedule+0x10e/0x180
> <6>[  157.403087]  ? lock_release+0xcd/0x2b0
> <6>[  157.403092]  schedule+0x3a/0x180
> <6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
> <6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
> <6>[  157.403102]  ? lock_release+0xcd/0x2b0
> <6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
> <6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
> <6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [3].
> 
> After bisecting the tree, the following patch [4] seems to be the first
> "bad" commit
> 
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
> Author: Matthew Wilcox (Oracle) willy@infradead.org
> Date:   Thu Mar 5 19:55:43 2026 +0000
> 
>     locking/mutex: Remove the list_head from struct mutex
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> 
> We could not revert the patch because of merge conflict but resetting to the
> parent of the commit seems to fix the issue.
> 
> Could you please check why the patch causes this regression and provide a
> fix if necessary?

Does this help?

---
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -40,10 +40,10 @@ __ww_waiter_last(struct mutex *lock)
 	__must_hold(&lock->wait_lock)
 {
 	struct mutex_waiter *w = lock->first_waiter;
+	if (!w)
+		return NULL;
 
-	if (w)
-		w = list_prev_entry(w, list);
-	return w;
+	return __ww_waiter_prev(lock, w);
 }
 
 static inline void

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 )
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
  2026-03-27 16:31 ` Peter Zijlstra
@ 2026-03-27 16:36 ` Patchwork
  2026-03-27 16:44 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork
  2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2026-03-27 16:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-xe

== Series Details ==

Series: Regression on linux-next (next-20260324 )
URL   : https://patchwork.freedesktop.org/series/164008/
State : failure

== Summary ==

Address 'peterz@infradead.org' is not on the allowlist, which prevents CI from being triggered for this patch.
If you want Intel GFX CI to accept this address, please contact the script maintainers at i915-ci-infra@lists.freedesktop.org.
Exception occurred during validation, bailing out!



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-27 16:31 ` Peter Zijlstra
@ 2026-03-27 16:43   ` Peter Zijlstra
  2026-03-30  8:26     ` Borah, Chaitanya Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2026-03-27 16:43 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
> > Hello Matthew,
> > 
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> > Intel.
> > 
> > This mail is regarding a regression we are seeing in our CI runs[1] on
> > linux-next repository.
> > 
> > Since the version next-20260324 [2], we are seeing the following regression
> > 
> > `````````````````````````````````````````````````````````````````````````````````
> > <5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current
> > test with SIGQUIT.
> > <6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
> > show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
> > kill-all-tasks(i) thaw-filesystems(j) sak(k)
> > show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
> > poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
> > show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
> > dump-ftrace-buffer(z) replay-kernel-logs(R)
> > <6>[  157.399543] sysrq: Show State
> > <6>[  157.403061] task:systemd         state:S stack:0     pid:1 tgid:1
> > ppid:0      task_flags:0x400100 flags:0x00080000
> > <6>[  157.403067] Call Trace:
> > <6>[  157.403069]  <TASK>
> > <6>[  157.403072]  __schedule+0x5d7/0x1ef0
> > <6>[  157.403078]  ? lock_acquire+0xc4/0x300
> > <6>[  157.403084]  ? schedule+0x10e/0x180
> > <6>[  157.403087]  ? lock_release+0xcd/0x2b0
> > <6>[  157.403092]  schedule+0x3a/0x180
> > <6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
> > <6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
> > <6>[  157.403102]  ? lock_release+0xcd/0x2b0
> > <6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
> > <6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
> > <6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
> > `````````````````````````````````````````````````````````````````````````````````
> > Details log can be found in [3].
> > 
> > After bisecting the tree, the following patch [4] seems to be the first
> > "bad" commit
> > 
> > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
> > Author: Matthew Wilcox (Oracle) willy@infradead.org
> > Date:   Thu Mar 5 19:55:43 2026 +0000
> > 
> >     locking/mutex: Remove the list_head from struct mutex
> > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > 
> > We could not revert the patch because of merge conflict but resetting to the
> > parent of the commit seems to fix the issue.
> > 
> > Could you please check why the patch causes this regression and provide a
> > fix if necessary?
> 
> Does this help?

More tidy version of the same...

---
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index b1834ab7e782..bb8b410779d4 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
 	struct mutex_waiter *w = lock->first_waiter;
 
 	if (w)
-		w = list_prev_entry(w, list);
+		w = __ww_waiter_prev(lock, w);
 	return w;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2)
  2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
  2026-03-27 16:31 ` Peter Zijlstra
  2026-03-27 16:36 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
@ 2026-03-27 16:44 ` Patchwork
  2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2026-03-27 16:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: intel-xe

== Series Details ==

Series: Regression on linux-next (next-20260324 ) (rev2)
URL   : https://patchwork.freedesktop.org/series/164008/
State : failure

== Summary ==

Address 'peterz@infradead.org' is not on the allowlist, which prevents CI from being triggered for this patch.
If you want Intel GFX CI to accept this address, please contact the script maintainers at i915-ci-infra@lists.freedesktop.org.
Exception occurred during validation, bailing out!



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-27 16:43   ` Peter Zijlstra
@ 2026-03-30  8:26     ` Borah, Chaitanya Kumar
  2026-03-30 19:50       ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Borah, Chaitanya Kumar @ 2026-03-30  8:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam



On 3/27/2026 10:13 PM, Peter Zijlstra wrote:
> On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote:
>> On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
>>> Hello Matthew,
>>>
>>> Hope you are doing well. I am Chaitanya from the linux graphics team in
>>> Intel.
>>>
>>> This mail is regarding a regression we are seeing in our CI runs[1] on
>>> linux-next repository.
>>>
>>> Since the version next-20260324 [2], we are seeing the following regression
>>>
>>> `````````````````````````````````````````````````````````````````````````````````
>>> <5>[  157.361977] [IGT] Inactivity timeout exceeded. Killing the current
>>> test with SIGQUIT.
>>> <6>[  157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
>>> show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
>>> kill-all-tasks(i) thaw-filesystems(j) sak(k)
>>> show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
>>> poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
>>> show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
>>> dump-ftrace-buffer(z) replay-kernel-logs(R)
>>> <6>[  157.399543] sysrq: Show State
>>> <6>[  157.403061] task:systemd         state:S stack:0     pid:1 tgid:1
>>> ppid:0      task_flags:0x400100 flags:0x00080000
>>> <6>[  157.403067] Call Trace:
>>> <6>[  157.403069]  <TASK>
>>> <6>[  157.403072]  __schedule+0x5d7/0x1ef0
>>> <6>[  157.403078]  ? lock_acquire+0xc4/0x300
>>> <6>[  157.403084]  ? schedule+0x10e/0x180
>>> <6>[  157.403087]  ? lock_release+0xcd/0x2b0
>>> <6>[  157.403092]  schedule+0x3a/0x180
>>> <6>[  157.403094]  schedule_hrtimeout_range_clock+0x112/0x120
>>> <6>[  157.403097]  ? do_epoll_wait+0x3e4/0x5b0
>>> <6>[  157.403102]  ? lock_release+0xcd/0x2b0
>>> <6>[  157.403104]  ? _raw_spin_unlock_irq+0x27/0x70
>>> <6>[  157.403106]  ? do_epoll_wait+0x3e4/0x5b0
>>> <6>[  157.403110]  schedule_hrtimeout_range+0x13/0x30
>>> `````````````````````````````````````````````````````````````````````````````````
>>> Details log can be found in [3].
>>>
>>> After bisecting the tree, the following patch [4] seems to be the first
>>> "bad" commit
>>>
>>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>> commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
>>> Author: Matthew Wilcox (Oracle) willy@infradead.org
>>> Date:   Thu Mar 5 19:55:43 2026 +0000
>>>
>>>      locking/mutex: Remove the list_head from struct mutex
>>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>>
>>> We could not revert the patch because of merge conflict but resetting to the
>>> parent of the commit seems to fix the issue.
>>>
>>> Could you please check why the patch causes this regression and provide a
>>> fix if necessary?
>>
>> Does this help?
> 
> More tidy version of the same...
> 
> ---
> diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> index b1834ab7e782..bb8b410779d4 100644
> --- a/kernel/locking/ww_mutex.h
> +++ b/kernel/locking/ww_mutex.h
> @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
>   	struct mutex_waiter *w = lock->first_waiter;
>   
>   	if (w)
> -		w = list_prev_entry(w, list);
> +		w = __ww_waiter_prev(lock, w);
>   	return w;
>   }
>   
Thank you for the response, Peter. Unfortunately, the issue is still 
seen with this change.

Regards
Chaitanya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression on linux-next (next-20260324 )
  2026-03-30  8:26     ` Borah, Chaitanya Kumar
@ 2026-03-30 19:50       ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2026-03-30 19:50 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: willy, linux-kernel, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, Kurmi, Suresh Kumar,
	Saarinen, Jani, ravitejax.veesam

On Mon, Mar 30, 2026 at 01:56:33PM +0530, Borah, Chaitanya Kumar wrote:
> > diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
> > index b1834ab7e782..bb8b410779d4 100644
> > --- a/kernel/locking/ww_mutex.h
> > +++ b/kernel/locking/ww_mutex.h
> > @@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
> >   	struct mutex_waiter *w = lock->first_waiter;
> >   	if (w)
> > -		w = list_prev_entry(w, list);
> > +		w = __ww_waiter_prev(lock, w);
> >   	return w;
> >   }
> Thank you for the response, Peter. Unfortunately, the issue is still seen
> with this change.

Bah, indeed. Looking at this after the weekend I see that it's actually
wrong.

But I haven't yet had a new idea. I don't suppose there is a relatively
easy way to reproduce this issue outside of your CI robot?

My current working thesis is that since this is graphics, this is
ww_mutex related. I'll go over this code once more...

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-30 19:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 13:39 Regression on linux-next (next-20260324 ) Borah, Chaitanya Kumar
2026-03-27 16:31 ` Peter Zijlstra
2026-03-27 16:43   ` Peter Zijlstra
2026-03-30  8:26     ` Borah, Chaitanya Kumar
2026-03-30 19:50       ` Peter Zijlstra
2026-03-27 16:36 ` ✗ LGCI.VerificationFailed: failure for " Patchwork
2026-03-27 16:44 ` ✗ LGCI.VerificationFailed: failure for Regression on linux-next (next-20260324 ) (rev2) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox