* REGRESSION on drm-tip
@ 2025-11-27 6:25 Borah, Chaitanya Kumar
2025-11-27 16:01 ` Saarinen, Jani
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-11-27 6:25 UTC (permalink / raw)
To: brauner
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Saarinen, Jani, Kurmi, Suresh Kumar, Lucas De Marchi
Hello Christian,
This is Chaitanya (again!).
This mail is regarding another regression we are seeing in our CI
runs[1] on drm-tip (with both xe and i915).
`````````````````````````````````````````````````````````````````````````````````
<4> [157.687644] ------------[ cut here ]------------
<4> [157.687768] WARNING: CPU: 5 PID: 2277 at kernel/freezer.c:139
__set_task_frozen+0x7f/0xb0
...
<4> [157.687923] PKRU: 55555554
<4> [157.687924] Call Trace:
<4> [157.687925] <TASK>
<4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10
<4> [157.687929] task_call_func+0x6d/0x120
<4> [157.687932] ? cgroup_freezing+0x89/0x200
<4> [157.687937] freeze_task+0x98/0x100
<4> [157.687940] try_to_freeze_tasks+0xd2/0x440
<4> [157.687946] freeze_processes+0x56/0xd0
<4> [157.687948] hibernate+0x129/0x4a0
<4> [157.687951] state_store+0xd3/0xe0
<4> [157.687954] kobj_attr_store+0x12/0x40
<4> [157.687959] sysfs_kf_write+0x4d/0x80
<4> [157.687963] kernfs_fop_write_iter+0x188/0x240
<4> [157.687967] vfs_write+0x283/0x540
<4> [157.687969] ? free_to_partial_list+0x46d/0x640
<4> [157.687976] ksys_write+0x6f/0xf0
<4> [157.687980] __x64_sys_write+0x19/0x30
<4> [157.687982] x64_sys_call+0x79/0x26a0
<4> [157.687984] do_syscall_64+0x93/0xd60
<4> [157.687987] ? putname+0x65/0x90
<4> [157.687990] ? kmem_cache_free+0x553/0x680
<4> [157.687995] ? putname+0x65/0x90
<4> [157.687997] ? putname+0x65/0x90
<4> [157.687999] ? do_sys_openat2+0x8b/0xd0
<4> [157.688003] ? __x64_sys_openat+0x54/0xa0
<4> [157.688007] ? do_syscall_64+0x1b7/0xd60
<4> [157.688009] ? __fput+0x1bf/0x2f0
<4> [157.688012] ? fput_close_sync+0x3d/0xa0
<4> [157.688015] ? __x64_sys_close+0x3e/0x90
<4> [157.688017] ? do_syscall_64+0x1b7/0xd60
<4> [157.688019] ? putname+0x65/0x90
<4> [157.688021] ? putname+0x65/0x90
<4> [157.688023] ? do_sys_openat2+0x8b/0xd0
<4> [157.688024] ? __fput+0x1bf/0x2f0
<4> [157.688028] ? __x64_sys_openat+0x54/0xa0
<4> [157.688032] ? do_syscall_64+0x1b7/0xd60
<4> [157.688034] ? do_syscall_64+0x1b7/0xd60
<4> [157.688036] ? irqentry_exit+0x77/0xb0
<4> [157.688038] ? exc_page_fault+0xbd/0x2c0
<4> [157.688042] entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4> [157.688044] RIP: 0033:0x72523c91c574
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [2].
After bisecting the tree, the following patch [3] seems to be the first
"bad" commit
`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
Author: Christian Brauner <brauner@kernel.org>
Date: Wed Nov 5 14:39:45 2025 +0100
power: always freeze efivarfs
`````````````````````````````````````````````````````````````````````````````````````````````````````````
We also verified that if we revert the patch the issue is not seen.
Could you please check why the patch causes this regression and provide
a fix if necessary?
Thank you.
Regards
Chaitanya
[1]
https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=suspend
[2]
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_17595/shard-mtlp-6/igt@gem_exec_suspend@basic-s4-devices.html
[3]
https://gitlab.com/freedesktop-mirror/drm-tip/-/commit/a3f8f8662771285511ae26c4c8d3ba1cd22159b9
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: REGRESSION on drm-tip
2025-11-27 6:25 REGRESSION on drm-tip Borah, Chaitanya Kumar
@ 2025-11-27 16:01 ` Saarinen, Jani
2025-11-27 16:06 ` Saarinen, Jani
2025-11-27 23:04 ` Ville Syrjälä
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Saarinen, Jani @ 2025-11-27 16:01 UTC (permalink / raw)
To: Borah, Chaitanya Kumar, brauner@kernel.org
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Kurmi, Suresh Kumar, De Marchi, Lucas
Hi,
> -----Original Message-----
> From: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> Sent: Thursday, 27 November 2025 8.26
> To: brauner@kernel.org
> Cc: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Saarinen, Jani
> <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
> <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>
> Subject: REGRESSION on drm-tip
>
> Hello Christian,
>
> This is Chaitanya (again!).
>
> This mail is regarding another regression we are seeing in our CI runs[1] on drm-
> tip (with both xe and i915).
>
> `````````````````````````````````````````````````````````````````````````````````
> <4> [157.687644] ------------[ cut here ]------------ <4> [157.687768] WARNING:
> CPU: 5 PID: 2277 at kernel/freezer.c:139
> __set_task_frozen+0x7f/0xb0
> ...
> <4> [157.687923] PKRU: 55555554
> <4> [157.687924] Call Trace:
> <4> [157.687925] <TASK>
> <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10 <4> [157.687929]
> task_call_func+0x6d/0x120 <4> [157.687932] ? cgroup_freezing+0x89/0x200
> <4> [157.687937] freeze_task+0x98/0x100 <4> [157.687940]
> try_to_freeze_tasks+0xd2/0x440 <4> [157.687946]
> freeze_processes+0x56/0xd0 <4> [157.687948] hibernate+0x129/0x4a0 <4>
> [157.687951] state_store+0xd3/0xe0 <4> [157.687954]
> kobj_attr_store+0x12/0x40 <4> [157.687959] sysfs_kf_write+0x4d/0x80 <4>
> [157.687963] kernfs_fop_write_iter+0x188/0x240 <4> [157.687967]
> vfs_write+0x283/0x540 <4> [157.687969] ? free_to_partial_list+0x46d/0x640
> <4> [157.687976] ksys_write+0x6f/0xf0 <4> [157.687980]
> __x64_sys_write+0x19/0x30 <4> [157.687982] x64_sys_call+0x79/0x26a0 <4>
> [157.687984] do_syscall_64+0x93/0xd60 <4> [157.687987] ?
> putname+0x65/0x90 <4> [157.687990] ? kmem_cache_free+0x553/0x680 <4>
> [157.687995] ? putname+0x65/0x90 <4> [157.687997] ? putname+0x65/0x90
> <4> [157.687999] ? do_sys_openat2+0x8b/0xd0 <4> [157.688003] ?
> __x64_sys_openat+0x54/0xa0 <4> [157.688007] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688009] ? __fput+0x1bf/0x2f0 <4> [157.688012] ?
> fput_close_sync+0x3d/0xa0 <4> [157.688015] ? __x64_sys_close+0x3e/0x90
> <4> [157.688017] ? do_syscall_64+0x1b7/0xd60 <4> [157.688019] ?
> putname+0x65/0x90 <4> [157.688021] ? putname+0x65/0x90 <4> [157.688023]
> ? do_sys_openat2+0x8b/0xd0 <4> [157.688024] ? __fput+0x1bf/0x2f0 <4>
> [157.688028] ? __x64_sys_openat+0x54/0xa0 <4> [157.688032] ?
> do_syscall_64+0x1b7/0xd60 <4> [157.688034] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688036] ? irqentry_exit+0x77/0xb0 <4> [157.688038] ?
> exc_page_fault+0xbd/0x2c0 <4> [157.688042]
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [157.688044] RIP: 0033:0x72523c91c574
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
>
> After bisecting the tree, the following patch [3] seems to be the first "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> Author: Christian Brauner <brauner@kernel.org>
> Date: Wed Nov 5 14:39:45 2025 +0100
>
> power: always freeze efivarfs
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
Yes, revert works as you see from https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_158163v2/index.html?testfilter=suspend
For any fix, post it to intel-gfx and xe and you should get results (if BAT passes)
>
> Could you please check why the patch causes this regression and provide a fix if necessary?
>
> Thank you.
>
> Regards
>
> Chaitanya
>
> [1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=suspend
> [2] https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_17595/shard-mtlp-6/igt@gem_exec_suspend@basic-s4-devices.html
> [3] https://gitlab.com/freedesktop-mirror/drm-tip/-> /commit/a3f8f8662771285511ae26c4c8d3ba1cd22159b9
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: REGRESSION on drm-tip
2025-11-27 16:01 ` Saarinen, Jani
@ 2025-11-27 16:06 ` Saarinen, Jani
0 siblings, 0 replies; 22+ messages in thread
From: Saarinen, Jani @ 2025-11-27 16:06 UTC (permalink / raw)
To: Borah, Chaitanya Kumar, brauner@kernel.org
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Kurmi, Suresh Kumar, De Marchi, Lucas
Hi,
Add addition below.
> -----Original Message-----
> From: Saarinen, Jani
> Sent: Thursday, 27 November 2025 18.02
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>;
> brauner@kernel.org
> Cc: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Kurmi, Suresh
> Kumar <Suresh.Kumar.Kurmi@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>
> Subject: RE: REGRESSION on drm-tip
>
> Hi,
>
> > -----Original Message-----
> > From: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> > Sent: Thursday, 27 November 2025 8.26
> > To: brauner@kernel.org
> > Cc: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org;
> > Saarinen, Jani <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
> > <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> > <lucas.demarchi@intel.com>
> > Subject: REGRESSION on drm-tip
> >
> > Hello Christian,
> >
> > This is Chaitanya (again!).
> >
> > This mail is regarding another regression we are seeing in our CI
> > runs[1] on drm- tip (with both xe and i915).
> >
> > ``````````````````````````````````````````````````````````````````````
> > ``````````` <4> [157.687644] ------------[ cut here ]------------ <4>
> > [157.687768] WARNING:
> > CPU: 5 PID: 2277 at kernel/freezer.c:139
> > __set_task_frozen+0x7f/0xb0
> > ...
> > <4> [157.687923] PKRU: 55555554
> > <4> [157.687924] Call Trace:
> > <4> [157.687925] <TASK>
> > <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10 <4> [157.687929]
> > task_call_func+0x6d/0x120 <4> [157.687932] ?
> > cgroup_freezing+0x89/0x200 <4> [157.687937] freeze_task+0x98/0x100
> > <4> [157.687940]
> > try_to_freeze_tasks+0xd2/0x440 <4> [157.687946]
> > freeze_processes+0x56/0xd0 <4> [157.687948] hibernate+0x129/0x4a0 <4>
> > [157.687951] state_store+0xd3/0xe0 <4> [157.687954]
> > kobj_attr_store+0x12/0x40 <4> [157.687959] sysfs_kf_write+0x4d/0x80
> > <4> [157.687963] kernfs_fop_write_iter+0x188/0x240 <4> [157.687967]
> > vfs_write+0x283/0x540 <4> [157.687969] ?
> > free_to_partial_list+0x46d/0x640 <4> [157.687976]
> > ksys_write+0x6f/0xf0 <4> [157.687980]
> > __x64_sys_write+0x19/0x30 <4> [157.687982] x64_sys_call+0x79/0x26a0
> > <4> [157.687984] do_syscall_64+0x93/0xd60 <4> [157.687987] ?
> > putname+0x65/0x90 <4> [157.687990] ? kmem_cache_free+0x553/0x680 <4>
> > [157.687995] ? putname+0x65/0x90 <4> [157.687997] ?
> > putname+0x65/0x90 <4> [157.687999] ? do_sys_openat2+0x8b/0xd0 <4>
> [157.688003] ?
> > __x64_sys_openat+0x54/0xa0 <4> [157.688007] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688009] ? __fput+0x1bf/0x2f0 <4>
> [157.688012] ?
> > fput_close_sync+0x3d/0xa0 <4> [157.688015] ?
> > __x64_sys_close+0x3e/0x90 <4> [157.688017] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688019] ?
> > putname+0x65/0x90 <4> [157.688021] ? putname+0x65/0x90 <4>
> > putname+[157.688023]
> > ? do_sys_openat2+0x8b/0xd0 <4> [157.688024] ? __fput+0x1bf/0x2f0 <4>
> > [157.688028] ? __x64_sys_openat+0x54/0xa0 <4> [157.688032] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688034] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688036] ? irqentry_exit+0x77/0xb0 <4>
> [157.688038] ?
> > exc_page_fault+0xbd/0x2c0 <4> [157.688042]
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > <4> [157.688044] RIP: 0033:0x72523c91c574
> > ``````````````````````````````````````````````````````````````````````
> > ```````````
> > Details log can be found in [2].
> >
> > After bisecting the tree, the following patch [3] seems to be the
> > first "bad" commit
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> > commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> > Author: Christian Brauner <brauner@kernel.org>
> > Date: Wed Nov 5 14:39:45 2025 +0100
> >
> > power: always freeze efivarfs
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> >
> > We also verified that if we revert the patch the issue is not seen.
>
> Yes, revert works as you see from https://intel-gfx-ci.01.org/tree/drm-
> tip/Patchwork_158163v2/index.html?testfilter=suspend
Missed, same for Xe https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-158162v1/index.html?testfilter=suspend
Br,
Jani
> For any fix, post it to intel-gfx and xe and you should get results (if BAT passes)
> >
> > Could you please check why the patch causes this regression and provide a fix if
> necessary?
> >
> > Thank you.
> >
> > Regards
> >
> > Chaitanya
> >
> > [1]
> > https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=suspend
> > [2]
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_17595/shard-mtlp-6/igt
> > @gem_exec_suspend@basic-s4-devices.html
> > [3] https://gitlab.com/freedesktop-mirror/drm-tip/->
> > /commit/a3f8f8662771285511ae26c4c8d3ba1cd22159b9
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: REGRESSION on drm-tip
2025-11-27 6:25 REGRESSION on drm-tip Borah, Chaitanya Kumar
2025-11-27 16:01 ` Saarinen, Jani
@ 2025-11-27 23:04 ` Ville Syrjälä
2025-11-28 7:46 ` Borah, Chaitanya Kumar
2025-12-01 16:13 ` Saarinen, Jani
2025-12-03 13:34 ` BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip) Jani Nikula
2025-12-05 10:14 ` REGRESSION on drm-tip Christian Brauner
3 siblings, 2 replies; 22+ messages in thread
From: Ville Syrjälä @ 2025-11-27 23:04 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: brauner, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, Saarinen, Jani,
Kurmi, Suresh Kumar, Lucas De Marchi
On Thu, Nov 27, 2025 at 11:55:54AM +0530, Borah, Chaitanya Kumar wrote:
> Hello Christian,
>
> This is Chaitanya (again!).
>
> This mail is regarding another regression we are seeing in our CI
> runs[1] on drm-tip (with both xe and i915).
>
> `````````````````````````````````````````````````````````````````````````````````
> <4> [157.687644] ------------[ cut here ]------------
> <4> [157.687768] WARNING: CPU: 5 PID: 2277 at kernel/freezer.c:139
> __set_task_frozen+0x7f/0xb0
> ...
> <4> [157.687923] PKRU: 55555554
> <4> [157.687924] Call Trace:
> <4> [157.687925] <TASK>
> <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10
> <4> [157.687929] task_call_func+0x6d/0x120
> <4> [157.687932] ? cgroup_freezing+0x89/0x200
> <4> [157.687937] freeze_task+0x98/0x100
> <4> [157.687940] try_to_freeze_tasks+0xd2/0x440
> <4> [157.687946] freeze_processes+0x56/0xd0
> <4> [157.687948] hibernate+0x129/0x4a0
> <4> [157.687951] state_store+0xd3/0xe0
> <4> [157.687954] kobj_attr_store+0x12/0x40
> <4> [157.687959] sysfs_kf_write+0x4d/0x80
> <4> [157.687963] kernfs_fop_write_iter+0x188/0x240
> <4> [157.687967] vfs_write+0x283/0x540
> <4> [157.687969] ? free_to_partial_list+0x46d/0x640
> <4> [157.687976] ksys_write+0x6f/0xf0
> <4> [157.687980] __x64_sys_write+0x19/0x30
> <4> [157.687982] x64_sys_call+0x79/0x26a0
> <4> [157.687984] do_syscall_64+0x93/0xd60
> <4> [157.687987] ? putname+0x65/0x90
> <4> [157.687990] ? kmem_cache_free+0x553/0x680
> <4> [157.687995] ? putname+0x65/0x90
> <4> [157.687997] ? putname+0x65/0x90
> <4> [157.687999] ? do_sys_openat2+0x8b/0xd0
> <4> [157.688003] ? __x64_sys_openat+0x54/0xa0
> <4> [157.688007] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688009] ? __fput+0x1bf/0x2f0
> <4> [157.688012] ? fput_close_sync+0x3d/0xa0
> <4> [157.688015] ? __x64_sys_close+0x3e/0x90
> <4> [157.688017] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688019] ? putname+0x65/0x90
> <4> [157.688021] ? putname+0x65/0x90
> <4> [157.688023] ? do_sys_openat2+0x8b/0xd0
> <4> [157.688024] ? __fput+0x1bf/0x2f0
> <4> [157.688028] ? __x64_sys_openat+0x54/0xa0
> <4> [157.688032] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688034] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688036] ? irqentry_exit+0x77/0xb0
> <4> [157.688038] ? exc_page_fault+0xbd/0x2c0
> <4> [157.688042] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [157.688044] RIP: 0033:0x72523c91c574
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
>
> After bisecting the tree, the following patch [3] seems to be the first
> "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> Author: Christian Brauner <brauner@kernel.org>
> Date: Wed Nov 5 14:39:45 2025 +0100
>
> power: always freeze efivarfs
- if (freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
+ if (!freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
?
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: REGRESSION on drm-tip
2025-11-27 23:04 ` Ville Syrjälä
@ 2025-11-28 7:46 ` Borah, Chaitanya Kumar
2025-12-05 10:14 ` Christian Brauner
2025-12-01 16:13 ` Saarinen, Jani
1 sibling, 1 reply; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-11-28 7:46 UTC (permalink / raw)
To: Ville Syrjälä
Cc: brauner, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, Saarinen, Jani,
Kurmi, Suresh Kumar, Lucas De Marchi
On 11/28/2025 4:34 AM, Ville Syrjälä wrote:
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>> commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
>> Author: Christian Brauner<brauner@kernel.org>
>> Date: Wed Nov 5 14:39:45 2025 +0100
>>
>> power: always freeze efivarfs
> - if (freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
> + if (!freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
>
> ?
This change helps.
@Christian do you plan to send out a fix for it?
Thank you Ville for pointing it out.
Regards
Chaitanya
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: REGRESSION on drm-tip
2025-11-28 7:46 ` Borah, Chaitanya Kumar
@ 2025-12-05 10:14 ` Christian Brauner
0 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2025-12-05 10:14 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: Ville Syrjälä, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, Saarinen, Jani,
Kurmi, Suresh Kumar, Lucas De Marchi
On Fri, Nov 28, 2025 at 01:16:44PM +0530, Borah, Chaitanya Kumar wrote:
>
>
> On 11/28/2025 4:34 AM, Ville Syrjälä wrote:
> > > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > > commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> > > Author: Christian Brauner<brauner@kernel.org>
> > > Date: Wed Nov 5 14:39:45 2025 +0100
> > >
> > > power: always freeze efivarfs
> > - if (freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
> > + if (!freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
> >
> > ?
>
> This change helps.
> @Christian do you plan to send out a fix for it?
>
> Thank you Ville for pointing it out.
Yes, it's in vfs.fixes. I'll send it today.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: REGRESSION on drm-tip
2025-11-27 23:04 ` Ville Syrjälä
2025-11-28 7:46 ` Borah, Chaitanya Kumar
@ 2025-12-01 16:13 ` Saarinen, Jani
1 sibling, 0 replies; 22+ messages in thread
From: Saarinen, Jani @ 2025-12-01 16:13 UTC (permalink / raw)
To: Ville Syrjälä, Borah, Chaitanya Kumar,
brauner@kernel.org
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Kurmi, Suresh Kumar, De Marchi, Lucas
> -----Original Message-----
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Sent: Friday, 28 November 2025 1.04
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> Cc: brauner@kernel.org; intel-gfx@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org; Saarinen, Jani <jani.saarinen@intel.com>; Kurmi,
> Suresh Kumar <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>
> Subject: Re: REGRESSION on drm-tip
>
> On Thu, Nov 27, 2025 at 11:55:54AM +0530, Borah, Chaitanya Kumar wrote:
> > Hello Christian,
> >
> > This is Chaitanya (again!).
> >
> > This mail is regarding another regression we are seeing in our CI
> > runs[1] on drm-tip (with both xe and i915).
> >
> > ``````````````````````````````````````````````````````````````````````
> > ``````````` <4> [157.687644] ------------[ cut here ]------------ <4>
> > [157.687768] WARNING: CPU: 5 PID: 2277 at kernel/freezer.c:139
> > __set_task_frozen+0x7f/0xb0
> > ...
> > <4> [157.687923] PKRU: 55555554
> > <4> [157.687924] Call Trace:
> > <4> [157.687925] <TASK>
> > <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10 <4> [157.687929]
> > task_call_func+0x6d/0x120 <4> [157.687932] ?
> > cgroup_freezing+0x89/0x200 <4> [157.687937] freeze_task+0x98/0x100
> > <4> [157.687940] try_to_freeze_tasks+0xd2/0x440 <4> [157.687946]
> > freeze_processes+0x56/0xd0 <4> [157.687948] hibernate+0x129/0x4a0 <4>
> > [157.687951] state_store+0xd3/0xe0 <4> [157.687954]
> > kobj_attr_store+0x12/0x40 <4> [157.687959] sysfs_kf_write+0x4d/0x80
> > <4> [157.687963] kernfs_fop_write_iter+0x188/0x240 <4> [157.687967]
> > vfs_write+0x283/0x540 <4> [157.687969] ?
> > free_to_partial_list+0x46d/0x640 <4> [157.687976]
> > ksys_write+0x6f/0xf0 <4> [157.687980] __x64_sys_write+0x19/0x30 <4>
> > [157.687982] x64_sys_call+0x79/0x26a0 <4> [157.687984]
> > do_syscall_64+0x93/0xd60 <4> [157.687987] ? putname+0x65/0x90 <4>
> > [157.687990] ? kmem_cache_free+0x553/0x680 <4> [157.687995] ?
> > putname+0x65/0x90 <4> [157.687997] ? putname+0x65/0x90 <4>
> > [157.687999] ? do_sys_openat2+0x8b/0xd0 <4> [157.688003] ?
> > __x64_sys_openat+0x54/0xa0 <4> [157.688007] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688009] ? __fput+0x1bf/0x2f0 <4>
> > [157.688012] ? fput_close_sync+0x3d/0xa0 <4> [157.688015] ?
> > __x64_sys_close+0x3e/0x90 <4> [157.688017] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688019] ? putname+0x65/0x90 <4>
> > [157.688021] ? putname+0x65/0x90 <4> [157.688023] ?
> > do_sys_openat2+0x8b/0xd0 <4> [157.688024] ? __fput+0x1bf/0x2f0 <4>
> > [157.688028] ? __x64_sys_openat+0x54/0xa0 <4> [157.688032] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688034] ?
> > do_syscall_64+0x1b7/0xd60 <4> [157.688036] ? irqentry_exit+0x77/0xb0
> > <4> [157.688038] ? exc_page_fault+0xbd/0x2c0 <4> [157.688042]
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > <4> [157.688044] RIP: 0033:0x72523c91c574
> > ``````````````````````````````````````````````````````````````````````
> > ```````````
> > Details log can be found in [2].
> >
> > After bisecting the tree, the following patch [3] seems to be the
> > first "bad" commit
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> > commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> > Author: Christian Brauner <brauner@kernel.org>
> > Date: Wed Nov 5 14:39:45 2025 +0100
> >
> > power: always freeze efivarfs
>
> - if (freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
> + if (!freeze_all_ptr && !(sb->s_type->fs_flags & FS_POWER_FREEZE))
>
> ?
>
Should we go with fix or revert? We should decide as still badly broken.
Still only i915 view: https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=suspend
> --
> Ville Syrjälä
> Intel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip)
2025-11-27 6:25 REGRESSION on drm-tip Borah, Chaitanya Kumar
2025-11-27 16:01 ` Saarinen, Jani
2025-11-27 23:04 ` Ville Syrjälä
@ 2025-12-03 13:34 ` Jani Nikula
2025-12-03 13:36 ` Jani Nikula
2025-12-05 10:14 ` REGRESSION on drm-tip Christian Brauner
3 siblings, 1 reply; 22+ messages in thread
From: Jani Nikula @ 2025-12-03 13:34 UTC (permalink / raw)
To: Borah, Chaitanya Kumar, brauner
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Saarinen, Jani, Kurmi, Suresh Kumar, Thorsten Leemhuis,
Ville Syrjälä
On Thu, 27 Nov 2025, "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> wrote:
> Hello Christian,
>
> This is Chaitanya (again!).
>
> This mail is regarding another regression we are seeing in our CI
> runs[1] on drm-tip (with both xe and i915).
Referring to drm-tip is downplaying the problem. The bisected regression
is in v6.18 release. It's breaking suspend/resume across a lot of
platforms on two drivers, i915 and xe.
a3f8f8662771 ("power: always freeze efivarfs")
As far as regressions go, it's pretty bad. Please prioritize.
BR,
Jani.
>
> `````````````````````````````````````````````````````````````````````````````````
> <4> [157.687644] ------------[ cut here ]------------
> <4> [157.687768] WARNING: CPU: 5 PID: 2277 at kernel/freezer.c:139
> __set_task_frozen+0x7f/0xb0
> ...
> <4> [157.687923] PKRU: 55555554
> <4> [157.687924] Call Trace:
> <4> [157.687925] <TASK>
> <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10
> <4> [157.687929] task_call_func+0x6d/0x120
> <4> [157.687932] ? cgroup_freezing+0x89/0x200
> <4> [157.687937] freeze_task+0x98/0x100
> <4> [157.687940] try_to_freeze_tasks+0xd2/0x440
> <4> [157.687946] freeze_processes+0x56/0xd0
> <4> [157.687948] hibernate+0x129/0x4a0
> <4> [157.687951] state_store+0xd3/0xe0
> <4> [157.687954] kobj_attr_store+0x12/0x40
> <4> [157.687959] sysfs_kf_write+0x4d/0x80
> <4> [157.687963] kernfs_fop_write_iter+0x188/0x240
> <4> [157.687967] vfs_write+0x283/0x540
> <4> [157.687969] ? free_to_partial_list+0x46d/0x640
> <4> [157.687976] ksys_write+0x6f/0xf0
> <4> [157.687980] __x64_sys_write+0x19/0x30
> <4> [157.687982] x64_sys_call+0x79/0x26a0
> <4> [157.687984] do_syscall_64+0x93/0xd60
> <4> [157.687987] ? putname+0x65/0x90
> <4> [157.687990] ? kmem_cache_free+0x553/0x680
> <4> [157.687995] ? putname+0x65/0x90
> <4> [157.687997] ? putname+0x65/0x90
> <4> [157.687999] ? do_sys_openat2+0x8b/0xd0
> <4> [157.688003] ? __x64_sys_openat+0x54/0xa0
> <4> [157.688007] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688009] ? __fput+0x1bf/0x2f0
> <4> [157.688012] ? fput_close_sync+0x3d/0xa0
> <4> [157.688015] ? __x64_sys_close+0x3e/0x90
> <4> [157.688017] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688019] ? putname+0x65/0x90
> <4> [157.688021] ? putname+0x65/0x90
> <4> [157.688023] ? do_sys_openat2+0x8b/0xd0
> <4> [157.688024] ? __fput+0x1bf/0x2f0
> <4> [157.688028] ? __x64_sys_openat+0x54/0xa0
> <4> [157.688032] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688034] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688036] ? irqentry_exit+0x77/0xb0
> <4> [157.688038] ? exc_page_fault+0xbd/0x2c0
> <4> [157.688042] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [157.688044] RIP: 0033:0x72523c91c574
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
>
> After bisecting the tree, the following patch [3] seems to be the first
> "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> Author: Christian Brauner <brauner@kernel.org>
> Date: Wed Nov 5 14:39:45 2025 +0100
>
> power: always freeze efivarfs
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
>
> Could you please check why the patch causes this regression and provide
> a fix if necessary?
>
> Thank you.
>
> Regards
>
> Chaitanya
>
> [1]
> https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=suspend
> [2]
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_17595/shard-mtlp-6/igt@gem_exec_suspend@basic-s4-devices.html
> [3]
> https://gitlab.com/freedesktop-mirror/drm-tip/-/commit/a3f8f8662771285511ae26c4c8d3ba1cd22159b9
--
Jani Nikula, Intel
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip)
2025-12-03 13:34 ` BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip) Jani Nikula
@ 2025-12-03 13:36 ` Jani Nikula
2025-12-03 13:40 ` Rafael J. Wysocki
0 siblings, 1 reply; 22+ messages in thread
From: Jani Nikula @ 2025-12-03 13:36 UTC (permalink / raw)
To: Borah, Chaitanya Kumar, brauner
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Saarinen, Jani, Kurmi, Suresh Kumar, Thorsten Leemhuis,
Ville Syrjälä, Rafael J. Wysocki, Pavel Machek,
linux-pm, linux-kernel
On Wed, 03 Dec 2025, Jani Nikula <jani.nikula@linux.intel.com> wrote:
> On Thu, 27 Nov 2025, "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> wrote:
>> Hello Christian,
>>
>> This is Chaitanya (again!).
>>
>> This mail is regarding another regression we are seeing in our CI
>> runs[1] on drm-tip (with both xe and i915).
>
> Referring to drm-tip is downplaying the problem. The bisected regression
> is in v6.18 release. It's breaking suspend/resume across a lot of
> platforms on two drivers, i915 and xe.
>
> a3f8f8662771 ("power: always freeze efivarfs")
>
> As far as regressions go, it's pretty bad. Please prioritize.
Added some missing Cc's.
>
>
> BR,
> Jani.
>
>
>
>>
>> `````````````````````````````````````````````````````````````````````````````````
>> <4> [157.687644] ------------[ cut here ]------------
>> <4> [157.687768] WARNING: CPU: 5 PID: 2277 at kernel/freezer.c:139
>> __set_task_frozen+0x7f/0xb0
>> ...
>> <4> [157.687923] PKRU: 55555554
>> <4> [157.687924] Call Trace:
>> <4> [157.687925] <TASK>
>> <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10
>> <4> [157.687929] task_call_func+0x6d/0x120
>> <4> [157.687932] ? cgroup_freezing+0x89/0x200
>> <4> [157.687937] freeze_task+0x98/0x100
>> <4> [157.687940] try_to_freeze_tasks+0xd2/0x440
>> <4> [157.687946] freeze_processes+0x56/0xd0
>> <4> [157.687948] hibernate+0x129/0x4a0
>> <4> [157.687951] state_store+0xd3/0xe0
>> <4> [157.687954] kobj_attr_store+0x12/0x40
>> <4> [157.687959] sysfs_kf_write+0x4d/0x80
>> <4> [157.687963] kernfs_fop_write_iter+0x188/0x240
>> <4> [157.687967] vfs_write+0x283/0x540
>> <4> [157.687969] ? free_to_partial_list+0x46d/0x640
>> <4> [157.687976] ksys_write+0x6f/0xf0
>> <4> [157.687980] __x64_sys_write+0x19/0x30
>> <4> [157.687982] x64_sys_call+0x79/0x26a0
>> <4> [157.687984] do_syscall_64+0x93/0xd60
>> <4> [157.687987] ? putname+0x65/0x90
>> <4> [157.687990] ? kmem_cache_free+0x553/0x680
>> <4> [157.687995] ? putname+0x65/0x90
>> <4> [157.687997] ? putname+0x65/0x90
>> <4> [157.687999] ? do_sys_openat2+0x8b/0xd0
>> <4> [157.688003] ? __x64_sys_openat+0x54/0xa0
>> <4> [157.688007] ? do_syscall_64+0x1b7/0xd60
>> <4> [157.688009] ? __fput+0x1bf/0x2f0
>> <4> [157.688012] ? fput_close_sync+0x3d/0xa0
>> <4> [157.688015] ? __x64_sys_close+0x3e/0x90
>> <4> [157.688017] ? do_syscall_64+0x1b7/0xd60
>> <4> [157.688019] ? putname+0x65/0x90
>> <4> [157.688021] ? putname+0x65/0x90
>> <4> [157.688023] ? do_sys_openat2+0x8b/0xd0
>> <4> [157.688024] ? __fput+0x1bf/0x2f0
>> <4> [157.688028] ? __x64_sys_openat+0x54/0xa0
>> <4> [157.688032] ? do_syscall_64+0x1b7/0xd60
>> <4> [157.688034] ? do_syscall_64+0x1b7/0xd60
>> <4> [157.688036] ? irqentry_exit+0x77/0xb0
>> <4> [157.688038] ? exc_page_fault+0xbd/0x2c0
>> <4> [157.688042] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> <4> [157.688044] RIP: 0033:0x72523c91c574
>> `````````````````````````````````````````````````````````````````````````````````
>> Details log can be found in [2].
>>
>> After bisecting the tree, the following patch [3] seems to be the first
>> "bad" commit
>>
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>> commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
>> Author: Christian Brauner <brauner@kernel.org>
>> Date: Wed Nov 5 14:39:45 2025 +0100
>>
>> power: always freeze efivarfs
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>
>> We also verified that if we revert the patch the issue is not seen.
>>
>> Could you please check why the patch causes this regression and provide
>> a fix if necessary?
>>
>> Thank you.
>>
>> Regards
>>
>> Chaitanya
>>
>> [1]
>> https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=suspend
>> [2]
>> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_17595/shard-mtlp-6/igt@gem_exec_suspend@basic-s4-devices.html
>> [3]
>> https://gitlab.com/freedesktop-mirror/drm-tip/-/commit/a3f8f8662771285511ae26c4c8d3ba1cd22159b9
--
Jani Nikula, Intel
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip)
2025-12-03 13:36 ` Jani Nikula
@ 2025-12-03 13:40 ` Rafael J. Wysocki
0 siblings, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2025-12-03 13:40 UTC (permalink / raw)
To: Jani Nikula
Cc: Borah, Chaitanya Kumar, brauner, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, Saarinen, Jani,
Kurmi, Suresh Kumar, Thorsten Leemhuis, Ville Syrjälä,
Rafael J. Wysocki, Pavel Machek, linux-pm, linux-kernel
On Wed, Dec 3, 2025 at 2:36 PM Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> On Wed, 03 Dec 2025, Jani Nikula <jani.nikula@linux.intel.com> wrote:
> > On Thu, 27 Nov 2025, "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> wrote:
> >> Hello Christian,
> >>
> >> This is Chaitanya (again!).
> >>
> >> This mail is regarding another regression we are seeing in our CI
> >> runs[1] on drm-tip (with both xe and i915).
> >
> > Referring to drm-tip is downplaying the problem. The bisected regression
> > is in v6.18 release. It's breaking suspend/resume across a lot of
> > platforms on two drivers, i915 and xe.
> >
> > a3f8f8662771 ("power: always freeze efivarfs")
> >
> > As far as regressions go, it's pretty bad. Please prioritize.
>
> Added some missing Cc's.
This should fix it:
https://lore.kernel.org/linux-pm/12788397.O9o76ZdvQC@rafael.j.wysocki/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: REGRESSION on drm-tip
2025-11-27 6:25 REGRESSION on drm-tip Borah, Chaitanya Kumar
` (2 preceding siblings ...)
2025-12-03 13:34 ` BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip) Jani Nikula
@ 2025-12-05 10:14 ` Christian Brauner
3 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2025-12-05 10:14 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
Saarinen, Jani, Kurmi, Suresh Kumar, Lucas De Marchi
On Thu, Nov 27, 2025 at 11:55:54AM +0530, Borah, Chaitanya Kumar wrote:
> Hello Christian,
>
> This is Chaitanya (again!).
>
> This mail is regarding another regression we are seeing in our CI runs[1] on
> drm-tip (with both xe and i915).
>
> `````````````````````````````````````````````````````````````````````````````````
> <4> [157.687644] ------------[ cut here ]------------
> <4> [157.687768] WARNING: CPU: 5 PID: 2277 at kernel/freezer.c:139
> __set_task_frozen+0x7f/0xb0
> ...
> <4> [157.687923] PKRU: 55555554
> <4> [157.687924] Call Trace:
> <4> [157.687925] <TASK>
> <4> [157.687926] ? __pfx___set_task_frozen+0x10/0x10
> <4> [157.687929] task_call_func+0x6d/0x120
> <4> [157.687932] ? cgroup_freezing+0x89/0x200
> <4> [157.687937] freeze_task+0x98/0x100
> <4> [157.687940] try_to_freeze_tasks+0xd2/0x440
> <4> [157.687946] freeze_processes+0x56/0xd0
> <4> [157.687948] hibernate+0x129/0x4a0
> <4> [157.687951] state_store+0xd3/0xe0
> <4> [157.687954] kobj_attr_store+0x12/0x40
> <4> [157.687959] sysfs_kf_write+0x4d/0x80
> <4> [157.687963] kernfs_fop_write_iter+0x188/0x240
> <4> [157.687967] vfs_write+0x283/0x540
> <4> [157.687969] ? free_to_partial_list+0x46d/0x640
> <4> [157.687976] ksys_write+0x6f/0xf0
> <4> [157.687980] __x64_sys_write+0x19/0x30
> <4> [157.687982] x64_sys_call+0x79/0x26a0
> <4> [157.687984] do_syscall_64+0x93/0xd60
> <4> [157.687987] ? putname+0x65/0x90
> <4> [157.687990] ? kmem_cache_free+0x553/0x680
> <4> [157.687995] ? putname+0x65/0x90
> <4> [157.687997] ? putname+0x65/0x90
> <4> [157.687999] ? do_sys_openat2+0x8b/0xd0
> <4> [157.688003] ? __x64_sys_openat+0x54/0xa0
> <4> [157.688007] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688009] ? __fput+0x1bf/0x2f0
> <4> [157.688012] ? fput_close_sync+0x3d/0xa0
> <4> [157.688015] ? __x64_sys_close+0x3e/0x90
> <4> [157.688017] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688019] ? putname+0x65/0x90
> <4> [157.688021] ? putname+0x65/0x90
> <4> [157.688023] ? do_sys_openat2+0x8b/0xd0
> <4> [157.688024] ? __fput+0x1bf/0x2f0
> <4> [157.688028] ? __x64_sys_openat+0x54/0xa0
> <4> [157.688032] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688034] ? do_syscall_64+0x1b7/0xd60
> <4> [157.688036] ? irqentry_exit+0x77/0xb0
> <4> [157.688038] ? exc_page_fault+0xbd/0x2c0
> <4> [157.688042] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> [157.688044] RIP: 0033:0x72523c91c574
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
>
> After bisecting the tree, the following patch [3] seems to be the first
> "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit a3f8f8662771285511ae26c4c8d3ba1cd22159b9
> Author: Christian Brauner <brauner@kernel.org>
> Date: Wed Nov 5 14:39:45 2025 +0100
>
> power: always freeze efivarfs
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
>
> Could you please check why the patch causes this regression and provide a
> fix if necessary?
I'm going to send a fix for this to Linus today. Thanks!
Rafael has sent me fix earlier this week.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Regression on drm-tip
@ 2025-04-28 6:02 Borah, Chaitanya Kumar
0 siblings, 0 replies; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-04-28 6:02 UTC (permalink / raw)
To: Hall, Christopher S
Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
Keller, Jacob E, intel-wired-lan@lists.osuosl.org,
Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas
Hello Christopher,
This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip[2] repository.
`````````````````````````````````````````````````````````````````````````````````
<4>[ 7.891028] =============================
<4>[ 7.891293] [ BUG: Invalid wait context ]
<4>[ 7.891526] 6.15.0-rc3-CI_DRM_16443-gdc80d6a10c1c+ #1 Tainted: G W
<4>[ 7.891792] -----------------------------
<4>[ 7.892070] (udev-worker)/286 is trying to lock:
<4>[ 7.892349] ffff88811671bcc8 (&adapter->ptm_lock){....}-{3:3}, at: igc_ptp_reset+0x155/0x320 [igc]
<4>[ 7.892660] other info that might help us debug this:
<4>[ 7.892943] context-{4:4}
<4>[ 7.893226] 2 locks held by (udev-worker)/286:
<4>[ 7.893515] #0: ffff888103bd41b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220
<4>[ 7.893823] #1: ffff88811671bb70 (&adapter->tmreg_lock){....}-{2:2}, at: igc_ptp_reset+0x53/0x320 [igc]
<4>[ 7.894134] stack backtrace:
<4>[ 7.894439] CPU: 2 UID: 0 PID: 286 Comm: (udev-worker) Tainted: G W 6.15.0-rc3-CI_DRM_16443-gdc80d6a10c1c+ #1 PREEMPT(voluntary)
<4>[ 7.894442] Tainted: [W]=WARN
<4>[ 7.894443] Hardware name: Intel(R) Client Systems NUC11TNHi3/NUC11TNBi3, BIOS TNTGL357.0067.2022.0718.1742 07/18/2022
`````````````````````````````````````````````````````````````````````````````````
Detailed log can be found in [3].
After bisecting the tree, the following patch [4] seems to be the first "bad"
commit
`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 1a931c4f5e6862e61a4b130cb76b422e1415f644
Author: Christopher S M Hall mailto:christopher.s.hall@intel.com
Date: Tue Apr 1 16:35:34 2025 -0700
igc: add lock preventing multiple simultaneous PTM transactions
`````````````````````````````````````````````````````````````````````````````````````````````````````````
We also verified that if we revert the patch the issue is not seen.
Could you please check why the patch causes this regression and provide a fix if necessary?
Thank you.
Regards
Chaitanya
[1] https://intel-gfx-ci.01.org/tree/drm-tip/shard-tglu.html
[2] https://cgit.freedesktop.org/drm-tip/tree/
[3] https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16443/fi-tgl-1115g4/boot0.txt
[4] https://cgit.freedesktop.org/drm-tip/commit/?id=1a931c4f5e6862e61a4b130cb76b422e1415f644
^ permalink raw reply [flat|nested] 22+ messages in thread* Regression on drm-tip
@ 2025-03-13 8:51 Borah, Chaitanya Kumar
2025-03-13 9:30 ` Baolu Lu
2025-03-13 14:23 ` Baolu Lu
0 siblings, 2 replies; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-03-13 8:51 UTC (permalink / raw)
To: Baolu Lu
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev
Hello Lu,
Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip repository.
`````````````````````````````````````````````````````````````````````````````````
<4>[ 2.856622] WARNING: possible circular locking dependency detected
<4>[ 2.856631] 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
<4>[ 2.856642] ------------------------------------------------------
<4>[ 2.856650] swapper/0/1 is trying to acquire lock:
<4>[ 2.856657] ffffffff8360ecc8 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70
<4>[ 2.856679]
but task is already holding lock:
<4>[ 2.856686] ffff888102ab6fa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xea1/0x1220
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [2].
After bisecting the tree, the following patch [3] seems to be the first "bad" commit
`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
Author: Lu Baolu mailto:baolu.lu@linux.intel.com
Date: Fri Feb 28 18:27:26 2025 +0800
iommu/vt-d: Fix suspicious RCU usage
`````````````````````````````````````````````````````````````````````````````````````````````````````````
We also verified that if we revert the patch the issue is not seen.
Could you please check why the patch causes this regression and provide a fix if necessary?
Gitlab issue for the regression is [4].
Thank you.
Regards
Chaitanya
[1] https://intel-gfx-ci.01.org/tree/drm-tip/combined-alt.html?
[2] https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16276/fi-kbl-8809g/boot0.txt
[3] https://cgit.freedesktop.org/drm-tip/commit/?id=b150654f74bf0df8e6a7936d5ec51400d9ec06d8
[4] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13818
Regards
Chaitanya
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Regression on drm-tip
2025-03-13 8:51 Borah, Chaitanya Kumar
@ 2025-03-13 9:30 ` Baolu Lu
2025-03-13 14:23 ` Baolu Lu
1 sibling, 0 replies; 22+ messages in thread
From: Baolu Lu @ 2025-03-13 9:30 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: baolu.lu, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, iommu@lists.linux.dev
On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> Hello Lu,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip repository.
>
> `````````````````````````````````````````````````````````````````````````````````
> <4>[ 2.856622] WARNING: possible circular locking dependency detected
> <4>[ 2.856631] 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
> <4>[ 2.856642] ------------------------------------------------------
> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
> <4>[ 2.856657] ffffffff8360ecc8 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70
> <4>[ 2.856679]
> but task is already holding lock:
> <4>[ 2.856686] ffff888102ab6fa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xea1/0x1220
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
>
> After bisecting the tree, the following patch [3] seems to be the first "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> Author: Lu Baolumailto:baolu.lu@linux.intel.com
> Date: Fri Feb 28 18:27:26 2025 +0800
>
> iommu/vt-d: Fix suspicious RCU usage
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
>
> Could you please check why the patch causes this regression and provide a fix if necessary?
Sure. I will look into this issue.
Thanks,
baolu
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Regression on drm-tip
2025-03-13 8:51 Borah, Chaitanya Kumar
2025-03-13 9:30 ` Baolu Lu
@ 2025-03-13 14:23 ` Baolu Lu
2025-03-14 9:04 ` Borah, Chaitanya Kumar
1 sibling, 1 reply; 22+ messages in thread
From: Baolu Lu @ 2025-03-13 14:23 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: baolu.lu, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, iommu@lists.linux.dev
On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> Hello Lu,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip repository.
>
> `````````````````````````````````````````````````````````````````````````````````
> <4>[ 2.856622] WARNING: possible circular locking dependency detected
> <4>[ 2.856631] 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
> <4>[ 2.856642] ------------------------------------------------------
> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
> <4>[ 2.856657] ffffffff8360ecc8 (iommu_probe_device_lock){+.+.}-{3:3}, at: iommu_probe_device+0x1d/0x70
> <4>[ 2.856679]
> but task is already holding lock:
> <4>[ 2.856686] ffff888102ab6fa8 (&device->physical_node_lock){+.+.}-{3:3}, at: intel_iommu_init+0xea1/0x1220
> `````````````````````````````````````````````````````````````````````````````````
> Details log can be found in [2].
>
> After bisecting the tree, the following patch [3] seems to be the first "bad" commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> Author: Lu Baolumailto:baolu.lu@linux.intel.com
> Date: Fri Feb 28 18:27:26 2025 +0800
>
> iommu/vt-d: Fix suspicious RCU usage
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We also verified that if we revert the patch the issue is not seen.
>
> Could you please check why the patch causes this regression and provide a fix if necessary?
Can you please take a quick test to check if the following fix works?
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index e540092d664d..06debeaec643 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
if (iommu->irq || iommu->node != cpu_to_node(cpu))
continue;
+ /*
+ * Call dmar_alloc_hwirq() with dmar_global_lock held,
+ * could cause possible lock race condition.
+ */
+ up_read(&dmar_global_lock);
ret = dmar_set_interrupt(iommu);
-
+ down_read(&dmar_global_lock);
if (ret) {
pr_err("DRHD %Lx: failed to enable fault,
interrupt, ret %d\n",
(unsigned long long)drhd->reg_base_addr,
ret);
Thanks,
baolu
^ permalink raw reply related [flat|nested] 22+ messages in thread* RE: Regression on drm-tip
2025-03-13 14:23 ` Baolu Lu
@ 2025-03-14 9:04 ` Borah, Chaitanya Kumar
2025-03-16 2:33 ` Baolu Lu
0 siblings, 1 reply; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-03-14 9:04 UTC (permalink / raw)
To: Baolu Lu
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev
> -----Original Message-----
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, March 13, 2025 7:53 PM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> Cc: baolu.lu@linux.intel.com; intel-gfx@lists.freedesktop.org; intel-
> xe@lists.freedesktop.org; iommu@lists.linux.dev
> Subject: Re: Regression on drm-tip
>
> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> > Hello Lu,
> >
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> >
> > This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip
> repository.
> >
> > ``````````````````````````````````````````````````````````````````````
> > ``````````` <4>[ 2.856622] WARNING: possible circular locking
> > dependency detected <4>[ 2.856631]
> > 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I <4>[
> > 2.856642] ------------------------------------------------------
> > <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
> > <4>[ 2.856657] ffffffff8360ecc8
> > (iommu_probe_device_lock){+.+.}-{3:3}, at:
> > iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
> > but task is already holding lock:
> > <4>[ 2.856686] ffff888102ab6fa8
> > (&device->physical_node_lock){+.+.}-{3:3}, at:
> > intel_iommu_init+0xea1/0x1220
> > ``````````````````````````````````````````````````````````````````````
> > ```````````
> > Details log can be found in [2].
> >
> > After bisecting the tree, the following patch [3] seems to be the
> > first "bad" commit
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> > commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> > Author: Lu Baolumailto:baolu.lu@linux.intel.com
> > Date: Fri Feb 28 18:27:26 2025 +0800
> >
> > iommu/vt-d: Fix suspicious RCU usage
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> >
> > We also verified that if we revert the patch the issue is not seen.
> >
> > Could you please check why the patch causes this regression and provide a
> fix if necessary?
>
> Can you please take a quick test to check if the following fix works?
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index
> e540092d664d..06debeaec643 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
> if (iommu->irq || iommu->node != cpu_to_node(cpu))
> continue;
>
> + /*
> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
> + * could cause possible lock race condition.
> + */
> + up_read(&dmar_global_lock);
> ret = dmar_set_interrupt(iommu);
> -
> + down_read(&dmar_global_lock);
> if (ret) {
> pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
> (unsigned long long)drhd->reg_base_addr, ret);
>
> Thanks,
> baolu
We still see the issue with this change.
Regards
Chaitanya
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Regression on drm-tip
2025-03-14 9:04 ` Borah, Chaitanya Kumar
@ 2025-03-16 2:33 ` Baolu Lu
2025-03-16 7:27 ` Borah, Chaitanya Kumar
0 siblings, 1 reply; 22+ messages in thread
From: Baolu Lu @ 2025-03-16 2:33 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev
On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>
>
>> -----Original Message-----
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Thursday, March 13, 2025 7:53 PM
>> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
>> Cc: baolu.lu@linux.intel.com; intel-gfx@lists.freedesktop.org; intel-
>> xe@lists.freedesktop.org; iommu@lists.linux.dev
>> Subject: Re: Regression on drm-tip
>>
>> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>> Hello Lu,
>>>
>>> Hope you are doing well. I am Chaitanya from the linux graphics team in
>> Intel.
>>>
>>> This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip
>> repository.
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ``````````` <4>[ 2.856622] WARNING: possible circular locking
>>> dependency detected <4>[ 2.856631]
>>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I <4>[
>>> 2.856642] ------------------------------------------------------
>>> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
>>> <4>[ 2.856657] ffffffff8360ecc8
>>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
>>> iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
>>> but task is already holding lock:
>>> <4>[ 2.856686] ffff888102ab6fa8
>>> (&device->physical_node_lock){+.+.}-{3:3}, at:
>>> intel_iommu_init+0xea1/0x1220
>>> ``````````````````````````````````````````````````````````````````````
>>> ```````````
>>> Details log can be found in [2].
>>>
>>> After bisecting the tree, the following patch [3] seems to be the
>>> first "bad" commit
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ```````````````````````````````````
>>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>> Author: Lu Baolumailto:baolu.lu@linux.intel.com
>>> Date: Fri Feb 28 18:27:26 2025 +0800
>>>
>>> iommu/vt-d: Fix suspicious RCU usage
>>>
>>> ``````````````````````````````````````````````````````````````````````
>>> ```````````````````````````````````
>>>
>>> We also verified that if we revert the patch the issue is not seen.
>>>
>>> Could you please check why the patch causes this regression and provide a
>> fix if necessary?
>>
>> Can you please take a quick test to check if the following fix works?
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index
>> e540092d664d..06debeaec643 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
>> continue;
>>
>> + /*
>> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
>> + * could cause possible lock race condition.
>> + */
>> + up_read(&dmar_global_lock);
>> ret = dmar_set_interrupt(iommu);
>> -
>> + down_read(&dmar_global_lock);
>> if (ret) {
>> pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
>> (unsigned long long)drhd->reg_base_addr, ret);
>>
>> Thanks,
>> baolu
>
> We still see the issue with this change.
I am attempting to reproduce this issue with my MTL machine. I pulled
the test branch from:
https://anongit.freedesktop.org/git/drm-tip.git
and built the test kernel image using the configuration file from:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
But I did not observe the lockdep splat mentioned above after booting.
Is there anything I might have missed?
Thanks,
baolu
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: Regression on drm-tip
2025-03-16 2:33 ` Baolu Lu
@ 2025-03-16 7:27 ` Borah, Chaitanya Kumar
2025-03-16 8:03 ` Baolu Lu
0 siblings, 1 reply; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-03-16 7:27 UTC (permalink / raw)
To: Baolu Lu
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev, Kurmi, Suresh Kumar, Saarinen, Jani,
De Marchi, Lucas
> -----Original Message-----
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, March 16, 2025 8:04 AM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> Cc: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org;
> iommu@lists.linux.dev
> Subject: Re: Regression on drm-tip
>
> On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
> >
> >
> >> -----Original Message-----
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Thursday, March 13, 2025 7:53 PM
> >> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> >> Cc: baolu.lu@linux.intel.com; intel-gfx@lists.freedesktop.org; intel-
> >> xe@lists.freedesktop.org; iommu@lists.linux.dev
> >> Subject: Re: Regression on drm-tip
> >>
> >> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> >>> Hello Lu,
> >>>
> >>> Hope you are doing well. I am Chaitanya from the linux graphics team
> >>> in
> >> Intel.
> >>>
> >>> This mail is regarding a regression we are seeing in our CI runs[1]
> >>> on drm-tip
> >> repository.
> >>>
> >>> ````````````````````````````````````````````````````````````````````
> >>> `` ``````````` <4>[ 2.856622] WARNING: possible circular locking
> >>> dependency detected <4>[ 2.856631]
> >>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I <4>[
> >>> 2.856642] ------------------------------------------------------
> >>> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
> >>> <4>[ 2.856657] ffffffff8360ecc8
> >>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
> >>> iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
> >>> but task is already holding lock:
> >>> <4>[ 2.856686] ffff888102ab6fa8
> >>> (&device->physical_node_lock){+.+.}-{3:3}, at:
> >>> intel_iommu_init+0xea1/0x1220
> >>> ````````````````````````````````````````````````````````````````````
> >>> ``
> >>> ```````````
> >>> Details log can be found in [2].
> >>>
> >>> After bisecting the tree, the following patch [3] seems to be the
> >>> first "bad" commit
> >>>
> >>> ````````````````````````````````````````````````````````````````````
> >>> ``
> >>> ```````````````````````````````````
> >>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> >>> Author: Lu Baolumailto:baolu.lu@linux.intel.com
> >>> Date: Fri Feb 28 18:27:26 2025 +0800
> >>>
> >>> iommu/vt-d: Fix suspicious RCU usage
> >>>
> >>> ````````````````````````````````````````````````````````````````````
> >>> ``
> >>> ```````````````````````````````````
> >>>
> >>> We also verified that if we revert the patch the issue is not seen.
> >>>
> >>> Could you please check why the patch causes this regression and
> >>> provide a
> >> fix if necessary?
> >>
> >> Can you please take a quick test to check if the following fix works?
> >>
> >> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> >> index
> >> e540092d664d..06debeaec643 100644
> >> --- a/drivers/iommu/intel/dmar.c
> >> +++ b/drivers/iommu/intel/dmar.c
> >> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
> cpu)
> >> if (iommu->irq || iommu->node != cpu_to_node(cpu))
> >> continue;
> >>
> >> + /*
> >> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
> >> + * could cause possible lock race condition.
> >> + */
> >> + up_read(&dmar_global_lock);
> >> ret = dmar_set_interrupt(iommu);
> >> -
> >> + down_read(&dmar_global_lock);
> >> if (ret) {
> >> pr_err("DRHD %Lx: failed to enable fault, interrupt, ret
> %d\n",
> >> (unsigned long
> >> long)drhd->reg_base_addr, ret);
> >>
> >> Thanks,
> >> baolu
> >
> > We still see the issue with this change.
>
> I am attempting to reproduce this issue with my MTL machine. I pulled the
> test branch from:
>
> https://anongit.freedesktop.org/git/drm-tip.git
>
> and built the test kernel image using the configuration file from:
>
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
>
> But I did not observe the lockdep splat mentioned above after booting.
>
> Is there anything I might have missed?
>
+Suresh, Jani, Lucas
We are seeing this only the skykale and kabylake on our CI runs.
https://intel-gfx-ci.01.org/tree/drm-tip/igt@runner@aborted.html
Regards
Chaitanya
> Thanks,
> baolu
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Regression on drm-tip
2025-03-16 7:27 ` Borah, Chaitanya Kumar
@ 2025-03-16 8:03 ` Baolu Lu
2025-03-16 10:01 ` Borah, Chaitanya Kumar
0 siblings, 1 reply; 22+ messages in thread
From: Baolu Lu @ 2025-03-16 8:03 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev, Kurmi, Suresh Kumar, Saarinen, Jani,
De Marchi, Lucas
On 3/16/25 15:27, Borah, Chaitanya Kumar wrote:
>
>> -----Original Message-----
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Sunday, March 16, 2025 8:04 AM
>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>> Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>> iommu@lists.linux.dev
>> Subject: Re: Regression on drm-tip
>>
>> On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>>>
>>>> -----Original Message-----
>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>> Sent: Thursday, March 13, 2025 7:53 PM
>>>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>> Cc:baolu.lu@linux.intel.com;intel-gfx@lists.freedesktop.org; intel-
>>>> xe@lists.freedesktop.org;iommu@lists.linux.dev
>>>> Subject: Re: Regression on drm-tip
>>>>
>>>> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>>>> Hello Lu,
>>>>>
>>>>> Hope you are doing well. I am Chaitanya from the linux graphics team
>>>>> in
>>>> Intel.
>>>>> This mail is regarding a regression we are seeing in our CI runs[1]
>>>>> on drm-tip
>>>> repository.
>>>>> ````````````````````````````````````````````````````````````````````
>>>>> `` ``````````` <4>[ 2.856622] WARNING: possible circular locking
>>>>> dependency detected <4>[ 2.856631]
>>>>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I <4>[
>>>>> 2.856642] ------------------------------------------------------
>>>>> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
>>>>> <4>[ 2.856657] ffffffff8360ecc8
>>>>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
>>>>> iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
>>>>> but task is already holding lock:
>>>>> <4>[ 2.856686] ffff888102ab6fa8
>>>>> (&device->physical_node_lock){+.+.}-{3:3}, at:
>>>>> intel_iommu_init+0xea1/0x1220
>>>>> ````````````````````````````````````````````````````````````````````
>>>>> ``
>>>>> ```````````
>>>>> Details log can be found in [2].
>>>>>
>>>>> After bisecting the tree, the following patch [3] seems to be the
>>>>> first "bad" commit
>>>>>
>>>>> ````````````````````````````````````````````````````````````````````
>>>>> ``
>>>>> ```````````````````````````````````
>>>>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>>>> Author: LuBaolumailto:baolu.lu@linux.intel.com
>>>>> Date: Fri Feb 28 18:27:26 2025 +0800
>>>>>
>>>>> iommu/vt-d: Fix suspicious RCU usage
>>>>>
>>>>> ````````````````````````````````````````````````````````````````````
>>>>> ``
>>>>> ```````````````````````````````````
>>>>>
>>>>> We also verified that if we revert the patch the issue is not seen.
>>>>>
>>>>> Could you please check why the patch causes this regression and
>>>>> provide a
>>>> fix if necessary?
>>>>
>>>> Can you please take a quick test to check if the following fix works?
>>>>
>>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>>> index
>>>> e540092d664d..06debeaec643 100644
>>>> --- a/drivers/iommu/intel/dmar.c
>>>> +++ b/drivers/iommu/intel/dmar.c
>>>> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
>> cpu)
>>>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
>>>> continue;
>>>>
>>>> + /*
>>>> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
>>>> + * could cause possible lock race condition.
>>>> + */
>>>> + up_read(&dmar_global_lock);
>>>> ret = dmar_set_interrupt(iommu);
>>>> -
>>>> + down_read(&dmar_global_lock);
>>>> if (ret) {
>>>> pr_err("DRHD %Lx: failed to enable fault, interrupt, ret
>> %d\n",
>>>> (unsigned long
>>>> long)drhd->reg_base_addr, ret);
>>>>
>>>> Thanks,
>>>> baolu
>>> We still see the issue with this change.
>> I am attempting to reproduce this issue with my MTL machine. I pulled the
>> test branch from:
>>
>> https://anongit.freedesktop.org/git/drm-tip.git
>>
>> and built the test kernel image using the configuration file from:
>>
>> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
>>
>> But I did not observe the lockdep splat mentioned above after booting.
>>
>> Is there anything I might have missed?
>>
> +Suresh, Jani, Lucas
>
> We are seeing this only the skykale and kabylake on our CI runs.
If so, will below change make any difference?
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 85aa66ef4d61..ec2f385ae25b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3049,6 +3049,7 @@ static int __init probe_acpi_namespace_devices(void)
if (dev->bus != &acpi_bus_type)
continue;
+ up_read(&dmar_global_lock);
adev = to_acpi_device(dev);
mutex_lock(&adev->physical_node_lock);
list_for_each_entry(pn,
@@ -3058,6 +3059,7 @@ static int __init probe_acpi_namespace_devices(void)
break;
}
mutex_unlock(&adev->physical_node_lock);
+ down_read(&dmar_global_lock);
if (ret)
return ret;
Thanks,
baolu
^ permalink raw reply related [flat|nested] 22+ messages in thread* RE: Regression on drm-tip
2025-03-16 8:03 ` Baolu Lu
@ 2025-03-16 10:01 ` Borah, Chaitanya Kumar
2025-03-17 4:04 ` Baolu Lu
0 siblings, 1 reply; 22+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-03-16 10:01 UTC (permalink / raw)
To: Baolu Lu
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev, Kurmi, Suresh Kumar, Saarinen, Jani,
De Marchi, Lucas
> -----Original Message-----
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, March 16, 2025 1:33 PM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> Cc: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org;
> iommu@lists.linux.dev; Kurmi, Suresh Kumar
> <suresh.kumar.kurmi@intel.com>; Saarinen, Jani <jani.saarinen@intel.com>;
> De Marchi, Lucas <lucas.demarchi@intel.com>
> Subject: Re: Regression on drm-tip
>
> On 3/16/25 15:27, Borah, Chaitanya Kumar wrote:
> >
> >> -----Original Message-----
> >> From: Baolu Lu<baolu.lu@linux.intel.com>
> >> Sent: Sunday, March 16, 2025 8:04 AM
> >> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
> >> Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
> >> iommu@lists.linux.dev
> >> Subject: Re: Regression on drm-tip
> >>
> >> On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
> >>>
> >>>> -----Original Message-----
> >>>> From: Baolu Lu<baolu.lu@linux.intel.com>
> >>>> Sent: Thursday, March 13, 2025 7:53 PM
> >>>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
> >>>> Cc:baolu.lu@linux.intel.com;intel-gfx@lists.freedesktop.org; intel-
> >>>> xe@lists.freedesktop.org;iommu@lists.linux.dev
> >>>> Subject: Re: Regression on drm-tip
> >>>>
> >>>> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> >>>>> Hello Lu,
> >>>>>
> >>>>> Hope you are doing well. I am Chaitanya from the linux graphics
> >>>>> team in
> >>>> Intel.
> >>>>> This mail is regarding a regression we are seeing in our CI
> >>>>> runs[1] on drm-tip
> >>>> repository.
> >>>>> ``````````````````````````````````````````````````````````````````
> >>>>> `` `` ``````````` <4>[ 2.856622] WARNING: possible circular
> >>>>> locking dependency detected <4>[ 2.856631]
> >>>>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
> >>>>> <4>[ 2.856642]
> >>>>> ------------------------------------------------------
> >>>>> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
> >>>>> <4>[ 2.856657] ffffffff8360ecc8
> >>>>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
> >>>>> iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
> >>>>> but task is already holding lock:
> >>>>> <4>[ 2.856686] ffff888102ab6fa8
> >>>>> (&device->physical_node_lock){+.+.}-{3:3}, at:
> >>>>> intel_iommu_init+0xea1/0x1220
> >>>>> ``````````````````````````````````````````````````````````````````
> >>>>> ``
> >>>>> ``
> >>>>> ```````````
> >>>>> Details log can be found in [2].
> >>>>>
> >>>>> After bisecting the tree, the following patch [3] seems to be the
> >>>>> first "bad" commit
> >>>>>
> >>>>> ``````````````````````````````````````````````````````````````````
> >>>>> ``
> >>>>> ``
> >>>>> ```````````````````````````````````
> >>>>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> >>>>> Author: LuBaolumailto:baolu.lu@linux.intel.com
> >>>>> Date: Fri Feb 28 18:27:26 2025 +0800
> >>>>>
> >>>>> iommu/vt-d: Fix suspicious RCU usage
> >>>>>
> >>>>> ``````````````````````````````````````````````````````````````````
> >>>>> ``
> >>>>> ``
> >>>>> ```````````````````````````````````
> >>>>>
> >>>>> We also verified that if we revert the patch the issue is not seen.
> >>>>>
> >>>>> Could you please check why the patch causes this regression and
> >>>>> provide a
> >>>> fix if necessary?
> >>>>
> >>>> Can you please take a quick test to check if the following fix works?
> >>>>
> >>>> diff --git a/drivers/iommu/intel/dmar.c
> >>>> b/drivers/iommu/intel/dmar.c index
> >>>> e540092d664d..06debeaec643 100644
> >>>> --- a/drivers/iommu/intel/dmar.c
> >>>> +++ b/drivers/iommu/intel/dmar.c
> >>>> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
> >> cpu)
> >>>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
> >>>> continue;
> >>>>
> >>>> + /*
> >>>> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
> >>>> + * could cause possible lock race condition.
> >>>> + */
> >>>> + up_read(&dmar_global_lock);
> >>>> ret = dmar_set_interrupt(iommu);
> >>>> -
> >>>> + down_read(&dmar_global_lock);
> >>>> if (ret) {
> >>>> pr_err("DRHD %Lx: failed to enable
> >>>> fault, interrupt, ret
> >> %d\n",
> >>>> (unsigned long
> >>>> long)drhd->reg_base_addr, ret);
> >>>>
> >>>> Thanks,
> >>>> baolu
> >>> We still see the issue with this change.
> >> I am attempting to reproduce this issue with my MTL machine. I pulled
> >> the test branch from:
> >>
> >> https://anongit.freedesktop.org/git/drm-tip.git
> >>
> >> and built the test kernel image using the configuration file from:
> >>
> >> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
> >>
> >> But I did not observe the lockdep splat mentioned above after booting.
> >>
> >> Is there anything I might have missed?
> >>
> > +Suresh, Jani, Lucas
> >
> > We are seeing this only the skykale and kabylake on our CI runs.
>
> If so, will below change make any difference?
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 85aa66ef4d61..ec2f385ae25b 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3049,6 +3049,7 @@ static int __init
> probe_acpi_namespace_devices(void)
> if (dev->bus != &acpi_bus_type)
> continue;
>
> + up_read(&dmar_global_lock);
> adev = to_acpi_device(dev);
> mutex_lock(&adev->physical_node_lock);
> list_for_each_entry(pn, @@ -3058,6 +3059,7 @@ static int __init
> probe_acpi_namespace_devices(void)
> break;
> }
> mutex_unlock(&adev->physical_node_lock);
> + down_read(&dmar_global_lock);
>
> if (ret)
> return ret;
>
Thank you for the change. This seems to be working. Can we expect a fix patch soon?
Regards
Chaitanya
> Thanks,
> baolu
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Regression on drm-tip
2025-03-16 10:01 ` Borah, Chaitanya Kumar
@ 2025-03-17 4:04 ` Baolu Lu
2025-03-22 20:59 ` Lucas De Marchi
0 siblings, 1 reply; 22+ messages in thread
From: Baolu Lu @ 2025-03-17 4:04 UTC (permalink / raw)
To: Borah, Chaitanya Kumar
Cc: intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org,
iommu@lists.linux.dev, Kurmi, Suresh Kumar, Saarinen, Jani,
De Marchi, Lucas
On 3/16/25 18:01, Borah, Chaitanya Kumar wrote:
>
>> -----Original Message-----
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Sunday, March 16, 2025 1:33 PM
>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>> Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>> iommu@lists.linux.dev; Kurmi, Suresh Kumar
>> <suresh.kumar.kurmi@intel.com>; Saarinen, Jani<jani.saarinen@intel.com>;
>> De Marchi, Lucas<lucas.demarchi@intel.com>
>> Subject: Re: Regression on drm-tip
>>
>> On 3/16/25 15:27, Borah, Chaitanya Kumar wrote:
>>>> -----Original Message-----
>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>> Sent: Sunday, March 16, 2025 8:04 AM
>>>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>> Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>>>> iommu@lists.linux.dev
>>>> Subject: Re: Regression on drm-tip
>>>>
>>>> On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>>>>>> -----Original Message-----
>>>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>>>> Sent: Thursday, March 13, 2025 7:53 PM
>>>>>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>>>> Cc:baolu.lu@linux.intel.com;intel-gfx@lists.freedesktop.org; intel-
>>>>>> xe@lists.freedesktop.org;iommu@lists.linux.dev
>>>>>> Subject: Re: Regression on drm-tip
>>>>>>
>>>>>> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>>>>>> Hello Lu,
>>>>>>>
>>>>>>> Hope you are doing well. I am Chaitanya from the linux graphics
>>>>>>> team in
>>>>>> Intel.
>>>>>>> This mail is regarding a regression we are seeing in our CI
>>>>>>> runs[1] on drm-tip
>>>>>> repository.
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> `` `` ``````````` <4>[ 2.856622] WARNING: possible circular
>>>>>>> locking dependency detected <4>[ 2.856631]
>>>>>>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
>>>>>>> <4>[ 2.856642]
>>>>>>> ------------------------------------------------------
>>>>>>> <4>[ 2.856650] swapper/0/1 is trying to acquire lock:
>>>>>>> <4>[ 2.856657] ffffffff8360ecc8
>>>>>>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
>>>>>>> iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
>>>>>>> but task is already holding lock:
>>>>>>> <4>[ 2.856686] ffff888102ab6fa8
>>>>>>> (&device->physical_node_lock){+.+.}-{3:3}, at:
>>>>>>> intel_iommu_init+0xea1/0x1220
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> ``
>>>>>>> ``
>>>>>>> ```````````
>>>>>>> Details log can be found in [2].
>>>>>>>
>>>>>>> After bisecting the tree, the following patch [3] seems to be the
>>>>>>> first "bad" commit
>>>>>>>
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> ``
>>>>>>> ``
>>>>>>> ```````````````````````````````````
>>>>>>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>>>>>> Author:LuBaolumailto:baolu.lu@linux.intel.com
>>>>>>> Date: Fri Feb 28 18:27:26 2025 +0800
>>>>>>>
>>>>>>> iommu/vt-d: Fix suspicious RCU usage
>>>>>>>
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> ``
>>>>>>> ``
>>>>>>> ```````````````````````````````````
>>>>>>>
>>>>>>> We also verified that if we revert the patch the issue is not seen.
>>>>>>>
>>>>>>> Could you please check why the patch causes this regression and
>>>>>>> provide a
>>>>>> fix if necessary?
>>>>>>
>>>>>> Can you please take a quick test to check if the following fix works?
>>>>>>
>>>>>> diff --git a/drivers/iommu/intel/dmar.c
>>>>>> b/drivers/iommu/intel/dmar.c index
>>>>>> e540092d664d..06debeaec643 100644
>>>>>> --- a/drivers/iommu/intel/dmar.c
>>>>>> +++ b/drivers/iommu/intel/dmar.c
>>>>>> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
>>>> cpu)
>>>>>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
>>>>>> continue;
>>>>>>
>>>>>> + /*
>>>>>> + * Call dmar_alloc_hwirq() with dmar_global_lock held,
>>>>>> + * could cause possible lock race condition.
>>>>>> + */
>>>>>> + up_read(&dmar_global_lock);
>>>>>> ret = dmar_set_interrupt(iommu);
>>>>>> -
>>>>>> + down_read(&dmar_global_lock);
>>>>>> if (ret) {
>>>>>> pr_err("DRHD %Lx: failed to enable
>>>>>> fault, interrupt, ret
>>>> %d\n",
>>>>>> (unsigned long
>>>>>> long)drhd->reg_base_addr, ret);
>>>>>>
>>>>>> Thanks,
>>>>>> baolu
>>>>> We still see the issue with this change.
>>>> I am attempting to reproduce this issue with my MTL machine. I pulled
>>>> the test branch from:
>>>>
>>>> https://anongit.freedesktop.org/git/drm-tip.git
>>>>
>>>> and built the test kernel image using the configuration file from:
>>>>
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
>>>>
>>>> But I did not observe the lockdep splat mentioned above after booting.
>>>>
>>>> Is there anything I might have missed?
>>>>
>>> +Suresh, Jani, Lucas
>>>
>>> We are seeing this only the skykale and kabylake on our CI runs.
>> If so, will below change make any difference?
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 85aa66ef4d61..ec2f385ae25b 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -3049,6 +3049,7 @@ static int __init
>> probe_acpi_namespace_devices(void)
>> if (dev->bus != &acpi_bus_type)
>> continue;
>>
>> + up_read(&dmar_global_lock);
>> adev = to_acpi_device(dev);
>> mutex_lock(&adev->physical_node_lock);
>> list_for_each_entry(pn, @@ -3058,6 +3059,7 @@ static int __init
>> probe_acpi_namespace_devices(void)
>> break;
>> }
>> mutex_unlock(&adev->physical_node_lock);
>> + down_read(&dmar_global_lock);
>>
>> if (ret)
>> return ret;
>>
> Thank you for the change. This seems to be working. Can we expect a fix patch soon?
Sure. I have posted a fix patch here,
https://lore.kernel.org/linux-iommu/20250317035714.1041549-1-baolu.lu@linux.intel.com/
Thanks,
baolu
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Regression on drm-tip
2025-03-17 4:04 ` Baolu Lu
@ 2025-03-22 20:59 ` Lucas De Marchi
0 siblings, 0 replies; 22+ messages in thread
From: Lucas De Marchi @ 2025-03-22 20:59 UTC (permalink / raw)
To: Baolu Lu
Cc: Borah, Chaitanya Kumar, intel-gfx@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, iommu@lists.linux.dev,
Kurmi, Suresh Kumar, Saarinen, Jani
On Mon, Mar 17, 2025 at 12:04:40PM +0800, Baolu Lu wrote:
>On 3/16/25 18:01, Borah, Chaitanya Kumar wrote:
>>
>>>-----Original Message-----
>>>From: Baolu Lu<baolu.lu@linux.intel.com>
>>>Sent: Sunday, March 16, 2025 1:33 PM
>>>To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>>>iommu@lists.linux.dev; Kurmi, Suresh Kumar
>>><suresh.kumar.kurmi@intel.com>; Saarinen, Jani<jani.saarinen@intel.com>;
>>>De Marchi, Lucas<lucas.demarchi@intel.com>
>>>Subject: Re: Regression on drm-tip
>>>
>>>On 3/16/25 15:27, Borah, Chaitanya Kumar wrote:
>>>>>-----Original Message-----
>>>>>From: Baolu Lu<baolu.lu@linux.intel.com>
>>>>>Sent: Sunday, March 16, 2025 8:04 AM
>>>>>To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>>>Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>>>>>iommu@lists.linux.dev
>>>>>Subject: Re: Regression on drm-tip
>>>>>
>>>>>On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>>>>>>>-----Original Message-----
>>>>>>>From: Baolu Lu<baolu.lu@linux.intel.com>
>>>>>>>Sent: Thursday, March 13, 2025 7:53 PM
>>>>>>>To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>>>>>Cc:baolu.lu@linux.intel.com;intel-gfx@lists.freedesktop.org; intel-
>>>>>>>xe@lists.freedesktop.org;iommu@lists.linux.dev
>>>>>>>Subject: Re: Regression on drm-tip
>>>>>>>
>>>>>>>On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>>>>>>>Hello Lu,
>>>>>>>>
>>>>>>>>Hope you are doing well. I am Chaitanya from the linux graphics
>>>>>>>>team in
>>>>>>>Intel.
>>>>>>>>This mail is regarding a regression we are seeing in our CI
>>>>>>>>runs[1] on drm-tip
>>>>>>>repository.
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>`` `` ``````````` <4>[ 2.856622] WARNING: possible circular
>>>>>>>>locking dependency detected <4>[ 2.856631]
>>>>>>>>6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I
>>>>>>>><4>[ 2.856642]
>>>>>>>>------------------------------------------------------
>>>>>>>><4>[ 2.856650] swapper/0/1 is trying to acquire lock:
>>>>>>>><4>[ 2.856657] ffffffff8360ecc8
>>>>>>>>(iommu_probe_device_lock){+.+.}-{3:3}, at:
>>>>>>>>iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
>>>>>>>> but task is already holding lock:
>>>>>>>><4>[ 2.856686] ffff888102ab6fa8
>>>>>>>>(&device->physical_node_lock){+.+.}-{3:3}, at:
>>>>>>>>intel_iommu_init+0xea1/0x1220
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>``
>>>>>>>>``
>>>>>>>>```````````
>>>>>>>>Details log can be found in [2].
>>>>>>>>
>>>>>>>>After bisecting the tree, the following patch [3] seems to be the
>>>>>>>>first "bad" commit
>>>>>>>>
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>``
>>>>>>>>``
>>>>>>>>```````````````````````````````````
>>>>>>>>commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>>>>>>>Author:LuBaolumailto:baolu.lu@linux.intel.com
>>>>>>>>Date: Fri Feb 28 18:27:26 2025 +0800
>>>>>>>>
>>>>>>>> iommu/vt-d: Fix suspicious RCU usage
>>>>>>>>
>>>>>>>>``````````````````````````````````````````````````````````````````
>>>>>>>>``
>>>>>>>>``
>>>>>>>>```````````````````````````````````
>>>>>>>>
>>>>>>>>We also verified that if we revert the patch the issue is not seen.
>>>>>>>>
>>>>>>>>Could you please check why the patch causes this regression and
>>>>>>>>provide a
>>>>>>>fix if necessary?
>>>>>>>
>>>>>>>Can you please take a quick test to check if the following fix works?
>>>>>>>
>>>>>>>diff --git a/drivers/iommu/intel/dmar.c
>>>>>>>b/drivers/iommu/intel/dmar.c index
>>>>>>>e540092d664d..06debeaec643 100644
>>>>>>>--- a/drivers/iommu/intel/dmar.c
>>>>>>>+++ b/drivers/iommu/intel/dmar.c
>>>>>>>@@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
>>>>>cpu)
>>>>>>> if (iommu->irq || iommu->node != cpu_to_node(cpu))
>>>>>>> continue;
>>>>>>>
>>>>>>>+ /*
>>>>>>>+ * Call dmar_alloc_hwirq() with dmar_global_lock held,
>>>>>>>+ * could cause possible lock race condition.
>>>>>>>+ */
>>>>>>>+ up_read(&dmar_global_lock);
>>>>>>> ret = dmar_set_interrupt(iommu);
>>>>>>>-
>>>>>>>+ down_read(&dmar_global_lock);
>>>>>>> if (ret) {
>>>>>>> pr_err("DRHD %Lx: failed to enable
>>>>>>>fault, interrupt, ret
>>>>>%d\n",
>>>>>>> (unsigned long
>>>>>>>long)drhd->reg_base_addr, ret);
>>>>>>>
>>>>>>>Thanks,
>>>>>>>baolu
>>>>>>We still see the issue with this change.
>>>>>I am attempting to reproduce this issue with my MTL machine. I pulled
>>>>>the test branch from:
>>>>>
>>>>>https://anongit.freedesktop.org/git/drm-tip.git
>>>>>
>>>>>and built the test kernel image using the configuration file from:
>>>>>
>>>>>https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
>>>>>
>>>>>But I did not observe the lockdep splat mentioned above after booting.
>>>>>
>>>>>Is there anything I might have missed?
>>>>>
>>>>+Suresh, Jani, Lucas
>>>>
>>>>We are seeing this only the skykale and kabylake on our CI runs.
>>>If so, will below change make any difference?
>>>
>>>diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>index 85aa66ef4d61..ec2f385ae25b 100644
>>>--- a/drivers/iommu/intel/iommu.c
>>>+++ b/drivers/iommu/intel/iommu.c
>>>@@ -3049,6 +3049,7 @@ static int __init
>>>probe_acpi_namespace_devices(void)
>>> if (dev->bus != &acpi_bus_type)
>>> continue;
>>>
>>>+ up_read(&dmar_global_lock);
>>> adev = to_acpi_device(dev);
>>> mutex_lock(&adev->physical_node_lock);
>>> list_for_each_entry(pn, @@ -3058,6 +3059,7 @@ static int __init
>>>probe_acpi_namespace_devices(void)
>>> break;
>>> }
>>> mutex_unlock(&adev->physical_node_lock);
>>>+ down_read(&dmar_global_lock);
>>>
>>> if (ret)
>>> return ret;
>>>
>>Thank you for the change. This seems to be working. Can we expect a fix patch soon?
>
>Sure. I have posted a fix patch here,
>
>https://lore.kernel.org/linux-iommu/20250317035714.1041549-1-baolu.lu@linux.intel.com/
Thanks. FWIW I added this patch to our test branch in CI and the issue
is indeed not reproducing anymore.
Lucas De Marchi
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-12-05 10:14 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 6:25 REGRESSION on drm-tip Borah, Chaitanya Kumar
2025-11-27 16:01 ` Saarinen, Jani
2025-11-27 16:06 ` Saarinen, Jani
2025-11-27 23:04 ` Ville Syrjälä
2025-11-28 7:46 ` Borah, Chaitanya Kumar
2025-12-05 10:14 ` Christian Brauner
2025-12-01 16:13 ` Saarinen, Jani
2025-12-03 13:34 ` BISECTED REGRESSION on v6.18 (was: REGRESSION on drm-tip) Jani Nikula
2025-12-03 13:36 ` Jani Nikula
2025-12-03 13:40 ` Rafael J. Wysocki
2025-12-05 10:14 ` REGRESSION on drm-tip Christian Brauner
-- strict thread matches above, loose matches on Subject: below --
2025-04-28 6:02 Regression " Borah, Chaitanya Kumar
2025-03-13 8:51 Borah, Chaitanya Kumar
2025-03-13 9:30 ` Baolu Lu
2025-03-13 14:23 ` Baolu Lu
2025-03-14 9:04 ` Borah, Chaitanya Kumar
2025-03-16 2:33 ` Baolu Lu
2025-03-16 7:27 ` Borah, Chaitanya Kumar
2025-03-16 8:03 ` Baolu Lu
2025-03-16 10:01 ` Borah, Chaitanya Kumar
2025-03-17 4:04 ` Baolu Lu
2025-03-22 20:59 ` Lucas De Marchi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox