* [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
@ 2026-06-17 8:21 Tejas Upadhyay
2026-06-17 11:25 ` Matthew Auld
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Tejas Upadhyay @ 2026-06-17 8:21 UTC (permalink / raw)
To: intel-xe; +Cc: Tejas Upadhyay, Jani Nikula, Badal Nilawar, Raag Jadav
A circular locking dependency is reported between xe_pm_block_map and
dev->clientlist_mutex:
&dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex
The dependency chain is:
1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
xe_display_pm_suspend() which calls drm_client_dev_suspend()
acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
prelim_xe_eudebug_file_close() acquires discovery_lock.
(clientlist_mutex -> discovery_lock)
3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
(discovery_lock -> vm->lock)
4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
Fix this by moving drm_client_dev_suspend() before
xe_pm_block_begin_signalling() in xe_pm_suspend(), and
drm_client_dev_resume() after xe_pm_block_end_signalling() in
xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
edge in the dependency chain.
The drm_client_dev_suspend/resume calls are safe to move because they
only blank/unblank the fbdev console and don't depend on display power
state or any operations within the block signalling scope.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
drivers/gpu/drm/xe/display/xe_display.c | 3 ---
drivers/gpu/drm/xe/xe_pm.c | 5 +++++
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 810d93fefcbc..ec65dcbc3fdc 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe)
* properly.
*/
intel_display_power_disable(display);
- drm_client_dev_suspend(&xe->drm);
if (intel_display_device_present(display)) {
drm_kms_helper_poll_disable(&xe->drm);
@@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe)
intel_opregion_resume(display);
- drm_client_dev_resume(&xe->drm);
-
intel_display_power_enable(display);
}
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 99562f691080..d00c620d0926 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -10,6 +10,7 @@
#include <linux/suspend.h>
#include <linux/dmi.h>
+#include <drm/drm_client_event.h>
#include <drm/drm_managed.h>
#include <drm/ttm/ttm_placement.h>
@@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe)
int err;
drm_dbg(&xe->drm, "Suspending device\n");
+ drm_client_dev_suspend(&xe->drm);
xe_pm_block_begin_signalling();
trace_xe_pm_suspend(xe, __builtin_return_address(0));
@@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe)
err:
drm_dbg(&xe->drm, "Device suspend failed %d\n", err);
xe_pm_block_end_signalling();
+ drm_client_dev_resume(&xe->drm);
return err;
}
@@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe)
drm_dbg(&xe->drm, "Device resumed\n");
xe_pm_block_end_signalling();
+ drm_client_dev_resume(&xe->drm);
return 0;
err:
drm_dbg(&xe->drm, "Device resume failed %d\n", err);
xe_pm_block_end_signalling();
+ drm_client_dev_resume(&xe->drm);
return err;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling 2026-06-17 8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay @ 2026-06-17 11:25 ` Matthew Auld 2026-06-17 12:11 ` Upadhyay, Tejas 2026-06-17 11:30 ` Jani Nikula 2026-06-17 11:56 ` Raag Jadav 2 siblings, 1 reply; 7+ messages in thread From: Matthew Auld @ 2026-06-17 11:25 UTC (permalink / raw) To: Tejas Upadhyay, intel-xe; +Cc: Jani Nikula, Badal Nilawar, Raag Jadav On 17/06/2026 09:21, Tejas Upadhyay wrote: > A circular locking dependency is reported between xe_pm_block_map and > dev->clientlist_mutex: > > &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex > > The dependency chain is: > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then > xe_display_pm_suspend() which calls drm_client_dev_suspend() > acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex) > > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() -> > prelim_xe_eudebug_file_close() acquires discovery_lock. > (clientlist_mutex -> discovery_lock) > > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock. > (discovery_lock -> vm->lock) > > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend() > which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map) > > Fix this by moving drm_client_dev_suspend() before > xe_pm_block_begin_signalling() in xe_pm_suspend(), and > drm_client_dev_resume() after xe_pm_block_end_signalling() in > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex > edge in the dependency chain. Is this not just hiding the issue from lockdep though? The pm_begin/end are not real locks, they are just there as annotations to teach lockdep about the hidden pm dependencies. For example, any locks you grab in rpm suspend()/resume() you are not allowed to hold when waking the device up, otherwise you can potentially deadlock. So here moving stuff outside the annotations probably doesn't really fix anything, it just hides it from lockdep so it doesn't complain. The above looks like it's complaining that clientlist_mutex is held and indirectly ends up waiting on something pm related, but the pm side itself can also grab that lock? That seems like a legit complaint and should be fixed in the upper layers? > > The drm_client_dev_suspend/resume calls are safe to move because they > only blank/unblank the fbdev console and don't depend on display power > state or any operations within the block signalling scope. > > Cc: Jani Nikula <jani.nikula@linux.intel.com> > Cc: Badal Nilawar <badal.nilawar@intel.com> > Cc: Raag Jadav <raag.jadav@intel.com> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> > --- > drivers/gpu/drm/xe/display/xe_display.c | 3 --- > drivers/gpu/drm/xe/xe_pm.c | 5 +++++ > 2 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c > index 810d93fefcbc..ec65dcbc3fdc 100644 > --- a/drivers/gpu/drm/xe/display/xe_display.c > +++ b/drivers/gpu/drm/xe/display/xe_display.c > @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe) > * properly. > */ > intel_display_power_disable(display); > - drm_client_dev_suspend(&xe->drm); > > if (intel_display_device_present(display)) { > drm_kms_helper_poll_disable(&xe->drm); > @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe) > > intel_opregion_resume(display); > > - drm_client_dev_resume(&xe->drm); > - > intel_display_power_enable(display); > } > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c > index 99562f691080..d00c620d0926 100644 > --- a/drivers/gpu/drm/xe/xe_pm.c > +++ b/drivers/gpu/drm/xe/xe_pm.c > @@ -10,6 +10,7 @@ > #include <linux/suspend.h> > #include <linux/dmi.h> > > +#include <drm/drm_client_event.h> > #include <drm/drm_managed.h> > #include <drm/ttm/ttm_placement.h> > > @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe) > int err; > > drm_dbg(&xe->drm, "Suspending device\n"); > + drm_client_dev_suspend(&xe->drm); > xe_pm_block_begin_signalling(); > trace_xe_pm_suspend(xe, __builtin_return_address(0)); > > @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe) > err: > drm_dbg(&xe->drm, "Device suspend failed %d\n", err); > xe_pm_block_end_signalling(); > + drm_client_dev_resume(&xe->drm); > return err; > } > > @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe) > > drm_dbg(&xe->drm, "Device resumed\n"); > xe_pm_block_end_signalling(); > + drm_client_dev_resume(&xe->drm); > return 0; > err: > drm_dbg(&xe->drm, "Device resume failed %d\n", err); > xe_pm_block_end_signalling(); > + drm_client_dev_resume(&xe->drm); > return err; > } > ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling 2026-06-17 11:25 ` Matthew Auld @ 2026-06-17 12:11 ` Upadhyay, Tejas 2026-06-17 14:13 ` Matthew Auld 0 siblings, 1 reply; 7+ messages in thread From: Upadhyay, Tejas @ 2026-06-17 12:11 UTC (permalink / raw) To: Auld, Matthew, intel-xe@lists.freedesktop.org Cc: Jani Nikula, Nilawar, Badal, Jadav, Raag > -----Original Message----- > From: Auld, Matthew <matthew.auld@intel.com> > Sent: 17 June 2026 16:55 > To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel- > xe@lists.freedesktop.org > Cc: Jani Nikula <jani.nikula@linux.intel.com>; Nilawar, Badal > <badal.nilawar@intel.com>; Jadav, Raag <raag.jadav@intel.com> > Subject: Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside > pm block signalling > > On 17/06/2026 09:21, Tejas Upadhyay wrote: > > A circular locking dependency is reported between xe_pm_block_map and > > dev->clientlist_mutex: > > > > &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> > > &dev->clientlist_mutex > > > > The dependency chain is: > > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then > > xe_display_pm_suspend() which calls drm_client_dev_suspend() > > acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex) > > > > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() -> > > prelim_xe_eudebug_file_close() acquires discovery_lock. > > (clientlist_mutex -> discovery_lock) > > > > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock. > > (discovery_lock -> vm->lock) > > > > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend() > > which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map) > > > > Fix this by moving drm_client_dev_suspend() before > > xe_pm_block_begin_signalling() in xe_pm_suspend(), and > > drm_client_dev_resume() after xe_pm_block_end_signalling() in > > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex > > edge in the dependency chain. > > Is this not just hiding the issue from lockdep though? The pm_begin/end are > not real locks, they are just there as annotations to teach lockdep about the > hidden pm dependencies. For example, any locks you grab in rpm > suspend()/resume() you are not allowed to hold when waking the device up, > otherwise you can potentially deadlock. So here moving stuff outside the > annotations probably doesn't really fix anything, it just hides it from lockdep > so it doesn't complain. > > The above looks like it's complaining that clientlist_mutex is held and indirectly > ends up waiting on something pm related, but the pm side itself can also grab > that lock? That seems like a legit complaint and should be fixed in the upper > layers? Then it touches the eudebug subsystem as it most likely come from eudebug teardown, since drm_client_dev_suspend/resume doesn't functionally need to be inside the pm_block window (it just blanks/unblanks fbdev console), moving it outside is the minimal fix imo. Tejas > > > > > The drm_client_dev_suspend/resume calls are safe to move because they > > only blank/unblank the fbdev console and don't depend on display power > > state or any operations within the block signalling scope. > > > > Cc: Jani Nikula <jani.nikula@linux.intel.com> > > Cc: Badal Nilawar <badal.nilawar@intel.com> > > Cc: Raag Jadav <raag.jadav@intel.com> > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> > > --- > > drivers/gpu/drm/xe/display/xe_display.c | 3 --- > > drivers/gpu/drm/xe/xe_pm.c | 5 +++++ > > 2 files changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/display/xe_display.c > b/drivers/gpu/drm/xe/display/xe_display.c > > index 810d93fefcbc..ec65dcbc3fdc 100644 > > --- a/drivers/gpu/drm/xe/display/xe_display.c > > +++ b/drivers/gpu/drm/xe/display/xe_display.c > > @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe) > > * properly. > > */ > > intel_display_power_disable(display); > > - drm_client_dev_suspend(&xe->drm); > > > > if (intel_display_device_present(display)) { > > drm_kms_helper_poll_disable(&xe->drm); > > @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe) > > > > intel_opregion_resume(display); > > > > - drm_client_dev_resume(&xe->drm); > > - > > intel_display_power_enable(display); > > } > > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c > > index 99562f691080..d00c620d0926 100644 > > --- a/drivers/gpu/drm/xe/xe_pm.c > > +++ b/drivers/gpu/drm/xe/xe_pm.c > > @@ -10,6 +10,7 @@ > > #include <linux/suspend.h> > > #include <linux/dmi.h> > > > > +#include <drm/drm_client_event.h> > > #include <drm/drm_managed.h> > > #include <drm/ttm/ttm_placement.h> > > > > @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe) > > int err; > > > > drm_dbg(&xe->drm, "Suspending device\n"); > > + drm_client_dev_suspend(&xe->drm); > > xe_pm_block_begin_signalling(); > > trace_xe_pm_suspend(xe, __builtin_return_address(0)); > > > > @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe) > > err: > > drm_dbg(&xe->drm, "Device suspend failed %d\n", err); > > xe_pm_block_end_signalling(); > > + drm_client_dev_resume(&xe->drm); > > return err; > > } > > > > @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe) > > > > drm_dbg(&xe->drm, "Device resumed\n"); > > xe_pm_block_end_signalling(); > > + drm_client_dev_resume(&xe->drm); > > return 0; > > err: > > drm_dbg(&xe->drm, "Device resume failed %d\n", err); > > xe_pm_block_end_signalling(); > > + drm_client_dev_resume(&xe->drm); > > return err; > > } > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling 2026-06-17 12:11 ` Upadhyay, Tejas @ 2026-06-17 14:13 ` Matthew Auld 0 siblings, 0 replies; 7+ messages in thread From: Matthew Auld @ 2026-06-17 14:13 UTC (permalink / raw) To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org Cc: Jani Nikula, Nilawar, Badal, Jadav, Raag On 17/06/2026 13:11, Upadhyay, Tejas wrote: > > >> -----Original Message----- >> From: Auld, Matthew <matthew.auld@intel.com> >> Sent: 17 June 2026 16:55 >> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel- >> xe@lists.freedesktop.org >> Cc: Jani Nikula <jani.nikula@linux.intel.com>; Nilawar, Badal >> <badal.nilawar@intel.com>; Jadav, Raag <raag.jadav@intel.com> >> Subject: Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside >> pm block signalling >> >> On 17/06/2026 09:21, Tejas Upadhyay wrote: >>> A circular locking dependency is reported between xe_pm_block_map and >>> dev->clientlist_mutex: >>> >>> &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> >>> &dev->clientlist_mutex >>> >>> The dependency chain is: >>> 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then >>> xe_display_pm_suspend() which calls drm_client_dev_suspend() >>> acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex) >>> >>> 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() -> >>> prelim_xe_eudebug_file_close() acquires discovery_lock. >>> (clientlist_mutex -> discovery_lock) >>> >>> 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock. >>> (discovery_lock -> vm->lock) >>> >>> 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend() >>> which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map) >>> >>> Fix this by moving drm_client_dev_suspend() before >>> xe_pm_block_begin_signalling() in xe_pm_suspend(), and >>> drm_client_dev_resume() after xe_pm_block_end_signalling() in >>> xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex >>> edge in the dependency chain. >> >> Is this not just hiding the issue from lockdep though? The pm_begin/end are >> not real locks, they are just there as annotations to teach lockdep about the >> hidden pm dependencies. For example, any locks you grab in rpm >> suspend()/resume() you are not allowed to hold when waking the device up, >> otherwise you can potentially deadlock. So here moving stuff outside the >> annotations probably doesn't really fix anything, it just hides it from lockdep >> so it doesn't complain. >> >> The above looks like it's complaining that clientlist_mutex is held and indirectly >> ends up waiting on something pm related, but the pm side itself can also grab >> that lock? That seems like a legit complaint and should be fixed in the upper >> layers? > > Then it touches the eudebug subsystem as it most likely come from eudebug teardown, since drm_client_dev_suspend/resume doesn't functionally need to be inside the pm_block window (it just blanks/unblanks fbdev console), moving it outside is the minimal fix imo. If it's called in resume() or suspend() and it grabs clientlist_mutex then it is already part of the pm flow. Any locks you grab in here are part of the pm flow. The annotation here is just recording that dependency. If in the upper layers you then grab that same lock and then do something that depends on the pm flow, it has the potential to deadlock. The above seems to be saying you have some kind of locking inversion with clientlist_mutex vs pm flow. Moving that lock outside the annotation likely just gets rids of the lockdep splat, and is not really a proper fix AFAICT. For example, you grab lock A in resume(), but in the upper layers there is a path which can grab lock A and then end up indirectly depending on the resume() completing before continuing. That has the potential to deadlock, and the annotation here will help detect it. > > Tejas >> >>> >>> The drm_client_dev_suspend/resume calls are safe to move because they >>> only blank/unblank the fbdev console and don't depend on display power >>> state or any operations within the block signalling scope. >>> >>> Cc: Jani Nikula <jani.nikula@linux.intel.com> >>> Cc: Badal Nilawar <badal.nilawar@intel.com> >>> Cc: Raag Jadav <raag.jadav@intel.com> >>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> >>> --- >>> drivers/gpu/drm/xe/display/xe_display.c | 3 --- >>> drivers/gpu/drm/xe/xe_pm.c | 5 +++++ >>> 2 files changed, 5 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/xe/display/xe_display.c >> b/drivers/gpu/drm/xe/display/xe_display.c >>> index 810d93fefcbc..ec65dcbc3fdc 100644 >>> --- a/drivers/gpu/drm/xe/display/xe_display.c >>> +++ b/drivers/gpu/drm/xe/display/xe_display.c >>> @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe) >>> * properly. >>> */ >>> intel_display_power_disable(display); >>> - drm_client_dev_suspend(&xe->drm); >>> >>> if (intel_display_device_present(display)) { >>> drm_kms_helper_poll_disable(&xe->drm); >>> @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe) >>> >>> intel_opregion_resume(display); >>> >>> - drm_client_dev_resume(&xe->drm); >>> - >>> intel_display_power_enable(display); >>> } >>> >>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c >>> index 99562f691080..d00c620d0926 100644 >>> --- a/drivers/gpu/drm/xe/xe_pm.c >>> +++ b/drivers/gpu/drm/xe/xe_pm.c >>> @@ -10,6 +10,7 @@ >>> #include <linux/suspend.h> >>> #include <linux/dmi.h> >>> >>> +#include <drm/drm_client_event.h> >>> #include <drm/drm_managed.h> >>> #include <drm/ttm/ttm_placement.h> >>> >>> @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe) >>> int err; >>> >>> drm_dbg(&xe->drm, "Suspending device\n"); >>> + drm_client_dev_suspend(&xe->drm); >>> xe_pm_block_begin_signalling(); >>> trace_xe_pm_suspend(xe, __builtin_return_address(0)); >>> >>> @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe) >>> err: >>> drm_dbg(&xe->drm, "Device suspend failed %d\n", err); >>> xe_pm_block_end_signalling(); >>> + drm_client_dev_resume(&xe->drm); >>> return err; >>> } >>> >>> @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe) >>> >>> drm_dbg(&xe->drm, "Device resumed\n"); >>> xe_pm_block_end_signalling(); >>> + drm_client_dev_resume(&xe->drm); >>> return 0; >>> err: >>> drm_dbg(&xe->drm, "Device resume failed %d\n", err); >>> xe_pm_block_end_signalling(); >>> + drm_client_dev_resume(&xe->drm); >>> return err; >>> } >>> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling 2026-06-17 8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay 2026-06-17 11:25 ` Matthew Auld @ 2026-06-17 11:30 ` Jani Nikula 2026-06-17 11:56 ` Raag Jadav 2 siblings, 0 replies; 7+ messages in thread From: Jani Nikula @ 2026-06-17 11:30 UTC (permalink / raw) To: Tejas Upadhyay, intel-xe; +Cc: Tejas Upadhyay, Badal Nilawar, Raag Jadav On Wed, 17 Jun 2026, Tejas Upadhyay <tejas.upadhyay@intel.com> wrote: > A circular locking dependency is reported between xe_pm_block_map and > dev->clientlist_mutex: > > &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex > > The dependency chain is: > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then > xe_display_pm_suspend() which calls drm_client_dev_suspend() > acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex) > > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() -> > prelim_xe_eudebug_file_close() acquires discovery_lock. > (clientlist_mutex -> discovery_lock) > > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock. > (discovery_lock -> vm->lock) > > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend() > which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map) > > Fix this by moving drm_client_dev_suspend() before > xe_pm_block_begin_signalling() in xe_pm_suspend(), and > drm_client_dev_resume() after xe_pm_block_end_signalling() in > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex > edge in the dependency chain. > > The drm_client_dev_suspend/resume calls are safe to move because they > only blank/unblank the fbdev console and don't depend on display power > state or any operations within the block signalling scope. What baseline are you using? See f597d4b2fd0d ("drm/{i915, xe}: move more calls inside intel_display_driver_pm_resume()"), pushed almost two weeks ago. The direction must be to move display related calls to display, and neither i915 nor xe core driver should be messing with anything to do with display directly. Need to find another solution. BR, Jani. > > Cc: Jani Nikula <jani.nikula@linux.intel.com> > Cc: Badal Nilawar <badal.nilawar@intel.com> > Cc: Raag Jadav <raag.jadav@intel.com> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> > --- > drivers/gpu/drm/xe/display/xe_display.c | 3 --- > drivers/gpu/drm/xe/xe_pm.c | 5 +++++ > 2 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c > index 810d93fefcbc..ec65dcbc3fdc 100644 > --- a/drivers/gpu/drm/xe/display/xe_display.c > +++ b/drivers/gpu/drm/xe/display/xe_display.c > @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe) > * properly. > */ > intel_display_power_disable(display); > - drm_client_dev_suspend(&xe->drm); > > if (intel_display_device_present(display)) { > drm_kms_helper_poll_disable(&xe->drm); > @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe) > > intel_opregion_resume(display); > > - drm_client_dev_resume(&xe->drm); > - > intel_display_power_enable(display); > } > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c > index 99562f691080..d00c620d0926 100644 > --- a/drivers/gpu/drm/xe/xe_pm.c > +++ b/drivers/gpu/drm/xe/xe_pm.c > @@ -10,6 +10,7 @@ > #include <linux/suspend.h> > #include <linux/dmi.h> > > +#include <drm/drm_client_event.h> > #include <drm/drm_managed.h> > #include <drm/ttm/ttm_placement.h> > > @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe) > int err; > > drm_dbg(&xe->drm, "Suspending device\n"); > + drm_client_dev_suspend(&xe->drm); > xe_pm_block_begin_signalling(); > trace_xe_pm_suspend(xe, __builtin_return_address(0)); > > @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe) > err: > drm_dbg(&xe->drm, "Device suspend failed %d\n", err); > xe_pm_block_end_signalling(); > + drm_client_dev_resume(&xe->drm); > return err; > } > > @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe) > > drm_dbg(&xe->drm, "Device resumed\n"); > xe_pm_block_end_signalling(); > + drm_client_dev_resume(&xe->drm); > return 0; > err: > drm_dbg(&xe->drm, "Device resume failed %d\n", err); > xe_pm_block_end_signalling(); > + drm_client_dev_resume(&xe->drm); > return err; > } -- Jani Nikula, Intel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling 2026-06-17 8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay 2026-06-17 11:25 ` Matthew Auld 2026-06-17 11:30 ` Jani Nikula @ 2026-06-17 11:56 ` Raag Jadav 2026-06-17 12:12 ` Upadhyay, Tejas 2 siblings, 1 reply; 7+ messages in thread From: Raag Jadav @ 2026-06-17 11:56 UTC (permalink / raw) To: Tejas Upadhyay; +Cc: intel-xe, Jani Nikula, Badal Nilawar On Wed, Jun 17, 2026 at 01:51:05PM +0530, Tejas Upadhyay wrote: > A circular locking dependency is reported between xe_pm_block_map and > dev->clientlist_mutex: > > &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex > > The dependency chain is: > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then > xe_display_pm_suspend() which calls drm_client_dev_suspend() > acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex) > > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() -> > prelim_xe_eudebug_file_close() acquires discovery_lock. > (clientlist_mutex -> discovery_lock) > > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock. > (discovery_lock -> vm->lock) > > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend() > which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map) > > Fix this by moving drm_client_dev_suspend() before > xe_pm_block_begin_signalling() in xe_pm_suspend(), and > drm_client_dev_resume() after xe_pm_block_end_signalling() in > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex > edge in the dependency chain. > > The drm_client_dev_suspend/resume calls are safe to move because they > only blank/unblank the fbdev console and don't depend on display power > state or any operations within the block signalling scope. Other folks already have interesting comments but just FYI: We'd want to make sure fbdev is not left blank on suspend/resume failure so the user atleast have an opportunity to see the failure and possibly add to the bug report. You can refer to the comment above xe_display_pm_resume(). Raag ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling 2026-06-17 11:56 ` Raag Jadav @ 2026-06-17 12:12 ` Upadhyay, Tejas 0 siblings, 0 replies; 7+ messages in thread From: Upadhyay, Tejas @ 2026-06-17 12:12 UTC (permalink / raw) To: Jadav, Raag; +Cc: intel-xe@lists.freedesktop.org, Jani Nikula, Nilawar, Badal > -----Original Message----- > From: Jadav, Raag <raag.jadav@intel.com> > Sent: 17 June 2026 17:27 > To: Upadhyay, Tejas <tejas.upadhyay@intel.com> > Cc: intel-xe@lists.freedesktop.org; Jani Nikula <jani.nikula@linux.intel.com>; > Nilawar, Badal <badal.nilawar@intel.com> > Subject: Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside > pm block signalling > > On Wed, Jun 17, 2026 at 01:51:05PM +0530, Tejas Upadhyay wrote: > > A circular locking dependency is reported between xe_pm_block_map and > > dev->clientlist_mutex: > > > > &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> > > &dev->clientlist_mutex > > > > The dependency chain is: > > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then > > xe_display_pm_suspend() which calls drm_client_dev_suspend() > > acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex) > > > > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() -> > > prelim_xe_eudebug_file_close() acquires discovery_lock. > > (clientlist_mutex -> discovery_lock) > > > > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock. > > (discovery_lock -> vm->lock) > > > > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend() > > which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map) > > > > Fix this by moving drm_client_dev_suspend() before > > xe_pm_block_begin_signalling() in xe_pm_suspend(), and > > drm_client_dev_resume() after xe_pm_block_end_signalling() in > > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex > > edge in the dependency chain. > > > > The drm_client_dev_suspend/resume calls are safe to move because they > > only blank/unblank the fbdev console and don't depend on display power > > state or any operations within the block signalling scope. > > Other folks already have interesting comments but just FYI: > We'd want to make sure fbdev is not left blank on suspend/resume failure so > the user atleast have an opportunity to see the failure and possibly add to the > bug report. > > You can refer to the comment above xe_display_pm_resume(). Thanks for flagging this. The patch already handles this as in the suspend error paths, drm_client_dev_resume() is called at err: (which err_display falls through to), so the fbdev console is always unblanked on failure. Similarly in xe_pm_resume(), both the success and error paths call drm_client_dev_resume(), consistent with the existing pattern of calling xe_display_pm_resume() before bailing on GT resume failure. So the user always gets their screen back on any suspend/resume failure as far as I understand it. Tejas > > Raag ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-17 14:13 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-17 8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay 2026-06-17 11:25 ` Matthew Auld 2026-06-17 12:11 ` Upadhyay, Tejas 2026-06-17 14:13 ` Matthew Auld 2026-06-17 11:30 ` Jani Nikula 2026-06-17 11:56 ` Raag Jadav 2026-06-17 12:12 ` Upadhyay, Tejas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.