All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
@ 2026-06-17  8:21 Tejas Upadhyay
  2026-06-17 11:25 ` Matthew Auld
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tejas Upadhyay @ 2026-06-17  8:21 UTC (permalink / raw)
  To: intel-xe; +Cc: Tejas Upadhyay, Jani Nikula, Badal Nilawar, Raag Jadav

A circular locking dependency is reported between xe_pm_block_map and
dev->clientlist_mutex:

  &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex

The dependency chain is:
1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
   xe_display_pm_suspend() which calls drm_client_dev_suspend()
   acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)

2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
   prelim_xe_eudebug_file_close() acquires discovery_lock.
   (clientlist_mutex -> discovery_lock)

3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
   (discovery_lock -> vm->lock)

4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
   which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)

Fix this by moving drm_client_dev_suspend() before
xe_pm_block_begin_signalling() in xe_pm_suspend(), and
drm_client_dev_resume() after xe_pm_block_end_signalling() in
xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
edge in the dependency chain.

The drm_client_dev_suspend/resume calls are safe to move because they
only blank/unblank the fbdev console and don't depend on display power
state or any operations within the block signalling scope.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/display/xe_display.c | 3 ---
 drivers/gpu/drm/xe/xe_pm.c              | 5 +++++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 810d93fefcbc..ec65dcbc3fdc 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe)
 	 * properly.
 	 */
 	intel_display_power_disable(display);
-	drm_client_dev_suspend(&xe->drm);
 
 	if (intel_display_device_present(display)) {
 		drm_kms_helper_poll_disable(&xe->drm);
@@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe)
 
 	intel_opregion_resume(display);
 
-	drm_client_dev_resume(&xe->drm);
-
 	intel_display_power_enable(display);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 99562f691080..d00c620d0926 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -10,6 +10,7 @@
 #include <linux/suspend.h>
 #include <linux/dmi.h>
 
+#include <drm/drm_client_event.h>
 #include <drm/drm_managed.h>
 #include <drm/ttm/ttm_placement.h>
 
@@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe)
 	int err;
 
 	drm_dbg(&xe->drm, "Suspending device\n");
+	drm_client_dev_suspend(&xe->drm);
 	xe_pm_block_begin_signalling();
 	trace_xe_pm_suspend(xe, __builtin_return_address(0));
 
@@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe)
 err:
 	drm_dbg(&xe->drm, "Device suspend failed %d\n", err);
 	xe_pm_block_end_signalling();
+	drm_client_dev_resume(&xe->drm);
 	return err;
 }
 
@@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe)
 
 	drm_dbg(&xe->drm, "Device resumed\n");
 	xe_pm_block_end_signalling();
+	drm_client_dev_resume(&xe->drm);
 	return 0;
 err:
 	drm_dbg(&xe->drm, "Device resume failed %d\n", err);
 	xe_pm_block_end_signalling();
+	drm_client_dev_resume(&xe->drm);
 	return err;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
  2026-06-17  8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay
@ 2026-06-17 11:25 ` Matthew Auld
  2026-06-17 12:11   ` Upadhyay, Tejas
  2026-06-17 11:30 ` Jani Nikula
  2026-06-17 11:56 ` Raag Jadav
  2 siblings, 1 reply; 7+ messages in thread
From: Matthew Auld @ 2026-06-17 11:25 UTC (permalink / raw)
  To: Tejas Upadhyay, intel-xe; +Cc: Jani Nikula, Badal Nilawar, Raag Jadav

On 17/06/2026 09:21, Tejas Upadhyay wrote:
> A circular locking dependency is reported between xe_pm_block_map and
> dev->clientlist_mutex:
> 
>    &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex
> 
> The dependency chain is:
> 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
>     xe_display_pm_suspend() which calls drm_client_dev_suspend()
>     acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
> 
> 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
>     prelim_xe_eudebug_file_close() acquires discovery_lock.
>     (clientlist_mutex -> discovery_lock)
> 
> 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
>     (discovery_lock -> vm->lock)
> 
> 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
>     which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
> 
> Fix this by moving drm_client_dev_suspend() before
> xe_pm_block_begin_signalling() in xe_pm_suspend(), and
> drm_client_dev_resume() after xe_pm_block_end_signalling() in
> xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
> edge in the dependency chain.

Is this not just hiding the issue from lockdep though? The pm_begin/end 
are not real locks, they are just there as annotations to teach lockdep 
about the hidden pm dependencies. For example, any locks you grab in rpm 
suspend()/resume() you are not allowed to hold when waking the device 
up, otherwise you can potentially deadlock. So here moving stuff outside 
the annotations probably doesn't really fix anything, it just hides it 
from lockdep so it doesn't complain.

The above looks like it's complaining that clientlist_mutex is held and 
indirectly ends up waiting on something pm related, but the pm side 
itself can also grab that lock? That seems like a legit complaint and 
should be fixed in the upper layers?

> 
> The drm_client_dev_suspend/resume calls are safe to move because they
> only blank/unblank the fbdev console and don't depend on display power
> state or any operations within the block signalling scope.
> 
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Badal Nilawar <badal.nilawar@intel.com>
> Cc: Raag Jadav <raag.jadav@intel.com>
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>   drivers/gpu/drm/xe/display/xe_display.c | 3 ---
>   drivers/gpu/drm/xe/xe_pm.c              | 5 +++++
>   2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
> index 810d93fefcbc..ec65dcbc3fdc 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.c
> +++ b/drivers/gpu/drm/xe/display/xe_display.c
> @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe)
>   	 * properly.
>   	 */
>   	intel_display_power_disable(display);
> -	drm_client_dev_suspend(&xe->drm);
>   
>   	if (intel_display_device_present(display)) {
>   		drm_kms_helper_poll_disable(&xe->drm);
> @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe)
>   
>   	intel_opregion_resume(display);
>   
> -	drm_client_dev_resume(&xe->drm);
> -
>   	intel_display_power_enable(display);
>   }
>   
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index 99562f691080..d00c620d0926 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -10,6 +10,7 @@
>   #include <linux/suspend.h>
>   #include <linux/dmi.h>
>   
> +#include <drm/drm_client_event.h>
>   #include <drm/drm_managed.h>
>   #include <drm/ttm/ttm_placement.h>
>   
> @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe)
>   	int err;
>   
>   	drm_dbg(&xe->drm, "Suspending device\n");
> +	drm_client_dev_suspend(&xe->drm);
>   	xe_pm_block_begin_signalling();
>   	trace_xe_pm_suspend(xe, __builtin_return_address(0));
>   
> @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe)
>   err:
>   	drm_dbg(&xe->drm, "Device suspend failed %d\n", err);
>   	xe_pm_block_end_signalling();
> +	drm_client_dev_resume(&xe->drm);
>   	return err;
>   }
>   
> @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe)
>   
>   	drm_dbg(&xe->drm, "Device resumed\n");
>   	xe_pm_block_end_signalling();
> +	drm_client_dev_resume(&xe->drm);
>   	return 0;
>   err:
>   	drm_dbg(&xe->drm, "Device resume failed %d\n", err);
>   	xe_pm_block_end_signalling();
> +	drm_client_dev_resume(&xe->drm);
>   	return err;
>   }
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
  2026-06-17  8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay
  2026-06-17 11:25 ` Matthew Auld
@ 2026-06-17 11:30 ` Jani Nikula
  2026-06-17 11:56 ` Raag Jadav
  2 siblings, 0 replies; 7+ messages in thread
From: Jani Nikula @ 2026-06-17 11:30 UTC (permalink / raw)
  To: Tejas Upadhyay, intel-xe; +Cc: Tejas Upadhyay, Badal Nilawar, Raag Jadav

On Wed, 17 Jun 2026, Tejas Upadhyay <tejas.upadhyay@intel.com> wrote:
> A circular locking dependency is reported between xe_pm_block_map and
> dev->clientlist_mutex:
>
>   &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex
>
> The dependency chain is:
> 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
>    xe_display_pm_suspend() which calls drm_client_dev_suspend()
>    acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
>
> 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
>    prelim_xe_eudebug_file_close() acquires discovery_lock.
>    (clientlist_mutex -> discovery_lock)
>
> 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
>    (discovery_lock -> vm->lock)
>
> 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
>    which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
>
> Fix this by moving drm_client_dev_suspend() before
> xe_pm_block_begin_signalling() in xe_pm_suspend(), and
> drm_client_dev_resume() after xe_pm_block_end_signalling() in
> xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
> edge in the dependency chain.
>
> The drm_client_dev_suspend/resume calls are safe to move because they
> only blank/unblank the fbdev console and don't depend on display power
> state or any operations within the block signalling scope.

What baseline are you using?

See f597d4b2fd0d ("drm/{i915, xe}: move more calls inside
intel_display_driver_pm_resume()"), pushed almost two weeks ago.

The direction must be to move display related calls to display, and
neither i915 nor xe core driver should be messing with anything to do
with display directly.

Need to find another solution.

BR,
Jani.

>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Badal Nilawar <badal.nilawar@intel.com>
> Cc: Raag Jadav <raag.jadav@intel.com>
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>  drivers/gpu/drm/xe/display/xe_display.c | 3 ---
>  drivers/gpu/drm/xe/xe_pm.c              | 5 +++++
>  2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
> index 810d93fefcbc..ec65dcbc3fdc 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.c
> +++ b/drivers/gpu/drm/xe/display/xe_display.c
> @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe)
>  	 * properly.
>  	 */
>  	intel_display_power_disable(display);
> -	drm_client_dev_suspend(&xe->drm);
>  
>  	if (intel_display_device_present(display)) {
>  		drm_kms_helper_poll_disable(&xe->drm);
> @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe)
>  
>  	intel_opregion_resume(display);
>  
> -	drm_client_dev_resume(&xe->drm);
> -
>  	intel_display_power_enable(display);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> index 99562f691080..d00c620d0926 100644
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -10,6 +10,7 @@
>  #include <linux/suspend.h>
>  #include <linux/dmi.h>
>  
> +#include <drm/drm_client_event.h>
>  #include <drm/drm_managed.h>
>  #include <drm/ttm/ttm_placement.h>
>  
> @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe)
>  	int err;
>  
>  	drm_dbg(&xe->drm, "Suspending device\n");
> +	drm_client_dev_suspend(&xe->drm);
>  	xe_pm_block_begin_signalling();
>  	trace_xe_pm_suspend(xe, __builtin_return_address(0));
>  
> @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe)
>  err:
>  	drm_dbg(&xe->drm, "Device suspend failed %d\n", err);
>  	xe_pm_block_end_signalling();
> +	drm_client_dev_resume(&xe->drm);
>  	return err;
>  }
>  
> @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe)
>  
>  	drm_dbg(&xe->drm, "Device resumed\n");
>  	xe_pm_block_end_signalling();
> +	drm_client_dev_resume(&xe->drm);
>  	return 0;
>  err:
>  	drm_dbg(&xe->drm, "Device resume failed %d\n", err);
>  	xe_pm_block_end_signalling();
> +	drm_client_dev_resume(&xe->drm);
>  	return err;
>  }

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
  2026-06-17  8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay
  2026-06-17 11:25 ` Matthew Auld
  2026-06-17 11:30 ` Jani Nikula
@ 2026-06-17 11:56 ` Raag Jadav
  2026-06-17 12:12   ` Upadhyay, Tejas
  2 siblings, 1 reply; 7+ messages in thread
From: Raag Jadav @ 2026-06-17 11:56 UTC (permalink / raw)
  To: Tejas Upadhyay; +Cc: intel-xe, Jani Nikula, Badal Nilawar

On Wed, Jun 17, 2026 at 01:51:05PM +0530, Tejas Upadhyay wrote:
> A circular locking dependency is reported between xe_pm_block_map and
> dev->clientlist_mutex:
> 
>   &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map --> &dev->clientlist_mutex
> 
> The dependency chain is:
> 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
>    xe_display_pm_suspend() which calls drm_client_dev_suspend()
>    acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
> 
> 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
>    prelim_xe_eudebug_file_close() acquires discovery_lock.
>    (clientlist_mutex -> discovery_lock)
> 
> 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
>    (discovery_lock -> vm->lock)
> 
> 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
>    which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
> 
> Fix this by moving drm_client_dev_suspend() before
> xe_pm_block_begin_signalling() in xe_pm_suspend(), and
> drm_client_dev_resume() after xe_pm_block_end_signalling() in
> xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
> edge in the dependency chain.
> 
> The drm_client_dev_suspend/resume calls are safe to move because they
> only blank/unblank the fbdev console and don't depend on display power
> state or any operations within the block signalling scope.

Other folks already have interesting comments but just FYI:
We'd want to make sure fbdev is not left blank on suspend/resume failure
so the user atleast have an opportunity to see the failure and possibly
add to the bug report.

You can refer to the comment above xe_display_pm_resume().

Raag

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
  2026-06-17 11:25 ` Matthew Auld
@ 2026-06-17 12:11   ` Upadhyay, Tejas
  2026-06-17 14:13     ` Matthew Auld
  0 siblings, 1 reply; 7+ messages in thread
From: Upadhyay, Tejas @ 2026-06-17 12:11 UTC (permalink / raw)
  To: Auld, Matthew, intel-xe@lists.freedesktop.org
  Cc: Jani Nikula, Nilawar, Badal, Jadav, Raag



> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: 17 June 2026 16:55
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> xe@lists.freedesktop.org
> Cc: Jani Nikula <jani.nikula@linux.intel.com>; Nilawar, Badal
> <badal.nilawar@intel.com>; Jadav, Raag <raag.jadav@intel.com>
> Subject: Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside
> pm block signalling
> 
> On 17/06/2026 09:21, Tejas Upadhyay wrote:
> > A circular locking dependency is reported between xe_pm_block_map and
> > dev->clientlist_mutex:
> >
> >    &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map -->
> > &dev->clientlist_mutex
> >
> > The dependency chain is:
> > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
> >     xe_display_pm_suspend() which calls drm_client_dev_suspend()
> >     acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
> >
> > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
> >     prelim_xe_eudebug_file_close() acquires discovery_lock.
> >     (clientlist_mutex -> discovery_lock)
> >
> > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
> >     (discovery_lock -> vm->lock)
> >
> > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
> >     which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
> >
> > Fix this by moving drm_client_dev_suspend() before
> > xe_pm_block_begin_signalling() in xe_pm_suspend(), and
> > drm_client_dev_resume() after xe_pm_block_end_signalling() in
> > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
> > edge in the dependency chain.
> 
> Is this not just hiding the issue from lockdep though? The pm_begin/end are
> not real locks, they are just there as annotations to teach lockdep about the
> hidden pm dependencies. For example, any locks you grab in rpm
> suspend()/resume() you are not allowed to hold when waking the device up,
> otherwise you can potentially deadlock. So here moving stuff outside the
> annotations probably doesn't really fix anything, it just hides it from lockdep
> so it doesn't complain.
> 
> The above looks like it's complaining that clientlist_mutex is held and indirectly
> ends up waiting on something pm related, but the pm side itself can also grab
> that lock? That seems like a legit complaint and should be fixed in the upper
> layers?

Then it touches the eudebug subsystem as it most likely come from eudebug teardown, since drm_client_dev_suspend/resume doesn't functionally need to be inside the pm_block window (it just blanks/unblanks fbdev console), moving it outside is the minimal fix imo.

Tejas
> 
> >
> > The drm_client_dev_suspend/resume calls are safe to move because they
> > only blank/unblank the fbdev console and don't depend on display power
> > state or any operations within the block signalling scope.
> >
> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> > Cc: Badal Nilawar <badal.nilawar@intel.com>
> > Cc: Raag Jadav <raag.jadav@intel.com>
> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > ---
> >   drivers/gpu/drm/xe/display/xe_display.c | 3 ---
> >   drivers/gpu/drm/xe/xe_pm.c              | 5 +++++
> >   2 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/display/xe_display.c
> b/drivers/gpu/drm/xe/display/xe_display.c
> > index 810d93fefcbc..ec65dcbc3fdc 100644
> > --- a/drivers/gpu/drm/xe/display/xe_display.c
> > +++ b/drivers/gpu/drm/xe/display/xe_display.c
> > @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe)
> >   	 * properly.
> >   	 */
> >   	intel_display_power_disable(display);
> > -	drm_client_dev_suspend(&xe->drm);
> >
> >   	if (intel_display_device_present(display)) {
> >   		drm_kms_helper_poll_disable(&xe->drm);
> > @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe)
> >
> >   	intel_opregion_resume(display);
> >
> > -	drm_client_dev_resume(&xe->drm);
> > -
> >   	intel_display_power_enable(display);
> >   }
> >
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index 99562f691080..d00c620d0926 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -10,6 +10,7 @@
> >   #include <linux/suspend.h>
> >   #include <linux/dmi.h>
> >
> > +#include <drm/drm_client_event.h>
> >   #include <drm/drm_managed.h>
> >   #include <drm/ttm/ttm_placement.h>
> >
> > @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe)
> >   	int err;
> >
> >   	drm_dbg(&xe->drm, "Suspending device\n");
> > +	drm_client_dev_suspend(&xe->drm);
> >   	xe_pm_block_begin_signalling();
> >   	trace_xe_pm_suspend(xe, __builtin_return_address(0));
> >
> > @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe)
> >   err:
> >   	drm_dbg(&xe->drm, "Device suspend failed %d\n", err);
> >   	xe_pm_block_end_signalling();
> > +	drm_client_dev_resume(&xe->drm);
> >   	return err;
> >   }
> >
> > @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe)
> >
> >   	drm_dbg(&xe->drm, "Device resumed\n");
> >   	xe_pm_block_end_signalling();
> > +	drm_client_dev_resume(&xe->drm);
> >   	return 0;
> >   err:
> >   	drm_dbg(&xe->drm, "Device resume failed %d\n", err);
> >   	xe_pm_block_end_signalling();
> > +	drm_client_dev_resume(&xe->drm);
> >   	return err;
> >   }
> >


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
  2026-06-17 11:56 ` Raag Jadav
@ 2026-06-17 12:12   ` Upadhyay, Tejas
  0 siblings, 0 replies; 7+ messages in thread
From: Upadhyay, Tejas @ 2026-06-17 12:12 UTC (permalink / raw)
  To: Jadav, Raag; +Cc: intel-xe@lists.freedesktop.org, Jani Nikula, Nilawar, Badal



> -----Original Message-----
> From: Jadav, Raag <raag.jadav@intel.com>
> Sent: 17 June 2026 17:27
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>
> Cc: intel-xe@lists.freedesktop.org; Jani Nikula <jani.nikula@linux.intel.com>;
> Nilawar, Badal <badal.nilawar@intel.com>
> Subject: Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside
> pm block signalling
> 
> On Wed, Jun 17, 2026 at 01:51:05PM +0530, Tejas Upadhyay wrote:
> > A circular locking dependency is reported between xe_pm_block_map and
> > dev->clientlist_mutex:
> >
> >   &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map -->
> > &dev->clientlist_mutex
> >
> > The dependency chain is:
> > 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
> >    xe_display_pm_suspend() which calls drm_client_dev_suspend()
> >    acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
> >
> > 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
> >    prelim_xe_eudebug_file_close() acquires discovery_lock.
> >    (clientlist_mutex -> discovery_lock)
> >
> > 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
> >    (discovery_lock -> vm->lock)
> >
> > 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
> >    which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
> >
> > Fix this by moving drm_client_dev_suspend() before
> > xe_pm_block_begin_signalling() in xe_pm_suspend(), and
> > drm_client_dev_resume() after xe_pm_block_end_signalling() in
> > xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
> > edge in the dependency chain.
> >
> > The drm_client_dev_suspend/resume calls are safe to move because they
> > only blank/unblank the fbdev console and don't depend on display power
> > state or any operations within the block signalling scope.
> 
> Other folks already have interesting comments but just FYI:
> We'd want to make sure fbdev is not left blank on suspend/resume failure so
> the user atleast have an opportunity to see the failure and possibly add to the
> bug report.
> 
> You can refer to the comment above xe_display_pm_resume().

Thanks for flagging this. The patch already handles this as in the suspend error paths, drm_client_dev_resume() is called at err: (which err_display falls through to), so the fbdev console is always unblanked on failure. Similarly in xe_pm_resume(), both the success and error paths call drm_client_dev_resume(), consistent with the existing pattern of calling xe_display_pm_resume() before bailing on GT resume failure. So the user always gets their screen back on any suspend/resume failure as far as I understand it.

Tejas

> 
> Raag

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling
  2026-06-17 12:11   ` Upadhyay, Tejas
@ 2026-06-17 14:13     ` Matthew Auld
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Auld @ 2026-06-17 14:13 UTC (permalink / raw)
  To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org
  Cc: Jani Nikula, Nilawar, Badal, Jadav, Raag

On 17/06/2026 13:11, Upadhyay, Tejas wrote:
> 
> 
>> -----Original Message-----
>> From: Auld, Matthew <matthew.auld@intel.com>
>> Sent: 17 June 2026 16:55
>> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
>> xe@lists.freedesktop.org
>> Cc: Jani Nikula <jani.nikula@linux.intel.com>; Nilawar, Badal
>> <badal.nilawar@intel.com>; Jadav, Raag <raag.jadav@intel.com>
>> Subject: Re: [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside
>> pm block signalling
>>
>> On 17/06/2026 09:21, Tejas Upadhyay wrote:
>>> A circular locking dependency is reported between xe_pm_block_map and
>>> dev->clientlist_mutex:
>>>
>>>     &dev->clientlist_mutex --> &vm->lock --> xe_pm_block_map -->
>>> &dev->clientlist_mutex
>>>
>>> The dependency chain is:
>>> 1. xe_pm_suspend() calls xe_pm_block_begin_signalling(), then
>>>      xe_display_pm_suspend() which calls drm_client_dev_suspend()
>>>      acquiring clientlist_mutex. (xe_pm_block_map -> clientlist_mutex)
>>>
>>> 2. drm_client_dev_unregister() -> drm_file_free() -> xe_file_close() ->
>>>      prelim_xe_eudebug_file_close() acquires discovery_lock.
>>>      (clientlist_mutex -> discovery_lock)
>>>
>>> 3. xe_vm_close_and_put() acquires vm->lock under discovery_lock.
>>>      (discovery_lock -> vm->lock)
>>>
>>> 4. xe_exec_ioctl() holds vm->lock and calls xe_pm_block_on_suspend()
>>>      which waits on xe_pm_block_map. (vm->lock -> xe_pm_block_map)
>>>
>>> Fix this by moving drm_client_dev_suspend() before
>>> xe_pm_block_begin_signalling() in xe_pm_suspend(), and
>>> drm_client_dev_resume() after xe_pm_block_end_signalling() in
>>> xe_pm_resume(). This breaks the xe_pm_block_map -> clientlist_mutex
>>> edge in the dependency chain.
>>
>> Is this not just hiding the issue from lockdep though? The pm_begin/end are
>> not real locks, they are just there as annotations to teach lockdep about the
>> hidden pm dependencies. For example, any locks you grab in rpm
>> suspend()/resume() you are not allowed to hold when waking the device up,
>> otherwise you can potentially deadlock. So here moving stuff outside the
>> annotations probably doesn't really fix anything, it just hides it from lockdep
>> so it doesn't complain.
>>
>> The above looks like it's complaining that clientlist_mutex is held and indirectly
>> ends up waiting on something pm related, but the pm side itself can also grab
>> that lock? That seems like a legit complaint and should be fixed in the upper
>> layers?
> 
> Then it touches the eudebug subsystem as it most likely come from eudebug teardown, since drm_client_dev_suspend/resume doesn't functionally need to be inside the pm_block window (it just blanks/unblanks fbdev console), moving it outside is the minimal fix imo.

If it's called in resume() or suspend() and it grabs clientlist_mutex 
then it is already part of the pm flow. Any locks you grab in here are 
part of the pm flow. The annotation here is just recording that 
dependency. If in the upper layers you then grab that same lock and then 
do something that depends on the pm flow, it has the potential to deadlock.

The above seems to be saying you have some kind of locking inversion 
with clientlist_mutex vs pm flow. Moving that lock outside the 
annotation likely just gets rids of the lockdep splat, and is not really 
a proper fix AFAICT.

For example, you grab lock A in resume(), but in the upper layers there 
is a path which can grab lock A and then end up indirectly depending on 
the resume() completing before continuing. That has the potential to 
deadlock, and the annotation here will help detect it.

> 
> Tejas
>>
>>>
>>> The drm_client_dev_suspend/resume calls are safe to move because they
>>> only blank/unblank the fbdev console and don't depend on display power
>>> state or any operations within the block signalling scope.
>>>
>>> Cc: Jani Nikula <jani.nikula@linux.intel.com>
>>> Cc: Badal Nilawar <badal.nilawar@intel.com>
>>> Cc: Raag Jadav <raag.jadav@intel.com>
>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/display/xe_display.c | 3 ---
>>>    drivers/gpu/drm/xe/xe_pm.c              | 5 +++++
>>>    2 files changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/display/xe_display.c
>> b/drivers/gpu/drm/xe/display/xe_display.c
>>> index 810d93fefcbc..ec65dcbc3fdc 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_display.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_display.c
>>> @@ -304,7 +304,6 @@ void xe_display_pm_suspend(struct xe_device *xe)
>>>    	 * properly.
>>>    	 */
>>>    	intel_display_power_disable(display);
>>> -	drm_client_dev_suspend(&xe->drm);
>>>
>>>    	if (intel_display_device_present(display)) {
>>>    		drm_kms_helper_poll_disable(&xe->drm);
>>> @@ -461,8 +460,6 @@ void xe_display_pm_resume(struct xe_device *xe)
>>>
>>>    	intel_opregion_resume(display);
>>>
>>> -	drm_client_dev_resume(&xe->drm);
>>> -
>>>    	intel_display_power_enable(display);
>>>    }
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
>>> index 99562f691080..d00c620d0926 100644
>>> --- a/drivers/gpu/drm/xe/xe_pm.c
>>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>>> @@ -10,6 +10,7 @@
>>>    #include <linux/suspend.h>
>>>    #include <linux/dmi.h>
>>>
>>> +#include <drm/drm_client_event.h>
>>>    #include <drm/drm_managed.h>
>>>    #include <drm/ttm/ttm_placement.h>
>>>
>>> @@ -177,6 +178,7 @@ int xe_pm_suspend(struct xe_device *xe)
>>>    	int err;
>>>
>>>    	drm_dbg(&xe->drm, "Suspending device\n");
>>> +	drm_client_dev_suspend(&xe->drm);
>>>    	xe_pm_block_begin_signalling();
>>>    	trace_xe_pm_suspend(xe, __builtin_return_address(0));
>>>
>>> @@ -219,6 +221,7 @@ int xe_pm_suspend(struct xe_device *xe)
>>>    err:
>>>    	drm_dbg(&xe->drm, "Device suspend failed %d\n", err);
>>>    	xe_pm_block_end_signalling();
>>> +	drm_client_dev_resume(&xe->drm);
>>>    	return err;
>>>    }
>>>
>>> @@ -292,10 +295,12 @@ int xe_pm_resume(struct xe_device *xe)
>>>
>>>    	drm_dbg(&xe->drm, "Device resumed\n");
>>>    	xe_pm_block_end_signalling();
>>> +	drm_client_dev_resume(&xe->drm);
>>>    	return 0;
>>>    err:
>>>    	drm_dbg(&xe->drm, "Device resume failed %d\n", err);
>>>    	xe_pm_block_end_signalling();
>>> +	drm_client_dev_resume(&xe->drm);
>>>    	return err;
>>>    }
>>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-17 14:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17  8:21 [PATCH] drm/xe: Move drm_client_dev_suspend/resume outside pm block signalling Tejas Upadhyay
2026-06-17 11:25 ` Matthew Auld
2026-06-17 12:11   ` Upadhyay, Tejas
2026-06-17 14:13     ` Matthew Auld
2026-06-17 11:30 ` Jani Nikula
2026-06-17 11:56 ` Raag Jadav
2026-06-17 12:12   ` Upadhyay, Tejas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.