Re: [PATCH v1] drm/xe/pm: Handle GT resume failure

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Raag Jadav <raag.jadav@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>,
	intel-xe@lists.freedesktop.org, matthew.brost@intel.com,
	michal.wajdeczko@intel.com, badal.nilawar@intel.com,
	karthik.poosa@intel.com, dev@lankhorst.se
Subject: Re: [PATCH v1] drm/xe/pm: Handle GT resume failure
Date: Fri, 19 Dec 2025 06:04:42 +0100	[thread overview]
Message-ID: <aUTc6kypmsbHNWC1@black.igk.intel.com> (raw)
In-Reply-To: <20251218184610.GD1180203@mdroper-desk1.amr.corp.intel.com>

On Thu, Dec 18, 2025 at 10:46:10AM -0800, Matt Roper wrote:
> On Thu, Dec 18, 2025 at 12:12:59PM +0100, Raag Jadav wrote:
> > On Wed, Dec 17, 2025 at 09:38:34AM -0800, Matt Roper wrote:
> > > On Wed, Dec 17, 2025 at 12:25:32PM -0500, Rodrigo Vivi wrote:
> > > > On Wed, Dec 17, 2025 at 06:49:09PM +0530, Raag Jadav wrote:
> > > > > We've been historically ignoring GT resume failure. Since the function
> > > > > can return error, handle it properly.
> > > > 
> > > > I probably had a reason for it, but since I didn't document and
> > > > cannot remember it, let's go forward and make the clean flow.
> > > > 
> > > > Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > 
> > > > > 
> > > > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_pm.c | 14 ++++++++++----
> > > > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > > index 4390ba69610d..a8b50091d62e 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > > @@ -260,8 +260,11 @@ int xe_pm_resume(struct xe_device *xe)
> > > > >  
> > > > >  	xe_irq_resume(xe);
> > > > >  
> > > > > -	for_each_gt(gt, xe, id)
> > > > > -		xe_gt_resume(gt);
> > > > > +	for_each_gt(gt, xe, id) {
> > > > > +		err = xe_gt_resume(gt);
> > > > > +		if (err)
> > > > > +			goto err;
> > > 
> > > When we propagate these errors upward, what's the end result / where
> > > does it eventually get handled?  If the device is still [partially]
> > > usable after an error, wouldn't it be better to not bail out of the loop
> > > immediately, but rather at least try to resume the other GTs, the
> > > display, etc. before returning the error at the end to indicate
> > > something failed?  Then you might still have a partially functioning
> > > device and have a better chance of at least having your screen turn back
> > > on to show the relevant error messages?
> > 
> > I had a similar question when I came across xe_device_probe(), but as
> > Lucas mentioned[1] that the expectation here is pretty much "all or
> > nothing". Again, not my call but I think we should be consistent.
> 
> I think device probe is a bit different --- if you can't bring up the
> hardware successfully at the very beginning then something is pretty
> wrong and it's best to just not enable and start using the device at
> all.  But the resume paths are different --- the device is already bound
> and in use, and was working properly previously.  If we intentionally
> don't even try to power up other parts of the device that might still
> work (display, other GTs, etc.), then we're making the situation worse
> and that could be the difference between the user having a functional UI
> that gives them a chance to save their work and shutdown/recover
> gracefully vs having to just power off the machine because their monitor
> is black and they don't have any idea what's going on.  Powering up
> other units like display also makes it more likely that we can get
> useful debugging information out of the machine to figure out what
> actually went wrong.

Fair, but this also means the existing error handing in resume path is
redundant and should be removed.

Raag

> > [1] https://lore.kernel.org/intel-xe/lliho4ci6gi5spxxelttgqntbh7rxr4utg4dgfevlrdy54phrh@2k4mjuofaqye/
> > 
> > > > > +	}
> > > > >  
> > > > >  	xe_display_pm_resume(xe);
> > > > >  
> > > > > @@ -656,8 +659,11 @@ int xe_pm_runtime_resume(struct xe_device *xe)
> > > > >  
> > > > >  	xe_irq_resume(xe);
> > > > >  
> > > > > -	for_each_gt(gt, xe, id)
> > > > > -		xe->d3cold.allowed ? xe_gt_resume(gt) : xe_gt_runtime_resume(gt);
> > > > > +	for_each_gt(gt, xe, id) {
> > > > > +		err = xe->d3cold.allowed ? xe_gt_resume(gt) : xe_gt_runtime_resume(gt);
> > > > > +		if (err)
> > > > > +			goto out;
> > > > > +	}
> > > > >  
> > > > >  	xe_display_pm_runtime_resume(xe);
> > > > >  
> > > > > -- 
> > > > > 2.43.0
> > > > >

next prev parent reply	other threads:[~2025-12-19  5:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17 13:19 [PATCH v1] drm/xe/pm: Handle GT resume failure Raag Jadav
2025-12-17 15:00 ` ✓ CI.KUnit: success for " Patchwork
2025-12-17 15:37 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-17 17:25 ` [PATCH v1] " Rodrigo Vivi
2025-12-17 17:38   ` Matt Roper
2025-12-18 11:12     ` Raag Jadav
2025-12-18 18:46       ` Matt Roper
2025-12-19  5:04         ` Raag Jadav [this message]
2025-12-19 16:08           ` Rodrigo Vivi
2025-12-19 18:00             ` Raag Jadav
2025-12-19 18:53               ` Rodrigo Vivi
2025-12-18 12:59 ` ✗ Xe.CI.Full: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUTc6kypmsbHNWC1@black.igk.intel.com \
    --to=raag.jadav@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=dev@lankhorst.se \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=karthik.poosa@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox