All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Karthik Poosa <karthik.poosa@intel.com>
Subject: Re: [PATCH 1/2] drm/xe/pm: Temporarily disable D3Cold on BMG
Date: Fri, 7 Mar 2025 19:49:03 -0500	[thread overview]
Message-ID: <Z8uT_3sWc3p5LYkc@intel.com> (raw)
In-Reply-To: <vvxsj5qz5dp2hwpizgduejrvvzqlz4bvjlaljuezpqarjmgvwk@bxnbmh6quo3r>

On Fri, Mar 07, 2025 at 04:15:01PM -0600, Lucas De Marchi wrote:
> On Thu, Mar 06, 2025 at 04:36:14PM -0500, Rodrigo Vivi wrote:
> > Currently, many instability cases related to D3Cold -> D0 transition
> > on BMG are under investigation. Among them some bad cases where
> > the device is lost after 1 to 3 transitions from D3Cold to D0
> > on the runtime pm, with pcieport upstream bridge port link retrain
> > failure.
> > 
> > In other cases, it works fine, but with some sudden random memory
> > corruptions after D3cold, that could be 0xffff missed ack on GT
> > forcewake or GuC reload related failures.
> > 
> > In some other cases though, D3Cold -> D0 works pretty reliably.
> > It looks like it is a combination of GPU cards and Host boards at
> > this point. So, there is no possible/available quirk at this time.
> > 
> > This patch disables the D3Cold by default on BMG by reducing the
> > vram_d3cold_threshold to 0. Users and developers who wants to enable
> > it are still able to via
> > $ echo 300 > /sys/bus/pci/devices/<addr>/vram_d3cold_threshold
> > 
> > Fixes: 3adcf970dc7e ("drm/xe/bmg: Drop force_probe requirement")
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4395
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4396
> 
> are these Link: or should we use Closes: ?

I don't want to close them while we are in the investigation.
So it is either Link: or References:, which check patch doesn't like.

> 
> > Cc: Karthik Poosa <karthik.poosa@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_pm.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > index 12200be7b43d..a9f61a5fc971 100644
> > --- a/drivers/gpu/drm/xe/xe_pm.c
> > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > @@ -287,6 +287,7 @@ ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */
> >  */
> > int xe_pm_init(struct xe_device *xe)
> > {
> > +	u32 vram_threshold;
> > 	int err;
> > 
> > 	/* For now suspend/resume is only allowed with GuC */
> > @@ -300,7 +301,11 @@ int xe_pm_init(struct xe_device *xe)
> > 		if (err)
> > 			return err;
> > 
> > -		err = xe_pm_set_vram_threshold(xe, DEFAULT_VRAM_THRESHOLD);
> > +		/* FIXME: D3Cold temporarily disabled by default on BMG */
> > +		vram_threshold = xe->info.platform == XE_BATTLEMAGE ? 0 :
> > +				DEFAULT_VRAM_THRESHOLD;
> 
> we usually have to extract this for different values per platform, so
> maybe just go ahead and do that?
> 
> 	u32 vram_threshold_value(struct xe_device *xe)
> 	{
> 		/* FIXME: D3Cold temporarily disabled by default on BMG */
> 		if (xe->info.platform == XE_BATTLEMAGE)
> 			return 0;
> 
> 		return DEFAULT_VRAM_THRESHOLD;
> 	}
> 
> 	xe_pm_init()
> 	{
> 		...
> 		vram_threshold = vram_threshold_value(xe);
> 	}
> 
> Then the second patch simply removes the first 3 lines of that function.

Good idea! I will change. Thank you!

> Anyway, I agree with the approach to get things working. We can try
> enabling d3cold again when we understand what's going on.
> 
> 
> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
> 
> 
> for both patches.
> 
> thanks
> Lucas De Marchi
> 
> > +
> > +		err = xe_pm_set_vram_threshold(xe, vram_threshold);
> > 		if (err)
> > 			return err;
> > 	}
> > -- 
> > 2.48.1
> > 

  reply	other threads:[~2025-03-08  0:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 21:36 [PATCH 0/2] Battlemage D3Cold issues Rodrigo Vivi
2025-03-06 21:36 ` [PATCH 1/2] drm/xe/pm: Temporarily disable D3Cold on BMG Rodrigo Vivi
2025-03-07 22:15   ` Lucas De Marchi
2025-03-08  0:49     ` Rodrigo Vivi [this message]
2025-03-06 21:36 ` [PATCH 2/2] drm/xe/pm: Re-enable D3Cold by default " Rodrigo Vivi
2025-03-06 21:41 ` ✓ CI.Patch_applied: success for Battlemage D3Cold issues Patchwork
2025-03-06 21:42 ` ✓ CI.checkpatch: " Patchwork
2025-03-06 21:43 ` ✓ CI.KUnit: " Patchwork
2025-03-06 21:59 ` ✓ CI.Build: " Patchwork
2025-03-06 22:02 ` ✓ CI.Hooks: " Patchwork
2025-03-06 22:03 ` ✓ CI.checksparse: " Patchwork
2025-03-06 22:23 ` ✓ Xe.CI.BAT: " Patchwork
2025-03-07  5:22 ` ✗ Xe.CI.Full: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8uT_3sWc3p5LYkc@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=karthik.poosa@intel.com \
    --cc=lucas.demarchi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.