From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: "Anirban, Sk" <sk.anirban@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: <anshuman.gupta@intel.com>, <badal.nilawar@intel.com>,
<riana.tauro@intel.com>, <karthik.poosa@intel.com>,
<raag.jadav@intel.com>, <soham.purkait@intel.com>,
<mallesh.koujalagi@intel.com>, <vinay.belgaumkar@intel.com>,
<nishanth.p.reddy@intel.com>, <rodrigo.vivi@intel.com>,
<matthew.d.roper@intel.com>
Subject: Re: [PATCH v2 2/2] drm/xe/guc: Add Wa_14025883347 for GuC DMA failure on reset
Date: Fri, 23 Jan 2026 09:28:00 -0800 [thread overview]
Message-ID: <800a6f03-62ea-41c7-ae7b-d25606ccb708@intel.com> (raw)
In-Reply-To: <73520ae2-967e-453a-a7ec-dfa0b2d6a080@intel.com>
On 1/23/2026 9:12 AM, Anirban, Sk wrote:
> Hi,
>
> On 23-01-2026 03:12 am, Daniele Ceraolo Spurio wrote:
>>
>>
>> On 1/16/2026 2:34 AM, Sk Anirban wrote:
>>> Prevent GuC firmware DMA failures during GuC-only reset by disabling
>>> idle flow and verifying SRAM handling completion. Without this, reset
>>> can be issued while SRAM handler is copying WOPCM to SRAM,
>>> causing GuC HW to get stuck.
>>>
>>> v2: Modify error message (Badal)
>>> Rename reg bit name (Daniele)
>>> Update WA skip condition (Daniele)
>>> Update SRAM handling logic (Daniele)
>>>
>>> Signed-off-by: Sk Anirban <sk.anirban@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/regs/xe_guc_regs.h | 8 +++++++
>>> drivers/gpu/drm/xe/xe_guc.c | 30
>>> +++++++++++++++++++++++++++
>>> drivers/gpu/drm/xe/xe_wa_oob.rules | 9 ++++++++
>>> 3 files changed, 47 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> index 87984713dd12..c9cb02f32f5a 100644
>>> --- a/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> +++ b/drivers/gpu/drm/xe/regs/xe_guc_regs.h
>>> @@ -40,6 +40,9 @@
>>> #define GS_BOOTROM_JUMP_PASSED REG_FIELD_PREP(GS_BOOTROM_MASK,
>>> 0x76)
>>> #define GS_MIA_IN_RESET REG_BIT(0)
>>> +#define GUC_HASH_BOOT_CHECK XE_REG(0xc010)
>>> +#define GUC_BOOT_UKERNEL_VALID REG_BIT(31)
>>> +
>>> #define GUC_HEADER_INFO XE_REG(0xc014)
>>> #define GUC_WOPCM_SIZE XE_REG(0xc050)
>>> @@ -83,7 +86,12 @@
>>> #define GUC_WOPCM_OFFSET_MASK REG_GENMASK(31,
>>> GUC_WOPCM_OFFSET_SHIFT)
>>> #define HUC_LOADING_AGENT_GUC REG_BIT(1)
>>> #define GUC_WOPCM_OFFSET_VALID REG_BIT(0)
>>> +
>>> +#define GUC_SRAM_STATUS XE_REG(0xc398)
>>> +#define GUC_SRAM_HANDLING_MASK REG_GENMASK(8, 7)
>>> +
>>> #define GUC_MAX_IDLE_COUNT XE_REG(0xc3e4)
>>> +#define GUC_IDLE_FLOW_DISABLE REG_BIT(31)
>>> #define GUC_PMTIMESTAMP_LO XE_REG(0xc3e8)
>>> #define GUC_PMTIMESTAMP_HI XE_REG(0xc3ec)
>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c
>>> b/drivers/gpu/drm/xe/xe_guc.c
>>> index 44360437beeb..42658a409556 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>>> @@ -900,6 +900,33 @@ int xe_guc_post_load_init(struct xe_guc *guc)
>>> return xe_guc_submit_enable(guc);
>>> }
>>> +/*
>>> + * Wa_14025883347: Prevent GuC firmware DMA failures during
>>> GuC-only reset by ensuring
>>> + * SRAM save/restore operations are complete before reset.
>>> + */
>>> +static void guc_prevent_fw_dma_failure_on_reset(struct xe_guc *guc)
>>> +{
>>> + struct xe_gt *gt = guc_to_gt(guc);
>>> + u32 boot_hash_chk, guc_status, sram_status;
>>> + int ret;
>>> +
>>> + guc_status = xe_mmio_read32(>->mmio, GUC_STATUS);
>>> + if (guc_status & GS_MIA_IN_RESET)
>>> + return;
>>> +
>>> + boot_hash_chk = xe_mmio_read32(>->mmio, GUC_HASH_BOOT_CHECK);
>>> + if (!(boot_hash_chk & GUC_BOOT_UKERNEL_VALID))
>>> + return;
>>> +
>>> + xe_mmio_rmw32(>->mmio, GUC_MAX_IDLE_COUNT, 0,
>>> GUC_IDLE_FLOW_DISABLE);
>>> +
>>
>> The WA says that we also need to wait for the status to be "ready"
>> after setting GUC_IDLE_FLOW_DISABLE.
>>
>> Daniele
>>
> As discussed, a GuC reset can occur without firmware interaction, and
> during RC6 exit the GuC load status may transition, meaning it will
> not always be INTEL_GUC_LOAD_STATUS_READY.
>
> So we’re checking GS_MIA_IN_RESET instead. I just want to confirm that
> this is enough to ensure FW is present before applying the WA.
The GS_MIA_IN_RESET check + boot_hash_chk are enough to determine if the
FW is present. However, waiting for the ready state is not about
confirming if it is present, it is about waiting for the FW
initialization to complete after we've confirmed that it is indeed
present. Basically the WA is saying that we can't do a GuC reset while
GuC init is still in progress.
Daniele
>
> Thanks,
>
> Anirban
>
>>> + ret = xe_mmio_wait32(>->mmio, GUC_SRAM_STATUS,
>>> GUC_SRAM_HANDLING_MASK,
>>> + 0, 5000, &sram_status, false);
>>> + if (ret)
>>> + xe_gt_warn(gt, "SRAM handling not complete
>>> (GUC_SRAM_STATUS: 0x%x)\n",
>>> + sram_status);
>>> +}
>>> +
>>> int xe_guc_reset(struct xe_guc *guc)
>>> {
>>> struct xe_gt *gt = guc_to_gt(guc);
>>> @@ -909,6 +936,9 @@ int xe_guc_reset(struct xe_guc *guc)
>>> xe_force_wake_assert_held(gt_to_fw(gt), XE_FW_GT);
>>> + if (XE_GT_WA(gt, 14025883347))
>>> + guc_prevent_fw_dma_failure_on_reset(guc);
>>> +
>>> if (IS_SRIOV_VF(gt_to_xe(gt)))
>>> return xe_gt_sriov_vf_bootstrap(gt);
>>> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> b/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> index 5cd7fa6d2a5c..ff2efc7a68cc 100644
>>> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
>>> @@ -73,3 +73,12 @@
>>> 15015404425_disable PLATFORM(PANTHERLAKE), MEDIA_STEP(B0, FOREVER)
>>> 16026007364 MEDIA_VERSION(3000)
>>> 14020316580 MEDIA_VERSION(1301)
>>> +
>>> +14025883347 MEDIA_VERSION(1301)
>>> + MEDIA_VERSION(2000)
>>> + MEDIA_VERSION(3000)
>>> + MEDIA_VERSION(3002)
>>> + MEDIA_VERSION(3500)
>>> + MEDIA_VERSION(3503)
>>> + GRAPHICS_VERSION_RANGE(3000, 3001)
>>> + GRAPHICS_VERSION_RANGE(3003, 3005)
>>
next prev parent reply other threads:[~2026-01-23 17:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 10:34 [PATCH v2 0/2] drm/xe/guc: Add Wa_14025883347 for GuC DMA failure on reset Sk Anirban
2026-01-16 10:34 ` [PATCH v2 1/2] drm/xe/rtp: Extend support for max rules/actions per entry Sk Anirban
2026-01-16 10:34 ` [PATCH v2 2/2] drm/xe/guc: Add Wa_14025883347 for GuC DMA failure on reset Sk Anirban
2026-01-20 5:14 ` Nilawar, Badal
2026-01-22 21:42 ` Daniele Ceraolo Spurio
2026-01-23 17:12 ` Anirban, Sk
2026-01-23 17:28 ` Daniele Ceraolo Spurio [this message]
2026-01-23 17:54 ` Anirban, Sk
2026-01-23 17:58 ` Daniele Ceraolo Spurio
2026-01-23 18:15 ` Anirban, Sk
2026-01-16 11:55 ` ✗ CI.checkpatch: warning for drm/xe/guc: Add Wa_14025883347 for GuC DMA failure on reset (rev2) Patchwork
2026-01-16 11:57 ` ✓ CI.KUnit: success " Patchwork
2026-01-16 12:39 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-01-16 16:25 ` ✗ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=800a6f03-62ea-41c7-ae7b-d25606ccb708@intel.com \
--to=daniele.ceraolospurio@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=karthik.poosa@intel.com \
--cc=mallesh.koujalagi@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=nishanth.p.reddy@intel.com \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=sk.anirban@intel.com \
--cc=soham.purkait@intel.com \
--cc=vinay.belgaumkar@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox