From: "Tauro, Riana" <riana.tauro@intel.com>
To: Raag Jadav <raag.jadav@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<rodrigo.vivi@intel.com>, <aravind.iddamsetty@linux.intel.com>,
<badal.nilawar@intel.com>, <ravi.kishore.koppuravuri@intel.com>,
<mallesh.koujalagi@intel.com>, <soham.purkait@intel.com>,
Anoop Vijay <anoop.c.vijay@intel.com>,
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH v4 12/13] drm/xe/xe_ras: Query errors from system controller on probe
Date: Tue, 5 May 2026 19:20:44 +0530 [thread overview]
Message-ID: <3fb3fec8-e44b-49fc-be8f-649bc6bec5d3@intel.com> (raw)
In-Reply-To: <afCd_v3fPXpyufAD@black.igk.intel.com>
On 4/28/2026 5:16 PM, Raag Jadav wrote:
> On Fri, Apr 17, 2026 at 02:28:24PM +0530, Riana Tauro wrote:
>> Reorder soc remapper and system controller initialization to
>> early probe to allow querying errors on module load.
> ...
>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index 42ec27c05e9a..7598eeb796f0 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -479,4 +479,11 @@ void xe_ras_init(struct xe_device *xe)
>> /* Get any pages that need to be offlined from firmware and reserve them */
>> get_offlined_list(xe);
>> get_queued_pages(xe);
> I know it's yet to be merged but should we also add get_pending_event()?
We could add that.
>
>> + /*
>> + * On init, process and log any errors detected by firmware before driver init.
>> + * Critical errors are handled in xe_pcode_probe_early(), which enters survivability mode
>> + * if required.
>> + */
>> + xe_ras_process_errors(xe);
> What about wedging? Should we continue driver load after declaring wedged?
Wedging is only on CSC and Punit errors. Such errors will be caught
during boot survivability and appropriate
action will be taken. The rest of the errors require only logging.
Thanks
Riana
>
> Raag
>
>> }
>> --
>> 2.47.1
>>
next prev parent reply other threads:[~2026-05-05 13:51 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 8:58 [PATCH v4 00/13] Introduce Xe Uncorrectable Error Handling Riana Tauro
2026-04-17 8:58 ` [PATCH v4 01/13] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
2026-04-17 8:58 ` [PATCH v4 02/13] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
2026-04-27 6:35 ` Raag Jadav
2026-04-17 8:58 ` [PATCH v4 03/13] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
2026-04-17 8:58 ` [PATCH v4 04/13] drm/xe: Skip device access during PCI error recovery Riana Tauro
2026-04-30 12:58 ` Anshuman Gupta
2026-04-17 8:58 ` [PATCH v4 05/13] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
2026-04-27 7:56 ` Raag Jadav
2026-05-05 5:22 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 06/13] drm/xe/xe_ras: Add basic structures and commands for uncorrectable errors Riana Tauro
2026-04-17 17:38 ` Matt Roper
2026-04-17 21:25 ` Jadav, Raag
2026-04-17 21:32 ` Matt Roper
2026-04-20 5:34 ` Tauro, Riana
2026-04-20 7:49 ` Raag Jadav
2026-04-17 8:58 ` [PATCH v4 07/13] drm/xe/xe_ras: Add support for uncorrectable core-compute errors Riana Tauro
2026-04-27 8:24 ` Raag Jadav
2026-05-05 5:28 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 08/13] drm/xe/xe_ras: Handle uncorrectable SoC Internal errors Riana Tauro
2026-04-17 8:58 ` [PATCH v4 09/13] drm/xe/xe_ras: Handle uncorrectable device memory errors Riana Tauro
2026-04-21 6:08 ` Upadhyay, Tejas
2026-05-05 5:03 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 10/13] drm/xe/xe_ras: Add support to offline/decline a page Riana Tauro
2026-04-21 6:21 ` Upadhyay, Tejas
2026-05-05 5:16 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 11/13] drm/xe/xe_ras: Add support for page offline list and queue commands Riana Tauro
2026-04-21 6:19 ` Upadhyay, Tejas
2026-05-05 5:08 ` Tauro, Riana
2026-04-21 9:10 ` Upadhyay, Tejas
2026-05-05 5:17 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 12/13] drm/xe/xe_ras: Query errors from system controller on probe Riana Tauro
2026-04-28 11:46 ` Raag Jadav
2026-05-05 13:50 ` Tauro, Riana [this message]
2026-04-17 8:58 ` [PATCH v4 13/13] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
2026-04-28 11:39 ` Raag Jadav
2026-05-05 5:31 ` Tauro, Riana
2026-04-30 11:15 ` Gupta, Anshuman
2026-05-02 17:55 ` Raag Jadav
2026-04-20 13:33 ` ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev4) Patchwork
2026-04-20 13:35 ` ✓ CI.KUnit: success " Patchwork
2026-04-20 14:42 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-20 17:14 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3fb3fec8-e44b-49fc-be8f-649bc6bec5d3@intel.com \
--to=riana.tauro@intel.com \
--cc=anoop.c.vijay@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=raag.jadav@intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=soham.purkait@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox