From: Riana Tauro <riana.tauro@intel.com>
To: Raag Jadav <raag.jadav@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<rodrigo.vivi@intel.com>, <aravind.iddamsetty@linux.intel.com>,
<badal.nilawar@intel.com>, <ravi.kishore.koppuravuri@intel.com>,
<mallesh.koujalagi@intel.com>
Subject: Re: [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
Date: Tue, 24 Feb 2026 08:53:06 +0530 [thread overview]
Message-ID: <efce4bdc-32b8-48b2-a5ae-c0aab1c7df8f@intel.com> (raw)
In-Reply-To: <aYhDM-bCiyClOgyL@black.igk.intel.com>
On 2/8/2026 1:32 PM, Raag Jadav wrote:
> On Thu, Jan 22, 2026 at 03:36:14PM +0530, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>
> ...
>
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> + struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> + xe_device_set_in_recovery(xe);
>> + xe_device_declare_wedged(xe);
>
> Is this the correct usage?
>
> Documentation/gpu/drm-uapi.rst +392
>
> "A 'wedged' device is basically a device that is declared dead by the driver
> after exhausting all possible attempts to recover it from driver context."
Can't this be used?
"The only exception to this is WEDGED=none, which signifies that the
device was temporarily ‘wedged’ at some point but was recovered from
driver context using device specific methods like reset. No explicit
recovery is expected from the consumer in this case, but it can still
take additional steps like gathering telemetry information (devcoredump,
syslog). "
If not will replace it with the gt_wedged function and block ioctls
using recovery flag
Thanks
Riana
>
> Raag
>
>> + pci_disable_device(pdev);
>> +}
next prev parent reply other threads:[~2026-02-24 3:23 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-22 10:06 [PATCH 0/8] Introduce Xe Uncorrectable Error Handling Riana Tauro
2026-01-22 9:42 ` ✗ CI.checkpatch: warning for " Patchwork
2026-01-22 9:43 ` ✓ CI.KUnit: success " Patchwork
2026-01-22 10:06 ` [PATCH 1/8] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
2026-01-22 10:06 ` [PATCH 2/8] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
2026-01-27 22:49 ` Michal Wajdeczko
2026-02-02 9:45 ` Riana Tauro
2026-01-29 9:09 ` Nilawar, Badal
2026-02-02 13:19 ` Nilawar, Badal
2026-02-03 3:46 ` Riana Tauro
2026-02-03 3:41 ` Riana Tauro
2026-02-08 8:02 ` Raag Jadav
2026-02-24 3:23 ` Riana Tauro [this message]
2026-02-24 5:33 ` Raag Jadav
2026-02-16 8:53 ` Mallesh, Koujalagi
2026-02-24 3:26 ` Riana Tauro
2026-01-22 10:06 ` [PATCH 3/8] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
2026-01-27 11:23 ` Mallesh, Koujalagi
2026-02-02 8:46 ` Riana Tauro
2026-01-22 10:06 ` [PATCH 4/8] drm/xe: Skip device access during PCI error recovery Riana Tauro
2026-01-22 10:06 ` [PATCH 5/8] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
2026-01-27 12:41 ` Mallesh, Koujalagi
2026-02-02 9:34 ` Riana Tauro
2026-02-04 8:38 ` Aravind Iddamsetty
2026-02-16 12:27 ` Mallesh, Koujalagi
2026-02-18 14:48 ` Riana Tauro
2026-01-22 10:06 ` [PATCH 6/8] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
2026-02-23 14:19 ` Mallesh, Koujalagi
2026-02-23 14:30 ` Riana Tauro
2026-01-22 10:06 ` [PATCH 7/8] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
2026-01-27 11:44 ` Mallesh, Koujalagi
2026-02-02 8:38 ` Riana Tauro
2026-01-27 14:03 ` Mallesh, Koujalagi
2026-02-02 8:54 ` Riana Tauro
2026-02-24 12:17 ` Mallesh, Koujalagi
2026-02-17 14:02 ` Raag Jadav
2026-02-23 14:10 ` Riana Tauro
2026-01-22 10:06 ` [PATCH 8/8] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
2026-02-24 12:46 ` Mallesh, Koujalagi
2026-01-22 10:21 ` ✓ Xe.CI.BAT: success for Introduce Xe Uncorrectable Error Handling Patchwork
2026-01-22 20:28 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=efce4bdc-32b8-48b2-a5ae-c0aab1c7df8f@intel.com \
--to=riana.tauro@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=raag.jadav@intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox