From: Raag Jadav <raag.jadav@intel.com>
To: Riana Tauro <riana.tauro@intel.com>
Cc: intel-xe@lists.freedesktop.org, anshuman.gupta@intel.com,
rodrigo.vivi@intel.com, aravind.iddamsetty@linux.intel.com,
badal.nilawar@intel.com, ravi.kishore.koppuravuri@intel.com,
mallesh.koujalagi@intel.com, soham.purkait@intel.com
Subject: Re: [PATCH v3 05/10] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
Date: Tue, 7 Apr 2026 07:50:11 +0200 [thread overview]
Message-ID: <adSbE-2u2fAdJj6q@black.igk.intel.com> (raw)
In-Reply-To: <20260402070131.1603828-17-riana.tauro@intel.com>
On Thu, Apr 02, 2026 at 12:31:36PM +0530, Riana Tauro wrote:
> Uncorrectable errors from different endpoints in the device are steered to
> the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
See below.
> Downgrade all the errors to non-fatal to prevent PCIe bus driver
> from triggering a Secondary Bus Reset (SBR). This allows error
> detection, containment and recovery in the driver.
>
> The Uncorrectable Error Severity Register has the 'Uncorrectable
> Internal Error Severity' set to fatal by default. Set this to
> non-fatal and unmask the error.
...
> +static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
> +{
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> + struct pci_dev *vsp, *usp;
> + u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
> + u16 aer_cap;
> +
> + /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
What are these TLAs and why is everyone expected to know them?
> + vsp = pci_upstream_bridge(pdev);
> + if (!vsp)
> + return;
> +
> + usp = pci_upstream_bridge(vsp);
> + if (!usp)
> + return;
> +
> + aer_cap = usp->aer_cap;
> +
> + if (!aer_cap)
> + return;
> +
> + /*
> + * Clear any stale Uncorrectable Internal Error Status event in Uncorrectable Error
> + * Status Register.
> + */
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, &aer_uncorr_status);
> + if (aer_uncorr_status & PCI_ERR_UNC_INTN)
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
> +
> + /*
> + * All errors are steered to USP which is a PCIe AER Compliant device.
Ditto.
Raag
> + * Downgrade all the errors to non-fatal to prevent PCIe bus driver
> + * from triggering a Secondary Bus Reset (SBR). This allows error
> + * detection, containment and recovery in the driver.
> + *
> + * The Uncorrectable Error Severity Register has the 'Uncorrectable
> + * Internal Error Severity' set to fatal by default. Set this to
> + * non-fatal and unmask the error.
> + */
next prev parent reply other threads:[~2026-04-07 5:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 7:01 [PATCH v3 00/10] Introduce Xe Uncorrectable Error Handling Riana Tauro
2026-04-02 7:01 ` [PATCH v3 01/10] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
2026-04-14 12:15 ` Mallesh, Koujalagi
2026-04-02 7:01 ` [PATCH v3 02/10] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
2026-04-07 4:50 ` Matthew Brost
2026-04-13 9:00 ` Tauro, Riana
2026-04-14 13:29 ` Mallesh, Koujalagi
2026-04-16 4:48 ` Tauro, Riana
2026-04-16 5:33 ` Mallesh, Koujalagi
2026-04-02 7:01 ` [PATCH v3 03/10] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
2026-04-16 13:07 ` Mallesh, Koujalagi
2026-04-02 7:01 ` [PATCH v3 04/10] drm/xe: Skip device access during PCI error recovery Riana Tauro
2026-04-02 7:01 ` [PATCH v3 05/10] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
2026-04-07 5:50 ` Raag Jadav [this message]
2026-04-02 7:01 ` [PATCH v3 06/10] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
2026-04-07 5:59 ` Raag Jadav
2026-04-02 7:01 ` [PATCH v3 07/10] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
2026-04-08 11:15 ` Raag Jadav
2026-04-13 8:17 ` Tauro, Riana
2026-04-02 7:01 ` [PATCH v3 08/10] drm/xe/xe_ras: Add structures for SoC Internal errors Riana Tauro
2026-04-08 11:18 ` Raag Jadav
2026-04-13 7:55 ` Tauro, Riana
2026-04-02 7:01 ` [PATCH v3 09/10] drm/xe/xe_ras: Handle Uncorrectable " Riana Tauro
2026-04-02 7:01 ` [PATCH v3 10/10] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
2026-04-02 8:01 ` ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev3) Patchwork
2026-04-02 8:02 ` ✓ CI.KUnit: success " Patchwork
2026-04-02 8:50 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-02 15:08 ` ✓ Xe.CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adSbE-2u2fAdJj6q@black.igk.intel.com \
--to=raag.jadav@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=soham.purkait@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.