From: Lukas Wunner <lukas@wunner.de>
To: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
rodrigo.vivi@intel.com, andrealmeid@igalia.com,
christian.koenig@amd.com, airlied@gmail.com,
simona.vetter@ffwll.ch, mripard@kernel.org,
maarten.lankhorst@linux.intel.com, tzimmermann@suse.de,
anshuman.gupta@intel.com, badal.nilawar@intel.com,
riana.tauro@intel.com, karthik.poosa@intel.com,
sk.anirban@intel.com, raag.jadav@intel.com
Subject: Re: [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device
Date: Mon, 22 Jun 2026 05:47:40 +0200 [thread overview]
Message-ID: <ajiwXJuY0PrGq7Gj@wunner.de> (raw)
In-Reply-To: <20260612080722.26726-13-mallesh.koujalagi@intel.com>
On Fri, Jun 12, 2026 at 01:37:28PM +0530, Mallesh Koujalagi wrote:
> PUNIT errors can only be recovered using a power-cycle. Xe KMD
> sends a uevent to notify userspace to trigger a power cycle.
> On platforms where link drop caused by powering the device off and
> back on is reported by hardware as a Surprise Link Down (SLD), which
> AER then escalates as an Uncorrectable Fatal Error. That error fires
> before the device finishes coming back up and defeats the
> very recovery we are attempting.
>
> To keep the expected, recovery-induced link drop from being raised as
> a fatal AER event, mask the Surprise Link Down bit
> (PCI_ERR_UNC_SURPDN) in the upstream port's AER Uncorrectable Error
> Mask register before punit_error_handler() requests the cold reset.
You need to clear the Surprise Down Error Status bit in the
Uncorrectable Error Status Register after the reset.
You should also unmask the error.
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> + aer_uncorr_mask |= PCI_ERR_UNC_SURPDN;
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
pci_clear_and_set_config_dword()?
OSPM is not supposed to fiddle with AER registers unless it has been
granted control of those registers through the ACPI _OSC method.
There's a pcie_aer_is_native() helper to skip access to those registers
but its only visible to the PCI core currently.
Thanks,
Lukas
next prev parent reply other threads:[~2026-06-22 3:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-12 8:24 ` sashiko-bot
2026-06-12 8:07 ` [PATCH v8 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 3/6] drm/doc: Document " Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-06-12 8:27 ` sashiko-bot
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-12 8:21 ` sashiko-bot
2026-06-15 8:06 ` Tauro, Riana
2026-06-18 13:24 ` Raag Jadav
2026-06-19 6:42 ` Mallesh, Koujalagi
2026-06-22 3:47 ` Lukas Wunner [this message]
2026-06-22 6:23 ` Mallesh, Koujalagi
2026-06-12 8:07 ` [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error Mallesh Koujalagi
2026-06-12 8:23 ` sashiko-bot
2026-06-12 8:16 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev8) Patchwork
2026-06-12 8:18 ` ✓ CI.KUnit: success " Patchwork
2026-06-12 9:03 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-13 1:18 ` ✓ Xe.CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajiwXJuY0PrGq7Gj@wunner.de \
--to=lukas@wunner.de \
--cc=airlied@gmail.com \
--cc=andrealmeid@igalia.com \
--cc=anshuman.gupta@intel.com \
--cc=badal.nilawar@intel.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=karthik.poosa@intel.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mallesh.koujalagi@intel.com \
--cc=mripard@kernel.org \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona.vetter@ffwll.ch \
--cc=sk.anirban@intel.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.