From: Riana Tauro <riana.tauro@intel.com>
To: "Christian König" <christian.koenig@amd.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: anshuman.gupta@intel.com, rodrigo.vivi@intel.com,
lucas.demarchi@intel.com, aravind.iddamsetty@linux.intel.com,
raag.jadav@intel.com, umesh.nerlige.ramappa@intel.com,
frank.scarbrough@intel.com,
"André Almeida" <andrealmeid@igalia.com>
Subject: Re: [PATCH v2 1/5] drm: Add a firmware flash method to device wedged uevent
Date: Tue, 24 Jun 2025 19:33:17 +0530 [thread overview]
Message-ID: <d057d1e8-8b90-445c-8ccb-8a13e5d41a4c@intel.com> (raw)
In-Reply-To: <a2bfb8be-35bc-4db9-9352-02eab1ae0881@amd.com>
Hi Christian
On 6/24/2025 5:56 PM, Christian König wrote:
> On 23.06.25 12:01, Riana Tauro wrote:
>> A device is declared wedged when it is non-recoverable from
>> the driver context.
>
> Well, not quite.
i took this from the below document. Should it be changed?
https://www.kernel.org/doc/html/v6.16-rc3/gpu/drm-uapi.html#device-wedging
>
>> Some firmware errors can also cause
>> the device to enter this state and the only method to recover
>> from this would be to do a firmware flash
>
> What? What exactly do you mean with firmware flash here?
>
> Usually that means updating the firmware, but I don't see how this will bring you out of a wedge state?
It means updating the firmware.
Series: https://patchwork.freedesktop.org/series/149756/
In this xe kmd series, there are few firmware errors that cause the card
to be non-functional. The device is declared wedged and a firmware-flash
action is sent.
There is corresponding fwupd PR in work that uses this uevent to trigger
a firmware flash
fwupd PR: https://github.com/fwupd/fwupd/pull/8944/
Thanks
Riana
>
> Where is the rest of the series?
>
> Regards,
> Christian.
>
>> v2: modify documentation (Raag, Rodrigo)
>>
>> Cc: André Almeida <andrealmeid@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> Documentation/gpu/drm-uapi.rst | 6 +++---
>> drivers/gpu/drm/drm_drv.c | 2 ++
>> include/drm/drm_device.h | 1 +
>> 3 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
>> index 263e5a97c080..cd2481458755 100644
>> --- a/Documentation/gpu/drm-uapi.rst
>> +++ b/Documentation/gpu/drm-uapi.rst
>> @@ -422,9 +422,8 @@ Current implementation defines three recovery methods, out of which, drivers
>> can use any one, multiple or none. Method(s) of choice will be sent in the
>> uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
>> more side-effects. If driver is unsure about recovery or method is unknown
>> -(like soft/hard system reboot, firmware flashing, physical device replacement
>> -or any other procedure which can't be attempted on the fly), ``WEDGED=unknown``
>> -will be sent instead.
>> +(like soft/hard system reboot, physical device replacement or any other procedure
>> +which can't be attempted on the fly), ``WEDGED=unknown`` will be sent instead.
>>
>> Userspace consumers can parse this event and attempt recovery as per the
>> following expectations.
>> @@ -435,6 +434,7 @@ following expectations.
>> none optional telemetry collection
>> rebind unbind + bind driver
>> bus-reset unbind + bus reset/re-enumeration + bind
>> + firmware-flash firmware flash
>> unknown consumer policy
>> =============== ========================================
>>
>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>> index 02556363e918..5f3bbe01c207 100644
>> --- a/drivers/gpu/drm/drm_drv.c
>> +++ b/drivers/gpu/drm/drm_drv.c
>> @@ -535,6 +535,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
>> return "rebind";
>> case DRM_WEDGE_RECOVERY_BUS_RESET:
>> return "bus-reset";
>> + case DRM_WEDGE_RECOVERY_FW_FLASH:
>> + return "firmware-flash";
>> default:
>> return NULL;
>> }
>> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
>> index 08b3b2467c4c..9d57c8882d93 100644
>> --- a/include/drm/drm_device.h
>> +++ b/include/drm/drm_device.h
>> @@ -30,6 +30,7 @@ struct pci_controller;
>> #define DRM_WEDGE_RECOVERY_NONE BIT(0) /* optional telemetry collection */
>> #define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */
>> #define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device + bind */
>> +#define DRM_WEDGE_RECOVERY_FW_FLASH BIT(3) /* firmware flash */
>>
>> /**
>> * struct drm_wedge_task_info - information about the guilty task of a wedge dev
>
next prev parent reply other threads:[~2025-06-24 14:03 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-23 10:01 [PATCH v2 0/5] Handle Firmware reported Hardware Errors Riana Tauro
2025-06-23 9:42 ` ✗ CI.checkpatch: warning for Handle Firmware reported Hardware Errors (rev2) Patchwork
2025-06-23 9:44 ` ✓ CI.KUnit: success " Patchwork
2025-06-23 10:01 ` [PATCH v2 1/5] drm: Add a firmware flash method to device wedged uevent Riana Tauro
2025-06-24 12:26 ` Christian König
2025-06-24 14:03 ` Riana Tauro [this message]
2025-06-24 14:23 ` Christian König
2025-06-24 21:36 ` Rodrigo Vivi
2025-06-27 21:38 ` Rodrigo Vivi
2025-06-30 8:29 ` Christian König
2025-06-30 17:33 ` Rodrigo Vivi
2025-07-01 11:37 ` Riana Tauro
2025-07-01 11:41 ` Riana Tauro
2025-07-01 14:23 ` Raag Jadav
2025-07-01 14:35 ` Christian König
2025-07-01 16:02 ` Raag Jadav
2025-07-01 16:44 ` Riana Tauro
2025-07-01 17:15 ` André Almeida
2025-06-23 10:01 ` [PATCH v2 2/5] drm/xe: Add a helper function to set recovery method Riana Tauro
2025-06-23 10:01 ` [PATCH v2 3/5] drm/xe: Add support to handle hardware errors Riana Tauro
2025-06-23 10:01 ` [PATCH v2 4/5] drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors Riana Tauro
2025-06-23 10:01 ` [PATCH v2 5/5] drm/xe/xe_hw_error: Add fault injection to trigger csc error handler Riana Tauro
2025-06-23 10:02 ` ✗ CI.checksparse: warning for Handle Firmware reported Hardware Errors (rev2) Patchwork
2025-06-23 11:11 ` ✓ Xe.CI.BAT: success " Patchwork
2025-06-23 14:11 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d057d1e8-8b90-445c-8ccb-8a13e5d41a4c@intel.com \
--to=riana.tauro@intel.com \
--cc=andrealmeid@igalia.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=frank.scarbrough@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=raag.jadav@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox