From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Riana Tauro <riana.tauro@intel.com>
Cc: "Scarbrough, Frank" <frank.scarbrough@intel.com>,
<intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<lucas.demarchi@intel.com>, Raag Jadav <raag.jadav@intel.com>
Subject: Re: [PATCH] drm/xe/xe_survivability: Add support for survivability mode v2
Date: Tue, 4 Nov 2025 13:16:33 -0500 [thread overview]
Message-ID: <aQpDAc_EclcByX3A@intel.com> (raw)
In-Reply-To: <149be14a-1da1-4f43-9b0d-7c38feb0b01a@intel.com>
On Mon, Nov 03, 2025 at 01:35:07PM +0530, Riana Tauro wrote:
> Hi Rodrigo
>
> On 10/22/2025 6:08 PM, Riana Tauro wrote:
> > Hi Rodrigo
> >
> > On 10/17/2025 12:47 AM, Rodrigo Vivi wrote:
> > > On Tue, Oct 14, 2025 at 11:02:58AM +0530, Riana Tauro wrote:
> > > > v2 survivability breadcrumbs introduces a new mode called
> > > > SPI Flash Descriptor Override mode (FDO). This is enabled by
> > > > PCODE when MEI itself fails and firmware cannot be updated via
> > > > MEI using igsc. This mode provides the ability to update
> > > > the firmware directly via SPI driver.
> > > >
> > > > Xe KMD initializes the nvm aux driver if FDO mode is enabled.
> > > >
> > > > Userspace should check FDO mode entry in survivability sysfs before
> > > > using the SPI driver to update firmware.
> > > >
> > > > v2 also supports survivability mode for critical boot errors.
> > > >
> > > > cat /sys/bus/pci/devices/0000\:03\:00.0/survivability_mode
> > > >
> > > > Capability Info: 0x138320 - 0x2001ae06
> > > > Postcode Info: 0x138324 - 0x0
> > > > Overflow Info: 0x138328 - 0x0
> > > > Auxiliary Info 0: 0x13832c - 0x0
> > >
> > > I am truly sorry here, but although I was the one that designed this,
> > > looking it now, I realized that this is breaking the sysfs rules
> > > of one value per file and no fancy format. This is only allowed in
> > > the debugfs.
> >
> > Just found the link regarding the "one value per file". When i had tried
> > this i found few sysfs having this format, so used the same.
> >
> > >
> > > We need to change this asap, and with help from any tool that
> > > might be already consuming this.
> > >
> > > > FDO Mode: enabled
> > >
> > > After we fix that we can come and add this.
> > >
> > > About our options: I don't believe that debugfs is an option
> > > without the drm card right?
> >
> > No debugfs will not work here. And changing the path now will break
> > any tool using it.
> >
> > >
> > > Perhaps what we need is to transform survivability_mode in
> > > the directory. Each entry becomes a file in this directory.
> >
> > Since tools check for presence of survivability_mode file. We
> > can try this. Will check and respond
>
>
> fwupd tool currently reads the file for runtime survivability (though the
> file has only type). Changing this would break the previous versions of
> fwupd
>
> https://github.com/fwupd/fwupd/blob/66a0e5cc13b2f9b1391d84186bd6274cc9971299/plugins/intel-gsc/fu-igsc-device.c#L397
>
> @Frank, is it okay if we change the directory structure now?
>
> Or else can we have two sysfs entries.?
>
> 1) survivability_mode - boolean to indicate if device is in survivability
> mode or type of survivability mode
ack
>
> 2) survivability info for all the other details?
>
> /sys/bus/pci/devices/0000:03:00.0/survivability_info/
> ├── aux_info
> │ ├── aux_info0
> │ ├── aux_info1
> │ ├── aux_info2
> │ ├── aux_info3
> │ └── aux_info4
> ├── capability_info
> ├── postcode_info
> ├── postcode_overflow_info
ack
That sounds indeed better. We cannot break user-space.
Thank you and again, I'm so sorry for the bad sysfs design.
>
> Thanks
> Riana
> >
> > Thanks
> > Riana
> >
> > >
> > > Sorry,
> > > Rodrigo.
> > >
> > > >
> > > > Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_pcode_api.h | 2 ++
> > > > drivers/gpu/drm/xe/xe_survivability_mode.c | 32 +++++++++++++++++--
> > > > .../gpu/drm/xe/xe_survivability_mode_types.h | 6 ++++
> > > > 3 files changed, 38 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_pcode_api.h
> > > > b/drivers/gpu/drm/xe/ xe_pcode_api.h
> > > > index 92bfcba51e19..d41f07f9194d 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pcode_api.h
> > > > +++ b/drivers/gpu/drm/xe/xe_pcode_api.h
> > > > @@ -77,11 +77,13 @@
> > > > #define PCODE_SCRATCH(x) XE_REG(0x138320 + ((x) * 4))
> > > > /* PCODE_SCRATCH0 */
> > > > +#define BREADCRUMB_VERSION REG_GENMASK(31, 29)
> > > > #define AUXINFO_REG_OFFSET REG_GENMASK(17, 15)
> > > > #define OVERFLOW_REG_OFFSET REG_GENMASK(14, 12)
> > > > #define HISTORY_TRACKING REG_BIT(11)
> > > > #define OVERFLOW_SUPPORT REG_BIT(10)
> > > > #define AUXINFO_SUPPORT REG_BIT(9)
> > > > +#define FDO_MODE REG_BIT(4)
> > > > #define BOOT_STATUS REG_GENMASK(3, 1)
> > > > #define CRITICAL_FAILURE 4
> > > > #define NON_CRITICAL_FAILURE 7
> > > > diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c
> > > > b/drivers/ gpu/drm/xe/xe_survivability_mode.c
> > > > index 1662bfddd4bc..1c9421651548 100644
> > > > --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
> > > > +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
> > > > @@ -16,6 +16,7 @@
> > > > #include "xe_heci_gsc.h"
> > > > #include "xe_i2c.h"
> > > > #include "xe_mmio.h"
> > > > +#include "xe_nvm.h"
> > > > #include "xe_pcode_api.h"
> > > > #include "xe_vsec.h"
> > > > @@ -61,6 +62,12 @@
> > > > * Provides history of previous failures
> > > > * Auxiliary Information
> > > > * Certain failures may have information in addition to
> > > > postcode information
> > > > + * FDO Mode
> > > > + * To allow recovery in scenarios where MEI itself fails, a
> > > > new SPI Flash Descriptor
> > > > + * Override (FDO) mode is added in v2 survivability
> > > > breadcrumbs. This mode is enabled
> > > > + * by PCODE and provides the ability to directly update the
> > > > firmware via SPI Driver without
> > > > + * any dependency on MEI.
> > > > + * Xe KMD initializes the nvm aux driver if FDO mode is enabled.
> > > > *
> > > > * Runtime Survivability
> > > > * =====================
> > > > @@ -105,6 +112,11 @@ static void
> > > > populate_survivability_info(struct xe_device *xe)
> > > > set_survivability_info(mmio, info, id, "Capability Info");
> > > > reg_value = info[id].value;
> > > > + survivability->version = REG_FIELD_GET(BREADCRUMB_VERSION,
> > > > reg_value);
> > > > + /* FDO mode is exposed only from version 2 */
> > > > + if (survivability->version >= 2)
> > > > + survivability->fdo_mode = REG_FIELD_GET(FDO_MODE, reg_value);
> > > > +
> > > > if (reg_value & HISTORY_TRACKING) {
> > > > id++;
> > > > set_survivability_info(mmio, info, id, "Postcode Info");
> > > > @@ -171,6 +183,9 @@ static ssize_t
> > > > survivability_mode_show(struct device *dev,
> > > > info[index].reg, info[index].value);
> > > > }
> > > > + if (survivability->version >= 2)
> > > > + count += sysfs_emit_at(buff, count, "FDO Mode: %s\n",
> > > > + str_enabled_disabled(survivability->fdo_mode));
> > > > return count;
> > > > }
> > > > @@ -179,9 +194,13 @@ static DEVICE_ATTR_ADMIN_RO(survivability_mode);
> > > > static void xe_survivability_mode_fini(void *arg)
> > > > {
> > > > struct xe_device *xe = arg;
> > > > + struct xe_survivability *survivability = &xe->survivability;
> > > > struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> > > > struct device *dev = &pdev->dev;
> > > > + if (survivability->fdo_mode)
> > > > + xe_nvm_fini(xe);
> > > > +
> > > > sysfs_remove_file(&dev->kobj, &dev_attr_survivability_mode.attr);
> > > > }
> > > > @@ -230,11 +249,18 @@ static int
> > > > enable_boot_survivability_mode(struct pci_dev *pdev)
> > > > if (ret)
> > > > goto err;
> > > > + if (survivability->fdo_mode) {
> > > > + ret = xe_nvm_init(xe);
> > > > + if (ret)
> > > > + goto err;
> > > > + }
> > > > +
> > > > dev_err(dev, "In Survivability Mode\n");
> > > > return 0;
> > > > err:
> > > > + dev_err(dev, "Failed to enable Survivability Mode\n");
> > > > survivability->mode = false;
> > > > return ret;
> > > > }
> > > > @@ -365,8 +391,10 @@ int
> > > > xe_survivability_mode_boot_enable(struct xe_device *xe)
> > > > if (ret)
> > > > return ret;
> > > > - /* Log breadcrumbs but do not enter survivability mode for
> > > > Critical boot errors */
> > > > - if (survivability->boot_status == CRITICAL_FAILURE) {
> > > > + /*
> > > > + * v2 supports survivability mode for critical errors
> > > > + */
> > > > + if (survivability->version < 2 &&
> > > > survivability->boot_status == CRITICAL_FAILURE) {
> > > > log_survivability_info(pdev);
> > > > return -ENXIO;
> > > > }
> > > > diff --git a/drivers/gpu/drm/xe/xe_survivability_mode_types.h b/
> > > > drivers/gpu/drm/xe/xe_survivability_mode_types.h
> > > > index cd65a5d167c9..379d90759c28 100644
> > > > --- a/drivers/gpu/drm/xe/xe_survivability_mode_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_survivability_mode_types.h
> > > > @@ -38,6 +38,12 @@ struct xe_survivability {
> > > > /** @type: survivability type */
> > > > enum xe_survivability_type type;
> > > > +
> > > > + /** @fdo_mode: indicates if FDO mode is enabled */
> > > > + bool fdo_mode;
> > > > +
> > > > + /** @version: breadcrumb version of survivability mode */
> > > > + u8 version;
> > > > };
> > > > #endif /* _XE_SURVIVABILITY_MODE_TYPES_H_ */
> > > > --
> > > > 2.47.1
> > > >
> >
>
prev parent reply other threads:[~2025-11-04 18:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 5:32 [PATCH] drm/xe/xe_survivability: Add support for survivability mode v2 Riana Tauro
2025-10-14 6:10 ` ✓ CI.KUnit: success for " Patchwork
2025-10-14 6:46 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-14 14:09 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-16 19:17 ` [PATCH] " Rodrigo Vivi
2025-10-19 15:55 ` Raag Jadav
2025-10-26 18:59 ` Raag Jadav
2025-10-22 12:38 ` Riana Tauro
2025-11-03 8:05 ` Riana Tauro
2025-11-04 18:16 ` Rodrigo Vivi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQpDAc_EclcByX3A@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=frank.scarbrough@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=raag.jadav@intel.com \
--cc=riana.tauro@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.