From: Raag Jadav <raag.jadav@intel.com>
To: "Tauro, Riana" <riana.tauro@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
netdev@vger.kernel.org, simona.vetter@ffwll.ch,
airlied@gmail.com, kuba@kernel.org, lijo.lazar@amd.com,
Hawking.Zhang@amd.com, davem@davemloft.net, pabeni@redhat.com,
edumazet@google.com, maarten@lankhorst.se,
zachary.mckevitt@oss.qualcomm.com, rodrigo.vivi@intel.com,
michal.wajdeczko@intel.com, matthew.d.roper@intel.com,
umesh.nerlige.ramappa@intel.com, mallesh.koujalagi@intel.com,
soham.purkait@intel.com, anoop.c.vijay@intel.com,
aravind.iddamsetty@linux.intel.com
Subject: Re: [PATCH v1 08/11] drm/xe/ras: Get error threshold support
Date: Tue, 12 May 2026 16:37:43 +0200 [thread overview]
Message-ID: <agM7NzEdT5SsOi4f@black.igk.intel.com> (raw)
In-Reply-To: <81b9b5f6-6107-467a-879c-c2906e8d1f58@intel.com>
On Mon, May 11, 2026 at 10:40:29PM +0530, Tauro, Riana wrote:
> On 4/18/2026 2:46 AM, Raag Jadav wrote:
> > System controller allows programming per error threshold value, which
> > it uses to raise error events to the driver. Get it using mailbox
> > command so that it can be exposed to the user.
> >
> > Signed-off-by: Raag Jadav<raag.jadav@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_ras.c | 73 +++++++++++++++++++
> > drivers/gpu/drm/xe/xe_ras.h | 3 +
> > drivers/gpu/drm/xe/xe_ras_types.h | 22 ++++++
> > drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 2 +
> > 4 files changed, 100 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> > index 08e91348c459..3e93f838aa4a 100644
> > --- a/drivers/gpu/drm/xe/xe_ras.c
> > +++ b/drivers/gpu/drm/xe/xe_ras.c
> > @@ -3,11 +3,14 @@
> > * Copyright © 2026 Intel Corporation
> > */
> > +#include "xe_pm.h"
> > #include "xe_printk.h"
> > #include "xe_ras.h"
> > #include "xe_ras_types.h"
> > #include "xe_sysctrl.h"
> > #include "xe_sysctrl_event_types.h"
> > +#include "xe_sysctrl_mailbox.h"
> > +#include "xe_sysctrl_mailbox_types.h"
> > /* Severity of detected errors */
> > enum xe_ras_severity {
> > @@ -49,6 +52,23 @@ static const char *const xe_ras_components[] = {
> > };
> > static_assert(ARRAY_SIZE(xe_ras_components) == XE_RAS_COMP_MAX);
> > +/* uAPI mapping */
> > +static const int drm_to_xe_ras_components[] = {
> > + [DRM_XE_RAS_ERR_COMP_CORE_COMPUTE] = XE_RAS_COMP_CORE_COMPUTE,
> > + [DRM_XE_RAS_ERR_COMP_SOC_INTERNAL] = XE_RAS_COMP_SOC_INTERNAL,
> > + [DRM_XE_RAS_ERR_COMP_DEVICE_MEMORY] = XE_RAS_COMP_DEVICE_MEMORY,
> > + [DRM_XE_RAS_ERR_COMP_PCIE] = XE_RAS_COMP_PCIE,
> > + [DRM_XE_RAS_ERR_COMP_FABRIC] = XE_RAS_COMP_FABRIC
> > +};
> > +static_assert(ARRAY_SIZE(drm_to_xe_ras_components) == DRM_XE_RAS_ERR_COMP_MAX);
> > +
> > +/* uAPI mapping */
> > +static const int drm_to_xe_ras_severities[] = {
> > + [DRM_XE_RAS_ERR_SEV_CORRECTABLE] = XE_RAS_SEV_CORRECTABLE,
> > + [DRM_XE_RAS_ERR_SEV_UNCORRECTABLE] = XE_RAS_SEV_UNCORRECTABLE
> > +};
> > +static_assert(ARRAY_SIZE(drm_to_xe_ras_severities) == DRM_XE_RAS_ERR_SEV_MAX);
> > +
> > static inline const char *sev_to_str(u8 sev)
> > {
> > if (sev >= XE_RAS_SEV_MAX)
> > @@ -90,3 +110,56 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
> > comp_to_str(component), sev_to_str(severity));
> > }
> > }
> > +
> > +static void ras_command_prepare(struct xe_sysctrl_mailbox_command *command,
> > + void *request, size_t request_len, void *response,
> > + size_t response_len, u8 hdr_cmd)
> > +{
> > + struct xe_sysctrl_app_msg_hdr header = {};
> > +
> > + header.data = REG_FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
> > + REG_FIELD_PREP(APP_HDR_COMMAND_MASK, hdr_cmd);
> > +
> > + command->header = header;
> > + command->data_in = request;
> > + command->data_in_len = request_len;
> > + command->data_out = response;
> > + command->data_out_len = response_len;
> > +}
> > +
> > +int xe_ras_get_threshold(struct xe_device *xe, u32 severity, u32 component, u32 *threshold)
> > +{
> > + struct xe_ras_get_threshold_response response = {};
> > + struct xe_ras_get_threshold_request request = {};
> > + struct xe_sysctrl_mailbox_command command = {};
> > + struct xe_ras_error_class counter = {};
> > + size_t len;
> > + int ret;
> > +
> > + counter.common.severity = drm_to_xe_ras_severities[severity];
> > + counter.common.component = drm_to_xe_ras_components[component];
>
> I see this is only for correctable errors. We do not have correctable memory
> errors
> Do we want to return -EOPNOTSUPP for memory errors?
Could be done, but I'm expecting it to come from firmware since driver
is more or less acting as a transport here.
Raag
> > + request.counter = counter;
> > +
> > + ras_command_prepare(&command, &request, sizeof(request), &response,
> > + sizeof(response), XE_SYSCTRL_CMD_GET_THRESHOLD);
> > +
> > + guard(xe_pm_runtime)(xe);
> > + ret = xe_sysctrl_send_command(&xe->sc, &command, &len);
> > + if (ret) {
> > + xe_err(xe, "sysctrl: failed to get threshold %d\n", ret);
> > + return ret;
> > + }
> > +
> > + if (len != sizeof(response)) {
> > + xe_err(xe, "sysctrl: unexpected get threshold response length %zu (expected %zu)\n",
> > + len, sizeof(response));
> > + return -EIO;
> > + }
> > +
> > + counter = response.counter;
>
> Do we expect this to change?
Nope, but it helps with wrapping below.
> > + *threshold = response.threshold;
> > +
> > + xe_dbg(xe, "[RAS]: Get threshold %u for %s %s\n", response.threshold,
> > + comp_to_str(counter.common.component), sev_to_str(counter.common.severity));
>
> Do we need this. it should be visible to the user via netlink
It's for us, not the user.
> > + return 0;
> > +}
> > diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> > index ea90593b62dc..982bbe61461e 100644
> > --- a/drivers/gpu/drm/xe/xe_ras.h
> > +++ b/drivers/gpu/drm/xe/xe_ras.h
> > @@ -6,10 +6,13 @@
> > #ifndef _XE_RAS_H_
> > #define _XE_RAS_H_
> > +#include <linux/types.h>
> > +
> > struct xe_device;
> > struct xe_sysctrl_event_response;
> > void xe_ras_counter_threshold_crossed(struct xe_device *xe,
> > struct xe_sysctrl_event_response *response);
> > +int xe_ras_get_threshold(struct xe_device *xe, u32 severity, u32 component, u32 *threshold);
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> > index 4e63c67f806a..d5da93d65cf5 100644
> > --- a/drivers/gpu/drm/xe/xe_ras_types.h
> > +++ b/drivers/gpu/drm/xe/xe_ras_types.h
> > @@ -70,4 +70,26 @@ struct xe_ras_threshold_crossed {
> > struct xe_ras_error_class counters[XE_RAS_NUM_COUNTERS];
> > } __packed;
> > +/**
> > + * struct xe_ras_get_threshold_request - Request structure for get threshold
> > + */
> > +struct xe_ras_get_threshold_request {
> > + /** @counter: Counter to get threshold for */
> > + struct xe_ras_error_class counter;
> > + /** @reserved: Reserved for future use */
> > + u32 reserved;
> > +} __packed;
> > +
> > +/**
> > + * struct xe_ras_get_threshold_response - Response structure for get threshold
> > + */
> > +struct xe_ras_get_threshold_response {
> > + /** @counter: Counter id */
>
> Nit: ID
Sure.
Raag
> > + struct xe_ras_error_class counter;
> > + /** @threshold: Threshold value */
> > + u32 threshold;
> > + /** @reserved: Reserved for future use */
> > + u32 reserved[4];
> > +} __packed;
> > +
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> > index 84d7c647e743..a1b71218deca 100644
> > --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> > +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> > @@ -22,9 +22,11 @@ enum xe_sysctrl_group {
> > /**
> > * enum xe_sysctrl_gfsp_cmd - Commands supported by GFSP group
> > *
> > + * @XE_SYSCTRL_CMD_GET_THRESHOLD: Retrieve error threshold
> > * @XE_SYSCTRL_CMD_GET_PENDING_EVENT: Retrieve pending event
> > */
> > enum xe_sysctrl_gfsp_cmd {
> > + XE_SYSCTRL_CMD_GET_THRESHOLD = 0x05,
> > XE_SYSCTRL_CMD_GET_PENDING_EVENT = 0x07,
> > };
next prev parent reply other threads:[~2026-05-12 14:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 21:16 [PATCH v1 00/11] Introduce error threshold to drm_ras Raag Jadav
2026-04-17 21:16 ` [PATCH v1 01/11] drm/ras: Update counter helpers with counter naming Raag Jadav
2026-04-17 21:16 ` [PATCH v1 02/11] drm/ras: Introduce get-error-threshold Raag Jadav
2026-04-22 5:49 ` Tauro, Riana
2026-04-22 6:21 ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 03/11] drm/ras: Introduce set-error-threshold Raag Jadav
2026-04-22 6:12 ` Tauro, Riana
2026-04-17 21:16 ` [PATCH v1 04/11] drm/xe/uapi: Add additional error components to XE drm_ras Raag Jadav
2026-04-17 21:16 ` [PATCH v1 05/11] drm/xe/sysctrl: Add system controller interrupt handler Raag Jadav
2026-04-22 5:55 ` Tauro, Riana
2026-04-22 6:25 ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 06/11] drm/xe/sysctrl: Add system controller event support Raag Jadav
2026-04-17 21:16 ` [PATCH v1 07/11] drm/xe/ras: Introduce correctable error handling Raag Jadav
2026-04-17 21:16 ` [PATCH v1 08/11] drm/xe/ras: Get error threshold support Raag Jadav
2026-05-11 17:10 ` Tauro, Riana
2026-05-12 14:37 ` Raag Jadav [this message]
2026-04-17 21:16 ` [PATCH v1 09/11] drm/xe/ras: Set " Raag Jadav
2026-05-11 17:21 ` Tauro, Riana
2026-05-12 14:44 ` Raag Jadav
2026-05-12 16:52 ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 10/11] drm/xe/drm_ras: Wire up error threshold callbacks Raag Jadav
2026-05-11 17:30 ` Tauro, Riana
2026-04-17 21:16 ` [PATCH v1 11/11] drm/xe/ras: Add flag for Xe RAS Raag Jadav
2026-04-30 14:24 ` Tauro, Riana
2026-04-20 19:51 ` ✗ CI.checkpatch: warning for Introduce error threshold to drm_ras Patchwork
2026-04-20 19:52 ` ✓ CI.KUnit: success " Patchwork
2026-04-20 21:04 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-21 0:01 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agM7NzEdT5SsOi4f@black.igk.intel.com \
--to=raag.jadav@intel.com \
--cc=Hawking.Zhang@amd.com \
--cc=airlied@gmail.com \
--cc=anoop.c.vijay@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=edumazet@google.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=kuba@kernel.org \
--cc=lijo.lazar@amd.com \
--cc=maarten@lankhorst.se \
--cc=mallesh.koujalagi@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona.vetter@ffwll.ch \
--cc=soham.purkait@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
--cc=zachary.mckevitt@oss.qualcomm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.