From: "Tauro, Riana" <riana.tauro@intel.com>
To: Raag Jadav <raag.jadav@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<rodrigo.vivi@intel.com>, <aravind.iddamsetty@linux.intel.com>,
<badal.nilawar@intel.com>, <ravi.kishore.koppuravuri@intel.com>,
<mallesh.koujalagi@intel.com>, <soham.purkait@intel.com>
Subject: Re: [PATCH v5 2/6] drm/xe/xe_ras: Add support to get error counter in CRI
Date: Tue, 12 May 2026 10:57:50 +0530 [thread overview]
Message-ID: <c25d46c5-8c40-4665-9472-d1c84fbaf996@intel.com> (raw)
In-Reply-To: <agH1Xch7B6Ube4_5@black.igk.intel.com>
On 5/11/2026 8:57 PM, Raag Jadav wrote:
> On Mon, May 04, 2026 at 12:26:17PM +0530, Riana Tauro wrote:
>> Add request/response structures and helper functions to query system
>> controller to get error counter value.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> v2: add structures for clear counter
>> move commands to sysctrl file
>> split functions
>> fix commit message (Raag)
>>
>> v3: fix log message
>> squash patches
>> change error code for sysctrl error (Raag)
>>
>> v4: rename function
>> remove unecessary macro (Raag)
>> add documentation for enum
>>
>> v5: rebase
>> ---
>> drivers/gpu/drm/xe/xe_ras.c | 91 +++++++++++++++++++
>> drivers/gpu/drm/xe/xe_ras.h | 4 +
>> drivers/gpu/drm/xe/xe_ras_types.h | 30 ++++++
>> drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 2 +
>> 4 files changed, 127 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index 4cb16b419b0c..47a58ce3b3ca 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -4,11 +4,14 @@
>> */
>>
>> #include "xe_device.h"
>> +#include "xe_pm.h"
>> #include "xe_printk.h"
>> #include "xe_ras.h"
>> #include "xe_ras_types.h"
>> #include "xe_sysctrl.h"
>> #include "xe_sysctrl_event_types.h"
>> +#include "xe_sysctrl_mailbox.h"
>> +#include "xe_sysctrl_mailbox_types.h"
>>
>> /* Severity of detected errors */
>> enum xe_ras_severity {
>> @@ -50,6 +53,23 @@ static const char *const xe_ras_components[] = {
>> };
>> static_assert(ARRAY_SIZE(xe_ras_components) == XE_RAS_COMP_MAX);
>>
>> +/* Mapping from drm_xe_ras_error_component to xe_ras_component */
>> +static const int drm_to_xe_ras_component[] = {
>> + [DRM_XE_RAS_ERR_COMP_CORE_COMPUTE] = XE_RAS_COMP_CORE_COMPUTE,
>> + [DRM_XE_RAS_ERR_COMP_SOC_INTERNAL] = XE_RAS_COMP_SOC_INTERNAL,
>> + [DRM_XE_RAS_ERR_COMP_DEVICE_MEMORY] = XE_RAS_COMP_DEVICE_MEMORY,
>> + [DRM_XE_RAS_ERR_COMP_PCIE] = XE_RAS_COMP_PCIE,
>> + [DRM_XE_RAS_ERR_COMP_FABRIC] = XE_RAS_COMP_FABRIC
>> +};
>> +static_assert(ARRAY_SIZE(drm_to_xe_ras_component) == DRM_XE_RAS_ERR_COMP_MAX);
>> +
>> +/* Mapping from drm_xe_ras_error_severity to xe_ras_severity */
>> +static const int drm_to_xe_ras_severity[] = {
>> + [DRM_XE_RAS_ERR_SEV_CORRECTABLE] = XE_RAS_SEV_CORRECTABLE,
>> + [DRM_XE_RAS_ERR_SEV_UNCORRECTABLE] = XE_RAS_SEV_UNCORRECTABLE
>> +};
>> +static_assert(ARRAY_SIZE(drm_to_xe_ras_severity) == DRM_XE_RAS_ERR_SEV_MAX);
> So we don't accept new entries unless also added in uapi and vice versa
> which is good, but if you feel the need to have bounds checking just
> switch() instead.
If i move this to switch, there will be inconsistency with the component
array.
The component array has multiple entries so array is preferable.
Also bounds check is already done in upper layers for this so not needed.
>
>> static inline const char *sev_to_str(u8 severity)
>> {
>> if (severity >= XE_RAS_SEV_MAX)
>> @@ -66,6 +86,22 @@ static inline const char *comp_to_str(u8 component)
>> return xe_ras_components[component];
>> }
>>
>> +static void prepare_ras_command(struct xe_sysctrl_mailbox_command *command,
> This looks like it should be a sysctrl helper (for non-RAS mailbox users).
We do not have any non-ras users. We can move if we need it
>
>> + u32 cmd, void *request, size_t request_len,
>> + void *response, size_t response_len)
>> +{
>> + struct xe_sysctrl_app_msg_hdr header = {0};
>> +
>> + header.data = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
>> + FIELD_PREP(APP_HDR_COMMAND_MASK, cmd);
>> +
>> + command->header = header;
>> + command->data_in = request;
>> + command->data_in_len = request_len;
>> + command->data_out = response;
>> + command->data_out_len = response_len;
>> +}
>> +
>> void xe_ras_counter_threshold_crossed(struct xe_device *xe,
>> struct xe_sysctrl_event_response *response)
>> {
>> @@ -91,3 +127,58 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
>> comp_to_str(component), sev_to_str(severity));
>> }
>> }
>> +
>> +static int get_counter(struct xe_device *xe, struct xe_ras_error_class *error_class,
> s/error_class/counter
>
> Single word parameters usually help with wrapping but ofcourse it's a
> personal preference.
okay
>
>> + u32 *value)
>> +{
>> + struct xe_ras_get_counter_response response = {0};
>> + struct xe_ras_get_counter_request request = {0};
>> + struct xe_sysctrl_mailbox_command command = {0};
>> + size_t rlen;
>> + int ret;
>> +
>> + request.error_class = *error_class;
>> +
>> + prepare_ras_command(&command, XE_SYSCTRL_CMD_GET_COUNTER, &request, sizeof(request),
>> + &response, sizeof(response));
>> +
>> + ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
>> + if (ret) {
>> + xe_err(xe, "sysctrl: failed to get counter %d\n", ret);
>> + return ret;
>> + }
>> +
>> + if (rlen != sizeof(response)) {
>> + xe_err(xe, "sysctrl: unexpected get counter response length %zu (expected %zu)\n",
>> + rlen, sizeof(response));
>> + return -EIO;
>> + }
>> +
>> + *value = response.counter_value;
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * xe_ras_get_counter() - Get error counter value
>> + * @xe: xe device instance
>> + * @severity: Error severity level to be queried
>> + * @error_id: Error component to be queried
>> + * @value: Counter value
>> + *
>> + * This function retrieves the value of a specific error counter based on
>> + * the error severity and component.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int xe_ras_get_counter(struct xe_device *xe, enum drm_xe_ras_error_severity severity,
> const for consistency.
We do not need const since it's passed by value
consistency ? the upper layer does not have a const
Thanks
Riana
>
> Raag
>
>> + u32 error_id, u32 *value)
>> +{
>> + struct xe_ras_error_class error_class = {0};
>> +
>> + error_class.common.severity = drm_to_xe_ras_severity[severity];
>> + error_class.common.component = drm_to_xe_ras_component[error_id];
>> +
>> + guard(xe_pm_runtime)(xe);
>> + return get_counter(xe, &error_class, value);
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> index ea90593b62dc..74582c911b02 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.h
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -6,10 +6,14 @@
>> #ifndef _XE_RAS_H_
>> #define _XE_RAS_H_
>>
>> +#include <uapi/drm/xe_drm.h>
>> +
>> struct xe_device;
>> struct xe_sysctrl_event_response;
>>
>> void xe_ras_counter_threshold_crossed(struct xe_device *xe,
>> struct xe_sysctrl_event_response *response);
>> +int xe_ras_get_counter(struct xe_device *xe, enum drm_xe_ras_error_severity severity,
>> + u32 error_id, u32 *value);
>>
>> #endif
>> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
>> index 4e63c67f806a..74d85875cd63 100644
>> --- a/drivers/gpu/drm/xe/xe_ras_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
>> @@ -70,4 +70,34 @@ struct xe_ras_threshold_crossed {
>> struct xe_ras_error_class counters[XE_RAS_NUM_COUNTERS];
>> } __packed;
>>
>> +/**
>> + * struct xe_ras_get_counter_request - Request for get error counter
>> + */
>> +struct xe_ras_get_counter_request {
>> + /** @error_class: Error class counter to be queried */
>> + struct xe_ras_error_class error_class;
>> + /** @reserved: Reserved for future use */
>> + u32 reserved;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_get_counter_response - Response for get error counter
>> + */
>> +struct xe_ras_get_counter_response {
>> + /** @error_class: Error class counter that was queried */
>> + struct xe_ras_error_class error_class;
>> + /** @counter_value: Current counter value */
>> + u32 counter_value;
>> + /** @timestamp: Timestamp when counter was last updated */
>> + u64 timestamp;
>> + /** @threshold_value: Threshold value for the counter */
>> + u32 threshold_value;
>> + /** @counter_status: Status of the counter */
>> + u32 counter_status:8;
>> + /** @reserved: Reserved for future use */
>> + u32 reserved:24;
>> + /** @reserved1: Reserved for future use */
>> + u32 reserved1[56];
>> +} __packed;
>> +
>> #endif
>> diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> index 84d7c647e743..b315847cbf64 100644
>> --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> @@ -22,9 +22,11 @@ enum xe_sysctrl_group {
>> /**
>> * enum xe_sysctrl_gfsp_cmd - Commands supported by GFSP group
>> *
>> + * @XE_SYSCTRL_CMD_GET_COUNTER: Get error counter value
>> * @XE_SYSCTRL_CMD_GET_PENDING_EVENT: Retrieve pending event
>> */
>> enum xe_sysctrl_gfsp_cmd {
>> + XE_SYSCTRL_CMD_GET_COUNTER = 0x03,
>> XE_SYSCTRL_CMD_GET_PENDING_EVENT = 0x07,
>> };
>>
>> --
>> 2.47.1
>>
next prev parent reply other threads:[~2026-05-12 5:28 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 6:56 [PATCH v5 0/6] Add get-error-counter and clear-error-counter support for CRI Riana Tauro
2026-05-04 6:43 ` ✗ CI.checkpatch: warning for Add get-error-counter and clear-error-counter support for CRI (rev4) Patchwork
2026-05-04 6:45 ` ✓ CI.KUnit: success " Patchwork
2026-05-04 6:56 ` [PATCH v5 1/6] drm/xe/uapi: Add additional error components to xe drm_ras Riana Tauro
2026-05-08 6:37 ` Mallesh, Koujalagi
2026-05-12 6:58 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 2/6] drm/xe/xe_ras: Add support to get error counter in CRI Riana Tauro
2026-05-06 8:03 ` Mallesh, Koujalagi
2026-05-06 8:59 ` Tauro, Riana
2026-05-11 15:27 ` Raag Jadav
2026-05-12 5:27 ` Tauro, Riana [this message]
2026-05-12 5:47 ` Raag Jadav
2026-05-13 8:43 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 3/6] drm/xe/xe_ras: Add helper to clear error counter Riana Tauro
2026-05-08 7:50 ` Mallesh, Koujalagi
2026-05-11 6:20 ` Tauro, Riana
2026-05-11 7:42 ` Mallesh, Koujalagi
2026-05-11 7:49 ` Tauro, Riana
2026-05-11 15:32 ` Raag Jadav
2026-05-12 6:48 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 4/6] drm/xe/xe_drm_ras: Wire get-error-counter and clear-error-counter support for CRI Riana Tauro
2026-05-11 15:34 ` Raag Jadav
2026-05-12 5:08 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 5/6] drm/xe/xe_ras: Move xe drm_ras registration Riana Tauro
2026-05-04 10:53 ` Tauro, Riana
2026-05-04 16:22 ` Raag Jadav
2026-05-12 5:04 ` Tauro, Riana
2026-05-12 16:19 ` Anoop Vijay
2026-05-11 15:36 ` Raag Jadav
2026-05-04 6:56 ` [PATCH v5 6/6] drm/xe/xe_ras: Control xe drm_ras registration with a flag Riana Tauro
2026-05-11 15:46 ` Raag Jadav
2026-05-04 8:00 ` ✓ Xe.CI.BAT: success for Add get-error-counter and clear-error-counter support for CRI (rev4) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c25d46c5-8c40-4665-9472-d1c84fbaf996@intel.com \
--to=riana.tauro@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=raag.jadav@intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=soham.purkait@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox