From: "Mallesh, Koujalagi" <mallesh.koujalagi@intel.com>
To: Riana Tauro <riana.tauro@intel.com>
Cc: <anshuman.gupta@intel.com>, <rodrigo.vivi@intel.com>,
<aravind.iddamsetty@linux.intel.com>, <badal.nilawar@intel.com>,
<raag.jadav@intel.com>, <ravi.kishore.koppuravuri@intel.com>,
<soham.purkait@intel.com>, <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v5 3/6] drm/xe/xe_ras: Add helper to clear error counter
Date: Fri, 8 May 2026 13:20:33 +0530 [thread overview]
Message-ID: <c4df83de-8203-4b2d-b0f8-c8076fcadac6@intel.com> (raw)
In-Reply-To: <20260504065614.3832331-11-riana.tauro@intel.com>
On 04-05-2026 12:26 pm, Riana Tauro wrote:
> Add structures and helper function to clear error counter value.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
> v2: add status codes (Aravind)
> fix log message
> squash structure patch (Raag)
>
> v3: rename function
> add comma to enum members to avoid
> redundant churn
> align with tabs (Raag)
>
> v4: rebase
> ---
> drivers/gpu/drm/xe/xe_ras.c | 76 +++++++++++++++++++
> drivers/gpu/drm/xe/xe_ras.h | 2 +
> drivers/gpu/drm/xe/xe_ras_types.h | 25 ++++++
> drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 2 +
> 4 files changed, 105 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 47a58ce3b3ca..07f6837694e7 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -34,6 +34,17 @@ enum xe_ras_component {
> XE_RAS_COMP_MAX
> };
>
> +/* RAS response status codes */
> +enum xe_ras_response_status {
> + XE_RAS_STATUS_SUCCESS = 0,
> + XE_RAS_STATUS_INVALID_PARAM,
> + XE_RAS_STATUS_OP_NOT_SUPPORTED,
> + XE_RAS_STATUS_TIMEOUT,
> + XE_RAS_STATUS_HARDWARE_FAILURE,
> + XE_RAS_STATUS_INSUFFICIENT_RESOURCES,
> + XE_RAS_STATUS_UNKNOWN_ERROR
> +};
Add XE_RAS_STATUS_MAX right
> +
> static const char *const xe_ras_severities[] = {
> [XE_RAS_SEV_NOT_SUPPORTED] = "Not Supported",
> [XE_RAS_SEV_CORRECTABLE] = "Correctable Error",
> @@ -53,6 +64,16 @@ static const char *const xe_ras_components[] = {
> };
> static_assert(ARRAY_SIZE(xe_ras_components) == XE_RAS_COMP_MAX);
>
> +static const int ras_status_to_errno_map[] = {
> + [XE_RAS_STATUS_SUCCESS] = 0,
> + [XE_RAS_STATUS_INVALID_PARAM] = -EINVAL,
> + [XE_RAS_STATUS_OP_NOT_SUPPORTED] = -EOPNOTSUPP,
> + [XE_RAS_STATUS_TIMEOUT] = -ETIMEDOUT,
> + [XE_RAS_STATUS_HARDWARE_FAILURE] = -EIO,
> + [XE_RAS_STATUS_INSUFFICIENT_RESOURCES] = -ENAVAIL,
> + [XE_RAS_STATUS_UNKNOWN_ERROR] = -ENODATA
Are ENAVAIL(No XENIX semaphores available)/ENODATA(No data available) correct errno choice?
> +};
> +
> /* Mapping from drm_xe_ras_error_component to xe_ras_component */
> static const int drm_to_xe_ras_component[] = {
> [DRM_XE_RAS_ERR_COMP_CORE_COMPUTE] = XE_RAS_COMP_CORE_COMPUTE,
> @@ -70,6 +91,13 @@ static const int drm_to_xe_ras_severity[] = {
> };
> static_assert(ARRAY_SIZE(drm_to_xe_ras_severity) == DRM_XE_RAS_ERR_SEV_MAX);
>
> +static int ras_status_to_errno(enum xe_ras_response_status status)
> +{
> + if (status > XE_RAS_STATUS_UNKNOWN_ERROR)
> + status = XE_RAS_STATUS_UNKNOWN_ERROR;
> +
Check status against XE_RAS_STASTUS_MAX, if it failed log in and provide
-EIO and return.
Thanks,
-/Mallesh
> + return ras_status_to_errno_map[status];
> +}
> static inline const char *sev_to_str(u8 severity)
> {
> if (severity >= XE_RAS_SEV_MAX)
> @@ -182,3 +210,51 @@ int xe_ras_get_counter(struct xe_device *xe, enum drm_xe_ras_error_severity seve
> guard(xe_pm_runtime)(xe);
> return get_counter(xe, &error_class, value);
> }
> +
> +/**
> + * xe_ras_clear_counter() - Clear error counter value
> + * @xe: xe device instance
> + * @severity: Error severity level to be cleared
> + * @error_id: Error component to be cleared
> + *
> + * This function clears the value of a specific error counter based on
> + * the error severity and component.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int xe_ras_clear_counter(struct xe_device *xe, enum drm_xe_ras_error_severity severity,
> + u32 error_id)
> +{
> + struct xe_ras_clear_counter_response response = {0};
> + struct xe_ras_clear_counter_request request = {0};
> + struct xe_sysctrl_mailbox_command command = {0};
> + struct xe_ras_error_class *error_class;
> + size_t rlen;
> + int ret;
> +
> + error_class = &request.error_class;
> + error_class->common.severity = drm_to_xe_ras_severity[severity];
> + error_class->common.component = drm_to_xe_ras_component[error_id];
> +
> + prepare_ras_command(&command, XE_SYSCTRL_CMD_CLEAR_COUNTER, &request, sizeof(request),
> + &response, sizeof(response));
> +
> + guard(xe_pm_runtime)(xe);
> + ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
> + if (ret) {
> + xe_err(xe, "sysctrl: failed to clear counter %d\n", ret);
> + return ret;
> + }
> +
> + if (rlen != sizeof(response)) {
> + xe_err(xe, "sysctrl: unexpected clear counter response length %zu (expected %zu)\n",
> + rlen, sizeof(response));
> + return -EIO;
> + }
> +
> + ret = ras_status_to_errno(response.status);
> + if (ret)
> + xe_err(xe, "sysctrl: clear counter command failed with status %d\n", ret);
> +
> + return ret;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> index 74582c911b02..bbb9d42bd128 100644
> --- a/drivers/gpu/drm/xe/xe_ras.h
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -15,5 +15,7 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
> struct xe_sysctrl_event_response *response);
> int xe_ras_get_counter(struct xe_device *xe, enum drm_xe_ras_error_severity severity,
> u32 error_id, u32 *value);
> +int xe_ras_clear_counter(struct xe_device *xe, enum drm_xe_ras_error_severity severity,
> + u32 error_id);
>
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> index 74d85875cd63..44369fc8ef03 100644
> --- a/drivers/gpu/drm/xe/xe_ras_types.h
> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
> @@ -100,4 +100,29 @@ struct xe_ras_get_counter_response {
> u32 reserved1[56];
> } __packed;
>
> +/**
> + * struct xe_ras_clear_counter_request - Request for clearing an error counter
> + */
> +struct xe_ras_clear_counter_request {
> + /** @error_class: Counter class to be cleared */
> + struct xe_ras_error_class error_class;
> + /** @reserved: Reserved for future use */
> + u32 reserved;
> +} __packed;
> +
> +/**
> + * struct xe_ras_clear_counter_response - Response after clearing an error counter
> + */
> +struct xe_ras_clear_counter_response {
> + /** @error_class: Counter class that was cleared */
> + struct xe_ras_error_class error_class;
> + /** @previous_counter_value: Counter value before clearing */
> + u32 previous_counter_value;
> + /** @clear_timestamp: Timestamp when the counter was cleared */
> + u64 clear_timestamp;
> + /** @status: Status of the clear operation */
> + u32 status;
> + /** @reserved: Reserved for future use */
> + u32 reserved[3];
> +} __packed;
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> index b315847cbf64..6e3753554510 100644
> --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> @@ -23,10 +23,12 @@ enum xe_sysctrl_group {
> * enum xe_sysctrl_gfsp_cmd - Commands supported by GFSP group
> *
> * @XE_SYSCTRL_CMD_GET_COUNTER: Get error counter value
> + * @XE_SYSCTRL_CMD_CLEAR_COUNTER: Clear error counter value
> * @XE_SYSCTRL_CMD_GET_PENDING_EVENT: Retrieve pending event
> */
> enum xe_sysctrl_gfsp_cmd {
> XE_SYSCTRL_CMD_GET_COUNTER = 0x03,
> + XE_SYSCTRL_CMD_CLEAR_COUNTER = 0x04,
> XE_SYSCTRL_CMD_GET_PENDING_EVENT = 0x07,
> };
>
next prev parent reply other threads:[~2026-05-08 7:50 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 6:56 [PATCH v5 0/6] Add get-error-counter and clear-error-counter support for CRI Riana Tauro
2026-05-04 6:43 ` ✗ CI.checkpatch: warning for Add get-error-counter and clear-error-counter support for CRI (rev4) Patchwork
2026-05-04 6:45 ` ✓ CI.KUnit: success " Patchwork
2026-05-04 6:56 ` [PATCH v5 1/6] drm/xe/uapi: Add additional error components to xe drm_ras Riana Tauro
2026-05-08 6:37 ` Mallesh, Koujalagi
2026-05-12 6:58 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 2/6] drm/xe/xe_ras: Add support to get error counter in CRI Riana Tauro
2026-05-06 8:03 ` Mallesh, Koujalagi
2026-05-06 8:59 ` Tauro, Riana
2026-05-11 15:27 ` Raag Jadav
2026-05-12 5:27 ` Tauro, Riana
2026-05-12 5:47 ` Raag Jadav
2026-05-13 8:43 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 3/6] drm/xe/xe_ras: Add helper to clear error counter Riana Tauro
2026-05-08 7:50 ` Mallesh, Koujalagi [this message]
2026-05-11 6:20 ` Tauro, Riana
2026-05-11 7:42 ` Mallesh, Koujalagi
2026-05-11 7:49 ` Tauro, Riana
2026-05-11 15:32 ` Raag Jadav
2026-05-12 6:48 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 4/6] drm/xe/xe_drm_ras: Wire get-error-counter and clear-error-counter support for CRI Riana Tauro
2026-05-11 15:34 ` Raag Jadav
2026-05-12 5:08 ` Tauro, Riana
2026-05-04 6:56 ` [PATCH v5 5/6] drm/xe/xe_ras: Move xe drm_ras registration Riana Tauro
2026-05-04 10:53 ` Tauro, Riana
2026-05-04 16:22 ` Raag Jadav
2026-05-12 5:04 ` Tauro, Riana
2026-05-12 16:19 ` Anoop Vijay
2026-05-11 15:36 ` Raag Jadav
2026-05-04 6:56 ` [PATCH v5 6/6] drm/xe/xe_ras: Control xe drm_ras registration with a flag Riana Tauro
2026-05-11 15:46 ` Raag Jadav
2026-05-04 8:00 ` ✓ Xe.CI.BAT: success for Add get-error-counter and clear-error-counter support for CRI (rev4) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4df83de-8203-4b2d-b0f8-c8076fcadac6@intel.com \
--to=mallesh.koujalagi@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=raag.jadav@intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=riana.tauro@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=soham.purkait@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.