* Re: [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink command to drm_ras
2026-04-09 7:33 ` [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink " Riana Tauro
@ 2026-04-09 7:21 ` Tauro, Riana
0 siblings, 0 replies; 4+ messages in thread
From: Tauro, Riana @ 2026-04-09 7:21 UTC (permalink / raw)
To: intel-xe, dri-devel, netdev, rodrigo.vivi, Zack McKevitt,
joonas.lahtinen, aravind.iddamsetty
Cc: anshuman.gupta, simona.vetter, airlied, pratik.bari,
joshua.santosh.ranjan, ashwin.kumar.kulkarni, shubham.kumar,
ravi.kishore.koppuravuri, raag.jadav, anvesh.bakwad,
maarten.lankhorst, Jakub Kicinski, Lijo Lazar, Hawking Zhang,
David S. Miller, Paolo Abeni, Eric Dumazet
Hi Zack
Could you please take a look at this patch if applicable to your
usecase. Please let me know if any
changes are required
@Rodrigo This is already reviewed by Jakub and Raag.
If there are no opens, can this be merged via drm_misc
Thanks
Riana
On 4/9/2026 1:03 PM, Riana Tauro wrote:
> Introduce a new 'clear-error-counter' drm_ras command to reset the counter
> value for a specific error counter of a given node.
>
> The command is a 'do' netlink request with 'node-id' and 'error-id'
> as parameters with no response payload.
>
> Usage:
>
> $ sudo ynl --family drm_ras --do clear-error-counter --json \
> '{"node-id":1, "error-id":1}'
> None
>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
> Cc: Lijo Lazar <lijo.lazar@amd.com>
> Cc: Hawking Zhang <Hawking.Zhang@amd.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> Reviewed-by: Jakub Kicinski <kuba@kernel.org>
> Reviewed-by: Raag Jadav <raag.jadav@intel.com>
> ---
> Documentation/gpu/drm-ras.rst | 8 +++++
> Documentation/netlink/specs/drm_ras.yaml | 13 ++++++-
> drivers/gpu/drm/drm_ras.c | 43 +++++++++++++++++++++++-
> drivers/gpu/drm/drm_ras_nl.c | 13 +++++++
> drivers/gpu/drm/drm_ras_nl.h | 2 ++
> include/drm/drm_ras.h | 11 ++++++
> include/uapi/drm/drm_ras.h | 1 +
> 7 files changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/gpu/drm-ras.rst b/Documentation/gpu/drm-ras.rst
> index 70b246a78fc8..4636e68f5678 100644
> --- a/Documentation/gpu/drm-ras.rst
> +++ b/Documentation/gpu/drm-ras.rst
> @@ -52,6 +52,8 @@ User space tools can:
> as a parameter.
> * Query specific error counter values with the ``get-error-counter`` command, using both
> ``node-id`` and ``error-id`` as parameters.
> +* Clear specific error counters with the ``clear-error-counter`` command, using both
> + ``node-id`` and ``error-id`` as parameters.
>
> YAML-based Interface
> --------------------
> @@ -101,3 +103,9 @@ Example: Query an error counter for a given node
> sudo ynl --family drm_ras --do get-error-counter --json '{"node-id":0, "error-id":1}'
> {'error-id': 1, 'error-name': 'error_name1', 'error-value': 0}
>
> +Example: Clear an error counter for a given node
> +
> +.. code-block:: bash
> +
> + sudo ynl --family drm_ras --do clear-error-counter --json '{"node-id":0, "error-id":1}'
> + None
> diff --git a/Documentation/netlink/specs/drm_ras.yaml b/Documentation/netlink/specs/drm_ras.yaml
> index 79af25dac3c5..e113056f8c01 100644
> --- a/Documentation/netlink/specs/drm_ras.yaml
> +++ b/Documentation/netlink/specs/drm_ras.yaml
> @@ -99,7 +99,7 @@ operations:
> flags: [admin-perm]
> do:
> request:
> - attributes:
> + attributes: &id-attrs
> - node-id
> - error-id
> reply:
> @@ -113,3 +113,14 @@ operations:
> - node-id
> reply:
> attributes: *errorinfo
> + -
> + name: clear-error-counter
> + doc: >-
> + Clear error counter for a given node.
> + The request includes the error-id and node-id of the
> + counter to be cleared.
> + attribute-set: error-counter-attrs
> + flags: [admin-perm]
> + do:
> + request:
> + attributes: *id-attrs
> diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
> index b2fa5ab86d87..d6eab29a1394 100644
> --- a/drivers/gpu/drm/drm_ras.c
> +++ b/drivers/gpu/drm/drm_ras.c
> @@ -26,7 +26,7 @@
> * efficient lookup by ID. Nodes can be registered or unregistered
> * dynamically at runtime.
> *
> - * A Generic Netlink family `drm_ras` exposes two main operations to
> + * A Generic Netlink family `drm_ras` exposes the below operations to
> * userspace:
> *
> * 1. LIST_NODES: Dump all currently registered RAS nodes.
> @@ -37,6 +37,10 @@
> * Returns all counters of a node if only Node ID is provided or specific
> * error counters.
> *
> + * 3. CLEAR_ERROR_COUNTER: Clear error counter of a given node.
> + * Userspace must provide Node ID, Error ID.
> + * Clears specific error counter of a node if supported.
> + *
> * Node registration:
> *
> * - drm_ras_node_register(): Registers a new node and assigns
> @@ -66,6 +70,8 @@
> * operation, fetching all counters from a specific node.
> * - drm_ras_nl_get_error_counter_doit(): Implements the GET_ERROR_COUNTER doit
> * operation, fetching a counter value from a specific node.
> + * - drm_ras_nl_clear_error_counter_doit(): Implements the CLEAR_ERROR_COUNTER doit
> + * operation, clearing a counter value from a specific node.
> */
>
> static DEFINE_XARRAY_ALLOC(drm_ras_xa);
> @@ -314,6 +320,41 @@ int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb,
> return doit_reply_value(info, node_id, error_id);
> }
>
> +/**
> + * drm_ras_nl_clear_error_counter_doit() - Clear an error counter of a node
> + * @skb: Netlink message buffer
> + * @info: Generic Netlink info containing attributes of the request
> + *
> + * Extracts the node ID and error ID from the netlink attributes and
> + * clears the current value.
> + *
> + * Return: 0 on success, or negative errno on failure.
> + */
> +int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
> + struct genl_info *info)
> +{
> + struct drm_ras_node *node;
> + u32 node_id, error_id;
> +
> + if (!info->attrs ||
> + GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID) ||
> + GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID))
> + return -EINVAL;
> +
> + node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID]);
> + error_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID]);
> +
> + node = xa_load(&drm_ras_xa, node_id);
> + if (!node || !node->clear_error_counter)
> + return -ENOENT;
> +
> + if (error_id < node->error_counter_range.first ||
> + error_id > node->error_counter_range.last)
> + return -EINVAL;
> +
> + return node->clear_error_counter(node, error_id);
> +}
> +
> /**
> * drm_ras_node_register() - Register a new RAS node
> * @node: Node structure to register
> diff --git a/drivers/gpu/drm/drm_ras_nl.c b/drivers/gpu/drm/drm_ras_nl.c
> index 16803d0c4a44..dea1c1b2494e 100644
> --- a/drivers/gpu/drm/drm_ras_nl.c
> +++ b/drivers/gpu/drm/drm_ras_nl.c
> @@ -22,6 +22,12 @@ static const struct nla_policy drm_ras_get_error_counter_dump_nl_policy[DRM_RAS_
> [DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, },
> };
>
> +/* DRM_RAS_CMD_CLEAR_ERROR_COUNTER - do */
> +static const struct nla_policy drm_ras_clear_error_counter_nl_policy[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID + 1] = {
> + [DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, },
> + [DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID] = { .type = NLA_U32, },
> +};
> +
> /* Ops table for drm_ras */
> static const struct genl_split_ops drm_ras_nl_ops[] = {
> {
> @@ -43,6 +49,13 @@ static const struct genl_split_ops drm_ras_nl_ops[] = {
> .maxattr = DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID,
> .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
> },
> + {
> + .cmd = DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
> + .doit = drm_ras_nl_clear_error_counter_doit,
> + .policy = drm_ras_clear_error_counter_nl_policy,
> + .maxattr = DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID,
> + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> + },
> };
>
> struct genl_family drm_ras_nl_family __ro_after_init = {
> diff --git a/drivers/gpu/drm/drm_ras_nl.h b/drivers/gpu/drm/drm_ras_nl.h
> index 06ccd9342773..a398643572a5 100644
> --- a/drivers/gpu/drm/drm_ras_nl.h
> +++ b/drivers/gpu/drm/drm_ras_nl.h
> @@ -18,6 +18,8 @@ int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb,
> struct genl_info *info);
> int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb,
> struct netlink_callback *cb);
> +int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
> + struct genl_info *info);
>
> extern struct genl_family drm_ras_nl_family;
>
> diff --git a/include/drm/drm_ras.h b/include/drm/drm_ras.h
> index 5d50209e51db..f2a787bc4f64 100644
> --- a/include/drm/drm_ras.h
> +++ b/include/drm/drm_ras.h
> @@ -58,6 +58,17 @@ struct drm_ras_node {
> int (*query_error_counter)(struct drm_ras_node *node, u32 error_id,
> const char **name, u32 *val);
>
> + /**
> + * @clear_error_counter:
> + *
> + * This callback is used by drm_ras to clear a specific error counter.
> + * Driver should implement this callback to support clearing error counters
> + * of a node.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> + int (*clear_error_counter)(struct drm_ras_node *node, u32 error_id);
> +
> /** @priv: Driver private data */
> void *priv;
> };
> diff --git a/include/uapi/drm/drm_ras.h b/include/uapi/drm/drm_ras.h
> index 5f40fa5b869d..218a3ee86805 100644
> --- a/include/uapi/drm/drm_ras.h
> +++ b/include/uapi/drm/drm_ras.h
> @@ -41,6 +41,7 @@ enum {
> enum {
> DRM_RAS_CMD_LIST_NODES = 1,
> DRM_RAS_CMD_GET_ERROR_COUNTER,
> + DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
>
> __DRM_RAS_CMD_MAX,
> DRM_RAS_CMD_MAX = (__DRM_RAS_CMD_MAX - 1)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 0/2] Add clear-error-counter command to drm_ras
@ 2026-04-09 7:33 Riana Tauro
2026-04-09 7:33 ` [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink " Riana Tauro
2026-04-09 7:33 ` [PATCH v2 2/2] drm/xe/xe_drm_ras: Add support for clear-error-counter in XE drm_ras Riana Tauro
0 siblings, 2 replies; 4+ messages in thread
From: Riana Tauro @ 2026-04-09 7:33 UTC (permalink / raw)
To: intel-xe, dri-devel, netdev
Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
raag.jadav, anvesh.bakwad, maarten.lankhorst, Riana Tauro
Add clear-error-counter command to drm_ras to clear a specific error
counter of a node. The request parameters for this command are
node-id and error-id and no response payload.
Implement the callback in XE driver to demonstrate usage.
Usage:
$ sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":1}'
[{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0},
{'error-id': 2, 'error-name': 'soc-internal', 'error-value': 3}]
$ sudo ynl --family drm_ras --do clear-error-counter --json \
'{"node-id":1, "error-id":2}'
None
$ sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":1}'
[{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0},
{'error-id': 2, 'error-name': 'soc-internal', 'error-value': 0}]
Rev2: Split patches
Riana Tauro (2):
drm/drm_ras: Add clear-error-counter netlink command to drm_ras
drm/xe/xe_drm_ras: Add support for clear-error-counter in XE drm_ras
Documentation/gpu/drm-ras.rst | 8 +++++
Documentation/netlink/specs/drm_ras.yaml | 13 ++++++-
drivers/gpu/drm/drm_ras.c | 43 +++++++++++++++++++++++-
drivers/gpu/drm/drm_ras_nl.c | 13 +++++++
drivers/gpu/drm/drm_ras_nl.h | 2 ++
drivers/gpu/drm/xe/xe_drm_ras.c | 35 +++++++++++++++++--
include/drm/drm_ras.h | 11 ++++++
include/uapi/drm/drm_ras.h | 1 +
8 files changed, 122 insertions(+), 4 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink command to drm_ras
2026-04-09 7:33 [PATCH v2 0/2] Add clear-error-counter command to drm_ras Riana Tauro
@ 2026-04-09 7:33 ` Riana Tauro
2026-04-09 7:21 ` Tauro, Riana
2026-04-09 7:33 ` [PATCH v2 2/2] drm/xe/xe_drm_ras: Add support for clear-error-counter in XE drm_ras Riana Tauro
1 sibling, 1 reply; 4+ messages in thread
From: Riana Tauro @ 2026-04-09 7:33 UTC (permalink / raw)
To: intel-xe, dri-devel, netdev
Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
raag.jadav, anvesh.bakwad, maarten.lankhorst, Riana Tauro,
Jakub Kicinski, Zack McKevitt, Lijo Lazar, Hawking Zhang,
David S. Miller, Paolo Abeni, Eric Dumazet
Introduce a new 'clear-error-counter' drm_ras command to reset the counter
value for a specific error counter of a given node.
The command is a 'do' netlink request with 'node-id' and 'error-id'
as parameters with no response payload.
Usage:
$ sudo ynl --family drm_ras --do clear-error-counter --json \
'{"node-id":1, "error-id":1}'
None
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
Documentation/gpu/drm-ras.rst | 8 +++++
Documentation/netlink/specs/drm_ras.yaml | 13 ++++++-
drivers/gpu/drm/drm_ras.c | 43 +++++++++++++++++++++++-
drivers/gpu/drm/drm_ras_nl.c | 13 +++++++
drivers/gpu/drm/drm_ras_nl.h | 2 ++
include/drm/drm_ras.h | 11 ++++++
include/uapi/drm/drm_ras.h | 1 +
7 files changed, 89 insertions(+), 2 deletions(-)
diff --git a/Documentation/gpu/drm-ras.rst b/Documentation/gpu/drm-ras.rst
index 70b246a78fc8..4636e68f5678 100644
--- a/Documentation/gpu/drm-ras.rst
+++ b/Documentation/gpu/drm-ras.rst
@@ -52,6 +52,8 @@ User space tools can:
as a parameter.
* Query specific error counter values with the ``get-error-counter`` command, using both
``node-id`` and ``error-id`` as parameters.
+* Clear specific error counters with the ``clear-error-counter`` command, using both
+ ``node-id`` and ``error-id`` as parameters.
YAML-based Interface
--------------------
@@ -101,3 +103,9 @@ Example: Query an error counter for a given node
sudo ynl --family drm_ras --do get-error-counter --json '{"node-id":0, "error-id":1}'
{'error-id': 1, 'error-name': 'error_name1', 'error-value': 0}
+Example: Clear an error counter for a given node
+
+.. code-block:: bash
+
+ sudo ynl --family drm_ras --do clear-error-counter --json '{"node-id":0, "error-id":1}'
+ None
diff --git a/Documentation/netlink/specs/drm_ras.yaml b/Documentation/netlink/specs/drm_ras.yaml
index 79af25dac3c5..e113056f8c01 100644
--- a/Documentation/netlink/specs/drm_ras.yaml
+++ b/Documentation/netlink/specs/drm_ras.yaml
@@ -99,7 +99,7 @@ operations:
flags: [admin-perm]
do:
request:
- attributes:
+ attributes: &id-attrs
- node-id
- error-id
reply:
@@ -113,3 +113,14 @@ operations:
- node-id
reply:
attributes: *errorinfo
+ -
+ name: clear-error-counter
+ doc: >-
+ Clear error counter for a given node.
+ The request includes the error-id and node-id of the
+ counter to be cleared.
+ attribute-set: error-counter-attrs
+ flags: [admin-perm]
+ do:
+ request:
+ attributes: *id-attrs
diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
index b2fa5ab86d87..d6eab29a1394 100644
--- a/drivers/gpu/drm/drm_ras.c
+++ b/drivers/gpu/drm/drm_ras.c
@@ -26,7 +26,7 @@
* efficient lookup by ID. Nodes can be registered or unregistered
* dynamically at runtime.
*
- * A Generic Netlink family `drm_ras` exposes two main operations to
+ * A Generic Netlink family `drm_ras` exposes the below operations to
* userspace:
*
* 1. LIST_NODES: Dump all currently registered RAS nodes.
@@ -37,6 +37,10 @@
* Returns all counters of a node if only Node ID is provided or specific
* error counters.
*
+ * 3. CLEAR_ERROR_COUNTER: Clear error counter of a given node.
+ * Userspace must provide Node ID, Error ID.
+ * Clears specific error counter of a node if supported.
+ *
* Node registration:
*
* - drm_ras_node_register(): Registers a new node and assigns
@@ -66,6 +70,8 @@
* operation, fetching all counters from a specific node.
* - drm_ras_nl_get_error_counter_doit(): Implements the GET_ERROR_COUNTER doit
* operation, fetching a counter value from a specific node.
+ * - drm_ras_nl_clear_error_counter_doit(): Implements the CLEAR_ERROR_COUNTER doit
+ * operation, clearing a counter value from a specific node.
*/
static DEFINE_XARRAY_ALLOC(drm_ras_xa);
@@ -314,6 +320,41 @@ int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb,
return doit_reply_value(info, node_id, error_id);
}
+/**
+ * drm_ras_nl_clear_error_counter_doit() - Clear an error counter of a node
+ * @skb: Netlink message buffer
+ * @info: Generic Netlink info containing attributes of the request
+ *
+ * Extracts the node ID and error ID from the netlink attributes and
+ * clears the current value.
+ *
+ * Return: 0 on success, or negative errno on failure.
+ */
+int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct drm_ras_node *node;
+ u32 node_id, error_id;
+
+ if (!info->attrs ||
+ GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID) ||
+ GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID))
+ return -EINVAL;
+
+ node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID]);
+ error_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID]);
+
+ node = xa_load(&drm_ras_xa, node_id);
+ if (!node || !node->clear_error_counter)
+ return -ENOENT;
+
+ if (error_id < node->error_counter_range.first ||
+ error_id > node->error_counter_range.last)
+ return -EINVAL;
+
+ return node->clear_error_counter(node, error_id);
+}
+
/**
* drm_ras_node_register() - Register a new RAS node
* @node: Node structure to register
diff --git a/drivers/gpu/drm/drm_ras_nl.c b/drivers/gpu/drm/drm_ras_nl.c
index 16803d0c4a44..dea1c1b2494e 100644
--- a/drivers/gpu/drm/drm_ras_nl.c
+++ b/drivers/gpu/drm/drm_ras_nl.c
@@ -22,6 +22,12 @@ static const struct nla_policy drm_ras_get_error_counter_dump_nl_policy[DRM_RAS_
[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, },
};
+/* DRM_RAS_CMD_CLEAR_ERROR_COUNTER - do */
+static const struct nla_policy drm_ras_clear_error_counter_nl_policy[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID + 1] = {
+ [DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, },
+ [DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID] = { .type = NLA_U32, },
+};
+
/* Ops table for drm_ras */
static const struct genl_split_ops drm_ras_nl_ops[] = {
{
@@ -43,6 +49,13 @@ static const struct genl_split_ops drm_ras_nl_ops[] = {
.maxattr = DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
},
+ {
+ .cmd = DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
+ .doit = drm_ras_nl_clear_error_counter_doit,
+ .policy = drm_ras_clear_error_counter_nl_policy,
+ .maxattr = DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID,
+ .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+ },
};
struct genl_family drm_ras_nl_family __ro_after_init = {
diff --git a/drivers/gpu/drm/drm_ras_nl.h b/drivers/gpu/drm/drm_ras_nl.h
index 06ccd9342773..a398643572a5 100644
--- a/drivers/gpu/drm/drm_ras_nl.h
+++ b/drivers/gpu/drm/drm_ras_nl.h
@@ -18,6 +18,8 @@ int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb,
struct genl_info *info);
int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
+int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
+ struct genl_info *info);
extern struct genl_family drm_ras_nl_family;
diff --git a/include/drm/drm_ras.h b/include/drm/drm_ras.h
index 5d50209e51db..f2a787bc4f64 100644
--- a/include/drm/drm_ras.h
+++ b/include/drm/drm_ras.h
@@ -58,6 +58,17 @@ struct drm_ras_node {
int (*query_error_counter)(struct drm_ras_node *node, u32 error_id,
const char **name, u32 *val);
+ /**
+ * @clear_error_counter:
+ *
+ * This callback is used by drm_ras to clear a specific error counter.
+ * Driver should implement this callback to support clearing error counters
+ * of a node.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+ int (*clear_error_counter)(struct drm_ras_node *node, u32 error_id);
+
/** @priv: Driver private data */
void *priv;
};
diff --git a/include/uapi/drm/drm_ras.h b/include/uapi/drm/drm_ras.h
index 5f40fa5b869d..218a3ee86805 100644
--- a/include/uapi/drm/drm_ras.h
+++ b/include/uapi/drm/drm_ras.h
@@ -41,6 +41,7 @@ enum {
enum {
DRM_RAS_CMD_LIST_NODES = 1,
DRM_RAS_CMD_GET_ERROR_COUNTER,
+ DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
__DRM_RAS_CMD_MAX,
DRM_RAS_CMD_MAX = (__DRM_RAS_CMD_MAX - 1)
--
2.47.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] drm/xe/xe_drm_ras: Add support for clear-error-counter in XE drm_ras
2026-04-09 7:33 [PATCH v2 0/2] Add clear-error-counter command to drm_ras Riana Tauro
2026-04-09 7:33 ` [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink " Riana Tauro
@ 2026-04-09 7:33 ` Riana Tauro
1 sibling, 0 replies; 4+ messages in thread
From: Riana Tauro @ 2026-04-09 7:33 UTC (permalink / raw)
To: intel-xe, dri-devel, netdev
Cc: aravind.iddamsetty, anshuman.gupta, rodrigo.vivi, joonas.lahtinen,
simona.vetter, airlied, pratik.bari, joshua.santosh.ranjan,
ashwin.kumar.kulkarni, shubham.kumar, ravi.kishore.koppuravuri,
raag.jadav, anvesh.bakwad, maarten.lankhorst, Riana Tauro
Add support for clear-error-counter command in XE drm_ras
This resets the counter value.
Usage:
$ sudo ynl --family drm_ras --do clear-error-counter --json \
'{"node-id":1, "error-id":1}'
None
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
drivers/gpu/drm/xe/xe_drm_ras.c | 35 +++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_ras.c b/drivers/gpu/drm/xe/xe_drm_ras.c
index e07dc23a155e..c21c8b428de6 100644
--- a/drivers/gpu/drm/xe/xe_drm_ras.c
+++ b/drivers/gpu/drm/xe/xe_drm_ras.c
@@ -27,6 +27,16 @@ static int hw_query_error_counter(struct xe_drm_ras_counter *info,
return 0;
}
+static int hw_clear_error_counter(struct xe_drm_ras_counter *info, u32 error_id)
+{
+ if (!info || !info[error_id].name)
+ return -ENOENT;
+
+ atomic_set(&info[error_id].counter, 0);
+
+ return 0;
+}
+
static int query_uncorrectable_error_counter(struct drm_ras_node *ep, u32 error_id,
const char **name, u32 *val)
{
@@ -37,6 +47,15 @@ static int query_uncorrectable_error_counter(struct drm_ras_node *ep, u32 error_
return hw_query_error_counter(info, error_id, name, val);
}
+static int clear_uncorrectable_error_counter(struct drm_ras_node *node, u32 error_id)
+{
+ struct xe_device *xe = node->priv;
+ struct xe_drm_ras *ras = &xe->ras;
+ struct xe_drm_ras_counter *info = ras->info[DRM_XE_RAS_ERR_SEV_UNCORRECTABLE];
+
+ return hw_clear_error_counter(info, error_id);
+}
+
static int query_correctable_error_counter(struct drm_ras_node *ep, u32 error_id,
const char **name, u32 *val)
{
@@ -47,6 +66,15 @@ static int query_correctable_error_counter(struct drm_ras_node *ep, u32 error_id
return hw_query_error_counter(info, error_id, name, val);
}
+static int clear_correctable_error_counter(struct drm_ras_node *node, u32 error_id)
+{
+ struct xe_device *xe = node->priv;
+ struct xe_drm_ras *ras = &xe->ras;
+ struct xe_drm_ras_counter *info = ras->info[DRM_XE_RAS_ERR_SEV_CORRECTABLE];
+
+ return hw_clear_error_counter(info, error_id);
+}
+
static struct xe_drm_ras_counter *allocate_and_copy_counters(struct xe_device *xe)
{
struct xe_drm_ras_counter *counter;
@@ -92,10 +120,13 @@ static int assign_node_params(struct xe_device *xe, struct drm_ras_node *node,
if (IS_ERR(ras->info[severity]))
return PTR_ERR(ras->info[severity]);
- if (severity == DRM_XE_RAS_ERR_SEV_CORRECTABLE)
+ if (severity == DRM_XE_RAS_ERR_SEV_CORRECTABLE) {
node->query_error_counter = query_correctable_error_counter;
- else
+ node->clear_error_counter = clear_correctable_error_counter;
+ } else {
node->query_error_counter = query_uncorrectable_error_counter;
+ node->clear_error_counter = clear_uncorrectable_error_counter;
+ }
return 0;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-09 7:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 7:33 [PATCH v2 0/2] Add clear-error-counter command to drm_ras Riana Tauro
2026-04-09 7:33 ` [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink " Riana Tauro
2026-04-09 7:21 ` Tauro, Riana
2026-04-09 7:33 ` [PATCH v2 2/2] drm/xe/xe_drm_ras: Add support for clear-error-counter in XE drm_ras Riana Tauro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox