All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tauro, Riana" <riana.tauro@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>,
	<maarten.lankhorst@linux.intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>, <netdev@vger.kernel.org>,
	Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>,
	<joonas.lahtinen@linux.intel.com>,
	<aravind.iddamsetty@linux.intel.com>, <anshuman.gupta@intel.com>,
	<simona.vetter@ffwll.ch>, <airlied@gmail.com>,
	<pratik.bari@intel.com>, <joshua.santosh.ranjan@intel.com>,
	<ashwin.kumar.kulkarni@intel.com>, <shubham.kumar@intel.com>,
	<ravi.kishore.koppuravuri@intel.com>, <raag.jadav@intel.com>,
	<anvesh.bakwad@intel.com>, Jakub Kicinski <kuba@kernel.org>,
	Lijo Lazar <lijo.lazar@amd.com>,
	Hawking Zhang <Hawking.Zhang@amd.com>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink command to drm_ras
Date: Fri, 10 Apr 2026 10:51:47 +0530	[thread overview]
Message-ID: <ee2681bd-5223-4957-b10e-5dbde0c0e974@intel.com> (raw)
In-Reply-To: <aderpXuhLoiUoxcT@intel.com>

Hi Rodrigo

On 4/9/2026 7:07 PM, Rodrigo Vivi wrote:
> On Thu, Apr 09, 2026 at 12:51:44PM +0530, Tauro, Riana wrote:
>> Hi Zack
>>
>> Could you please take a look at this patch if applicable to your usecase.
>> Please let me know if any
>> changes are required
>>
>> @Rodrigo This is already reviewed by Jakub and Raag.
>> If there are no opens, can this be merged via drm_misc
> if we push this to drm-misc-next, it might take a few weeks to propagate
> back to drm-xe-next. With other work from you and Raag going fast pace
> on drm-xe-next around this area, I'm afraid it could cause some conflicts.
>
> It is definitely fine by me, but another option is to get ack from
> drm-misc maintainers to get this through drm-xe-next.
>

Yeah this would be better with the other RAS patches close to merge.

@Maarten Can you please help with an ack if this patch looks good to you?
This has been reviewed by Jakub from netdev and Raag from intel-xe
There are no other opens.

Thanks
Riana

>
> so, really okay with drm-misc-next?
>
>> Thanks
>> Riana
>>
>> On 4/9/2026 1:03 PM, Riana Tauro wrote:
>>> Introduce a new 'clear-error-counter' drm_ras command to reset the counter
>>> value for a specific error counter of a given node.
>>>
>>> The command is a 'do' netlink request with 'node-id' and 'error-id'
>>> as parameters with no response payload.
>>>
>>> Usage:
>>>
>>> $ sudo ynl --family drm_ras  --do clear-error-counter --json \
>>> '{"node-id":1, "error-id":1}'
>>> None
>>>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
>>> Cc: Lijo Lazar <lijo.lazar@amd.com>
>>> Cc: Hawking Zhang <Hawking.Zhang@amd.com>
>>> Cc: David S. Miller <davem@davemloft.net>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>>> Reviewed-by: Jakub Kicinski <kuba@kernel.org>
>>> Reviewed-by: Raag Jadav <raag.jadav@intel.com>
>>> ---
>>>    Documentation/gpu/drm-ras.rst            |  8 +++++
>>>    Documentation/netlink/specs/drm_ras.yaml | 13 ++++++-
>>>    drivers/gpu/drm/drm_ras.c                | 43 +++++++++++++++++++++++-
>>>    drivers/gpu/drm/drm_ras_nl.c             | 13 +++++++
>>>    drivers/gpu/drm/drm_ras_nl.h             |  2 ++
>>>    include/drm/drm_ras.h                    | 11 ++++++
>>>    include/uapi/drm/drm_ras.h               |  1 +
>>>    7 files changed, 89 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/gpu/drm-ras.rst b/Documentation/gpu/drm-ras.rst
>>> index 70b246a78fc8..4636e68f5678 100644
>>> --- a/Documentation/gpu/drm-ras.rst
>>> +++ b/Documentation/gpu/drm-ras.rst
>>> @@ -52,6 +52,8 @@ User space tools can:
>>>      as a parameter.
>>>    * Query specific error counter values with the ``get-error-counter`` command, using both
>>>      ``node-id`` and ``error-id`` as parameters.
>>> +* Clear specific error counters with the ``clear-error-counter`` command, using both
>>> +  ``node-id`` and ``error-id`` as parameters.
>>>    YAML-based Interface
>>>    --------------------
>>> @@ -101,3 +103,9 @@ Example: Query an error counter for a given node
>>>        sudo ynl --family drm_ras --do get-error-counter --json '{"node-id":0, "error-id":1}'
>>>        {'error-id': 1, 'error-name': 'error_name1', 'error-value': 0}
>>> +Example: Clear an error counter for a given node
>>> +
>>> +.. code-block:: bash
>>> +
>>> +    sudo ynl --family drm_ras --do clear-error-counter --json '{"node-id":0, "error-id":1}'
>>> +    None
>>> diff --git a/Documentation/netlink/specs/drm_ras.yaml b/Documentation/netlink/specs/drm_ras.yaml
>>> index 79af25dac3c5..e113056f8c01 100644
>>> --- a/Documentation/netlink/specs/drm_ras.yaml
>>> +++ b/Documentation/netlink/specs/drm_ras.yaml
>>> @@ -99,7 +99,7 @@ operations:
>>>          flags: [admin-perm]
>>>          do:
>>>            request:
>>> -          attributes:
>>> +          attributes: &id-attrs
>>>                - node-id
>>>                - error-id
>>>            reply:
>>> @@ -113,3 +113,14 @@ operations:
>>>                - node-id
>>>            reply:
>>>              attributes: *errorinfo
>>> +    -
>>> +      name: clear-error-counter
>>> +      doc: >-
>>> +           Clear error counter for a given node.
>>> +           The request includes the error-id and node-id of the
>>> +           counter to be cleared.
>>> +      attribute-set: error-counter-attrs
>>> +      flags: [admin-perm]
>>> +      do:
>>> +        request:
>>> +          attributes: *id-attrs
>>> diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
>>> index b2fa5ab86d87..d6eab29a1394 100644
>>> --- a/drivers/gpu/drm/drm_ras.c
>>> +++ b/drivers/gpu/drm/drm_ras.c
>>> @@ -26,7 +26,7 @@
>>>     * efficient lookup by ID. Nodes can be registered or unregistered
>>>     * dynamically at runtime.
>>>     *
>>> - * A Generic Netlink family `drm_ras` exposes two main operations to
>>> + * A Generic Netlink family `drm_ras` exposes the below operations to
>>>     * userspace:
>>>     *
>>>     * 1. LIST_NODES: Dump all currently registered RAS nodes.
>>> @@ -37,6 +37,10 @@
>>>     *    Returns all counters of a node if only Node ID is provided or specific
>>>     *    error counters.
>>>     *
>>> + * 3. CLEAR_ERROR_COUNTER: Clear error counter of a given node.
>>> + *    Userspace must provide Node ID, Error ID.
>>> + *    Clears specific error counter of a node if supported.
>>> + *
>>>     * Node registration:
>>>     *
>>>     * - drm_ras_node_register(): Registers a new node and assigns
>>> @@ -66,6 +70,8 @@
>>>     *   operation, fetching all counters from a specific node.
>>>     * - drm_ras_nl_get_error_counter_doit(): Implements the GET_ERROR_COUNTER doit
>>>     *   operation, fetching a counter value from a specific node.
>>> + * - drm_ras_nl_clear_error_counter_doit(): Implements the CLEAR_ERROR_COUNTER doit
>>> + *   operation, clearing a counter value from a specific node.
>>>     */
>>>    static DEFINE_XARRAY_ALLOC(drm_ras_xa);
>>> @@ -314,6 +320,41 @@ int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb,
>>>    	return doit_reply_value(info, node_id, error_id);
>>>    }
>>> +/**
>>> + * drm_ras_nl_clear_error_counter_doit() - Clear an error counter of a node
>>> + * @skb: Netlink message buffer
>>> + * @info: Generic Netlink info containing attributes of the request
>>> + *
>>> + * Extracts the node ID and error ID from the netlink attributes and
>>> + * clears the current value.
>>> + *
>>> + * Return: 0 on success, or negative errno on failure.
>>> + */
>>> +int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
>>> +					struct genl_info *info)
>>> +{
>>> +	struct drm_ras_node *node;
>>> +	u32 node_id, error_id;
>>> +
>>> +	if (!info->attrs ||
>>> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID) ||
>>> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID))
>>> +		return -EINVAL;
>>> +
>>> +	node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID]);
>>> +	error_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID]);
>>> +
>>> +	node = xa_load(&drm_ras_xa, node_id);
>>> +	if (!node || !node->clear_error_counter)
>>> +		return -ENOENT;
>>> +
>>> +	if (error_id < node->error_counter_range.first ||
>>> +	    error_id > node->error_counter_range.last)
>>> +		return -EINVAL;
>>> +
>>> +	return node->clear_error_counter(node, error_id);
>>> +}
>>> +
>>>    /**
>>>     * drm_ras_node_register() - Register a new RAS node
>>>     * @node: Node structure to register
>>> diff --git a/drivers/gpu/drm/drm_ras_nl.c b/drivers/gpu/drm/drm_ras_nl.c
>>> index 16803d0c4a44..dea1c1b2494e 100644
>>> --- a/drivers/gpu/drm/drm_ras_nl.c
>>> +++ b/drivers/gpu/drm/drm_ras_nl.c
>>> @@ -22,6 +22,12 @@ static const struct nla_policy drm_ras_get_error_counter_dump_nl_policy[DRM_RAS_
>>>    	[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, },
>>>    };
>>> +/* DRM_RAS_CMD_CLEAR_ERROR_COUNTER - do */
>>> +static const struct nla_policy drm_ras_clear_error_counter_nl_policy[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID + 1] = {
>>> +	[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, },
>>> +	[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID] = { .type = NLA_U32, },
>>> +};
>>> +
>>>    /* Ops table for drm_ras */
>>>    static const struct genl_split_ops drm_ras_nl_ops[] = {
>>>    	{
>>> @@ -43,6 +49,13 @@ static const struct genl_split_ops drm_ras_nl_ops[] = {
>>>    		.maxattr	= DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID,
>>>    		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
>>>    	},
>>> +	{
>>> +		.cmd		= DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
>>> +		.doit		= drm_ras_nl_clear_error_counter_doit,
>>> +		.policy		= drm_ras_clear_error_counter_nl_policy,
>>> +		.maxattr	= DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID,
>>> +		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
>>> +	},
>>>    };
>>>    struct genl_family drm_ras_nl_family __ro_after_init = {
>>> diff --git a/drivers/gpu/drm/drm_ras_nl.h b/drivers/gpu/drm/drm_ras_nl.h
>>> index 06ccd9342773..a398643572a5 100644
>>> --- a/drivers/gpu/drm/drm_ras_nl.h
>>> +++ b/drivers/gpu/drm/drm_ras_nl.h
>>> @@ -18,6 +18,8 @@ int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb,
>>>    				      struct genl_info *info);
>>>    int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb,
>>>    					struct netlink_callback *cb);
>>> +int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
>>> +					struct genl_info *info);
>>>    extern struct genl_family drm_ras_nl_family;
>>> diff --git a/include/drm/drm_ras.h b/include/drm/drm_ras.h
>>> index 5d50209e51db..f2a787bc4f64 100644
>>> --- a/include/drm/drm_ras.h
>>> +++ b/include/drm/drm_ras.h
>>> @@ -58,6 +58,17 @@ struct drm_ras_node {
>>>    	int (*query_error_counter)(struct drm_ras_node *node, u32 error_id,
>>>    				   const char **name, u32 *val);
>>> +	/**
>>> +	 * @clear_error_counter:
>>> +	 *
>>> +	 * This callback is used by drm_ras to clear a specific error counter.
>>> +	 * Driver should implement this callback to support clearing error counters
>>> +	 * of a node.
>>> +	 *
>>> +	 * Returns: 0 on success, negative error code on failure.
>>> +	 */
>>> +	int (*clear_error_counter)(struct drm_ras_node *node, u32 error_id);
>>> +
>>>    	/** @priv: Driver private data */
>>>    	void *priv;
>>>    };
>>> diff --git a/include/uapi/drm/drm_ras.h b/include/uapi/drm/drm_ras.h
>>> index 5f40fa5b869d..218a3ee86805 100644
>>> --- a/include/uapi/drm/drm_ras.h
>>> +++ b/include/uapi/drm/drm_ras.h
>>> @@ -41,6 +41,7 @@ enum {
>>>    enum {
>>>    	DRM_RAS_CMD_LIST_NODES = 1,
>>>    	DRM_RAS_CMD_GET_ERROR_COUNTER,
>>> +	DRM_RAS_CMD_CLEAR_ERROR_COUNTER,
>>>    	__DRM_RAS_CMD_MAX,
>>>    	DRM_RAS_CMD_MAX = (__DRM_RAS_CMD_MAX - 1)

  reply	other threads:[~2026-04-10  5:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09  7:33 [PATCH v2 0/2] Add clear-error-counter command to drm_ras Riana Tauro
2026-04-09  7:07 ` ✗ CI.checkpatch: warning for " Patchwork
2026-04-09  7:08 ` ✗ CI.KUnit: failure " Patchwork
2026-04-09  7:33 ` [PATCH v2 1/2] drm/drm_ras: Add clear-error-counter netlink " Riana Tauro
2026-04-09  7:21   ` Tauro, Riana
2026-04-09 13:37     ` Rodrigo Vivi
2026-04-10  5:21       ` Tauro, Riana [this message]
2026-04-21 14:25         ` Tauro, Riana
2026-04-09 23:01     ` Zack McKevitt
2026-04-10  5:25       ` Tauro, Riana
2026-04-09  7:33 ` [PATCH v2 2/2] drm/xe/xe_drm_ras: Add support for clear-error-counter in XE drm_ras Riana Tauro
2026-04-20 18:12 ` ✗ CI.checkpatch: warning for Add clear-error-counter command to drm_ras (rev2) Patchwork
2026-04-20 18:13 ` ✓ CI.KUnit: success " Patchwork
2026-04-20 19:04 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-20 21:39 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee2681bd-5223-4957-b10e-5dbde0c0e974@intel.com \
    --to=riana.tauro@intel.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=airlied@gmail.com \
    --cc=anshuman.gupta@intel.com \
    --cc=anvesh.bakwad@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=ashwin.kumar.kulkarni@intel.com \
    --cc=davem@davemloft.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=edumazet@google.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=joshua.santosh.ranjan@intel.com \
    --cc=kuba@kernel.org \
    --cc=lijo.lazar@amd.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pratik.bari@intel.com \
    --cc=raag.jadav@intel.com \
    --cc=ravi.kishore.koppuravuri@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=shubham.kumar@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=zachary.mckevitt@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.