All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tauro, Riana" <riana.tauro@intel.com>
To: Raag Jadav <raag.jadav@intel.com>,
	<intel-xe@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>, <netdev@vger.kernel.org>
Cc: <simona.vetter@ffwll.ch>, <airlied@gmail.com>, <kuba@kernel.org>,
	<lijo.lazar@amd.com>, <Hawking.Zhang@amd.com>,
	<davem@davemloft.net>, <pabeni@redhat.com>, <edumazet@google.com>,
	<maarten@lankhorst.se>, <zachary.mckevitt@oss.qualcomm.com>,
	<rodrigo.vivi@intel.com>, <michal.wajdeczko@intel.com>,
	<matthew.d.roper@intel.com>, <umesh.nerlige.ramappa@intel.com>,
	<mallesh.koujalagi@intel.com>, <soham.purkait@intel.com>,
	<anoop.c.vijay@intel.com>, <aravind.iddamsetty@linux.intel.com>
Subject: Re: [PATCH v1 03/11] drm/ras: Introduce set-error-threshold
Date: Wed, 22 Apr 2026 11:42:12 +0530	[thread overview]
Message-ID: <663d0ee8-98d6-4f26-ab0f-b3ab44d0fa23@intel.com> (raw)
In-Reply-To: <20260417211730.837345-4-raag.jadav@intel.com>


On 4/18/2026 2:46 AM, Raag Jadav wrote:
> Add set-error-threshold command support which allows setting threshold
> value of the error. Threshold in RAS context means the number of errors
> the hardware is expected to accumulate before it raises them to software.
> This is to have a fine grained control over error notifications that are
> raised by the hardware.
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
>   Documentation/gpu/drm-ras.rst            |  9 +++++
>   Documentation/netlink/specs/drm_ras.yaml | 12 ++++++
>   drivers/gpu/drm/drm_ras.c                | 48 ++++++++++++++++++++++++
>   drivers/gpu/drm/drm_ras_nl.c             | 14 +++++++
>   drivers/gpu/drm/drm_ras_nl.h             |  2 +
>   include/drm/drm_ras.h                    | 13 +++++++
>   include/uapi/drm/drm_ras.h               |  1 +
>   7 files changed, 99 insertions(+)
>
> diff --git a/Documentation/gpu/drm-ras.rst b/Documentation/gpu/drm-ras.rst
> index 6443dfd1677f..a819aa150604 100644
> --- a/Documentation/gpu/drm-ras.rst
> +++ b/Documentation/gpu/drm-ras.rst
> @@ -54,6 +54,8 @@ User space tools can:
>     ``node-id`` and ``error-id`` as parameters.
>   * Query specific error threshold value with the ``get-error-threshold`` command, using both
>     ``node-id`` and ``error-id`` as parameters.
> +* Set specific error threshold value with the ``set-error-threshold`` command, using
> +  ``node-id``, ``error-id`` and ``error-threshold`` as parameters.
>   
>   YAML-based Interface
>   --------------------
> @@ -109,3 +111,10 @@ Example: Query threshold value of a given error
>   
>       sudo ynl --family drm_ras --do get-error-threshold --json '{"node-id":0, "error-id":1}'
>       {'error-id': 1, 'error-name': 'error_name1', 'error-threshold': 0}
> +
> +Example: Set threshold value of a given error
> +
> +.. code-block:: bash
> +
> +    sudo ynl --family drm_ras --do set-error-threshold --json '{"node-id":0, "error-id":1, "error-threshold":8}'
> +    None
> diff --git a/Documentation/netlink/specs/drm_ras.yaml b/Documentation/netlink/specs/drm_ras.yaml
> index 95a939fb987d..09824309cdff 100644
> --- a/Documentation/netlink/specs/drm_ras.yaml
> +++ b/Documentation/netlink/specs/drm_ras.yaml
> @@ -150,3 +150,15 @@ operations:
>               - error-id
>               - error-name
>               - error-threshold
> +    -
> +      name: set-error-threshold
> +      doc: >-
> +           Set threshold value of the error.
> +      attribute-set: error-threshold-attrs
> +      flags: [admin-perm]
> +      do:
> +        request:
> +          attributes:
> +            - node-id
> +            - error-id
> +            - error-threshold
> diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
> index d2d853d5d69c..e4ff6d87f824 100644
> --- a/drivers/gpu/drm/drm_ras.c
> +++ b/drivers/gpu/drm/drm_ras.c
> @@ -41,6 +41,9 @@
>    *    Userspace must provide Node ID and Error ID.
>    *    Returns the threshold value of a specific error.
>    *
> + * 4. SET_ERROR_THRESHOLD: Set threshold value of the error.
> + *    Userspace must provide Node ID, Error ID and Threshold value to be set.
> + *
>    * Node registration:
>    *
>    * - drm_ras_node_register(): Registers a new node and assigns
> @@ -72,6 +75,8 @@
>    *   operation, fetching a counter value from a specific node.
>    * - drm_ras_nl_get_error_threshold_doit(): Implements the GET_ERROR_THRESHOLD doit
>    *   operation, fetching the threshold value of a specific error.
> + * - drm_ras_nl_set_error_threshold_doit(): Implements the SET_ERROR_THRESHOLD doit
> + *   operation, setting the threshold value of a specific error.
>    */
>   
>   static DEFINE_XARRAY_ALLOC(drm_ras_xa);
> @@ -184,6 +189,21 @@ static int get_node_error_threshold(u32 node_id, u32 error_id,
>   	return node->query_error_threshold(node, error_id, name, value);
>   }
>   
> +static int set_node_error_threshold(u32 node_id, u32 error_id, u32 value)
> +{
> +	struct drm_ras_node *node;
> +
> +	node = xa_load(&drm_ras_xa, node_id);
> +	if (!node || !node->set_error_threshold)
> +		return -ENOENT;

Use -EOPNOTSUPP for absence of function

> +
> +	if (error_id < node->error_counter_range.first ||
> +	    error_id > node->error_counter_range.last)
> +		return -EINVAL;
> +
> +	return node->set_error_threshold(node, error_id, value);
> +}
> +
>   static int msg_reply_counter_value(struct sk_buff *msg, u32 error_id,
>   				   const char *error_name, u32 value)
>   {
> @@ -417,6 +437,34 @@ int drm_ras_nl_get_error_threshold_doit(struct sk_buff *skb,
>   	return doit_reply_threshold_value(info, node_id, error_id);
>   }
>   
> +/**
> + * drm_ras_nl_set_error_threshold_doit() - Set threshold value of the error
> + * @skb: Netlink message buffer
> + * @info: Generic Netlink info containing attributes of the request
> + *
> + * Extracts the node ID, error ID and threshold value from the netlink attributes
> + * and sets the threshold of the corresponding error.
> + *
> + * Return: 0 on success, or negative errno on failure.
> + */
> +int drm_ras_nl_set_error_threshold_doit(struct sk_buff *skb,
> +				      struct genl_info *info)
> +{
> +	u32 node_id, error_id, value;
> +
> +	if (!info->attrs ||
> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_THRESHOLD_ATTRS_NODE_ID) ||
> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_ID) ||
> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_THRESHOLD))
> +		return -EINVAL;
> +
> +	node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_NODE_ID]);
> +	error_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_ID]);
> +	value = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_THRESHOLD]);

do we need a check for max threshold here? Probably configured by drivers?
Or is it upto the driver to check? If its upto the driver, please add it 
in the document

Thanks
Riana

> +
> +	return set_node_error_threshold(node_id, error_id, value);
> +}
> +
>   /**
>    * drm_ras_node_register() - Register a new RAS node
>    * @node: Node structure to register
> diff --git a/drivers/gpu/drm/drm_ras_nl.c b/drivers/gpu/drm/drm_ras_nl.c
> index 48e231734f4d..8b202d773dac 100644
> --- a/drivers/gpu/drm/drm_ras_nl.c
> +++ b/drivers/gpu/drm/drm_ras_nl.c
> @@ -28,6 +28,13 @@ static const struct nla_policy drm_ras_get_error_threshold_nl_policy[DRM_RAS_A_E
>   	[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_ID] = { .type = NLA_U32, },
>   };
>   
> +/* DRM_RAS_CMD_SET_ERROR_THRESHOLD - do */
> +static const struct nla_policy drm_ras_set_error_threshold_nl_policy[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_THRESHOLD + 1] = {
> +	[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_NODE_ID] = { .type = NLA_U32, },
> +	[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_ID] = { .type = NLA_U32, },
> +	[DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_THRESHOLD] = { .type = NLA_U32, },
> +};
> +
>   /* Ops table for drm_ras */
>   static const struct genl_split_ops drm_ras_nl_ops[] = {
>   	{
> @@ -56,6 +63,13 @@ static const struct genl_split_ops drm_ras_nl_ops[] = {
>   		.maxattr	= DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_ID,
>   		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
>   	},
> +	{
> +		.cmd		= DRM_RAS_CMD_SET_ERROR_THRESHOLD,
> +		.doit		= drm_ras_nl_set_error_threshold_doit,
> +		.policy		= drm_ras_set_error_threshold_nl_policy,
> +		.maxattr	= DRM_RAS_A_ERROR_THRESHOLD_ATTRS_ERROR_THRESHOLD,
> +		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> +	},
>   };
>   
>   struct genl_family drm_ras_nl_family __ro_after_init = {
> diff --git a/drivers/gpu/drm/drm_ras_nl.h b/drivers/gpu/drm/drm_ras_nl.h
> index 540fe22e2312..9db7f5d00201 100644
> --- a/drivers/gpu/drm/drm_ras_nl.h
> +++ b/drivers/gpu/drm/drm_ras_nl.h
> @@ -20,6 +20,8 @@ int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb,
>   					struct netlink_callback *cb);
>   int drm_ras_nl_get_error_threshold_doit(struct sk_buff *skb,
>   					struct genl_info *info);
> +int drm_ras_nl_set_error_threshold_doit(struct sk_buff *skb,
> +					struct genl_info *info);
>   
>   extern struct genl_family drm_ras_nl_family;
>   
> diff --git a/include/drm/drm_ras.h b/include/drm/drm_ras.h
> index 50cee70bd065..7a69821b8b78 100644
> --- a/include/drm/drm_ras.h
> +++ b/include/drm/drm_ras.h
> @@ -71,6 +71,19 @@ struct drm_ras_node {
>   	 */
>   	int (*query_error_threshold)(struct drm_ras_node *node, u32 error_id,
>   				     const char **name, u32 *val);
> +	/**
> +	 * @set_error_threshold:
> +	 *
> +	 * This callback is used by drm-ras to set threshold value of a specific
> +	 * error.
> +	 *
> +	 * Driver should expect set_error_threshold() to be called with error_id
> +	 * from `error_counter_range.first` to `error_counter_range.last`.
> +	 *
> +	 * Returns: 0 on success, negative error code on failure.
> +	 */
> +	int (*set_error_threshold)(struct drm_ras_node *node, u32 error_id,
> +				   u32 val);
>   
>   	/** @priv: Driver private data */
>   	void *priv;
> diff --git a/include/uapi/drm/drm_ras.h b/include/uapi/drm/drm_ras.h
> index 49c5ca497d73..8ff0311d0d63 100644
> --- a/include/uapi/drm/drm_ras.h
> +++ b/include/uapi/drm/drm_ras.h
> @@ -52,6 +52,7 @@ enum {
>   	DRM_RAS_CMD_LIST_NODES = 1,
>   	DRM_RAS_CMD_GET_ERROR_COUNTER,
>   	DRM_RAS_CMD_GET_ERROR_THRESHOLD,
> +	DRM_RAS_CMD_SET_ERROR_THRESHOLD,
>   
>   	__DRM_RAS_CMD_MAX,
>   	DRM_RAS_CMD_MAX = (__DRM_RAS_CMD_MAX - 1)

  reply	other threads:[~2026-04-22  6:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-17 21:16 [PATCH v1 00/11] Introduce error threshold to drm_ras Raag Jadav
2026-04-17 21:16 ` [PATCH v1 01/11] drm/ras: Update counter helpers with counter naming Raag Jadav
2026-04-17 21:16 ` [PATCH v1 02/11] drm/ras: Introduce get-error-threshold Raag Jadav
2026-04-22  5:49   ` Tauro, Riana
2026-04-22  6:21     ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 03/11] drm/ras: Introduce set-error-threshold Raag Jadav
2026-04-22  6:12   ` Tauro, Riana [this message]
2026-04-17 21:16 ` [PATCH v1 04/11] drm/xe/uapi: Add additional error components to XE drm_ras Raag Jadav
2026-04-17 21:16 ` [PATCH v1 05/11] drm/xe/sysctrl: Add system controller interrupt handler Raag Jadav
2026-04-22  5:55   ` Tauro, Riana
2026-04-22  6:25     ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 06/11] drm/xe/sysctrl: Add system controller event support Raag Jadav
2026-04-17 21:16 ` [PATCH v1 07/11] drm/xe/ras: Introduce correctable error handling Raag Jadav
2026-04-17 21:16 ` [PATCH v1 08/11] drm/xe/ras: Get error threshold support Raag Jadav
2026-05-11 17:10   ` Tauro, Riana
2026-05-12 14:37     ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 09/11] drm/xe/ras: Set " Raag Jadav
2026-05-11 17:21   ` Tauro, Riana
2026-05-12 14:44     ` Raag Jadav
2026-05-12 16:52       ` Raag Jadav
2026-04-17 21:16 ` [PATCH v1 10/11] drm/xe/drm_ras: Wire up error threshold callbacks Raag Jadav
2026-05-11 17:30   ` Tauro, Riana
2026-04-17 21:16 ` [PATCH v1 11/11] drm/xe/ras: Add flag for Xe RAS Raag Jadav
2026-04-30 14:24   ` Tauro, Riana
2026-04-20 19:51 ` ✗ CI.checkpatch: warning for Introduce error threshold to drm_ras Patchwork
2026-04-20 19:52 ` ✓ CI.KUnit: success " Patchwork
2026-04-20 21:04 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-21  0:01 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=663d0ee8-98d6-4f26-ab0f-b3ab44d0fa23@intel.com \
    --to=riana.tauro@intel.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=airlied@gmail.com \
    --cc=anoop.c.vijay@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=edumazet@google.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=kuba@kernel.org \
    --cc=lijo.lazar@amd.com \
    --cc=maarten@lankhorst.se \
    --cc=mallesh.koujalagi@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=raag.jadav@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=soham.purkait@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    --cc=zachary.mckevitt@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.