All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Raag Jadav" <raag.jadav@intel.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v4 2/5] drm/ras: Introduce error threshold
Date: Wed, 24 Jun 2026 10:15:30 +0000	[thread overview]
Message-ID: <20260624101531.459DC1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623101043.255897-3-raag.jadav@intel.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Medium] The `set-error-threshold` operation modifies state but only replies with a generic ACK instead of returning the finalized object state, violating Netlink uAPI guidelines.

Pre-existing issues:
- [Critical] Lockless lookup and dereference of `drm_ras_node` via `xa_load()` leads to a Use-After-Free if the node is concurrently unregistered.
--

commit 90a4ea5c0139594375be477d76556dae20fa84e0
Author: Raag Jadav <raag.jadav@intel.com>
Subject: drm/ras: Introduce error threshold

Add get-error-threshold and set-error-threshold command support which
allows querying/setting error threshold of the counter. Threshold in RAS
context means the number of errors the hardware is expected to accumulate
before it raises them to software.

> diff --git a/Documentation/netlink/specs/drm_ras.yaml b/Documentation/netlink/specs/drm_ras.yaml
> index e113056f8c016..9cf7f9cde2425 100644
> --- a/Documentation/netlink/specs/drm_ras.yaml
> +++ b/Documentation/netlink/specs/drm_ras.yaml
> @@ -124,3 +128,31 @@ operations:
[ ... ]
> +    -
> +      name: set-error-threshold
> +      doc: >-
> +           Set error threshold of a given counter.
> +      attribute-set: error-counter-attrs
> +      flags: [admin-perm]
> +      do:
> +        request:
> +          attributes:
> +            - node-id
> +            - error-id
> +            - error-threshold

[Severity: Medium]
Does this operation need a reply block to return the finalized threshold?

According to the Generic Netlink subsystem uAPI guidelines, operations that
modify state must reply with a full message carrying identifying information
about the object.

If the driver applies bounds checking and clamps or rounds the threshold,
user space won't know the actual applied value since the command only
returns a generic ACK. Is it possible to include a reply block here?

> diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
> index 467a169026fc1..d60c40ac5427a 100644
> --- a/drivers/gpu/drm/drm_ras.c
> +++ b/drivers/gpu/drm/drm_ras.c
> @@ -168,6 +189,40 @@ static int get_node_error_counter(u32 node_id, u32 error_id,
>  	return node->query_error_counter(node, error_id, name, value);
>  }
>  
> +static int get_node_error_threshold(u32 node_id, u32 error_id, const char **name, u32 *threshold)
> +{
> +	struct drm_ras_node *node;
> +
> +	node = xa_load(&drm_ras_xa, node_id);
> +	if (!node)
> +		return -ENOENT;
> +
> +	if (!node->query_error_threshold)
> +		return -EOPNOTSUPP;
> +
> +	if (error_id < node->error_counter_range.first || error_id > node->error_counter_range.last)
> +		return -EINVAL;
> +
> +	return node->query_error_threshold(node, error_id, name, threshold);

[Severity: Critical]
This is a pre-existing issue, but is it safe to dereference the node pointer
here without RCU protection or a reference count?

The xa_load() function drops the RCU read lock before returning. If a
concurrent thread unregisters the node and frees it, could this result in a
use-after-free when calling node->query_error_threshold()?

> +}
> +
> +static int set_node_error_threshold(u32 node_id, u32 error_id, u32 threshold)
> +{
> +	struct drm_ras_node *node;
> +
> +	node = xa_load(&drm_ras_xa, node_id);
> +	if (!node)
> +		return -ENOENT;
> +
> +	if (!node->set_error_threshold)
> +		return -EOPNOTSUPP;
> +
> +	if (error_id < node->error_counter_range.first || error_id > node->error_counter_range.last)
> +		return -EINVAL;
> +
> +	return node->set_error_threshold(node, error_id, threshold);

[Severity: Critical]
This is a pre-existing issue, but similarly to get_node_error_threshold(),
the node pointer is retrieved locklessly via xa_load() and immediately
dereferenced.

Could a concurrent unregistration of the device free the node while it is
being accessed here?

> +}
[ ... ]
> @@ -358,6 +466,59 @@ int drm_ras_nl_clear_error_counter_doit(struct sk_buff *skb,
>  	return node->clear_error_counter(node, error_id);
>  }
>  
[ ... ]
> +int drm_ras_nl_set_error_threshold_doit(struct sk_buff *skb, struct genl_info *info)
> +{
> +	u32 node_id, error_id, threshold;
> +
> +	if (!info->attrs ||
> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID) ||
> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID) ||
> +	    GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_THRESHOLD))
> +		return -EINVAL;
> +
> +	node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID]);
> +	error_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID]);
> +	threshold = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_THRESHOLD]);
> +
> +	return set_node_error_threshold(node_id, error_id, threshold);

[Severity: Medium]
This operation completes by returning the success code from
set_node_error_threshold(), which results in a generic ACK being sent to
user space.

Should this prepare and send a netlink response message containing the
actual applied threshold instead of relying on the ACK?

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623101043.255897-1-raag.jadav@intel.com?part=2

  reply	other threads:[~2026-06-24 10:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 10:09 [PATCH v4 0/5] Introduce error threshold to drm_ras Raag Jadav
2026-06-23 10:09 ` [PATCH v4 1/5] drm/ras: Cancel and free message on get counter failure Raag Jadav
2026-06-24 10:15   ` sashiko-bot
2026-06-23 10:09 ` [PATCH v4 2/5] drm/ras: Introduce error threshold Raag Jadav
2026-06-24 10:15   ` sashiko-bot [this message]
2026-06-23 10:09 ` [PATCH v4 3/5] drm/xe/ras: Add support for " Raag Jadav
2026-06-24 10:15   ` sashiko-bot
2026-06-23 10:09 ` [PATCH v4 4/5] drm/xe/drm_ras: Wire up error threshold callbacks Raag Jadav
2026-06-24 10:15   ` sashiko-bot
2026-06-23 10:09 ` [PATCH v4 5/5] drm/xe/sysctrl: Reuse xe_sysctrl_create_command() Raag Jadav
2026-06-23 10:56 ` ✗ CI.checkpatch: warning for Introduce error threshold to drm_ras (rev4) Patchwork
2026-06-23 10:57 ` ✓ CI.KUnit: success " Patchwork
2026-06-23 12:24 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-23 14:14 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624101531.459DC1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=raag.jadav@intel.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.