All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Raag Jadav" <raag.jadav@intel.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v4 1/5] drm/ras: Cancel and free message on get counter failure
Date: Wed, 24 Jun 2026 10:15:29 +0000	[thread overview]
Message-ID: <20260624101530.56CFB1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623101043.255897-2-raag.jadav@intel.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Lockless access to `drm_ras_node` pointers from the global `drm_ras_xa` XArray leads to Use-After-Free and control-flow hijacking during concurrent device teardown.
- [Medium] Missing generation counter and NLM_F_DUMP_INTR flag for lockless Netlink dumps.
--

commit 17d70fc870ba118f3745064732f976f0856f2d46
Author: Raag Jadav <raag.jadav@intel.com>

drm/ras: Cancel and free message on get counter failure

This commit fixes a resource leak in doit_reply_value() where sk_buff
and the genetlink header were not cleaned up on a get counter failure.
It also consolidates the error handling using a goto pattern.

> diff --git a/drivers/gpu/drm/drm_ras.c b/drivers/gpu/drm/drm_ras.c
> index d6eab29a1394c..467a169026fc1 100644
> --- a/drivers/gpu/drm/drm_ras.c
> +++ b/drivers/gpu/drm/drm_ras.c
> @@ -201,25 +201,28 @@ static int doit_reply_value(struct genl_info *info, u32 node_id,
[ ... ]
>  	ret = get_node_error_counter(node_id, error_id,
>  				     &error_name, &value);

[Severity: Critical]
This is a pre-existing issue, but does lockless access to drm_ras_node
pointers from the global drm_ras_xa XArray lead to a use-after-free
during concurrent device teardown?

The drm_ras netlink operations use .parallel_ops = true, meaning handlers
like get_node_error_counter() run concurrently without holding the global
genl_lock().

When retrieving a drm_ras_node pointer via xa_load(), the XArray API drops
the RCU read lock immediately after the lookup, returning a raw, unprotected
pointer:

drivers/gpu/drm/drm_ras.c:get_node_error_counter() {
	node = xa_load(&drm_ras_xa, node_id);
	if (!node || !node->query_error_counter)
		return -ENOENT;
	...
	return node->query_error_counter(node, error_id, name, value);
}

Concurrently, a driver can unregister its nodes during teardown, which removes
the node from the XArray and synchronously frees the node memory via DRM's
managed cleanup.

Could the netlink handler then dereference this freed pointer, calling function
pointers read from freed memory?

This same pattern appears in drm_ras_nl_list_nodes_dumpit() and
get_node_error_threshold().


[Severity: Medium]
This is also a pre-existing issue, but are we missing a generation counter and
NLM_F_DUMP_INTR flag for lockless Netlink dumps?

The Generic Netlink UAPI rules dictate: "If iteration during a dump may skip or
repeat objects (e.g. due to lockless data structures), set NLM_F_DUMP_INTR on
the affected message(s)."

drm_ras_nl_list_nodes_dumpit() and drm_ras_nl_get_error_counter_dumpit()
iterate over the lockless drm_ras_xa XArray across potentially multiple netlink
continuations:

drivers/gpu/drm/drm_ras.c:drm_ras_nl_list_nodes_dumpit() {
	xa_for_each_start(&drm_ras_xa, id, node, ctx->restart) {
		...
	}
}

Because drm_ras_xa can be modified concurrently by node registration or
unregistration, a multipart dump might skip newly registered elements. Should
the implementation maintain a generation counter or set NLM_F_DUMP_INTR on the
netlink message to inform userspace of the sequence change?

>  	if (ret)
> -		return ret;
> +		goto cancel_msg;
>  
>  	ret = msg_reply_value(msg, error_id, error_name, value);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623101043.255897-1-raag.jadav@intel.com?part=1

  reply	other threads:[~2026-06-24 10:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 10:09 [PATCH v4 0/5] Introduce error threshold to drm_ras Raag Jadav
2026-06-23 10:09 ` [PATCH v4 1/5] drm/ras: Cancel and free message on get counter failure Raag Jadav
2026-06-24 10:15   ` sashiko-bot [this message]
2026-06-23 10:09 ` [PATCH v4 2/5] drm/ras: Introduce error threshold Raag Jadav
2026-06-24 10:15   ` sashiko-bot
2026-06-23 10:09 ` [PATCH v4 3/5] drm/xe/ras: Add support for " Raag Jadav
2026-06-24 10:15   ` sashiko-bot
2026-06-23 10:09 ` [PATCH v4 4/5] drm/xe/drm_ras: Wire up error threshold callbacks Raag Jadav
2026-06-24 10:15   ` sashiko-bot
2026-06-23 10:09 ` [PATCH v4 5/5] drm/xe/sysctrl: Reuse xe_sysctrl_create_command() Raag Jadav
2026-06-23 10:56 ` ✗ CI.checkpatch: warning for Introduce error threshold to drm_ras (rev4) Patchwork
2026-06-23 10:57 ` ✓ CI.KUnit: success " Patchwork
2026-06-23 12:24 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-23 14:14 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624101530.56CFB1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=raag.jadav@intel.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.