From: Alex Deucher <alexander.deucher@amd.com>
To: <amd-gfx@lists.freedesktop.org>
Cc: Alex Deucher <alexander.deucher@amd.com>,
<SRINIVASAN.SHANMUGAM@amd.com>, <vitaly.prosyak@amd.com>,
<christian.koenig@amd.com>,
Matthew Brost <matthew.brost@intel.com>
Subject: [PATCH V2] drm/amdgpu: fix a job->pasid access race in gpu recovery
Date: Wed, 10 Dec 2025 15:23:47 -0500 [thread overview]
Message-ID: <20251210202347.63243-1-alexander.deucher@amd.com> (raw)
Avoid a possible UAF in GPU recovery due to a race between
the sched timeout callback and the tdr work queue.
The gpu recovery function calls drm_sched_stop() and
later drm_sched_start(). drm_sched_start() restarts
the tdr queue which will eventually free the job. If
the tdr queue frees the job before time out callback
completes, the job will be freed and we'll get a UAF
when accessing the pasid. Cache it early to avoid the
UAF.
Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info")
Cc: SRINIVASAN.SHANMUGAM@amd.com
Cc: vitaly.prosyak@amd.com
Cc: christian.koenig@amd.com
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
v2: Check the pasid rather than job (Lijo)
Add fixes tag (Christian)
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8a851d7548c00..c6b1dd95c401d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -6634,6 +6634,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
struct amdgpu_hive_info *hive = NULL;
int r = 0;
bool need_emergency_restart = false;
+ /* save the pasid here as the job may be freed before the end of the reset */
+ int pasid = job ? job->pasid : -EINVAL;
/*
* If it reaches here because of hang/timeout and a RAS error is
@@ -6734,8 +6736,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (!r) {
struct amdgpu_task_info *ti = NULL;
- if (job)
- ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid);
+ /*
+ * The job may already be freed at this point via the sched tdr workqueue so
+ * use the cached pasid.
+ */
+ if (pasid >= 0)
+ ti = amdgpu_vm_get_task_info_pasid(adev, pasid);
drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE,
ti ? &ti->task : NULL);
--
2.52.0
next reply other threads:[~2025-12-10 20:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 20:23 Alex Deucher [this message]
2025-12-11 4:44 ` [PATCH V2] drm/amdgpu: fix a job->pasid access race in gpu recovery SHANMUGAM, SRINIVASAN
2025-12-11 5:03 ` Lazar, Lijo
2025-12-11 5:22 ` SHANMUGAM, SRINIVASAN
2025-12-11 5:44 ` Lazar, Lijo
2025-12-11 6:07 ` Lazar, Lijo
2025-12-11 6:39 ` Matthew Brost
2025-12-11 7:15 ` Lazar, Lijo
2025-12-11 12:28 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251210202347.63243-1-alexander.deucher@amd.com \
--to=alexander.deucher@amd.com \
--cc=SRINIVASAN.SHANMUGAM@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=matthew.brost@intel.com \
--cc=vitaly.prosyak@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.