AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Jesse.Zhang" <Jesse.Zhang@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Alexander.Deucher@amd.com
Subject: Re: [PATCH] drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine
Date: Mon, 29 Sep 2025 14:06:29 +0200	[thread overview]
Message-ID: <7d70caf5-e74c-4f28-9e4b-efd2f52810f6@amd.com> (raw)
In-Reply-To: <20250929085121.3181721-1-Jesse.Zhang@amd.com>

On 29.09.25 10:51, Jesse.Zhang wrote:
> After GPU reset with VRAM loss, a general protection fault occurs
> during user queue restoration when accessing vm_bo->vm after
> spinlock release in amdgpu_vm_bo_reset_state_machine.
> 
> The root cause is that vm_bo points to the last entry from the
> list_for_each_entry loop, but this becomes invalid after the
> spinlock is released. Accessing vm_bo->vm at this point leads
> to memory corruption.
> 
> Crash log shows:
> [  326.981811] Oops: general protection fault, probably for non-canonical address 0x4156415741e58ac8: 0000 [#1] SMP NOPTI
> [  326.981820] CPU: 13 UID: 0 PID: 1035 Comm: kworker/13:3 Tainted: G            E       6.16.0+ #25 PREEMPT(voluntary)
> [  326.981826] Tainted: [E]=UNSIGNED_MODULE
> [  326.981827] Hardware name: Gigabyte Technology Co., Ltd. X870E AORUS PRO ICE/X870E AORUS PRO ICE, BIOS F3i 12/19/2024
> [  326.981831] Workqueue: events amdgpu_userq_restore_worker [amdgpu]
> [  326.981999] RIP: 0010:amdgpu_vm_assert_locked+0x16/0x70 [amdgpu]
> [  326.982094] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 45 48 8b 87 80 03 00 00 48 85 c0 74 40 <48> 8b b8 80 01 00 00 48 85 ff 74 3b 8b 05 0c b7 0e f0 85 c0 75 05
> [  326.982098] RSP: 0018:ffffaa91c2a6bc20 EFLAGS: 00010206
> [  326.982100] RAX: 4156415741e58948 RBX: ffff9e8f013e8330 RCX: 0000000000000000
> [  326.982102] RDX: 0000000000000005 RSI: 000000001d254e88 RDI: ffffffffc144814a
> [  326.982104] RBP: ffffaa91c2a6bc68 R08: 0000004c21a25674 R09: 0000000000000001
> [  326.982106] R10: 0000000000000001 R11: dccaf3f2f82863fc R12: ffff9e8f013e8000
> [  326.982108] R13: ffff9e8f013e8000 R14: 0000000000000000 R15: ffff9e8f09980000
> [  326.982110] FS:  0000000000000000(0000) GS:ffff9e9e79995000(0000) knlGS:0000000000000000
> [  326.982112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  326.982114] CR2: 000055ed6c9caa80 CR3: 0000000797060000 CR4: 0000000000750ef0
> [  326.982116] PKRU: 55555554
> 
> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 563cad9c6cbc..86c8288c665f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -265,7 +265,7 @@ static void amdgpu_vm_bo_reset_state_machine(struct amdgpu_vm *vm)
>  		vm_bo->moved = true;
>  	spin_unlock(&vm->invalidated_lock);
>  
> -	amdgpu_vm_assert_locked(vm_bo->vm);
> +	amdgpu_vm_assert_locked(vm);
>  	list_for_each_entry_safe(vm_bo, tmp, &vm->idle, vm_status) {
>  		struct amdgpu_bo *bo = vm_bo->bo;
>  


      reply	other threads:[~2025-09-29 12:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-29  8:51 [PATCH] drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine Jesse.Zhang
2025-09-29 12:06 ` Christian König [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d70caf5-e74c-4f28-9e4b-efd2f52810f6@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Jesse.Zhang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox