From: "Christian König" <christian.koenig@amd.com>
To: "SHANMUGAM, SRINIVASAN" <SRINIVASAN.SHANMUGAM@amd.com>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>
Cc: "amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: Fix PRT VA handling and guard BO access in VA update path
Date: Wed, 25 Mar 2026 14:28:07 +0100 [thread overview]
Message-ID: <a554a809-d8ae-473a-a068-a240f66c5ea9@amd.com> (raw)
In-Reply-To: <IA0PR12MB820867AB3A8984B75C9B460F9049A@IA0PR12MB8208.namprd12.prod.outlook.com>
On 3/25/26 14:18, SHANMUGAM, SRINIVASAN wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Wednesday, March 25, 2026 5:39 PM
>> To: SHANMUGAM, SRINIVASAN <SRINIVASAN.SHANMUGAM@amd.com>;
>> Deucher, Alexander <Alexander.Deucher@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix PRT VA handling and guard BO access in
>> VA update path
>>
>> On 3/25/26 12:58, Srinivasan Shanmugam wrote:
>>> PRT (Page Request Table) mappings are not backed by a real buffer. In
>>
>> PRT (Partial Resident Texture).
>>
>>> this case, bo_va is valid, but bo_va->bo is NULL, meaning the mapping
>>> exists but does not point to any real buffer object.
>>>
>>> amdgpu_gem_va_ioctl() currently mixes CLEAR and PRT handling, which
>>> can result in incorrect bo_va selection. CLEAR should use bo_va =
>>> NULL, while PRT should use the special fpriv->prt_va mapping.
>>>
>>> Fix this by clearly selecting bo_va:
>>> - use fpriv->prt_va for PRT
>>> - use NULL only for CLEAR
>>> - use amdgpu_vm_bo_find() for normal BO mappings
>>>
>>> Also, amdgpu_gem_va_update_vm() accesses bo_va->base.bo without
>>> checking if it is NULL. This is not valid for PRT mappings.
>>>
>>> This keeps CLEAR, PRT, and normal cases separate and avoids invalid
>>> memory access.
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Suggested-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 18 ++++++++++++++----
>>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index b0ba2bdaf43a..289d6b58b579 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -772,8 +772,10 @@ amdgpu_gem_va_update_vm(struct amdgpu_device
>> *adev,
>>> if (r)
>>> goto error;
>>>
>>> + /* Only do BO-specific handling if this VA is backed by a real BO */
>>> if ((operation == AMDGPU_VA_OP_MAP ||
>>> operation == AMDGPU_VA_OP_REPLACE) &&
>>> + bo_va->base.bo &&
>>
>> That is not correct. This branch here should also be taken for PRT mappings.
>>
>>> !amdgpu_vm_is_bo_always_valid(vm, bo_va->base.bo)) {
>>>
>>> /*
>>> @@ -909,15 +911,23 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
>> void *data,
>>> goto error;
>>> }
>>>
>>> - /* Resolve the BO-VA mapping for this VM/BO combination. */
>>> - if (abo) {
>>> + /* Resolve the BO-VA mapping for this VM/BO combination.
>>> + *
>>> + * Depending on the case decide bo_va:
>>> + * - PRT: use special per-file prt_va (bo_va valid, but bo_va->bo == NULL)
>>> + * - CLEAR: no BO involved → bo_va = NULL
>>> + * - Normal BO path: lookup mapping from VM
>>> + */
>>> + if (args->flags & AMDGPU_VM_PAGE_PRT) {
>>> + bo_va = fpriv->prt_va;
>>> + } else if (args->operation == AMDGPU_VA_OP_CLEAR) {
>>> + bo_va = NULL;
>>> + } else if (abo) {
>>> bo_va = amdgpu_vm_bo_find(&fpriv->vm, abo);
>>> if (!bo_va) {
>>> r = -ENOENT;
>>> goto error;
>>> }
>>> - } else if (args->operation != AMDGPU_VA_OP_CLEAR) {
>>> - bo_va = fpriv->prt_va;
>>
>> That code already looks correct to me. I don't think we need to change anything
>> here.
>>
>> Where is your crash actually coming from?
>
> Hi Christian,
>
> The issue was observed in CI during IGT (amd_bo) runs, but I have not
> yet been able to reproduce it locally. Will continue investigating to
> identify the exact failing path.
That is most likely something completely different. As far as I can see the bo_va handling is correct.
>
> Below is the crash signature for reference:
>
> BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> Write of size 4 at addr 0000000000000000 by task amd_bo
That sounds a bit like the fallout from Pikes patch:
drm/amdgpu: fix syncobj leak for amdgpu_gem_va_ioctl()
It requires freeing the syncobj and chain
alloction resource.
Not sure what exactly goes wrong here.
Regards,
Christian.
>
> RIP: amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
> CR2: 0000000000000000
>
> I also tried to map the crash offset using gdb/objdump, but the results
> were not conclusive. The reported amdgpu_gem_va_ioctl+0x380 offset did
> not map cleanly to a single obvious source line
>
> So at this point I can localize the crash to amdgpu_gem_va_ioctl(), but
> still need to identify the exact failing pointer/path.
>
>
> [ 325.779102] ==================================================================
> [ 325.786483] BUG: KASAN: null-ptr-deref in amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> [ 325.795105] Write of size 4 at addr 0000000000000000 by task amd_bo/7893
> [ 325.801997]
> [ 325.803595] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Not tainted 6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
> [ 325.803602] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019
> [ 325.803606] Call Trace:
> [ 325.803609] <TASK>
> [ 325.803612] dump_stack_lvl+0x64/0x80
> [ 325.803623] kasan_report+0xb8/0xf0
> [ 325.803631] ? amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> [ 325.804427] kasan_check_range+0x105/0x1b0
> [ 325.804432] amdgpu_gem_va_ioctl+0x380/0x1130 [amdgpu]
> [ 325.805229] ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
> [ 325.806022] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [ 325.806815] ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
> [ 325.806894] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [ 325.807686] drm_ioctl_kernel+0x13d/0x2b0 [drm]
> [ 325.807767] ? __pfx_file_has_perm+0x10/0x10
> [ 325.807777] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
> [ 325.807857] drm_ioctl+0x4be/0xae0 [drm]
> [ 325.807936] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [ 325.808728] ? __pfx_sock_write_iter+0x10/0x10
> [ 325.808737] ? __pfx_drm_ioctl+0x10/0x10 [drm]
> [ 325.808816] ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
> [ 325.808823] ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
> [ 325.808827] ? _raw_spin_lock_irqsave+0x86/0xd0
> [ 325.808835] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [ 325.808841] amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
> [ 325.809622] __x64_sys_ioctl+0x139/0x1c0
> [ 325.809630] do_syscall_64+0x64/0x880
> [ 325.809638] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 325.809645] RIP: 0033:0x7f205fd12e1d
> [ 325.809650] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> [ 325.809654] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 325.809660] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f205fd12e1d
> [ 325.809663] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 0000000000000006
> [ 325.809665] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e
> [ 325.809668] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0406448
> [ 325.809670] R13: 0000000000000006 R14: 0000000000001000 R15: 0000000000000001
> [ 325.809675] </TASK>
> [ 325.809678] ==================================================================
> [ 326.029964] Disabling lock debugging due to kernel taint
> [ 326.035486] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 326.042557] #PF: supervisor write access in kernel mode
> [ 326.047887] #PF: error_code(0x0002) - not-present page
> [ 326.053132] PGD 0 P4D 0
> [ 326.055766] Oops: Oops: 0002 [#1] SMP KASAN NOPTI
> [ 326.060577] CPU: 12 UID: 0 PID: 7893 Comm: amd_bo Tainted: G B 6.19.0-1314135.2.zuul.928a0cbbebc74c4f8d5a99a4d0a7ca55 #1 PREEMPT(voluntary)
> [ 326.074815] Tainted: [B]=BAD_PAGE
> [ 326.078233] Hardware name: TYAN B�8021G88V2HR-2T/7] RIP: 0010:amdgpu_gem_va_ioctl+0x385/0x1130 [amdgpu]
> [ 326.093279] Code: 00 00 75 aa 85 c0 74 a6 41 89 c7 31 ed 45 31 f6 48 89 ef e8 dd bf 09 ce be 04 00 00 00 4c 89 f7 e8 90 0e 13 ce b8 ff ff ff ff <f0> 41 0f c1 06 83 f8 01 0f 84 3c 05 00 00 85 c0 0f 8e 75 05 00 00
> [ 326.112237] RSP: 0018:ffff88a0d02d7b60 EFLAGS: 00010246
> [ 326.117568] RAX: 00000000ffffffff RBX: ffff88907f0c2848 RCX: ffffffff8f43434a
> [ 326.124813] RDX: fffffbfff2a16c0d RSI: 0000000000000008 RDI: ffffffff950b6060
> [ 326.132056] RBP: 0000000000000000 R08: 0000000000000001 R09: fffffbfff2a16c0c
> [ 326.139303] R10: ffffffff950b6067 R11: 0000000000000001 R12: ffff88b1349d7778
> [ 326.146548] R13: ffff88a0d02d7c00 R14: 0000000000000000 R15: 0000000000000000
> [ 326.153794] FS: 00007f205dbad940(0000) GS:ffff88c00aa09000(0000) knlGS:0000000000000000
> [ 326.162023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 326.167872] CR2: 0000000000000000 CR3: 000000207940e000 CR4: 00000000003506f0
> [ 326.175113] Call Trace:
> [ 326.177661] <TASK>
> [ 326.179861] ? __pfx_amdgpu_gem_create_ioctl+0x10/0x10 [amdgpu]
> [ 326.186637] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [ 326.193168] ? __pfx___drm_dev_dbg+0x10/0x10 [drm]
> [ 326.198141] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [ 326.204608] drm_ioctl_kernel+0x13d/0x2b0 [drm]
> [ 326.209319] ? __pfx_file_has_perm+0x10/0x10
> [ 326.213696] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
> [ 326.218934] drm_ioctl+0x4be/0xae0 [drm]
> [ 326.223109] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
> [ 326.229576] ? __pfx_sock_write_iter+0x10/0x10
> [ 326.234130] ? __pfx_drm_ioctl+0x10/0x10 [drm]
> [ 326.238752] ? ioctl_has_perm.constprop.0.isra.0+0x2ad/0x490
> [ 326.244518] ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
> [ 326.250630] ? _raw_spin_lock_irqsave+0x86/0xd0
> [ 326.255268] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [ 326.260429] amdgpu_drm_ioctl+0xce/0x180 [amdgpu]
> [ 326.266018] __x64_sys_ioctl+0x139/0x1c0
> [ 326.270056] do_syscall_64+0x64/0x880
> [ 326.273827] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 326.278983] RIP: 0033:0x7f205fd12e1d
> [ 326.282660] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> [ 326.301609] RSP: 002b:00007ffe9032b510 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 326.309316] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f205fd12e1d
> [ 326.316560] RDX: 00007ffe9032b5b0 RSI: 00000000c0406448 RDI: 0000000000000006
> [ 326.323855] RBP: 00007ffe9032b560 R08: 0000000100000000 R09: 000000000000000e
> [ 326.331103] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0406448
> [ 326.338347] R13: 0000000000000006 R14: 0000000000001000 R15: 0000000000000001
> [ 326.345595] </TASK>
>
> Thanks!
> Srini
>
>>
>> Regards,
>> Christian.
>>
>>> } else {
>>> bo_va = NULL;
>>> }
>
prev parent reply other threads:[~2026-03-25 13:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-25 11:58 [PATCH] drm/amdgpu: Fix PRT VA handling and guard BO access in VA update path Srinivasan Shanmugam
2026-03-25 12:09 ` Christian König
2026-03-25 13:18 ` SHANMUGAM, SRINIVASAN
2026-03-25 13:28 ` Christian König [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a554a809-d8ae-473a-a068-a240f66c5ea9@amd.com \
--to=christian.koenig@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=SRINIVASAN.SHANMUGAM@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox