public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Donet Tom <donettom@linux.ibm.com>
To: "Christian König" <christian.koenig@amd.com>,
	amd-gfx@lists.freedesktop.org,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Alex Deucher" <alexdeucher@gmail.com>,
	"Philip Yang" <yangp@amd.com>
Cc: David.YatSin@amd.com, Kent.Russell@amd.com,
	Ritesh Harjani <ritesh.list@gmail.com>,
	Vaidyanathan Srinivasan <svaidy@linux.ibm.com>,
	stable@vger.kernel.org, Donet Tom <donettom@linux.ibm.com>
Subject: Re: [RESEND RFC PATCH v3 1/6] drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 2 PAGE_SIZE pages
Date: Tue, 24 Mar 2026 23:49:05 +0530	[thread overview]
Message-ID: <6171f849-4164-4fd5-b31e-79c08df936c2@linux.ibm.com> (raw)
In-Reply-To: <bf255b34-0def-4a0b-a07d-30b9271b0166@amd.com>


On 3/23/26 6:42 PM, Christian König wrote:
> On 3/23/26 12:50, Donet Tom wrote:
>> On 3/23/26 3:41 PM, Christian König wrote:
>>
>> Hi Christian
>>
>>> On 3/23/26 05:28, Donet Tom wrote:
>>>> Currently, AMDGPU_VA_RESERVED_TRAP_SIZE is hardcoded to 8KB, while
>>>> KFD_CWSR_TBA_TMA_SIZE is defined as 2 * PAGE_SIZE. On systems with
>>>> 4K pages, both values match (8KB), so allocation and reserved space
>>>> are consistent.
>>>>
>>>> However, on 64K page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB,
>>>> while the reserved trap area remains 8KB. This mismatch causes the
>>>> kernel to crash when running rocminfo or rccl unit tests.
>>>>
>>>> Kernel attempted to read user page (2) - exploit attempt? (uid: 1001)
>>>> BUG: Kernel NULL pointer dereference on read at 0x00000002
>>>> Faulting instruction address: 0xc0000000002c8a64
>>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>>> CPU: 34 UID: 1001 PID: 9379 Comm: rocminfo Tainted: G E
>>>> 6.19.0-rc4-amdgpu-00320-gf23176405700 #56 VOLUNTARY
>>>> Tainted: [E]=UNSIGNED_MODULE
>>>> Hardware name: IBM,9105-42A POWER10 (architected) 0x800200 0xf000006
>>>> of:IBM,FW1060.30 (ML1060_896) hv:phyp pSeries
>>>> NIP:  c0000000002c8a64 LR: c00000000125dbc8 CTR: c00000000125e730
>>>> REGS: c0000001e0957580 TRAP: 0300 Tainted: G E
>>>> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24008268
>>>> XER: 00000036
>>>> CFAR: c00000000125dbc4 DAR: 0000000000000002 DSISR: 40000000
>>>> IRQMASK: 1
>>>> GPR00: c00000000125d908 c0000001e0957820 c0000000016e8100
>>>> c00000013d814540
>>>> GPR04: 0000000000000002 c00000013d814550 0000000000000045
>>>> 0000000000000000
>>>> GPR08: c00000013444d000 c00000013d814538 c00000013d814538
>>>> 0000000084002268
>>>> GPR12: c00000000125e730 c000007e2ffd5f00 ffffffffffffffff
>>>> 0000000000020000
>>>> GPR16: 0000000000000000 0000000000000002 c00000015f653000
>>>> 0000000000000000
>>>> GPR20: c000000138662400 c00000013d814540 0000000000000000
>>>> c00000013d814500
>>>> GPR24: 0000000000000000 0000000000000002 c0000001e0957888
>>>> c0000001e0957878
>>>> GPR28: c00000013d814548 0000000000000000 c00000013d814540
>>>> c0000001e0957888
>>>> NIP [c0000000002c8a64] __mutex_add_waiter+0x24/0xc0
>>>> LR [c00000000125dbc8] __mutex_lock.constprop.0+0x318/0xd00
>>>> Call Trace:
>>>> 0xc0000001e0957890 (unreliable)
>>>> __mutex_lock.constprop.0+0x58/0xd00
>>>> amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x6fc/0xb60 [amdgpu]
>>>> kfd_process_alloc_gpuvm+0x54/0x1f0 [amdgpu]
>>>> kfd_process_device_init_cwsr_dgpu+0xa4/0x1a0 [amdgpu]
>>>> kfd_process_device_init_vm+0xd8/0x2e0 [amdgpu]
>>>> kfd_ioctl_acquire_vm+0xd0/0x130 [amdgpu]
>>>> kfd_ioctl+0x514/0x670 [amdgpu]
>>>> sys_ioctl+0x134/0x180
>>>> system_call_exception+0x114/0x300
>>>> system_call_vectored_common+0x15c/0x2ec
>>>>
>>>> This patch changes AMDGPU_VA_RESERVED_TRAP_SIZE to 2 * PAGE_SIZE,
>>>> ensuring that the reserved trap area matches the allocation size
>>>> across all page sizes.
>>>>
>>>> cc: stable@vger.kernel.org
>>>> Fixes: 34a1de0f7935 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole")
>>>> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>>>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index 139642eacdd0..a5eae49f9471 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -173,7 +173,7 @@ struct amdgpu_bo_vm;
>>>>   #define AMDGPU_VA_RESERVED_SEQ64_SIZE		(2ULL << 20)
>>>>   #define AMDGPU_VA_RESERVED_SEQ64_START(adev)	(AMDGPU_VA_RESERVED_CSA_START(adev) \
>>>>   						 - AMDGPU_VA_RESERVED_SEQ64_SIZE)
>>>> -#define AMDGPU_VA_RESERVED_TRAP_SIZE		(2ULL << 12)
>>>> +#define AMDGPU_VA_RESERVED_TRAP_SIZE		(2ULL << PAGE_SHIFT)
>>> Well using PAGE_SHIFT in amdgpu_vm.h looks quite broken to me.
>>>
>>> That makes the GPU VA reservation depend on the CPU page size and that is clearly not something we want to have.
>>>
>>> Where is KFD_CWSR_TBA_TMA_SIZE defined?
>>>
>> Thanks Christian for reviewing this patch.
>>
>> It is defined in kfd_priv.h.
>>
>> /*
>>   * Size of the per-process TBA+TMA buffer: 2 pages
>>   *
>>   * The first chunk is the TBA used for the CWSR ISA code. The second
>>   * chunk is used as TMA for user-mode trap handler setup in daisy-chain mode.
>>   */
>> #define KFD_CWSR_TBA_TMA_SIZE (PAGE_SIZE * 2)
>>
>>
>>
>> Could you please suggest the correct way to fix this issue?
> I'm only looking from the POV of the VM code on this, but my educated guess is that KFD_CWSR_TBA_TMA_SIZE should be 8k independent of the CPU page size.
>
> Background is that this is written by the shader trap handler and that byte code doesn't care what CPU architecture you have.
>
> But I think only the engineers working on that trap handler can really answer this. @Felix / @Philip?


Hi @christian @Felix @Philip

To remove the dependency on CPU page size, can we use

+#define AMDGPU_VA_RESERVED_TRAP_SIZE    (2ULL << 16)

During reservation, we reserve 128 bytes, but during
allocation, we use 2 * PAGE_SIZE.


-Donet

>
> Regards,
> Christian.
>
>> -Donet
>>
>>> Regards,
>>> Christian.
>>>
>>>>   #define AMDGPU_VA_RESERVED_TRAP_START(adev)	(AMDGPU_VA_RESERVED_SEQ64_START(adev) \
>>>>   						 - AMDGPU_VA_RESERVED_TRAP_SIZE)
>>>>   #define AMDGPU_VA_RESERVED_BOTTOM		(1ULL << 16)

  reply	other threads:[~2026-03-24 18:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1774239489.git.donettom@linux.ibm.com>
2026-03-23  4:28 ` [RESEND RFC PATCH v3 1/6] drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 2 PAGE_SIZE pages Donet Tom
2026-03-23 10:11   ` Christian König
     [not found]     ` <7beedf3b-99f7-4096-9a49-88f98b9b4eb5@linux.ibm.com>
2026-03-23 13:12       ` Christian König
2026-03-24 18:19         ` Donet Tom [this message]
2026-03-25  2:26           ` Kuehling, Felix
2026-03-25  9:34             ` Christian König
2026-03-25 10:26               ` Donet Tom
2026-03-25 10:29                 ` Christian König
2026-03-25 17:54                   ` Kuehling, Felix
2026-03-25 17:59                     ` Donet Tom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6171f849-4164-4fd5-b31e-79c08df936c2@linux.ibm.com \
    --to=donettom@linux.ibm.com \
    --cc=David.YatSin@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Kent.Russell@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ritesh.list@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=svaidy@linux.ibm.com \
    --cc=yangp@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox