From: Felix Kuehling <felix.kuehling@amd.com>
To: "Russell, Kent" <Kent.Russell@amd.com>,
"Martin, Andrew" <Andrew.Martin@amd.com>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdkfd: FORWARD NULL
Date: Fri, 14 Nov 2025 16:47:43 -0500 [thread overview]
Message-ID: <8f5fead3-2f4e-4352-a4d1-2d316bdf23f7@amd.com> (raw)
In-Reply-To: <PH7PR12MB590242762FBB162CCE923A0385CAA@PH7PR12MB5902.namprd12.prod.outlook.com>
On 2025-11-14 12:02, Russell, Kent wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Andrew
>> Martin
>> Sent: Friday, November 14, 2025 9:41 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Martin, Andrew <Andrew.Martin@amd.com>
>> Subject: [PATCH] drm/amdkfd: FORWARD NULL
>>
>> This patch fixes issues when the code moves forward with a potential
>> NULL pointer, without checking.
>>
>> Signed-off-by: Andrew Martin <andrew.martin@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 2 ++
>> drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +++++-
>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++
>> 3 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> index 1ef758ac5076..71b7db5de69f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>> @@ -104,6 +104,8 @@ static const char *amdkfd_fence_get_driver_name(struct
>> dma_fence *f)
>> static const char *amdkfd_fence_get_timeline_name(struct dma_fence *f)
>> {
>> struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> + if (!fence)
>> + return NULL;
> Felix can hopefully confirm, but we may need to have something here, since the references here expect something. Like in
> #define AMDGPU_JOB_GET_TIMELINE_NAME(job) \
> job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished)
For amdgpu Job fences the above makes sense. But KFD fences are our
evictions fences. There is no job associated with them.
I don't think the NULL check is needed here. to_amdgpu_amdkfd_fence
returns NULL if the f is NULL or the fence is not an
amdgpu_amdkfd_fence, based on the fence_ops. But we're in a fence_ops
callback here that should only be called for amdgpu_amdkfd_fences.
That said, if you need a check to satisfy a static checker, I suggest this:
return fence ? fence->timeline_name : NULL;
>
>> return fence->timeline_name;
>> }
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
>> index ba99e0f258ae..42fa137bdb2f 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
>> @@ -517,7 +517,9 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target,
>> uint32_t *flags)
>>
>> for (i = 0; i < target->n_pdds; i++) {
>> struct kfd_topology_device *topo_dev =
>> - kfd_topology_device_by_id(target->pdds[i]->dev->id);
>> + kfd_topology_device_by_id(target->pdds[i]->dev->id);
>> + if (!topo_dev)
>> + continue;
This loop checks the capabilities of all the devices. If a device is not
found, we should assume the worst and return an error, instead of just
assuming that it'll be fine.
>> uint32_t caps = topo_dev->node_props.capability;
>>
>> if (!(caps &
>> HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED)
>> &&
>> @@ -1071,6 +1073,8 @@ int kfd_dbg_trap_device_snapshot(struct kfd_process
>> *target,
>> for (i = 0; i < tmp_num_devices; i++) {
>> struct kfd_process_device *pdd = target->pdds[i];
>> struct kfd_topology_device *topo_dev =
>> kfd_topology_device_by_id(pdd->dev->id);
>> + if (!topo_dev)
>> + continue;
I'd return an error here as well, because we obviously don't have
accurate device info.
>>
>> device_info.gpu_id = pdd->dev->id;
>> device_info.exception_status = pdd->exception_status;
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index f5d173f1ca3b..f40d93f82ede 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -1685,6 +1685,9 @@ int kfd_process_device_init_vm(struct
>> kfd_process_device *pdd,
>> struct kfd_node *dev;
>> int ret;
>>
>> + if (!pdd)
>> + return -EINVAL;
>> +
We generally assume that functions are called with valid parameters.
With that argument, we should probably remove the check for !drm_file
below as well.
Regards,
Felix
>> if (!drm_file)
>> return -EINVAL;
> Probably easier to just combine the !pdd and !drm_file into the same check.
>
> Kent
>> --
>> 2.43.0
next prev parent reply other threads:[~2025-11-14 21:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 14:40 [PATCH] drm/amdkfd: FORWARD NULL Andrew Martin
2025-11-14 17:02 ` Russell, Kent
2025-11-14 17:37 ` Martin, Andrew
2025-11-14 21:47 ` Felix Kuehling [this message]
2025-11-17 18:13 ` Martin, Andrew
2025-11-18 6:00 ` Kuehling, Felix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f5fead3-2f4e-4352-a4d1-2d316bdf23f7@amd.com \
--to=felix.kuehling@amd.com \
--cc=Andrew.Martin@amd.com \
--cc=Kent.Russell@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox