* [PATCH v2 0/3] Make device links between KFD and GPU device
@ 2025-12-10 20:15 Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 1/3] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-10 20:15 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD)
Discovering which KFD device is associated with a GPU is relatively
awkward right now in userspace.
This series creates sysfs links between the devices to simplify it
for userspace.
v2:
* Add tag
* Fix case that systems > 1 GPU wouldn't show links due to rebuilding
topology.
Mario Limonciello (AMD) (3):
amdkfd: Only ignore -ENOENT for KFD init failuires
amdkfd: Don't rebuild node tree when calling
kfd_topology_update_sysfs()
amdkfd: Add device links between kfd device and amdgpu device
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++-
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 42 +++++++++++++++++--
.../gpu/drm/amd/include/kgd_kfd_interface.h | 2 +
6 files changed, 57 insertions(+), 6 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/3] amdkfd: Only ignore -ENOENT for KFD init failuires
2025-12-10 20:15 [PATCH v2 0/3] Make device links between KFD and GPU device Mario Limonciello (AMD)
@ 2025-12-10 20:15 ` Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs() Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
2 siblings, 0 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-10 20:15 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD), Kent Russell
When compiled without CONFIG_HSA_AMD KFD will return -ENOENT.
As other errors will cause KFD functionality issues this is the
only error code that should be ignored at init.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Kent Russell <kent.russell@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 16adeba4d7e68..e804461e5f272 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -3169,8 +3169,10 @@ static int __init amdgpu_init(void)
amdgpu_register_atpx_handler();
amdgpu_acpi_detect();
- /* Ignore KFD init failures. Normal when CONFIG_HSA_AMD is not set. */
- amdgpu_amdkfd_init();
+ /* Ignore KFD init failures when CONFIG_HSA_AMD is not set. */
+ r = amdgpu_amdkfd_init();
+ if (r && r != -ENOENT)
+ goto error_fence;
if (amdgpu_pp_feature_mask & PP_OVERDRIVE_MASK) {
add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs()
2025-12-10 20:15 [PATCH v2 0/3] Make device links between KFD and GPU device Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 1/3] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
@ 2025-12-10 20:15 ` Mario Limonciello (AMD)
2025-12-10 20:40 ` Russell, Kent
2025-12-16 2:42 ` Kasiviswanathan, Harish
2025-12-10 20:15 ` [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
2 siblings, 2 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-10 20:15 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD)
There is no need to remove all the nodes and rebuild them. The content
will be the same. Instead check whether the node was created and skip
the creation.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index a0990dd2378c1..b65f29294e2d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -650,8 +650,8 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
uint32_t i, num_attrs;
struct attribute **attrs;
- if (WARN_ON(dev->kobj_node))
- return -EEXIST;
+ if (dev->kobj_node)
+ return 0;
/*
* Creating the sysfs folders
@@ -888,8 +888,6 @@ static int kfd_topology_update_sysfs(void)
return ret;
}
- kfd_remove_sysfs_node_tree();
-
return kfd_build_sysfs_node_tree();
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device
2025-12-10 20:15 [PATCH v2 0/3] Make device links between KFD and GPU device Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 1/3] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs() Mario Limonciello (AMD)
@ 2025-12-10 20:15 ` Mario Limonciello (AMD)
2025-12-10 20:40 ` Russell, Kent
2025-12-16 2:45 ` Kasiviswanathan, Harish
2 siblings, 2 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-10 20:15 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD)
Mapping out a KFD device to a GPU can be done manually by looking at the
domain and location properties. To make it easier to discover which
KFD device goes with what GPU add bidirectional links.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 +++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 +++++++++++++++++++
.../gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
5 files changed, 51 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index a2879d2b7c8ec..5d6cf3adfa7b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
return r;
}
+
+int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
+{
+ if (!adev->kfd.init_complete || !adev->kfd.dev)
+ return 0;
+
+ return kgd2kfd_create_sysfs_links(adev->kfd.dev);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index da4575676335f..fd92b227a674b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -272,6 +272,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id);
int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable);
bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id);
+int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
/* Read user wptr from a specified user address space with page fault
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7a0213a07023d..44c9320d72a56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4947,6 +4947,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
*/
r = amdgpu_device_sys_interface_init(adev);
+ r = amdgpu_amdkfd_create_sysfs_links(adev);
+ if (r)
+ dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
+
if (IS_ENABLED(CONFIG_PERF_EVENTS))
r = amdgpu_pmu_init(adev);
if (r)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b65f29294e2d6..796fd411a7dcc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -79,6 +79,37 @@ struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
return device;
}
+int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd)
+{
+ struct kfd_topology_device *top_dev;
+ int ret = -ENODEV;
+
+ if (!kfd)
+ return -EINVAL;
+
+ down_read(&topology_lock);
+
+ list_for_each_entry(top_dev, &topology_device_list, list) {
+ struct kobject *amdgpu_kobj;
+
+ if (!top_dev->gpu || top_dev->gpu->kfd != kfd || !top_dev->kobj_node)
+ continue;
+
+ amdgpu_kobj = &top_dev->gpu->adev->dev->kobj;
+ ret = sysfs_create_link(top_dev->kobj_node, amdgpu_kobj, "device");
+ if (ret)
+ break;
+
+ ret = sysfs_create_link(amdgpu_kobj, top_dev->kobj_node, "kfd");
+ if (ret)
+ sysfs_remove_link(top_dev->kobj_node, "device");
+ break;
+ }
+
+ up_read(&topology_lock);
+ return ret;
+}
+
struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id)
{
struct kfd_topology_device *top_dev = NULL;
@@ -571,6 +602,11 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
struct kfd_mem_properties *mem;
struct kfd_perf_properties *perf;
+ if (dev->gpu) {
+ sysfs_remove_link(dev->kobj_node, "device");
+ sysfs_remove_link(&dev->gpu->adev->dev->kobj, "kfd");
+ }
+
if (dev->kobj_iolink) {
list_for_each_entry(iolink, &dev->io_link_props, list)
if (iolink->kobj) {
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 9aba8596faa7e..f6db1dc634399 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -335,4 +335,6 @@ struct kfd2kgd_calls {
int engine, int queue);
};
+int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd);
+
#endif /* KGD_KFD_INTERFACE_H_INCLUDED */
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs()
2025-12-10 20:15 ` [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs() Mario Limonciello (AMD)
@ 2025-12-10 20:40 ` Russell, Kent
2025-12-16 2:42 ` Kasiviswanathan, Harish
1 sibling, 0 replies; 9+ messages in thread
From: Russell, Kent @ 2025-12-10 20:40 UTC (permalink / raw)
To: Mario Limonciello (AMD), amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix, Kasiviswanathan, Harish
[Public]
Seems reasonable to me, but I'd want Felix (unlikely due to PTO) or Harish to weigh in to make sure I am not missing something obvious.
From what I can tell, it was called way back when KFD Topology was first written, back when it was for APUs-only. So they'd do 2 calls to update sysfs, once for the CPU and once for the GPU. I think the logic just never got touched because it still works, even though we've changed things a bit now.
Kent
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Mario
> Limonciello (AMD)
> Sent: Wednesday, December 10, 2025 3:15 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Mario Limonciello (AMD) <superm1@kernel.org>
> Subject: [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling
> kfd_topology_update_sysfs()
>
> There is no need to remove all the nodes and rebuild them. The content
> will be the same. Instead check whether the node was created and skip
> the creation.
>
> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index a0990dd2378c1..b65f29294e2d6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -650,8 +650,8 @@ static int kfd_build_sysfs_node_entry(struct
> kfd_topology_device *dev,
> uint32_t i, num_attrs;
> struct attribute **attrs;
>
> - if (WARN_ON(dev->kobj_node))
> - return -EEXIST;
> + if (dev->kobj_node)
> + return 0;
>
> /*
> * Creating the sysfs folders
> @@ -888,8 +888,6 @@ static int kfd_topology_update_sysfs(void)
> return ret;
> }
>
> - kfd_remove_sysfs_node_tree();
> -
> return kfd_build_sysfs_node_tree();
> }
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device
2025-12-10 20:15 ` [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
@ 2025-12-10 20:40 ` Russell, Kent
2025-12-16 2:45 ` Kasiviswanathan, Harish
1 sibling, 0 replies; 9+ messages in thread
From: Russell, Kent @ 2025-12-10 20:40 UTC (permalink / raw)
To: Mario Limonciello (AMD), amd-gfx@lists.freedesktop.org
[Public]
Reviewed-by: Kent Russell <kent.russell@amd.com>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Mario
> Limonciello (AMD)
> Sent: Wednesday, December 10, 2025 3:15 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Mario Limonciello (AMD) <superm1@kernel.org>
> Subject: [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu
> device
>
> Mapping out a KFD device to a GPU can be done manually by looking at the
> domain and location properties. To make it easier to discover which
> KFD device goes with what GPU add bidirectional links.
>
> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 +++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 +++++++++++++++++++
> .../gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
> 5 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index a2879d2b7c8ec..5d6cf3adfa7b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct
> amdgpu_device *adev, uint32_t xcp_id,
>
> return r;
> }
> +
> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
> +{
> + if (!adev->kfd.init_complete || !adev->kfd.dev)
> + return 0;
> +
> + return kgd2kfd_create_sysfs_links(adev->kfd.dev);
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index da4575676335f..fd92b227a674b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -272,6 +272,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device
> *adev, uint32_t node_id);
> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t
> xcp_id,
> bool core_override_enable, bool reg_override_enable, bool
> perfmon_override_enable);
> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t
> node_id);
> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>
>
> /* Read user wptr from a specified user address space with page fault
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7a0213a07023d..44c9320d72a56 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4947,6 +4947,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> */
> r = amdgpu_device_sys_interface_init(adev);
>
> + r = amdgpu_amdkfd_create_sysfs_links(adev);
> + if (r)
> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
> +
> if (IS_ENABLED(CONFIG_PERF_EVENTS))
> r = amdgpu_pmu_init(adev);
> if (r)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index b65f29294e2d6..796fd411a7dcc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -79,6 +79,37 @@ struct kfd_topology_device
> *kfd_topology_device_by_proximity_domain(
> return device;
> }
>
> +int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd)
> +{
> + struct kfd_topology_device *top_dev;
> + int ret = -ENODEV;
> +
> + if (!kfd)
> + return -EINVAL;
> +
> + down_read(&topology_lock);
> +
> + list_for_each_entry(top_dev, &topology_device_list, list) {
> + struct kobject *amdgpu_kobj;
> +
> + if (!top_dev->gpu || top_dev->gpu->kfd != kfd || !top_dev->kobj_node)
> + continue;
> +
> + amdgpu_kobj = &top_dev->gpu->adev->dev->kobj;
> + ret = sysfs_create_link(top_dev->kobj_node, amdgpu_kobj, "device");
> + if (ret)
> + break;
> +
> + ret = sysfs_create_link(amdgpu_kobj, top_dev->kobj_node, "kfd");
> + if (ret)
> + sysfs_remove_link(top_dev->kobj_node, "device");
> + break;
> + }
> +
> + up_read(&topology_lock);
> + return ret;
> +}
> +
> struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id)
> {
> struct kfd_topology_device *top_dev = NULL;
> @@ -571,6 +602,11 @@ static void kfd_remove_sysfs_node_entry(struct
> kfd_topology_device *dev)
> struct kfd_mem_properties *mem;
> struct kfd_perf_properties *perf;
>
> + if (dev->gpu) {
> + sysfs_remove_link(dev->kobj_node, "device");
> + sysfs_remove_link(&dev->gpu->adev->dev->kobj, "kfd");
> + }
> +
> if (dev->kobj_iolink) {
> list_for_each_entry(iolink, &dev->io_link_props, list)
> if (iolink->kobj) {
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 9aba8596faa7e..f6db1dc634399 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
> int engine, int queue);
> };
>
> +int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd);
> +
> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
> --
> 2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs()
2025-12-10 20:15 ` [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs() Mario Limonciello (AMD)
2025-12-10 20:40 ` Russell, Kent
@ 2025-12-16 2:42 ` Kasiviswanathan, Harish
1 sibling, 0 replies; 9+ messages in thread
From: Kasiviswanathan, Harish @ 2025-12-16 2:42 UTC (permalink / raw)
To: Mario Limonciello (AMD), amd-gfx@lists.freedesktop.org
Hi Mario,
This wouldn't work when we change compute and/or memory partitions. After a partition switch we need to recreate all the nodes. If you have an instinct machine you can try changing partitions
$ rocm-smi --setcomputepartition CPX
Best Regards,
Harish
________________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Mario Limonciello (AMD) <superm1@kernel.org>
Sent: Wednesday, December 10, 2025 3:15 PM
To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Subject: [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs()
There is no need to remove all the nodes and rebuild them. The content
will be the same. Instead check whether the node was created and skip
the creation.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index a0990dd2378c1..b65f29294e2d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -650,8 +650,8 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
uint32_t i, num_attrs;
struct attribute **attrs;
- if (WARN_ON(dev->kobj_node))
- return -EEXIST;
+ if (dev->kobj_node)
+ return 0;
/*
* Creating the sysfs folders
@@ -888,8 +888,6 @@ static int kfd_topology_update_sysfs(void)
return ret;
}
- kfd_remove_sysfs_node_tree();
-
return kfd_build_sysfs_node_tree();
}
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device
2025-12-10 20:15 ` [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
2025-12-10 20:40 ` Russell, Kent
@ 2025-12-16 2:45 ` Kasiviswanathan, Harish
2025-12-16 12:20 ` Mario Limonciello
1 sibling, 1 reply; 9+ messages in thread
From: Kasiviswanathan, Harish @ 2025-12-16 2:45 UTC (permalink / raw)
To: Mario Limonciello (AMD), amd-gfx@lists.freedesktop.org
Similar comment to previous patch. Once you do spatial partitioning (like QPX or CPX), there is no one-to-one correspondence between drm node and kfd node. Partitions don't change device node however, you could have multiple (4 or 8) kfd nodes.
Best Regards,
Harish
________________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Mario Limonciello (AMD) <superm1@kernel.org>
Sent: Wednesday, December 10, 2025 3:15 PM
To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Subject: [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device
Mapping out a KFD device to a GPU can be done manually by looking at the
domain and location properties. To make it easier to discover which
KFD device goes with what GPU add bidirectional links.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 +++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 +++++++++++++++++++
.../gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
5 files changed, 51 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index a2879d2b7c8ec..5d6cf3adfa7b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
return r;
}
+
+int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
+{
+ if (!adev->kfd.init_complete || !adev->kfd.dev)
+ return 0;
+
+ return kgd2kfd_create_sysfs_links(adev->kfd.dev);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index da4575676335f..fd92b227a674b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -272,6 +272,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id);
int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable);
bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id);
+int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
/* Read user wptr from a specified user address space with page fault
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7a0213a07023d..44c9320d72a56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4947,6 +4947,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
*/
r = amdgpu_device_sys_interface_init(adev);
+ r = amdgpu_amdkfd_create_sysfs_links(adev);
+ if (r)
+ dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
+
if (IS_ENABLED(CONFIG_PERF_EVENTS))
r = amdgpu_pmu_init(adev);
if (r)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b65f29294e2d6..796fd411a7dcc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -79,6 +79,37 @@ struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
return device;
}
+int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd)
+{
+ struct kfd_topology_device *top_dev;
+ int ret = -ENODEV;
+
+ if (!kfd)
+ return -EINVAL;
+
+ down_read(&topology_lock);
+
+ list_for_each_entry(top_dev, &topology_device_list, list) {
+ struct kobject *amdgpu_kobj;
+
+ if (!top_dev->gpu || top_dev->gpu->kfd != kfd || !top_dev->kobj_node)
+ continue;
+
+ amdgpu_kobj = &top_dev->gpu->adev->dev->kobj;
+ ret = sysfs_create_link(top_dev->kobj_node, amdgpu_kobj, "device");
+ if (ret)
+ break;
+
+ ret = sysfs_create_link(amdgpu_kobj, top_dev->kobj_node, "kfd");
+ if (ret)
+ sysfs_remove_link(top_dev->kobj_node, "device");
+ break;
+ }
+
+ up_read(&topology_lock);
+ return ret;
+}
+
struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id)
{
struct kfd_topology_device *top_dev = NULL;
@@ -571,6 +602,11 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
struct kfd_mem_properties *mem;
struct kfd_perf_properties *perf;
+ if (dev->gpu) {
+ sysfs_remove_link(dev->kobj_node, "device");
+ sysfs_remove_link(&dev->gpu->adev->dev->kobj, "kfd");
+ }
+
if (dev->kobj_iolink) {
list_for_each_entry(iolink, &dev->io_link_props, list)
if (iolink->kobj) {
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 9aba8596faa7e..f6db1dc634399 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -335,4 +335,6 @@ struct kfd2kgd_calls {
int engine, int queue);
};
+int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd);
+
#endif /* KGD_KFD_INTERFACE_H_INCLUDED */
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 2:45 ` Kasiviswanathan, Harish
@ 2025-12-16 12:20 ` Mario Limonciello
0 siblings, 0 replies; 9+ messages in thread
From: Mario Limonciello @ 2025-12-16 12:20 UTC (permalink / raw)
To: Kasiviswanathan, Harish, amd-gfx@lists.freedesktop.org
Thanks. I've posted a v3 that hopefully addresses this.
On 12/15/25 8:45 PM, Kasiviswanathan, Harish wrote:
> Similar comment to previous patch. Once you do spatial partitioning (like QPX or CPX), there is no one-to-one correspondence between drm node and kfd node. Partitions don't change device node however, you could have multiple (4 or 8) kfd nodes.
>
> Best Regards,
> Harish
>
>
> ________________________________________
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Mario Limonciello (AMD) <superm1@kernel.org>
> Sent: Wednesday, December 10, 2025 3:15 PM
> To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> Cc: Mario Limonciello (AMD) <superm1@kernel.org>
> Subject: [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device
>
>
> Mapping out a KFD device to a GPU can be done manually by looking at the
>
> domain and location properties. To make it easier to discover which
>
> KFD device goes with what GPU add bidirectional links.
>
>
>
> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>
> ---
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 +++++
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++
>
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 +++++++++++++++++++
>
> .../gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
>
> 5 files changed, 51 insertions(+)
>
>
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>
> index a2879d2b7c8ec..5d6cf3adfa7b8 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>
> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
>
>
>
> return r;
>
> }
>
> +
>
> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
>
> +{
>
> + if (!adev->kfd.init_complete || !adev->kfd.dev)
>
> + return 0;
>
> +
>
> + return kgd2kfd_create_sysfs_links(adev->kfd.dev);
>
> +}
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>
> index da4575676335f..fd92b227a674b 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>
> @@ -272,6 +272,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id);
>
> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
>
> bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable);
>
> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id);
>
> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>
>
>
>
>
> /* Read user wptr from a specified user address space with page fault
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>
> index 7a0213a07023d..44c9320d72a56 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>
> @@ -4947,6 +4947,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>
> */
>
> r = amdgpu_device_sys_interface_init(adev);
>
>
>
> + r = amdgpu_amdkfd_create_sysfs_links(adev);
>
> + if (r)
>
> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
>
> +
>
> if (IS_ENABLED(CONFIG_PERF_EVENTS))
>
> r = amdgpu_pmu_init(adev);
>
> if (r)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>
> index b65f29294e2d6..796fd411a7dcc 100644
>
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>
> @@ -79,6 +79,37 @@ struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
>
> return device;
>
> }
>
>
>
> +int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd)
>
> +{
>
> + struct kfd_topology_device *top_dev;
>
> + int ret = -ENODEV;
>
> +
>
> + if (!kfd)
>
> + return -EINVAL;
>
> +
>
> + down_read(&topology_lock);
>
> +
>
> + list_for_each_entry(top_dev, &topology_device_list, list) {
>
> + struct kobject *amdgpu_kobj;
>
> +
>
> + if (!top_dev->gpu || top_dev->gpu->kfd != kfd || !top_dev->kobj_node)
>
> + continue;
>
> +
>
> + amdgpu_kobj = &top_dev->gpu->adev->dev->kobj;
>
> + ret = sysfs_create_link(top_dev->kobj_node, amdgpu_kobj, "device");
>
> + if (ret)
>
> + break;
>
> +
>
> + ret = sysfs_create_link(amdgpu_kobj, top_dev->kobj_node, "kfd");
>
> + if (ret)
>
> + sysfs_remove_link(top_dev->kobj_node, "device");
>
> + break;
>
> + }
>
> +
>
> + up_read(&topology_lock);
>
> + return ret;
>
> +}
>
> +
>
> struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id)
>
> {
>
> struct kfd_topology_device *top_dev = NULL;
>
> @@ -571,6 +602,11 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>
> struct kfd_mem_properties *mem;
>
> struct kfd_perf_properties *perf;
>
>
>
> + if (dev->gpu) {
>
> + sysfs_remove_link(dev->kobj_node, "device");
>
> + sysfs_remove_link(&dev->gpu->adev->dev->kobj, "kfd");
>
> + }
>
> +
>
> if (dev->kobj_iolink) {
>
> list_for_each_entry(iolink, &dev->io_link_props, list)
>
> if (iolink->kobj) {
>
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>
> index 9aba8596faa7e..f6db1dc634399 100644
>
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>
> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
>
> int engine, int queue);
>
> };
>
>
>
> +int kgd2kfd_create_sysfs_links(struct kfd_dev *kfd);
>
> +
>
> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
>
> --
>
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-16 12:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-10 20:15 [PATCH v2 0/3] Make device links between KFD and GPU device Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 1/3] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
2025-12-10 20:15 ` [PATCH v2 2/3] amdkfd: Don't rebuild node tree when calling kfd_topology_update_sysfs() Mario Limonciello (AMD)
2025-12-10 20:40 ` Russell, Kent
2025-12-16 2:42 ` Kasiviswanathan, Harish
2025-12-10 20:15 ` [PATCH v2 3/3] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
2025-12-10 20:40 ` Russell, Kent
2025-12-16 2:45 ` Kasiviswanathan, Harish
2025-12-16 12:20 ` Mario Limonciello
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox