* [PATCH v3 0/2] Make device links between KFD and GPU device
@ 2025-12-16 6:00 Mario Limonciello (AMD)
2025-12-16 6:00 ` [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
0 siblings, 2 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-16 6:00 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD)
Discovering which GPU device is associated with a KFD node is
relatively awkward right now in userspace.
This series creates a link from KFD to GPU to simplify it
for userspace.
Mario Limonciello (AMD) (2):
amdkfd: Only ignore -ENOENT for KFD init failuires
amdkfd: Add device links between kfd device and amdgpu device
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++--
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++-
drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
6 files changed, 35 insertions(+), 3 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires
2025-12-16 6:00 [PATCH v3 0/2] Make device links between KFD and GPU device Mario Limonciello (AMD)
@ 2025-12-16 6:00 ` Mario Limonciello (AMD)
2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
1 sibling, 0 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-16 6:00 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD), Kent Russell
When compiled without CONFIG_HSA_AMD KFD will return -ENOENT.
As other errors will cause KFD functionality issues this is the
only error code that should be ignored at init.
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 18658985a57ce..7eaea3f216fd3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -3177,8 +3177,10 @@ static int __init amdgpu_init(void)
amdgpu_register_atpx_handler();
amdgpu_acpi_detect();
- /* Ignore KFD init failures. Normal when CONFIG_HSA_AMD is not set. */
- amdgpu_amdkfd_init();
+ /* Ignore KFD init failures when CONFIG_HSA_AMD is not set. */
+ r = amdgpu_amdkfd_init();
+ if (r && r != -ENOENT)
+ goto error_fence;
if (amdgpu_pp_feature_mask & PP_OVERDRIVE_MASK) {
add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 6:00 [PATCH v3 0/2] Make device links between KFD and GPU device Mario Limonciello (AMD)
2025-12-16 6:00 ` [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
@ 2025-12-16 6:00 ` Mario Limonciello (AMD)
2025-12-16 6:22 ` Lazar, Lijo
1 sibling, 1 reply; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2025-12-16 6:00 UTC (permalink / raw)
To: amd-gfx; +Cc: Mario Limonciello (AMD), Harish.Kasiviswanathan
Mapping out a KFD device to a GPU can be done manually by looking at the
domain and location properties. To make it easier to discover which
KFD device goes with what GPU add a link to the GPU node.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
Cc: Harish.Kasiviswanathan@amd.com>
v3:
* Create link when topology created
* Only call update topology when amdgpu is called
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++-
drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
5 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 67a01c4f38855..870a727d6e938 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
return r;
}
+
+int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
+{
+ if (!adev->kfd.init_complete || !adev->kfd.dev)
+ return 0;
+
+ return kfd_topology_update_sysfs();
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 8bdfcde2029b5..07aa519b28d45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id);
int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable);
bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id);
+int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
/* Read user wptr from a specified user address space with page fault
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 467326871a81e..d4c8b03b6bf57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
*/
r = amdgpu_device_sys_interface_init(adev);
+ r = amdgpu_amdkfd_create_sysfs_links(adev);
+ if (r)
+ dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
+
if (IS_ENABLED(CONFIG_PERF_EVENTS))
r = amdgpu_pmu_init(adev);
if (r)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index a95be23fd0397..5f14c66902f9d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
struct kfd_mem_properties *mem;
struct kfd_perf_properties *perf;
+ if (dev->gpu)
+ sysfs_remove_link(dev->kobj_node, "device");
+
if (dev->kobj_iolink) {
list_for_each_entry(iolink, &dev->io_link_props, list)
if (iolink->kobj) {
@@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
return ret;
}
+ /*
+ * create a link to the GPU node, but don't do a reverse one since it might
+ * not match after spatial partitioning
+ */
+ if (dev->gpu) {
+ struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
+
+ ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device");
+ if (ret)
+ return ret;
+ }
+
return 0;
}
@@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
kfd_remove_sysfs_node_entry(dev);
}
-static int kfd_topology_update_sysfs(void)
+int kfd_topology_update_sysfs(void)
{
int ret;
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 9aba8596faa7e..0ee1a7d3a73f5 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -335,4 +335,6 @@ struct kfd2kgd_calls {
int engine, int queue);
};
+int kfd_topology_update_sysfs(void);
+
#endif /* KGD_KFD_INTERFACE_H_INCLUDED */
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
@ 2025-12-16 6:22 ` Lazar, Lijo
2025-12-16 12:19 ` Mario Limonciello
0 siblings, 1 reply; 9+ messages in thread
From: Lazar, Lijo @ 2025-12-16 6:22 UTC (permalink / raw)
To: Mario Limonciello (AMD), amd-gfx; +Cc: Harish.Kasiviswanathan
On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote:
> Mapping out a KFD device to a GPU can be done manually by looking at the
> domain and location properties. To make it easier to discover which
> KFD device goes with what GPU add a link to the GPU node.
>
Access to the full device is not desirable in container environments
where it is restricted to the particular partition's properties.
Thanks,
Lijo
> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
> ---
> Cc: Harish.Kasiviswanathan@amd.com>
> v3:
> * Create link when topology created
> * Only call update topology when amdgpu is called
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++-
> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
> 5 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 67a01c4f38855..870a727d6e938 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
>
> return r;
> }
> +
> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
> +{
> + if (!adev->kfd.init_complete || !adev->kfd.dev)
> + return 0;
> +
> + return kfd_topology_update_sysfs();
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 8bdfcde2029b5..07aa519b28d45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id);
> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id,
> bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable);
> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id);
> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>
>
> /* Read user wptr from a specified user address space with page fault
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 467326871a81e..d4c8b03b6bf57 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> */
> r = amdgpu_device_sys_interface_init(adev);
>
> + r = amdgpu_amdkfd_create_sysfs_links(adev);
> + if (r)
> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
> +
> if (IS_ENABLED(CONFIG_PERF_EVENTS))
> r = amdgpu_pmu_init(adev);
> if (r)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index a95be23fd0397..5f14c66902f9d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
> struct kfd_mem_properties *mem;
> struct kfd_perf_properties *perf;
>
> + if (dev->gpu)
> + sysfs_remove_link(dev->kobj_node, "device");
> +
> if (dev->kobj_iolink) {
> list_for_each_entry(iolink, &dev->io_link_props, list)
> if (iolink->kobj) {
> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
> return ret;
> }
>
> + /*
> + * create a link to the GPU node, but don't do a reverse one since it might
> + * not match after spatial partitioning
> + */
> + if (dev->gpu) {
> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
> +
> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device");
> + if (ret)
> + return ret;
> + }
> +
> return 0;
> }
>
> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
> kfd_remove_sysfs_node_entry(dev);
> }
>
> -static int kfd_topology_update_sysfs(void)
> +int kfd_topology_update_sysfs(void)
> {
> int ret;
>
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 9aba8596faa7e..0ee1a7d3a73f5 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
> int engine, int queue);
> };
>
> +int kfd_topology_update_sysfs(void);
> +
> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 6:22 ` Lazar, Lijo
@ 2025-12-16 12:19 ` Mario Limonciello
2025-12-16 12:40 ` Lazar, Lijo
0 siblings, 1 reply; 9+ messages in thread
From: Mario Limonciello @ 2025-12-16 12:19 UTC (permalink / raw)
To: Lazar, Lijo, amd-gfx; +Cc: Harish.Kasiviswanathan
On 12/16/25 12:22 AM, Lazar, Lijo wrote:
>
>
> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote:
>> Mapping out a KFD device to a GPU can be done manually by looking at the
>> domain and location properties. To make it easier to discover which
>> KFD device goes with what GPU add a link to the GPU node.
>>
>
> Access to the full device is not desirable in container environments
> where it is restricted to the particular partition's properties.
>
Container environments don't typically bind mount the whole sysfs tree
do they?
Nonetheless; even if they did this information is already discoverable,
it's just a PIA to get to.
> Thanks,
> Lijo
>
>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>> ---
>> Cc: Harish.Kasiviswanathan@amd.com>
>> v3:
>> * Create link when topology created
>> * Only call update topology when amdgpu is called
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++-
>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/
>> drm/amd/amdgpu/amdgpu_amdkfd.c
>> index 67a01c4f38855..870a727d6e938 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct
>> amdgpu_device *adev, uint32_t xcp_id,
>> return r;
>> }
>> +
>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
>> +{
>> + if (!adev->kfd.init_complete || !adev->kfd.dev)
>> + return 0;
>> +
>> + return kfd_topology_update_sysfs();
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/
>> drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 8bdfcde2029b5..07aa519b28d45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device
>> *adev, uint32_t node_id);
>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev,
>> uint32_t xcp_id,
>> bool core_override_enable, bool reg_override_enable, bool
>> perfmon_override_enable);
>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev,
>> uint32_t node_id);
>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>> /* Read user wptr from a specified user address space with page fault
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/
>> drm/amd/amdgpu/amdgpu_device.c
>> index 467326871a81e..d4c8b03b6bf57 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> */
>> r = amdgpu_device_sys_interface_init(adev);
>> + r = amdgpu_amdkfd_create_sysfs_links(adev);
>> + if (r)
>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
>> +
>> if (IS_ENABLED(CONFIG_PERF_EVENTS))
>> r = amdgpu_pmu_init(adev);
>> if (r)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/
>> drm/amd/amdkfd/kfd_topology.c
>> index a95be23fd0397..5f14c66902f9d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct
>> kfd_topology_device *dev)
>> struct kfd_mem_properties *mem;
>> struct kfd_perf_properties *perf;
>> + if (dev->gpu)
>> + sysfs_remove_link(dev->kobj_node, "device");
>> +
>> if (dev->kobj_iolink) {
>> list_for_each_entry(iolink, &dev->io_link_props, list)
>> if (iolink->kobj) {
>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct
>> kfd_topology_device *dev,
>> return ret;
>> }
>> + /*
>> + * create a link to the GPU node, but don't do a reverse one
>> since it might
>> + * not match after spatial partitioning
>> + */
>> + if (dev->gpu) {
>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
>> +
>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device");
>> + if (ret)
>> + return ret;
>> + }
>> +
>> return 0;
>> }
>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
>> kfd_remove_sysfs_node_entry(dev);
>> }
>> -static int kfd_topology_update_sysfs(void)
>> +int kfd_topology_update_sysfs(void)
>> {
>> int ret;
>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/
>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> index 9aba8596faa7e..0ee1a7d3a73f5 100644
>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
>> int engine, int queue);
>> };
>> +int kfd_topology_update_sysfs(void);
>> +
>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 12:19 ` Mario Limonciello
@ 2025-12-16 12:40 ` Lazar, Lijo
2025-12-16 14:01 ` Mario Limonciello
0 siblings, 1 reply; 9+ messages in thread
From: Lazar, Lijo @ 2025-12-16 12:40 UTC (permalink / raw)
To: Mario Limonciello, amd-gfx; +Cc: Harish.Kasiviswanathan
On 16-Dec-25 5:49 PM, Mario Limonciello wrote:
>
>
> On 12/16/25 12:22 AM, Lazar, Lijo wrote:
>>
>>
>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote:
>>> Mapping out a KFD device to a GPU can be done manually by looking at the
>>> domain and location properties. To make it easier to discover which
>>> KFD device goes with what GPU add a link to the GPU node.
>>>
>>
>> Access to the full device is not desirable in container environments
>> where it is restricted to the particular partition's properties.
>>
>
> Container environments don't typically bind mount the whole sysfs tree
> do they?
>
AFAIK, only selected ones and access restricted through cgroups.
Thanks,
Lijo
> Nonetheless; even if they did this information is already discoverable,
> it's just a PIA to get to.
>
>> Thanks,
>> Lijo
>>
>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>>> ---
>>> Cc: Harish.Kasiviswanathan@amd.com>
>>> v3:
>>> * Create link when topology created
>>> * Only call update topology when amdgpu is called
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++-
>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
>>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/
>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c
>>> index 67a01c4f38855..870a727d6e938 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct
>>> amdgpu_device *adev, uint32_t xcp_id,
>>> return r;
>>> }
>>> +
>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
>>> +{
>>> + if (!adev->kfd.init_complete || !adev->kfd.dev)
>>> + return 0;
>>> +
>>> + return kfd_topology_update_sysfs();
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/
>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h
>>> index 8bdfcde2029b5..07aa519b28d45 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device
>>> *adev, uint32_t node_id);
>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev,
>>> uint32_t xcp_id,
>>> bool core_override_enable, bool reg_override_enable, bool
>>> perfmon_override_enable);
>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev,
>>> uint32_t node_id);
>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>>> /* Read user wptr from a specified user address space with page fault
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/
>>> gpu/ drm/amd/amdgpu/amdgpu_device.c
>>> index 467326871a81e..d4c8b03b6bf57 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device
>>> *adev,
>>> */
>>> r = amdgpu_device_sys_interface_init(adev);
>>> + r = amdgpu_amdkfd_create_sysfs_links(adev);
>>> + if (r)
>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r);
>>> +
>>> if (IS_ENABLED(CONFIG_PERF_EVENTS))
>>> r = amdgpu_pmu_init(adev);
>>> if (r)
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/
>>> drm/amd/amdkfd/kfd_topology.c
>>> index a95be23fd0397..5f14c66902f9d 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct
>>> kfd_topology_device *dev)
>>> struct kfd_mem_properties *mem;
>>> struct kfd_perf_properties *perf;
>>> + if (dev->gpu)
>>> + sysfs_remove_link(dev->kobj_node, "device");
>>> +
>>> if (dev->kobj_iolink) {
>>> list_for_each_entry(iolink, &dev->io_link_props, list)
>>> if (iolink->kobj) {
>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct
>>> kfd_topology_device *dev,
>>> return ret;
>>> }
>>> + /*
>>> + * create a link to the GPU node, but don't do a reverse one
>>> since it might
>>> + * not match after spatial partitioning
>>> + */
>>> + if (dev->gpu) {
>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
>>> +
>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device");
>>> + if (ret)
>>> + return ret;
>>> + }
>>> +
>>> return 0;
>>> }
>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
>>> kfd_remove_sysfs_node_entry(dev);
>>> }
>>> -static int kfd_topology_update_sysfs(void)
>>> +int kfd_topology_update_sysfs(void)
>>> {
>>> int ret;
>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/
>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644
>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
>>> int engine, int queue);
>>> };
>>> +int kfd_topology_update_sysfs(void);
>>> +
>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 12:40 ` Lazar, Lijo
@ 2025-12-16 14:01 ` Mario Limonciello
2025-12-16 14:25 ` Kasiviswanathan, Harish
0 siblings, 1 reply; 9+ messages in thread
From: Mario Limonciello @ 2025-12-16 14:01 UTC (permalink / raw)
To: Lazar, Lijo, amd-gfx; +Cc: Harish.Kasiviswanathan
On 12/16/25 6:40 AM, Lazar, Lijo wrote:
>
>
> On 16-Dec-25 5:49 PM, Mario Limonciello wrote:
>>
>>
>> On 12/16/25 12:22 AM, Lazar, Lijo wrote:
>>>
>>>
>>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote:
>>>> Mapping out a KFD device to a GPU can be done manually by looking at
>>>> the
>>>> domain and location properties. To make it easier to discover which
>>>> KFD device goes with what GPU add a link to the GPU node.
>>>>
>>>
>>> Access to the full device is not desirable in container environments
>>> where it is restricted to the particular partition's properties.
>>>
>>
>> Container environments don't typically bind mount the whole sysfs tree
>> do they?
>>
>
> AFAIK, only selected ones and access restricted through cgroups.
The information needed to discover is definitely already exposed.
❯ cat /sys/class/kfd/kfd/topology/nodes/1/properties | grep
"domain\|location_id\|vendor_id\|device_id"
vendor_id 4098
device_id 5510
location_id 49664
domain 0
But so what's going to happen with the new symlink then? Would cgroups
export it if the device it points to wasn't exported?
I'm not sure how to try this myself.
>
> Thanks,
> Lijo
>
>> Nonetheless; even if they did this information is already
>> discoverable, it's just a PIA to get to.
>>
>>> Thanks,
>>> Lijo
>>>
>>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>>>> ---
>>>> Cc: Harish.Kasiviswanathan@amd.com>
>>>> v3:
>>>> * Create link when topology created
>>>> * Only call update topology when amdgpu is called
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 +++++++++++++
>>>> +++-
>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
>>>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/
>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> index 67a01c4f38855..870a727d6e938 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct
>>>> amdgpu_device *adev, uint32_t xcp_id,
>>>> return r;
>>>> }
>>>> +
>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
>>>> +{
>>>> + if (!adev->kfd.init_complete || !adev->kfd.dev)
>>>> + return 0;
>>>> +
>>>> + return kfd_topology_update_sysfs();
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/
>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> index 8bdfcde2029b5..07aa519b28d45 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct
>>>> amdgpu_device *adev, uint32_t node_id);
>>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev,
>>>> uint32_t xcp_id,
>>>> bool core_override_enable, bool reg_override_enable, bool
>>>> perfmon_override_enable);
>>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev,
>>>> uint32_t node_id);
>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>>>> /* Read user wptr from a specified user address space with page fault
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/
>>>> gpu/ drm/amd/amdgpu/amdgpu_device.c
>>>> index 467326871a81e..d4c8b03b6bf57 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device
>>>> *adev,
>>>> */
>>>> r = amdgpu_device_sys_interface_init(adev);
>>>> + r = amdgpu_amdkfd_create_sysfs_links(adev);
>>>> + if (r)
>>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n",
>>>> r);
>>>> +
>>>> if (IS_ENABLED(CONFIG_PERF_EVENTS))
>>>> r = amdgpu_pmu_init(adev);
>>>> if (r)
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/
>>>> gpu/ drm/amd/amdkfd/kfd_topology.c
>>>> index a95be23fd0397..5f14c66902f9d 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct
>>>> kfd_topology_device *dev)
>>>> struct kfd_mem_properties *mem;
>>>> struct kfd_perf_properties *perf;
>>>> + if (dev->gpu)
>>>> + sysfs_remove_link(dev->kobj_node, "device");
>>>> +
>>>> if (dev->kobj_iolink) {
>>>> list_for_each_entry(iolink, &dev->io_link_props, list)
>>>> if (iolink->kobj) {
>>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct
>>>> kfd_topology_device *dev,
>>>> return ret;
>>>> }
>>>> + /*
>>>> + * create a link to the GPU node, but don't do a reverse one
>>>> since it might
>>>> + * not match after spatial partitioning
>>>> + */
>>>> + if (dev->gpu) {
>>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
>>>> +
>>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj,
>>>> "device");
>>>> + if (ret)
>>>> + return ret;
>>>> + }
>>>> +
>>>> return 0;
>>>> }
>>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
>>>> kfd_remove_sysfs_node_entry(dev);
>>>> }
>>>> -static int kfd_topology_update_sysfs(void)
>>>> +int kfd_topology_update_sysfs(void)
>>>> {
>>>> int ret;
>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/
>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644
>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
>>>> int engine, int queue);
>>>> };
>>>> +int kfd_topology_update_sysfs(void);
>>>> +
>>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
>>>
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 14:01 ` Mario Limonciello
@ 2025-12-16 14:25 ` Kasiviswanathan, Harish
2025-12-16 14:41 ` Mario Limonciello
0 siblings, 1 reply; 9+ messages in thread
From: Kasiviswanathan, Harish @ 2025-12-16 14:25 UTC (permalink / raw)
To: Mario Limonciello, Lazar, Lijo, amd-gfx@lists.freedesktop.org
To try this scenario, you can do the following.
$ set partition mode to QPX
$ run docker but instead of /dev/dri use /dev/dri/renderD128 --> this way the container can access only one single partition of QPX.
# inside docker
## run rocminfo --> you should only see one device
## In /sys/class/kfd/kfd/topology/nodes --> you could see the all 4 nodes but only /sys/class/kfd/kfd/topology/nodes/1/ will be accessible
This was designed this way as hiding the nodes was hard to architecture. Hence, from the container the nodes list is visible but access is based on cgroups. So, with /dev/dri/renderDXXX only one of the /sys/class/kfd/kfd/topology/nodes/X/ will be visible and usable.
________________________________________
From: Mario Limonciello <superm1@kernel.org>
Sent: Tuesday, December 16, 2025 9:01 AM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>
Subject: Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
On 12/16/25 6:40 AM, Lazar, Lijo wrote:
>
>
> On 16-Dec-25 5:49 PM, Mario Limonciello wrote:
>>
>>
>> On 12/16/25 12:22 AM, Lazar, Lijo wrote:
>>>
>>>
>>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote:
>>>> Mapping out a KFD device to a GPU can be done manually by looking at
>>>> the
>>>> domain and location properties. To make it easier to discover which
>>>> KFD device goes with what GPU add a link to the GPU node.
>>>>
>>>
>>> Access to the full device is not desirable in container environments
>>> where it is restricted to the particular partition's properties.
>>>
>>
>> Container environments don't typically bind mount the whole sysfs tree
>> do they?
>>
>
> AFAIK, only selected ones and access restricted through cgroups.
The information needed to discover is definitely already exposed.
❯ cat /sys/class/kfd/kfd/topology/nodes/1/properties | grep
"domain\|location_id\|vendor_id\|device_id"
vendor_id 4098
device_id 5510
location_id 49664
domain 0
But so what's going to happen with the new symlink then? Would cgroups
export it if the device it points to wasn't exported?
I'm not sure how to try this myself.
>
> Thanks,
> Lijo
>
>> Nonetheless; even if they did this information is already
>> discoverable, it's just a PIA to get to.
>>
>>> Thanks,
>>> Lijo
>>>
>>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>>>> ---
>>>> Cc: Harish.Kasiviswanathan@amd.com>
>>>> v3:
>>>> * Create link when topology created
>>>> * Only call update topology when amdgpu is called
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 +++++++++++++
>>>> +++-
>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
>>>> 5 files changed, 31 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/
>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> index 67a01c4f38855..870a727d6e938 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct
>>>> amdgpu_device *adev, uint32_t xcp_id,
>>>> return r;
>>>> }
>>>> +
>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
>>>> +{
>>>> + if (!adev->kfd.init_complete || !adev->kfd.dev)
>>>> + return 0;
>>>> +
>>>> + return kfd_topology_update_sysfs();
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/
>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> index 8bdfcde2029b5..07aa519b28d45 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct
>>>> amdgpu_device *adev, uint32_t node_id);
>>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev,
>>>> uint32_t xcp_id,
>>>> bool core_override_enable, bool reg_override_enable, bool
>>>> perfmon_override_enable);
>>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev,
>>>> uint32_t node_id);
>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>>>> /* Read user wptr from a specified user address space with page fault
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/
>>>> gpu/ drm/amd/amdgpu/amdgpu_device.c
>>>> index 467326871a81e..d4c8b03b6bf57 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device
>>>> *adev,
>>>> */
>>>> r = amdgpu_device_sys_interface_init(adev);
>>>> + r = amdgpu_amdkfd_create_sysfs_links(adev);
>>>> + if (r)
>>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n",
>>>> r);
>>>> +
>>>> if (IS_ENABLED(CONFIG_PERF_EVENTS))
>>>> r = amdgpu_pmu_init(adev);
>>>> if (r)
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/
>>>> gpu/ drm/amd/amdkfd/kfd_topology.c
>>>> index a95be23fd0397..5f14c66902f9d 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct
>>>> kfd_topology_device *dev)
>>>> struct kfd_mem_properties *mem;
>>>> struct kfd_perf_properties *perf;
>>>> + if (dev->gpu)
>>>> + sysfs_remove_link(dev->kobj_node, "device");
>>>> +
>>>> if (dev->kobj_iolink) {
>>>> list_for_each_entry(iolink, &dev->io_link_props, list)
>>>> if (iolink->kobj) {
>>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct
>>>> kfd_topology_device *dev,
>>>> return ret;
>>>> }
>>>> + /*
>>>> + * create a link to the GPU node, but don't do a reverse one
>>>> since it might
>>>> + * not match after spatial partitioning
>>>> + */
>>>> + if (dev->gpu) {
>>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
>>>> +
>>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj,
>>>> "device");
>>>> + if (ret)
>>>> + return ret;
>>>> + }
>>>> +
>>>> return 0;
>>>> }
>>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
>>>> kfd_remove_sysfs_node_entry(dev);
>>>> }
>>>> -static int kfd_topology_update_sysfs(void)
>>>> +int kfd_topology_update_sysfs(void)
>>>> {
>>>> int ret;
>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/
>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644
>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
>>>> int engine, int queue);
>>>> };
>>>> +int kfd_topology_update_sysfs(void);
>>>> +
>>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
>>>
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
2025-12-16 14:25 ` Kasiviswanathan, Harish
@ 2025-12-16 14:41 ` Mario Limonciello
0 siblings, 0 replies; 9+ messages in thread
From: Mario Limonciello @ 2025-12-16 14:41 UTC (permalink / raw)
To: Kasiviswanathan, Harish, Lazar, Lijo,
amd-gfx@lists.freedesktop.org
On 12/16/25 8:25 AM, Kasiviswanathan, Harish wrote:
> To try this scenario, you can do the following.
Ah I don't have an Instinct on my side to try this, only multiple radeon
cards.
>
> $ set partition mode to QPX
> $ run docker but instead of /dev/dri use /dev/dri/renderD128 --> this way the container can access only one single partition of QPX.
> # inside docker
> ## run rocminfo --> you should only see one device
> ## In /sys/class/kfd/kfd/topology/nodes --> you could see the all 4 nodes but only /sys/class/kfd/kfd/topology/nodes/1/ will be accessible
>
> This was designed this way as hiding the nodes was hard to architecture. Hence, from the container the nodes list is visible but access is based on cgroups. So, with /dev/dri/renderDXXX only one of the /sys/class/kfd/kfd/topology/nodes/X/ will be visible and usable.
>
But so shouldn't /sys/class/drm/renderD128 also be exposed too then?
This has a device sub-link too which should match what this change does.
I would expect this also exposes.
❯ ls -alh /sys/class/drm/renderD128/ | grep device
lrwxrwxrwx - root 16 Dec 08:40 device -> ../../../0000:c2:00.0
>
>
>
> ________________________________________
> From: Mario Limonciello <superm1@kernel.org>
> Sent: Tuesday, December 16, 2025 9:01 AM
> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>
> Subject: Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device
>
>
> On 12/16/25 6:40 AM, Lazar, Lijo wrote:
>
>>
>
>>
>
>> On 16-Dec-25 5:49 PM, Mario Limonciello wrote:
>
>>>
>
>>>
>
>>> On 12/16/25 12:22 AM, Lazar, Lijo wrote:
>
>>>>
>
>>>>
>
>>>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote:
>
>>>>> Mapping out a KFD device to a GPU can be done manually by looking at
>
>>>>> the
>
>>>>> domain and location properties. To make it easier to discover which
>
>>>>> KFD device goes with what GPU add a link to the GPU node.
>
>>>>>
>
>>>>
>
>>>> Access to the full device is not desirable in container environments
>
>>>> where it is restricted to the particular partition's properties.
>
>>>>
>
>>>
>
>>> Container environments don't typically bind mount the whole sysfs tree
>
>>> do they?
>
>>>
>
>>
>
>> AFAIK, only selected ones and access restricted through cgroups.
>
>
>
> The information needed to discover is definitely already exposed.
>
>
>
> ❯ cat /sys/class/kfd/kfd/topology/nodes/1/properties | grep
>
> "domain\|location_id\|vendor_id\|device_id"
>
> vendor_id 4098
>
> device_id 5510
>
> location_id 49664
>
> domain 0
>
>
>
> But so what's going to happen with the new symlink then? Would cgroups
>
> export it if the device it points to wasn't exported?
>
>
>
> I'm not sure how to try this myself.
>
>>
>
>> Thanks,
>
>> Lijo
>
>>
>
>>> Nonetheless; even if they did this information is already
>
>>> discoverable, it's just a PIA to get to.
>
>>>
>
>>>> Thanks,
>
>>>> Lijo
>
>>>>
>
>>>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>
>>>>> ---
>
>>>>> Cc: Harish.Kasiviswanathan@amd.com>
>
>>>>> v3:
>
>>>>> * Create link when topology created
>
>>>>> * Only call update topology when amdgpu is called
>
>>>>> ---
>
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
>
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
>
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>
>>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 +++++++++++++
>
>>>>> +++-
>
>>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++
>
>>>>> 5 files changed, 31 insertions(+), 1 deletion(-)
>
>>>>>
>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/
>
>>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c
>
>>>>> index 67a01c4f38855..870a727d6e938 100644
>
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>
>>>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct
>
>>>>> amdgpu_device *adev, uint32_t xcp_id,
>
>>>>> return r;
>
>>>>> }
>
>>>>> +
>
>>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev)
>
>>>>> +{
>
>>>>> + if (!adev->kfd.init_complete || !adev->kfd.dev)
>
>>>>> + return 0;
>
>>>>> +
>
>>>>> + return kfd_topology_update_sysfs();
>
>>>>> +}
>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/
>
>>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h
>
>>>>> index 8bdfcde2029b5..07aa519b28d45 100644
>
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>
>>>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct
>
>>>>> amdgpu_device *adev, uint32_t node_id);
>
>>>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev,
>
>>>>> uint32_t xcp_id,
>
>>>>> bool core_override_enable, bool reg_override_enable, bool
>
>>>>> perfmon_override_enable);
>
>>>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev,
>
>>>>> uint32_t node_id);
>
>>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev);
>
>>>>> /* Read user wptr from a specified user address space with page fault
>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/
>
>>>>> gpu/ drm/amd/amdgpu/amdgpu_device.c
>
>>>>> index 467326871a81e..d4c8b03b6bf57 100644
>
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>
>>>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device
>
>>>>> *adev,
>
>>>>> */
>
>>>>> r = amdgpu_device_sys_interface_init(adev);
>
>>>>> + r = amdgpu_amdkfd_create_sysfs_links(adev);
>
>>>>> + if (r)
>
>>>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n",
>
>>>>> r);
>
>>>>> +
>
>>>>> if (IS_ENABLED(CONFIG_PERF_EVENTS))
>
>>>>> r = amdgpu_pmu_init(adev);
>
>>>>> if (r)
>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/
>
>>>>> gpu/ drm/amd/amdkfd/kfd_topology.c
>
>>>>> index a95be23fd0397..5f14c66902f9d 100644
>
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>
>>>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct
>
>>>>> kfd_topology_device *dev)
>
>>>>> struct kfd_mem_properties *mem;
>
>>>>> struct kfd_perf_properties *perf;
>
>>>>> + if (dev->gpu)
>
>>>>> + sysfs_remove_link(dev->kobj_node, "device");
>
>>>>> +
>
>>>>> if (dev->kobj_iolink) {
>
>>>>> list_for_each_entry(iolink, &dev->io_link_props, list)
>
>>>>> if (iolink->kobj) {
>
>>>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct
>
>>>>> kfd_topology_device *dev,
>
>>>>> return ret;
>
>>>>> }
>
>>>>> + /*
>
>>>>> + * create a link to the GPU node, but don't do a reverse one
>
>>>>> since it might
>
>>>>> + * not match after spatial partitioning
>
>>>>> + */
>
>>>>> + if (dev->gpu) {
>
>>>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj;
>
>>>>> +
>
>>>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj,
>
>>>>> "device");
>
>>>>> + if (ret)
>
>>>>> + return ret;
>
>>>>> + }
>
>>>>> +
>
>>>>> return 0;
>
>>>>> }
>
>>>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void)
>
>>>>> kfd_remove_sysfs_node_entry(dev);
>
>>>>> }
>
>>>>> -static int kfd_topology_update_sysfs(void)
>
>>>>> +int kfd_topology_update_sysfs(void)
>
>>>>> {
>
>>>>> int ret;
>
>>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/
>
>>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>
>>>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644
>
>>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>
>>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>
>>>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls {
>
>>>>> int engine, int queue);
>
>>>>> };
>
>>>>> +int kfd_topology_update_sysfs(void);
>
>>>>> +
>
>>>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
>
>>>>
>
>>>
>
>>
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-16 14:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16 6:00 [PATCH v3 0/2] Make device links between KFD and GPU device Mario Limonciello (AMD)
2025-12-16 6:00 ` [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD)
2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD)
2025-12-16 6:22 ` Lazar, Lijo
2025-12-16 12:19 ` Mario Limonciello
2025-12-16 12:40 ` Lazar, Lijo
2025-12-16 14:01 ` Mario Limonciello
2025-12-16 14:25 ` Kasiviswanathan, Harish
2025-12-16 14:41 ` Mario Limonciello
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.