* [PATCH v3 0/2] Make device links between KFD and GPU device @ 2025-12-16 6:00 Mario Limonciello (AMD) 2025-12-16 6:00 ` [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD) 2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD) 0 siblings, 2 replies; 9+ messages in thread From: Mario Limonciello (AMD) @ 2025-12-16 6:00 UTC (permalink / raw) To: amd-gfx; +Cc: Mario Limonciello (AMD) Discovering which GPU device is associated with a KFD node is relatively awkward right now in userspace. This series creates a link from KFD to GPU to simplify it for userspace. Mario Limonciello (AMD) (2): amdkfd: Only ignore -ENOENT for KFD init failuires amdkfd: Add device links between kfd device and amdgpu device drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++-- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ 6 files changed, 35 insertions(+), 3 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires 2025-12-16 6:00 [PATCH v3 0/2] Make device links between KFD and GPU device Mario Limonciello (AMD) @ 2025-12-16 6:00 ` Mario Limonciello (AMD) 2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD) 1 sibling, 0 replies; 9+ messages in thread From: Mario Limonciello (AMD) @ 2025-12-16 6:00 UTC (permalink / raw) To: amd-gfx; +Cc: Mario Limonciello (AMD), Kent Russell When compiled without CONFIG_HSA_AMD KFD will return -ENOENT. As other errors will cause KFD functionality issues this is the only error code that should be ignored at init. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 18658985a57ce..7eaea3f216fd3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -3177,8 +3177,10 @@ static int __init amdgpu_init(void) amdgpu_register_atpx_handler(); amdgpu_acpi_detect(); - /* Ignore KFD init failures. Normal when CONFIG_HSA_AMD is not set. */ - amdgpu_amdkfd_init(); + /* Ignore KFD init failures when CONFIG_HSA_AMD is not set. */ + r = amdgpu_amdkfd_init(); + if (r && r != -ENOENT) + goto error_fence; if (amdgpu_pp_feature_mask & PP_OVERDRIVE_MASK) { add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK); -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 6:00 [PATCH v3 0/2] Make device links between KFD and GPU device Mario Limonciello (AMD) 2025-12-16 6:00 ` [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD) @ 2025-12-16 6:00 ` Mario Limonciello (AMD) 2025-12-16 6:22 ` Lazar, Lijo 1 sibling, 1 reply; 9+ messages in thread From: Mario Limonciello (AMD) @ 2025-12-16 6:00 UTC (permalink / raw) To: amd-gfx; +Cc: Mario Limonciello (AMD), Harish.Kasiviswanathan Mapping out a KFD device to a GPU can be done manually by looking at the domain and location properties. To make it easier to discover which KFD device goes with what GPU add a link to the GPU node. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> --- Cc: Harish.Kasiviswanathan@amd.com> v3: * Create link when topology created * Only call update topology when amdgpu is called --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ 5 files changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 67a01c4f38855..870a727d6e938 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id, return r; } + +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) +{ + if (!adev->kfd.init_complete || !adev->kfd.dev) + return 0; + + return kfd_topology_update_sysfs(); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 8bdfcde2029b5..07aa519b28d45 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id); int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id, bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable); bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id); +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); /* Read user wptr from a specified user address space with page fault diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 467326871a81e..d4c8b03b6bf57 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device *adev, */ r = amdgpu_device_sys_interface_init(adev); + r = amdgpu_amdkfd_create_sysfs_links(adev); + if (r) + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r); + if (IS_ENABLED(CONFIG_PERF_EVENTS)) r = amdgpu_pmu_init(adev); if (r) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index a95be23fd0397..5f14c66902f9d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev) struct kfd_mem_properties *mem; struct kfd_perf_properties *perf; + if (dev->gpu) + sysfs_remove_link(dev->kobj_node, "device"); + if (dev->kobj_iolink) { list_for_each_entry(iolink, &dev->io_link_props, list) if (iolink->kobj) { @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev, return ret; } + /* + * create a link to the GPU node, but don't do a reverse one since it might + * not match after spatial partitioning + */ + if (dev->gpu) { + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; + + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device"); + if (ret) + return ret; + } + return 0; } @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) kfd_remove_sysfs_node_entry(dev); } -static int kfd_topology_update_sysfs(void) +int kfd_topology_update_sysfs(void) { int ret; diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 9aba8596faa7e..0ee1a7d3a73f5 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -335,4 +335,6 @@ struct kfd2kgd_calls { int engine, int queue); }; +int kfd_topology_update_sysfs(void); + #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD) @ 2025-12-16 6:22 ` Lazar, Lijo 2025-12-16 12:19 ` Mario Limonciello 0 siblings, 1 reply; 9+ messages in thread From: Lazar, Lijo @ 2025-12-16 6:22 UTC (permalink / raw) To: Mario Limonciello (AMD), amd-gfx; +Cc: Harish.Kasiviswanathan On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote: > Mapping out a KFD device to a GPU can be done manually by looking at the > domain and location properties. To make it easier to discover which > KFD device goes with what GPU add a link to the GPU node. > Access to the full device is not desirable in container environments where it is restricted to the particular partition's properties. Thanks, Lijo > Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> > --- > Cc: Harish.Kasiviswanathan@amd.com> > v3: > * Create link when topology created > * Only call update topology when amdgpu is called > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++- > drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ > 5 files changed, 31 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > index 67a01c4f38855..870a727d6e938 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id, > > return r; > } > + > +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) > +{ > + if (!adev->kfd.init_complete || !adev->kfd.dev) > + return 0; > + > + return kfd_topology_update_sysfs(); > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > index 8bdfcde2029b5..07aa519b28d45 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id); > int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, uint32_t xcp_id, > bool core_override_enable, bool reg_override_enable, bool perfmon_override_enable); > bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, uint32_t node_id); > +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); > > > /* Read user wptr from a specified user address space with page fault > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 467326871a81e..d4c8b03b6bf57 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device *adev, > */ > r = amdgpu_device_sys_interface_init(adev); > > + r = amdgpu_amdkfd_create_sysfs_links(adev); > + if (r) > + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r); > + > if (IS_ENABLED(CONFIG_PERF_EVENTS)) > r = amdgpu_pmu_init(adev); > if (r) > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > index a95be23fd0397..5f14c66902f9d 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev) > struct kfd_mem_properties *mem; > struct kfd_perf_properties *perf; > > + if (dev->gpu) > + sysfs_remove_link(dev->kobj_node, "device"); > + > if (dev->kobj_iolink) { > list_for_each_entry(iolink, &dev->io_link_props, list) > if (iolink->kobj) { > @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev, > return ret; > } > > + /* > + * create a link to the GPU node, but don't do a reverse one since it might > + * not match after spatial partitioning > + */ > + if (dev->gpu) { > + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; > + > + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device"); > + if (ret) > + return ret; > + } > + > return 0; > } > > @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) > kfd_remove_sysfs_node_entry(dev); > } > > -static int kfd_topology_update_sysfs(void) > +int kfd_topology_update_sysfs(void) > { > int ret; > > diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h > index 9aba8596faa7e..0ee1a7d3a73f5 100644 > --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h > +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h > @@ -335,4 +335,6 @@ struct kfd2kgd_calls { > int engine, int queue); > }; > > +int kfd_topology_update_sysfs(void); > + > #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 6:22 ` Lazar, Lijo @ 2025-12-16 12:19 ` Mario Limonciello 2025-12-16 12:40 ` Lazar, Lijo 0 siblings, 1 reply; 9+ messages in thread From: Mario Limonciello @ 2025-12-16 12:19 UTC (permalink / raw) To: Lazar, Lijo, amd-gfx; +Cc: Harish.Kasiviswanathan On 12/16/25 12:22 AM, Lazar, Lijo wrote: > > > On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote: >> Mapping out a KFD device to a GPU can be done manually by looking at the >> domain and location properties. To make it easier to discover which >> KFD device goes with what GPU add a link to the GPU node. >> > > Access to the full device is not desirable in container environments > where it is restricted to the particular partition's properties. > Container environments don't typically bind mount the whole sysfs tree do they? Nonetheless; even if they did this information is already discoverable, it's just a PIA to get to. > Thanks, > Lijo > >> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> >> --- >> Cc: Harish.Kasiviswanathan@amd.com> >> v3: >> * Create link when topology created >> * Only call update topology when amdgpu is called >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ >> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++- >> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ >> 5 files changed, 31 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/ >> drm/amd/amdgpu/amdgpu_amdkfd.c >> index 67a01c4f38855..870a727d6e938 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct >> amdgpu_device *adev, uint32_t xcp_id, >> return r; >> } >> + >> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) >> +{ >> + if (!adev->kfd.init_complete || !adev->kfd.dev) >> + return 0; >> + >> + return kfd_topology_update_sysfs(); >> +} >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/ >> drm/amd/amdgpu/amdgpu_amdkfd.h >> index 8bdfcde2029b5..07aa519b28d45 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device >> *adev, uint32_t node_id); >> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, >> uint32_t xcp_id, >> bool core_override_enable, bool reg_override_enable, bool >> perfmon_override_enable); >> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, >> uint32_t node_id); >> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); >> /* Read user wptr from a specified user address space with page fault >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/ >> drm/amd/amdgpu/amdgpu_device.c >> index 467326871a81e..d4c8b03b6bf57 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device *adev, >> */ >> r = amdgpu_device_sys_interface_init(adev); >> + r = amdgpu_amdkfd_create_sysfs_links(adev); >> + if (r) >> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r); >> + >> if (IS_ENABLED(CONFIG_PERF_EVENTS)) >> r = amdgpu_pmu_init(adev); >> if (r) >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/ >> drm/amd/amdkfd/kfd_topology.c >> index a95be23fd0397..5f14c66902f9d 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct >> kfd_topology_device *dev) >> struct kfd_mem_properties *mem; >> struct kfd_perf_properties *perf; >> + if (dev->gpu) >> + sysfs_remove_link(dev->kobj_node, "device"); >> + >> if (dev->kobj_iolink) { >> list_for_each_entry(iolink, &dev->io_link_props, list) >> if (iolink->kobj) { >> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct >> kfd_topology_device *dev, >> return ret; >> } >> + /* >> + * create a link to the GPU node, but don't do a reverse one >> since it might >> + * not match after spatial partitioning >> + */ >> + if (dev->gpu) { >> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; >> + >> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device"); >> + if (ret) >> + return ret; >> + } >> + >> return 0; >> } >> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) >> kfd_remove_sysfs_node_entry(dev); >> } >> -static int kfd_topology_update_sysfs(void) >> +int kfd_topology_update_sysfs(void) >> { >> int ret; >> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/ >> drivers/gpu/drm/amd/include/kgd_kfd_interface.h >> index 9aba8596faa7e..0ee1a7d3a73f5 100644 >> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >> @@ -335,4 +335,6 @@ struct kfd2kgd_calls { >> int engine, int queue); >> }; >> +int kfd_topology_update_sysfs(void); >> + >> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 12:19 ` Mario Limonciello @ 2025-12-16 12:40 ` Lazar, Lijo 2025-12-16 14:01 ` Mario Limonciello 0 siblings, 1 reply; 9+ messages in thread From: Lazar, Lijo @ 2025-12-16 12:40 UTC (permalink / raw) To: Mario Limonciello, amd-gfx; +Cc: Harish.Kasiviswanathan On 16-Dec-25 5:49 PM, Mario Limonciello wrote: > > > On 12/16/25 12:22 AM, Lazar, Lijo wrote: >> >> >> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote: >>> Mapping out a KFD device to a GPU can be done manually by looking at the >>> domain and location properties. To make it easier to discover which >>> KFD device goes with what GPU add a link to the GPU node. >>> >> >> Access to the full device is not desirable in container environments >> where it is restricted to the particular partition's properties. >> > > Container environments don't typically bind mount the whole sysfs tree > do they? > AFAIK, only selected ones and access restricted through cgroups. Thanks, Lijo > Nonetheless; even if they did this information is already discoverable, > it's just a PIA to get to. > >> Thanks, >> Lijo >> >>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> >>> --- >>> Cc: Harish.Kasiviswanathan@amd.com> >>> v3: >>> * Create link when topology created >>> * Only call update topology when amdgpu is called >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ >>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 ++++++++++++++++- >>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ >>> 5 files changed, 31 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/ >>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c >>> index 67a01c4f38855..870a727d6e938 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct >>> amdgpu_device *adev, uint32_t xcp_id, >>> return r; >>> } >>> + >>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) >>> +{ >>> + if (!adev->kfd.init_complete || !adev->kfd.dev) >>> + return 0; >>> + >>> + return kfd_topology_update_sysfs(); >>> +} >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/ >>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h >>> index 8bdfcde2029b5..07aa519b28d45 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct amdgpu_device >>> *adev, uint32_t node_id); >>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, >>> uint32_t xcp_id, >>> bool core_override_enable, bool reg_override_enable, bool >>> perfmon_override_enable); >>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, >>> uint32_t node_id); >>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); >>> /* Read user wptr from a specified user address space with page fault >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/ >>> gpu/ drm/amd/amdgpu/amdgpu_device.c >>> index 467326871a81e..d4c8b03b6bf57 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device >>> *adev, >>> */ >>> r = amdgpu_device_sys_interface_init(adev); >>> + r = amdgpu_amdkfd_create_sysfs_links(adev); >>> + if (r) >>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", r); >>> + >>> if (IS_ENABLED(CONFIG_PERF_EVENTS)) >>> r = amdgpu_pmu_init(adev); >>> if (r) >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/ >>> drm/amd/amdkfd/kfd_topology.c >>> index a95be23fd0397..5f14c66902f9d 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct >>> kfd_topology_device *dev) >>> struct kfd_mem_properties *mem; >>> struct kfd_perf_properties *perf; >>> + if (dev->gpu) >>> + sysfs_remove_link(dev->kobj_node, "device"); >>> + >>> if (dev->kobj_iolink) { >>> list_for_each_entry(iolink, &dev->io_link_props, list) >>> if (iolink->kobj) { >>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct >>> kfd_topology_device *dev, >>> return ret; >>> } >>> + /* >>> + * create a link to the GPU node, but don't do a reverse one >>> since it might >>> + * not match after spatial partitioning >>> + */ >>> + if (dev->gpu) { >>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; >>> + >>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, "device"); >>> + if (ret) >>> + return ret; >>> + } >>> + >>> return 0; >>> } >>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) >>> kfd_remove_sysfs_node_entry(dev); >>> } >>> -static int kfd_topology_update_sysfs(void) >>> +int kfd_topology_update_sysfs(void) >>> { >>> int ret; >>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/ >>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>> index 9aba8596faa7e..0ee1a7d3a73f5 100644 >>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls { >>> int engine, int queue); >>> }; >>> +int kfd_topology_update_sysfs(void); >>> + >>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 12:40 ` Lazar, Lijo @ 2025-12-16 14:01 ` Mario Limonciello 2025-12-16 14:25 ` Kasiviswanathan, Harish 0 siblings, 1 reply; 9+ messages in thread From: Mario Limonciello @ 2025-12-16 14:01 UTC (permalink / raw) To: Lazar, Lijo, amd-gfx; +Cc: Harish.Kasiviswanathan On 12/16/25 6:40 AM, Lazar, Lijo wrote: > > > On 16-Dec-25 5:49 PM, Mario Limonciello wrote: >> >> >> On 12/16/25 12:22 AM, Lazar, Lijo wrote: >>> >>> >>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote: >>>> Mapping out a KFD device to a GPU can be done manually by looking at >>>> the >>>> domain and location properties. To make it easier to discover which >>>> KFD device goes with what GPU add a link to the GPU node. >>>> >>> >>> Access to the full device is not desirable in container environments >>> where it is restricted to the particular partition's properties. >>> >> >> Container environments don't typically bind mount the whole sysfs tree >> do they? >> > > AFAIK, only selected ones and access restricted through cgroups. The information needed to discover is definitely already exposed. ❯ cat /sys/class/kfd/kfd/topology/nodes/1/properties | grep "domain\|location_id\|vendor_id\|device_id" vendor_id 4098 device_id 5510 location_id 49664 domain 0 But so what's going to happen with the new symlink then? Would cgroups export it if the device it points to wasn't exported? I'm not sure how to try this myself. > > Thanks, > Lijo > >> Nonetheless; even if they did this information is already >> discoverable, it's just a PIA to get to. >> >>> Thanks, >>> Lijo >>> >>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> >>>> --- >>>> Cc: Harish.Kasiviswanathan@amd.com> >>>> v3: >>>> * Create link when topology created >>>> * Only call update topology when amdgpu is called >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ >>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 +++++++++++++ >>>> +++- >>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ >>>> 5 files changed, 31 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/ >>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c >>>> index 67a01c4f38855..870a727d6e938 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct >>>> amdgpu_device *adev, uint32_t xcp_id, >>>> return r; >>>> } >>>> + >>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) >>>> +{ >>>> + if (!adev->kfd.init_complete || !adev->kfd.dev) >>>> + return 0; >>>> + >>>> + return kfd_topology_update_sysfs(); >>>> +} >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/ >>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h >>>> index 8bdfcde2029b5..07aa519b28d45 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct >>>> amdgpu_device *adev, uint32_t node_id); >>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, >>>> uint32_t xcp_id, >>>> bool core_override_enable, bool reg_override_enable, bool >>>> perfmon_override_enable); >>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, >>>> uint32_t node_id); >>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); >>>> /* Read user wptr from a specified user address space with page fault >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/ >>>> gpu/ drm/amd/amdgpu/amdgpu_device.c >>>> index 467326871a81e..d4c8b03b6bf57 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device >>>> *adev, >>>> */ >>>> r = amdgpu_device_sys_interface_init(adev); >>>> + r = amdgpu_amdkfd_create_sysfs_links(adev); >>>> + if (r) >>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", >>>> r); >>>> + >>>> if (IS_ENABLED(CONFIG_PERF_EVENTS)) >>>> r = amdgpu_pmu_init(adev); >>>> if (r) >>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/ >>>> gpu/ drm/amd/amdkfd/kfd_topology.c >>>> index a95be23fd0397..5f14c66902f9d 100644 >>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct >>>> kfd_topology_device *dev) >>>> struct kfd_mem_properties *mem; >>>> struct kfd_perf_properties *perf; >>>> + if (dev->gpu) >>>> + sysfs_remove_link(dev->kobj_node, "device"); >>>> + >>>> if (dev->kobj_iolink) { >>>> list_for_each_entry(iolink, &dev->io_link_props, list) >>>> if (iolink->kobj) { >>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct >>>> kfd_topology_device *dev, >>>> return ret; >>>> } >>>> + /* >>>> + * create a link to the GPU node, but don't do a reverse one >>>> since it might >>>> + * not match after spatial partitioning >>>> + */ >>>> + if (dev->gpu) { >>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; >>>> + >>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, >>>> "device"); >>>> + if (ret) >>>> + return ret; >>>> + } >>>> + >>>> return 0; >>>> } >>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) >>>> kfd_remove_sysfs_node_entry(dev); >>>> } >>>> -static int kfd_topology_update_sysfs(void) >>>> +int kfd_topology_update_sysfs(void) >>>> { >>>> int ret; >>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/ >>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644 >>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls { >>>> int engine, int queue); >>>> }; >>>> +int kfd_topology_update_sysfs(void); >>>> + >>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ >>> >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 14:01 ` Mario Limonciello @ 2025-12-16 14:25 ` Kasiviswanathan, Harish 2025-12-16 14:41 ` Mario Limonciello 0 siblings, 1 reply; 9+ messages in thread From: Kasiviswanathan, Harish @ 2025-12-16 14:25 UTC (permalink / raw) To: Mario Limonciello, Lazar, Lijo, amd-gfx@lists.freedesktop.org To try this scenario, you can do the following. $ set partition mode to QPX $ run docker but instead of /dev/dri use /dev/dri/renderD128 --> this way the container can access only one single partition of QPX. # inside docker ## run rocminfo --> you should only see one device ## In /sys/class/kfd/kfd/topology/nodes --> you could see the all 4 nodes but only /sys/class/kfd/kfd/topology/nodes/1/ will be accessible This was designed this way as hiding the nodes was hard to architecture. Hence, from the container the nodes list is visible but access is based on cgroups. So, with /dev/dri/renderDXXX only one of the /sys/class/kfd/kfd/topology/nodes/X/ will be visible and usable. ________________________________________ From: Mario Limonciello <superm1@kernel.org> Sent: Tuesday, December 16, 2025 9:01 AM To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com> Subject: Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device On 12/16/25 6:40 AM, Lazar, Lijo wrote: > > > On 16-Dec-25 5:49 PM, Mario Limonciello wrote: >> >> >> On 12/16/25 12:22 AM, Lazar, Lijo wrote: >>> >>> >>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote: >>>> Mapping out a KFD device to a GPU can be done manually by looking at >>>> the >>>> domain and location properties. To make it easier to discover which >>>> KFD device goes with what GPU add a link to the GPU node. >>>> >>> >>> Access to the full device is not desirable in container environments >>> where it is restricted to the particular partition's properties. >>> >> >> Container environments don't typically bind mount the whole sysfs tree >> do they? >> > > AFAIK, only selected ones and access restricted through cgroups. The information needed to discover is definitely already exposed. ❯ cat /sys/class/kfd/kfd/topology/nodes/1/properties | grep "domain\|location_id\|vendor_id\|device_id" vendor_id 4098 device_id 5510 location_id 49664 domain 0 But so what's going to happen with the new symlink then? Would cgroups export it if the device it points to wasn't exported? I'm not sure how to try this myself. > > Thanks, > Lijo > >> Nonetheless; even if they did this information is already >> discoverable, it's just a PIA to get to. >> >>> Thanks, >>> Lijo >>> >>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> >>>> --- >>>> Cc: Harish.Kasiviswanathan@amd.com> >>>> v3: >>>> * Create link when topology created >>>> * Only call update topology when amdgpu is called >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ >>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 +++++++++++++ >>>> +++- >>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ >>>> 5 files changed, 31 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/ >>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c >>>> index 67a01c4f38855..870a727d6e938 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct >>>> amdgpu_device *adev, uint32_t xcp_id, >>>> return r; >>>> } >>>> + >>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) >>>> +{ >>>> + if (!adev->kfd.init_complete || !adev->kfd.dev) >>>> + return 0; >>>> + >>>> + return kfd_topology_update_sysfs(); >>>> +} >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/ >>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h >>>> index 8bdfcde2029b5..07aa519b28d45 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct >>>> amdgpu_device *adev, uint32_t node_id); >>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, >>>> uint32_t xcp_id, >>>> bool core_override_enable, bool reg_override_enable, bool >>>> perfmon_override_enable); >>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, >>>> uint32_t node_id); >>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); >>>> /* Read user wptr from a specified user address space with page fault >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/ >>>> gpu/ drm/amd/amdgpu/amdgpu_device.c >>>> index 467326871a81e..d4c8b03b6bf57 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device >>>> *adev, >>>> */ >>>> r = amdgpu_device_sys_interface_init(adev); >>>> + r = amdgpu_amdkfd_create_sysfs_links(adev); >>>> + if (r) >>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", >>>> r); >>>> + >>>> if (IS_ENABLED(CONFIG_PERF_EVENTS)) >>>> r = amdgpu_pmu_init(adev); >>>> if (r) >>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/ >>>> gpu/ drm/amd/amdkfd/kfd_topology.c >>>> index a95be23fd0397..5f14c66902f9d 100644 >>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct >>>> kfd_topology_device *dev) >>>> struct kfd_mem_properties *mem; >>>> struct kfd_perf_properties *perf; >>>> + if (dev->gpu) >>>> + sysfs_remove_link(dev->kobj_node, "device"); >>>> + >>>> if (dev->kobj_iolink) { >>>> list_for_each_entry(iolink, &dev->io_link_props, list) >>>> if (iolink->kobj) { >>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct >>>> kfd_topology_device *dev, >>>> return ret; >>>> } >>>> + /* >>>> + * create a link to the GPU node, but don't do a reverse one >>>> since it might >>>> + * not match after spatial partitioning >>>> + */ >>>> + if (dev->gpu) { >>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; >>>> + >>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, >>>> "device"); >>>> + if (ret) >>>> + return ret; >>>> + } >>>> + >>>> return 0; >>>> } >>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) >>>> kfd_remove_sysfs_node_entry(dev); >>>> } >>>> -static int kfd_topology_update_sysfs(void) >>>> +int kfd_topology_update_sysfs(void) >>>> { >>>> int ret; >>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/ >>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644 >>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h >>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls { >>>> int engine, int queue); >>>> }; >>>> +int kfd_topology_update_sysfs(void); >>>> + >>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ >>> >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device 2025-12-16 14:25 ` Kasiviswanathan, Harish @ 2025-12-16 14:41 ` Mario Limonciello 0 siblings, 0 replies; 9+ messages in thread From: Mario Limonciello @ 2025-12-16 14:41 UTC (permalink / raw) To: Kasiviswanathan, Harish, Lazar, Lijo, amd-gfx@lists.freedesktop.org On 12/16/25 8:25 AM, Kasiviswanathan, Harish wrote: > To try this scenario, you can do the following. Ah I don't have an Instinct on my side to try this, only multiple radeon cards. > > $ set partition mode to QPX > $ run docker but instead of /dev/dri use /dev/dri/renderD128 --> this way the container can access only one single partition of QPX. > # inside docker > ## run rocminfo --> you should only see one device > ## In /sys/class/kfd/kfd/topology/nodes --> you could see the all 4 nodes but only /sys/class/kfd/kfd/topology/nodes/1/ will be accessible > > This was designed this way as hiding the nodes was hard to architecture. Hence, from the container the nodes list is visible but access is based on cgroups. So, with /dev/dri/renderDXXX only one of the /sys/class/kfd/kfd/topology/nodes/X/ will be visible and usable. > But so shouldn't /sys/class/drm/renderD128 also be exposed too then? This has a device sub-link too which should match what this change does. I would expect this also exposes. ❯ ls -alh /sys/class/drm/renderD128/ | grep device lrwxrwxrwx - root 16 Dec 08:40 device -> ../../../0000:c2:00.0 > > > > ________________________________________ > From: Mario Limonciello <superm1@kernel.org> > Sent: Tuesday, December 16, 2025 9:01 AM > To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> > Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com> > Subject: Re: [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device > > > On 12/16/25 6:40 AM, Lazar, Lijo wrote: > >> > >> > >> On 16-Dec-25 5:49 PM, Mario Limonciello wrote: > >>> > >>> > >>> On 12/16/25 12:22 AM, Lazar, Lijo wrote: > >>>> > >>>> > >>>> On 16-Dec-25 11:30 AM, Mario Limonciello (AMD) wrote: > >>>>> Mapping out a KFD device to a GPU can be done manually by looking at > >>>>> the > >>>>> domain and location properties. To make it easier to discover which > >>>>> KFD device goes with what GPU add a link to the GPU node. > >>>>> > >>>> > >>>> Access to the full device is not desirable in container environments > >>>> where it is restricted to the particular partition's properties. > >>>> > >>> > >>> Container environments don't typically bind mount the whole sysfs tree > >>> do they? > >>> > >> > >> AFAIK, only selected ones and access restricted through cgroups. > > > > The information needed to discover is definitely already exposed. > > > > ❯ cat /sys/class/kfd/kfd/topology/nodes/1/properties | grep > > "domain\|location_id\|vendor_id\|device_id" > > vendor_id 4098 > > device_id 5510 > > location_id 49664 > > domain 0 > > > > But so what's going to happen with the new symlink then? Would cgroups > > export it if the device it points to wasn't exported? > > > > I'm not sure how to try this myself. > >> > >> Thanks, > >> Lijo > >> > >>> Nonetheless; even if they did this information is already > >>> discoverable, it's just a PIA to get to. > >>> > >>>> Thanks, > >>>> Lijo > >>>> > >>>>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> > >>>>> --- > >>>>> Cc: Harish.Kasiviswanathan@amd.com> > >>>>> v3: > >>>>> * Create link when topology created > >>>>> * Only call update topology when amdgpu is called > >>>>> --- > >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++ > >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + > >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ > >>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 17 +++++++++++++ > >>>>> +++- > >>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 2 ++ > >>>>> 5 files changed, 31 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/ > >>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.c > >>>>> index 67a01c4f38855..870a727d6e938 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > >>>>> @@ -910,3 +910,11 @@ int amdgpu_amdkfd_config_sq_perfmon(struct > >>>>> amdgpu_device *adev, uint32_t xcp_id, > >>>>> return r; > >>>>> } > >>>>> + > >>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev) > >>>>> +{ > >>>>> + if (!adev->kfd.init_complete || !adev->kfd.dev) > >>>>> + return 0; > >>>>> + > >>>>> + return kfd_topology_update_sysfs(); > >>>>> +} > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/ > >>>>> gpu/ drm/amd/amdgpu/amdgpu_amdkfd.h > >>>>> index 8bdfcde2029b5..07aa519b28d45 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > >>>>> @@ -268,6 +268,7 @@ int amdgpu_amdkfd_stop_sched(struct > >>>>> amdgpu_device *adev, uint32_t node_id); > >>>>> int amdgpu_amdkfd_config_sq_perfmon(struct amdgpu_device *adev, > >>>>> uint32_t xcp_id, > >>>>> bool core_override_enable, bool reg_override_enable, bool > >>>>> perfmon_override_enable); > >>>>> bool amdgpu_amdkfd_compute_active(struct amdgpu_device *adev, > >>>>> uint32_t node_id); > >>>>> +int amdgpu_amdkfd_create_sysfs_links(struct amdgpu_device *adev); > >>>>> /* Read user wptr from a specified user address space with page fault > >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/ > >>>>> gpu/ drm/amd/amdgpu/amdgpu_device.c > >>>>> index 467326871a81e..d4c8b03b6bf57 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>> @@ -5123,6 +5123,10 @@ int amdgpu_device_init(struct amdgpu_device > >>>>> *adev, > >>>>> */ > >>>>> r = amdgpu_device_sys_interface_init(adev); > >>>>> + r = amdgpu_amdkfd_create_sysfs_links(adev); > >>>>> + if (r) > >>>>> + dev_err(adev->dev, "Failed to create KFD sysfs link: %d\n", > >>>>> r); > >>>>> + > >>>>> if (IS_ENABLED(CONFIG_PERF_EVENTS)) > >>>>> r = amdgpu_pmu_init(adev); > >>>>> if (r) > >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/ > >>>>> gpu/ drm/amd/amdkfd/kfd_topology.c > >>>>> index a95be23fd0397..5f14c66902f9d 100644 > >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > >>>>> @@ -571,6 +571,9 @@ static void kfd_remove_sysfs_node_entry(struct > >>>>> kfd_topology_device *dev) > >>>>> struct kfd_mem_properties *mem; > >>>>> struct kfd_perf_properties *perf; > >>>>> + if (dev->gpu) > >>>>> + sysfs_remove_link(dev->kobj_node, "device"); > >>>>> + > >>>>> if (dev->kobj_iolink) { > >>>>> list_for_each_entry(iolink, &dev->io_link_props, list) > >>>>> if (iolink->kobj) { > >>>>> @@ -819,6 +822,18 @@ static int kfd_build_sysfs_node_entry(struct > >>>>> kfd_topology_device *dev, > >>>>> return ret; > >>>>> } > >>>>> + /* > >>>>> + * create a link to the GPU node, but don't do a reverse one > >>>>> since it might > >>>>> + * not match after spatial partitioning > >>>>> + */ > >>>>> + if (dev->gpu) { > >>>>> + struct kobject *amdgpu_kobj = &dev->gpu->adev->dev->kobj; > >>>>> + > >>>>> + ret = sysfs_create_link(dev->kobj_node, amdgpu_kobj, > >>>>> "device"); > >>>>> + if (ret) > >>>>> + return ret; > >>>>> + } > >>>>> + > >>>>> return 0; > >>>>> } > >>>>> @@ -848,7 +863,7 @@ static void kfd_remove_sysfs_node_tree(void) > >>>>> kfd_remove_sysfs_node_entry(dev); > >>>>> } > >>>>> -static int kfd_topology_update_sysfs(void) > >>>>> +int kfd_topology_update_sysfs(void) > >>>>> { > >>>>> int ret; > >>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/ > >>>>> drivers/gpu/drm/amd/include/kgd_kfd_interface.h > >>>>> index 9aba8596faa7e..0ee1a7d3a73f5 100644 > >>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h > >>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h > >>>>> @@ -335,4 +335,6 @@ struct kfd2kgd_calls { > >>>>> int engine, int queue); > >>>>> }; > >>>>> +int kfd_topology_update_sysfs(void); > >>>>> + > >>>>> #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ > >>>> > >>> > >> > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-16 14:41 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-16 6:00 [PATCH v3 0/2] Make device links between KFD and GPU device Mario Limonciello (AMD) 2025-12-16 6:00 ` [PATCH v3 1/2] amdkfd: Only ignore -ENOENT for KFD init failuires Mario Limonciello (AMD) 2025-12-16 6:00 ` [PATCH v3 2/2] amdkfd: Add device links between kfd device and amdgpu device Mario Limonciello (AMD) 2025-12-16 6:22 ` Lazar, Lijo 2025-12-16 12:19 ` Mario Limonciello 2025-12-16 12:40 ` Lazar, Lijo 2025-12-16 14:01 ` Mario Limonciello 2025-12-16 14:25 ` Kasiviswanathan, Harish 2025-12-16 14:41 ` Mario Limonciello
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.