* [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
@ 2025-10-21 9:45 chong li
0 siblings, 0 replies; 9+ messages in thread
From: chong li @ 2025-10-21 9:45 UTC (permalink / raw)
To: amd-gfx; +Cc: Horace.Chen, Harish.Kasiviswanathan, chong li
[Why]
the function "pci_p2pdma_add_resource" in "kfd_ais_init",
and function "devm_memremap_pages" in "kgd2kfd_init_zone_device",
sometimes will cost time about 4s,but the gpu full access time is 3s,
will cause gim reset vf.
[How]
move the two function after release full gpu access(amdgpu_virt_release_full_gpu).
Signed-off-by: chong li <chongli2@amd.com>
Change-Id: I2db38d905d9dd7fedc4c6a38e325320268c2d84d
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 882fff5a7598..93f66a03ee01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3331,12 +3331,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
if (adev->mman.buffer_funcs_ring->sched.ready)
amdgpu_ttm_set_buffer_funcs_status(adev, true);
- /* Don't init kfd if whole hive need to be reset during init */
- if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
- kgd2kfd_init_zone_device(adev);
- amdgpu_amdkfd_device_init(adev);
- }
-
amdgpu_fru_get_product_info(adev);
if (!amdgpu_sriov_vf(adev) || amdgpu_sriov_ras_cper_en(adev))
@@ -4926,6 +4920,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
flush_delayed_work(&adev->delayed_init_work);
}
+ /* Don't init kfd if whole hive need to be reset during init */
+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
+ kgd2kfd_init_zone_device(adev);
+ amdgpu_amdkfd_device_init(adev);
+ }
+
if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
amdgpu_xgmi_reset_on_init(adev);
/*
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
@ 2025-11-07 7:07 chong li
2025-11-07 7:12 ` Deng, Emily
0 siblings, 1 reply; 9+ messages in thread
From: chong li @ 2025-11-07 7:07 UTC (permalink / raw)
To: amd-gfx; +Cc: Emily.Deng, chong li
[Why]
function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
sometimes cost too much time.
[How]
move the function "kgd2kfd_init_zone_device" after release full gpu access(amdgpu_virt_release_full_gpu).
Signed-off-by: chong li <chongli2@amd.com>
Change-Id: I3eebd7272b8f0c85d08fec80acee67a2c9e59e52
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1e8725abcded..9aacf8fdb38a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3314,7 +3314,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
/* Don't init kfd if whole hive need to be reset during init */
if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
- kgd2kfd_init_zone_device(adev);
amdgpu_amdkfd_device_init(adev);
}
@@ -4929,6 +4928,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
amdgpu_xgmi_reset_on_init(adev);
+
+ /* Don't init kfd if whole hive need to be reset during init */
+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
+ kgd2kfd_init_zone_device(adev);
+
/*
* Place those sysfs registering after `late_init`. As some of those
* operations performed in `late_init` might affect the sysfs
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
2025-11-07 7:07 chong li
@ 2025-11-07 7:12 ` Deng, Emily
0 siblings, 0 replies; 9+ messages in thread
From: Deng, Emily @ 2025-11-07 7:12 UTC (permalink / raw)
To: Li, Chong(Alan), amd-gfx@lists.freedesktop.org
[AMD Official Use Only - AMD Internal Distribution Only]
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Emily Deng
Best Wishes
>-----Original Message-----
>From: Li, Chong(Alan) <Chong.Li@amd.com>
>Sent: Friday, November 7, 2025 3:08 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>; Li, Chong(Alan) <Chong.Li@amd.com>
>Subject: [PATCH] drm/amdgpu: reduce the full gpu access time in
>amdgpu_device_init.
>
>[Why]
>function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
>sometimes cost too much time.
>
>[How]
>move the function "kgd2kfd_init_zone_device" after release full gpu
>access(amdgpu_virt_release_full_gpu).
>
>Signed-off-by: chong li <chongli2@amd.com>
>Change-Id: I3eebd7272b8f0c85d08fec80acee67a2c9e59e52
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 1e8725abcded..9aacf8fdb38a 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3314,7 +3314,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device
>*adev)
>
> /* Don't init kfd if whole hive need to be reset during init */
> if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>- kgd2kfd_init_zone_device(adev);
> amdgpu_amdkfd_device_init(adev);
> }
>
>@@ -4929,6 +4928,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>
> if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
> amdgpu_xgmi_reset_on_init(adev);
>+
>+ /* Don't init kfd if whole hive need to be reset during init */
>+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
>+ kgd2kfd_init_zone_device(adev);
>+
> /*
> * Place those sysfs registering after `late_init`. As some of those
> * operations performed in `late_init` might affect the sysfs
>--
>2.48.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
@ 2025-11-07 9:06 chong li
0 siblings, 0 replies; 9+ messages in thread
From: chong li @ 2025-11-07 9:06 UTC (permalink / raw)
To: amd-gfx; +Cc: Emily.Deng, Harish.Kasiviswanathan, chong li
[Why]
the function "pci_p2pdma_add_resource" in function "kfd_ais_init",
cost too much time.
[How]
move the function "kfd_ais_init" out of gpu full access regions.
Signed-off-by: chong li <chongli2@amd.com>
Change-Id: I2db38d905d9dd7fedc4c6a38e325320268c2d84d
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ---
6 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 39d712e3e692..e6829e5c8801 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -37,6 +37,7 @@
#include "amdgpu_umc.h"
#include "amdgpu_reset.h"
#include "amdgpu_ras_mgr.h"
+#include "kfd_priv.h"
/* Total memory size in system memory and all GPU VRAM. Used to
* estimate worst case amount of memory to reserve for page tables
@@ -234,6 +235,13 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
}
}
+void amdgpu_amdkfd_device_late_init(struct amdgpu_device *adev)
+{
+ kfd_ais_init(adev);
+ adev->kfd.dev->init_complete = true;
+ adev->kfd.init_complete = true;
+}
+
void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev)
{
if (adev->kfd.dev) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 40c46e6c8898..504cf90b84e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -170,6 +170,7 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
const void *ih_ring_entry);
void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
+void amdgpu_amdkfd_device_late_init(struct amdgpu_device *adev);
void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev);
int amdgpu_amdkfd_check_and_lock_kfd(struct amdgpu_device *adev);
void amdgpu_amdkfd_unlock_kfd(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index eca11fbc637a..6c8f0de84727 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4760,6 +4760,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
* completed before the need for a different level is detected.
*/
amdgpu_set_init_level(adev, AMDGPU_INIT_LEVEL_DEFAULT);
+
/* early init functions */
r = amdgpu_device_ip_early_init(adev);
if (r)
@@ -4971,6 +4972,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
flush_delayed_work(&adev->delayed_init_work);
}
+ /* Don't init kfd if whole hive need to be reset during init */
+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
+ amdgpu_amdkfd_device_late_init(adev);
+ }
+
if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
amdgpu_xgmi_reset_on_init(adev);
/*
@@ -6628,8 +6634,10 @@ static void amdgpu_device_gpu_resume(struct amdgpu_device *adev,
/* kfd_post_reset will do nothing if kfd device is not initialized,
* need to bring up kfd here if it's not be initialized before
*/
- if (!adev->kfd.init_complete)
+ if (!adev->kfd.init_complete) {
amdgpu_amdkfd_device_init(adev);
+ amdgpu_amdkfd_device_late_init(adev);
+ }
if (tmp_adev->pcie_reset_ctx.audio_suspended)
amdgpu_device_resume_display_audio(tmp_adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 4c50530e7c32..423ff3c8502b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -89,6 +89,7 @@ static int amdgpu_reset_xgmi_reset_on_init_restore_hwctxt(
if (!tmp_adev->kfd.init_complete) {
kgd2kfd_init_zone_device(tmp_adev);
amdgpu_amdkfd_device_init(tmp_adev);
+ amdgpu_amdkfd_device_late_init(tmp_adev);
amdgpu_amdkfd_drm_client_create(tmp_adev);
}
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 6b375665507d..65c3136413f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
@@ -686,6 +686,7 @@ int amdgpu_xcp_post_partition_switch(struct amdgpu_xcp_mgr *xcp_mgr, u32 flags)
if (flags & AMDGPU_XCP_OPS_KFD) {
amdgpu_amdkfd_device_probe(xcp_mgr->adev);
amdgpu_amdkfd_device_init(xcp_mgr->adev);
+ amdgpu_amdkfd_device_late_init(xcp_mgr->adev);
/* If KFD init failed, return failure */
if (!xcp_mgr->adev->kfd.init_complete)
ret = -EIO;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 784c28fbadda..a4a91244cbc6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -935,9 +935,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
svm_range_set_max_pages(kfd->adev);
- kfd_ais_init(kfd->adev);
-
- kfd->init_complete = true;
dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
kfd->adev->pdev->device);
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
@ 2025-11-11 7:02 chong li
0 siblings, 0 replies; 9+ messages in thread
From: chong li @ 2025-11-11 7:02 UTC (permalink / raw)
To: amd-gfx; +Cc: Emily.Deng, Victor.Zhao, philip.yang, felix.kuehling, chong li
[Why]
function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
sometimes cost too much time.
[How]
move the function "kgd2kfd_init_zone_device"
after release full gpu access(amdgpu_virt_release_full_gpu).
Signed-off-by: chong li <chongli2@amd.com>
Change-Id: I3eebd7272b8f0c85d08fec80acee67a2c9e59e52
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 24 ++++++++++++++++++++++
3 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 860ac1f9e35d..c293e9a24d48 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -84,6 +84,8 @@
#include <drm/drm_drv.h>
+#include <kfd_priv.h>
+
#if IS_ENABLED(CONFIG_X86)
#include <asm/intel-family.h>
#include <asm/cpu_device_id.h>
@@ -3314,7 +3316,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
/* Don't init kfd if whole hive need to be reset during init */
if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
- kgd2kfd_init_zone_device(adev);
amdgpu_amdkfd_device_init(adev);
}
@@ -4918,6 +4919,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
amdgpu_xgmi_reset_on_init(adev);
+
+ /* Don't init kfd if whole hive need to be reset during init */
+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
+ kgd2kfd_init_zone_device(adev);
+ kfd_update_svm_support_properties(adev);
+ }
+
/*
* Place those sysfs registering after `late_init`. As some of those
* operations performed in `late_init` might affect the sysfs
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 70ef051511bb..bd1c6d1742c8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1358,6 +1358,7 @@ struct process_queue_node {
void kfd_process_dequeue_from_device(struct kfd_process_device *pdd);
void kfd_process_dequeue_from_all_devices(struct kfd_process *p);
+void kfd_update_svm_support_properties(struct amdgpu_device *adev);
int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p);
void pqm_uninit(struct process_queue_manager *pqm);
int pqm_create_queue(struct process_queue_manager *pqm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 5c98746eb72d..f25d867ec40a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2401,4 +2401,28 @@ int kfd_debugfs_rls_by_device(struct seq_file *m, void *data)
return r;
}
+void kfd_update_svm_support_properties(struct amdgpu_device *adev)
+{
+ struct kfd_topology_device *dev;
+ int ret;
+ down_write(&topology_lock);
+
+ list_for_each_entry(dev, &topology_device_list, list) {
+ if (dev->gpu && dev->gpu->adev == adev) {
+ if (KFD_IS_SVM_API_SUPPORTED(adev)) {
+ dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED;
+ ret = kfd_topology_update_sysfs();
+ if (!ret)
+ sys_props.generation_count++;
+ else
+ dev_err(adev->dev, "Failed to update SVM support properties. ret=%d\n", ret); }
+ else
+ dev->node_props.capability &= ~HSA_CAP_SVMAPI_SUPPORTED;
+ }
+ }
+
+ up_write(&topology_lock);
+ return ;
+}
+
#endif
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
@ 2025-11-17 6:38 chong li
2025-11-17 6:41 ` Li, Chong(Alan)
2026-01-06 20:47 ` Kasiviswanathan, Harish
0 siblings, 2 replies; 9+ messages in thread
From: chong li @ 2025-11-17 6:38 UTC (permalink / raw)
To: amd-gfx; +Cc: emily.deng, victor.zhao, philip.yang, felix.kuehling, chong li
[Why]
function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
sometimes cost too much time.
[How]
move the function "kgd2kfd_init_zone_device"
after release full gpu access(amdgpu_virt_release_full_gpu).
v2:
improve the coding style.
Signed-off-by: chong li <chongli2@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++-
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 23 ++++++++++++++++++++++
drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++++++
4 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 40c46e6c8898..6d204ba2c267 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -37,7 +37,7 @@
#include "amdgpu_sync.h"
#include "amdgpu_vm.h"
#include "amdgpu_xcp.h"
-
+#include "kfd_topology.h"
extern uint64_t amdgpu_amdkfd_total_mem_size;
enum TLB_FLUSH_TYPE {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0b40ddcb8ba1..b4e1f258119c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3333,7 +3333,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
/* Don't init kfd if whole hive need to be reset during init */
if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
- kgd2kfd_init_zone_device(adev);
amdgpu_amdkfd_device_init(adev);
}
@@ -4931,6 +4930,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
amdgpu_xgmi_reset_on_init(adev);
+
+ /* Don't init kfd if whole hive need to be reset during init */
+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
+ kgd2kfd_init_zone_device(adev);
+ kfd_update_svm_support_properties(adev);
+ }
+
/*
* Place those sysfs registering after `late_init`. As some of those
* operations performed in `late_init` might affect the sysfs
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 8644039777b8..8511b00a7463 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2475,3 +2475,26 @@ int kfd_debugfs_rls_by_device(struct seq_file *m, void *data)
}
#endif
+
+void kfd_update_svm_support_properties(struct amdgpu_device *adev)
+{
+ struct kfd_topology_device *dev;
+ int ret;
+
+ down_write(&topology_lock);
+ list_for_each_entry(dev, &topology_device_list, list) {
+ if (!dev->gpu || dev->gpu->adev != adev)
+ continue;
+
+ if (KFD_IS_SVM_API_SUPPORTED(adev)) {
+ dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED;
+ ret = kfd_topology_update_sysfs();
+ if (!ret)
+ sys_props.generation_count++;
+ else
+ dev_err(adev->dev, "Failed to update SVM support properties. ret=%d\n", ret);
+ } else
+ dev->node_props.capability &= ~HSA_CAP_SVMAPI_SUPPORTED;
+ }
+ up_write(&topology_lock);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index ab7a3bf1bdef..129b447fcf84 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -202,4 +202,10 @@ struct kfd_topology_device *kfd_create_topology_device(
struct list_head *device_list);
void kfd_release_topology_device_list(struct list_head *device_list);
+#if IS_ENABLED(CONFIG_HSA_AMD)
+void kfd_update_svm_support_properties(struct amdgpu_device *adev);
+#else
+static inline void kfd_update_svm_support_properties(struct amdgpu_device *adev) {}
+#endif
+
#endif /* __KFD_TOPOLOGY_H__ */
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
2025-11-17 6:38 chong li
@ 2025-11-17 6:41 ` Li, Chong(Alan)
2026-01-06 20:47 ` Kasiviswanathan, Harish
1 sibling, 0 replies; 9+ messages in thread
From: Li, Chong(Alan) @ 2025-11-17 6:41 UTC (permalink / raw)
To: Yang, Philip, Kuehling, Felix
Cc: Deng, Emily, Zhao, Victor, amd-gfx@lists.freedesktop.org
[-- Attachment #1: Type: text/plain, Size: 5287 bytes --]
[AMD Official Use Only - AMD Internal Distribution Only]
Hi, @Kuehling, Felix<mailto:Felix.Kuehling@amd.com>, @Yang, Philip<mailto:Philip.Yang@amd.com>.
Can you help to review my patch?
Thanks,
Chong.
-----Original Message-----
From: Li, Chong(Alan) <Chong.Li@amd.com>
Sent: Monday, November 17, 2025 2:38 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily <Emily.Deng@amd.com>; Zhao, Victor <Victor.Zhao@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Li, Chong(Alan) <Chong.Li@amd.com>
Subject: [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
[Why]
function "devm_memremap_pages" in function "kgd2kfd_init_zone_device", sometimes cost too much time.
[How]
move the function "kgd2kfd_init_zone_device"
after release full gpu access(amdgpu_virt_release_full_gpu).
v2:
improve the coding style.
Signed-off-by: chong li <chongli2@amd.com<mailto:chongli2@amd.com>>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 23 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++++++
4 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 40c46e6c8898..6d204ba2c267 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -37,7 +37,7 @@
#include "amdgpu_sync.h"
#include "amdgpu_vm.h"
#include "amdgpu_xcp.h"
-
+#include "kfd_topology.h"
extern uint64_t amdgpu_amdkfd_total_mem_size;
enum TLB_FLUSH_TYPE {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0b40ddcb8ba1..b4e1f258119c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3333,7 +3333,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
/* Don't init kfd if whole hive need to be reset during init */
if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
- kgd2kfd_init_zone_device(adev);
amdgpu_amdkfd_device_init(adev);
}
@@ -4931,6 +4930,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
amdgpu_xgmi_reset_on_init(adev);
+
+ /* Don't init kfd if whole hive need to be reset during init */
+ if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
+ kgd2kfd_init_zone_device(adev);
+ kfd_update_svm_support_properties(adev);
+ }
+
/*
* Place those sysfs registering after `late_init`. As some of those
* operations performed in `late_init` might affect the sysfs diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 8644039777b8..8511b00a7463 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2475,3 +2475,26 @@ int kfd_debugfs_rls_by_device(struct seq_file *m, void *data) }
#endif
+
+void kfd_update_svm_support_properties(struct amdgpu_device *adev) {
+ struct kfd_topology_device *dev;
+ int ret;
+
+ down_write(&topology_lock);
+ list_for_each_entry(dev, &topology_device_list, list) {
+ if (!dev->gpu || dev->gpu->adev != adev)
+ continue;
+
+ if (KFD_IS_SVM_API_SUPPORTED(adev)) {
+ dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED;
+ ret = kfd_topology_update_sysfs();
+ if (!ret)
+ sys_props.generation_count++;
+ else
+ dev_err(adev->dev, "Failed to update SVM support properties. ret=%d\n", ret);
+ } else
+ dev->node_props.capability &= ~HSA_CAP_SVMAPI_SUPPORTED;
+ }
+ up_write(&topology_lock);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index ab7a3bf1bdef..129b447fcf84 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -202,4 +202,10 @@ struct kfd_topology_device *kfd_create_topology_device(
struct list_head *device_list);
void kfd_release_topology_device_list(struct list_head *device_list);
+#if IS_ENABLED(CONFIG_HSA_AMD)
+void kfd_update_svm_support_properties(struct amdgpu_device *adev);
+#else static inline void kfd_update_svm_support_properties(struct
+amdgpu_device *adev) {} #endif
+
#endif /* __KFD_TOPOLOGY_H__ */
--
2.48.1
[-- Attachment #2: Type: text/html, Size: 14623 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
2025-11-17 6:38 chong li
2025-11-17 6:41 ` Li, Chong(Alan)
@ 2026-01-06 20:47 ` Kasiviswanathan, Harish
2026-01-09 7:13 ` Deng, Emily
1 sibling, 1 reply; 9+ messages in thread
From: Kasiviswanathan, Harish @ 2026-01-06 20:47 UTC (permalink / raw)
To: Li, Chong(Alan), amd-gfx@lists.freedesktop.org
Cc: Deng, Emily, Zhao, Victor, Yang, Philip, Kuehling, Felix
Hi Alan,
Based on your older patches, I understand that this patch is required because host (gim) driver assuemes guest driver is available within 3s. I am not sure how the 3s timeout was decided. I feel better approach should be a more robust handshake between guest and host driver. You might be able to temporarily get away by rearranging the initialization code but that could break easily if some other change in future causes a delay.
Best Regards,
Harish
On 2025-11-17 01:38, chong li wrote:
> [Why]
> function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
> sometimes cost too much time.
>
> [How]
> move the function "kgd2kfd_init_zone_device"
> after release full gpu access(amdgpu_virt_release_full_gpu).
>
> v2:
> improve the coding style.
>
> Signed-off-by: chong li <chongli2@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++-
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 23 ++++++++++++++++++++++
> drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++++++
> 4 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 40c46e6c8898..6d204ba2c267 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -37,7 +37,7 @@
> #include "amdgpu_sync.h"
> #include "amdgpu_vm.h"
> #include "amdgpu_xcp.h"
> -
> +#include "kfd_topology.h"
> extern uint64_t amdgpu_amdkfd_total_mem_size;
>
> enum TLB_FLUSH_TYPE {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0b40ddcb8ba1..b4e1f258119c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3333,7 +3333,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>
> /* Don't init kfd if whole hive need to be reset during init */
> if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
> - kgd2kfd_init_zone_device(adev);
> amdgpu_amdkfd_device_init(adev);
> }
>
> @@ -4931,6 +4930,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>
> if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
> amdgpu_xgmi_reset_on_init(adev);
> +
> + /* Don't init kfd if whole hive need to be reset during init */
> + if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
> + kgd2kfd_init_zone_device(adev);
> + kfd_update_svm_support_properties(adev);
> + }
> +
> /*
> * Place those sysfs registering after `late_init`. As some of those
> * operations performed in `late_init` might affect the sysfs
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 8644039777b8..8511b00a7463 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -2475,3 +2475,26 @@ int kfd_debugfs_rls_by_device(struct seq_file *m, void *data)
> }
>
> #endif
> +
> +void kfd_update_svm_support_properties(struct amdgpu_device *adev)
> +{
> + struct kfd_topology_device *dev;
> + int ret;
> +
> + down_write(&topology_lock);
> + list_for_each_entry(dev, &topology_device_list, list) {
> + if (!dev->gpu || dev->gpu->adev != adev)
> + continue;
> +
> + if (KFD_IS_SVM_API_SUPPORTED(adev)) {
> + dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED;
> + ret = kfd_topology_update_sysfs();
> + if (!ret)
> + sys_props.generation_count++;
> + else
> + dev_err(adev->dev, "Failed to update SVM support properties. ret=%d\n", ret);
> + } else
> + dev->node_props.capability &= ~HSA_CAP_SVMAPI_SUPPORTED;
> + }
> + up_write(&topology_lock);
> +}
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index ab7a3bf1bdef..129b447fcf84 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -202,4 +202,10 @@ struct kfd_topology_device *kfd_create_topology_device(
> struct list_head *device_list);
> void kfd_release_topology_device_list(struct list_head *device_list);
>
> +#if IS_ENABLED(CONFIG_HSA_AMD)
> +void kfd_update_svm_support_properties(struct amdgpu_device *adev);
> +#else
> +static inline void kfd_update_svm_support_properties(struct amdgpu_device *adev) {}
> +#endif
> +
> #endif /* __KFD_TOPOLOGY_H__ */
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.
2026-01-06 20:47 ` Kasiviswanathan, Harish
@ 2026-01-09 7:13 ` Deng, Emily
0 siblings, 0 replies; 9+ messages in thread
From: Deng, Emily @ 2026-01-09 7:13 UTC (permalink / raw)
To: Kasiviswanathan, Harish, Li, Chong(Alan),
amd-gfx@lists.freedesktop.org
Cc: Zhao, Victor, Yang, Philip, Kuehling, Felix
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Harish,
Operations within full access mode are hardware-related and require exclusive GPU ownership. Software-related operations, particularly those that are time-consuming, must not be placed inside full access mode, as this would impact other VFs by blocking their access to the GPU.
Emily Deng
Best Wishes
>-----Original Message-----
>From: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>
>Sent: Wednesday, January 7, 2026 4:47 AM
>To: Li, Chong(Alan) <Chong.Li@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>; Zhao, Victor <Victor.Zhao@amd.com>;
>Yang, Philip <Philip.Yang@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
>Subject: Re: [PATCH] drm/amdgpu: reduce the full gpu access time in
>amdgpu_device_init.
>
>Hi Alan,
>
>Based on your older patches, I understand that this patch is required because host
>(gim) driver assuemes guest driver is available within 3s. I am not sure how the 3s
>timeout was decided. I feel better approach should be a more robust handshake
>between guest and host driver. You might be able to temporarily get away by
>rearranging the initialization code but that could break easily if some other change in
>future causes a delay.
>
>Best Regards,
>Harish
>
>
>On 2025-11-17 01:38, chong li wrote:
>> [Why]
>> function "devm_memremap_pages" in function "kgd2kfd_init_zone_device",
>> sometimes cost too much time.
>>
>> [How]
>> move the function "kgd2kfd_init_zone_device"
>> after release full gpu access(amdgpu_virt_release_full_gpu).
>>
>> v2:
>> improve the coding style.
>>
>> Signed-off-by: chong li <chongli2@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++-
>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 23 ++++++++++++++++++++++
>> drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++++++
>> 4 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 40c46e6c8898..6d204ba2c267 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -37,7 +37,7 @@
>> #include "amdgpu_sync.h"
>> #include "amdgpu_vm.h"
>> #include "amdgpu_xcp.h"
>> -
>> +#include "kfd_topology.h"
>> extern uint64_t amdgpu_amdkfd_total_mem_size;
>>
>> enum TLB_FLUSH_TYPE {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 0b40ddcb8ba1..b4e1f258119c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3333,7 +3333,6 @@ static int amdgpu_device_ip_init(struct
>> amdgpu_device *adev)
>>
>> /* Don't init kfd if whole hive need to be reset during init */
>> if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>> - kgd2kfd_init_zone_device(adev);
>> amdgpu_amdkfd_device_init(adev);
>> }
>>
>> @@ -4931,6 +4930,13 @@ int amdgpu_device_init(struct amdgpu_device
>> *adev,
>>
>> if (adev->init_lvl->level == AMDGPU_INIT_LEVEL_MINIMAL_XGMI)
>> amdgpu_xgmi_reset_on_init(adev);
>> +
>> + /* Don't init kfd if whole hive need to be reset during init */
>> + if (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) {
>> + kgd2kfd_init_zone_device(adev);
>> + kfd_update_svm_support_properties(adev);
>> + }
>> +
>> /*
>> * Place those sysfs registering after `late_init`. As some of those
>> * operations performed in `late_init` might affect the sysfs diff
>> --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 8644039777b8..8511b00a7463 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -2475,3 +2475,26 @@ int kfd_debugfs_rls_by_device(struct seq_file
>> *m, void *data) }
>>
>> #endif
>> +
>> +void kfd_update_svm_support_properties(struct amdgpu_device *adev) {
>> + struct kfd_topology_device *dev;
>> + int ret;
>> +
>> + down_write(&topology_lock);
>> + list_for_each_entry(dev, &topology_device_list, list) {
>> + if (!dev->gpu || dev->gpu->adev != adev)
>> + continue;
>> +
>> + if (KFD_IS_SVM_API_SUPPORTED(adev)) {
>> + dev->node_props.capability |=
>HSA_CAP_SVMAPI_SUPPORTED;
>> + ret = kfd_topology_update_sysfs();
>> + if (!ret)
>> + sys_props.generation_count++;
>> + else
>> + dev_err(adev->dev, "Failed to update SVM support
>properties. ret=%d\n", ret);
>> + } else
>> + dev->node_props.capability &=
>~HSA_CAP_SVMAPI_SUPPORTED;
>> + }
>> + up_write(&topology_lock);
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index ab7a3bf1bdef..129b447fcf84 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -202,4 +202,10 @@ struct kfd_topology_device *kfd_create_topology_device(
>> struct list_head *device_list);
>> void kfd_release_topology_device_list(struct list_head *device_list);
>>
>> +#if IS_ENABLED(CONFIG_HSA_AMD)
>> +void kfd_update_svm_support_properties(struct amdgpu_device *adev);
>> +#else static inline void kfd_update_svm_support_properties(struct
>> +amdgpu_device *adev) {} #endif
>> +
>> #endif /* __KFD_TOPOLOGY_H__ */
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-01-09 7:13 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 9:06 [PATCH] drm/amdgpu: reduce the full gpu access time in amdgpu_device_init chong li
-- strict thread matches above, loose matches on Subject: below --
2025-11-17 6:38 chong li
2025-11-17 6:41 ` Li, Chong(Alan)
2026-01-06 20:47 ` Kasiviswanathan, Harish
2026-01-09 7:13 ` Deng, Emily
2025-11-11 7:02 chong li
2025-11-07 7:07 chong li
2025-11-07 7:12 ` Deng, Emily
2025-10-21 9:45 chong li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox