* [PATCH] drm/amdkfd: Initialize dqm earlier
@ 2019-06-06 21:51 Zeng, Oak
[not found] ` <1559857887-7096-1-git-send-email-Oak.Zeng-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Zeng, Oak @ 2019-06-06 21:51 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: Kuehling, Felix, Zeng, Oak
dqm is referenced in function kfd_toplogy_add_device.
Move dqm initialization up to avoid NULL pointer reference.
Change-Id: Id6cb2541af129826b7621ceaa8e06e638c7bb122
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 9d1b026..e7e24fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -603,6 +603,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
if (kfd->kfd2kgd->get_hive_id)
kfd->hive_id = kfd->kfd2kgd->get_hive_id(kfd->kgd);
+ kfd->dqm = device_queue_manager_init(kfd);
+ if (!kfd->dqm) {
+ dev_err(kfd_device, "Error initializing queue manager\n");
+ goto device_queue_manager_error;
+ }
+
if (kfd_topology_add_device(kfd)) {
dev_err(kfd_device, "Error adding device to topology\n");
goto kfd_topology_add_device_error;
@@ -613,12 +619,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
goto kfd_interrupt_error;
}
- kfd->dqm = device_queue_manager_init(kfd);
- if (!kfd->dqm) {
- dev_err(kfd_device, "Error initializing queue manager\n");
- goto device_queue_manager_error;
- }
-
if (kfd_iommu_device_init(kfd)) {
dev_err(kfd_device, "Error initializing iommuv2\n");
goto device_iommu_error;
@@ -642,12 +642,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
kfd_resume_error:
device_iommu_error:
- device_queue_manager_uninit(kfd->dqm);
-device_queue_manager_error:
kfd_interrupt_exit(kfd);
kfd_interrupt_error:
kfd_topology_remove_device(kfd);
kfd_topology_add_device_error:
+ device_queue_manager_uninit(kfd->dqm);
+device_queue_manager_error:
kfd_doorbell_fini(kfd);
kfd_doorbell_error:
kfd_gtt_sa_fini(kfd);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amdkfd: Initialize dqm earlier
[not found] ` <1559857887-7096-1-git-send-email-Oak.Zeng-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-06 22:08 ` Kuehling, Felix
[not found] ` <b3a5c8d3-394f-d6a1-dec3-fb42d0b9d077-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Kuehling, Felix @ 2019-06-06 22:08 UTC (permalink / raw)
To: Zeng, Oak,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
On 2019-06-06 5:51 p.m., Zeng, Oak wrote:
> dqm is referenced in function kfd_toplogy_add_device.
> Move dqm initialization up to avoid NULL pointer reference.
This addresses a pretty unlikely race condition where someone looks at
/sys/kernel/debug/kfd/hqds during the device initialization.
We add devices do the topology before their initialization is
successfully completed. If it fails, we remove the device again. Having
devices in the topology that are not completely initialized yet seems to
be the real issue. A cleaner solution would move
kfd_topoglogy_add_device to the end of kgd2kfd_device_init, so that we
only add a device to the topology after they are successfully and
completely initialized. Not sure if there are any dependencies in the
init sequence that would be broken by this, though.
Regards,
Felix
>
> Change-Id: Id6cb2541af129826b7621ceaa8e06e638c7bb122
> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 9d1b026..e7e24fe 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -603,6 +603,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
> if (kfd->kfd2kgd->get_hive_id)
> kfd->hive_id = kfd->kfd2kgd->get_hive_id(kfd->kgd);
>
> + kfd->dqm = device_queue_manager_init(kfd);
> + if (!kfd->dqm) {
> + dev_err(kfd_device, "Error initializing queue manager\n");
> + goto device_queue_manager_error;
> + }
> +
> if (kfd_topology_add_device(kfd)) {
> dev_err(kfd_device, "Error adding device to topology\n");
> goto kfd_topology_add_device_error;
> @@ -613,12 +619,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
> goto kfd_interrupt_error;
> }
>
> - kfd->dqm = device_queue_manager_init(kfd);
> - if (!kfd->dqm) {
> - dev_err(kfd_device, "Error initializing queue manager\n");
> - goto device_queue_manager_error;
> - }
> -
> if (kfd_iommu_device_init(kfd)) {
> dev_err(kfd_device, "Error initializing iommuv2\n");
> goto device_iommu_error;
> @@ -642,12 +642,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>
> kfd_resume_error:
> device_iommu_error:
> - device_queue_manager_uninit(kfd->dqm);
> -device_queue_manager_error:
> kfd_interrupt_exit(kfd);
> kfd_interrupt_error:
> kfd_topology_remove_device(kfd);
> kfd_topology_add_device_error:
> + device_queue_manager_uninit(kfd->dqm);
> +device_queue_manager_error:
> kfd_doorbell_fini(kfd);
> kfd_doorbell_error:
> kfd_gtt_sa_fini(kfd);
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH] drm/amdkfd: Initialize dqm earlier
[not found] ` <b3a5c8d3-394f-d6a1-dec3-fb42d0b9d077-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-10 19:40 ` Zeng, Oak
0 siblings, 0 replies; 3+ messages in thread
From: Zeng, Oak @ 2019-06-10 19:40 UTC (permalink / raw)
To: Kuehling, Felix,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Hi Felix,
Kfd_iommu_device_init depends on kfd topology. I will do as you suggested below. Also I will move the setting of HSA_CAP_ATS_PRESENT from kfd_iommu_device_init to kfd_topology_add_device, to avoid the dependency.
Regards,
Oak
-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Thursday, June 6, 2019 6:09 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdkfd: Initialize dqm earlier
On 2019-06-06 5:51 p.m., Zeng, Oak wrote:
> dqm is referenced in function kfd_toplogy_add_device.
> Move dqm initialization up to avoid NULL pointer reference.
This addresses a pretty unlikely race condition where someone looks at /sys/kernel/debug/kfd/hqds during the device initialization.
We add devices do the topology before their initialization is successfully completed. If it fails, we remove the device again. Having devices in the topology that are not completely initialized yet seems to be the real issue. A cleaner solution would move kfd_topoglogy_add_device to the end of kgd2kfd_device_init, so that we only add a device to the topology after they are successfully and completely initialized. Not sure if there are any dependencies in the init sequence that would be broken by this, though.
Regards,
Felix
>
> Change-Id: Id6cb2541af129826b7621ceaa8e06e638c7bb122
> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 9d1b026..e7e24fe 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -603,6 +603,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
> if (kfd->kfd2kgd->get_hive_id)
> kfd->hive_id = kfd->kfd2kgd->get_hive_id(kfd->kgd);
>
> + kfd->dqm = device_queue_manager_init(kfd);
> + if (!kfd->dqm) {
> + dev_err(kfd_device, "Error initializing queue manager\n");
> + goto device_queue_manager_error;
> + }
> +
> if (kfd_topology_add_device(kfd)) {
> dev_err(kfd_device, "Error adding device to topology\n");
> goto kfd_topology_add_device_error; @@ -613,12 +619,6 @@ bool
> kgd2kfd_device_init(struct kfd_dev *kfd,
> goto kfd_interrupt_error;
> }
>
> - kfd->dqm = device_queue_manager_init(kfd);
> - if (!kfd->dqm) {
> - dev_err(kfd_device, "Error initializing queue manager\n");
> - goto device_queue_manager_error;
> - }
> -
> if (kfd_iommu_device_init(kfd)) {
> dev_err(kfd_device, "Error initializing iommuv2\n");
> goto device_iommu_error;
> @@ -642,12 +642,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>
> kfd_resume_error:
> device_iommu_error:
> - device_queue_manager_uninit(kfd->dqm);
> -device_queue_manager_error:
> kfd_interrupt_exit(kfd);
> kfd_interrupt_error:
> kfd_topology_remove_device(kfd);
> kfd_topology_add_device_error:
> + device_queue_manager_uninit(kfd->dqm);
> +device_queue_manager_error:
> kfd_doorbell_fini(kfd);
> kfd_doorbell_error:
> kfd_gtt_sa_fini(kfd);
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-06-10 19:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-06 21:51 [PATCH] drm/amdkfd: Initialize dqm earlier Zeng, Oak
[not found] ` <1559857887-7096-1-git-send-email-Oak.Zeng-5C7GfCeVMHo@public.gmane.org>
2019-06-06 22:08 ` Kuehling, Felix
[not found] ` <b3a5c8d3-394f-d6a1-dec3-fb42d0b9d077-5C7GfCeVMHo@public.gmane.org>
2019-06-10 19:40 ` Zeng, Oak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox