From: Huang Rui <ray.huang@amd.com>
To: "Kuehling, Felix" <Felix.Kuehling@amd.com>
Cc: "amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
Date: Thu, 20 Aug 2020 08:31:25 +0800 [thread overview]
Message-ID: <20200820003125.GD215900@hr-amd> (raw)
In-Reply-To: <e31242f8-78df-9b7f-5e5a-b57c1722d5c3@amd.com>
On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
> On 2020-08-19 7:56 p.m., Huang Rui wrote:
> > On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> >> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> >>> We still have a few iommu issues which need to address, so force raven
> >>> as "dgpu" path for the moment.
> >>>
> >>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> >>> or ACPI CRAT table not correct.
> >>>
> >>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> >>> v3: Align with existed thunk, don't change the way of raven, only renoir
> >>> will use "dgpu" path by default.
> >>>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> ---
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-
> >>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++++++++++++++++++++++-
> >>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
> >>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
> >>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
> >>> 5 files changed, 34 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> index a9a4319c24ae..189f9d7e190d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> >>> * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
> >>> * table to get information about AMD APUs. This option can serve as a workaround on
> >>> * systems with a broken CRAT table.
> >>> + *
> >>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> >>> + * whehter use CRAT)
> >>> */
> >>> int ignore_crat;
> >>> module_param(ignore_crat, int, 0444);
> >>> MODULE_PARM_DESC(ignore_crat,
> >>> - "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> >>> + "Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
> >>>
> >>> /**
> >>> * DOC: halt_if_hws_hang (int)
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> index 59557e3e206a..f8346d4402e2 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> @@ -22,6 +22,7 @@
> >>>
> >>> #include <linux/pci.h>
> >>> #include <linux/acpi.h>
> >>> +#include <asm/processor.h>
> >>> #include "kfd_crat.h"
> >>> #include "kfd_priv.h"
> >>> #include "kfd_topology.h"
> >>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>> return 0;
> >>> }
> >>>
> >>> +
> >>> +#ifdef CONFIG_ACPI
> >>> +static void kfd_setup_ignore_crat_option(void)
> >>> +{
> >>> +
> >>> + if (ignore_crat)
> >>> + return;
> >>> +
> >>> +#ifndef KFD_SUPPORT_IOMMU_V2
> >>> + ignore_crat = 1;
> >>> +#else
> >>> + ignore_crat = 0;
> >>> +#endif
> >>> +
> >>> + /* Renoir use the fallback path to align with existed thunk */
> >> Are you sure you need special code for Renoir here? For Renoir the
> >> dev->device_info already treats it as a dGPU and always has.
> > Renoir also is an APU, in other words, we might have got the correct CRAT
> > table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> > we had got CRAT table, the kfd would create an APU node. That's not
> > expected.
>
> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT
> table because gpu->device_info->needs_iommu_device is False for Renoir.
> So Renoir will always show up in the topology as its own discrete GPU node.
>
> How does this work today? Renoir is already treated as a dGPU. But the
> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the
> CRAT table still shows GPU cores?
>
Yes, Renoir works well. In fact, I found the problem while I was enabling
the dGPU path for raven before. Even I set needs_iommu_device as false in
raven's device info. The kfd still creates the APU node. (in v1 patch)
Let me rollback to check it again.
Thanks,
Ray
> Regards,
> Felix
>
>
> >
> >> I don't like the whole idea of changing the value of a module parameter,
> >> because it is global and visible to the user through sysfs. Instead, if
> >> you need to override the value of ignore_crat to consider other
> >> conditions, I think kfd_device_use_iommu_v2 and
> >> kfd_create_crat_image_acpi would be the right place to do it.
> >>
> >> To avoid duplicating the conditions, you could add a helper function
> >> bool kfd_ignore_crat(void) that can be called instead of using the
> >> ignore_crat parameter directly. This function, changing the global
> >> module parameter, should be removed.
> > That makes sense. Will update it in next version.
> >
> >>
> >>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >>> + boot_cpu_data.x86 == 0x17 &&
> >>> + boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> >>> + ignore_crat = 1;
> >>> + }
> >>> +
> >>> + return;
> >>> +}
> >>> +
> >>> /*
> >>> * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
> >>> * copies CRAT from ACPI (if available).
> >>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>> *
> >>> * Return 0 if successful else return error code
> >>> */
> >>> -#ifdef CONFIG_ACPI
> >>> int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>> {
> >>> struct acpi_table_header *crat_table;
> >>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>> return -EINVAL;
> >>> }
> >>>
> >>> + kfd_setup_ignore_crat_option();
> >>> +
> >>> if (ignore_crat) {
> >>> pr_info("CRAT table disabled by module option\n");
> >>> return -ENODATA;
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> index 2c030c2b5b8d..dab44951c4d8 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
> >>> .num_xgmi_sdma_engines = 0,
> >>> .num_sdma_queues_per_engine = 2,
> >>> };
> >>> +#endif
> >>>
> >>> static const struct kfd_device_info raven_device_info = {
> >>> .asic_family = CHIP_RAVEN,
> >>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
> >>> .num_xgmi_sdma_engines = 0,
> >>> .num_sdma_queues_per_engine = 2,
> >>> };
> >>> -#endif
> >>>
> >>> static const struct kfd_device_info hawaii_device_info = {
> >>> .asic_family = CHIP_HAWAII,
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> index 82f955750e75..4b6e7ef7a71c 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> >>>
> >>> static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
> >>> {
> >>> - return dev && dev->device_info->needs_iommu_device;
> >>> + return !ignore_crat && dev && dev->device_info->needs_iommu_device;
> >>> }
> >>>
> >>> /* Debugfs */
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> index 4b29815e9205..b92ce75a4c53 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
> >>> COMPUTE_UNIT_CPU, NULL,
> >>> proximity_domain);
> >>> cpu_only_node = 1;
> >>> + ignore_crat = 1;
> >> Don't change the global variable. I think you're doing this here in case
> >> the CRAT table is broken and contains no GPU info. Maybe we need to add
> >> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
> >>
> > Yes, you're right. Will remove global ignore_crat update. Let me revise it
> > again.
> >
> > Thanks,
> > Ray
> >
> >> Regards,
> >> Felix
> >>
> >>
> >>> if (ret) {
> >>> pr_err("Error creating VCRAT table for CPU\n");
> >>> return ret;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2020-08-20 0:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-19 11:06 [PATCH v3 1/3] drm/amdkfd: abstract the iommu device checking with ignore_crat (v2) Huang Rui
2020-08-19 11:06 ` [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3) Huang Rui
2020-08-19 15:38 ` Felix Kuehling
2020-08-19 23:56 ` Huang Rui
2020-08-20 0:18 ` Felix Kuehling
2020-08-20 0:31 ` Huang Rui [this message]
2020-08-20 9:38 ` Huang Rui
2020-08-20 13:22 ` Felix Kuehling
2020-08-20 3:09 ` Huang Rui
2020-08-20 4:05 ` Felix Kuehling
2020-08-20 4:41 ` Huang Rui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200820003125.GD215900@hr-amd \
--to=ray.huang@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox