From: Felix Kuehling <felix.kuehling@amd.com>
To: Huang Rui <ray.huang@amd.com>
Cc: "amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
Date: Thu, 20 Aug 2020 09:22:09 -0400 [thread overview]
Message-ID: <508ffc35-0bde-90f7-abb2-ea3502013cc4@amd.com> (raw)
In-Reply-To: <20200820093854.GA298091@hr-amd>
Am 2020-08-20 um 5:38 a.m. schrieb Huang Rui:
> On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote:
>> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
>>> On 2020-08-19 7:56 p.m., Huang Rui wrote:
>>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>>>>> We still have a few iommu issues which need to address, so force raven
>>>>>> as "dgpu" path for the moment.
>>>>>>
>>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>>>>> or ACPI CRAT table not correct.
>>>>>>
>>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>>>> will use "dgpu" path by default.
>>>>>>
>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++++++++++++++++++++++-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
>>>>>> 5 files changed, 34 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index a9a4319c24ae..189f9d7e190d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>>>> * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>>>> * table to get information about AMD APUs. This option can serve as a workaround on
>>>>>> * systems with a broken CRAT table.
>>>>>> + *
>>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>>>>> + * whehter use CRAT)
>>>>>> */
>>>>>> int ignore_crat;
>>>>>> module_param(ignore_crat, int, 0444);
>>>>>> MODULE_PARM_DESC(ignore_crat,
>>>>>> - "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>>>>> + "Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>>>>
>>>>>> /**
>>>>>> * DOC: halt_if_hws_hang (int)
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> index 59557e3e206a..f8346d4402e2 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> @@ -22,6 +22,7 @@
>>>>>>
>>>>>> #include <linux/pci.h>
>>>>>> #include <linux/acpi.h>
>>>>>> +#include <asm/processor.h>
>>>>>> #include "kfd_crat.h"
>>>>>> #include "kfd_priv.h"
>>>>>> #include "kfd_topology.h"
>>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +
>>>>>> +#ifdef CONFIG_ACPI
>>>>>> +static void kfd_setup_ignore_crat_option(void)
>>>>>> +{
>>>>>> +
>>>>>> + if (ignore_crat)
>>>>>> + return;
>>>>>> +
>>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>>>>> + ignore_crat = 1;
>>>>>> +#else
>>>>>> + ignore_crat = 0;
>>>>>> +#endif
>>>>>> +
>>>>>> + /* Renoir use the fallback path to align with existed thunk */
>>>>> Are you sure you need special code for Renoir here? For Renoir the
>>>>> dev->device_info already treats it as a dGPU and always has.
>>>> Renoir also is an APU, in other words, we might have got the correct CRAT
>>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
>>>> we had got CRAT table, the kfd would create an APU node. That's not
>>>> expected.
>>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT
>>> table because gpu->device_info->needs_iommu_device is False for Renoir.
>>> So Renoir will always show up in the topology as its own discrete GPU node.
>>>
>>> How does this work today? Renoir is already treated as a dGPU. But the
>>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the
>>> CRAT table still shows GPU cores?
>>>
>> Yes, Renoir works well. In fact, I found the problem while I was enabling
>> the dGPU path for raven before. Even I set needs_iommu_device as false in
>> raven's device info. The kfd still creates the APU node. (in v1 patch)
>>
>> Let me rollback to check it again.
>>
> Hi Felix,
>
> I double check it again. If Renoir's ACPI CRAT were good, we wouldn't
> create virtual CRAT table for Renoir at this moement. Then the pure CPU
> node is unable to be created. That conflicted with needs_iommu_device flag
> (we hardcode needs_iommu_device as false for renoir). Then will break the
> user mode driver. (please see below rocminfo)
>
> rocminfo: /libhsakmt/src/topology.c:1079: topology_sysfs_get_node_props: Assertion `props->EngineId.ui32.Major' failed.
>
> So we probably would better have a specific handling to make sure Renoir
> with virtual CRAT.
OK. That would be a solution that works for Renoir only.
I'd suggest trying a more general solution in node_show in
kfd_topology.c. If we see an APU node (that has CPU and GPU cores) with
no associated GPU (dev->gpu_id is 0 or dev->gpu is NULL), we can report
it as a pure CPU node to user mode by just reporting the simd_count as 0.
Regards,
Felix
>
> Thanks,
> Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2020-08-20 13:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-19 11:06 [PATCH v3 1/3] drm/amdkfd: abstract the iommu device checking with ignore_crat (v2) Huang Rui
2020-08-19 11:06 ` [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3) Huang Rui
2020-08-19 15:38 ` Felix Kuehling
2020-08-19 23:56 ` Huang Rui
2020-08-20 0:18 ` Felix Kuehling
2020-08-20 0:31 ` Huang Rui
2020-08-20 9:38 ` Huang Rui
2020-08-20 13:22 ` Felix Kuehling [this message]
2020-08-20 3:09 ` Huang Rui
2020-08-20 4:05 ` Felix Kuehling
2020-08-20 4:41 ` Huang Rui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508ffc35-0bde-90f7-abb2-ea3502013cc4@amd.com \
--to=felix.kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=ray.huang@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox