public inbox for amd-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v3 1/3] drm/amdkfd: abstract the iommu device checking with ignore_crat (v2)
@ 2020-08-19 11:06 Huang Rui
  2020-08-19 11:06 ` [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3) Huang Rui
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Rui @ 2020-08-19 11:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix Kuehling, Huang Rui

It's better to use inline function to wrap the iommu checking.

v2: rename the function and use another input.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c               |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c                |  4 ++--
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c           |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c                 | 10 +++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  |  5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c              |  6 +++---
 7 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1b60e0ed6b5c..ae3e3e7e3b75 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1258,7 +1258,7 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev)
 		return true;
 	}
 
-	if (dev->device_info->needs_iommu_device)
+	if (kfd_device_use_iommu_v2(dev))
 		return false;
 
 	amdgpu_amdkfd_get_local_mem_info(dev->kgd, &mem_info);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 3e5904f8876a..78830835162e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -309,7 +309,7 @@ static int dbgdev_address_watch_nodiq(struct kfd_dbgdev *dbgdev,
 	for (i = 0; i < adw_info->num_watch_points; i++) {
 		dbgdev_address_watch_set_registers(adw_info, &addrHi, &addrLo,
 				&cntl, i, pdd->qpd.vmid,
-				dbgdev->dev->device_info->needs_iommu_device);
+				kfd_device_use_iommu_v2(dbgdev->dev));
 
 		pr_debug("\t\t%30s\n", "* * * * * * * * * * * * * * * * * *");
 		pr_debug("\t\t%20s %08x\n", "register index :", i);
@@ -399,7 +399,7 @@ static int dbgdev_address_watch_diq(struct kfd_dbgdev *dbgdev,
 	for (i = 0; i < adw_info->num_watch_points; i++) {
 		dbgdev_address_watch_set_registers(adw_info, &addrHi, &addrLo,
 				&cntl, i, vmid,
-				dbgdev->dev->device_info->needs_iommu_device);
+				kfd_device_use_iommu_v2(dbgdev->dev));
 
 		pr_debug("\t\t%30s\n", "* * * * * * * * * * * * * * * * * *");
 		pr_debug("\t\t%20s %08x\n", "register index :", i);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
index 95a82ac455f2..02a3e9888092 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
@@ -62,7 +62,7 @@ static int update_qpd_v9(struct device_queue_manager *dqm,
 				SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
 					SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT;
 		if (amdgpu_noretry &&
-		    !dqm->dev->device_info->needs_iommu_device)
+		    !(kfd_device_use_iommu_v2(dqm->dev)))
 			qpd->sh_mem_config |=
 				1 << SH_MEM_CONFIG__RETRY_DISABLE__SHIFT;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 78714f9a8b11..11c2bb7ba5ee 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -321,7 +321,7 @@ static void kfd_init_apertures_vi(struct kfd_process_device *pdd, uint8_t id)
 	pdd->lds_base = MAKE_LDS_APP_BASE_VI();
 	pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
-	if (!pdd->dev->device_info->needs_iommu_device) {
+	if (!kfd_device_use_iommu_v2(pdd->dev)) {
 		/* dGPUs: SVM aperture starting at 0
 		 * with small reserved space for kernel.
 		 * Set them to CANONICAL addresses.
@@ -423,7 +423,7 @@ int kfd_init_apertures(struct kfd_process *process)
 				return -EINVAL;
 			}
 
-			if (!dev->device_info->needs_iommu_device) {
+			if (!kfd_device_use_iommu_v2(dev)) {
 				/* dGPUs: the reserved space for kernel
 				 * before SVM
 				 */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 4d3b4188b9a1..94e8354a857d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -41,7 +41,7 @@ int kfd_iommu_check_device(struct kfd_dev *kfd)
 	struct amd_iommu_device_info iommu_info;
 	int err;
 
-	if (!kfd->device_info->needs_iommu_device)
+	if (!kfd_device_use_iommu_v2(kfd))
 		return -ENODEV;
 
 	iommu_info.flags = 0;
@@ -63,7 +63,7 @@ int kfd_iommu_device_init(struct kfd_dev *kfd)
 	unsigned int pasid_limit;
 	int err;
 
-	if (!kfd->device_info->needs_iommu_device)
+	if (!kfd_device_use_iommu_v2(kfd))
 		return 0;
 
 	iommu_info.flags = 0;
@@ -109,7 +109,7 @@ int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
 	struct kfd_process *p = pdd->process;
 	int err;
 
-	if (!dev->device_info->needs_iommu_device || pdd->bound == PDD_BOUND)
+	if (!kfd_device_use_iommu_v2(dev) || pdd->bound == PDD_BOUND)
 		return 0;
 
 	if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
@@ -284,7 +284,7 @@ static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
  */
 void kfd_iommu_suspend(struct kfd_dev *kfd)
 {
-	if (!kfd->device_info->needs_iommu_device)
+	if (!kfd_device_use_iommu_v2(kfd))
 		return;
 
 	kfd_unbind_processes_from_device(kfd);
@@ -304,7 +304,7 @@ int kfd_iommu_resume(struct kfd_dev *kfd)
 	unsigned int pasid_limit;
 	int err;
 
-	if (!kfd->device_info->needs_iommu_device)
+	if (!kfd_device_use_iommu_v2(kfd))
 		return 0;
 
 	pasid_limit = kfd_get_pasid_limit();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 60243798cce2..82f955750e75 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1232,6 +1232,11 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
 #endif
 }
 
+static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
+{
+	return dev && dev->device_info->needs_iommu_device;
+}
+
 /* Debugfs */
 #if defined(CONFIG_DEBUG_FS)
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index cbb8535abf0c..4b29815e9205 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -545,7 +545,7 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 		 * If the ASIC is APU except Kaveri, set local memory size
 		 * to 0 to disable local memory support
 		 */
-		if (!dev->gpu->device_info->needs_iommu_device
+		if (!kfd_device_use_iommu_v2(dev->gpu)
 			|| dev->gpu->device_info->asic_family == CHIP_KAVERI) {
 			amdgpu_amdkfd_get_local_mem_info(dev->gpu->kgd,
 				&local_mem_info);
@@ -1197,7 +1197,7 @@ static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
 		/* Discrete GPUs need their own topology device list
 		 * entries. Don't assign them to CPU/APU nodes.
 		 */
-		if (!gpu->device_info->needs_iommu_device &&
+		if (!kfd_device_use_iommu_v2(gpu) &&
 		    dev->node_props.cpu_cores_count)
 			continue;
 
@@ -1452,7 +1452,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 	* Overwrite ATS capability according to needs_iommu_device to fix
 	* potential missing corresponding bit in CRAT of BIOS.
 	*/
-	if (dev->gpu->device_info->needs_iommu_device)
+	if (kfd_device_use_iommu_v2(dev->gpu))
 		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
 	else
 		dev->node_props.capability &= ~HSA_CAP_ATS_PRESENT;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-19 11:06 [PATCH v3 1/3] drm/amdkfd: abstract the iommu device checking with ignore_crat (v2) Huang Rui
@ 2020-08-19 11:06 ` Huang Rui
  2020-08-19 15:38   ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Rui @ 2020-08-19 11:06 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix Kuehling, Huang Rui

We still have a few iommu issues which need to address, so force raven
as "dgpu" path for the moment.

This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
or ACPI CRAT table not correct.

v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
v3: Align with existed thunk, don't change the way of raven, only renoir
    will use "dgpu" path by default.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
 5 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index a9a4319c24ae..189f9d7e190d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
  * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
  * table to get information about AMD APUs. This option can serve as a workaround on
  * systems with a broken CRAT table.
+ *
+ * Default is auto (according to asic type, iommu_v2, and crat table, to decide
+ * whehter use CRAT)
  */
 int ignore_crat;
 module_param(ignore_crat, int, 0444);
 MODULE_PARM_DESC(ignore_crat,
-	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
+	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
 
 /**
  * DOC: halt_if_hws_hang (int)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 59557e3e206a..f8346d4402e2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -22,6 +22,7 @@
 
 #include <linux/pci.h>
 #include <linux/acpi.h>
+#include <asm/processor.h>
 #include "kfd_crat.h"
 #include "kfd_priv.h"
 #include "kfd_topology.h"
@@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
 	return 0;
 }
 
+
+#ifdef CONFIG_ACPI
+static void kfd_setup_ignore_crat_option(void)
+{
+
+	if (ignore_crat)
+		return;
+
+#ifndef KFD_SUPPORT_IOMMU_V2
+	ignore_crat = 1;
+#else
+	ignore_crat = 0;
+#endif
+
+	/* Renoir use the fallback path to align with existed thunk */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+	    boot_cpu_data.x86 == 0x17 &&
+	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
+		ignore_crat = 1;
+	}
+
+	return;
+}
+
 /*
  * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
  * copies CRAT from ACPI (if available).
@@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
  *
  *	Return 0 if successful else return error code
  */
-#ifdef CONFIG_ACPI
 int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
 {
 	struct acpi_table_header *crat_table;
@@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
 		return -EINVAL;
 	}
 
+	kfd_setup_ignore_crat_option();
+
 	if (ignore_crat) {
 		pr_info("CRAT table disabled by module option\n");
 		return -ENODATA;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 2c030c2b5b8d..dab44951c4d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
 	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
+#endif
 
 static const struct kfd_device_info raven_device_info = {
 	.asic_family = CHIP_RAVEN,
@@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
 	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
-#endif
 
 static const struct kfd_device_info hawaii_device_info = {
 	.asic_family = CHIP_HAWAII,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 82f955750e75..4b6e7ef7a71c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
 
 static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
 {
-	return dev && dev->device_info->needs_iommu_device;
+	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
 }
 
 /* Debugfs */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 4b29815e9205..b92ce75a4c53 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
 						    COMPUTE_UNIT_CPU, NULL,
 						    proximity_domain);
 		cpu_only_node = 1;
+		ignore_crat = 1;
 		if (ret) {
 			pr_err("Error creating VCRAT table for CPU\n");
 			return ret;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-19 11:06 ` [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3) Huang Rui
@ 2020-08-19 15:38   ` Felix Kuehling
  2020-08-19 23:56     ` Huang Rui
  0 siblings, 1 reply; 11+ messages in thread
From: Felix Kuehling @ 2020-08-19 15:38 UTC (permalink / raw)
  To: Huang Rui, amd-gfx

Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> We still have a few iommu issues which need to address, so force raven
> as "dgpu" path for the moment.
>
> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> or ACPI CRAT table not correct.
>
> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> v3: Align with existed thunk, don't change the way of raven, only renoir
>     will use "dgpu" path by default.
>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>  5 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index a9a4319c24ae..189f9d7e190d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>   * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>   * table to get information about AMD APUs. This option can serve as a workaround on
>   * systems with a broken CRAT table.
> + *
> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> + * whehter use CRAT)
>   */
>  int ignore_crat;
>  module_param(ignore_crat, int, 0444);
>  MODULE_PARM_DESC(ignore_crat,
> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>  
>  /**
>   * DOC: halt_if_hws_hang (int)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 59557e3e206a..f8346d4402e2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -22,6 +22,7 @@
>  
>  #include <linux/pci.h>
>  #include <linux/acpi.h>
> +#include <asm/processor.h>
>  #include "kfd_crat.h"
>  #include "kfd_priv.h"
>  #include "kfd_topology.h"
> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>  	return 0;
>  }
>  
> +
> +#ifdef CONFIG_ACPI
> +static void kfd_setup_ignore_crat_option(void)
> +{
> +
> +	if (ignore_crat)
> +		return;
> +
> +#ifndef KFD_SUPPORT_IOMMU_V2
> +	ignore_crat = 1;
> +#else
> +	ignore_crat = 0;
> +#endif
> +
> +	/* Renoir use the fallback path to align with existed thunk */

Are you sure you need special code for Renoir here? For Renoir the
dev->device_info already treats it as a dGPU and always has.

I don't like the whole idea of changing the value of a module parameter,
because it is global and visible to the user through sysfs. Instead, if
you need to override the value of ignore_crat to consider other
conditions, I think kfd_device_use_iommu_v2 and
kfd_create_crat_image_acpi would be the right place to do it.

To avoid duplicating the conditions, you could add a helper function
bool kfd_ignore_crat(void) that can be called instead of using the
ignore_crat parameter directly. This function, changing the global
module parameter, should be removed.


> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> +	    boot_cpu_data.x86 == 0x17 &&
> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> +		ignore_crat = 1;
> +	}
> +
> +	return;
> +}
> +
>  /*
>   * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
>   * copies CRAT from ACPI (if available).
> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>   *
>   *	Return 0 if successful else return error code
>   */
> -#ifdef CONFIG_ACPI
>  int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>  {
>  	struct acpi_table_header *crat_table;
> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>  		return -EINVAL;
>  	}
>  
> +	kfd_setup_ignore_crat_option();
> +
>  	if (ignore_crat) {
>  		pr_info("CRAT table disabled by module option\n");
>  		return -ENODATA;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 2c030c2b5b8d..dab44951c4d8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
>  	.num_xgmi_sdma_engines = 0,
>  	.num_sdma_queues_per_engine = 2,
>  };
> +#endif
>  
>  static const struct kfd_device_info raven_device_info = {
>  	.asic_family = CHIP_RAVEN,
> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
>  	.num_xgmi_sdma_engines = 0,
>  	.num_sdma_queues_per_engine = 2,
>  };
> -#endif
>  
>  static const struct kfd_device_info hawaii_device_info = {
>  	.asic_family = CHIP_HAWAII,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 82f955750e75..4b6e7ef7a71c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
>  
>  static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
>  {
> -	return dev && dev->device_info->needs_iommu_device;
> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
>  }
>  
>  /* Debugfs */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 4b29815e9205..b92ce75a4c53 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
>  						    COMPUTE_UNIT_CPU, NULL,
>  						    proximity_domain);
>  		cpu_only_node = 1;
> +		ignore_crat = 1;

Don't change the global variable. I think you're doing this here in case
the CRAT table is broken and contains no GPU info. Maybe we need to add
a new flag "use_iommu_v2" into the kfd_dev structure to handle this.

Regards,
  Felix


>  		if (ret) {
>  			pr_err("Error creating VCRAT table for CPU\n");
>  			return ret;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-19 15:38   ` Felix Kuehling
@ 2020-08-19 23:56     ` Huang Rui
  2020-08-20  0:18       ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Rui @ 2020-08-19 23:56 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx@lists.freedesktop.org

On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> > We still have a few iommu issues which need to address, so force raven
> > as "dgpu" path for the moment.
> >
> > This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> > or ACPI CRAT table not correct.
> >
> > v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> > v3: Align with existed thunk, don't change the way of raven, only renoir
> >     will use "dgpu" path by default.
> >
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
> >  drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
> >  drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
> >  drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
> >  drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
> >  5 files changed, 34 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index a9a4319c24ae..189f9d7e190d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> >   * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
> >   * table to get information about AMD APUs. This option can serve as a workaround on
> >   * systems with a broken CRAT table.
> > + *
> > + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> > + * whehter use CRAT)
> >   */
> >  int ignore_crat;
> >  module_param(ignore_crat, int, 0444);
> >  MODULE_PARM_DESC(ignore_crat,
> > -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> > +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
> >  
> >  /**
> >   * DOC: halt_if_hws_hang (int)
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > index 59557e3e206a..f8346d4402e2 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > @@ -22,6 +22,7 @@
> >  
> >  #include <linux/pci.h>
> >  #include <linux/acpi.h>
> > +#include <asm/processor.h>
> >  #include "kfd_crat.h"
> >  #include "kfd_priv.h"
> >  #include "kfd_topology.h"
> > @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >  	return 0;
> >  }
> >  
> > +
> > +#ifdef CONFIG_ACPI
> > +static void kfd_setup_ignore_crat_option(void)
> > +{
> > +
> > +	if (ignore_crat)
> > +		return;
> > +
> > +#ifndef KFD_SUPPORT_IOMMU_V2
> > +	ignore_crat = 1;
> > +#else
> > +	ignore_crat = 0;
> > +#endif
> > +
> > +	/* Renoir use the fallback path to align with existed thunk */
> 
> Are you sure you need special code for Renoir here? For Renoir the
> dev->device_info already treats it as a dGPU and always has.

Renoir also is an APU, in other words, we might have got the correct CRAT
table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
we had got CRAT table, the kfd would create an APU node. That's not
expected.

> 
> I don't like the whole idea of changing the value of a module parameter,
> because it is global and visible to the user through sysfs. Instead, if
> you need to override the value of ignore_crat to consider other
> conditions, I think kfd_device_use_iommu_v2 and
> kfd_create_crat_image_acpi would be the right place to do it.
> 
> To avoid duplicating the conditions, you could add a helper function
> bool kfd_ignore_crat(void) that can be called instead of using the
> ignore_crat parameter directly. This function, changing the global
> module parameter, should be removed.

That makes sense. Will update it in next version.

> 
> 
> > +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> > +	    boot_cpu_data.x86 == 0x17 &&
> > +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> > +		ignore_crat = 1;
> > +	}
> > +
> > +	return;
> > +}
> > +
> >  /*
> >   * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
> >   * copies CRAT from ACPI (if available).
> > @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >   *
> >   *	Return 0 if successful else return error code
> >   */
> > -#ifdef CONFIG_ACPI
> >  int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >  {
> >  	struct acpi_table_header *crat_table;
> > @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >  		return -EINVAL;
> >  	}
> >  
> > +	kfd_setup_ignore_crat_option();
> > +
> >  	if (ignore_crat) {
> >  		pr_info("CRAT table disabled by module option\n");
> >  		return -ENODATA;
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > index 2c030c2b5b8d..dab44951c4d8 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
> >  	.num_xgmi_sdma_engines = 0,
> >  	.num_sdma_queues_per_engine = 2,
> >  };
> > +#endif
> >  
> >  static const struct kfd_device_info raven_device_info = {
> >  	.asic_family = CHIP_RAVEN,
> > @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
> >  	.num_xgmi_sdma_engines = 0,
> >  	.num_sdma_queues_per_engine = 2,
> >  };
> > -#endif
> >  
> >  static const struct kfd_device_info hawaii_device_info = {
> >  	.asic_family = CHIP_HAWAII,
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > index 82f955750e75..4b6e7ef7a71c 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> >  
> >  static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
> >  {
> > -	return dev && dev->device_info->needs_iommu_device;
> > +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
> >  }
> >  
> >  /* Debugfs */
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > index 4b29815e9205..b92ce75a4c53 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
> >  						    COMPUTE_UNIT_CPU, NULL,
> >  						    proximity_domain);
> >  		cpu_only_node = 1;
> > +		ignore_crat = 1;
> 
> Don't change the global variable. I think you're doing this here in case
> the CRAT table is broken and contains no GPU info. Maybe we need to add
> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
> 

Yes, you're right. Will remove global ignore_crat update. Let me revise it
again.

Thanks,
Ray

> Regards,
>   Felix
> 
> 
> >  		if (ret) {
> >  			pr_err("Error creating VCRAT table for CPU\n");
> >  			return ret;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-19 23:56     ` Huang Rui
@ 2020-08-20  0:18       ` Felix Kuehling
  2020-08-20  0:31         ` Huang Rui
  2020-08-20  3:09         ` Huang Rui
  0 siblings, 2 replies; 11+ messages in thread
From: Felix Kuehling @ 2020-08-20  0:18 UTC (permalink / raw)
  To: Huang Rui; +Cc: amd-gfx@lists.freedesktop.org

On 2020-08-19 7:56 p.m., Huang Rui wrote:
> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>> We still have a few iommu issues which need to address, so force raven
>>> as "dgpu" path for the moment.
>>>
>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>> or ACPI CRAT table not correct.
>>>
>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>      will use "dgpu" path by default.
>>>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>>>   5 files changed, 34 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index a9a4319c24ae..189f9d7e190d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>    * table to get information about AMD APUs. This option can serve as a workaround on
>>>    * systems with a broken CRAT table.
>>> + *
>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>> + * whehter use CRAT)
>>>    */
>>>   int ignore_crat;
>>>   module_param(ignore_crat, int, 0444);
>>>   MODULE_PARM_DESC(ignore_crat,
>>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>   
>>>   /**
>>>    * DOC: halt_if_hws_hang (int)
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> index 59557e3e206a..f8346d4402e2 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> @@ -22,6 +22,7 @@
>>>   
>>>   #include <linux/pci.h>
>>>   #include <linux/acpi.h>
>>> +#include <asm/processor.h>
>>>   #include "kfd_crat.h"
>>>   #include "kfd_priv.h"
>>>   #include "kfd_topology.h"
>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>   	return 0;
>>>   }
>>>   
>>> +
>>> +#ifdef CONFIG_ACPI
>>> +static void kfd_setup_ignore_crat_option(void)
>>> +{
>>> +
>>> +	if (ignore_crat)
>>> +		return;
>>> +
>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>> +	ignore_crat = 1;
>>> +#else
>>> +	ignore_crat = 0;
>>> +#endif
>>> +
>>> +	/* Renoir use the fallback path to align with existed thunk */
>> Are you sure you need special code for Renoir here? For Renoir the
>> dev->device_info already treats it as a dGPU and always has.
> Renoir also is an APU, in other words, we might have got the correct CRAT
> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> we had got CRAT table, the kfd would create an APU node. That's not
> expected.

kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
table because gpu->device_info->needs_iommu_device is False for Renoir. 
So Renoir will always show up in the topology as its own discrete GPU node.

How does this work today? Renoir is already treated as a dGPU. But the 
CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
CRAT table still shows GPU cores?

Regards,
   Felix


>
>> I don't like the whole idea of changing the value of a module parameter,
>> because it is global and visible to the user through sysfs. Instead, if
>> you need to override the value of ignore_crat to consider other
>> conditions, I think kfd_device_use_iommu_v2 and
>> kfd_create_crat_image_acpi would be the right place to do it.
>>
>> To avoid duplicating the conditions, you could add a helper function
>> bool kfd_ignore_crat(void) that can be called instead of using the
>> ignore_crat parameter directly. This function, changing the global
>> module parameter, should be removed.
> That makes sense. Will update it in next version.
>
>>
>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
>>> +	    boot_cpu_data.x86 == 0x17 &&
>>> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
>>> +		ignore_crat = 1;
>>> +	}
>>> +
>>> +	return;
>>> +}
>>> +
>>>   /*
>>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
>>>    * copies CRAT from ACPI (if available).
>>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>    *
>>>    *	Return 0 if successful else return error code
>>>    */
>>> -#ifdef CONFIG_ACPI
>>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>>>   {
>>>   	struct acpi_table_header *crat_table;
>>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>>>   		return -EINVAL;
>>>   	}
>>>   
>>> +	kfd_setup_ignore_crat_option();
>>> +
>>>   	if (ignore_crat) {
>>>   		pr_info("CRAT table disabled by module option\n");
>>>   		return -ENODATA;
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> index 2c030c2b5b8d..dab44951c4d8 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
>>>   	.num_xgmi_sdma_engines = 0,
>>>   	.num_sdma_queues_per_engine = 2,
>>>   };
>>> +#endif
>>>   
>>>   static const struct kfd_device_info raven_device_info = {
>>>   	.asic_family = CHIP_RAVEN,
>>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
>>>   	.num_xgmi_sdma_engines = 0,
>>>   	.num_sdma_queues_per_engine = 2,
>>>   };
>>> -#endif
>>>   
>>>   static const struct kfd_device_info hawaii_device_info = {
>>>   	.asic_family = CHIP_HAWAII,
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 82f955750e75..4b6e7ef7a71c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
>>>   
>>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
>>>   {
>>> -	return dev && dev->device_info->needs_iommu_device;
>>> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
>>>   }
>>>   
>>>   /* Debugfs */
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 4b29815e9205..b92ce75a4c53 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
>>>   						    COMPUTE_UNIT_CPU, NULL,
>>>   						    proximity_domain);
>>>   		cpu_only_node = 1;
>>> +		ignore_crat = 1;
>> Don't change the global variable. I think you're doing this here in case
>> the CRAT table is broken and contains no GPU info. Maybe we need to add
>> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
>>
> Yes, you're right. Will remove global ignore_crat update. Let me revise it
> again.
>
> Thanks,
> Ray
>
>> Regards,
>>    Felix
>>
>>
>>>   		if (ret) {
>>>   			pr_err("Error creating VCRAT table for CPU\n");
>>>   			return ret;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-20  0:18       ` Felix Kuehling
@ 2020-08-20  0:31         ` Huang Rui
  2020-08-20  9:38           ` Huang Rui
  2020-08-20  3:09         ` Huang Rui
  1 sibling, 1 reply; 11+ messages in thread
From: Huang Rui @ 2020-08-20  0:31 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx@lists.freedesktop.org

On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
> On 2020-08-19 7:56 p.m., Huang Rui wrote:
> > On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> >> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> >>> We still have a few iommu issues which need to address, so force raven
> >>> as "dgpu" path for the moment.
> >>>
> >>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> >>> or ACPI CRAT table not correct.
> >>>
> >>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> >>> v3: Align with existed thunk, don't change the way of raven, only renoir
> >>>      will use "dgpu" path by default.
> >>>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
> >>>   5 files changed, 34 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> index a9a4319c24ae..189f9d7e190d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> >>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
> >>>    * table to get information about AMD APUs. This option can serve as a workaround on
> >>>    * systems with a broken CRAT table.
> >>> + *
> >>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> >>> + * whehter use CRAT)
> >>>    */
> >>>   int ignore_crat;
> >>>   module_param(ignore_crat, int, 0444);
> >>>   MODULE_PARM_DESC(ignore_crat,
> >>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> >>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
> >>>   
> >>>   /**
> >>>    * DOC: halt_if_hws_hang (int)
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> index 59557e3e206a..f8346d4402e2 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> @@ -22,6 +22,7 @@
> >>>   
> >>>   #include <linux/pci.h>
> >>>   #include <linux/acpi.h>
> >>> +#include <asm/processor.h>
> >>>   #include "kfd_crat.h"
> >>>   #include "kfd_priv.h"
> >>>   #include "kfd_topology.h"
> >>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>>   	return 0;
> >>>   }
> >>>   
> >>> +
> >>> +#ifdef CONFIG_ACPI
> >>> +static void kfd_setup_ignore_crat_option(void)
> >>> +{
> >>> +
> >>> +	if (ignore_crat)
> >>> +		return;
> >>> +
> >>> +#ifndef KFD_SUPPORT_IOMMU_V2
> >>> +	ignore_crat = 1;
> >>> +#else
> >>> +	ignore_crat = 0;
> >>> +#endif
> >>> +
> >>> +	/* Renoir use the fallback path to align with existed thunk */
> >> Are you sure you need special code for Renoir here? For Renoir the
> >> dev->device_info already treats it as a dGPU and always has.
> > Renoir also is an APU, in other words, we might have got the correct CRAT
> > table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> > we had got CRAT table, the kfd would create an APU node. That's not
> > expected.
> 
> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
> table because gpu->device_info->needs_iommu_device is False for Renoir. 
> So Renoir will always show up in the topology as its own discrete GPU node.
> 
> How does this work today? Renoir is already treated as a dGPU. But the 
> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
> CRAT table still shows GPU cores?
> 

Yes, Renoir works well. In fact, I found the problem while I was enabling
the dGPU path for raven before. Even I set needs_iommu_device as false in
raven's device info. The kfd still creates the APU node. (in v1 patch)

Let me rollback to check it again.

Thanks,
Ray

> Regards,
>    Felix
> 
> 
> >
> >> I don't like the whole idea of changing the value of a module parameter,
> >> because it is global and visible to the user through sysfs. Instead, if
> >> you need to override the value of ignore_crat to consider other
> >> conditions, I think kfd_device_use_iommu_v2 and
> >> kfd_create_crat_image_acpi would be the right place to do it.
> >>
> >> To avoid duplicating the conditions, you could add a helper function
> >> bool kfd_ignore_crat(void) that can be called instead of using the
> >> ignore_crat parameter directly. This function, changing the global
> >> module parameter, should be removed.
> > That makes sense. Will update it in next version.
> >
> >>
> >>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >>> +	    boot_cpu_data.x86 == 0x17 &&
> >>> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> >>> +		ignore_crat = 1;
> >>> +	}
> >>> +
> >>> +	return;
> >>> +}
> >>> +
> >>>   /*
> >>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
> >>>    * copies CRAT from ACPI (if available).
> >>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>>    *
> >>>    *	Return 0 if successful else return error code
> >>>    */
> >>> -#ifdef CONFIG_ACPI
> >>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>   {
> >>>   	struct acpi_table_header *crat_table;
> >>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>   		return -EINVAL;
> >>>   	}
> >>>   
> >>> +	kfd_setup_ignore_crat_option();
> >>> +
> >>>   	if (ignore_crat) {
> >>>   		pr_info("CRAT table disabled by module option\n");
> >>>   		return -ENODATA;
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> index 2c030c2b5b8d..dab44951c4d8 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
> >>>   	.num_xgmi_sdma_engines = 0,
> >>>   	.num_sdma_queues_per_engine = 2,
> >>>   };
> >>> +#endif
> >>>   
> >>>   static const struct kfd_device_info raven_device_info = {
> >>>   	.asic_family = CHIP_RAVEN,
> >>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
> >>>   	.num_xgmi_sdma_engines = 0,
> >>>   	.num_sdma_queues_per_engine = 2,
> >>>   };
> >>> -#endif
> >>>   
> >>>   static const struct kfd_device_info hawaii_device_info = {
> >>>   	.asic_family = CHIP_HAWAII,
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> index 82f955750e75..4b6e7ef7a71c 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> >>>   
> >>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
> >>>   {
> >>> -	return dev && dev->device_info->needs_iommu_device;
> >>> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
> >>>   }
> >>>   
> >>>   /* Debugfs */
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> index 4b29815e9205..b92ce75a4c53 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
> >>>   						    COMPUTE_UNIT_CPU, NULL,
> >>>   						    proximity_domain);
> >>>   		cpu_only_node = 1;
> >>> +		ignore_crat = 1;
> >> Don't change the global variable. I think you're doing this here in case
> >> the CRAT table is broken and contains no GPU info. Maybe we need to add
> >> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
> >>
> > Yes, you're right. Will remove global ignore_crat update. Let me revise it
> > again.
> >
> > Thanks,
> > Ray
> >
> >> Regards,
> >>    Felix
> >>
> >>
> >>>   		if (ret) {
> >>>   			pr_err("Error creating VCRAT table for CPU\n");
> >>>   			return ret;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-20  0:18       ` Felix Kuehling
  2020-08-20  0:31         ` Huang Rui
@ 2020-08-20  3:09         ` Huang Rui
  2020-08-20  4:05           ` Felix Kuehling
  1 sibling, 1 reply; 11+ messages in thread
From: Huang Rui @ 2020-08-20  3:09 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx@lists.freedesktop.org

On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
> On 2020-08-19 7:56 p.m., Huang Rui wrote:
> > On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> >> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> >>> We still have a few iommu issues which need to address, so force raven
> >>> as "dgpu" path for the moment.
> >>>
> >>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> >>> or ACPI CRAT table not correct.
> >>>
> >>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> >>> v3: Align with existed thunk, don't change the way of raven, only renoir
> >>>      will use "dgpu" path by default.
> >>>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
> >>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
> >>>   5 files changed, 34 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> index a9a4319c24ae..189f9d7e190d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> >>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
> >>>    * table to get information about AMD APUs. This option can serve as a workaround on
> >>>    * systems with a broken CRAT table.
> >>> + *
> >>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> >>> + * whehter use CRAT)
> >>>    */
> >>>   int ignore_crat;
> >>>   module_param(ignore_crat, int, 0444);
> >>>   MODULE_PARM_DESC(ignore_crat,
> >>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> >>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
> >>>   
> >>>   /**
> >>>    * DOC: halt_if_hws_hang (int)
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> index 59557e3e206a..f8346d4402e2 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>> @@ -22,6 +22,7 @@
> >>>   
> >>>   #include <linux/pci.h>
> >>>   #include <linux/acpi.h>
> >>> +#include <asm/processor.h>
> >>>   #include "kfd_crat.h"
> >>>   #include "kfd_priv.h"
> >>>   #include "kfd_topology.h"
> >>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>>   	return 0;
> >>>   }
> >>>   
> >>> +
> >>> +#ifdef CONFIG_ACPI
> >>> +static void kfd_setup_ignore_crat_option(void)
> >>> +{
> >>> +
> >>> +	if (ignore_crat)
> >>> +		return;
> >>> +
> >>> +#ifndef KFD_SUPPORT_IOMMU_V2
> >>> +	ignore_crat = 1;
> >>> +#else
> >>> +	ignore_crat = 0;
> >>> +#endif
> >>> +
> >>> +	/* Renoir use the fallback path to align with existed thunk */
> >> Are you sure you need special code for Renoir here? For Renoir the
> >> dev->device_info already treats it as a dGPU and always has.
> > Renoir also is an APU, in other words, we might have got the correct CRAT
> > table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> > we had got CRAT table, the kfd would create an APU node. That's not
> > expected.
> 
> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
> table because gpu->device_info->needs_iommu_device is False for Renoir. 
> So Renoir will always show up in the topology as its own discrete GPU node.
> 
> How does this work today? Renoir is already treated as a dGPU. But the 
> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
> CRAT table still shows GPU cores?
> 
> Regards,
>    Felix
> 
> 
> >
> >> I don't like the whole idea of changing the value of a module parameter,
> >> because it is global and visible to the user through sysfs. Instead, if
> >> you need to override the value of ignore_crat to consider other
> >> conditions, I think kfd_device_use_iommu_v2 and
> >> kfd_create_crat_image_acpi would be the right place to do it.
> >>
> >> To avoid duplicating the conditions, you could add a helper function
> >> bool kfd_ignore_crat(void) that can be called instead of using the
> >> ignore_crat parameter directly. This function, changing the global
> >> module parameter, should be removed.
> > That makes sense. Will update it in next version.
> >
> >>
> >>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >>> +	    boot_cpu_data.x86 == 0x17 &&
> >>> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> >>> +		ignore_crat = 1;
> >>> +	}
> >>> +
> >>> +	return;
> >>> +}
> >>> +
> >>>   /*
> >>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
> >>>    * copies CRAT from ACPI (if available).
> >>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>>    *
> >>>    *	Return 0 if successful else return error code
> >>>    */
> >>> -#ifdef CONFIG_ACPI
> >>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>   {
> >>>   	struct acpi_table_header *crat_table;
> >>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>   		return -EINVAL;
> >>>   	}
> >>>   
> >>> +	kfd_setup_ignore_crat_option();
> >>> +
> >>>   	if (ignore_crat) {
> >>>   		pr_info("CRAT table disabled by module option\n");
> >>>   		return -ENODATA;
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> index 2c030c2b5b8d..dab44951c4d8 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
> >>>   	.num_xgmi_sdma_engines = 0,
> >>>   	.num_sdma_queues_per_engine = 2,
> >>>   };
> >>> +#endif
> >>>   
> >>>   static const struct kfd_device_info raven_device_info = {
> >>>   	.asic_family = CHIP_RAVEN,
> >>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
> >>>   	.num_xgmi_sdma_engines = 0,
> >>>   	.num_sdma_queues_per_engine = 2,
> >>>   };
> >>> -#endif
> >>>   
> >>>   static const struct kfd_device_info hawaii_device_info = {
> >>>   	.asic_family = CHIP_HAWAII,
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> index 82f955750e75..4b6e7ef7a71c 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> >>>   
> >>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
> >>>   {
> >>> -	return dev && dev->device_info->needs_iommu_device;
> >>> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
> >>>   }
> >>>   
> >>>   /* Debugfs */
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> index 4b29815e9205..b92ce75a4c53 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
> >>>   						    COMPUTE_UNIT_CPU, NULL,
> >>>   						    proximity_domain);
> >>>   		cpu_only_node = 1;
> >>> +		ignore_crat = 1;
> >> Don't change the global variable. I think you're doing this here in case
> >> the CRAT table is broken and contains no GPU info. Maybe we need to add
> >> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
> >>

Find it just now, kfd_dev is not initialized here. So we may be unable to
use flag in kfd_dev.

Thanks,
Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-20  3:09         ` Huang Rui
@ 2020-08-20  4:05           ` Felix Kuehling
  2020-08-20  4:41             ` Huang Rui
  0 siblings, 1 reply; 11+ messages in thread
From: Felix Kuehling @ 2020-08-20  4:05 UTC (permalink / raw)
  To: Huang Rui; +Cc: amd-gfx@lists.freedesktop.org


Am 2020-08-19 um 11:09 p.m. schrieb Huang Rui:
> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
>> On 2020-08-19 7:56 p.m., Huang Rui wrote:
>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>>>> We still have a few iommu issues which need to address, so force raven
>>>>> as "dgpu" path for the moment.
>>>>>
>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>>>> or ACPI CRAT table not correct.
>>>>>
>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>>>      will use "dgpu" path by default.
>>>>>
>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>>>>>   5 files changed, 34 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> index a9a4319c24ae..189f9d7e190d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>>>    * table to get information about AMD APUs. This option can serve as a workaround on
>>>>>    * systems with a broken CRAT table.
>>>>> + *
>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>>>> + * whehter use CRAT)
>>>>>    */
>>>>>   int ignore_crat;
>>>>>   module_param(ignore_crat, int, 0444);
>>>>>   MODULE_PARM_DESC(ignore_crat,
>>>>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>>>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>>>   
>>>>>   /**
>>>>>    * DOC: halt_if_hws_hang (int)
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>> index 59557e3e206a..f8346d4402e2 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>> @@ -22,6 +22,7 @@
>>>>>   
>>>>>   #include <linux/pci.h>
>>>>>   #include <linux/acpi.h>
>>>>> +#include <asm/processor.h>
>>>>>   #include "kfd_crat.h"
>>>>>   #include "kfd_priv.h"
>>>>>   #include "kfd_topology.h"
>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>   	return 0;
>>>>>   }
>>>>>   
>>>>> +
>>>>> +#ifdef CONFIG_ACPI
>>>>> +static void kfd_setup_ignore_crat_option(void)
>>>>> +{
>>>>> +
>>>>> +	if (ignore_crat)
>>>>> +		return;
>>>>> +
>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>>>> +	ignore_crat = 1;
>>>>> +#else
>>>>> +	ignore_crat = 0;
>>>>> +#endif
>>>>> +
>>>>> +	/* Renoir use the fallback path to align with existed thunk */
>>>> Are you sure you need special code for Renoir here? For Renoir the
>>>> dev->device_info already treats it as a dGPU and always has.
>>> Renoir also is an APU, in other words, we might have got the correct CRAT
>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
>>> we had got CRAT table, the kfd would create an APU node. That's not
>>> expected.
>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
>> table because gpu->device_info->needs_iommu_device is False for Renoir. 
>> So Renoir will always show up in the topology as its own discrete GPU node.
>>
>> How does this work today? Renoir is already treated as a dGPU. But the 
>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
>> CRAT table still shows GPU cores?
>>
>> Regards,
>>    Felix
>>
>>
>>>> I don't like the whole idea of changing the value of a module parameter,
>>>> because it is global and visible to the user through sysfs. Instead, if
>>>> you need to override the value of ignore_crat to consider other
>>>> conditions, I think kfd_device_use_iommu_v2 and
>>>> kfd_create_crat_image_acpi would be the right place to do it.
>>>>
>>>> To avoid duplicating the conditions, you could add a helper function
>>>> bool kfd_ignore_crat(void) that can be called instead of using the
>>>> ignore_crat parameter directly. This function, changing the global
>>>> module parameter, should be removed.
>>> That makes sense. Will update it in next version.
>>>
>>>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
>>>>> +	    boot_cpu_data.x86 == 0x17 &&
>>>>> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
>>>>> +		ignore_crat = 1;
>>>>> +	}
>>>>> +
>>>>> +	return;
>>>>> +}
>>>>> +
>>>>>   /*
>>>>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
>>>>>    * copies CRAT from ACPI (if available).
>>>>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>    *
>>>>>    *	Return 0 if successful else return error code
>>>>>    */
>>>>> -#ifdef CONFIG_ACPI
>>>>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>>>>>   {
>>>>>   	struct acpi_table_header *crat_table;
>>>>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>>>>>   		return -EINVAL;
>>>>>   	}
>>>>>   
>>>>> +	kfd_setup_ignore_crat_option();
>>>>> +
>>>>>   	if (ignore_crat) {
>>>>>   		pr_info("CRAT table disabled by module option\n");
>>>>>   		return -ENODATA;
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>>> index 2c030c2b5b8d..dab44951c4d8 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
>>>>>   	.num_xgmi_sdma_engines = 0,
>>>>>   	.num_sdma_queues_per_engine = 2,
>>>>>   };
>>>>> +#endif
>>>>>   
>>>>>   static const struct kfd_device_info raven_device_info = {
>>>>>   	.asic_family = CHIP_RAVEN,
>>>>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
>>>>>   	.num_xgmi_sdma_engines = 0,
>>>>>   	.num_sdma_queues_per_engine = 2,
>>>>>   };
>>>>> -#endif
>>>>>   
>>>>>   static const struct kfd_device_info hawaii_device_info = {
>>>>>   	.asic_family = CHIP_HAWAII,
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>> index 82f955750e75..4b6e7ef7a71c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
>>>>>   
>>>>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
>>>>>   {
>>>>> -	return dev && dev->device_info->needs_iommu_device;
>>>>> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
>>>>>   }
>>>>>   
>>>>>   /* Debugfs */
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> index 4b29815e9205..b92ce75a4c53 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
>>>>>   						    COMPUTE_UNIT_CPU, NULL,
>>>>>   						    proximity_domain);
>>>>>   		cpu_only_node = 1;
>>>>> +		ignore_crat = 1;
>>>> Don't change the global variable. I think you're doing this here in case
>>>> the CRAT table is broken and contains no GPU info. Maybe we need to add
>>>> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
>>>>
> Find it just now, kfd_dev is not initialized here. So we may be unable to
> use flag in kfd_dev.

I see. This is very early during module init. When you get here, you
already failed to read the ACPI CRAT table and created a VCRAT for the
CPU with no GPU cores.

If you wanted to add a per device "use_iommu_v2" flag, you could
probably set that in kfd_assign_gpu when it assigns a KFD device to a
node with CPU cores.

Regards,
  Felix


>
> Thanks,
> Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-20  4:05           ` Felix Kuehling
@ 2020-08-20  4:41             ` Huang Rui
  0 siblings, 0 replies; 11+ messages in thread
From: Huang Rui @ 2020-08-20  4:41 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx@lists.freedesktop.org

On Thu, Aug 20, 2020 at 12:05:56PM +0800, Kuehling, Felix wrote:
> 
> Am 2020-08-19 um 11:09 p.m. schrieb Huang Rui:
> > On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
> >> On 2020-08-19 7:56 p.m., Huang Rui wrote:
> >>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> >>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> >>>>> We still have a few iommu issues which need to address, so force raven
> >>>>> as "dgpu" path for the moment.
> >>>>>
> >>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> >>>>> or ACPI CRAT table not correct.
> >>>>>
> >>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> >>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
> >>>>>      will use "dgpu" path by default.
> >>>>>
> >>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>>>> ---
> >>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
> >>>>>   5 files changed, 34 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>> index a9a4319c24ae..189f9d7e190d 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> >>>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
> >>>>>    * table to get information about AMD APUs. This option can serve as a workaround on
> >>>>>    * systems with a broken CRAT table.
> >>>>> + *
> >>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> >>>>> + * whehter use CRAT)
> >>>>>    */
> >>>>>   int ignore_crat;
> >>>>>   module_param(ignore_crat, int, 0444);
> >>>>>   MODULE_PARM_DESC(ignore_crat,
> >>>>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> >>>>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
> >>>>>   
> >>>>>   /**
> >>>>>    * DOC: halt_if_hws_hang (int)
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>>>> index 59557e3e206a..f8346d4402e2 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>>>> @@ -22,6 +22,7 @@
> >>>>>   
> >>>>>   #include <linux/pci.h>
> >>>>>   #include <linux/acpi.h>
> >>>>> +#include <asm/processor.h>
> >>>>>   #include "kfd_crat.h"
> >>>>>   #include "kfd_priv.h"
> >>>>>   #include "kfd_topology.h"
> >>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>>>>   	return 0;
> >>>>>   }
> >>>>>   
> >>>>> +
> >>>>> +#ifdef CONFIG_ACPI
> >>>>> +static void kfd_setup_ignore_crat_option(void)
> >>>>> +{
> >>>>> +
> >>>>> +	if (ignore_crat)
> >>>>> +		return;
> >>>>> +
> >>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
> >>>>> +	ignore_crat = 1;
> >>>>> +#else
> >>>>> +	ignore_crat = 0;
> >>>>> +#endif
> >>>>> +
> >>>>> +	/* Renoir use the fallback path to align with existed thunk */
> >>>> Are you sure you need special code for Renoir here? For Renoir the
> >>>> dev->device_info already treats it as a dGPU and always has.
> >>> Renoir also is an APU, in other words, we might have got the correct CRAT
> >>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> >>> we had got CRAT table, the kfd would create an APU node. That's not
> >>> expected.
> >> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
> >> table because gpu->device_info->needs_iommu_device is False for Renoir. 
> >> So Renoir will always show up in the topology as its own discrete GPU node.
> >>
> >> How does this work today? Renoir is already treated as a dGPU. But the 
> >> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
> >> CRAT table still shows GPU cores?
> >>
> >> Regards,
> >>    Felix
> >>
> >>
> >>>> I don't like the whole idea of changing the value of a module parameter,
> >>>> because it is global and visible to the user through sysfs. Instead, if
> >>>> you need to override the value of ignore_crat to consider other
> >>>> conditions, I think kfd_device_use_iommu_v2 and
> >>>> kfd_create_crat_image_acpi would be the right place to do it.
> >>>>
> >>>> To avoid duplicating the conditions, you could add a helper function
> >>>> bool kfd_ignore_crat(void) that can be called instead of using the
> >>>> ignore_crat parameter directly. This function, changing the global
> >>>> module parameter, should be removed.
> >>> That makes sense. Will update it in next version.
> >>>
> >>>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >>>>> +	    boot_cpu_data.x86 == 0x17 &&
> >>>>> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> >>>>> +		ignore_crat = 1;
> >>>>> +	}
> >>>>> +
> >>>>> +	return;
> >>>>> +}
> >>>>> +
> >>>>>   /*
> >>>>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
> >>>>>    * copies CRAT from ACPI (if available).
> >>>>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> >>>>>    *
> >>>>>    *	Return 0 if successful else return error code
> >>>>>    */
> >>>>> -#ifdef CONFIG_ACPI
> >>>>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>>>   {
> >>>>>   	struct acpi_table_header *crat_table;
> >>>>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>>>   		return -EINVAL;
> >>>>>   	}
> >>>>>   
> >>>>> +	kfd_setup_ignore_crat_option();
> >>>>> +
> >>>>>   	if (ignore_crat) {
> >>>>>   		pr_info("CRAT table disabled by module option\n");
> >>>>>   		return -ENODATA;
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>>>> index 2c030c2b5b8d..dab44951c4d8 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>>>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
> >>>>>   	.num_xgmi_sdma_engines = 0,
> >>>>>   	.num_sdma_queues_per_engine = 2,
> >>>>>   };
> >>>>> +#endif
> >>>>>   
> >>>>>   static const struct kfd_device_info raven_device_info = {
> >>>>>   	.asic_family = CHIP_RAVEN,
> >>>>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
> >>>>>   	.num_xgmi_sdma_engines = 0,
> >>>>>   	.num_sdma_queues_per_engine = 2,
> >>>>>   };
> >>>>> -#endif
> >>>>>   
> >>>>>   static const struct kfd_device_info hawaii_device_info = {
> >>>>>   	.asic_family = CHIP_HAWAII,
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>>>> index 82f955750e75..4b6e7ef7a71c 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>>>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> >>>>>   
> >>>>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
> >>>>>   {
> >>>>> -	return dev && dev->device_info->needs_iommu_device;
> >>>>> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
> >>>>>   }
> >>>>>   
> >>>>>   /* Debugfs */
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>>>> index 4b29815e9205..b92ce75a4c53 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>>>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
> >>>>>   						    COMPUTE_UNIT_CPU, NULL,
> >>>>>   						    proximity_domain);
> >>>>>   		cpu_only_node = 1;
> >>>>> +		ignore_crat = 1;
> >>>> Don't change the global variable. I think you're doing this here in case
> >>>> the CRAT table is broken and contains no GPU info. Maybe we need to add
> >>>> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
> >>>>
> > Find it just now, kfd_dev is not initialized here. So we may be unable to
> > use flag in kfd_dev.
> 
> I see. This is very early during module init. When you get here, you
> already failed to read the ACPI CRAT table and created a VCRAT for the
> CPU with no GPU cores.
> 
> If you wanted to add a per device "use_iommu_v2" flag, you could
> probably set that in kfd_assign_gpu when it assigns a KFD device to a
> node with CPU cores.
> 

Yes, exactly!

Thanks,
Ray

> Regards,
>   Felix
> 
> 
> >
> > Thanks,
> > Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-20  0:31         ` Huang Rui
@ 2020-08-20  9:38           ` Huang Rui
  2020-08-20 13:22             ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Rui @ 2020-08-20  9:38 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx@lists.freedesktop.org

On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote:
> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
> > On 2020-08-19 7:56 p.m., Huang Rui wrote:
> > > On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> > >> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> > >>> We still have a few iommu issues which need to address, so force raven
> > >>> as "dgpu" path for the moment.
> > >>>
> > >>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> > >>> or ACPI CRAT table not correct.
> > >>>
> > >>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> > >>> v3: Align with existed thunk, don't change the way of raven, only renoir
> > >>>      will use "dgpu" path by default.
> > >>>
> > >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> > >>> ---
> > >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
> > >>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
> > >>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
> > >>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
> > >>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
> > >>>   5 files changed, 34 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > >>> index a9a4319c24ae..189f9d7e190d 100644
> > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > >>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> > >>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
> > >>>    * table to get information about AMD APUs. This option can serve as a workaround on
> > >>>    * systems with a broken CRAT table.
> > >>> + *
> > >>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> > >>> + * whehter use CRAT)
> > >>>    */
> > >>>   int ignore_crat;
> > >>>   module_param(ignore_crat, int, 0444);
> > >>>   MODULE_PARM_DESC(ignore_crat,
> > >>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> > >>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
> > >>>   
> > >>>   /**
> > >>>    * DOC: halt_if_hws_hang (int)
> > >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > >>> index 59557e3e206a..f8346d4402e2 100644
> > >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > >>> @@ -22,6 +22,7 @@
> > >>>   
> > >>>   #include <linux/pci.h>
> > >>>   #include <linux/acpi.h>
> > >>> +#include <asm/processor.h>
> > >>>   #include "kfd_crat.h"
> > >>>   #include "kfd_priv.h"
> > >>>   #include "kfd_topology.h"
> > >>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> > >>>   	return 0;
> > >>>   }
> > >>>   
> > >>> +
> > >>> +#ifdef CONFIG_ACPI
> > >>> +static void kfd_setup_ignore_crat_option(void)
> > >>> +{
> > >>> +
> > >>> +	if (ignore_crat)
> > >>> +		return;
> > >>> +
> > >>> +#ifndef KFD_SUPPORT_IOMMU_V2
> > >>> +	ignore_crat = 1;
> > >>> +#else
> > >>> +	ignore_crat = 0;
> > >>> +#endif
> > >>> +
> > >>> +	/* Renoir use the fallback path to align with existed thunk */
> > >> Are you sure you need special code for Renoir here? For Renoir the
> > >> dev->device_info already treats it as a dGPU and always has.
> > > Renoir also is an APU, in other words, we might have got the correct CRAT
> > > table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> > > we had got CRAT table, the kfd would create an APU node. That's not
> > > expected.
> > 
> > kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
> > table because gpu->device_info->needs_iommu_device is False for Renoir. 
> > So Renoir will always show up in the topology as its own discrete GPU node.
> > 
> > How does this work today? Renoir is already treated as a dGPU. But the 
> > CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
> > CRAT table still shows GPU cores?
> > 
> 
> Yes, Renoir works well. In fact, I found the problem while I was enabling
> the dGPU path for raven before. Even I set needs_iommu_device as false in
> raven's device info. The kfd still creates the APU node. (in v1 patch)
> 
> Let me rollback to check it again.
> 

Hi Felix,

I double check it again. If Renoir's ACPI CRAT were good, we wouldn't
create virtual CRAT table for Renoir at this moement. Then the pure CPU
node is unable to be created. That conflicted with needs_iommu_device flag
(we hardcode needs_iommu_device as false for renoir). Then will break the
user mode driver. (please see below rocminfo)

rocminfo: /libhsakmt/src/topology.c:1079: topology_sysfs_get_node_props: Assertion `props->EngineId.ui32.Major' failed.

So we probably would better have a specific handling to make sure Renoir
with virtual CRAT.

Thanks,
Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
  2020-08-20  9:38           ` Huang Rui
@ 2020-08-20 13:22             ` Felix Kuehling
  0 siblings, 0 replies; 11+ messages in thread
From: Felix Kuehling @ 2020-08-20 13:22 UTC (permalink / raw)
  To: Huang Rui; +Cc: amd-gfx@lists.freedesktop.org


Am 2020-08-20 um 5:38 a.m. schrieb Huang Rui:
> On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote:
>> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
>>> On 2020-08-19 7:56 p.m., Huang Rui wrote:
>>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>>>>> We still have a few iommu issues which need to address, so force raven
>>>>>> as "dgpu" path for the moment.
>>>>>>
>>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>>>>> or ACPI CRAT table not correct.
>>>>>>
>>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>>>>      will use "dgpu" path by default.
>>>>>>
>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>>>>>>   5 files changed, 34 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index a9a4319c24ae..189f9d7e190d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>>>>    * table to get information about AMD APUs. This option can serve as a workaround on
>>>>>>    * systems with a broken CRAT table.
>>>>>> + *
>>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>>>>> + * whehter use CRAT)
>>>>>>    */
>>>>>>   int ignore_crat;
>>>>>>   module_param(ignore_crat, int, 0444);
>>>>>>   MODULE_PARM_DESC(ignore_crat,
>>>>>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>>>>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>>>>   
>>>>>>   /**
>>>>>>    * DOC: halt_if_hws_hang (int)
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> index 59557e3e206a..f8346d4402e2 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> @@ -22,6 +22,7 @@
>>>>>>   
>>>>>>   #include <linux/pci.h>
>>>>>>   #include <linux/acpi.h>
>>>>>> +#include <asm/processor.h>
>>>>>>   #include "kfd_crat.h"
>>>>>>   #include "kfd_priv.h"
>>>>>>   #include "kfd_topology.h"
>>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>>   	return 0;
>>>>>>   }
>>>>>>   
>>>>>> +
>>>>>> +#ifdef CONFIG_ACPI
>>>>>> +static void kfd_setup_ignore_crat_option(void)
>>>>>> +{
>>>>>> +
>>>>>> +	if (ignore_crat)
>>>>>> +		return;
>>>>>> +
>>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>>>>> +	ignore_crat = 1;
>>>>>> +#else
>>>>>> +	ignore_crat = 0;
>>>>>> +#endif
>>>>>> +
>>>>>> +	/* Renoir use the fallback path to align with existed thunk */
>>>>> Are you sure you need special code for Renoir here? For Renoir the
>>>>> dev->device_info already treats it as a dGPU and always has.
>>>> Renoir also is an APU, in other words, we might have got the correct CRAT
>>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
>>>> we had got CRAT table, the kfd would create an APU node. That's not
>>>> expected.
>>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
>>> table because gpu->device_info->needs_iommu_device is False for Renoir. 
>>> So Renoir will always show up in the topology as its own discrete GPU node.
>>>
>>> How does this work today? Renoir is already treated as a dGPU. But the 
>>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
>>> CRAT table still shows GPU cores?
>>>
>> Yes, Renoir works well. In fact, I found the problem while I was enabling
>> the dGPU path for raven before. Even I set needs_iommu_device as false in
>> raven's device info. The kfd still creates the APU node. (in v1 patch)
>>
>> Let me rollback to check it again.
>>
> Hi Felix,
>
> I double check it again. If Renoir's ACPI CRAT were good, we wouldn't
> create virtual CRAT table for Renoir at this moement. Then the pure CPU
> node is unable to be created. That conflicted with needs_iommu_device flag
> (we hardcode needs_iommu_device as false for renoir). Then will break the
> user mode driver. (please see below rocminfo)
>
> rocminfo: /libhsakmt/src/topology.c:1079: topology_sysfs_get_node_props: Assertion `props->EngineId.ui32.Major' failed.
>
> So we probably would better have a specific handling to make sure Renoir
> with virtual CRAT.

OK. That would be a solution that works for Renoir only.

I'd suggest trying a more general solution in node_show in
kfd_topology.c. If we see an APU node (that has CPU and GPU cores) with
no associated GPU (dev->gpu_id is 0 or dev->gpu is NULL), we can report
it as a pure CPU node to user mode by just reporting the simd_count as 0.

Regards,
  Felix


>
> Thanks,
> Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-08-20 13:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-19 11:06 [PATCH v3 1/3] drm/amdkfd: abstract the iommu device checking with ignore_crat (v2) Huang Rui
2020-08-19 11:06 ` [PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3) Huang Rui
2020-08-19 15:38   ` Felix Kuehling
2020-08-19 23:56     ` Huang Rui
2020-08-20  0:18       ` Felix Kuehling
2020-08-20  0:31         ` Huang Rui
2020-08-20  9:38           ` Huang Rui
2020-08-20 13:22             ` Felix Kuehling
2020-08-20  3:09         ` Huang Rui
2020-08-20  4:05           ` Felix Kuehling
2020-08-20  4:41             ` Huang Rui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox