* [PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 3:00 ` SRINIVASAN SHANMUGAM
2024-08-15 0:04 ` [PATCH 02/17] drm/amdgpu: Add infrastructure for Cleaner Shader feature Alex Deucher
` (16 subsequent siblings)
17 siblings, 1 reply; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Alex Deucher
Some chips have more than one gfxhub so check if we
are a gfxhub rather than just gfxhub 0.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index b6a8bddada4c..6608eeb61e5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -484,7 +484,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub)
{
return vm->reserved_vmid[vmhub] ||
- (enforce_isolation && (vmhub == AMDGPU_GFXHUB(0)));
+ (enforce_isolation && AMDGPU_IS_GFXHUB(vmhub));
}
int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub
2024-08-15 0:04 ` [PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub Alex Deucher
@ 2024-08-15 3:00 ` SRINIVASAN SHANMUGAM
0 siblings, 0 replies; 23+ messages in thread
From: SRINIVASAN SHANMUGAM @ 2024-08-15 3:00 UTC (permalink / raw)
To: Alex Deucher, amd-gfx
On 8/15/2024 5:34 AM, Alex Deucher wrote:
> Some chips have more than one gfxhub so check if we
> are a gfxhub rather than just gfxhub 0.
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index b6a8bddada4c..6608eeb61e5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -484,7 +484,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
> bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub)
> {
> return vm->reserved_vmid[vmhub] ||
> - (enforce_isolation && (vmhub == AMDGPU_GFXHUB(0)));
> + (enforce_isolation && AMDGPU_IS_GFXHUB(vmhub));
> }
>
> int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
Modification to check if the vmhub is a gfxhub, rather than just
checking for gfxhub 0, is a necessary for memory management and data
transfer in systems with multiple gfxhubs, It ensures that all gfxhubs
are considered for multiple GPU's.
Based on my this understanding of the changes.
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 02/17] drm/amdgpu: Add infrastructure for Cleaner Shader feature
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
2024-08-15 0:04 ` [PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-22 6:24 ` Christian König
2024-08-15 0:04 ` [PATCH 03/17] drm/amdgpu: Emit cleaner shader at end of IB submission Alex Deucher
` (15 subsequent siblings)
17 siblings, 1 reply; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
The cleaner shader is used by the CP firmware to clean LDS and GPRs
between processes on the CUs.
This adds an internal API for GFX IP code to allocate and initialize the
cleaner shader.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 35 +++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 14 ++++++++++
2 files changed, 49 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 9be8cafdcecc..4ed69fcfe9c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1416,3 +1416,38 @@ void amdgpu_gfx_sysfs_fini(struct amdgpu_device *adev)
device_remove_file(adev->dev, &dev_attr_current_compute_partition);
device_remove_file(adev->dev, &dev_attr_available_compute_partition);
}
+
+int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
+ unsigned int cleaner_shader_size)
+{
+ if (!adev->gfx.enable_cleaner_shader)
+ return -EOPNOTSUPP;
+
+ return amdgpu_bo_create_kernel(adev, cleaner_shader_size, PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT,
+ &adev->gfx.cleaner_shader_obj,
+ &adev->gfx.cleaner_shader_gpu_addr,
+ (void **)&adev->gfx.cleaner_shader_cpu_ptr);
+}
+
+void amdgpu_gfx_cleaner_shader_sw_fini(struct amdgpu_device *adev)
+{
+ if (!adev->gfx.enable_cleaner_shader)
+ return;
+
+ amdgpu_bo_free_kernel(&adev->gfx.cleaner_shader_obj,
+ &adev->gfx.cleaner_shader_gpu_addr,
+ (void **)&adev->gfx.cleaner_shader_cpu_ptr);
+}
+
+void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
+ unsigned int cleaner_shader_size,
+ const void *cleaner_shader_ptr)
+{
+ if (!adev->gfx.enable_cleaner_shader)
+ return;
+
+ if (adev->gfx.cleaner_shader_cpu_ptr && cleaner_shader_ptr)
+ memcpy_toio(adev->gfx.cleaner_shader_cpu_ptr, cleaner_shader_ptr,
+ cleaner_shader_size);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 6fe77e483bb7..5ff3ab7d429a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -444,6 +444,14 @@ struct amdgpu_gfx {
uint32_t *ip_dump_core;
uint32_t *ip_dump_compute_queues;
uint32_t *ip_dump_gfx_queues;
+
+ /* cleaner shader */
+ struct amdgpu_bo *cleaner_shader_obj;
+ unsigned int cleaner_shader_size;
+ u64 cleaner_shader_gpu_addr;
+ void *cleaner_shader_cpu_ptr;
+ const void *cleaner_shader_ptr;
+ bool enable_cleaner_shader;
};
struct amdgpu_gfx_ras_reg_entry {
@@ -545,6 +553,12 @@ void amdgpu_gfx_ras_error_func(struct amdgpu_device *adev,
void *ras_error_status,
void (*func)(struct amdgpu_device *adev, void *ras_error_status,
int xcc_id));
+int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
+ unsigned int cleaner_shader_size);
+void amdgpu_gfx_cleaner_shader_sw_fini(struct amdgpu_device *adev);
+void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
+ unsigned int cleaner_shader_size,
+ const void *cleaner_shader_ptr);
static inline const char *amdgpu_gfx_compute_mode_desc(int mode)
{
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 02/17] drm/amdgpu: Add infrastructure for Cleaner Shader feature
2024-08-15 0:04 ` [PATCH 02/17] drm/amdgpu: Add infrastructure for Cleaner Shader feature Alex Deucher
@ 2024-08-22 6:24 ` Christian König
0 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2024-08-22 6:24 UTC (permalink / raw)
To: Alex Deucher, amd-gfx; +Cc: Srinivasan Shanmugam, Christian König
[-- Attachment #1: Type: text/plain, Size: 4037 bytes --]
Am 15.08.24 um 02:04 schrieb Alex Deucher:
> From: Srinivasan Shanmugam<srinivasan.shanmugam@amd.com>
>
> The cleaner shader is used by the CP firmware to clean LDS and GPRs
> between processes on the CUs.
>
> This adds an internal API for GFX IP code to allocate and initialize the
> cleaner shader.
>
> Cc: Christian König<christian.koenig@amd.com>
> Cc: Alex Deucher<alexander.deucher@amd.com>
> Signed-off-by: Alex Deucher<alexander.deucher@amd.com>
> Signed-off-by: Srinivasan Shanmugam<srinivasan.shanmugam@amd.com>
> Suggested-by: Christian König<christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 35 +++++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 14 ++++++++++
> 2 files changed, 49 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 9be8cafdcecc..4ed69fcfe9c1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -1416,3 +1416,38 @@ void amdgpu_gfx_sysfs_fini(struct amdgpu_device *adev)
> device_remove_file(adev->dev, &dev_attr_current_compute_partition);
> device_remove_file(adev->dev, &dev_attr_available_compute_partition);
> }
> +
> +int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
> + unsigned int cleaner_shader_size)
> +{
> + if (!adev->gfx.enable_cleaner_shader)
> + return -EOPNOTSUPP;
It's probably better to not call the function in the first place instead
of returning an error.
> +
> + return amdgpu_bo_create_kernel(adev, cleaner_shader_size, PAGE_SIZE,
> + AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT,
> + &adev->gfx.cleaner_shader_obj,
> + &adev->gfx.cleaner_shader_gpu_addr,
> + (void **)&adev->gfx.cleaner_shader_cpu_ptr);
> +}
> +
> +void amdgpu_gfx_cleaner_shader_sw_fini(struct amdgpu_device *adev)
> +{
> + if (!adev->gfx.enable_cleaner_shader)
> + return;
> +
> + amdgpu_bo_free_kernel(&adev->gfx.cleaner_shader_obj,
> + &adev->gfx.cleaner_shader_gpu_addr,
> + (void **)&adev->gfx.cleaner_shader_cpu_ptr);
> +}
> +
> +void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
> + unsigned int cleaner_shader_size,
> + const void *cleaner_shader_ptr)
> +{
> + if (!adev->gfx.enable_cleaner_shader)
> + return;
> +
> + if (adev->gfx.cleaner_shader_cpu_ptr && cleaner_shader_ptr)
> + memcpy_toio(adev->gfx.cleaner_shader_cpu_ptr, cleaner_shader_ptr,
> + cleaner_shader_size);
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 6fe77e483bb7..5ff3ab7d429a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -444,6 +444,14 @@ struct amdgpu_gfx {
> uint32_t *ip_dump_core;
> uint32_t *ip_dump_compute_queues;
> uint32_t *ip_dump_gfx_queues;
> +
> + /* cleaner shader */
> + struct amdgpu_bo *cleaner_shader_obj;
> + unsigned int cleaner_shader_size;
This is a parameter now and the member unused.
> + u64 cleaner_shader_gpu_addr;
> + void *cleaner_shader_cpu_ptr;
> + const void *cleaner_shader_ptr;
> + bool enable_cleaner_shader;
Probably better to test if cleaner_shader_obj is allocated or not
instead of having a separate bool.
Regards,
Christian.
> };
>
> struct amdgpu_gfx_ras_reg_entry {
> @@ -545,6 +553,12 @@ void amdgpu_gfx_ras_error_func(struct amdgpu_device *adev,
> void *ras_error_status,
> void (*func)(struct amdgpu_device *adev, void *ras_error_status,
> int xcc_id));
> +int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
> + unsigned int cleaner_shader_size);
> +void amdgpu_gfx_cleaner_shader_sw_fini(struct amdgpu_device *adev);
> +void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
> + unsigned int cleaner_shader_size,
> + const void *cleaner_shader_ptr);
>
> static inline const char *amdgpu_gfx_compute_mode_desc(int mode)
> {
[-- Attachment #2: Type: text/html, Size: 5764 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 03/17] drm/amdgpu: Emit cleaner shader at end of IB submission
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
2024-08-15 0:04 ` [PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub Alex Deucher
2024-08-15 0:04 ` [PATCH 02/17] drm/amdgpu: Add infrastructure for Cleaner Shader feature Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 04/17] drm/amdgpu: Make enforce_isolation setting per GPU Alex Deucher
` (14 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Alex Deucher, Christian König, Srinivasan Shanmugam
This commit introduces the emission of a cleaner shader at the end of
the IB submission process. This is achieved by adding a new function
pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If
the `emit_cleaner_shader` function is set in the ring functions, it is
called during the VM flush process.
The cleaner shader is only emitted if the `enable_cleaner_shader` flag
is set in the `amdgpu_device` structure. This allows the cleaner shader
emission to be controlled on a per-device basis.
By emitting a cleaner shader at the end of the IB submission, we can
ensure that the VM state is properly cleaned up after each submission.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index c7f15edeb367..f93f51002201 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -236,6 +236,7 @@ struct amdgpu_ring_funcs {
void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset);
void (*patch_de)(struct amdgpu_ring *ring, unsigned offset);
int (*reset)(struct amdgpu_ring *ring, unsigned int vmid);
+ void (*emit_cleaner_shader)(struct amdgpu_ring *ring);
};
struct amdgpu_ring {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bcb729094521..71ef3308be92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -681,6 +681,10 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
pasid_mapping_needed &= adev->gmc.gmc_funcs->emit_pasid_mapping &&
ring->funcs->emit_wreg;
+ if (adev->gfx.enable_cleaner_shader &&
+ ring->funcs->emit_cleaner_shader)
+ ring->funcs->emit_cleaner_shader(ring);
+
if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync)
return 0;
@@ -742,6 +746,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
amdgpu_ring_emit_switch_buffer(ring);
amdgpu_ring_emit_switch_buffer(ring);
}
+
amdgpu_ring_ib_end(ring);
return 0;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 04/17] drm/amdgpu: Make enforce_isolation setting per GPU
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (2 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 03/17] drm/amdgpu: Emit cleaner shader at end of IB submission Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 05/17] drm/amdgpu: Enforce isolation as part of the job Alex Deucher
` (13 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit makes enforce_isolation setting to be per GPU and per
partition by adding the enforce_isolation array to the adev structure.
The adev variable is set based on the global enforce_isolation module
parameter during device initialization.
In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
is used to determine whether to enforce isolation between graphics and
compute processes on that GPU.
In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
and partition is used to determine whether to enforce isolation between
graphics and compute processes on that GPU and partition.
This allows the enforce_isolation setting to be controlled individually
for each GPU and each partition, which is useful in a system with
multiple GPUs and partitions where different isolation settings might be
desired for different GPUs and partitions.
v2: fix loop in amdgpu_vmid_mgr_init() (Alex)
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 17 +++++++++++------
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h | 3 ++-
5 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index f3980b40f2ce..ae3e827da5ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1162,6 +1162,8 @@ struct amdgpu_device {
bool debug_disable_soft_recovery;
bool debug_use_vram_fw_buf;
bool debug_enable_ras_aca;
+
+ bool enforce_isolation[MAX_XCP];
};
static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 78b3c067fea7..5d5ba1e3d90f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1110,7 +1110,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
struct drm_gpu_scheduler *sched = entity->rq->sched;
struct amdgpu_ring *ring = to_amdgpu_ring(sched);
- if (amdgpu_vmid_uses_reserved(vm, ring->vm_hub))
+ if (amdgpu_vmid_uses_reserved(adev, vm, ring->vm_hub))
return -EINVAL;
}
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a6b8d0ba4758..95953c028ca5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1916,6 +1916,8 @@ static int amdgpu_device_init_apu_flags(struct amdgpu_device *adev)
*/
static int amdgpu_device_check_arguments(struct amdgpu_device *adev)
{
+ int i;
+
if (amdgpu_sched_jobs < 4) {
dev_warn(adev->dev, "sched jobs (%d) must be at least 4\n",
amdgpu_sched_jobs);
@@ -1970,6 +1972,9 @@ static int amdgpu_device_check_arguments(struct amdgpu_device *adev)
adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, amdgpu_fw_load_type);
+ for (i = 0; i < MAX_XCP; i++)
+ adev->enforce_isolation[i] = !!enforce_isolation;
+
return 0;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 6608eeb61e5a..92d27d32de41 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -424,7 +424,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
if (r || !idle)
goto error;
- if (amdgpu_vmid_uses_reserved(vm, vmhub)) {
+ if (amdgpu_vmid_uses_reserved(adev, vm, vmhub)) {
r = amdgpu_vmid_grab_reserved(vm, ring, job, &id, fence);
if (r || !id)
goto error;
@@ -476,15 +476,19 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
/*
* amdgpu_vmid_uses_reserved - check if a VM will use a reserved VMID
+ * @adev: amdgpu_device pointer
* @vm: the VM to check
* @vmhub: the VMHUB which will be used
*
* Returns: True if the VM will use a reserved VMID.
*/
-bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub)
+bool amdgpu_vmid_uses_reserved(struct amdgpu_device *adev,
+ struct amdgpu_vm *vm, unsigned int vmhub)
{
return vm->reserved_vmid[vmhub] ||
- (enforce_isolation && AMDGPU_IS_GFXHUB(vmhub));
+ (adev->enforce_isolation[(vm->root.bo->xcp_id != AMDGPU_XCP_NO_PARTITION) ?
+ vm->root.bo->xcp_id : 0] &&
+ AMDGPU_IS_GFXHUB(vmhub));
}
int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
@@ -600,9 +604,10 @@ void amdgpu_vmid_mgr_init(struct amdgpu_device *adev)
}
}
/* alloc a default reserved vmid to enforce isolation */
- if (enforce_isolation)
- amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(0));
-
+ for (i = 0; i < (adev->xcp_mgr ? adev->xcp_mgr->num_xcps : 1); i++) {
+ if (adev->enforce_isolation[i])
+ amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
+ }
}
/**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
index 240fa6751260..4012fb2dd08a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
@@ -78,7 +78,8 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv,
bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device *adev,
struct amdgpu_vmid *id);
-bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub);
+bool amdgpu_vmid_uses_reserved(struct amdgpu_device *adev,
+ struct amdgpu_vm *vm, unsigned int vmhub);
int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
unsigned vmhub);
void amdgpu_vmid_free_reserved(struct amdgpu_device *adev,
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 05/17] drm/amdgpu: Enforce isolation as part of the job
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (3 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 04/17] drm/amdgpu: Make enforce_isolation setting per GPU Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 06/17] drm/amdgpu: Add enforce_isolation sysfs attribute Alex Deucher
` (12 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This patch adds a new parameter 'enforce_isolation' to the amdgpu_job
structure. This parameter is used to determine whether shader isolation
should be enforced for a job. The enforce_isolation parameter is then
stored in the amdgpu_job structure and used when flushing the VM.
The enforce_isolation field of the amdgpu_job structure is set directly
after the job is allocated
This change allows more fine-grained control over shader isolation,
making it possible to enforce isolation on a per-job basis rather than
globally. This can be useful in scenarios where only certain jobs
require isolation.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 ++-
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 5d5ba1e3d90f..1e475eb01417 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -296,6 +296,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
num_ibs[i], &p->jobs[i]);
if (ret)
goto free_all_kdata;
+ p->jobs[i]->enforce_isolation = p->adev->enforce_isolation[fpriv->xcp_id];
}
p->gang_leader = p->jobs[p->gang_leader_idx];
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index a963a25ddd62..ce6b9ba967ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -76,6 +76,9 @@ struct amdgpu_job {
/* job_run_counter >= 1 means a resubmit job */
uint32_t job_run_counter;
+ /* enforce isolation */
+ bool enforce_isolation;
+
uint32_t num_ibs;
struct amdgpu_ib ibs[];
};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 71ef3308be92..1468222ea0cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -682,7 +682,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job,
ring->funcs->emit_wreg;
if (adev->gfx.enable_cleaner_shader &&
- ring->funcs->emit_cleaner_shader)
+ ring->funcs->emit_cleaner_shader &&
+ job->enforce_isolation)
ring->funcs->emit_cleaner_shader(ring);
if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync)
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 06/17] drm/amdgpu: Add enforce_isolation sysfs attribute
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (4 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 05/17] drm/amdgpu: Enforce isolation as part of the job Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 07/17] drm/amdgpu: Add sysfs interface for running cleaner shader Alex Deucher
` (11 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit adds a new sysfs attribute 'enforce_isolation' to control
the 'enforce_isolation' setting per GPU. The attribute can be read and
written, and accepts values 0 (disabled) and 1 (enabled).
When 'enforce_isolation' is enabled, reserved VMIDs are allocated for
each ring. When it's disabled, the reserved VMIDs are freed.
The set function locks a mutex before changing the 'enforce_isolation'
flag and the VMIDs, and unlocks it afterwards. This ensures that these
operations are atomic and prevents race conditions and other concurrency
issues.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 101 +++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 +
4 files changed, 107 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ae3e827da5ec..fac632986866 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1164,6 +1164,8 @@ struct amdgpu_device {
bool debug_enable_ras_aca;
bool enforce_isolation[MAX_XCP];
+ /* Added this mutex for cleaner shader isolation between GFX and compute processes */
+ struct mutex enforce_isolation_mutex;
};
static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 95953c028ca5..a6e714d1fe4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4064,6 +4064,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
mutex_init(&adev->notifier_lock);
mutex_init(&adev->pm.stable_pstate_ctx_lock);
mutex_init(&adev->benchmark_mutex);
+ /* Initialize the mutex for cleaner shader isolation between GFX and compute processes */
+ mutex_init(&adev->enforce_isolation_mutex);
amdgpu_device_init_apu_flags(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 4ed69fcfe9c1..2e35fc2577f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1391,6 +1391,88 @@ static ssize_t amdgpu_gfx_get_available_compute_partition(struct device *dev,
return sysfs_emit(buf, "%s\n", supported_partition);
}
+static ssize_t amdgpu_gfx_get_enforce_isolation(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct drm_device *ddev = dev_get_drvdata(dev);
+ struct amdgpu_device *adev = drm_to_adev(ddev);
+ int i;
+ ssize_t size = 0;
+
+ if (adev->xcp_mgr) {
+ for (i = 0; i < adev->xcp_mgr->num_xcps; i++) {
+ size += sysfs_emit_at(buf, size, "%u", adev->enforce_isolation[i]);
+ if (i < (adev->xcp_mgr->num_xcps - 1))
+ size += sysfs_emit_at(buf, size, " ");
+ }
+ buf[size++] = '\n';
+ } else {
+ size = sysfs_emit_at(buf, 0, "%u\n", adev->enforce_isolation[0]);
+ }
+
+ return size;
+}
+
+static ssize_t amdgpu_gfx_set_enforce_isolation(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct drm_device *ddev = dev_get_drvdata(dev);
+ struct amdgpu_device *adev = drm_to_adev(ddev);
+ long partition_values[MAX_XCP] = {0};
+ int ret, i, num_partitions;
+ const char *input_buf = buf;
+
+ for (i = 0; i < (adev->xcp_mgr ? adev->xcp_mgr->num_xcps : 1); i++) {
+ ret = sscanf(input_buf, "%ld", &partition_values[i]);
+ if (ret <= 0)
+ break;
+
+ /* Move the pointer to the next value in the string */
+ input_buf = strchr(input_buf, ' ');
+ if (input_buf) {
+ input_buf++;
+ } else {
+ i++;
+ break;
+ }
+ }
+ num_partitions = i;
+
+ if (adev->xcp_mgr && num_partitions != adev->xcp_mgr->num_xcps)
+ return -EINVAL;
+
+ if (!adev->xcp_mgr && num_partitions != 1)
+ return -EINVAL;
+
+ for (i = 0; i < num_partitions; i++) {
+ if (partition_values[i] != 0 && partition_values[i] != 1)
+ return -EINVAL;
+ }
+
+ mutex_lock(&adev->enforce_isolation_mutex);
+
+ for (i = 0; i < num_partitions; i++) {
+ if (adev->enforce_isolation[i] && !partition_values[i]) {
+ /* Going from enabled to disabled */
+ amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(i));
+ } else if (!adev->enforce_isolation[i] && partition_values[i]) {
+ /* Going from disabled to enabled */
+ amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
+ }
+ adev->enforce_isolation[i] = partition_values[i];
+ }
+
+ mutex_unlock(&adev->enforce_isolation_mutex);
+
+ return count;
+}
+
+static DEVICE_ATTR(enforce_isolation, 0644,
+ amdgpu_gfx_get_enforce_isolation,
+ amdgpu_gfx_set_enforce_isolation);
+
static DEVICE_ATTR(current_compute_partition, 0644,
amdgpu_gfx_get_current_compute_partition,
amdgpu_gfx_set_compute_partition);
@@ -1417,6 +1499,25 @@ void amdgpu_gfx_sysfs_fini(struct amdgpu_device *adev)
device_remove_file(adev->dev, &dev_attr_available_compute_partition);
}
+int amdgpu_gfx_sysfs_isolation_shader_init(struct amdgpu_device *adev)
+{
+ int r;
+
+ if (!amdgpu_sriov_vf(adev)) {
+ r = device_create_file(adev->dev, &dev_attr_enforce_isolation);
+ if (r)
+ return r;
+ }
+
+ return 0;
+}
+
+void amdgpu_gfx_sysfs_isolation_shader_fini(struct amdgpu_device *adev)
+{
+ if (!amdgpu_sriov_vf(adev))
+ device_remove_file(adev->dev, &dev_attr_enforce_isolation);
+}
+
int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
unsigned int cleaner_shader_size)
{
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 5ff3ab7d429a..cb83b66aba89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -559,6 +559,8 @@ void amdgpu_gfx_cleaner_shader_sw_fini(struct amdgpu_device *adev);
void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
unsigned int cleaner_shader_size,
const void *cleaner_shader_ptr);
+int amdgpu_gfx_sysfs_isolation_shader_init(struct amdgpu_device *adev);
+void amdgpu_gfx_sysfs_isolation_shader_fini(struct amdgpu_device *adev);
static inline const char *amdgpu_gfx_compute_mode_desc(int mode)
{
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 07/17] drm/amdgpu: Add sysfs interface for running cleaner shader
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (5 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 06/17] drm/amdgpu: Add enforce_isolation sysfs attribute Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-22 6:29 ` Christian König
2024-08-15 0:04 ` [PATCH 08/17] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution Alex Deucher
` (10 subsequent siblings)
17 siblings, 1 reply; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This patch adds a new sysfs interface for running the cleaner shader on
AMD GPUs. The cleaner shader is used to clear GPU memory before it's
reused, which can help prevent data leakage between different processes.
The new sysfs file is write-only and is named `run_cleaner_shader`.
Write the number of the partition to this file to trigger the cleaner shader
on that partition. There is only one partition on GPUs which do not
support partitioning.
Changes made in this patch:
- Added `amdgpu_set_run_cleaner_shader` function to handle writes to the
`run_cleaner_shader` sysfs file.
- Added `run_cleaner_shader` to the list of device attributes in
`amdgpu_device_attrs`.
- Updated `default_attr_update` to handle `run_cleaner_shader`.
- Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device
attributes.
v2: fix error handling (Alex)
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 134 ++++++++++++++++++++++++
1 file changed, 134 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 2e35fc2577f9..76f77cf562af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -24,10 +24,13 @@
*/
#include <linux/firmware.h>
+#include <linux/pm_runtime.h>
+
#include "amdgpu.h"
#include "amdgpu_gfx.h"
#include "amdgpu_rlc.h"
#include "amdgpu_ras.h"
+#include "amdgpu_reset.h"
#include "amdgpu_xcp.h"
#include "amdgpu_xgmi.h"
@@ -1391,6 +1394,129 @@ static ssize_t amdgpu_gfx_get_available_compute_partition(struct device *dev,
return sysfs_emit(buf, "%s\n", supported_partition);
}
+static int amdgpu_gfx_run_cleaner_shader_job(struct amdgpu_ring *ring)
+{
+ struct amdgpu_device *adev = ring->adev;
+ long timeout = msecs_to_jiffies(1000);
+ struct dma_fence *f = NULL;
+ struct amdgpu_job *job;
+ struct amdgpu_ib *ib;
+ int i, r;
+
+ r = amdgpu_job_alloc_with_ib(adev, NULL, NULL,
+ 64, AMDGPU_IB_POOL_DIRECT,
+ &job);
+ if (r)
+ goto err;
+
+ job->enforce_isolation = true;
+
+ ib = &job->ibs[0];
+ for (i = 0; i <= ring->funcs->align_mask; ++i)
+ ib->ptr[i] = ring->funcs->nop;
+ ib->length_dw = ring->funcs->align_mask + 1;
+
+ r = amdgpu_job_submit_direct(job, ring, &f);
+ if (r)
+ goto err_free;
+
+ r = dma_fence_wait_timeout(f, false, timeout);
+ if (r == 0)
+ r = -ETIMEDOUT;
+ else if (r > 0)
+ r = 0;
+
+ amdgpu_ib_free(adev, ib, f);
+ dma_fence_put(f);
+
+ return 0;
+
+err_free:
+ amdgpu_job_free(job);
+ amdgpu_ib_free(adev, ib, f);
+err:
+ return r;
+}
+
+static int amdgpu_gfx_run_cleaner_shader(struct amdgpu_device *adev, int xcp_id)
+{
+ int num_xcc = NUM_XCC(adev->gfx.xcc_mask);
+ struct amdgpu_ring *ring;
+ int num_xcc_to_clear;
+ int i, r, xcc_id;
+
+ if (adev->gfx.num_xcc_per_xcp)
+ num_xcc_to_clear = adev->gfx.num_xcc_per_xcp;
+ else
+ num_xcc_to_clear = 1;
+
+ for (xcc_id = 0; xcc_id < num_xcc; xcc_id++) {
+ for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+ ring = &adev->gfx.compute_ring[i + xcc_id * adev->gfx.num_compute_rings];
+ if ((ring->xcp_id == xcp_id) && ring->sched.ready) {
+ r = amdgpu_gfx_run_cleaner_shader_job(ring);
+ if (r)
+ return r;
+ num_xcc_to_clear--;
+ break;
+ }
+ }
+ }
+
+ if (num_xcc_to_clear)
+ return -ENOENT;
+
+ return 0;
+}
+
+static ssize_t amdgpu_gfx_set_run_cleaner_shader(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct drm_device *ddev = dev_get_drvdata(dev);
+ struct amdgpu_device *adev = drm_to_adev(ddev);
+ int ret;
+ long value;
+
+ if (amdgpu_in_reset(adev))
+ return -EPERM;
+ if (adev->in_suspend && !adev->in_runpm)
+ return -EPERM;
+
+ ret = kstrtol(buf, 0, &value);
+
+ if (ret)
+ return -EINVAL;
+
+ if (value < 0)
+ return -EINVAL;
+
+ if (adev->xcp_mgr) {
+ if (value >= adev->xcp_mgr->num_xcps)
+ return -EINVAL;
+ } else {
+ if (value > 1)
+ return -EINVAL;
+ }
+
+ ret = pm_runtime_get_sync(ddev->dev);
+ if (ret < 0) {
+ pm_runtime_put_autosuspend(ddev->dev);
+ return ret;
+ }
+
+ ret = amdgpu_gfx_run_cleaner_shader(adev, value);
+
+ pm_runtime_mark_last_busy(ddev->dev);
+ pm_runtime_put_autosuspend(ddev->dev);
+
+ if (ret)
+ return ret;
+
+ return count;
+}
+
static ssize_t amdgpu_gfx_get_enforce_isolation(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -1469,6 +1595,9 @@ static ssize_t amdgpu_gfx_set_enforce_isolation(struct device *dev,
return count;
}
+static DEVICE_ATTR(run_cleaner_shader, 0200,
+ NULL, amdgpu_gfx_set_run_cleaner_shader);
+
static DEVICE_ATTR(enforce_isolation, 0644,
amdgpu_gfx_get_enforce_isolation,
amdgpu_gfx_set_enforce_isolation);
@@ -1509,6 +1638,10 @@ int amdgpu_gfx_sysfs_isolation_shader_init(struct amdgpu_device *adev)
return r;
}
+ r = device_create_file(adev->dev, &dev_attr_run_cleaner_shader);
+ if (r)
+ return r;
+
return 0;
}
@@ -1516,6 +1649,7 @@ void amdgpu_gfx_sysfs_isolation_shader_fini(struct amdgpu_device *adev)
{
if (!amdgpu_sriov_vf(adev))
device_remove_file(adev->dev, &dev_attr_enforce_isolation);
+ device_remove_file(adev->dev, &dev_attr_run_cleaner_shader);
}
int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 07/17] drm/amdgpu: Add sysfs interface for running cleaner shader
2024-08-15 0:04 ` [PATCH 07/17] drm/amdgpu: Add sysfs interface for running cleaner shader Alex Deucher
@ 2024-08-22 6:29 ` Christian König
0 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2024-08-22 6:29 UTC (permalink / raw)
To: Alex Deucher, amd-gfx; +Cc: Srinivasan Shanmugam, Christian König
Am 15.08.24 um 02:04 schrieb Alex Deucher:
> From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
>
> This patch adds a new sysfs interface for running the cleaner shader on
> AMD GPUs. The cleaner shader is used to clear GPU memory before it's
> reused, which can help prevent data leakage between different processes.
>
> The new sysfs file is write-only and is named `run_cleaner_shader`.
> Write the number of the partition to this file to trigger the cleaner shader
> on that partition. There is only one partition on GPUs which do not
> support partitioning.
>
> Changes made in this patch:
>
> - Added `amdgpu_set_run_cleaner_shader` function to handle writes to the
> `run_cleaner_shader` sysfs file.
> - Added `run_cleaner_shader` to the list of device attributes in
> `amdgpu_device_attrs`.
> - Updated `default_attr_update` to handle `run_cleaner_shader`.
> - Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device
> attributes.
>
> v2: fix error handling (Alex)
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 134 ++++++++++++++++++++++++
> 1 file changed, 134 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 2e35fc2577f9..76f77cf562af 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -24,10 +24,13 @@
> */
>
> #include <linux/firmware.h>
> +#include <linux/pm_runtime.h>
> +
> #include "amdgpu.h"
> #include "amdgpu_gfx.h"
> #include "amdgpu_rlc.h"
> #include "amdgpu_ras.h"
> +#include "amdgpu_reset.h"
> #include "amdgpu_xcp.h"
> #include "amdgpu_xgmi.h"
>
> @@ -1391,6 +1394,129 @@ static ssize_t amdgpu_gfx_get_available_compute_partition(struct device *dev,
> return sysfs_emit(buf, "%s\n", supported_partition);
> }
>
> +static int amdgpu_gfx_run_cleaner_shader_job(struct amdgpu_ring *ring)
> +{
> + struct amdgpu_device *adev = ring->adev;
> + long timeout = msecs_to_jiffies(1000);
> + struct dma_fence *f = NULL;
> + struct amdgpu_job *job;
> + struct amdgpu_ib *ib;
> + int i, r;
> +
> + r = amdgpu_job_alloc_with_ib(adev, NULL, NULL,
> + 64, AMDGPU_IB_POOL_DIRECT,
> + &job);
> + if (r)
> + goto err;
> +
> + job->enforce_isolation = true;
> +
> + ib = &job->ibs[0];
> + for (i = 0; i <= ring->funcs->align_mask; ++i)
> + ib->ptr[i] = ring->funcs->nop;
> + ib->length_dw = ring->funcs->align_mask + 1;
> +
> + r = amdgpu_job_submit_direct(job, ring, &f);
That's a really really bad idea. There is nothing which guarantees the
the scheduler tries to submit something at the same time as well.
You need to allocate a scheduler entity and use that for submission.
Christian.
> + if (r)
> + goto err_free;
> +
> + r = dma_fence_wait_timeout(f, false, timeout);
> + if (r == 0)
> + r = -ETIMEDOUT;
> + else if (r > 0)
> + r = 0;
> +
> + amdgpu_ib_free(adev, ib, f);
> + dma_fence_put(f);
> +
> + return 0;
> +
> +err_free:
> + amdgpu_job_free(job);
> + amdgpu_ib_free(adev, ib, f);
> +err:
> + return r;
> +}
> +
> +static int amdgpu_gfx_run_cleaner_shader(struct amdgpu_device *adev, int xcp_id)
> +{
> + int num_xcc = NUM_XCC(adev->gfx.xcc_mask);
> + struct amdgpu_ring *ring;
> + int num_xcc_to_clear;
> + int i, r, xcc_id;
> +
> + if (adev->gfx.num_xcc_per_xcp)
> + num_xcc_to_clear = adev->gfx.num_xcc_per_xcp;
> + else
> + num_xcc_to_clear = 1;
> +
> + for (xcc_id = 0; xcc_id < num_xcc; xcc_id++) {
> + for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> + ring = &adev->gfx.compute_ring[i + xcc_id * adev->gfx.num_compute_rings];
> + if ((ring->xcp_id == xcp_id) && ring->sched.ready) {
> + r = amdgpu_gfx_run_cleaner_shader_job(ring);
> + if (r)
> + return r;
> + num_xcc_to_clear--;
> + break;
> + }
> + }
> + }
> +
> + if (num_xcc_to_clear)
> + return -ENOENT;
> +
> + return 0;
> +}
> +
> +static ssize_t amdgpu_gfx_set_run_cleaner_shader(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf,
> + size_t count)
> +{
> + struct drm_device *ddev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(ddev);
> + int ret;
> + long value;
> +
> + if (amdgpu_in_reset(adev))
> + return -EPERM;
> + if (adev->in_suspend && !adev->in_runpm)
> + return -EPERM;
> +
> + ret = kstrtol(buf, 0, &value);
> +
> + if (ret)
> + return -EINVAL;
> +
> + if (value < 0)
> + return -EINVAL;
> +
> + if (adev->xcp_mgr) {
> + if (value >= adev->xcp_mgr->num_xcps)
> + return -EINVAL;
> + } else {
> + if (value > 1)
> + return -EINVAL;
> + }
> +
> + ret = pm_runtime_get_sync(ddev->dev);
> + if (ret < 0) {
> + pm_runtime_put_autosuspend(ddev->dev);
> + return ret;
> + }
> +
> + ret = amdgpu_gfx_run_cleaner_shader(adev, value);
> +
> + pm_runtime_mark_last_busy(ddev->dev);
> + pm_runtime_put_autosuspend(ddev->dev);
> +
> + if (ret)
> + return ret;
> +
> + return count;
> +}
> +
> static ssize_t amdgpu_gfx_get_enforce_isolation(struct device *dev,
> struct device_attribute *attr,
> char *buf)
> @@ -1469,6 +1595,9 @@ static ssize_t amdgpu_gfx_set_enforce_isolation(struct device *dev,
> return count;
> }
>
> +static DEVICE_ATTR(run_cleaner_shader, 0200,
> + NULL, amdgpu_gfx_set_run_cleaner_shader);
> +
> static DEVICE_ATTR(enforce_isolation, 0644,
> amdgpu_gfx_get_enforce_isolation,
> amdgpu_gfx_set_enforce_isolation);
> @@ -1509,6 +1638,10 @@ int amdgpu_gfx_sysfs_isolation_shader_init(struct amdgpu_device *adev)
> return r;
> }
>
> + r = device_create_file(adev->dev, &dev_attr_run_cleaner_shader);
> + if (r)
> + return r;
> +
> return 0;
> }
>
> @@ -1516,6 +1649,7 @@ void amdgpu_gfx_sysfs_isolation_shader_fini(struct amdgpu_device *adev)
> {
> if (!amdgpu_sriov_vf(adev))
> device_remove_file(adev->dev, &dev_attr_enforce_isolation);
> + device_remove_file(adev->dev, &dev_attr_run_cleaner_shader);
> }
>
> int amdgpu_gfx_cleaner_shader_sw_init(struct amdgpu_device *adev,
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 08/17] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (6 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 07/17] drm/amdgpu: Add sysfs interface for running cleaner shader Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 09/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware Alex Deucher
` (9 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet
is a command packet used to instruct the GPU to execute the cleaner
shader.
The cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs). Clearing these resources is important for ensuring data
isolation between different workloads running on the GPU.
The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution
of the cleaner shader on the GPU. The packet consists of a header
followed by a RESERVED field, which is programmed to zero. When the GPU
receives this packet, it fetches and executes the cleaner shader
instructions from the location specified in the packet.
The cleaner shader feature helps to enhances security and reliability by
preventing data leaks between workloads.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/soc15d.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h b/drivers/gpu/drm/amd/amdgpu/soc15d.h
index e74e1983da53..b9cbeb389edc 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15d.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15d.h
@@ -413,6 +413,10 @@
# define PACKET3_QUERY_STATUS_DOORBELL_OFFSET(x) ((x) << 2)
# define PACKET3_QUERY_STATUS_ENG_SEL(x) ((x) << 25)
+#define PACKET3_RUN_CLEANER_SHADER 0xD2
+/* 1. header
+ * 2. RESERVED [31:0]
+ */
#define VCE_CMD_NO_OP 0x00000000
#define VCE_CMD_END 0x00000001
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 09/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (7 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 08/17] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 10/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware Alex Deucher
` (8 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
The patch modifies the gfx_v9_0_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.
This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.
Finally, the patch updates the ring function structures to include the
new gfx_v9_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 45 ++++++++++++++++---
.../drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h | 26 +++++++++++
2 files changed, 66 insertions(+), 5 deletions(-)
create mode 100644 drivers/gpu/drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index db21fb951e0e..3045b8b0796d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -50,6 +50,7 @@
#include "amdgpu_ring_mux.h"
#include "gfx_v9_4.h"
#include "gfx_v9_0.h"
+#include "gfx_v9_0_cleaner_shader.h"
#include "gfx_v9_4_2.h"
#include "asic_reg/pwr/pwr_10_0_offset.h"
@@ -899,6 +900,12 @@ static void gfx_v9_0_unset_safe_mode(struct amdgpu_device *adev, int xcc_id);
static void gfx_v9_0_kiq_set_resources(struct amdgpu_ring *kiq_ring,
uint64_t queue_mask)
{
+ struct amdgpu_device *adev = kiq_ring->adev;
+ u64 shader_mc_addr;
+
+ /* Cleaner shader MC address */
+ shader_mc_addr = adev->gfx.cleaner_shader_gpu_addr >> 8;
+
amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_RESOURCES, 6));
amdgpu_ring_write(kiq_ring,
PACKET3_SET_RESOURCES_VMID_MASK(0) |
@@ -908,8 +915,8 @@ static void gfx_v9_0_kiq_set_resources(struct amdgpu_ring *kiq_ring,
lower_32_bits(queue_mask)); /* queue mask lo */
amdgpu_ring_write(kiq_ring,
upper_32_bits(queue_mask)); /* queue mask hi */
- amdgpu_ring_write(kiq_ring, 0); /* gws mask lo */
- amdgpu_ring_write(kiq_ring, 0); /* gws mask hi */
+ amdgpu_ring_write(kiq_ring, lower_32_bits(shader_mc_addr)); /* cleaner shader addr lo */
+ amdgpu_ring_write(kiq_ring, upper_32_bits(shader_mc_addr)); /* cleaner shader addr hi */
amdgpu_ring_write(kiq_ring, 0); /* oac mask */
amdgpu_ring_write(kiq_ring, 0); /* gds heap base:0, gds heap size:0 */
}
@@ -2211,6 +2218,12 @@ static int gfx_v9_0_sw_init(void *handle)
break;
}
+ switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
+ default:
+ adev->gfx.enable_cleaner_shader = false;
+ break;
+ }
+
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 8;
@@ -2373,6 +2386,10 @@ static int gfx_v9_0_sw_init(void *handle)
gfx_v9_0_alloc_ip_dump(adev);
+ r = amdgpu_gfx_sysfs_isolation_shader_init(adev);
+ if (r)
+ return r;
+
return 0;
}
@@ -2408,6 +2425,8 @@ static int gfx_v9_0_sw_fini(void *handle)
}
gfx_v9_0_free_microcode(adev);
+ amdgpu_gfx_sysfs_isolation_shader_fini(adev);
+
kfree(adev->gfx.ip_dump_core);
kfree(adev->gfx.ip_dump_compute_queues);
@@ -3952,6 +3971,9 @@ static int gfx_v9_0_hw_init(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ amdgpu_gfx_cleaner_shader_init(adev, adev->gfx.cleaner_shader_size,
+ adev->gfx.cleaner_shader_ptr);
+
if (!amdgpu_sriov_vf(adev))
gfx_v9_0_init_golden_registers(adev);
@@ -7368,6 +7390,13 @@ static void gfx_v9_ip_dump(void *handle)
}
+static void gfx_v9_0_ring_emit_cleaner_shader(struct amdgpu_ring *ring)
+{
+ /* Emit the cleaner shader */
+ amdgpu_ring_write(ring, PACKET3(PACKET3_RUN_CLEANER_SHADER, 0));
+ amdgpu_ring_write(ring, 0); /* RESERVED field, programmed to zero */
+}
+
static const struct amd_ip_funcs gfx_v9_0_ip_funcs = {
.name = "gfx_v9_0",
.early_init = gfx_v9_0_early_init,
@@ -7417,7 +7446,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
5 + /* HDP_INVL */
8 + 8 + /* FENCE x2 */
2 + /* SWITCH_BUFFER */
- 7, /* gfx_v9_0_emit_mem_sync */
+ 7 + /* gfx_v9_0_emit_mem_sync */
+ 2, /* gfx_v9_0_ring_emit_cleaner_shader */
.emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_gfx */
.emit_ib = gfx_v9_0_ring_emit_ib_gfx,
.emit_fence = gfx_v9_0_ring_emit_fence,
@@ -7439,6 +7469,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
.soft_recovery = gfx_v9_0_ring_soft_recovery,
.emit_mem_sync = gfx_v9_0_emit_mem_sync,
.reset = gfx_v9_0_reset_kgq,
+ .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
};
static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
@@ -7471,7 +7502,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
5 + /* HDP_INVL */
8 + 8 + /* FENCE x2 */
2 + /* SWITCH_BUFFER */
- 7, /* gfx_v9_0_emit_mem_sync */
+ 7 + /* gfx_v9_0_emit_mem_sync */
+ 2, /* gfx_v9_0_ring_emit_cleaner_shader */
.emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_gfx */
.emit_ib = gfx_v9_0_ring_emit_ib_gfx,
.emit_fence = gfx_v9_0_ring_emit_fence,
@@ -7495,6 +7527,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
.patch_cntl = gfx_v9_0_ring_patch_cntl,
.patch_de = gfx_v9_0_ring_patch_de_meta,
.patch_ce = gfx_v9_0_ring_patch_ce_meta,
+ .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
};
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
@@ -7515,7 +7548,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
8 + 8 + 8 + /* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */
7 + /* gfx_v9_0_emit_mem_sync */
5 + /* gfx_v9_0_emit_wave_limit for updating mmSPI_WCL_PIPE_PERCENT_GFX register */
- 15, /* for updating 3 mmSPI_WCL_PIPE_PERCENT_CS registers */
+ 15 + /* for updating 3 mmSPI_WCL_PIPE_PERCENT_CS registers */
+ 2, /* gfx_v9_0_ring_emit_cleaner_shader */
.emit_ib_size = 7, /* gfx_v9_0_ring_emit_ib_compute */
.emit_ib = gfx_v9_0_ring_emit_ib_compute,
.emit_fence = gfx_v9_0_ring_emit_fence,
@@ -7534,6 +7568,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
.emit_mem_sync = gfx_v9_0_emit_mem_sync,
.emit_wave_limit = gfx_v9_0_emit_wave_limit,
.reset = gfx_v9_0_reset_kcq,
+ .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
};
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h
new file mode 100644
index 000000000000..36c0292b5110
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+/* Define the cleaner shader gfx_9_0 */
+static const u32 __maybe_unused gfx_9_0_cleaner_shader_hex[] = {
+ /* Add the cleaner shader code here */
+};
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 10/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (8 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 09/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 11/17] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3 Alex Deucher
` (7 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
The patch modifies the gfx_v9_4_3_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.
This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.
Finally, the patch updates the ring function structures to include the
new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 57 +++++++++++++++++--
.../amd/amdgpu/gfx_v9_4_3_cleaner_shader.h | 26 +++++++++
2 files changed, 78 insertions(+), 5 deletions(-)
create mode 100644 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 619ff3ec2c86..28f4212a8db2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -37,6 +37,7 @@
#include "gc/gc_9_4_3_sh_mask.h"
#include "gfx_v9_4_3.h"
+#include "gfx_v9_4_3_cleaner_shader.h"
#include "amdgpu_xcp.h"
#include "amdgpu_aca.h"
@@ -169,6 +170,12 @@ static void gfx_v9_4_3_xcc_unset_safe_mode(struct amdgpu_device *adev, int xcc_i
static void gfx_v9_4_3_kiq_set_resources(struct amdgpu_ring *kiq_ring,
uint64_t queue_mask)
{
+ struct amdgpu_device *adev = kiq_ring->adev;
+ u64 shader_mc_addr;
+
+ /* Cleaner shader MC address */
+ shader_mc_addr = adev->gfx.cleaner_shader_gpu_addr >> 8;
+
amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_RESOURCES, 6));
amdgpu_ring_write(kiq_ring,
PACKET3_SET_RESOURCES_VMID_MASK(0) |
@@ -178,8 +185,8 @@ static void gfx_v9_4_3_kiq_set_resources(struct amdgpu_ring *kiq_ring,
lower_32_bits(queue_mask)); /* queue mask lo */
amdgpu_ring_write(kiq_ring,
upper_32_bits(queue_mask)); /* queue mask hi */
- amdgpu_ring_write(kiq_ring, 0); /* gws mask lo */
- amdgpu_ring_write(kiq_ring, 0); /* gws mask hi */
+ amdgpu_ring_write(kiq_ring, lower_32_bits(shader_mc_addr)); /* cleaner shader addr lo */
+ amdgpu_ring_write(kiq_ring, upper_32_bits(shader_mc_addr)); /* cleaner shader addr hi */
amdgpu_ring_write(kiq_ring, 0); /* oac mask */
amdgpu_ring_write(kiq_ring, 0); /* gds heap base:0, gds heap size:0 */
}
@@ -1047,6 +1054,24 @@ static int gfx_v9_4_3_sw_init(void *handle)
int i, j, k, r, ring_id, xcc_id, num_xcc;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
+ case IP_VERSION(9, 4, 3):
+ adev->gfx.cleaner_shader_ptr = gfx_9_4_3_cleaner_shader_hex;
+ adev->gfx.cleaner_shader_size = sizeof(gfx_9_4_3_cleaner_shader_hex);
+ if (adev->gfx.mec_fw_version >= 153) {
+ adev->gfx.enable_cleaner_shader = true;
+ r = amdgpu_gfx_cleaner_shader_sw_init(adev, adev->gfx.cleaner_shader_size);
+ if (r) {
+ adev->gfx.enable_cleaner_shader = false;
+ dev_err(adev->dev, "Failed to initialize cleaner shader\n");
+ }
+ }
+ break;
+ default:
+ adev->gfx.enable_cleaner_shader = false;
+ break;
+ }
+
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 8;
@@ -1140,12 +1165,19 @@ static int gfx_v9_4_3_sw_init(void *handle)
return r;
- if (!amdgpu_sriov_vf(adev))
+ if (!amdgpu_sriov_vf(adev)) {
r = amdgpu_gfx_sysfs_init(adev);
+ if (r)
+ return r;
+ }
gfx_v9_4_3_alloc_ip_dump(adev);
- return r;
+ r = amdgpu_gfx_sysfs_isolation_shader_init(adev);
+ if (r)
+ return r;
+
+ return 0;
}
static int gfx_v9_4_3_sw_fini(void *handle)
@@ -1163,11 +1195,14 @@ static int gfx_v9_4_3_sw_fini(void *handle)
amdgpu_gfx_kiq_fini(adev, i);
}
+ amdgpu_gfx_cleaner_shader_sw_fini(adev);
+
gfx_v9_4_3_mec_fini(adev);
amdgpu_bo_unref(&adev->gfx.rlc.clear_state_obj);
gfx_v9_4_3_free_microcode(adev);
if (!amdgpu_sriov_vf(adev))
amdgpu_gfx_sysfs_fini(adev);
+ amdgpu_gfx_sysfs_isolation_shader_fini(adev);
kfree(adev->gfx.ip_dump_core);
kfree(adev->gfx.ip_dump_compute_queues);
@@ -2308,6 +2343,9 @@ static int gfx_v9_4_3_hw_init(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ amdgpu_gfx_cleaner_shader_init(adev, adev->gfx.cleaner_shader_size,
+ adev->gfx.cleaner_shader_ptr);
+
if (!amdgpu_sriov_vf(adev))
gfx_v9_4_3_init_golden_registers(adev);
@@ -4565,6 +4603,13 @@ static void gfx_v9_4_3_ip_dump(void *handle)
amdgpu_gfx_off_ctrl(adev, true);
}
+static void gfx_v9_4_3_ring_emit_cleaner_shader(struct amdgpu_ring *ring)
+{
+ /* Emit the cleaner shader */
+ amdgpu_ring_write(ring, PACKET3(PACKET3_RUN_CLEANER_SHADER, 0));
+ amdgpu_ring_write(ring, 0); /* RESERVED field, programmed to zero */
+}
+
static const struct amd_ip_funcs gfx_v9_4_3_ip_funcs = {
.name = "gfx_v9_4_3",
.early_init = gfx_v9_4_3_early_init,
@@ -4604,7 +4649,8 @@ static const struct amdgpu_ring_funcs gfx_v9_4_3_ring_funcs_compute = {
8 + 8 + 8 + /* gfx_v9_4_3_ring_emit_fence x3 for user fence, vm fence */
7 + /* gfx_v9_4_3_emit_mem_sync */
5 + /* gfx_v9_4_3_emit_wave_limit for updating regSPI_WCL_PIPE_PERCENT_GFX register */
- 15, /* for updating 3 regSPI_WCL_PIPE_PERCENT_CS registers */
+ 15 + /* for updating 3 regSPI_WCL_PIPE_PERCENT_CS registers */
+ 2, /* gfx_v9_4_3_ring_emit_cleaner_shader */
.emit_ib_size = 7, /* gfx_v9_4_3_ring_emit_ib_compute */
.emit_ib = gfx_v9_4_3_ring_emit_ib_compute,
.emit_fence = gfx_v9_4_3_ring_emit_fence,
@@ -4623,6 +4669,7 @@ static const struct amdgpu_ring_funcs gfx_v9_4_3_ring_funcs_compute = {
.emit_mem_sync = gfx_v9_4_3_emit_mem_sync,
.emit_wave_limit = gfx_v9_4_3_emit_wave_limit,
.reset = gfx_v9_4_3_reset_kcq,
+ .emit_cleaner_shader = gfx_v9_4_3_ring_emit_cleaner_shader,
};
static const struct amdgpu_ring_funcs gfx_v9_4_3_ring_funcs_kiq = {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
new file mode 100644
index 000000000000..042944ac75df
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/* Define the cleaner shader gfx_9_4_3 */
+static const u32 gfx_9_4_3_cleaner_shader_hex[] = {
+};
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 11/17] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (9 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 10/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 12/17] drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware Alex Deucher
` (6 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).
Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.
The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_3_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.
When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.
The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.
This addition is part of the cleaner shader feature implementation. The
cleaner shader feature helps improve GPU performance and resource
utilization by cleaning up GPU resources after they are used. It also
enhances security and reliability by preventing data leaks between
workloads.
v2: fix copyright date (Alex)
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
.../amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm | 153 ++++++++++++++++++
.../amd/amdgpu/gfx_v9_4_3_cleaner_shader.h | 38 +++++
2 files changed, 191 insertions(+)
create mode 100644 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm
new file mode 100644
index 000000000000..d5325ef80ab0
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm
@@ -0,0 +1,153 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+// This shader is to clean LDS, SGPRs and VGPRs. It is first 64 Dwords or 256 bytes of 192 Dwords cleaner shader.
+//To turn this shader program on for complitaion change this to main and lower shader main to main_1
+
+// MI300 : Clear SGPRs, VGPRs and LDS
+// Uses two kernels launched separately:
+// 1. Clean VGPRs, LDS, and lower SGPRs
+// Launches one workgroup per CU, each workgroup with 4x wave64 per SIMD in the CU
+// Waves are "wave64" and have 128 VGPRs each, which uses all 512 VGPRs per SIMD
+// Waves in the workgroup share the 64KB of LDS
+// Each wave clears SGPRs 0 - 95. Because there are 4 waves/SIMD, this is physical SGPRs 0-383
+// Each wave clears 128 VGPRs, so all 512 in the SIMD
+// The first wave of the workgroup clears its 64KB of LDS
+// The shader starts with "S_BARRIER" to ensure SPI has launched all waves of the workgroup
+// before any wave in the workgroup could end. Without this, it is possible not all SGPRs get cleared.
+// 2. Clean remaining SGPRs
+// Launches a workgroup with 24 waves per workgroup, yielding 6 waves per SIMD in each CU
+// Waves are allocating 96 SGPRs
+// CP sets up SPI_RESOURCE_RESERVE_* registers to prevent these waves from allocating SGPRs 0-223.
+// As such, these 6 waves per SIMD are allocated physical SGPRs 224-799
+// Barriers do not work for >16 waves per workgroup, so we cannot start with S_BARRIER
+// Instead, the shader starts with an S_SETHALT 1. Once all waves are launched CP will send unhalt command
+// The shader then clears all SGPRs allocated to it, cleaning out physical SGPRs 224-799
+
+shader main
+ asic(MI300)
+ type(CS)
+ wave_size(64)
+// Note: original source code from SQ team
+
+// (theorhetical fastest = ~512clks vgpr + 1536 lds + ~128 sgpr = 2176 clks)
+
+ s_cmp_eq_u32 s0, 1 // Bit0 is set, sgpr0 is set then clear VGPRS and LDS as FW set COMPUTE_USER_DATA_3
+ s_cbranch_scc0 label_0023 // Clean VGPRs and LDS if sgpr0 of wave is set, scc = (s3 == 1)
+ S_BARRIER
+
+ s_movk_i32 m0, 0x0000
+ s_mov_b32 s2, 0x00000078 // Loop 128/8=16 times (loop unrolled for performance)
+ //
+ // CLEAR VGPRs
+ //
+ s_set_gpr_idx_on s2, 0x8 // enable Dest VGPR indexing
+label_0005:
+ v_mov_b32 v0, 0
+ v_mov_b32 v1, 0
+ v_mov_b32 v2, 0
+ v_mov_b32 v3, 0
+ v_mov_b32 v4, 0
+ v_mov_b32 v5, 0
+ v_mov_b32 v6, 0
+ v_mov_b32 v7, 0
+ s_sub_u32 s2, s2, 8
+ s_set_gpr_idx_idx s2
+ s_cbranch_scc0 label_0005
+ s_set_gpr_idx_off
+
+ //
+ //
+
+ s_mov_b32 s2, 0x80000000 // Bit31 is first_wave
+ s_and_b32 s2, s2, s1 // sgpr0 has tg_size (first_wave) term as in ucode only COMPUTE_PGM_RSRC2.tg_size_en is set
+ s_cbranch_scc0 label_clean_sgpr_1 // Clean LDS if its first wave of ThreadGroup/WorkGroup
+ // CLEAR LDS
+ //
+ s_mov_b32 exec_lo, 0xffffffff
+ s_mov_b32 exec_hi, 0xffffffff
+ v_mbcnt_lo_u32_b32 v1, exec_hi, 0 // Set V1 to thread-ID (0..63)
+ v_mbcnt_hi_u32_b32 v1, exec_lo, v1 // Set V1 to thread-ID (0..63)
+ v_mul_u32_u24 v1, 0x00000008, v1 // * 8, so each thread is a double-dword address (8byte)
+ s_mov_b32 s2, 0x00000003f // 64 loop iteraions
+ s_mov_b32 m0, 0xffffffff
+ // Clear all of LDS space
+ // Each FirstWave of WorkGroup clears 64kbyte block
+
+label_001F:
+ ds_write2_b64 v1, v[2:3], v[2:3] offset1:32
+ ds_write2_b64 v1, v[4:5], v[4:5] offset0:64 offset1:96
+ v_add_co_u32 v1, vcc, 0x00000400, v1
+ s_sub_u32 s2, s2, 1
+ s_cbranch_scc0 label_001F
+ //
+ // CLEAR SGPRs
+ //
+label_clean_sgpr_1:
+ s_mov_b32 m0, 0x0000005c // Loop 96/4=24 times (loop unrolled for performance)
+ s_nop 0
+label_sgpr_loop:
+ s_movreld_b32 s0, 0
+ s_movreld_b32 s1, 0
+ s_movreld_b32 s2, 0
+ s_movreld_b32 s3, 0
+ s_sub_u32 m0, m0, 4
+ s_cbranch_scc0 label_sgpr_loop
+
+ //clear vcc, flat scratch
+ s_mov_b32 flat_scratch_lo, 0 //clear flat scratch lo SGPR
+ s_mov_b32 flat_scratch_hi, 0 //clear flat scratch hi SGPR
+ s_mov_b64 vcc, 0 //clear vcc
+ s_mov_b64 ttmp0, 0 //Clear ttmp0 and ttmp1
+ s_mov_b64 ttmp2, 0 //Clear ttmp2 and ttmp3
+ s_mov_b64 ttmp4, 0 //Clear ttmp4 and ttmp5
+ s_mov_b64 ttmp6, 0 //Clear ttmp6 and ttmp7
+ s_mov_b64 ttmp8, 0 //Clear ttmp8 and ttmp9
+ s_mov_b64 ttmp10, 0 //Clear ttmp10 and ttmp11
+ s_mov_b64 ttmp12, 0 //Clear ttmp12 and ttmp13
+ s_mov_b64 ttmp14, 0 //Clear ttmp14 and ttmp15
+s_endpgm
+
+label_0023:
+
+ s_sethalt 1
+
+ s_mov_b32 m0, 0x0000005c // Loop 96/4=24 times (loop unrolled for performance)
+ s_nop 0
+label_sgpr_loop1:
+
+ s_movreld_b32 s0, 0
+ s_movreld_b32 s1, 0
+ s_movreld_b32 s2, 0
+ s_movreld_b32 s3, 0
+ s_sub_u32 m0, m0, 4
+ s_cbranch_scc0 label_sgpr_loop1
+
+ //clear vcc, flat scratch
+ s_mov_b32 flat_scratch_lo, 0 //clear flat scratch lo SGPR
+ s_mov_b32 flat_scratch_hi, 0 //clear flat scratch hi SGPR
+ s_mov_b64 vcc, 0xee //clear vcc
+
+s_endpgm
+end
+
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
index 042944ac75df..69aa567c6c1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
@@ -23,4 +23,42 @@
/* Define the cleaner shader gfx_9_4_3 */
static const u32 gfx_9_4_3_cleaner_shader_hex[] = {
+ 0xbf068100, 0xbf84003b,
+ 0xbf8a0000, 0xb07c0000,
+ 0xbe8200ff, 0x00000078,
+ 0xbf110802, 0x7e000280,
+ 0x7e020280, 0x7e040280,
+ 0x7e060280, 0x7e080280,
+ 0x7e0a0280, 0x7e0c0280,
+ 0x7e0e0280, 0x80828802,
+ 0xbe803202, 0xbf84fff5,
+ 0xbf9c0000, 0xbe8200ff,
+ 0x80000000, 0x86020102,
+ 0xbf840011, 0xbefe00c1,
+ 0xbeff00c1, 0xd28c0001,
+ 0x0001007f, 0xd28d0001,
+ 0x0002027e, 0x10020288,
+ 0xbe8200bf, 0xbefc00c1,
+ 0xd89c2000, 0x00020201,
+ 0xd89c6040, 0x00040401,
+ 0x320202ff, 0x00000400,
+ 0x80828102, 0xbf84fff8,
+ 0xbefc00ff, 0x0000005c,
+ 0xbf800000, 0xbe802c80,
+ 0xbe812c80, 0xbe822c80,
+ 0xbe832c80, 0x80fc847c,
+ 0xbf84fffa, 0xbee60080,
+ 0xbee70080, 0xbeea0180,
+ 0xbeec0180, 0xbeee0180,
+ 0xbef00180, 0xbef20180,
+ 0xbef40180, 0xbef60180,
+ 0xbef80180, 0xbefa0180,
+ 0xbf810000, 0xbf8d0001,
+ 0xbefc00ff, 0x0000005c,
+ 0xbf800000, 0xbe802c80,
+ 0xbe812c80, 0xbe822c80,
+ 0xbe832c80, 0x80fc847c,
+ 0xbf84fffa, 0xbee60080,
+ 0xbee70080, 0xbeea01ff,
+ 0x000000ee, 0xbf810000,
};
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 12/17] drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (10 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 11/17] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3 Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 13/17] drm/amdkfd: APIs to stop/start KFD scheduling Alex Deucher
` (5 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit extends the cleaner shader feature to support GFX9.4.4
hardware.
The cleaner shader feature is used to clear or initialize certain GPU
resources, such as Local Data Share (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This
operation needs to be performed in isolation, while no other tasks
should be running on the GPU at the same time.
Previously, the cleaner shader feature was implemented for GFX9.4.3
hardware. This commit adds support for GFX9.4.4 hardware by allowing the
cleaner shader to be used with this hardware version.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 28f4212a8db2..fa6752585a72 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -1056,6 +1056,7 @@ static int gfx_v9_4_3_sw_init(void *handle)
switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
case IP_VERSION(9, 4, 3):
+ case IP_VERSION(9, 4, 4):
adev->gfx.cleaner_shader_ptr = gfx_9_4_3_cleaner_shader_hex;
adev->gfx.cleaner_shader_size = sizeof(gfx_9_4_3_cleaner_shader_hex);
if (adev->gfx.mec_fw_version >= 153) {
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 13/17] drm/amdkfd: APIs to stop/start KFD scheduling
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (11 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 12/17] drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 14/17] drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization Alex Deucher
` (4 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Amber Lin, Alex Deucher
From: Amber Lin <Amber.Lin@amd.com>
Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling
compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling.
When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from
runlist. If users send ioctls to KFD to create queues, they'll be added
but those queues won't be mapped to runlist (so not scheduled) until
amdgpu_amdkfd_start_sched is called.
v2: fix build (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 18 ++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 14 +++++
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 39 +++++++++++++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 58 ++++++++++++++++++-
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 9 +++
5 files changed, 137 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index c272461d70a9..64a989cbc301 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -887,3 +887,21 @@ int amdgpu_amdkfd_unmap_hiq(struct amdgpu_device *adev, u32 doorbell_off,
return r;
}
+
+/* Stop scheduling on KFD */
+int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id)
+{
+ if (!adev->kfd.init_complete)
+ return 0;
+
+ return kgd2kfd_stop_sched(adev->kfd.dev, node_id);
+}
+
+/* Start scheduling on KFD */
+int amdgpu_amdkfd_start_sched(struct amdgpu_device *adev, uint32_t node_id)
+{
+ if (!adev->kfd.init_complete)
+ return 0;
+
+ return kgd2kfd_start_sched(adev->kfd.dev, node_id);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 4ed49265c764..825c7ffe4bc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -264,6 +264,8 @@ int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev,
uint32_t *payload);
int amdgpu_amdkfd_unmap_hiq(struct amdgpu_device *adev, u32 doorbell_off,
u32 inst);
+int amdgpu_amdkfd_start_sched(struct amdgpu_device *adev, uint32_t node_id);
+int amdgpu_amdkfd_stop_sched(struct amdgpu_device *adev, uint32_t node_id);
/* Read user wptr from a specified user address space with page fault
* disabled. The memory must be pinned and mapped to the hardware when
@@ -426,6 +428,8 @@ void kgd2kfd_set_sram_ecc_flag(struct kfd_dev *kfd);
void kgd2kfd_smi_event_throttle(struct kfd_dev *kfd, uint64_t throttle_bitmask);
int kgd2kfd_check_and_lock_kfd(void);
void kgd2kfd_unlock_kfd(void);
+int kgd2kfd_start_sched(struct kfd_dev *kfd, uint32_t node_id);
+int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id);
#else
static inline int kgd2kfd_init(void)
{
@@ -496,5 +500,15 @@ static inline int kgd2kfd_check_and_lock_kfd(void)
static inline void kgd2kfd_unlock_kfd(void)
{
}
+
+static inline int kgd2kfd_start_sched(struct kfd_dev *kfd, uint32_t node_id)
+{
+ return 0;
+}
+
+static inline int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id)
+{
+ return 0;
+}
#endif
#endif /* AMDGPU_AMDKFD_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index c2d2598f776c..fad1c8f2bc83 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -1446,6 +1446,45 @@ void kgd2kfd_unlock_kfd(void)
mutex_unlock(&kfd_processes_mutex);
}
+int kgd2kfd_start_sched(struct kfd_dev *kfd, uint32_t node_id)
+{
+ struct kfd_node *node;
+ int ret;
+
+ if (!kfd->init_complete)
+ return 0;
+
+ if (node_id >= kfd->num_nodes) {
+ dev_warn(kfd->adev->dev, "Invalid node ID: %u exceeds %u\n",
+ node_id, kfd->num_nodes - 1);
+ return -EINVAL;
+ }
+ node = kfd->nodes[node_id];
+
+ ret = node->dqm->ops.unhalt(node->dqm);
+ if (ret)
+ dev_err(kfd_device, "Error in starting scheduler\n");
+
+ return ret;
+}
+
+int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id)
+{
+ struct kfd_node *node;
+
+ if (!kfd->init_complete)
+ return 0;
+
+ if (node_id >= kfd->num_nodes) {
+ dev_warn(kfd->adev->dev, "Invalid node ID: %u exceeds %u\n",
+ node_id, kfd->num_nodes - 1);
+ return -EINVAL;
+ }
+
+ node = kfd->nodes[node_id];
+ return node->dqm->ops.halt(node->dqm);
+}
+
#if defined(CONFIG_DEBUG_FS)
/* This function will send a package to HIQ to hang the HWS
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f6e211070299..d23388ea8181 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1679,6 +1679,60 @@ static int initialize_cpsch(struct device_queue_manager *dqm)
return 0;
}
+/* halt_cpsch:
+ * Unmap queues so the schedule doesn't continue remaining jobs in the queue.
+ * Then set dqm->sched_halt so queues don't map to runlist until unhalt_cpsch
+ * is called.
+ */
+static int halt_cpsch(struct device_queue_manager *dqm)
+{
+ int ret = 0;
+
+ dqm_lock(dqm);
+ if (!dqm->sched_running) {
+ dqm_unlock(dqm);
+ return 0;
+ }
+
+ WARN_ONCE(dqm->sched_halt, "Scheduling is already on halt\n");
+
+ if (!dqm->is_hws_hang) {
+ if (!dqm->dev->kfd->shared_resources.enable_mes)
+ ret = unmap_queues_cpsch(dqm,
+ KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0,
+ USE_DEFAULT_GRACE_PERIOD, false);
+ else
+ ret = remove_all_queues_mes(dqm);
+ }
+ dqm->sched_halt = true;
+ dqm_unlock(dqm);
+
+ return ret;
+}
+
+/* unhalt_cpsch
+ * Unset dqm->sched_halt and map queues back to runlist
+ */
+static int unhalt_cpsch(struct device_queue_manager *dqm)
+{
+ int ret = 0;
+
+ dqm_lock(dqm);
+ if (!dqm->sched_running || !dqm->sched_halt) {
+ WARN_ONCE(!dqm->sched_halt, "Scheduling is not on halt.\n");
+ dqm_unlock(dqm);
+ return 0;
+ }
+ dqm->sched_halt = false;
+ if (!dqm->dev->kfd->shared_resources.enable_mes)
+ ret = execute_queues_cpsch(dqm,
+ KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES,
+ 0, USE_DEFAULT_GRACE_PERIOD);
+ dqm_unlock(dqm);
+
+ return ret;
+}
+
static int start_cpsch(struct device_queue_manager *dqm)
{
struct device *dev = dqm->dev->adev->dev;
@@ -1984,7 +2038,7 @@ static int map_queues_cpsch(struct device_queue_manager *dqm)
struct device *dev = dqm->dev->adev->dev;
int retval;
- if (!dqm->sched_running)
+ if (!dqm->sched_running || dqm->sched_halt)
return 0;
if (dqm->active_queue_count <= 0 || dqm->processes_count <= 0)
return 0;
@@ -2727,6 +2781,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_node *dev)
dqm->ops.initialize = initialize_cpsch;
dqm->ops.start = start_cpsch;
dqm->ops.stop = stop_cpsch;
+ dqm->ops.halt = halt_cpsch;
+ dqm->ops.unhalt = unhalt_cpsch;
dqm->ops.destroy_queue = destroy_queue_cpsch;
dqm->ops.update_queue = update_queue;
dqm->ops.register_process = register_process;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index dfb36a246637..08b40826ad1e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -106,6 +106,12 @@ union GRBM_GFX_INDEX_BITS {
* @uninitialize: Destroys all the device queue manager resources allocated in
* initialize routine.
*
+ * @halt: This routine unmaps queues from runlist and set halt status to true
+ * so no more queues will be mapped to runlist until unhalt.
+ *
+ * @unhalt: This routine unset halt status to flase and maps queues back to
+ * runlist.
+ *
* @create_kernel_queue: Creates kernel queue. Used for debug queue.
*
* @destroy_kernel_queue: Destroys kernel queue. Used for debug queue.
@@ -153,6 +159,8 @@ struct device_queue_manager_ops {
int (*start)(struct device_queue_manager *dqm);
int (*stop)(struct device_queue_manager *dqm);
void (*uninitialize)(struct device_queue_manager *dqm);
+ int (*halt)(struct device_queue_manager *dqm);
+ int (*unhalt)(struct device_queue_manager *dqm);
int (*create_kernel_queue)(struct device_queue_manager *dqm,
struct kernel_queue *kq,
struct qcm_process_device *qpd);
@@ -264,6 +272,7 @@ struct device_queue_manager {
struct work_struct hw_exception_work;
struct kfd_mem_obj hiq_sdma_mqd;
bool sched_running;
+ bool sched_halt;
/* used for GFX 9.4.3 only */
uint32_t current_logical_xcc_start;
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 14/17] drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (12 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 13/17] drm/amdkfd: APIs to stop/start KFD scheduling Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-15 0:04 ` [PATCH 15/17] drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings Alex Deucher
` (3 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit introduces the Enforce Isolation Handler designed to enforce
shader isolation on AMD GPUs, which helps to prevent data leakage
between different processes.
The handler counts the number of emitted fences for each GFX and compute
ring. If there are any fences, it schedules the `enforce_isolation_work`
to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences,
it signals the Kernel Fusion Driver (KFD) to resume the runqueue.
The function is synchronized using the `enforce_isolation_mutex`.
This commit also introduces a reference count mechanism
(kfd_sch_req_count) to keep track of the number of requests to enable
the KFD scheduler. When a request to enable the KFD scheduler is made,
the reference count is decremented. When the reference count reaches
zero, a delayed work is scheduled to enforce isolation after a delay of
GFX_SLICE_PERIOD.
When a request to disable the KFD scheduler is made, the function first
checks if the reference count is zero. If it is, it cancels the delayed
work for enforcing isolation and checks if the KFD scheduler is active.
If the KFD scheduler is active, it sends a request to stop the KFD
scheduler and sets the KFD scheduler state to inactive. Then, it
increments the reference count.
The function is synchronized using the kfd_sch_mutex to ensure that the
KFD scheduler state and reference count are updated atomically.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 167 +++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 15 ++
4 files changed, 200 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index fac632986866..a5ef0407bce7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -118,6 +118,8 @@
#define MAX_GPU_INSTANCE 64
+#define GFX_SLICE_PERIOD msecs_to_jiffies(250)
+
struct amdgpu_gpu_instance {
struct amdgpu_device *adev;
int mgpu_fan_enabled;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a6e714d1fe4d..c47122a54dcb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4066,6 +4066,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
mutex_init(&adev->benchmark_mutex);
/* Initialize the mutex for cleaner shader isolation between GFX and compute processes */
mutex_init(&adev->enforce_isolation_mutex);
+ mutex_init(&adev->gfx.kfd_sch_mutex);
amdgpu_device_init_apu_flags(adev);
@@ -4097,6 +4098,21 @@ int amdgpu_device_init(struct amdgpu_device *adev,
amdgpu_device_delayed_init_work_handler);
INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
amdgpu_device_delay_enable_gfx_off);
+ /*
+ * Initialize the enforce_isolation work structures for each XCP
+ * partition. This work handler is responsible for enforcing shader
+ * isolation on AMD GPUs. It counts the number of emitted fences for
+ * each GFX and compute ring. If there are any fences, it schedules
+ * the `enforce_isolation_work` to be run after a delay. If there are
+ * no fences, it signals the Kernel Fusion Driver (KFD) to resume the
+ * runqueue.
+ */
+ for (i = 0; i < MAX_XCP; i++) {
+ INIT_DELAYED_WORK(&adev->gfx.enforce_isolation[i].work,
+ amdgpu_gfx_enforce_isolation_handler);
+ adev->gfx.enforce_isolation[i].adev = adev;
+ adev->gfx.enforce_isolation[i].xcp_id = i;
+ }
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 76f77cf562af..b4efeef848de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1686,3 +1686,170 @@ void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
memcpy_toio(adev->gfx.cleaner_shader_cpu_ptr, cleaner_shader_ptr,
cleaner_shader_size);
}
+
+/**
+ * amdgpu_gfx_kfd_sch_ctrl - Control the KFD scheduler from the KGD (Graphics Driver)
+ * @adev: amdgpu_device pointer
+ * @idx: Index of the scheduler to control
+ * @enable: Whether to enable or disable the KFD scheduler
+ *
+ * This function is used to control the KFD (Kernel Fusion Driver) scheduler
+ * from the KGD. It is part of the cleaner shader feature. This function plays
+ * a key role in enforcing process isolation on the GPU.
+ *
+ * The function uses a reference count mechanism (kfd_sch_req_count) to keep
+ * track of the number of requests to enable the KFD scheduler. When a request
+ * to enable the KFD scheduler is made, the reference count is decremented.
+ * When the reference count reaches zero, a delayed work is scheduled to
+ * enforce isolation after a delay of GFX_SLICE_PERIOD.
+ *
+ * When a request to disable the KFD scheduler is made, the function first
+ * checks if the reference count is zero. If it is, it cancels the delayed work
+ * for enforcing isolation and checks if the KFD scheduler is active. If the
+ * KFD scheduler is active, it sends a request to stop the KFD scheduler and
+ * sets the KFD scheduler state to inactive. Then, it increments the reference
+ * count.
+ *
+ * The function is synchronized using the kfd_sch_mutex to ensure that the KFD
+ * scheduler state and reference count are updated atomically.
+ *
+ * Note: If the reference count is already zero when a request to enable the
+ * KFD scheduler is made, it means there's an imbalance bug somewhere. The
+ * function triggers a warning in this case.
+ */
+static void amdgpu_gfx_kfd_sch_ctrl(struct amdgpu_device *adev, u32 idx,
+ bool enable)
+{
+ mutex_lock(&adev->gfx.kfd_sch_mutex);
+
+ if (enable) {
+ /* If the count is already 0, it means there's an imbalance bug somewhere.
+ * Note that the bug may be in a different caller than the one which triggers the
+ * WARN_ON_ONCE.
+ */
+ if (WARN_ON_ONCE(adev->gfx.kfd_sch_req_count[idx] == 0)) {
+ dev_err(adev->dev, "Attempted to enable KFD scheduler when reference count is already zero\n");
+ goto unlock;
+ }
+
+ adev->gfx.kfd_sch_req_count[idx]--;
+
+ if (adev->gfx.kfd_sch_req_count[idx] == 0 &&
+ adev->gfx.kfd_sch_inactive[idx]) {
+ schedule_delayed_work(&adev->gfx.enforce_isolation[idx].work,
+ GFX_SLICE_PERIOD);
+ }
+ } else {
+ if (adev->gfx.kfd_sch_req_count[idx] == 0) {
+ cancel_delayed_work_sync(&adev->gfx.enforce_isolation[idx].work);
+ if (!adev->gfx.kfd_sch_inactive[idx]) {
+ amdgpu_amdkfd_stop_sched(adev, idx);
+ adev->gfx.kfd_sch_inactive[idx] = true;
+ }
+ }
+
+ adev->gfx.kfd_sch_req_count[idx]++;
+ }
+
+unlock:
+ mutex_unlock(&adev->gfx.kfd_sch_mutex);
+}
+
+/**
+ * amdgpu_gfx_enforce_isolation_handler - work handler for enforcing shader isolation
+ *
+ * @work: work_struct.
+ *
+ * This function is the work handler for enforcing shader isolation on AMD GPUs.
+ * It counts the number of emitted fences for each GFX and compute ring. If there
+ * are any fences, it schedules the `enforce_isolation_work` to be run after a
+ * delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion
+ * Driver (KFD) to resume the runqueue. The function is synchronized using the
+ * `enforce_isolation_mutex`.
+ */
+void amdgpu_gfx_enforce_isolation_handler(struct work_struct *work)
+{
+ struct amdgpu_isolation_work *isolation_work =
+ container_of(work, struct amdgpu_isolation_work, work.work);
+ struct amdgpu_device *adev = isolation_work->adev;
+ u32 i, idx, fences = 0;
+
+ if (isolation_work->xcp_id == AMDGPU_XCP_NO_PARTITION)
+ idx = 0;
+ else
+ idx = isolation_work->xcp_id;
+
+ if (idx >= MAX_XCP)
+ return;
+
+ mutex_lock(&adev->enforce_isolation_mutex);
+ for (i = 0; i < AMDGPU_MAX_GFX_RINGS; ++i) {
+ if (isolation_work->xcp_id == adev->gfx.gfx_ring[i].xcp_id)
+ fences += amdgpu_fence_count_emitted(&adev->gfx.gfx_ring[i]);
+ }
+ for (i = 0; i < (AMDGPU_MAX_COMPUTE_RINGS * AMDGPU_MAX_GC_INSTANCES); ++i) {
+ if (isolation_work->xcp_id == adev->gfx.compute_ring[i].xcp_id)
+ fences += amdgpu_fence_count_emitted(&adev->gfx.compute_ring[i]);
+ }
+ if (fences) {
+ schedule_delayed_work(&adev->gfx.enforce_isolation[idx].work,
+ GFX_SLICE_PERIOD);
+ } else {
+ /* Tell KFD to resume the runqueue */
+ if (adev->kfd.init_complete) {
+ WARN_ON_ONCE(!adev->gfx.kfd_sch_inactive[idx]);
+ WARN_ON_ONCE(adev->gfx.kfd_sch_req_count[idx]);
+ amdgpu_amdkfd_start_sched(adev, idx);
+ adev->gfx.kfd_sch_inactive[idx] = false;
+ }
+ }
+ mutex_unlock(&adev->enforce_isolation_mutex);
+}
+
+void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring)
+{
+ struct amdgpu_device *adev = ring->adev;
+ u32 idx;
+
+ if (!adev->gfx.enable_cleaner_shader)
+ return;
+
+ if (ring->xcp_id == AMDGPU_XCP_NO_PARTITION)
+ idx = 0;
+ else
+ idx = ring->xcp_id;
+
+ if (idx >= MAX_XCP)
+ return;
+
+ mutex_lock(&adev->enforce_isolation_mutex);
+ if (adev->enforce_isolation[idx]) {
+ if (adev->kfd.init_complete)
+ amdgpu_gfx_kfd_sch_ctrl(adev, idx, false);
+ }
+ mutex_unlock(&adev->enforce_isolation_mutex);
+}
+
+void amdgpu_gfx_enforce_isolation_ring_end_use(struct amdgpu_ring *ring)
+{
+ struct amdgpu_device *adev = ring->adev;
+ u32 idx;
+
+ if (!adev->gfx.enable_cleaner_shader)
+ return;
+
+ if (ring->xcp_id == AMDGPU_XCP_NO_PARTITION)
+ idx = 0;
+ else
+ idx = ring->xcp_id;
+
+ if (idx >= MAX_XCP)
+ return;
+
+ mutex_lock(&adev->enforce_isolation_mutex);
+ if (adev->enforce_isolation[idx]) {
+ if (adev->kfd.init_complete)
+ amdgpu_gfx_kfd_sch_ctrl(adev, idx, true);
+ }
+ mutex_unlock(&adev->enforce_isolation_mutex);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index cb83b66aba89..eb556dee5a43 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -34,6 +34,7 @@
#include "soc15.h"
#include "amdgpu_ras.h"
#include "amdgpu_ring_mux.h"
+#include "amdgpu_xcp.h"
/* GFX current status */
#define AMDGPU_GFX_NORMAL_MODE 0x00000000L
@@ -343,6 +344,12 @@ struct amdgpu_me {
DECLARE_BITMAP(queue_bitmap, AMDGPU_MAX_GFX_QUEUES);
};
+struct amdgpu_isolation_work {
+ struct amdgpu_device *adev;
+ u32 xcp_id;
+ struct delayed_work work;
+};
+
struct amdgpu_gfx {
struct mutex gpu_clock_mutex;
struct amdgpu_gfx_config config;
@@ -452,6 +459,11 @@ struct amdgpu_gfx {
void *cleaner_shader_cpu_ptr;
const void *cleaner_shader_ptr;
bool enable_cleaner_shader;
+ struct amdgpu_isolation_work enforce_isolation[MAX_XCP];
+ /* Mutex for synchronizing KFD scheduler operations */
+ struct mutex kfd_sch_mutex;
+ u64 kfd_sch_req_count[MAX_XCP];
+ bool kfd_sch_inactive[MAX_XCP];
};
struct amdgpu_gfx_ras_reg_entry {
@@ -561,6 +573,9 @@ void amdgpu_gfx_cleaner_shader_init(struct amdgpu_device *adev,
const void *cleaner_shader_ptr);
int amdgpu_gfx_sysfs_isolation_shader_init(struct amdgpu_device *adev);
void amdgpu_gfx_sysfs_isolation_shader_fini(struct amdgpu_device *adev);
+void amdgpu_gfx_enforce_isolation_handler(struct work_struct *work);
+void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring);
+void amdgpu_gfx_enforce_isolation_ring_end_use(struct amdgpu_ring *ring);
static inline const char *amdgpu_gfx_compute_mode_desc(int mode)
{
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 15/17] drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (13 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 14/17] drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization Alex Deucher
@ 2024-08-15 0:04 ` Alex Deucher
2024-08-22 6:35 ` Christian König
2024-08-15 0:05 ` [PATCH 16/17] drm/amdgpu/gfx_v9_4_3: " Alex Deucher
` (2 subsequent siblings)
17 siblings, 1 reply; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:04 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_0 module.
The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.
`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.
`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.
These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 3045b8b0796d..21089aadbb7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -7470,6 +7470,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
.emit_mem_sync = gfx_v9_0_emit_mem_sync,
.reset = gfx_v9_0_reset_kgq,
.emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
+ .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
+ .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
};
static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
@@ -7528,6 +7530,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
.patch_de = gfx_v9_0_ring_patch_de_meta,
.patch_ce = gfx_v9_0_ring_patch_ce_meta,
.emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
+ .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
+ .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
};
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
@@ -7569,6 +7573,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
.emit_wave_limit = gfx_v9_0_emit_wave_limit,
.reset = gfx_v9_0_reset_kcq,
.emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
+ .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
+ .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
};
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = {
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 15/17] drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings
2024-08-15 0:04 ` [PATCH 15/17] drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings Alex Deucher
@ 2024-08-22 6:35 ` Christian König
0 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2024-08-22 6:35 UTC (permalink / raw)
To: Alex Deucher, amd-gfx; +Cc: Srinivasan Shanmugam, Christian König
Am 15.08.24 um 02:04 schrieb Alex Deucher:
> From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
>
> This commit applies isolation enforcement to the GFX and Compute rings
> in the gfx_v9_0 module.
>
> The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
> `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
> called when a ring begins and ends its use, respectively.
>
> `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
> begins its use. This function cancels any scheduled
> `enforce_isolation_work` and, if necessary, signals the Kernel Fusion
> Driver (KFD) to stop the runqueue.
>
> `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
> its use. This function schedules `enforce_isolation_work` to be run
> after a delay.
>
> These functions are part of the Enforce Isolation Handler, which
> enforces shader isolation on AMD GPUs to prevent data leakage between
> different processes.
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Suggested-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 3045b8b0796d..21089aadbb7b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -7470,6 +7470,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
> .emit_mem_sync = gfx_v9_0_emit_mem_sync,
> .reset = gfx_v9_0_reset_kgq,
> .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
> + .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
> + .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
Does it make sense to have that for the MCBP ring as well?
Christian.
> };
>
> static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
> @@ -7528,6 +7530,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_sw_ring_funcs_gfx = {
> .patch_de = gfx_v9_0_ring_patch_de_meta,
> .patch_ce = gfx_v9_0_ring_patch_ce_meta,
> .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
> + .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
> + .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
> };
>
> static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
> @@ -7569,6 +7573,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
> .emit_wave_limit = gfx_v9_0_emit_wave_limit,
> .reset = gfx_v9_0_reset_kcq,
> .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader,
> + .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
> + .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
> };
>
> static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = {
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 16/17] drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (14 preceding siblings ...)
2024-08-15 0:04 ` [PATCH 15/17] drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings Alex Deucher
@ 2024-08-15 0:05 ` Alex Deucher
2024-08-15 0:05 ` [PATCH 17/17] drm/amdkfd: Enable processes isolation on gfx9 Alex Deucher
2024-08-22 6:36 ` [PATCH 00/17] Process Isolation Support Christian König
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:05 UTC (permalink / raw)
To: amd-gfx; +Cc: Srinivasan Shanmugam, Christian König, Alex Deucher
From: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_4_3 module.
The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.
`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.
`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.
These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.
The commit also includes a check for the type of the ring. If the type
of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the
`enforce_isolation` structure in the `gfx` structure of the
`amdgpu_device` is set to the `xcp_id` of the ring. This ensures that
the correct `xcp_id` is used when enforcing isolation on compute rings.
The `xcp_id` is an identifier for an XCP partition, and different rings
can be associated with different XCP partitions.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
---
drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 4 ++++
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 ++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index 228fd4dd32f1..26e2188101e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -75,6 +75,8 @@ static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev,
uint32_t inst_mask;
ring->xcp_id = AMDGPU_XCP_NO_PARTITION;
+ if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
+ adev->gfx.enforce_isolation[0].xcp_id = ring->xcp_id;
if (adev->xcp_mgr->mode == AMDGPU_XCP_MODE_NONE)
return;
@@ -103,6 +105,8 @@ static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev,
for (xcp_id = 0; xcp_id < adev->xcp_mgr->num_xcps; xcp_id++) {
if (adev->xcp_mgr->xcp[xcp_id].ip[ip_blk].inst_mask & inst_mask) {
ring->xcp_id = xcp_id;
+ if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
+ adev->gfx.enforce_isolation[xcp_id].xcp_id = xcp_id;
break;
}
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index fa6752585a72..2067f26d3a9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -4671,6 +4671,8 @@ static const struct amdgpu_ring_funcs gfx_v9_4_3_ring_funcs_compute = {
.emit_wave_limit = gfx_v9_4_3_emit_wave_limit,
.reset = gfx_v9_4_3_reset_kcq,
.emit_cleaner_shader = gfx_v9_4_3_ring_emit_cleaner_shader,
+ .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use,
+ .end_use = amdgpu_gfx_enforce_isolation_ring_end_use,
};
static const struct amdgpu_ring_funcs gfx_v9_4_3_ring_funcs_kiq = {
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 17/17] drm/amdkfd: Enable processes isolation on gfx9
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (15 preceding siblings ...)
2024-08-15 0:05 ` [PATCH 16/17] drm/amdgpu/gfx_v9_4_3: " Alex Deucher
@ 2024-08-15 0:05 ` Alex Deucher
2024-08-22 6:36 ` [PATCH 00/17] Process Isolation Support Christian König
17 siblings, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2024-08-15 0:05 UTC (permalink / raw)
To: amd-gfx; +Cc: Amber Lin, Alex Deucher
From: Amber Lin <Amber.Lin@amd.com>
When amdgpu enable enforce_isolation, KFD enables single-process mode in
HWS and sets exec_cleaner_shader bit in MAP_PROCESS.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 14 +++++++++++++-
drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h | 5 +++--
.../gpu/drm/amd/amdkfd/kfd_pm4_headers_aldebaran.h | 2 +-
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 00776f08351c..1f9f5bfeaf86 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -37,11 +37,14 @@ static int pm_map_process_v9(struct packet_manager *pm,
struct kfd_node *kfd = pm->dqm->dev;
struct kfd_process_device *pdd =
container_of(qpd, struct kfd_process_device, qpd);
+ struct amdgpu_device *adev = kfd->adev;
packet = (struct pm4_mes_map_process *)buffer;
memset(buffer, 0, sizeof(struct pm4_mes_map_process));
packet->header.u32All = pm_build_pm4_header(IT_MAP_PROCESS,
sizeof(struct pm4_mes_map_process));
+ if (adev->enforce_isolation[kfd->node_id])
+ packet->bitfields2.exec_cleaner_shader = 1;
packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
packet->bitfields2.process_quantum = 10;
packet->bitfields2.pasid = qpd->pqm->process->pasid;
@@ -89,14 +92,18 @@ static int pm_map_process_aldebaran(struct packet_manager *pm,
struct pm4_mes_map_process_aldebaran *packet;
uint64_t vm_page_table_base_addr = qpd->page_table_base;
struct kfd_dev *kfd = pm->dqm->dev->kfd;
+ struct kfd_node *knode = pm->dqm->dev;
struct kfd_process_device *pdd =
container_of(qpd, struct kfd_process_device, qpd);
int i;
+ struct amdgpu_device *adev = kfd->adev;
packet = (struct pm4_mes_map_process_aldebaran *)buffer;
memset(buffer, 0, sizeof(struct pm4_mes_map_process_aldebaran));
packet->header.u32All = pm_build_pm4_header(IT_MAP_PROCESS,
sizeof(struct pm4_mes_map_process_aldebaran));
+ if (adev->enforce_isolation[knode->node_id])
+ packet->bitfields2.exec_cleaner_shader = 1;
packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
packet->bitfields2.process_quantum = 10;
packet->bitfields2.pasid = qpd->pqm->process->pasid;
@@ -144,17 +151,22 @@ static int pm_runlist_v9(struct packet_manager *pm, uint32_t *buffer,
int concurrent_proc_cnt = 0;
struct kfd_node *kfd = pm->dqm->dev;
+ struct amdgpu_device *adev = kfd->adev;
/* Determine the number of processes to map together to HW:
* it can not exceed the number of VMIDs available to the
* scheduler, and it is determined by the smaller of the number
* of processes in the runlist and kfd module parameter
* hws_max_conc_proc.
+ * However, if enforce_isolation is set (toggle LDS/VGPRs/SGPRs
+ * cleaner between process switch), enable single-process mode
+ * in HWS.
* Note: the arbitration between the number of VMIDs and
* hws_max_conc_proc has been done in
* kgd2kfd_device_init().
*/
- concurrent_proc_cnt = min(pm->dqm->processes_count,
+ concurrent_proc_cnt = adev->enforce_isolation[kfd->node_id] ?
+ 1 : min(pm->dqm->processes_count,
kfd->max_proc_per_quantum);
packet = (struct pm4_mes_runlist *)buffer;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
index 8b6b2bd5c148..cd8611401a66 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
@@ -145,8 +145,9 @@ struct pm4_mes_map_process {
union {
struct {
- uint32_t pasid:16;
- uint32_t reserved1:2;
+ uint32_t pasid:16; /* 0 - 15 */
+ uint32_t reserved1:1; /* 16 */
+ uint32_t exec_cleaner_shader:1; /* 17 */
uint32_t debug_vmid:4;
uint32_t new_debug:1;
uint32_t reserved2:1;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_aldebaran.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_aldebaran.h
index 38f5cb6a222a..e0ed62c4ade0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_aldebaran.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_aldebaran.h
@@ -37,7 +37,7 @@ struct pm4_mes_map_process_aldebaran {
struct {
uint32_t pasid:16; /* 0 - 15 */
uint32_t single_memops:1; /* 16 */
- uint32_t reserved1:1; /* 17 */
+ uint32_t exec_cleaner_shader:1; /* 17 */
uint32_t debug_vmid:4; /* 18 - 21 */
uint32_t new_debug:1; /* 22 */
uint32_t tmz:1; /* 23 */
--
2.46.0
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 00/17] Process Isolation Support
2024-08-15 0:04 [PATCH 00/17] Process Isolation Support Alex Deucher
` (16 preceding siblings ...)
2024-08-15 0:05 ` [PATCH 17/17] drm/amdkfd: Enable processes isolation on gfx9 Alex Deucher
@ 2024-08-22 6:36 ` Christian König
17 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2024-08-22 6:36 UTC (permalink / raw)
To: Alex Deucher, amd-gfx, Srinivasan Shanmugam
Reviewed-by: Christian König <christian.koenig@amd.com> for patches #1,
#3-#5,
Acked-by: Christian König <christian.koenig@amd.com> for patches #6,
#8-#10, #12, #13, #14, #17.
Some nit picks in patches #2, use of // for comments in patch #11 and
question on patch #15/#16
Really big bug in patch #7, that needs to be addressed.
Regards,
Christian.
Am 15.08.24 um 02:04 schrieb Alex Deucher:
> This patch set enables process isolation mode which
> serializes access to the graphics block between processes.
> When this mode is active, a cleaner shader is run between
> processes to clear shader LDS (Local Data Store) and GPRs
> (General Purpose Registers). A sysfs interface is
> also available to manually clear LDS and GPRs if you
> for example want to clear LDS and GPRs when a user logs out.
>
> This includes support for GFX 9.4.3 and 9.4.4. Support for
> other GPUs is in progress and will be available when ready.
>
> Alex Deucher (2):
> drm/amdgpu: handle enforce isolation on non-0 gfxhub
> drm/amdgpu: Emit cleaner shader at end of IB submission
>
> Amber Lin (2):
> drm/amdkfd: APIs to stop/start KFD scheduling
> drm/amdkfd: Enable processes isolation on gfx9
>
> Srinivasan Shanmugam (13):
> drm/amdgpu: Add infrastructure for Cleaner Shader feature
> drm/amdgpu: Make enforce_isolation setting per GPU
> drm/amdgpu: Enforce isolation as part of the job
> drm/amdgpu: Add enforce_isolation sysfs attribute
> drm/amdgpu: Add sysfs interface for running cleaner shader
> drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader
> execution
> drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware
> drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3
> hardware
> drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3
> drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware
> drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD
> serialization
> drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings
> drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute
> rings
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 18 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 14 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 437 ++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 31 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 17 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +
> drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 4 +
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 51 +-
> .../drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h | 26 ++
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 60 ++-
> .../amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm | 153 ++++++
> .../amd/amdgpu/gfx_v9_4_3_cleaner_shader.h | 64 +++
> drivers/gpu/drm/amd/amdgpu/soc15d.h | 4 +
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 39 ++
> .../drm/amd/amdkfd/kfd_device_queue_manager.c | 58 ++-
> .../drm/amd/amdkfd/kfd_device_queue_manager.h | 9 +
> .../drm/amd/amdkfd/kfd_packet_manager_v9.c | 14 +-
> .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h | 5 +-
> .../amd/amdkfd/kfd_pm4_headers_aldebaran.h | 2 +-
> 25 files changed, 1028 insertions(+), 23 deletions(-)
> create mode 100644 drivers/gpu/drm/amd/amdgpu/gfx_v9_0_cleaner_shader.h
> create mode 100644 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.asm
> create mode 100644 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3_cleaner_shader.h
>
^ permalink raw reply [flat|nested] 23+ messages in thread