[PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping
@ 2026-06-17 12:51 Xiang Liu
  2026-06-17 12:51 ` [PATCH 2/4] drm/amdgpu: return CSA kernel mapping via out parameter Xiang Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Xiang Liu @ 2026-06-17 12:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Tao.Zhou1, Stanley.Yang, YiPeng.Chai, Xiang Liu

The Context Save Area only ever holds CP preemption/resume (CE/DE)
metadata; it never contains shader code that the GPU needs to fetch and
execute. Mapping it executable in the process GPUVM is therefore
unnecessary and needlessly widens the attack surface of a GPU-writeable
buffer.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 7b46bffb10ccb..814cb9e903586 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -93,8 +93,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	}
 
 	r = amdgpu_vm_bo_map(adev, *bo_va, csa_addr, 0, size,
-			     AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE |
-			     AMDGPU_PTE_EXECUTABLE);
+			     AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE);
 
 	if (r) {
 		drm_err(adev_to_drm(adev),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] drm/amdgpu: return CSA kernel mapping via out parameter
  2026-06-17 12:51 [PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping Xiang Liu
@ 2026-06-17 12:51 ` Xiang Liu
  2026-06-17 12:51 ` [PATCH 3/4] drm/amdgpu: read back CE/DE preemption state via a per-ring CSA pointer Xiang Liu
  2026-06-17 12:51 ` [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state Xiang Liu
  2 siblings, 0 replies; 6+ messages in thread
From: Xiang Liu @ 2026-06-17 12:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Tao.Zhou1, Stanley.Yang, YiPeng.Chai, Xiang Liu

amdgpu_allocate_static_csa() stored the kernel CPU mapping of the newly
created CSA directly into adev->virt.csa_cpu_addr. That hard-codes the
single device-global CSA and prevents the function from being reused to
allocate additional CSA buffers.

Return the CPU mapping through an out parameter instead and let the
caller decide where to store it.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.h    | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 814cb9e903586..083a2cf20324b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -40,7 +40,7 @@ uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)
 }
 
 int amdgpu_allocate_static_csa(struct amdgpu_device *adev, struct amdgpu_bo **bo,
-				u32 domain, uint32_t size)
+				u32 domain, uint32_t size, void **cpu_ptr)
 {
 	void *ptr;
 
@@ -51,7 +51,7 @@ int amdgpu_allocate_static_csa(struct amdgpu_device *adev, struct amdgpu_bo **bo
 		return -ENOMEM;
 
 	memset(ptr, 0, size);
-	adev->virt.csa_cpu_addr = ptr;
+	*cpu_ptr = ptr;
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.h
index 7dfc1f2012ebf..34b85cda1cc56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.h
@@ -30,7 +30,7 @@
 uint32_t amdgpu_get_total_csa_size(struct amdgpu_device *adev);
 uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev);
 int amdgpu_allocate_static_csa(struct amdgpu_device *adev, struct amdgpu_bo **bo,
-				u32 domain, uint32_t size);
+				u32 domain, uint32_t size, void **cpu_ptr);
 int amdgpu_map_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			  struct amdgpu_bo *bo, struct amdgpu_bo_va **bo_va,
 			  uint64_t csa_addr, uint32_t size);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1253df9b1a9d2..a8e76fae11282 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2422,7 +2422,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 				r = amdgpu_allocate_static_csa(adev, &adev->virt.csa_obj,
 							       AMDGPU_GEM_DOMAIN_VRAM |
 							       AMDGPU_GEM_DOMAIN_GTT,
-							       AMDGPU_CSA_SIZE);
+							       AMDGPU_CSA_SIZE,
+							       &adev->virt.csa_cpu_addr);
 				if (r) {
 					dev_err(adev->dev,
 						"allocate CSA failed %d\n", r);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] drm/amdgpu: read back CE/DE preemption state via a per-ring CSA pointer
  2026-06-17 12:51 [PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping Xiang Liu
  2026-06-17 12:51 ` [PATCH 2/4] drm/amdgpu: return CSA kernel mapping via out parameter Xiang Liu
@ 2026-06-17 12:51 ` Xiang Liu
  2026-06-17 12:51 ` [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state Xiang Liu
  2 siblings, 0 replies; 6+ messages in thread
From: Xiang Liu @ 2026-06-17 12:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Tao.Zhou1, Stanley.Yang, YiPeng.Chai, Xiang Liu

The MCBP preemption resume path reads back the saved CE/DE payload from
the CSA kernel mapping. The gfx9/10/11 emit and patch helpers hard-coded
adev->virt.csa_cpu_addr, which only works while there is a single
device-global CSA shared by all processes.

Introduce a per-ring csa_cpu_addr that records which CSA kernel mapping
the state must be read back from for the jobs currently emitted on that
ring, and set it in amdgpu_ib_schedule() from the job's VM
(vm->csa_cpu_addr) with a fallback to the global mapping. For the gfx9
software-ring mux, carry the value in the saved chunk so the deferred
resubmission reads back the correct CSA. Convert the gfx9/10/11 helpers
to use ring->csa_cpu_addr.

vm->csa_cpu_addr is always NULL for now, so every lookup still resolves
to adev->virt.csa_cpu_addr and there is no functional change. This only
prepares the readback path for per-process CSA buffers.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c       |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h     |  6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h       |  6 ++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c       |  5 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c        | 11 ++++-------
 8 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 3099379af0b29..bec2fe6b35968 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -150,6 +150,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 	/* ring tests don't use a job */
 	if (job) {
 		vm = job->vm;
+		ring->csa_cpu_addr = (vm && vm->csa_cpu_addr) ?
+			vm->csa_cpu_addr : adev->virt.csa_cpu_addr;
 		fence_ctx = job->base.s_fence ?
 			job->base.s_fence->finished.context : 0;
 		shadow_va = job->shadow_va;
@@ -166,6 +168,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
 		job->hw_vm_fence->context = fence_ctx;
 	} else {
 		vm = NULL;
+		ring->csa_cpu_addr = adev->virt.csa_cpu_addr;
 		fence_ctx = 0;
 		shadow_va = 0;
 		csa_va = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 8f28b3bd70106..6d675a08df746 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -424,6 +424,12 @@ struct amdgpu_ring {
 
 	bool            is_sw_ring;
 	unsigned int    entry_index;
+
+	/* CPU mapping of the CSA whose CE/DE preemption state must be read
+	 * back for jobs currently emitted on this ring. Updated per job.
+	 */
+	void		*csa_cpu_addr;
+
 	/* store the cached rptr to restore after reset */
 	uint64_t cached_rptr;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 7e7d6c3865bcd..ba8c8ed778d33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -107,6 +107,7 @@ static void amdgpu_mux_resubmit_chunks(struct amdgpu_ring_mux *mux)
 								    ktime_get());
 				if (chunk->sync_seq ==
 					le32_to_cpu(*(e->ring->fence_drv.cpu_addr + 2))) {
+					e->ring->csa_cpu_addr = chunk->csa_cpu_addr;
 					if (chunk->cntl_offset <= e->ring->buf_mask)
 						amdgpu_ring_patch_cntl(e->ring,
 								       chunk->cntl_offset);
@@ -456,6 +457,7 @@ void amdgpu_ring_mux_start_ib(struct amdgpu_ring_mux *mux, struct amdgpu_ring *r
 	chunk->cntl_offset = ring->buf_mask + 1;
 	chunk->de_offset = ring->buf_mask + 1;
 	chunk->ce_offset = ring->buf_mask + 1;
+	chunk->csa_cpu_addr = ring->csa_cpu_addr;
 	list_add_tail(&chunk->entry, &e->list);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
index d3186b570b82e..e41e49ad7ddc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
@@ -99,6 +99,7 @@ struct amdgpu_mux_chunk {
 	u64                     cntl_offset;
 	u64                     de_offset;
 	u64                     ce_offset;
+	void			*csa_cpu_addr;
 };
 
 int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index a1698fb41c4af..5c85c38588374 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -361,6 +361,12 @@ struct amdgpu_vm {
 	bool			evicting;
 	unsigned int		saved_flags;
 
+	/* Kernel CPU mapping of this VM's private Context Save Area, used
+	 * by the MCBP preemption resume path to read back this VM's saved
+	 * CE/DE state. NULL when the VM has no private CSA.
+	 */
+	void			*csa_cpu_addr;
+
 	/* Memory statistics for this vm, protected by stats_lock */
 	spinlock_t		stats_lock;
 	struct amdgpu_mem_stats stats[__AMDGPU_PL_NUM];
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 07659f039f804..a8a9d54649eb2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -9011,7 +9011,6 @@ static int gfx_v10_0_ring_preempt_ib(struct amdgpu_ring *ring)
 
 static void gfx_v10_0_ring_emit_ce_meta(struct amdgpu_ring *ring, bool resume)
 {
-	struct amdgpu_device *adev = ring->adev;
 	struct v10_ce_ib_state ce_payload = {0};
 	uint64_t offset, ce_payload_gpu_addr;
 	void *ce_payload_cpu_addr;
@@ -9021,7 +9020,7 @@ static void gfx_v10_0_ring_emit_ce_meta(struct amdgpu_ring *ring, bool resume)
 
 	offset = offsetof(struct v10_gfx_meta_data, ce_payload);
 	ce_payload_gpu_addr = amdgpu_csa_vaddr(ring->adev) + offset;
-	ce_payload_cpu_addr = adev->virt.csa_cpu_addr + offset;
+	ce_payload_cpu_addr = ring->csa_cpu_addr + offset;
 
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, cnt));
 	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(2) |
@@ -9049,7 +9048,7 @@ static void gfx_v10_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume)
 
 	offset = offsetof(struct v10_gfx_meta_data, de_payload);
 	de_payload_gpu_addr = amdgpu_csa_vaddr(ring->adev) + offset;
-	de_payload_cpu_addr = adev->virt.csa_cpu_addr + offset;
+	de_payload_cpu_addr = ring->csa_cpu_addr + offset;
 
 	gds_addr = ALIGN(amdgpu_csa_vaddr(ring->adev) +
 			 AMDGPU_CSA_SIZE - adev->gds.gds_size,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 1941bfbcbfbff..68030391aa17c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -6252,7 +6252,7 @@ static void gfx_v11_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume)
 
 	offset = offsetof(struct v10_gfx_meta_data, de_payload);
 	de_payload_gpu_addr = amdgpu_csa_vaddr(ring->adev) + offset;
-	de_payload_cpu_addr = adev->virt.csa_cpu_addr + offset;
+	de_payload_cpu_addr = ring->csa_cpu_addr + offset;
 
 	gds_addr = ALIGN(amdgpu_csa_vaddr(ring->adev) +
 			 AMDGPU_CSA_SIZE - adev->gds.gds_size,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 2152830052ef9..f8e9be9383140 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -5556,14 +5556,13 @@ static void gfx_v9_0_ring_patch_cntl(struct amdgpu_ring *ring,
 static void gfx_v9_0_ring_patch_ce_meta(struct amdgpu_ring *ring,
 					unsigned offset)
 {
-	struct amdgpu_device *adev = ring->adev;
 	void *ce_payload_cpu_addr;
 	uint64_t payload_offset, payload_size;
 
 	payload_size = sizeof(struct v9_ce_ib_state);
 
 	payload_offset = offsetof(struct v9_gfx_meta_data, ce_payload);
-	ce_payload_cpu_addr = adev->virt.csa_cpu_addr + payload_offset;
+	ce_payload_cpu_addr = ring->csa_cpu_addr + payload_offset;
 
 	if (offset + (payload_size >> 2) <= ring->buf_mask + 1) {
 		memcpy((void *)&ring->ring[offset], ce_payload_cpu_addr, payload_size);
@@ -5580,14 +5579,13 @@ static void gfx_v9_0_ring_patch_ce_meta(struct amdgpu_ring *ring,
 static void gfx_v9_0_ring_patch_de_meta(struct amdgpu_ring *ring,
 					unsigned offset)
 {
-	struct amdgpu_device *adev = ring->adev;
 	void *de_payload_cpu_addr;
 	uint64_t payload_offset, payload_size;
 
 	payload_size = sizeof(struct v9_de_ib_state);
 
 	payload_offset = offsetof(struct v9_gfx_meta_data, de_payload);
-	de_payload_cpu_addr = adev->virt.csa_cpu_addr + payload_offset;
+	de_payload_cpu_addr = ring->csa_cpu_addr + payload_offset;
 
 	((struct v9_de_ib_state *)de_payload_cpu_addr)->ib_completion_status =
 		IB_COMPLETION_STATUS_PREEMPTED;
@@ -5793,7 +5791,6 @@ static void gfx_v9_ring_emit_sb(struct amdgpu_ring *ring)
 
 static void gfx_v9_0_ring_emit_ce_meta(struct amdgpu_ring *ring, bool resume)
 {
-	struct amdgpu_device *adev = ring->adev;
 	struct v9_ce_ib_state ce_payload = {0};
 	uint64_t offset, ce_payload_gpu_addr;
 	void *ce_payload_cpu_addr;
@@ -5803,7 +5800,7 @@ static void gfx_v9_0_ring_emit_ce_meta(struct amdgpu_ring *ring, bool resume)
 
 	offset = offsetof(struct v9_gfx_meta_data, ce_payload);
 	ce_payload_gpu_addr = amdgpu_csa_vaddr(ring->adev) + offset;
-	ce_payload_cpu_addr = adev->virt.csa_cpu_addr + offset;
+	ce_payload_cpu_addr = ring->csa_cpu_addr + offset;
 
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, cnt));
 	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(2) |
@@ -5891,7 +5888,7 @@ static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume, bo
 
 	offset = offsetof(struct v9_gfx_meta_data, de_payload);
 	de_payload_gpu_addr = amdgpu_csa_vaddr(ring->adev) + offset;
-	de_payload_cpu_addr = adev->virt.csa_cpu_addr + offset;
+	de_payload_cpu_addr = ring->csa_cpu_addr + offset;
 
 	gds_addr = ALIGN(amdgpu_csa_vaddr(ring->adev) +
 			 AMDGPU_CSA_SIZE - adev->gds.gds_size,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state
  2026-06-17 12:51 [PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping Xiang Liu
  2026-06-17 12:51 ` [PATCH 2/4] drm/amdgpu: return CSA kernel mapping via out parameter Xiang Liu
  2026-06-17 12:51 ` [PATCH 3/4] drm/amdgpu: read back CE/DE preemption state via a per-ring CSA pointer Xiang Liu
@ 2026-06-17 12:51 ` Xiang Liu
  2026-06-18  4:24   ` Liu, Xiang(Dean)
  2026-06-18  7:47   ` Christian König
  2 siblings, 2 replies; 6+ messages in thread
From: Xiang Liu @ 2026-06-17 12:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Hawking.Zhang, Tao.Zhou1, Stanley.Yang, YiPeng.Chai, Xiang Liu

A single device-global CSA (adev->virt.csa_obj) was mapped into every
render client's GPUVM at the same fixed virtual address. The CSA is
GPU-writeable and holds CP preemption/resume (CE/DE) metadata that the
kernel and CP firmware consume to save and restore gfx queue state, so a
shared buffer lets one client overwrite the scheduler state relied upon
for another client's queue. Under SR-IOV this is a cross-tenant
scheduler-state integrity issue.

Allocate a private CSA per amdgpu_fpriv in amdgpu_driver_open_kms() and
map that into the process GPUVM instead of the global object, and free
it in amdgpu_driver_postclose_kms(). Publish its kernel mapping through
vm->csa_cpu_addr so the preemption resume path reads back this process's
own saved state. One client can no longer observe or corrupt another
client's CSA.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 ++++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 59670aee0fd6f..50ac52f2e0565 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -448,6 +448,8 @@ struct amdgpu_fpriv {
 	struct amdgpu_vm	vm;
 	struct amdgpu_bo_va	*prt_va;
 	struct amdgpu_bo_va	*csa_va;
+	struct amdgpu_bo	*csa_obj;
+	void			*csa_cpu_addr;
 	struct amdgpu_bo_va	*seq64_va;
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 2e1284b7887c3..e0fc16bc7ef23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1529,10 +1529,28 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 	if (adev->gfx.mcbp) {
 		uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;
 
-		r = amdgpu_map_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
-						&fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
+		/* Allocate a per-process CSA. The CSA holds CP preemption/resume
+		 * (CE/DE) metadata that the kernel and CP firmware rely on. A
+		 * single device-global CSA mapped writable into every GPUVM would
+		 * let one client corrupt another client's (or the kernel's) saved
+		 * scheduler state, so give each process its own isolated copy.
+		 */
+		r = amdgpu_allocate_static_csa(adev, &fpriv->csa_obj,
+					       AMDGPU_GEM_DOMAIN_VRAM |
+					       AMDGPU_GEM_DOMAIN_GTT,
+					       AMDGPU_CSA_SIZE,
+					       &fpriv->csa_cpu_addr);
 		if (r)
 			goto error_vm;
+
+		r = amdgpu_map_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
+						&fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
+		if (r) {
+			amdgpu_free_static_csa(&fpriv->csa_obj);
+			fpriv->csa_cpu_addr = NULL;
+			goto error_vm;
+		}
+		fpriv->vm.csa_cpu_addr = fpriv->csa_cpu_addr;
 	}
 
 	r = amdgpu_seq64_map(adev, &fpriv->vm, &fpriv->seq64_va);
@@ -1604,9 +1622,12 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	if (fpriv->csa_va) {
 		uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;
 
-		WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
+		WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
 						fpriv->csa_va, csa_addr));
 		fpriv->csa_va = NULL;
+		fpriv->vm.csa_cpu_addr = NULL;
+		amdgpu_free_static_csa(&fpriv->csa_obj);
+		fpriv->csa_cpu_addr = NULL;
 	}
 
 	amdgpu_seq64_unmap(adev, fpriv);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state
  2026-06-17 12:51 ` [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state Xiang Liu
@ 2026-06-18  4:24   ` Liu, Xiang(Dean)
  2026-06-18  7:47   ` Christian König
  1 sibling, 0 replies; 6+ messages in thread
From: Liu, Xiang(Dean) @ 2026-06-18  4:24 UTC (permalink / raw)
  To: amd-gfx@lists.freedesktop.org, Koenig,  Christian
  Cc: Zhang, Hawking, Zhou1, Tao, Yang, Stanley, Chai, Thomas

[-- Attachment #1: Type: text/plain, Size: 5023 bytes --]

AMD General

Hi @Koenig, Christian<mailto:Christian.Koenig@amd.com>

Could you help take a look at this patch series, thanks.

Best Regards,

Liu, Xiang

________________________________
From: Liu, Xiang(Dean) <Xiang.Liu@amd.com>
Sent: Wednesday, June 17, 2026 8:51 PM
To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>; Liu, Xiang(Dean) <Xiang.Liu@amd.com>
Subject: [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state

A single device-global CSA (adev->virt.csa_obj) was mapped into every
render client's GPUVM at the same fixed virtual address. The CSA is
GPU-writeable and holds CP preemption/resume (CE/DE) metadata that the
kernel and CP firmware consume to save and restore gfx queue state, so a
shared buffer lets one client overwrite the scheduler state relied upon
for another client's queue. Under SR-IOV this is a cross-tenant
scheduler-state integrity issue.

Allocate a private CSA per amdgpu_fpriv in amdgpu_driver_open_kms() and
map that into the process GPUVM instead of the global object, and free
it in amdgpu_driver_postclose_kms(). Publish its kernel mapping through
vm->csa_cpu_addr so the preemption resume path reads back this process's
own saved state. One client can no longer observe or corrupt another
client's CSA.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 ++++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 59670aee0fd6f..50ac52f2e0565 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -448,6 +448,8 @@ struct amdgpu_fpriv {
         struct amdgpu_vm        vm;
         struct amdgpu_bo_va     *prt_va;
         struct amdgpu_bo_va     *csa_va;
+       struct amdgpu_bo        *csa_obj;
+       void                    *csa_cpu_addr;
         struct amdgpu_bo_va     *seq64_va;
         struct mutex            bo_list_lock;
         struct idr              bo_list_handles;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 2e1284b7887c3..e0fc16bc7ef23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1529,10 +1529,28 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
         if (adev->gfx.mcbp) {
                 uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;

-               r = amdgpu_map_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
-                                               &fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
+               /* Allocate a per-process CSA. The CSA holds CP preemption/resume
+                * (CE/DE) metadata that the kernel and CP firmware rely on. A
+                * single device-global CSA mapped writable into every GPUVM would
+                * let one client corrupt another client's (or the kernel's) saved
+                * scheduler state, so give each process its own isolated copy.
+                */
+               r = amdgpu_allocate_static_csa(adev, &fpriv->csa_obj,
+                                              AMDGPU_GEM_DOMAIN_VRAM |
+                                              AMDGPU_GEM_DOMAIN_GTT,
+                                              AMDGPU_CSA_SIZE,
+                                              &fpriv->csa_cpu_addr);
                 if (r)
                         goto error_vm;
+
+               r = amdgpu_map_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
+                                               &fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
+               if (r) {
+                       amdgpu_free_static_csa(&fpriv->csa_obj);
+                       fpriv->csa_cpu_addr = NULL;
+                       goto error_vm;
+               }
+               fpriv->vm.csa_cpu_addr = fpriv->csa_cpu_addr;
         }

         r = amdgpu_seq64_map(adev, &fpriv->vm, &fpriv->seq64_va);
@@ -1604,9 +1622,12 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
         if (fpriv->csa_va) {
                 uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;

-               WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
+               WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
                                                 fpriv->csa_va, csa_addr));
                 fpriv->csa_va = NULL;
+               fpriv->vm.csa_cpu_addr = NULL;
+               amdgpu_free_static_csa(&fpriv->csa_obj);
+               fpriv->csa_cpu_addr = NULL;
         }

         amdgpu_seq64_unmap(adev, fpriv);
--
2.34.1


[-- Attachment #2: Type: text/html, Size: 11867 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state
  2026-06-17 12:51 ` [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state Xiang Liu
  2026-06-18  4:24   ` Liu, Xiang(Dean)
@ 2026-06-18  7:47   ` Christian König
  1 sibling, 0 replies; 6+ messages in thread
From: Christian König @ 2026-06-18  7:47 UTC (permalink / raw)
  To: Xiang Liu, amd-gfx; +Cc: Hawking.Zhang, Tao.Zhou1, Stanley.Yang, YiPeng.Chai

On 6/17/26 14:51, Xiang Liu wrote:
> A single device-global CSA (adev->virt.csa_obj) was mapped into every
> render client's GPUVM at the same fixed virtual address. The CSA is
> GPU-writeable and holds CP preemption/resume (CE/DE) metadata that the
> kernel and CP firmware consume to save and restore gfx queue state, so a
> shared buffer lets one client overwrite the scheduler state relied upon
> for another client's queue. Under SR-IOV this is a cross-tenant
> scheduler-state integrity issue.
> 
> Allocate a private CSA per amdgpu_fpriv in amdgpu_driver_open_kms() and
> map that into the process GPUVM instead of the global object, and free
> it in amdgpu_driver_postclose_kms(). Publish its kernel mapping through
> vm->csa_cpu_addr so the preemption resume path reads back this process's
> own saved state. One client can no longer observe or corrupt another
> client's CSA.

Clear NAK to that design. That doesn't even remotely work correctly.

Userspace is responsible to allocate the CSA if the global one isn't used.

Regards,
Christian.

> 
> Signed-off-by: Xiang Liu <xiang.liu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 ++++++++++++++++++++++---
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 59670aee0fd6f..50ac52f2e0565 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -448,6 +448,8 @@ struct amdgpu_fpriv {
>  	struct amdgpu_vm	vm;
>  	struct amdgpu_bo_va	*prt_va;
>  	struct amdgpu_bo_va	*csa_va;
> +	struct amdgpu_bo	*csa_obj;
> +	void			*csa_cpu_addr;
>  	struct amdgpu_bo_va	*seq64_va;
>  	struct mutex		bo_list_lock;
>  	struct idr		bo_list_handles;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 2e1284b7887c3..e0fc16bc7ef23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1529,10 +1529,28 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>  	if (adev->gfx.mcbp) {
>  		uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;
>  
> -		r = amdgpu_map_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
> -						&fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
> +		/* Allocate a per-process CSA. The CSA holds CP preemption/resume
> +		 * (CE/DE) metadata that the kernel and CP firmware rely on. A
> +		 * single device-global CSA mapped writable into every GPUVM would
> +		 * let one client corrupt another client's (or the kernel's) saved
> +		 * scheduler state, so give each process its own isolated copy.
> +		 */
> +		r = amdgpu_allocate_static_csa(adev, &fpriv->csa_obj,
> +					       AMDGPU_GEM_DOMAIN_VRAM |
> +					       AMDGPU_GEM_DOMAIN_GTT,
> +					       AMDGPU_CSA_SIZE,
> +					       &fpriv->csa_cpu_addr);
>  		if (r)
>  			goto error_vm;
> +
> +		r = amdgpu_map_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
> +						&fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
> +		if (r) {
> +			amdgpu_free_static_csa(&fpriv->csa_obj);
> +			fpriv->csa_cpu_addr = NULL;
> +			goto error_vm;
> +		}
> +		fpriv->vm.csa_cpu_addr = fpriv->csa_cpu_addr;
>  	}
>  
>  	r = amdgpu_seq64_map(adev, &fpriv->vm, &fpriv->seq64_va);
> @@ -1604,9 +1622,12 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  	if (fpriv->csa_va) {
>  		uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;
>  
> -		WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
> +		WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
>  						fpriv->csa_va, csa_addr));
>  		fpriv->csa_va = NULL;
> +		fpriv->vm.csa_cpu_addr = NULL;
> +		amdgpu_free_static_csa(&fpriv->csa_obj);
> +		fpriv->csa_cpu_addr = NULL;
>  	}
>  
>  	amdgpu_seq64_unmap(adev, fpriv);


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-18  7:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 12:51 [PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping Xiang Liu
2026-06-17 12:51 ` [PATCH 2/4] drm/amdgpu: return CSA kernel mapping via out parameter Xiang Liu
2026-06-17 12:51 ` [PATCH 3/4] drm/amdgpu: read back CE/DE preemption state via a per-ring CSA pointer Xiang Liu
2026-06-17 12:51 ` [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state Xiang Liu
2026-06-18  4:24   ` Liu, Xiang(Dean)
2026-06-18  7:47   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.