All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Xiang Liu <xiang.liu@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Hawking.Zhang@amd.com, Tao.Zhou1@amd.com, Stanley.Yang@amd.com,
	YiPeng.Chai@amd.com
Subject: Re: [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state
Date: Thu, 18 Jun 2026 09:47:15 +0200	[thread overview]
Message-ID: <c312cca5-fa4e-42b4-b0ea-e8c351f30d5e@amd.com> (raw)
In-Reply-To: <20260617125121.1838486-4-xiang.liu@amd.com>

On 6/17/26 14:51, Xiang Liu wrote:
> A single device-global CSA (adev->virt.csa_obj) was mapped into every
> render client's GPUVM at the same fixed virtual address. The CSA is
> GPU-writeable and holds CP preemption/resume (CE/DE) metadata that the
> kernel and CP firmware consume to save and restore gfx queue state, so a
> shared buffer lets one client overwrite the scheduler state relied upon
> for another client's queue. Under SR-IOV this is a cross-tenant
> scheduler-state integrity issue.
> 
> Allocate a private CSA per amdgpu_fpriv in amdgpu_driver_open_kms() and
> map that into the process GPUVM instead of the global object, and free
> it in amdgpu_driver_postclose_kms(). Publish its kernel mapping through
> vm->csa_cpu_addr so the preemption resume path reads back this process's
> own saved state. One client can no longer observe or corrupt another
> client's CSA.

Clear NAK to that design. That doesn't even remotely work correctly.

Userspace is responsible to allocate the CSA if the global one isn't used.

Regards,
Christian.

> 
> Signed-off-by: Xiang Liu <xiang.liu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 ++++++++++++++++++++++---
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 59670aee0fd6f..50ac52f2e0565 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -448,6 +448,8 @@ struct amdgpu_fpriv {
>  	struct amdgpu_vm	vm;
>  	struct amdgpu_bo_va	*prt_va;
>  	struct amdgpu_bo_va	*csa_va;
> +	struct amdgpu_bo	*csa_obj;
> +	void			*csa_cpu_addr;
>  	struct amdgpu_bo_va	*seq64_va;
>  	struct mutex		bo_list_lock;
>  	struct idr		bo_list_handles;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 2e1284b7887c3..e0fc16bc7ef23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1529,10 +1529,28 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>  	if (adev->gfx.mcbp) {
>  		uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;
>  
> -		r = amdgpu_map_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
> -						&fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
> +		/* Allocate a per-process CSA. The CSA holds CP preemption/resume
> +		 * (CE/DE) metadata that the kernel and CP firmware rely on. A
> +		 * single device-global CSA mapped writable into every GPUVM would
> +		 * let one client corrupt another client's (or the kernel's) saved
> +		 * scheduler state, so give each process its own isolated copy.
> +		 */
> +		r = amdgpu_allocate_static_csa(adev, &fpriv->csa_obj,
> +					       AMDGPU_GEM_DOMAIN_VRAM |
> +					       AMDGPU_GEM_DOMAIN_GTT,
> +					       AMDGPU_CSA_SIZE,
> +					       &fpriv->csa_cpu_addr);
>  		if (r)
>  			goto error_vm;
> +
> +		r = amdgpu_map_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
> +						&fpriv->csa_va, csa_addr, AMDGPU_CSA_SIZE);
> +		if (r) {
> +			amdgpu_free_static_csa(&fpriv->csa_obj);
> +			fpriv->csa_cpu_addr = NULL;
> +			goto error_vm;
> +		}
> +		fpriv->vm.csa_cpu_addr = fpriv->csa_cpu_addr;
>  	}
>  
>  	r = amdgpu_seq64_map(adev, &fpriv->vm, &fpriv->seq64_va);
> @@ -1604,9 +1622,12 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  	if (fpriv->csa_va) {
>  		uint64_t csa_addr = amdgpu_csa_vaddr(adev) & AMDGPU_GMC_HOLE_MASK;
>  
> -		WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, adev->virt.csa_obj,
> +		WARN_ON(amdgpu_unmap_static_csa(adev, &fpriv->vm, fpriv->csa_obj,
>  						fpriv->csa_va, csa_addr));
>  		fpriv->csa_va = NULL;
> +		fpriv->vm.csa_cpu_addr = NULL;
> +		amdgpu_free_static_csa(&fpriv->csa_obj);
> +		fpriv->csa_cpu_addr = NULL;
>  	}
>  
>  	amdgpu_seq64_unmap(adev, fpriv);


      parent reply	other threads:[~2026-06-18  7:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 12:51 [PATCH 1/4] drm/amdgpu: drop executable permission from the CSA mapping Xiang Liu
2026-06-17 12:51 ` [PATCH 2/4] drm/amdgpu: return CSA kernel mapping via out parameter Xiang Liu
2026-06-17 12:51 ` [PATCH 3/4] drm/amdgpu: read back CE/DE preemption state via a per-ring CSA pointer Xiang Liu
2026-06-17 12:51 ` [PATCH 4/4] drm/amdgpu: allocate a per-process CSA to isolate scheduler state Xiang Liu
2026-06-18  4:24   ` Liu, Xiang(Dean)
2026-06-18  7:47   ` Christian König [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c312cca5-fa4e-42b4-b0ea-e8c351f30d5e@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=Stanley.Yang@amd.com \
    --cc=Tao.Zhou1@amd.com \
    --cc=YiPeng.Chai@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=xiang.liu@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.