All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Natalie Vock" <natalie.vock@gmx.de>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
	cgroups@vger.kernel.org, "Huang Rui" <ray.huang@amd.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Simona Vetter" <simona@ffwll.ch>,
	"David Airlie" <airlied@gmail.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 6/6] drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
Date: Thu, 11 Jun 2026 21:41:28 +0200	[thread overview]
Message-ID: <c4dec5fa460cdbeb3706410c6fb3375ab36012ab.camel@linux.intel.com> (raw)
In-Reply-To: <20260611173301.17473-7-thomas.hellstrom@linux.intel.com>

On Thu, 2026-06-11 at 19:33 +0200, Thomas Hellström wrote:
> Register the VRAM manager with the dmem cgroup reclaim infrastructure
> so that lowering dmem.max below current VRAM usage triggers TTM
> eviction rather than failing with -EBUSY.
> 
> Guard place->flags in amdgpu_ttm_bo_eviction_valuable() against NULL,
> as the TTM reclaim path passes a NULL place in cgroup drain mode.
> 
> v3:
> - Rebased on fix for uninitialized list and buddy allocator on the
>   drmm_cgroup_register_region() error path.
> 
> v5:
> - Rebased on the introduction of struct dmem_cgroup_init.
> - Clear the reclaim callback in amdgpu_vram_mgr_fini() to prevent
>   use-after-free if cgroup reclaim is triggered after driver unbind
>   while userspace holds an open DRM file descriptor. (Sashiko-bot)
> - Switch from drmm_cgroup_register_region() to the raw
>   dmem_cgroup_register_region() and store the region in
>   amdgpu_vram_mgr.cg_region. Call dmem_cgroup_unregister_region()
>   in amdgpu_vram_mgr_fini() after ttm_resource_manager_evict_all()
>   to drain in-flight reclaim callbacks, and clear man->cg afterwards.
>   This is required because amdgpu's vram manager fini is called
>   explicitly during driver unbind, which may precede the DRM device
>   release and thus precede any drmm-based cleanup. (Sashiko-bot)
> 
> v6:
> - Fix mgr->cg_region never being assigned, so
>   dmem_cgroup_unregister_region() in fini silently no-ops on NULL
>   and leaks the region. (Sashiko-bot)
> - Reorder fini to call set_used(false) and evict_all() before
>   dmem_cgroup_unregister_region(), so ttm_resource_free() can
>   uncharge via man->cg during eviction; clear man->cg after
>   unregister. (Sashiko-bot)
> 
> Assisted-by: GitHub_Copilot:claude-sonnet-4.6
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c      |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 31 ++++++++++++++++--
> --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h |  2 ++
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 2740de94e93c..8cbcd33f51a5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1488,7 +1488,7 @@ static bool
> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>  	dma_resv_for_each_fence(&resv_cursor, bo->base.resv,
>  				DMA_RESV_USAGE_BOOKKEEP, f) {
>  		if (amdkfd_fence_check_mm(f, current->mm) &&
> -		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
> +		    !(place && (place->flags &
> TTM_PL_FLAG_CONTIGUOUS)))
>  			return false;
>  	}
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 08f05c3aed1d..2250bab0970d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -906,6 +906,10 @@ static const struct ttm_resource_manager_func
> amdgpu_vram_mgr_func = {
>  	.debug	= amdgpu_vram_mgr_debug
>  };
>  
> +static const struct dmem_cgroup_ops amdgpu_vram_mgr_dmem_ops = {
> +	.reclaim = ttm_resource_manager_dmem_reclaim,
> +};

Probably might want to block reclaim after device unbind, just like xe.
I'll look at that for v7.

> +
>  /**
>   * amdgpu_vram_mgr_init - init VRAM manager and DRM MM
>   *
> @@ -917,6 +921,7 @@ int amdgpu_vram_mgr_init(struct amdgpu_device
> *adev)
>  {
>  	struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr;
>  	struct ttm_resource_manager *man = &mgr->manager;
> +	struct dmem_cgroup_region *cg;
>  	int err;
>  
>  	ttm_resource_manager_init(man, &adev->mman.bdev,
> @@ -933,12 +938,16 @@ int amdgpu_vram_mgr_init(struct amdgpu_device
> *adev)
>  	if (err)
>  		return err;
>  
> -	man->cg = drmm_cgroup_register_region(adev_to_drm(adev),
> "vram",
> -					      &(struct
> dmem_cgroup_init){
> -						.size = adev-
> >gmc.real_vram_size,
> -					      });
> -	if (IS_ERR(man->cg))
> -		return PTR_ERR(man->cg);
> +	cg = dmem_cgroup_register_region(&(struct dmem_cgroup_init){
> +					     .size = adev-
> >gmc.real_vram_size,
> +					     .ops =
> &amdgpu_vram_mgr_dmem_ops,
> +					     .reclaim_priv = man,
> +					 }, "vram");
> +	if (IS_ERR(cg))
> +		return PTR_ERR(cg);
> +
> +	mgr->cg_region = cg;
> +	ttm_resource_manager_set_dmem_region(man, cg);
>  
>  	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr-
> >manager);
>  	ttm_resource_manager_set_used(man, true);
> @@ -966,6 +975,16 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device
> *adev)
>  	if (ret)
>  		return;
>  
> +	/*
> +	 * Drain any in-flight dmem cgroup reclaim callbacks and
> remove the
> +	 * region from the global list.  This must happen after
> evict_all()
> +	 * so that ttm_resource_free() can still uncharge via man-
> >cg while
> +	 * BOs are being evicted.
> +	 */
> +	dmem_cgroup_unregister_region(mgr->cg_region);
> +	mgr->cg_region = NULL;
> +	man->cg = NULL;
> +
>  	mutex_lock(&mgr->lock);
>  	list_for_each_entry_safe(rsv, temp, &mgr-
> >reservations_pending, blocks)
>  		kfree(rsv);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> index 429a21a2e9b2..07103cddb335 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
> @@ -36,6 +36,8 @@ struct amdgpu_vram_mgr {
>  	atomic64_t vis_usage;
>  	u64 default_page_size;
>  	struct list_head allocated_vres_list;
> +	/** @cg_region: dmem cgroup region for VRAM; unregistered in
> fini. */
> +	struct dmem_cgroup_region *cg_region;
>  };
>  
>  struct amdgpu_vres_task {

  parent reply	other threads:[~2026-06-11 19:41 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 17:32 [PATCH v6 0/6] [PATCH v6 0/6] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-06-11 17:32 ` [PATCH v6 1/6] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-06-11 17:45   ` sashiko-bot
2026-06-11 17:32 ` [PATCH v6 2/6] cgroup/dmem: Introduce struct dmem_cgroup_init for region initialization Thomas Hellström
2026-06-11 17:32 ` [PATCH v6 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-06-11 18:01   ` sashiko-bot
2026-06-11 17:32 ` [PATCH v6 4/6] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-06-11 17:33 ` [PATCH v6 5/6] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-06-11 17:33 ` [PATCH v6 6/6] drm/amdgpu: " Thomas Hellström
2026-06-11 18:32   ` sashiko-bot
2026-06-11 19:41   ` Thomas Hellström [this message]
2026-06-11 20:00 ` ✗ CI.checkpatch: warning for Add reclaim to the dmem cgroup controller (rev6) Patchwork
2026-06-11 20:02 ` ✓ CI.KUnit: success " Patchwork
2026-06-11 20:49 ` ✓ Xe.CI.BAT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4dec5fa460cdbeb3706410c6fb3375ab36012ab.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mkoutny@suse.com \
    --cc=mripard@kernel.org \
    --cc=natalie.vock@gmx.de \
    --cc=ray.huang@amd.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.