public inbox for dri-devel@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	 intel-xe@lists.freedesktop.org
Cc: "Natalie Vock" <natalie.vock@gmx.de>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
	cgroups@vger.kernel.org, "Huang Rui" <ray.huang@amd.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Simona Vetter" <simona@ffwll.ch>,
	"David Airlie" <airlied@gmail.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	linux-kernel@vger.kernel.org,
	"Thadeu Lima de Souza Cascardo" <cascardo@igalia.com>
Subject: Re: [PATCH 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage
Date: Wed, 22 Apr 2026 10:42:11 +0200	[thread overview]
Message-ID: <398623a092c65ce4e53d1713112fa39ac0979fd7.camel@linux.intel.com> (raw)
In-Reply-To: <4b647952-0038-4878-b67e-6c7fc7ab27a6@linux.intel.com>

On Wed, 2026-04-22 at 10:31 +0200, Maarten Lankhorst wrote:
> Hey,
> 
> (Adding Thadeu to cc since they've been working on the same issue)
> 
> Den 2026-03-27 kl. 09:15, skrev Thomas Hellström:
> > Add an optional reclaim callback to struct dmem_cgroup_region. 
> > When
> > dmem.max is set below current usage, invoke the callback to evict
> > memory
> > and retry setting the limit rather than failing immediately. 
> > Signal
> > interruptions propagate back to the write() caller.
> > 
> > RFC:
> > Due to us updating the max limit _after_ the usage has been
> > sufficiently lowered, this should be prone to failures if there are
> > aggressive allocators running in parallel to the reclaim.
> > So can we somehow enforce the new limit while the eviction is
> > happening?
> > 
> > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  include/linux/cgroup_dmem.h | 11 +++++
> >  kernel/cgroup/dmem.c        | 94
> > +++++++++++++++++++++++++++++++++----
> >  2 files changed, 96 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/cgroup_dmem.h
> > b/include/linux/cgroup_dmem.h
> > index dd4869f1d736..61520a431740 100644
> > --- a/include/linux/cgroup_dmem.h
> > +++ b/include/linux/cgroup_dmem.h
> > @@ -26,6 +26,10 @@ bool dmem_cgroup_state_evict_valuable(struct
> > dmem_cgroup_pool_state *limit_pool,
> >  				      bool ignore_low, bool
> > *ret_hit_low);
> >  
> >  void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state
> > *pool);
> > +void dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region
> > *region,
> > +				    int (*reclaim)(struct
> > dmem_cgroup_pool_state *pool,
> > +						   u64
> > target_bytes, void *priv),
> > +				    void *priv);
> >  #else
> >  static inline __printf(2,3) struct dmem_cgroup_region *
> >  dmem_cgroup_register_region(u64 size, const char *name_fmt, ...)
> > @@ -62,5 +66,12 @@ bool dmem_cgroup_state_evict_valuable(struct
> > dmem_cgroup_pool_state *limit_pool,
> >  static inline void dmem_cgroup_pool_state_put(struct
> > dmem_cgroup_pool_state *pool)
> >  { }
> >  
> > +static inline void
> > +dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region,
> > +			       int (*reclaim)(struct
> > dmem_cgroup_pool_state *pool,
> > +					      u64 target_bytes,
> > void *priv),
> > +			       void *priv)
> > +{ }
> > +
> >  #endif
> >  #endif	/* _CGROUP_DMEM_H */
> > diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> > index 3e6d4c0b26a1..f993fb058b74 100644
> > --- a/kernel/cgroup/dmem.c
> > +++ b/kernel/cgroup/dmem.c
> > @@ -51,6 +51,18 @@ struct dmem_cgroup_region {
> >  	 * No new pools should be added to the region afterwards.
> >  	 */
> >  	bool unregistered;
> > +
> > +	/**
> > +	 * @reclaim: Optional callback invoked when dmem.max is
> > set below the
> > +	 * current usage of a pool. The driver should attempt to
> > free at least
> > +	 * @target_bytes from @pool. May be called multiple times
> > if usage
> > +	 * remains above the limit after returning.
> > +	 */
> > +	int (*reclaim)(struct dmem_cgroup_pool_state *pool, u64
> > target_bytes,
> > +		       void *priv);
> > +
> > +	/** @reclaim_priv: Private data passed to @reclaim. */
> > +	void *reclaim_priv;
> >  };
> >  
> >  struct dmemcg_state {
> > @@ -145,23 +157,59 @@ static void free_cg_pool(struct
> > dmem_cgroup_pool_state *pool)
> >  }
> >  
> >  static int
> > -set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val)
> > +set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val,
> > +		 struct dmem_cgroup_region *region)
> >  {
> >  	page_counter_set_min(&pool->cnt, val);
> >  	return 0;
> >  }
> >  
> >  static int
> > -set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val)
> > +set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val,
> > +		 struct dmem_cgroup_region *region)
> >  {
> >  	page_counter_set_low(&pool->cnt, val);
> >  	return 0;
> >  }
> >  
> >  static int
> > -set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
> > +set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val,
> > +		 struct dmem_cgroup_region *region)
> >  {
> > -	return page_counter_set_max(&pool->cnt, val);
> > +	int err = page_counter_set_max(&pool->cnt, val);
> > +
> > +	if (err != -EBUSY || !region || !region->reclaim)
> > +		return err;
> > +
> > +	/*
> > +	 * The new max is below current usage.  Ask the driver to
> > evict memory
> > +	 * and retry, up to a bounded number of times.  Signal
> > interruptions are
> > +	 * propagated back to the write() caller; other reclaim
> > failures leave
> > +	 * -EBUSY as the result.
> > +	 */
> > +	for (int retries = 5; retries > 0; retries--) {
> > +		u64 usage = page_counter_read(&pool->cnt);
> > +		u64 target = usage > val ? usage - val : 0;
> > +		int reclaim_err;
> > +
> > +		if (!target) {
> > +			err = page_counter_set_max(&pool->cnt,
> > val);
> > +			break;
> > +		}
> > +
> > +		reclaim_err = region->reclaim(pool, target,
> > region->reclaim_priv);
> > +		if (reclaim_err) {
> > +			if (reclaim_err == -EINTR || reclaim_err
> > == -ERESTARTSYS)
> > +				err = reclaim_err;
> > +			break;
> > +		}
> > +
> > +		err = page_counter_set_max(&pool->cnt, val);
> > +		if (err != -EBUSY)
> > +			break;
> > +	}
> > +
> > +	return err;
> >  }
> 
> I mentioned this in chat but I wanted to mention it on the mailing
> list for others as well,
> can we reproduce the behavior from memory_max_write() in
> mm/memcontrol.c?
> 
> 1. First set new limit through xchg.
> 2. If O_NONBLOCK is set -> do nothing, next allocation in target
> region will fail and cause reclaim.
> 3. If not set -> reclaim until below new limit or interrupted by a
> signal, return success in all cases here since we set new limit.
> 
> 

Yup.

For 3, we also need to consider the case where we fail to reclaim due
to memory being pinned. If it's OK to (usually temporary) have current
usage above max, that would work.

I have that coded up and also add a patch on top to defer reclaim to a
thread if we bail due to signal or O_NONBLOCK. Perhaps we could discuss
whether that's a good or bad idea in that patch.

Will send out when I've updated the IGT tests accordingly.

Thanks,
Thomas



  reply	other threads:[~2026-04-22  8:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27  8:15 [PATCH 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-03-27  8:15 ` [PATCH 1/5] cgroup/dmem: Return error when setting max below current usage Thomas Hellström
2026-03-27  8:15 ` [PATCH 2/5] cgroup/dmem: Add reclaim callback for lowering " Thomas Hellström
2026-04-22  8:31   ` Maarten Lankhorst
2026-04-22  8:42     ` Thomas Hellström [this message]
2026-04-22  9:50       ` Maarten Lankhorst
2026-04-22 10:20         ` Thomas Hellström
2026-04-22 10:29           ` Maarten Lankhorst
2026-04-22 10:36             ` Thomas Hellström
2026-03-27  8:15 ` [PATCH 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-03-27  8:15 ` [PATCH 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-03-27  8:16 ` [PATCH 5/5] drm/amdgpu: " Thomas Hellström
2026-03-27  8:33 ` [PATCH 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=398623a092c65ce4e53d1713112fa39ac0979fd7.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=cascardo@igalia.com \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mkoutny@suse.com \
    --cc=mripard@kernel.org \
    --cc=natalie.vock@gmx.de \
    --cc=ray.huang@amd.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox