public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tursulin@ursulin.net>
To: "Natalie Vock" <natalie.vock@gmx.de>,
	"Maarten Lankhorst" <dev@lankhorst.se>,
	"Maxime Ripard" <mripard@kernel.org>, "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Christian Koenig" <christian.koenig@amd.com>,
	"Huang Rui" <ray.huang@amd.com>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>
Cc: cgroups@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
Date: Wed, 25 Feb 2026 14:09:26 +0000	[thread overview]
Message-ID: <5f24e705-0ecb-4915-9f58-e50ba2b353bf@ursulin.net> (raw)
In-Reply-To: <c6a7d7fe-a7df-4790-b6eb-ba2073eaab68@gmx.de>


On 25/02/2026 13:26, Natalie Vock wrote:
> Sorry, already sent out v4 before I saw this.
> 
> On 2/25/26 12:45, Tvrtko Ursulin wrote:
>>
>> On 10/11/2025 12:37, Natalie Vock wrote:
>>> When the cgroup's memory usage is below the low/min limit and allocation
>>> fails, try evicting some unprotected buffers to make space. Otherwise,
>>> application buffers may be forced to go into GTT even though usage is
>>> below the corresponding low/min limit, if other applications filled VRAM
>>> with their allocations first.
>>>
>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 +++++++++++++++++++++++++++ 
>>> + ++++++----
>>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 829d994798835..bd467c965e1bc 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>>> struct ttm_resource_manager *man
>>>   }
>>>   struct ttm_bo_alloc_state {
>>> +    /** @charge_pool: The memory pool the resource is charged to */
>>> +    struct dmem_cgroup_pool_state *charge_pool;
>>>       /** @limit_pool: Which pool limit we should test against */
>>>       struct dmem_cgroup_pool_state *limit_pool;
>>> +    /** @only_evict_unprotected: If eviction should be restricted to 
>>> unprotected BOs */
>>> +    bool only_evict_unprotected;
>>>   };
>>>   /**
>>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>>> *walk, struct ttm_buffer_object *
>>>       evict_walk->evicted++;
>>>       if (evict_walk->res)
>>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>>> >place,
>>> -                      evict_walk->res, NULL);
>>> +                      evict_walk->res, evict_walk->alloc_state- 
>>> >charge_pool);
>>>       if (lret == 0)
>>>           return 1;
>>>   out:
>>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>> *bdev,
>>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>       /* One more attempt if we hit low limit? */
>>> -    if (!lret && evict_walk.hit_low) {
>>> +    if (!lret && evict_walk.hit_low && !state- 
>>> >only_evict_unprotected) {
>>>           evict_walk.try_low = true;
>>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>       }
>>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>> *bdev,
>>>       } while (!lret && evict_walk.evicted);
>>>       /* We hit the low limit? Try once more */
>>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>>> +            !state->only_evict_unprotected) {
>>>           evict_walk.try_low = true;
>>>           goto retry;
>>>       }
>>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>>> ttm_buffer_object *bo,
>>>                    struct ttm_resource **res,
>>>                    struct ttm_bo_alloc_state *alloc_state)
>>>   {
>>> -    bool may_evict;
>>> +    bool may_evict, below_low = false;
>>
>> No need to init below_low.
> 
> Oops. Oh well, that one goes into v5 then.
> 
>>
>>>       int ret;
>>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state->charge_pool,
>>> +                      force_space ? &alloc_state->limit_pool : NULL);
>>> +    if (ret) {
>>> +        /*
>>> +         * -EAGAIN means the charge failed, which we treat like an
>>> +         * allocation failure. Therefore, return an error code 
>>> indicating
>>> +         * the allocation failed - either -EBUSY if the allocation 
>>> should
>>> +         * be retried with eviction, or -ENOSPC if there should be 
>>> no second
>>> +         * attempt.
>>> +         */
>>> +        if (ret == -EAGAIN)
>>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>>> +        return ret;
>>> +    }
>>> -    ret = ttm_resource_alloc(bo, place, res,
>>> -                 force_space ? &alloc_state->limit_pool : NULL);
>>> +    /*
>>> +     * cgroup protection plays a special role in eviction.
>>> +     * Conceptually, protection of memory via the dmem cgroup 
>>> controller
>>> +     * entitles the protected cgroup to use a certain amount of memory.
>>> +     * There are two types of protection - the 'low' limit is a
>>> +     * "best-effort" protection, whereas the 'min' limit provides a 
>>> hard
>>> +     * guarantee that memory within the cgroup's allowance will not be
>>> +     * evicted under any circumstance.
>>> +     *
>>> +     * To faithfully model this concept in TTM, we also need to take 
>>> cgroup
>>> +     * protection into account when allocating. When allocation in one
>>> +     * place fails, TTM will default to trying other places first 
>>> before
>>> +     * evicting.
>>> +     * If the allocation is covered by dmem cgroup protection, however,
>>> +     * this prevents the allocation from using the memory it is 
>>> "entitled"
>>> +     * to. To make sure unprotected allocations cannot push new 
>>> protected
>>> +     * allocations out of places they are "entitled" to use, we should
>>> +     * evict buffers not covered by any cgroup protection, if this
>>> +     * allocation is covered by cgroup protection.
>>> +     *
>>> +     * Buffers covered by 'min' protection are a special case - the 
>>> 'min'
>>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>>> protected
>>> +     * by 'low' but not 'min' should also be considered for eviction.
>>> +     * Buffers protected by 'min' will never be considered for eviction
>>> +     * anyway, so the regular eviction path should be triggered here.
>>> +     * Buffers protected by 'low' but not 'min' will take a special
>>> +     * eviction path that only evicts buffers covered by neither 
>>> 'low' or
>>> +     * 'min' protections.
>>> +     */
>>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state->charge_pool);
>>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>>
>> Are these some magic macros? Couldn't grep for them.
> 
> They're functions added in patch 1.

Doh!

> 
>>
>>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>>  > +> +    ret = ttm_resource_alloc(bo, place, res, alloc_state- 
>>  >charge_pool);
>>>       if (ret) {
>>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>>> +                (may_evict || below_low))
>>>               ret = -EBUSY;
>>>           return ret;
>>
>> Where does the uncharge happen on the failure path?
> 
> The charge is passed to the caller regardless of success or failure, so 
> that the caller can retry allocation (with eviction) using the same 
> charge. The caller is also responsible for uncharging.
> 
> Maybe this is clearer in the split patch in v4.

I don't see it just yet. I see v4 also removes one pair of 
dmem_cgroup_try_charge + dmem_cgroup_uncharge, while adding one 
dmem_cgroup_try_charge. I don't see where the uncharge in the new flow is.

At least there are some returns directly out of ttm_bo_alloc_resource() 
(before or after the eviction attempt) so couldn't those have the charge 
already applied and not uncharged? Or the games with converting error 
codes make that impossible?

Regards,

Tvrtko

>>>       }
>>> +    /*
>>> +     * Ownership of charge_pool has been transferred to the TTM 
>>> resource,
>>> +     * don't make the caller think we still hold a reference to it.
>>> +     */
>>> +    alloc_state->charge_pool = NULL;
>>>       return 0;
>>>   }
>>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>>> ttm_buffer_object *bo,
>>>                   res, &alloc_state);
>>>           if (ret == -ENOSPC) {
>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               continue;
>>>           } else if (ret == -EBUSY) {
>>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>>> ttm_buffer_object *bo,
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               if (ret) {
>>> +                dmem_cgroup_pool_state_put(
>>> +                        alloc_state.charge_pool);
>>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>>                       return ret;
>>>                   continue;
>>>               }
>>>           } else if (ret) {
>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>               return ret;
>>>           }
>>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ 
>>> ttm/ ttm_resource.c
>>> index e2c82ad07eb44..fcfa8b51b0337 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>>> ttm_resource_manager *man,
>>>   }
>>>   EXPORT_SYMBOL(ttm_resource_fini);
>>> +/**
>>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>>> + * @bo: buffer for which an allocation should be charged
>>> + * @place: where the allocation is attempted to be placed
>>> + * @ret_pool: on charge success, the pool that was charged
>>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>>> failure
>>> + *
>>> + * Should be used to charge cgroups before attempting resource 
>>> allocation.
>>> + * When charging succeeds, the value of ret_pool should be passed to
>>> + * ttm_resource_alloc.
>>> + *
>>> + * Returns: 0 on charge success, negative errno on failure.
>>> + */
>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>> +                const struct ttm_place *place,
>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>>> +{
>>> +    struct ttm_resource_manager *man =
>>> +        ttm_manager_type(bo->bdev, place->mem_type);
>>> +
>>> +    if (!man->cg) {
>>> +        *ret_pool = NULL;
>>> +        if (ret_limit_pool)
>>> +            *ret_limit_pool = NULL;
>>> +        return 0;
>>> +    }
>>> +
>>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>>> +                      ret_limit_pool);
>>> +}
>>> +
>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>                  const struct ttm_place *place,
>>>                  struct ttm_resource **res_ptr,
>>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>>> +               struct dmem_cgroup_pool_state *charge_pool)
>>>   {
>>>       struct ttm_resource_manager *man =
>>>           ttm_manager_type(bo->bdev, place->mem_type);
>>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>>       int ret;
>>> -    if (man->cg) {
>>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>>> ret_limit_pool);
>>> -        if (ret)
>>> -            return ret;
>>> -    }
>>> -
>>>       ret = man->func->alloc(man, bo, place, res_ptr);
>>> -    if (ret) {
>>> -        if (pool)
>>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>>> +    if (ret)
>>>           return ret;
>>> -    }
>>> -    (*res_ptr)->css = pool;
>>> +    (*res_ptr)->css = charge_pool;
>>>       spin_lock(&bo->bdev->lru_lock);
>>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>>> ttm_resource.h
>>> index e52bba15012f7..3aef7efdd7cfb 100644
>>> --- a/include/drm/ttm/ttm_resource.h
>>> +++ b/include/drm/ttm/ttm_resource.h
>>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct ttm_buffer_object 
>>> *bo,
>>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>>                  struct ttm_resource *res);
>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>> +                const struct ttm_place *place,
>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>                  const struct ttm_place *place,
>>>                  struct ttm_resource **res,
>>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>>> +               struct dmem_cgroup_pool_state *charge_pool);
>>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>>> ttm_resource **res);
>>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>>                    struct ttm_resource *res,
>>>
>>
> 


  reply	other threads:[~2026-02-25 14:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
2025-11-10 12:37 ` [PATCH v3 1/5] cgroup/dmem: Add queries for protection values Natalie Vock
2025-11-10 12:37 ` [PATCH v3 2/5] cgroup/dmem: Add dmem_cgroup_common_ancestor helper Natalie Vock
2025-11-10 12:37 ` [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place Natalie Vock
2026-02-24 16:09   ` Tvrtko Ursulin
2026-02-25  9:36     ` Natalie Vock
2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
2026-02-24 16:40   ` Tvrtko Ursulin
2026-02-25  9:49     ` Natalie Vock
2026-02-25 10:12       ` Tvrtko Ursulin
2026-02-25 10:19         ` Natalie Vock
2026-02-25 11:45   ` Tvrtko Ursulin
2026-02-25 13:26     ` Natalie Vock
2026-02-25 14:09       ` Tvrtko Ursulin [this message]
2025-11-10 12:37 ` [PATCH v3 5/5] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f24e705-0ecb-4915-9f58-e50ba2b353bf@ursulin.net \
    --to=tursulin@ursulin.net \
    --cc=airlied@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dev@lankhorst.se \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mkoutny@suse.com \
    --cc=mripard@kernel.org \
    --cc=natalie.vock@gmx.de \
    --cc=ray.huang@amd.com \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox