public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Natalie Vock <natalie.vock@gmx.de>
To: "Tvrtko Ursulin" <tursulin@ursulin.net>,
	"Maarten Lankhorst" <dev@lankhorst.se>,
	"Maxime Ripard" <mripard@kernel.org>, "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Christian Koenig" <christian.koenig@amd.com>,
	"Huang Rui" <ray.huang@amd.com>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>
Cc: cgroups@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit
Date: Wed, 25 Feb 2026 11:19:38 +0100	[thread overview]
Message-ID: <cd9a2b39-3e68-42be-a538-9653227fbf4e@gmx.de> (raw)
In-Reply-To: <e097bf9b-0078-4de4-929c-0b9e0b26af8e@ursulin.net>

On 2/25/26 11:12, Tvrtko Ursulin wrote:
> 
> On 25/02/2026 09:49, Natalie Vock wrote:
>> On 2/24/26 17:40, Tvrtko Ursulin wrote:
>>>
>>> On 10/11/2025 12:37, Natalie Vock wrote:
>>>> When the cgroup's memory usage is below the low/min limit and 
>>>> allocation
>>>> fails, try evicting some unprotected buffers to make space. Otherwise,
>>>> application buffers may be forced to go into GTT even though usage is
>>>> below the corresponding low/min limit, if other applications filled 
>>>> VRAM
>>>> with their allocations first.
>>>>
>>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_bo.c       | 75 ++++++++++++++++++++++++++ 
>>>> + + ++++++----
>>>>   drivers/gpu/drm/ttm/ttm_resource.c | 48 +++++++++++++++++-------
>>>>   include/drm/ttm/ttm_resource.h     |  6 ++-
>>>>   3 files changed, 108 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ 
>>>> ttm_bo.c
>>>> index 829d994798835..bd467c965e1bc 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -490,8 +490,12 @@ int ttm_bo_evict_first(struct ttm_device *bdev, 
>>>> struct ttm_resource_manager *man
>>>>   }
>>>>   struct ttm_bo_alloc_state {
>>>> +    /** @charge_pool: The memory pool the resource is charged to */
>>>> +    struct dmem_cgroup_pool_state *charge_pool;
>>>>       /** @limit_pool: Which pool limit we should test against */
>>>>       struct dmem_cgroup_pool_state *limit_pool;
>>>> +    /** @only_evict_unprotected: If eviction should be restricted 
>>>> to unprotected BOs */
>>>> +    bool only_evict_unprotected;
>>>>   };
>>>>   /**
>>>> @@ -546,7 +550,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk 
>>>> *walk, struct ttm_buffer_object *
>>>>       evict_walk->evicted++;
>>>>       if (evict_walk->res)
>>>>           lret = ttm_resource_alloc(evict_walk->evictor, evict_walk- 
>>>> >place,
>>>> -                      evict_walk->res, NULL);
>>>> +                      evict_walk->res, evict_walk->alloc_state- 
>>>> >charge_pool);
>>>>       if (lret == 0)
>>>>           return 1;
>>>>   out:
>>>> @@ -589,7 +593,7 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>>> *bdev,
>>>>       lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
>>>>       /* One more attempt if we hit low limit? */
>>>> -    if (!lret && evict_walk.hit_low) {
>>>> +    if (!lret && evict_walk.hit_low && !state- 
>>>> >only_evict_unprotected) {
>>>
>>> What is unprotected synonymous with? No low watermark set? Should 
>>> dmem_cgroup_state_evict_valuable() even set *hit_low = true for if 
>>> low is not set to begin with?
>>
>> In terms of cgroup usage, a cgroup (and by extension, its BOs) is 
>> protected as long as its usage stays under the low watermark (if not 
>> set, that watermark is zero and any BO is trivially unprotected). If 
>> the usage exceeds the low watermark, the cgroup/its BOs become 
>> unprotected and can be evicted (more easily), until the usage goes 
>> below the watermark again.
> 
> Got it thanks, so either no low set, or usage above low. Makes sense.
> 
>> With only_evict_unprotected, what we're trying to do is evict buffers 
>> from any cgroup that currently exceeds its low (or min) watermark, but 
>> leave alone cgroups that are within their limits. I've elaborated on 
>> the rationale more in the cover letter, but essentially, this is 
>> supposed to make TTM honor the low/min protection better for cgroups 
>> that are allocating and currently within their low/min watermark, by 
>> allowing them to push out BOs from cgroups that exceed their 
>> respective watermarks.
> 
> Yep, I got that part. Just that I will need a second pass to fully grasp 
> the extended logic. Problem being more booleans and passes make things 
> more complex. That is why I made this side question on whether it even 
> makes sense for dmem_cgroup_state_evict_valuable() to set hit_low if the 
> low is not even set. Assuming I got it right it can happen:
> 
>      if (!ignore_low) {
>          low = READ_ONCE(ctest->elow);
>          if (used > low)
>              return true;
> 
>          *ret_hit_low = true;
>          return false;
>      }
> 
> So I was wondering what would be the effect of making that like this:
> 
>      if (!ignore_low) {
>          low = READ_ONCE(ctest->elow);
>          if (used > low)
>              return true;
> 
> +        if (low)
> +            *ret_hit_low = true;
> 
>          return false;
>      }
> 
> 
> Could that somehow simplify the logic, maybe allow for not having to add 
> the additional condition above. Possibly not, it seems more complex than 
> that. But I am just thinking out loud at this point. Again, I need to 
> make a second reading pass.

FWIW, hit_low can never become true if low is not set. If we even call 
dmem_cgroup_eviction_valuable in the first place, that means there has 
to be a buffer object associated with the cgroup. Unless something went 
super duper wrong, that also means the memory usage of that buffer is 
charged to that cgroup, and therefore usage has to be > 0.

If low is unset, i.e. 0, we therefore always exit out of the
"if (used > low)" statement, and never set *ret_hit_low to true.

*ret_hit_low can only be set to true if there is a buffer that has a 
(set) low limit associated with its cgroup, and that cgroup's usage does 
not exceed the low limit.

Thanks,
Natalie

> 
>> I'll add some comments to the only_evict_unprotected docs to explain 
>> what "unprotected" means here.
>>
>>>
>>>>           evict_walk.try_low = true;
>>>>           lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 
>>>> 1);
>>>>       }
>>>> @@ -610,7 +614,8 @@ static int ttm_bo_evict_alloc(struct ttm_device 
>>>> *bdev,
>>>>       } while (!lret && evict_walk.evicted);
>>>>       /* We hit the low limit? Try once more */
>>>> -    if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
>>>> +    if (!lret && evict_walk.hit_low && !evict_walk.try_low &&
>>>> +            !state->only_evict_unprotected) {
>>>>           evict_walk.try_low = true;
>>>>           goto retry;
>>>>       }
>>>> @@ -719,20 +724,72 @@ static int ttm_bo_alloc_at_place(struct 
>>>> ttm_buffer_object *bo,
>>>>                    struct ttm_resource **res,
>>>>                    struct ttm_bo_alloc_state *alloc_state)
>>>>   {
>>>> -    bool may_evict;
>>>> +    bool may_evict, below_low = false;
>>>>       int ret;
>>>>       may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
>>>> +    ret = ttm_resource_try_charge(bo, place, &alloc_state- 
>>>> >charge_pool,
>>>> +                      force_space ? &alloc_state->limit_pool : NULL);
>>>> +    if (ret) {
>>>> +        /*
>>>> +         * -EAGAIN means the charge failed, which we treat like an
>>>> +         * allocation failure. Therefore, return an error code 
>>>> indicating
>>>> +         * the allocation failed - either -EBUSY if the allocation 
>>>> should
>>>> +         * be retried with eviction, or -ENOSPC if there should be 
>>>> no second
>>>> +         * attempt.
>>>> +         */
>>>> +        if (ret == -EAGAIN)
>>>> +            ret = may_evict ? -EBUSY : -ENOSPC;
>>>> +        return ret;
>>>> +    }
>>>> -    ret = ttm_resource_alloc(bo, place, res,
>>>> -                 force_space ? &alloc_state->limit_pool : NULL);
>>>> +    /*
>>>> +     * cgroup protection plays a special role in eviction.
>>>> +     * Conceptually, protection of memory via the dmem cgroup 
>>>> controller
>>>> +     * entitles the protected cgroup to use a certain amount of 
>>>> memory.
>>>> +     * There are two types of protection - the 'low' limit is a
>>>> +     * "best-effort" protection, whereas the 'min' limit provides a 
>>>> hard
>>>> +     * guarantee that memory within the cgroup's allowance will not be
>>>> +     * evicted under any circumstance.
>>>> +     *
>>>> +     * To faithfully model this concept in TTM, we also need to 
>>>> take cgroup
>>>> +     * protection into account when allocating. When allocation in one
>>>> +     * place fails, TTM will default to trying other places first 
>>>> before
>>>> +     * evicting.
>>>> +     * If the allocation is covered by dmem cgroup protection, 
>>>> however,
>>>> +     * this prevents the allocation from using the memory it is 
>>>> "entitled"
>>>> +     * to. To make sure unprotected allocations cannot push new 
>>>> protected
>>>> +     * allocations out of places they are "entitled" to use, we should
>>>> +     * evict buffers not covered by any cgroup protection, if this
>>>> +     * allocation is covered by cgroup protection.
>>>> +     *
>>>> +     * Buffers covered by 'min' protection are a special case - the 
>>>> 'min'
>>>> +     * limit is a stronger guarantee than 'low', and thus buffers 
>>>> protected
>>>> +     * by 'low' but not 'min' should also be considered for eviction.
>>>> +     * Buffers protected by 'min' will never be considered for 
>>>> eviction
>>>> +     * anyway, so the regular eviction path should be triggered here.
>>>> +     * Buffers protected by 'low' but not 'min' will take a special
>>>> +     * eviction path that only evicts buffers covered by neither 
>>>> 'low' or
>>>> +     * 'min' protections.
>>>> +     */
>>>> +    may_evict |= dmem_cgroup_below_min(NULL, alloc_state- 
>>>> >charge_pool);
>>>> +    below_low = dmem_cgroup_below_low(NULL, alloc_state->charge_pool);
>>>> +    alloc_state->only_evict_unprotected = !may_evict && below_low;
>>>> +
>>>> +    ret = ttm_resource_alloc(bo, place, res, alloc_state- 
>>>> >charge_pool);
>>>>       if (ret) {
>>>> -        if ((ret == -ENOSPC || ret == -EAGAIN) && may_evict)
>>>> +        if ((ret == -ENOSPC || ret == -EAGAIN) &&
>>>> +                (may_evict || below_low))
>>>>               ret = -EBUSY;
>>>>           return ret;
>>>>       }
>>>> +    /*
>>>> +     * Ownership of charge_pool has been transferred to the TTM 
>>>> resource,
>>>> +     * don't make the caller think we still hold a reference to it.
>>>> +     */
>>>> +    alloc_state->charge_pool = NULL;
>>>>       return 0;
>>>>   }
>>>> @@ -787,6 +844,7 @@ static int ttm_bo_alloc_resource(struct 
>>>> ttm_buffer_object *bo,
>>>>                   res, &alloc_state);
>>>>           if (ret == -ENOSPC) {
>>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>>               continue;
>>>>           } else if (ret == -EBUSY) {
>>>> @@ -796,11 +854,14 @@ static int ttm_bo_alloc_resource(struct 
>>>> ttm_buffer_object *bo,
>>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>>               if (ret) {
>>>> +                dmem_cgroup_pool_state_put(
>>>> +                        alloc_state.charge_pool);
>>>
>>> Funky line break.
>>
>> Will fix.
>>
>>>
>>>>                   if (ret != -ENOSPC && ret != -EBUSY)
>>>>                       return ret;
>>>>                   continue;
>>>>               }
>>>>           } else if (ret) {
>>>> +            dmem_cgroup_pool_state_put(alloc_state.charge_pool);
>>>>               dmem_cgroup_pool_state_put(alloc_state.limit_pool);
>>>>               return ret;
>>>>           }
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ 
>>>> ttm/ ttm_resource.c
>>>> index e2c82ad07eb44..fcfa8b51b0337 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>>>> @@ -372,30 +372,52 @@ void ttm_resource_fini(struct 
>>>> ttm_resource_manager *man,
>>>>   }
>>>>   EXPORT_SYMBOL(ttm_resource_fini);
>>>> +/**
>>>> + * ttm_resource_try_charge - charge a resource manager's cgroup pool
>>>> + * @bo: buffer for which an allocation should be charged
>>>> + * @place: where the allocation is attempted to be placed
>>>> + * @ret_pool: on charge success, the pool that was charged
>>>> + * @ret_limit_pool: on charge failure, the pool responsible for the 
>>>> failure
>>>> + *
>>>> + * Should be used to charge cgroups before attempting resource 
>>>> allocation.
>>>> + * When charging succeeds, the value of ret_pool should be passed to
>>>> + * ttm_resource_alloc.
>>>> + *
>>>> + * Returns: 0 on charge success, negative errno on failure.
>>>> + */
>>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>>> +                const struct ttm_place *place,
>>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>>> +                struct dmem_cgroup_pool_state **ret_limit_pool)
>>>> +{
>>>> +    struct ttm_resource_manager *man =
>>>> +        ttm_manager_type(bo->bdev, place->mem_type);
>>>> +
>>>> +    if (!man->cg) {
>>>> +        *ret_pool = NULL;
>>>> +        if (ret_limit_pool)
>>>> +            *ret_limit_pool = NULL;
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    return dmem_cgroup_try_charge(man->cg, bo->base.size, ret_pool,
>>>> +                      ret_limit_pool);
>>>> +}
>>>> +
>>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>>                  const struct ttm_place *place,
>>>>                  struct ttm_resource **res_ptr,
>>>> -               struct dmem_cgroup_pool_state **ret_limit_pool)
>>>> +               struct dmem_cgroup_pool_state *charge_pool)
>>>>   {
>>>>       struct ttm_resource_manager *man =
>>>>           ttm_manager_type(bo->bdev, place->mem_type);
>>>> -    struct dmem_cgroup_pool_state *pool = NULL;
>>>>       int ret;
>>>> -    if (man->cg) {
>>>> -        ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, 
>>>> ret_limit_pool);
>>>> -        if (ret)
>>>> -            return ret;
>>>> -    }
>>>> -
>>>>       ret = man->func->alloc(man, bo, place, res_ptr);
>>>> -    if (ret) {
>>>> -        if (pool)
>>>> -            dmem_cgroup_uncharge(pool, bo->base.size);
>>>> +    if (ret)
>>>>           return ret;
>>>> -    }
>>>> -    (*res_ptr)->css = pool;
>>>> +    (*res_ptr)->css = charge_pool;
>>>
>>> Is it possible to somehow split this patch into two? I mean first a 
>>> patch which changes the prototype of ttm_resource_alloc(), adjusting 
>>> the callers, set out new rules for owning the charge pool, etc, then 
>>> the patch which only adds the cgroup smarts to 
>>> ttm_bo_alloc_at_place(). If that could be made without creating any 
>>> functional difference to the eviction alone I think it could make it 
>>> easier to review.
>>
>> Will try.
> 
> Only if it sounds plausible that it can be sensibly done. Otherwise 
> don't spend too much time if you think it makes no sense. I'll wait for 
> the verdict, or for v4 to appear and then have another go of making 
> sense of the existing vs new eviction logic.
> 
> Regards,
> 
> Tvrtko
> 
> 
>> Thanks,
>> Natalie
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>       spin_lock(&bo->bdev->lru_lock);
>>>>       ttm_resource_add_bulk_move(*res_ptr, bo);
>>>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ 
>>>> ttm_resource.h
>>>> index e52bba15012f7..3aef7efdd7cfb 100644
>>>> --- a/include/drm/ttm/ttm_resource.h
>>>> +++ b/include/drm/ttm/ttm_resource.h
>>>> @@ -442,10 +442,14 @@ void ttm_resource_init(struct 
>>>> ttm_buffer_object *bo,
>>>>   void ttm_resource_fini(struct ttm_resource_manager *man,
>>>>                  struct ttm_resource *res);
>>>> +int ttm_resource_try_charge(struct ttm_buffer_object *bo,
>>>> +                const struct ttm_place *place,
>>>> +                struct dmem_cgroup_pool_state **ret_pool,
>>>> +                struct dmem_cgroup_pool_state **ret_limit_pool);
>>>>   int ttm_resource_alloc(struct ttm_buffer_object *bo,
>>>>                  const struct ttm_place *place,
>>>>                  struct ttm_resource **res,
>>>> -               struct dmem_cgroup_pool_state **ret_limit_pool);
>>>> +               struct dmem_cgroup_pool_state *charge_pool);
>>>>   void ttm_resource_free(struct ttm_buffer_object *bo, struct 
>>>> ttm_resource **res);
>>>>   bool ttm_resource_intersects(struct ttm_device *bdev,
>>>>                    struct ttm_resource *res,
>>>>
>>>
>>
> 


  reply	other threads:[~2026-02-25 10:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 12:37 [PATCH v3 0/5] cgroup/dmem,drm/ttm: Improve protection in contended cases Natalie Vock
2025-11-10 12:37 ` [PATCH v3 1/5] cgroup/dmem: Add queries for protection values Natalie Vock
2025-11-10 12:37 ` [PATCH v3 2/5] cgroup/dmem: Add dmem_cgroup_common_ancestor helper Natalie Vock
2025-11-10 12:37 ` [PATCH v3 3/5] drm/ttm: Make a helper for attempting allocation in a place Natalie Vock
2026-02-24 16:09   ` Tvrtko Ursulin
2026-02-25  9:36     ` Natalie Vock
2025-11-10 12:37 ` [PATCH v3 4/5] drm/ttm: Be more aggressive when allocating below protection limit Natalie Vock
2026-02-24 16:40   ` Tvrtko Ursulin
2026-02-25  9:49     ` Natalie Vock
2026-02-25 10:12       ` Tvrtko Ursulin
2026-02-25 10:19         ` Natalie Vock [this message]
2026-02-25 11:45   ` Tvrtko Ursulin
2026-02-25 13:26     ` Natalie Vock
2026-02-25 14:09       ` Tvrtko Ursulin
2025-11-10 12:37 ` [PATCH v3 5/5] drm/ttm: Use common ancestor of evictor and evictee as limit pool Natalie Vock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd9a2b39-3e68-42be-a538-9653227fbf4e@gmx.de \
    --to=natalie.vock@gmx.de \
    --cc=airlied@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dev@lankhorst.se \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mkoutny@suse.com \
    --cc=mripard@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    --cc=tursulin@ursulin.net \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox