Re: [PATCH 07/16] memcg: add support for GPU page counters. (v4)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org, tj@kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Waiman Long <longman@redhat.com>,
	simona@ffwll.ch
Subject: Re: [PATCH 07/16] memcg: add support for GPU page counters. (v4)
Date: Wed, 25 Feb 2026 10:09:55 +0100	[thread overview]
Message-ID: <4fddf319-50c4-40ab-9e36-04d629a8855e@amd.com> (raw)
In-Reply-To: <CAPM=9txUuS-qzA+gX2DvTuYR2OZ79RG86FuDA6czkpuJ_SR6KQ@mail.gmail.com>

On 2/24/26 20:28, Dave Airlie wrote:
> On Tue, 24 Feb 2026 at 17:50, Christian König <christian.koenig@amd.com> wrote:
>>
>> On 2/24/26 03:06, Dave Airlie wrote:
>>> From: Dave Airlie <airlied@redhat.com>
>>>
>>> This introduces 2 new statistics and 3 new memcontrol APIs for dealing
>>> with GPU system memory allocations.
>>>
>>> The stats corresponds to the same stats in the global vmstat,
>>> for number of active GPU pages, and number of pages in pools that
>>> can be reclaimed.
>>>
>>> The first API charges a order of pages to a objcg, and sets
>>> the objcg on the pages like kmem does, and updates the active/reclaim
>>> statistic.
>>>
>>> The second API uncharges a page from the obj cgroup it is currently charged
>>> to.
>>>
>>> The third API allows moving a page to/from reclaim and between obj cgroups.
>>> When pages are added to the pool lru, this just updates accounting.
>>> When pages are being removed from a pool lru, they can be taken from
>>> the parent objcg so this allows them to be uncharged from there and transferred
>>> to a new child objcg.
>>>
>>> Acked-by: Christian König <christian.koenig@amd.com>
>>
>> I have to take that back.
>>
>> After going over the different use cases I'm now pretty convinced that charging any GPU/TTM allocation to memcg is the wrong approach to the problem.
> 
> You'll need to sell me a bit more on this idea, I don't hate it, but
> it seems to be honest kinda half baked and smells a bit of reachitect
> without form, so please start up you writing skills and give me
> something concrete here.
> 
>>
>> Instead TTM should have a dmem_cgroup_pool which can limit the amount of system memory each cgroup can use from GTT.
> 
> This sounds like a static limit though, how would we configure that in
> a sane way?

See the discussion about dmem controller for CMA with Mathew, T.J., me and a couple of others. It's on dri-devel and I've CCed you on my latest reply.

>>
>> The use case that GTT memory should account to memcg is actually only valid for an extremely small number of HPC customers and for those use cases we have different approaches to solve this issue (udmabuf, system DMA-buf heap, etc...).
> 
> Stop, I have a major use case for this that isn't any of those.
> Integrated GPUs on Intel and AMD accounting the RAM usage to somewhere
> useful, so cgroup mgmt of desktop clients actually work, so when
> firefox uses GPU memory it gets accounted to firefox and when the OOM
> killer comes along it can choose the correct user.

Oh, yes! I have tried multiple times to fix this as well in the last decade or so.

> This has been a pain in the ass for desktop for years, and I'd like to
> fix it, the HPC use case if purely a driver for me doing the work.

Wait a second. How does accounting to cgroups help with that in any way?

The last time I looked into this problem the OOM killer worked based on the per task_struct stats which couldn't be influenced this way.

Both me and others have tried that approach multiple times and so far it never worked.

> Can you give a detailed explanation of how your idea will work in an
> unconfigured cgroup environment to help this case?

It wouldn't, but I also don't see how this patch set here would.

The accounting limits the amount of memory you can allocate per process for each cgroup, but it does not affect the OOM killer score in any way.

If we want to fix the OOM killer score we would need to start using the proportional set size in the OOM instead of the resident set size. And that in turn means the changes to the OOM killer and FS layer I already proposed over a decade ago.

Otherwise you can always come up with deny of service attacks against centralized services like X or Wayland.

>>
>> What we can do is to say that this dmem_cgroup_pool then also accounts to memcg for selected cgroups. This would not only make it superflous to have different flags in drivers and TTM to turn this feature on/off, but also allow charging VRAM or other local memory to memcg because they use system memory as fallback for device memory.
>>
>> In other more high level words memcg is actually the swapping space for dmem.
> 
> This is descriptive, but still feels very static, and nothing I've
> seen indicated I want this to be a 50% type limit.

The initial idea was to have more like a 90% limit by default, so that at least enough memory is left to SSH into the box and kill a run away process.

Christian.

> 
> Dave.
>>

next prev parent reply	other threads:[~2026-02-25  9:10 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  2:06 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver (complete series v5) Dave Airlie
2026-02-24  2:06 ` [PATCH 01/16] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
2026-02-24  2:06 ` [PATCH 02/16] drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) Dave Airlie
2026-02-24  2:06 ` [PATCH 03/16] ttm/pool: port to list_lru. (v2) Dave Airlie
2026-02-24  2:06 ` [PATCH 04/16] ttm/pool: drop numa specific pools Dave Airlie
2026-02-24  2:06 ` [PATCH 05/16] ttm/pool: make pool shrinker NUMA aware (v2) Dave Airlie
2026-02-24  2:06 ` [PATCH 06/16] ttm/pool: track allocated_pages per numa node Dave Airlie
2026-02-24  2:06 ` [PATCH 07/16] memcg: add support for GPU page counters. (v4) Dave Airlie
2026-02-24  7:20   ` kernel test robot
2026-02-24  7:50   ` Christian König
2026-02-24 19:28     ` Dave Airlie
2026-02-25  9:09       ` Christian König [this message]
2026-03-02 14:15         ` Shakeel Butt
2026-03-02 14:37           ` Christian König
2026-03-02 15:40             ` Shakeel Butt
2026-03-02 15:51               ` Christian König
2026-03-02 17:16                 ` Shakeel Butt
2026-03-02 19:36                   ` Christian König
2026-03-05  3:23                     ` Dave Airlie
2026-03-02 19:35                 ` T.J. Mercier
2026-03-03  9:29                   ` Christian König
2026-03-03 17:25                     ` T.J. Mercier
2026-03-05  3:19                   ` Dave Airlie
2026-03-05  9:25                     ` Christian König
2026-03-10  1:27                     ` T.J. Mercier
2026-02-24  2:06 ` [PATCH 08/16] ttm: add a memcg accounting flag to the alloc/populate APIs Dave Airlie
2026-02-24  8:42   ` kernel test robot
2026-02-24  2:06 ` [PATCH 09/16] ttm/pool: initialise the shrinker earlier Dave Airlie
2026-02-24  2:06 ` [PATCH 10/16] ttm: add objcg pointer to bo and tt (v2) Dave Airlie
2026-02-24  2:06 ` [PATCH 11/16] ttm/pool: enable memcg tracking and shrinker. (v3) Dave Airlie
2026-02-24  2:06 ` [PATCH 12/16] ttm: hook up memcg placement flags Dave Airlie
2026-02-24  2:06 ` [PATCH 13/16] memcontrol: allow objcg api when memcg is config off Dave Airlie
2026-02-24  2:06 ` [PATCH 14/16] amdgpu: add support for memory cgroups Dave Airlie
2026-02-24  2:06 ` [PATCH 15/16] ttm: add support for a module option to disable memcg integration Dave Airlie
2026-02-24  2:06 ` [PATCH 16/16] xe: create a flag to enable memcg accounting for XE as well Dave Airlie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fddf319-50c4-40ab-9e36-04d629a8855e@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.