dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Dave Airlie" <airlied@gmail.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>
Cc: dri-devel@lists.freedesktop.org, tj@kernel.org,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, simona@ffwll.ch
Subject: Re: [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)
Date: Thu, 15 May 2025 11:04:27 -0400	[thread overview]
Message-ID: <4ec2dd5a-3cdb-48cf-a459-4d384a48c671@redhat.com> (raw)
In-Reply-To: <b0953201-8d04-49f3-a116-8ae1936c581c@amd.com>

On 5/15/25 4:55 AM, Christian König wrote:
> On 5/15/25 05:02, Dave Airlie wrote:
>>> I have to admit I'm pretty clueless about the gpu driver internals and
>>> can't really judge how feasible this is. But from a cgroup POV, if you
>>> want proper memory isolation between groups, it seems to me that's the
>>> direction you'd have to take this in.
>> Thanks for this insight, I think you have definitely shown me where
>> things need to go here, and I agree that the goal should be to make
>> the pools and the shrinker memcg aware is the proper answer,
>> unfortunately I think we are long way from that at the moment, but
>> I'll need to do a bit more research. I wonder if we can agree on some
>> compromise points in order to move things forward from where they are
>> now.
>>
>> Right now we have 0 accounting for any system memory allocations done
>> via GPU APIs, never mind the case where we have pools and evictions.
>>
>> I think I sort of see 3 stages:
>> 1. Land some sort of accounting so you can at least see the active GPU
>> memory usage globally, per-node and per-cgroup - this series mostly
>> covers that, modulo any other feedback I get.
>> 2. Work on making the ttm subsystem cgroup aware and achieve the state
>> where we can shrink inside the cgroup first.
>> 3. Work on what to do with evicted memory for VRAM allocations, and
>> how best to integrate with dmem to possibly allow userspace to define
>> policy for this.
>>
>>> Ah, no need to worry about it. The name is just a historical memcgism,
>>> from back when we first started charging "kernel" allocations, as
>>> opposed to the conventional, pageable userspace memory. It's no longer
>>> a super meaningful distinction, tbh.
>>>
>>> You can still add a separate counter for GPU memory.
>> Okay that's interesting, so I guess the only question vs the bespoke
>> ones is whether we use __GFP_ACCOUNT and whether there is benefit in
>> having page->memcg set.
>>
>>> I agree this doesn't need to be a goal in itself. It would just be a
>>> side effect of charging through __GFP_ACCOUNT and uncharging inside
>>> __free_pages(). What's more important is that the charge lifetime is
>>> correlated with the actual memory allocation.
>> How much flexibility to do we have to evolve here, like if we start
>> with where the latest series I posted gets us (maybe with a CONFIG
>> option), then work on memcg aware shrinkers for the pools, then with
>> that in place it might make more sense to account across the complete
>> memory allocation. I think I'm also not sure if passing __GFP_ACCOUNT
>> to the dma allocators is supported, which is also something we need to
>> do, and having the bespoke API allows that to be possible.
> Stop for a second.
>
> As far as I can see the shrinker for the TTM pool should *not* be memcg aware. Background is that pages who enter the pool are considered freed by the application.
>
> The only reason we have the pool is to speed up allocation of uncached and write combined pages as well as work around for performance problems of the coherent DMA API.
>
> The shrinker makes sure that the pages can be given back to the core memory management at any given time.

I think what Johannes is suggesting that we can have a separate cgroup 
just for the memory pool so that if the system is running into global 
memory pressure, the shrinker can force the uncached pool to release 
memory back into the system.

Cheers,
Longman


  reply	other threads:[~2025-05-15 15:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-02  3:35 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2) Dave Airlie
2025-05-02  3:36 ` [PATCH 1/5] memcg: add GPU statistic Dave Airlie
2025-05-02  3:36 ` [PATCH 2/5] memcg: add hooks for gpu memcg charging/uncharging Dave Airlie
2025-05-02  3:36 ` [PATCH 3/5] ttm: add initial memcg integration. (v2) Dave Airlie
2025-05-02 12:01   ` Christian König
2025-05-02 14:24   ` kernel test robot
2025-05-03  2:09   ` kernel test robot
2025-05-02  3:36 ` [PATCH 4/5] amdgpu: add support for memcg integration Dave Airlie
2025-05-02 14:01   ` Waiman Long
2025-05-02  3:36 ` [PATCH 5/5] nouveau: add " Dave Airlie
2025-05-06  0:37 ` [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2) Shakeel Butt
2025-05-06  0:59   ` Dave Airlie
2025-05-07 17:52 ` Johannes Weiner
2025-05-07 22:03   ` Dave Airlie
2025-05-07 22:11     ` Dave Airlie
2025-05-13  7:54     ` Johannes Weiner
2025-05-15  3:02       ` Dave Airlie
2025-05-15  8:55         ` Christian König
2025-05-15 15:04           ` Waiman Long [this message]
2025-05-15 15:16             ` Christian König
2025-05-15 16:08           ` Johannes Weiner
2025-05-16  6:53             ` Christian König
2025-05-16 14:53               ` Johannes Weiner
2025-05-16 15:35                 ` Christian König
2025-05-16 16:41                   ` Johannes Weiner
2025-05-16 17:42                     ` Christian König
2025-05-16 20:04                       ` Johannes Weiner
2025-05-16 20:25                         ` Dave Airlie
2025-05-18 16:28                           ` Christian König
2025-05-19  6:18                             ` Dave Airlie
2025-05-19  8:26                               ` Christian König
2025-05-22 19:51                           ` Tejun Heo
2025-05-23  7:58                             ` Christian König
2025-05-23 17:06                               ` Tejun Heo
2025-05-26  8:19                                 ` Christian König
2025-05-26 20:13                                   ` Dave Airlie
2025-05-27  8:01                                     ` Christian König
2025-05-16 16:12         ` Johannes Weiner
2025-05-21  2:23       ` Dave Airlie
2025-05-21  7:50         ` Christian König
2025-05-21 14:43         ` Johannes Weiner
2025-05-22  7:03           ` Dave Airlie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ec2dd5a-3cdb-48cf-a459-4d384a48c671@redhat.com \
    --to=llong@redhat.com \
    --cc=airlied@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).