From: Dave Airlie <airlied@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: dri-devel@lists.freedesktop.org, tj@kernel.org,
christian.koenig@amd.com, Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
cgroups@vger.kernel.org, Waiman Long <longman@redhat.com>,
simona@ffwll.ch
Subject: Re: [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)
Date: Thu, 8 May 2025 08:11:17 +1000 [thread overview]
Message-ID: <CAPM=9tw--fB7S75THKQQtLav2XPq9REU76keggKy2_jCJe+tYg@mail.gmail.com> (raw)
In-Reply-To: <CAPM=9tw0hn=doXVdH_hxQMvUhyAQvWOp+HT24RVGA7Hi=nhwRA@mail.gmail.com>
On Thu, 8 May 2025 at 08:03, Dave Airlie <airlied@gmail.com> wrote:
>
> On Thu, 8 May 2025 at 03:52, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > Hello Dave,
> >
> > On Fri, May 02, 2025 at 01:35:59PM +1000, Dave Airlie wrote:
> > > Hey all,
> > >
> > > This is my second attempt at adding the initial simple memcg/ttm
> > > integration.
> > >
> > > This varies from the first attempt in two major ways:
> > >
> > > 1. Instead of using __GFP_ACCOUNT and direct calling kmem charges
> > > for pool memory, and directly hitting the GPU statistic, Waiman
> > > suggested I just do what the network socket stuff did, which looks
> > > simpler. So this adds two new memcg apis that wrap accounting.
> > > The pages no longer get assigned the memcg, it's owned by the
> > > larger BO object which makes more sense.
> >
> > Unfortunately, this was bad advice :(
> >
> > Naked-charging like this is quite awkward from the memcg side. It
> > requires consumer-specific charge paths in the memcg code, adds stat
> > counters that are memcg-only with no system-wide equivalent, and it's
> > difficult for the memcg maintainers to keep track of the link between
> > what's in the counter and the actual physical memory that is supposed
> > to be tracked.
> >
> > The network and a few others like it are rather begrudging exceptions
> > because they do not have a suitable page context or otherwise didn't
> > fit the charging scheme. They're not good examples to follow if it can
> > at all be avoided.
>
> I unfortunately feel GPU might fit in this category as well, we aren't
> allocating pages in the traditional manner, so we have a number of
> problems with doing it at the page level.
>
> I think the biggest barrier to doing page level tracking is with the
> page pools we have to keep. When we need CPU uncached pages, we
> allocate a bunch of pages and do the work to fix their cpu mappings to
> be uncached, this work is costly, so when we no longer need the
> uncached pages in an object, we return them to a pool rather than to
> the central allocator. We then use a shrinker to empty the pool and
> undo the cpu mapping change. We also do equivalent pools for dma able
> and 32-bit dma able pages if they are used.
>
> This means that if userspace allocates a large uncached object, and we
> account it to the current memcg with __GFP_ACCOUNT, then when it gets
> handed back to the pool we have to remove it from the memcg
> accounting. This was where I used the memcg kmem charge/uncharges,
> uncharge on adding to pool and charge on reuse. However kmem seems
> like the wrong place to charge, but it's where the initial
> __GFP_ACCOUNT will put those pages. This is why the suggestions to use
> the network solution worked better for me, I can do all the work
> outside the pool code at a slightly higher level (the same level where
> we charge/uncharge dmem), and I don't have to worry about handling the
> different pool types etc. We also don't need page->memcg to be set for
> these pages I don't think as all pages are part of a large object and
> the object is what gets swapped or freed, not parts of it.
With all that said we also manage moving objects to swap and maybe for
proper accounting of
swapping I need the more detailed integrated approach at the lower
levels, so objects that have
been swapped get removed from the gpu counter and added to the normal
swap counters.
Dave.
>
> >
> > __GFP_ACCOUNT and an enum node_stat_item is the much preferred way. I
> > have no objections to exports if you need to charge and account memory
> > from a module.
>
> Now maybe it makes sense to add a node_stat_item for this as well as
> the GPU memcg accounting so we can have it accounted in both places,
> right now mm has no insight statistics wise into this memory at all,
> other than it being pages allocated.
>
> Dave.
next prev parent reply other threads:[~2025-05-07 22:11 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-02 3:35 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2) Dave Airlie
2025-05-02 3:36 ` [PATCH 1/5] memcg: add GPU statistic Dave Airlie
2025-05-02 3:36 ` [PATCH 2/5] memcg: add hooks for gpu memcg charging/uncharging Dave Airlie
2025-05-02 3:36 ` [PATCH 3/5] ttm: add initial memcg integration. (v2) Dave Airlie
2025-05-02 12:01 ` Christian König
2025-05-02 14:24 ` kernel test robot
2025-05-03 2:09 ` kernel test robot
2025-05-02 3:36 ` [PATCH 4/5] amdgpu: add support for memcg integration Dave Airlie
2025-05-02 14:01 ` Waiman Long
2025-05-02 3:36 ` [PATCH 5/5] nouveau: add " Dave Airlie
2025-05-06 0:37 ` [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2) Shakeel Butt
2025-05-06 0:59 ` Dave Airlie
2025-05-07 17:52 ` Johannes Weiner
2025-05-07 22:03 ` Dave Airlie
2025-05-07 22:11 ` Dave Airlie [this message]
2025-05-13 7:54 ` Johannes Weiner
2025-05-15 3:02 ` Dave Airlie
2025-05-15 8:55 ` Christian König
2025-05-15 15:04 ` Waiman Long
2025-05-15 15:16 ` Christian König
2025-05-15 16:08 ` Johannes Weiner
2025-05-16 6:53 ` Christian König
2025-05-16 14:53 ` Johannes Weiner
2025-05-16 15:35 ` Christian König
2025-05-16 16:41 ` Johannes Weiner
2025-05-16 17:42 ` Christian König
2025-05-16 20:04 ` Johannes Weiner
2025-05-16 20:25 ` Dave Airlie
2025-05-18 16:28 ` Christian König
2025-05-19 6:18 ` Dave Airlie
2025-05-19 8:26 ` Christian König
2025-05-22 19:51 ` Tejun Heo
2025-05-23 7:58 ` Christian König
2025-05-23 17:06 ` Tejun Heo
2025-05-26 8:19 ` Christian König
2025-05-26 20:13 ` Dave Airlie
2025-05-27 8:01 ` Christian König
2025-05-16 16:12 ` Johannes Weiner
2025-05-21 2:23 ` Dave Airlie
2025-05-21 7:50 ` Christian König
2025-05-21 14:43 ` Johannes Weiner
2025-05-22 7:03 ` Dave Airlie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPM=9tw--fB7S75THKQQtLav2XPq9REU76keggKy2_jCJe+tYg@mail.gmail.com' \
--to=airlied@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=simona@ffwll.ch \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).