cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Airlie <airlied@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: dri-devel@lists.freedesktop.org, tj@kernel.org,
	christian.koenig@amd.com,  Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org,  Waiman Long <longman@redhat.com>,
	simona@ffwll.ch
Subject: Re: [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2)
Date: Thu, 8 May 2025 08:11:17 +1000	[thread overview]
Message-ID: <CAPM=9tw--fB7S75THKQQtLav2XPq9REU76keggKy2_jCJe+tYg@mail.gmail.com> (raw)
In-Reply-To: <CAPM=9tw0hn=doXVdH_hxQMvUhyAQvWOp+HT24RVGA7Hi=nhwRA@mail.gmail.com>

On Thu, 8 May 2025 at 08:03, Dave Airlie <airlied@gmail.com> wrote:
>
> On Thu, 8 May 2025 at 03:52, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > Hello Dave,
> >
> > On Fri, May 02, 2025 at 01:35:59PM +1000, Dave Airlie wrote:
> > > Hey all,
> > >
> > > This is my second attempt at adding the initial simple memcg/ttm
> > > integration.
> > >
> > > This varies from the first attempt in two major ways:
> > >
> > > 1. Instead of using __GFP_ACCOUNT and direct calling kmem charges
> > > for pool memory, and directly hitting the GPU statistic, Waiman
> > > suggested I just do what the network socket stuff did, which looks
> > > simpler. So this adds two new memcg apis that wrap accounting.
> > > The pages no longer get assigned the memcg, it's owned by the
> > > larger BO object which makes more sense.
> >
> > Unfortunately, this was bad advice :(
> >
> > Naked-charging like this is quite awkward from the memcg side. It
> > requires consumer-specific charge paths in the memcg code, adds stat
> > counters that are memcg-only with no system-wide equivalent, and it's
> > difficult for the memcg maintainers to keep track of the link between
> > what's in the counter and the actual physical memory that is supposed
> > to be tracked.
> >
> > The network and a few others like it are rather begrudging exceptions
> > because they do not have a suitable page context or otherwise didn't
> > fit the charging scheme. They're not good examples to follow if it can
> > at all be avoided.
>
> I unfortunately feel GPU might fit in this category as well, we aren't
> allocating pages in the traditional manner, so we have a number of
> problems with doing it at the page level.
>
> I think the biggest barrier to doing page level tracking is with the
> page pools we have to keep. When we need CPU uncached pages, we
> allocate a bunch of pages and do the work to fix their cpu mappings to
> be uncached, this work is costly, so when we no longer need the
> uncached pages in an object, we return them to a pool rather than to
> the central allocator. We then use a shrinker to empty the pool and
> undo the cpu mapping change. We also do equivalent pools for dma able
> and 32-bit dma able pages if they are used.
>
> This means that if userspace allocates a large uncached object, and we
> account it to the current memcg with __GFP_ACCOUNT, then when it gets
> handed back to the pool we have to remove it from the memcg
> accounting. This was where I used the memcg kmem charge/uncharges,
> uncharge on adding to pool and charge on reuse. However kmem seems
> like the wrong place to charge, but it's where the initial
> __GFP_ACCOUNT will put those pages. This is why the suggestions to use
> the network solution worked better for me, I can do all the work
> outside the pool code at a slightly higher level (the same level where
> we charge/uncharge dmem), and I don't have to worry about handling the
> different pool types etc. We also don't need page->memcg to be set for
> these pages I don't think as all pages are part of a large object and
> the object is what gets swapped or freed, not parts of it.

With all that said we also manage moving objects to swap and maybe for
proper accounting of
swapping I need the more detailed integrated approach at the lower
levels, so objects that have
been swapped get removed from the gpu counter and added to the normal
swap counters.

Dave.

>
> >
> > __GFP_ACCOUNT and an enum node_stat_item is the much preferred way. I
> > have no objections to exports if you need to charge and account memory
> > from a module.
>
> Now maybe it makes sense to add a node_stat_item for this as well as
> the GPU memcg accounting so we can have it accounted in both places,
> right now mm has no insight statistics wise into this memory at all,
> other than it being pages allocated.
>
> Dave.

  reply	other threads:[~2025-05-07 22:11 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-02  3:35 [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2) Dave Airlie
2025-05-02  3:36 ` [PATCH 1/5] memcg: add GPU statistic Dave Airlie
2025-05-02  3:36 ` [PATCH 2/5] memcg: add hooks for gpu memcg charging/uncharging Dave Airlie
2025-05-02  3:36 ` [PATCH 3/5] ttm: add initial memcg integration. (v2) Dave Airlie
2025-05-02 12:01   ` Christian König
2025-05-02 14:24   ` kernel test robot
2025-05-03  2:09   ` kernel test robot
2025-05-02  3:36 ` [PATCH 4/5] amdgpu: add support for memcg integration Dave Airlie
2025-05-02 14:01   ` Waiman Long
2025-05-02  3:36 ` [PATCH 5/5] nouveau: add " Dave Airlie
2025-05-06  0:37 ` [rfc] drm/ttm/memcg: simplest initial memcg/ttm integration (v2) Shakeel Butt
2025-05-06  0:59   ` Dave Airlie
2025-05-07 17:52 ` Johannes Weiner
2025-05-07 22:03   ` Dave Airlie
2025-05-07 22:11     ` Dave Airlie [this message]
2025-05-13  7:54     ` Johannes Weiner
2025-05-15  3:02       ` Dave Airlie
2025-05-15  8:55         ` Christian König
2025-05-15 15:04           ` Waiman Long
2025-05-15 15:16             ` Christian König
2025-05-15 16:08           ` Johannes Weiner
2025-05-16  6:53             ` Christian König
2025-05-16 14:53               ` Johannes Weiner
2025-05-16 15:35                 ` Christian König
2025-05-16 16:41                   ` Johannes Weiner
2025-05-16 17:42                     ` Christian König
2025-05-16 20:04                       ` Johannes Weiner
2025-05-16 20:25                         ` Dave Airlie
2025-05-18 16:28                           ` Christian König
2025-05-19  6:18                             ` Dave Airlie
2025-05-19  8:26                               ` Christian König
2025-05-22 19:51                           ` Tejun Heo
2025-05-23  7:58                             ` Christian König
2025-05-23 17:06                               ` Tejun Heo
2025-05-26  8:19                                 ` Christian König
2025-05-26 20:13                                   ` Dave Airlie
2025-05-27  8:01                                     ` Christian König
2025-05-16 16:12         ` Johannes Weiner
2025-05-21  2:23       ` Dave Airlie
2025-05-21  7:50         ` Christian König
2025-05-21 14:43         ` Johannes Weiner
2025-05-22  7:03           ` Dave Airlie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPM=9tw--fB7S75THKQQtLav2XPq9REU76keggKy2_jCJe+tYg@mail.gmail.com' \
    --to=airlied@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=simona@ffwll.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).