From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Zefan Li <lizefan.x@bytedance.com>, Yu Zhao <yuzhao@google.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
Iurii Zaikin <yzaikin@google.com>,
"T.J. Mercier" <tjmercier@google.com>,
Greg Thelen <gthelen@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org
Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs
Date: Fri, 21 Jul 2023 13:59:55 -0700 [thread overview]
Message-ID: <CAJD7tkaro0opThQaMTFr_8sAjiFFEsaZK9YzEjBaSiDJ93DOBg@mail.gmail.com> (raw)
In-Reply-To: <20230721204408.GA1033322@cmpxchg.org>
On Fri, Jul 21, 2023 at 1:44 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Jul 21, 2023 at 11:47:49AM -0700, Yosry Ahmed wrote:
> > On Fri, Jul 21, 2023 at 11:26 AM Tejun Heo <tj@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote:
> > > > On Thu, Jul 20, 2023 at 3:31 PM Tejun Heo <tj@kernel.org> wrote:
> > > > > memory at least in our case. The sharing across them comes down to things
> > > > > like some common library pages which don't really account for much these
> > > > > days.
> > > >
> > > > Keep in mind that even a single page charged to a memcg and used by
> > > > another memcg is sufficient to result in a zombie memcg.
> > >
> > > I mean, yeah, that's a separate issue or rather a subset which isn't all
> > > that controversial. That can be deterministically solved by reparenting to
> > > the parent like how slab is handled. I think the "deterministic" part is
> > > important here. As you said, even a single page can pin a dying cgroup.
> >
> > There are serious flaws with reparenting that I mentioned above. We do
> > it for kernel memory, but that's because we really have no other
> > choice. Oftentimes the memory is not reclaimable and we cannot find an
> > owner for it. This doesn't mean it's the right answer for user memory.
> >
> > The semantics are new compared to normal charging (as opposed to
> > recharging, as I explain below). There is an extra layer of
> > indirection that we did not (as far as I know) measure the impact of.
> > Parents end up with pages that they never used and we have no
> > observability into where it came from. Most importantly, over time
> > user memory will keep accumulating at the root, reducing the accuracy
> > and usefulness of accounting, effectively an accounting leak and
> > reduction of capacity. Memory that is not attributed to any user, aka
> > system overhead.
>
> Reparenting has been the behavior since the first iteration of cgroups
> in the kernel. The initial implementation would loop over the LRUs and
> reparent pages synchronously during rmdir. This had some locking
> issues, so we switched to the current implementation of just leaving
> the zombie memcg behind but neutralizing its controls.
Thanks for the context.
>
> Thanks to Roman's objcg abstraction, we can now go back to the old
> implementation of directly moving pages up to avoid the zombies.
>
> However, these were pure implementation changes. The user-visible
> semantics never varied: when you delete a cgroup, any leftover
> resources are subject to control by the remaining parent cgroups.
> Don't remove control domains if you still need to control resources.
> But none of this is new or would change in any way!
The problem is that you cannot fully monitor or control all the
resources charged to a control domain. The example of common shared
libraries stands, the pages are charged on first touch basis. You
can't easily control it or monitor who is charged for what exactly.
Even if you can find out, is the answer to leave the cgroup alive
forever because it is charged for a shared resource?
> Neutralizing
> controls of a zombie cgroup results in the same behavior and
> accounting as linking the pages to the parent cgroup's LRU!
>
> The only thing that's new is the zombie cgroups. We can fix that by
> effectively going back to the earlier implementation, but thanks to
> objcg without the locking problems.
>
> I just wanted to address this, because your description/framing of
> reparenting strikes me as quite wrong.
Thanks for the context, and sorry if my framing was inaccurate. I was
more focused on the in-kernel semantics rather than user-visible
semantics. Nonetheless, with today's status or with reparenting, once
the memory is at the root level (whether reparented to the root level,
or in a zombie memcg whose parent is root), the memory has effectively
escaped accounting. This is not a new problem that reparenting would
introduce, but it's a problem that recharging is trying to fix that
reparenting won't.
As I outlined above, the semantics of recharging are not new, they are
equivalent to reclaiming and refaulting the memory in a more
accelerated/efficient manner. The indeterminism in recharging is very
similar to reclaiming and refaulting.
What do you think?
next prev parent reply other threads:[~2023-07-21 21:00 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-20 7:08 [RFC PATCH 0/8] memory recharging for offline memcgs Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 1/8] memcg: refactor updating memcg->moving_account Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 2/8] mm: vmscan: add lruvec_for_each_list() helper Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 3/8] memcg: recharge mapped folios when a memcg is offlined Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 4/8] memcg: support deferred memcg recharging Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 5/8] memcg: recharge folios when accessed or dirtied Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 6/8] memcg: add stats for offline memcgs recharging Yosry Ahmed
2023-07-20 7:08 ` [RFC PATCH 7/8] memcg: add sysctl and config option to control memory recharging Yosry Ahmed
2023-07-20 18:13 ` Luis Chamberlain
2023-07-20 18:24 ` Yosry Ahmed
2023-07-20 18:30 ` Luis Chamberlain
2023-07-20 7:08 ` [RFC PATCH 8/8] selftests: cgroup: test_memcontrol: add a selftest for memcg recharging Yosry Ahmed
2023-07-20 15:35 ` [RFC PATCH 0/8] memory recharging for offline memcgs Johannes Weiner
2023-07-20 19:57 ` Tejun Heo
2023-07-20 21:34 ` Yosry Ahmed
2023-07-20 22:11 ` Tejun Heo
2023-07-20 22:23 ` Yosry Ahmed
2023-07-20 22:31 ` Tejun Heo
2023-07-20 23:24 ` T.J. Mercier
2023-07-20 23:33 ` Tejun Heo
2023-07-21 18:15 ` Yosry Ahmed
2023-07-21 18:26 ` Tejun Heo
2023-07-21 18:47 ` Yosry Ahmed
2023-07-21 19:18 ` Tejun Heo
2023-07-21 20:37 ` Yosry Ahmed
2023-07-21 20:44 ` Johannes Weiner
2023-07-21 20:59 ` Yosry Ahmed [this message]
2023-07-20 21:33 ` Yosry Ahmed
2023-08-01 9:54 ` Michal Hocko
2023-07-21 0:02 ` Roman Gushchin
2023-07-21 0:07 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkaro0opThQaMTFr_8sAjiFFEsaZK9YzEjBaSiDJ93DOBg@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mcgrof@kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=tjmercier@google.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).