From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: "Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>
Cc: Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Seth Jennings <sjenning-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Dan Streetman <ddstreet-EkmVulN54Sk@public.gmane.org>,
Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kernel-team-b10kYP2dOMg@public.gmane.org
Subject: Re: [PATCH v2 6/6] zswap: memcg accounting
Date: Fri, 13 May 2022 13:08:13 -0400 [thread overview]
Message-ID: <Yn6QfdouzkcrygTR@cmpxchg.org> (raw)
In-Reply-To: <20220513151426.GC16096-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
Hello Michal,
On Fri, May 13, 2022 at 05:14:26PM +0200, Michal Koutný wrote:
> On Wed, May 11, 2022 at 03:06:56PM -0400, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> > Correct. After which the uncompressed page is reclaimed and uncharged.
> > So the zswapout process will reduce the charge bottom line.
>
> A zswap object falling under memory.current was my first thinking, I was
> confused why it's exported as a separate counter memory.zswap.current
> (which IMO suggests disjoint counting) and it doubles a
> memory.stat:zswap entry.
>
> Is the separate memory.zswap.current good for anything? (Except maybe
> avoiding global rstat flush on memory.stat read but that'd be an
> undesired precendent.)
Right, it's accounted as a subset rather than fully disjointed. But it
is a limitable counter of its own, so I exported it as such, with a
current and a max knob. This is comparable to the kmem counter in v1.
From an API POV it would be quite strange to have max for a counter
that has no current. Likewise it would be strange for a major memory
consumer to be missing from memory.stat.
> (Ad the eventually reduced footprint, the transitional excursion above
> memcg's (or ancestor's) limit should be limited by number of parallel
> reclaims running (each one at most a page, right?), so it doesn't seem
> necessary to tackle (now).)
Correct.
> > memory.zswap.* are there to configure zswap policy, within the
> > boundaries of available memory - it's by definition a subset.
>
> I see how the .max works when equal to 0 or "max". The intermediate
> values are more difficult to reason about.
It needs to be configured to the workload's access frequency curve,
which can be done with trial-and-error (reasonable balance between
zswpins and pswpins) or in a more targeted manner using tools such as
page_idle, damon etc.
> Also, I can see that on the global level, zswap is configured relatively
> (/sys/module/zswap/parameters/max_pool_percent).
> You wrote that the actual configured value is workload specific, would
> it be simpler to have also relative zswap limit per memcg?
>
> (Relative wrt memory.max, it'd be rather just a convenience with this
> simple ratio, however, it'd correspond to the top level limit. OTOH, the
> relatives would have counter-intuitive hierarchical behavior. I don't
> mean this should be changed, rather wondering why this variant was
> chosen.)
A percentage isn't a bad way to pick a global default limit for a
kernel feature. But it would have been preferable if zswap had used
the percentage internally and made the knob based in bytes (like
min_free_kbytes for example).
Because for load tuning, bytes make much more sense. That's how you
measure the workingset, so a percentage is an awkward indirection. At
the cgroup level, it makes even less sense: all memcg tunables are in
bytes, it would be quite weird to introduce a "max" that is 0-100. Add
the confusion of how percentages would propagate down the hierarchy...
> > +bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > +{
> > + struct mem_cgroup *memcg, *original_memcg;
> > + bool ret = true;
> > +
> > + original_memcg = get_mem_cgroup_from_objcg(objcg);
> > + for (memcg = original_memcg; memcg != root_mem_cgroup;
> > + memcg = parent_mem_cgroup(memcg)) {
> > + unsigned long max = READ_ONCE(memcg->zswap_max);
> > + unsigned long pages;
> > +
> > + if (max == PAGE_COUNTER_MAX)
> > + continue;
> > + if (max == 0) {
> > + ret = false;
> > + break;
> > + }
> > +
> > + cgroup_rstat_flush(memcg->css.cgroup);
>
> Here, I think it'd be better not to bypass mem_cgroup_flush_stats() (the
> mechanism is approximate and you traverse all ancestors anyway), i.e.
> mem_cgroup_flush_stats() before the loop instead of this.
I don't traverse all ancestors, I bail on disabled groups and skip
unlimited ones. This saves a lot of flushes in practice right now: our
heaviest swapping cgroups have zswap disabled (max=0) because they're
lowpri and forced to disk. Likewise, the zswap users have their zswap
limit several levels down from the root, and I currently don't ever
flush the higher levels (max=PAGE_COUNTER_MAX).
Flushing unnecessary groups with a ratelimit doesn't sound like an
improvement to me.
Thanks
next prev parent reply other threads:[~2022-05-13 17:08 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-10 15:28 [PATCH v2 0/6] zswap: accounting & cgroup control Johannes Weiner
[not found] ` <20220510152847.230957-1-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-10 15:28 ` [PATCH v2 1/6] Documentation: filesystems: proc: update meminfo section Johannes Weiner
[not found] ` <20220510152847.230957-2-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-11 17:11 ` David Hildenbrand
[not found] ` <7a6f8520-a496-e3c3-1fd9-8a30b7a12b14-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-05-11 18:51 ` Johannes Weiner
[not found] ` <YnwFraZlVWQoCjz3-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-12 8:55 ` David Hildenbrand
2022-05-10 15:28 ` [PATCH v2 2/6] mm: Kconfig: move swap and slab config options to the MM section Johannes Weiner
2022-05-10 15:28 ` [PATCH v2 3/6] mm: Kconfig: group swap, slab, hotplug and thp options into submenus Johannes Weiner
[not found] ` <20220510152847.230957-4-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-10 22:40 ` Andrew Morton
[not found] ` <20220510154037.c7916ee9d7de90eedd12f92c-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2022-05-11 15:22 ` Johannes Weiner
[not found] ` <YnvU0hwCfQ11P8Ce-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-11 16:28 ` Johannes Weiner
2022-05-10 15:28 ` [PATCH v2 4/6] mm: Kconfig: simplify zswap configuration Johannes Weiner
2022-05-10 15:28 ` [PATCH v2 5/6] mm: zswap: add basic meminfo and vmstat coverage Johannes Weiner
[not found] ` <20220510152847.230957-6-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-11 17:13 ` David Hildenbrand
2022-05-10 15:28 ` [PATCH v2 6/6] zswap: memcg accounting Johannes Weiner
[not found] ` <20220510152847.230957-7-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-11 17:32 ` Michal Koutný
[not found] ` <20220511173218.GB31592-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-05-11 19:06 ` Johannes Weiner
[not found] ` <YnwJUL90fuoHs3YW-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-13 15:14 ` Michal Koutný
[not found] ` <20220513151426.GC16096-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-05-13 17:08 ` Johannes Weiner [this message]
[not found] ` <Yn6QfdouzkcrygTR-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-05-16 14:34 ` Michal Koutný
[not found] ` <20220516143459.GA17557-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-05-16 20:01 ` Johannes Weiner
2022-05-17 23:52 ` Andrew Morton
[not found] ` <20220517165216.7acd8434f8b25606836e21e6-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2022-05-18 8:23 ` Michal Koutný
2022-05-13 17:23 ` Shakeel Butt
[not found] ` <CALvZod6kBZZFfD6Y5p_=9TMJr8P-vU_77NTq048wGUDr0wTv0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-05-13 18:25 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yn6QfdouzkcrygTR@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ddstreet-EkmVulN54Sk@public.gmane.org \
--cc=guro-b10kYP2dOMg@public.gmane.org \
--cc=kernel-team-b10kYP2dOMg@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-IBi9RG/b67k@public.gmane.org \
--cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=mkoutny-IBi9RG/b67k@public.gmane.org \
--cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=sjenning-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox