From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <zhouchengming@bytedance.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: zswap: optimize zswap pool size tracking
Date: Tue, 12 Mar 2024 04:04:33 +0000 [thread overview]
Message-ID: <Ze_UUeajWWkKpZJ0@google.com> (raw)
In-Reply-To: <20240312023411.GA22705@cmpxchg.org>
On Mon, Mar 11, 2024 at 10:34:11PM -0400, Johannes Weiner wrote:
> On Mon, Mar 11, 2024 at 10:09:35PM +0000, Yosry Ahmed wrote:
> > On Mon, Mar 11, 2024 at 12:12:13PM -0400, Johannes Weiner wrote:
> > > Profiling the munmap() of a zswapped memory region shows 50%(!) of the
> > > total cycles currently going into updating the zswap_pool_total_size.
> >
> > Yikes. I have always hated that size update scheme FWIW.
> >
> > I have also wondered whether it makes sense to just maintain the number
> > of pages in zswap as an atomic, like zswap_stored_pages. I guess your
> > proposed scheme is even cheaper for the load/invalidate paths because we
> > do nothing at all. It could be an option if the aggregation in other
> > paths ever becomes a problem, but we would need to make sure it
> > doesn't regress the load/invalidate paths. Just sharing some thoughts.
>
> Agree with you there. I actually tried doing it that way at first, but
> noticed zram uses zs_get_total_pages() and actually wants a per-pool
> count. I didn't want the backend to have to update two atomics, so I
> settled for this version.
Could be useful to document this context if you send a v2. This version
is a big improvement anyway, so hopefully we don' t need to revisit.
>
> > > There are three consumers of this counter:
> > > - store, to enforce the globally configured pool limit
> > > - meminfo & debugfs, to report the size to the user
> > > - shrink, to determine the batch size for each cycle
> > >
> > > Instead of aggregating everytime an entry enters or exits the zswap
> > > pool, aggregate the value from the zpools on-demand:
> > >
> > > - Stores aggregate the counter anyway upon success. Aggregating to
> > > check the limit instead is the same amount of work.
> > >
> > > - Meminfo & debugfs might benefit somewhat from a pre-aggregated
> > > counter, but aren't exactly hotpaths.
> > >
> > > - Shrinking can aggregate once for every cycle instead of doing it for
> > > every freed entry. As the shrinker might work on tens or hundreds of
> > > objects per scan cycle, this is a large reduction in aggregations.
> > >
> > > The paths that benefit dramatically are swapin, swapoff, and
> > > unmaps. There could be millions of pages being processed until
> > > somebody asks for the pool size again. This eliminates the pool size
> > > updates from those paths entirely.
> >
> > This looks like a big win, thanks! I wonder if you have any numbers of
> > perf profiles to share. That would be nice to have, but I think the
> > benefit is clear regardless.
>
> I deleted the perf files already, but can re-run it tomorrow.
Thanks!
>
> > I also like the implicit cleanup when we switch to maintaining the
> > number of pages rather than bytes. The code looks much better with all
> > the shifts and divisions gone :)
> >
> > I have a couple of comments below. With them addressed, feel free to
> > add:
> > Acked-by: Yosry Ahmed <yosryahmed@google.com>
>
> Thanks!
>
> > > @@ -1385,6 +1365,10 @@ static void shrink_worker(struct work_struct *w)
> > > {
> > > struct mem_cgroup *memcg;
> > > int ret, failures = 0;
> > > + unsigned long thr;
> > > +
> > > + /* Reclaim down to the accept threshold */
> > > + thr = zswap_max_pages() * zswap_accept_thr_percent / 100;
> >
> > This calculation is repeated twice, so I'd rather keep a helper for it
> > as an alternative to zswap_can_accept(). Perhaps zswap_threshold_page()
> > or zswap_acceptance_pages()?
>
> Sounds good. I went with zswap_accept_thr_pages().
Even better.
>
> > > @@ -1711,6 +1700,13 @@ void zswap_swapoff(int type)
> > >
> > > static struct dentry *zswap_debugfs_root;
> > >
> > > +static int debugfs_get_total_size(void *data, u64 *val)
> > > +{
> > > + *val = zswap_total_pages() * PAGE_SIZE;
> > > + return 0;
> > > +}
> > > +DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu");
> >
> > I think we are missing a newline here to maintain the current format
> > (i.e "%llu\n").
>
> Oops, good catch! I had verified the debugfs file (along with the
> others) with 'grep . *', which hides that this is missing. Fixed up.
>
> Thanks for taking a look. The incremental diff is below. I'll run the
> tests and recapture the numbers tomorrow, then send v2.
LGTM. Feel free to carry the Ack forward.
next prev parent reply other threads:[~2024-03-12 4:04 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-11 16:12 [PATCH 1/2] mm: zswap: optimize zswap pool size tracking Johannes Weiner
2024-03-11 16:12 ` [PATCH 2/2] mm: zpool: return pool size in pages Johannes Weiner
2024-03-11 22:12 ` Yosry Ahmed
2024-03-12 2:36 ` Johannes Weiner
2024-03-12 4:07 ` Yosry Ahmed
2024-03-12 4:56 ` Chengming Zhou
2024-03-12 9:15 ` Nhat Pham
2024-03-11 22:09 ` [PATCH 1/2] mm: zswap: optimize zswap pool size tracking Yosry Ahmed
2024-03-12 2:34 ` Johannes Weiner
2024-03-12 4:04 ` Yosry Ahmed [this message]
2024-03-12 4:55 ` Chengming Zhou
2024-03-12 9:12 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ze_UUeajWWkKpZJ0@google.com \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.