From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Nhat Pham <nphamcs@gmail.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, ngupta@vflare.org,
sjenning@redhat.com, ddstreet@ieee.org,
vitaly.wool@konsulko.com
Subject: Re: [PATCH 2/5] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks
Date: Thu, 3 Nov 2022 11:10:47 -0700 [thread overview]
Message-ID: <CAJD7tkZ6VDnX3ACVS-iKCJfS4CrLdnZWzLD9T1-yqqDDhMrk_Q@mail.gmail.com> (raw)
In-Reply-To: <Y2QDgSAc7FcIaNBh@cmpxchg.org>
On Thu, Nov 3, 2022 at 11:08 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Nov 03, 2022 at 08:53:29AM -0700, Minchan Kim wrote:
> > On Thu, Nov 03, 2022 at 11:18:04AM -0400, Johannes Weiner wrote:
> > > On Wed, Nov 02, 2022 at 02:36:35PM -0700, Minchan Kim wrote:
> > > > On Wed, Nov 02, 2022 at 12:28:56PM +0900, Sergey Senozhatsky wrote:
> > > > > On (22/10/26 13:06), Nhat Pham wrote:
> > > > > > struct size_class {
> > > > > > - spinlock_t lock;
> > > > > > struct list_head fullness_list[NR_ZS_FULLNESS];
> > > > > > /*
> > > > > > * Size of objects stored in this class. Must be multiple
> > > > > > @@ -247,8 +245,7 @@ struct zs_pool {
> > > > > > #ifdef CONFIG_COMPACTION
> > > > > > struct work_struct free_work;
> > > > > > #endif
> > > > > > - /* protect page/zspage migration */
> > > > > > - rwlock_t migrate_lock;
> > > > > > + spinlock_t lock;
> > > > > > };
> > > > >
> > > > > I'm not in love with this, to be honest. One big pool lock instead
> > > > > of 255 per-class locks doesn't look attractive, as one big pool lock
> > > > > is going to be hammered quite a lot when zram is used, e.g. as a regular
> > > > > block device with a file system and is under heavy parallel writes/reads.
> > >
> > > TBH the class always struck me as an odd scope to split the lock. Lock
> > > contention depends on how variable the compression rate is of the
> > > hottest incoming data, which is unpredictable from a user POV.
> > >
> > > My understanding is that the primary usecase for zram is swapping, and
> > > the pool lock is the same granularity as the swap locking.
> >
> > People uses the zram to store caching object files in build server.
>
> Oh, interesting. We can try with a kernel build directory on zram.
>
> > > Do you have a particular one in mind? (I'm thinking journaled ones are
> > > not of much interest, since their IO tends to be fairly serialized.)
> > >
> > > btrfs?
> >
> > I am not sure what FSes others are using but at least for me, just
> > plain ext4.
>
> Okay, we can test with both.
>
> > > > I am also worry about that LRU stuff should be part of allocator
> > > > instead of higher level.
> > >
> > > I'm sorry, but that's not a reasonable objection.
> > >
> > > These patches implement a core feature of being a zswap backend, using
> > > standard LRU and locking techniques established by the other backends.
> > >
> > > I don't disagree that it would nicer if zswap had a strong abstraction
> > > for backend pages and a generalized LRU. But that is major surgery on
> > > a codebase of over 6,500 lines. It's not a reasonable ask to change
> > > all that first before implementing a basic feature that's useful now.
> >
> > With same logic, folks added the LRU logic into their allocators
> > without the effort considering moving the LRU into upper layer.
> >
> > And then trend is still going on since I have seen multiple times
> > people are trying to add more allocators. So if it's not a reasonable
> > ask to consier, we couldn't stop the trend in the end.
>
> So there is actually an ongoing effort to do that. Yosry and I have
> spent quite some time on coming up with an LRU design that's
> independent from compression policy over email and at Plumbers.
>
> My concern is more about the order of doing things:
>
> 1. The missing writeback support is a gaping hole in zsmalloc, which
> affects production systems. A generalized LRU list is a good idea,
> but it's a huge task that from a user pov really is not
> critical. Even from a kernel dev / maintainer POV, there are bigger
> fish to fry in the zswap code base and the backends than this.
>
> 2. Refactoring existing functionality is much easier than writing
> generalized code that simultaneously enables new behavior. zsmalloc
> is the most complex of our backends. To make its LRU writeback work
> we had to patch zswap's ->map ordering to accomodate it, e.g. Such
> tricky changes are easier to make and test incrementally.
>
> The generalized LRU project will hugely benefit from already having
> a proven writeback implementation in zsmalloc, because then all the
> requirements in zswap and zsmalloc will be in black and white.
>
> > > I get that your main interest is zram, and so this feature isn't of
> > > interest to you. But zram isn't the only user, nor is it the primary
> >
> > I am interest to the feature but my interest is more of general swap
> > layer to manage the LRU so that it could support any hierarchy among
> > swap devices, not only zswap.
>
> I think we're on the same page about the longer term goals.
>
Yeah. As Johannes said, I was also recently looking into this. This
can also help solve other problems than consolidating implementations.
Currently if zswap rejects a page, it goes into swap, which is
more-or-less a violation of page LRUs since hotter pages that are more
recently reclaimed end up in swap (slow), while colder pages that were
reclaimed before are in zswap. Having a separate layer managing the
LRU of swap pages can also make sure this doesn't happen.
More broadly, making zswap a separate layer from swap enables other
improvements such as using zswap regardless of the presence of a
backend swapfile and not consuming space in swapfiles if a page is in
zswap. Of course, this is a much larger surgery.
I am intending to spend more time looking further into this, but other
things keep popping up :)
next prev parent reply other threads:[~2022-11-03 18:11 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-26 20:06 [PATCH 0/5] Implement writeback for zsmalloc Nhat Pham
2022-10-26 20:06 ` [PATCH 1/5] zswap: fix writeback lock ordering " Nhat Pham
2022-10-26 20:06 ` [PATCH 2/5] zsmalloc: Consolidate zs_pool's migrate_lock and size_class's locks Nhat Pham
2022-10-28 14:46 ` Johannes Weiner
2022-11-02 3:28 ` Sergey Senozhatsky
2022-11-02 21:36 ` Minchan Kim
2022-11-03 15:18 ` Johannes Weiner
2022-11-03 15:53 ` Minchan Kim
2022-11-03 18:08 ` Johannes Weiner
2022-11-03 18:10 ` Yosry Ahmed [this message]
2022-11-03 20:37 ` Minchan Kim
2022-11-03 20:46 ` Yosry Ahmed
2022-11-03 21:15 ` Yu Zhao
2022-11-03 23:19 ` Yosry Ahmed
2022-11-03 21:43 ` Minchan Kim
2022-11-03 23:31 ` Yosry Ahmed
2022-11-03 20:22 ` Minchan Kim
2022-11-04 3:58 ` Sergey Senozhatsky
2022-11-07 21:31 ` Nhat Pham
2022-11-07 22:35 ` Minchan Kim
2022-10-26 20:06 ` [PATCH 3/5] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Nhat Pham
2022-10-28 14:47 ` Johannes Weiner
2022-10-26 20:06 ` [PATCH 4/5] zsmalloc: Add ops fields to zs_pool to store evict handlers Nhat Pham
2022-10-28 15:18 ` Johannes Weiner
2022-11-02 4:10 ` Sergey Senozhatsky
2022-11-07 21:36 ` Nhat Pham
2022-10-26 20:06 ` [PATCH 5/5] zsmalloc: Implement writeback mechanism for zsmalloc Nhat Pham
2022-10-27 13:53 ` kernel test robot
2022-10-27 18:27 ` [PATCH v2 " Nhat Pham
2022-10-28 15:19 ` Johannes Weiner
2022-11-02 3:42 ` Sergey Senozhatsky
2022-11-03 15:26 ` Johannes Weiner
2022-11-02 3:44 ` Sergey Senozhatsky
2022-11-02 4:13 ` Sergey Senozhatsky
2022-11-03 16:45 ` Johannes Weiner
2022-11-04 4:02 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkZ6VDnX3ACVS-iKCJfS4CrLdnZWzLD9T1-yqqDDhMrk_Q@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ngupta@vflare.org \
--cc=nphamcs@gmail.com \
--cc=senozhatsky@chromium.org \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).