From: SeongJae Park <sj@kernel.org>
To: Nhat Pham <nphamcs@gmail.com>
Cc: SeongJae Park <sj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Takero Funaki <flintglass@gmail.com>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
kernel-team@meta.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/zswap: store compression failed page as-is
Date: Thu, 31 Jul 2025 09:43:23 -0700 [thread overview]
Message-ID: <20250731164323.15107-1-sj@kernel.org> (raw)
In-Reply-To: <CAKEwX=NC65XCkmX1YzivEJtPc+sEJ3pLHUsYhF60QJnk_OtpVw@mail.gmail.com>
Hi Nhat,
On Wed, 30 Jul 2025 17:21:44 -0700 Nhat Pham <nphamcs@gmail.com> wrote:
> On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device. This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
> >
> > Keep the LRU order by storing the original content in zswap as-is. The
> > original content is saved in a dynamically allocated page size buffer,
> > and the pointer to the buffer is kept in zswap_entry, on the space for
> > zswap_entry->pool. Whether the space is used for the original content
> > or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
[...]
> > ---
> > mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 65 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 7e02c760955f..e021865696c6 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
[...]
> > +/*
> > + * If the compression is failed, try saving the content as is without
> > + * compression, to keep the LRU order. This can increase memory overhead from
> > + * metadata, but in common zswap use cases where there are sufficient amount of
> > + * compressible pages, the overhead should be not ciritical, and can be
> > + * mitigated by the writeback. Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic. For the case, just return the failure. swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> > + struct zswap_entry *entry)
> > +{
> > + if (!zswap_save_incompressible_pages)
> > + return comp_ret;
> > + if (!mem_cgroup_zswap_writeback_enabled(
> > + folio_memcg(page_folio(page))))
> > + return comp_ret;
> > +
> > + entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> > + __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
>
> Hmm, seems like this new buffer is not migratable (for compaction etc.?)
>
> My understanding is that zsmalloc's allocated memory can be migrated
> (which is why zswap only works with a handle - it's a layer of
> indirection that gives zsmalloc the ability to move memory around).
>
> Besides, why should we re-invent the wheel when zsmalloc already
> handles page-sized objects? :)
Makes sense, I will use zpool in the next version.
I actually saw both you and Takero did so in your versions, but I didn't
realize the migration benefit of the approach. Thank you for enlightening me,
now I think this migration benefit is important, and I will make the next
version to provide the migratability reusing zpool.
>
> > + if (!entry->orig_data)
> > + return -ENOMEM;
> > + memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> > + entry->length = PAGE_SIZE;
> > + atomic_long_inc(&zswap_stored_uncompressed_pages);
> > + return 0;
> > +}
> > +
> > static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > struct zswap_pool *pool)
> > {
> > @@ -976,8 +1023,11 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > */
> > comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
> > dlen = acomp_ctx->req->dlen;
> > - if (comp_ret)
> > + if (comp_ret) {
> > + comp_ret = zswap_handle_compression_failure(comp_ret, page,
> > + entry);
> > goto unlock;
> > + }
> >
> > zpool = pool->zpool;
> > gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> > @@ -1009,6 +1059,11 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
> > int decomp_ret, dlen;
> > u8 *src, *obj;
> >
> > + if (entry->length == PAGE_SIZE) {
> > + memcpy_to_folio(folio, 0, entry->orig_data, entry->length);
> > + return true;
> > + }
>
> This might not be safe.
>
> It's conceivable that in zswap_compress(), some compression algorithm
> "successfully" compresses a page to the same size (comp_ret == 0). We
> hand that to zsmalloc, which happily stores the page.
>
> When we "decompress" the page again, we will attempt to
> memcpy_to_folio from a bogus address (the handle from zsmalloc).
Makes sense, thank you for catching this.
>
> So, in zswap_compress, you have to treat both comp_ret == 0 and dlen
> == PAGE_SIZE as "compression failure".
I saw your reply saying you were meaning both comp_ret != 0 and dlen ==
PAGE_SIZE, and yes, this makes sense. I will do so in the next version.
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-07-31 16:43 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
2025-07-31 0:21 ` Nhat Pham
2025-07-31 0:22 ` Nhat Pham
2025-07-31 16:43 ` SeongJae Park
2025-07-31 16:43 ` SeongJae Park [this message]
2025-07-31 0:48 ` Nhat Pham
2025-07-31 16:56 ` SeongJae Park
2025-07-31 15:27 ` Johannes Weiner
2025-07-31 17:09 ` SeongJae Park
2025-07-31 18:16 ` Johannes Weiner
2025-07-31 17:20 ` Joshua Hahn
2025-08-01 19:57 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250731164323.15107-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=flintglass@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).