From: SeongJae Park <sj@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: SeongJae Park <sj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
Nhat Pham <nphamcs@gmail.com>,
Takero Funaki <flintglass@gmail.com>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
kernel-team@meta.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/zswap: store compression failed page as-is
Date: Thu, 31 Jul 2025 10:09:22 -0700 [thread overview]
Message-ID: <20250731170922.15509-1-sj@kernel.org> (raw)
In-Reply-To: <20250731152701.GA1055539@cmpxchg.org>
On Thu, 31 Jul 2025 11:27:01 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> Hi SJ,
>
> On Wed, Jul 30, 2025 at 04:40:59PM -0700, SeongJae Park wrote:
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device. This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
>
> +1 Thanks for working on this!
My pleasure :)
[...]
> > config 0 1-1 1-2 1-3 2-1 2-2 2-3
> > perf_normalized 1.0000 0.0060 0.0230 0.0310 0.0030 0.0116 0.0003
> > perf_stdev_ratio 0.06 0.04 0.11 0.14 0.03 0.05 0.26
> > zswpin 0 0 1701702 6982188 0 2479848 815742
> > zswpout 0 0 1725260 7048517 0 2535744 931420
> > zswpwb 0 0 0 0 0 0 0
> > pswpin 0 468612 481270 0 476434 535772 0
> > pswpout 0 574634 689625 0 612683 591881 0
>
> zswpwb being zero across the board suggests the zswap shrinker was not
> enabled. Can you double check that?
You're correct, I didn't.
>
> We should only take on incompressible pages to maintain LRU order on
> their way to disk. If we don't try to move them out, then it's better
> to reject them and avoid the metadata overhead.
I agree. I will update the test to explore the writeback effect, and share the
results, by the posting of the next version of this patch.
[...]
> > +/*
> > + * If the compression is failed, try saving the content as is without
> > + * compression, to keep the LRU order. This can increase memory overhead from
> > + * metadata, but in common zswap use cases where there are sufficient amount of
> > + * compressible pages, the overhead should be not ciritical, and can be
> > + * mitigated by the writeback. Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic. For the case, just return the failure. swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> > + struct zswap_entry *entry)
> > +{
> > + if (!zswap_save_incompressible_pages)
> > + return comp_ret;
> > + if (!mem_cgroup_zswap_writeback_enabled(
> > + folio_memcg(page_folio(page))))
> > + return comp_ret;
> > +
> > + entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> > + __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> > + if (!entry->orig_data)
> > + return -ENOMEM;
> > + memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> > + entry->length = PAGE_SIZE;
> > + atomic_long_inc(&zswap_stored_uncompressed_pages);
> > + return 0;
> > +}
>
> Better to still use the backend (zsmalloc) for storage. It'll give you
> migratability, highmem handling, stats etc.
Nhat also pointed out the migratability. Thank you for further letting me know
even more benefits from zsmalloc reuse. As I also replied to Nhat, I agree
those are important benefits, and I will rework on the next version to use the
backend.
>
> So if compression fails, still do zpool_malloc(), but for PAGE_SIZE
> and copy over the uncompressed page contents.
>
> struct zswap_entry has a hole after bool referenced, so you can add a
> flag to mark those uncompressed entries at no extra cost.
>
> Then you can detect this case in zswap_decompress() and handle the
> uncompressed copy into @folio accordingly.
I think we could still use 'zswap_entry->length == PAGE_SIZE' as the indicator,
As long as we ensure that always means the content is incompressed, following
Nhat's suggestion[1].
Please let me know if I'm missing something.
[1] https://lore.kernel.org/CAKEwX=Py+yvxtR5zt-1DtskhGWWHkRP_h8kneEHSrcQ947=m9Q@mail.gmail.com
Thanks,
SJ
next prev parent reply other threads:[~2025-07-31 17:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
2025-07-31 0:21 ` Nhat Pham
2025-07-31 0:22 ` Nhat Pham
2025-07-31 16:43 ` SeongJae Park
2025-07-31 16:43 ` SeongJae Park
2025-07-31 0:48 ` Nhat Pham
2025-07-31 16:56 ` SeongJae Park
2025-07-31 15:27 ` Johannes Weiner
2025-07-31 17:09 ` SeongJae Park [this message]
2025-07-31 18:16 ` Johannes Weiner
2025-07-31 17:20 ` Joshua Hahn
2025-08-01 19:57 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250731170922.15509-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=flintglass@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).