[PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
@ 2025-08-22 19:08 SeongJae Park
  2025-08-26 15:52 ` Chris Li
  0 siblings, 1 reply; 8+ messages in thread
From: SeongJae Park @ 2025-08-22 19:08 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Herbert Xu,
	Johannes Weiner, Nhat Pham, Yosry Ahmed, kernel-team,
	linux-kernel, linux-mm, Takero Funaki, David Hildenbrand,
	Baoquan He, Barry Song, Chris Li, Kairui Song

When zswap writeback is enabled and it fails compressing a given page, the
page is swapped out to the backing swap device.  This behavior breaks the
zswap's writeback LRU order, and hence users can experience unexpected
latency spikes.  If the page is compressed without failure, but results in
a size of PAGE_SIZE, the LRU order is kept, but the decompression overhead
for loading the page back on the later access is unnecessary.

Keep the LRU order and optimize unnecessary decompression overheads in
those cases, by storing the original content as-is in zpool.  The length
field of zswap_entry will be set appropriately, as PAGE_SIZE.  Hence
whether it is saved as-is or not (whether decompression is unnecessary) is
identified by 'zswap_entry->length == PAGE_SIZE'.

Because the uncompressed data is saved in zpool, same to the compressed
ones, this introduces no change in terms of memory management including
movability and migratability of involved pages.

This change is also not increasing per zswap entry metadata overhead.  But
as the number of incompressible pages increases, total zswap metadata
overhead is proportionally increased.  The overhead should not be
problematic in usual cases, since the zswap metadata for single zswap
entry is much smaller than PAGE_SIZE, and in common zswap use cases there
should be a sufficient amount of compressible pages.  Also it can be
mitigated by the zswap writeback.

When the writeback is disabled, the additional overhead could be
problematic.  For the case, keep the current behavior that just returns
the failure and let swap_writeout() put the page back to the active LRU
list in the case.

Knowing how many incompressible pages are stored at the given moment
will be useful for future investigations.  Add a new debugfs file called
stored_incompressible_pages for the purpose.

Tests
-----

I tested this patch using a simple self-written microbenchmark that is
available at GitHub[1].  You can reproduce the test I did by executing
run_tests.sh of the repo on your system.  Note that the repo's
documentation is not good as of this writing, so you may need to read and
use the code.

The basic test scenario is simple.  Run a test program making artificial
accesses to memory having artificial content under memory.high-set memory
limit and measure how many accesses were made in a given time.

The test program repeatedly and randomly access three anonymous memory
regions.  The regions are all 500 MiB size, and be accessed in the same
probability.  Two of those are filled up with a simple content that can
easily be compressed, while the remaining one is filled up with a content
that s read from /dev/urandom, which is easy to fail at compressing to a
size smaller than PAGE_SIZE.  The program runs for two minutes and prints
out the number of accesses made every five seconds.

The test script runs the program under below four configurations.

- 0: memory.high is set to 2 GiB, zswap is disabled.
- 1-1: memory.high is set to 1350 MiB, zswap is disabled.
- 1-2: On 1-1, zswap is enabled without this patch.
- 1-3: On 1-2, this patch is applied.

For all zswap enabled cases, zswap shrinker is enabled.

Configuration '0' is for showing the original memory performance.
Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
zswap, and this patch under a level of memory pressure (~10% of working
set).  Configurations 0 and 1-1 are not the main focus of this patch, but
I'm adding those since their results transparently show how far this
microbenchmark test is from the real world.

Because the per-5 seconds performance is not very reliable, I measured the
average of that for the last one minute period of the test program run.  I
also measured a few vmstat counters including zswpin, zswpout, zswpwb,
pswpin and pswpout during the test runs.

The measurement results are as below.  To save space, I show performance
numbers that are normalized to that of the configuration '0' (no memory
pressure).  The averaged accesses per 5 seconds of configuration '0' was
36493417.75.

    config            0       1-1     1-2      1-3
    perf_normalized   1.0000  0.0057  0.0235   0.0367
    perf_stdev_ratio  0.0582  0.0652  0.0167   0.0346
    zswpin            0       0       3548424  1999335
    zswpout           0       0       3588817  2361689
    zswpwb            0       0       10214    340270
    pswpin            0       485806  772038   340967
    pswpout           0       649543  144773   340270

'perf_normalized' is the performance metric, normalized to that of
configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
deviation of the averaged data points, as a ratio to the averaged metric
value.  For example, configuration '0' performance was showing 5.8% stdev.
Configurations 1-1 and 1-3 were having about 6.5% and 6.1% stdev.  Also
the results were highly variable between multiple runs.  So this result is
not very stable but just showing ball park figures.  Please keep this in
your mind when reading these results.

Under about 10% of working set memory pressure, the performance was
dropped to about 0.57% of no-pressure one, when the normal swap is used
(1-1).  Note that ~10% working set pressure is already extreme, at least
on this test setup.  No one would desire system setups that can degrade
performance to 0.57% of the best case.

By turning zswap on (1-2), the performance was improved about 4x,
resulting in about 2.35% of no-pressure one.  Because of the
incompressible pages in the third memory region, a significant amount of
(non-zswap) swap I/O operations were made, though.

By applying this patch (1-3), about 56% performance improvement was made,
resulting in about 3.67% of no-pressure one.  Reduced pswpin of 1-3
compared to 1-2 let us see where this improvement came from.

Tests without Zswap Shrinker
----------------------------

Zswap shrinker is not enabled by default, so I ran the above test after
disabling zswap shrinker.  The results are as below.

    config            0       1-1     1-2      1-3
    perf_normalized   1.0000  0.0056  0.0185   0.0260
    perf_stdev_ratio  0.0467  0.0348  0.1832   0.3387
    zswpin            0       0       2506765  6049078
    zswpout           0       0       2534357  6115426
    zswpwb            0       0       0        0
    pswpin            0       463694  472978   0
    pswpout           0       686227  612149   0

The overall normalized performance of the different configs are very
similar to those of zswap shrinker enabled case.  By adding the memory
pressure, the performance was dropped to 0.56% of the original one.  By
enabling zswap without zswap shrinker, the performance was increased to
1.85% of the original one.  By applying this patch on it, the performance
was further increased to 2.6% of the original one.

Even though zswap shrinker is disabled, 1-2 shows high numbers of pswpin
and pswpout because the incompressible pages are directly swapped out.  In
the case of 1-3, it shows zero pswpin and pswpout since it saves
incompressible pages in the memory, and shows higher performance.

Note that the performance of 1-2 and 1-3 varies pretty much.  Standard
deviation of the performance for 1-2 was about 18.32% of the performance,
while that for 1-3 was about 33.87%.  Because zswap shrinker is disabled
and the memory pressure is induced by memory.high, the workload got
penalty_jiffies sleeps, and this resulted in the unstabilized performance.

Related Works
-------------

This is not an entirely new attempt.  Nhat Pham and Takero Funaki tried
very similar approaches in October 2023[2] and April 2024[3],
respectively.  The two approaches didn't get merged mainly due to the
metadata overhead concern.  I described why I think that shouldn't be a
problem for this change, which is automatically disabled when writeback is
disabled, at the beginning of this changelog.

This patch is not particularly different from those, and actually built
upon those.  I wrote this from scratch again, though.  Hence adding
Suggested-by tags for them.  Actually Nhat first suggested this to me
offlist.

Historically, writeback disabling was introduced partially as a way to
solve the LRU order issue.  Yosry pointed out[4] this is still suboptimal
when the incompressible pages are cold, since the incompressible pages
will continuously be tried to be zswapped out, and burn CPU cycles for
compression attempts that will anyway fail.  One imaginable solution for
the problem is reusing the swapped-out page and its struct page to store
in the zswap pool.  But that's out of the scope of this patch.

[1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
[2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
[3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com
[4] https://lore.kernel.org/CAJD7tkZXS-UJVAFfvxJ0nNgTzWBiqepPYA4hEozi01_qktkitg@mail.gmail.com

Signed-off-by: SeongJae Park <sj@kernel.org>
Suggested-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Takero Funaki <flintglass@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Changes from v4
(https://lore.kernel.org/20250819193404.46680-1-sj@kernel.org)
- Mark stored_incompressible_pages as static.
- Restore reject_compress_poor code path.
- Remove crypto_compress_fail debugfs file.

Changes from v3
(https://lore.kernel.org/20250815213020.89327-1-sj@kernel.org)
(discussions for changes from v3 were made on v2 thread)
- Drop the cumulated compression failure counter (compress_fail).
- Add a cumulated crypto-failure only counter (crypto_compress_fail).
- Add a not cumulated stored incompressible pages counter
  (stored_incompressible_pages).
- Cleanup compression failure handling code for readability.

Changes from v2
(https://lore.kernel.org/20250812170046.56468-1-sj@kernel.org)
- No code change bug changelog updates
  - Add zswap shrinker disabled case test results.
  - Fix a typo on changelog.
  - Add a clarification of intention of 0 and 1-1 test configs.

Changes from v1
(https://lore.kernel.org/20250807181616.1895-1-sj@kernel.org)
- Optimize out memcpy() per incompressible page saving, using
  k[un]map_local().
- Add a debugfs file for counting compression failures.
- Use a clear form of a ternary operation.
- Add the history of writeback disabling with a link.
- Wordsmith comments.

Changes from RFC v2
(https://lore.kernel.org/20250805002954.1496-1-sj@kernel.org)
- Fix race conditions at decompressed pages identification.
- Remove the parameter and make saving as-is the default behavior.
- Open-code main changes.
- Clarify there is no memory management changes on the cover letter.
- Remove 20% pressure case from test results, since it is arguably too
  extreme and only adds confusion.
- Drop RFC tag.

Changes from RFC v1
(https://lore.kernel.org/20250730234059.4603-1-sj@kernel.org)
- Consider PAGE_SIZE compression successes as failures.
- Use zpool for storing incompressible pages.
- Test with zswap shrinker enabled.
- Wordsmith changelog and comments.
- Add documentation of save_incompressible_pages parameter.

 mm/zswap.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 678caa981fc3..f865a930dc88 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -42,8 +42,10 @@
 /*********************************
 * statistics
 **********************************/
-/* The number of compressed pages currently stored in zswap */
+/* The number of pages currently stored in zswap */
 atomic_long_t zswap_stored_pages = ATOMIC_LONG_INIT(0);
+/* The number of incompressible pages currently stored in zswap */
+static atomic_long_t zswap_stored_incompressible_pages = ATOMIC_LONG_INIT(0);

 /*
  * The statistics below are not protected from concurrent access for
@@ -811,6 +813,8 @@ static void zswap_entry_free(struct zswap_entry *entry)
 		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
 		obj_cgroup_put(entry->objcg);
 	}
+	if (entry->length == PAGE_SIZE)
+		atomic_long_dec(&zswap_stored_incompressible_pages);
 	zswap_entry_cache_free(entry);
 	atomic_long_dec(&zswap_stored_pages);
 }
@@ -971,8 +975,26 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	 */
 	comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
 	dlen = acomp_ctx->req->dlen;
-	if (comp_ret)
-		goto unlock;
+
+	/*
+	 * If a page cannot be compressed into a size smaller than PAGE_SIZE,
+	 * save the content as is without a compression, to keep the LRU order
+	 * of writebacks.  If writeback is disabled, reject the page since it
+	 * only adds metadata overhead.  swap_writeout() will put the page back
+	 * to the active LRU list in the case.
+	 */
+	if (comp_ret || !dlen)
+		dlen = PAGE_SIZE;
+	if (dlen >= PAGE_SIZE) {
+		if (!mem_cgroup_zswap_writeback_enabled(
+					folio_memcg(page_folio(page)))) {
+			comp_ret = comp_ret ? comp_ret : -EINVAL;
+			goto unlock;
+		}
+		comp_ret = 0;
+		dlen = PAGE_SIZE;
+		dst = kmap_local_page(page);
+	}

 	zpool = pool->zpool;
 	gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
@@ -985,6 +1007,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	entry->length = dlen;

 unlock:
+	if (dst != acomp_ctx->buffer)
+		kunmap_local(dst);
 	if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
 		zswap_reject_compress_poor++;
 	else if (comp_ret)
@@ -1007,6 +1031,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
 	obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);

+	/* zswap entries of length PAGE_SIZE are not compressed. */
+	if (entry->length == PAGE_SIZE) {
+		memcpy_to_folio(folio, 0, obj, entry->length);
+		zpool_obj_read_end(zpool, entry->handle, obj);
+		acomp_ctx_put_unlock(acomp_ctx);
+		return true;
+	}
+
 	/*
 	 * zpool_obj_read_begin() might return a kmap address of highmem when
 	 * acomp_ctx->buffer is not used.  However, sg_init_one() does not
@@ -1519,6 +1551,8 @@ static bool zswap_store_page(struct page *page,
 		obj_cgroup_charge_zswap(objcg, entry->length);
 	}
 	atomic_long_inc(&zswap_stored_pages);
+	if (entry->length == PAGE_SIZE)
+		atomic_long_inc(&zswap_stored_incompressible_pages);

 	/*
 	 * We finish initializing the entry while it's already in xarray.
@@ -1787,6 +1821,14 @@ static int debugfs_get_stored_pages(void *data, u64 *val)
 }
 DEFINE_DEBUGFS_ATTRIBUTE(stored_pages_fops, debugfs_get_stored_pages, NULL, "%llu\n");

+static int debugfs_get_stored_incompressible_pages(void *data, u64 *val)
+{
+	*val = atomic_long_read(&zswap_stored_incompressible_pages);
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(stored_incompressible_pages_fops,
+		debugfs_get_stored_incompressible_pages, NULL, "%llu\n");
+
 static int zswap_debugfs_init(void)
 {
 	if (!debugfs_initialized())
@@ -1814,6 +1856,9 @@ static int zswap_debugfs_init(void)
 			    zswap_debugfs_root, NULL, &total_size_fops);
 	debugfs_create_file("stored_pages", 0444,
 			    zswap_debugfs_root, NULL, &stored_pages_fops);
+	debugfs_create_file("stored_incompressible_pages", 0444,
+			    zswap_debugfs_root, NULL,
+			    &stored_incompressible_pages_fops);

 	return 0;
 }

base-commit: 5c2375777435b4378bf05f57ffae72d8a3be011f
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-22 19:08 [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is SeongJae Park
@ 2025-08-26 15:52 ` Chris Li
  2025-08-27 16:18   ` SeongJae Park
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Li @ 2025-08-26 15:52 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Herbert Xu, Johannes Weiner,
	Nhat Pham, Yosry Ahmed, kernel-team, linux-kernel, linux-mm,
	Takero Funaki, David Hildenbrand, Baoquan He, Barry Song,
	Kairui Song

Hi SeongJae,

I did another pass review on it. This time with the editor so I saw
more source code context and had more feedback.
Mostly just nitpicks. See the detailed comments below.

On Fri, Aug 22, 2025 at 12:08 PM SeongJae Park <sj@kernel.org> wrote:
> @@ -971,8 +975,26 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>          */
>         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
>         dlen = acomp_ctx->req->dlen;
> -       if (comp_ret)
> -               goto unlock;
> +
> +       /*
> +        * If a page cannot be compressed into a size smaller than PAGE_SIZE,
> +        * save the content as is without a compression, to keep the LRU order
> +        * of writebacks.  If writeback is disabled, reject the page since it
> +        * only adds metadata overhead.  swap_writeout() will put the page back
> +        * to the active LRU list in the case.
> +        */
> +       if (comp_ret || !dlen)
> +               dlen = PAGE_SIZE;
> +       if (dlen >= PAGE_SIZE) {

I think these two if can be simplify as:

if (comp_ret || !dlen || dlen >= PAGE_SIZE) {
              dlen = PAGE_SIZE;

then you do the following check.
That way when goto unlock happens, you have dlen == PAGE_SIZE related
to my other feedback in kunmap_local()

> +               if (!mem_cgroup_zswap_writeback_enabled(
> +                                       folio_memcg(page_folio(page)))) {
> +                       comp_ret = comp_ret ? comp_ret : -EINVAL;
> +                       goto unlock;
> +               }
> +               comp_ret = 0;
> +               dlen = PAGE_SIZE;

Delete this line if you use the above suggestion on: dlen = PAGE_SIZE;

> +               dst = kmap_local_page(page);
> +       }
>
>         zpool = pool->zpool;
>         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> @@ -985,6 +1007,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         entry->length = dlen;
>
>  unlock:
> +       if (dst != acomp_ctx->buffer)
> +               kunmap_local(dst);

I think this has a hidden assumption that kmap_local_page() will
return a different value than acomp_ctx->buffer. That might be true. I
haven't checked the internals. Otherwise you are missing a
kunmap_local(). It also looks a bit strange in the sense that
kumap_local() should be paired with kmap_local_page() in the same
condition. The same condition is not obvious here. How about this to
make it more obvious and get rid of that assumption above:

if (dlen == PAGE_SIZE)
               kunmap_local(dst);

That assumes you also take my suggestion above to assign dlen =
PAGE_SIZE earlier.


>         if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
>                 zswap_reject_compress_poor++;
>         else if (comp_ret)
> @@ -1007,6 +1031,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
>         obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
>
> +       /* zswap entries of length PAGE_SIZE are not compressed. */
> +       if (entry->length == PAGE_SIZE) {
> +               memcpy_to_folio(folio, 0, obj, entry->length);

The following read_end() followed by acomp unlock() duplicates the
normal decompress ending sequence. It will create complications when
we modify the normal ending sequence in the future, we need to match
it here.How about just goto the ending sequence and share the same
return path as normal:

 +                  goto read_done;

Then insert the read_done label at ending sequence:

        dlen = acomp_ctx->req->dlen;

+ read_done:
        zpool_obj_read_end(zpool, entry->handle, obj);
        acomp_ctx_put_unlock(acomp_ctx);

If you adopt that, you also will need to init the comp_ret to "0"
instead of no init value in the beginning of the function:

        struct crypto_acomp_ctx *acomp_ctx;
-        int decomp_ret, dlen;
+       int decomp_ret = 0, dlen;
        u8 *src, *obj;


> +               zpool_obj_read_end(zpool, entry->handle, obj);
> +               acomp_ctx_put_unlock(acomp_ctx);
> +               return true;

Delete the above 3 lines inside uncompress if case.

Looks good otherwise.

Thanks for the good work.

Chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-26 15:52 ` Chris Li
@ 2025-08-27 16:18   ` SeongJae Park
  2025-08-27 17:33     ` Chris Li
  0 siblings, 1 reply; 8+ messages in thread
From: SeongJae Park @ 2025-08-27 16:18 UTC (permalink / raw)
  To: Chris Li
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Herbert Xu,
	Johannes Weiner, Nhat Pham, Yosry Ahmed, kernel-team,
	linux-kernel, linux-mm, Takero Funaki, David Hildenbrand,
	Baoquan He, Barry Song, Kairui Song

On Tue, 26 Aug 2025 08:52:35 -0700 Chris Li <chrisl@kernel.org> wrote:

> Hi SeongJae,
> 
> I did another pass review on it. This time with the editor so I saw
> more source code context and had more feedback.
> Mostly just nitpicks. See the detailed comments below.

Thank you for your review!

> 
> On Fri, Aug 22, 2025 at 12:08 PM SeongJae Park <sj@kernel.org> wrote:
> > @@ -971,8 +975,26 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >          */
> >         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
> >         dlen = acomp_ctx->req->dlen;
> > -       if (comp_ret)
> > -               goto unlock;
> > +
> > +       /*
> > +        * If a page cannot be compressed into a size smaller than PAGE_SIZE,
> > +        * save the content as is without a compression, to keep the LRU order
> > +        * of writebacks.  If writeback is disabled, reject the page since it
> > +        * only adds metadata overhead.  swap_writeout() will put the page back
> > +        * to the active LRU list in the case.
> > +        */
> > +       if (comp_ret || !dlen)
> > +               dlen = PAGE_SIZE;
> > +       if (dlen >= PAGE_SIZE) {
> 
> I think these two if can be simplify as:
> 
> if (comp_ret || !dlen || dlen >= PAGE_SIZE) {
>               dlen = PAGE_SIZE;
> 
> then you do the following check.
> That way when goto unlock happens, you have dlen = PAGE_SIZE related
> to my other feedback in kunmap_local()
> 
> > +               if (!mem_cgroup_zswap_writeback_enabled(
> > +                                       folio_memcg(page_folio(page)))) {
> > +                       comp_ret = comp_ret ? comp_ret : -EINVAL;
> > +                       goto unlock;
> > +               }
> > +               comp_ret = 0;
> > +               dlen = PAGE_SIZE;
> 
> Delete this line if you use the above suggestion on: dlen = PAGE_SIZE;

Thank you for nice suggestion!

> 
> > +               dst = kmap_local_page(page);
> > +       }
> >
> >         zpool = pool->zpool;
> >         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> > @@ -985,6 +1007,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >         entry->length = dlen;
> >
> >  unlock:
> > +       if (dst != acomp_ctx->buffer)
> > +               kunmap_local(dst);
> 
> I think this has a hidden assumption that kmap_local_page() will
> return a different value than acomp_ctx->buffer. That might be true. I
> haven't checked the internals. Otherwise you are missing a
> kunmap_local(). It also looks a bit strange in the sense that
> kumap_local() should be paired with kmap_local_page() in the same
> condition. The same condition is not obvious here.

Good point, I agree.

> How about this to
> make it more obvious and get rid of that assumption above:
> 
> if (dlen = PAGE_SIZE)
>                kunmap_local(dst);

However, if the execution reached here because compression failed and writeback
was disabled, kmap_local_page() wouldn't be called, so we could try to unmap
unmapped memory.

What do you think about adding another bool vairable for recording if
kunmap_local() need to be executed?  For example,

"""
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -952,6 +952,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
        struct zpool *zpool;
        gfp_t gfp;
        u8 *dst;
+       bool dst_need_unmap = false;

        acomp_ctx = acomp_ctx_get_cpu_lock(pool);
        dst = acomp_ctx->buffer;
@@ -994,6 +995,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
                comp_ret = 0;
                dlen = PAGE_SIZE;
                dst = kmap_local_page(page);
+               dst_need_unmap = true;
        }

        zpool = pool->zpool;
@@ -1007,7 +1009,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
        entry->length = dlen;

 unlock:
-       if (dst != acomp_ctx->buffer)
+       if (dst_need_unmap)
                kunmap_local(dst);
        if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
                zswap_reject_compress_poor++;
"""

> 
> That assumes you also take my suggestion above to assign dlen PAGE_SIZE earlier.
> 
> 
> >         if (comp_ret = -ENOSPC || alloc_ret = -ENOSPC)
> >                 zswap_reject_compress_poor++;
> >         else if (comp_ret)
> > @@ -1007,6 +1031,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
> >         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
> >         obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
> >
> > +       /* zswap entries of length PAGE_SIZE are not compressed. */
> > +       if (entry->length = PAGE_SIZE) {
> > +               memcpy_to_folio(folio, 0, obj, entry->length);
> 
> The following read_end() followed by acomp unlock() duplicates the
> normal decompress ending sequence. It will create complications when
> we modify the normal ending sequence in the future, we need to match
> it here.How about just goto the ending sequence and share the same
> return path as normal:
> 
>  +                  goto read_done;
> 
> Then insert the read_done label at ending sequence:
> 
>         dlen = acomp_ctx->req->dlen;
> 
> + read_done:
>         zpool_obj_read_end(zpool, entry->handle, obj);
>         acomp_ctx_put_unlock(acomp_ctx);

I agree your concern and this looks good to me :)

> 
> If you adopt that, you also will need to init the comp_ret to "0"
> instead of no init value in the beginning of the function:
> 
>         struct crypto_acomp_ctx *acomp_ctx;
> -        int decomp_ret, dlen;
> +       int decomp_ret = 0, dlen;
>         u8 *src, *obj;

We may also need to initialize 'dlen' as PAGE_SIZE ?
> 
> 
> > +               zpool_obj_read_end(zpool, entry->handle, obj);
> > +               acomp_ctx_put_unlock(acomp_ctx);
> > +               return true;
> 
> Delete the above 3 lines inside uncompress if case.
> 
> Looks good otherwise.
> 
> Thanks for the good work.

Thank you for your kind review and nice suggestions!  Since the change is
simple, I will post a fixup patch as reply to this, for adopting your
suggestions with my additional changes (adding dst_need_unmap bool variable on
zswap_compress(), and initializing dlen on zswap_decompress()) if you have no
objection or different suggestions for the my addition of the changes.  Please
let me know if you have any concern or other suggestions for my suggested
additional changes.


Thanks,
SJ

> 
> Chris
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-27 16:18   ` SeongJae Park
@ 2025-08-27 17:33     ` Chris Li
  2025-08-27 17:45       ` SeongJae Park
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Li @ 2025-08-27 17:33 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Herbert Xu, Johannes Weiner,
	Nhat Pham, Yosry Ahmed, kernel-team, linux-kernel, linux-mm,
	Takero Funaki, David Hildenbrand, Baoquan He, Barry Song,
	Kairui Song

On Wed, Aug 27, 2025 at 9:18 AM SeongJae Park <sj@kernel.org> wrote:
>
> On Tue, 26 Aug 2025 08:52:35 -0700 Chris Li <chrisl@kernel.org> wrote:
>
> > Hi SeongJae,
> >
> > I did another pass review on it. This time with the editor so I saw
> > more source code context and had more feedback.
> > Mostly just nitpicks. See the detailed comments below.
>
> Thank you for your review!

Thank you for the good work.

>
> >
> > On Fri, Aug 22, 2025 at 12:08 PM SeongJae Park <sj@kernel.org> wrote:
> > > @@ -971,8 +975,26 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > >          */
> > >         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
> > >         dlen = acomp_ctx->req->dlen;
> > > -       if (comp_ret)
> > > -               goto unlock;
> > > +
> > > +       /*
> > > +        * If a page cannot be compressed into a size smaller than PAGE_SIZE,
> > > +        * save the content as is without a compression, to keep the LRU order
> > > +        * of writebacks.  If writeback is disabled, reject the page since it
> > > +        * only adds metadata overhead.  swap_writeout() will put the page back
> > > +        * to the active LRU list in the case.
> > > +        */
> > > +       if (comp_ret || !dlen)
> > > +               dlen = PAGE_SIZE;
> > > +       if (dlen >= PAGE_SIZE) {
> >
> > I think these two if can be simplify as:
> >
> > if (comp_ret || !dlen || dlen >= PAGE_SIZE) {
> >               dlen = PAGE_SIZE;
> >
> > then you do the following check.
> > That way when goto unlock happens, you have dlen = PAGE_SIZE related
> > to my other feedback in kunmap_local()
> >
> > > +               if (!mem_cgroup_zswap_writeback_enabled(
> > > +                                       folio_memcg(page_folio(page)))) {
> > > +                       comp_ret = comp_ret ? comp_ret : -EINVAL;
> > > +                       goto unlock;
> > > +               }
> > > +               comp_ret = 0;
> > > +               dlen = PAGE_SIZE;
> >
> > Delete this line if you use the above suggestion on: dlen = PAGE_SIZE;
>
> Thank you for nice suggestion!
>
> >
> > > +               dst = kmap_local_page(page);
> > > +       }
> > >
> > >         zpool = pool->zpool;
> > >         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> > > @@ -985,6 +1007,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > >         entry->length = dlen;
> > >
> > >  unlock:
> > > +       if (dst != acomp_ctx->buffer)
> > > +               kunmap_local(dst);
> >
> > I think this has a hidden assumption that kmap_local_page() will
> > return a different value than acomp_ctx->buffer. That might be true. I
> > haven't checked the internals. Otherwise you are missing a
> > kunmap_local(). It also looks a bit strange in the sense that
> > kumap_local() should be paired with kmap_local_page() in the same
> > condition. The same condition is not obvious here.
>
> Good point, I agree.
>
> > How about this to
> > make it more obvious and get rid of that assumption above:
> >
> > if (dlen = PAGE_SIZE)
> >                kunmap_local(dst);
>
> However, if the execution reached here because compression failed and writeback
> was disabled, kmap_local_page() wouldn't be called, so we could try to unmap
> unmapped memory.

Ah, thanks for catching that. That is why having more reviewers the
bug can be obvious.

>
> What do you think about adding another bool vairable for recording if
> kunmap_local() need to be executed?  For example,

Sound good.

>
> """
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -952,6 +952,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         struct zpool *zpool;
>         gfp_t gfp;
>         u8 *dst;
> +       bool dst_need_unmap = false;

A bit nitpicky. That variable name is too long as a C local variable.
We want local auto variables to be short and sweet. That is why you
have "dst" rather than  "u8 *destination_compressed_buffer;"
The local variable name is too long and it can hurt the reading as well.
Can we have something shorter? e.g. "mapped" or "has_map"

>
>         acomp_ctx = acomp_ctx_get_cpu_lock(pool);
>         dst = acomp_ctx->buffer;
> @@ -994,6 +995,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>                 comp_ret = 0;
>                 dlen = PAGE_SIZE;
>                 dst = kmap_local_page(page);
> +               dst_need_unmap = true;
>         }
>
>         zpool = pool->zpool;
> @@ -1007,7 +1009,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         entry->length = dlen;
>
>  unlock:
> -       if (dst != acomp_ctx->buffer)
> +       if (dst_need_unmap)
>                 kunmap_local(dst);

Yes, that is good. Make the kmap and kumap very obvious as a pair.

>         if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
>                 zswap_reject_compress_poor++;
> """
>
> >
> > That assumes you also take my suggestion above to assign dlen PAGE_SIZE earlier.
> >
> >
> > >         if (comp_ret = -ENOSPC || alloc_ret = -ENOSPC)
> > >                 zswap_reject_compress_poor++;
> > >         else if (comp_ret)
> > > @@ -1007,6 +1031,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
> > >         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
> > >         obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
> > >
> > > +       /* zswap entries of length PAGE_SIZE are not compressed. */
> > > +       if (entry->length = PAGE_SIZE) {
> > > +               memcpy_to_folio(folio, 0, obj, entry->length);
> >
> > The following read_end() followed by acomp unlock() duplicates the
> > normal decompress ending sequence. It will create complications when
> > we modify the normal ending sequence in the future, we need to match
> > it here.How about just goto the ending sequence and share the same
> > return path as normal:
> >
> >  +                  goto read_done;
> >
> > Then insert the read_done label at ending sequence:
> >
> >         dlen = acomp_ctx->req->dlen;
> >
> > + read_done:
> >         zpool_obj_read_end(zpool, entry->handle, obj);
> >         acomp_ctx_put_unlock(acomp_ctx);
>
> I agree your concern and this looks good to me :)
>
> >
> > If you adopt that, you also will need to init the comp_ret to "0"
> > instead of no init value in the beginning of the function:
> >
> >         struct crypto_acomp_ctx *acomp_ctx;
> > -        int decomp_ret, dlen;
> > +       int decomp_ret = 0, dlen;
> >         u8 *src, *obj;
>
> We may also need to initialize 'dlen' as PAGE_SIZE ?

If there is a code path can lead to dlen use not initialized value? If
not then we don't have to assign it.

> >
> >
> > > +               zpool_obj_read_end(zpool, entry->handle, obj);
> > > +               acomp_ctx_put_unlock(acomp_ctx);
> > > +               return true;
> >
> > Delete the above 3 lines inside uncompress if case.
> >
> > Looks good otherwise.
> >
> > Thanks for the good work.
>
> Thank you for your kind review and nice suggestions!  Since the change is
> simple, I will post a fixup patch as reply to this, for adopting your
> suggestions with my additional changes (adding dst_need_unmap bool variable on
> zswap_compress(), and initializing dlen on zswap_decompress()) if you have no
> objection or different suggestions for the my addition of the changes.  Please
> let me know if you have any concern or other suggestions for my suggested
> additional changes.

I am fine with a fix up patch or a new version. Does not matter to me
in the slightest. I care more about the final landing code itself more
than which vehicle to carry the code.  Assume you do all those fix up
you mention above, you can have my Ack in your fix up:

Acked-by: Chris Li <chrisl@kernel.org>

Chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-27 17:33     ` Chris Li
@ 2025-08-27 17:45       ` SeongJae Park
  2025-08-27 21:16         ` Chris Li
  0 siblings, 1 reply; 8+ messages in thread
From: SeongJae Park @ 2025-08-27 17:45 UTC (permalink / raw)
  To: Chris Li
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Herbert Xu,
	Johannes Weiner, Nhat Pham, Yosry Ahmed, kernel-team,
	linux-kernel, linux-mm, Takero Funaki, David Hildenbrand,
	Baoquan He, Barry Song, Kairui Song

On Wed, 27 Aug 2025 10:33:38 -0700 Chris Li <chrisl@kernel.org> wrote:

> On Wed, Aug 27, 2025 at 9:18 AM SeongJae Park <sj@kernel.org> wrote:
> >
> > On Tue, 26 Aug 2025 08:52:35 -0700 Chris Li <chrisl@kernel.org> wrote:
> >
> > > Hi SeongJae,
> > >
> > > I did another pass review on it. This time with the editor so I saw
> > > more source code context and had more feedback.
> > > Mostly just nitpicks. See the detailed comments below.
> >
> > Thank you for your review!
> 
> Thank you for the good work.
> 
> >
> > >
> > > On Fri, Aug 22, 2025 at 12:08 PM SeongJae Park <sj@kernel.org> wrote:
[...]
> > """
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -952,6 +952,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >         struct zpool *zpool;
> >         gfp_t gfp;
> >         u8 *dst;
> > +       bool dst_need_unmap = false;
> 
> A bit nitpicky. That variable name is too long as a C local variable.
> We want local auto variables to be short and sweet. That is why you
> have "dst" rather than  "u8 *destination_compressed_buffer;"
> The local variable name is too long and it can hurt the reading as well.
> Can we have something shorter? e.g. "mapped" or "has_map"

I agree your points, and thank you for suggestions.  I will take "mapped".

[...]
> > > > @@ -1007,6 +1031,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
> > > >         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
> > > >         obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
> > > >
> > > > +       /* zswap entries of length PAGE_SIZE are not compressed. */
> > > > +       if (entry->length = PAGE_SIZE) {
> > > > +               memcpy_to_folio(folio, 0, obj, entry->length);
> > >
> > > The following read_end() followed by acomp unlock() duplicates the
> > > normal decompress ending sequence. It will create complications when
> > > we modify the normal ending sequence in the future, we need to match
> > > it here.How about just goto the ending sequence and share the same
> > > return path as normal:
> > >
> > >  +                  goto read_done;
> > >
> > > Then insert the read_done label at ending sequence:
> > >
> > >         dlen = acomp_ctx->req->dlen;
> > >
> > > + read_done:
> > >         zpool_obj_read_end(zpool, entry->handle, obj);
> > >         acomp_ctx_put_unlock(acomp_ctx);
> >
> > I agree your concern and this looks good to me :)
> >
> > >
> > > If you adopt that, you also will need to init the comp_ret to "0"
> > > instead of no init value in the beginning of the function:
> > >
> > >         struct crypto_acomp_ctx *acomp_ctx;
> > > -        int decomp_ret, dlen;
> > > +       int decomp_ret = 0, dlen;
> > >         u8 *src, *obj;
> >
> > We may also need to initialize 'dlen' as PAGE_SIZE ?
> 
> If there is a code path can lead to dlen use not initialized value? If
> not then we don't have to assign it.

The success return path of zswap_decompress() checks dlen together with
decomp_ret as below.  So I think we need to initialize dlen, too.  Please let
me know if I'm missing something.

    if (!decomp_ret && dlen == PAGE_SIZE)
            return true;

[...]
> > Thank you for your kind review and nice suggestions!  Since the change is
> > simple, I will post a fixup patch as reply to this, for adopting your
> > suggestions with my additional changes (adding dst_need_unmap bool variable on
> > zswap_compress(), and initializing dlen on zswap_decompress()) if you have no
> > objection or different suggestions for the my addition of the changes.  Please
> > let me know if you have any concern or other suggestions for my suggested
> > additional changes.
> 
> I am fine with a fix up patch or a new version. Does not matter to me
> in the slightest. I care more about the final landing code itself more
> than which vehicle to carry the code.  Assume you do all those fix up
> you mention above, you can have my Ack in your fix up:
> 
> Acked-by: Chris Li <chrisl@kernel.org>

Thank you, Chris!

I will post the fixup by tomorrow morning (Pacific time) unless I
hear other opinions or find my mistakes on the above plan by tonight.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-27 17:45       ` SeongJae Park
@ 2025-08-27 21:16         ` Chris Li
  2025-08-28 16:39           ` SeongJae Park
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Li @ 2025-08-27 21:16 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Herbert Xu, Johannes Weiner,
	Nhat Pham, Yosry Ahmed, kernel-team, linux-kernel, linux-mm,
	Takero Funaki, David Hildenbrand, Baoquan He, Barry Song,
	Kairui Song

On Wed, Aug 27, 2025 at 10:45 AM SeongJae Park <sj@kernel.org> wrote:
> > > >
> > > > On Fri, Aug 22, 2025 at 12:08 PM SeongJae Park <sj@kernel.org> wrote:
> [...]
> > > """
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -952,6 +952,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > >         struct zpool *zpool;
> > >         gfp_t gfp;
> > >         u8 *dst;
> > > +       bool dst_need_unmap = false;
> >
> > A bit nitpicky. That variable name is too long as a C local variable.
> > We want local auto variables to be short and sweet. That is why you
> > have "dst" rather than  "u8 *destination_compressed_buffer;"
> > The local variable name is too long and it can hurt the reading as well.
> > Can we have something shorter? e.g. "mapped" or "has_map"
>
> I agree your points, and thank you for suggestions.  I will take "mapped".

Thank you.

> > > We may also need to initialize 'dlen' as PAGE_SIZE ?
> >
> > If there is a code path can lead to dlen use not initialized value? If
> > not then we don't have to assign it.
>
> The success return path of zswap_decompress() checks dlen together with
> decomp_ret as below.  So I think we need to initialize dlen, too.  Please let
> me know if I'm missing something.

I normally don't worry about those, the compiler will complain if I
get it wrong. The compiler might have false positives, but should not
have false negatives because the compiler can see all the incoming
edges of the basic block. It is a trivial graph reachability problem
if we allow false positives reports.

Anyway, I just opened the editor to check again. Yes, if we goto the
read_done, the if condition using dlen can introduce a new incoming
edge that has len uninitialized value to be used. Yes, need to
initialize dlen == PAGE_SIZE or you initialize it at the memcpy of
page.

> I will post the fixup by tomorrow morning (Pacific time) unless I
> hear other opinions or find my mistakes on the above plan by tonight.

You are too humble, that is the normal reviewing process. You can take
your time.

Chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-27 21:16         ` Chris Li
@ 2025-08-28 16:39           ` SeongJae Park
  2025-08-29 18:14             ` Chris Li
  0 siblings, 1 reply; 8+ messages in thread
From: SeongJae Park @ 2025-08-28 16:39 UTC (permalink / raw)
  To: Chris Li
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Herbert Xu,
	Johannes Weiner, Nhat Pham, Yosry Ahmed, kernel-team,
	linux-kernel, linux-mm, Takero Funaki, David Hildenbrand,
	Baoquan He, Barry Song, Kairui Song

On Wed, 27 Aug 2025 14:16:37 -0700 Chris Li <chrisl@kernel.org> wrote:

> On Wed, Aug 27, 2025 at 10:45 AM SeongJae Park <sj@kernel.org> wrote:
[...]
> Anyway, I just opened the editor to check again. Yes, if we goto the
> read_done, the if condition using dlen can introduce a new incoming
> edge that has len uninitialized value to be used. Yes, need to
> initialize dlen = PAGE_SIZE or you initialize it at the memcpy of
> page.

Thank you for confirming.

> 
> > I will post the fixup by tomorrow morning (Pacific time) unless I
> > hear other opinions or find my mistakes on the above plan by tonight.
> 
> You are too humble, that is the normal reviewing process. You can take
> your time.

No worry, I just wanted to give you an expectation of the time line :)

Andrew, could you please pick the below attached cleanup patch as a fixup of
the original patch?  Please feel free to let me know if you prefer posting a
new version of the patch instead or any other approach.


Thanks,
SJ

[...]

==== Attachment 0 (0001-mm-zswap-cleanup-incompressible-pages-handling-code.patch) ====
From 867c3789fa80ac163427f1d7804bf2c8667684ca Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj@kernel.org>
Date: Wed, 27 Aug 2025 13:18:38 -0700
Subject: [PATCH] mm/zswap: cleanup incompressible pages handling code

Following Chris Li's suggestions[1], make the code easier to read and
manage.

[1] https://lore.kernel.org/CACePvbWGPApYr7G29FzbmWzRw-BJE39WH7kUHSaHs+Lnw8=-qQ@mail.gmail.com

Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Chris Li <chrisl@kernel.org>
---
 mm/zswap.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f865a930dc88..e5e1f5687f5e 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -952,6 +952,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	struct zpool *zpool;
 	gfp_t gfp;
 	u8 *dst;
+	bool mapped = false;
 
 	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
 	dst = acomp_ctx->buffer;
@@ -983,9 +984,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	 * only adds metadata overhead.  swap_writeout() will put the page back
 	 * to the active LRU list in the case.
 	 */
-	if (comp_ret || !dlen)
+	if (comp_ret || !dlen || dlen >= PAGE_SIZE) {
 		dlen = PAGE_SIZE;
-	if (dlen >= PAGE_SIZE) {
 		if (!mem_cgroup_zswap_writeback_enabled(
 					folio_memcg(page_folio(page)))) {
 			comp_ret = comp_ret ? comp_ret : -EINVAL;
@@ -994,6 +994,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 		comp_ret = 0;
 		dlen = PAGE_SIZE;
 		dst = kmap_local_page(page);
+		mapped = true;
 	}
 
 	zpool = pool->zpool;
@@ -1007,7 +1008,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	entry->length = dlen;
 
 unlock:
-	if (dst != acomp_ctx->buffer)
+	if (mapped)
 		kunmap_local(dst);
 	if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
 		zswap_reject_compress_poor++;
@@ -1025,7 +1026,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	struct zpool *zpool = entry->pool->zpool;
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
-	int decomp_ret, dlen;
+	int decomp_ret = 0, dlen = PAGE_SIZE;
 	u8 *src, *obj;
 
 	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
@@ -1034,9 +1035,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	/* zswap entries of length PAGE_SIZE are not compressed. */
 	if (entry->length == PAGE_SIZE) {
 		memcpy_to_folio(folio, 0, obj, entry->length);
-		zpool_obj_read_end(zpool, entry->handle, obj);
-		acomp_ctx_put_unlock(acomp_ctx);
-		return true;
+		goto read_done;
 	}
 
 	/*
@@ -1059,6 +1058,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	decomp_ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
 	dlen = acomp_ctx->req->dlen;
 
+read_done:
 	zpool_obj_read_end(zpool, entry->handle, obj);
 	acomp_ctx_put_unlock(acomp_ctx);
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-28 16:39           ` SeongJae Park
@ 2025-08-29 18:14             ` Chris Li
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Li @ 2025-08-29 18:14 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Herbert Xu, Johannes Weiner,
	Nhat Pham, Yosry Ahmed, kernel-team, linux-kernel, linux-mm,
	Takero Funaki, David Hildenbrand, Baoquan He, Barry Song,
	Kairui Song

Acked-by: Chris Li <chrisl@kernel.org>

Chris

On Thu, Aug 28, 2025 at 9:39 AM SeongJae Park <sj@kernel.org> wrote:
>
> On Wed, 27 Aug 2025 14:16:37 -0700 Chris Li <chrisl@kernel.org> wrote:
>
> > On Wed, Aug 27, 2025 at 10:45 AM SeongJae Park <sj@kernel.org> wrote:
> [...]
> > Anyway, I just opened the editor to check again. Yes, if we goto the
> > read_done, the if condition using dlen can introduce a new incoming
> > edge that has len uninitialized value to be used. Yes, need to
> > initialize dlen = PAGE_SIZE or you initialize it at the memcpy of
> > page.
>
> Thank you for confirming.
>
> >
> > > I will post the fixup by tomorrow morning (Pacific time) unless I
> > > hear other opinions or find my mistakes on the above plan by tonight.
> >
> > You are too humble, that is the normal reviewing process. You can take
> > your time.
>
> No worry, I just wanted to give you an expectation of the time line :)
>
> Andrew, could you please pick the below attached cleanup patch as a fixup of
> the original patch?  Please feel free to let me know if you prefer posting a
> new version of the patch instead or any other approach.
>
>
> Thanks,
> SJ
>
> [...]
>
> ==== Attachment 0 (0001-mm-zswap-cleanup-incompressible-pages-handling-code.patch) ====
> From 867c3789fa80ac163427f1d7804bf2c8667684ca Mon Sep 17 00:00:00 2001
> From: SeongJae Park <sj@kernel.org>
> Date: Wed, 27 Aug 2025 13:18:38 -0700
> Subject: [PATCH] mm/zswap: cleanup incompressible pages handling code
>
> Following Chris Li's suggestions[1], make the code easier to read and
> manage.
>
> [1] https://lore.kernel.org/CACePvbWGPApYr7G29FzbmWzRw-BJE39WH7kUHSaHs+Lnw8=-qQ@mail.gmail.com
>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> Acked-by: Chris Li <chrisl@kernel.org>
> ---
>  mm/zswap.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f865a930dc88..e5e1f5687f5e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -952,6 +952,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         struct zpool *zpool;
>         gfp_t gfp;
>         u8 *dst;
> +       bool mapped = false;
>
>         acomp_ctx = acomp_ctx_get_cpu_lock(pool);
>         dst = acomp_ctx->buffer;
> @@ -983,9 +984,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>          * only adds metadata overhead.  swap_writeout() will put the page back
>          * to the active LRU list in the case.
>          */
> -       if (comp_ret || !dlen)
> +       if (comp_ret || !dlen || dlen >= PAGE_SIZE) {
>                 dlen = PAGE_SIZE;
> -       if (dlen >= PAGE_SIZE) {
>                 if (!mem_cgroup_zswap_writeback_enabled(
>                                         folio_memcg(page_folio(page)))) {
>                         comp_ret = comp_ret ? comp_ret : -EINVAL;
> @@ -994,6 +994,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>                 comp_ret = 0;
>                 dlen = PAGE_SIZE;
>                 dst = kmap_local_page(page);
> +               mapped = true;
>         }
>
>         zpool = pool->zpool;
> @@ -1007,7 +1008,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         entry->length = dlen;
>
>  unlock:
> -       if (dst != acomp_ctx->buffer)
> +       if (mapped)
>                 kunmap_local(dst);
>         if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
>                 zswap_reject_compress_poor++;
> @@ -1025,7 +1026,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         struct zpool *zpool = entry->pool->zpool;
>         struct scatterlist input, output;
>         struct crypto_acomp_ctx *acomp_ctx;
> -       int decomp_ret, dlen;
> +       int decomp_ret = 0, dlen = PAGE_SIZE;
>         u8 *src, *obj;
>
>         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
> @@ -1034,9 +1035,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         /* zswap entries of length PAGE_SIZE are not compressed. */
>         if (entry->length == PAGE_SIZE) {
>                 memcpy_to_folio(folio, 0, obj, entry->length);
> -               zpool_obj_read_end(zpool, entry->handle, obj);
> -               acomp_ctx_put_unlock(acomp_ctx);
> -               return true;
> +               goto read_done;
>         }
>
>         /*
> @@ -1059,6 +1058,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         decomp_ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
>         dlen = acomp_ctx->req->dlen;
>
> +read_done:
>         zpool_obj_read_end(zpool, entry->handle, obj);
>         acomp_ctx_put_unlock(acomp_ctx);
>
> --
> 2.39.5
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-29 18:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 19:08 [PATCH v5] mm/zswap: store <PAGE_SIZE compression failed page as-is SeongJae Park
2025-08-26 15:52 ` Chris Li
2025-08-27 16:18   ` SeongJae Park
2025-08-27 17:33     ` Chris Li
2025-08-27 17:45       ` SeongJae Park
2025-08-27 21:16         ` Chris Li
2025-08-28 16:39           ` SeongJae Park
2025-08-29 18:14             ` Chris Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).