[RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
@ 2025-08-05  0:29 SeongJae Park
  2025-08-05 10:47 ` David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: SeongJae Park @ 2025-08-05  0:29 UTC (permalink / raw)
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Chengming Zhou,
	David Hildenbrand, Johannes Weiner, Jonathan Corbet,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Nhat Pham,
	Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed, kernel-team,
	linux-doc, linux-kernel, linux-mm, Takero Funaki

When zswap writeback is enabled and it fails compressing a given page,
the page is swapped out to the backing swap device.  This behavior
breaks the zswap's writeback LRU order, and hence users can experience
unexpected latency spikes.  If the page is compressed without failure,
but results in a size of PAGE_SIZE, the LRU order is kept, but the
decompression overhead for loading the page back on the later access is
unnecessary.

Keep the LRU order and optimize unnecessary decompression overheads in
the cases, by storing the original content in zpool as-is.  The length
field of zswap_entry will be set appropriately, as PAGE_SIZE,  Hence
whether it is saved as-is or not (whether decompression is unnecessary)
is identified by 'zswap_entry->length == PAGE_SIZE'.

So this change is not increasing per zswap entry metadata overhead.  But
as the number of incompressible pages increases, total zswap metadata
overhead is proportionally increased.  The overhead should not be
problematic in usual cases, since the zswap metadata for single zswap
entry is much smaller than PAGE_SIZE, and in common zswap use cases
there should be a sufficient amount of compressible pages.  Also it can
be mitigated by the zswap writeback.

When a severe memory pressure comes from memcg's memory.high, storing
incompressible pages as-is may result in reducing accounted memory
footprint slower, since the footprint will be reduced only after the
zswap writeback kicks in.  This can incur higher penalty_jiffies and
degrade the performance.  Arguably this is just a wrong setup, but we
don't want to introduce unnecessary surprises.  Add a parameter, namely
'save_incompressible_pages', to turn the feature on/off as users want.
It is turned off by default.

When the writeback is disabled, the additional overhead could be
problematic.  For the case, keep the current behavior that just returns
the failure and let swap_writeout() put the page back to the active LRU
list in the case.  It is known to be suboptimal when the incompressible
pages are cold, since the incompressible pages will continuously be
tried to be zswapped out, and burn CPU cycles for compression attempts
that will anyway fails.  One imaginable solution for the problem is
reusing the swapped-out page and its struct page to store in the zswap
pool.  But that's out of the scope of this patch.

Tests
-----

I tested this patch using a simple self-written microbenchmark that is
available at GitHub[1].  You can reproduce the test I did by executing
run_tests.sh of the repo on your system.  Note that the repo's
documentation is not good as of this writing, so you may need to read
and use the code.

The basic test scenario is simple.  Run a test program making artificial
accesses to memory having artificial content under memory.high-set
memory limit and measure how many accesses were made in given time.

The test program repeatedly and randomly access three anonymous memory
regions.  The regions are all 500 MiB size, and accessed in the same
probability.  Two of those are filled up with a simple content that can
easily be compressed, while the remaining one is filled up with a
content that read from /dev/urandom, which is easy to fail at
compressing to <PAGE_SIZE size.  The program runs for two minutes and
prints out the number of accesses made every five seconds.

The test script runs the program under below seven configurations.

- 0: memory.high is set to 2 GiB, zswap is disabled.
- 1-1: memory.high is set to 1350 MiB, zswap is disabled.
- 1-2: Same to 1-1, but zswap is enabled.
- 1-3: Same to 1-2, but save_incompressible_pages is turned on.
- 2-1: memory.high is set to 1200 MiB, zswap is disabled.
- 2-2: Same to 2-1, but zswap is enabled.
- 2-3: Same to 2-2, but save_incompressible_pages is turned on.

For all zswap enabled case, zswap shrinker is enabled.

Configuration '0' is for showing the original memory performance.
Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
zswap, and this patch under a level of memory pressure (~10% of working
set).

Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
show those under a severe level of memory pressure (~20% of the working
set).

Because the per-5 seconds performance is not very reliable, I measured
the average of that for the last one minute period of the test program
run.  I also measured a few vmstat counters including zswpin, zswpout,
zswpwb, pswpin and pswpout during the test runs.

The measurement results are as below.  To save space, I show performance
numbers that are normalized to that of the configuration '0' (no memory
pressure), only.  The averaged accesses per 5 seconds of configuration
'0' was 36493417.75.

    config            0       1-1     1-2      1-3      2-1     2-2      2-3
    perf_normalized   1.0000  0.0057  0.0235   0.0367   0.0031  0.0122   0.0077
    perf_stdev_ratio  0.0582  0.0652  0.0167   0.0346   0.0404  0.0145   0.0613
    zswpin            0       0       3548424  1999335  0       2912972  1612517
    zswpout           0       0       3588817  2361689  0       2996588  2029884
    zswpwb            0       0       10214    340270   0       34625    382117
    pswpin            0       485806  772038   340967   540476  874909   790418
    pswpout           0       649543  144773   340270   692666  275178   382117

'perf_normalized' is the performance metric, normalized to that of
configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
deviation of the averaged data points, as a ratio to the averaged metric
value.  For example, configuration '0' performance was showing 5.8%
stdev.  Configurations 1-1 and 1-3 were having about 6.5% and 6.1%
stdev.  Also the results were highly variable between multiple runs.  So
this result is not very stable but just showing ball park figures.
Please keep this in your mind when reading these results.

Under about 10% of working set memory pressure, the performance was
dropped to about 0.57% of no-pressure one, when the normal swap is used
(1-1).  Actually ~10% working set pressure is not a mild one, at least
on this test setup.

By turning zswap on (1-2), the performance was improved about 4x,
resulting in about 2.35% of no-pressure one.  Because of the
incompressible pages in the third memory region, a significant amount of
(non-zswap) swap I/O operations were made, though.

By enabling the incompressible pages handling feature that is introduced
by this patch (1-3), about 56% performance improvement was made,
resulting in about 3.67% of no-pressure one.  Reduced pswpin of 1-3
compared to 1-2 let us see where this improvement came from.

Under about 20% of working set memory pressure, which could be extreme,
the performance drops down to 0.31% of no-pressure one when only the
normal swap is used (2-1).  Enabling zswap significantly improves it, up
to 1.22%, though again showing a significant number of (non-zswap) swap
I/O due to incompressible pages.

Enabling the incompressible pages handling feature of this patch (2-3)
didn't reduce non-zswap swap I/O, because the memory pressure is too
severe to let nearly all zswap pages including the incompressible pages
written back by zswap shrinker.  And because the memory usage is not
dropped as soon as incompressible pages are swapped out but only after
those are written back by shrinker, memory.high apparently applied more
penalty_jiffies.  As a result, the performance became even worse than
2-2 about 36.88%, resulting in 0.07% of the no-pressure one.

20% of working set memory pressure is pretty extreme, but anyway the
incompressible pages handling feature could make it worse in certain
setups.  Hence add the parameter for turning the feature on/off as
needed, and disable it by default.

Related Works
-------------

This is not an entirely new attempt.  Nhat Pham and Takero Funaki tried
very similar approaches in October 2023[2] and April 2024[3],
respectively.  The two approaches didn't get merged mainly due to the
metadata overhead concern.  I described why I think that shouldn't be a
problem for this change, which is automatically disabled when writeback
is disabled, at the beginning of this changelog.

This patch is not particularly different from those, and actually built
upon those.  I wrote this from scratch again, though.  Hence adding
Suggested-by tags for them.  Actually Nhat first suggested this to me
offlist.

[1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
[2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
[3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com

Suggested-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Takero Funaki <flintglass@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Changes from RFC v1
(https://lore.kernel.org/20250730234059.4603-1-sj@kernel.org)
- Consider PAGE_SIZE-resulting compression successes as failures.
- Use zpool for storing incompressible pages.
- Test with zswap shrinker enabled.
- Wordsmith changelog and comments.
- Add documentation of save_incompressible_pages parameter.

 Documentation/admin-guide/mm/zswap.rst |  9 +++++
 mm/zswap.c                             | 53 +++++++++++++++++++++++++-
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index c2806d051b92..20eae0734491 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -142,6 +142,15 @@ User can enable it as follows::
 This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
 selected.

+If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
+beneficial to save the content as is without compression, to keep the LRU
+order.  Users can enable this behavior, as follows::
+
+  echo Y > /sys/module/zswap/parameters/save_incompressible_pages
+
+This is disabled by default, and doesn't change behavior of zswap writeback
+disabled case.
+
 A debugfs interface is provided for various statistic about pool size, number
 of pages stored, same-value filled pages and various counters for the reasons
 pages are rejected.
diff --git a/mm/zswap.c b/mm/zswap.c
index 7e02c760955f..6e196c9a4dba 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -129,6 +129,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
 		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
 module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);

+/* Enable/disable incompressible pages storing */
+static bool zswap_save_incompressible_pages;
+module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
+		bool, 0644);
+
 bool zswap_is_enabled(void)
 {
 	return zswap_enabled;
@@ -937,6 +942,29 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
 	mutex_unlock(&acomp_ctx->mutex);
 }

+/*
+ * Determine whether to save given page as-is.
+ *
+ * If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
+ * beneficial to saving the content as is without compression, to keep the LRU
+ * order.  This can increase memory overhead from metadata, but in common zswap
+ * use cases where there are sufficient amount of compressible pages, the
+ * overhead should be not critical, and can be mitigated by the writeback.
+ * Also, the decompression overhead is optimized.
+ *
+ * When the writeback is disabled, however, the additional overhead could be
+ * problematic.  For the case, just return the failure.  swap_writeout() will
+ * put the page back to the active LRU list in the case.
+ */
+static bool zswap_save_as_is(int comp_ret, unsigned int dlen,
+		struct page *page)
+{
+	return zswap_save_incompressible_pages &&
+			(comp_ret || dlen == PAGE_SIZE) &&
+			mem_cgroup_zswap_writeback_enabled(
+					folio_memcg(page_folio(page)));
+}
+
 static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 			   struct zswap_pool *pool)
 {
@@ -976,8 +1004,13 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	 */
 	comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
 	dlen = acomp_ctx->req->dlen;
-	if (comp_ret)
+	if (zswap_save_as_is(comp_ret, dlen, page)) {
+		comp_ret = 0;
+		dlen = PAGE_SIZE;
+		memcpy_from_page(dst, page, 0, dlen);
+	} else if (comp_ret) {
 		goto unlock;
+	}

 	zpool = pool->zpool;
 	gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
@@ -1001,6 +1034,17 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	return comp_ret == 0 && alloc_ret == 0;
 }

+/*
+ * If save_incompressible_pages is set and writeback is enabled, incompressible
+ * pages are saved as is without compression.  For more details, refer to the
+ * comments of zswap_save_as_is().
+ */
+static bool zswap_saved_as_is(struct zswap_entry *entry, struct folio *folio)
+{
+	return entry->length == PAGE_SIZE && zswap_save_incompressible_pages &&
+		mem_cgroup_zswap_writeback_enabled(folio_memcg(folio));
+}
+
 static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 {
 	struct zpool *zpool = entry->pool->zpool;
@@ -1012,6 +1056,13 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
 	obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);

+	if (zswap_saved_as_is(entry, folio)) {
+		memcpy_to_folio(folio, 0, obj, entry->length);
+		zpool_obj_read_end(zpool, entry->handle, obj);
+		acomp_ctx_put_unlock(acomp_ctx);
+		return true;
+	}
+
 	/*
 	 * zpool_obj_read_begin() might return a kmap address of highmem when
 	 * acomp_ctx->buffer is not used.  However, sg_init_one() does not

base-commit: d19f69751d55ef3883569c119d4b2ea3d6a0e39f
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05  0:29 [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is SeongJae Park
@ 2025-08-05 10:47 ` David Hildenbrand
  2025-08-05 16:56   ` Nhat Pham
  2025-08-05 18:43   ` SeongJae Park
  2025-08-05 18:25 ` Nhat Pham
  2025-08-06 16:32 ` Johannes Weiner
  2 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-08-05 10:47 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, Andrew Morton, Chengming Zhou, Johannes Weiner,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Nhat Pham, Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed,
	kernel-team, linux-doc, linux-kernel, linux-mm, Takero Funaki

On 05.08.25 02:29, SeongJae Park wrote:
> When zswap writeback is enabled and it fails compressing a given page,
> the page is swapped out to the backing swap device.  This behavior
> breaks the zswap's writeback LRU order, and hence users can experience
> unexpected latency spikes.  If the page is compressed without failure,
> but results in a size of PAGE_SIZE, the LRU order is kept, but the
> decompression overhead for loading the page back on the later access is
> unnecessary.
> 
> Keep the LRU order and optimize unnecessary decompression overheads in
> the cases, by storing the original content in zpool as-is.

Does this have any effect on the movability of the given page? IOW, does 
page migration etc. still work when we store an ordinary page of an 
shmem/anon folio here?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05 10:47 ` David Hildenbrand
@ 2025-08-05 16:56   ` Nhat Pham
  2025-08-06 20:14     ` David Hildenbrand
  2025-08-05 18:43   ` SeongJae Park
  1 sibling, 1 reply; 14+ messages in thread
From: Nhat Pham @ 2025-08-05 16:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Chengming Zhou,
	Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed,
	kernel-team, linux-doc, linux-kernel, linux-mm, Takero Funaki

On Tue, Aug 5, 2025 at 3:47 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 05.08.25 02:29, SeongJae Park wrote:
> > When zswap writeback is enabled and it fails compressing a given page,
> > the page is swapped out to the backing swap device.  This behavior
> > breaks the zswap's writeback LRU order, and hence users can experience
> > unexpected latency spikes.  If the page is compressed without failure,
> > but results in a size of PAGE_SIZE, the LRU order is kept, but the
> > decompression overhead for loading the page back on the later access is
> > unnecessary.
> >
> > Keep the LRU order and optimize unnecessary decompression overheads in
> > the cases, by storing the original content in zpool as-is.
>
> Does this have any effect on the movability of the given page? IOW, does
> page migration etc. still work when we store an ordinary page of an
> shmem/anon folio here?

Good question. This depends on the backend allocator of zswap, but the
only backend allocator remaining (zsmalloc) does implement page
migration.

It's why we insisted on using zpool/zsmalloc to handle the
incompressibility case as well:

https://lore.kernel.org/all/CAKEwX=NC65XCkmX1YzivEJtPc+sEJ3pLHUsYhF60QJnk_OtpVw@mail.gmail.com/

>
> --
> Cheers,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05  0:29 [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is SeongJae Park
  2025-08-05 10:47 ` David Hildenbrand
@ 2025-08-05 18:25 ` Nhat Pham
  2025-08-05 18:31   ` Nhat Pham
  2025-08-05 18:51   ` SeongJae Park
  2025-08-06 16:32 ` Johannes Weiner
  2 siblings, 2 replies; 14+ messages in thread
From: Nhat Pham @ 2025-08-05 18:25 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, Andrew Morton, Chengming Zhou, David Hildenbrand,
	Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed,
	kernel-team, linux-doc, linux-kernel, linux-mm, Takero Funaki

On Mon, Aug 4, 2025 at 5:30 PM SeongJae Park <sj@kernel.org> wrote:
>
> When zswap writeback is enabled and it fails compressing a given page,
> the page is swapped out to the backing swap device.  This behavior
> breaks the zswap's writeback LRU order, and hence users can experience
> unexpected latency spikes.  If the page is compressed without failure,
> but results in a size of PAGE_SIZE, the LRU order is kept, but the
> decompression overhead for loading the page back on the later access is
> unnecessary.
>
> Keep the LRU order and optimize unnecessary decompression overheads in
> the cases, by storing the original content in zpool as-is.  The length
> field of zswap_entry will be set appropriately, as PAGE_SIZE,  Hence
> whether it is saved as-is or not (whether decompression is unnecessary)
> is identified by 'zswap_entry->length == PAGE_SIZE'.
>
> So this change is not increasing per zswap entry metadata overhead.  But
> as the number of incompressible pages increases, total zswap metadata
> overhead is proportionally increased.  The overhead should not be
> problematic in usual cases, since the zswap metadata for single zswap
> entry is much smaller than PAGE_SIZE, and in common zswap use cases
> there should be a sufficient amount of compressible pages.  Also it can
> be mitigated by the zswap writeback.
>
> When a severe memory pressure comes from memcg's memory.high, storing
> incompressible pages as-is may result in reducing accounted memory
> footprint slower, since the footprint will be reduced only after the
> zswap writeback kicks in.  This can incur higher penalty_jiffies and
> degrade the performance.  Arguably this is just a wrong setup, but we
> don't want to introduce unnecessary surprises.  Add a parameter, namely
> 'save_incompressible_pages', to turn the feature on/off as users want.
> It is turned off by default.
>
> When the writeback is disabled, the additional overhead could be
> problematic.  For the case, keep the current behavior that just returns
> the failure and let swap_writeout() put the page back to the active LRU
> list in the case.  It is known to be suboptimal when the incompressible
> pages are cold, since the incompressible pages will continuously be
> tried to be zswapped out, and burn CPU cycles for compression attempts
> that will anyway fails.  One imaginable solution for the problem is
> reusing the swapped-out page and its struct page to store in the zswap
> pool.  But that's out of the scope of this patch.
>
> Tests
> -----
>
> I tested this patch using a simple self-written microbenchmark that is
> available at GitHub[1].  You can reproduce the test I did by executing
> run_tests.sh of the repo on your system.  Note that the repo's
> documentation is not good as of this writing, so you may need to read
> and use the code.
>
> The basic test scenario is simple.  Run a test program making artificial
> accesses to memory having artificial content under memory.high-set
> memory limit and measure how many accesses were made in given time.
>
> The test program repeatedly and randomly access three anonymous memory
> regions.  The regions are all 500 MiB size, and accessed in the same
> probability.  Two of those are filled up with a simple content that can
> easily be compressed, while the remaining one is filled up with a
> content that read from /dev/urandom, which is easy to fail at
> compressing to <PAGE_SIZE size.  The program runs for two minutes and
> prints out the number of accesses made every five seconds.
>
> The test script runs the program under below seven configurations.
>
> - 0: memory.high is set to 2 GiB, zswap is disabled.
> - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> - 1-2: Same to 1-1, but zswap is enabled.
> - 1-3: Same to 1-2, but save_incompressible_pages is turned on.
> - 2-1: memory.high is set to 1200 MiB, zswap is disabled.
> - 2-2: Same to 2-1, but zswap is enabled.
> - 2-3: Same to 2-2, but save_incompressible_pages is turned on.
>
> For all zswap enabled case, zswap shrinker is enabled.
>
> Configuration '0' is for showing the original memory performance.
> Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
> zswap, and this patch under a level of memory pressure (~10% of working
> set).
>
> Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
> show those under a severe level of memory pressure (~20% of the working
> set).
>
> Because the per-5 seconds performance is not very reliable, I measured
> the average of that for the last one minute period of the test program
> run.  I also measured a few vmstat counters including zswpin, zswpout,
> zswpwb, pswpin and pswpout during the test runs.
>
> The measurement results are as below.  To save space, I show performance
> numbers that are normalized to that of the configuration '0' (no memory
> pressure), only.  The averaged accesses per 5 seconds of configuration
> '0' was 36493417.75.
>
>     config            0       1-1     1-2      1-3      2-1     2-2      2-3
>     perf_normalized   1.0000  0.0057  0.0235   0.0367   0.0031  0.0122   0.0077
>     perf_stdev_ratio  0.0582  0.0652  0.0167   0.0346   0.0404  0.0145   0.0613
>     zswpin            0       0       3548424  1999335  0       2912972  1612517
>     zswpout           0       0       3588817  2361689  0       2996588  2029884
>     zswpwb            0       0       10214    340270   0       34625    382117
>     pswpin            0       485806  772038   340967   540476  874909   790418
>     pswpout           0       649543  144773   340270   692666  275178   382117
>
> 'perf_normalized' is the performance metric, normalized to that of
> configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
> deviation of the averaged data points, as a ratio to the averaged metric
> value.  For example, configuration '0' performance was showing 5.8%
> stdev.  Configurations 1-1 and 1-3 were having about 6.5% and 6.1%
> stdev.  Also the results were highly variable between multiple runs.  So
> this result is not very stable but just showing ball park figures.
> Please keep this in your mind when reading these results.
>
> Under about 10% of working set memory pressure, the performance was
> dropped to about 0.57% of no-pressure one, when the normal swap is used
> (1-1).  Actually ~10% working set pressure is not a mild one, at least
> on this test setup.
>
> By turning zswap on (1-2), the performance was improved about 4x,
> resulting in about 2.35% of no-pressure one.  Because of the
> incompressible pages in the third memory region, a significant amount of
> (non-zswap) swap I/O operations were made, though.
>
> By enabling the incompressible pages handling feature that is introduced
> by this patch (1-3), about 56% performance improvement was made,
> resulting in about 3.67% of no-pressure one.  Reduced pswpin of 1-3
> compared to 1-2 let us see where this improvement came from.
>
> Under about 20% of working set memory pressure, which could be extreme,
> the performance drops down to 0.31% of no-pressure one when only the
> normal swap is used (2-1).  Enabling zswap significantly improves it, up
> to 1.22%, though again showing a significant number of (non-zswap) swap
> I/O due to incompressible pages.
>
> Enabling the incompressible pages handling feature of this patch (2-3)
> didn't reduce non-zswap swap I/O, because the memory pressure is too
> severe to let nearly all zswap pages including the incompressible pages
> written back by zswap shrinker.  And because the memory usage is not
> dropped as soon as incompressible pages are swapped out but only after
> those are written back by shrinker, memory.high apparently applied more
> penalty_jiffies.  As a result, the performance became even worse than
> 2-2 about 36.88%, resulting in 0.07% of the no-pressure one.
>
> 20% of working set memory pressure is pretty extreme, but anyway the
> incompressible pages handling feature could make it worse in certain
> setups.  Hence add the parameter for turning the feature on/off as
> needed, and disable it by default.
>
> Related Works
> -------------
>
> This is not an entirely new attempt.  Nhat Pham and Takero Funaki tried
> very similar approaches in October 2023[2] and April 2024[3],
> respectively.  The two approaches didn't get merged mainly due to the
> metadata overhead concern.  I described why I think that shouldn't be a
> problem for this change, which is automatically disabled when writeback
> is disabled, at the beginning of this changelog.
>
> This patch is not particularly different from those, and actually built
> upon those.  I wrote this from scratch again, though.  Hence adding
> Suggested-by tags for them.  Actually Nhat first suggested this to me
> offlist.
>
> [1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
> [2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
> [3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com
>
> Suggested-by: Nhat Pham <nphamcs@gmail.com>
> Suggested-by: Takero Funaki <flintglass@gmail.com>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
> Changes from RFC v1
> (https://lore.kernel.org/20250730234059.4603-1-sj@kernel.org)
> - Consider PAGE_SIZE-resulting compression successes as failures.
> - Use zpool for storing incompressible pages.
> - Test with zswap shrinker enabled.
> - Wordsmith changelog and comments.
> - Add documentation of save_incompressible_pages parameter.
>
>  Documentation/admin-guide/mm/zswap.rst |  9 +++++
>  mm/zswap.c                             | 53 +++++++++++++++++++++++++-
>  2 files changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
> index c2806d051b92..20eae0734491 100644
> --- a/Documentation/admin-guide/mm/zswap.rst
> +++ b/Documentation/admin-guide/mm/zswap.rst
> @@ -142,6 +142,15 @@ User can enable it as follows::
>  This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
>  selected.
>
> +If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> +beneficial to save the content as is without compression, to keep the LRU
> +order.  Users can enable this behavior, as follows::
> +
> +  echo Y > /sys/module/zswap/parameters/save_incompressible_pages
> +
> +This is disabled by default, and doesn't change behavior of zswap writeback
> +disabled case.
> +
>  A debugfs interface is provided for various statistic about pool size, number
>  of pages stored, same-value filled pages and various counters for the reasons
>  pages are rejected.
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 7e02c760955f..6e196c9a4dba 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -129,6 +129,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
>                 CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
>  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
>
> +/* Enable/disable incompressible pages storing */
> +static bool zswap_save_incompressible_pages;
> +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> +               bool, 0644);
> +
>  bool zswap_is_enabled(void)
>  {
>         return zswap_enabled;
> @@ -937,6 +942,29 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
>         mutex_unlock(&acomp_ctx->mutex);
>  }
>
> +/*
> + * Determine whether to save given page as-is.
> + *
> + * If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> + * beneficial to saving the content as is without compression, to keep the LRU
> + * order.  This can increase memory overhead from metadata, but in common zswap
> + * use cases where there are sufficient amount of compressible pages, the
> + * overhead should be not critical, and can be mitigated by the writeback.
> + * Also, the decompression overhead is optimized.
> + *
> + * When the writeback is disabled, however, the additional overhead could be
> + * problematic.  For the case, just return the failure.  swap_writeout() will
> + * put the page back to the active LRU list in the case.
> + */
> +static bool zswap_save_as_is(int comp_ret, unsigned int dlen,
> +               struct page *page)
> +{
> +       return zswap_save_incompressible_pages &&
> +                       (comp_ret || dlen == PAGE_SIZE) &&
> +                       mem_cgroup_zswap_writeback_enabled(
> +                                       folio_memcg(page_folio(page)));
> +}
> +
>  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>                            struct zswap_pool *pool)
>  {
> @@ -976,8 +1004,13 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>          */
>         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
>         dlen = acomp_ctx->req->dlen;
> -       if (comp_ret)
> +       if (zswap_save_as_is(comp_ret, dlen, page)) {
> +               comp_ret = 0;
> +               dlen = PAGE_SIZE;
> +               memcpy_from_page(dst, page, 0, dlen);
> +       } else if (comp_ret) {
>                 goto unlock;
> +       }
>
>         zpool = pool->zpool;
>         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> @@ -1001,6 +1034,17 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>         return comp_ret == 0 && alloc_ret == 0;
>  }
>
> +/*
> + * If save_incompressible_pages is set and writeback is enabled, incompressible
> + * pages are saved as is without compression.  For more details, refer to the
> + * comments of zswap_save_as_is().
> + */
> +static bool zswap_saved_as_is(struct zswap_entry *entry, struct folio *folio)
> +{
> +       return entry->length == PAGE_SIZE && zswap_save_incompressible_pages &&
> +               mem_cgroup_zswap_writeback_enabled(folio_memcg(folio));
> +}

Actually, this might not be safe either :(

What if we have the following sequence:
1. Initially, the cgroup is writeback enabled. We encounter an
incompressible page, and store it as-is in the zswap pool.
2. Some userspace agent (systemd or whatever) runs, and disables zswap
writeback on the cgroup.
3. At fault time, zswap_saved_as_is() returns false, so we'll treat
the page-sized stored object as compressed, and attempt to decompress
it. This is a memory corruption.

I think you can trigger a similar bug, if you enable
zswap_save_incompressible_pages initially, then disable it later on.

I think you have to do the following:
1. At store time, if comp_ret or dlen == PAGE_SIZE, treat it as
compression failure. This means: saving as-is when writeback enabled,
and rejecting when writeback disabled. Basically:

if (!comp_ret || dlen == PAGE_SIZE) {
    if (zswap_save_incompressible_pages &&
mem_cgroup_zswap_writeback_enabled(folio_memcg(page_folio(folio)))) {
        /* save as-is */
    } else {
       /* rejects */
    }

}

2. At load time, just check that dlen == PAGE_SIZE. We NEVER store
PAGE_SIZE "compressed" page, so we can safely assume that it is the
original, uncompressed data.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05 18:25 ` Nhat Pham
@ 2025-08-05 18:31   ` Nhat Pham
  2025-08-05 18:51   ` SeongJae Park
  1 sibling, 0 replies; 14+ messages in thread
From: Nhat Pham @ 2025-08-05 18:31 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, Andrew Morton, Chengming Zhou, David Hildenbrand,
	Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed,
	kernel-team, linux-doc, linux-kernel, linux-mm, Takero Funaki

On Tue, Aug 5, 2025 at 11:25 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> 1. At store time, if comp_ret or dlen == PAGE_SIZE, treat it as
> compression failure. This means: saving as-is when writeback enabled,
> and rejecting when writeback disabled. Basically:
>
> if (!comp_ret || dlen == PAGE_SIZE) {

Again, comp_ret || dlen == PAGE_SIZE here. Not sure why I kept making
the same brainfart, lol.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05 10:47 ` David Hildenbrand
  2025-08-05 16:56   ` Nhat Pham
@ 2025-08-05 18:43   ` SeongJae Park
  1 sibling, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-08-05 18:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Chengming Zhou,
	Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Nhat Pham, Suren Baghdasaryan, Vlastimil Babka,
	Yosry Ahmed, kernel-team, linux-doc, linux-kernel, linux-mm,
	Takero Funaki

On Tue, 5 Aug 2025 12:47:36 +0200 David Hildenbrand <david@redhat.com> wrote:

> On 05.08.25 02:29, SeongJae Park wrote:
> > When zswap writeback is enabled and it fails compressing a given page,
> > the page is swapped out to the backing swap device.  This behavior
> > breaks the zswap's writeback LRU order, and hence users can experience
> > unexpected latency spikes.  If the page is compressed without failure,
> > but results in a size of PAGE_SIZE, the LRU order is kept, but the
> > decompression overhead for loading the page back on the later access is
> > unnecessary.
> > 
> > Keep the LRU order and optimize unnecessary decompression overheads in
> > the cases, by storing the original content in zpool as-is.
> 
> Does this have any effect on the movability of the given page? IOW, does 
> page migration etc. still work when we store an ordinary page of an 
> shmem/anon folio here?

Thank you for good question.  As Nhat also replied, there is no effect on the
movability.

In more detail, the handling of the given (incompressible) page is nearly same
to compressible pages.  Zswap asks zpool to allocate memory, copy the content
of the page into new newly allocated memory, and let the page be marked as
zswapped out and hence be freed.  Only difference of incompressible pages
handling is that the content is copied into the zpool memory without
compression.  All other properties including movability are same to the case of
compressible pages, so this patch doesn't introduce movability difference.

In the previous version of this patch, I was manually allocating memory without
zpool's help, and hence other people including Nhat kindly enlightened me that
it can introduce migratability difference.  Hence this version uses zpool.

[1] https://lore.kernel.org/761a2899-6fd9-4bfe-aeaf-23bce0baa0f1@redhat.com

Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05 18:25 ` Nhat Pham
  2025-08-05 18:31   ` Nhat Pham
@ 2025-08-05 18:51   ` SeongJae Park
  1 sibling, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-08-05 18:51 UTC (permalink / raw)
  To: Nhat Pham
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Chengming Zhou,
	David Hildenbrand, Johannes Weiner, Jonathan Corbet,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Yosry Ahmed, kernel-team, linux-doc,
	linux-kernel, linux-mm, Takero Funaki

On Tue, 5 Aug 2025 11:25:38 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> On Mon, Aug 4, 2025 at 5:30 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > When zswap writeback is enabled and it fails compressing a given page,
> > the page is swapped out to the backing swap device.  This behavior
> > breaks the zswap's writeback LRU order, and hence users can experience
> > unexpected latency spikes.  If the page is compressed without failure,
> > but results in a size of PAGE_SIZE, the LRU order is kept, but the
> > decompression overhead for loading the page back on the later access is
> > unnecessary.
> >
> > Keep the LRU order and optimize unnecessary decompression overheads in
> > the cases, by storing the original content in zpool as-is.  The length
> > field of zswap_entry will be set appropriately, as PAGE_SIZE,  Hence
> > whether it is saved as-is or not (whether decompression is unnecessary)
> > is identified by 'zswap_entry->length == PAGE_SIZE'.
> >
> > So this change is not increasing per zswap entry metadata overhead.  But
> > as the number of incompressible pages increases, total zswap metadata
> > overhead is proportionally increased.  The overhead should not be
> > problematic in usual cases, since the zswap metadata for single zswap
> > entry is much smaller than PAGE_SIZE, and in common zswap use cases
> > there should be a sufficient amount of compressible pages.  Also it can
> > be mitigated by the zswap writeback.
> >
> > When a severe memory pressure comes from memcg's memory.high, storing
> > incompressible pages as-is may result in reducing accounted memory
> > footprint slower, since the footprint will be reduced only after the
> > zswap writeback kicks in.  This can incur higher penalty_jiffies and
> > degrade the performance.  Arguably this is just a wrong setup, but we
> > don't want to introduce unnecessary surprises.  Add a parameter, namely
> > 'save_incompressible_pages', to turn the feature on/off as users want.
> > It is turned off by default.
> >
> > When the writeback is disabled, the additional overhead could be
> > problematic.  For the case, keep the current behavior that just returns
> > the failure and let swap_writeout() put the page back to the active LRU
> > list in the case.  It is known to be suboptimal when the incompressible
> > pages are cold, since the incompressible pages will continuously be
> > tried to be zswapped out, and burn CPU cycles for compression attempts
> > that will anyway fails.  One imaginable solution for the problem is
> > reusing the swapped-out page and its struct page to store in the zswap
> > pool.  But that's out of the scope of this patch.
> >
> > Tests
> > -----
> >
> > I tested this patch using a simple self-written microbenchmark that is
> > available at GitHub[1].  You can reproduce the test I did by executing
> > run_tests.sh of the repo on your system.  Note that the repo's
> > documentation is not good as of this writing, so you may need to read
> > and use the code.
> >
> > The basic test scenario is simple.  Run a test program making artificial
> > accesses to memory having artificial content under memory.high-set
> > memory limit and measure how many accesses were made in given time.
> >
> > The test program repeatedly and randomly access three anonymous memory
> > regions.  The regions are all 500 MiB size, and accessed in the same
> > probability.  Two of those are filled up with a simple content that can
> > easily be compressed, while the remaining one is filled up with a
> > content that read from /dev/urandom, which is easy to fail at
> > compressing to <PAGE_SIZE size.  The program runs for two minutes and
> > prints out the number of accesses made every five seconds.
> >
> > The test script runs the program under below seven configurations.
> >
> > - 0: memory.high is set to 2 GiB, zswap is disabled.
> > - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> > - 1-2: Same to 1-1, but zswap is enabled.
> > - 1-3: Same to 1-2, but save_incompressible_pages is turned on.
> > - 2-1: memory.high is set to 1200 MiB, zswap is disabled.
> > - 2-2: Same to 2-1, but zswap is enabled.
> > - 2-3: Same to 2-2, but save_incompressible_pages is turned on.
> >
> > For all zswap enabled case, zswap shrinker is enabled.
> >
> > Configuration '0' is for showing the original memory performance.
> > Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
> > zswap, and this patch under a level of memory pressure (~10% of working
> > set).
> >
> > Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
> > show those under a severe level of memory pressure (~20% of the working
> > set).
> >
> > Because the per-5 seconds performance is not very reliable, I measured
> > the average of that for the last one minute period of the test program
> > run.  I also measured a few vmstat counters including zswpin, zswpout,
> > zswpwb, pswpin and pswpout during the test runs.
> >
> > The measurement results are as below.  To save space, I show performance
> > numbers that are normalized to that of the configuration '0' (no memory
> > pressure), only.  The averaged accesses per 5 seconds of configuration
> > '0' was 36493417.75.
> >
> >     config            0       1-1     1-2      1-3      2-1     2-2      2-3
> >     perf_normalized   1.0000  0.0057  0.0235   0.0367   0.0031  0.0122   0.0077
> >     perf_stdev_ratio  0.0582  0.0652  0.0167   0.0346   0.0404  0.0145   0.0613
> >     zswpin            0       0       3548424  1999335  0       2912972  1612517
> >     zswpout           0       0       3588817  2361689  0       2996588  2029884
> >     zswpwb            0       0       10214    340270   0       34625    382117
> >     pswpin            0       485806  772038   340967   540476  874909   790418
> >     pswpout           0       649543  144773   340270   692666  275178   382117
> >
> > 'perf_normalized' is the performance metric, normalized to that of
> > configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
> > deviation of the averaged data points, as a ratio to the averaged metric
> > value.  For example, configuration '0' performance was showing 5.8%
> > stdev.  Configurations 1-1 and 1-3 were having about 6.5% and 6.1%
> > stdev.  Also the results were highly variable between multiple runs.  So
> > this result is not very stable but just showing ball park figures.
> > Please keep this in your mind when reading these results.
> >
> > Under about 10% of working set memory pressure, the performance was
> > dropped to about 0.57% of no-pressure one, when the normal swap is used
> > (1-1).  Actually ~10% working set pressure is not a mild one, at least
> > on this test setup.
> >
> > By turning zswap on (1-2), the performance was improved about 4x,
> > resulting in about 2.35% of no-pressure one.  Because of the
> > incompressible pages in the third memory region, a significant amount of
> > (non-zswap) swap I/O operations were made, though.
> >
> > By enabling the incompressible pages handling feature that is introduced
> > by this patch (1-3), about 56% performance improvement was made,
> > resulting in about 3.67% of no-pressure one.  Reduced pswpin of 1-3
> > compared to 1-2 let us see where this improvement came from.
> >
> > Under about 20% of working set memory pressure, which could be extreme,
> > the performance drops down to 0.31% of no-pressure one when only the
> > normal swap is used (2-1).  Enabling zswap significantly improves it, up
> > to 1.22%, though again showing a significant number of (non-zswap) swap
> > I/O due to incompressible pages.
> >
> > Enabling the incompressible pages handling feature of this patch (2-3)
> > didn't reduce non-zswap swap I/O, because the memory pressure is too
> > severe to let nearly all zswap pages including the incompressible pages
> > written back by zswap shrinker.  And because the memory usage is not
> > dropped as soon as incompressible pages are swapped out but only after
> > those are written back by shrinker, memory.high apparently applied more
> > penalty_jiffies.  As a result, the performance became even worse than
> > 2-2 about 36.88%, resulting in 0.07% of the no-pressure one.
> >
> > 20% of working set memory pressure is pretty extreme, but anyway the
> > incompressible pages handling feature could make it worse in certain
> > setups.  Hence add the parameter for turning the feature on/off as
> > needed, and disable it by default.
> >
> > Related Works
> > -------------
> >
> > This is not an entirely new attempt.  Nhat Pham and Takero Funaki tried
> > very similar approaches in October 2023[2] and April 2024[3],
> > respectively.  The two approaches didn't get merged mainly due to the
> > metadata overhead concern.  I described why I think that shouldn't be a
> > problem for this change, which is automatically disabled when writeback
> > is disabled, at the beginning of this changelog.
> >
> > This patch is not particularly different from those, and actually built
> > upon those.  I wrote this from scratch again, though.  Hence adding
> > Suggested-by tags for them.  Actually Nhat first suggested this to me
> > offlist.
> >
> > [1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
> > [2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
> > [3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com
> >
> > Suggested-by: Nhat Pham <nphamcs@gmail.com>
> > Suggested-by: Takero Funaki <flintglass@gmail.com>
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> > ---
> > Changes from RFC v1
> > (https://lore.kernel.org/20250730234059.4603-1-sj@kernel.org)
> > - Consider PAGE_SIZE-resulting compression successes as failures.
> > - Use zpool for storing incompressible pages.
> > - Test with zswap shrinker enabled.
> > - Wordsmith changelog and comments.
> > - Add documentation of save_incompressible_pages parameter.
> >
> >  Documentation/admin-guide/mm/zswap.rst |  9 +++++
> >  mm/zswap.c                             | 53 +++++++++++++++++++++++++-
> >  2 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
> > index c2806d051b92..20eae0734491 100644
> > --- a/Documentation/admin-guide/mm/zswap.rst
> > +++ b/Documentation/admin-guide/mm/zswap.rst
> > @@ -142,6 +142,15 @@ User can enable it as follows::
> >  This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
> >  selected.
> >
> > +If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> > +beneficial to save the content as is without compression, to keep the LRU
> > +order.  Users can enable this behavior, as follows::
> > +
> > +  echo Y > /sys/module/zswap/parameters/save_incompressible_pages
> > +
> > +This is disabled by default, and doesn't change behavior of zswap writeback
> > +disabled case.
> > +
> >  A debugfs interface is provided for various statistic about pool size, number
> >  of pages stored, same-value filled pages and various counters for the reasons
> >  pages are rejected.
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 7e02c760955f..6e196c9a4dba 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -129,6 +129,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
> >                 CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
> >  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
> >
> > +/* Enable/disable incompressible pages storing */
> > +static bool zswap_save_incompressible_pages;
> > +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> > +               bool, 0644);
> > +
> >  bool zswap_is_enabled(void)
> >  {
> >         return zswap_enabled;
> > @@ -937,6 +942,29 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
> >         mutex_unlock(&acomp_ctx->mutex);
> >  }
> >
> > +/*
> > + * Determine whether to save given page as-is.
> > + *
> > + * If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> > + * beneficial to saving the content as is without compression, to keep the LRU
> > + * order.  This can increase memory overhead from metadata, but in common zswap
> > + * use cases where there are sufficient amount of compressible pages, the
> > + * overhead should be not critical, and can be mitigated by the writeback.
> > + * Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic.  For the case, just return the failure.  swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static bool zswap_save_as_is(int comp_ret, unsigned int dlen,
> > +               struct page *page)
> > +{
> > +       return zswap_save_incompressible_pages &&
> > +                       (comp_ret || dlen == PAGE_SIZE) &&
> > +                       mem_cgroup_zswap_writeback_enabled(
> > +                                       folio_memcg(page_folio(page)));
> > +}
> > +
> >  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >                            struct zswap_pool *pool)
> >  {
> > @@ -976,8 +1004,13 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >          */
> >         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
> >         dlen = acomp_ctx->req->dlen;
> > -       if (comp_ret)
> > +       if (zswap_save_as_is(comp_ret, dlen, page)) {
> > +               comp_ret = 0;
> > +               dlen = PAGE_SIZE;
> > +               memcpy_from_page(dst, page, 0, dlen);
> > +       } else if (comp_ret) {
> >                 goto unlock;
> > +       }
> >
> >         zpool = pool->zpool;
> >         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> > @@ -1001,6 +1034,17 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >         return comp_ret == 0 && alloc_ret == 0;
> >  }
> >
> > +/*
> > + * If save_incompressible_pages is set and writeback is enabled, incompressible
> > + * pages are saved as is without compression.  For more details, refer to the
> > + * comments of zswap_save_as_is().
> > + */
> > +static bool zswap_saved_as_is(struct zswap_entry *entry, struct folio *folio)
> > +{
> > +       return entry->length == PAGE_SIZE && zswap_save_incompressible_pages &&
> > +               mem_cgroup_zswap_writeback_enabled(folio_memcg(folio));
> > +}
> 
> Actually, this might not be safe either :(
> 
> What if we have the following sequence:
> 1. Initially, the cgroup is writeback enabled. We encounter an
> incompressible page, and store it as-is in the zswap pool.
> 2. Some userspace agent (systemd or whatever) runs, and disables zswap
> writeback on the cgroup.
> 3. At fault time, zswap_saved_as_is() returns false, so we'll treat
> the page-sized stored object as compressed, and attempt to decompress
> it. This is a memory corruption.
> 
> I think you can trigger a similar bug, if you enable
> zswap_save_incompressible_pages initially, then disable it later on.

Nice catch!  Thank you for catching this and giving this nice explanation.  I
agree your points.

> 
> I think you have to do the following:
> 1. At store time, if comp_ret or dlen == PAGE_SIZE, treat it as
> compression failure. This means: saving as-is when writeback enabled,
> and rejecting when writeback disabled. Basically:
> 
> if (!comp_ret || dlen == PAGE_SIZE) {

I saw your reply correcting this to '(comp_ret || dlen == PAGE_SIZE)', and that
makes sense to me.

>     if (zswap_save_incompressible_pages &&
> mem_cgroup_zswap_writeback_enabled(folio_memcg(page_folio(folio)))) {
>         /* save as-is */
>     } else {
>        /* rejects */
>     }
> 
> }
> 
> 2. At load time, just check that dlen == PAGE_SIZE. We NEVER store
> PAGE_SIZE "compressed" page, so we can safely assume that it is the
> original, uncompressed data.

Thank you for even further giving me this nice suggestion.  Again this makes
sense to me.  I will make this change on the next version.


Thanks,
SJ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05  0:29 [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is SeongJae Park
  2025-08-05 10:47 ` David Hildenbrand
  2025-08-05 18:25 ` Nhat Pham
@ 2025-08-06 16:32 ` Johannes Weiner
  2025-08-06 16:56   ` SeongJae Park
  2 siblings, 1 reply; 14+ messages in thread
From: Johannes Weiner @ 2025-08-06 16:32 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, Andrew Morton, Chengming Zhou, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Nhat Pham, Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed,
	kernel-team, linux-doc, linux-kernel, linux-mm, Takero Funaki

Hi SJ,

Overall this looks good to me. On top of the feedback provided by
others, I have a few comments below.

On Mon, Aug 04, 2025 at 05:29:54PM -0700, SeongJae Park wrote:
> @@ -142,6 +142,15 @@ User can enable it as follows::
>  This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
>  selected.
>  
> +If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> +beneficial to save the content as is without compression, to keep the LRU
> +order.  Users can enable this behavior, as follows::
> +
> +  echo Y > /sys/module/zswap/parameters/save_incompressible_pages
> +
> +This is disabled by default, and doesn't change behavior of zswap writeback
> +disabled case.
> +
>  A debugfs interface is provided for various statistic about pool size, number
>  of pages stored, same-value filled pages and various counters for the reasons
>  pages are rejected.
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 7e02c760955f..6e196c9a4dba 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -129,6 +129,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
>  		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
>  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
>  
> +/* Enable/disable incompressible pages storing */
> +static bool zswap_save_incompressible_pages;
> +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> +		bool, 0644);

Please remove the knob and just make it the default behavior.

With writeback enabled, the current behavior is quite weird:
compressible pages to into zswap, then get written to swap in LRU
order. Incompressible pages get sent to swap directly. This is an
obvious age inversion, and the performance problems this has caused
was a motivating factor for the ability to disable writeback.

I don't think there is much point in keeping that as an officially
supported mode.

> @@ -937,6 +942,29 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
>  	mutex_unlock(&acomp_ctx->mutex);
>  }
>  
> +/*
> + * Determine whether to save given page as-is.
> + *
> + * If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> + * beneficial to saving the content as is without compression, to keep the LRU
> + * order.  This can increase memory overhead from metadata, but in common zswap
> + * use cases where there are sufficient amount of compressible pages, the
> + * overhead should be not critical, and can be mitigated by the writeback.
> + * Also, the decompression overhead is optimized.
> + *
> + * When the writeback is disabled, however, the additional overhead could be
> + * problematic.  For the case, just return the failure.  swap_writeout() will
> + * put the page back to the active LRU list in the case.
> + */
> +static bool zswap_save_as_is(int comp_ret, unsigned int dlen,
> +		struct page *page)
> +{
> +	return zswap_save_incompressible_pages &&
> +			(comp_ret || dlen == PAGE_SIZE) &&
> +			mem_cgroup_zswap_writeback_enabled(
> +					folio_memcg(page_folio(page)));
> +}
> +
>  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>  			   struct zswap_pool *pool)
>  {

> @@ -1001,6 +1034,17 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>  	return comp_ret == 0 && alloc_ret == 0;
>  }
>  
> +/*
> + * If save_incompressible_pages is set and writeback is enabled, incompressible
> + * pages are saved as is without compression.  For more details, refer to the
> + * comments of zswap_save_as_is().
> + */
> +static bool zswap_saved_as_is(struct zswap_entry *entry, struct folio *folio)
> +{
> +	return entry->length == PAGE_SIZE && zswap_save_incompressible_pages &&
> +		mem_cgroup_zswap_writeback_enabled(folio_memcg(folio));
> +}

I don't think there will be much left of these helpers once you
incorporate Nhat's feedback, but please open-code these in either
case. They're single user, hide what's going on, and the similar names
doesn't do them any favors.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-06 16:32 ` Johannes Weiner
@ 2025-08-06 16:56   ` SeongJae Park
  0 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-08-06 16:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Chengming Zhou,
	David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Nhat Pham, Suren Baghdasaryan, Vlastimil Babka,
	Yosry Ahmed, kernel-team, linux-doc, linux-kernel, linux-mm,
	Takero Funaki

On Wed, 6 Aug 2025 12:32:21 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi SJ,
> 
> Overall this looks good to me. On top of the feedback provided by
> others, I have a few comments below.
> 
> On Mon, Aug 04, 2025 at 05:29:54PM -0700, SeongJae Park wrote:
> > @@ -142,6 +142,15 @@ User can enable it as follows::
> >  This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
> >  selected.
> >  
> > +If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> > +beneficial to save the content as is without compression, to keep the LRU
> > +order.  Users can enable this behavior, as follows::
> > +
> > +  echo Y > /sys/module/zswap/parameters/save_incompressible_pages
> > +
> > +This is disabled by default, and doesn't change behavior of zswap writeback
> > +disabled case.
> > +
> >  A debugfs interface is provided for various statistic about pool size, number
> >  of pages stored, same-value filled pages and various counters for the reasons
> >  pages are rejected.
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 7e02c760955f..6e196c9a4dba 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -129,6 +129,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
> >  		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
> >  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
> >  
> > +/* Enable/disable incompressible pages storing */
> > +static bool zswap_save_incompressible_pages;
> > +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> > +		bool, 0644);
> 
> Please remove the knob and just make it the default behavior.
> 
> With writeback enabled, the current behavior is quite weird:
> compressible pages to into zswap, then get written to swap in LRU
> order. Incompressible pages get sent to swap directly. This is an
> obvious age inversion, and the performance problems this has caused
> was a motivating factor for the ability to disable writeback.
> 
> I don't think there is much point in keeping that as an officially
> supported mode.

Makes sense, I agree!  I will do so in the next version.

> 
> > @@ -937,6 +942,29 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
> >  	mutex_unlock(&acomp_ctx->mutex);
> >  }
> >  
> > +/*
> > + * Determine whether to save given page as-is.
> > + *
> > + * If a page cannot be compressed into a size smaller than PAGE_SIZE, it can be
> > + * beneficial to saving the content as is without compression, to keep the LRU
> > + * order.  This can increase memory overhead from metadata, but in common zswap
> > + * use cases where there are sufficient amount of compressible pages, the
> > + * overhead should be not critical, and can be mitigated by the writeback.
> > + * Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic.  For the case, just return the failure.  swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static bool zswap_save_as_is(int comp_ret, unsigned int dlen,
> > +		struct page *page)
> > +{
> > +	return zswap_save_incompressible_pages &&
> > +			(comp_ret || dlen == PAGE_SIZE) &&
> > +			mem_cgroup_zswap_writeback_enabled(
> > +					folio_memcg(page_folio(page)));
> > +}
> > +
> >  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >  			   struct zswap_pool *pool)
> >  {
> 
> > @@ -1001,6 +1034,17 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >  	return comp_ret == 0 && alloc_ret == 0;
> >  }
> >  
> > +/*
> > + * If save_incompressible_pages is set and writeback is enabled, incompressible
> > + * pages are saved as is without compression.  For more details, refer to the
> > + * comments of zswap_save_as_is().
> > + */
> > +static bool zswap_saved_as_is(struct zswap_entry *entry, struct folio *folio)
> > +{
> > +	return entry->length == PAGE_SIZE && zswap_save_incompressible_pages &&
> > +		mem_cgroup_zswap_writeback_enabled(folio_memcg(folio));
> > +}
> 
> I don't think there will be much left of these helpers once you
> incorporate Nhat's feedback, but please open-code these in either
> case. They're single user, hide what's going on, and the similar names
> doesn't do them any favors.

Agreed, I will do open-code these in the next version.


Thanks,
SJ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-05 16:56   ` Nhat Pham
@ 2025-08-06 20:14     ` David Hildenbrand
  2025-08-06 21:28       ` SeongJae Park
  2025-08-06 23:48       ` Shakeel Butt
  0 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-08-06 20:14 UTC (permalink / raw)
  To: Nhat Pham
  Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, Chengming Zhou,
	Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka, Yosry Ahmed,
	kernel-team, linux-doc, linux-kernel, linux-mm, Takero Funaki

On 05.08.25 18:56, Nhat Pham wrote:
> On Tue, Aug 5, 2025 at 3:47 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.08.25 02:29, SeongJae Park wrote:
>>> When zswap writeback is enabled and it fails compressing a given page,
>>> the page is swapped out to the backing swap device.  This behavior
>>> breaks the zswap's writeback LRU order, and hence users can experience
>>> unexpected latency spikes.  If the page is compressed without failure,
>>> but results in a size of PAGE_SIZE, the LRU order is kept, but the
>>> decompression overhead for loading the page back on the later access is
>>> unnecessary.
>>>
>>> Keep the LRU order and optimize unnecessary decompression overheads in
>>> the cases, by storing the original content in zpool as-is.
>>
>> Does this have any effect on the movability of the given page? IOW, does
>> page migration etc. still work when we store an ordinary page of an
>> shmem/anon folio here?
> 
> Good question. This depends on the backend allocator of zswap, but the
> only backend allocator remaining (zsmalloc) does implement page
> migration.

Right, but migration of these pages works completely different than 
folio migration.

But I think the part I was missing: we are still performing a copy to 
another page, it's just that we don't perform any compression.

So I guess *breaking* movability of folios is not a concern.

But yeah, whether these "as is" pages are movable or not is a good 
question as well -- in particular when zsmalloc supports page migration 
and the "as is" pages would not.

Maybe someone familiar with the code could shed a light on that.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-06 20:14     ` David Hildenbrand
@ 2025-08-06 21:28       ` SeongJae Park
  2025-08-06 23:48       ` Shakeel Butt
  1 sibling, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-08-06 21:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: SeongJae Park, Nhat Pham, Liam R. Howlett, Andrew Morton,
	Chengming Zhou, Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Yosry Ahmed, kernel-team, linux-doc, linux-kernel, linux-mm,
	Takero Funaki

On Wed, 6 Aug 2025 22:14:39 +0200 David Hildenbrand <david@redhat.com> wrote:

> On 05.08.25 18:56, Nhat Pham wrote:
> > On Tue, Aug 5, 2025 at 3:47 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 05.08.25 02:29, SeongJae Park wrote:
> >>> When zswap writeback is enabled and it fails compressing a given page,
> >>> the page is swapped out to the backing swap device.  This behavior
> >>> breaks the zswap's writeback LRU order, and hence users can experience
> >>> unexpected latency spikes.  If the page is compressed without failure,
> >>> but results in a size of PAGE_SIZE, the LRU order is kept, but the
> >>> decompression overhead for loading the page back on the later access is
> >>> unnecessary.
> >>>
> >>> Keep the LRU order and optimize unnecessary decompression overheads in
> >>> the cases, by storing the original content in zpool as-is.
> >>
> >> Does this have any effect on the movability of the given page? IOW, does
> >> page migration etc. still work when we store an ordinary page of an
> >> shmem/anon folio here?
> > 
> > Good question. This depends on the backend allocator of zswap, but the
> > only backend allocator remaining (zsmalloc) does implement page
> > migration.
> 
> Right, but migration of these pages works completely different than 
> folio migration.
> 
> But I think the part I was missing: we are still performing a copy to 
> another page, it's just that we don't perform any compression.
> 
> So I guess *breaking* movability of folios is not a concern.
> 
> But yeah, whether these "as is" pages are movable or not is a good 
> question as well -- in particular when zsmalloc supports page migration 
> and the "as is" pages would not.

Maybe I'm missing some of your points.  But there is no difference for "as is"
pages.

Before this patch, zswap asks zpool (backed by zsmalloc) to allocate memoy and
store the content of the page in the "compressed" form, if the content was able
to be compressed.  After that, the original page becomes same to any pages
that swapped out.

After this patch, if the content was unable to be compressed, the content is
saved "as is" _in_ the zpool, in a way same to those "compressible" case,
except the content is not changed.  After the saving is done, the original page
becomes same to any pages that swapped out.

Zsmalloc will support migration of pages that backing the internal contents,
regardless of whether those are compressed or saved "as is".  From perspectives
other than that of zsmalloc, hence, I think no difference is introduced by this
patch.

Again, I'm not sure if I'm really understanding your points.  If so, please let
me know.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-06 20:14     ` David Hildenbrand
  2025-08-06 21:28       ` SeongJae Park
@ 2025-08-06 23:48       ` Shakeel Butt
  2025-08-07  5:55         ` David Hildenbrand
  1 sibling, 1 reply; 14+ messages in thread
From: Shakeel Butt @ 2025-08-06 23:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Nhat Pham, SeongJae Park, Liam R. Howlett, Andrew Morton,
	Chengming Zhou, Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Yosry Ahmed, kernel-team, linux-doc, linux-kernel, linux-mm,
	Takero Funaki

On Wed, Aug 06, 2025 at 10:14:39PM +0200, David Hildenbrand wrote:
> 
> But yeah, whether these "as is" pages are movable or not is a good question
> as well -- in particular when zsmalloc supports page migration and the "as
> is" pages would not.

By "as is" page, do you mean the page which the reclaim code is trying
to reclaim or the page within zsmalloc on which the content of original
pages are copied as is? Most probably you meant the page which the reclaim
code is trying to reclaim. This page is on its way to get freed after
[z]swapout is completed and this patch is not changing any behavior for
that path.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-06 23:48       ` Shakeel Butt
@ 2025-08-07  5:55         ` David Hildenbrand
  2025-08-07 16:50           ` SeongJae Park
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-08-07  5:55 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Nhat Pham, SeongJae Park, Liam R. Howlett, Andrew Morton,
	Chengming Zhou, Johannes Weiner, Jonathan Corbet, Lorenzo Stoakes,
	Michal Hocko, Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Yosry Ahmed, kernel-team, linux-doc, linux-kernel, linux-mm,
	Takero Funaki

On 07.08.25 01:48, Shakeel Butt wrote:
> On Wed, Aug 06, 2025 at 10:14:39PM +0200, David Hildenbrand wrote:
>>
>> But yeah, whether these "as is" pages are movable or not is a good question
>> as well -- in particular when zsmalloc supports page migration and the "as
>> is" pages would not.
> 
> By "as is" page, do you mean the page which the reclaim code is trying
> to reclaim or the page within zsmalloc on which the content of original
> pages are copied as is?

I mean whatever the "dst" is here.

+	if (zswap_save_as_is(comp_ret, dlen, page)) {
+		comp_ret = 0;
+		dlen = PAGE_SIZE;
+		memcpy_from_page(dst, page, 0, dlen);

IIUC SJ correctly, in case of zsmalloc "dst" is just the same page that 
would have stored encrypted data.

If that is the case, nothing should change, really.

Thanks for clarifying, all!

> Most probably you meant the page which the reclaim
> code is trying to reclaim. This page is on its way to get freed after
> [z]swapout is completed and this patch is not changing any behavior for
> that path.

Yeah, that's the "page" in the hunk above I guess.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is
  2025-08-07  5:55         ` David Hildenbrand
@ 2025-08-07 16:50           ` SeongJae Park
  0 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-08-07 16:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: SeongJae Park, Shakeel Butt, Nhat Pham, Liam R. Howlett,
	Andrew Morton, Chengming Zhou, Johannes Weiner, Jonathan Corbet,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Yosry Ahmed, kernel-team, linux-doc,
	linux-kernel, linux-mm, Takero Funaki

On Thu, 7 Aug 2025 07:55:27 +0200 David Hildenbrand <david@redhat.com> wrote:

> On 07.08.25 01:48, Shakeel Butt wrote:
> > On Wed, Aug 06, 2025 at 10:14:39PM +0200, David Hildenbrand wrote:
> >>
> >> But yeah, whether these "as is" pages are movable or not is a good question
> >> as well -- in particular when zsmalloc supports page migration and the "as
> >> is" pages would not.
> > 
> > By "as is" page, do you mean the page which the reclaim code is trying
> > to reclaim or the page within zsmalloc on which the content of original
> > pages are copied as is?
> 
> I mean whatever the "dst" is here.
> 
> +	if (zswap_save_as_is(comp_ret, dlen, page)) {
> +		comp_ret = 0;
> +		dlen = PAGE_SIZE;
> +		memcpy_from_page(dst, page, 0, dlen);
> 
> IIUC SJ correctly, in case of zsmalloc "dst" is just the same page that 
> would have stored encrypted data.

You correctly understood me.

> 
> If that is the case, nothing should change, really.
> 
> Thanks for clarifying, all!

Thank you for asking this important question, too, David! :)


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-08-07 16:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05  0:29 [RFC PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is SeongJae Park
2025-08-05 10:47 ` David Hildenbrand
2025-08-05 16:56   ` Nhat Pham
2025-08-06 20:14     ` David Hildenbrand
2025-08-06 21:28       ` SeongJae Park
2025-08-06 23:48       ` Shakeel Butt
2025-08-07  5:55         ` David Hildenbrand
2025-08-07 16:50           ` SeongJae Park
2025-08-05 18:43   ` SeongJae Park
2025-08-05 18:25 ` Nhat Pham
2025-08-05 18:31   ` Nhat Pham
2025-08-05 18:51   ` SeongJae Park
2025-08-06 16:32 ` Johannes Weiner
2025-08-06 16:56   ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).