linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm/zswap: store compression failed page as-is
@ 2025-07-30 23:40 SeongJae Park
  2025-07-31  0:21 ` Nhat Pham
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: SeongJae Park @ 2025-07-30 23:40 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Johannes Weiner,
	Nhat Pham, Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel,
	linux-mm

When zswap writeback is enabled and it fails compressing a given page,
zswap lets the page be swapped out to the backing swap device.  This
behavior breaks the zswap's writeback LRU order, and hence users can
experience unexpected latency spikes.

Keep the LRU order by storing the original content in zswap as-is.  The
original content is saved in a dynamically allocated page size buffer,
and the pointer to the buffer is kept in zswap_entry, on the space for
zswap_entry->pool.  Whether the space is used for the original content
or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.

This avoids increasing per zswap entry metadata overhead.  But as the
number of incompressible pages increases, zswap metadata overhead is
proportionally increased.  The overhead should not be problematic in
usual cases, since the zswap metadata for single zswap entry is much
smaller than PAGE_SIZE, and in common zswap use cases there should be
sufficient amount of compressible pages.  Also it can be mitigated by
the zswap writeback.

When the memory pressure comes from memcg's memory.high and zswap
writeback is set to be triggered for that, however, the penalty_jiffies
sleep could degrade the performance.  Add a parameter, namely
'save_incompressible_pages', to turn the feature on/off as users want.
It is turned off by default.

When the writeback is just disabled, the additional overhead could be
problematic.  For the case, keep the behavior that just returns the
failure and let swap_writeout() puts the page back to the active LRU
list in the case.  It is known to be suboptimal when the incompressible
pages are cold, since the incompressible pages will continuously be
tried to be zswapped out, and burn CPU cycles for compression attempts
that will anyway fails.  But that's out of the scope of this patch.

Tests
-----

I tested this patch using a simple self-written microbenchmark that is
available at GitHub[1].  You can reproduce the test I did by executing
run_tests.sh of the repo on your system.  Note that the repo's
documentation is not good as of this writing, so you may need to read
and use the code.

The basic test scenario is simple.  Run a test program making artificial
accesses to memory having artificial content under memory.high-set
memory limit and measure how many accesses were made in given time.

The test program repeatedly and randomly access three anonymous memory
regions.  The regions are all 500 MiB size, and accessed in same
probability.  Two of those are filled up with a simple content that can
easily be compressed, while the remaining one is filled up with a
content that read from /dev/urandom, which is easy to fail at
compressing.  The program runs for two minutes and prints out the number
of accesses made every five seconds.

The test script runs the program under below seven configurations.

- 0: memory.high is set to 2 GiB, zswap is disabled.
- 1-1: memory.high is set to 1350 MiB, zswap is disabled.
- 1-2: Same to 1-1, but zswap is enabled.
- 1-3: Same to 1-2, but save_incompressible_pages is turned on.
- 2-1: memory.high is set to 1200 MiB, zswap is disabled.
- 2-2: Same to 2-1, but zswap is enabled.
- 2-3: Same to 2-2, but save_incompressible_pages is turned on.

Configuration '0' is for showing the original memory performance.
Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
zswap, and this patch under a level of memory pressure (~10% of working
set).

Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
show those under a severe level of memory pressure (~20% of the working
set).

Because the per-5 seconds performance is not very reliable, I measured
the average of that for the last one minute period of the test program
run.  I also measured a few vmstat counters including zswpin, zswpout,
zswpwb, pswpin and pswpout during the test runs.

The measurement results are as below.  To save space, I show performance
numbers that are normalized to that of the configuration '0' (no memory
pressure), only.  The averaged accesses per 5 seconds of configuration
'0' was 34612740.66.

    config            0       1-1     1-2      1-3      2-1     2-2      2-3
    perf_normalized   1.0000  0.0060  0.0230   0.0310   0.0030  0.0116   0.0003
    perf_stdev_ratio  0.06    0.04    0.11     0.14     0.03    0.05     0.26
    zswpin            0       0       1701702  6982188  0       2479848  815742
    zswpout           0       0       1725260  7048517  0       2535744  931420
    zswpwb            0       0       0        0        0       0        0
    pswpin            0       468612  481270   0        476434  535772   0
    pswpout           0       574634  689625   0        612683  591881   0

'perf_normalized' is the performance metric, normalized to that of
configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
deviation of the averaged data points, as a ratio to the averaged metric
value.  For example, configuration '0' performance was showing 6% stdev.
Configurations 1-2 and 1-3 were having about 11% and 14% stdev.  So the
measurement is not very stable.  Please keep this in your mind when
reading these results.  It shows some rough patterns, though.

Under about 10% of working set memory pressure, the performance was
dropped to about 0.6% of no-pressure one, when the normal swap is used
(1-1).  Actually ~10% working set pressure is not a mild one, at least
on this test setup.

By turning zswap on (1-2), the performance was improved about 4x,
resulting in about 2.3% of no-pressure one.  Because of the
incompressible pages in the third memory region, a significant amount of
(non-zswap) swap I/O operations were made, though.

By enabling the incompressible pages handling feature that is introduced
by this patch (1-3), about 34% performance improvement was made,
resulting in about 3.1% of no-pressure one.  As expected, compression
failed pages were handled by zswap, and hence no (non-zswap) swap I/O
was made in this configuration.

Under about 20% of working set memory pressure, which could be extreme,
the performance drops down to 0.3% of no-pressure one when only the
normal swap is used (2-1).  Enabling zswap significantly improves it, up
to 1.16%, though again showing a significant number of (non-zswap) swap
I/O due to incompressible pages.

Enabling the incompressible pages handling feature of this patch (2-3)
reduced non-zswap swap I/O as expected, but the performance became
worse, 0.03% of no-pressure one.  It turned out this is because of
memory.high-imposed penalty_jiffies sleep.  By commenting out the
penalty_jiffies sleep code of mem_cgroup_handle_over_high(), the
performance became higher than 2-2.

20% of working set memory pressure is pretty extreme, but anyway the
incompressible pages handling feature could make it worse in certain
setups.  Hence this version of the patch adds the parameter for turning
the feature on/off as needed, and makes it disabled by default.

Related Works
-------------

This is not an entirely new attempt.  There were a couple of similar
approaches in the past.  Nhat Pham tried a very same approach[2] in
October 2023.  Takero Funaki also tried a very similar approach[3] in
April 2024.

The two approaches didn't get merged mainly due to the metadata overhead
concern.  I described why I think that shouldn't be a problem for this
change, which is automatically disabled when writeback is disabled, at
the beginning of this changelog.

This patch is not particularly different from those, and actually built
upon those.  I wrote this from scratch again, though.  I have no good
idea about how I can give credits to the authors of the previously made
nearly-same attempts, and I should be responsible to maintain this
change if this is upstreamed, so taking the authorship for now.  Please
let me know if you know a better way to give them their deserved
credits.

[1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
[2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
[3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 65 insertions(+), 8 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 7e02c760955f..e021865696c6 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -45,6 +45,9 @@
 /* The number of compressed pages currently stored in zswap */
 atomic_long_t zswap_stored_pages = ATOMIC_LONG_INIT(0);
 
+/* The number of uncompressed pages currently stored in zswap */
+atomic_long_t zswap_stored_uncompressed_pages = ATOMIC_LONG_INIT(0);
+
 /*
  * The statistics below are not protected from concurrent access for
  * performance reasons so they may not be a 100% accurate.  However,
@@ -129,6 +132,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
 		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
 module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
 
+/* Store incompressible pages as is */
+static bool zswap_save_incompressible_pages;
+module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
+		bool, 0644);
+
 bool zswap_is_enabled(void)
 {
 	return zswap_enabled;
@@ -190,7 +198,8 @@ static struct shrinker *zswap_shrinker;
  *              writeback logic. The entry is only reclaimed by the writeback
  *              logic if referenced is unset. See comments in the shrinker
  *              section for context.
- * pool - the zswap_pool the entry's data is in
+ * orig_data - uncompressed original data of the page, if length is PAGE_SIZE
+ * pool - the zswap_pool the entry's data is in, if length is not PAGE_SIZE
  * handle - zpool allocation handle that stores the compressed page data
  * objcg - the obj_cgroup that the compressed memory is charged to
  * lru - handle to the pool's lru used to evict pages.
@@ -199,7 +208,10 @@ struct zswap_entry {
 	swp_entry_t swpentry;
 	unsigned int length;
 	bool referenced;
-	struct zswap_pool *pool;
+	union {
+		void *orig_data;
+		struct zswap_pool *pool;
+	};
 	unsigned long handle;
 	struct obj_cgroup *objcg;
 	struct list_head lru;
@@ -500,7 +512,7 @@ unsigned long zswap_total_pages(void)
 		total += zpool_get_total_pages(pool->zpool);
 	rcu_read_unlock();
 
-	return total;
+	return total + atomic_long_read(&zswap_stored_uncompressed_pages);
 }
 
 static bool zswap_check_limits(void)
@@ -805,8 +817,13 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
 static void zswap_entry_free(struct zswap_entry *entry)
 {
 	zswap_lru_del(&zswap_list_lru, entry);
-	zpool_free(entry->pool->zpool, entry->handle);
-	zswap_pool_put(entry->pool);
+	if (entry->length == PAGE_SIZE) {
+		kfree(entry->orig_data);
+		atomic_long_dec(&zswap_stored_uncompressed_pages);
+	} else {
+		zpool_free(entry->pool->zpool, entry->handle);
+		zswap_pool_put(entry->pool);
+	}
 	if (entry->objcg) {
 		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
 		obj_cgroup_put(entry->objcg);
@@ -937,6 +954,36 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
 	mutex_unlock(&acomp_ctx->mutex);
 }
 
+/*
+ * If the compression is failed, try saving the content as is without
+ * compression, to keep the LRU order.  This can increase memory overhead from
+ * metadata, but in common zswap use cases where there are sufficient amount of
+ * compressible pages, the overhead should be not ciritical, and can be
+ * mitigated by the writeback.  Also, the decompression overhead is optimized.
+ *
+ * When the writeback is disabled, however, the additional overhead could be
+ * problematic.  For the case, just return the failure.  swap_writeout() will
+ * put the page back to the active LRU list in the case.
+ */
+static int zswap_handle_compression_failure(int comp_ret, struct page *page,
+		struct zswap_entry *entry)
+{
+	if (!zswap_save_incompressible_pages)
+		return comp_ret;
+	if (!mem_cgroup_zswap_writeback_enabled(
+				folio_memcg(page_folio(page))))
+		return comp_ret;
+
+	entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
+			__GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
+	if (!entry->orig_data)
+		return -ENOMEM;
+	memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
+	entry->length = PAGE_SIZE;
+	atomic_long_inc(&zswap_stored_uncompressed_pages);
+	return 0;
+}
+
 static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 			   struct zswap_pool *pool)
 {
@@ -976,8 +1023,11 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	 */
 	comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
 	dlen = acomp_ctx->req->dlen;
-	if (comp_ret)
+	if (comp_ret) {
+		comp_ret = zswap_handle_compression_failure(comp_ret, page,
+				entry);
 		goto unlock;
+	}
 
 	zpool = pool->zpool;
 	gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
@@ -1009,6 +1059,11 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	int decomp_ret, dlen;
 	u8 *src, *obj;
 
+	if (entry->length == PAGE_SIZE) {
+		memcpy_to_folio(folio, 0, entry->orig_data, entry->length);
+		return true;
+	}
+
 	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
 	obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
 
@@ -1518,7 +1573,8 @@ static bool zswap_store_page(struct page *page,
 	 * The opposite actions will be performed by zswap_entry_free()
 	 * when the entry is removed from the tree.
 	 */
-	zswap_pool_get(pool);
+	if (entry->length != PAGE_SIZE)
+		zswap_pool_get(pool);
 	if (objcg) {
 		obj_cgroup_get(objcg);
 		obj_cgroup_charge_zswap(objcg, entry->length);
@@ -1535,7 +1591,8 @@ static bool zswap_store_page(struct page *page,
 	 *    The publishing order matters to prevent writeback from seeing
 	 *    an incoherent entry.
 	 */
-	entry->pool = pool;
+	if (entry->length != PAGE_SIZE)
+		entry->pool = pool;
 	entry->swpentry = page_swpentry;
 	entry->objcg = objcg;
 	entry->referenced = true;

base-commit: b771a67861b3538324c3df25d337f4713ee53e03
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
@ 2025-07-31  0:21 ` Nhat Pham
  2025-07-31  0:22   ` Nhat Pham
  2025-07-31 16:43   ` SeongJae Park
  2025-07-31  0:48 ` Nhat Pham
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Nhat Pham @ 2025-07-31  0:21 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Johannes Weiner, Takero Funaki,
	Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:
>
> When zswap writeback is enabled and it fails compressing a given page,
> zswap lets the page be swapped out to the backing swap device.  This
> behavior breaks the zswap's writeback LRU order, and hence users can
> experience unexpected latency spikes.
>
> Keep the LRU order by storing the original content in zswap as-is.  The
> original content is saved in a dynamically allocated page size buffer,
> and the pointer to the buffer is kept in zswap_entry, on the space for
> zswap_entry->pool.  Whether the space is used for the original content
> or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
>
> This avoids increasing per zswap entry metadata overhead.  But as the
> number of incompressible pages increases, zswap metadata overhead is
> proportionally increased.  The overhead should not be problematic in
> usual cases, since the zswap metadata for single zswap entry is much
> smaller than PAGE_SIZE, and in common zswap use cases there should be
> sufficient amount of compressible pages.  Also it can be mitigated by
> the zswap writeback.
>
> When the memory pressure comes from memcg's memory.high and zswap
> writeback is set to be triggered for that, however, the penalty_jiffies
> sleep could degrade the performance.  Add a parameter, namely
> 'save_incompressible_pages', to turn the feature on/off as users want.
> It is turned off by default.
>
> When the writeback is just disabled, the additional overhead could be
> problematic.  For the case, keep the behavior that just returns the
> failure and let swap_writeout() puts the page back to the active LRU
> list in the case.  It is known to be suboptimal when the incompressible
> pages are cold, since the incompressible pages will continuously be
> tried to be zswapped out, and burn CPU cycles for compression attempts
> that will anyway fails.  But that's out of the scope of this patch.
>
> Tests
> -----
>
> I tested this patch using a simple self-written microbenchmark that is
> available at GitHub[1].  You can reproduce the test I did by executing
> run_tests.sh of the repo on your system.  Note that the repo's
> documentation is not good as of this writing, so you may need to read
> and use the code.
>
> The basic test scenario is simple.  Run a test program making artificial
> accesses to memory having artificial content under memory.high-set
> memory limit and measure how many accesses were made in given time.
>
> The test program repeatedly and randomly access three anonymous memory
> regions.  The regions are all 500 MiB size, and accessed in same
> probability.  Two of those are filled up with a simple content that can
> easily be compressed, while the remaining one is filled up with a
> content that read from /dev/urandom, which is easy to fail at
> compressing.  The program runs for two minutes and prints out the number
> of accesses made every five seconds.
>
> The test script runs the program under below seven configurations.
>
> - 0: memory.high is set to 2 GiB, zswap is disabled.
> - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> - 1-2: Same to 1-1, but zswap is enabled.
> - 1-3: Same to 1-2, but save_incompressible_pages is turned on.
> - 2-1: memory.high is set to 1200 MiB, zswap is disabled.
> - 2-2: Same to 2-1, but zswap is enabled.
> - 2-3: Same to 2-2, but save_incompressible_pages is turned on.
>
> Configuration '0' is for showing the original memory performance.
> Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
> zswap, and this patch under a level of memory pressure (~10% of working
> set).
>
> Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
> show those under a severe level of memory pressure (~20% of the working
> set).
>
> Because the per-5 seconds performance is not very reliable, I measured
> the average of that for the last one minute period of the test program
> run.  I also measured a few vmstat counters including zswpin, zswpout,
> zswpwb, pswpin and pswpout during the test runs.
>
> The measurement results are as below.  To save space, I show performance
> numbers that are normalized to that of the configuration '0' (no memory
> pressure), only.  The averaged accesses per 5 seconds of configuration
> '0' was 34612740.66.
>
>     config            0       1-1     1-2      1-3      2-1     2-2      2-3
>     perf_normalized   1.0000  0.0060  0.0230   0.0310   0.0030  0.0116   0.0003
>     perf_stdev_ratio  0.06    0.04    0.11     0.14     0.03    0.05     0.26
>     zswpin            0       0       1701702  6982188  0       2479848  815742
>     zswpout           0       0       1725260  7048517  0       2535744  931420
>     zswpwb            0       0       0        0        0       0        0
>     pswpin            0       468612  481270   0        476434  535772   0
>     pswpout           0       574634  689625   0        612683  591881   0
>
> 'perf_normalized' is the performance metric, normalized to that of
> configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
> deviation of the averaged data points, as a ratio to the averaged metric
> value.  For example, configuration '0' performance was showing 6% stdev.
> Configurations 1-2 and 1-3 were having about 11% and 14% stdev.  So the
> measurement is not very stable.  Please keep this in your mind when
> reading these results.  It shows some rough patterns, though.
>
> Under about 10% of working set memory pressure, the performance was
> dropped to about 0.6% of no-pressure one, when the normal swap is used
> (1-1).  Actually ~10% working set pressure is not a mild one, at least
> on this test setup.
>
> By turning zswap on (1-2), the performance was improved about 4x,
> resulting in about 2.3% of no-pressure one.  Because of the
> incompressible pages in the third memory region, a significant amount of
> (non-zswap) swap I/O operations were made, though.
>
> By enabling the incompressible pages handling feature that is introduced
> by this patch (1-3), about 34% performance improvement was made,
> resulting in about 3.1% of no-pressure one.  As expected, compression
> failed pages were handled by zswap, and hence no (non-zswap) swap I/O
> was made in this configuration.
>
> Under about 20% of working set memory pressure, which could be extreme,
> the performance drops down to 0.3% of no-pressure one when only the
> normal swap is used (2-1).  Enabling zswap significantly improves it, up
> to 1.16%, though again showing a significant number of (non-zswap) swap
> I/O due to incompressible pages.
>
> Enabling the incompressible pages handling feature of this patch (2-3)
> reduced non-zswap swap I/O as expected, but the performance became
> worse, 0.03% of no-pressure one.  It turned out this is because of
> memory.high-imposed penalty_jiffies sleep.  By commenting out the
> penalty_jiffies sleep code of mem_cgroup_handle_over_high(), the
> performance became higher than 2-2.
>
> 20% of working set memory pressure is pretty extreme, but anyway the
> incompressible pages handling feature could make it worse in certain
> setups.  Hence this version of the patch adds the parameter for turning
> the feature on/off as needed, and makes it disabled by default.
>
> Related Works
> -------------
>
> This is not an entirely new attempt.  There were a couple of similar
> approaches in the past.  Nhat Pham tried a very same approach[2] in
> October 2023.  Takero Funaki also tried a very similar approach[3] in
> April 2024.
>
> The two approaches didn't get merged mainly due to the metadata overhead
> concern.  I described why I think that shouldn't be a problem for this
> change, which is automatically disabled when writeback is disabled, at
> the beginning of this changelog.
>
> This patch is not particularly different from those, and actually built
> upon those.  I wrote this from scratch again, though.  I have no good
> idea about how I can give credits to the authors of the previously made
> nearly-same attempts, and I should be responsible to maintain this
> change if this is upstreamed, so taking the authorship for now.  Please
> let me know if you know a better way to give them their deserved
> credits.
>
> [1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
> [2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
> [3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com
>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
>  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 65 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 7e02c760955f..e021865696c6 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -45,6 +45,9 @@
>  /* The number of compressed pages currently stored in zswap */
>  atomic_long_t zswap_stored_pages = ATOMIC_LONG_INIT(0);
>
> +/* The number of uncompressed pages currently stored in zswap */
> +atomic_long_t zswap_stored_uncompressed_pages = ATOMIC_LONG_INIT(0);
> +
>  /*
>   * The statistics below are not protected from concurrent access for
>   * performance reasons so they may not be a 100% accurate.  However,
> @@ -129,6 +132,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
>                 CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
>  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
>
> +/* Store incompressible pages as is */
> +static bool zswap_save_incompressible_pages;
> +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> +               bool, 0644);
> +
>  bool zswap_is_enabled(void)
>  {
>         return zswap_enabled;
> @@ -190,7 +198,8 @@ static struct shrinker *zswap_shrinker;
>   *              writeback logic. The entry is only reclaimed by the writeback
>   *              logic if referenced is unset. See comments in the shrinker
>   *              section for context.
> - * pool - the zswap_pool the entry's data is in
> + * orig_data - uncompressed original data of the page, if length is PAGE_SIZE
> + * pool - the zswap_pool the entry's data is in, if length is not PAGE_SIZE
>   * handle - zpool allocation handle that stores the compressed page data
>   * objcg - the obj_cgroup that the compressed memory is charged to
>   * lru - handle to the pool's lru used to evict pages.
> @@ -199,7 +208,10 @@ struct zswap_entry {
>         swp_entry_t swpentry;
>         unsigned int length;
>         bool referenced;
> -       struct zswap_pool *pool;
> +       union {
> +               void *orig_data;
> +               struct zswap_pool *pool;
> +       };
>         unsigned long handle;
>         struct obj_cgroup *objcg;
>         struct list_head lru;
> @@ -500,7 +512,7 @@ unsigned long zswap_total_pages(void)
>                 total += zpool_get_total_pages(pool->zpool);
>         rcu_read_unlock();
>
> -       return total;
> +       return total + atomic_long_read(&zswap_stored_uncompressed_pages);
>  }
>
>  static bool zswap_check_limits(void)
> @@ -805,8 +817,13 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
>  static void zswap_entry_free(struct zswap_entry *entry)
>  {
>         zswap_lru_del(&zswap_list_lru, entry);
> -       zpool_free(entry->pool->zpool, entry->handle);
> -       zswap_pool_put(entry->pool);
> +       if (entry->length == PAGE_SIZE) {
> +               kfree(entry->orig_data);
> +               atomic_long_dec(&zswap_stored_uncompressed_pages);
> +       } else {
> +               zpool_free(entry->pool->zpool, entry->handle);
> +               zswap_pool_put(entry->pool);
> +       }
>         if (entry->objcg) {
>                 obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
>                 obj_cgroup_put(entry->objcg);
> @@ -937,6 +954,36 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
>         mutex_unlock(&acomp_ctx->mutex);
>  }
>
> +/*
> + * If the compression is failed, try saving the content as is without
> + * compression, to keep the LRU order.  This can increase memory overhead from
> + * metadata, but in common zswap use cases where there are sufficient amount of
> + * compressible pages, the overhead should be not ciritical, and can be
> + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> + *
> + * When the writeback is disabled, however, the additional overhead could be
> + * problematic.  For the case, just return the failure.  swap_writeout() will
> + * put the page back to the active LRU list in the case.
> + */
> +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> +               struct zswap_entry *entry)
> +{
> +       if (!zswap_save_incompressible_pages)
> +               return comp_ret;
> +       if (!mem_cgroup_zswap_writeback_enabled(
> +                               folio_memcg(page_folio(page))))
> +               return comp_ret;
> +
> +       entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> +                       __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));

Hmm, seems like this new buffer is not migratable (for compaction etc.?)

My understanding is that zsmalloc's allocated memory can be migrated
(which is why zswap only works with a handle - it's a layer of
indirection that gives zsmalloc the ability to move memory around).

Besides, why should we re-invent the wheel when zsmalloc already
handles page-sized objects? :)

> +       if (!entry->orig_data)
> +               return -ENOMEM;
> +       memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> +       entry->length = PAGE_SIZE;
> +       atomic_long_inc(&zswap_stored_uncompressed_pages);
> +       return 0;
> +}
> +
>  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>                            struct zswap_pool *pool)
>  {
> @@ -976,8 +1023,11 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>          */
>         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
>         dlen = acomp_ctx->req->dlen;
> -       if (comp_ret)
> +       if (comp_ret) {
> +               comp_ret = zswap_handle_compression_failure(comp_ret, page,
> +                               entry);
>                 goto unlock;
> +       }
>
>         zpool = pool->zpool;
>         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> @@ -1009,6 +1059,11 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         int decomp_ret, dlen;
>         u8 *src, *obj;
>
> +       if (entry->length == PAGE_SIZE) {
> +               memcpy_to_folio(folio, 0, entry->orig_data, entry->length);
> +               return true;
> +       }

This might not be safe.

It's conceivable that in zswap_compress(), some compression algorithm
"successfully" compresses a page to the same size (comp_ret == 0). We
hand that to zsmalloc, which happily stores the page.

When we "decompress" the page again, we will attempt to
memcpy_to_folio from a bogus address (the handle from zsmalloc).

So, in zswap_compress, you have to treat both comp_ret == 0 and dlen
== PAGE_SIZE as "compression failure".

> +
>         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
>         obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
>
> @@ -1518,7 +1573,8 @@ static bool zswap_store_page(struct page *page,
>          * The opposite actions will be performed by zswap_entry_free()
>          * when the entry is removed from the tree.
>          */
> -       zswap_pool_get(pool);
> +       if (entry->length != PAGE_SIZE)
> +               zswap_pool_get(pool);
>         if (objcg) {
>                 obj_cgroup_get(objcg);
>                 obj_cgroup_charge_zswap(objcg, entry->length);
> @@ -1535,7 +1591,8 @@ static bool zswap_store_page(struct page *page,
>          *    The publishing order matters to prevent writeback from seeing
>          *    an incoherent entry.
>          */
> -       entry->pool = pool;
> +       if (entry->length != PAGE_SIZE)
> +               entry->pool = pool;
>         entry->swpentry = page_swpentry;
>         entry->objcg = objcg;
>         entry->referenced = true;
>
> base-commit: b771a67861b3538324c3df25d337f4713ee53e03
> --
> 2.39.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31  0:21 ` Nhat Pham
@ 2025-07-31  0:22   ` Nhat Pham
  2025-07-31 16:43     ` SeongJae Park
  2025-07-31 16:43   ` SeongJae Park
  1 sibling, 1 reply; 12+ messages in thread
From: Nhat Pham @ 2025-07-31  0:22 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Johannes Weiner, Takero Funaki,
	Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Wed, Jul 30, 2025 at 5:21 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:
>
> This might not be safe.
>
> It's conceivable that in zswap_compress(), some compression algorithm
> "successfully" compresses a page to the same size (comp_ret == 0). We
> hand that to zsmalloc, which happily stores the page.
>
> When we "decompress" the page again, we will attempt to
> memcpy_to_folio from a bogus address (the handle from zsmalloc).
>
> So, in zswap_compress, you have to treat both comp_ret == 0 and dlen
> == PAGE_SIZE as "compression failure".

Meant to say comp_ret != 0 here... sorry for the confusion...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
  2025-07-31  0:21 ` Nhat Pham
@ 2025-07-31  0:48 ` Nhat Pham
  2025-07-31 16:56   ` SeongJae Park
  2025-07-31 15:27 ` Johannes Weiner
  2025-07-31 17:20 ` Joshua Hahn
  3 siblings, 1 reply; 12+ messages in thread
From: Nhat Pham @ 2025-07-31  0:48 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Johannes Weiner, Takero Funaki,
	Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:

Now, taking a look at the conceptual side of the patch:

>
> When zswap writeback is enabled and it fails compressing a given page,
> zswap lets the page be swapped out to the backing swap device.  This
> behavior breaks the zswap's writeback LRU order, and hence users can
> experience unexpected latency spikes.
>
> Keep the LRU order by storing the original content in zswap as-is.  The
> original content is saved in a dynamically allocated page size buffer,
> and the pointer to the buffer is kept in zswap_entry, on the space for
> zswap_entry->pool.  Whether the space is used for the original content
> or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
>
> This avoids increasing per zswap entry metadata overhead.  But as the
> number of incompressible pages increases, zswap metadata overhead is
> proportionally increased.  The overhead should not be problematic in
> usual cases, since the zswap metadata for single zswap entry is much
> smaller than PAGE_SIZE, and in common zswap use cases there should be
> sufficient amount of compressible pages.  Also it can be mitigated by
> the zswap writeback.
>
> When the memory pressure comes from memcg's memory.high and zswap
> writeback is set to be triggered for that, however, the penalty_jiffies
> sleep could degrade the performance.  Add a parameter, namely
> 'save_incompressible_pages', to turn the feature on/off as users want.
> It is turned off by default.
>
> When the writeback is just disabled, the additional overhead could be
> problematic.  For the case, keep the behavior that just returns the
> failure and let swap_writeout() puts the page back to the active LRU
> list in the case.  It is known to be suboptimal when the incompressible
> pages are cold, since the incompressible pages will continuously be
> tried to be zswapped out, and burn CPU cycles for compression attempts
> that will anyway fails.  But that's out of the scope of this patch.

As a follow up, we can reuse the "swapped out" page (and its struct
page) to store in the zswap pool (and LRU).

>
> Tests
> -----
>
> I tested this patch using a simple self-written microbenchmark that is
> available at GitHub[1].  You can reproduce the test I did by executing
> run_tests.sh of the repo on your system.  Note that the repo's
> documentation is not good as of this writing, so you may need to read
> and use the code.
>
> The basic test scenario is simple.  Run a test program making artificial
> accesses to memory having artificial content under memory.high-set
> memory limit and measure how many accesses were made in given time.
>
> The test program repeatedly and randomly access three anonymous memory
> regions.  The regions are all 500 MiB size, and accessed in same
> probability.  Two of those are filled up with a simple content that can
> easily be compressed, while the remaining one is filled up with a
> content that read from /dev/urandom, which is easy to fail at
> compressing.  The program runs for two minutes and prints out the number
> of accesses made every five seconds.
>
> The test script runs the program under below seven configurations.
>
> - 0: memory.high is set to 2 GiB, zswap is disabled.
> - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> - 1-2: Same to 1-1, but zswap is enabled.
> - 1-3: Same to 1-2, but save_incompressible_pages is turned on.
> - 2-1: memory.high is set to 1200 MiB, zswap is disabled.
> - 2-2: Same to 2-1, but zswap is enabled.
> - 2-3: Same to 2-2, but save_incompressible_pages is turned on.
>
> Configuration '0' is for showing the original memory performance.
> Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
> zswap, and this patch under a level of memory pressure (~10% of working
> set).
>
> Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
> show those under a severe level of memory pressure (~20% of the working
> set).
>
> Because the per-5 seconds performance is not very reliable, I measured
> the average of that for the last one minute period of the test program
> run.  I also measured a few vmstat counters including zswpin, zswpout,
> zswpwb, pswpin and pswpout during the test runs.
>
> The measurement results are as below.  To save space, I show performance
> numbers that are normalized to that of the configuration '0' (no memory
> pressure), only.  The averaged accesses per 5 seconds of configuration
> '0' was 34612740.66.
>
>     config            0       1-1     1-2      1-3      2-1     2-2      2-3
>     perf_normalized   1.0000  0.0060  0.0230   0.0310   0.0030  0.0116   0.0003
>     perf_stdev_ratio  0.06    0.04    0.11     0.14     0.03    0.05     0.26
>     zswpin            0       0       1701702  6982188  0       2479848  815742
>     zswpout           0       0       1725260  7048517  0       2535744  931420
>     zswpwb            0       0       0        0        0       0        0
>     pswpin            0       468612  481270   0        476434  535772   0
>     pswpout           0       574634  689625   0        612683  591881   0
>
> 'perf_normalized' is the performance metric, normalized to that of
> configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
> deviation of the averaged data points, as a ratio to the averaged metric
> value.  For example, configuration '0' performance was showing 6% stdev.
> Configurations 1-2 and 1-3 were having about 11% and 14% stdev.  So the
> measurement is not very stable.  Please keep this in your mind when
> reading these results.  It shows some rough patterns, though.
>
> Under about 10% of working set memory pressure, the performance was
> dropped to about 0.6% of no-pressure one, when the normal swap is used
> (1-1).  Actually ~10% working set pressure is not a mild one, at least
> on this test setup.
>
> By turning zswap on (1-2), the performance was improved about 4x,
> resulting in about 2.3% of no-pressure one.  Because of the
> incompressible pages in the third memory region, a significant amount of
> (non-zswap) swap I/O operations were made, though.
>
> By enabling the incompressible pages handling feature that is introduced
> by this patch (1-3), about 34% performance improvement was made,
> resulting in about 3.1% of no-pressure one.  As expected, compression
> failed pages were handled by zswap, and hence no (non-zswap) swap I/O
> was made in this configuration.
>
> Under about 20% of working set memory pressure, which could be extreme,
> the performance drops down to 0.3% of no-pressure one when only the
> normal swap is used (2-1).  Enabling zswap significantly improves it, up
> to 1.16%, though again showing a significant number of (non-zswap) swap
> I/O due to incompressible pages.
>
> Enabling the incompressible pages handling feature of this patch (2-3)
> reduced non-zswap swap I/O as expected, but the performance became
> worse, 0.03% of no-pressure one.  It turned out this is because of
> memory.high-imposed penalty_jiffies sleep.  By commenting out the
> penalty_jiffies sleep code of mem_cgroup_handle_over_high(), the
> performance became higher than 2-2.

Yeah this is pretty extreme. I suppose you can construct scenarios
where disk swapping delays are still better than memory.high
violations throttling.

That said, I suppose under very hard memory pressure like this, users
must make a choice anyway:

1. If they're OK with disk swapping, consider enabling zswap shrinker
:) That'll evict a bunch of compressed objects from zswap to disk swap
to avoid memory limit violations.

2. If they're not OK with disk swapping, then throttling/OOMs is
unavoidable. In fact, we're speeding up the OOM process, which is
arguably desirable? This is precisely the pathological behavior that
some of our workloads are observing in production - they spin the
wheel for a really long time trying (and failing) to reclaim
incompressible anonymous memory, hogging the CPUs.

>
> 20% of working set memory pressure is pretty extreme, but anyway the
> incompressible pages handling feature could make it worse in certain
> setups.  Hence this version of the patch adds the parameter for turning
> the feature on/off as needed, and makes it disabled by default.
>
> Related Works
> -------------
>
> This is not an entirely new attempt.  There were a couple of similar
> approaches in the past.  Nhat Pham tried a very same approach[2] in
> October 2023.  Takero Funaki also tried a very similar approach[3] in
> April 2024.
>
> The two approaches didn't get merged mainly due to the metadata overhead
> concern.  I described why I think that shouldn't be a problem for this
> change, which is automatically disabled when writeback is disabled, at
> the beginning of this changelog.
>
> This patch is not particularly different from those, and actually built
> upon those.  I wrote this from scratch again, though.  I have no good
> idea about how I can give credits to the authors of the previously made
> nearly-same attempts, and I should be responsible to maintain this
> change if this is upstreamed, so taking the authorship for now.  Please
> let me know if you know a better way to give them their deserved
> credits.

I'd take Suggested-by if you feel bad ;)

But, otherwise, no need to add me as author! Unless you copy the OG
patch in future versions lol.

>
> [1] https://github.com/sjp38/eval_zswap/blob/master/run.sh
> [2] https://lore.kernel.org/20231017003519.1426574-3-nphamcs@gmail.com
> [3] https://lore.kernel.org/20240706022523.1104080-6-flintglass@gmail.com
>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
>  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 65 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 7e02c760955f..e021865696c6 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -45,6 +45,9 @@
>  /* The number of compressed pages currently stored in zswap */
>  atomic_long_t zswap_stored_pages = ATOMIC_LONG_INIT(0);
>
> +/* The number of uncompressed pages currently stored in zswap */
> +atomic_long_t zswap_stored_uncompressed_pages = ATOMIC_LONG_INIT(0);
> +
>  /*
>   * The statistics below are not protected from concurrent access for
>   * performance reasons so they may not be a 100% accurate.  However,
> @@ -129,6 +132,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
>                 CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
>  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
>
> +/* Store incompressible pages as is */
> +static bool zswap_save_incompressible_pages;
> +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> +               bool, 0644);
> +
>  bool zswap_is_enabled(void)
>  {
>         return zswap_enabled;
> @@ -190,7 +198,8 @@ static struct shrinker *zswap_shrinker;
>   *              writeback logic. The entry is only reclaimed by the writeback
>   *              logic if referenced is unset. See comments in the shrinker
>   *              section for context.
> - * pool - the zswap_pool the entry's data is in
> + * orig_data - uncompressed original data of the page, if length is PAGE_SIZE
> + * pool - the zswap_pool the entry's data is in, if length is not PAGE_SIZE
>   * handle - zpool allocation handle that stores the compressed page data
>   * objcg - the obj_cgroup that the compressed memory is charged to
>   * lru - handle to the pool's lru used to evict pages.
> @@ -199,7 +208,10 @@ struct zswap_entry {
>         swp_entry_t swpentry;
>         unsigned int length;
>         bool referenced;
> -       struct zswap_pool *pool;
> +       union {
> +               void *orig_data;
> +               struct zswap_pool *pool;
> +       };
>         unsigned long handle;
>         struct obj_cgroup *objcg;
>         struct list_head lru;
> @@ -500,7 +512,7 @@ unsigned long zswap_total_pages(void)
>                 total += zpool_get_total_pages(pool->zpool);
>         rcu_read_unlock();
>
> -       return total;
> +       return total + atomic_long_read(&zswap_stored_uncompressed_pages);
>  }
>
>  static bool zswap_check_limits(void)
> @@ -805,8 +817,13 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
>  static void zswap_entry_free(struct zswap_entry *entry)
>  {
>         zswap_lru_del(&zswap_list_lru, entry);
> -       zpool_free(entry->pool->zpool, entry->handle);
> -       zswap_pool_put(entry->pool);
> +       if (entry->length == PAGE_SIZE) {
> +               kfree(entry->orig_data);
> +               atomic_long_dec(&zswap_stored_uncompressed_pages);
> +       } else {
> +               zpool_free(entry->pool->zpool, entry->handle);
> +               zswap_pool_put(entry->pool);
> +       }
>         if (entry->objcg) {
>                 obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
>                 obj_cgroup_put(entry->objcg);
> @@ -937,6 +954,36 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
>         mutex_unlock(&acomp_ctx->mutex);
>  }
>
> +/*
> + * If the compression is failed, try saving the content as is without
> + * compression, to keep the LRU order.  This can increase memory overhead from
> + * metadata, but in common zswap use cases where there are sufficient amount of
> + * compressible pages, the overhead should be not ciritical, and can be
> + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> + *
> + * When the writeback is disabled, however, the additional overhead could be
> + * problematic.  For the case, just return the failure.  swap_writeout() will
> + * put the page back to the active LRU list in the case.
> + */
> +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> +               struct zswap_entry *entry)
> +{
> +       if (!zswap_save_incompressible_pages)
> +               return comp_ret;
> +       if (!mem_cgroup_zswap_writeback_enabled(
> +                               folio_memcg(page_folio(page))))
> +               return comp_ret;
> +
> +       entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> +                       __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> +       if (!entry->orig_data)
> +               return -ENOMEM;
> +       memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> +       entry->length = PAGE_SIZE;
> +       atomic_long_inc(&zswap_stored_uncompressed_pages);
> +       return 0;
> +}
> +
>  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>                            struct zswap_pool *pool)
>  {
> @@ -976,8 +1023,11 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>          */
>         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
>         dlen = acomp_ctx->req->dlen;
> -       if (comp_ret)
> +       if (comp_ret) {
> +               comp_ret = zswap_handle_compression_failure(comp_ret, page,
> +                               entry);
>                 goto unlock;
> +       }
>
>         zpool = pool->zpool;
>         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> @@ -1009,6 +1059,11 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         int decomp_ret, dlen;
>         u8 *src, *obj;
>
> +       if (entry->length == PAGE_SIZE) {
> +               memcpy_to_folio(folio, 0, entry->orig_data, entry->length);
> +               return true;
> +       }
> +
>         acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
>         obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer);
>
> @@ -1518,7 +1573,8 @@ static bool zswap_store_page(struct page *page,
>          * The opposite actions will be performed by zswap_entry_free()
>          * when the entry is removed from the tree.
>          */
> -       zswap_pool_get(pool);
> +       if (entry->length != PAGE_SIZE)
> +               zswap_pool_get(pool);
>         if (objcg) {
>                 obj_cgroup_get(objcg);
>                 obj_cgroup_charge_zswap(objcg, entry->length);
> @@ -1535,7 +1591,8 @@ static bool zswap_store_page(struct page *page,
>          *    The publishing order matters to prevent writeback from seeing
>          *    an incoherent entry.
>          */
> -       entry->pool = pool;
> +       if (entry->length != PAGE_SIZE)
> +               entry->pool = pool;
>         entry->swpentry = page_swpentry;
>         entry->objcg = objcg;
>         entry->referenced = true;
>
> base-commit: b771a67861b3538324c3df25d337f4713ee53e03
> --
> 2.39.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
  2025-07-31  0:21 ` Nhat Pham
  2025-07-31  0:48 ` Nhat Pham
@ 2025-07-31 15:27 ` Johannes Weiner
  2025-07-31 17:09   ` SeongJae Park
  2025-07-31 17:20 ` Joshua Hahn
  3 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2025-07-31 15:27 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Nhat Pham, Takero Funaki,
	Yosry Ahmed, kernel-team, linux-kernel, linux-mm

Hi SJ,

On Wed, Jul 30, 2025 at 04:40:59PM -0700, SeongJae Park wrote:
> When zswap writeback is enabled and it fails compressing a given page,
> zswap lets the page be swapped out to the backing swap device.  This
> behavior breaks the zswap's writeback LRU order, and hence users can
> experience unexpected latency spikes.

+1 Thanks for working on this!

> Keep the LRU order by storing the original content in zswap as-is.  The
> original content is saved in a dynamically allocated page size buffer,
> and the pointer to the buffer is kept in zswap_entry, on the space for
> zswap_entry->pool.  Whether the space is used for the original content
> or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
> 
> This avoids increasing per zswap entry metadata overhead.  But as the
> number of incompressible pages increases, zswap metadata overhead is
> proportionally increased.  The overhead should not be problematic in
> usual cases, since the zswap metadata for single zswap entry is much
> smaller than PAGE_SIZE, and in common zswap use cases there should be
> sufficient amount of compressible pages.  Also it can be mitigated by
> the zswap writeback.
> 
> When the memory pressure comes from memcg's memory.high and zswap
> writeback is set to be triggered for that, however, the penalty_jiffies
> sleep could degrade the performance.  Add a parameter, namely
> 'save_incompressible_pages', to turn the feature on/off as users want.
> It is turned off by default.
> 
> When the writeback is just disabled, the additional overhead could be
> problematic.  For the case, keep the behavior that just returns the
> failure and let swap_writeout() puts the page back to the active LRU
> list in the case.  It is known to be suboptimal when the incompressible
> pages are cold, since the incompressible pages will continuously be
> tried to be zswapped out, and burn CPU cycles for compression attempts
> that will anyway fails.  But that's out of the scope of this patch.
> 
> Tests
> -----
> 
> I tested this patch using a simple self-written microbenchmark that is
> available at GitHub[1].  You can reproduce the test I did by executing
> run_tests.sh of the repo on your system.  Note that the repo's
> documentation is not good as of this writing, so you may need to read
> and use the code.
> 
> The basic test scenario is simple.  Run a test program making artificial
> accesses to memory having artificial content under memory.high-set
> memory limit and measure how many accesses were made in given time.
> 
> The test program repeatedly and randomly access three anonymous memory
> regions.  The regions are all 500 MiB size, and accessed in same
> probability.  Two of those are filled up with a simple content that can
> easily be compressed, while the remaining one is filled up with a
> content that read from /dev/urandom, which is easy to fail at
> compressing.  The program runs for two minutes and prints out the number
> of accesses made every five seconds.
> 
> The test script runs the program under below seven configurations.
> 
> - 0: memory.high is set to 2 GiB, zswap is disabled.
> - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> - 1-2: Same to 1-1, but zswap is enabled.
> - 1-3: Same to 1-2, but save_incompressible_pages is turned on.
> - 2-1: memory.high is set to 1200 MiB, zswap is disabled.
> - 2-2: Same to 2-1, but zswap is enabled.
> - 2-3: Same to 2-2, but save_incompressible_pages is turned on.
> 
> Configuration '0' is for showing the original memory performance.
> Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
> zswap, and this patch under a level of memory pressure (~10% of working
> set).
> 
> Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
> show those under a severe level of memory pressure (~20% of the working
> set).
> 
> Because the per-5 seconds performance is not very reliable, I measured
> the average of that for the last one minute period of the test program
> run.  I also measured a few vmstat counters including zswpin, zswpout,
> zswpwb, pswpin and pswpout during the test runs.
> 
> The measurement results are as below.  To save space, I show performance
> numbers that are normalized to that of the configuration '0' (no memory
> pressure), only.  The averaged accesses per 5 seconds of configuration
> '0' was 34612740.66.
> 
>     config            0       1-1     1-2      1-3      2-1     2-2      2-3
>     perf_normalized   1.0000  0.0060  0.0230   0.0310   0.0030  0.0116   0.0003
>     perf_stdev_ratio  0.06    0.04    0.11     0.14     0.03    0.05     0.26
>     zswpin            0       0       1701702  6982188  0       2479848  815742
>     zswpout           0       0       1725260  7048517  0       2535744  931420
>     zswpwb            0       0       0        0        0       0        0
>     pswpin            0       468612  481270   0        476434  535772   0
>     pswpout           0       574634  689625   0        612683  591881   0

zswpwb being zero across the board suggests the zswap shrinker was not
enabled. Can you double check that?

We should only take on incompressible pages to maintain LRU order on
their way to disk. If we don't try to move them out, then it's better
to reject them and avoid the metadata overhead.

> @@ -199,7 +208,10 @@ struct zswap_entry {
>  	swp_entry_t swpentry;
>  	unsigned int length;
>  	bool referenced;
> -	struct zswap_pool *pool;
> +	union {
> +		void *orig_data;
> +		struct zswap_pool *pool;
> +	};
>  	unsigned long handle;
>  	struct obj_cgroup *objcg;
>  	struct list_head lru;
> @@ -500,7 +512,7 @@ unsigned long zswap_total_pages(void)
>  		total += zpool_get_total_pages(pool->zpool);
>  	rcu_read_unlock();
>  
> -	return total;
> +	return total + atomic_long_read(&zswap_stored_uncompressed_pages);
>  }
>  
>  static bool zswap_check_limits(void)
> @@ -805,8 +817,13 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
>  static void zswap_entry_free(struct zswap_entry *entry)
>  {
>  	zswap_lru_del(&zswap_list_lru, entry);
> -	zpool_free(entry->pool->zpool, entry->handle);
> -	zswap_pool_put(entry->pool);
> +	if (entry->length == PAGE_SIZE) {
> +		kfree(entry->orig_data);
> +		atomic_long_dec(&zswap_stored_uncompressed_pages);
> +	} else {
> +		zpool_free(entry->pool->zpool, entry->handle);
> +		zswap_pool_put(entry->pool);
> +	}
>  	if (entry->objcg) {
>  		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
>  		obj_cgroup_put(entry->objcg);
> @@ -937,6 +954,36 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
>  	mutex_unlock(&acomp_ctx->mutex);
>  }
>  
> +/*
> + * If the compression is failed, try saving the content as is without
> + * compression, to keep the LRU order.  This can increase memory overhead from
> + * metadata, but in common zswap use cases where there are sufficient amount of
> + * compressible pages, the overhead should be not ciritical, and can be
> + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> + *
> + * When the writeback is disabled, however, the additional overhead could be
> + * problematic.  For the case, just return the failure.  swap_writeout() will
> + * put the page back to the active LRU list in the case.
> + */
> +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> +		struct zswap_entry *entry)
> +{
> +	if (!zswap_save_incompressible_pages)
> +		return comp_ret;
> +	if (!mem_cgroup_zswap_writeback_enabled(
> +				folio_memcg(page_folio(page))))
> +		return comp_ret;
> +
> +	entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> +			__GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> +	if (!entry->orig_data)
> +		return -ENOMEM;
> +	memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> +	entry->length = PAGE_SIZE;
> +	atomic_long_inc(&zswap_stored_uncompressed_pages);
> +	return 0;
> +}

Better to still use the backend (zsmalloc) for storage. It'll give you
migratability, highmem handling, stats etc.

So if compression fails, still do zpool_malloc(), but for PAGE_SIZE
and copy over the uncompressed page contents.

struct zswap_entry has a hole after bool referenced, so you can add a
flag to mark those uncompressed entries at no extra cost.

Then you can detect this case in zswap_decompress() and handle the
uncompressed copy into @folio accordingly.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31  0:21 ` Nhat Pham
  2025-07-31  0:22   ` Nhat Pham
@ 2025-07-31 16:43   ` SeongJae Park
  1 sibling, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2025-07-31 16:43 UTC (permalink / raw)
  To: Nhat Pham
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Johannes Weiner,
	Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel, linux-mm

Hi Nhat,

On Wed, 30 Jul 2025 17:21:44 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device.  This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
> >
> > Keep the LRU order by storing the original content in zswap as-is.  The
> > original content is saved in a dynamically allocated page size buffer,
> > and the pointer to the buffer is kept in zswap_entry, on the space for
> > zswap_entry->pool.  Whether the space is used for the original content
> > or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
[...]
> > ---
> >  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 65 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 7e02c760955f..e021865696c6 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
[...]
> > +/*
> > + * If the compression is failed, try saving the content as is without
> > + * compression, to keep the LRU order.  This can increase memory overhead from
> > + * metadata, but in common zswap use cases where there are sufficient amount of
> > + * compressible pages, the overhead should be not ciritical, and can be
> > + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic.  For the case, just return the failure.  swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> > +               struct zswap_entry *entry)
> > +{
> > +       if (!zswap_save_incompressible_pages)
> > +               return comp_ret;
> > +       if (!mem_cgroup_zswap_writeback_enabled(
> > +                               folio_memcg(page_folio(page))))
> > +               return comp_ret;
> > +
> > +       entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> > +                       __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> 
> Hmm, seems like this new buffer is not migratable (for compaction etc.?)
> 
> My understanding is that zsmalloc's allocated memory can be migrated
> (which is why zswap only works with a handle - it's a layer of
> indirection that gives zsmalloc the ability to move memory around).
> 
> Besides, why should we re-invent the wheel when zsmalloc already
> handles page-sized objects? :)

Makes sense, I will use zpool in the next version.

I actually saw both you and Takero did so in your versions, but I didn't
realize the migration benefit of the approach.  Thank you for enlightening me,
now I think this migration benefit is important, and I will make the next
version to provide the migratability reusing zpool.

> 
> > +       if (!entry->orig_data)
> > +               return -ENOMEM;
> > +       memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> > +       entry->length = PAGE_SIZE;
> > +       atomic_long_inc(&zswap_stored_uncompressed_pages);
> > +       return 0;
> > +}
> > +
> >  static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >                            struct zswap_pool *pool)
> >  {
> > @@ -976,8 +1023,11 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> >          */
> >         comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
> >         dlen = acomp_ctx->req->dlen;
> > -       if (comp_ret)
> > +       if (comp_ret) {
> > +               comp_ret = zswap_handle_compression_failure(comp_ret, page,
> > +                               entry);
> >                 goto unlock;
> > +       }
> >
> >         zpool = pool->zpool;
> >         gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> > @@ -1009,6 +1059,11 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
> >         int decomp_ret, dlen;
> >         u8 *src, *obj;
> >
> > +       if (entry->length == PAGE_SIZE) {
> > +               memcpy_to_folio(folio, 0, entry->orig_data, entry->length);
> > +               return true;
> > +       }
> 
> This might not be safe.
> 
> It's conceivable that in zswap_compress(), some compression algorithm
> "successfully" compresses a page to the same size (comp_ret == 0). We
> hand that to zsmalloc, which happily stores the page.
> 
> When we "decompress" the page again, we will attempt to
> memcpy_to_folio from a bogus address (the handle from zsmalloc).

Makes sense, thank you for catching this.

> 
> So, in zswap_compress, you have to treat both comp_ret == 0 and dlen
> == PAGE_SIZE as "compression failure".

I saw your reply saying you were meaning both comp_ret != 0 and dlen ==
PAGE_SIZE, and yes, this makes sense.  I will do so in the next version.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31  0:22   ` Nhat Pham
@ 2025-07-31 16:43     ` SeongJae Park
  0 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2025-07-31 16:43 UTC (permalink / raw)
  To: Nhat Pham
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Johannes Weiner,
	Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Wed, 30 Jul 2025 17:22:51 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> On Wed, Jul 30, 2025 at 5:21 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > This might not be safe.
> >
> > It's conceivable that in zswap_compress(), some compression algorithm
> > "successfully" compresses a page to the same size (comp_ret == 0). We
> > hand that to zsmalloc, which happily stores the page.
> >
> > When we "decompress" the page again, we will attempt to
> > memcpy_to_folio from a bogus address (the handle from zsmalloc).
> >
> > So, in zswap_compress, you have to treat both comp_ret == 0 and dlen
> > == PAGE_SIZE as "compression failure".
> 
> Meant to say comp_ret != 0 here... sorry for the confusion...

No worry, thank you for clarifying!


Thanks,
SJ


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31  0:48 ` Nhat Pham
@ 2025-07-31 16:56   ` SeongJae Park
  0 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2025-07-31 16:56 UTC (permalink / raw)
  To: Nhat Pham
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Johannes Weiner,
	Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Wed, 30 Jul 2025 17:48:09 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> On Wed, Jul 30, 2025 at 4:41 PM SeongJae Park <sj@kernel.org> wrote:
> 
> Now, taking a look at the conceptual side of the patch:
> 
> >
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device.  This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
> >
> > Keep the LRU order by storing the original content in zswap as-is.  The
> > original content is saved in a dynamically allocated page size buffer,
> > and the pointer to the buffer is kept in zswap_entry, on the space for
> > zswap_entry->pool.  Whether the space is used for the original content
> > or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
> >
> > This avoids increasing per zswap entry metadata overhead.  But as the
> > number of incompressible pages increases, zswap metadata overhead is
> > proportionally increased.  The overhead should not be problematic in
> > usual cases, since the zswap metadata for single zswap entry is much
> > smaller than PAGE_SIZE, and in common zswap use cases there should be
> > sufficient amount of compressible pages.  Also it can be mitigated by
> > the zswap writeback.
> >
> > When the memory pressure comes from memcg's memory.high and zswap
> > writeback is set to be triggered for that, however, the penalty_jiffies
> > sleep could degrade the performance.  Add a parameter, namely
> > 'save_incompressible_pages', to turn the feature on/off as users want.
> > It is turned off by default.
> >
> > When the writeback is just disabled, the additional overhead could be
> > problematic.  For the case, keep the behavior that just returns the
> > failure and let swap_writeout() puts the page back to the active LRU
> > list in the case.  It is known to be suboptimal when the incompressible
> > pages are cold, since the incompressible pages will continuously be
> > tried to be zswapped out, and burn CPU cycles for compression attempts
> > that will anyway fails.  But that's out of the scope of this patch.
> 
> As a follow up, we can reuse the "swapped out" page (and its struct
> page) to store in the zswap pool (and LRU).

Thank you for adding this.  I will include this into the changelog of the next
version of this patch.

> 
> >
> > Tests
> > -----
> >
> > I tested this patch using a simple self-written microbenchmark that is
> > available at GitHub[1].  You can reproduce the test I did by executing
> > run_tests.sh of the repo on your system.  Note that the repo's
> > documentation is not good as of this writing, so you may need to read
> > and use the code.
> >
> > The basic test scenario is simple.  Run a test program making artificial
> > accesses to memory having artificial content under memory.high-set
> > memory limit and measure how many accesses were made in given time.
> >
> > The test program repeatedly and randomly access three anonymous memory
> > regions.  The regions are all 500 MiB size, and accessed in same
> > probability.  Two of those are filled up with a simple content that can
> > easily be compressed, while the remaining one is filled up with a
> > content that read from /dev/urandom, which is easy to fail at
> > compressing.  The program runs for two minutes and prints out the number
> > of accesses made every five seconds.
> >
> > The test script runs the program under below seven configurations.
> >
> > - 0: memory.high is set to 2 GiB, zswap is disabled.
> > - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> > - 1-2: Same to 1-1, but zswap is enabled.
> > - 1-3: Same to 1-2, but save_incompressible_pages is turned on.
> > - 2-1: memory.high is set to 1200 MiB, zswap is disabled.
> > - 2-2: Same to 2-1, but zswap is enabled.
> > - 2-3: Same to 2-2, but save_incompressible_pages is turned on.
> >
> > Configuration '0' is for showing the original memory performance.
> > Configurations 1-1, 1-2 and 1-3 are for showing the performance of swap,
> > zswap, and this patch under a level of memory pressure (~10% of working
> > set).
> >
> > Configurations 2-1, 2-2 and 2-3 are similar to 1-1, 1-2 and 1-3 but to
> > show those under a severe level of memory pressure (~20% of the working
> > set).
> >
> > Because the per-5 seconds performance is not very reliable, I measured
> > the average of that for the last one minute period of the test program
> > run.  I also measured a few vmstat counters including zswpin, zswpout,
> > zswpwb, pswpin and pswpout during the test runs.
> >
> > The measurement results are as below.  To save space, I show performance
> > numbers that are normalized to that of the configuration '0' (no memory
> > pressure), only.  The averaged accesses per 5 seconds of configuration
> > '0' was 34612740.66.
> >
> >     config            0       1-1     1-2      1-3      2-1     2-2      2-3
> >     perf_normalized   1.0000  0.0060  0.0230   0.0310   0.0030  0.0116   0.0003
> >     perf_stdev_ratio  0.06    0.04    0.11     0.14     0.03    0.05     0.26
> >     zswpin            0       0       1701702  6982188  0       2479848  815742
> >     zswpout           0       0       1725260  7048517  0       2535744  931420
> >     zswpwb            0       0       0        0        0       0        0
> >     pswpin            0       468612  481270   0        476434  535772   0
> >     pswpout           0       574634  689625   0        612683  591881   0
> >
> > 'perf_normalized' is the performance metric, normalized to that of
> > configuration '0' (no pressure).  'perf_stdev_ratio' is the standard
> > deviation of the averaged data points, as a ratio to the averaged metric
> > value.  For example, configuration '0' performance was showing 6% stdev.
> > Configurations 1-2 and 1-3 were having about 11% and 14% stdev.  So the
> > measurement is not very stable.  Please keep this in your mind when
> > reading these results.  It shows some rough patterns, though.
> >
> > Under about 10% of working set memory pressure, the performance was
> > dropped to about 0.6% of no-pressure one, when the normal swap is used
> > (1-1).  Actually ~10% working set pressure is not a mild one, at least
> > on this test setup.
> >
> > By turning zswap on (1-2), the performance was improved about 4x,
> > resulting in about 2.3% of no-pressure one.  Because of the
> > incompressible pages in the third memory region, a significant amount of
> > (non-zswap) swap I/O operations were made, though.
> >
> > By enabling the incompressible pages handling feature that is introduced
> > by this patch (1-3), about 34% performance improvement was made,
> > resulting in about 3.1% of no-pressure one.  As expected, compression
> > failed pages were handled by zswap, and hence no (non-zswap) swap I/O
> > was made in this configuration.
> >
> > Under about 20% of working set memory pressure, which could be extreme,
> > the performance drops down to 0.3% of no-pressure one when only the
> > normal swap is used (2-1).  Enabling zswap significantly improves it, up
> > to 1.16%, though again showing a significant number of (non-zswap) swap
> > I/O due to incompressible pages.
> >
> > Enabling the incompressible pages handling feature of this patch (2-3)
> > reduced non-zswap swap I/O as expected, but the performance became
> > worse, 0.03% of no-pressure one.  It turned out this is because of
> > memory.high-imposed penalty_jiffies sleep.  By commenting out the
> > penalty_jiffies sleep code of mem_cgroup_handle_over_high(), the
> > performance became higher than 2-2.
> 
> Yeah this is pretty extreme. I suppose you can construct scenarios
> where disk swapping delays are still better than memory.high
> violations throttling.
> 
> That said, I suppose under very hard memory pressure like this, users
> must make a choice anyway:
> 
> 1. If they're OK with disk swapping, consider enabling zswap shrinker
> :) That'll evict a bunch of compressed objects from zswap to disk swap
> to avoid memory limit violations.

I agree.  And while I'm arguing I believe this is ok when writeback is enabled,
my test is not exploring the case since it is not enabling zswap shrinker or
set max_pool_percent.  And hence the test results show zero zswpwb always.  I
will extend the test setup to explore this case.

> 
> 2. If they're not OK with disk swapping, then throttling/OOMs is
> unavoidable. In fact, we're speeding up the OOM process, which is
> arguably desirable? This is precisely the pathological behavior that
> some of our workloads are observing in production - they spin the
> wheel for a really long time trying (and failing) to reclaim
> incompressible anonymous memory, hogging the CPUs.

I agree.  When memory.max is appropriately set, I think the system will work in
the desirable way.

> 
> >
> > 20% of working set memory pressure is pretty extreme, but anyway the
> > incompressible pages handling feature could make it worse in certain
> > setups.  Hence this version of the patch adds the parameter for turning
> > the feature on/off as needed, and makes it disabled by default.
> >
> > Related Works
> > -------------
> >
> > This is not an entirely new attempt.  There were a couple of similar
> > approaches in the past.  Nhat Pham tried a very same approach[2] in
> > October 2023.  Takero Funaki also tried a very similar approach[3] in
> > April 2024.
> >
> > The two approaches didn't get merged mainly due to the metadata overhead
> > concern.  I described why I think that shouldn't be a problem for this
> > change, which is automatically disabled when writeback is disabled, at
> > the beginning of this changelog.
> >
> > This patch is not particularly different from those, and actually built
> > upon those.  I wrote this from scratch again, though.  I have no good
> > idea about how I can give credits to the authors of the previously made
> > nearly-same attempts, and I should be responsible to maintain this
> > change if this is upstreamed, so taking the authorship for now.  Please
> > let me know if you know a better way to give them their deserved
> > credits.
> 
> I'd take Suggested-by if you feel bad ;)
> 
> But, otherwise, no need to add me as author! Unless you copy the OG
> patch in future versions lol.

Nice idea, thank you for suggesting Suggested-by: :)

I'll add Suggested-by: tags for you and Takero in the next version.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31 15:27 ` Johannes Weiner
@ 2025-07-31 17:09   ` SeongJae Park
  2025-07-31 18:16     ` Johannes Weiner
  0 siblings, 1 reply; 12+ messages in thread
From: SeongJae Park @ 2025-07-31 17:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Nhat Pham,
	Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Thu, 31 Jul 2025 11:27:01 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Hi SJ,
> 
> On Wed, Jul 30, 2025 at 04:40:59PM -0700, SeongJae Park wrote:
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device.  This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
> 
> +1 Thanks for working on this!

My pleasure :)

[...]
> >     config            0       1-1     1-2      1-3      2-1     2-2      2-3
> >     perf_normalized   1.0000  0.0060  0.0230   0.0310   0.0030  0.0116   0.0003
> >     perf_stdev_ratio  0.06    0.04    0.11     0.14     0.03    0.05     0.26
> >     zswpin            0       0       1701702  6982188  0       2479848  815742
> >     zswpout           0       0       1725260  7048517  0       2535744  931420
> >     zswpwb            0       0       0        0        0       0        0
> >     pswpin            0       468612  481270   0        476434  535772   0
> >     pswpout           0       574634  689625   0        612683  591881   0
> 
> zswpwb being zero across the board suggests the zswap shrinker was not
> enabled. Can you double check that?

You're correct, I didn't.

> 
> We should only take on incompressible pages to maintain LRU order on
> their way to disk. If we don't try to move them out, then it's better
> to reject them and avoid the metadata overhead.

I agree.  I will update the test to explore the writeback effect, and share the
results, by the posting of the next version of this patch.

[...]
> > +/*
> > + * If the compression is failed, try saving the content as is without
> > + * compression, to keep the LRU order.  This can increase memory overhead from
> > + * metadata, but in common zswap use cases where there are sufficient amount of
> > + * compressible pages, the overhead should be not ciritical, and can be
> > + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic.  For the case, just return the failure.  swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> > +		struct zswap_entry *entry)
> > +{
> > +	if (!zswap_save_incompressible_pages)
> > +		return comp_ret;
> > +	if (!mem_cgroup_zswap_writeback_enabled(
> > +				folio_memcg(page_folio(page))))
> > +		return comp_ret;
> > +
> > +	entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> > +			__GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> > +	if (!entry->orig_data)
> > +		return -ENOMEM;
> > +	memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> > +	entry->length = PAGE_SIZE;
> > +	atomic_long_inc(&zswap_stored_uncompressed_pages);
> > +	return 0;
> > +}
> 
> Better to still use the backend (zsmalloc) for storage. It'll give you
> migratability, highmem handling, stats etc.

Nhat also pointed out the migratability.  Thank you for further letting me know
even more benefits from zsmalloc reuse.  As I also replied to Nhat, I agree
those are important benefits, and I will rework on the next version to use the
backend.

> 
> So if compression fails, still do zpool_malloc(), but for PAGE_SIZE
> and copy over the uncompressed page contents.
> 
> struct zswap_entry has a hole after bool referenced, so you can add a
> flag to mark those uncompressed entries at no extra cost.
> 
> Then you can detect this case in zswap_decompress() and handle the
> uncompressed copy into @folio accordingly.

I think we could still use 'zswap_entry->length == PAGE_SIZE' as the indicator,
As long as we ensure that always means the content is incompressed, following
Nhat's suggestion[1].

Please let me know if I'm missing something.

[1] https://lore.kernel.org/CAKEwX=Py+yvxtR5zt-1DtskhGWWHkRP_h8kneEHSrcQ947=m9Q@mail.gmail.com


Thanks,
SJ


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
                   ` (2 preceding siblings ...)
  2025-07-31 15:27 ` Johannes Weiner
@ 2025-07-31 17:20 ` Joshua Hahn
  2025-08-01 19:57   ` SeongJae Park
  3 siblings, 1 reply; 12+ messages in thread
From: Joshua Hahn @ 2025-07-31 17:20 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Johannes Weiner, Nhat Pham,
	Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Wed, 30 Jul 2025 16:40:59 -0700 SeongJae Park <sj@kernel.org> wrote:

Hi SJ, thank you for your patch! This is a really cool idea : -)

> When zswap writeback is enabled and it fails compressing a given page,
> zswap lets the page be swapped out to the backing swap device.  This
> behavior breaks the zswap's writeback LRU order, and hence users can
> experience unexpected latency spikes.
> 
> Keep the LRU order by storing the original content in zswap as-is.  The
> original content is saved in a dynamically allocated page size buffer,
> and the pointer to the buffer is kept in zswap_entry, on the space for
> zswap_entry->pool.  Whether the space is used for the original content
> or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
> 
> This avoids increasing per zswap entry metadata overhead.  But as the
> number of incompressible pages increases, zswap metadata overhead is
> proportionally increased.  The overhead should not be problematic in
> usual cases, since the zswap metadata for single zswap entry is much
> smaller than PAGE_SIZE, and in common zswap use cases there should be
> sufficient amount of compressible pages.  Also it can be mitigated by
> the zswap writeback.

I think one other benefit that is worth mentioning here is that by moving
the incompressible pages to the zswap LRU, we prevent wasting CPU cycles
on pages that have not changed since their last compression attempt.

This concern is a real concern that we have seen manifest as wasted cycles
in zswap wasted on trying to compress the same pages over and over again,
and failing each time. This is also made worse by the fact that wasted CPU
cycles are bad, but even worse when we are reclaiming and must free up memory!

> When the memory pressure comes from memcg's memory.high and zswap
> writeback is set to be triggered for that, however, the penalty_jiffies
> sleep could degrade the performance.  Add a parameter, namely
> 'save_incompressible_pages', to turn the feature on/off as users want.
> It is turned off by default.
> 
> When the writeback is just disabled, the additional overhead could be
> problematic.  For the case, keep the behavior that just returns the
> failure and let swap_writeout() puts the page back to the active LRU
> list in the case.  It is known to be suboptimal when the incompressible
> pages are cold, since the incompressible pages will continuously be
> tried to be zswapped out, and burn CPU cycles for compression attempts
> that will anyway fails.  But that's out of the scope of this patch.

Indeed, and your patch fixes this problem, at least for the writeback
enabled case!

[...snip...]

> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
>  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 65 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 7e02c760955f..e021865696c6 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -45,6 +45,9 @@
>  /* The number of compressed pages currently stored in zswap */
>  atomic_long_t zswap_stored_pages = ATOMIC_LONG_INIT(0);
>  
> +/* The number of uncompressed pages currently stored in zswap */
> +atomic_long_t zswap_stored_uncompressed_pages = ATOMIC_LONG_INIT(0);
> +
>  /*
>   * The statistics below are not protected from concurrent access for
>   * performance reasons so they may not be a 100% accurate.  However,
> @@ -129,6 +132,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
>  		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
>  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
>  
> +/* Store incompressible pages as is */
> +static bool zswap_save_incompressible_pages;
> +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> +		bool, 0644);
> +
>  bool zswap_is_enabled(void)
>  {
>  	return zswap_enabled;
> @@ -190,7 +198,8 @@ static struct shrinker *zswap_shrinker;
>   *              writeback logic. The entry is only reclaimed by the writeback
>   *              logic if referenced is unset. See comments in the shrinker
>   *              section for context.
> - * pool - the zswap_pool the entry's data is in
> + * orig_data - uncompressed original data of the page, if length is PAGE_SIZE
> + * pool - the zswap_pool the entry's data is in, if length is not PAGE_SIZE
>   * handle - zpool allocation handle that stores the compressed page data
>   * objcg - the obj_cgroup that the compressed memory is charged to
>   * lru - handle to the pool's lru used to evict pages.
> @@ -199,7 +208,10 @@ struct zswap_entry {
>  	swp_entry_t swpentry;
>  	unsigned int length;
>  	bool referenced;
> -	struct zswap_pool *pool;
> +	union {
> +		void *orig_data;
> +		struct zswap_pool *pool;
> +	};
>  	unsigned long handle;
>  	struct obj_cgroup *objcg;
>  	struct list_head lru;
> @@ -500,7 +512,7 @@ unsigned long zswap_total_pages(void)
>  		total += zpool_get_total_pages(pool->zpool);
>  	rcu_read_unlock();
>  
> -	return total;
> +	return total + atomic_long_read(&zswap_stored_uncompressed_pages);
>  }
>  
>  static bool zswap_check_limits(void)
> @@ -805,8 +817,13 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
>  static void zswap_entry_free(struct zswap_entry *entry)
>  {
>  	zswap_lru_del(&zswap_list_lru, entry);
> -	zpool_free(entry->pool->zpool, entry->handle);
> -	zswap_pool_put(entry->pool);
> +	if (entry->length == PAGE_SIZE) {
> +		kfree(entry->orig_data);
> +		atomic_long_dec(&zswap_stored_uncompressed_pages);
> +	} else {
> +		zpool_free(entry->pool->zpool, entry->handle);
> +		zswap_pool_put(entry->pool);
> +	}
>  	if (entry->objcg) {
>  		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
>  		obj_cgroup_put(entry->objcg);
> @@ -937,6 +954,36 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
>  	mutex_unlock(&acomp_ctx->mutex);
>  }
>  
> +/*
> + * If the compression is failed, try saving the content as is without
> + * compression, to keep the LRU order.  This can increase memory overhead from
> + * metadata, but in common zswap use cases where there are sufficient amount of
> + * compressible pages, the overhead should be not ciritical, and can be

					      NIT: s/ciritical/critical/?

> + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> + *
> + * When the writeback is disabled, however, the additional overhead could be
> + * problematic.  For the case, just return the failure.  swap_writeout() will

					           NIT: s/  / /?

[...snip...]

And I think Nhat and Johannes have covered the other concerns.

Thanks again SJ for working on this, I hope you have a great day!
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31 17:09   ` SeongJae Park
@ 2025-07-31 18:16     ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2025-07-31 18:16 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Chengming Zhou, Nhat Pham, Takero Funaki,
	Yosry Ahmed, kernel-team, linux-kernel, linux-mm

On Thu, Jul 31, 2025 at 10:09:22AM -0700, SeongJae Park wrote:
> On Thu, 31 Jul 2025 11:27:01 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> > So if compression fails, still do zpool_malloc(), but for PAGE_SIZE
> > and copy over the uncompressed page contents.
> > 
> > struct zswap_entry has a hole after bool referenced, so you can add a
> > flag to mark those uncompressed entries at no extra cost.
> > 
> > Then you can detect this case in zswap_decompress() and handle the
> > uncompressed copy into @folio accordingly.
> 
> I think we could still use 'zswap_entry->length == PAGE_SIZE' as the indicator,
> As long as we ensure that always means the content is incompressed, following
> Nhat's suggestion[1].
> 
> Please let me know if I'm missing something.

Ah, right. So if compression succeeds but the result is still
PAGE_SIZE you will treat it as failure and store it uncompressed. Then
PAGE_SIZE always means uncompressed for the readers.

Even better.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] mm/zswap: store compression failed page as-is
  2025-07-31 17:20 ` Joshua Hahn
@ 2025-08-01 19:57   ` SeongJae Park
  0 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2025-08-01 19:57 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: SeongJae Park, Andrew Morton, Chengming Zhou, Johannes Weiner,
	Nhat Pham, Takero Funaki, Yosry Ahmed, kernel-team, linux-kernel,
	linux-mm

On Thu, 31 Jul 2025 10:20:05 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> On Wed, 30 Jul 2025 16:40:59 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> Hi SJ, thank you for your patch! This is a really cool idea : -)
> 
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device.  This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
> > 
> > Keep the LRU order by storing the original content in zswap as-is.  The
> > original content is saved in a dynamically allocated page size buffer,
> > and the pointer to the buffer is kept in zswap_entry, on the space for
> > zswap_entry->pool.  Whether the space is used for the original content
> > or zpool is identified by 'zswap_entry->length == PAGE_SIZE'.
> > 
> > This avoids increasing per zswap entry metadata overhead.  But as the
> > number of incompressible pages increases, zswap metadata overhead is
> > proportionally increased.  The overhead should not be problematic in
> > usual cases, since the zswap metadata for single zswap entry is much
> > smaller than PAGE_SIZE, and in common zswap use cases there should be
> > sufficient amount of compressible pages.  Also it can be mitigated by
> > the zswap writeback.
> 
> I think one other benefit that is worth mentioning here is that by moving
> the incompressible pages to the zswap LRU, we prevent wasting CPU cycles
> on pages that have not changed since their last compression attempt.

Thank you for adding this.  But, isn't this only for writeback-disabled case?

If writeback is enabled and an incompressible page is tried to be zswapped-out,
a compression of it is tried and failed, and the page is swapped out to disk.
The compression will be tried again later only if it is accessed, swapped in,
and then again tried to be zswapped-out.

After this patch, a compression is tried and failed, the page is kept in
memory, linked on zswap's LRU list.  And the compression will be tried again
later, if it is accessed, zswapped in, and then again tried to be zswapped-out.

So in terms of number of compression attempts, this patch is not making a
direct difference, to my understanding.  The correct LRU order _might_ make
some indirect help in some complicated scenarios, though, but I wouldn't dare
to claim that.

Am I missing something?

> 
> This concern is a real concern that we have seen manifest as wasted cycles
> in zswap wasted on trying to compress the same pages over and over again,
> and failing each time. This is also made worse by the fact that wasted CPU
> cycles are bad, but even worse when we are reclaiming and must free up memory!
> 
> > When the memory pressure comes from memcg's memory.high and zswap
> > writeback is set to be triggered for that, however, the penalty_jiffies
> > sleep could degrade the performance.  Add a parameter, namely
> > 'save_incompressible_pages', to turn the feature on/off as users want.
> > It is turned off by default.
> > 
> > When the writeback is just disabled, the additional overhead could be
> > problematic.  For the case, keep the behavior that just returns the
> > failure and let swap_writeout() puts the page back to the active LRU
> > list in the case.  It is known to be suboptimal when the incompressible
> > pages are cold, since the incompressible pages will continuously be
> > tried to be zswapped out, and burn CPU cycles for compression attempts
> > that will anyway fails.  But that's out of the scope of this patch.
> 
> Indeed, and your patch fixes this problem, at least for the writeback
> enabled case!

Again, I don't think this patch is really fixing it unless I'm missing
something.

> 
> [...snip...]
> 
> > Signed-off-by: SeongJae Park <sj@kernel.org>
> > ---
> >  mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 65 insertions(+), 8 deletions(-)
> > 
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 7e02c760955f..e021865696c6 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -45,6 +45,9 @@
> >  /* The number of compressed pages currently stored in zswap */
> >  atomic_long_t zswap_stored_pages = ATOMIC_LONG_INIT(0);
> >  
> > +/* The number of uncompressed pages currently stored in zswap */
> > +atomic_long_t zswap_stored_uncompressed_pages = ATOMIC_LONG_INIT(0);
> > +
> >  /*
> >   * The statistics below are not protected from concurrent access for
> >   * performance reasons so they may not be a 100% accurate.  However,
> > @@ -129,6 +132,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
> >  		CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
> >  module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
> >  
> > +/* Store incompressible pages as is */
> > +static bool zswap_save_incompressible_pages;
> > +module_param_named(save_incompressible_pages, zswap_save_incompressible_pages,
> > +		bool, 0644);
> > +
> >  bool zswap_is_enabled(void)
> >  {
> >  	return zswap_enabled;
> > @@ -190,7 +198,8 @@ static struct shrinker *zswap_shrinker;
> >   *              writeback logic. The entry is only reclaimed by the writeback
> >   *              logic if referenced is unset. See comments in the shrinker
> >   *              section for context.
> > - * pool - the zswap_pool the entry's data is in
> > + * orig_data - uncompressed original data of the page, if length is PAGE_SIZE
> > + * pool - the zswap_pool the entry's data is in, if length is not PAGE_SIZE
> >   * handle - zpool allocation handle that stores the compressed page data
> >   * objcg - the obj_cgroup that the compressed memory is charged to
> >   * lru - handle to the pool's lru used to evict pages.
> > @@ -199,7 +208,10 @@ struct zswap_entry {
> >  	swp_entry_t swpentry;
> >  	unsigned int length;
> >  	bool referenced;
> > -	struct zswap_pool *pool;
> > +	union {
> > +		void *orig_data;
> > +		struct zswap_pool *pool;
> > +	};
> >  	unsigned long handle;
> >  	struct obj_cgroup *objcg;
> >  	struct list_head lru;
> > @@ -500,7 +512,7 @@ unsigned long zswap_total_pages(void)
> >  		total += zpool_get_total_pages(pool->zpool);
> >  	rcu_read_unlock();
> >  
> > -	return total;
> > +	return total + atomic_long_read(&zswap_stored_uncompressed_pages);
> >  }
> >  
> >  static bool zswap_check_limits(void)
> > @@ -805,8 +817,13 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
> >  static void zswap_entry_free(struct zswap_entry *entry)
> >  {
> >  	zswap_lru_del(&zswap_list_lru, entry);
> > -	zpool_free(entry->pool->zpool, entry->handle);
> > -	zswap_pool_put(entry->pool);
> > +	if (entry->length == PAGE_SIZE) {
> > +		kfree(entry->orig_data);
> > +		atomic_long_dec(&zswap_stored_uncompressed_pages);
> > +	} else {
> > +		zpool_free(entry->pool->zpool, entry->handle);
> > +		zswap_pool_put(entry->pool);
> > +	}
> >  	if (entry->objcg) {
> >  		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
> >  		obj_cgroup_put(entry->objcg);
> > @@ -937,6 +954,36 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
> >  	mutex_unlock(&acomp_ctx->mutex);
> >  }
> >  
> > +/*
> > + * If the compression is failed, try saving the content as is without
> > + * compression, to keep the LRU order.  This can increase memory overhead from
> > + * metadata, but in common zswap use cases where there are sufficient amount of
> > + * compressible pages, the overhead should be not ciritical, and can be
> 
> 					      NIT: s/ciritical/critical/?

Thank you for finding this!

> 
> > + * mitigated by the writeback.  Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic.  For the case, just return the failure.  swap_writeout() will
> 
> 					           NIT: s/  / /?

I was intentionally adding two spaces between paragraphs.  I find a few other
comments on zswap.c file are also using the convention...?

> 
> [...snip...]
> 
> And I think Nhat and Johannes have covered the other concerns.
> 
> Thanks again SJ for working on this, I hope you have a great day!

Thank you, I appreciate your review!  Also, happy Friday!


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-01 19:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 23:40 [RFC PATCH] mm/zswap: store compression failed page as-is SeongJae Park
2025-07-31  0:21 ` Nhat Pham
2025-07-31  0:22   ` Nhat Pham
2025-07-31 16:43     ` SeongJae Park
2025-07-31 16:43   ` SeongJae Park
2025-07-31  0:48 ` Nhat Pham
2025-07-31 16:56   ` SeongJae Park
2025-07-31 15:27 ` Johannes Weiner
2025-07-31 17:09   ` SeongJae Park
2025-07-31 18:16     ` Johannes Weiner
2025-07-31 17:20 ` Joshua Hahn
2025-08-01 19:57   ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).