* [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check
@ 2026-05-18 7:34 Youngjun Park
2026-05-18 7:34 ` [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip " Youngjun Park
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Youngjun Park @ 2026-05-18 7:34 UTC (permalink / raw)
To: linux-mm, akpm
Cc: chrisl, kasong, shikemeng, nphamcs, bhe, baohua, youngjun.park
Currently, the swap layer checks whether a page is entirely zero-filled
before writing it out to the swap device. However, some swap backends,
such as zram and our custom swap device, already perform their own
same-filled page checking internally. This results in redundant CPU operations
checking same page pattern.
This patchset introduces a new swapon flag, SWAP_FLAG_SKIP_ZERO_CHECK,
to eliminate this redundancy. I introduce this as a per-device flag
rather than a global setting because traditional swap devices still
benefit from the swap layer's zero page check to avoid unnecessary I/O.
By using this flag, userspace can selectively disable the zero check
only for specific backends.
Furthermore, on certain architectures where the zero map is managed via
a separate bitmap, skipping this check allows bypassing
the bitmap allocation entirely (saving memory).
This modification is based on the previous discussion with Nhat Pham [1].
Additionally, this patchset is built on top of Kairui Song's recent
patchset regarding swap table and zeromap modifications [2].
Tested simply with zram on QEMU to verify zero-filled page handling.
References:
[1] https://lore.kernel.org/linux-mm/acQvNRLpHwnHt7i+@yjaykim-PowerEdge-T330/
[2] https://lore.kernel.org/linux-mm/20260517-swap-table-p4-v5-0-88ae43e064c7@tencent.com/T/#t
Youngjun Park (2):
mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
mm: swap: do not allocate zero_bitmap if zero check is skipped
include/linux/swap.h | 4 +++-
mm/page_io.c | 7 ++++++-
mm/swap.h | 12 ++++++++++++
mm/swapfile.c | 14 ++++++++++----
4 files changed, 31 insertions(+), 6 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
2026-05-18 7:34 [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Youngjun Park
@ 2026-05-18 7:34 ` Youngjun Park
2026-05-19 6:30 ` Christoph Hellwig
2026-05-18 7:34 ` [RFC PATCH 2/2] mm: swap: do not allocate zero_bitmap if zero check is skipped Youngjun Park
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Youngjun Park @ 2026-05-18 7:34 UTC (permalink / raw)
To: linux-mm, akpm
Cc: chrisl, kasong, shikemeng, nphamcs, bhe, baohua, youngjun.park
The swap layer currently checks if a page is zero-filled before writing
it out. However, some swap backends like zram already perform their own
same-filled data checking. This causes redundant CPU cycles spent on
checking zero pages in both the swap layer and the backend.
Introduce a new swapon flag, SWAP_FLAG_SKIP_ZERO_CHECK.
When this flag is passed during swapon, the swap layer bypasses the
zero-filled page check and directly passes the page to the backend.
This feature is implemented as a per-swap device flag because other
devices still benefit from the swap layer's zero check to avoid
unnecessary I/O operations. Userspace can now selectively apply this
flag to devices like zram to reduce CPU overhead.
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
include/linux/swap.h | 4 +++-
mm/page_io.c | 7 ++++++-
mm/swap.h | 12 ++++++++++++
mm/swapfile.c | 3 +++
4 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 6d72778e6cc3..30282019a758 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -25,10 +25,11 @@ struct bio;
#define SWAP_FLAG_DISCARD 0x10000 /* enable discard for swap */
#define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */
#define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */
+#define SWAP_FLAG_SKIP_ZERO_CHECK 0x80000
#define SWAP_FLAGS_VALID (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \
SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \
- SWAP_FLAG_DISCARD_PAGES)
+ SWAP_FLAG_DISCARD_PAGES | SWAP_FLAG_SKIP_ZERO_CHECK)
#define SWAP_BATCH 64
static inline int current_is_kswapd(void)
@@ -215,6 +216,7 @@ enum {
SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */
SWP_HIBERNATION = (1 << 13), /* pinned for hibernation */
/* add others here before... */
+ SWP_SKIP_ZERO_CHECK = (1 << 14), /* skip zero-filled page check */
};
#define SWAP_CLUSTER_MAX 32UL
diff --git a/mm/page_io.c b/mm/page_io.c
index f2d8fe7fd057..4b65ea19801e 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -265,6 +265,9 @@ int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
goto out_unlock;
}
+ if (swap_skip_zero_check(folio))
+ goto skip_zerocheck;
+
/*
* Use the swap table zero mark to avoid doing IO for zero-filled
* pages. The zero mark is protected by the cluster lock, which is
@@ -282,6 +285,7 @@ int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
*/
swap_zeromap_folio_clear(folio);
+skip_zerocheck:
if (zswap_store(folio)) {
count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT);
goto out_unlock;
@@ -563,7 +567,8 @@ static bool swap_read_folio_zeromap(struct folio *folio)
* that an IO error is emitted (e.g. do_swap_page() will sigbus).
* Folio lock stabilizes the cluster and map, so the check is safe.
*/
- if (WARN_ON_ONCE(swap_zeromap_batch(folio->swap, nr_pages,
+ if (!swap_skip_zero_check(folio) &&
+ WARN_ON_ONCE(swap_zeromap_batch(folio->swap, nr_pages,
&is_zeromap) != nr_pages))
return true;
diff --git a/mm/swap.h b/mm/swap.h
index 81c06aae7ccd..299e1200887f 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -334,6 +334,13 @@ static inline unsigned int folio_swap_flags(struct folio *folio)
return __swap_entry_to_info(folio->swap)->flags;
}
+static inline bool swap_skip_zero_check(struct folio *folio)
+{
+ struct swap_info_struct *si = __swap_entry_to_info(folio->swap);
+
+ return (si->flags & SWP_SKIP_ZERO_CHECK) ? true : false;
+}
+
#else /* CONFIG_SWAP */
struct swap_iocb;
static inline struct swap_cluster_info *swap_cluster_lock(
@@ -471,5 +478,10 @@ static inline unsigned int folio_swap_flags(struct folio *folio)
{
return 0;
}
+
+static inline bool swap_zero_check_skip(struct folio *folio)
+{
+ return false;
+}
#endif /* CONFIG_SWAP */
#endif /* _MM_SWAP_H */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index a9a1e477fec9..35cc33698b78 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3658,6 +3658,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (IS_ERR(si))
return PTR_ERR(si);
+ if (swap_flags & SWAP_FLAG_SKIP_ZERO_CHECK)
+ si->flags |= SWP_SKIP_ZERO_CHECK;
+
INIT_WORK(&si->discard_work, swap_discard_work);
INIT_WORK(&si->reclaim_work, swap_reclaim_work);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 2/2] mm: swap: do not allocate zero_bitmap if zero check is skipped
2026-05-18 7:34 [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Youngjun Park
2026-05-18 7:34 ` [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip " Youngjun Park
@ 2026-05-18 7:34 ` Youngjun Park
2026-05-19 10:16 ` [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Usama Arif
2026-05-31 18:48 ` Kairui Song
3 siblings, 0 replies; 11+ messages in thread
From: Youngjun Park @ 2026-05-18 7:34 UTC (permalink / raw)
To: linux-mm, akpm
Cc: chrisl, kasong, shikemeng, nphamcs, bhe, baohua, youngjun.park
On architectures where !SWAP_TABLE_HAS_ZEROFLAG is true, the swap layer
allocates a separate bitmap to track zero-filled pages within a swap cluster.
If a swap device is activated with the SWAP_FLAG_SKIP_ZERO_CHECK flag,
the swap layer no longer checks or tracks zero-filled pages for that
specific device. Therefore, allocating the zero_bitmap is unnecessary.
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
mm/swapfile.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 35cc33698b78..232cc5f9e06e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -446,7 +446,8 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
swap_cluster_free_table_folio_rcu_cb);
}
-static int swap_cluster_alloc_table(struct swap_cluster_info *ci, gfp_t gfp)
+static int swap_cluster_alloc_table(struct swap_info_struct *si,
+ struct swap_cluster_info *ci, gfp_t gfp)
{
struct swap_table *table = NULL;
struct folio *folio;
@@ -479,6 +480,8 @@ static int swap_cluster_alloc_table(struct swap_cluster_info *ci, gfp_t gfp)
#endif
#if !SWAP_TABLE_HAS_ZEROFLAG
+ if (si->flags & SWP_SKIP_ZERO_CHECK)
+ return 0;
VM_WARN_ON_ONCE(ci->zero_bitmap);
ci->zero_bitmap = bitmap_zalloc(SWAPFILE_CLUSTER, gfp);
if (!ci->zero_bitmap)
@@ -539,7 +542,7 @@ swap_cluster_populate(struct swap_info_struct *si,
lockdep_assert_held(&si->global_cluster_lock);
lockdep_assert_held(&ci->lock);
- if (!swap_cluster_alloc_table(ci, __GFP_HIGH | __GFP_NOMEMALLOC |
+ if (!swap_cluster_alloc_table(si, ci, __GFP_HIGH | __GFP_NOMEMALLOC |
__GFP_NOWARN))
return ci;
@@ -553,7 +556,7 @@ swap_cluster_populate(struct swap_info_struct *si,
spin_unlock(&si->global_cluster_lock);
local_unlock(&percpu_swap_cluster.lock);
- ret = swap_cluster_alloc_table(ci, __GFP_HIGH | __GFP_NOMEMALLOC |
+ ret = swap_cluster_alloc_table(si, ci, __GFP_HIGH | __GFP_NOMEMALLOC |
GFP_KERNEL);
/*
@@ -813,7 +816,7 @@ static int swap_cluster_setup_bad_slot(struct swap_info_struct *si,
ci = cluster_info + idx;
/* Need to allocate swap table first for initial bad slot marking. */
- if (!ci->count && swap_cluster_alloc_table(ci, GFP_KERNEL))
+ if (!ci->count && swap_cluster_alloc_table(si, ci, GFP_KERNEL))
return -ENOMEM;
spin_lock(&ci->lock);
/* Check for duplicated bad swap slots. */
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
2026-05-18 7:34 ` [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip " Youngjun Park
@ 2026-05-19 6:30 ` Christoph Hellwig
2026-05-19 7:08 ` YoungJun Park
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2026-05-19 6:30 UTC (permalink / raw)
To: Youngjun Park
Cc: linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs, bhe, baohua
Nothing in this patch ever sets the new flag. And I'd rather not
add new weird bdev hooks for zram when we really need to consolidate
the compressed swap logic in the core swap code.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
2026-05-19 6:30 ` Christoph Hellwig
@ 2026-05-19 7:08 ` YoungJun Park
2026-05-19 9:18 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: YoungJun Park @ 2026-05-19 7:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs, bhe, baohua
On Mon, May 18, 2026 at 11:30:52PM -0700, Christoph Hellwig wrote:
Hi Christoph!
> Nothing in this patch ever sets the new flag. And I'd rather not
I defined a new flag that can be set at swapon timing. Is your concern
that there is no mention of the user-space flag setting part?
> add new weird bdev hooks for zram when we really need to consolidate
> the compressed swap logic in the core swap code.
Are you mentioning we consolidate the same data pattern logic (including
compressed swap) into the core swap code?
I am curious about what direction you have in mind. I can think of three
possible approaches.
- Handle it in individual devices
Can be optimized for each device. Currently, the swap layer
handles zero pages but not same data patterns, causing a mismatch. My
patch aims to resolve this. If we prefer this direction, we could even
remove the zero page check from the swap layer and let individual swap
devices handle it.
- Handle it in the swap layer
Add same data pattern checks (currently only zero pages) and
integrate compressed swap logic here per swap device.
Cross-swap device deduplication becomes difficult. Also, for zero
pages, it occupies a swap entry even though it doesn't use physical swap
space, which is inefficient.
And for same data pattern check, swap table needs some more bits.
- Create a higher layer above swap
cross-swap device dedup is possible and eliminates the inefficiency of
occupying unused physical swap space.
But, Relationship with zswap, might not satisfy individual device needs
due to its general nature, and requires extensive discussion?
I wonder which direction you are considering.
Regardless of the above, I would like to solve this issue. How about a
compile-time solution? For example, adding a config like
CONFIG_SKIP_SWAP_CHECK to enable/disable this logic across the entire
swap layer.
This way, vendors and zram users who select this option can solve the
problem mentioned in the patch. (However, if using zram + another swap
device, the other device would also skip the zero page check).
Best regards,
Youngjun Park
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
2026-05-19 7:08 ` YoungJun Park
@ 2026-05-19 9:18 ` Christoph Hellwig
2026-05-21 6:46 ` YoungJun Park
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2026-05-19 9:18 UTC (permalink / raw)
To: YoungJun Park
Cc: Christoph Hellwig, linux-mm, akpm, chrisl, kasong, shikemeng,
nphamcs, bhe, baohua
On Tue, May 19, 2026 at 04:08:35PM +0900, YoungJun Park wrote:
> On Mon, May 18, 2026 at 11:30:52PM -0700, Christoph Hellwig wrote:
>
> Hi Christoph!
>
> > Nothing in this patch ever sets the new flag. And I'd rather not
>
> I defined a new flag that can be set at swapon timing. Is your concern
> that there is no mention of the user-space flag setting part?
Heh. My main concert was that I assumed it's a kernel-only flag as
SWAP_FLAG_* aren't defined in a uapi header. But now that I know
that they are user passed flag, my concern is that this is not something
that should be exposed to the user because it is a depply internal
implementation detail.
>
> > add new weird bdev hooks for zram when we really need to consolidate
> > the compressed swap logic in the core swap code.
>
> Are you mentioning we consolidate the same data pattern logic (including
> compressed swap) into the core swap code?
That is the plan based on my understanding of the LSF/MM session.
> I am curious about what direction you have in mind. I can think of three
> possible approaches.
>
> - Handle it in individual devices
>
> Can be optimized for each device. Currently, the swap layer
> handles zero pages but not same data patterns, causing a mismatch. My
> patch aims to resolve this. If we prefer this direction, we could even
> remove the zero page check from the swap layer and let individual swap
> devices handle it.
I don't think devices should care about this at all. compression and
dedup is core functionality that does not depend on the underlying
device.
> I wonder which direction you are considering.
Please look at all the ongoing discussions on the linux-mm list.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check
2026-05-18 7:34 [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Youngjun Park
2026-05-18 7:34 ` [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip " Youngjun Park
2026-05-18 7:34 ` [RFC PATCH 2/2] mm: swap: do not allocate zero_bitmap if zero check is skipped Youngjun Park
@ 2026-05-19 10:16 ` Usama Arif
2026-05-21 6:57 ` YoungJun Park
2026-05-31 18:48 ` Kairui Song
3 siblings, 1 reply; 11+ messages in thread
From: Usama Arif @ 2026-05-19 10:16 UTC (permalink / raw)
To: Youngjun Park
Cc: Usama Arif, linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs,
bhe, baohua
On Mon, 18 May 2026 16:34:53 +0900 Youngjun Park <youngjun.park@lge.com> wrote:
> Currently, the swap layer checks whether a page is entirely zero-filled
> before writing it out to the swap device. However, some swap backends,
> such as zram and our custom swap device, already perform their own
> same-filled page checking internally. This results in redundant CPU operations
> checking same page pattern.
Hello!
So IMO, we should go in the other direction and remove same-filled page
support from zram. When we added zero-filled support to swap, we removed
it from zswap, because its not worth the extra overhead of maintaining
this in zswap.
I did some analysis at that time, and I remember more than 90% of the
same-filled pages on server workloads were zero-filled. I think someone
else did analysis on android (maybe Barry?) and it was about 85%.
IMO we should just remove same-filled page support from zram, and keep
this centralized in swap, so that everyone benefits.
>
> This patchset introduces a new swapon flag, SWAP_FLAG_SKIP_ZERO_CHECK,
> to eliminate this redundancy. I introduce this as a per-device flag
> rather than a global setting because traditional swap devices still
> benefit from the swap layer's zero page check to avoid unnecessary I/O.
> By using this flag, userspace can selectively disable the zero check
> only for specific backends.
>
> Furthermore, on certain architectures where the zero map is managed via
> a separate bitmap, skipping this check allows bypassing
> the bitmap allocation entirely (saving memory).
>
> This modification is based on the previous discussion with Nhat Pham [1].
> Additionally, this patchset is built on top of Kairui Song's recent
> patchset regarding swap table and zeromap modifications [2].
>
> Tested simply with zram on QEMU to verify zero-filled page handling.
>
> References:
> [1] https://lore.kernel.org/linux-mm/acQvNRLpHwnHt7i+@yjaykim-PowerEdge-T330/
> [2] https://lore.kernel.org/linux-mm/20260517-swap-table-p4-v5-0-88ae43e064c7@tencent.com/T/#t
>
> Youngjun Park (2):
> mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
> mm: swap: do not allocate zero_bitmap if zero check is skipped
>
> include/linux/swap.h | 4 +++-
> mm/page_io.c | 7 ++++++-
> mm/swap.h | 12 ++++++++++++
> mm/swapfile.c | 14 ++++++++++----
> 4 files changed, 31 insertions(+), 6 deletions(-)
>
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
2026-05-19 9:18 ` Christoph Hellwig
@ 2026-05-21 6:46 ` YoungJun Park
0 siblings, 0 replies; 11+ messages in thread
From: YoungJun Park @ 2026-05-21 6:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs, bhe, baohua
On Tue, May 19, 2026 at 02:18:33AM -0700, Christoph Hellwig wrote:
> On Tue, May 19, 2026 at 04:08:35PM +0900, YoungJun Park wrote:
> > On Mon, May 18, 2026 at 11:30:52PM -0700, Christoph Hellwig wrote:
> >
> > Hi Christoph!
> >
> > > Nothing in this patch ever sets the new flag. And I'd rather not
> >
> > I defined a new flag that can be set at swapon timing. Is your concern
> > that there is no mention of the user-space flag setting part?
>
> Heh. My main concert was that I assumed it's a kernel-only flag as
> SWAP_FLAG_* aren't defined in a uapi header. But now that I know
> that they are user passed flag, my concern is that this is not something
> that should be exposed to the user because it is a depply internal
> implementation detail.
>
> >
> > > add new weird bdev hooks for zram when we really need to consolidate
> > > the compressed swap logic in the core swap code.
> >
> > Are you mentioning we consolidate the same data pattern logic (including
> > compressed swap) into the core swap code?
>
> That is the plan based on my understanding of the LSF/MM session.
>
> > I am curious about what direction you have in mind. I can think of three
> > possible approaches.
> >
> > - Handle it in individual devices
> >
> > Can be optimized for each device. Currently, the swap layer
> > handles zero pages but not same data patterns, causing a mismatch. My
> > patch aims to resolve this. If we prefer this direction, we could even
> > remove the zero page check from the swap layer and let individual swap
> > devices handle it.
>
> I don't think devices should care about this at all. compression and
> dedup is core functionality that does not depend on the underlying
> device.
>
> > I wonder which direction you are considering.
>
> Please look at all the ongoing discussions on the linux-mm list.
I submitted this patch from the perspective of optimizing the current state,
but I see that it might not align with the future direction.
Thank you for the reply and clarification.
Best regards,
Youngjun Park
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check
2026-05-19 10:16 ` [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Usama Arif
@ 2026-05-21 6:57 ` YoungJun Park
0 siblings, 0 replies; 11+ messages in thread
From: YoungJun Park @ 2026-05-21 6:57 UTC (permalink / raw)
To: Usama Arif
Cc: linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs, bhe, baohua
On Tue, May 19, 2026 at 03:16:14AM -0700, Usama Arif wrote:
> On Mon, 18 May 2026 16:34:53 +0900 Youngjun Park <youngjun.park@lge.com> wrote:
>
> > Currently, the swap layer checks whether a page is entirely zero-filled
> > before writing it out to the swap device. However, some swap backends,
> > such as zram and our custom swap device, already perform their own
> > same-filled page checking internally. This results in redundant CPU operations
> > checking same page pattern.
>
> Hello!
>
> So IMO, we should go in the other direction and remove same-filled page
> support from zram. When we added zero-filled support to swap, we removed
> it from zswap, because its not worth the extra overhead of maintaining
> this in zswap.
>
> I did some analysis at that time, and I remember more than 90% of the
> same-filled pages on server workloads were zero-filled. I think someone
> else did analysis on android (maybe Barry?) and it was about 85%.
>
> IMO we should just remove same-filled page support from zram, and keep
> this centralized in swap, so that everyone benefits.
Hello,
Workloads can vary (like ours), and we still see value in checking for
same data filled pages.
(For reference, there was a similar discussion here:
https://lore.kernel.org/linux-mm/CAKEwX=PBjMVfMvKkNfqbgiw7o10NFyZBSB62ODzsqogv-WDYKQ@mail.gmail.com/)
Anyway, I will review this again to figure out the better approach.
Thank you :)
Best regards,
Youngjun Park.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check
2026-05-18 7:34 [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Youngjun Park
` (2 preceding siblings ...)
2026-05-19 10:16 ` [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Usama Arif
@ 2026-05-31 18:48 ` Kairui Song
2026-06-01 2:05 ` YoungJun Park
3 siblings, 1 reply; 11+ messages in thread
From: Kairui Song @ 2026-05-31 18:48 UTC (permalink / raw)
To: Youngjun Park
Cc: linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs, bhe, baohua
On Mon, May 18, 2026 at 04:34:53PM +0800, Youngjun Park wrote:
> Currently, the swap layer checks whether a page is entirely zero-filled
> before writing it out to the swap device. However, some swap backends,
> such as zram and our custom swap device, already perform their own
> same-filled page checking internally. This results in redundant CPU operations
> checking same page pattern.
>
> This patchset introduces a new swapon flag, SWAP_FLAG_SKIP_ZERO_CHECK,
> to eliminate this redundancy. I introduce this as a per-device flag
> rather than a global setting because traditional swap devices still
> benefit from the swap layer's zero page check to avoid unnecessary I/O.
> By using this flag, userspace can selectively disable the zero check
> only for specific backends.
>
> Furthermore, on certain architectures where the zero map is managed via
> a separate bitmap, skipping this check allows bypassing
> the bitmap allocation entirely (saving memory).
>
> This modification is based on the previous discussion with Nhat Pham [1].
> Additionally, this patchset is built on top of Kairui Song's recent
> patchset regarding swap table and zeromap modifications [2].
>
> Tested simply with zram on QEMU to verify zero-filled page handling.
>
> References:
> [1] https://lore.kernel.org/linux-mm/acQvNRLpHwnHt7i+@yjaykim-PowerEdge-T330/
> [2] https://lore.kernel.org/linux-mm/20260517-swap-table-p4-v5-0-88ae43e064c7@tencent.com/T/#t
>
> Youngjun Park (2):
> mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
> mm: swap: do not allocate zero_bitmap if zero check is skipped
>
> include/linux/swap.h | 4 +++-
> mm/page_io.c | 7 ++++++-
> mm/swap.h | 12 ++++++++++++
> mm/swapfile.c | 14 ++++++++++----
> 4 files changed, 31 insertions(+), 6 deletions(-)
>
> --
> 2.34.1
>
Hi YoungJun,
I think this idea might be useful, some workloads with very few zero
folios doesn't benefit from this zero folio detection and it's more of a
burden than gain:
For example, one test result with MySQL:
pswpin 129323368
pswpout 131460192
swpin_zero 4641
swpout_zero 248210
Less than 0.3% percent of the pages are zero, and almost none of the
zero pages are swapped back.
I think the zero page detection is actually better combined with
compression, e.g. ZRAM & ZSWAP which will always have to touch the
content of the page. But some devices may be able to just accept
the folio as it is and CPU may not want or need to read the content
again before pass the folio to the device, that may save some CPU
time I think?
How we can make this zero check as an interface is arguable though.
So I think the user might not be limited to ZRAM.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check
2026-05-31 18:48 ` Kairui Song
@ 2026-06-01 2:05 ` YoungJun Park
0 siblings, 0 replies; 11+ messages in thread
From: YoungJun Park @ 2026-06-01 2:05 UTC (permalink / raw)
To: Kairui Song
Cc: linux-mm, akpm, chrisl, kasong, shikemeng, nphamcs, bhe, baohua
On Mon, Jun 01, 2026 at 02:48:44AM +0800, Kairui Song wrote:
> On Mon, May 18, 2026 at 04:34:53PM +0800, Youngjun Park wrote:
> > Currently, the swap layer checks whether a page is entirely zero-filled
> > before writing it out to the swap device. However, some swap backends,
> > such as zram and our custom swap device, already perform their own
> > same-filled page checking internally. This results in redundant CPU operations
> > checking same page pattern.
> >
> > This patchset introduces a new swapon flag, SWAP_FLAG_SKIP_ZERO_CHECK,
> > to eliminate this redundancy. I introduce this as a per-device flag
> > rather than a global setting because traditional swap devices still
> > benefit from the swap layer's zero page check to avoid unnecessary I/O.
> > By using this flag, userspace can selectively disable the zero check
> > only for specific backends.
> >
> > Furthermore, on certain architectures where the zero map is managed via
> > a separate bitmap, skipping this check allows bypassing
> > the bitmap allocation entirely (saving memory).
> >
> > This modification is based on the previous discussion with Nhat Pham [1].
> > Additionally, this patchset is built on top of Kairui Song's recent
> > patchset regarding swap table and zeromap modifications [2].
> >
> > Tested simply with zram on QEMU to verify zero-filled page handling.
> >
> > References:
> > [1] https://lore.kernel.org/linux-mm/acQvNRLpHwnHt7i+@yjaykim-PowerEdge-T330/
> > [2] https://lore.kernel.org/linux-mm/20260517-swap-table-p4-v5-0-88ae43e064c7@tencent.com/T/#t
> >
> > Youngjun Park (2):
> > mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip zero-filled page check
> > mm: swap: do not allocate zero_bitmap if zero check is skipped
> >
> > include/linux/swap.h | 4 +++-
> > mm/page_io.c | 7 ++++++-
> > mm/swap.h | 12 ++++++++++++
> > mm/swapfile.c | 14 ++++++++++----
> > 4 files changed, 31 insertions(+), 6 deletions(-)
> >
> > --
> > 2.34.1
> >
>
> Hi YoungJun,
Hello Kairui,
> I think this idea might be useful, some workloads with very few zero
> folios doesn't benefit from this zero folio detection and it's more of a
> burden than gain:
>
> For example, one test result with MySQL:
>
> pswpin 129323368
> pswpout 131460192
> swpin_zero 4641
> swpout_zero 248210
>
> Less than 0.3% percent of the pages are zero, and almost none of the
> zero pages are swapped back.
That is good point.
So, There are three type of wordloads.
1. Zero filled (current kernel targeted case)
2. Zero filled + Filled with other value (our targeted case)
3. None
The third case one also benifit from this kind of approach.
> I think the zero page detection is actually better combined with
> compression, e.g. ZRAM & ZSWAP which will always have to touch the
> content of the page. But some devices may be able to just accept
> the folio as it is and CPU may not want or need to read the content
> again before pass the folio to the device, that may save some CPU
> time I think?
> How we can make this zero check as an interface is arguable though.
>
> So I think the user might not be limited to ZRAM.
So, In my opinion,
This same filled data might be controlled in some proper way.
(like this patch, CONFIG_*, etc...)
I will think about it more.
Thanks
Youngjun Park
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-01 2:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 7:34 [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Youngjun Park
2026-05-18 7:34 ` [RFC PATCH 1/2] mm: swap: add SWAP_FLAG_SKIP_ZERO_CHECK to skip " Youngjun Park
2026-05-19 6:30 ` Christoph Hellwig
2026-05-19 7:08 ` YoungJun Park
2026-05-19 9:18 ` Christoph Hellwig
2026-05-21 6:46 ` YoungJun Park
2026-05-18 7:34 ` [RFC PATCH 2/2] mm: swap: do not allocate zero_bitmap if zero check is skipped Youngjun Park
2026-05-19 10:16 ` [RFC PATCH 0/2] mm: swap: allow per-device skipping of zero-filled page check Usama Arif
2026-05-21 6:57 ` YoungJun Park
2026-05-31 18:48 ` Kairui Song
2026-06-01 2:05 ` YoungJun Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox