From: Wenchao Hao <haowenchao22@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Barry Song <21cnbao@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Jens Axboe <axboe@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Minchan Kim <minchan@kernel.org>,
Nhat Pham <nphamcs@gmail.com>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Yosry Ahmed <yosry@kernel.org>
Cc: Wenchao Hao <haowenchao22@gmail.com>,
Wenchao Hao <haowenchao@xiaomi.com>
Subject: [RFC PATCH v3 0/4] mm/zsmalloc: per-cpu deferred free to accelerate swap entry release
Date: Fri, 8 May 2026 14:07:20 +0800 [thread overview]
Message-ID: <20260508060724.3810904-1-haowenchao@xiaomi.com> (raw)
Swap freeing can be expensive when unmapping a VMA containing many swap
entries. This has been reported to significantly delay memory reclamation
during Android's low-memory killing, especially when multiple processes
are terminated to free memory, with slot_free() accounting for more than
80% of the total cost of freeing swap entries.
Two earlier attempts by Lei and Zhiguo added a new thread in the mm core
to asynchronously collect and free swap entries [1][2], but the design
itself is fairly complex.
When anon folios and swap entries are mixed within a process, reclaiming
anon folios from killed processes helps return memory to the system as
quickly as possible, so that newly launched applications can satisfy
their memory demands. It is not ideal for swap freeing to block anon
folio freeing. On the other hand, swap freeing can still return memory
to the system, although at a slower rate due to memory compression.
This series introduces a callback-based deferred free framework in
zsmalloc. Callers (zram, zswap) register push/drain callbacks to
define what gets buffered and how it gets drained. The entire free
path including caller-side bookkeeping (slot_free, zswap_entry_free)
is deferred to a background worker.
Implementation:
- Each CPU owns a single-page buffer. The hot path writes a value
via the push callback with preemption disabled (no locks).
- When the buffer fills, it is swapped with a fresh page from a
pre-allocated page pool. The full page is queued to a WQ_UNBOUND
worker for drain.
- The drain callback performs the actual expensive work (zs_free,
slot_free, zswap_entry_free, etc.) in batch, off the hot path.
- If no free page is available, the caller falls back to synchronous
processing.
The speedup comes from moving expensive swap slot freeing off the
munmap hot path into a background worker, so that intact anonymous
folios are released back to the system without blocking. The worker
drains at a slower rate since compressed objects are small and freeing
a single handle may not release an entire page until the zspage is
fully empty.
Performance results (Raspberry Pi 4B, ARM64, 8GB RAM):
Test 1: munmap latency for 256MB swap-filled VMA (zram backend)
mode Base Patched Speedup
single 61.82ms 8.62ms 7.17x
multi 2p 94.75ms 54.11ms 1.75x
multi 3p 154.64ms 104.83ms 1.48x
Test 2: munmap latency for different sizes (zram, single process)
Size Base Patched Speedup
64MB 14.11ms 2.18ms 6.47x
128MB 29.45ms 4.48ms 6.57x
192MB 43.85ms 6.62ms 6.62x
256MB 57.01ms 9.08ms 6.28x
512MB 115.13ms 55.58ms 2.07x
1024MB 229.66ms 153.28ms 1.50x
Test 3: munmap latency for 256MB swap-filled VMA (zswap backend)
mode Base Patched Speedup
single 152.14ms 51.26ms 2.97x
multi 2p 186.56ms 105.42ms 1.77x
multi 3p 205.83ms 153.32ms 1.34x
Test 4: munmap latency for different sizes (zswap, single process)
Size Base Patched Speedup
64MB 37.83ms 13.26ms 2.85x
128MB 75.11ms 26.73ms 2.81x
256MB 150.78ms 52.97ms 2.85x
512MB 303.04ms 130.38ms 2.32x
1024MB 599.95ms 287.10ms 2.09x
[1] https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/
[2] https://lore.kernel.org/all/20250909065349.574894-1-liulei.rjpt@vivo.com/
[3] https://lore.kernel.org/linux-mm/20260412060450.15813-1-baohua@kernel.org/
Changes since v2:
- Use per-cpu single-page buffers instead of a global list; the hot
path only writes into the local CPU's buffer with preemption disabled
- Add a page pool for buffer rotation: when the current buffer is full,
swap it with a free page from the pool and queue the full page for
drain
- Introduce push/drain callback ops so that zram and zswap can each
define their own element size and drain logic (zram stores u32 slot
indices, zswap stores unsigned long handles)
- Drop the lock optimization patches it will be submitted separately
as part of a dedicated zsmalloc lock contention series
- Link to v2: https://lore.kernel.org/all/20260421121616.3298845-1-haowenchao@xiaomi.com/
Barry Song (1):
zram: use zsmalloc deferred free callback for async slot free
Wenchao Hao (3):
mm/zsmalloc: introduce deferred free framework with callback ops
mm/zswap: use zsmalloc deferred free callback for async invalidate
zram: batch clear flags in slot_free with single write
drivers/block/zram/zram_drv.c | 44 ++++++-
drivers/block/zram/zram_drv.h | 6 +
include/linux/zsmalloc.h | 16 +++
mm/zsmalloc.c | 208 +++++++++++++++++++++++++++++++++-
mm/zswap.c | 38 ++++++-
5 files changed, 306 insertions(+), 6 deletions(-)
--
2.34.1
next reply other threads:[~2026-05-08 6:08 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 6:07 Wenchao Hao [this message]
2026-05-08 6:07 ` [RFC PATCH v3 1/4] mm/zsmalloc: introduce deferred free framework with callback ops Wenchao Hao
2026-05-09 0:29 ` Nhat Pham
2026-05-09 8:47 ` Wenchao Hao
2026-05-08 6:07 ` [RFC PATCH v3 2/4] mm/zswap: use zsmalloc deferred free callback for async invalidate Wenchao Hao
2026-05-08 6:07 ` [RFC PATCH v3 3/4] zram: use zsmalloc deferred free callback for async slot free Wenchao Hao
2026-05-08 6:07 ` [RFC PATCH v3 4/4] zram: batch clear flags in slot_free with single write Wenchao Hao
2026-05-08 20:12 ` [RFC PATCH v3 0/4] mm/zsmalloc: per-cpu deferred free to accelerate swap entry release Yosry Ahmed
2026-05-09 8:32 ` Wenchao Hao
2026-05-09 8:38 ` Wenchao Hao
2026-05-09 0:08 ` Nhat Pham
2026-05-09 8:45 ` Wenchao Hao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260508060724.3810904-1-haowenchao@xiaomi.com \
--to=haowenchao22@gmail.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=haowenchao@xiaomi.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=senozhatsky@chromium.org \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox