* [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free()
@ 2026-05-08 6:19 Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08 6:19 UTC (permalink / raw)
To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
Paul Walmsley, Sergey Senozhatsky
Cc: Wenchao Hao, Wenchao Hao
Swap freeing can be expensive when unmapping a VMA containing many swap
entries. This has been reported to significantly delay memory reclamation
during Android's low-memory killing, especially when multiple processes
are terminated to free memory, with slot_free() accounting for more than
80% of the total cost of freeing swap entries.
Lock contention in zs_free() is a major contributor to this cost:
- pool->lock (rwlock) read-side atomic operations become expensive
under multi-process concurrency due to cacheline bouncing
- class->lock held across zspage page freeing causes zone->lock
contention to propagate back
This series addresses both issues:
Patch 1: Encode class_idx in obj value
On 64-bit systems, OBJ_INDEX_BITS is over-provisioned. We split it
into class_idx + obj_idx so that zs_free() can determine the correct
size_class from the obj value alone, without needing pool->lock.
Patch 2: Remove pool->lock from zs_free()
With class_idx available from the obj encoding, zs_free() acquires
only class->lock (re-reading obj for a stable PFN). This eliminates
rwlock read-side contention between concurrent zs_free() calls and
page migration/compaction.
Patch 3: Drop class->lock before freeing zspage pages
Move the actual page release (free_zspage) outside class->lock. The
bookkeeping is done under the lock, but buddy allocator interaction
(zone->lock) no longer nests inside class->lock.
Performance results:
Test: each process independently mmap 256MB, write data, madvise
MADV_PAGEOUT to swap out via zram (lzo-rle), then concurrent munmap.
Raspberry Pi 4B (4-core ARM64 Cortex-A72):
mode Base Patched Speedup
single 59.0ms 56.0ms 1.05x
multi 2p 94.6ms 66.7ms 1.42x
multi 4p 202.9ms 110.6ms 1.83x
x86 (20-core Intel i7-12700, 16 concurrent processes):
mode Base Patched Speedup
single 11.7ms 9.8ms 1.19x
multi 2p 24.1ms 17.2ms 1.40x
multi 4p 63.0ms 45.3ms 1.39x
Single-process shows modest improvement. With multiple processes,
each read_lock/read_unlock atomically modifies the shared rwlock
reader count, and the cost of these atomic operations increases
with more CPUs accessing the same cacheline concurrently.
Eliminating pool->lock removes this overhead entirely.
Patch 1-2 only work on 64-bit systems (gated by ZS_OBJ_CLASS_IDX);
32-bit falls back to the original pool->lock path. Patch 3 benefits
all architectures.
Wenchao Hao (2):
mm/zsmalloc: encode class index in obj value for lockless class lookup
mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems
Xueyuan Chen (1):
mm/zsmalloc: drop class lock before freeing zspage
mm/zsmalloc.c | 146 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 131 insertions(+), 15 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup
2026-05-08 6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
@ 2026-05-08 6:19 ` Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage Wenchao Hao
2 siblings, 0 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08 6:19 UTC (permalink / raw)
To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
Paul Walmsley, Sergey Senozhatsky
Cc: Wenchao Hao, Wenchao Hao
Encode the size class index (class_idx) into the obj value so that
zs_free() can determine the correct size_class without dereferencing
the handle->obj->PFN->zpdesc->zspage->class chain under pool->lock.
OBJ_INDEX_BITS is over-provisioned on 64-bit systems. For example on
arm64 with default chain_size=8: OBJ_INDEX_BITS=24 but only 10 bits
are actually needed for obj_idx. We dynamically compute OBJ_CLASS_BITS
as ilog2(ZS_SIZE_CLASSES - 1) + 1 (8 bits for 4K pages, 9 for 64K)
and verify at compile time via static_assert that the three fields
(PFN + class_idx + obj_idx) fit within BITS_PER_LONG.
This encoding is gated by ZS_OBJ_CLASS_IDX, defined only when
BITS_PER_LONG >= 64. On 32-bit systems the bits do not fit, so
the feature is disabled and the original OBJ_INDEX layout is preserved.
Split OBJ_INDEX into class_idx and obj_idx:
obj: [PFN | class_idx | obj_idx]
[_PFN_BITS | OBJ_CLASS_BITS | OBJ_IDX_BITS]
class_idx is invariant across page migration (only PFN changes), so a
lockless read always yields a valid class_idx.
Update obj_to_location(), location_to_obj() and callers accordingly.
Add obj_to_class_idx() helper. Adjust ZS_MIN_ALLOC_SIZE to use
OBJ_IDX_BITS.
Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
---
mm/zsmalloc.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 88 insertions(+), 7 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 63128ddb7959..bccadf0a27f2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -96,11 +96,74 @@
#define CLASS_BITS 8
#define MAGIC_VAL_BITS 8
+/*
+ * Optionally encode the size class index in the obj value so that
+ * zs_free() can look up the correct class without holding pool->lock.
+ *
+ * Rather than fixing a hard CLASS_BITS constant for the class_idx field,
+ * we compute the minimum bits needed from the actual number of size classes
+ * and the actual maximum obj_idx, then check whether they all fit:
+ *
+ * _PFN_BITS + OBJ_CLASS_BITS_NEEDED + OBJ_IDX_BITS_NEEDED <= BITS_PER_LONG
+ *
+ * This naturally handles all architectures and PAGE_SIZE configurations:
+ *
+ * - 32-bit: BITS_PER_LONG=32, sum easily exceeds 32 --> disabled.
+ * - powerpc64 64K pages: ZS_SIZE_CLASSES=257 --> OBJ_CLASS_BITS_NEEDED=9,
+ * but the sum still fits in 64 bits --> enabled.
+ * - riscv64 Sv57: _PFN_BITS=44, tight but still fits --> enabled.
+ *
+ * When enabled, obj layout is:
+ *
+ * 63 0
+ * +-----------+--------------+-------------+
+ * | PFN | class_idx | obj_idx |
+ * | _PFN_BITS |OBJ_CLASS_BITS| OBJ_IDX_BITS|
+ * +-----------+--------------+-------------+
+ *
+ * Migration only rewrites PFN; class_idx and obj_idx are invariant,
+ * so a lockless read of obj always yields a valid class_idx.
+ */
+
+#if BITS_PER_LONG >= 64
+#define ZS_OBJ_CLASS_IDX
+#endif
+
+#ifdef ZS_OBJ_CLASS_IDX
+
+/* ZS_SIZE_CLASSES computed conservatively with original OBJ_INDEX_BITS */
+#define ZS_MIN_ALLOC_SIZE_FULL \
+ MAX(32, (CONFIG_ZSMALLOC_CHAIN_SIZE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+#define ZS_SIZE_CLASSES_FULL \
+ (DIV_ROUND_UP(PAGE_SIZE - ZS_MIN_ALLOC_SIZE_FULL, \
+ PAGE_SIZE >> CLASS_BITS) + 1)
+
+#define ZS_MAX_OBJ_COUNT_FULL \
+ (CONFIG_ZSMALLOC_CHAIN_SIZE * PAGE_SIZE / 32)
+#define OBJ_CLASS_BITS_NEEDED (ilog2(ZS_SIZE_CLASSES_FULL - 1) + 1)
+#define OBJ_IDX_BITS_NEEDED (ilog2(ZS_MAX_OBJ_COUNT_FULL - 1) + 1)
+
+static_assert(_PFN_BITS + OBJ_CLASS_BITS_NEEDED + OBJ_IDX_BITS_NEEDED
+ <= BITS_PER_LONG,
+ "zsmalloc: class_idx + obj_idx + PFN do not fit in obj on this config");
+
+#define OBJ_CLASS_BITS OBJ_CLASS_BITS_NEEDED
+#define OBJ_IDX_BITS (OBJ_INDEX_BITS - OBJ_CLASS_BITS)
+#define OBJ_IDX_MASK ((_AC(1, UL) << OBJ_IDX_BITS) - 1)
+#define OBJ_CLASS_MASK ((_AC(1, UL) << OBJ_CLASS_BITS) - 1)
+
+#else /* !ZS_OBJ_CLASS_IDX */
+
+#define OBJ_IDX_BITS OBJ_INDEX_BITS
+#define OBJ_IDX_MASK OBJ_INDEX_MASK
+
+#endif /* ZS_OBJ_CLASS_IDX */
+
#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL))
/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
#define ZS_MIN_ALLOC_SIZE \
- MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+ MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_IDX_BITS))
/* each chunk includes extra space to keep handle */
#define ZS_MAX_ALLOC_SIZE PAGE_SIZE
@@ -722,7 +785,7 @@ static void obj_to_location(unsigned long obj, struct zpdesc **zpdesc,
unsigned int *obj_idx)
{
*zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS);
- *obj_idx = (obj & OBJ_INDEX_MASK);
+ *obj_idx = (obj & OBJ_IDX_MASK);
}
static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc)
@@ -730,17 +793,29 @@ static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc)
*zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS);
}
+#ifdef ZS_OBJ_CLASS_IDX
+static unsigned int obj_to_class_idx(unsigned long obj)
+{
+ return (obj >> OBJ_IDX_BITS) & OBJ_CLASS_MASK;
+}
+#endif
+
/**
- * location_to_obj - get obj value encoded from (<zpdesc>, <obj_idx>)
+ * location_to_obj - encode (<zpdesc>, <obj_idx>, <class_idx>) into obj value
* @zpdesc: zpdesc object resides in zspage
* @obj_idx: object index
+ * @class_idx: size class index (used only when ZS_OBJ_CLASS_IDX is defined)
*/
-static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx)
+static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx,
+ unsigned int class_idx)
{
unsigned long obj;
obj = zpdesc_pfn(zpdesc) << OBJ_INDEX_BITS;
- obj |= obj_idx & OBJ_INDEX_MASK;
+#ifdef ZS_OBJ_CLASS_IDX
+ obj |= (unsigned long)(class_idx & OBJ_CLASS_MASK) << OBJ_IDX_BITS;
+#endif
+ obj |= obj_idx & OBJ_IDX_MASK;
return obj;
}
@@ -1276,7 +1351,7 @@ static unsigned long obj_malloc(struct zs_pool *pool,
kunmap_local(vaddr);
mod_zspage_inuse(zspage, 1);
- obj = location_to_obj(m_zpdesc, obj);
+ obj = location_to_obj(m_zpdesc, obj, zspage->class);
record_obj(handle, obj);
return obj;
@@ -1762,7 +1837,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
old_obj = handle_to_obj(handle);
obj_to_location(old_obj, &dummy, &obj_idx);
- new_obj = (unsigned long)location_to_obj(newzpdesc, obj_idx);
+#ifdef ZS_OBJ_CLASS_IDX
+ new_obj = (unsigned long)location_to_obj(newzpdesc,
+ obj_idx, obj_to_class_idx(old_obj));
+#else
+ new_obj = (unsigned long)location_to_obj(newzpdesc,
+ obj_idx, 0);
+#endif
record_obj(handle, new_obj);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems
2026-05-08 6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
@ 2026-05-08 6:19 ` Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage Wenchao Hao
2 siblings, 0 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08 6:19 UTC (permalink / raw)
To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
Paul Walmsley, Sergey Senozhatsky
Cc: Wenchao Hao, Wenchao Hao
With class_idx now encoded in the obj value (ZS_OBJ_CLASS_IDX),
zs_free() no longer needs pool->lock to locate the size class on
64-bit systems.
The class_idx is invariant across page migration (only PFN changes),
and 64-bit aligned reads are atomic, so a lockless read of the handle
always yields a valid class_idx. After acquiring class->lock (which
blocks concurrent migration), the handle is re-read to obtain a stable
PFN for the actual free operation.
This eliminates rwlock read-side contention between zs_free() and page
migration/compaction, improving zs_free() scalability on multi-core
systems.
On 32-bit systems (ZS_OBJ_CLASS_IDX not defined), the original
pool->lock path is preserved.
Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
---
mm/zsmalloc.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bccadf0a27f2..47ec0414ce9e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -21,6 +21,10 @@
* pool->lock
* class->lock
* zspage->lock
+ *
+ * On 64-bit systems with ZS_OBJ_CLASS_IDX enabled, zs_free() does not
+ * take pool->lock; it extracts class_idx from the obj encoding with a
+ * lockless read, then re-reads obj under class->lock.
*/
#include <linux/module.h>
@@ -1467,10 +1471,24 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
if (IS_ERR_OR_NULL((void *)handle))
return;
+#ifdef ZS_OBJ_CLASS_IDX
+ /*
+ * The class_idx encoded in obj is invariant across migration
+ * (only PFN changes), and the read of *(unsigned long *)handle
+ * is atomic on 64-bit, so we can determine the correct class
+ * without holding pool->lock.
+ */
+ obj = handle_to_obj(handle);
+ class = pool->size_class[obj_to_class_idx(obj)];
+ spin_lock(&class->lock);
/*
- * The pool->lock protects the race with zpage's migration
- * so it's safe to get the page from handle.
+ * Re-read under class->lock: migration also acquires class->lock,
+ * so the obj value is now stable and the PFN is valid.
*/
+ obj = handle_to_obj(handle);
+ obj_to_zpdesc(obj, &f_zpdesc);
+ zspage = get_zspage(f_zpdesc);
+#else
read_lock(&pool->lock);
obj = handle_to_obj(handle);
obj_to_zpdesc(obj, &f_zpdesc);
@@ -1478,6 +1496,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
class = zspage_class(pool, zspage);
spin_lock(&class->lock);
read_unlock(&pool->lock);
+#endif
class_stat_sub(class, ZS_OBJS_INUSE, 1);
obj_free(class->size, obj);
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage
2026-05-08 6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems Wenchao Hao
@ 2026-05-08 6:19 ` Wenchao Hao
2 siblings, 0 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08 6:19 UTC (permalink / raw)
To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
Paul Walmsley, Sergey Senozhatsky
Cc: Wenchao Hao, Xueyuan Chen, Wenchao Hao
From: Xueyuan Chen <xueyuan.chen21@gmail.com>
Currently in zs_free(), the class->lock is held until the zspage is
completely freed and the counters are updated. However, freeing pages back
to the buddy allocator requires acquiring the zone lock.
Under heavy memory pressure, zone lock contention can be severe. When this
happens, the CPU holding the class->lock will stall waiting for the zone
lock, thereby blocking all other CPUs attempting to acquire the same
class->lock.
This patch shrinks the critical section of the class->lock to reduce lock
contention. By moving the actual page freeing process outside the
class->lock, we can improve the concurrency performance of zs_free().
Testing on the RADXA O6 platform shows that with 12 CPUs concurrently
performing zs_free() operations, the execution time is reduced by 20%.
Signed-off-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
---
mm/zsmalloc.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 47ec0414ce9e..4b01fb215b19 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -880,13 +880,10 @@ static int trylock_zspage(struct zspage *zspage)
return 0;
}
-static void __free_zspage(struct zs_pool *pool, struct size_class *class,
- struct zspage *zspage)
+static inline void __free_zspage_lockless(struct zs_pool *pool, struct zspage *zspage)
{
struct zpdesc *zpdesc, *next;
- assert_spin_locked(&class->lock);
-
VM_BUG_ON(get_zspage_inuse(zspage));
VM_BUG_ON(zspage->fullness != ZS_INUSE_RATIO_0);
@@ -902,7 +899,13 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
} while (zpdesc != NULL);
cache_free_zspage(zspage);
+}
+static void __free_zspage(struct zs_pool *pool, struct size_class *class,
+ struct zspage *zspage)
+{
+ assert_spin_locked(&class->lock);
+ __free_zspage_lockless(pool, zspage);
class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated);
}
@@ -1467,6 +1470,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
unsigned long obj;
struct size_class *class;
int fullness;
+ struct zspage *zspage_to_free = NULL;
if (IS_ERR_OR_NULL((void *)handle))
return;
@@ -1502,10 +1506,22 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
obj_free(class->size, obj);
fullness = fix_fullness_group(class, zspage);
- if (fullness == ZS_INUSE_RATIO_0)
- free_zspage(pool, class, zspage);
+ if (fullness == ZS_INUSE_RATIO_0) {
+ if (trylock_zspage(zspage)) {
+ remove_zspage(class, zspage);
+ class_stat_sub(class, ZS_OBJS_ALLOCATED,
+ class->objs_per_zspage);
+ zspage_to_free = zspage;
+ } else
+ kick_deferred_free(pool);
+ }
spin_unlock(&class->lock);
+
+ if (likely(zspage_to_free)) {
+ __free_zspage_lockless(pool, zspage_to_free);
+ atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated);
+ }
cache_free_handle(handle);
}
EXPORT_SYMBOL_GPL(zs_free);
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-08 6:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems Wenchao Hao
2026-05-08 6:19 ` [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage Wenchao Hao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox