Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free()
@ 2026-05-08  6:19 Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08  6:19 UTC (permalink / raw)
  To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
	linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
	Paul Walmsley, Sergey Senozhatsky
  Cc: Wenchao Hao, Wenchao Hao

Swap freeing can be expensive when unmapping a VMA containing many swap
entries. This has been reported to significantly delay memory reclamation
during Android's low-memory killing, especially when multiple processes
are terminated to free memory, with slot_free() accounting for more than
80% of the total cost of freeing swap entries.

Lock contention in zs_free() is a major contributor to this cost:
  - pool->lock (rwlock) read-side atomic operations become expensive
    under multi-process concurrency due to cacheline bouncing
  - class->lock held across zspage page freeing causes zone->lock
    contention to propagate back

This series addresses both issues:

Patch 1: Encode class_idx in obj value
  On 64-bit systems, OBJ_INDEX_BITS is over-provisioned. We split it
  into class_idx + obj_idx so that zs_free() can determine the correct
  size_class from the obj value alone, without needing pool->lock.

Patch 2: Remove pool->lock from zs_free()
  With class_idx available from the obj encoding, zs_free() acquires
  only class->lock (re-reading obj for a stable PFN). This eliminates
  rwlock read-side contention between concurrent zs_free() calls and
  page migration/compaction.

Patch 3: Drop class->lock before freeing zspage pages
  Move the actual page release (free_zspage) outside class->lock. The
  bookkeeping is done under the lock, but buddy allocator interaction
  (zone->lock) no longer nests inside class->lock.

Performance results:

Test: each process independently mmap 256MB, write data, madvise
MADV_PAGEOUT to swap out via zram (lzo-rle), then concurrent munmap.

Raspberry Pi 4B (4-core ARM64 Cortex-A72):

  mode        Base       Patched     Speedup
  single      59.0ms     56.0ms      1.05x
  multi 2p    94.6ms     66.7ms      1.42x
  multi 4p    202.9ms    110.6ms     1.83x

x86 (20-core Intel i7-12700, 16 concurrent processes):

  mode        Base       Patched     Speedup
  single      11.7ms     9.8ms       1.19x
  multi 2p    24.1ms     17.2ms      1.40x
  multi 4p    63.0ms     45.3ms      1.39x

Single-process shows modest improvement. With multiple processes,
each read_lock/read_unlock atomically modifies the shared rwlock
reader count, and the cost of these atomic operations increases
with more CPUs accessing the same cacheline concurrently.
Eliminating pool->lock removes this overhead entirely.

Patch 1-2 only work on 64-bit systems (gated by ZS_OBJ_CLASS_IDX);
32-bit falls back to the original pool->lock path. Patch 3 benefits
all architectures.

Wenchao Hao (2):
  mm/zsmalloc: encode class index in obj value for lockless class lookup
  mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems

Xueyuan Chen (1):
  mm/zsmalloc: drop class lock before freeing zspage

 mm/zsmalloc.c | 146 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 131 insertions(+), 15 deletions(-)

--
2.34.1



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup
  2026-05-08  6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
@ 2026-05-08  6:19 ` Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage Wenchao Hao
  2 siblings, 0 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08  6:19 UTC (permalink / raw)
  To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
	linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
	Paul Walmsley, Sergey Senozhatsky
  Cc: Wenchao Hao, Wenchao Hao

Encode the size class index (class_idx) into the obj value so that
zs_free() can determine the correct size_class without dereferencing
the handle->obj->PFN->zpdesc->zspage->class chain under pool->lock.

OBJ_INDEX_BITS is over-provisioned on 64-bit systems.  For example on
arm64 with default chain_size=8: OBJ_INDEX_BITS=24 but only 10 bits
are actually needed for obj_idx.  We dynamically compute OBJ_CLASS_BITS
as ilog2(ZS_SIZE_CLASSES - 1) + 1 (8 bits for 4K pages, 9 for 64K)
and verify at compile time via static_assert that the three fields
(PFN + class_idx + obj_idx) fit within BITS_PER_LONG.

This encoding is gated by ZS_OBJ_CLASS_IDX, defined only when
BITS_PER_LONG >= 64.  On 32-bit systems the bits do not fit, so
the feature is disabled and the original OBJ_INDEX layout is preserved.

Split OBJ_INDEX into class_idx and obj_idx:

  obj: [PFN | class_idx | obj_idx]
       [_PFN_BITS | OBJ_CLASS_BITS | OBJ_IDX_BITS]

class_idx is invariant across page migration (only PFN changes), so a
lockless read always yields a valid class_idx.

Update obj_to_location(), location_to_obj() and callers accordingly.
Add obj_to_class_idx() helper.  Adjust ZS_MIN_ALLOC_SIZE to use
OBJ_IDX_BITS.

Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
---
 mm/zsmalloc.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 7 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 63128ddb7959..bccadf0a27f2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -96,11 +96,74 @@
 #define CLASS_BITS	8
 #define MAGIC_VAL_BITS	8
 
+/*
+ * Optionally encode the size class index in the obj value so that
+ * zs_free() can look up the correct class without holding pool->lock.
+ *
+ * Rather than fixing a hard CLASS_BITS constant for the class_idx field,
+ * we compute the minimum bits needed from the actual number of size classes
+ * and the actual maximum obj_idx, then check whether they all fit:
+ *
+ *   _PFN_BITS + OBJ_CLASS_BITS_NEEDED + OBJ_IDX_BITS_NEEDED <= BITS_PER_LONG
+ *
+ * This naturally handles all architectures and PAGE_SIZE configurations:
+ *
+ *  - 32-bit: BITS_PER_LONG=32, sum easily exceeds 32 --> disabled.
+ *  - powerpc64 64K pages: ZS_SIZE_CLASSES=257 --> OBJ_CLASS_BITS_NEEDED=9,
+ *    but the sum still fits in 64 bits --> enabled.
+ *  - riscv64 Sv57: _PFN_BITS=44, tight but still fits --> enabled.
+ *
+ * When enabled, obj layout is:
+ *
+ *  63                                              0
+ *  +-----------+--------------+-------------+
+ *  |    PFN    |  class_idx   |   obj_idx   |
+ *  | _PFN_BITS |OBJ_CLASS_BITS| OBJ_IDX_BITS|
+ *  +-----------+--------------+-------------+
+ *
+ * Migration only rewrites PFN; class_idx and obj_idx are invariant,
+ * so a lockless read of obj always yields a valid class_idx.
+ */
+
+#if BITS_PER_LONG >= 64
+#define ZS_OBJ_CLASS_IDX
+#endif
+
+#ifdef ZS_OBJ_CLASS_IDX
+
+/* ZS_SIZE_CLASSES computed conservatively with original OBJ_INDEX_BITS */
+#define ZS_MIN_ALLOC_SIZE_FULL \
+	MAX(32, (CONFIG_ZSMALLOC_CHAIN_SIZE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+#define ZS_SIZE_CLASSES_FULL \
+	(DIV_ROUND_UP(PAGE_SIZE - ZS_MIN_ALLOC_SIZE_FULL, \
+		      PAGE_SIZE >> CLASS_BITS) + 1)
+
+#define ZS_MAX_OBJ_COUNT_FULL \
+	(CONFIG_ZSMALLOC_CHAIN_SIZE * PAGE_SIZE / 32)
+#define OBJ_CLASS_BITS_NEEDED	(ilog2(ZS_SIZE_CLASSES_FULL - 1) + 1)
+#define OBJ_IDX_BITS_NEEDED	(ilog2(ZS_MAX_OBJ_COUNT_FULL - 1) + 1)
+
+static_assert(_PFN_BITS + OBJ_CLASS_BITS_NEEDED + OBJ_IDX_BITS_NEEDED
+	      <= BITS_PER_LONG,
+	"zsmalloc: class_idx + obj_idx + PFN do not fit in obj on this config");
+
+#define OBJ_CLASS_BITS		OBJ_CLASS_BITS_NEEDED
+#define OBJ_IDX_BITS		(OBJ_INDEX_BITS - OBJ_CLASS_BITS)
+#define OBJ_IDX_MASK		((_AC(1, UL) << OBJ_IDX_BITS) - 1)
+#define OBJ_CLASS_MASK		((_AC(1, UL) << OBJ_CLASS_BITS) - 1)
+
+#else /* !ZS_OBJ_CLASS_IDX */
+
+#define OBJ_IDX_BITS		OBJ_INDEX_BITS
+#define OBJ_IDX_MASK		OBJ_INDEX_MASK
+
+#endif /* ZS_OBJ_CLASS_IDX */
+
 #define ZS_MAX_PAGES_PER_ZSPAGE	(_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL))
 
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
-	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_IDX_BITS))
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
 
@@ -722,7 +785,7 @@ static void obj_to_location(unsigned long obj, struct zpdesc **zpdesc,
 				unsigned int *obj_idx)
 {
 	*zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS);
-	*obj_idx = (obj & OBJ_INDEX_MASK);
+	*obj_idx = (obj & OBJ_IDX_MASK);
 }
 
 static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc)
@@ -730,17 +793,29 @@ static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc)
 	*zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS);
 }
 
+#ifdef ZS_OBJ_CLASS_IDX
+static unsigned int obj_to_class_idx(unsigned long obj)
+{
+	return (obj >> OBJ_IDX_BITS) & OBJ_CLASS_MASK;
+}
+#endif
+
 /**
- * location_to_obj - get obj value encoded from (<zpdesc>, <obj_idx>)
+ * location_to_obj - encode (<zpdesc>, <obj_idx>, <class_idx>) into obj value
  * @zpdesc: zpdesc object resides in zspage
  * @obj_idx: object index
+ * @class_idx: size class index (used only when ZS_OBJ_CLASS_IDX is defined)
  */
-static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx)
+static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx,
+				     unsigned int class_idx)
 {
 	unsigned long obj;
 
 	obj = zpdesc_pfn(zpdesc) << OBJ_INDEX_BITS;
-	obj |= obj_idx & OBJ_INDEX_MASK;
+#ifdef ZS_OBJ_CLASS_IDX
+	obj |= (unsigned long)(class_idx & OBJ_CLASS_MASK) << OBJ_IDX_BITS;
+#endif
+	obj |= obj_idx & OBJ_IDX_MASK;
 
 	return obj;
 }
@@ -1276,7 +1351,7 @@ static unsigned long obj_malloc(struct zs_pool *pool,
 	kunmap_local(vaddr);
 	mod_zspage_inuse(zspage, 1);
 
-	obj = location_to_obj(m_zpdesc, obj);
+	obj = location_to_obj(m_zpdesc, obj, zspage->class);
 	record_obj(handle, obj);
 
 	return obj;
@@ -1762,7 +1837,13 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 
 			old_obj = handle_to_obj(handle);
 			obj_to_location(old_obj, &dummy, &obj_idx);
-			new_obj = (unsigned long)location_to_obj(newzpdesc, obj_idx);
+#ifdef ZS_OBJ_CLASS_IDX
+			new_obj = (unsigned long)location_to_obj(newzpdesc,
+					obj_idx, obj_to_class_idx(old_obj));
+#else
+			new_obj = (unsigned long)location_to_obj(newzpdesc,
+					obj_idx, 0);
+#endif
 			record_obj(handle, new_obj);
 		}
 	}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems
  2026-05-08  6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
@ 2026-05-08  6:19 ` Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage Wenchao Hao
  2 siblings, 0 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08  6:19 UTC (permalink / raw)
  To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
	linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
	Paul Walmsley, Sergey Senozhatsky
  Cc: Wenchao Hao, Wenchao Hao

With class_idx now encoded in the obj value (ZS_OBJ_CLASS_IDX),
zs_free() no longer needs pool->lock to locate the size class on
64-bit systems.

The class_idx is invariant across page migration (only PFN changes),
and 64-bit aligned reads are atomic, so a lockless read of the handle
always yields a valid class_idx.  After acquiring class->lock (which
blocks concurrent migration), the handle is re-read to obtain a stable
PFN for the actual free operation.

This eliminates rwlock read-side contention between zs_free() and page
migration/compaction, improving zs_free() scalability on multi-core
systems.

On 32-bit systems (ZS_OBJ_CLASS_IDX not defined), the original
pool->lock path is preserved.

Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
---
 mm/zsmalloc.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bccadf0a27f2..47ec0414ce9e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -21,6 +21,10 @@
  *	pool->lock
  *	class->lock
  *	zspage->lock
+ *
+ * On 64-bit systems with ZS_OBJ_CLASS_IDX enabled, zs_free() does not
+ * take pool->lock; it extracts class_idx from the obj encoding with a
+ * lockless read, then re-reads obj under class->lock.
  */
 
 #include <linux/module.h>
@@ -1467,10 +1471,24 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	if (IS_ERR_OR_NULL((void *)handle))
 		return;
 
+#ifdef ZS_OBJ_CLASS_IDX
+	/*
+	 * The class_idx encoded in obj is invariant across migration
+	 * (only PFN changes), and the read of *(unsigned long *)handle
+	 * is atomic on 64-bit, so we can determine the correct class
+	 * without holding pool->lock.
+	 */
+	obj = handle_to_obj(handle);
+	class = pool->size_class[obj_to_class_idx(obj)];
+	spin_lock(&class->lock);
 	/*
-	 * The pool->lock protects the race with zpage's migration
-	 * so it's safe to get the page from handle.
+	 * Re-read under class->lock: migration also acquires class->lock,
+	 * so the obj value is now stable and the PFN is valid.
 	 */
+	obj = handle_to_obj(handle);
+	obj_to_zpdesc(obj, &f_zpdesc);
+	zspage = get_zspage(f_zpdesc);
+#else
 	read_lock(&pool->lock);
 	obj = handle_to_obj(handle);
 	obj_to_zpdesc(obj, &f_zpdesc);
@@ -1478,6 +1496,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	class = zspage_class(pool, zspage);
 	spin_lock(&class->lock);
 	read_unlock(&pool->lock);
+#endif
 
 	class_stat_sub(class, ZS_OBJS_INUSE, 1);
 	obj_free(class->size, obj);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage
  2026-05-08  6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
  2026-05-08  6:19 ` [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems Wenchao Hao
@ 2026-05-08  6:19 ` Wenchao Hao
  2 siblings, 0 replies; 4+ messages in thread
From: Wenchao Hao @ 2026-05-08  6:19 UTC (permalink / raw)
  To: Albert Ou, Alexandre Ghiti, Andrew Morton, Barry Song,
	linux-kernel, linux-mm, linux-riscv, Minchan Kim, Palmer Dabbelt,
	Paul Walmsley, Sergey Senozhatsky
  Cc: Wenchao Hao, Xueyuan Chen, Wenchao Hao

From: Xueyuan Chen <xueyuan.chen21@gmail.com>

Currently in zs_free(), the class->lock is held until the zspage is
completely freed and the counters are updated. However, freeing pages back
to the buddy allocator requires acquiring the zone lock.

Under heavy memory pressure, zone lock contention can be severe. When this
happens, the CPU holding the class->lock will stall waiting for the zone
lock, thereby blocking all other CPUs attempting to acquire the same
class->lock.

This patch shrinks the critical section of the class->lock to reduce lock
contention. By moving the actual page freeing process outside the
class->lock, we can improve the concurrency performance of zs_free().

Testing on the RADXA O6 platform shows that with 12 CPUs concurrently
performing zs_free() operations, the execution time is reduced by 20%.

Signed-off-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
Signed-off-by: Wenchao Hao <haowenchao@xiaomi.com>
---
 mm/zsmalloc.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 47ec0414ce9e..4b01fb215b19 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -880,13 +880,10 @@ static int trylock_zspage(struct zspage *zspage)
 	return 0;
 }
 
-static void __free_zspage(struct zs_pool *pool, struct size_class *class,
-				struct zspage *zspage)
+static inline void __free_zspage_lockless(struct zs_pool *pool, struct zspage *zspage)
 {
 	struct zpdesc *zpdesc, *next;
 
-	assert_spin_locked(&class->lock);
-
 	VM_BUG_ON(get_zspage_inuse(zspage));
 	VM_BUG_ON(zspage->fullness != ZS_INUSE_RATIO_0);
 
@@ -902,7 +899,13 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
 	} while (zpdesc != NULL);
 
 	cache_free_zspage(zspage);
+}
 
+static void __free_zspage(struct zs_pool *pool, struct size_class *class,
+				struct zspage *zspage)
+{
+	assert_spin_locked(&class->lock);
+	__free_zspage_lockless(pool, zspage);
 	class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
 	atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated);
 }
@@ -1467,6 +1470,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	unsigned long obj;
 	struct size_class *class;
 	int fullness;
+	struct zspage *zspage_to_free = NULL;
 
 	if (IS_ERR_OR_NULL((void *)handle))
 		return;
@@ -1502,10 +1506,22 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 	obj_free(class->size, obj);
 
 	fullness = fix_fullness_group(class, zspage);
-	if (fullness == ZS_INUSE_RATIO_0)
-		free_zspage(pool, class, zspage);
+	if (fullness == ZS_INUSE_RATIO_0) {
+		if (trylock_zspage(zspage)) {
+			remove_zspage(class, zspage);
+			class_stat_sub(class, ZS_OBJS_ALLOCATED,
+				class->objs_per_zspage);
+			zspage_to_free = zspage;
+		} else
+			kick_deferred_free(pool);
+	}
 
 	spin_unlock(&class->lock);
+
+	if (likely(zspage_to_free)) {
+		__free_zspage_lockless(pool, zspage_to_free);
+		atomic_long_sub(class->pages_per_zspage, &pool->pages_allocated);
+	}
 	cache_free_handle(handle);
 }
 EXPORT_SYMBOL_GPL(zs_free);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-08  6:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08  6:19 [RFC PATCH 0/3] mm/zsmalloc: reduce lock contention in zs_free() Wenchao Hao
2026-05-08  6:19 ` [RFC PATCH 1/3] mm/zsmalloc: encode class index in obj value for lockless class lookup Wenchao Hao
2026-05-08  6:19 ` [RFC PATCH 2/3] mm/zsmalloc: remove pool->lock from zs_free on 64-bit systems Wenchao Hao
2026-05-08  6:19 ` [RFC PATCH 3/3] mm/zsmalloc: drop class lock before freeing zspage Wenchao Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox