From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44FA113A244 for ; Sun, 28 Jun 2026 04:36:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782621369; cv=none; b=pUFNpIWm+Ogdf8ixPuXmBmXv6aFf3AKp2+vsBp1Qnuh0JEp57ZpGeq4hYQ4tIZZ5uM02X3LeWHJHDtuA/kVFAJXHVg2EcsKjmFdk7GawqMgrV/ZEMFlWZU25jAsR0gSjO0M+F4VWf9R5rIqse/wYhzOqoDDAOpdaq8XwaOS8m44= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782621369; c=relaxed/simple; bh=ko/aGn02gxxxMlJIii41kUuUgkNhrRkSXuSpIQj77Yc=; h=Date:To:From:Subject:Message-Id; b=pOsjQbWnjGZY6KhEm+eAVory35LVt+dmbejGd/X1BB0/a1siOrBf3wBIxrkZaIpYXN/DSjGepLZH6UwhtbT2JEDtEoAlrZa9kn5iT6o6xv6rL7bwuwuLHdiJOMKxffO6ZA4LyiWiuwU33LG0+ilN9bE0QEyHC7W6TNkCoZRWvJM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=zWLrY+9L; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="zWLrY+9L" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C15561F000E9; Sun, 28 Jun 2026 04:36:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=korg; t=1782621367; bh=VeuSpOumieaMUTEWTkiiGkHqqokozq/RUCABlH+SsWk=; h=Date:To:From:Subject; b=zWLrY+9LoZNGpHyuS58WgjxFU+bCSKR1y343/XnFyEnryiidjqUjVCMX0ZH3w+4Mh rJzB59Op/y46AYJtYjfE7qmjhquI8qOnBlmcZwIz0HY78O1DMd9fpIlvS2Bpi61gh9 wPilGA/yGje7IKmOjrH3kAPi7IB9YQdtbb2HHZ7o= Date: Sat, 27 Jun 2026 21:36:07 -0700 To: mm-commits@vger.kernel.org,xueyuan.chen21@gmail.com,senozhatsky@chromium.org,nphamcs@gmail.com,minchan@kernel.org,joshua.hahnjy@gmail.com,baohua@kernel.org,haowenchao@xiaomi.com,akpm@linux-foundation.org From: Andrew Morton Subject: + mm-zsmalloc-encode-class-index-in-obj-value-for-lockless-class-lookup.patch added to mm-new branch Message-Id: <20260628043607.C15561F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: mm/zsmalloc: encode class index in obj value for lockless class lookup has been added to the -mm mm-new branch. Its filename is mm-zsmalloc-encode-class-index-in-obj-value-for-lockless-class-lookup.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-zsmalloc-encode-class-index-in-obj-value-for-lockless-class-lookup.patch This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. The mm-new branch of mm.git is not included in linux-next If a few days of testing in mm-new is successful, the patch will me moved into mm.git's mm-unstable branch, which is included in linux-next Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via various branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there most days ------------------------------------------------------ From: Wenchao Hao Subject: mm/zsmalloc: encode class index in obj value for lockless class lookup Date: Fri, 26 Jun 2026 09:50:00 +0800 Patch series "mm/zsmalloc: reduce lock contention in zs_free()", v6. This series reduces lock contention in zs_free(), which dominates the unmap path under memory pressure on Android (LMK kills) and on x86 servers running zswap-heavy workloads. The current zs_free() takes pool->lock (rwlock, read side) just to look up the size_class for a handle, then takes class->lock and holds it across __free_zspage() which can call into the buddy allocator and acquire zone->lock. Two costs follow: * pool->lock reader-counter cacheline bouncing among concurrent zs_free() callers. * class->lock held across folio_put(), so any zone->lock wait fans out to every other zs_free() on the same class. The series tackles both: Patch 1: encode size_class index into obj alongside PFN and obj_idx, so zs_free() can locate the class without pool->lock. Patch 2: drop pool->lock from zs_free() on 64-bit; 32-bit unchanged. Patch 3: move zspage page-freeing out of class->lock. Patch 4: document the three free_zspage helper variants that result from the split in patch 3. Performance results: Test: each process independently mmap 256MB, write data, madvise MADV_PAGEOUT to swap out via zram (lzo-rle), then concurrent munmap. Raspberry Pi 4B (4-core ARM64 Cortex-A72): mode Base Patched Speedup single 59.0ms 56.0ms 1.05x multi 2p 94.6ms 66.7ms 1.42x multi 4p 202.9ms 110.6ms 1.83x x86 (20-core Intel i7-12700, 16 concurrent processes): mode Base Patched Speedup single 11.7ms 9.8ms 1.19x multi 2p 24.1ms 17.2ms 1.40x multi 4p 63.0ms 45.3ms 1.39x This patch (of 4): Encode the size_class index (class_idx) into the obj value so that zs_free() can determine the correct size_class without dereferencing the handle->obj->PFN->zpdesc->zspage->class chain under pool->lock. class_idx is invariant across page migration (only PFN is rewritten), so a lockless read of obj always yields a valid class_idx. Where obj has more bits below the PFN field than obj_idx alone needs, split that space into class_idx and obj_idx subfields: |<-- _PFN_BITS -->|<-- ZS_OBJ_CLASS_BITS -->|<-- ZS_OBJ_IDX_BITS -->| +-----------------+-------------------------+-----------------------+ | PFN | class_idx | obj_idx | +-----------------+-------------------------+-----------------------+ MSB ^ LSB | +-- ZS_OBJ_PFN_SHIFT The macro layout changes as follows: Before After Meaning ---------------- ------------------ ---------------------------- OBJ_INDEX_BITS ZS_OBJ_IDX_BITS width of obj_idx subfield OBJ_INDEX_MASK ZS_OBJ_IDX_MASK mask of obj_idx subfield (n/a) ZS_OBJ_CLASS_BITS width of class_idx subfield (n/a) ZS_OBJ_CLASS_MASK mask of class_idx subfield (n/a) ZS_OBJ_PFN_SHIFT bit offset of PFN in obj ZS_OBJ_CLASS_BITS folds to 0 (and the layout collapses to [PFN | obj_idx]) when obj has no spare bits, i.e. on 32-bit or on 64-bit fallback paths where MAX_POSSIBLE_PHYSMEM_BITS == BITS_PER_LONG (e.g. UML); zs_free() then falls back to pool->lock. Link: https://lore.kernel.org/20260626015003.2965881-1-haowenchao22@gmail.com Link: https://lore.kernel.org/20260626015003.2965881-2-haowenchao22@gmail.com Signed-off-by: Wenchao Hao Reviewed-by: Nhat Pham Cc: Barry Song Cc: Joshua Hahn Cc: Minchan Kim Cc: Sergey Senozhatsky Cc: Xueyuan Chen Signed-off-by: Andrew Morton --- mm/zsmalloc.c | 105 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 91 insertions(+), 14 deletions(-) --- a/mm/zsmalloc.c~mm-zsmalloc-encode-class-index-in-obj-value-for-lockless-class-lookup +++ a/mm/zsmalloc.c @@ -67,8 +67,8 @@ #define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS #else /* - * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just - * be PAGE_SHIFT + * If this definition of MAX_PHYSMEM_BITS is used, ZS_OBJ_PFN_SHIFT will + * just be PAGE_SHIFT */ #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG #endif @@ -88,8 +88,23 @@ #define OBJ_TAG_BITS 1 #define OBJ_TAG_MASK OBJ_ALLOCATED_TAG -#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS) -#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) +/* + * obj is encoded as [PFN | class_idx | obj_idx] within an unsigned long: + * + * |<-- _PFN_BITS -->|<-- ZS_OBJ_CLASS_BITS -->|<-- ZS_OBJ_IDX_BITS -->| + * +-----------------+-------------------------+-----------------------+ + * | PFN | class_idx | obj_idx | + * +-----------------+-------------------------+-----------------------+ + * MSB ^ LSB + * | + * +-- ZS_OBJ_PFN_SHIFT + * + * Encoding class_idx into obj lets zs_free() locate the size_class + * without holding pool->lock; class_idx is invariant across page + * migration (only PFN changes), so a lockless read of the obj value + * always yields a valid class_idx. + */ +#define ZS_OBJ_PFN_SHIFT (BITS_PER_LONG - _PFN_BITS) #define HUGE_BITS 1 #define FULLNESS_BITS 4 @@ -98,9 +113,61 @@ #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL)) +/* + * Bits to index a page within a zspage = ceil(log2(ZS_MAX_PAGES_PER_ZSPAGE)). + * Computed at preprocessor time, for use in #if below. Kconfig + * restricts ZSMALLOC_CHAIN_SIZE to [4, 16]. + */ +#if ZS_MAX_PAGES_PER_ZSPAGE <= 4 +#define ZS_PAGES_PER_ZSPAGE_BITS 2 +#elif ZS_MAX_PAGES_PER_ZSPAGE <= 8 +#define ZS_PAGES_PER_ZSPAGE_BITS 3 +#elif ZS_MAX_PAGES_PER_ZSPAGE <= 16 +#define ZS_PAGES_PER_ZSPAGE_BITS 4 +#else +#error "ZSMALLOC_CHAIN_SIZE out of expected range [4,16]" +#endif + +/* + * Bits to index an object within a single PAGE_SIZE at the smallest + * possible object size: log2(PAGE_SIZE / 32) = PAGE_SHIFT - 5. + * 32 is the hard floor of ZS_MIN_ALLOC_SIZE. + */ +#define ZS_OBJS_PER_PAGE_BITS (PAGE_SHIFT - 5) + +/* + * Bits to index any object in the densest possible zspage. Below this, + * ZS_MIN_ALLOC_SIZE is auto-raised by the MAX(32, ...) formula -- still + * correct, but objects are coarser. + */ +#define ZS_OBJS_PER_ZSPAGE_BITS \ + (ZS_PAGES_PER_ZSPAGE_BITS + ZS_OBJS_PER_PAGE_BITS) + +/* + * Encode class_idx only when obj has spare bits; otherwise + * ZS_OBJ_CLASS_BITS folds to 0 (32-bit, or 64-bit UML/fallback). + */ +#if BITS_PER_LONG >= 64 && \ + ZS_OBJ_PFN_SHIFT >= (CLASS_BITS + 1) + ZS_OBJS_PER_ZSPAGE_BITS +#define ZS_OBJ_CLASS_BITS (CLASS_BITS + 1) +#else +#define ZS_OBJ_CLASS_BITS 0 +#endif +#define ZS_OBJ_CLASS_MASK ((_AC(1, UL) << ZS_OBJ_CLASS_BITS) - 1) + +#define ZS_OBJ_IDX_BITS (ZS_OBJ_PFN_SHIFT - ZS_OBJ_CLASS_BITS) +#define ZS_OBJ_IDX_MASK ((_AC(1, UL) << ZS_OBJ_IDX_BITS) - 1) + +/* + * Belt-and-suspenders: the #if above already guarantees this when + * class_idx is enabled. Catches future tweaks that bypass it. + */ +static_assert(ZS_OBJ_IDX_BITS >= ZS_PAGES_PER_ZSPAGE_BITS, + "zsmalloc: ZS_MIN_ALLOC_SIZE would exceed ZS_MAX_ALLOC_SIZE"); + /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ #define ZS_MIN_ALLOC_SIZE \ - MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) + MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> ZS_OBJ_IDX_BITS)) /* each chunk includes extra space to keep handle */ #define ZS_MAX_ALLOC_SIZE PAGE_SIZE @@ -720,26 +787,35 @@ static struct zpdesc *get_next_zpdesc(st static void obj_to_location(unsigned long obj, struct zpdesc **zpdesc, unsigned int *obj_idx) { - *zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS); - *obj_idx = (obj & OBJ_INDEX_MASK); + *zpdesc = pfn_zpdesc(obj >> ZS_OBJ_PFN_SHIFT); + *obj_idx = (obj & ZS_OBJ_IDX_MASK); } static void obj_to_zpdesc(unsigned long obj, struct zpdesc **zpdesc) { - *zpdesc = pfn_zpdesc(obj >> OBJ_INDEX_BITS); + *zpdesc = pfn_zpdesc(obj >> ZS_OBJ_PFN_SHIFT); +} + +/* Folds to 0 when ZS_OBJ_CLASS_BITS == 0; no ifdef needed at callers. */ +static unsigned int obj_to_class_idx(unsigned long obj) +{ + return (obj >> ZS_OBJ_IDX_BITS) & ZS_OBJ_CLASS_MASK; } /** - * location_to_obj - get obj value encoded from (, ) + * location_to_obj - encode (, , ) into obj value * @zpdesc: zpdesc object resides in zspage * @obj_idx: object index + * @class_idx: size class index; ignored when ZS_OBJ_CLASS_BITS == 0 */ -static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx) +static unsigned long location_to_obj(struct zpdesc *zpdesc, unsigned int obj_idx, + unsigned int class_idx) { unsigned long obj; - obj = zpdesc_pfn(zpdesc) << OBJ_INDEX_BITS; - obj |= obj_idx & OBJ_INDEX_MASK; + obj = zpdesc_pfn(zpdesc) << ZS_OBJ_PFN_SHIFT; + obj |= (unsigned long)(class_idx & ZS_OBJ_CLASS_MASK) << ZS_OBJ_IDX_BITS; + obj |= obj_idx & ZS_OBJ_IDX_MASK; return obj; } @@ -1275,7 +1351,7 @@ static unsigned long obj_malloc(struct z kunmap_local(vaddr); mod_zspage_inuse(zspage, 1); - obj = location_to_obj(m_zpdesc, obj); + obj = location_to_obj(m_zpdesc, obj, zspage->class); record_obj(handle, obj); return obj; @@ -1761,7 +1837,8 @@ static int zs_page_migrate(struct page * old_obj = handle_to_obj(handle); obj_to_location(old_obj, &dummy, &obj_idx); - new_obj = (unsigned long)location_to_obj(newzpdesc, obj_idx); + new_obj = location_to_obj(newzpdesc, obj_idx, + obj_to_class_idx(old_obj)); record_obj(handle, new_obj); } } _ Patches currently in -mm which might be from haowenchao@xiaomi.com are mm-zsmalloc-encode-class-index-in-obj-value-for-lockless-class-lookup.patch mm-zsmalloc-drop-pool-lock-from-zs_free-on-64-bit-systems.patch mm-zsmalloc-document-free_zspage-helper-variants.patch