From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C486C369AB for ; Sat, 12 Apr 2025 19:25:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECF0468004B; Sat, 12 Apr 2025 15:25:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E77AC680041; Sat, 12 Apr 2025 15:25:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF99968004B; Sat, 12 Apr 2025 15:25:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A776C680041 for ; Sat, 12 Apr 2025 15:25:12 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 429C61218D3 for ; Sat, 12 Apr 2025 19:25:12 +0000 (UTC) X-FDA: 83326370064.18.3ABBFBA Received: from server4.hayhost.am (server4.hayhost.am [2.56.206.6]) by imf22.hostedemail.com (Postfix) with ESMTP id B07A8C0002 for ; Sat, 12 Apr 2025 19:25:09 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=beldev.am header.s=default header.b=R1oJw+nQ; spf=pass (imf22.hostedemail.com: domain of igor.b@beldev.am designates 2.56.206.6 as permitted sender) smtp.mailfrom=igor.b@beldev.am; dmarc=pass (policy=none) header.from=beldev.am ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744485910; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LpHFoMetpdb/6iKGcltYh3h5FH8mtmHNQU97K7fBWLE=; b=XMQnhVMdPLeYyh4yxYIsURejCpepC3OL8yQnaRt8cSI8uyTilOZQKv+RflXZ3lc84gO4DV ZaNS8h+dkWF2283atXWdy6NaoqSKOrP9rB/fsTb3jyiwOvE8oilEmvyRtNp0WZP+GQrMLr kGZVemhWPnhL0vIa5C+KFFFiprRccI8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=beldev.am header.s=default header.b=R1oJw+nQ; spf=pass (imf22.hostedemail.com: domain of igor.b@beldev.am designates 2.56.206.6 as permitted sender) smtp.mailfrom=igor.b@beldev.am; dmarc=pass (policy=none) header.from=beldev.am ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744485910; a=rsa-sha256; cv=none; b=EOcrK9nPh2yhUID1E5UY/T4BNsJ9/fYdRd6ayIu6Vd1D6gDmHcy6KAwvevUKLfzkGzKnoX sXObcrmXjs9zGKivGc7X02ZR07dErfoqKWOmLJ9HDyH49DMVQktiK4KfKGQ4ZjvDeJ0P/7 i4qF3Wf7+hlERvynNugaqM/+s/UU/B8= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=beldev.am; s=default; h=Content-Transfer-Encoding:Content-Type:Message-ID:References: In-Reply-To:Subject:Cc:To:From:Date:MIME-Version:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=LpHFoMetpdb/6iKGcltYh3h5FH8mtmHNQU97K7fBWLE=; b=R1oJw+nQqrl8H543arisTqn777 NCeYJubIK5VhDLd6vn6h26dbPj594I1ojPKudtqYHnuBwFejlU9PjK0E5gFajhDNBmtu6yXRTG97J wmDV/vu/lz8vLOZMn78jeEarLBxDJve3IhjJ3Q+Ts/N+eGN/JigblbrQoaB1djnYIluaLCNrWVjpq BDqQrTYRoH5GH155nhXFBah5RCAOzAjxxPiIacJogrrWdRMkorms1f9xSu33lUgwOf158Mhk9Clw/ cQzJEjzlpQuz7riAu9w3hdObPv5Ymmqbb9O/n6pxZ9Bmj4Vw+YHhq5VMR+rn5tXEWBcX+TNS9tfZu IEEvycbQ==; Received: from [::1] (port=46660 helo=server4.hayhost.am) by server4.hayhost.am with esmtpa (Exim 4.98.1) (envelope-from ) id 1u3gTf-00000000FOT-0RDU; Sat, 12 Apr 2025 23:25:19 +0400 MIME-Version: 1.0 Date: Sat, 12 Apr 2025 23:25:16 +0400 From: Igor Belousov To: Vitaly Wool Cc: linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Nhat Pham , Shakeel Butt , Johannes Weiner Subject: Re: [PATCH v4] mm: add zblock allocator In-Reply-To: <20250412154207.2152667-1-vitaly.wool@konsulko.se> References: <20250412154207.2152667-1-vitaly.wool@konsulko.se> User-Agent: Roundcube Webmail/1.6.9 Message-ID: <7185823ee82bb476cf86ea82a6f70a85@beldev.am> X-Sender: igor.b@beldev.am Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server4.hayhost.am X-AntiAbuse: Original Domain - kvack.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - beldev.am X-Get-Message-Sender-Via: server4.hayhost.am: authenticated_id: igor.b@beldev.am X-Authenticated-Sender: server4.hayhost.am: igor.b@beldev.am X-Rspamd-Queue-Id: B07A8C0002 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: t794dasgxw5pw5kxjm7s6qpjo9m4g7pe X-HE-Tag: 1744485909-338744 X-HE-Meta: U2FsdGVkX18ryQ2J5+XfSSye6ZYXo7Q0tpMaFAuRsYj7hOMjqLwHjoQ8FsHzI+mYooJpanumruF5FGNwlgcLhjBh8JUaqO9M+Gznp1xk0VVu11s/Zo3IZ8zJhdkpXA+SInDOnav+m/K0tNIcQK3CRrkQ+GxrXQBCGMRQeDUn+ga0BkbxiWKsdK1FYGMVoKfQ8PJs9NgCot+a7m1omO8kxUsPqptl45KR2tknn4xOuQD2zGks/PV3NKfb7WPt4qF5n9mE4CqZux6VvFFdblE8nilybU+XBlsk0RMnXkotZkD1UvytazWoE1b2imGAPaX/agzMBiH9ce4CfZSBpCkWeTZTvbL8r1CQxQ8XqUTV8rocFo0I/YcRnSaVkK+WGTgSABbwGv7V9nIGF1L4HKkz9lc/rlbK+6EVoINDxi+WgXgFbuG2vT1KKiruOrj8qTEnNaoNtkMaiu7mDcLAzSL9VqISG4y0oMrbEpdiVHDqzFh/udiqmLNF4lKzGkL8hFz/RWQh2XtvpXik1NuU38RLXtZD4hnx+h501YR8YeVlQ2avRT7aWvmemT2dv2ha34lyvxJUfpKTBBNagJl3vf3POle6UPxYO3MWThaTznv+3nBUVWl/uxHy+HXl5stfgz9sXqfvLZCLH1dnLOJ6kGWELs7TjZ9sc7b+DSPj+oeTVt8BuYAihKS28YLlPQQ9SR8WqQ7ZZarxQobPZv+AS9b/wiab3O93tOi1MPV/Ff7M7N3EbOSoIZ9WngL9hmTVD7A6dvogWgcgOeXCmlhIQuejM6OF4cylYT24R9x3ICr4AzJhmiBzJuS7PylAPPK+D/jWp/ZEzskHo3YOyEDpRSqPo+bl5Fj6qROmjT2FzFLuIyI/tjOy3Nwla0KHwkTjBsznpzzsHNd2tP4eokakjIBqHmyDDuJx7PJltKuD7sFWaLDxRlE9Kvek3WEdCE8yyxc5tMl9jniPEafybKB1lts m4ZzbRzT tg2WPG3asm5c2Z6o0gwPj3TU1zmc6HegOlDkWgNHzot7tYFi1GnCzuhOICQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2025-04-12 19:42 skrev Vitaly Wool: > zblock is a special purpose allocator for storing compressed pages. > It stores integer number of same size objects per its block. These > blocks consist of several physical pages (2**n, i. e. 1/2/4/8). > > With zblock, it is possible to densely arrange objects of various sizes > resulting in low internal fragmentation. Also this allocator tries to > fill incomplete blocks instead of adding new ones, in many cases > providing a compression ratio comparable to zmalloc's. > > zblock is also in most cases superior to zsmalloc with regard to > average performance and worst execution times, thus allowing for better > response time and real-time characteristics of the whole system. > > High memory and page migration are currently not supported by zblock. > > Signed-off-by: Vitaly Wool > Signed-off-by: Igor Belousov Tested-by: Igor Belousov > --- > v3: > https://patchwork.kernel.org/project/linux-mm/patch/20250408125211.1611879-1-vitaly.wool@konsulko.se/ > Changes since v3: > - rebased and tested against the latest -mm tree > - the block descriptors table was updated for better compression ratio > - fixed the bug with wrong SLOT_BITS value > - slot search moved to find_and_claim_block() > > Test results (zstd compressor, 8 core Ryzen 9 VM, make bzImage): > - zblock: > real 6m52.621s > user 33m41.771s > sys 6m28.825s > Zswap: 162328 kB > Zswapped: 754468 kB > zswpin 93851 > zswpout 542481 > zswpwb 935 > - zsmalloc: > real 7m4.355s > user 34m37.538s > sys 6m22.086s > zswpin 101243 > zswpout 448217 > zswpwb 640 > Zswap: 175704 kB > Zswapped: 778692 kB > > Documentation/mm/zblock.rst | 24 ++ > MAINTAINERS | 7 + > mm/Kconfig | 12 + > mm/Makefile | 1 + > mm/zblock.c | 443 ++++++++++++++++++++++++++++++++++++ > mm/zblock.h | 176 ++++++++++++++ > 6 files changed, 663 insertions(+) > create mode 100644 Documentation/mm/zblock.rst > create mode 100644 mm/zblock.c > create mode 100644 mm/zblock.h > > diff --git a/Documentation/mm/zblock.rst b/Documentation/mm/zblock.rst > new file mode 100644 > index 000000000000..9751434d0b76 > --- /dev/null > +++ b/Documentation/mm/zblock.rst > @@ -0,0 +1,24 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +====== > +zblock > +====== > + > +zblock is a special purpose allocator for storing compressed pages. > +It stores integer number of compressed objects per its block. These > +blocks consist of several physical pages (2**n, i. e. 1/2/4/8). > + > +With zblock, it is possible to densely arrange objects of various > sizes > +resulting in low internal fragmentation. Also this allocator tries to > +fill incomplete blocks instead of adding new ones, in many cases > +providing a compression ratio substantially higher than z3fold and > zbud > +(though lower than zmalloc's). > + > +zblock does not require MMU to operate and also is superior to > zsmalloc > +with regard to average performance and worst execution times, thus > +allowing for better response time and real-time characteristics of the > +whole system. > + > +E. g. on a series of stress-ng tests run on a Raspberry Pi 5, we get > +5-10% higher value for bogo ops/s in zblock/zsmalloc comparison. > + > diff --git a/MAINTAINERS b/MAINTAINERS > index 96b827049501..46465c986005 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -26640,6 +26640,13 @@ > F: Documentation/networking/device_drivers/hamradio/z8530drv.rst > F: drivers/net/hamradio/*scc.c > F: drivers/net/hamradio/z8530.h > > +ZBLOCK COMPRESSED SLAB MEMORY ALLOCATOR > +M: Vitaly Wool > +L: linux-mm@kvack.org > +S: Maintained > +F: Documentation/mm/zblock.rst > +F: mm/zblock.[ch] > + > ZD1211RW WIRELESS DRIVER > L: linux-wireless@vger.kernel.org > S: Orphan > diff --git a/mm/Kconfig b/mm/Kconfig > index e113f713b493..5aa1479151ec 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -152,6 +152,18 @@ config ZSWAP_ZPOOL_DEFAULT > default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC > default "" > > +config ZBLOCK > + tristate "Fast compression allocator with high density" > + depends on ZPOOL > + help > + A special purpose allocator for storing compressed pages. > + It stores integer number of same size compressed objects per > + its block. These blocks consist of several physical pages > + (2**n, i. e. 1/2/4/8). > + > + With zblock, it is possible to densely arrange objects of > + various sizes resulting in low internal fragmentation. > + > config ZSMALLOC > tristate > prompt "N:1 compression allocator (zsmalloc)" if (ZSWAP || ZRAM) > diff --git a/mm/Makefile b/mm/Makefile > index e7f6bbf8ae5f..9d7e5b5bb694 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -116,6 +116,7 @@ obj-$(CONFIG_DEBUG_VM_PGTABLE) += > debug_vm_pgtable.o > obj-$(CONFIG_PAGE_OWNER) += page_owner.o > obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o > obj-$(CONFIG_ZPOOL) += zpool.o > +obj-$(CONFIG_ZBLOCK) += zblock.o > obj-$(CONFIG_ZSMALLOC) += zsmalloc.o > obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o > obj-$(CONFIG_CMA) += cma.o > diff --git a/mm/zblock.c b/mm/zblock.c > new file mode 100644 > index 000000000000..ecc7aeb611af > --- /dev/null > +++ b/mm/zblock.c > @@ -0,0 +1,443 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * zblock.c > + * > + * Author: Vitaly Wool > + * Based on the work from Ananda Badmaev > + * Copyright (C) 2022-2025, Konsulko AB. > + * > + * Zblock is a small object allocator with the intention to serve as a > + * zpool backend. It operates on page blocks which consist of number > + * of physical pages being a power of 2 and store integer number of > + * compressed pages per block which results in determinism and > simplicity. > + * > + * zblock doesn't export any API and is meant to be used via zpool > API. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "zblock.h" > + > +static struct rb_root block_desc_tree = RB_ROOT; > + > +/* Encode handle of a particular slot in the pool using metadata */ > +static inline unsigned long metadata_to_handle(struct zblock_block > *block, > + unsigned int block_type, unsigned int slot) > +{ > + return (unsigned long)(block) | (block_type << SLOT_BITS) | slot; > +} > + > +/* Return block, block type and slot in the pool corresponding to > handle */ > +static inline struct zblock_block *handle_to_metadata(unsigned long > handle, > + unsigned int *block_type, unsigned int *slot) > +{ > + *block_type = (handle & (PAGE_SIZE - 1)) >> SLOT_BITS; > + *slot = handle & SLOT_MASK; > + return (struct zblock_block *)(handle & PAGE_MASK); > +} > + > +/* > + * Find a block with at least one free slot and claim it. > + * We make sure that the first block, if exists, will always work. > + */ > +static inline struct zblock_block *find_and_claim_block(struct > block_list *b, > + int block_type, unsigned long *handle) > +{ > + struct list_head *l = &b->active_list; > + unsigned int slot; > + > + if (!list_empty(l)) { > + struct zblock_block *z = list_first_entry(l, typeof(*z), link); > + > + if (--z->free_slots == 0) > + list_move(&z->link, &b->full_list); > + /* > + * There is a slot in the block and we just made sure it would > + * remain. > + * Find that slot and set the busy bit. > + */ > + for (slot = find_first_zero_bit(z->slot_info, > + block_desc[block_type].slots_per_block); > + slot < block_desc[block_type].slots_per_block; > + slot = find_next_zero_bit(z->slot_info, > + block_desc[block_type].slots_per_block, > + slot)) { > + if (!test_and_set_bit(slot, z->slot_info)) > + break; > + barrier(); > + } > + > + WARN_ON(slot >= block_desc[block_type].slots_per_block); > + *handle = metadata_to_handle(z, block_type, slot); > + return z; > + } > + return NULL; > +} > + > +/* > + * allocate new block and add it to corresponding block list > + */ > +static struct zblock_block *alloc_block(struct zblock_pool *pool, > + int block_type, gfp_t gfp, > + unsigned long *handle) > +{ > + struct zblock_block *block; > + struct block_list *block_list; > + > + block = (void *)__get_free_pages(gfp, block_desc[block_type].order); > + if (!block) > + return NULL; > + > + block_list = &pool->block_lists[block_type]; > + > + /* init block data */ > + block->free_slots = block_desc[block_type].slots_per_block - 1; > + memset(&block->slot_info, 0, sizeof(block->slot_info)); > + set_bit(0, block->slot_info); > + *handle = metadata_to_handle(block, block_type, 0); > + > + spin_lock(&block_list->lock); > + list_add(&block->link, &block_list->active_list); > + block_list->block_count++; > + spin_unlock(&block_list->lock); > + return block; > +} > + > +/***************** > + * API Functions > + *****************/ > +/** > + * zblock_create_pool() - create a new zblock pool > + * @gfp: gfp flags when allocating the zblock pool structure > + * @ops: user-defined operations for the zblock pool > + * > + * Return: pointer to the new zblock pool or NULL if the metadata > allocation > + * failed. > + */ > +static struct zblock_pool *zblock_create_pool(gfp_t gfp) > +{ > + struct zblock_pool *pool; > + struct block_list *block_list; > + int i; > + > + pool = kmalloc(sizeof(struct zblock_pool), gfp); > + if (!pool) > + return NULL; > + > + /* init each block list */ > + for (i = 0; i < ARRAY_SIZE(block_desc); i++) { > + block_list = &pool->block_lists[i]; > + spin_lock_init(&block_list->lock); > + INIT_LIST_HEAD(&block_list->full_list); > + INIT_LIST_HEAD(&block_list->active_list); > + block_list->block_count = 0; > + } > + return pool; > +} > + > +/** > + * zblock_destroy_pool() - destroys an existing zblock pool > + * @pool: the zblock pool to be destroyed > + * > + */ > +static void zblock_destroy_pool(struct zblock_pool *pool) > +{ > + kfree(pool); > +} > + > + > +/** > + * zblock_alloc() - allocates a slot of appropriate size > + * @pool: zblock pool from which to allocate > + * @size: size in bytes of the desired allocation > + * @gfp: gfp flags used if the pool needs to grow > + * @handle: handle of the new allocation > + * > + * Return: 0 if success and handle is set, otherwise -EINVAL if the > size or > + * gfp arguments are invalid or -ENOMEM if the pool was unable to > allocate > + * a new slot. > + */ > +static int zblock_alloc(struct zblock_pool *pool, size_t size, gfp_t > gfp, > + unsigned long *handle) > +{ > + int block_type = -1; > + struct zblock_block *block; > + struct block_list *block_list; > + > + if (!size) > + return -EINVAL; > + > + if (size > PAGE_SIZE) > + return -ENOSPC; > + > + /* find basic block type with suitable slot size */ > + if (size < block_desc[0].slot_size) > + block_type = 0; > + else { > + struct block_desc_node *block_node; > + struct rb_node *node = block_desc_tree.rb_node; > + > + while (node) { > + block_node = container_of(node, typeof(*block_node), node); > + if (size < block_node->this_slot_size) > + node = node->rb_left; > + else if (size >= block_node->next_slot_size) > + node = node->rb_right; > + else { > + block_type = block_node->block_idx + 1; > + break; > + } > + } > + } > + if (WARN_ON(block_type < 0)) > + return -EINVAL; > + if (block_type >= ARRAY_SIZE(block_desc)) > + return -ENOSPC; > + > + block_list = &pool->block_lists[block_type]; > + > + spin_lock(&block_list->lock); > + block = find_and_claim_block(block_list, block_type, handle); > + spin_unlock(&block_list->lock); > + if (block) > + return 0; > + > + /* not found block with free slots try to allocate new empty block */ > + block = alloc_block(pool, block_type, gfp & ~(__GFP_MOVABLE | > __GFP_HIGHMEM), handle); > + return block ? 0 : -ENOMEM; > +} > + > +/** > + * zblock_free() - frees the allocation associated with the given > handle > + * @pool: pool in which the allocation resided > + * @handle: handle associated with the allocation returned by > zblock_alloc() > + * > + */ > +static void zblock_free(struct zblock_pool *pool, unsigned long > handle) > +{ > + unsigned int slot, block_type; > + struct zblock_block *block; > + struct block_list *block_list; > + > + block = handle_to_metadata(handle, &block_type, &slot); > + block_list = &pool->block_lists[block_type]; > + > + spin_lock(&block_list->lock); > + /* if all slots in block are empty delete whole block */ > + if (++block->free_slots == block_desc[block_type].slots_per_block) { > + block_list->block_count--; > + list_del(&block->link); > + spin_unlock(&block_list->lock); > + free_pages((unsigned long)block, block_desc[block_type].order); > + return; > + } else if (block->free_slots == 1) > + list_move_tail(&block->link, &block_list->active_list); > + clear_bit(slot, block->slot_info); > + spin_unlock(&block_list->lock); > +} > + > +/** > + * zblock_map() - maps the allocation associated with the given handle > + * @pool: pool in which the allocation resides > + * @handle: handle associated with the allocation to be mapped > + * > + * > + * Returns: a pointer to the mapped allocation > + */ > +static void *zblock_map(struct zblock_pool *pool, unsigned long > handle) > +{ > + unsigned int block_type, slot; > + struct zblock_block *block; > + unsigned long offs; > + void *p; > + > + block = handle_to_metadata(handle, &block_type, &slot); > + offs = ZBLOCK_HEADER_SIZE + slot * block_desc[block_type].slot_size; > + p = (void *)block + offs; > + return p; > +} > + > +/** > + * zblock_unmap() - unmaps the allocation associated with the given > handle > + * @pool: pool in which the allocation resides > + * @handle: handle associated with the allocation to be unmapped > + */ > +static void zblock_unmap(struct zblock_pool *pool, unsigned long > handle) > +{ > +} > + > +/** > + * zblock_write() - write to the memory area defined by handle > + * @pool: pool in which the allocation resides > + * @handle: handle associated with the allocation > + * @handle_mem: pointer to source memory block > + * @mem_len: length of the memory block to write > + */ > +static void zblock_write(struct zblock_pool *pool, unsigned long > handle, > + void *handle_mem, size_t mem_len) > +{ > + unsigned int block_type, slot; > + struct zblock_block *block; > + unsigned long offs; > + void *p; > + > + block = handle_to_metadata(handle, &block_type, &slot); > + offs = ZBLOCK_HEADER_SIZE + slot * block_desc[block_type].slot_size; > + p = (void *)block + offs; > + memcpy(p, handle_mem, mem_len); > +} > + > +/** > + * zblock_get_total_pages() - gets the zblock pool size in pages > + * @pool: pool being queried > + * > + * Returns: size in bytes of the given pool. > + */ > +static u64 zblock_get_total_pages(struct zblock_pool *pool) > +{ > + u64 total_size; > + int i; > + > + total_size = 0; > + for (i = 0; i < ARRAY_SIZE(block_desc); i++) > + total_size += pool->block_lists[i].block_count << > block_desc[i].order; > + > + return total_size; > +} > + > +/***************** > + * zpool > + ****************/ > + > +static void *zblock_zpool_create(const char *name, gfp_t gfp) > +{ > + return zblock_create_pool(gfp); > +} > + > +static void zblock_zpool_destroy(void *pool) > +{ > + zblock_destroy_pool(pool); > +} > + > +static int zblock_zpool_malloc(void *pool, size_t size, gfp_t gfp, > + unsigned long *handle) > +{ > + return zblock_alloc(pool, size, gfp, handle); > +} > + > +static void zblock_zpool_free(void *pool, unsigned long handle) > +{ > + zblock_free(pool, handle); > +} > + > +static void *zblock_zpool_read_begin(void *pool, unsigned long handle, > + void *local_copy) > +{ > + return zblock_map(pool, handle); > +} > + > +static void zblock_zpool_obj_write(void *pool, unsigned long handle, > + void *handle_mem, size_t mem_len) > +{ > + zblock_write(pool, handle, handle_mem, mem_len); > +} > + > +static void zblock_zpool_read_end(void *pool, unsigned long handle, > + void *handle_mem) > +{ > + zblock_unmap(pool, handle); > +} > + > +static u64 zblock_zpool_total_pages(void *pool) > +{ > + return zblock_get_total_pages(pool); > +} > + > +static struct zpool_driver zblock_zpool_driver = { > + .type = "zblock", > + .owner = THIS_MODULE, > + .create = zblock_zpool_create, > + .destroy = zblock_zpool_destroy, > + .malloc = zblock_zpool_malloc, > + .free = zblock_zpool_free, > + .obj_read_begin = zblock_zpool_read_begin, > + .obj_read_end = zblock_zpool_read_end, > + .obj_write = zblock_zpool_obj_write, > + .total_pages = zblock_zpool_total_pages, > +}; > + > +MODULE_ALIAS("zpool-zblock"); > + > +static void delete_rbtree(void) > +{ > + while (!RB_EMPTY_ROOT(&block_desc_tree)) > + rb_erase(block_desc_tree.rb_node, &block_desc_tree); > +} > + > +static int __init create_rbtree(void) > +{ > + int i; > + > + for (i = 0; i < ARRAY_SIZE(block_desc); i++) { > + struct block_desc_node *block_node = kmalloc(sizeof(*block_node), > + GFP_KERNEL); > + struct rb_node **new = &block_desc_tree.rb_node, *parent = NULL; > + > + if (!block_node) { > + delete_rbtree(); > + return -ENOMEM; > + } > + if (i > 0 && block_desc[i].slot_size <= block_desc[i-1].slot_size) { > + pr_err("%s: block descriptors not in ascending order\n", > + __func__); > + delete_rbtree(); > + return -EINVAL; > + } > + block_node->this_slot_size = block_desc[i].slot_size; > + block_node->block_idx = i; > + if (i == ARRAY_SIZE(block_desc) - 1) > + block_node->next_slot_size = PAGE_SIZE; > + else > + block_node->next_slot_size = block_desc[i+1].slot_size; > + while (*new) { > + parent = *new; > + /* the array is sorted so we will always go to the right */ > + new = &((*new)->rb_right); > + } > + rb_link_node(&block_node->node, parent, new); > + rb_insert_color(&block_node->node, &block_desc_tree); > + } > + return 0; > +} > + > +static int __init init_zblock(void) > +{ > + int ret = create_rbtree(); > + > + if (ret) > + return ret; > + > + zpool_register_driver(&zblock_zpool_driver); > + return 0; > +} > + > +static void __exit exit_zblock(void) > +{ > + zpool_unregister_driver(&zblock_zpool_driver); > + delete_rbtree(); > +} > + > +module_init(init_zblock); > +module_exit(exit_zblock); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Vitaly Wool "); > +MODULE_DESCRIPTION("Block allocator for compressed pages"); > diff --git a/mm/zblock.h b/mm/zblock.h > new file mode 100644 > index 000000000000..fd72961c077a > --- /dev/null > +++ b/mm/zblock.h > @@ -0,0 +1,176 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Author: Vitaly Wool > + * Copyright (C) 2025, Konsulko AB. > + */ > +#ifndef __ZBLOCK_H__ > +#define __ZBLOCK_H__ > + > +#include > +#include > +#include > + > +#define SLOT_FREE 0 > +#define BIT_SLOT_OCCUPIED 0 > +#define BIT_SLOT_MAPPED 1 > + > +#if PAGE_SIZE == 0x1000 > +/* max 128 slots per block, max table size 32 */ > +#define SLOT_BITS 7 > +#elif PAGE_SIZE == 0x4000 > +/* max 256 slots per block, max table size 64 */ > +#define SLOT_BITS 8 > +#else > +#error Unsupported PAGE_SIZE > +#endif > + > +#define MAX_SLOTS (1 << SLOT_BITS) > +#define SLOT_MASK ((0x1UL << SLOT_BITS) - 1) > + > +#define ZBLOCK_HEADER_SIZE round_up(sizeof(struct zblock_block), > sizeof(long)) > +#define BLOCK_DATA_SIZE(order) ((PAGE_SIZE << order) - > ZBLOCK_HEADER_SIZE) > +#define SLOT_SIZE(nslots, order) (round_down((BLOCK_DATA_SIZE(order) / > nslots), sizeof(long))) > + > +/** > + * struct zblock_block - block metadata > + * Block consists of several (1/2/4/8) pages and contains fixed > + * integer number of slots for allocating compressed pages. > + * > + * free_slots: number of free slots in the block > + * slot_info: contains data about free/occupied slots > + */ > +struct zblock_block { > + struct list_head link; > + DECLARE_BITMAP(slot_info, 1 << SLOT_BITS); > + u32 free_slots; > +}; > + > +/** > + * struct block_desc - general metadata for block lists > + * Each block list stores only blocks of corresponding type which > means > + * that all blocks in it have the same number and size of slots. > + * All slots are aligned to size of long. > + * > + * slot_size: size of slot for this list > + * slots_per_block: number of slots per block for this list > + * order: order for __get_free_pages > + */ > +struct block_desc { > + unsigned int slot_size; > + unsigned short slots_per_block; > + unsigned short order; > +}; > + > +struct block_desc_node { > + struct rb_node node; > + unsigned int this_slot_size; > + unsigned int next_slot_size; > + unsigned int block_idx; > +}; > + > +static const struct block_desc block_desc[] = { > +#if PAGE_SIZE == 0x1000 > + { SLOT_SIZE(63, 0), 63, 0 }, > + { SLOT_SIZE(32, 0), 32, 0 }, > + { SLOT_SIZE(21, 0), 21, 0 }, > + { SLOT_SIZE(15, 0), 15, 0 }, > + { SLOT_SIZE(12, 0), 12, 0 }, > + { SLOT_SIZE(10, 0), 10, 0 }, > + { SLOT_SIZE(9, 0), 9, 0 }, > + { SLOT_SIZE(8, 0), 8, 0 }, > + { SLOT_SIZE(29, 2), 29, 2 }, > + { SLOT_SIZE(13, 1), 13, 1 }, > + { SLOT_SIZE(6, 0), 6, 0 }, > + { SLOT_SIZE(11, 1), 11, 1 }, > + { SLOT_SIZE(5, 0), 5, 0 }, > + { SLOT_SIZE(9, 1), 9, 1 }, > + { SLOT_SIZE(8, 1), 8, 1 }, > + { SLOT_SIZE(29, 3), 29, 3 }, > + { SLOT_SIZE(13, 2), 13, 2 }, > + { SLOT_SIZE(12, 2), 12, 2 }, > + { SLOT_SIZE(11, 2), 11, 2 }, > + { SLOT_SIZE(10, 2), 10, 2 }, > + { SLOT_SIZE(9, 2), 9, 2 }, > + { SLOT_SIZE(17, 3), 17, 3 }, > + { SLOT_SIZE(8, 2), 8, 2 }, > + { SLOT_SIZE(15, 3), 15, 3 }, > + { SLOT_SIZE(14, 3), 14, 3 }, > + { SLOT_SIZE(13, 3), 13, 3 }, > + { SLOT_SIZE(6, 2), 6, 2 }, > + { SLOT_SIZE(11, 3), 11, 3 }, > + { SLOT_SIZE(10, 3), 10, 3 }, > + { SLOT_SIZE(9, 3), 9, 3 }, > + { SLOT_SIZE(4, 2), 4, 2 }, > +#elif PAGE_SIZE == 0x4000 > + { SLOT_SIZE(255, 0), 255, 0 }, > + { SLOT_SIZE(185, 0), 185, 0 }, > + { SLOT_SIZE(145, 0), 145, 0 }, > + { SLOT_SIZE(113, 0), 113, 0 }, > + { SLOT_SIZE(92, 0), 92, 0 }, > + { SLOT_SIZE(75, 0), 75, 0 }, > + { SLOT_SIZE(60, 0), 60, 0 }, > + { SLOT_SIZE(51, 0), 51, 0 }, > + { SLOT_SIZE(43, 0), 43, 0 }, > + { SLOT_SIZE(37, 0), 37, 0 }, > + { SLOT_SIZE(32, 0), 32, 0 }, > + { SLOT_SIZE(27, 0), 27, 0 }, > + { SLOT_SIZE(23, 0), 23, 0 }, > + { SLOT_SIZE(19, 0), 19, 0 }, > + { SLOT_SIZE(17, 0), 17, 0 }, > + { SLOT_SIZE(15, 0), 15, 0 }, > + { SLOT_SIZE(13, 0), 13, 0 }, > + { SLOT_SIZE(11, 0), 11, 0 }, > + { SLOT_SIZE(10, 0), 10, 0 }, > + { SLOT_SIZE(9, 0), 9, 0 }, > + { SLOT_SIZE(8, 0), 8, 0 }, > + { SLOT_SIZE(15, 1), 15, 1 }, > + { SLOT_SIZE(14, 1), 14, 1 }, > + { SLOT_SIZE(13, 1), 13, 1 }, > + { SLOT_SIZE(12, 1), 12, 1 }, > + { SLOT_SIZE(11, 1), 11, 1 }, > + { SLOT_SIZE(10, 1), 10, 1 }, > + { SLOT_SIZE(9, 1), 9, 1 }, > + { SLOT_SIZE(8, 1), 8, 1 }, > + { SLOT_SIZE(15, 2), 15, 2 }, > + { SLOT_SIZE(14, 2), 14, 2 }, > + { SLOT_SIZE(13, 2), 13, 2 }, > + { SLOT_SIZE(12, 2), 12, 2 }, > + { SLOT_SIZE(11, 2), 11, 2 }, > + { SLOT_SIZE(10, 2), 10, 2 }, > + { SLOT_SIZE(9, 2), 9, 2 }, > + { SLOT_SIZE(8, 2), 8, 2 }, > + { SLOT_SIZE(7, 2), 7, 2 }, > + { SLOT_SIZE(6, 2), 6, 2 }, > + { SLOT_SIZE(5, 2), 5, 2 }, > +#endif /* PAGE_SIZE */ > +}; > + > +/** > + * struct block_list - stores metadata of particular list > + * lock: protects the list of blocks > + * active_list: linked list of active (non-full) blocks > + * full_list: linked list of full blocks > + * block_count: total number of blocks in the list > + */ > +struct block_list { > + spinlock_t lock; > + struct list_head active_list; > + struct list_head full_list; > + unsigned long block_count; > +}; > + > +/** > + * struct zblock_pool - stores metadata for each zblock pool > + * @block_lists: array of block lists > + * @zpool: zpool driver > + * > + * This structure is allocated at pool creation time and maintains > metadata > + * for a particular zblock pool. > + */ > +struct zblock_pool { > + struct block_list block_lists[ARRAY_SIZE(block_desc)]; > + struct zpool *zpool; > +}; > + > + > +#endif