* [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
[not found] <20260226192936.3190275-1-joshua.hahnjy@gmail.com>
@ 2026-02-26 19:29 ` Joshua Hahn
2026-02-26 21:37 ` Shakeel Butt
` (2 more replies)
2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn
1 sibling, 3 replies; 9+ messages in thread
From: Joshua Hahn @ 2026-02-26 19:29 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky
Cc: Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham,
Chengming Zhou, Andrew Morton, linux-mm, linux-block,
linux-kernel, kernel-team
Introduce an array of struct obj_cgroup pointers to zpdesc to keep track
of compressed objects' memcg ownership.
The 8 bytes required to add the array in struct zpdesc brings its size
up from 56 bytes to 64 bytes. However, in the current implementation,
struct zpdesc lays on top of struct page[1]. This allows the increased
size to remain invisible to the outside, since 64 bytes are used for
struct zpdesc anyways.
The newly added obj_cgroup array pointer overlays page->memcg_data,
which causes problems for functions that try to perform page charging by
checking the zeroness of page->memcg_data. To make sure that the
backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *,
follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer.
Consumers of zsmalloc that do not perform memcg accounting (i.e. zram)
are completely unaffected by this patch, as the array to track the
obj_cgroup pointers are only allocated in the zswap path.
This patch temporarily increases the memory used by zswap by 8 bytes
per zswap_entry, since the obj_cgroup pointer is duplicated in the
zpdesc and in zswap_entry. In the following patches, we will redirect
memory charging operations to use the zpdesc's obj_cgroup instead, and
remove the pointer from zswap_entry. This will leave no net memory usage
increase for both zram and zswap.
In this patch, allocate / free the objcg pointer array for the zswap
path, and handle partial object migration and full zpdesc migration.
[1] In the (near) future, struct zpdesc may no longer overlay struct
page as we shift towards using memdescs. When this happens, the size
increase of struct zpdesc will no longer free. With that said, the
difference can be kept minimal.
All the changes that are being implemented are currently guarded under
CONFIG_MEMCG. We can optionally minimize the impact on zram users by
guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
drivers/block/zram/zram_drv.c | 10 ++---
include/linux/zsmalloc.h | 2 +-
mm/zpdesc.h | 25 +++++++++++-
mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------
mm/zswap.c | 2 +-
5 files changed, 93 insertions(+), 20 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 61d3e2c74901..60ee85679730 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2220,8 +2220,8 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
* like we do for compressible pages.
*/
handle = zs_malloc(zram->mem_pool, PAGE_SIZE,
- GFP_NOIO | __GFP_NOWARN |
- __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
+ GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM |
+ __GFP_MOVABLE, page_to_nid(page), false);
if (IS_ERR_VALUE(handle))
return PTR_ERR((void *)handle);
@@ -2283,8 +2283,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
}
handle = zs_malloc(zram->mem_pool, comp_len,
- GFP_NOIO | __GFP_NOWARN |
- __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
+ GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM |
+ __GFP_MOVABLE, page_to_nid(page), false);
if (IS_ERR_VALUE(handle)) {
zcomp_stream_put(zstrm);
return PTR_ERR((void *)handle);
@@ -2514,7 +2514,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
handle_new = zs_malloc(zram->mem_pool, comp_len_new,
GFP_NOIO | __GFP_NOWARN |
__GFP_HIGHMEM | __GFP_MOVABLE,
- page_to_nid(page));
+ page_to_nid(page), false);
if (IS_ERR_VALUE(handle_new)) {
zcomp_stream_put(zstrm);
return PTR_ERR((void *)handle_new);
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 478410c880b1..8ef28b964bb0 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -28,7 +28,7 @@ struct zs_pool *zs_create_pool(const char *name);
void zs_destroy_pool(struct zs_pool *pool);
unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags,
- const int nid);
+ const int nid, bool objcg);
void zs_free(struct zs_pool *pool, unsigned long obj);
size_t zs_huge_class_size(struct zs_pool *pool);
diff --git a/mm/zpdesc.h b/mm/zpdesc.h
index b8258dc78548..d10a73e4a90e 100644
--- a/mm/zpdesc.h
+++ b/mm/zpdesc.h
@@ -20,10 +20,12 @@
* @zspage: Points to the zspage this zpdesc is a part of.
* @first_obj_offset: First object offset in zsmalloc pool.
* @_refcount: The number of references to this zpdesc.
+ * @objcgs: Array of objcgs pointers that the stored objs
+ * belong to. Overlayed on top of page->memcg_data, and
+ * will always have first bit set if it is a valid pointer.
*
* This struct overlays struct page for now. Do not modify without a good
- * understanding of the issues. In particular, do not expand into the overlap
- * with memcg_data.
+ * understanding of the issues.
*
* Page flags used:
* * PG_private identifies the first component page.
@@ -47,6 +49,9 @@ struct zpdesc {
*/
unsigned int first_obj_offset;
atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+ unsigned long objcgs;
+#endif
};
#define ZPDESC_MATCH(pg, zp) \
static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp))
@@ -59,6 +64,9 @@ ZPDESC_MATCH(__folio_index, handle);
ZPDESC_MATCH(private, zspage);
ZPDESC_MATCH(page_type, first_obj_offset);
ZPDESC_MATCH(_refcount, _refcount);
+#ifdef CONFIG_MEMCG
+ZPDESC_MATCH(memcg_data, objcgs);
+#endif
#undef ZPDESC_MATCH
static_assert(sizeof(struct zpdesc) <= sizeof(struct page));
@@ -171,4 +179,17 @@ static inline bool zpdesc_is_locked(struct zpdesc *zpdesc)
{
return folio_test_locked(zpdesc_folio(zpdesc));
}
+
+#ifdef CONFIG_MEMCG
+static inline struct obj_cgroup **zpdesc_objcgs(struct zpdesc *zpdesc)
+{
+ return (struct obj_cgroup **)(zpdesc->objcgs & ~OBJEXTS_FLAGS_MASK);
+}
+
+static inline void zpdesc_set_objcgs(struct zpdesc *zpdesc,
+ struct obj_cgroup **objcgs)
+{
+ zpdesc->objcgs = (unsigned long)objcgs | MEMCG_DATA_OBJEXTS;
+}
+#endif
#endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7846f31bcc8b..7d56bb700e11 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -39,6 +39,7 @@
#include <linux/zsmalloc.h>
#include <linux/fs.h>
#include <linux/workqueue.h>
+#include <linux/memcontrol.h>
#include "zpdesc.h"
#define ZSPAGE_MAGIC 0x58
@@ -777,6 +778,10 @@ static void reset_zpdesc(struct zpdesc *zpdesc)
ClearPagePrivate(page);
zpdesc->zspage = NULL;
zpdesc->next = NULL;
+#ifdef CONFIG_MEMCG
+ kfree(zpdesc_objcgs(zpdesc));
+ zpdesc->objcgs = 0;
+#endif
/* PageZsmalloc is sticky until the page is freed to the buddy. */
}
@@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
set_freeobj(zspage, 0);
}
+#ifdef CONFIG_MEMCG
+static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
+ struct zpdesc *zpdescs[])
+{
+ /*
+ * Add 2 to objcgs_per_zpdesc to account for partial objs that may be
+ * stored at the beginning or end of the zpdesc.
+ */
+ int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2;
+ int i;
+ struct obj_cgroup **objcgs;
+
+ for (i = 0; i < class->pages_per_zspage; i++) {
+ objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *),
+ gfp & ~__GFP_HIGHMEM);
+ if (!objcgs) {
+ while (--i >= 0) {
+ kfree(zpdesc_objcgs(zpdescs[i]));
+ zpdescs[i]->objcgs = 0;
+ }
+
+ return false;
+ }
+
+ zpdesc_set_objcgs(zpdescs[i], objcgs);
+ }
+
+ return true;
+}
+#else
+static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
+ struct zpdesc *zpdescs[])
+{
+ return true;
+}
+#endif
+
static void create_page_chain(struct size_class *class, struct zspage *zspage,
struct zpdesc *zpdescs[])
{
@@ -931,7 +973,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage,
*/
static struct zspage *alloc_zspage(struct zs_pool *pool,
struct size_class *class,
- gfp_t gfp, const int nid)
+ gfp_t gfp, const int nid, bool objcg)
{
int i;
struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE];
@@ -952,24 +994,29 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
struct zpdesc *zpdesc;
zpdesc = alloc_zpdesc(gfp, nid);
- if (!zpdesc) {
- while (--i >= 0) {
- zpdesc_dec_zone_page_state(zpdescs[i]);
- free_zpdesc(zpdescs[i]);
- }
- cache_free_zspage(zspage);
- return NULL;
- }
+ if (!zpdesc)
+ goto err;
__zpdesc_set_zsmalloc(zpdesc);
zpdesc_inc_zone_page_state(zpdesc);
zpdescs[i] = zpdesc;
}
+ if (objcg && !alloc_zspage_objcgs(class, gfp, zpdescs))
+ goto err;
+
create_page_chain(class, zspage, zpdescs);
init_zspage(class, zspage);
return zspage;
+
+err:
+ while (--i >= 0) {
+ zpdesc_dec_zone_page_state(zpdescs[i]);
+ free_zpdesc(zpdescs[i]);
+ }
+ cache_free_zspage(zspage);
+ return NULL;
}
static struct zspage *find_get_zspage(struct size_class *class)
@@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool,
* @size: size of block to allocate
* @gfp: gfp flags when allocating object
* @nid: The preferred node id to allocate new zspage (if needed)
+ * @objcg: Whether the zspage should track per-object memory charging.
*
* On success, handle to the allocated object is returned,
* otherwise an ERR_PTR().
* Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
*/
unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
- const int nid)
+ const int nid, bool objcg)
{
unsigned long handle;
struct size_class *class;
@@ -1330,7 +1378,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
spin_unlock(&class->lock);
- zspage = alloc_zspage(pool, class, gfp, nid);
+ zspage = alloc_zspage(pool, class, gfp, nid, objcg);
if (!zspage) {
cache_free_handle(handle);
return (unsigned long)ERR_PTR(-ENOMEM);
@@ -1672,6 +1720,10 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
if (unlikely(ZsHugePage(zspage)))
newzpdesc->handle = oldzpdesc->handle;
__zpdesc_set_movable(newzpdesc);
+#ifdef CONFIG_MEMCG
+ zpdesc_set_objcgs(newzpdesc, zpdesc_objcgs(oldzpdesc));
+ oldzpdesc->objcgs = 0;
+#endif
}
static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
diff --git a/mm/zswap.c b/mm/zswap.c
index af3f0fbb0558..dd083110bfa0 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -905,7 +905,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
}
gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
- handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page));
+ handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page), true);
if (IS_ERR_VALUE(handle)) {
alloc_ret = PTR_ERR((void *)handle);
goto unlock;
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc
[not found] <20260226192936.3190275-1-joshua.hahnjy@gmail.com>
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
@ 2026-02-26 19:29 ` Joshua Hahn
1 sibling, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2026-02-26 19:29 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky
Cc: Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham,
Chengming Zhou, Andrew Morton, linux-mm, linux-block,
linux-kernel, kernel-team
With each zswap-backing zpdesc now having an array of obj_cgroup pointers,
plumb the obj_cgroup pointer from the zswap / zram layer down to
zsmalloc.
Introduce two helper functions zpdesc_obj_cgroup and zpdesc_set_obj_cgroup,
which abstract the conversion of an object's zspage idx to its zpdesc
idx and the retrieval of the obj_cgroup pointer from the zpdesc.
From the zswap path, store the obj_cgroup pointer after compression when
writing the object and free when the object gets freed. Also handle the
migration of an object across zpdescs.
The lifetime and charging of the obj_cgroup is still handled in the
zswap layer.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
drivers/block/zram/zram_drv.c | 7 ++--
include/linux/zsmalloc.h | 3 +-
mm/zsmalloc.c | 71 ++++++++++++++++++++++++++++++++++-
mm/zswap.c | 6 +--
4 files changed, 79 insertions(+), 8 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 60ee85679730..209668b14428 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2231,7 +2231,7 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
}
src = kmap_local_page(page);
- zs_obj_write(zram->mem_pool, handle, src, PAGE_SIZE);
+ zs_obj_write(zram->mem_pool, handle, src, PAGE_SIZE, NULL);
kunmap_local(src);
slot_lock(zram, index);
@@ -2296,7 +2296,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
return -ENOMEM;
}
- zs_obj_write(zram->mem_pool, handle, zstrm->buffer, comp_len);
+ zs_obj_write(zram->mem_pool, handle, zstrm->buffer, comp_len, NULL);
zcomp_stream_put(zstrm);
slot_lock(zram, index);
@@ -2520,7 +2520,8 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
return PTR_ERR((void *)handle_new);
}
- zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer, comp_len_new);
+ zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer,
+ comp_len_new, NULL);
zcomp_stream_put(zstrm);
slot_free(zram, index);
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 8ef28b964bb0..22f3baa13f24 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -15,6 +15,7 @@
#define _ZS_MALLOC_H_
#include <linux/types.h>
+#include <linux/memcontrol.h>
struct zs_pool_stats {
/* How many pages were migrated (freed) */
@@ -48,7 +49,7 @@ void zs_obj_read_sg_begin(struct zs_pool *pool, unsigned long handle,
struct scatterlist *sg, size_t mem_len);
void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle);
void zs_obj_write(struct zs_pool *pool, unsigned long handle,
- void *handle_mem, size_t mem_len);
+ void *handle_mem, size_t mem_len, struct obj_cgroup *objcg);
extern const struct movable_operations zsmalloc_mops;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7d56bb700e11..e5ae9a0fc78a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -899,6 +899,41 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
}
#ifdef CONFIG_MEMCG
+/* idx is indexed per-zspage, not per-zpdesc. */
+static inline struct obj_cgroup *zpdesc_obj_cgroup(struct zpdesc *zpdesc,
+ unsigned int idx,
+ int size)
+{
+ struct obj_cgroup **objcgs = zpdesc_objcgs(zpdesc);
+ unsigned int off = offset_in_page(size * idx);
+ unsigned int zpdesc_idx = DIV_ROUND_UP(off, size);
+
+ if (!objcgs)
+ return NULL;
+
+ return objcgs[zpdesc_idx];
+}
+
+/* idx is indexed per-zspage, not per-zpdesc. */
+static inline void zpdesc_set_obj_cgroup(struct zpdesc *zpdesc,
+ unsigned int idx, int size,
+ struct obj_cgroup *objcg)
+{
+ struct obj_cgroup **objcgs = zpdesc_objcgs(zpdesc);
+ unsigned int off = offset_in_page(size * idx);
+ unsigned int zpdesc_idx = DIV_ROUND_UP(off, size);
+
+ if (!objcgs)
+ return;
+
+ objcgs[zpdesc_idx] = objcg;
+ if (off + size > PAGE_SIZE) {
+ /* object spans two pages */
+ objcgs = zpdesc_objcgs(get_next_zpdesc(zpdesc));
+ objcgs[0] = objcg;
+ }
+}
+
static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
struct zpdesc *zpdescs[])
{
@@ -927,12 +962,40 @@ static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
return true;
}
+
+static void migrate_obj_objcg(unsigned long used_obj, unsigned long free_obj,
+ int size)
+{
+ unsigned int s_obj_idx, d_obj_idx;
+ struct zpdesc *s_zpdesc, *d_zpdesc;
+ struct obj_cgroup *objcg;
+
+ obj_to_location(used_obj, &s_zpdesc, &s_obj_idx);
+ obj_to_location(free_obj, &d_zpdesc, &d_obj_idx);
+ objcg = zpdesc_obj_cgroup(s_zpdesc, s_obj_idx, size);
+
+ zpdesc_set_obj_cgroup(d_zpdesc, d_obj_idx, size, objcg);
+ zpdesc_set_obj_cgroup(s_zpdesc, s_obj_idx, size, NULL);
+}
#else
+static inline struct obj_cgroup *zpdesc_obj_cgroup(struct zpdesc *zpdesc,
+ unsigned int offset,
+ int size)
+{
+ return NULL;
+}
+
+static inline void zpdesc_set_obj_cgroup(struct zpdesc *zpdesc,
+ unsigned int offset, int size,
+ struct obj_cgroup *objcg) {}
static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
struct zpdesc *zpdescs[])
{
return true;
}
+
+static void migrate_obj_objcg(unsigned long used_obj, unsigned long free_obj,
+ int size) {}
#endif
static void create_page_chain(struct size_class *class, struct zspage *zspage,
@@ -1221,7 +1284,7 @@ void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle)
EXPORT_SYMBOL_GPL(zs_obj_read_sg_end);
void zs_obj_write(struct zs_pool *pool, unsigned long handle,
- void *handle_mem, size_t mem_len)
+ void *handle_mem, size_t mem_len, struct obj_cgroup *objcg)
{
struct zspage *zspage;
struct zpdesc *zpdesc;
@@ -1242,6 +1305,9 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);
+ if (objcg)
+ zpdesc_set_obj_cgroup(zpdesc, obj_idx, class->size, objcg);
+
if (!ZsHugePage(zspage))
off += ZS_HANDLE_SIZE;
@@ -1415,6 +1481,8 @@ static void obj_free(int class_size, unsigned long obj)
f_offset = offset_in_page(class_size * f_objidx);
zspage = get_zspage(f_zpdesc);
+ zpdesc_set_obj_cgroup(f_zpdesc, f_objidx, class_size, NULL);
+
vaddr = kmap_local_zpdesc(f_zpdesc);
link = (struct link_free *)(vaddr + f_offset);
@@ -1587,6 +1655,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
used_obj = handle_to_obj(handle);
free_obj = obj_malloc(pool, dst_zspage, handle);
zs_obj_copy(class, free_obj, used_obj);
+ migrate_obj_objcg(used_obj, free_obj, class->size);
obj_idx++;
obj_free(class->size, used_obj);
diff --git a/mm/zswap.c b/mm/zswap.c
index dd083110bfa0..1e2d60f47919 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -851,7 +851,7 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
}
static bool zswap_compress(struct page *page, struct zswap_entry *entry,
- struct zswap_pool *pool)
+ struct zswap_pool *pool, struct obj_cgroup *objcg)
{
struct crypto_acomp_ctx *acomp_ctx;
struct scatterlist input, output;
@@ -911,7 +911,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
goto unlock;
}
- zs_obj_write(pool->zs_pool, handle, dst, dlen);
+ zs_obj_write(pool->zs_pool, handle, dst, dlen, objcg);
entry->handle = handle;
entry->length = dlen;
@@ -1413,7 +1413,7 @@ static bool zswap_store_page(struct page *page,
return false;
}
- if (!zswap_compress(page, entry, pool))
+ if (!zswap_compress(page, entry, pool, objcg))
goto compress_failed;
old = xa_store(swap_zswap_tree(page_swpentry),
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
@ 2026-02-26 21:37 ` Shakeel Butt
2026-02-26 21:43 ` Joshua Hahn
2026-03-04 16:58 ` Yosry Ahmed
2026-03-06 3:49 ` Harry Yoo
2 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2026-02-26 21:37 UTC (permalink / raw)
To: Joshua Hahn
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote:
> --- a/mm/zpdesc.h
> +++ b/mm/zpdesc.h
> @@ -20,10 +20,12 @@
> * @zspage: Points to the zspage this zpdesc is a part of.
> * @first_obj_offset: First object offset in zsmalloc pool.
> * @_refcount: The number of references to this zpdesc.
> + * @objcgs: Array of objcgs pointers that the stored objs
> + * belong to. Overlayed on top of page->memcg_data, and
> + * will always have first bit set if it is a valid pointer.
> *
> * This struct overlays struct page for now. Do not modify without a good
> - * understanding of the issues. In particular, do not expand into the overlap
> - * with memcg_data.
> + * understanding of the issues.
> *
> * Page flags used:
> * * PG_private identifies the first component page.
> @@ -47,6 +49,9 @@ struct zpdesc {
> */
> unsigned int first_obj_offset;
> atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> + unsigned long objcgs;
Why not just strore struct obj_cgroup ** instead of unsigned long? You will not
need to do conversions when storing or accessing.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-02-26 21:37 ` Shakeel Butt
@ 2026-02-26 21:43 ` Joshua Hahn
0 siblings, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2026-02-26 21:43 UTC (permalink / raw)
To: Shakeel Butt
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
On Thu, 26 Feb 2026 13:37:42 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote:
> > --- a/mm/zpdesc.h
> > +++ b/mm/zpdesc.h
> > @@ -20,10 +20,12 @@
> > * @zspage: Points to the zspage this zpdesc is a part of.
> > * @first_obj_offset: First object offset in zsmalloc pool.
> > * @_refcount: The number of references to this zpdesc.
> > + * @objcgs: Array of objcgs pointers that the stored objs
> > + * belong to. Overlayed on top of page->memcg_data, and
> > + * will always have first bit set if it is a valid pointer.
> > *
> > * This struct overlays struct page for now. Do not modify without a good
> > - * understanding of the issues. In particular, do not expand into the overlap
> > - * with memcg_data.
> > + * understanding of the issues.
> > *
> > * Page flags used:
> > * * PG_private identifies the first component page.
> > @@ -47,6 +49,9 @@ struct zpdesc {
> > */
> > unsigned int first_obj_offset;
> > atomic_t _refcount;
> > +#ifdef CONFIG_MEMCG
> > + unsigned long objcgs;
Hello Shakeel, I hope you're doing well!
> Why not just strore struct obj_cgroup ** instead of unsigned long? You will not
> need to do conversions when storing or accessing.
Yeah, that makes sense to me : -)
I guess if we're going to be accessing it with the zpdesc_objcgs and
zpdesc_set_objcgs helpers anyways, we can abstract away the casting to
unsigned long and re-casting to obj_cgroup pointer away from the user.
I was just trying to skip one cast, but I think it won't matter too much
anyways since it's always hidden away. I'll change it to obj_cgroup **
in the next version!!
I hope you have a great day : -)
Joshua
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
2026-02-26 21:37 ` Shakeel Butt
@ 2026-03-04 16:58 ` Yosry Ahmed
2026-03-04 18:03 ` Joshua Hahn
2026-03-06 3:49 ` Harry Yoo
2 siblings, 1 reply; 9+ messages in thread
From: Yosry Ahmed @ 2026-03-04 16:58 UTC (permalink / raw)
To: Joshua Hahn
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
> static struct zspage *find_get_zspage(struct size_class *class)
> @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool,
> * @size: size of block to allocate
> * @gfp: gfp flags when allocating object
> * @nid: The preferred node id to allocate new zspage (if needed)
> + * @objcg: Whether the zspage should track per-object memory charging.
> *
> * On success, handle to the allocated object is returned,
> * otherwise an ERR_PTR().
> * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
> */
> unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
> - const int nid)
> + const int nid, bool objcg)
Instead of passing in a boolean here, what if we make it a pool
parameter at creation time? I don't foresee I use case where some
objects are charged and some aren't. This avoids needing to always
pass objcg=true (for zswap) or objcg=false (for zram), and reduces
churn. Also, it allows us to add assertions to zs_obj_write() (and
elsewhere if needed) that an objcg is passed in when the pool should
be charged.
We can even add a zs_obj_write_objcg() variant that takes in an objcg,
and keep the current one as-is. Both would internally call a helper
that takes in an objcg, but that would further minimize churn to zram.
Not sure if that's worth it though. Sergey, WDYT?
On Thu, Feb 26, 2026 at 11:29 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> Introduce an array of struct obj_cgroup pointers to zpdesc to keep track
> of compressed objects' memcg ownership.
>
> The 8 bytes required to add the array in struct zpdesc brings its size
> up from 56 bytes to 64 bytes. However, in the current implementation,
> struct zpdesc lays on top of struct page[1]. This allows the increased
> size to remain invisible to the outside, since 64 bytes are used for
> struct zpdesc anyways.
>
> The newly added obj_cgroup array pointer overlays page->memcg_data,
> which causes problems for functions that try to perform page charging by
> checking the zeroness of page->memcg_data. To make sure that the
> backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *,
> follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer.
>
> Consumers of zsmalloc that do not perform memcg accounting (i.e. zram)
> are completely unaffected by this patch, as the array to track the
> obj_cgroup pointers are only allocated in the zswap path.
>
> This patch temporarily increases the memory used by zswap by 8 bytes
> per zswap_entry, since the obj_cgroup pointer is duplicated in the
> zpdesc and in zswap_entry. In the following patches, we will redirect
> memory charging operations to use the zpdesc's obj_cgroup instead, and
> remove the pointer from zswap_entry. This will leave no net memory usage
> increase for both zram and zswap.
>
> In this patch, allocate / free the objcg pointer array for the zswap
> path, and handle partial object migration and full zpdesc migration.
>
> [1] In the (near) future, struct zpdesc may no longer overlay struct
> page as we shift towards using memdescs. When this happens, the size
> increase of struct zpdesc will no longer free. With that said, the
> difference can be kept minimal.
>
> All the changes that are being implemented are currently guarded under
> CONFIG_MEMCG. We can optionally minimize the impact on zram users by
> guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well.
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
> drivers/block/zram/zram_drv.c | 10 ++---
> include/linux/zsmalloc.h | 2 +-
> mm/zpdesc.h | 25 +++++++++++-
> mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------
> mm/zswap.c | 2 +-
> 5 files changed, 93 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 61d3e2c74901..60ee85679730 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -2220,8 +2220,8 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
> * like we do for compressible pages.
> */
> handle = zs_malloc(zram->mem_pool, PAGE_SIZE,
> - GFP_NOIO | __GFP_NOWARN |
> - __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> + GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM |
> + __GFP_MOVABLE, page_to_nid(page), false);
> if (IS_ERR_VALUE(handle))
> return PTR_ERR((void *)handle);
>
> @@ -2283,8 +2283,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> }
>
> handle = zs_malloc(zram->mem_pool, comp_len,
> - GFP_NOIO | __GFP_NOWARN |
> - __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> + GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM |
> + __GFP_MOVABLE, page_to_nid(page), false);
> if (IS_ERR_VALUE(handle)) {
> zcomp_stream_put(zstrm);
> return PTR_ERR((void *)handle);
> @@ -2514,7 +2514,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
> handle_new = zs_malloc(zram->mem_pool, comp_len_new,
> GFP_NOIO | __GFP_NOWARN |
> __GFP_HIGHMEM | __GFP_MOVABLE,
> - page_to_nid(page));
> + page_to_nid(page), false);
> if (IS_ERR_VALUE(handle_new)) {
> zcomp_stream_put(zstrm);
> return PTR_ERR((void *)handle_new);
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index 478410c880b1..8ef28b964bb0 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -28,7 +28,7 @@ struct zs_pool *zs_create_pool(const char *name);
> void zs_destroy_pool(struct zs_pool *pool);
>
> unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags,
> - const int nid);
> + const int nid, bool objcg);
> void zs_free(struct zs_pool *pool, unsigned long obj);
>
> size_t zs_huge_class_size(struct zs_pool *pool);
> diff --git a/mm/zpdesc.h b/mm/zpdesc.h
> index b8258dc78548..d10a73e4a90e 100644
> --- a/mm/zpdesc.h
> +++ b/mm/zpdesc.h
> @@ -20,10 +20,12 @@
> * @zspage: Points to the zspage this zpdesc is a part of.
> * @first_obj_offset: First object offset in zsmalloc pool.
> * @_refcount: The number of references to this zpdesc.
> + * @objcgs: Array of objcgs pointers that the stored objs
> + * belong to. Overlayed on top of page->memcg_data, and
> + * will always have first bit set if it is a valid pointer.
> *
> * This struct overlays struct page for now. Do not modify without a good
> - * understanding of the issues. In particular, do not expand into the overlap
> - * with memcg_data.
> + * understanding of the issues.
> *
> * Page flags used:
> * * PG_private identifies the first component page.
> @@ -47,6 +49,9 @@ struct zpdesc {
> */
> unsigned int first_obj_offset;
> atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> + unsigned long objcgs;
> +#endif
> };
> #define ZPDESC_MATCH(pg, zp) \
> static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp))
> @@ -59,6 +64,9 @@ ZPDESC_MATCH(__folio_index, handle);
> ZPDESC_MATCH(private, zspage);
> ZPDESC_MATCH(page_type, first_obj_offset);
> ZPDESC_MATCH(_refcount, _refcount);
> +#ifdef CONFIG_MEMCG
> +ZPDESC_MATCH(memcg_data, objcgs);
> +#endif
> #undef ZPDESC_MATCH
> static_assert(sizeof(struct zpdesc) <= sizeof(struct page));
>
> @@ -171,4 +179,17 @@ static inline bool zpdesc_is_locked(struct zpdesc *zpdesc)
> {
> return folio_test_locked(zpdesc_folio(zpdesc));
> }
> +
> +#ifdef CONFIG_MEMCG
> +static inline struct obj_cgroup **zpdesc_objcgs(struct zpdesc *zpdesc)
> +{
> + return (struct obj_cgroup **)(zpdesc->objcgs & ~OBJEXTS_FLAGS_MASK);
> +}
> +
> +static inline void zpdesc_set_objcgs(struct zpdesc *zpdesc,
> + struct obj_cgroup **objcgs)
> +{
> + zpdesc->objcgs = (unsigned long)objcgs | MEMCG_DATA_OBJEXTS;
> +}
> +#endif
> #endif
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 7846f31bcc8b..7d56bb700e11 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -39,6 +39,7 @@
> #include <linux/zsmalloc.h>
> #include <linux/fs.h>
> #include <linux/workqueue.h>
> +#include <linux/memcontrol.h>
> #include "zpdesc.h"
>
> #define ZSPAGE_MAGIC 0x58
> @@ -777,6 +778,10 @@ static void reset_zpdesc(struct zpdesc *zpdesc)
> ClearPagePrivate(page);
> zpdesc->zspage = NULL;
> zpdesc->next = NULL;
> +#ifdef CONFIG_MEMCG
> + kfree(zpdesc_objcgs(zpdesc));
> + zpdesc->objcgs = 0;
> +#endif
> /* PageZsmalloc is sticky until the page is freed to the buddy. */
> }
>
> @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> set_freeobj(zspage, 0);
> }
>
> +#ifdef CONFIG_MEMCG
> +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
> + struct zpdesc *zpdescs[])
> +{
> + /*
> + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be
> + * stored at the beginning or end of the zpdesc.
> + */
> + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2;
> + int i;
> + struct obj_cgroup **objcgs;
> +
> + for (i = 0; i < class->pages_per_zspage; i++) {
> + objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *),
> + gfp & ~__GFP_HIGHMEM);
> + if (!objcgs) {
> + while (--i >= 0) {
> + kfree(zpdesc_objcgs(zpdescs[i]));
> + zpdescs[i]->objcgs = 0;
> + }
> +
> + return false;
> + }
> +
> + zpdesc_set_objcgs(zpdescs[i], objcgs);
> + }
> +
> + return true;
> +}
> +#else
> +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
> + struct zpdesc *zpdescs[])
> +{
> + return true;
> +}
> +#endif
> +
> static void create_page_chain(struct size_class *class, struct zspage *zspage,
> struct zpdesc *zpdescs[])
> {
> @@ -931,7 +973,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage,
> */
> static struct zspage *alloc_zspage(struct zs_pool *pool,
> struct size_class *class,
> - gfp_t gfp, const int nid)
> + gfp_t gfp, const int nid, bool objcg)
> {
> int i;
> struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE];
> @@ -952,24 +994,29 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
> struct zpdesc *zpdesc;
>
> zpdesc = alloc_zpdesc(gfp, nid);
> - if (!zpdesc) {
> - while (--i >= 0) {
> - zpdesc_dec_zone_page_state(zpdescs[i]);
> - free_zpdesc(zpdescs[i]);
> - }
> - cache_free_zspage(zspage);
> - return NULL;
> - }
> + if (!zpdesc)
> + goto err;
> __zpdesc_set_zsmalloc(zpdesc);
>
> zpdesc_inc_zone_page_state(zpdesc);
> zpdescs[i] = zpdesc;
> }
>
> + if (objcg && !alloc_zspage_objcgs(class, gfp, zpdescs))
> + goto err;
> +
> create_page_chain(class, zspage, zpdescs);
> init_zspage(class, zspage);
>
> return zspage;
> +
> +err:
> + while (--i >= 0) {
> + zpdesc_dec_zone_page_state(zpdescs[i]);
> + free_zpdesc(zpdescs[i]);
> + }
> + cache_free_zspage(zspage);
> + return NULL;
> }
>
> static struct zspage *find_get_zspage(struct size_class *class)
> @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool,
> * @size: size of block to allocate
> * @gfp: gfp flags when allocating object
> * @nid: The preferred node id to allocate new zspage (if needed)
> + * @objcg: Whether the zspage should track per-object memory charging.
> *
> * On success, handle to the allocated object is returned,
> * otherwise an ERR_PTR().
> * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
> */
> unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
> - const int nid)
> + const int nid, bool objcg)
> {
> unsigned long handle;
> struct size_class *class;
> @@ -1330,7 +1378,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
>
> spin_unlock(&class->lock);
>
> - zspage = alloc_zspage(pool, class, gfp, nid);
> + zspage = alloc_zspage(pool, class, gfp, nid, objcg);
> if (!zspage) {
> cache_free_handle(handle);
> return (unsigned long)ERR_PTR(-ENOMEM);
> @@ -1672,6 +1720,10 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
> if (unlikely(ZsHugePage(zspage)))
> newzpdesc->handle = oldzpdesc->handle;
> __zpdesc_set_movable(newzpdesc);
> +#ifdef CONFIG_MEMCG
> + zpdesc_set_objcgs(newzpdesc, zpdesc_objcgs(oldzpdesc));
> + oldzpdesc->objcgs = 0;
> +#endif
> }
>
> static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
> diff --git a/mm/zswap.c b/mm/zswap.c
> index af3f0fbb0558..dd083110bfa0 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -905,7 +905,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> }
>
> gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
> - handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page));
> + handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page), true);
> if (IS_ERR_VALUE(handle)) {
> alloc_ret = PTR_ERR((void *)handle);
> goto unlock;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-03-04 16:58 ` Yosry Ahmed
@ 2026-03-04 18:03 ` Joshua Hahn
0 siblings, 0 replies; 9+ messages in thread
From: Joshua Hahn @ 2026-03-04 18:03 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
On Wed, 4 Mar 2026 08:58:44 -0800 Yosry Ahmed <yosry@kernel.org> wrote:
> > static struct zspage *find_get_zspage(struct size_class *class)
> > @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool,
> > * @size: size of block to allocate
> > * @gfp: gfp flags when allocating object
> > * @nid: The preferred node id to allocate new zspage (if needed)
> > + * @objcg: Whether the zspage should track per-object memory charging.
> > *
> > * On success, handle to the allocated object is returned,
> > * otherwise an ERR_PTR().
> > * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
> > */
> > unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
> > - const int nid)
> > + const int nid, bool objcg)
>
> Instead of passing in a boolean here, what if we make it a pool
> parameter at creation time? I don't foresee I use case where some
> objects are charged and some aren't. This avoids needing to always
> pass objcg=true (for zswap) or objcg=false (for zram), and reduces
> churn. Also, it allows us to add assertions to zs_obj_write() (and
> elsewhere if needed) that an objcg is passed in when the pool should
> be charged.
Hi Yosry,
Thank you for another great idea! Makes sense to me, then I can leave most
of the existing ABI signatures alone and just make the zpool changes.
Thanks a lot again : -)
Joshua
> We can even add a zs_obj_write_objcg() variant that takes in an objcg,
> and keep the current one as-is. Both would internally call a helper
> that takes in an objcg, but that would further minimize churn to zram.
> Not sure if that's worth it though. Sergey, WDYT?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
2026-02-26 21:37 ` Shakeel Butt
2026-03-04 16:58 ` Yosry Ahmed
@ 2026-03-06 3:49 ` Harry Yoo
2026-03-06 15:48 ` Joshua Hahn
2 siblings, 1 reply; 9+ messages in thread
From: Harry Yoo @ 2026-03-06 3:49 UTC (permalink / raw)
To: Joshua Hahn
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote:
> Introduce an array of struct obj_cgroup pointers to zpdesc to keep track
> of compressed objects' memcg ownership.
>
> The 8 bytes required to add the array in struct zpdesc brings its size
> up from 56 bytes to 64 bytes. However, in the current implementation,
> struct zpdesc lays on top of struct page[1]. This allows the increased
> size to remain invisible to the outside, since 64 bytes are used for
> struct zpdesc anyways.
>
> The newly added obj_cgroup array pointer overlays page->memcg_data,
> which causes problems for functions that try to perform page charging by
> checking the zeroness of page->memcg_data. To make sure that the
> backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *,
> follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer.
>
> Consumers of zsmalloc that do not perform memcg accounting (i.e. zram)
> are completely unaffected by this patch, as the array to track the
> obj_cgroup pointers are only allocated in the zswap path.
>
> This patch temporarily increases the memory used by zswap by 8 bytes
> per zswap_entry, since the obj_cgroup pointer is duplicated in the
> zpdesc and in zswap_entry. In the following patches, we will redirect
> memory charging operations to use the zpdesc's obj_cgroup instead, and
> remove the pointer from zswap_entry. This will leave no net memory usage
> increase for both zram and zswap.
>
> In this patch, allocate / free the objcg pointer array for the zswap
> path, and handle partial object migration and full zpdesc migration.
>
> [1] In the (near) future, struct zpdesc may no longer overlay struct
> page as we shift towards using memdescs. When this happens, the size
> increase of struct zpdesc will no longer free. With that said, the
> difference can be kept minimal.
>
> All the changes that are being implemented are currently guarded under
> CONFIG_MEMCG. We can optionally minimize the impact on zram users by
> guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well.
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
> drivers/block/zram/zram_drv.c | 10 ++---
> include/linux/zsmalloc.h | 2 +-
> mm/zpdesc.h | 25 +++++++++++-
> mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------
> mm/zswap.c | 2 +-
> 5 files changed, 93 insertions(+), 20 deletions(-)
>
> @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> set_freeobj(zspage, 0);
> }
>
> +#ifdef CONFIG_MEMCG
> +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
> + struct zpdesc *zpdescs[])
> +{
> + /*
> + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be
> + * stored at the beginning or end of the zpdesc.
> + */
> + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2;
> + int i;
> + struct obj_cgroup **objcgs;
Just wondering, perhaps it makes more sense to have an array of
objcg pointers for each zspage (of size objs_per_zspage)?
> +
> + for (i = 0; i < class->pages_per_zspage; i++) {
> + objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *),
> + gfp & ~__GFP_HIGHMEM);
> + if (!objcgs) {
> + while (--i >= 0) {
> + kfree(zpdesc_objcgs(zpdescs[i]));
> + zpdescs[i]->objcgs = 0;
> + }
> +
> + return false;
> + }
> +
> + zpdesc_set_objcgs(zpdescs[i], objcgs);
> + }
> +
> + return true;
> +}
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-03-06 3:49 ` Harry Yoo
@ 2026-03-06 15:48 ` Joshua Hahn
2026-03-11 2:27 ` Harry Yoo
0 siblings, 1 reply; 9+ messages in thread
From: Joshua Hahn @ 2026-03-06 15:48 UTC (permalink / raw)
To: Harry Yoo
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
On Fri, 6 Mar 2026 12:49:44 +0900 Harry Yoo <harry.yoo@oracle.com> wrote:
> On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote:
> > Introduce an array of struct obj_cgroup pointers to zpdesc to keep track
> > of compressed objects' memcg ownership.
> >
> > The 8 bytes required to add the array in struct zpdesc brings its size
> > up from 56 bytes to 64 bytes. However, in the current implementation,
> > struct zpdesc lays on top of struct page[1]. This allows the increased
> > size to remain invisible to the outside, since 64 bytes are used for
> > struct zpdesc anyways.
> >
> > The newly added obj_cgroup array pointer overlays page->memcg_data,
> > which causes problems for functions that try to perform page charging by
> > checking the zeroness of page->memcg_data. To make sure that the
> > backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *,
> > follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer.
> >
> > Consumers of zsmalloc that do not perform memcg accounting (i.e. zram)
> > are completely unaffected by this patch, as the array to track the
> > obj_cgroup pointers are only allocated in the zswap path.
> >
> > This patch temporarily increases the memory used by zswap by 8 bytes
> > per zswap_entry, since the obj_cgroup pointer is duplicated in the
> > zpdesc and in zswap_entry. In the following patches, we will redirect
> > memory charging operations to use the zpdesc's obj_cgroup instead, and
> > remove the pointer from zswap_entry. This will leave no net memory usage
> > increase for both zram and zswap.
> >
> > In this patch, allocate / free the objcg pointer array for the zswap
> > path, and handle partial object migration and full zpdesc migration.
> >
> > [1] In the (near) future, struct zpdesc may no longer overlay struct
> > page as we shift towards using memdescs. When this happens, the size
> > increase of struct zpdesc will no longer free. With that said, the
> > difference can be kept minimal.
> >
> > All the changes that are being implemented are currently guarded under
> > CONFIG_MEMCG. We can optionally minimize the impact on zram users by
> > guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well.
> >
> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > ---
> > drivers/block/zram/zram_drv.c | 10 ++---
> > include/linux/zsmalloc.h | 2 +-
> > mm/zpdesc.h | 25 +++++++++++-
> > mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------
> > mm/zswap.c | 2 +-
> > 5 files changed, 93 insertions(+), 20 deletions(-)
> >
> > @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> > set_freeobj(zspage, 0);
> > }
> >
> > +#ifdef CONFIG_MEMCG
> > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
> > + struct zpdesc *zpdescs[])
> > +{
> > + /*
> > + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be
> > + * stored at the beginning or end of the zpdesc.
> > + */
> > + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2;
> > + int i;
> > + struct obj_cgroup **objcgs;
>
> Just wondering, perhaps it makes more sense to have an array of
> objcg pointers for each zspage (of size objs_per_zspage)?
Hi Harry! I hope you are doing well, thanks for taking a look : -)
Hmm, I think you might be right. For context, one of the first
ideas I had for this patch was to have a per-zspage array, but store
it in the first zpdesc. As you can imagine this was not a good idea...
(head zpdesc page and tail zpdesc page? ;) )
But! storing it in the zspage struct makes a lot more sense to me.
And I think we can actually simplify the migration pathways as well.
My immediate response to this was that "subzpdesc swap ins/outs would
be difficult" since right now we can just move the pointer, but
if we have a per-zspage array, we actually don't have to do any
objcgs pointer migration at all.
And I think the cross-boundary cases are handled a lot beter by having
a per-zpdesc array too. We also don't have to convert the per-zspage
obj_idx into a per-zpdesc obj_idx as well, I think if we do this...
Let me mull on this for a bit : -) I'll give a shot at implementing
it this way, I think it makes sense!
Thanks again for taking a look, Harry. Have a great day!
Joshua
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
2026-03-06 15:48 ` Joshua Hahn
@ 2026-03-11 2:27 ` Harry Yoo
0 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo @ 2026-03-11 2:27 UTC (permalink / raw)
To: Joshua Hahn
Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
linux-mm, linux-block, linux-kernel, kernel-team
On Fri, Mar 06, 2026 at 07:48:06AM -0800, Joshua Hahn wrote:
> On Fri, 6 Mar 2026 12:49:44 +0900 Harry Yoo <harry.yoo@oracle.com> wrote:
>
> > On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote:
> > > Introduce an array of struct obj_cgroup pointers to zpdesc to keep track
> > > of compressed objects' memcg ownership.
> > >
> > > The 8 bytes required to add the array in struct zpdesc brings its size
> > > up from 56 bytes to 64 bytes. However, in the current implementation,
> > > struct zpdesc lays on top of struct page[1]. This allows the increased
> > > size to remain invisible to the outside, since 64 bytes are used for
> > > struct zpdesc anyways.
> > >
> > > The newly added obj_cgroup array pointer overlays page->memcg_data,
> > > which causes problems for functions that try to perform page charging by
> > > checking the zeroness of page->memcg_data. To make sure that the
> > > backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *,
> > > follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer.
> > >
> > > Consumers of zsmalloc that do not perform memcg accounting (i.e. zram)
> > > are completely unaffected by this patch, as the array to track the
> > > obj_cgroup pointers are only allocated in the zswap path.
> > >
> > > This patch temporarily increases the memory used by zswap by 8 bytes
> > > per zswap_entry, since the obj_cgroup pointer is duplicated in the
> > > zpdesc and in zswap_entry. In the following patches, we will redirect
> > > memory charging operations to use the zpdesc's obj_cgroup instead, and
> > > remove the pointer from zswap_entry. This will leave no net memory usage
> > > increase for both zram and zswap.
> > >
> > > In this patch, allocate / free the objcg pointer array for the zswap
> > > path, and handle partial object migration and full zpdesc migration.
> > >
> > > [1] In the (near) future, struct zpdesc may no longer overlay struct
> > > page as we shift towards using memdescs. When this happens, the size
> > > increase of struct zpdesc will no longer free. With that said, the
> > > difference can be kept minimal.
> > >
> > > All the changes that are being implemented are currently guarded under
> > > CONFIG_MEMCG. We can optionally minimize the impact on zram users by
> > > guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well.
> > >
> > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > > ---
> > > drivers/block/zram/zram_drv.c | 10 ++---
> > > include/linux/zsmalloc.h | 2 +-
> > > mm/zpdesc.h | 25 +++++++++++-
> > > mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------
> > > mm/zswap.c | 2 +-
> > > 5 files changed, 93 insertions(+), 20 deletions(-)
> > >
> > > @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> > > set_freeobj(zspage, 0);
> > > }
> > >
> > > +#ifdef CONFIG_MEMCG
> > > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
> > > + struct zpdesc *zpdescs[])
> > > +{
> > > + /*
> > > + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be
> > > + * stored at the beginning or end of the zpdesc.
> > > + */
> > > + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2;
> > > + int i;
> > > + struct obj_cgroup **objcgs;
> >
> > Just wondering, perhaps it makes more sense to have an array of
> > objcg pointers for each zspage (of size objs_per_zspage)?
>
> Hi Harry! I hope you are doing well, thanks for taking a look : -)
>
> Hmm, I think you might be right. For context, one of the first
> ideas I had for this patch was to have a per-zspage array, but store
> it in the first zpdesc. As you can imagine this was not a good idea...
> (head zpdesc page and tail zpdesc page? ;) )
Yeah that's not good ;)
> But! storing it in the zspage struct makes a lot more sense to me.
> And I think we can actually simplify the migration pathways as well.
>
> My immediate response to this was that "subzpdesc swap ins/outs would
> be difficult" since right now we can just move the pointer, but
> if we have a per-zspage array, we actually don't have to do any
> objcgs pointer migration at all.
Right.
> And I think the cross-boundary cases are handled a lot beter by having
> a per-zpdesc array too. We also don't have to convert the per-zspage
> obj_idx into a per-zpdesc obj_idx as well, I think if we do this...
Right.
> Let me mull on this for a bit : -) I'll give a shot at implementing
> it this way, I think it makes sense!
>
> Thanks again for taking a look, Harry. Have a great day!
Happy to help (hehe, it just looked a bit more natural),
have a good day!
> Joshua
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-11 2:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260226192936.3190275-1-joshua.hahnjy@gmail.com>
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
2026-02-26 21:37 ` Shakeel Butt
2026-02-26 21:43 ` Joshua Hahn
2026-03-04 16:58 ` Yosry Ahmed
2026-03-04 18:03 ` Joshua Hahn
2026-03-06 3:49 ` Harry Yoo
2026-03-06 15:48 ` Joshua Hahn
2026-03-11 2:27 ` Harry Yoo
2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox