* [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc [not found] <20260226192936.3190275-1-joshua.hahnjy@gmail.com> @ 2026-02-26 19:29 ` Joshua Hahn 2026-02-26 21:37 ` Shakeel Butt ` (2 more replies) 2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn 1 sibling, 3 replies; 9+ messages in thread From: Joshua Hahn @ 2026-02-26 19:29 UTC (permalink / raw) To: Minchan Kim, Sergey Senozhatsky Cc: Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team Introduce an array of struct obj_cgroup pointers to zpdesc to keep track of compressed objects' memcg ownership. The 8 bytes required to add the array in struct zpdesc brings its size up from 56 bytes to 64 bytes. However, in the current implementation, struct zpdesc lays on top of struct page[1]. This allows the increased size to remain invisible to the outside, since 64 bytes are used for struct zpdesc anyways. The newly added obj_cgroup array pointer overlays page->memcg_data, which causes problems for functions that try to perform page charging by checking the zeroness of page->memcg_data. To make sure that the backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *, follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer. Consumers of zsmalloc that do not perform memcg accounting (i.e. zram) are completely unaffected by this patch, as the array to track the obj_cgroup pointers are only allocated in the zswap path. This patch temporarily increases the memory used by zswap by 8 bytes per zswap_entry, since the obj_cgroup pointer is duplicated in the zpdesc and in zswap_entry. In the following patches, we will redirect memory charging operations to use the zpdesc's obj_cgroup instead, and remove the pointer from zswap_entry. This will leave no net memory usage increase for both zram and zswap. In this patch, allocate / free the objcg pointer array for the zswap path, and handle partial object migration and full zpdesc migration. [1] In the (near) future, struct zpdesc may no longer overlay struct page as we shift towards using memdescs. When this happens, the size increase of struct zpdesc will no longer free. With that said, the difference can be kept minimal. All the changes that are being implemented are currently guarded under CONFIG_MEMCG. We can optionally minimize the impact on zram users by guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> --- drivers/block/zram/zram_drv.c | 10 ++--- include/linux/zsmalloc.h | 2 +- mm/zpdesc.h | 25 +++++++++++- mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------ mm/zswap.c | 2 +- 5 files changed, 93 insertions(+), 20 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 61d3e2c74901..60ee85679730 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2220,8 +2220,8 @@ static int write_incompressible_page(struct zram *zram, struct page *page, * like we do for compressible pages. */ handle = zs_malloc(zram->mem_pool, PAGE_SIZE, - GFP_NOIO | __GFP_NOWARN | - __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page)); + GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM | + __GFP_MOVABLE, page_to_nid(page), false); if (IS_ERR_VALUE(handle)) return PTR_ERR((void *)handle); @@ -2283,8 +2283,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index) } handle = zs_malloc(zram->mem_pool, comp_len, - GFP_NOIO | __GFP_NOWARN | - __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page)); + GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM | + __GFP_MOVABLE, page_to_nid(page), false); if (IS_ERR_VALUE(handle)) { zcomp_stream_put(zstrm); return PTR_ERR((void *)handle); @@ -2514,7 +2514,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page, handle_new = zs_malloc(zram->mem_pool, comp_len_new, GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM | __GFP_MOVABLE, - page_to_nid(page)); + page_to_nid(page), false); if (IS_ERR_VALUE(handle_new)) { zcomp_stream_put(zstrm); return PTR_ERR((void *)handle_new); diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 478410c880b1..8ef28b964bb0 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -28,7 +28,7 @@ struct zs_pool *zs_create_pool(const char *name); void zs_destroy_pool(struct zs_pool *pool); unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags, - const int nid); + const int nid, bool objcg); void zs_free(struct zs_pool *pool, unsigned long obj); size_t zs_huge_class_size(struct zs_pool *pool); diff --git a/mm/zpdesc.h b/mm/zpdesc.h index b8258dc78548..d10a73e4a90e 100644 --- a/mm/zpdesc.h +++ b/mm/zpdesc.h @@ -20,10 +20,12 @@ * @zspage: Points to the zspage this zpdesc is a part of. * @first_obj_offset: First object offset in zsmalloc pool. * @_refcount: The number of references to this zpdesc. + * @objcgs: Array of objcgs pointers that the stored objs + * belong to. Overlayed on top of page->memcg_data, and + * will always have first bit set if it is a valid pointer. * * This struct overlays struct page for now. Do not modify without a good - * understanding of the issues. In particular, do not expand into the overlap - * with memcg_data. + * understanding of the issues. * * Page flags used: * * PG_private identifies the first component page. @@ -47,6 +49,9 @@ struct zpdesc { */ unsigned int first_obj_offset; atomic_t _refcount; +#ifdef CONFIG_MEMCG + unsigned long objcgs; +#endif }; #define ZPDESC_MATCH(pg, zp) \ static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp)) @@ -59,6 +64,9 @@ ZPDESC_MATCH(__folio_index, handle); ZPDESC_MATCH(private, zspage); ZPDESC_MATCH(page_type, first_obj_offset); ZPDESC_MATCH(_refcount, _refcount); +#ifdef CONFIG_MEMCG +ZPDESC_MATCH(memcg_data, objcgs); +#endif #undef ZPDESC_MATCH static_assert(sizeof(struct zpdesc) <= sizeof(struct page)); @@ -171,4 +179,17 @@ static inline bool zpdesc_is_locked(struct zpdesc *zpdesc) { return folio_test_locked(zpdesc_folio(zpdesc)); } + +#ifdef CONFIG_MEMCG +static inline struct obj_cgroup **zpdesc_objcgs(struct zpdesc *zpdesc) +{ + return (struct obj_cgroup **)(zpdesc->objcgs & ~OBJEXTS_FLAGS_MASK); +} + +static inline void zpdesc_set_objcgs(struct zpdesc *zpdesc, + struct obj_cgroup **objcgs) +{ + zpdesc->objcgs = (unsigned long)objcgs | MEMCG_DATA_OBJEXTS; +} +#endif #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 7846f31bcc8b..7d56bb700e11 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -39,6 +39,7 @@ #include <linux/zsmalloc.h> #include <linux/fs.h> #include <linux/workqueue.h> +#include <linux/memcontrol.h> #include "zpdesc.h" #define ZSPAGE_MAGIC 0x58 @@ -777,6 +778,10 @@ static void reset_zpdesc(struct zpdesc *zpdesc) ClearPagePrivate(page); zpdesc->zspage = NULL; zpdesc->next = NULL; +#ifdef CONFIG_MEMCG + kfree(zpdesc_objcgs(zpdesc)); + zpdesc->objcgs = 0; +#endif /* PageZsmalloc is sticky until the page is freed to the buddy. */ } @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) set_freeobj(zspage, 0); } +#ifdef CONFIG_MEMCG +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, + struct zpdesc *zpdescs[]) +{ + /* + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be + * stored at the beginning or end of the zpdesc. + */ + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2; + int i; + struct obj_cgroup **objcgs; + + for (i = 0; i < class->pages_per_zspage; i++) { + objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *), + gfp & ~__GFP_HIGHMEM); + if (!objcgs) { + while (--i >= 0) { + kfree(zpdesc_objcgs(zpdescs[i])); + zpdescs[i]->objcgs = 0; + } + + return false; + } + + zpdesc_set_objcgs(zpdescs[i], objcgs); + } + + return true; +} +#else +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, + struct zpdesc *zpdescs[]) +{ + return true; +} +#endif + static void create_page_chain(struct size_class *class, struct zspage *zspage, struct zpdesc *zpdescs[]) { @@ -931,7 +973,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage, */ static struct zspage *alloc_zspage(struct zs_pool *pool, struct size_class *class, - gfp_t gfp, const int nid) + gfp_t gfp, const int nid, bool objcg) { int i; struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE]; @@ -952,24 +994,29 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, struct zpdesc *zpdesc; zpdesc = alloc_zpdesc(gfp, nid); - if (!zpdesc) { - while (--i >= 0) { - zpdesc_dec_zone_page_state(zpdescs[i]); - free_zpdesc(zpdescs[i]); - } - cache_free_zspage(zspage); - return NULL; - } + if (!zpdesc) + goto err; __zpdesc_set_zsmalloc(zpdesc); zpdesc_inc_zone_page_state(zpdesc); zpdescs[i] = zpdesc; } + if (objcg && !alloc_zspage_objcgs(class, gfp, zpdescs)) + goto err; + create_page_chain(class, zspage, zpdescs); init_zspage(class, zspage); return zspage; + +err: + while (--i >= 0) { + zpdesc_dec_zone_page_state(zpdescs[i]); + free_zpdesc(zpdescs[i]); + } + cache_free_zspage(zspage); + return NULL; } static struct zspage *find_get_zspage(struct size_class *class) @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool, * @size: size of block to allocate * @gfp: gfp flags when allocating object * @nid: The preferred node id to allocate new zspage (if needed) + * @objcg: Whether the zspage should track per-object memory charging. * * On success, handle to the allocated object is returned, * otherwise an ERR_PTR(). * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail. */ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, - const int nid) + const int nid, bool objcg) { unsigned long handle; struct size_class *class; @@ -1330,7 +1378,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, spin_unlock(&class->lock); - zspage = alloc_zspage(pool, class, gfp, nid); + zspage = alloc_zspage(pool, class, gfp, nid, objcg); if (!zspage) { cache_free_handle(handle); return (unsigned long)ERR_PTR(-ENOMEM); @@ -1672,6 +1720,10 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage, if (unlikely(ZsHugePage(zspage))) newzpdesc->handle = oldzpdesc->handle; __zpdesc_set_movable(newzpdesc); +#ifdef CONFIG_MEMCG + zpdesc_set_objcgs(newzpdesc, zpdesc_objcgs(oldzpdesc)); + oldzpdesc->objcgs = 0; +#endif } static bool zs_page_isolate(struct page *page, isolate_mode_t mode) diff --git a/mm/zswap.c b/mm/zswap.c index af3f0fbb0558..dd083110bfa0 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -905,7 +905,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, } gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE; - handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page)); + handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page), true); if (IS_ERR_VALUE(handle)) { alloc_ret = PTR_ERR((void *)handle); goto unlock; -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn @ 2026-02-26 21:37 ` Shakeel Butt 2026-02-26 21:43 ` Joshua Hahn 2026-03-04 16:58 ` Yosry Ahmed 2026-03-06 3:49 ` Harry Yoo 2 siblings, 1 reply; 9+ messages in thread From: Shakeel Butt @ 2026-02-26 21:37 UTC (permalink / raw) To: Joshua Hahn Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote: > --- a/mm/zpdesc.h > +++ b/mm/zpdesc.h > @@ -20,10 +20,12 @@ > * @zspage: Points to the zspage this zpdesc is a part of. > * @first_obj_offset: First object offset in zsmalloc pool. > * @_refcount: The number of references to this zpdesc. > + * @objcgs: Array of objcgs pointers that the stored objs > + * belong to. Overlayed on top of page->memcg_data, and > + * will always have first bit set if it is a valid pointer. > * > * This struct overlays struct page for now. Do not modify without a good > - * understanding of the issues. In particular, do not expand into the overlap > - * with memcg_data. > + * understanding of the issues. > * > * Page flags used: > * * PG_private identifies the first component page. > @@ -47,6 +49,9 @@ struct zpdesc { > */ > unsigned int first_obj_offset; > atomic_t _refcount; > +#ifdef CONFIG_MEMCG > + unsigned long objcgs; Why not just strore struct obj_cgroup ** instead of unsigned long? You will not need to do conversions when storing or accessing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-02-26 21:37 ` Shakeel Butt @ 2026-02-26 21:43 ` Joshua Hahn 0 siblings, 0 replies; 9+ messages in thread From: Joshua Hahn @ 2026-02-26 21:43 UTC (permalink / raw) To: Shakeel Butt Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team On Thu, 26 Feb 2026 13:37:42 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote: > On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote: > > --- a/mm/zpdesc.h > > +++ b/mm/zpdesc.h > > @@ -20,10 +20,12 @@ > > * @zspage: Points to the zspage this zpdesc is a part of. > > * @first_obj_offset: First object offset in zsmalloc pool. > > * @_refcount: The number of references to this zpdesc. > > + * @objcgs: Array of objcgs pointers that the stored objs > > + * belong to. Overlayed on top of page->memcg_data, and > > + * will always have first bit set if it is a valid pointer. > > * > > * This struct overlays struct page for now. Do not modify without a good > > - * understanding of the issues. In particular, do not expand into the overlap > > - * with memcg_data. > > + * understanding of the issues. > > * > > * Page flags used: > > * * PG_private identifies the first component page. > > @@ -47,6 +49,9 @@ struct zpdesc { > > */ > > unsigned int first_obj_offset; > > atomic_t _refcount; > > +#ifdef CONFIG_MEMCG > > + unsigned long objcgs; Hello Shakeel, I hope you're doing well! > Why not just strore struct obj_cgroup ** instead of unsigned long? You will not > need to do conversions when storing or accessing. Yeah, that makes sense to me : -) I guess if we're going to be accessing it with the zpdesc_objcgs and zpdesc_set_objcgs helpers anyways, we can abstract away the casting to unsigned long and re-casting to obj_cgroup pointer away from the user. I was just trying to skip one cast, but I think it won't matter too much anyways since it's always hidden away. I'll change it to obj_cgroup ** in the next version!! I hope you have a great day : -) Joshua ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn 2026-02-26 21:37 ` Shakeel Butt @ 2026-03-04 16:58 ` Yosry Ahmed 2026-03-04 18:03 ` Joshua Hahn 2026-03-06 3:49 ` Harry Yoo 2 siblings, 1 reply; 9+ messages in thread From: Yosry Ahmed @ 2026-03-04 16:58 UTC (permalink / raw) To: Joshua Hahn Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team > static struct zspage *find_get_zspage(struct size_class *class) > @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool, > * @size: size of block to allocate > * @gfp: gfp flags when allocating object > * @nid: The preferred node id to allocate new zspage (if needed) > + * @objcg: Whether the zspage should track per-object memory charging. > * > * On success, handle to the allocated object is returned, > * otherwise an ERR_PTR(). > * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail. > */ > unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, > - const int nid) > + const int nid, bool objcg) Instead of passing in a boolean here, what if we make it a pool parameter at creation time? I don't foresee I use case where some objects are charged and some aren't. This avoids needing to always pass objcg=true (for zswap) or objcg=false (for zram), and reduces churn. Also, it allows us to add assertions to zs_obj_write() (and elsewhere if needed) that an objcg is passed in when the pool should be charged. We can even add a zs_obj_write_objcg() variant that takes in an objcg, and keep the current one as-is. Both would internally call a helper that takes in an objcg, but that would further minimize churn to zram. Not sure if that's worth it though. Sergey, WDYT? On Thu, Feb 26, 2026 at 11:29 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote: > > Introduce an array of struct obj_cgroup pointers to zpdesc to keep track > of compressed objects' memcg ownership. > > The 8 bytes required to add the array in struct zpdesc brings its size > up from 56 bytes to 64 bytes. However, in the current implementation, > struct zpdesc lays on top of struct page[1]. This allows the increased > size to remain invisible to the outside, since 64 bytes are used for > struct zpdesc anyways. > > The newly added obj_cgroup array pointer overlays page->memcg_data, > which causes problems for functions that try to perform page charging by > checking the zeroness of page->memcg_data. To make sure that the > backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *, > follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer. > > Consumers of zsmalloc that do not perform memcg accounting (i.e. zram) > are completely unaffected by this patch, as the array to track the > obj_cgroup pointers are only allocated in the zswap path. > > This patch temporarily increases the memory used by zswap by 8 bytes > per zswap_entry, since the obj_cgroup pointer is duplicated in the > zpdesc and in zswap_entry. In the following patches, we will redirect > memory charging operations to use the zpdesc's obj_cgroup instead, and > remove the pointer from zswap_entry. This will leave no net memory usage > increase for both zram and zswap. > > In this patch, allocate / free the objcg pointer array for the zswap > path, and handle partial object migration and full zpdesc migration. > > [1] In the (near) future, struct zpdesc may no longer overlay struct > page as we shift towards using memdescs. When this happens, the size > increase of struct zpdesc will no longer free. With that said, the > difference can be kept minimal. > > All the changes that are being implemented are currently guarded under > CONFIG_MEMCG. We can optionally minimize the impact on zram users by > guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well. > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> > --- > drivers/block/zram/zram_drv.c | 10 ++--- > include/linux/zsmalloc.h | 2 +- > mm/zpdesc.h | 25 +++++++++++- > mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------ > mm/zswap.c | 2 +- > 5 files changed, 93 insertions(+), 20 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index 61d3e2c74901..60ee85679730 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -2220,8 +2220,8 @@ static int write_incompressible_page(struct zram *zram, struct page *page, > * like we do for compressible pages. > */ > handle = zs_malloc(zram->mem_pool, PAGE_SIZE, > - GFP_NOIO | __GFP_NOWARN | > - __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page)); > + GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM | > + __GFP_MOVABLE, page_to_nid(page), false); > if (IS_ERR_VALUE(handle)) > return PTR_ERR((void *)handle); > > @@ -2283,8 +2283,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index) > } > > handle = zs_malloc(zram->mem_pool, comp_len, > - GFP_NOIO | __GFP_NOWARN | > - __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page)); > + GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM | > + __GFP_MOVABLE, page_to_nid(page), false); > if (IS_ERR_VALUE(handle)) { > zcomp_stream_put(zstrm); > return PTR_ERR((void *)handle); > @@ -2514,7 +2514,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page, > handle_new = zs_malloc(zram->mem_pool, comp_len_new, > GFP_NOIO | __GFP_NOWARN | > __GFP_HIGHMEM | __GFP_MOVABLE, > - page_to_nid(page)); > + page_to_nid(page), false); > if (IS_ERR_VALUE(handle_new)) { > zcomp_stream_put(zstrm); > return PTR_ERR((void *)handle_new); > diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h > index 478410c880b1..8ef28b964bb0 100644 > --- a/include/linux/zsmalloc.h > +++ b/include/linux/zsmalloc.h > @@ -28,7 +28,7 @@ struct zs_pool *zs_create_pool(const char *name); > void zs_destroy_pool(struct zs_pool *pool); > > unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags, > - const int nid); > + const int nid, bool objcg); > void zs_free(struct zs_pool *pool, unsigned long obj); > > size_t zs_huge_class_size(struct zs_pool *pool); > diff --git a/mm/zpdesc.h b/mm/zpdesc.h > index b8258dc78548..d10a73e4a90e 100644 > --- a/mm/zpdesc.h > +++ b/mm/zpdesc.h > @@ -20,10 +20,12 @@ > * @zspage: Points to the zspage this zpdesc is a part of. > * @first_obj_offset: First object offset in zsmalloc pool. > * @_refcount: The number of references to this zpdesc. > + * @objcgs: Array of objcgs pointers that the stored objs > + * belong to. Overlayed on top of page->memcg_data, and > + * will always have first bit set if it is a valid pointer. > * > * This struct overlays struct page for now. Do not modify without a good > - * understanding of the issues. In particular, do not expand into the overlap > - * with memcg_data. > + * understanding of the issues. > * > * Page flags used: > * * PG_private identifies the first component page. > @@ -47,6 +49,9 @@ struct zpdesc { > */ > unsigned int first_obj_offset; > atomic_t _refcount; > +#ifdef CONFIG_MEMCG > + unsigned long objcgs; > +#endif > }; > #define ZPDESC_MATCH(pg, zp) \ > static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp)) > @@ -59,6 +64,9 @@ ZPDESC_MATCH(__folio_index, handle); > ZPDESC_MATCH(private, zspage); > ZPDESC_MATCH(page_type, first_obj_offset); > ZPDESC_MATCH(_refcount, _refcount); > +#ifdef CONFIG_MEMCG > +ZPDESC_MATCH(memcg_data, objcgs); > +#endif > #undef ZPDESC_MATCH > static_assert(sizeof(struct zpdesc) <= sizeof(struct page)); > > @@ -171,4 +179,17 @@ static inline bool zpdesc_is_locked(struct zpdesc *zpdesc) > { > return folio_test_locked(zpdesc_folio(zpdesc)); > } > + > +#ifdef CONFIG_MEMCG > +static inline struct obj_cgroup **zpdesc_objcgs(struct zpdesc *zpdesc) > +{ > + return (struct obj_cgroup **)(zpdesc->objcgs & ~OBJEXTS_FLAGS_MASK); > +} > + > +static inline void zpdesc_set_objcgs(struct zpdesc *zpdesc, > + struct obj_cgroup **objcgs) > +{ > + zpdesc->objcgs = (unsigned long)objcgs | MEMCG_DATA_OBJEXTS; > +} > +#endif > #endif > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 7846f31bcc8b..7d56bb700e11 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -39,6 +39,7 @@ > #include <linux/zsmalloc.h> > #include <linux/fs.h> > #include <linux/workqueue.h> > +#include <linux/memcontrol.h> > #include "zpdesc.h" > > #define ZSPAGE_MAGIC 0x58 > @@ -777,6 +778,10 @@ static void reset_zpdesc(struct zpdesc *zpdesc) > ClearPagePrivate(page); > zpdesc->zspage = NULL; > zpdesc->next = NULL; > +#ifdef CONFIG_MEMCG > + kfree(zpdesc_objcgs(zpdesc)); > + zpdesc->objcgs = 0; > +#endif > /* PageZsmalloc is sticky until the page is freed to the buddy. */ > } > > @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > set_freeobj(zspage, 0); > } > > +#ifdef CONFIG_MEMCG > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, > + struct zpdesc *zpdescs[]) > +{ > + /* > + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be > + * stored at the beginning or end of the zpdesc. > + */ > + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2; > + int i; > + struct obj_cgroup **objcgs; > + > + for (i = 0; i < class->pages_per_zspage; i++) { > + objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *), > + gfp & ~__GFP_HIGHMEM); > + if (!objcgs) { > + while (--i >= 0) { > + kfree(zpdesc_objcgs(zpdescs[i])); > + zpdescs[i]->objcgs = 0; > + } > + > + return false; > + } > + > + zpdesc_set_objcgs(zpdescs[i], objcgs); > + } > + > + return true; > +} > +#else > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, > + struct zpdesc *zpdescs[]) > +{ > + return true; > +} > +#endif > + > static void create_page_chain(struct size_class *class, struct zspage *zspage, > struct zpdesc *zpdescs[]) > { > @@ -931,7 +973,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage, > */ > static struct zspage *alloc_zspage(struct zs_pool *pool, > struct size_class *class, > - gfp_t gfp, const int nid) > + gfp_t gfp, const int nid, bool objcg) > { > int i; > struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE]; > @@ -952,24 +994,29 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, > struct zpdesc *zpdesc; > > zpdesc = alloc_zpdesc(gfp, nid); > - if (!zpdesc) { > - while (--i >= 0) { > - zpdesc_dec_zone_page_state(zpdescs[i]); > - free_zpdesc(zpdescs[i]); > - } > - cache_free_zspage(zspage); > - return NULL; > - } > + if (!zpdesc) > + goto err; > __zpdesc_set_zsmalloc(zpdesc); > > zpdesc_inc_zone_page_state(zpdesc); > zpdescs[i] = zpdesc; > } > > + if (objcg && !alloc_zspage_objcgs(class, gfp, zpdescs)) > + goto err; > + > create_page_chain(class, zspage, zpdescs); > init_zspage(class, zspage); > > return zspage; > + > +err: > + while (--i >= 0) { > + zpdesc_dec_zone_page_state(zpdescs[i]); > + free_zpdesc(zpdescs[i]); > + } > + cache_free_zspage(zspage); > + return NULL; > } > > static struct zspage *find_get_zspage(struct size_class *class) > @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool, > * @size: size of block to allocate > * @gfp: gfp flags when allocating object > * @nid: The preferred node id to allocate new zspage (if needed) > + * @objcg: Whether the zspage should track per-object memory charging. > * > * On success, handle to the allocated object is returned, > * otherwise an ERR_PTR(). > * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail. > */ > unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, > - const int nid) > + const int nid, bool objcg) > { > unsigned long handle; > struct size_class *class; > @@ -1330,7 +1378,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, > > spin_unlock(&class->lock); > > - zspage = alloc_zspage(pool, class, gfp, nid); > + zspage = alloc_zspage(pool, class, gfp, nid, objcg); > if (!zspage) { > cache_free_handle(handle); > return (unsigned long)ERR_PTR(-ENOMEM); > @@ -1672,6 +1720,10 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage, > if (unlikely(ZsHugePage(zspage))) > newzpdesc->handle = oldzpdesc->handle; > __zpdesc_set_movable(newzpdesc); > +#ifdef CONFIG_MEMCG > + zpdesc_set_objcgs(newzpdesc, zpdesc_objcgs(oldzpdesc)); > + oldzpdesc->objcgs = 0; > +#endif > } > > static bool zs_page_isolate(struct page *page, isolate_mode_t mode) > diff --git a/mm/zswap.c b/mm/zswap.c > index af3f0fbb0558..dd083110bfa0 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -905,7 +905,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, > } > > gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE; > - handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page)); > + handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page), true); > if (IS_ERR_VALUE(handle)) { > alloc_ret = PTR_ERR((void *)handle); > goto unlock; > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-03-04 16:58 ` Yosry Ahmed @ 2026-03-04 18:03 ` Joshua Hahn 0 siblings, 0 replies; 9+ messages in thread From: Joshua Hahn @ 2026-03-04 18:03 UTC (permalink / raw) To: Yosry Ahmed Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team On Wed, 4 Mar 2026 08:58:44 -0800 Yosry Ahmed <yosry@kernel.org> wrote: > > static struct zspage *find_get_zspage(struct size_class *class) > > @@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool, > > * @size: size of block to allocate > > * @gfp: gfp flags when allocating object > > * @nid: The preferred node id to allocate new zspage (if needed) > > + * @objcg: Whether the zspage should track per-object memory charging. > > * > > * On success, handle to the allocated object is returned, > > * otherwise an ERR_PTR(). > > * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail. > > */ > > unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp, > > - const int nid) > > + const int nid, bool objcg) > > Instead of passing in a boolean here, what if we make it a pool > parameter at creation time? I don't foresee I use case where some > objects are charged and some aren't. This avoids needing to always > pass objcg=true (for zswap) or objcg=false (for zram), and reduces > churn. Also, it allows us to add assertions to zs_obj_write() (and > elsewhere if needed) that an objcg is passed in when the pool should > be charged. Hi Yosry, Thank you for another great idea! Makes sense to me, then I can leave most of the existing ABI signatures alone and just make the zpool changes. Thanks a lot again : -) Joshua > We can even add a zs_obj_write_objcg() variant that takes in an objcg, > and keep the current one as-is. Both would internally call a helper > that takes in an objcg, but that would further minimize churn to zram. > Not sure if that's worth it though. Sergey, WDYT? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn 2026-02-26 21:37 ` Shakeel Butt 2026-03-04 16:58 ` Yosry Ahmed @ 2026-03-06 3:49 ` Harry Yoo 2026-03-06 15:48 ` Joshua Hahn 2 siblings, 1 reply; 9+ messages in thread From: Harry Yoo @ 2026-03-06 3:49 UTC (permalink / raw) To: Joshua Hahn Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote: > Introduce an array of struct obj_cgroup pointers to zpdesc to keep track > of compressed objects' memcg ownership. > > The 8 bytes required to add the array in struct zpdesc brings its size > up from 56 bytes to 64 bytes. However, in the current implementation, > struct zpdesc lays on top of struct page[1]. This allows the increased > size to remain invisible to the outside, since 64 bytes are used for > struct zpdesc anyways. > > The newly added obj_cgroup array pointer overlays page->memcg_data, > which causes problems for functions that try to perform page charging by > checking the zeroness of page->memcg_data. To make sure that the > backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *, > follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer. > > Consumers of zsmalloc that do not perform memcg accounting (i.e. zram) > are completely unaffected by this patch, as the array to track the > obj_cgroup pointers are only allocated in the zswap path. > > This patch temporarily increases the memory used by zswap by 8 bytes > per zswap_entry, since the obj_cgroup pointer is duplicated in the > zpdesc and in zswap_entry. In the following patches, we will redirect > memory charging operations to use the zpdesc's obj_cgroup instead, and > remove the pointer from zswap_entry. This will leave no net memory usage > increase for both zram and zswap. > > In this patch, allocate / free the objcg pointer array for the zswap > path, and handle partial object migration and full zpdesc migration. > > [1] In the (near) future, struct zpdesc may no longer overlay struct > page as we shift towards using memdescs. When this happens, the size > increase of struct zpdesc will no longer free. With that said, the > difference can be kept minimal. > > All the changes that are being implemented are currently guarded under > CONFIG_MEMCG. We can optionally minimize the impact on zram users by > guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well. > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> > --- > drivers/block/zram/zram_drv.c | 10 ++--- > include/linux/zsmalloc.h | 2 +- > mm/zpdesc.h | 25 +++++++++++- > mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------ > mm/zswap.c | 2 +- > 5 files changed, 93 insertions(+), 20 deletions(-) > > @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > set_freeobj(zspage, 0); > } > > +#ifdef CONFIG_MEMCG > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, > + struct zpdesc *zpdescs[]) > +{ > + /* > + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be > + * stored at the beginning or end of the zpdesc. > + */ > + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2; > + int i; > + struct obj_cgroup **objcgs; Just wondering, perhaps it makes more sense to have an array of objcg pointers for each zspage (of size objs_per_zspage)? > + > + for (i = 0; i < class->pages_per_zspage; i++) { > + objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *), > + gfp & ~__GFP_HIGHMEM); > + if (!objcgs) { > + while (--i >= 0) { > + kfree(zpdesc_objcgs(zpdescs[i])); > + zpdescs[i]->objcgs = 0; > + } > + > + return false; > + } > + > + zpdesc_set_objcgs(zpdescs[i], objcgs); > + } > + > + return true; > +} -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-03-06 3:49 ` Harry Yoo @ 2026-03-06 15:48 ` Joshua Hahn 2026-03-11 2:27 ` Harry Yoo 0 siblings, 1 reply; 9+ messages in thread From: Joshua Hahn @ 2026-03-06 15:48 UTC (permalink / raw) To: Harry Yoo Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team On Fri, 6 Mar 2026 12:49:44 +0900 Harry Yoo <harry.yoo@oracle.com> wrote: > On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote: > > Introduce an array of struct obj_cgroup pointers to zpdesc to keep track > > of compressed objects' memcg ownership. > > > > The 8 bytes required to add the array in struct zpdesc brings its size > > up from 56 bytes to 64 bytes. However, in the current implementation, > > struct zpdesc lays on top of struct page[1]. This allows the increased > > size to remain invisible to the outside, since 64 bytes are used for > > struct zpdesc anyways. > > > > The newly added obj_cgroup array pointer overlays page->memcg_data, > > which causes problems for functions that try to perform page charging by > > checking the zeroness of page->memcg_data. To make sure that the > > backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *, > > follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer. > > > > Consumers of zsmalloc that do not perform memcg accounting (i.e. zram) > > are completely unaffected by this patch, as the array to track the > > obj_cgroup pointers are only allocated in the zswap path. > > > > This patch temporarily increases the memory used by zswap by 8 bytes > > per zswap_entry, since the obj_cgroup pointer is duplicated in the > > zpdesc and in zswap_entry. In the following patches, we will redirect > > memory charging operations to use the zpdesc's obj_cgroup instead, and > > remove the pointer from zswap_entry. This will leave no net memory usage > > increase for both zram and zswap. > > > > In this patch, allocate / free the objcg pointer array for the zswap > > path, and handle partial object migration and full zpdesc migration. > > > > [1] In the (near) future, struct zpdesc may no longer overlay struct > > page as we shift towards using memdescs. When this happens, the size > > increase of struct zpdesc will no longer free. With that said, the > > difference can be kept minimal. > > > > All the changes that are being implemented are currently guarded under > > CONFIG_MEMCG. We can optionally minimize the impact on zram users by > > guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well. > > > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> > > --- > > drivers/block/zram/zram_drv.c | 10 ++--- > > include/linux/zsmalloc.h | 2 +- > > mm/zpdesc.h | 25 +++++++++++- > > mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------ > > mm/zswap.c | 2 +- > > 5 files changed, 93 insertions(+), 20 deletions(-) > > > > @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > > set_freeobj(zspage, 0); > > } > > > > +#ifdef CONFIG_MEMCG > > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, > > + struct zpdesc *zpdescs[]) > > +{ > > + /* > > + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be > > + * stored at the beginning or end of the zpdesc. > > + */ > > + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2; > > + int i; > > + struct obj_cgroup **objcgs; > > Just wondering, perhaps it makes more sense to have an array of > objcg pointers for each zspage (of size objs_per_zspage)? Hi Harry! I hope you are doing well, thanks for taking a look : -) Hmm, I think you might be right. For context, one of the first ideas I had for this patch was to have a per-zspage array, but store it in the first zpdesc. As you can imagine this was not a good idea... (head zpdesc page and tail zpdesc page? ;) ) But! storing it in the zspage struct makes a lot more sense to me. And I think we can actually simplify the migration pathways as well. My immediate response to this was that "subzpdesc swap ins/outs would be difficult" since right now we can just move the pointer, but if we have a per-zspage array, we actually don't have to do any objcgs pointer migration at all. And I think the cross-boundary cases are handled a lot beter by having a per-zpdesc array too. We also don't have to convert the per-zspage obj_idx into a per-zpdesc obj_idx as well, I think if we do this... Let me mull on this for a bit : -) I'll give a shot at implementing it this way, I think it makes sense! Thanks again for taking a look, Harry. Have a great day! Joshua ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc 2026-03-06 15:48 ` Joshua Hahn @ 2026-03-11 2:27 ` Harry Yoo 0 siblings, 0 replies; 9+ messages in thread From: Harry Yoo @ 2026-03-11 2:27 UTC (permalink / raw) To: Joshua Hahn Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team On Fri, Mar 06, 2026 at 07:48:06AM -0800, Joshua Hahn wrote: > On Fri, 6 Mar 2026 12:49:44 +0900 Harry Yoo <harry.yoo@oracle.com> wrote: > > > On Thu, Feb 26, 2026 at 11:29:26AM -0800, Joshua Hahn wrote: > > > Introduce an array of struct obj_cgroup pointers to zpdesc to keep track > > > of compressed objects' memcg ownership. > > > > > > The 8 bytes required to add the array in struct zpdesc brings its size > > > up from 56 bytes to 64 bytes. However, in the current implementation, > > > struct zpdesc lays on top of struct page[1]. This allows the increased > > > size to remain invisible to the outside, since 64 bytes are used for > > > struct zpdesc anyways. > > > > > > The newly added obj_cgroup array pointer overlays page->memcg_data, > > > which causes problems for functions that try to perform page charging by > > > checking the zeroness of page->memcg_data. To make sure that the > > > backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *, > > > follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer. > > > > > > Consumers of zsmalloc that do not perform memcg accounting (i.e. zram) > > > are completely unaffected by this patch, as the array to track the > > > obj_cgroup pointers are only allocated in the zswap path. > > > > > > This patch temporarily increases the memory used by zswap by 8 bytes > > > per zswap_entry, since the obj_cgroup pointer is duplicated in the > > > zpdesc and in zswap_entry. In the following patches, we will redirect > > > memory charging operations to use the zpdesc's obj_cgroup instead, and > > > remove the pointer from zswap_entry. This will leave no net memory usage > > > increase for both zram and zswap. > > > > > > In this patch, allocate / free the objcg pointer array for the zswap > > > path, and handle partial object migration and full zpdesc migration. > > > > > > [1] In the (near) future, struct zpdesc may no longer overlay struct > > > page as we shift towards using memdescs. When this happens, the size > > > increase of struct zpdesc will no longer free. With that said, the > > > difference can be kept minimal. > > > > > > All the changes that are being implemented are currently guarded under > > > CONFIG_MEMCG. We can optionally minimize the impact on zram users by > > > guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well. > > > > > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> > > > --- > > > drivers/block/zram/zram_drv.c | 10 ++--- > > > include/linux/zsmalloc.h | 2 +- > > > mm/zpdesc.h | 25 +++++++++++- > > > mm/zsmalloc.c | 74 +++++++++++++++++++++++++++++------ > > > mm/zswap.c | 2 +- > > > 5 files changed, 93 insertions(+), 20 deletions(-) > > > > > > @@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > > > set_freeobj(zspage, 0); > > > } > > > > > > +#ifdef CONFIG_MEMCG > > > +static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, > > > + struct zpdesc *zpdescs[]) > > > +{ > > > + /* > > > + * Add 2 to objcgs_per_zpdesc to account for partial objs that may be > > > + * stored at the beginning or end of the zpdesc. > > > + */ > > > + int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2; > > > + int i; > > > + struct obj_cgroup **objcgs; > > > > Just wondering, perhaps it makes more sense to have an array of > > objcg pointers for each zspage (of size objs_per_zspage)? > > Hi Harry! I hope you are doing well, thanks for taking a look : -) > > Hmm, I think you might be right. For context, one of the first > ideas I had for this patch was to have a per-zspage array, but store > it in the first zpdesc. As you can imagine this was not a good idea... > (head zpdesc page and tail zpdesc page? ;) ) Yeah that's not good ;) > But! storing it in the zspage struct makes a lot more sense to me. > And I think we can actually simplify the migration pathways as well. > > My immediate response to this was that "subzpdesc swap ins/outs would > be difficult" since right now we can just move the pointer, but > if we have a per-zspage array, we actually don't have to do any > objcgs pointer migration at all. Right. > And I think the cross-boundary cases are handled a lot beter by having > a per-zpdesc array too. We also don't have to convert the per-zspage > obj_idx into a per-zpdesc obj_idx as well, I think if we do this... Right. > Let me mull on this for a bit : -) I'll give a shot at implementing > it this way, I think it makes sense! > > Thanks again for taking a look, Harry. Have a great day! Happy to help (hehe, it just looked a bit more natural), have a good day! > Joshua -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc [not found] <20260226192936.3190275-1-joshua.hahnjy@gmail.com> 2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn @ 2026-02-26 19:29 ` Joshua Hahn 1 sibling, 0 replies; 9+ messages in thread From: Joshua Hahn @ 2026-02-26 19:29 UTC (permalink / raw) To: Minchan Kim, Sergey Senozhatsky Cc: Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block, linux-kernel, kernel-team With each zswap-backing zpdesc now having an array of obj_cgroup pointers, plumb the obj_cgroup pointer from the zswap / zram layer down to zsmalloc. Introduce two helper functions zpdesc_obj_cgroup and zpdesc_set_obj_cgroup, which abstract the conversion of an object's zspage idx to its zpdesc idx and the retrieval of the obj_cgroup pointer from the zpdesc. From the zswap path, store the obj_cgroup pointer after compression when writing the object and free when the object gets freed. Also handle the migration of an object across zpdescs. The lifetime and charging of the obj_cgroup is still handled in the zswap layer. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> --- drivers/block/zram/zram_drv.c | 7 ++-- include/linux/zsmalloc.h | 3 +- mm/zsmalloc.c | 71 ++++++++++++++++++++++++++++++++++- mm/zswap.c | 6 +-- 4 files changed, 79 insertions(+), 8 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 60ee85679730..209668b14428 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2231,7 +2231,7 @@ static int write_incompressible_page(struct zram *zram, struct page *page, } src = kmap_local_page(page); - zs_obj_write(zram->mem_pool, handle, src, PAGE_SIZE); + zs_obj_write(zram->mem_pool, handle, src, PAGE_SIZE, NULL); kunmap_local(src); slot_lock(zram, index); @@ -2296,7 +2296,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index) return -ENOMEM; } - zs_obj_write(zram->mem_pool, handle, zstrm->buffer, comp_len); + zs_obj_write(zram->mem_pool, handle, zstrm->buffer, comp_len, NULL); zcomp_stream_put(zstrm); slot_lock(zram, index); @@ -2520,7 +2520,8 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page, return PTR_ERR((void *)handle_new); } - zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer, comp_len_new); + zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer, + comp_len_new, NULL); zcomp_stream_put(zstrm); slot_free(zram, index); diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 8ef28b964bb0..22f3baa13f24 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -15,6 +15,7 @@ #define _ZS_MALLOC_H_ #include <linux/types.h> +#include <linux/memcontrol.h> struct zs_pool_stats { /* How many pages were migrated (freed) */ @@ -48,7 +49,7 @@ void zs_obj_read_sg_begin(struct zs_pool *pool, unsigned long handle, struct scatterlist *sg, size_t mem_len); void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle); void zs_obj_write(struct zs_pool *pool, unsigned long handle, - void *handle_mem, size_t mem_len); + void *handle_mem, size_t mem_len, struct obj_cgroup *objcg); extern const struct movable_operations zsmalloc_mops; diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 7d56bb700e11..e5ae9a0fc78a 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -899,6 +899,41 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) } #ifdef CONFIG_MEMCG +/* idx is indexed per-zspage, not per-zpdesc. */ +static inline struct obj_cgroup *zpdesc_obj_cgroup(struct zpdesc *zpdesc, + unsigned int idx, + int size) +{ + struct obj_cgroup **objcgs = zpdesc_objcgs(zpdesc); + unsigned int off = offset_in_page(size * idx); + unsigned int zpdesc_idx = DIV_ROUND_UP(off, size); + + if (!objcgs) + return NULL; + + return objcgs[zpdesc_idx]; +} + +/* idx is indexed per-zspage, not per-zpdesc. */ +static inline void zpdesc_set_obj_cgroup(struct zpdesc *zpdesc, + unsigned int idx, int size, + struct obj_cgroup *objcg) +{ + struct obj_cgroup **objcgs = zpdesc_objcgs(zpdesc); + unsigned int off = offset_in_page(size * idx); + unsigned int zpdesc_idx = DIV_ROUND_UP(off, size); + + if (!objcgs) + return; + + objcgs[zpdesc_idx] = objcg; + if (off + size > PAGE_SIZE) { + /* object spans two pages */ + objcgs = zpdesc_objcgs(get_next_zpdesc(zpdesc)); + objcgs[0] = objcg; + } +} + static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, struct zpdesc *zpdescs[]) { @@ -927,12 +962,40 @@ static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, return true; } + +static void migrate_obj_objcg(unsigned long used_obj, unsigned long free_obj, + int size) +{ + unsigned int s_obj_idx, d_obj_idx; + struct zpdesc *s_zpdesc, *d_zpdesc; + struct obj_cgroup *objcg; + + obj_to_location(used_obj, &s_zpdesc, &s_obj_idx); + obj_to_location(free_obj, &d_zpdesc, &d_obj_idx); + objcg = zpdesc_obj_cgroup(s_zpdesc, s_obj_idx, size); + + zpdesc_set_obj_cgroup(d_zpdesc, d_obj_idx, size, objcg); + zpdesc_set_obj_cgroup(s_zpdesc, s_obj_idx, size, NULL); +} #else +static inline struct obj_cgroup *zpdesc_obj_cgroup(struct zpdesc *zpdesc, + unsigned int offset, + int size) +{ + return NULL; +} + +static inline void zpdesc_set_obj_cgroup(struct zpdesc *zpdesc, + unsigned int offset, int size, + struct obj_cgroup *objcg) {} static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp, struct zpdesc *zpdescs[]) { return true; } + +static void migrate_obj_objcg(unsigned long used_obj, unsigned long free_obj, + int size) {} #endif static void create_page_chain(struct size_class *class, struct zspage *zspage, @@ -1221,7 +1284,7 @@ void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle) EXPORT_SYMBOL_GPL(zs_obj_read_sg_end); void zs_obj_write(struct zs_pool *pool, unsigned long handle, - void *handle_mem, size_t mem_len) + void *handle_mem, size_t mem_len, struct obj_cgroup *objcg) { struct zspage *zspage; struct zpdesc *zpdesc; @@ -1242,6 +1305,9 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); + if (objcg) + zpdesc_set_obj_cgroup(zpdesc, obj_idx, class->size, objcg); + if (!ZsHugePage(zspage)) off += ZS_HANDLE_SIZE; @@ -1415,6 +1481,8 @@ static void obj_free(int class_size, unsigned long obj) f_offset = offset_in_page(class_size * f_objidx); zspage = get_zspage(f_zpdesc); + zpdesc_set_obj_cgroup(f_zpdesc, f_objidx, class_size, NULL); + vaddr = kmap_local_zpdesc(f_zpdesc); link = (struct link_free *)(vaddr + f_offset); @@ -1587,6 +1655,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage, used_obj = handle_to_obj(handle); free_obj = obj_malloc(pool, dst_zspage, handle); zs_obj_copy(class, free_obj, used_obj); + migrate_obj_objcg(used_obj, free_obj, class->size); obj_idx++; obj_free(class->size, used_obj); diff --git a/mm/zswap.c b/mm/zswap.c index dd083110bfa0..1e2d60f47919 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -851,7 +851,7 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx) } static bool zswap_compress(struct page *page, struct zswap_entry *entry, - struct zswap_pool *pool) + struct zswap_pool *pool, struct obj_cgroup *objcg) { struct crypto_acomp_ctx *acomp_ctx; struct scatterlist input, output; @@ -911,7 +911,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, goto unlock; } - zs_obj_write(pool->zs_pool, handle, dst, dlen); + zs_obj_write(pool->zs_pool, handle, dst, dlen, objcg); entry->handle = handle; entry->length = dlen; @@ -1413,7 +1413,7 @@ static bool zswap_store_page(struct page *page, return false; } - if (!zswap_compress(page, entry, pool)) + if (!zswap_compress(page, entry, pool, objcg)) goto compress_failed; old = xa_store(swap_zswap_tree(page_swpentry), -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-11 2:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260226192936.3190275-1-joshua.hahnjy@gmail.com>
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
2026-02-26 21:37 ` Shakeel Butt
2026-02-26 21:43 ` Joshua Hahn
2026-03-04 16:58 ` Yosry Ahmed
2026-03-04 18:03 ` Joshua Hahn
2026-03-06 3:49 ` Harry Yoo
2026-03-06 15:48 ` Joshua Hahn
2026-03-11 2:27 ` Harry Yoo
2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox