public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting
@ 2026-03-11 19:51 Joshua Hahn
  2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Harry Yoo, Andrew Morton, cgroups, linux-mm,
	linux-kernel, kernel-team

INTRODUCTION
============
The current design for zswap and zsmalloc leaves a clean divide between
layers of the memory stack. At the higher level, we have zswap, which
interacts directly with memory consumers, compression algorithms, and
handles memory usage accounting via memcg limits. At the lower level,
we have zsmalloc, which handles the page allocation and migration of
physical pages.

While this logical separation simplifies the codebase, it leaves
problems for accounting that requires both memory cgroup awareness and
physical memory location. To name a few:

 - On tiered systems, it is impossible to understand how much toptier
   memory a cgroup is using, since zswap has no understanding of where
   the compressed memory is physically stored.
   + With SeongJae Park's work to store incompressible pages as-is in
     zswap [1], the size of compressed memory can become non-trivial,
     and easily consume a meaningful portion of memory.

 - cgroups that restrict memory nodes have no control over which nodes
   their zswapped objects live on. This can lead to unexpectedly high
   fault times for workloads, who must eat the remote access latency
   cost of retrieving the compressed object from a remote node.
   + Nhat Pham addressed this issue via a best-effort attempt to place
     compressed objects in the same page as the original page, but this
     cannot guarantee complete isolation [2].

 - On the flip side, zsmalloc's ignorance of cgroup also makes its
   shrinker memcg-unaware, which can lead to ineffective reclaim when
   pressure is localized to a single cgroup.

Until recently, zpool acted as another layer of indirection between
zswap and zsmalloc, which made bridging memcg and physical location
difficult. Now that zsmalloc is the only allocator backend for zswap and
zram [3], it is possible to move memory-cgroup accounting to the
zsmalloc layer.

Introduce a new per-zspage array of objcg pointers to track
per-memcg-lruvec memory usage by zswap, while leaving zram users
mostly unaffected.

In addition, move the accounting of memcg charges from the consumer
layer (zswap, zram) to the zsmalloc layer. Stat indices are
parameterized at pool creation time, meaning future consumers that wish
to account memory statistics can do so using the compressed object
memory accounting infrastructure introduced here.

PERFORMANCE
===========
The experiments were performed across 5 trials on a 2-NUMA machine.

Experiment 1
Node-bound workload, churning memory by allocating 2GB in 1GB cgroup.
0.638% regression, standard deviation: +/- 0.603%

Experiment 2:
Writeback with zswap pressure
0.295% gain, standard deviation: +/- 0.456%

Experiment 3:
1 cgroup, 2 workloads each bound to a NUMA node.
2.126% regression, standard deviation: +/- 3.008%

Experiment 4:
Reading memory.stat 10000x
1.464% gain, standard deviation: +/- 2.239%

Experiment 5:
Reading memory.numa_stat 10000x
0.281% gain, standard deviation: +/- 1.878%

It seems like all of the gains or regressions are mostly within the
standard deviation. I would like to note that workloads that span NUMA
nodes may see some contention as the zsmalloc migration path becomes
more expensive.

PATCH OUTLINE
=============
Patches 1 and 2 are small cleanups that make the codebase consistent and
easier to digest.

Patch 3 introduces memcg accounting-awareness to struct zs_pool, and
allows consumers to provide the memcg stat item indices that should be
accounted. The awareness is not functional at this point.

Patches 4, 5, and 6 allocate and populate the new zspage->objcgs field
with compressed objects' obj_cgroups. zswap_entry->objcgs is removed
and redirected to look at the zspage for memcg information.

Patch 7 moves the charging and lifetime management of obj_cgroups to the
zsmalloc layer, which leaves zswwap only as a plumbing layer to hand
cgroup information to zsmalloc at compression time.

Patches 8 and 9 introduce node counters and memcg-lruvec counters for
zswap.

Patches 10 and 11 handle charge migrations for the two types of compressed
object migration in zsmalloc. Special care is taken for compressed
objects that span multiple nodes.

CHANGELOG V1 [4] --> V2
=======================
A lot has changed from v1 and v2, thanks to the generous suggestions
from reviewers.
- Harry Yoo's suggestion to make the objcgs array per-zspage instead of
  per-zpdesc simplified much of the code needed to handle boundary
  cases. By moving the array to be per-zspage, much of the index
  translation (from per-zspage to per-zpdesc) has been simplified. Note
  that this does make the reverse true (per-zpdesc to per-zspage is
  harder now), but the only case this really matters is during the
  charge migration case in patch 10. Thank you Harry!

- Yosry Ahmed's suggestion to make memcg awareness a per-zspool decision
  has simplified much of the #ifdef casing needed, which makes the code
  a lot easier to follow (and makes changes less invasive for zram).

- Yosry Ahmed's suggestion to parameterize the memcg stat indices as
  zs_pool parameter makes the awkward hardcoding of zswap stat indices
  in zsmalloc code more natural & leaves room for future consumers to
  follow. Thank you Yosry!

- Shakeel Butt's suggestion to turn the objcgs array from an unsigned
  long to an objcgs ** pointer made the code much cleaner. However,
  after moving the pointer from zpdesc to zspage, there is now no longer
  a need to tag the pointer. Thank you, Shakeel!

- v1 only handled the migration case for single compressed objects.
  Patch 10 in v2 is written to handle the migration case for zpdesc
  replacement.
  + Special-casing compressed objects living at the boundary are a tad
    bit harder with per-zspage objcgs. I felt that this difficulty was
    outweighed by the simplification in the "typical" write/free case,
    though. 

REVIEWERS NOTE
==============
Patches 10 and 11 are a bit hairy, since they have to deal with special
casing scenarios for objects that span pages. I originally implemented a
very simple approach which uses the existing zs_charge_objcg functions,
but later realized that these migration paths take spin locks and
therefore cannot accept obj_cgroup_charge going to sleep.

The workaround is less elegant, but gets the job done. Feedback on these
two commits would be greatly appreciated!

[1] https://lore.kernel.org/linux-mm/20250822190817.49287-1-sj@kernel.org/
[2] https://lore.kernel.org/linux-mm/20250402204416.3435994-1-nphamcs@gmail.com/#t3
[3] https://lore.kernel.org/linux-mm/20250829162212.208258-1-hannes@cmpxchg.org/
[4] https://lore.kernel.org/all/20260226192936.3190275-1-joshua.hahnjy@gmail.com/

Joshua Hahn (11):
  mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
  mm/zsmalloc: Make all obj_idx unsigned ints
  mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
  mm/zsmalloc: Introduce objcgs pointer in struct zspage
  mm/zsmalloc: Store obj_cgroup pointer in zspage
  mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage
  mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
  mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
  mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec
  mm/zsmalloc: Handle single object charge migration in migrate_zspage
  mm/zsmalloc: Handle charge migration in zpdesc substitution

 drivers/block/zram/zram_drv.c |  10 +-
 include/linux/memcontrol.h    |  20 +-
 include/linux/mmzone.h        |   2 +
 include/linux/zsmalloc.h      |   9 +-
 mm/memcontrol.c               |  75 ++-----
 mm/vmstat.c                   |   2 +
 mm/zsmalloc.c                 | 381 ++++++++++++++++++++++++++++++++--
 mm/zswap.c                    |  66 +++---
 8 files changed, 431 insertions(+), 134 deletions(-)

-- 
2.52.0



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 19:56   ` Yosry Ahmed
  2026-03-11 20:00   ` Nhat Pham
  2026-03-11 19:51 ` [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Nhat Pham, Nhat Pham, Johannes Weiner, Andrew Morton, linux-mm,
	linux-kernel, kernel-team

All the zsmalloc functions that operate on a zsmalloc object (encoded
location values) are named "zs_obj_xxx", except for zs_object_copy.

Rename zs_object_copy to zs_obj_copy to conform to the pattern.
No functional changes intended.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/zsmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2c1430bf8d57..7a9b8f55d529 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1416,7 +1416,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
-static void zs_object_copy(struct size_class *class, unsigned long dst,
+static void zs_obj_copy(struct size_class *class, unsigned long dst,
 				unsigned long src)
 {
 	struct zpdesc *s_zpdesc, *d_zpdesc;
@@ -1537,7 +1537,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
 
 		used_obj = handle_to_obj(handle);
 		free_obj = obj_malloc(pool, dst_zspage, handle);
-		zs_object_copy(class, free_obj, used_obj);
+		zs_obj_copy(class, free_obj, used_obj);
 		obj_idx++;
 		obj_free(class->size, used_obj);
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
  2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 19:58   ` Yosry Ahmed
  2026-03-11 20:01   ` Nhat Pham
  2026-03-11 19:51 ` [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool Joshua Hahn
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Nhat Pham, Nhat Pham, Johannes Weiner, Andrew Morton, linux-mm,
	linux-kernel, kernel-team

object indices, which describe the location of an object in a zspage,
cannot be negative. To reflect this most helpers calculate and return
these values as unsigned ints.

Convert find_alloced_obj, the only function that calculates obj_idx as
a signed int, to use an unsigned int as well.

No functional change intended.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/zsmalloc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7a9b8f55d529..7758486e1d06 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1491,10 +1491,11 @@ static void zs_obj_copy(struct size_class *class, unsigned long dst,
  * return handle.
  */
 static unsigned long find_alloced_obj(struct size_class *class,
-				      struct zpdesc *zpdesc, int *obj_idx)
+				      struct zpdesc *zpdesc,
+				      unsigned int *obj_idx)
 {
 	unsigned int offset;
-	int index = *obj_idx;
+	unsigned int index = *obj_idx;
 	unsigned long handle = 0;
 	void *addr = kmap_local_zpdesc(zpdesc);
 
@@ -1521,7 +1522,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
 {
 	unsigned long used_obj, free_obj;
 	unsigned long handle;
-	int obj_idx = 0;
+	unsigned int obj_idx = 0;
 	struct zpdesc *s_zpdesc = get_first_zpdesc(src_zspage);
 	struct size_class *class = pool->size_class[src_zspage->class];
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
  2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
  2026-03-11 19:51 ` [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 20:12   ` Nhat Pham
  2026-03-11 20:16   ` Johannes Weiner
  2026-03-11 19:51 ` [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage Joshua Hahn
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Andrew Morton, linux-mm, linux-block,
	linux-kernel, kernel-team

Introduce 3 new fields to struct zs_pool to allow individual zpools to
be "memcg-aware": memcg_aware, compressed_stat, and uncompressed_stat.

memcg_aware is used in later patches to determine whether memory
should be allocated to keep track of per-compresed object objgs.
compressed_stat and uncompressed_stat are enum indices that point into
memcg (node) stats that zsmalloc will account towards.

In reality, these fields help distinguish between the two users of
zsmalloc, zswap and zram. The enum indices compressed_stat and
uncompressed_stat are parametrized to minimize zswap-specific hardcoding
in zsmalloc.

Suggested-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 drivers/block/zram/zram_drv.c |  3 ++-
 include/linux/zsmalloc.h      |  5 ++++-
 mm/zsmalloc.c                 | 13 ++++++++++++-
 mm/zswap.c                    |  3 ++-
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index bca33403fc8b..d1eae5c20df7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1980,7 +1980,8 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
 	if (!zram->table)
 		return false;
 
-	zram->mem_pool = zs_create_pool(zram->disk->disk_name);
+	/* zram does not support memcg accounting */
+	zram->mem_pool = zs_create_pool(zram->disk->disk_name, false, 0, 0);
 	if (!zram->mem_pool) {
 		vfree(zram->table);
 		zram->table = NULL;
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 478410c880b1..24fb2e0fdf67 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -23,8 +23,11 @@ struct zs_pool_stats {
 
 struct zs_pool;
 struct scatterlist;
+enum memcg_stat_item;
 
-struct zs_pool *zs_create_pool(const char *name);
+struct zs_pool *zs_create_pool(const char *name, bool memcg_aware,
+			       enum memcg_stat_item compressed_stat,
+			       enum memcg_stat_item uncompressed_stat);
 void zs_destroy_pool(struct zs_pool *pool);
 
 unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags,
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7758486e1d06..3f0f42b78314 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -214,6 +214,9 @@ struct zs_pool {
 #ifdef CONFIG_COMPACTION
 	struct work_struct free_work;
 #endif
+	bool memcg_aware;
+	enum memcg_stat_item compressed_stat;
+	enum memcg_stat_item uncompressed_stat;
 	/* protect zspage migration/compaction */
 	rwlock_t lock;
 	atomic_t compaction_in_progress;
@@ -2050,6 +2053,9 @@ static int calculate_zspage_chain_size(int class_size)
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @name: pool name to be created
+ * @memcg_aware: whether the consumer of this pool will account memcg stats
+ * @compressed_stat: compressed memcontrol stat item to account
+ * @uncompressed_stat: uncompressed memcontrol stat item to account
  *
  * This function must be called before anything when using
  * the zsmalloc allocator.
@@ -2057,7 +2063,9 @@ static int calculate_zspage_chain_size(int class_size)
  * On success, a pointer to the newly created pool is returned,
  * otherwise NULL.
  */
-struct zs_pool *zs_create_pool(const char *name)
+struct zs_pool *zs_create_pool(const char *name, bool memcg_aware,
+			       enum memcg_stat_item compressed_stat,
+			       enum memcg_stat_item uncompressed_stat)
 {
 	int i;
 	struct zs_pool *pool;
@@ -2071,6 +2079,9 @@ struct zs_pool *zs_create_pool(const char *name)
 	rwlock_init(&pool->lock);
 	atomic_set(&pool->compaction_in_progress, 0);
 
+	pool->memcg_aware = memcg_aware;
+	pool->compressed_stat = compressed_stat;
+	pool->uncompressed_stat = uncompressed_stat;
 	pool->name = kstrdup(name, GFP_KERNEL);
 	if (!pool->name)
 		goto err;
diff --git a/mm/zswap.c b/mm/zswap.c
index e6ec3295bdb0..ff9abaa8aa38 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -257,7 +257,8 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 
 	/* unique name for each pool specifically required by zsmalloc */
 	snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
-	pool->zs_pool = zs_create_pool(name);
+	pool->zs_pool = zs_create_pool(name, true, MEMCG_ZSWAP_B,
+				       MEMCG_ZSWAPPED);
 	if (!pool->zs_pool)
 		goto error;
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (2 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 20:17   ` Nhat Pham
  2026-03-11 19:51 ` [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage Joshua Hahn
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Harry Yoo, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Andrew Morton, linux-mm, linux-block,
	linux-kernel, kernel-team

Introduce an array of struct obj_cgroup pointers to zspage to keep track
of compressed objects' memcg ownership, if the zs_pool has been made to
be memcg-aware at creation time.

Move the error path for alloc_zspage to a jump label to simplify the
growing error handling path for a failed zpdesc allocation.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/zsmalloc.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 3f0f42b78314..dcf99516227c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -39,6 +39,7 @@
 #include <linux/zsmalloc.h>
 #include <linux/fs.h>
 #include <linux/workqueue.h>
+#include <linux/memcontrol.h>
 #include "zpdesc.h"
 
 #define ZSPAGE_MAGIC	0x58
@@ -273,6 +274,7 @@ struct zspage {
 	struct zpdesc *first_zpdesc;
 	struct list_head list; /* fullness list */
 	struct zs_pool *pool;
+	struct obj_cgroup **objcgs;
 	struct zspage_lock zsl;
 };
 
@@ -825,6 +827,8 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
 		zpdesc = next;
 	} while (zpdesc != NULL);
 
+	if (pool->memcg_aware)
+		kfree(zspage->objcgs);
 	cache_free_zspage(zspage);
 
 	class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
@@ -946,6 +950,16 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 	if (!IS_ENABLED(CONFIG_COMPACTION))
 		gfp &= ~__GFP_MOVABLE;
 
+	if (pool->memcg_aware) {
+		zspage->objcgs = kcalloc(class->objs_per_zspage,
+					 sizeof(struct obj_cgroup *),
+					 gfp & ~__GFP_HIGHMEM);
+		if (!zspage->objcgs) {
+			cache_free_zspage(zspage);
+			return NULL;
+		}
+	}
+
 	zspage->magic = ZSPAGE_MAGIC;
 	zspage->pool = pool;
 	zspage->class = class->index;
@@ -955,14 +969,8 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 		struct zpdesc *zpdesc;
 
 		zpdesc = alloc_zpdesc(gfp, nid);
-		if (!zpdesc) {
-			while (--i >= 0) {
-				zpdesc_dec_zone_page_state(zpdescs[i]);
-				free_zpdesc(zpdescs[i]);
-			}
-			cache_free_zspage(zspage);
-			return NULL;
-		}
+		if (!zpdesc)
+			goto err;
 		__zpdesc_set_zsmalloc(zpdesc);
 
 		zpdesc_inc_zone_page_state(zpdesc);
@@ -973,6 +981,16 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 	init_zspage(class, zspage);
 
 	return zspage;
+
+err:
+	while (--i >= 0) {
+		zpdesc_dec_zone_page_state(zpdescs[i]);
+		free_zpdesc(zpdescs[i]);
+	}
+	if (pool->memcg_aware)
+		kfree(zspage->objcgs);
+	cache_free_zspage(zspage);
+	return NULL;
 }
 
 static struct zspage *find_get_zspage(struct size_class *class)
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (3 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 20:17   ` Yosry Ahmed
  2026-03-11 19:51 ` [PATCH 06/11] mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage Joshua Hahn
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Jens Axboe, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Andrew Morton, linux-mm, linux-block,
	linux-kernel, kernel-team

With each zspage now having an array of obj_cgroup pointers, plumb the
obj_cgroup pointer from the zswap / zram layer down to zsmalloc.

zram still sees no visible change from its end. For the zswap path,
store the obj_cgroup pointer after compression when writing the object,
and erase the pointer when the object gets freed.

The lifetime and charging of the obj_cgroup is still handled in the
zswap layer.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 drivers/block/zram/zram_drv.c |  7 ++++---
 include/linux/zsmalloc.h      |  3 ++-
 mm/zsmalloc.c                 | 25 ++++++++++++++++++++++++-
 mm/zswap.c                    |  6 +++---
 4 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d1eae5c20df7..e68e408992e7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2232,7 +2232,7 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
 	}
 
 	src = kmap_local_page(page);
-	zs_obj_write(zram->mem_pool, handle, src, PAGE_SIZE);
+	zs_obj_write(zram->mem_pool, handle, src, PAGE_SIZE, NULL);
 	kunmap_local(src);
 
 	slot_lock(zram, index);
@@ -2297,7 +2297,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 		return -ENOMEM;
 	}
 
-	zs_obj_write(zram->mem_pool, handle, zstrm->buffer, comp_len);
+	zs_obj_write(zram->mem_pool, handle, zstrm->buffer, comp_len, NULL);
 	zcomp_stream_put(zstrm);
 
 	slot_lock(zram, index);
@@ -2521,7 +2521,8 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
 		return PTR_ERR((void *)handle_new);
 	}
 
-	zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer, comp_len_new);
+	zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer,
+		     comp_len_new, NULL);
 	zcomp_stream_put(zstrm);
 
 	slot_free(zram, index);
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 24fb2e0fdf67..645957a156c4 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -23,6 +23,7 @@ struct zs_pool_stats {
 
 struct zs_pool;
 struct scatterlist;
+struct obj_cgroup;
 enum memcg_stat_item;
 
 struct zs_pool *zs_create_pool(const char *name, bool memcg_aware,
@@ -51,7 +52,7 @@ void zs_obj_read_sg_begin(struct zs_pool *pool, unsigned long handle,
 			  struct scatterlist *sg, size_t mem_len);
 void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle);
 void zs_obj_write(struct zs_pool *pool, unsigned long handle,
-		  void *handle_mem, size_t mem_len);
+		  void *handle_mem, size_t mem_len, struct obj_cgroup *objcg);
 
 extern const struct movable_operations zsmalloc_mops;
 
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index dcf99516227c..d4735451c273 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1195,7 +1195,7 @@ void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle)
 EXPORT_SYMBOL_GPL(zs_obj_read_sg_end);
 
 void zs_obj_write(struct zs_pool *pool, unsigned long handle,
-		  void *handle_mem, size_t mem_len)
+		  void *handle_mem, size_t mem_len, struct obj_cgroup *objcg)
 {
 	struct zspage *zspage;
 	struct zpdesc *zpdesc;
@@ -1216,6 +1216,11 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
 	class = zspage_class(pool, zspage);
 	off = offset_in_page(class->size * obj_idx);
 
+	if (objcg) {
+		WARN_ON_ONCE(!pool->memcg_aware);
+		zspage->objcgs[obj_idx] = objcg;
+	}
+
 	if (!ZsHugePage(zspage))
 		off += ZS_HANDLE_SIZE;
 
@@ -1388,6 +1393,9 @@ static void obj_free(int class_size, unsigned long obj)
 	f_offset = offset_in_page(class_size * f_objidx);
 	zspage = get_zspage(f_zpdesc);
 
+	if (zspage->pool->memcg_aware)
+		zspage->objcgs[f_objidx] = NULL;
+
 	vaddr = kmap_local_zpdesc(f_zpdesc);
 	link = (struct link_free *)(vaddr + f_offset);
 
@@ -1538,6 +1546,16 @@ static unsigned long find_alloced_obj(struct size_class *class,
 	return handle;
 }
 
+static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
+			     unsigned long used_obj, unsigned long free_obj)
+{
+	unsigned int s_idx = used_obj & OBJ_INDEX_MASK;
+	unsigned int d_idx = free_obj & OBJ_INDEX_MASK;
+
+	d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
+	s_zspage->objcgs[s_idx] = NULL;
+}
+
 static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
 			   struct zspage *dst_zspage)
 {
@@ -1560,6 +1578,11 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
 		used_obj = handle_to_obj(handle);
 		free_obj = obj_malloc(pool, dst_zspage, handle);
 		zs_obj_copy(class, free_obj, used_obj);
+
+		if (pool->memcg_aware)
+			zs_migrate_objcg(src_zspage, dst_zspage,
+					 used_obj, free_obj);
+
 		obj_idx++;
 		obj_free(class->size, used_obj);
 
diff --git a/mm/zswap.c b/mm/zswap.c
index ff9abaa8aa38..68b87c3cc326 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -852,7 +852,7 @@ static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
 }
 
 static bool zswap_compress(struct page *page, struct zswap_entry *entry,
-			   struct zswap_pool *pool)
+			   struct zswap_pool *pool, struct obj_cgroup *objcg)
 {
 	struct crypto_acomp_ctx *acomp_ctx;
 	struct scatterlist input, output;
@@ -912,7 +912,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 		goto unlock;
 	}
 
-	zs_obj_write(pool->zs_pool, handle, dst, dlen);
+	zs_obj_write(pool->zs_pool, handle, dst, dlen, objcg);
 	entry->handle = handle;
 	entry->length = dlen;
 
@@ -1414,7 +1414,7 @@ static bool zswap_store_page(struct page *page,
 		return false;
 	}
 
-	if (!zswap_compress(page, entry, pool))
+	if (!zswap_compress(page, entry, pool, objcg))
 		goto compress_failed;
 
 	old = xa_store(swap_zswap_tree(page_swpentry),
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/11] mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (4 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 19:51 ` [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Andrew Morton, linux-mm, linux-kernel,
	kernel-team

Now that obj_cgroups are tracked in the zspage, redirect the zswap layer
to use the pointer stored in the zspage and remove the pointer in
struct zswap_entry.

This offsets the temporary memory increase caused by the duplicate
storage of the obj_cgroup pointer and results in a net zero memory
footprint change (aside from the array pointer and flags in zspage).

The lifetime and charging of the obj_cgroup is still handled in the
zswap layer.

Clean up mem_cgroup_from_entry, which has no remaining callers.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 include/linux/memcontrol.h |  5 ++++
 include/linux/zsmalloc.h   |  1 +
 mm/zsmalloc.c              | 25 +++++++++++++++++++
 mm/zswap.c                 | 50 +++++++++++++++++---------------------
 4 files changed, 53 insertions(+), 28 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 70b685a85bf4..0652db4ff2d5 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1072,6 +1072,11 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob
 	return NULL;
 }
 
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+	return NULL;
+}
+
 static inline bool folio_memcg_kmem(struct folio *folio)
 {
 	return false;
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 645957a156c4..6010d8dac9ff 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -41,6 +41,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool);
 unsigned long zs_compact(struct zs_pool *pool);
 
 unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size);
+struct obj_cgroup *zs_lookup_objcg(struct zs_pool *pool, unsigned long handle);
 
 void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
 
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index d4735451c273..a94ca8c26ad9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1049,6 +1049,31 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size)
 }
 EXPORT_SYMBOL_GPL(zs_lookup_class_index);
 
+struct obj_cgroup *zs_lookup_objcg(struct zs_pool *pool, unsigned long handle)
+{
+	unsigned long obj;
+	struct zpdesc *zpdesc;
+	struct zspage *zspage;
+	struct obj_cgroup *objcg;
+	unsigned int obj_idx;
+
+	if (!pool->memcg_aware)
+		return NULL;
+
+	read_lock(&pool->lock);
+	obj = handle_to_obj(handle);
+	obj_to_location(obj, &zpdesc, &obj_idx);
+
+	zspage = get_zspage(zpdesc);
+	zspage_read_lock(zspage);
+	read_unlock(&pool->lock);
+
+	objcg = zspage->objcgs[obj_idx];
+	zspage_read_unlock(zspage);
+
+	return objcg;
+}
+
 unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
 	return atomic_long_read(&pool->pages_allocated);
diff --git a/mm/zswap.c b/mm/zswap.c
index 68b87c3cc326..436066965413 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -193,7 +193,6 @@ struct zswap_entry {
 	bool referenced;
 	struct zswap_pool *pool;
 	unsigned long handle;
-	struct obj_cgroup *objcg;
 	struct list_head lru;
 };
 
@@ -602,25 +601,13 @@ static int zswap_enabled_param_set(const char *val,
 * lru functions
 **********************************/
 
-/* should be called under RCU */
-#ifdef CONFIG_MEMCG
-static inline struct mem_cgroup *mem_cgroup_from_entry(struct zswap_entry *entry)
-{
-	return entry->objcg ? obj_cgroup_memcg(entry->objcg) : NULL;
-}
-#else
-static inline struct mem_cgroup *mem_cgroup_from_entry(struct zswap_entry *entry)
-{
-	return NULL;
-}
-#endif
-
 static inline int entry_to_nid(struct zswap_entry *entry)
 {
 	return page_to_nid(virt_to_page(entry));
 }
 
-static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry *entry)
+static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry *entry,
+			  struct obj_cgroup *objcg)
 {
 	int nid = entry_to_nid(entry);
 	struct mem_cgroup *memcg;
@@ -637,19 +624,20 @@ static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry *entry)
 	 * Similar reasoning holds for list_lru_del().
 	 */
 	rcu_read_lock();
-	memcg = mem_cgroup_from_entry(entry);
+	memcg = objcg ? obj_cgroup_memcg(objcg) : NULL;
 	/* will always succeed */
 	list_lru_add(list_lru, &entry->lru, nid, memcg);
 	rcu_read_unlock();
 }
 
-static void zswap_lru_del(struct list_lru *list_lru, struct zswap_entry *entry)
+static void zswap_lru_del(struct list_lru *list_lru, struct zswap_entry *entry,
+			  struct obj_cgroup *objcg)
 {
 	int nid = entry_to_nid(entry);
 	struct mem_cgroup *memcg;
 
 	rcu_read_lock();
-	memcg = mem_cgroup_from_entry(entry);
+	memcg = objcg ? obj_cgroup_memcg(objcg) : NULL;
 	/* will always succeed */
 	list_lru_del(list_lru, &entry->lru, nid, memcg);
 	rcu_read_unlock();
@@ -717,12 +705,15 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
  */
 static void zswap_entry_free(struct zswap_entry *entry)
 {
-	zswap_lru_del(&zswap_list_lru, entry);
+	struct obj_cgroup *objcg = zs_lookup_objcg(entry->pool->zs_pool,
+						   entry->handle);
+
+	zswap_lru_del(&zswap_list_lru, entry, objcg);
 	zs_free(entry->pool->zs_pool, entry->handle);
 	zswap_pool_put(entry->pool);
-	if (entry->objcg) {
-		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
-		obj_cgroup_put(entry->objcg);
+	if (objcg) {
+		obj_cgroup_uncharge_zswap(objcg, entry->length);
+		obj_cgroup_put(objcg);
 	}
 	if (entry->length == PAGE_SIZE)
 		atomic_long_dec(&zswap_stored_incompressible_pages);
@@ -995,6 +986,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	struct mempolicy *mpol;
 	bool folio_was_allocated;
 	struct swap_info_struct *si;
+	struct obj_cgroup *objcg;
 	int ret = 0;
 
 	/* try to allocate swap cache folio */
@@ -1044,8 +1036,9 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	xa_erase(tree, offset);
 
 	count_vm_event(ZSWPWB);
-	if (entry->objcg)
-		count_objcg_events(entry->objcg, ZSWPWB, 1);
+	objcg = zs_lookup_objcg(entry->pool->zs_pool, entry->handle);
+	if (objcg)
+		count_objcg_events(objcg, ZSWPWB, 1);
 
 	zswap_entry_free(entry);
 
@@ -1464,11 +1457,10 @@ static bool zswap_store_page(struct page *page,
 	 */
 	entry->pool = pool;
 	entry->swpentry = page_swpentry;
-	entry->objcg = objcg;
 	entry->referenced = true;
 	if (entry->length) {
 		INIT_LIST_HEAD(&entry->lru);
-		zswap_lru_add(&zswap_list_lru, entry);
+		zswap_lru_add(&zswap_list_lru, entry, objcg);
 	}
 
 	return true;
@@ -1593,6 +1585,7 @@ int zswap_load(struct folio *folio)
 	bool swapcache = folio_test_swapcache(folio);
 	struct xarray *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
+	struct obj_cgroup *objcg;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
 
@@ -1621,8 +1614,9 @@ int zswap_load(struct folio *folio)
 	folio_mark_uptodate(folio);
 
 	count_vm_event(ZSWPIN);
-	if (entry->objcg)
-		count_objcg_events(entry->objcg, ZSWPIN, 1);
+	objcg = zs_lookup_objcg(entry->pool->zs_pool, entry->handle);
+	if (objcg)
+		count_objcg_events(objcg, ZSWPIN, 1);
 
 	/*
 	 * When reading into the swapcache, invalidate our entry. The
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (5 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 06/11] mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-12 21:42   ` Johannes Weiner
  2026-03-11 19:51 ` [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel,
	kernel-team

Now that zswap_entries do not directly track obj_cgroups of the entries,
handle the lifetime management and charging of these entries into the
zsmalloc layer.

One functional change is that zswap entries are now no longer accounted
by the size of the compressed object, but by the size of the size_class
slot they occupy.

This brings charging one step closer to an accurate representation of
the memory consumed in the zpdesc; even if a compressed object doesn't
consume the entirety of a obj slot, we should account the entirety of
the compressed object slot that the object makes unusable.

While at it, also remove an unnecessary newline in obj_free.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 include/linux/memcontrol.h | 10 ------
 mm/memcontrol.c            | 54 ++-----------------------------
 mm/zsmalloc.c              | 65 ++++++++++++++++++++++++++++++++++++--
 mm/zswap.c                 |  8 -----
 4 files changed, 66 insertions(+), 71 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0652db4ff2d5..701d9ab6fef1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1851,22 +1851,12 @@ static inline bool memcg_is_dying(struct mem_cgroup *memcg)
 
 #if defined(CONFIG_MEMCG) && defined(CONFIG_ZSWAP)
 bool obj_cgroup_may_zswap(struct obj_cgroup *objcg);
-void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size);
-void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size);
 bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg);
 #else
 static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
 {
 	return true;
 }
-static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg,
-					   size_t size)
-{
-}
-static inline void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg,
-					     size_t size)
-{
-}
 static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
 {
 	/* if zswap is disabled, do not block pages going to the swapping device */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a52da3a5e4fd..68139be66a4f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -716,6 +716,7 @@ void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
 
 	put_cpu();
 }
+EXPORT_SYMBOL(mod_memcg_state);
 
 #ifdef CONFIG_MEMCG_V1
 /* idx can be of type enum memcg_stat_item or node_stat_item. */
@@ -3169,11 +3170,13 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 {
 	return obj_cgroup_charge_account(objcg, gfp, size, NULL, 0);
 }
+EXPORT_SYMBOL(obj_cgroup_charge);
 
 void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
 {
 	refill_obj_stock(objcg, size, true, 0, NULL, 0);
 }
+EXPORT_SYMBOL(obj_cgroup_uncharge);
 
 static inline size_t obj_full_size(struct kmem_cache *s)
 {
@@ -5488,57 +5491,6 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
 	return ret;
 }
 
-/**
- * obj_cgroup_charge_zswap - charge compression backend memory
- * @objcg: the object cgroup
- * @size: size of compressed object
- *
- * This forces the charge after obj_cgroup_may_zswap() allowed
- * compression and storage in zswap for this cgroup to go ahead.
- */
-void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size)
-{
-	struct mem_cgroup *memcg;
-
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		return;
-
-	VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC));
-
-	/* PF_MEMALLOC context, charging must succeed */
-	if (obj_cgroup_charge(objcg, GFP_KERNEL, size))
-		VM_WARN_ON_ONCE(1);
-
-	rcu_read_lock();
-	memcg = obj_cgroup_memcg(objcg);
-	mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
-	mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
-	rcu_read_unlock();
-}
-
-/**
- * obj_cgroup_uncharge_zswap - uncharge compression backend memory
- * @objcg: the object cgroup
- * @size: size of compressed object
- *
- * Uncharges zswap memory on page in.
- */
-void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
-{
-	struct mem_cgroup *memcg;
-
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		return;
-
-	obj_cgroup_uncharge(objcg, size);
-
-	rcu_read_lock();
-	memcg = obj_cgroup_memcg(objcg);
-	mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size);
-	mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1);
-	rcu_read_unlock();
-}
-
 bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
 {
 	/* if zswap is disabled, do not block pages going to the swapping device */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a94ca8c26ad9..291194572a09 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1028,6 +1028,59 @@ static bool zspage_empty(struct zspage *zspage)
 	return get_zspage_inuse(zspage) == 0;
 }
 
+#ifdef CONFIG_MEMCG
+static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
+			    int size)
+{
+	struct mem_cgroup *memcg;
+
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		return;
+
+	VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC));
+	WARN_ON_ONCE(!pool->memcg_aware);
+
+	/* PF_MEMALLOC context, charging must succeed */
+	if (obj_cgroup_charge(objcg, GFP_KERNEL, size))
+		VM_WARN_ON_ONCE(1);
+
+	rcu_read_lock();
+	memcg = obj_cgroup_memcg(objcg);
+	mod_memcg_state(memcg, pool->compressed_stat, size);
+	mod_memcg_state(memcg, pool->uncompressed_stat, 1);
+	rcu_read_unlock();
+}
+
+static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
+			      int size)
+{
+	struct mem_cgroup *memcg;
+
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		return;
+
+	WARN_ON_ONCE(!pool->memcg_aware);
+
+	obj_cgroup_uncharge(objcg, size);
+
+	rcu_read_lock();
+	memcg = obj_cgroup_memcg(objcg);
+	mod_memcg_state(memcg, pool->compressed_stat, -size);
+	mod_memcg_state(memcg, pool->uncompressed_stat, -1);
+	rcu_read_unlock();
+}
+#else
+static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
+			    int size)
+{
+}
+
+static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
+			      int size)
+{
+}
+#endif
+
 /**
  * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
  * that hold objects of the provided size.
@@ -1244,6 +1297,8 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
 	if (objcg) {
 		WARN_ON_ONCE(!pool->memcg_aware);
 		zspage->objcgs[obj_idx] = objcg;
+		obj_cgroup_get(objcg);
+		zs_charge_objcg(pool, objcg, class->size);
 	}
 
 	if (!ZsHugePage(zspage))
@@ -1409,17 +1464,23 @@ static void obj_free(int class_size, unsigned long obj)
 	struct link_free *link;
 	struct zspage *zspage;
 	struct zpdesc *f_zpdesc;
+	struct zs_pool *pool;
 	unsigned long f_offset;
 	unsigned int f_objidx;
 	void *vaddr;
 
-
 	obj_to_location(obj, &f_zpdesc, &f_objidx);
 	f_offset = offset_in_page(class_size * f_objidx);
 	zspage = get_zspage(f_zpdesc);
+	pool = zspage->pool;
+
+	if (pool->memcg_aware && zspage->objcgs[f_objidx]) {
+		struct obj_cgroup *objcg = zspage->objcgs[f_objidx];
 
-	if (zspage->pool->memcg_aware)
+		zs_uncharge_objcg(pool, objcg, class_size);
+		obj_cgroup_put(objcg);
 		zspage->objcgs[f_objidx] = NULL;
+	}
 
 	vaddr = kmap_local_zpdesc(f_zpdesc);
 	link = (struct link_free *)(vaddr + f_offset);
diff --git a/mm/zswap.c b/mm/zswap.c
index 436066965413..bca29a6e18f3 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -711,10 +711,6 @@ static void zswap_entry_free(struct zswap_entry *entry)
 	zswap_lru_del(&zswap_list_lru, entry, objcg);
 	zs_free(entry->pool->zs_pool, entry->handle);
 	zswap_pool_put(entry->pool);
-	if (objcg) {
-		obj_cgroup_uncharge_zswap(objcg, entry->length);
-		obj_cgroup_put(objcg);
-	}
 	if (entry->length == PAGE_SIZE)
 		atomic_long_dec(&zswap_stored_incompressible_pages);
 	zswap_entry_cache_free(entry);
@@ -1437,10 +1433,6 @@ static bool zswap_store_page(struct page *page,
 	 * when the entry is removed from the tree.
 	 */
 	zswap_pool_get(pool);
-	if (objcg) {
-		obj_cgroup_get(objcg);
-		obj_cgroup_charge_zswap(objcg, entry->length);
-	}
 	atomic_long_inc(&zswap_stored_pages);
 	if (entry->length == PAGE_SIZE)
 		atomic_long_inc(&zswap_stored_incompressible_pages);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (6 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 20:33   ` Nhat Pham
  2026-03-11 19:51 ` [PATCH 09/11] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, cgroups, linux-mm, linux-kernel,
	kernel-team

Zswap compresses and uncompresses in PAGE_SIZE units, which simplifies
the accounting for how much memory it has compressed. However, when a
compressed object is stored at the boundary of two zspages, accounting
at a PAGE_SIZE granularity makes it difficult to fractionally charge
each backing zspage with the ratio of memory it backs for the
compressed object.

To make sub-PAGE_SIZE granularity charging possible for MEMCG_ZSWAPPED,
track the value in bytes and adjust its accounting accordingly.

No functional changes intended.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 include/linux/memcontrol.h | 2 +-
 mm/memcontrol.c            | 5 +++--
 mm/zsmalloc.c              | 4 ++--
 mm/zswap.c                 | 8 +++++---
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 701d9ab6fef1..ce2e598b5963 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -38,7 +38,7 @@ enum memcg_stat_item {
 	MEMCG_VMALLOC,
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
-	MEMCG_ZSWAPPED,
+	MEMCG_ZSWAPPED_B,
 	MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 68139be66a4f..1cb02d2febe8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -342,7 +342,7 @@ static const unsigned int memcg_stat_items[] = {
 	MEMCG_VMALLOC,
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
-	MEMCG_ZSWAPPED,
+	MEMCG_ZSWAPPED_B,
 };
 
 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
@@ -1364,7 +1364,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "shmem",			NR_SHMEM			},
 #ifdef CONFIG_ZSWAP
 	{ "zswap",			MEMCG_ZSWAP_B			},
-	{ "zswapped",			MEMCG_ZSWAPPED			},
+	{ "zswapped",			MEMCG_ZSWAPPED_B		},
 #endif
 	{ "file_mapped",		NR_FILE_MAPPED			},
 	{ "file_dirty",			NR_FILE_DIRTY			},
@@ -1412,6 +1412,7 @@ static int memcg_page_state_unit(int item)
 	switch (item) {
 	case MEMCG_PERCPU_B:
 	case MEMCG_ZSWAP_B:
+	case MEMCG_ZSWAPPED_B:
 	case NR_SLAB_RECLAIMABLE_B:
 	case NR_SLAB_UNRECLAIMABLE_B:
 		return 1;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 291194572a09..24665d7cd4a9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1047,7 +1047,7 @@ static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
 	rcu_read_lock();
 	memcg = obj_cgroup_memcg(objcg);
 	mod_memcg_state(memcg, pool->compressed_stat, size);
-	mod_memcg_state(memcg, pool->uncompressed_stat, 1);
+	mod_memcg_state(memcg, pool->uncompressed_stat, PAGE_SIZE);
 	rcu_read_unlock();
 }
 
@@ -1066,7 +1066,7 @@ static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
 	rcu_read_lock();
 	memcg = obj_cgroup_memcg(objcg);
 	mod_memcg_state(memcg, pool->compressed_stat, -size);
-	mod_memcg_state(memcg, pool->uncompressed_stat, -1);
+	mod_memcg_state(memcg, pool->uncompressed_stat, -(int)PAGE_SIZE);
 	rcu_read_unlock();
 }
 #else
diff --git a/mm/zswap.c b/mm/zswap.c
index bca29a6e18f3..d81e2db4490b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -257,7 +257,7 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 	/* unique name for each pool specifically required by zsmalloc */
 	snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
 	pool->zs_pool = zs_create_pool(name, true, MEMCG_ZSWAP_B,
-				       MEMCG_ZSWAPPED);
+				       MEMCG_ZSWAPPED_B);
 	if (!pool->zs_pool)
 		goto error;
 
@@ -1214,8 +1214,10 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
 	 */
 	if (!mem_cgroup_disabled()) {
 		mem_cgroup_flush_stats(memcg);
-		nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;
-		nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
+		nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B);
+		nr_backing >>= PAGE_SHIFT;
+		nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED_B);
+		nr_stored >>= PAGE_SHIFT;
 	} else {
 		nr_backing = zswap_total_pages();
 		nr_stored = atomic_long_read(&zswap_stored_pages);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/11] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (7 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 19:51 ` [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage Joshua Hahn
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham,
	Chengming Zhou, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Andrew Morton,
	cgroups, linux-mm, linux-kernel, kernel-team

Now that memcg charging happens in the zsmalloc layer where we have both
objcg and page information, we can specify which node's memcg lruvec
zswapped memory should be accounted to.

Move MEMCG_ZSWAP_B and MEMCG_ZSWAPPED_B from enum memcg_stat_item to
enum node_stat_item. Reanme their prefixes from MEMCG to NR to reflect
this move as well.

In addition, decouple the updates of node stats (vmstat) and
memcg-lruvec stats, since node stats can only track values at a
PAGE_SIZE granularity.

As a result of tracking zswap statistics at a finer granularity, the
charging from zsmalloc also gets more complicated to cover the cases
when the compressed object spans two zpdescs, which both live on
different nodes. In this case, the memcg-lruvec of both node-memcg
combinations are partially charged.

memcg-lruvec stats are now updated precisely and proportionally when
compressed objects are split across pages. Unfortunately for node stats,
only NR_ZSWAP_B can be kept accurate. NR_ZSWAPPED_B works as a good
best-effort value, but cannot proportionally account for compressed
objects split across nodes due to the coarse PAGE_SIZE granularity of
node stats. For such objects, NR_ZSWAPPED_B is accounted to the first
zpdesc's node stats.

Note that this is not a new inaccuracy, but one that is simply left
unable to be fixed as part of these changes. The small inaccuracy is
accepted in place of invasive changes across all of vmstat
infrastructure to begin tracking stats at byte granularity.

Finally, note that handling of objcg migrations across zspages (and
their subsequent migrations across nodes) are handled in the next patch.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 include/linux/memcontrol.h |   5 +-
 include/linux/mmzone.h     |   2 +
 include/linux/zsmalloc.h   |   6 +--
 mm/memcontrol.c            |  22 ++++----
 mm/vmstat.c                |   2 +
 mm/zsmalloc.c              | 104 +++++++++++++++++++++++++++----------
 mm/zswap.c                 |   7 ++-
 7 files changed, 102 insertions(+), 46 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ce2e598b5963..b03501e0c09b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -37,8 +37,6 @@ enum memcg_stat_item {
 	MEMCG_PERCPU_B,
 	MEMCG_VMALLOC,
 	MEMCG_KMEM,
-	MEMCG_ZSWAP_B,
-	MEMCG_ZSWAPPED_B,
 	MEMCG_NR_STAT,
 };
 
@@ -927,6 +925,9 @@ struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim,
 					    struct mem_cgroup *oom_domain);
 void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
 
+void mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
+			    int val);
+
 /* idx can be of type enum memcg_stat_item or node_stat_item */
 void mod_memcg_state(struct mem_cgroup *memcg,
 		     enum memcg_stat_item idx, int val);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..ae16a90491ac 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -258,6 +258,8 @@ enum node_stat_item {
 #ifdef CONFIG_HUGETLB_PAGE
 	NR_HUGETLB,
 #endif
+	NR_ZSWAP_B,
+	NR_ZSWAPPED_B,
 	NR_BALLOON_PAGES,
 	NR_KERNEL_FILE_PAGES,
 	NR_VM_NODE_STAT_ITEMS
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 6010d8dac9ff..fd79916c7740 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -24,11 +24,11 @@ struct zs_pool_stats {
 struct zs_pool;
 struct scatterlist;
 struct obj_cgroup;
-enum memcg_stat_item;
+enum node_stat_item;
 
 struct zs_pool *zs_create_pool(const char *name, bool memcg_aware,
-			       enum memcg_stat_item compressed_stat,
-			       enum memcg_stat_item uncompressed_stat);
+			       enum node_stat_item compressed_stat,
+			       enum node_stat_item uncompressed_stat);
 void zs_destroy_pool(struct zs_pool *pool);
 
 unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1cb02d2febe8..d87bc4beff16 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -333,6 +333,8 @@ static const unsigned int memcg_node_stat_items[] = {
 #ifdef CONFIG_HUGETLB_PAGE
 	NR_HUGETLB,
 #endif
+	NR_ZSWAP_B,
+	NR_ZSWAPPED_B,
 };
 
 static const unsigned int memcg_stat_items[] = {
@@ -341,8 +343,6 @@ static const unsigned int memcg_stat_items[] = {
 	MEMCG_PERCPU_B,
 	MEMCG_VMALLOC,
 	MEMCG_KMEM,
-	MEMCG_ZSWAP_B,
-	MEMCG_ZSWAPPED_B,
 };
 
 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
@@ -737,9 +737,8 @@ unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
 }
 #endif
 
-static void mod_memcg_lruvec_state(struct lruvec *lruvec,
-				     enum node_stat_item idx,
-				     int val)
+void mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
+			    int val)
 {
 	struct mem_cgroup_per_node *pn;
 	struct mem_cgroup *memcg;
@@ -766,6 +765,7 @@ static void mod_memcg_lruvec_state(struct lruvec *lruvec,
 
 	put_cpu();
 }
+EXPORT_SYMBOL(mod_memcg_lruvec_state);
 
 /**
  * mod_lruvec_state - update lruvec memory statistics
@@ -1363,8 +1363,8 @@ static const struct memory_stat memory_stats[] = {
 	{ "vmalloc",			MEMCG_VMALLOC			},
 	{ "shmem",			NR_SHMEM			},
 #ifdef CONFIG_ZSWAP
-	{ "zswap",			MEMCG_ZSWAP_B			},
-	{ "zswapped",			MEMCG_ZSWAPPED_B		},
+	{ "zswap",			NR_ZSWAP_B			},
+	{ "zswapped",			NR_ZSWAPPED_B			},
 #endif
 	{ "file_mapped",		NR_FILE_MAPPED			},
 	{ "file_dirty",			NR_FILE_DIRTY			},
@@ -1411,8 +1411,8 @@ static int memcg_page_state_unit(int item)
 {
 	switch (item) {
 	case MEMCG_PERCPU_B:
-	case MEMCG_ZSWAP_B:
-	case MEMCG_ZSWAPPED_B:
+	case NR_ZSWAP_B:
+	case NR_ZSWAPPED_B:
 	case NR_SLAB_RECLAIMABLE_B:
 	case NR_SLAB_UNRECLAIMABLE_B:
 		return 1;
@@ -5482,7 +5482,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
 
 		/* Force flush to get accurate stats for charging */
 		__mem_cgroup_flush_stats(memcg, true);
-		pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE;
+		pages = memcg_page_state(memcg, NR_ZSWAP_B) / PAGE_SIZE;
 		if (pages < max)
 			continue;
 		ret = false;
@@ -5511,7 +5511,7 @@ static u64 zswap_current_read(struct cgroup_subsys_state *css,
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
 	mem_cgroup_flush_stats(memcg);
-	return memcg_page_state(memcg, MEMCG_ZSWAP_B);
+	return memcg_page_state(memcg, NR_ZSWAP_B);
 }
 
 static int zswap_max_show(struct seq_file *m, void *v)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 86b14b0f77b5..389ff986ceac 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1279,6 +1279,8 @@ const char * const vmstat_text[] = {
 #ifdef CONFIG_HUGETLB_PAGE
 	[I(NR_HUGETLB)]				= "nr_hugetlb",
 #endif
+	[I(NR_ZSWAP_B)]				= "zswap",
+	[I(NR_ZSWAPPED_B)]			= "zswapped",
 	[I(NR_BALLOON_PAGES)]			= "nr_balloon_pages",
 	[I(NR_KERNEL_FILE_PAGES)]		= "nr_kernel_file_pages",
 #undef I
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 24665d7cd4a9..ab085961b0e2 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -216,8 +216,8 @@ struct zs_pool {
 	struct work_struct free_work;
 #endif
 	bool memcg_aware;
-	enum memcg_stat_item compressed_stat;
-	enum memcg_stat_item uncompressed_stat;
+	enum node_stat_item compressed_stat;
+	enum node_stat_item uncompressed_stat;
 	/* protect zspage migration/compaction */
 	rwlock_t lock;
 	atomic_t compaction_in_progress;
@@ -823,6 +823,9 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
 		reset_zpdesc(zpdesc);
 		zpdesc_unlock(zpdesc);
 		zpdesc_dec_zone_page_state(zpdesc);
+		if (pool->memcg_aware)
+			dec_node_page_state(zpdesc_page(zpdesc),
+					    pool->compressed_stat);
 		zpdesc_put(zpdesc);
 		zpdesc = next;
 	} while (zpdesc != NULL);
@@ -974,6 +977,9 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 		__zpdesc_set_zsmalloc(zpdesc);
 
 		zpdesc_inc_zone_page_state(zpdesc);
+		if (pool->memcg_aware)
+			inc_node_page_state(zpdesc_page(zpdesc),
+					    pool->compressed_stat);
 		zpdescs[i] = zpdesc;
 	}
 
@@ -985,6 +991,9 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 err:
 	while (--i >= 0) {
 		zpdesc_dec_zone_page_state(zpdescs[i]);
+		if (pool->memcg_aware)
+			dec_node_page_state(zpdesc_page(zpdescs[i]),
+					    pool->compressed_stat);
 		free_zpdesc(zpdescs[i]);
 	}
 	if (pool->memcg_aware)
@@ -1029,10 +1038,48 @@ static bool zspage_empty(struct zspage *zspage)
 }
 
 #ifdef CONFIG_MEMCG
-static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
-			    int size)
+static void __zs_mod_memcg_lruvec(struct zs_pool *pool, struct zpdesc *zpdesc,
+				  struct obj_cgroup *objcg, int size,
+				  int sign, unsigned long offset)
 {
 	struct mem_cgroup *memcg;
+	struct lruvec *lruvec;
+	int compressed_size = size, original_size = PAGE_SIZE;
+	int nid = page_to_nid(zpdesc_page(zpdesc));
+	int next_nid = nid;
+
+	if (offset + size > PAGE_SIZE) {
+		struct zpdesc *next_zpdesc = get_next_zpdesc(zpdesc);
+
+		next_nid = page_to_nid(zpdesc_page(next_zpdesc));
+		if (nid != next_nid) {
+			compressed_size = PAGE_SIZE - offset;
+			original_size = (PAGE_SIZE * compressed_size) / size;
+		}
+	}
+
+	rcu_read_lock();
+	memcg = obj_cgroup_memcg(objcg);
+	lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid));
+	mod_memcg_lruvec_state(lruvec, pool->compressed_stat,
+			       sign * compressed_size);
+	mod_memcg_lruvec_state(lruvec, pool->uncompressed_stat,
+			       sign * original_size);
+
+	if (nid != next_nid) {
+		lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(next_nid));
+		mod_memcg_lruvec_state(lruvec, pool->compressed_stat,
+				       sign * (size - compressed_size));
+		mod_memcg_lruvec_state(lruvec, pool->uncompressed_stat,
+				       sign * (PAGE_SIZE - original_size));
+	}
+	rcu_read_unlock();
+}
+
+static void zs_charge_objcg(struct zs_pool *pool, struct zpdesc *zpdesc,
+			    struct obj_cgroup *objcg, int size,
+			    unsigned long offset)
+{
 
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
@@ -1044,18 +1091,19 @@ static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
 	if (obj_cgroup_charge(objcg, GFP_KERNEL, size))
 		VM_WARN_ON_ONCE(1);
 
-	rcu_read_lock();
-	memcg = obj_cgroup_memcg(objcg);
-	mod_memcg_state(memcg, pool->compressed_stat, size);
-	mod_memcg_state(memcg, pool->uncompressed_stat, PAGE_SIZE);
-	rcu_read_unlock();
+	__zs_mod_memcg_lruvec(pool, zpdesc, objcg, size, 1, offset);
+
+	/*
+	 * Node-level vmstats are charged in PAGE_SIZE units. As a best-effort,
+	 * always charge the uncompressed stats to the first zpdesc.
+	 */
+	inc_node_page_state(zpdesc_page(zpdesc), pool->uncompressed_stat);
 }
 
-static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
-			      int size)
+static void zs_uncharge_objcg(struct zs_pool *pool, struct zpdesc *zpdesc,
+			      struct obj_cgroup *objcg, int size,
+			      unsigned long offset)
 {
-	struct mem_cgroup *memcg;
-
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
 
@@ -1063,20 +1111,24 @@ static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
 
 	obj_cgroup_uncharge(objcg, size);
 
-	rcu_read_lock();
-	memcg = obj_cgroup_memcg(objcg);
-	mod_memcg_state(memcg, pool->compressed_stat, -size);
-	mod_memcg_state(memcg, pool->uncompressed_stat, -(int)PAGE_SIZE);
-	rcu_read_unlock();
+	__zs_mod_memcg_lruvec(pool, zpdesc, objcg, size, -1, offset);
+
+	/*
+	 * Node-level vmstats are charged in PAGE_SIZE units. As a best-effort,
+	 * always uncharged the uncompressed stats from the first zpdesc.
+	 */
+	dec_node_page_state(zpdesc_page(zpdesc), pool->uncompressed_stat);
 }
 #else
-static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
-			    int size)
+static void zs_charge_objcg(struct zs_pool *pool, struct zpdesc *zpdesc,
+			    struct obj_cgroup *objcg, int size,
+			    unsigned long offset)
 {
 }
 
-static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
-			      int size)
+static void zs_uncharge_objcg(struct zs_pool *pool, struct zpdesc *zpdesc,
+			      struct obj_cgroup *objcg, int size,
+			      unsigned long offset)
 {
 }
 #endif
@@ -1298,7 +1350,7 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
 		WARN_ON_ONCE(!pool->memcg_aware);
 		zspage->objcgs[obj_idx] = objcg;
 		obj_cgroup_get(objcg);
-		zs_charge_objcg(pool, objcg, class->size);
+		zs_charge_objcg(pool, zpdesc, objcg, class->size, off);
 	}
 
 	if (!ZsHugePage(zspage))
@@ -1477,7 +1529,7 @@ static void obj_free(int class_size, unsigned long obj)
 	if (pool->memcg_aware && zspage->objcgs[f_objidx]) {
 		struct obj_cgroup *objcg = zspage->objcgs[f_objidx];
 
-		zs_uncharge_objcg(pool, objcg, class_size);
+		zs_uncharge_objcg(pool, f_zpdesc, objcg, class_size, f_offset);
 		obj_cgroup_put(objcg);
 		zspage->objcgs[f_objidx] = NULL;
 	}
@@ -2191,8 +2243,8 @@ static int calculate_zspage_chain_size(int class_size)
  * otherwise NULL.
  */
 struct zs_pool *zs_create_pool(const char *name, bool memcg_aware,
-			       enum memcg_stat_item compressed_stat,
-			       enum memcg_stat_item uncompressed_stat)
+			       enum node_stat_item compressed_stat,
+			       enum node_stat_item uncompressed_stat)
 {
 	int i;
 	struct zs_pool *pool;
diff --git a/mm/zswap.c b/mm/zswap.c
index d81e2db4490b..2e9352b46693 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -256,8 +256,7 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 
 	/* unique name for each pool specifically required by zsmalloc */
 	snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
-	pool->zs_pool = zs_create_pool(name, true, MEMCG_ZSWAP_B,
-				       MEMCG_ZSWAPPED_B);
+	pool->zs_pool = zs_create_pool(name, true, NR_ZSWAP_B, NR_ZSWAPPED_B);
 	if (!pool->zs_pool)
 		goto error;
 
@@ -1214,9 +1213,9 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
 	 */
 	if (!mem_cgroup_disabled()) {
 		mem_cgroup_flush_stats(memcg);
-		nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B);
+		nr_backing = memcg_page_state(memcg, NR_ZSWAP_B);
 		nr_backing >>= PAGE_SHIFT;
-		nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED_B);
+		nr_stored = memcg_page_state(memcg, NR_ZSWAPPED_B);
 		nr_stored >>= PAGE_SHIFT;
 	} else {
 		nr_backing = zswap_total_pages();
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (8 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 09/11] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-12  3:51   ` kernel test robot
  2026-03-12  3:51   ` kernel test robot
  2026-03-11 19:51 ` [PATCH 11/11] mm/zsmalloc: Handle charge migration in zpdesc substitution Joshua Hahn
  2026-03-11 19:54 ` [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
  11 siblings, 2 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham, Harry Yoo,
	Andrew Morton, linux-mm, linux-kernel, kernel-team

In zsmalloc, there are two types of migrations: Migrations of single
compressed objects from one zspage to another, and substitutions of
zpdescs from zspages.

In both of these migrations, memcg association for the compressed
objects do not change. However, the physical location of the compressed
objects may change, which alters their lruvec association.

In this patch, handle the single compressed object migration and
transfer lruvec and node statistics across the affected lruvecs / nodes.

Zsmalloc compressed objects, like slab objects, can span two pages.
When a spanning object is migrated, possibly to another zspage where
it spans two zpdescs, up to 4 nodes can be touched.

Instead of enumerating all possible combinations of node migrations,
simply uncharge entirely from the source (1 or 2 nodes) and charge
entirely to the destination (1 or 2 nodes).

              s_off                         d_off
                 v                           v
         ----------+ +----               -----+ +---------
     ... ooo ooo xx| |x oo ...  -->  ... ooo x| |xx ooo oo ...
         ----------+ +----               -----+ +---------
             pg1      pg2                  pg3      pg4

            s_zspage                        d_zspage

To do this, calculate how much of the compressed object lives on each
page and perform up to 4 uncharge-charges.

Note that these operations cannot call the existing
zs_{charge, uncharge}_objcg functions we introduced, since we are
holding the class spin lock and obj_cgroup_charge can sleep.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/zsmalloc.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 70 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ab085961b0e2..f3508ff8b3ab 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1684,15 +1684,81 @@ static unsigned long find_alloced_obj(struct size_class *class,
 	return handle;
 }
 
+#ifdef CONFIG_MEMCG
 static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
-			     unsigned long used_obj, unsigned long free_obj)
+			     unsigned long used_obj, unsigned long free_obj,
+			     struct zs_pool *pool, int size)
 {
-	unsigned int s_idx = used_obj & OBJ_INDEX_MASK;
-	unsigned int d_idx = free_obj & OBJ_INDEX_MASK;
+	struct zpdesc *s_zpdesc, *d_zpdesc;
+	struct obj_cgroup *objcg;
+	struct mem_cgroup *memcg;
+	struct lruvec *l;
+	unsigned int s_idx, d_idx;
+	unsigned int s_off, d_off;
+	int charges[4], nids[4], partial;
+	int s_bytes_in_page, d_bytes_in_page;
+	int i;
+
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		goto out;
+
+	obj_to_location(used_obj, &s_zpdesc, &s_idx);
+	obj_to_location(free_obj, &d_zpdesc, &d_idx);
+
+	objcg = s_zspage->objcgs[s_idx];
+	if (!objcg)
+		goto out;
+
+	/*
+	 * The object migration here can touch up to 4 nodes.
+	 * Instead of breaking down all possible combinations of node changes,
+	 * just uncharge entirely from the source and charge entirely to the
+	 * destination, even if there is are node overlaps between src and dst.
+	 */
+	s_off = (s_idx * size) % PAGE_SIZE;
+	d_off = (d_idx * size) % PAGE_SIZE;
+	s_bytes_in_page = min_t(int, size, PAGE_SIZE - s_off);
+	d_bytes_in_page = min_t(int, size, PAGE_SIZE - d_off);
+
+	charges[0] = -s_bytes_in_page;
+	nids[0] = page_to_nid(zpdesc_page(s_zpdesc));
+	charges[1] = -(size - s_bytes_in_page); /* 0 if object doesn't span */
+	if (charges[1])
+		nids[1] = page_to_nid(zpdesc_page(get_next_zpdesc(s_zpdesc)));
+
+	charges[2] = d_bytes_in_page;
+	nids[2] = page_to_nid(zpdesc_page(d_zpdesc));
+	charges[3] = size - d_bytes_in_page; /* 0 if object doesn't span */
+	if (charges[3])
+		nids[3] = page_to_nid(zpdesc_page(get_next_zpdesc(d_zpdesc)));
 
+	rcu_read_lock();
+	memcg = obj_cgroup_memcg(objcg);
+	for (i = 0; i < 4; i++) {
+		if (!charges[i])
+			continue;
+
+		l = mem_cgroup_lruvec(memcg, NODE_DATA(nids[i]));
+		partial = (PAGE_SIZE * charges[i]) / size;
+		mod_memcg_lruvec_state(l, pool->compressed_stat, charges[i]);
+		mod_memcg_lruvec_state(l, pool->uncompressed_stat, partial);
+	}
+	rcu_read_unlock();
+
+	dec_node_page_state(zpdesc_page(s_zpdesc), pool->uncompressed_stat);
+	inc_node_page_state(zpdesc_page(d_zpdesc), pool->uncompressed_stat);
+
+out:
 	d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
 	s_zspage->objcgs[s_idx] = NULL;
 }
+#else
+static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
+			     unsigned long used_obj, unsigned long free_obj,
+			     struct zs_pool *pool, int size)
+{
+}
+#endif
 
 static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
 			   struct zspage *dst_zspage)
@@ -1719,7 +1785,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
 
 		if (pool->memcg_aware)
 			zs_migrate_objcg(src_zspage, dst_zspage,
-					 used_obj, free_obj);
+					 used_obj, free_obj, pool, class->size);
 
 		obj_idx++;
 		obj_free(class->size, used_obj);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/11] mm/zsmalloc: Handle charge migration in zpdesc substitution
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (9 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage Joshua Hahn
@ 2026-03-11 19:51 ` Joshua Hahn
  2026-03-11 19:54 ` [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
  11 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:51 UTC (permalink / raw)
  To: Minchan Kim, Sergey Senozhatsky
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Nhat Pham, Harry Yoo,
	Andrew Morton, linux-mm, linux-kernel, kernel-team

In zsmalloc, there are two types of migrations: Migrations of single
compressed objects from one zspage to another, and substitutions of
zpdescs from zspages.

In both of these migrations, memcg association for the compressed
objects do not change. However, the physical location of the compressed
objects may change, which alters their lruvec association.

In this patch, handle the substitution of zpdescs from zspages, which
may change the node of all objects present (wholly or partially).

Take special care to address the partial compressed object at the
beginning of the swapped out zpdesc. "Ownership" of spanning objects
are associated to the zpdesc it begins on. Thus, when handling the
first compressed object, we must iterate through the (up to 4)
zpdescs present in the zspage to find the previous zpdesc, then
retrieve the object's zspage-wide index.

For the same reason, pool->uncompressed_stat, which can only be
accounted at PAGE_SIZE granularity for the node statistics, are
accounted for objects beginning in the zpdesc.

Likewise for the spanning object at the end of the replaced zpdesc,
account only the amount that lives on the zpdesc.

Note that these operations cannot call the existing
zs_{charge, uncharge}_objcg functions we introduced, since we are
holding the class spin lock and obj_cgroup_charge can sleep.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/zsmalloc.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f3508ff8b3ab..a4c90447d28e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1932,6 +1932,94 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
 	return page_zpdesc(page)->zspage;
 }
 
+#ifdef CONFIG_MEMCG
+static void zs_migrate_lruvec(struct zs_pool *pool, struct obj_cgroup *objcg,
+			      int old_nid, int new_nid, int charge,
+			      int obj_size)
+{
+	struct mem_cgroup *memcg;
+	struct lruvec *old_lruvec, *new_lruvec;
+	int partial;
+
+	if (old_nid == new_nid || !objcg)
+		return;
+
+	/* Proportional (partial) uncompressed share for this portion */
+	partial = (PAGE_SIZE * charge) / obj_size;
+
+	rcu_read_lock();
+	memcg = obj_cgroup_memcg(objcg);
+	old_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(old_nid));
+	new_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(new_nid));
+
+	mod_memcg_lruvec_state(old_lruvec, pool->compressed_stat, -charge);
+	mod_memcg_lruvec_state(new_lruvec, pool->compressed_stat, charge);
+
+	mod_memcg_lruvec_state(old_lruvec, pool->uncompressed_stat, -partial);
+	mod_memcg_lruvec_state(new_lruvec, pool->uncompressed_stat, partial);
+	rcu_read_unlock();
+}
+
+/*
+ * Transfer per-lruvec and node-level stats when a zspage replaces a zpdesc
+ * with one from a different NUMA node. Must be called while old_zpdesc is
+ * still linked to the zspage. memcg-level charges are unchanged.
+ */
+static void zs_page_migrate_lruvec(struct zs_pool *pool, struct zspage *zspage,
+				   struct zpdesc *old_zpdesc,
+				   struct zpdesc *new_zpdesc,
+				   struct size_class *class)
+{
+	int size = class->size;
+	int old_nid = page_to_nid(zpdesc_page(old_zpdesc));
+	int new_nid = page_to_nid(zpdesc_page(new_zpdesc));
+	unsigned int off, first_obj_offset, page_offset = 0;
+	unsigned int idx;
+	struct zpdesc *cursor = zspage->first_zpdesc;
+
+	if (old_nid == new_nid)
+		return;
+
+	while (cursor != old_zpdesc) {
+		cursor = get_next_zpdesc(cursor);
+		page_offset += PAGE_SIZE;
+	}
+
+	first_obj_offset = get_first_obj_offset(old_zpdesc);
+	idx = (page_offset + first_obj_offset) / size;
+
+	/* Boundary object spaning from the previous zpdesc*/
+	if (idx > 0 && zspage->objcgs[idx - 1])
+		zs_migrate_lruvec(pool, zspage->objcgs[idx - 1],
+				  old_nid, new_nid, first_obj_offset, size);
+
+	for (off = first_obj_offset;
+			off < PAGE_SIZE && idx < class->objs_per_zspage;
+			idx++, off += size) {
+		struct obj_cgroup *objcg = zspage->objcgs[idx];
+		int bytes_on_page = min_t(int, size, PAGE_SIZE - off);
+
+		if (!objcg)
+			continue;
+
+		zs_migrate_lruvec(pool, objcg, old_nid, new_nid,
+				  bytes_on_page, size);
+
+		dec_node_page_state(zpdesc_page(old_zpdesc),
+				    pool->uncompressed_stat);
+		inc_node_page_state(zpdesc_page(new_zpdesc),
+				    pool->uncompressed_stat);
+	}
+}
+#else
+static void zs_page_migrate_lruvec(struct zs_pool *pool, struct zspage *zspage,
+				   struct zpdesc *old_zpdesc,
+				   struct zpdesc *new_zpdesc,
+				   struct size_class *class)
+{
+}
+#endif
+
 static int zs_page_migrate(struct page *newpage, struct page *page,
 		enum migrate_mode mode)
 {
@@ -2004,6 +2092,10 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 	}
 	kunmap_local(s_addr);
 
+	/* Transfer lruvec/node stats while old zpdesc is still linked */
+	if (pool->memcg_aware)
+		zs_page_migrate_lruvec(pool, zspage, zpdesc, newzpdesc, class);
+
 	replace_sub_page(class, zspage, newzpdesc, zpdesc);
 	/*
 	 * Since we complete the data copy and set up new zspage structure,
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting
  2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
                   ` (10 preceding siblings ...)
  2026-03-11 19:51 ` [PATCH 11/11] mm/zsmalloc: Handle charge migration in zpdesc substitution Joshua Hahn
@ 2026-03-11 19:54 ` Joshua Hahn
  11 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 19:54 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Yosry Ahmed,
	Nhat Pham, Nhat Pham, Chengming Zhou, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Harry Yoo,
	Andrew Morton, cgroups, linux-mm, linux-kernel, kernel-team

On Wed, 11 Mar 2026 12:51:37 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

Ouch, immediately after sending these out I realized that I forgot to
add a "V2" indicator in the subjects of all of these patches.

I apologize for the noise.

> Joshua Hahn (11):
>   mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
>   mm/zsmalloc: Make all obj_idx unsigned ints
>   mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
>   mm/zsmalloc: Introduce objcgs pointer in struct zspage
>   mm/zsmalloc: Store obj_cgroup pointer in zspage
>   mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage
>   mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
>   mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
>   mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec
>   mm/zsmalloc: Handle single object charge migration in migrate_zspage
>   mm/zsmalloc: Handle charge migration in zpdesc substitution
> 
>  drivers/block/zram/zram_drv.c |  10 +-
>  include/linux/memcontrol.h    |  20 +-
>  include/linux/mmzone.h        |   2 +
>  include/linux/zsmalloc.h      |   9 +-
>  mm/memcontrol.c               |  75 ++-----
>  mm/vmstat.c                   |   2 +
>  mm/zsmalloc.c                 | 381 ++++++++++++++++++++++++++++++++--
>  mm/zswap.c                    |  66 +++---
>  8 files changed, 431 insertions(+), 134 deletions(-)
> 
> -- 
> 2.52.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
  2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
@ 2026-03-11 19:56   ` Yosry Ahmed
  2026-03-11 20:00   ` Nhat Pham
  1 sibling, 0 replies; 33+ messages in thread
From: Yosry Ahmed @ 2026-03-11 19:56 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Nhat Pham, Nhat Pham,
	Johannes Weiner, Andrew Morton, linux-mm, linux-kernel,
	kernel-team

On Wed, Mar 11, 2026 at 12:52 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> All the zsmalloc functions that operate on a zsmalloc object (encoded
> location values) are named "zs_obj_xxx", except for zs_object_copy.
>
> Rename zs_object_copy to zs_obj_copy to conform to the pattern.
> No functional changes intended.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Reviewed-by: Yosry Ahmed <yosry@kernel.org>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints
  2026-03-11 19:51 ` [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
@ 2026-03-11 19:58   ` Yosry Ahmed
  2026-03-11 20:01   ` Nhat Pham
  1 sibling, 0 replies; 33+ messages in thread
From: Yosry Ahmed @ 2026-03-11 19:58 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Nhat Pham, Nhat Pham,
	Johannes Weiner, Andrew Morton, linux-mm, linux-kernel,
	kernel-team

On Wed, Mar 11, 2026 at 12:52 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> object indices, which describe the location of an object in a zspage,
> cannot be negative. To reflect this most helpers calculate and return
> these values as unsigned ints.
>
> Convert find_alloced_obj, the only function that calculates obj_idx as
> a signed int, to use an unsigned int as well.
>
> No functional change intended.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Reviewed-by: Yosry Ahmed <yosry@kernel.org>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
  2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
  2026-03-11 19:56   ` Yosry Ahmed
@ 2026-03-11 20:00   ` Nhat Pham
  1 sibling, 0 replies; 33+ messages in thread
From: Nhat Pham @ 2026-03-11 20:00 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Nhat Pham, Johannes Weiner,
	Andrew Morton, linux-mm, linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:51 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> All the zsmalloc functions that operate on a zsmalloc object (encoded
> location values) are named "zs_obj_xxx", except for zs_object_copy.
>
> Rename zs_object_copy to zs_obj_copy to conform to the pattern.
> No functional changes intended.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
>  mm/zsmalloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 2c1430bf8d57..7a9b8f55d529 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1416,7 +1416,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
>  }
>  EXPORT_SYMBOL_GPL(zs_free);
>
> -static void zs_object_copy(struct size_class *class, unsigned long dst,
> +static void zs_obj_copy(struct size_class *class, unsigned long dst,
>                                 unsigned long src)
>  {
>         struct zpdesc *s_zpdesc, *d_zpdesc;
> @@ -1537,7 +1537,7 @@ static void migrate_zspage(struct zs_pool *pool, struct zspage *src_zspage,
>
>                 used_obj = handle_to_obj(handle);
>                 free_obj = obj_malloc(pool, dst_zspage, handle);
> -               zs_object_copy(class, free_obj, used_obj);
> +               zs_obj_copy(class, free_obj, used_obj);
>                 obj_idx++;

Reviewed-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints
  2026-03-11 19:51 ` [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
  2026-03-11 19:58   ` Yosry Ahmed
@ 2026-03-11 20:01   ` Nhat Pham
  1 sibling, 0 replies; 33+ messages in thread
From: Nhat Pham @ 2026-03-11 20:01 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Nhat Pham, Johannes Weiner,
	Andrew Morton, linux-mm, linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:51 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> object indices, which describe the location of an object in a zspage,
> cannot be negative. To reflect this most helpers calculate and return
> these values as unsigned ints.
>
> Convert find_alloced_obj, the only function that calculates obj_idx as
> a signed int, to use an unsigned int as well.
>
> No functional change intended.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Reviewed-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
  2026-03-11 19:51 ` [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool Joshua Hahn
@ 2026-03-11 20:12   ` Nhat Pham
  2026-03-11 20:16   ` Johannes Weiner
  1 sibling, 0 replies; 33+ messages in thread
From: Nhat Pham @ 2026-03-11 20:12 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Yosry Ahmed,
	Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block,
	linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:51 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> Introduce 3 new fields to struct zs_pool to allow individual zpools to
> be "memcg-aware": memcg_aware, compressed_stat, and uncompressed_stat.
>
> memcg_aware is used in later patches to determine whether memory
> should be allocated to keep track of per-compresed object objgs.
> compressed_stat and uncompressed_stat are enum indices that point into
> memcg (node) stats that zsmalloc will account towards.
>
> In reality, these fields help distinguish between the two users of
> zsmalloc, zswap and zram. The enum indices compressed_stat and
> uncompressed_stat are parametrized to minimize zswap-specific hardcoding
> in zsmalloc.
>
> Suggested-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Zswap side LGTM :) And for that:

Acked-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
  2026-03-11 19:51 ` [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool Joshua Hahn
  2026-03-11 20:12   ` Nhat Pham
@ 2026-03-11 20:16   ` Johannes Weiner
  2026-03-11 20:19     ` Yosry Ahmed
  2026-03-11 20:20     ` Joshua Hahn
  1 sibling, 2 replies; 33+ messages in thread
From: Johannes Weiner @ 2026-03-11 20:16 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Yosry Ahmed, Nhat Pham,
	Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block,
	linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:51:40PM -0700, Joshua Hahn wrote:
> Introduce 3 new fields to struct zs_pool to allow individual zpools to
> be "memcg-aware": memcg_aware, compressed_stat, and uncompressed_stat.
> 
> memcg_aware is used in later patches to determine whether memory
> should be allocated to keep track of per-compresed object objgs.
> compressed_stat and uncompressed_stat are enum indices that point into
> memcg (node) stats that zsmalloc will account towards.
> 
> In reality, these fields help distinguish between the two users of
> zsmalloc, zswap and zram. The enum indices compressed_stat and
> uncompressed_stat are parametrized to minimize zswap-specific hardcoding
> in zsmalloc.
> 
> Suggested-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
>  drivers/block/zram/zram_drv.c |  3 ++-
>  include/linux/zsmalloc.h      |  5 ++++-
>  mm/zsmalloc.c                 | 13 ++++++++++++-
>  mm/zswap.c                    |  3 ++-
>  4 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index bca33403fc8b..d1eae5c20df7 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1980,7 +1980,8 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
>  	if (!zram->table)
>  		return false;
>  
> -	zram->mem_pool = zs_create_pool(zram->disk->disk_name);
> +	/* zram does not support memcg accounting */
> +	zram->mem_pool = zs_create_pool(zram->disk->disk_name, false, 0, 0);

It's a bit awkward that 0 is valid (MEMCG_SWAP). Plus you store these
values in every pool, even though they're always the same for all
zswap pools.

How about:

/* zsmalloc.h */
struct zs_memcg_params {
	enum memcg_stat_item compressed;
	enum memcg_stat_item uncompressed;
};
struct zs_pool *zs_create_pool(const char *name, struct zs_memcg_params *memcg_params);

/* zswap.c */
static struct zs_memcg_params zswap_memcg_params = {
	.compressed = MEMCG_ZSWAP_B,
	.uncompressed = MEMCG_ZSWAPPED,
};

then pass &zswap_memcg_params from zswap and NULL from zram.

> @@ -2071,6 +2079,9 @@ struct zs_pool *zs_create_pool(const char *name)
>  	rwlock_init(&pool->lock);
>  	atomic_set(&pool->compaction_in_progress, 0);
>  
> +	pool->memcg_aware = memcg_aware;
> +	pool->compressed_stat = compressed_stat;
> +	pool->uncompressed_stat = uncompressed_stat;

	pool->memcg_params = memcg_params;

And then use if (pool->memcg_params) to gate in zsmalloc.c.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage
  2026-03-11 19:51 ` [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage Joshua Hahn
@ 2026-03-11 20:17   ` Nhat Pham
  2026-03-11 20:22     ` Joshua Hahn
  0 siblings, 1 reply; 33+ messages in thread
From: Nhat Pham @ 2026-03-11 20:17 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Harry Yoo,
	Yosry Ahmed, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm,
	linux-block, linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:52 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> Introduce an array of struct obj_cgroup pointers to zspage to keep track
> of compressed objects' memcg ownership, if the zs_pool has been made to
> be memcg-aware at creation time.
>
> Move the error path for alloc_zspage to a jump label to simplify the
> growing error handling path for a failed zpdesc allocation.
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Suggested-by: Harry Yoo <harry.yoo@oracle.com>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
>  mm/zsmalloc.c | 34 ++++++++++++++++++++++++++--------
>  1 file changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 3f0f42b78314..dcf99516227c 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -39,6 +39,7 @@
>  #include <linux/zsmalloc.h>
>  #include <linux/fs.h>
>  #include <linux/workqueue.h>
> +#include <linux/memcontrol.h>
>  #include "zpdesc.h"
>
>  #define ZSPAGE_MAGIC   0x58
> @@ -273,6 +274,7 @@ struct zspage {
>         struct zpdesc *first_zpdesc;
>         struct list_head list; /* fullness list */
>         struct zs_pool *pool;
> +       struct obj_cgroup **objcgs;
>         struct zspage_lock zsl;
>  };
>
> @@ -825,6 +827,8 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
>                 zpdesc = next;
>         } while (zpdesc != NULL);
>
> +       if (pool->memcg_aware)
> +               kfree(zspage->objcgs);
>         cache_free_zspage(zspage);
>
>         class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
> @@ -946,6 +950,16 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
>         if (!IS_ENABLED(CONFIG_COMPACTION))
>                 gfp &= ~__GFP_MOVABLE;
>
> +       if (pool->memcg_aware) {
> +               zspage->objcgs = kcalloc(class->objs_per_zspage,
> +                                        sizeof(struct obj_cgroup *),
> +                                        gfp & ~__GFP_HIGHMEM);

I remembered asking this, so my apologies if I missed/forgot your
response - but would vmalloc work here? i.e kvcalloc to fallback to
vmalloc etc.?

> +               if (!zspage->objcgs) {
> +                       cache_free_zspage(zspage);
> +                       return NULL;
> +               }
> +       }
> +
>         zspage->magic = ZSPAGE_MAGIC;
>         zspage->pool = pool;
>         zspage->class = class->index;
> @@ -955,14 +969,8 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
>                 struct zpdesc *zpdesc;
>
>                 zpdesc = alloc_zpdesc(gfp, nid);
> -               if (!zpdesc) {
> -                       while (--i >= 0) {
> -                               zpdesc_dec_zone_page_state(zpdescs[i]);
> -                               free_zpdesc(zpdescs[i]);
> -                       }
> -                       cache_free_zspage(zspage);
> -                       return NULL;
> -               }
> +               if (!zpdesc)
> +                       goto err;
>                 __zpdesc_set_zsmalloc(zpdesc);
>
>                 zpdesc_inc_zone_page_state(zpdesc);
> @@ -973,6 +981,16 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
>         init_zspage(class, zspage);
>
>         return zspage;
> +
> +err:
> +       while (--i >= 0) {
> +               zpdesc_dec_zone_page_state(zpdescs[i]);
> +               free_zpdesc(zpdescs[i]);
> +       }
> +       if (pool->memcg_aware)
> +               kfree(zspage->objcgs);
> +       cache_free_zspage(zspage);
> +       return NULL;
>  }
>
>  static struct zspage *find_get_zspage(struct size_class *class)
> --
> 2.52.0
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage
  2026-03-11 19:51 ` [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage Joshua Hahn
@ 2026-03-11 20:17   ` Yosry Ahmed
  2026-03-11 20:24     ` Joshua Hahn
  0 siblings, 1 reply; 33+ messages in thread
From: Yosry Ahmed @ 2026-03-11 20:17 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
	Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
	linux-mm, linux-block, linux-kernel, kernel-team

[..]
> @@ -1216,6 +1216,11 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
>         class = zspage_class(pool, zspage);
>         off = offset_in_page(class->size * obj_idx);
>
> +       if (objcg) {
> +               WARN_ON_ONCE(!pool->memcg_aware);
> +               zspage->objcgs[obj_idx] = objcg;
> +       }

If pool->memcg_aware is not set the warning will fire, but the
following line will write to uninitialized memory and probably crash.
We should avoid the write if the warning fires.

Maybe:

if (objcg && !WARN_ON_ONCE(!pool->memcg_aware))
       zspage->objcgs[obj_idx] = objcg;

Not pretty, but the same pattern is followed in many places in the kernel.

> +
>         if (!ZsHugePage(zspage))
>                 off += ZS_HANDLE_SIZE;
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
  2026-03-11 20:16   ` Johannes Weiner
@ 2026-03-11 20:19     ` Yosry Ahmed
  2026-03-11 20:20     ` Joshua Hahn
  1 sibling, 0 replies; 33+ messages in thread
From: Yosry Ahmed @ 2026-03-11 20:19 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Joshua Hahn, Minchan Kim, Sergey Senozhatsky, Yosry Ahmed,
	Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm,
	linux-block, linux-kernel, kernel-team

> It's a bit awkward that 0 is valid (MEMCG_SWAP). Plus you store these
> values in every pool, even though they're always the same for all
> zswap pools.
>
> How about:
>
> /* zsmalloc.h */
> struct zs_memcg_params {
>         enum memcg_stat_item compressed;
>         enum memcg_stat_item uncompressed;
> };
> struct zs_pool *zs_create_pool(const char *name, struct zs_memcg_params *memcg_params);
>
> /* zswap.c */
> static struct zs_memcg_params zswap_memcg_params = {
>         .compressed = MEMCG_ZSWAP_B,
>         .uncompressed = MEMCG_ZSWAPPED,
> };
>
> then pass &zswap_memcg_params from zswap and NULL from zram.
>
> > @@ -2071,6 +2079,9 @@ struct zs_pool *zs_create_pool(const char *name)
> >       rwlock_init(&pool->lock);
> >       atomic_set(&pool->compaction_in_progress, 0);
> >
> > +     pool->memcg_aware = memcg_aware;
> > +     pool->compressed_stat = compressed_stat;
> > +     pool->uncompressed_stat = uncompressed_stat;
>
>         pool->memcg_params = memcg_params;
>
> And then use if (pool->memcg_params) to gate in zsmalloc.c.

I like this.

I also didn't like the suggested prototype for zs_create_pool(), and
didn't like that compressed_stat and uncompressed_stat did not have
memcg anywhere in the name. Was going to suggest adding a warning if
memcg_aware=false but the stat indices are non-zero, but your
suggestion is so much cleaner.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool
  2026-03-11 20:16   ` Johannes Weiner
  2026-03-11 20:19     ` Yosry Ahmed
@ 2026-03-11 20:20     ` Joshua Hahn
  1 sibling, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 20:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Minchan Kim, Sergey Senozhatsky, Yosry Ahmed, Nhat Pham,
	Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm, linux-block,
	linux-kernel, kernel-team

On Wed, 11 Mar 2026 16:16:34 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Wed, Mar 11, 2026 at 12:51:40PM -0700, Joshua Hahn wrote:
> > Introduce 3 new fields to struct zs_pool to allow individual zpools to
> > be "memcg-aware": memcg_aware, compressed_stat, and uncompressed_stat.
> > 
> > memcg_aware is used in later patches to determine whether memory
> > should be allocated to keep track of per-compresed object objgs.
> > compressed_stat and uncompressed_stat are enum indices that point into
> > memcg (node) stats that zsmalloc will account towards.
> > 
> > In reality, these fields help distinguish between the two users of
> > zsmalloc, zswap and zram. The enum indices compressed_stat and
> > uncompressed_stat are parametrized to minimize zswap-specific hardcoding
> > in zsmalloc.
> > 
> > Suggested-by: Yosry Ahmed <yosry@kernel.org>
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > ---
> >  drivers/block/zram/zram_drv.c |  3 ++-
> >  include/linux/zsmalloc.h      |  5 ++++-
> >  mm/zsmalloc.c                 | 13 ++++++++++++-
> >  mm/zswap.c                    |  3 ++-
> >  4 files changed, 20 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index bca33403fc8b..d1eae5c20df7 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -1980,7 +1980,8 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
> >  	if (!zram->table)
> >  		return false;
> >  
> > -	zram->mem_pool = zs_create_pool(zram->disk->disk_name);
> > +	/* zram does not support memcg accounting */
> > +	zram->mem_pool = zs_create_pool(zram->disk->disk_name, false, 0, 0);

Hello Johannes,

Thank you for your review! I hope you are doing well : -)

> It's a bit awkward that 0 is valid (MEMCG_SWAP). Plus you store these
> values in every pool, even though they're always the same for all
> zswap pools.

Agreed. Originally I thought of removing memcg_aware and doing a bitwise
OR check of the two stats, but thought this was also a bit strange
(also because 0 is not a valid enum state for memcg_stat_item anyways)

> How about:
> 
> /* zsmalloc.h */
> struct zs_memcg_params {
> 	enum memcg_stat_item compressed;
> 	enum memcg_stat_item uncompressed;
> };
> struct zs_pool *zs_create_pool(const char *name, struct zs_memcg_params *memcg_params);
> 
> /* zswap.c */
> static struct zs_memcg_params zswap_memcg_params = {
> 	.compressed = MEMCG_ZSWAP_B,
> 	.uncompressed = MEMCG_ZSWAPPED,
> };
> 
> then pass &zswap_memcg_params from zswap and NULL from zram.
> 
> > @@ -2071,6 +2079,9 @@ struct zs_pool *zs_create_pool(const char *name)
> >  	rwlock_init(&pool->lock);
> >  	atomic_set(&pool->compaction_in_progress, 0);
> >  
> > +	pool->memcg_aware = memcg_aware;
> > +	pool->compressed_stat = compressed_stat;
> > +	pool->uncompressed_stat = uncompressed_stat;
> 
> 	pool->memcg_params = memcg_params;
> 
> And then use if (pool->memcg_params) to gate in zsmalloc.c.

These definitely look a lot cleaner. Will make these changes in v3!

Thanks again. I hope you have a great day!
Joshua


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage
  2026-03-11 20:17   ` Nhat Pham
@ 2026-03-11 20:22     ` Joshua Hahn
  0 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 20:22 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Harry Yoo,
	Yosry Ahmed, Nhat Pham, Chengming Zhou, Andrew Morton, linux-mm,
	linux-block, linux-kernel, kernel-team

On Wed, 11 Mar 2026 13:17:22 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> On Wed, Mar 11, 2026 at 12:52 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > Introduce an array of struct obj_cgroup pointers to zspage to keep track
> > of compressed objects' memcg ownership, if the zs_pool has been made to
> > be memcg-aware at creation time.
> >
> > Move the error path for alloc_zspage to a jump label to simplify the
> > growing error handling path for a failed zpdesc allocation.
> >
> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > Suggested-by: Harry Yoo <harry.yoo@oracle.com>
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > ---
> >  mm/zsmalloc.c | 34 ++++++++++++++++++++++++++--------
> >  1 file changed, 26 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index 3f0f42b78314..dcf99516227c 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -39,6 +39,7 @@
> >  #include <linux/zsmalloc.h>
> >  #include <linux/fs.h>
> >  #include <linux/workqueue.h>
> > +#include <linux/memcontrol.h>
> >  #include "zpdesc.h"
> >
> >  #define ZSPAGE_MAGIC   0x58
> > @@ -273,6 +274,7 @@ struct zspage {
> >         struct zpdesc *first_zpdesc;
> >         struct list_head list; /* fullness list */
> >         struct zs_pool *pool;
> > +       struct obj_cgroup **objcgs;
> >         struct zspage_lock zsl;
> >  };
> >
> > @@ -825,6 +827,8 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class,
> >                 zpdesc = next;
> >         } while (zpdesc != NULL);
> >
> > +       if (pool->memcg_aware)
> > +               kfree(zspage->objcgs);
> >         cache_free_zspage(zspage);
> >
> >         class_stat_sub(class, ZS_OBJS_ALLOCATED, class->objs_per_zspage);
> > @@ -946,6 +950,16 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
> >         if (!IS_ENABLED(CONFIG_COMPACTION))
> >                 gfp &= ~__GFP_MOVABLE;
> >
> > +       if (pool->memcg_aware) {
> > +               zspage->objcgs = kcalloc(class->objs_per_zspage,
> > +                                        sizeof(struct obj_cgroup *),
> > +                                        gfp & ~__GFP_HIGHMEM);
> 
> I remembered asking this, so my apologies if I missed/forgot your
> response - but would vmalloc work here? i.e kvcalloc to fallback to
> vmalloc etc.?

Hello Nhat : -)

Thank you for reviewing, and for your acks on the other parts!

You're right, I missed changing that on my end after v1. No reason
vmalloc shouldn't work here, let me make that change in v3.

Thanks, I hope you have a great day!
Joshua


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage
  2026-03-11 20:17   ` Yosry Ahmed
@ 2026-03-11 20:24     ` Joshua Hahn
  0 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-11 20:24 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Jens Axboe,
	Yosry Ahmed, Nhat Pham, Nhat Pham, Chengming Zhou, Andrew Morton,
	linux-mm, linux-block, linux-kernel, kernel-team

On Wed, 11 Mar 2026 13:17:26 -0700 Yosry Ahmed <yosry@kernel.org> wrote:

> [..]
> > @@ -1216,6 +1216,11 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
> >         class = zspage_class(pool, zspage);
> >         off = offset_in_page(class->size * obj_idx);
> >
> > +       if (objcg) {
> > +               WARN_ON_ONCE(!pool->memcg_aware);
> > +               zspage->objcgs[obj_idx] = objcg;
> > +       }

Hello Yosry,

I hope you are doing well. Thank you for reviewing this series! : -)

> If pool->memcg_aware is not set the warning will fire, but the
> following line will write to uninitialized memory and probably crash.
> We should avoid the write if the warning fires.
> 
> Maybe:
> 
> if (objcg && !WARN_ON_ONCE(!pool->memcg_aware))
>        zspage->objcgs[obj_idx] = objcg;

Ack. 

> Not pretty, but the same pattern is followed in many places in the kernel.
> 
> > +
> >         if (!ZsHugePage(zspage))
> >                 off += ZS_HANDLE_SIZE;
> >

Definitely better than writing garbage and crashing : -)
I'll make this change in the next version, I think I should also sprinkle
these WARN_ON_ONCEs in a few other places as well. I'll be more
mindful of this for those cases as well.

Thank you again Yosry, I hope you have a great day!
Joshua


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
  2026-03-11 19:51 ` [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
@ 2026-03-11 20:33   ` Nhat Pham
  2026-03-17 19:13     ` Joshua Hahn
  0 siblings, 1 reply; 33+ messages in thread
From: Nhat Pham @ 2026-03-11 20:33 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Yosry Ahmed,
	Nhat Pham, Chengming Zhou, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, cgroups, linux-mm,
	linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:52 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> Zswap compresses and uncompresses in PAGE_SIZE units, which simplifies
> the accounting for how much memory it has compressed. However, when a
> compressed object is stored at the boundary of two zspages, accounting
> at a PAGE_SIZE granularity makes it difficult to fractionally charge
> each backing zspage with the ratio of memory it backs for the
> compressed object.
>
> To make sub-PAGE_SIZE granularity charging possible for MEMCG_ZSWAPPED,
> track the value in bytes and adjust its accounting accordingly.
>
> No functional changes intended.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

LGTM.
Reviewed-by: Nhat Pham <nphamcs@gmail.com>

> ---
>  include/linux/memcontrol.h | 2 +-
>  mm/memcontrol.c            | 5 +++--
>  mm/zsmalloc.c              | 4 ++--
>  mm/zswap.c                 | 8 +++++---
>  4 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 701d9ab6fef1..ce2e598b5963 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -38,7 +38,7 @@ enum memcg_stat_item {
>         MEMCG_VMALLOC,
>         MEMCG_KMEM,
>         MEMCG_ZSWAP_B,
> -       MEMCG_ZSWAPPED,
> +       MEMCG_ZSWAPPED_B,
>         MEMCG_NR_STAT,
>  };
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 68139be66a4f..1cb02d2febe8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -342,7 +342,7 @@ static const unsigned int memcg_stat_items[] = {
>         MEMCG_VMALLOC,
>         MEMCG_KMEM,
>         MEMCG_ZSWAP_B,
> -       MEMCG_ZSWAPPED,
> +       MEMCG_ZSWAPPED_B,
>  };
>
>  #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> @@ -1364,7 +1364,7 @@ static const struct memory_stat memory_stats[] = {
>         { "shmem",                      NR_SHMEM                        },
>  #ifdef CONFIG_ZSWAP
>         { "zswap",                      MEMCG_ZSWAP_B                   },
> -       { "zswapped",                   MEMCG_ZSWAPPED                  },
> +       { "zswapped",                   MEMCG_ZSWAPPED_B                },
>  #endif
>         { "file_mapped",                NR_FILE_MAPPED                  },
>         { "file_dirty",                 NR_FILE_DIRTY                   },
> @@ -1412,6 +1412,7 @@ static int memcg_page_state_unit(int item)
>         switch (item) {
>         case MEMCG_PERCPU_B:
>         case MEMCG_ZSWAP_B:
> +       case MEMCG_ZSWAPPED_B:
>         case NR_SLAB_RECLAIMABLE_B:
>         case NR_SLAB_UNRECLAIMABLE_B:
>                 return 1;
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 291194572a09..24665d7cd4a9 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1047,7 +1047,7 @@ static void zs_charge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
>         rcu_read_lock();
>         memcg = obj_cgroup_memcg(objcg);
>         mod_memcg_state(memcg, pool->compressed_stat, size);
> -       mod_memcg_state(memcg, pool->uncompressed_stat, 1);
> +       mod_memcg_state(memcg, pool->uncompressed_stat, PAGE_SIZE);
>         rcu_read_unlock();
>  }
>
> @@ -1066,7 +1066,7 @@ static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
>         rcu_read_lock();
>         memcg = obj_cgroup_memcg(objcg);
>         mod_memcg_state(memcg, pool->compressed_stat, -size);
> -       mod_memcg_state(memcg, pool->uncompressed_stat, -1);
> +       mod_memcg_state(memcg, pool->uncompressed_stat, -(int)PAGE_SIZE);

nit: seems a bit awkward lol?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
  2026-03-11 19:51 ` [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage Joshua Hahn
@ 2026-03-12  3:51   ` kernel test robot
  2026-03-12  3:51   ` kernel test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kernel test robot @ 2026-03-12  3:51 UTC (permalink / raw)
  To: Joshua Hahn, Minchan Kim, Sergey Senozhatsky
  Cc: oe-kbuild-all, Johannes Weiner, Yosry Ahmed, Nhat Pham, Harry Yoo,
	Andrew Morton, Linux Memory Management List, linux-kernel,
	kernel-team

Hi Joshua,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe/for-next]
[also build test WARNING on linus/master v7.0-rc3]
[cannot apply to akpm-mm/mm-everything next-20260311]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/mm-zsmalloc-Rename-zs_object_copy-to-zs_obj_copy/20260312-035531
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
patch link:    https://lore.kernel.org/r/20260311195153.4013476-11-joshua.hahnjy%40gmail.com
patch subject: [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
config: arc-randconfig-001-20260312 (https://download.01.org/0day-ci/archive/20260312/202603121115.dm3Z6KvA-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260312/202603121115.dm3Z6KvA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603121115.dm3Z6KvA-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/zsmalloc.c: In function 'zs_compact.part.28':
>> mm/zsmalloc.c:1696:15: warning: 's_idx' is used uninitialized in this function [-Wuninitialized]
     unsigned int s_idx, d_idx;
                  ^~~~~


vim +/s_idx +1696 mm/zsmalloc.c

  1686	
  1687	#ifdef CONFIG_MEMCG
  1688	static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
  1689				     unsigned long used_obj, unsigned long free_obj,
  1690				     struct zs_pool *pool, int size)
  1691	{
  1692		struct zpdesc *s_zpdesc, *d_zpdesc;
  1693		struct obj_cgroup *objcg;
  1694		struct mem_cgroup *memcg;
  1695		struct lruvec *l;
> 1696		unsigned int s_idx, d_idx;
  1697		unsigned int s_off, d_off;
  1698		int charges[4], nids[4], partial;
  1699		int s_bytes_in_page, d_bytes_in_page;
  1700		int i;
  1701	
  1702		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
  1703			goto out;
  1704	
  1705		obj_to_location(used_obj, &s_zpdesc, &s_idx);
  1706		obj_to_location(free_obj, &d_zpdesc, &d_idx);
  1707	
  1708		objcg = s_zspage->objcgs[s_idx];
  1709		if (!objcg)
  1710			goto out;
  1711	
  1712		/*
  1713		 * The object migration here can touch up to 4 nodes.
  1714		 * Instead of breaking down all possible combinations of node changes,
  1715		 * just uncharge entirely from the source and charge entirely to the
  1716		 * destination, even if there is are node overlaps between src and dst.
  1717		 */
  1718		s_off = (s_idx * size) % PAGE_SIZE;
  1719		d_off = (d_idx * size) % PAGE_SIZE;
  1720		s_bytes_in_page = min_t(int, size, PAGE_SIZE - s_off);
  1721		d_bytes_in_page = min_t(int, size, PAGE_SIZE - d_off);
  1722	
  1723		charges[0] = -s_bytes_in_page;
  1724		nids[0] = page_to_nid(zpdesc_page(s_zpdesc));
  1725		charges[1] = -(size - s_bytes_in_page); /* 0 if object doesn't span */
  1726		if (charges[1])
  1727			nids[1] = page_to_nid(zpdesc_page(get_next_zpdesc(s_zpdesc)));
  1728	
  1729		charges[2] = d_bytes_in_page;
  1730		nids[2] = page_to_nid(zpdesc_page(d_zpdesc));
  1731		charges[3] = size - d_bytes_in_page; /* 0 if object doesn't span */
  1732		if (charges[3])
  1733			nids[3] = page_to_nid(zpdesc_page(get_next_zpdesc(d_zpdesc)));
  1734	
  1735		rcu_read_lock();
  1736		memcg = obj_cgroup_memcg(objcg);
  1737		for (i = 0; i < 4; i++) {
  1738			if (!charges[i])
  1739				continue;
  1740	
  1741			l = mem_cgroup_lruvec(memcg, NODE_DATA(nids[i]));
  1742			partial = (PAGE_SIZE * charges[i]) / size;
  1743			mod_memcg_lruvec_state(l, pool->compressed_stat, charges[i]);
  1744			mod_memcg_lruvec_state(l, pool->uncompressed_stat, partial);
  1745		}
  1746		rcu_read_unlock();
  1747	
  1748		dec_node_page_state(zpdesc_page(s_zpdesc), pool->uncompressed_stat);
  1749		inc_node_page_state(zpdesc_page(d_zpdesc), pool->uncompressed_stat);
  1750	
  1751	out:
  1752		d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
  1753		s_zspage->objcgs[s_idx] = NULL;
  1754	}
  1755	#else
  1756	static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
  1757				     unsigned long used_obj, unsigned long free_obj,
  1758				     struct zs_pool *pool, int size)
  1759	{
  1760	}
  1761	#endif
  1762	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
  2026-03-11 19:51 ` [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage Joshua Hahn
  2026-03-12  3:51   ` kernel test robot
@ 2026-03-12  3:51   ` kernel test robot
  2026-03-12 16:56     ` Joshua Hahn
  1 sibling, 1 reply; 33+ messages in thread
From: kernel test robot @ 2026-03-12  3:51 UTC (permalink / raw)
  To: Joshua Hahn, Minchan Kim, Sergey Senozhatsky
  Cc: llvm, oe-kbuild-all, Johannes Weiner, Yosry Ahmed, Nhat Pham,
	Harry Yoo, Andrew Morton, Linux Memory Management List,
	linux-kernel, kernel-team

Hi Joshua,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe/for-next]
[also build test WARNING on linus/master v7.0-rc3]
[cannot apply to akpm-mm/mm-everything next-20260311]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/mm-zsmalloc-Rename-zs_object_copy-to-zs_obj_copy/20260312-035531
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
patch link:    https://lore.kernel.org/r/20260311195153.4013476-11-joshua.hahnjy%40gmail.com
patch subject: [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
config: x86_64-randconfig-001-20260312 (https://download.01.org/0day-ci/archive/20260312/202603121158.g93vlc2U-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260312/202603121158.g93vlc2U-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603121158.g93vlc2U-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/zsmalloc.c:1702:6: warning: variable 's_idx' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/zsmalloc.c:1752:45: note: uninitialized use occurs here
    1752 |         d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
         |                                                    ^~~~~
   mm/zsmalloc.c:1702:2: note: remove the 'if' if its condition is always false
    1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1703 |                 goto out;
         |                 ~~~~~~~~
   mm/zsmalloc.c:1696:20: note: initialize the variable 's_idx' to silence this warning
    1696 |         unsigned int s_idx, d_idx;
         |                           ^
         |                            = 0
>> mm/zsmalloc.c:1702:6: warning: variable 'd_idx' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/zsmalloc.c:1752:19: note: uninitialized use occurs here
    1752 |         d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
         |                          ^~~~~
   mm/zsmalloc.c:1702:2: note: remove the 'if' if its condition is always false
    1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1703 |                 goto out;
         |                 ~~~~~~~~
   mm/zsmalloc.c:1696:27: note: initialize the variable 'd_idx' to silence this warning
    1696 |         unsigned int s_idx, d_idx;
         |                                  ^
         |                                   = 0
   2 warnings generated.


vim +1702 mm/zsmalloc.c

  1686	
  1687	#ifdef CONFIG_MEMCG
  1688	static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
  1689				     unsigned long used_obj, unsigned long free_obj,
  1690				     struct zs_pool *pool, int size)
  1691	{
  1692		struct zpdesc *s_zpdesc, *d_zpdesc;
  1693		struct obj_cgroup *objcg;
  1694		struct mem_cgroup *memcg;
  1695		struct lruvec *l;
  1696		unsigned int s_idx, d_idx;
  1697		unsigned int s_off, d_off;
  1698		int charges[4], nids[4], partial;
  1699		int s_bytes_in_page, d_bytes_in_page;
  1700		int i;
  1701	
> 1702		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
  1703			goto out;
  1704	
  1705		obj_to_location(used_obj, &s_zpdesc, &s_idx);
  1706		obj_to_location(free_obj, &d_zpdesc, &d_idx);
  1707	
  1708		objcg = s_zspage->objcgs[s_idx];
  1709		if (!objcg)
  1710			goto out;
  1711	
  1712		/*
  1713		 * The object migration here can touch up to 4 nodes.
  1714		 * Instead of breaking down all possible combinations of node changes,
  1715		 * just uncharge entirely from the source and charge entirely to the
  1716		 * destination, even if there is are node overlaps between src and dst.
  1717		 */
  1718		s_off = (s_idx * size) % PAGE_SIZE;
  1719		d_off = (d_idx * size) % PAGE_SIZE;
  1720		s_bytes_in_page = min_t(int, size, PAGE_SIZE - s_off);
  1721		d_bytes_in_page = min_t(int, size, PAGE_SIZE - d_off);
  1722	
  1723		charges[0] = -s_bytes_in_page;
  1724		nids[0] = page_to_nid(zpdesc_page(s_zpdesc));
  1725		charges[1] = -(size - s_bytes_in_page); /* 0 if object doesn't span */
  1726		if (charges[1])
  1727			nids[1] = page_to_nid(zpdesc_page(get_next_zpdesc(s_zpdesc)));
  1728	
  1729		charges[2] = d_bytes_in_page;
  1730		nids[2] = page_to_nid(zpdesc_page(d_zpdesc));
  1731		charges[3] = size - d_bytes_in_page; /* 0 if object doesn't span */
  1732		if (charges[3])
  1733			nids[3] = page_to_nid(zpdesc_page(get_next_zpdesc(d_zpdesc)));
  1734	
  1735		rcu_read_lock();
  1736		memcg = obj_cgroup_memcg(objcg);
  1737		for (i = 0; i < 4; i++) {
  1738			if (!charges[i])
  1739				continue;
  1740	
  1741			l = mem_cgroup_lruvec(memcg, NODE_DATA(nids[i]));
  1742			partial = (PAGE_SIZE * charges[i]) / size;
  1743			mod_memcg_lruvec_state(l, pool->compressed_stat, charges[i]);
  1744			mod_memcg_lruvec_state(l, pool->uncompressed_stat, partial);
  1745		}
  1746		rcu_read_unlock();
  1747	
  1748		dec_node_page_state(zpdesc_page(s_zpdesc), pool->uncompressed_stat);
  1749		inc_node_page_state(zpdesc_page(d_zpdesc), pool->uncompressed_stat);
  1750	
  1751	out:
  1752		d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
  1753		s_zspage->objcgs[s_idx] = NULL;
  1754	}
  1755	#else
  1756	static void zs_migrate_objcg(struct zspage *s_zspage, struct zspage *d_zspage,
  1757				     unsigned long used_obj, unsigned long free_obj,
  1758				     struct zs_pool *pool, int size)
  1759	{
  1760	}
  1761	#endif
  1762	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
  2026-03-12  3:51   ` kernel test robot
@ 2026-03-12 16:56     ` Joshua Hahn
  0 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-12 16:56 UTC (permalink / raw)
  To: kernel test robot
  Cc: Minchan Kim, Sergey Senozhatsky, llvm, oe-kbuild-all,
	Johannes Weiner, Yosry Ahmed, Nhat Pham, Harry Yoo, Andrew Morton,
	Linux Memory Management List, linux-kernel, kernel-team

On Thu, 12 Mar 2026 11:51:59 +0800 kernel test robot <lkp@intel.com> wrote:

> Hi Joshua,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on axboe/for-next]
> [also build test WARNING on linus/master v7.0-rc3]
> [cannot apply to akpm-mm/mm-everything next-20260311]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/mm-zsmalloc-Rename-zs_object_copy-to-zs_obj_copy/20260312-035531
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
> patch link:    https://lore.kernel.org/r/20260311195153.4013476-11-joshua.hahnjy%40gmail.com
> patch subject: [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage
> config: x86_64-randconfig-001-20260312 (https://download.01.org/0day-ci/archive/20260312/202603121158.g93vlc2U-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260312/202603121158.g93vlc2U-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202603121158.g93vlc2U-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> mm/zsmalloc.c:1702:6: warning: variable 's_idx' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>     1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>          |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    mm/zsmalloc.c:1752:45: note: uninitialized use occurs here
>     1752 |         d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
>          |                                                    ^~~~~
>    mm/zsmalloc.c:1702:2: note: remove the 'if' if its condition is always false
>     1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     1703 |                 goto out;
>          |                 ~~~~~~~~
>    mm/zsmalloc.c:1696:20: note: initialize the variable 's_idx' to silence this warning
>     1696 |         unsigned int s_idx, d_idx;
>          |                           ^
>          |                            = 0
> >> mm/zsmalloc.c:1702:6: warning: variable 'd_idx' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>     1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>          |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    mm/zsmalloc.c:1752:19: note: uninitialized use occurs here
>     1752 |         d_zspage->objcgs[d_idx] = s_zspage->objcgs[s_idx];
>          |                          ^~~~~
>    mm/zsmalloc.c:1702:2: note: remove the 'if' if its condition is always false
>     1702 |         if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     1703 |                 goto out;
>          |                 ~~~~~~~~
>    mm/zsmalloc.c:1696:27: note: initialize the variable 'd_idx' to silence this warning
>     1696 |         unsigned int s_idx, d_idx;
>          |                                  ^
>          |                                   = 0
>    2 warnings generated.

Hello kernel test robot,

Thank you for catching this issue! Yes, the MEMCG v1 check should be
done after I use obj_to_location to initialize the indices, so that
the objcg pointer swap works at the end.

Will make the change in the next version!
Joshua


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
  2026-03-11 19:51 ` [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
@ 2026-03-12 21:42   ` Johannes Weiner
  2026-03-13 15:34     ` Joshua Hahn
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Weiner @ 2026-03-12 21:42 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Yosry Ahmed, Nhat Pham,
	Nhat Pham, Chengming Zhou, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, cgroups, linux-mm,
	linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 12:51:44PM -0700, Joshua Hahn wrote:
> @@ -1244,6 +1297,8 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
>  	if (objcg) {
>  		WARN_ON_ONCE(!pool->memcg_aware);
>  		zspage->objcgs[obj_idx] = objcg;
> +		obj_cgroup_get(objcg);
> +		zs_charge_objcg(pool, objcg, class->size);
>  	}
>  
>  	if (!ZsHugePage(zspage))

Note that the obj_cgroup_get() reference is for the pointer, not the
charge. I think it all comes out right in the end, but it's a bit
confusing to follow and verify through the series.

IOW, it's better move that obj_cgroup_get() when you add and store
zspage->objcgs[]. If zswap stil has a reference at that point in the
series, then it's fine for there to be two separate obj_cgroup_get()
as well, with later patches deleting the zswap one when its
entry->objcg pointer disappears.

> @@ -711,10 +711,6 @@ static void zswap_entry_free(struct zswap_entry *entry)
>  	zswap_lru_del(&zswap_list_lru, entry, objcg);
>  	zs_free(entry->pool->zs_pool, entry->handle);
>  	zswap_pool_put(entry->pool);
> -	if (objcg) {
> -		obj_cgroup_uncharge_zswap(objcg, entry->length);
> -		obj_cgroup_put(objcg);
> -	}
>  	if (entry->length == PAGE_SIZE)
>  		atomic_long_dec(&zswap_stored_incompressible_pages);
>  	zswap_entry_cache_free(entry);

[ I can see that this was misleading. It was really getting a
  reference for the entry->objcg = objcg a few lines down, hitching a
  ride on that existing `if (objcg)`. ]


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
  2026-03-12 21:42   ` Johannes Weiner
@ 2026-03-13 15:34     ` Joshua Hahn
  2026-03-13 16:49       ` Johannes Weiner
  0 siblings, 1 reply; 33+ messages in thread
From: Joshua Hahn @ 2026-03-13 15:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Minchan Kim, Sergey Senozhatsky, Yosry Ahmed, Nhat Pham,
	Nhat Pham, Chengming Zhou, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, cgroups, linux-mm,
	linux-kernel, kernel-team

On Thu, 12 Mar 2026 17:42:01 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Wed, Mar 11, 2026 at 12:51:44PM -0700, Joshua Hahn wrote:
> > @@ -1244,6 +1297,8 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
> >  	if (objcg) {
> >  		WARN_ON_ONCE(!pool->memcg_aware);
> >  		zspage->objcgs[obj_idx] = objcg;
> > +		obj_cgroup_get(objcg);
> > +		zs_charge_objcg(pool, objcg, class->size);
> >  	}
> >  
> >  	if (!ZsHugePage(zspage))

Hello Johannes, thank you for your review!

> Note that the obj_cgroup_get() reference is for the pointer, not the
> charge. I think it all comes out right in the end, but it's a bit
> confusing to follow and verify through the series.

Thank you for pointing that out. I'll try to make it more explicit via
the placement.

> IOW, it's better move that obj_cgroup_get() when you add and store
> zspage->objcgs[]. If zswap stil has a reference at that point in the
> series, then it's fine for there to be two separate obj_cgroup_get()
> as well, with later patches deleting the zswap one when its
> entry->objcg pointer disappears.

Sounds good with me. Maybe for the code block above I just move it one
line up so that it happens before the zspage->objcgs set and
make it more obvious that it's associated with setting the objcg
pointer and not with the charge?

And for the freeing section, putting after we set the pointer to
NULL could be more obvious?

> > @@ -711,10 +711,6 @@ static void zswap_entry_free(struct zswap_entry *entry)
> >  	zswap_lru_del(&zswap_list_lru, entry, objcg);
> >  	zs_free(entry->pool->zs_pool, entry->handle);
> >  	zswap_pool_put(entry->pool);
> > -	if (objcg) {
> > -		obj_cgroup_uncharge_zswap(objcg, entry->length);
> > -		obj_cgroup_put(objcg);
> > -	}
> >  	if (entry->length == PAGE_SIZE)
> >  		atomic_long_dec(&zswap_stored_incompressible_pages);
> >  	zswap_entry_cache_free(entry);
> 
> [ I can see that this was misleading. It was really getting a
>   reference for the entry->objcg = objcg a few lines down, hitching a
>   ride on that existing `if (objcg)`. ]

Thank you for the clarification! I hope you have a great day : -)
Joshua


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
  2026-03-13 15:34     ` Joshua Hahn
@ 2026-03-13 16:49       ` Johannes Weiner
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Weiner @ 2026-03-13 16:49 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Minchan Kim, Sergey Senozhatsky, Yosry Ahmed, Nhat Pham,
	Nhat Pham, Chengming Zhou, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, cgroups, linux-mm,
	linux-kernel, kernel-team

On Fri, Mar 13, 2026 at 08:34:33AM -0700, Joshua Hahn wrote:
> > IOW, it's better move that obj_cgroup_get() when you add and store
> > zspage->objcgs[]. If zswap stil has a reference at that point in the
> > series, then it's fine for there to be two separate obj_cgroup_get()
> > as well, with later patches deleting the zswap one when its
> > entry->objcg pointer disappears.
> 
> Sounds good with me. Maybe for the code block above I just move it one
> line up so that it happens before the zspage->objcgs set and
> make it more obvious that it's associated with setting the objcg
> pointer and not with the charge?
> 
> And for the freeing section, putting after we set the pointer to
> NULL could be more obvious?

That makes sense to me!

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
  2026-03-11 20:33   ` Nhat Pham
@ 2026-03-17 19:13     ` Joshua Hahn
  0 siblings, 0 replies; 33+ messages in thread
From: Joshua Hahn @ 2026-03-17 19:13 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Minchan Kim, Sergey Senozhatsky, Johannes Weiner, Yosry Ahmed,
	Nhat Pham, Chengming Zhou, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, cgroups, linux-mm,
	linux-kernel, kernel-team

On Wed, 11 Mar 2026 13:33:34 -0700 Nhat Pham <nphamcs@gmail.com> wrote:

> On Wed, Mar 11, 2026 at 12:52 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > Zswap compresses and uncompresses in PAGE_SIZE units, which simplifies
> > the accounting for how much memory it has compressed. However, when a
> > compressed object is stored at the boundary of two zspages, accounting
> > at a PAGE_SIZE granularity makes it difficult to fractionally charge
> > each backing zspage with the ratio of memory it backs for the
> > compressed object.
> >
> > To make sub-PAGE_SIZE granularity charging possible for MEMCG_ZSWAPPED,
> > track the value in bytes and adjust its accounting accordingly.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 
> LGTM.
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>

[...snip...]

> > @@ -1066,7 +1066,7 @@ static void zs_uncharge_objcg(struct zs_pool *pool, struct obj_cgroup *objcg,
> >         rcu_read_lock();
> >         memcg = obj_cgroup_memcg(objcg);
> >         mod_memcg_state(memcg, pool->compressed_stat, -size);
> > -       mod_memcg_state(memcg, pool->uncompressed_stat, -1);
> > +       mod_memcg_state(memcg, pool->uncompressed_stat, -(int)PAGE_SIZE);
> 
> nit: seems a bit awkward lol?

Hello Nhat,

I totally just saw the Reviewed-by and moved on and didn't see this nit
here :p sorry!!

But yeah, I agree that it looks very awkward. AFAICT I don't think there's
a signed version of PAGE_SIZE or a negative PAGE_SIZE definition, so
unfortunately this cast is needed : -(

mm/zsmalloc.c: In function ‘zs_uncharge_objcg’:
mm/zsmalloc.c:1068:66: warning: overflow in conversion from ‘long unsigned int’ to ‘int’ changes value from ‘18446744073709547520’ to ‘-4096’ [-Woverflow]
 1068 |         mod_memcg_state(memcg, pool->memcg_params->uncompressed, -PAGE_SIZE);
      |                                                                  ^~~~~~~~~~

I will note that this is a temporary cast, we immediately remove this line
in the next patch. I did this because I wanted to show a natrual transition
from MEMCG_ZSWAPPED --> MEMCG_ZSWAPPED_B --> NR_ZSWAPPED_B and thought it
would be easier to review, but this does leave some intermediary changes in
this patch that are removed right away. If you would prefer that I squash
this commit and the next into a single patch so that there is less
intermediate code, I would be happy to do that instead!

I hope you have a great day!
Joshua


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-03-17 19:13 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
2026-03-11 19:56   ` Yosry Ahmed
2026-03-11 20:00   ` Nhat Pham
2026-03-11 19:51 ` [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
2026-03-11 19:58   ` Yosry Ahmed
2026-03-11 20:01   ` Nhat Pham
2026-03-11 19:51 ` [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool Joshua Hahn
2026-03-11 20:12   ` Nhat Pham
2026-03-11 20:16   ` Johannes Weiner
2026-03-11 20:19     ` Yosry Ahmed
2026-03-11 20:20     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage Joshua Hahn
2026-03-11 20:17   ` Nhat Pham
2026-03-11 20:22     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage Joshua Hahn
2026-03-11 20:17   ` Yosry Ahmed
2026-03-11 20:24     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 06/11] mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage Joshua Hahn
2026-03-11 19:51 ` [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
2026-03-12 21:42   ` Johannes Weiner
2026-03-13 15:34     ` Joshua Hahn
2026-03-13 16:49       ` Johannes Weiner
2026-03-11 19:51 ` [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
2026-03-11 20:33   ` Nhat Pham
2026-03-17 19:13     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 09/11] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
2026-03-11 19:51 ` [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage Joshua Hahn
2026-03-12  3:51   ` kernel test robot
2026-03-12  3:51   ` kernel test robot
2026-03-12 16:56     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 11/11] mm/zsmalloc: Handle charge migration in zpdesc substitution Joshua Hahn
2026-03-11 19:54 ` [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox