* [PATCH v2 0/9] support ksm_stat showing at cgroup level
@ 2025-05-01 4:08 xu.xin16
2025-05-01 4:11 ` [PATCH v2 1/9] memcontrol: rename mem_cgroup_scan_tasks() xu.xin.sc
` (10 more replies)
0 siblings, 11 replies; 15+ messages in thread
From: xu.xin16 @ 2025-05-01 4:08 UTC (permalink / raw)
To: akpm
Cc: david, linux-kernel, wang.yaxin, linux-mm, linux-fsdevel,
yang.yang29, xu.xin16
From: xu xin <xu.xin16@zte.com.cn>
With the enablement of container-level KSM (e.g., via prctl [1]), there is
a growing demand for container-level observability of KSM behavior. However,
current cgroup implementations lack support for exposing KSM-related
metrics.
This patch introduces a new interface named ksm_stat
at the cgroup hierarchy level, enabling users to monitor KSM merging
statistics specifically for containers where this feature has been
activated, eliminating the need to manually inspect KSM information for
each individual process within the cgroup.
Users can obtain the KSM information of a cgroup just by:
# cat /sys/fs/cgroup/memory.ksm_stat
ksm_rmap_items 76800
ksm_zero_pages 0
ksm_merging_pages 76800
ksm_process_profit 309657600
Current implementation supports both cgroup v2 and cgroup v1.
xu xin (9):
memcontrol: rename mem_cgroup_scan_tasks()
memcontrol: introduce the new mem_cgroup_scan_tasks()
memcontrol: introduce ksm_stat at memcg-v2
memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat
memcontrol: add ksm_merging_pages in cgroup/memory.ksm_stat
memcontrol: add ksm_profit in cgroup/memory.ksm_stat
memcontrol-v1: add ksm_stat at memcg-v1
Documentation: add ksm_stat description in cgroup-v1/memory.rst
Documentation: add ksm_stat description in cgroup-v2.rst
Documentation/admin-guide/cgroup-v1/memory.rst | 36 +++++++++++
Documentation/admin-guide/cgroup-v2.rst | 12 ++++
include/linux/memcontrol.h | 14 +++++
mm/memcontrol-v1.c | 6 ++
mm/memcontrol.c | 83 +++++++++++++++++++++++++-
mm/oom_kill.c | 6 +-
6 files changed, 152 insertions(+), 5 deletions(-)
--
2.15.2
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/9] memcontrol: rename mem_cgroup_scan_tasks()
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
@ 2025-05-01 4:11 ` xu.xin.sc
2025-05-01 4:13 ` [PATCH v2 2/9] memcontrol: introduce the new mem_cgroup_scan_tasks() xu.xin.sc
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:11 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
Current Issue:
==============
The function mem_cgroup_scan_tasks in memcontrol.c has a naming ambiguity.
While its name suggests it only iterates through processes belonging to
the current memcgroup, it actually scans all descendant cgroups under the
subtree rooted at this memcgroup. This discrepancy can cause confusion
for developers relying on the semantic meaning of the function name.
Resolution:
=========
Renaming: We have renamed the original function to
**mem_cgroup_tree_scan_tasks** to explicitly reflect its subtree-traversal
behavior.
A subsequent patch will introduce a new mem_cgroup_scan_tasks function that
strictly iterates processes only within the current memcgroup, aligning its
behavior with its name.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
include/linux/memcontrol.h | 4 ++--
mm/memcontrol.c | 4 ++--
mm/oom_kill.c | 6 +++---
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5264d148bdd9..1c1ce25fae4c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -795,7 +795,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
struct mem_cgroup *,
struct mem_cgroup_reclaim_cookie *);
void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
-void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
+void mem_cgroup_tree_scan_tasks(struct mem_cgroup *memcg,
int (*)(struct task_struct *, void *), void *arg);
static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
@@ -1289,7 +1289,7 @@ static inline void mem_cgroup_iter_break(struct mem_cgroup *root,
{
}
-static inline void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
+static inline void mem_cgroup_tree_scan_tasks(struct mem_cgroup *memcg,
int (*fn)(struct task_struct *, void *), void *arg)
{
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6bc6dade60d8..3baf0a4e0674 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1164,7 +1164,7 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
}
/**
- * mem_cgroup_scan_tasks - iterate over tasks of a memory cgroup hierarchy
+ * mem_cgroup_tree_scan_tasks - iterate over tasks of a memory cgroup hierarchy
* @memcg: hierarchy root
* @fn: function to call for each task
* @arg: argument passed to @fn
@@ -1176,7 +1176,7 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
*
* This function must not be called for the root memory cgroup.
*/
-void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
+void mem_cgroup_tree_scan_tasks(struct mem_cgroup *memcg,
int (*fn)(struct task_struct *, void *), void *arg)
{
struct mem_cgroup *iter;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..af3b8407fb08 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -367,7 +367,7 @@ static void select_bad_process(struct oom_control *oc)
oc->chosen_points = LONG_MIN;
if (is_memcg_oom(oc))
- mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc);
+ mem_cgroup_tree_scan_tasks(oc->memcg, oom_evaluate_task, oc);
else {
struct task_struct *p;
@@ -428,7 +428,7 @@ static void dump_tasks(struct oom_control *oc)
pr_info("[ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name\n");
if (is_memcg_oom(oc))
- mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
+ mem_cgroup_tree_scan_tasks(oc->memcg, dump_task, oc);
else {
struct task_struct *p;
int i = 0;
@@ -1056,7 +1056,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
if (oom_group) {
memcg_memory_event(oom_group, MEMCG_OOM_GROUP_KILL);
mem_cgroup_print_oom_group(oom_group);
- mem_cgroup_scan_tasks(oom_group, oom_kill_memcg_member,
+ mem_cgroup_tree_scan_tasks(oom_group, oom_kill_memcg_member,
(void *)message);
mem_cgroup_put(oom_group);
}
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/9] memcontrol: introduce the new mem_cgroup_scan_tasks()
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
2025-05-01 4:11 ` [PATCH v2 1/9] memcontrol: rename mem_cgroup_scan_tasks() xu.xin.sc
@ 2025-05-01 4:13 ` xu.xin.sc
2025-05-01 4:14 ` [PATCH v2 3/9] memcontrol: introduce ksm_stat at memcg-v2 xu.xin.sc
` (8 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:13 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
Introduce a new mem_cgroup_scan_tasks function that strictly iterates
processes only within the current memcgroup, aligning its behavior with
its name.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
include/linux/memcontrol.h | 7 +++++++
mm/memcontrol.c | 24 ++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1c1ce25fae4c..f9d663a7ccde 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -795,6 +795,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
struct mem_cgroup *,
struct mem_cgroup_reclaim_cookie *);
void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
+void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
+ int (*)(struct task_struct *, void *), void *arg);
void mem_cgroup_tree_scan_tasks(struct mem_cgroup *memcg,
int (*)(struct task_struct *, void *), void *arg);
@@ -1289,6 +1291,11 @@ static inline void mem_cgroup_iter_break(struct mem_cgroup *root,
{
}
+static inline void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
+ int (*fn)(struct task_struct *, void *), void *arg)
+{
+}
+
static inline void mem_cgroup_tree_scan_tasks(struct mem_cgroup *memcg,
int (*fn)(struct task_struct *, void *), void *arg)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3baf0a4e0674..629e2ce2d830 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1163,6 +1163,30 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
dead_memcg);
}
+/* *
+ * mem_cgroup_scan_tasks - iterate over tasks of only this memory cgroup.
+ * @memcg: the specified memory cgroup.
+ * @fn: function to call for each task
+ * @arg: argument passed to @fn
+ *
+ * Unlike mem_cgroup_tree_scan_tasks(), this function only iterate over
+ * these tasks attached to @memcg, not including any of its descendants
+ * memcg. And this could be called for the root memory cgroup.
+ */
+void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
+ int (*fn)(struct task_struct *, void *), void *arg)
+{
+ int ret = 0;
+ struct css_task_iter it;
+ struct task_struct *task;
+
+ css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it);
+ while (!ret && (task = css_task_iter_next(&it)))
+ ret = fn(task, arg);
+
+ css_task_iter_end(&it);
+}
+
/**
* mem_cgroup_tree_scan_tasks - iterate over tasks of a memory cgroup hierarchy
* @memcg: hierarchy root
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 3/9] memcontrol: introduce ksm_stat at memcg-v2
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
2025-05-01 4:11 ` [PATCH v2 1/9] memcontrol: rename mem_cgroup_scan_tasks() xu.xin.sc
2025-05-01 4:13 ` [PATCH v2 2/9] memcontrol: introduce the new mem_cgroup_scan_tasks() xu.xin.sc
@ 2025-05-01 4:14 ` xu.xin.sc
2025-05-01 4:14 ` [PATCH v2 4/9] memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat xu.xin.sc
` (7 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:14 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29, Haonan Chen
From: xu xin <xu.xin16@zte.com.cn>
With the enablement of container-level KSM (e.g., via prctl), there is a
growing demand for container-level observability of KSM behavior. However,
current cgroup implementations lack support for exposing KSM-related
metrics.
This patch introduces a new interface named ksm_stat
at the cgroup hierarchy level, enabling users to monitor KSM merging
statistics specifically for containers where this feature has been
activated, eliminating the need to manually inspect KSM information for
each individual process within the memcg-v2.
Users can obtain the KSM information of a cgroup just by:
`cat /sys/fs/cgroup/memory.ksm_stat`
Co-developed-by: Haonan Chen <chen.haonan2@zte.com.cn>
Signed-off-by: Haonan Chen <chen.haonan2@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/memcontrol.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 629e2ce2d830..69521a66639b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4388,6 +4388,40 @@ static int memory_numa_stat_show(struct seq_file *m, void *v)
}
#endif
+#ifdef CONFIG_KSM
+struct memcg_ksm_stat {
+ unsigned long ksm_rmap_items;
+};
+
+static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
+{
+ struct mm_struct *mm;
+ struct memcg_ksm_stat *ksm_stat = arg;
+
+ mm = get_task_mm(task);
+ if (mm) {
+ ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
+ mmput(mm);
+ }
+
+ return 0;
+}
+
+static int memcg_ksm_stat_show(struct seq_file *m, void *v)
+{
+ struct memcg_ksm_stat ksm_stat;
+ struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+ /* Initialization */
+ ksm_stat.ksm_rmap_items = 0;
+ /* summing all processes'ksm statistic items of this cgroup hierarchy */
+ mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
+ seq_printf(m, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
+
+ return 0;
+}
+#endif
+
static int memory_oom_group_show(struct seq_file *m, void *v)
{
struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
@@ -4554,6 +4588,12 @@ static struct cftype memory_files[] = {
.name = "numa_stat",
.seq_show = memory_numa_stat_show,
},
+#endif
+#ifdef CONFIG_KSM
+ {
+ .name = "ksm_stat",
+ .seq_show = memcg_ksm_stat_show,
+ },
#endif
{
.name = "oom.group",
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 4/9] memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (2 preceding siblings ...)
2025-05-01 4:14 ` [PATCH v2 3/9] memcontrol: introduce ksm_stat at memcg-v2 xu.xin.sc
@ 2025-05-01 4:14 ` xu.xin.sc
2025-05-01 4:15 ` [PATCH v2 5/9] memcontrol: add ksm_merging_pages " xu.xin.sc
` (6 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:14 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
Users can obtain ksm_zero_pages of a cgroup just by:
/ # cat /sys/fs/cgroup/memory.ksm_stat
ksm_rmap_items 76800
ksm_zero_pages 0
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/memcontrol.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 69521a66639b..509098093bbd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
#include <linux/seq_buf.h>
#include <linux/sched/isolation.h>
#include <linux/kmemleak.h>
+#include <linux/ksm.h>
#include "internal.h"
#include <net/sock.h>
#include <net/ip.h>
@@ -4391,6 +4392,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v)
#ifdef CONFIG_KSM
struct memcg_ksm_stat {
unsigned long ksm_rmap_items;
+ long ksm_zero_pages;
};
static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
@@ -4401,6 +4403,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
mm = get_task_mm(task);
if (mm) {
ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
+ ksm_stat->ksm_zero_pages += mm_ksm_zero_pages(mm);
mmput(mm);
}
@@ -4414,9 +4417,12 @@ static int memcg_ksm_stat_show(struct seq_file *m, void *v)
/* Initialization */
ksm_stat.ksm_rmap_items = 0;
+ ksm_stat.ksm_zero_pages = 0;
+
/* summing all processes'ksm statistic items of this cgroup hierarchy */
mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
seq_printf(m, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
+ seq_printf(m, "ksm_zero_pages %ld\n", ksm_stat.ksm_zero_pages);
return 0;
}
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 5/9] memcontrol: add ksm_merging_pages in cgroup/memory.ksm_stat
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (3 preceding siblings ...)
2025-05-01 4:14 ` [PATCH v2 4/9] memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat xu.xin.sc
@ 2025-05-01 4:15 ` xu.xin.sc
2025-05-01 4:15 ` [PATCH v2 6/9] memcontrol: add ksm_profit " xu.xin.sc
` (5 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:15 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
Users can obtain ksm_merging_pages of a cgroup just by:
/ # cat /sys/fs/cgroup/memory.ksm_stat
ksm_rmap_items 76800
ksm_zero_pages 0
ksm_merging_pages 1092
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/memcontrol.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 509098093bbd..9569d32944e3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4393,6 +4393,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v)
struct memcg_ksm_stat {
unsigned long ksm_rmap_items;
long ksm_zero_pages;
+ unsigned long ksm_merging_pages;
};
static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
@@ -4404,6 +4405,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
if (mm) {
ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
ksm_stat->ksm_zero_pages += mm_ksm_zero_pages(mm);
+ ksm_stat->ksm_merging_pages += mm->ksm_merging_pages;
mmput(mm);
}
@@ -4418,11 +4420,14 @@ static int memcg_ksm_stat_show(struct seq_file *m, void *v)
/* Initialization */
ksm_stat.ksm_rmap_items = 0;
ksm_stat.ksm_zero_pages = 0;
+ ksm_stat.ksm_merging_pages = 0;
/* summing all processes'ksm statistic items of this cgroup hierarchy */
mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
+
seq_printf(m, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
seq_printf(m, "ksm_zero_pages %ld\n", ksm_stat.ksm_zero_pages);
+ seq_printf(m, "ksm_merging_pages %ld\n", ksm_stat.ksm_merging_pages);
return 0;
}
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 6/9] memcontrol: add ksm_profit in cgroup/memory.ksm_stat
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (4 preceding siblings ...)
2025-05-01 4:15 ` [PATCH v2 5/9] memcontrol: add ksm_merging_pages " xu.xin.sc
@ 2025-05-01 4:15 ` xu.xin.sc
2025-05-01 4:16 ` [PATCH v2 7/9] memcontrol-v1: add ksm_stat at memcg-v1 xu.xin.sc
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:15 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
Users can obtain ksm_profit of a cgroup just by:
/ # cat /sys/fs/cgroup/memory.ksm_stat
ksm_rmap_items 76800
ksm_zero_pages 0
ksm_merging_pages 76800
ksm_profit 309657600
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/memcontrol.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9569d32944e3..8ab21420ebb8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4394,6 +4394,7 @@ struct memcg_ksm_stat {
unsigned long ksm_rmap_items;
long ksm_zero_pages;
unsigned long ksm_merging_pages;
+ long ksm_profit;
};
static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
@@ -4406,6 +4407,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
ksm_stat->ksm_zero_pages += mm_ksm_zero_pages(mm);
ksm_stat->ksm_merging_pages += mm->ksm_merging_pages;
+ ksm_stat->ksm_profit += ksm_process_profit(mm);
mmput(mm);
}
@@ -4421,6 +4423,7 @@ static int memcg_ksm_stat_show(struct seq_file *m, void *v)
ksm_stat.ksm_rmap_items = 0;
ksm_stat.ksm_zero_pages = 0;
ksm_stat.ksm_merging_pages = 0;
+ ksm_stat.ksm_profit = 0;
/* summing all processes'ksm statistic items of this cgroup hierarchy */
mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
@@ -4428,6 +4431,7 @@ static int memcg_ksm_stat_show(struct seq_file *m, void *v)
seq_printf(m, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
seq_printf(m, "ksm_zero_pages %ld\n", ksm_stat.ksm_zero_pages);
seq_printf(m, "ksm_merging_pages %ld\n", ksm_stat.ksm_merging_pages);
+ seq_printf(m, "ksm_profit %ld\n", ksm_stat.ksm_profit);
return 0;
}
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 7/9] memcontrol-v1: add ksm_stat at memcg-v1
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (5 preceding siblings ...)
2025-05-01 4:15 ` [PATCH v2 6/9] memcontrol: add ksm_profit " xu.xin.sc
@ 2025-05-01 4:16 ` xu.xin.sc
2025-05-01 4:17 ` [PATCH v2 8/9] Documentation: add ksm_stat description in cgroup-v1/memory.rst xu.xin.sc
` (3 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:16 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
With the enablement of container-level KSM (e.g., via prctl), there is a
growing demand for container-level observability of KSM behavior. However,
current cgroup implementations lack support for exposing KSM-related
metrics.
This patch introduces a new interface named ksm_stat
at the cgroup hierarchy level, enabling users to monitor KSM merging
statistics specifically for containers where this feature has been
activated, eliminating the need to manually inspect KSM information for
each individual process within the cgroup-v1.
Since the implementation of ksm_stat has been added into memcg-v2, this
just add it back to memcg-v1 with the same function of traversing the
process of the memcg.
Users can obtain the KSM information of a cgroup just by:
`cat /sys/fs/cgroup/memory.ksm_stat`
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
include/linux/memcontrol.h | 7 +++++++
mm/memcontrol-v1.c | 6 ++++++
mm/memcontrol.c | 2 +-
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f9d663a7ccde..880ed3619f57 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -939,6 +939,8 @@ unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx);
unsigned long lruvec_page_state_local(struct lruvec *lruvec,
enum node_stat_item idx);
+int memcg_ksm_stat_show(struct seq_file *m, void *v);
+
void mem_cgroup_flush_stats(struct mem_cgroup *memcg);
void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg);
@@ -1415,6 +1417,11 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
return node_page_state(lruvec_pgdat(lruvec), idx);
}
+static inline int memcg_ksm_stat_show(struct seq_file *m, void *v)
+{
+ return 0;
+}
+
static inline void mem_cgroup_flush_stats(struct mem_cgroup *memcg)
{
}
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 4a9cf27a70af..0891ae3dae78 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -2079,6 +2079,12 @@ struct cftype mem_cgroup_legacy_files[] = {
.name = "numa_stat",
.seq_show = memcg_numa_stat_show,
},
+#endif
+#ifdef CONFIG_KSM
+ {
+ .name = "ksm_stat",
+ .seq_show = memcg_ksm_stat_show,
+ },
#endif
{
.name = "kmem.limit_in_bytes",
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8ab21420ebb8..cf4e9d47bb40 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4414,7 +4414,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
return 0;
}
-static int memcg_ksm_stat_show(struct seq_file *m, void *v)
+int memcg_ksm_stat_show(struct seq_file *m, void *v)
{
struct memcg_ksm_stat ksm_stat;
struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 8/9] Documentation: add ksm_stat description in cgroup-v1/memory.rst
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (6 preceding siblings ...)
2025-05-01 4:16 ` [PATCH v2 7/9] memcontrol-v1: add ksm_stat at memcg-v1 xu.xin.sc
@ 2025-05-01 4:17 ` xu.xin.sc
2025-05-01 4:17 ` [PATCH v2 9/9] Documentation: add ksm_stat description in cgroup-v2.rst xu.xin.sc
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:17 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
This updates ksm_stat description in cgroup-v1/memory.rst.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 36 ++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index d6b1db8cc7eb..a67e573c43d2 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -97,6 +97,8 @@ Brief summary of control files.
used.
memory.numa_stat show the number of memory usage per numa
node
+ memory.ksm_stat show the statistic information about various ksm
+ counters
memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
memory hard limit. Kernel hard limit is not
supported since 5.16. Writing any value to
@@ -674,6 +676,40 @@ The output format of memory.numa_stat is::
The "total" count is sum of file + anon + unevictable.
+.. _memcg_ksm_stat:
+
+5.7 ksm_stat
+------------
+
+When CONFIG_KSM is enabled, the ksm_stat file can be used to observe the ksm
+merging status of the processes within an memory cgroup.
+
+The output format of memory.ksm_stat is::
+
+ ksm_rmap_items <number>
+ ksm_zero_pages <number>
+ ksm_merging_pages <number>
+ ksm_profit <number>
+
+The "ksm_rmap_items" count specifies the number of ksm_rmap_item structures in
+use. The structureksm_rmap_item stores the reverse mapping information for
+virtual addresses. KSM will generate a ksm_rmap_item for each ksm-scanned page
+of the process.
+
+The "ksm_zero_pages" count specifies represent how many empty pages are merged
+with kernel zero pages by KSM, which is useful when /sys/kernel/mm/ksm/use_zero_pages
+is enabled.
+
+The "ksm_merging_pages" count specifies how many pages of this process are involved
+in KSM merging (not including ksm_zero_pages).
+
+The "ksm_process_profit" count specifies the profit that KSM brings (Saved bytes).
+KSM can save memory by merging identical pages, but also can consume additional
+memory, because it needs to generate a number of rmap_items to save each scanned
+page's brief rmap information. Some of these pages may be merged, but some may not
+be abled to be merged after being checked several times, which are unprofitable
+memory consumed.
+
6. Hierarchy support
====================
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 9/9] Documentation: add ksm_stat description in cgroup-v2.rst
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (7 preceding siblings ...)
2025-05-01 4:17 ` [PATCH v2 8/9] Documentation: add ksm_stat description in cgroup-v1/memory.rst xu.xin.sc
@ 2025-05-01 4:17 ` xu.xin.sc
2025-05-01 20:27 ` [PATCH v2 0/9] support ksm_stat showing at cgroup level Andrew Morton
2025-05-05 21:30 ` Shakeel Butt
10 siblings, 0 replies; 15+ messages in thread
From: xu.xin.sc @ 2025-05-01 4:17 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-fsdevel, linux-kernel, linux-mm, wang.yaxin,
yang.yang29
From: xu xin <xu.xin16@zte.com.cn>
This updates ksm_stat description in cgroup-v2.rst.
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
Documentation/admin-guide/cgroup-v2.rst | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8fb14ffab7d1..acab4c9c6e25 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1718,6 +1718,18 @@ The following nested keys are defined.
The entries can refer to the memory.stat.
+ memory.ksm_stat
+ A read-only nested-keyed file.
+
+ The file can be used to observe the ksm merging status of the
+ processes within an memory cgroup. There are four items
+ including "ksm_rmap_items", "ksm_zero_pages", "ksm_merging_pages"
+ and "ksm_profit".
+
+ See
+ :ref:`Documentation/admin-guide/cgroup-v1/memory.rst ksm_stat <memcg_ksm_stat>`
+ for details.
+
memory.swap.current
A read-only single value file which exists on non-root
cgroups.
--
2.15.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (8 preceding siblings ...)
2025-05-01 4:17 ` [PATCH v2 9/9] Documentation: add ksm_stat description in cgroup-v2.rst xu.xin.sc
@ 2025-05-01 20:27 ` Andrew Morton
2025-05-05 21:30 ` Shakeel Butt
10 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2025-05-01 20:27 UTC (permalink / raw)
To: xu.xin16
Cc: david, linux-kernel, wang.yaxin, linux-mm, linux-fsdevel,
yang.yang29, Johannes Weiner, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song
On Thu, 1 May 2025 12:08:54 +0800 (CST) <xu.xin16@zte.com.cn> wrote:
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related
> metrics.
>
> This patch introduces a new interface named ksm_stat
> at the cgroup hierarchy level, enabling users to monitor KSM merging
> statistics specifically for containers where this feature has been
> activated, eliminating the need to manually inspect KSM information for
> each individual process within the cgroup.
Well, you didn't cc any of the memcg maintainers!
The feature seems desirable and the implementation straightforward.
I'll add the patchset into mm.git for some testing, pending review
outcomes, thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
` (9 preceding siblings ...)
2025-05-01 20:27 ` [PATCH v2 0/9] support ksm_stat showing at cgroup level Andrew Morton
@ 2025-05-05 21:30 ` Shakeel Butt
2025-05-06 5:09 ` xu.xin16
10 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2025-05-05 21:30 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-kernel, wang.yaxin, linux-mm, linux-fsdevel,
yang.yang29
On Thu, May 01, 2025 at 12:08:54PM +0800, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related
> metrics.
>
> This patch introduces a new interface named ksm_stat
> at the cgroup hierarchy level, enabling users to monitor KSM merging
> statistics specifically for containers where this feature has been
> activated, eliminating the need to manually inspect KSM information for
> each individual process within the cgroup.
>
> Users can obtain the KSM information of a cgroup just by:
>
> # cat /sys/fs/cgroup/memory.ksm_stat
> ksm_rmap_items 76800
> ksm_zero_pages 0
> ksm_merging_pages 76800
> ksm_process_profit 309657600
>
> Current implementation supports both cgroup v2 and cgroup v1.
>
Before adding these stats to memcg, add global stats for them in
enum node_stat_item and then you can expose them in memcg through
memory.stat instead of a new interface.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
2025-05-05 21:30 ` Shakeel Butt
@ 2025-05-06 5:09 ` xu.xin16
2025-05-08 18:56 ` Shakeel Butt
0 siblings, 1 reply; 15+ messages in thread
From: xu.xin16 @ 2025-05-06 5:09 UTC (permalink / raw)
To: shakeel.butt
Cc: akpm, david, linux-kernel, wang.yaxin, linux-mm, linux-fsdevel,
yang.yang29
> > Users can obtain the KSM information of a cgroup just by:
> >
> > # cat /sys/fs/cgroup/memory.ksm_stat
> > ksm_rmap_items 76800
> > ksm_zero_pages 0
> > ksm_merging_pages 76800
> > ksm_process_profit 309657600
> >
> > Current implementation supports both cgroup v2 and cgroup v1.
> >
>
> Before adding these stats to memcg, add global stats for them in
> enum node_stat_item and then you can expose them in memcg through
> memory.stat instead of a new interface.
Dear shakeel.butt,
If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating
code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra
CPU-consuming while normal KSM procedures happen. Or, we can just traversal all processes of
this memcg and sum their ksm'counters like the current patche set implmentation.
If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as
it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is
less advisable since they are strongly coupled with KSM internals and would primarily interest
users monitoring KSM-specific behavior.
Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining
structural consistency with the existing /proc/<pid>/ksm_stat interface.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
2025-05-06 5:09 ` xu.xin16
@ 2025-05-08 18:56 ` Shakeel Butt
2025-06-02 14:14 ` xu.xin16
0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2025-05-08 18:56 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, david, linux-kernel, wang.yaxin, linux-mm, linux-fsdevel,
yang.yang29
On Tue, May 06, 2025 at 01:09:25PM +0800, xu.xin16@zte.com.cn wrote:
> > > Users can obtain the KSM information of a cgroup just by:
> > >
> > > # cat /sys/fs/cgroup/memory.ksm_stat
> > > ksm_rmap_items 76800
> > > ksm_zero_pages 0
> > > ksm_merging_pages 76800
> > > ksm_process_profit 309657600
> > >
> > > Current implementation supports both cgroup v2 and cgroup v1.
> > >
> >
> > Before adding these stats to memcg, add global stats for them in
> > enum node_stat_item and then you can expose them in memcg through
> > memory.stat instead of a new interface.
>
> Dear shakeel.butt,
>
> If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating
> code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra
> CPU-consuming while normal KSM procedures happen.
How is it more expensive than traversing all processes?
__lruvec_stat_add_folio() and related functions are already called in many
performance critical code paths, so I don't see any issue to call in the
ksm.
> Or, we can just traversal all processes of
> this memcg and sum their ksm'counters like the current patche set implmentation.
>
> If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as
> it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is
> less advisable since they are strongly coupled with KSM internals and would primarily interest
> users monitoring KSM-specific behavior.
We can discuss and decide each individual ksm stat if it makes sense to
added to memcg or not.
>
> Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining
> structural consistency with the existing /proc/<pid>/ksm_stat interface.
Sorry, I don't agree with this rationale. This is a separate interface
and can be different from exisiting ksm interface. We can define however
we think is right way to do for memcg and yes there can be stats overlap
with older interface.
For now I would say start with the ksm metrics that are appropriate to
be exposed globally and then we can see if those are fine for memcg as
well.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
2025-05-08 18:56 ` Shakeel Butt
@ 2025-06-02 14:14 ` xu.xin16
0 siblings, 0 replies; 15+ messages in thread
From: xu.xin16 @ 2025-06-02 14:14 UTC (permalink / raw)
To: shakeel.butt
Cc: akpm, david, linux-kernel, wang.yaxin, linux-mm, linux-fsdevel,
yang.yang29
> > > > Users can obtain the KSM information of a cgroup just by:
> > > >
> > > > # cat /sys/fs/cgroup/memory.ksm_stat
> > > > ksm_rmap_items 76800
> > > > ksm_zero_pages 0
> > > > ksm_merging_pages 76800
> > > > ksm_process_profit 309657600
> > > >
> > > > Current implementation supports both cgroup v2 and cgroup v1.
> > > >
> > >
> > > Before adding these stats to memcg, add global stats for them in
> > > enum node_stat_item and then you can expose them in memcg through
> > > memory.stat instead of a new interface.
> >
> > Dear shakeel.butt,
> >
> > If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating
> > code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra
> > CPU-consuming while normal KSM procedures happen.
>
> How is it more expensive than traversing all processes?
> __lruvec_stat_add_folio() and related functions are already called in many
> performance critical code paths, so I don't see any issue to call in the
> ksm.
>
> > Or, we can just traversal all processes of
> > this memcg and sum their ksm'counters like the current patche set implmentation.
> >
> > If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as
> > it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is
> > less advisable since they are strongly coupled with KSM internals and would primarily interest
> > users monitoring KSM-specific behavior.
>
> We can discuss and decide each individual ksm stat if it makes sense to
> added to memcg or not.
>
> >
> > Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining
> > structural consistency with the existing /proc/<pid>/ksm_stat interface.
>
> Sorry, I don't agree with this rationale. This is a separate interface
> and can be different from exisiting ksm interface. We can define however
> we think is right way to do for memcg and yes there can be stats overlap
> with older interface.
>
> For now I would say start with the ksm metrics that are appropriate to
> be exposed globally and then we can see if those are fine for memcg as
> well.
Thank you very much for your suggestion, and I'm sorry for the delayed reply
as last month I was exceptionally busy.
Upon further consideration, I agree that adding entries to the existing memory.stat
interface is indeed preferable to arbitrarily creating new interfaces. Therefore, my
next step is to plan adding the following global KSM metrics to memory.stat, such as
ksm_merged, ksm_unmergable, ksm_zero, and ksm_profit. (These represent the
total amount of merged pages, unmergable pages, zero pages merged by KSM, and
the overall profit, respectively.) However, please note that ksm_merging_pages and
ksm_unshared need to be converted to be represented in bytes.”
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-02 14:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-01 4:08 [PATCH v2 0/9] support ksm_stat showing at cgroup level xu.xin16
2025-05-01 4:11 ` [PATCH v2 1/9] memcontrol: rename mem_cgroup_scan_tasks() xu.xin.sc
2025-05-01 4:13 ` [PATCH v2 2/9] memcontrol: introduce the new mem_cgroup_scan_tasks() xu.xin.sc
2025-05-01 4:14 ` [PATCH v2 3/9] memcontrol: introduce ksm_stat at memcg-v2 xu.xin.sc
2025-05-01 4:14 ` [PATCH v2 4/9] memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat xu.xin.sc
2025-05-01 4:15 ` [PATCH v2 5/9] memcontrol: add ksm_merging_pages " xu.xin.sc
2025-05-01 4:15 ` [PATCH v2 6/9] memcontrol: add ksm_profit " xu.xin.sc
2025-05-01 4:16 ` [PATCH v2 7/9] memcontrol-v1: add ksm_stat at memcg-v1 xu.xin.sc
2025-05-01 4:17 ` [PATCH v2 8/9] Documentation: add ksm_stat description in cgroup-v1/memory.rst xu.xin.sc
2025-05-01 4:17 ` [PATCH v2 9/9] Documentation: add ksm_stat description in cgroup-v2.rst xu.xin.sc
2025-05-01 20:27 ` [PATCH v2 0/9] support ksm_stat showing at cgroup level Andrew Morton
2025-05-05 21:30 ` Shakeel Butt
2025-05-06 5:09 ` xu.xin16
2025-05-08 18:56 ` Shakeel Butt
2025-06-02 14:14 ` xu.xin16
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).