* [PATCH 0/3] page_counter cleanup and size reduction
@ 2025-02-28 7:58 Shakeel Butt
2025-02-28 7:58 ` [PATCH 1/3] memcg: don't call propagate_protected_usage() for v1 Shakeel Butt
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Shakeel Butt @ 2025-02-28 7:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
linux-mm, cgroups, linux-kernel, Meta kernel team
The commit c6f53ed8f213a ("mm, memcg: cg2 memory{.swap,}.peak write
handlers") has accidently increased the size of struct page_counter.
This series rearrange the fields to reduce its size and also has some
cleanups.
Shakeel Butt (3):
memcg: don't call propagate_protected_usage() for v1
page_counter: track failcnt only for legacy cgroups
page_counter: reduce struct page_counter size
include/linux/page_counter.h | 9 ++++++---
mm/hugetlb_cgroup.c | 31 ++++++++++++++-----------------
mm/memcontrol.c | 17 +++++++++++++----
mm/page_counter.c | 4 +++-
4 files changed, 36 insertions(+), 25 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] memcg: don't call propagate_protected_usage() for v1
2025-02-28 7:58 [PATCH 0/3] page_counter cleanup and size reduction Shakeel Butt
@ 2025-02-28 7:58 ` Shakeel Butt
2025-02-28 7:58 ` [PATCH 2/3] page_counter: track failcnt only for legacy cgroups Shakeel Butt
2025-02-28 7:58 ` [PATCH 3/3] page_counter: reduce struct page_counter size Shakeel Butt
2 siblings, 0 replies; 4+ messages in thread
From: Shakeel Butt @ 2025-02-28 7:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
linux-mm, cgroups, linux-kernel, Meta kernel team
Memcg-v1 does not support memory protection (min/low) and thus there is
no need to track protected memory usage for it.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 55b0e9482c00..36b2dfbc86c0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3601,6 +3601,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
{
struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
struct mem_cgroup *memcg, *old_memcg;
+ bool memcg_on_dfl = cgroup_subsys_on_dfl(memory_cgrp_subsys);
old_memcg = set_active_memcg(parent);
memcg = mem_cgroup_alloc(parent);
@@ -3618,7 +3619,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
if (parent) {
WRITE_ONCE(memcg->swappiness, mem_cgroup_swappiness(parent));
- page_counter_init(&memcg->memory, &parent->memory, true);
+ page_counter_init(&memcg->memory, &parent->memory, memcg_on_dfl);
page_counter_init(&memcg->swap, &parent->swap, false);
#ifdef CONFIG_MEMCG_V1
WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable));
@@ -3638,7 +3639,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
return &memcg->css;
}
- if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+ if (memcg_on_dfl && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
if (!cgroup_memory_nobpf)
--
2.43.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] page_counter: track failcnt only for legacy cgroups
2025-02-28 7:58 [PATCH 0/3] page_counter cleanup and size reduction Shakeel Butt
2025-02-28 7:58 ` [PATCH 1/3] memcg: don't call propagate_protected_usage() for v1 Shakeel Butt
@ 2025-02-28 7:58 ` Shakeel Butt
2025-02-28 7:58 ` [PATCH 3/3] page_counter: reduce struct page_counter size Shakeel Butt
2 siblings, 0 replies; 4+ messages in thread
From: Shakeel Butt @ 2025-02-28 7:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
linux-mm, cgroups, linux-kernel, Meta kernel team
Currently page_counter tracks failcnt for counters used by v1 and v2
controllers. However failcnt is only exported for v1 deployment and thus
there is no need to maintain it in v2. The oom report does expose
failcnt for memory and swap in v2 but v2 already maintains MEMCG_MAX and
MEMCG_SWAP_MAX event counters which can be used.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
include/linux/page_counter.h | 4 +++-
mm/hugetlb_cgroup.c | 31 ++++++++++++++-----------------
mm/memcontrol.c | 12 ++++++++++--
mm/page_counter.c | 4 +++-
4 files changed, 30 insertions(+), 21 deletions(-)
diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 46406f3fe34d..e4bd8fd427be 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -28,12 +28,13 @@ struct page_counter {
unsigned long watermark;
/* Latest cg2 reset watermark */
unsigned long local_watermark;
- unsigned long failcnt;
+ unsigned long failcnt; /* v1-only field */
/* Keep all the read most fields in a separete cacheline. */
CACHELINE_PADDING(_pad2_);
bool protection_support;
+ bool track_failcnt;
unsigned long min;
unsigned long low;
unsigned long high;
@@ -58,6 +59,7 @@ static inline void page_counter_init(struct page_counter *counter,
counter->max = PAGE_COUNTER_MAX;
counter->parent = parent;
counter->protection_support = protection_support;
+ counter->track_failcnt = false;
}
static inline unsigned long page_counter_read(struct page_counter *counter)
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index bb9578bd99f9..58e895f3899a 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -101,10 +101,9 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
int idx;
for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
- struct page_counter *fault_parent = NULL;
- struct page_counter *rsvd_parent = NULL;
+ struct page_counter *fault, *fault_parent = NULL;
+ struct page_counter *rsvd, *rsvd_parent = NULL;
unsigned long limit;
- int ret;
if (parent_h_cgroup) {
fault_parent = hugetlb_cgroup_counter_from_cgroup(
@@ -112,24 +111,22 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
rsvd_parent = hugetlb_cgroup_counter_from_cgroup_rsvd(
parent_h_cgroup, idx);
}
- page_counter_init(hugetlb_cgroup_counter_from_cgroup(h_cgroup,
- idx),
- fault_parent, false);
- page_counter_init(
- hugetlb_cgroup_counter_from_cgroup_rsvd(h_cgroup, idx),
- rsvd_parent, false);
+ fault = hugetlb_cgroup_counter_from_cgroup(h_cgroup, idx);
+ rsvd = hugetlb_cgroup_counter_from_cgroup_rsvd(h_cgroup, idx);
+
+ page_counter_init(fault, fault_parent, false);
+ page_counter_init(rsvd, rsvd_parent, false);
+
+ if (!cgroup_subsys_on_dfl(hugetlb_cgrp_subsys)) {
+ fault->track_failcnt = true;
+ rsvd->track_failcnt = true;
+ }
limit = round_down(PAGE_COUNTER_MAX,
pages_per_huge_page(&hstates[idx]));
- ret = page_counter_set_max(
- hugetlb_cgroup_counter_from_cgroup(h_cgroup, idx),
- limit);
- VM_BUG_ON(ret);
- ret = page_counter_set_max(
- hugetlb_cgroup_counter_from_cgroup_rsvd(h_cgroup, idx),
- limit);
- VM_BUG_ON(ret);
+ VM_BUG_ON(page_counter_set_max(fault, limit));
+ VM_BUG_ON(page_counter_set_max(rsvd, limit));
}
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 36b2dfbc86c0..030fadbd5bf2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1572,16 +1572,23 @@ void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
/* Use static buffer, for the caller is holding oom_lock. */
static char buf[SEQ_BUF_SIZE];
struct seq_buf s;
+ unsigned long memory_failcnt;
lockdep_assert_held(&oom_lock);
+ if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
+ memory_failcnt = atomic_long_read(&memcg->memory_events[MEMCG_MAX]);
+ else
+ memory_failcnt = memcg->memory.failcnt;
+
pr_info("memory: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->memory)),
- K((u64)READ_ONCE(memcg->memory.max)), memcg->memory.failcnt);
+ K((u64)READ_ONCE(memcg->memory.max)), memory_failcnt);
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
pr_info("swap: usage %llukB, limit %llukB, failcnt %lu\n",
K((u64)page_counter_read(&memcg->swap)),
- K((u64)READ_ONCE(memcg->swap.max)), memcg->swap.failcnt);
+ K((u64)READ_ONCE(memcg->swap.max)),
+ atomic_long_read(&memcg->memory_events[MEMCG_SWAP_MAX]));
#ifdef CONFIG_MEMCG_V1
else {
pr_info("memory+swap: usage %llukB, limit %llukB, failcnt %lu\n",
@@ -3622,6 +3629,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
page_counter_init(&memcg->memory, &parent->memory, memcg_on_dfl);
page_counter_init(&memcg->swap, &parent->swap, false);
#ifdef CONFIG_MEMCG_V1
+ memcg->memory.track_failcnt = !memcg_on_dfl;
WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable));
page_counter_init(&memcg->kmem, &parent->kmem, false);
page_counter_init(&memcg->tcpmem, &parent->tcpmem, false);
diff --git a/mm/page_counter.c b/mm/page_counter.c
index af23f927611b..661e0f2a5127 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -121,6 +121,7 @@ bool page_counter_try_charge(struct page_counter *counter,
{
struct page_counter *c;
bool protection = track_protection(counter);
+ bool track_failcnt = counter->track_failcnt;
for (c = counter; c; c = c->parent) {
long new;
@@ -146,7 +147,8 @@ bool page_counter_try_charge(struct page_counter *counter,
* inaccuracy in the failcnt which is only used
* to report stats.
*/
- data_race(c->failcnt++);
+ if (track_failcnt)
+ data_race(c->failcnt++);
*fail = c;
goto failed;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] page_counter: reduce struct page_counter size
2025-02-28 7:58 [PATCH 0/3] page_counter cleanup and size reduction Shakeel Butt
2025-02-28 7:58 ` [PATCH 1/3] memcg: don't call propagate_protected_usage() for v1 Shakeel Butt
2025-02-28 7:58 ` [PATCH 2/3] page_counter: track failcnt only for legacy cgroups Shakeel Butt
@ 2025-02-28 7:58 ` Shakeel Butt
2 siblings, 0 replies; 4+ messages in thread
From: Shakeel Butt @ 2025-02-28 7:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
linux-mm, cgroups, linux-kernel, Meta kernel team
The struct page_counter has explicit padding for better cache alignment.
The commit c6f53ed8f213a ("mm, memcg: cg2 memory{.swap,}.peak write
handlers") added a field to the struct page_counter and accidently
increased its size. Let's move the failcnt field which is v1-only field
to the same cacheline of usage to reduce the size of struct
page_counter.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
include/linux/page_counter.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index e4bd8fd427be..d649b6bbbc87 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -9,10 +9,12 @@
struct page_counter {
/*
- * Make sure 'usage' does not share cacheline with any other field. The
- * memcg->memory.usage is a hot member of struct mem_cgroup.
+ * Make sure 'usage' does not share cacheline with any other field in
+ * v2. The memcg->memory.usage is a hot member of struct mem_cgroup.
*/
atomic_long_t usage;
+ unsigned long failcnt; /* v1-only field */
+
CACHELINE_PADDING(_pad1_);
/* effective memory.min and memory.min usage tracking */
@@ -28,7 +30,6 @@ struct page_counter {
unsigned long watermark;
/* Latest cg2 reset watermark */
unsigned long local_watermark;
- unsigned long failcnt; /* v1-only field */
/* Keep all the read most fields in a separete cacheline. */
CACHELINE_PADDING(_pad2_);
--
2.43.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-28 7:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 7:58 [PATCH 0/3] page_counter cleanup and size reduction Shakeel Butt
2025-02-28 7:58 ` [PATCH 1/3] memcg: don't call propagate_protected_usage() for v1 Shakeel Butt
2025-02-28 7:58 ` [PATCH 2/3] page_counter: track failcnt only for legacy cgroups Shakeel Butt
2025-02-28 7:58 ` [PATCH 3/3] page_counter: reduce struct page_counter size Shakeel Butt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).