* [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 10:22 ` Vlastimil Babka
2025-05-13 3:13 ` [RFC PATCH 2/7] memcg: move preempt disable to callers of memcg_rstat_updated Shakeel Butt
` (5 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
The function memcg_rstat_updated() is used to track the memcg stats
updates for optimizing the flushes. At the moment, it is not re-entrant
safe and the callers disabled irqs before calling. However to achieve
the goal of updating memcg stats without irqs, memcg_rstat_updated()
needs to be re-entrant safe against irqs.
This patch makes memcg_rstat_updated() re-entrant safe against irqs.
However it is using atomic_* ops which on x86, adds lock prefix to the
instructions. Since this is per-cpu data, the this_cpu_* ops are
preferred. However the percpu pointer is stored in struct mem_cgroup and
doing the upward traversal through struct mem_cgroup may cause two cache
misses as compared to traversing through struct memcg_vmstats_percpu
pointer.
NOTE: explore if there is atomic_* ops alternative without lock prefix.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6cfa3550f300..2c4c095bf26c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -503,7 +503,7 @@ static inline int memcg_events_index(enum vm_event_item idx)
struct memcg_vmstats_percpu {
/* Stats updates since the last flush */
- unsigned int stats_updates;
+ atomic_t stats_updates;
/* Cached pointers for fast iteration in memcg_rstat_updated() */
struct memcg_vmstats_percpu *parent;
@@ -590,12 +590,15 @@ static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
struct memcg_vmstats_percpu *statc;
- int cpu = smp_processor_id();
- unsigned int stats_updates;
+ int cpu;
+ int stats_updates;
if (!val)
return;
+ /* Don't assume callers have preemption disabled. */
+ cpu = get_cpu();
+
cgroup_rstat_updated(memcg->css.cgroup, cpu);
statc = this_cpu_ptr(memcg->vmstats_percpu);
for (; statc; statc = statc->parent) {
@@ -607,14 +610,16 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
if (memcg_vmstats_needs_flush(statc->vmstats))
break;
- stats_updates = READ_ONCE(statc->stats_updates) + abs(val);
- WRITE_ONCE(statc->stats_updates, stats_updates);
+ stats_updates = atomic_add_return(abs(val), &statc->stats_updates);
if (stats_updates < MEMCG_CHARGE_BATCH)
continue;
- atomic64_add(stats_updates, &statc->vmstats->stats_updates);
- WRITE_ONCE(statc->stats_updates, 0);
+ stats_updates = atomic_xchg(&statc->stats_updates, 0);
+ if (stats_updates)
+ atomic64_add(stats_updates,
+ &statc->vmstats->stats_updates);
}
+ put_cpu();
}
static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force)
@@ -4155,7 +4160,7 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
mem_cgroup_stat_aggregate(&ac);
}
- WRITE_ONCE(statc->stats_updates, 0);
+ atomic_set(&statc->stats_updates, 0);
/* We are in a per-cpu loop here, only do the atomic write once */
if (atomic64_read(&memcg->vmstats->stats_updates))
atomic64_set(&memcg->vmstats->stats_updates, 0);
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs
2025-05-13 3:13 ` [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs Shakeel Butt
@ 2025-05-13 10:22 ` Vlastimil Babka
2025-05-13 18:09 ` Shakeel Butt
0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 10:22 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team, Mathieu Desnoyers
On 5/13/25 05:13, Shakeel Butt wrote:
> The function memcg_rstat_updated() is used to track the memcg stats
> updates for optimizing the flushes. At the moment, it is not re-entrant
> safe and the callers disabled irqs before calling. However to achieve
> the goal of updating memcg stats without irqs, memcg_rstat_updated()
> needs to be re-entrant safe against irqs.
>
> This patch makes memcg_rstat_updated() re-entrant safe against irqs.
> However it is using atomic_* ops which on x86, adds lock prefix to the
> instructions. Since this is per-cpu data, the this_cpu_* ops are
> preferred. However the percpu pointer is stored in struct mem_cgroup and
> doing the upward traversal through struct mem_cgroup may cause two cache
> misses as compared to traversing through struct memcg_vmstats_percpu
> pointer.
>
> NOTE: explore if there is atomic_* ops alternative without lock prefix.
local_t might be what you want here
https://docs.kernel.org/core-api/local_ops.html
Or maybe just add __percpu to parent like this?
struct memcg_vmstats_percpu {
...
struct memcg_vmstats_percpu __percpu *parent;
...
}
Yes, it means on each cpu's struct memcg_vmstats_percpu instance there will
be actually the same value stored (the percpu offset) instead of the
cpu-specific parent pointer, which might seem wasteful. But AFAIK this_cpu_*
is optimized enough thanks to the segment register usage, that it doesn't
matter? It shouldn't cause any extra cache miss you worry about, IIUC?
With that I think you could refactor that code to use e.g.
this_cpu_add_return() and this_cpu_xchg() on the stats_updates and obtain
the parent "pointer" in a way that's also compatible with these operations.
That is unless we want also nmi safety, then we're back to the issue of the
previous series...
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
> mm/memcontrol.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6cfa3550f300..2c4c095bf26c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -503,7 +503,7 @@ static inline int memcg_events_index(enum vm_event_item idx)
>
> struct memcg_vmstats_percpu {
> /* Stats updates since the last flush */
> - unsigned int stats_updates;
> + atomic_t stats_updates;
>
> /* Cached pointers for fast iteration in memcg_rstat_updated() */
> struct memcg_vmstats_percpu *parent;
> @@ -590,12 +590,15 @@ static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
> static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
> {
> struct memcg_vmstats_percpu *statc;
> - int cpu = smp_processor_id();
> - unsigned int stats_updates;
> + int cpu;
> + int stats_updates;
>
> if (!val)
> return;
>
> + /* Don't assume callers have preemption disabled. */
> + cpu = get_cpu();
> +
> cgroup_rstat_updated(memcg->css.cgroup, cpu);
> statc = this_cpu_ptr(memcg->vmstats_percpu);
> for (; statc; statc = statc->parent) {
> @@ -607,14 +610,16 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
> if (memcg_vmstats_needs_flush(statc->vmstats))
> break;
>
> - stats_updates = READ_ONCE(statc->stats_updates) + abs(val);
> - WRITE_ONCE(statc->stats_updates, stats_updates);
> + stats_updates = atomic_add_return(abs(val), &statc->stats_updates);
> if (stats_updates < MEMCG_CHARGE_BATCH)
> continue;
>
> - atomic64_add(stats_updates, &statc->vmstats->stats_updates);
> - WRITE_ONCE(statc->stats_updates, 0);
> + stats_updates = atomic_xchg(&statc->stats_updates, 0);
> + if (stats_updates)
> + atomic64_add(stats_updates,
> + &statc->vmstats->stats_updates);
> }
> + put_cpu();
> }
>
> static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force)
> @@ -4155,7 +4160,7 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> mem_cgroup_stat_aggregate(&ac);
>
> }
> - WRITE_ONCE(statc->stats_updates, 0);
> + atomic_set(&statc->stats_updates, 0);
> /* We are in a per-cpu loop here, only do the atomic write once */
> if (atomic64_read(&memcg->vmstats->stats_updates))
> atomic64_set(&memcg->vmstats->stats_updates, 0);
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs
2025-05-13 10:22 ` Vlastimil Babka
@ 2025-05-13 18:09 ` Shakeel Butt
0 siblings, 0 replies; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 18:09 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team, Mathieu Desnoyers
On Tue, May 13, 2025 at 12:22:28PM +0200, Vlastimil Babka wrote:
> On 5/13/25 05:13, Shakeel Butt wrote:
> > The function memcg_rstat_updated() is used to track the memcg stats
> > updates for optimizing the flushes. At the moment, it is not re-entrant
> > safe and the callers disabled irqs before calling. However to achieve
> > the goal of updating memcg stats without irqs, memcg_rstat_updated()
> > needs to be re-entrant safe against irqs.
> >
> > This patch makes memcg_rstat_updated() re-entrant safe against irqs.
> > However it is using atomic_* ops which on x86, adds lock prefix to the
> > instructions. Since this is per-cpu data, the this_cpu_* ops are
> > preferred. However the percpu pointer is stored in struct mem_cgroup and
> > doing the upward traversal through struct mem_cgroup may cause two cache
> > misses as compared to traversing through struct memcg_vmstats_percpu
> > pointer.
> >
> > NOTE: explore if there is atomic_* ops alternative without lock prefix.
>
> local_t might be what you want here
> https://docs.kernel.org/core-api/local_ops.html
>
> Or maybe just add __percpu to parent like this?
>
> struct memcg_vmstats_percpu {
> ...
> struct memcg_vmstats_percpu __percpu *parent;
> ...
> }
>
> Yes, it means on each cpu's struct memcg_vmstats_percpu instance there will
> be actually the same value stored (the percpu offset) instead of the
> cpu-specific parent pointer, which might seem wasteful. But AFAIK this_cpu_*
> is optimized enough thanks to the segment register usage, that it doesn't
> matter? It shouldn't cause any extra cache miss you worry about, IIUC?
>
> With that I think you could refactor that code to use e.g.
> this_cpu_add_return() and this_cpu_xchg() on the stats_updates and obtain
> the parent "pointer" in a way that's also compatible with these operations.
>
Thanks, I will try both of these and see which one looks better.
> That is unless we want also nmi safety, then we're back to the issue of the
> previous series...
Nah just irq safety for now and thanks a lot of quick feedback and
review.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 2/7] memcg: move preempt disable to callers of memcg_rstat_updated
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
2025-05-13 3:13 ` [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 10:34 ` Vlastimil Babka
2025-05-13 3:13 ` [RFC PATCH 3/7] memcg: make mod_memcg_state re-entrant safe against irqs Shakeel Butt
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
Let's move the explicit preempt disable code to the callers of
memcg_rstat_updated and also remove the memcg_stats_lock and related
functions which ensures the callers of stats update functions have
disabled preemption because now the stats update functions are
explicitly disabling preemption.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 74 +++++++++++++------------------------------------
1 file changed, 19 insertions(+), 55 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2c4c095bf26c..62450e7991d8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -558,47 +558,21 @@ static u64 flush_last_time;
#define FLUSH_TIME (2UL*HZ)
-/*
- * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can
- * not rely on this as part of an acquired spinlock_t lock. These functions are
- * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion
- * is sufficient.
- */
-static void memcg_stats_lock(void)
-{
- preempt_disable_nested();
- VM_WARN_ON_IRQS_ENABLED();
-}
-
-static void __memcg_stats_lock(void)
-{
- preempt_disable_nested();
-}
-
-static void memcg_stats_unlock(void)
-{
- preempt_enable_nested();
-}
-
-
static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
{
return atomic64_read(&vmstats->stats_updates) >
MEMCG_CHARGE_BATCH * num_online_cpus();
}
-static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
+static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val,
+ int cpu)
{
struct memcg_vmstats_percpu *statc;
- int cpu;
int stats_updates;
if (!val)
return;
- /* Don't assume callers have preemption disabled. */
- cpu = get_cpu();
-
cgroup_rstat_updated(memcg->css.cgroup, cpu);
statc = this_cpu_ptr(memcg->vmstats_percpu);
for (; statc; statc = statc->parent) {
@@ -619,7 +593,6 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
atomic64_add(stats_updates,
&statc->vmstats->stats_updates);
}
- put_cpu();
}
static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force)
@@ -717,6 +690,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
int val)
{
int i = memcg_stats_index(idx);
+ int cpu;
if (mem_cgroup_disabled())
return;
@@ -724,12 +698,14 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
- memcg_stats_lock();
+ cpu = get_cpu();
+
__this_cpu_add(memcg->vmstats_percpu->state[i], val);
val = memcg_state_val_in_pages(idx, val);
- memcg_rstat_updated(memcg, val);
+ memcg_rstat_updated(memcg, val, cpu);
trace_mod_memcg_state(memcg, idx, val);
- memcg_stats_unlock();
+
+ put_cpu();
}
#ifdef CONFIG_MEMCG_V1
@@ -758,6 +734,7 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
struct mem_cgroup_per_node *pn;
struct mem_cgroup *memcg;
int i = memcg_stats_index(idx);
+ int cpu;
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
@@ -765,24 +742,7 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
memcg = pn->memcg;
- /*
- * The caller from rmap relies on disabled preemption because they never
- * update their counter from in-interrupt context. For these two
- * counters we check that the update is never performed from an
- * interrupt context while other caller need to have disabled interrupt.
- */
- __memcg_stats_lock();
- if (IS_ENABLED(CONFIG_DEBUG_VM)) {
- switch (idx) {
- case NR_ANON_MAPPED:
- case NR_FILE_MAPPED:
- case NR_ANON_THPS:
- WARN_ON_ONCE(!in_task());
- break;
- default:
- VM_WARN_ON_IRQS_ENABLED();
- }
- }
+ cpu = get_cpu();
/* Update memcg */
__this_cpu_add(memcg->vmstats_percpu->state[i], val);
@@ -791,9 +751,10 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
__this_cpu_add(pn->lruvec_stats_percpu->state[i], val);
val = memcg_state_val_in_pages(idx, val);
- memcg_rstat_updated(memcg, val);
+ memcg_rstat_updated(memcg, val, cpu);
trace_mod_memcg_lruvec_state(memcg, idx, val);
- memcg_stats_unlock();
+
+ put_cpu();
}
/**
@@ -873,6 +834,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
unsigned long count)
{
int i = memcg_events_index(idx);
+ int cpu;
if (mem_cgroup_disabled())
return;
@@ -880,11 +842,13 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
- memcg_stats_lock();
+ cpu = get_cpu();
+
__this_cpu_add(memcg->vmstats_percpu->events[i], count);
- memcg_rstat_updated(memcg, count);
+ memcg_rstat_updated(memcg, count, cpu);
trace_count_memcg_events(memcg, idx, count);
- memcg_stats_unlock();
+
+ put_cpu();
}
unsigned long memcg_events(struct mem_cgroup *memcg, int event)
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 2/7] memcg: move preempt disable to callers of memcg_rstat_updated
2025-05-13 3:13 ` [RFC PATCH 2/7] memcg: move preempt disable to callers of memcg_rstat_updated Shakeel Butt
@ 2025-05-13 10:34 ` Vlastimil Babka
0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 10:34 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 5/13/25 05:13, Shakeel Butt wrote:
> Let's move the explicit preempt disable code to the callers of
> memcg_rstat_updated and also remove the memcg_stats_lock and related
> functions which ensures the callers of stats update functions have
> disabled preemption because now the stats update functions are
> explicitly disabling preemption.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
A welcome cleanup!
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 3/7] memcg: make mod_memcg_state re-entrant safe against irqs
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
2025-05-13 3:13 ` [RFC PATCH 1/7] memcg: memcg_rstat_updated re-entrant safe against irqs Shakeel Butt
2025-05-13 3:13 ` [RFC PATCH 2/7] memcg: move preempt disable to callers of memcg_rstat_updated Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 10:38 ` Vlastimil Babka
2025-05-13 3:13 ` [RFC PATCH 4/7] memcg: make count_memcg_events " Shakeel Butt
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
Let's make mod_memcg_state re-entrant safe against irqs. The only thing
needed is to convert the usage of __this_cpu_add() to this_cpu_add().
In addition, with re-entrant safety, there is no need to disable irqs.
mod_memcg_state() is not safe against nmi, so let's add warning if
someone tries to call it in nmi context.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
include/linux/memcontrol.h | 20 ++------------------
mm/memcontrol.c | 12 ++++++++----
2 files changed, 10 insertions(+), 22 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ed9acb68652a..84e2cea7e666 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -911,19 +911,9 @@ struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim,
struct mem_cgroup *oom_domain);
void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
-void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
- int val);
-
/* idx can be of type enum memcg_stat_item or node_stat_item */
-static inline void mod_memcg_state(struct mem_cgroup *memcg,
- enum memcg_stat_item idx, int val)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- __mod_memcg_state(memcg, idx, val);
- local_irq_restore(flags);
-}
+void mod_memcg_state(struct mem_cgroup *memcg,
+ enum memcg_stat_item idx, int val);
static inline void mod_memcg_page_state(struct page *page,
enum memcg_stat_item idx, int val)
@@ -1390,12 +1380,6 @@ static inline void mem_cgroup_print_oom_group(struct mem_cgroup *memcg)
{
}
-static inline void __mod_memcg_state(struct mem_cgroup *memcg,
- enum memcg_stat_item idx,
- int nr)
-{
-}
-
static inline void mod_memcg_state(struct mem_cgroup *memcg,
enum memcg_stat_item idx,
int nr)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 62450e7991d8..373d36cae069 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -681,12 +681,12 @@ static int memcg_state_val_in_pages(int idx, int val)
}
/**
- * __mod_memcg_state - update cgroup memory statistics
+ * mod_memcg_state - update cgroup memory statistics
* @memcg: the memory cgroup
* @idx: the stat item - can be enum memcg_stat_item or enum node_stat_item
* @val: delta to add to the counter, can be negative
*/
-void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
+void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
int val)
{
int i = memcg_stats_index(idx);
@@ -698,9 +698,13 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
+ if (WARN_ONCE(in_nmi(), "%s: called in nmi context for stat item %d\n",
+ __func__, idx))
+ return;
+
cpu = get_cpu();
- __this_cpu_add(memcg->vmstats_percpu->state[i], val);
+ this_cpu_add(memcg->vmstats_percpu->state[i], val);
val = memcg_state_val_in_pages(idx, val);
memcg_rstat_updated(memcg, val, cpu);
trace_mod_memcg_state(memcg, idx, val);
@@ -2969,7 +2973,7 @@ static void drain_obj_stock(struct obj_stock_pcp *stock)
memcg = get_mem_cgroup_from_objcg(old);
- __mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+ mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
if (!mem_cgroup_is_root(memcg))
memcg_uncharge(memcg, nr_pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/7] memcg: make mod_memcg_state re-entrant safe against irqs
2025-05-13 3:13 ` [RFC PATCH 3/7] memcg: make mod_memcg_state re-entrant safe against irqs Shakeel Butt
@ 2025-05-13 10:38 ` Vlastimil Babka
0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 10:38 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 5/13/25 05:13, Shakeel Butt wrote:
> Let's make mod_memcg_state re-entrant safe against irqs. The only thing
> needed is to convert the usage of __this_cpu_add() to this_cpu_add().
> In addition, with re-entrant safety, there is no need to disable irqs.
>
> mod_memcg_state() is not safe against nmi, so let's add warning if
> someone tries to call it in nmi context.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Also a good cleanup.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 4/7] memcg: make count_memcg_events re-entrant safe against irqs
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
` (2 preceding siblings ...)
2025-05-13 3:13 ` [RFC PATCH 3/7] memcg: make mod_memcg_state re-entrant safe against irqs Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 10:39 ` Vlastimil Babka
2025-05-13 3:13 ` [RFC PATCH 5/7] memcg: make __mod_memcg_lruvec_state " Shakeel Butt
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
Let's make count_memcg_events re-entrant safe against irqs. The only
thing needed is to convert the usage of __this_cpu_add() to
this_cpu_add(). In addition, with re-entrant safety, there is no need
to disable irqs. Also add warnings for in_nmi() as it is not safe
against nmi context.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
include/linux/memcontrol.h | 21 ++-------------------
mm/memcontrol-v1.c | 6 +++---
mm/memcontrol.c | 10 +++++++---
mm/swap.c | 8 ++++----
mm/vmscan.c | 14 +++++++-------
5 files changed, 23 insertions(+), 36 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 84e2cea7e666..31b9ab93d4e1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -952,19 +952,8 @@ static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
local_irq_restore(flags);
}
-void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
- unsigned long count);
-
-static inline void count_memcg_events(struct mem_cgroup *memcg,
- enum vm_event_item idx,
- unsigned long count)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- __count_memcg_events(memcg, idx, count);
- local_irq_restore(flags);
-}
+void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
+ unsigned long count);
static inline void count_memcg_folio_events(struct folio *folio,
enum vm_event_item idx, unsigned long nr)
@@ -1438,12 +1427,6 @@ static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
}
static inline void count_memcg_events(struct mem_cgroup *memcg,
- enum vm_event_item idx,
- unsigned long count)
-{
-}
-
-static inline void __count_memcg_events(struct mem_cgroup *memcg,
enum vm_event_item idx,
unsigned long count)
{
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 3852f0713ad2..581c960ba19b 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -512,9 +512,9 @@ static void memcg1_charge_statistics(struct mem_cgroup *memcg, int nr_pages)
{
/* pagein of a big page is an event. So, ignore page size */
if (nr_pages > 0)
- __count_memcg_events(memcg, PGPGIN, 1);
+ count_memcg_events(memcg, PGPGIN, 1);
else {
- __count_memcg_events(memcg, PGPGOUT, 1);
+ count_memcg_events(memcg, PGPGOUT, 1);
nr_pages = -nr_pages; /* for event */
}
@@ -689,7 +689,7 @@ void memcg1_uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
unsigned long flags;
local_irq_save(flags);
- __count_memcg_events(memcg, PGPGOUT, pgpgout);
+ count_memcg_events(memcg, PGPGOUT, pgpgout);
__this_cpu_add(memcg->events_percpu->nr_page_events, nr_memory);
memcg1_check_events(memcg, nid);
local_irq_restore(flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 373d36cae069..9e7dc90cc460 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -829,12 +829,12 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
}
/**
- * __count_memcg_events - account VM events in a cgroup
+ * count_memcg_events - account VM events in a cgroup
* @memcg: the memory cgroup
* @idx: the event item
* @count: the number of events that occurred
*/
-void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
+void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
unsigned long count)
{
int i = memcg_events_index(idx);
@@ -846,9 +846,13 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
+ if (WARN_ONCE(in_nmi(), "%s: called in nmi context for stat item %d\n",
+ __func__, idx))
+ return;
+
cpu = get_cpu();
- __this_cpu_add(memcg->vmstats_percpu->events[i], count);
+ this_cpu_add(memcg->vmstats_percpu->events[i], count);
memcg_rstat_updated(memcg, count, cpu);
trace_count_memcg_events(memcg, idx, count);
diff --git a/mm/swap.c b/mm/swap.c
index 77b2d5997873..4fc322f7111a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -309,7 +309,7 @@ static void lru_activate(struct lruvec *lruvec, struct folio *folio)
trace_mm_lru_activate(folio);
__count_vm_events(PGACTIVATE, nr_pages);
- __count_memcg_events(lruvec_memcg(lruvec), PGACTIVATE, nr_pages);
+ count_memcg_events(lruvec_memcg(lruvec), PGACTIVATE, nr_pages);
}
#ifdef CONFIG_SMP
@@ -581,7 +581,7 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
if (active) {
__count_vm_events(PGDEACTIVATE, nr_pages);
- __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE,
+ count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE,
nr_pages);
}
}
@@ -599,7 +599,7 @@ static void lru_deactivate(struct lruvec *lruvec, struct folio *folio)
lruvec_add_folio(lruvec, folio);
__count_vm_events(PGDEACTIVATE, nr_pages);
- __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages);
+ count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_pages);
}
static void lru_lazyfree(struct lruvec *lruvec, struct folio *folio)
@@ -625,7 +625,7 @@ static void lru_lazyfree(struct lruvec *lruvec, struct folio *folio)
lruvec_add_folio(lruvec, folio);
__count_vm_events(PGLAZYFREE, nr_pages);
- __count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, nr_pages);
+ count_memcg_events(lruvec_memcg(lruvec), PGLAZYFREE, nr_pages);
}
/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5efd939d8c76..f86d264558f5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2028,7 +2028,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
item = PGSCAN_KSWAPD + reclaimer_offset(sc);
if (!cgroup_reclaim(sc))
__count_vm_events(item, nr_scanned);
- __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
+ count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
__count_vm_events(PGSCAN_ANON + file, nr_scanned);
spin_unlock_irq(&lruvec->lru_lock);
@@ -2048,7 +2048,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
if (!cgroup_reclaim(sc))
__count_vm_events(item, nr_reclaimed);
- __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
+ count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
__count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
spin_unlock_irq(&lruvec->lru_lock);
@@ -2138,7 +2138,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
if (!cgroup_reclaim(sc))
__count_vm_events(PGREFILL, nr_scanned);
- __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
+ count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
spin_unlock_irq(&lruvec->lru_lock);
@@ -2195,7 +2195,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
nr_deactivate = move_folios_to_lru(lruvec, &l_inactive);
__count_vm_events(PGDEACTIVATE, nr_deactivate);
- __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
+ count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
spin_unlock_irq(&lruvec->lru_lock);
@@ -4616,8 +4616,8 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
__count_vm_events(item, isolated);
__count_vm_events(PGREFILL, sorted);
}
- __count_memcg_events(memcg, item, isolated);
- __count_memcg_events(memcg, PGREFILL, sorted);
+ count_memcg_events(memcg, item, isolated);
+ count_memcg_events(memcg, PGREFILL, sorted);
__count_vm_events(PGSCAN_ANON + type, isolated);
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, MAX_LRU_BATCH,
scanned, skipped, isolated,
@@ -4769,7 +4769,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
if (!cgroup_reclaim(sc))
__count_vm_events(item, reclaimed);
- __count_memcg_events(memcg, item, reclaimed);
+ count_memcg_events(memcg, item, reclaimed);
__count_vm_events(PGSTEAL_ANON + type, reclaimed);
spin_unlock_irq(&lruvec->lru_lock);
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 4/7] memcg: make count_memcg_events re-entrant safe against irqs
2025-05-13 3:13 ` [RFC PATCH 4/7] memcg: make count_memcg_events " Shakeel Butt
@ 2025-05-13 10:39 ` Vlastimil Babka
0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 10:39 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 5/13/25 05:13, Shakeel Butt wrote:
> Let's make count_memcg_events re-entrant safe against irqs. The only
> thing needed is to convert the usage of __this_cpu_add() to
> this_cpu_add(). In addition, with re-entrant safety, there is no need
> to disable irqs. Also add warnings for in_nmi() as it is not safe
> against nmi context.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 5/7] memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
` (3 preceding siblings ...)
2025-05-13 3:13 ` [RFC PATCH 4/7] memcg: make count_memcg_events " Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 10:40 ` Vlastimil Babka
2025-05-13 3:13 ` [RFC PATCH 6/7] memcg: objcg stock trylock without irq disabling Shakeel Butt
2025-05-13 3:13 ` [RFC PATCH 7/7] memcg: no stock lock for cpu hot-unplug Shakeel Butt
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
Let's make __mod_memcg_lruvec_state re-entrant safe and name it
mod_memcg_lruvec_state(). The only thing needed is to convert the usage
of __this_cpu_add() to this_cpu_add(). There are two callers of
mod_memcg_lruvec_state() and one of them i.e. __mod_objcg_mlstate() will
be re-entrant safe as well, so, rename it mod_objcg_mlstate(). The last
caller __mod_lruvec_state() still calls __mod_node_page_state() which is
not re-entrant safe yet, so keep it as is.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9e7dc90cc460..adf2f1922118 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -731,7 +731,7 @@ unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
}
#endif
-static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
+static void mod_memcg_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx,
int val)
{
@@ -743,16 +743,20 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
+ if (WARN_ONCE(in_nmi(), "%s: called in nmi context for stat item %d\n",
+ __func__, idx))
+ return;
+
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
memcg = pn->memcg;
cpu = get_cpu();
/* Update memcg */
- __this_cpu_add(memcg->vmstats_percpu->state[i], val);
+ this_cpu_add(memcg->vmstats_percpu->state[i], val);
/* Update lruvec */
- __this_cpu_add(pn->lruvec_stats_percpu->state[i], val);
+ this_cpu_add(pn->lruvec_stats_percpu->state[i], val);
val = memcg_state_val_in_pages(idx, val);
memcg_rstat_updated(memcg, val, cpu);
@@ -779,7 +783,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
/* Update memcg and lruvec */
if (!mem_cgroup_disabled())
- __mod_memcg_lruvec_state(lruvec, idx, val);
+ mod_memcg_lruvec_state(lruvec, idx, val);
}
void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
@@ -2559,7 +2563,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
folio->memcg_data = (unsigned long)memcg;
}
-static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
+static inline void mod_objcg_mlstate(struct obj_cgroup *objcg,
struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
{
@@ -2570,7 +2574,7 @@ static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
memcg = obj_cgroup_memcg(objcg);
if (likely(!in_nmi())) {
lruvec = mem_cgroup_lruvec(memcg, pgdat);
- __mod_memcg_lruvec_state(lruvec, idx, nr);
+ mod_memcg_lruvec_state(lruvec, idx, nr);
} else {
struct mem_cgroup_per_node *pn = memcg->nodeinfo[pgdat->node_id];
@@ -2901,12 +2905,12 @@ static void __account_obj_stock(struct obj_cgroup *objcg,
struct pglist_data *oldpg = stock->cached_pgdat;
if (stock->nr_slab_reclaimable_b) {
- __mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B,
+ mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B,
stock->nr_slab_reclaimable_b);
stock->nr_slab_reclaimable_b = 0;
}
if (stock->nr_slab_unreclaimable_b) {
- __mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B,
+ mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B,
stock->nr_slab_unreclaimable_b);
stock->nr_slab_unreclaimable_b = 0;
}
@@ -2932,7 +2936,7 @@ static void __account_obj_stock(struct obj_cgroup *objcg,
}
}
if (nr)
- __mod_objcg_mlstate(objcg, pgdat, idx, nr);
+ mod_objcg_mlstate(objcg, pgdat, idx, nr);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
@@ -3004,13 +3008,13 @@ static void drain_obj_stock(struct obj_stock_pcp *stock)
*/
if (stock->nr_slab_reclaimable_b || stock->nr_slab_unreclaimable_b) {
if (stock->nr_slab_reclaimable_b) {
- __mod_objcg_mlstate(old, stock->cached_pgdat,
+ mod_objcg_mlstate(old, stock->cached_pgdat,
NR_SLAB_RECLAIMABLE_B,
stock->nr_slab_reclaimable_b);
stock->nr_slab_reclaimable_b = 0;
}
if (stock->nr_slab_unreclaimable_b) {
- __mod_objcg_mlstate(old, stock->cached_pgdat,
+ mod_objcg_mlstate(old, stock->cached_pgdat,
NR_SLAB_UNRECLAIMABLE_B,
stock->nr_slab_unreclaimable_b);
stock->nr_slab_unreclaimable_b = 0;
@@ -3050,7 +3054,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
if (unlikely(in_nmi())) {
if (pgdat)
- __mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes);
+ mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes);
nr_pages = nr_bytes >> PAGE_SHIFT;
nr_bytes = nr_bytes & (PAGE_SIZE - 1);
atomic_add(nr_bytes, &objcg->nr_charged_bytes);
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 5/7] memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
2025-05-13 3:13 ` [RFC PATCH 5/7] memcg: make __mod_memcg_lruvec_state " Shakeel Butt
@ 2025-05-13 10:40 ` Vlastimil Babka
0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 10:40 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 5/13/25 05:13, Shakeel Butt wrote:
> Let's make __mod_memcg_lruvec_state re-entrant safe and name it
> mod_memcg_lruvec_state(). The only thing needed is to convert the usage
> of __this_cpu_add() to this_cpu_add(). There are two callers of
> mod_memcg_lruvec_state() and one of them i.e. __mod_objcg_mlstate() will
> be re-entrant safe as well, so, rename it mod_objcg_mlstate(). The last
> caller __mod_lruvec_state() still calls __mod_node_page_state() which is
> not re-entrant safe yet, so keep it as is.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 6/7] memcg: objcg stock trylock without irq disabling
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
` (4 preceding siblings ...)
2025-05-13 3:13 ` [RFC PATCH 5/7] memcg: make __mod_memcg_lruvec_state " Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 13:05 ` Vlastimil Babka
2025-05-13 3:13 ` [RFC PATCH 7/7] memcg: no stock lock for cpu hot-unplug Shakeel Butt
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
There is no need to disable irqs to use objcg per-cpu stock, so let's
just not do that but consume_obj_stock() and refill_obj_stock() will
need to use trylock instead to keep per-cpu stock safe. One consequence
of this change is that the charge request from irq context may take
slowpath more often but it should be rare.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index adf2f1922118..af7df675d733 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1918,18 +1918,17 @@ static void drain_local_memcg_stock(struct work_struct *dummy)
static void drain_local_obj_stock(struct work_struct *dummy)
{
struct obj_stock_pcp *stock;
- unsigned long flags;
if (WARN_ONCE(!in_task(), "drain in non-task context"))
return;
- local_lock_irqsave(&obj_stock.lock, flags);
+ local_lock(&obj_stock.lock);
stock = this_cpu_ptr(&obj_stock);
drain_obj_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
- local_unlock_irqrestore(&obj_stock.lock, flags);
+ local_unlock(&obj_stock.lock);
}
static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
@@ -2062,14 +2061,13 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct obj_stock_pcp *obj_st;
- unsigned long flags;
obj_st = &per_cpu(obj_stock, cpu);
/* drain_obj_stock requires objstock.lock */
- local_lock_irqsave(&obj_stock.lock, flags);
+ local_lock(&obj_stock.lock);
drain_obj_stock(obj_st);
- local_unlock_irqrestore(&obj_stock.lock, flags);
+ local_unlock(&obj_stock.lock);
/* no need for the local lock */
drain_stock_fully(&per_cpu(memcg_stock, cpu));
@@ -2943,14 +2941,12 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
struct pglist_data *pgdat, enum node_stat_item idx)
{
struct obj_stock_pcp *stock;
- unsigned long flags;
bool ret = false;
- if (unlikely(in_nmi()))
+ if (unlikely(in_nmi()) ||
+ !local_trylock(&obj_stock.lock))
return ret;
- local_lock_irqsave(&obj_stock.lock, flags);
-
stock = this_cpu_ptr(&obj_stock);
if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) {
stock->nr_bytes -= nr_bytes;
@@ -2960,7 +2956,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
__account_obj_stock(objcg, stock, nr_bytes, pgdat, idx);
}
- local_unlock_irqrestore(&obj_stock.lock, flags);
+ local_unlock(&obj_stock.lock);
return ret;
}
@@ -3049,10 +3045,10 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
enum node_stat_item idx)
{
struct obj_stock_pcp *stock;
- unsigned long flags;
unsigned int nr_pages = 0;
- if (unlikely(in_nmi())) {
+ if (unlikely(in_nmi()) ||
+ !local_trylock(&obj_stock.lock)) {
if (pgdat)
mod_objcg_mlstate(objcg, pgdat, idx, nr_bytes);
nr_pages = nr_bytes >> PAGE_SHIFT;
@@ -3061,8 +3057,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
goto out;
}
- local_lock_irqsave(&obj_stock.lock, flags);
-
stock = this_cpu_ptr(&obj_stock);
if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
drain_obj_stock(stock);
@@ -3083,7 +3077,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
stock->nr_bytes &= (PAGE_SIZE - 1);
}
- local_unlock_irqrestore(&obj_stock.lock, flags);
+ local_unlock(&obj_stock.lock);
out:
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 6/7] memcg: objcg stock trylock without irq disabling
2025-05-13 3:13 ` [RFC PATCH 6/7] memcg: objcg stock trylock without irq disabling Shakeel Butt
@ 2025-05-13 13:05 ` Vlastimil Babka
0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 13:05 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 5/13/25 05:13, Shakeel Butt wrote:
> There is no need to disable irqs to use objcg per-cpu stock, so let's
> just not do that but consume_obj_stock() and refill_obj_stock() will
> need to use trylock instead to keep per-cpu stock safe. One consequence
I'd rather say "to avoid deadlock".
> of this change is that the charge request from irq context may take
> slowpath more often but it should be rare.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 7/7] memcg: no stock lock for cpu hot-unplug
2025-05-13 3:13 [RFC PATCH 0/7] memcg: make memcg stats irq safe Shakeel Butt
` (5 preceding siblings ...)
2025-05-13 3:13 ` [RFC PATCH 6/7] memcg: objcg stock trylock without irq disabling Shakeel Butt
@ 2025-05-13 3:13 ` Shakeel Butt
2025-05-13 13:10 ` Vlastimil Babka
6 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2025-05-13 3:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Alexei Starovoitov, Sebastian Andrzej Siewior,
Harry Yoo, Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
Previously on the cpu hot-unplug, the kernel would call
drain_obj_stock() with objcg local lock. However local lock was not
neede as the stock which was accessed belongs to a dead cpu but we kept
it there to disable irqs as drain_obj_stock() may call
mod_objcg_mlstate() which required irqs disabled. However there is no
need to disable irqs now for mod_objcg_mlstate(), so we can remove the
lcoal lock altogether from cpu hot-unplug path.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index af7df675d733..539cd76e1492 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2060,16 +2060,8 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
- struct obj_stock_pcp *obj_st;
-
- obj_st = &per_cpu(obj_stock, cpu);
-
- /* drain_obj_stock requires objstock.lock */
- local_lock(&obj_stock.lock);
- drain_obj_stock(obj_st);
- local_unlock(&obj_stock.lock);
-
/* no need for the local lock */
+ drain_obj_stock(&per_cpu(obj_stock, cpu));
drain_stock_fully(&per_cpu(memcg_stock, cpu));
return 0;
--
2.47.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 7/7] memcg: no stock lock for cpu hot-unplug
2025-05-13 3:13 ` [RFC PATCH 7/7] memcg: no stock lock for cpu hot-unplug Shakeel Butt
@ 2025-05-13 13:10 ` Vlastimil Babka
0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-05-13 13:10 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Alexei Starovoitov, Sebastian Andrzej Siewior, Harry Yoo,
Yosry Ahmed, bpf, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 5/13/25 05:13, Shakeel Butt wrote:
> Previously on the cpu hot-unplug, the kernel would call
> drain_obj_stock() with objcg local lock. However local lock was not
> neede as the stock which was accessed belongs to a dead cpu but we kept
needed
> it there to disable irqs as drain_obj_stock() may call
> mod_objcg_mlstate() which required irqs disabled. However there is no
> need to disable irqs now for mod_objcg_mlstate(), so we can remove the
> lcoal lock altogether from cpu hot-unplug path.
local
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
I think you could switch ordering of this patch with 6/7 to avoid changing
memcg_hotplug_cpu_dead() twice?
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 16+ messages in thread