* [PATCH v2 0/9] memcg: cleanup per-cpu stock
@ 2025-04-04 1:39 Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
This is a cleanup series which is trying to simplify the memcg per-cpu
stock code, particularly it tries to remove unnecessary dependencies on
local_lock of per-cpu memcg stock. The eight patch from Vlastimil
optimizes the charge path by combining the charging and accounting.
This series is based on mm-everything-2025-04-03-06-03. The main changes
from v1 is collecting acks, adding warning in patch 1 and rebased on the
latest local lock changes from Alexei.
Shakeel Butt (8):
memcg: remove root memcg check from refill_stock
memcg: decouple drain_obj_stock from local stock
memcg: introduce memcg_uncharge
memcg: manually inline __refill_stock
memcg: no refilling stock from obj_cgroup_release
memcg: do obj_cgroup_put inside drain_obj_stock
memcg: use __mod_memcg_state in drain_obj_stock
memcg: manually inline replace_stock_objcg
Vlastimil Babka (1):
memcg: combine slab obj stock charging and accounting
mm/memcontrol.c | 196 ++++++++++++++++++++++++------------------------
1 file changed, 96 insertions(+), 100 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/9] memcg: remove root memcg check from refill_stock
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
refill_stock can not be called with root memcg, so there is no need to
check it. Instead add a warning if root is ever passed to it.
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b16b5b807d7c..ae1e953cead7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1893,13 +1893,13 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
unsigned long flags;
+ VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg));
+
if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
/*
* In case of unlikely failure to lock percpu stock_lock
* uncharge memcg directly.
*/
- if (mem_cgroup_is_root(memcg))
- return;
page_counter_uncharge(&memcg->memory, nr_pages);
if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, nr_pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/9] memcg: decouple drain_obj_stock from local stock
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 3/9] memcg: introduce memcg_uncharge Shakeel Butt
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
Currently drain_obj_stock() can potentially call __refill_stock which
accesses local cpu stock and thus requires memcg stock's local_lock.
However if we look at the code paths leading to drain_obj_stock(), there
is never a good reason to refill the memcg stock at all from it.
At the moment, drain_obj_stock can be called from reclaim, hotplug cpu
teardown, mod_objcg_state() and refill_obj_stock(). For reclaim and
hotplug there is no need to refill. For the other two paths, most
probably the newly switched objcg would be used in near future and thus
no need to refill stock with the older objcg.
In addition, __refill_stock() from drain_obj_stock() happens on rare
cases, so performance is not really an issue. Let's just uncharge
directly instead of refill which will also decouple drain_obj_stock from
local cpu stock and local_lock requirements.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1e953cead7..52be78515d70 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2876,7 +2876,12 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
- __refill_stock(memcg, nr_pages);
+ if (!mem_cgroup_is_root(memcg)) {
+ page_counter_uncharge(&memcg->memory, nr_pages);
+ if (do_memsw_account())
+ page_counter_uncharge(&memcg->memsw,
+ nr_pages);
+ }
css_put(&memcg->css);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/9] memcg: introduce memcg_uncharge
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 4/9] memcg: manually inline __refill_stock Shakeel Butt
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
At multiple places in memcontrol.c, the memory and memsw page counters
are being uncharged. This is error-prone. Let's move the functionality
to a newly introduced memcg_uncharge and call it from all those places.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 52be78515d70..dfb3f14c1178 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1822,6 +1822,13 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
return ret;
}
+static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ page_counter_uncharge(&memcg->memory, nr_pages);
+ if (do_memsw_account())
+ page_counter_uncharge(&memcg->memsw, nr_pages);
+}
+
/*
* Returns stocks cached in percpu and reset cached information.
*/
@@ -1834,10 +1841,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
return;
if (stock_pages) {
- page_counter_uncharge(&old->memory, stock_pages);
- if (do_memsw_account())
- page_counter_uncharge(&old->memsw, stock_pages);
-
+ memcg_uncharge(old, stock_pages);
WRITE_ONCE(stock->nr_pages, 0);
}
@@ -1900,9 +1904,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
* In case of unlikely failure to lock percpu stock_lock
* uncharge memcg directly.
*/
- page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_uncharge(&memcg->memsw, nr_pages);
+ memcg_uncharge(memcg, nr_pages);
return;
}
__refill_stock(memcg, nr_pages);
@@ -2876,12 +2878,8 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
- if (!mem_cgroup_is_root(memcg)) {
- page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_uncharge(&memcg->memsw,
- nr_pages);
- }
+ if (!mem_cgroup_is_root(memcg))
+ memcg_uncharge(memcg, nr_pages);
css_put(&memcg->css);
}
@@ -4702,9 +4700,7 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
static void uncharge_batch(const struct uncharge_gather *ug)
{
if (ug->nr_memory) {
- page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
- if (do_memsw_account())
- page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
+ memcg_uncharge(ug->memcg, ug->nr_memory);
if (ug->nr_kmem) {
mod_memcg_state(ug->memcg, MEMCG_KMEM, -ug->nr_kmem);
memcg1_account_kmem(ug->memcg, -ug->nr_kmem);
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/9] memcg: manually inline __refill_stock
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (2 preceding siblings ...)
2025-04-04 1:39 ` [PATCH v2 3/9] memcg: introduce memcg_uncharge Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
There are no more multiple callers of __refill_stock(), so simply inline
it to refill_stock().
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 34 +++++++++++++---------------------
1 file changed, 13 insertions(+), 21 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dfb3f14c1178..03a2be6d4a67 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1871,14 +1871,22 @@ static void drain_local_stock(struct work_struct *dummy)
obj_cgroup_put(old);
}
-/*
- * Cache charges(val) to local per_cpu area.
- * This will be consumed by consume_stock() function, later.
- */
-static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
struct memcg_stock_pcp *stock;
unsigned int stock_pages;
+ unsigned long flags;
+
+ VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg));
+
+ if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+ /*
+ * In case of unlikely failure to lock percpu stock_lock
+ * uncharge memcg directly.
+ */
+ memcg_uncharge(memcg, nr_pages);
+ return;
+ }
stock = this_cpu_ptr(&memcg_stock);
if (READ_ONCE(stock->cached) != memcg) { /* reset if necessary */
@@ -1891,23 +1899,7 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
if (stock_pages > MEMCG_CHARGE_BATCH)
drain_stock(stock);
-}
-
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
- unsigned long flags;
-
- VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg));
- if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
- /*
- * In case of unlikely failure to lock percpu stock_lock
- * uncharge memcg directly.
- */
- memcg_uncharge(memcg, nr_pages);
- return;
- }
- __refill_stock(memcg, nr_pages);
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/9] memcg: no refilling stock from obj_cgroup_release
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (3 preceding siblings ...)
2025-04-04 1:39 ` [PATCH v2 4/9] memcg: manually inline __refill_stock Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
obj_cgroup_release is called when all the references to the objcg have
been released i.e. no more memory objects are pointing to it. Most
probably objcg->memcg will be pointing to some ancestor memcg. In
obj_cgroup_release(), the kernel calls obj_cgroup_uncharge_pages() which
refills the local stock.
There is no need to refill the local stock with some ancestor memcg and
flush the local stock. Let's decouple obj_cgroup_release() from the
local stock by uncharging instead of refilling. One additional benefit
of this change is that it removes the requirement to only call
obj_cgroup_put() outside of local_lock.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 03a2be6d4a67..df52084e90f4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -129,8 +129,7 @@ bool mem_cgroup_kmem_disabled(void)
return cgroup_memory_nokmem;
}
-static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
- unsigned int nr_pages);
+static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages);
static void obj_cgroup_release(struct percpu_ref *ref)
{
@@ -163,8 +162,16 @@ static void obj_cgroup_release(struct percpu_ref *ref)
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
nr_pages = nr_bytes >> PAGE_SHIFT;
- if (nr_pages)
- obj_cgroup_uncharge_pages(objcg, nr_pages);
+ if (nr_pages) {
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+ memcg1_account_kmem(memcg, -nr_pages);
+ if (!mem_cgroup_is_root(memcg))
+ memcg_uncharge(memcg, nr_pages);
+ mem_cgroup_put(memcg);
+ }
spin_lock_irqsave(&objcg_lock, flags);
list_del(&objcg->list);
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/9] memcg: do obj_cgroup_put inside drain_obj_stock
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (4 preceding siblings ...)
2025-04-04 1:39 ` [PATCH v2 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-11 8:34 ` Vlastimil Babka
2025-04-04 1:39 ` [PATCH v2 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
` (2 subsequent siblings)
8 siblings, 1 reply; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
Previously we could not call obj_cgroup_put() inside the local lock
because on the put on the last reference, the release function
obj_cgroup_release() may try to re-acquire the local lock. However that
chain has been broken. Now simply do obj_cgroup_put() inside
drain_obj_stock() instead of returning the old objcg.
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 37 +++++++++++--------------------------
1 file changed, 11 insertions(+), 26 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index df52084e90f4..7988a42b29bf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1785,7 +1785,7 @@ static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = {
};
static DEFINE_MUTEX(percpu_charge_mutex);
-static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock);
+static void drain_obj_stock(struct memcg_stock_pcp *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
@@ -1859,7 +1859,6 @@ static void drain_stock(struct memcg_stock_pcp *stock)
static void drain_local_stock(struct work_struct *dummy)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old = NULL;
unsigned long flags;
/*
@@ -1870,12 +1869,11 @@ static void drain_local_stock(struct work_struct *dummy)
local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
- old = drain_obj_stock(stock);
+ drain_obj_stock(stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
- obj_cgroup_put(old);
}
static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
@@ -1958,18 +1956,16 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old;
unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
/* drain_obj_stock requires stock_lock */
local_lock_irqsave(&memcg_stock.stock_lock, flags);
- old = drain_obj_stock(stock);
+ drain_obj_stock(stock);
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
drain_stock(stock);
- obj_cgroup_put(old);
return 0;
}
@@ -2766,24 +2762,20 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
}
/* Replace the stock objcg with objcg, return the old objcg */
-static struct obj_cgroup *replace_stock_objcg(struct memcg_stock_pcp *stock,
- struct obj_cgroup *objcg)
+static void replace_stock_objcg(struct memcg_stock_pcp *stock,
+ struct obj_cgroup *objcg)
{
- struct obj_cgroup *old = NULL;
-
- old = drain_obj_stock(stock);
+ drain_obj_stock(stock);
obj_cgroup_get(objcg);
stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
WRITE_ONCE(stock->cached_objcg, objcg);
- return old;
}
static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old = NULL;
unsigned long flags;
int *bytes;
@@ -2796,7 +2788,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
* changes.
*/
if (READ_ONCE(stock->cached_objcg) != objcg) {
- old = replace_stock_objcg(stock, objcg);
+ replace_stock_objcg(stock, objcg);
stock->cached_pgdat = pgdat;
} else if (stock->cached_pgdat != pgdat) {
/* Flush the existing cached vmstat data */
@@ -2837,7 +2829,6 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
__mod_objcg_mlstate(objcg, pgdat, idx, nr);
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
- obj_cgroup_put(old);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -2859,12 +2850,12 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
return ret;
}
-static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
+static void drain_obj_stock(struct memcg_stock_pcp *stock)
{
struct obj_cgroup *old = READ_ONCE(stock->cached_objcg);
if (!old)
- return NULL;
+ return;
if (stock->nr_bytes) {
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
@@ -2917,11 +2908,7 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
}
WRITE_ONCE(stock->cached_objcg, NULL);
- /*
- * The `old' objects needs to be released by the caller via
- * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock.
- */
- return old;
+ obj_cgroup_put(old);
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -2943,7 +2930,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
bool allow_uncharge)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old = NULL;
unsigned long flags;
unsigned int nr_pages = 0;
@@ -2951,7 +2937,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
stock = this_cpu_ptr(&memcg_stock);
if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
- old = replace_stock_objcg(stock, objcg);
+ replace_stock_objcg(stock, objcg);
allow_uncharge = true; /* Allow uncharge when objcg changes */
}
stock->nr_bytes += nr_bytes;
@@ -2962,7 +2948,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
}
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
- obj_cgroup_put(old);
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (5 preceding siblings ...)
2025-04-04 1:39 ` [PATCH v2 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
disabled, so we can use __mod_memcg_state() instead of
mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
in __mod_memcg_state().
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7988a42b29bf..33aeddfff0ba 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -710,10 +710,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
+ memcg_stats_lock();
__this_cpu_add(memcg->vmstats_percpu->state[i], val);
val = memcg_state_val_in_pages(idx, val);
memcg_rstat_updated(memcg, val);
trace_mod_memcg_state(memcg, idx, val);
+ memcg_stats_unlock();
}
#ifdef CONFIG_MEMCG_V1
@@ -2866,7 +2868,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
memcg = get_mem_cgroup_from_objcg(old);
- mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+ __mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
if (!mem_cgroup_is_root(memcg))
memcg_uncharge(memcg, nr_pages);
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 8/9] memcg: combine slab obj stock charging and accounting
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (6 preceding siblings ...)
2025-04-04 1:39 ` [PATCH v2 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
From: Vlastimil Babka <vbabka@suse.cz>
When handing slab objects, we use obj_cgroup_[un]charge() for
(un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B.
All these operations use the percpu stock for performance. However with
the calls being separate, the stock_lock is taken twice in each case.
By refactoring the code, we can turn mod_objcg_state() into
__account_obj_stock() which is called on a stock that's already locked
and validated. On the charging side we can call this function from
consume_obj_stock() when it succeeds, and refill_obj_stock() in the
fallback. We just expand parameters of these functions as necessary.
The uncharge side from __memcg_slab_free_hook() is just the call to
refill_obj_stock().
Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the
extra parameters as NULL/zeroes to skip the __account_obj_stock()
operation.
In __memcg_slab_post_alloc_hook() we now charge each object separately,
but that's not a problem as we did call mod_objcg_state() for each
object separately, and most allocations are non-bulk anyway. This
could be improved by batching all operations until slab_pgdat(slab)
changes.
Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M
iterations with/without __GFP_ACCOUNT:
Before the patch:
kmalloc/kfree !memcg: 581390144 cycles
kmalloc/kfree memcg: 783689984 cycles
After the patch:
kmalloc/kfree memcg: 658723808 cycles
More than half of the overhead of __GFP_ACCOUNT relative to
non-accounted case seems eliminated.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 77 +++++++++++++++++++++++++++++--------------------
1 file changed, 46 insertions(+), 31 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 33aeddfff0ba..3bb02f672e39 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2774,25 +2774,17 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
WRITE_ONCE(stock->cached_objcg, objcg);
}
-static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
- enum node_stat_item idx, int nr)
+static void __account_obj_stock(struct obj_cgroup *objcg,
+ struct memcg_stock_pcp *stock, int nr,
+ struct pglist_data *pgdat, enum node_stat_item idx)
{
- struct memcg_stock_pcp *stock;
- unsigned long flags;
int *bytes;
- local_lock_irqsave(&memcg_stock.stock_lock, flags);
- stock = this_cpu_ptr(&memcg_stock);
-
/*
* Save vmstat data in stock and skip vmstat array update unless
- * accumulating over a page of vmstat data or when pgdat or idx
- * changes.
+ * accumulating over a page of vmstat data or when pgdat changes.
*/
- if (READ_ONCE(stock->cached_objcg) != objcg) {
- replace_stock_objcg(stock, objcg);
- stock->cached_pgdat = pgdat;
- } else if (stock->cached_pgdat != pgdat) {
+ if (stock->cached_pgdat != pgdat) {
/* Flush the existing cached vmstat data */
struct pglist_data *oldpg = stock->cached_pgdat;
@@ -2829,11 +2821,10 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
}
if (nr)
__mod_objcg_mlstate(objcg, pgdat, idx, nr);
-
- local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
-static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
+static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
+ struct pglist_data *pgdat, enum node_stat_item idx)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
@@ -2845,6 +2836,9 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) {
stock->nr_bytes -= nr_bytes;
ret = true;
+
+ if (pgdat)
+ __account_obj_stock(objcg, stock, nr_bytes, pgdat, idx);
}
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
@@ -2929,7 +2923,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
}
static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
- bool allow_uncharge)
+ bool allow_uncharge, int nr_acct, struct pglist_data *pgdat,
+ enum node_stat_item idx)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
@@ -2944,6 +2939,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
}
stock->nr_bytes += nr_bytes;
+ if (pgdat)
+ __account_obj_stock(objcg, stock, nr_acct, pgdat, idx);
+
if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
nr_pages = stock->nr_bytes >> PAGE_SHIFT;
stock->nr_bytes &= (PAGE_SIZE - 1);
@@ -2955,12 +2953,13 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
obj_cgroup_uncharge_pages(objcg, nr_pages);
}
-int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
+static int obj_cgroup_charge_account(struct obj_cgroup *objcg, gfp_t gfp, size_t size,
+ struct pglist_data *pgdat, enum node_stat_item idx)
{
unsigned int nr_pages, nr_bytes;
int ret;
- if (consume_obj_stock(objcg, size))
+ if (likely(consume_obj_stock(objcg, size, pgdat, idx)))
return 0;
/*
@@ -2993,15 +2992,21 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
nr_pages += 1;
ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
- if (!ret && nr_bytes)
- refill_obj_stock(objcg, PAGE_SIZE - nr_bytes, false);
+ if (!ret && (nr_bytes || pgdat))
+ refill_obj_stock(objcg, nr_bytes ? PAGE_SIZE - nr_bytes : 0,
+ false, size, pgdat, idx);
return ret;
}
+int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
+{
+ return obj_cgroup_charge_account(objcg, gfp, size, NULL, 0);
+}
+
void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
{
- refill_obj_stock(objcg, size, true);
+ refill_obj_stock(objcg, size, true, 0, NULL, 0);
}
static inline size_t obj_full_size(struct kmem_cache *s)
@@ -3053,23 +3058,32 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
return false;
}
- if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s)))
- return false;
-
for (i = 0; i < size; i++) {
slab = virt_to_slab(p[i]);
if (!slab_obj_exts(slab) &&
alloc_slab_obj_exts(slab, s, flags, false)) {
- obj_cgroup_uncharge(objcg, obj_full_size(s));
continue;
}
+ /*
+ * if we fail and size is 1, memcg_alloc_abort_single() will
+ * just free the object, which is ok as we have not assigned
+ * objcg to its obj_ext yet
+ *
+ * for larger sizes, kmem_cache_free_bulk() will uncharge
+ * any objects that were already charged and obj_ext assigned
+ *
+ * TODO: we could batch this until slab_pgdat(slab) changes
+ * between iterations, with a more complicated undo
+ */
+ if (obj_cgroup_charge_account(objcg, flags, obj_full_size(s),
+ slab_pgdat(slab), cache_vmstat_idx(s)))
+ return false;
+
off = obj_to_index(s, slab, p[i]);
obj_cgroup_get(objcg);
slab_obj_exts(slab)[off].objcg = objcg;
- mod_objcg_state(objcg, slab_pgdat(slab),
- cache_vmstat_idx(s), obj_full_size(s));
}
return true;
@@ -3078,6 +3092,8 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects, struct slabobj_ext *obj_exts)
{
+ size_t obj_size = obj_full_size(s);
+
for (int i = 0; i < objects; i++) {
struct obj_cgroup *objcg;
unsigned int off;
@@ -3088,9 +3104,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
continue;
obj_exts[off].objcg = NULL;
- obj_cgroup_uncharge(objcg, obj_full_size(s));
- mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
- -obj_full_size(s));
+ refill_obj_stock(objcg, obj_size, true, -obj_size,
+ slab_pgdat(slab), cache_vmstat_idx(s));
obj_cgroup_put(objcg);
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 9/9] memcg: manually inline replace_stock_objcg
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (7 preceding siblings ...)
2025-04-04 1:39 ` [PATCH v2 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
@ 2025-04-04 1:39 ` Shakeel Butt
8 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2025-04-04 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
The replace_stock_objcg() is being called by only refill_obj_stock, so
manually inline it.
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3bb02f672e39..aebb1f2c8657 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2763,17 +2763,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
obj_cgroup_put(objcg);
}
-/* Replace the stock objcg with objcg, return the old objcg */
-static void replace_stock_objcg(struct memcg_stock_pcp *stock,
- struct obj_cgroup *objcg)
-{
- drain_obj_stock(stock);
- obj_cgroup_get(objcg);
- stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
- ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
- WRITE_ONCE(stock->cached_objcg, objcg);
-}
-
static void __account_obj_stock(struct obj_cgroup *objcg,
struct memcg_stock_pcp *stock, int nr,
struct pglist_data *pgdat, enum node_stat_item idx)
@@ -2934,7 +2923,12 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
stock = this_cpu_ptr(&memcg_stock);
if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
- replace_stock_objcg(stock, objcg);
+ drain_obj_stock(stock);
+ obj_cgroup_get(objcg);
+ stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
+ ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
+ WRITE_ONCE(stock->cached_objcg, objcg);
+
allow_uncharge = true; /* Allow uncharge when objcg changes */
}
stock->nr_bytes += nr_bytes;
--
2.47.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 6/9] memcg: do obj_cgroup_put inside drain_obj_stock
2025-04-04 1:39 ` [PATCH v2 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
@ 2025-04-11 8:34 ` Vlastimil Babka
0 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2025-04-11 8:34 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 4/4/25 03:39, Shakeel Butt wrote:
> Previously we could not call obj_cgroup_put() inside the local lock
> because on the put on the last reference, the release function
> obj_cgroup_release() may try to re-acquire the local lock. However that
> chain has been broken. Now simply do obj_cgroup_put() inside
> drain_obj_stock() instead of returning the old objcg.
>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-04-11 8:34 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 1:39 [PATCH v2 0/9] memcg: cleanup per-cpu stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 3/9] memcg: introduce memcg_uncharge Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 4/9] memcg: manually inline __refill_stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
2025-04-11 8:34 ` Vlastimil Babka
2025-04-04 1:39 ` [PATCH v2 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
2025-04-04 1:39 ` [PATCH v2 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).