* [next -v1 0/5] Some cleanup for memcg
@ 2024-12-06 1:35 Chen Ridong
2024-12-06 1:35 ` [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1 Chen Ridong
` (5 more replies)
0 siblings, 6 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 1:35 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
Chen Ridong (5):
memcg: use OFP_PEAK_UNSET instead of -1
memcg: call the free function when allocation of pn fails
memcg: simplify the mem_cgroup_update_lru_size function
memcg: factor out the __refill_obj_stock function
memcg: factor out stat(event)/stat_local(event_local) reading
functions
mm/memcontrol.c | 140 ++++++++++++++++++++++--------------------------
1 file changed, 63 insertions(+), 77 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
@ 2024-12-06 1:35 ` Chen Ridong
2024-12-17 12:27 ` Michal Koutný
2024-12-17 17:47 ` Shakeel Butt
2024-12-06 1:35 ` [next -v1 2/5] memcg: call the free function when allocation of pn fails Chen Ridong
` (4 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 1:35 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
The 'OFP_PEAK_UNSET' has been defined, use it instead of '-1'.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b3503d12aaf..fc18633aba5d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4014,7 +4014,7 @@ static ssize_t peak_write(struct kernfs_open_file *of, char *buf, size_t nbytes,
WRITE_ONCE(peer_ctx->value, usage);
/* initial write, register watcher */
- if (ofp->value == -1)
+ if (ofp->value == OFP_PEAK_UNSET)
list_add(&ofp->list, watchers);
WRITE_ONCE(ofp->value, usage);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [next -v1 2/5] memcg: call the free function when allocation of pn fails
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
2024-12-06 1:35 ` [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1 Chen Ridong
@ 2024-12-06 1:35 ` Chen Ridong
2024-12-17 12:27 ` Michal Koutný
2024-12-17 17:47 ` Shakeel Butt
2024-12-06 1:35 ` [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function Chen Ridong
` (3 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 1:35 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
The 'free_mem_cgroup_per_node_info' function is used to free
the 'mem_cgroup_per_node' struct. Using 'pn' as the input for the
free_mem_cgroup_per_node_info function will be much clearer.
Call 'free_mem_cgroup_per_node_info' when 'alloc_mem_cgroup_per_node_info'
fails, to free 'pn' as a whole, which makes the code more cohesive.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/memcontrol.c | 27 ++++++++++++---------------
1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fc18633aba5d..da6e4e9bd0fa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3434,6 +3434,16 @@ struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino)
}
#endif
+static void free_mem_cgroup_per_node_info(struct mem_cgroup_per_node *pn)
+{
+ if (!pn)
+ return;
+
+ free_percpu(pn->lruvec_stats_percpu);
+ kfree(pn->lruvec_stats);
+ kfree(pn);
+}
+
static bool alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
@@ -3458,23 +3468,10 @@ static bool alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
memcg->nodeinfo[node] = pn;
return true;
fail:
- kfree(pn->lruvec_stats);
- kfree(pn);
+ free_mem_cgroup_per_node_info(pn);
return false;
}
-static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
-{
- struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
-
- if (!pn)
- return;
-
- free_percpu(pn->lruvec_stats_percpu);
- kfree(pn->lruvec_stats);
- kfree(pn);
-}
-
static void __mem_cgroup_free(struct mem_cgroup *memcg)
{
int node;
@@ -3482,7 +3479,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
obj_cgroup_put(memcg->orig_objcg);
for_each_node(node)
- free_mem_cgroup_per_node_info(memcg, node);
+ free_mem_cgroup_per_node_info(memcg->nodeinfo[node]);
memcg1_free_events(memcg);
kfree(memcg->vmstats);
free_percpu(memcg->vmstats_percpu);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
2024-12-06 1:35 ` [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1 Chen Ridong
2024-12-06 1:35 ` [next -v1 2/5] memcg: call the free function when allocation of pn fails Chen Ridong
@ 2024-12-06 1:35 ` Chen Ridong
2024-12-06 5:33 ` Yu Zhao
2024-12-06 1:35 ` [next -v1 4/5] memcg: factor out the __refill_obj_stock function Chen Ridong
` (2 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 1:35 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
In the `mem_cgroup_update_lru_size` function, the `lru_size` should be
updated by adding `nr_pages` regardless of whether `nr_pages` is greater
than 0 or less than 0. To simplify this function, add a check for
`nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same
actions.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/memcontrol.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index da6e4e9bd0fa..f977e0be1c04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1280,15 +1280,14 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
unsigned long *lru_size;
long size;
- if (mem_cgroup_disabled())
+ if (mem_cgroup_disabled() || !nr_pages)
return;
mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
lru_size = &mz->lru_zone_size[zid][lru];
- if (nr_pages < 0)
- *lru_size += nr_pages;
+ *lru_size += nr_pages;
size = *lru_size;
if (WARN_ONCE(size < 0,
"%s(%p, %d, %d): lru_size %ld\n",
@@ -1296,9 +1295,6 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
VM_BUG_ON(1);
*lru_size = 0;
}
-
- if (nr_pages > 0)
- *lru_size += nr_pages;
}
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [next -v1 4/5] memcg: factor out the __refill_obj_stock function
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
` (2 preceding siblings ...)
2024-12-06 1:35 ` [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function Chen Ridong
@ 2024-12-06 1:35 ` Chen Ridong
2024-12-17 18:19 ` Shakeel Butt
2024-12-06 1:35 ` [next -v1 5/5] memcg: factor out stat(event)/stat_local(event_local) reading functions Chen Ridong
2024-12-14 1:47 ` [next -v1 0/5] Some cleanup for memcg Chen Ridong
5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 1:35 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
Factor out the '__refill_obj_stock' function to make the code more
cohesive.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/memcontrol.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f977e0be1c04..0c9331d7b606 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2697,6 +2697,21 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
obj_cgroup_put(objcg);
}
+/* If the cached_objcg was refilled, return true; otherwise, return false */
+static bool __refill_obj_stock(struct memcg_stock_pcp *stock,
+ struct obj_cgroup *objcg, struct obj_cgroup **old_objcg)
+{
+ if (READ_ONCE(stock->cached_objcg) != objcg) {
+ *old_objcg = drain_obj_stock(stock);
+ obj_cgroup_get(objcg);
+ stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
+ ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
+ WRITE_ONCE(stock->cached_objcg, objcg);
+ return true;
+ }
+ return false;
+}
+
static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
{
@@ -2713,12 +2728,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
* accumulating over a page of vmstat data or when pgdat or idx
* changes.
*/
- if (READ_ONCE(stock->cached_objcg) != objcg) {
- old = drain_obj_stock(stock);
- obj_cgroup_get(objcg);
- stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
- ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
- WRITE_ONCE(stock->cached_objcg, objcg);
+ if (__refill_obj_stock(stock, objcg, &old)) {
stock->cached_pgdat = pgdat;
} else if (stock->cached_pgdat != pgdat) {
/* Flush the existing cached vmstat data */
@@ -2871,14 +2881,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
- if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
- old = drain_obj_stock(stock);
- obj_cgroup_get(objcg);
- WRITE_ONCE(stock->cached_objcg, objcg);
- stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
- ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
+ if (__refill_obj_stock(stock, objcg, &old))
allow_uncharge = true; /* Allow uncharge when objcg changes */
- }
+
stock->nr_bytes += nr_bytes;
if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [next -v1 5/5] memcg: factor out stat(event)/stat_local(event_local) reading functions
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
` (3 preceding siblings ...)
2024-12-06 1:35 ` [next -v1 4/5] memcg: factor out the __refill_obj_stock function Chen Ridong
@ 2024-12-06 1:35 ` Chen Ridong
2024-12-20 9:51 ` Chen Ridong
2024-12-14 1:47 ` [next -v1 0/5] Some cleanup for memcg Chen Ridong
5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 1:35 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
The only difference between 'lruvec_page_state' and
'lruvec_page_state_local' is that they read 'state' and 'state_local',
respectively. Factor out an inner functions to make the code more concise.
Do the same for reading 'memcg_page_stat' and 'memcg_events'.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/memcontrol.c | 72 +++++++++++++++++++++----------------------------
1 file changed, 30 insertions(+), 42 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0c9331d7b606..a1b1409a0dce 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -375,7 +375,8 @@ struct lruvec_stats {
long state_pending[NR_MEMCG_NODE_STAT_ITEMS];
};
-unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx)
+static unsigned long __lruvec_page_state(struct lruvec *lruvec,
+ enum node_stat_item idx, bool local)
{
struct mem_cgroup_per_node *pn;
long x;
@@ -389,7 +390,8 @@ unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx)
return 0;
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- x = READ_ONCE(pn->lruvec_stats->state[i]);
+ x = local ? READ_ONCE(pn->lruvec_stats->state_local[i]) :
+ READ_ONCE(pn->lruvec_stats->state[i]);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -397,27 +399,16 @@ unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx)
return x;
}
+
+unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx)
+{
+ return __lruvec_page_state(lruvec, idx, false);
+}
+
unsigned long lruvec_page_state_local(struct lruvec *lruvec,
enum node_stat_item idx)
{
- struct mem_cgroup_per_node *pn;
- long x;
- int i;
-
- if (mem_cgroup_disabled())
- return node_page_state(lruvec_pgdat(lruvec), idx);
-
- i = memcg_stats_index(idx);
- if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
- return 0;
-
- pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- x = READ_ONCE(pn->lruvec_stats->state_local[i]);
-#ifdef CONFIG_SMP
- if (x < 0)
- x = 0;
-#endif
- return x;
+ return __lruvec_page_state(lruvec, idx, true);
}
/* Subset of vm_event_item to report for memcg event stats */
@@ -651,7 +642,7 @@ static void flush_memcg_stats_dwork(struct work_struct *w)
queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
-unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
+static unsigned long __memcg_page_state(struct mem_cgroup *memcg, int idx, bool local)
{
long x;
int i = memcg_stats_index(idx);
@@ -659,7 +650,9 @@ unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return 0;
- x = READ_ONCE(memcg->vmstats->state[i]);
+ x = local ? READ_ONCE(memcg->vmstats->state_local[i]) :
+ READ_ONCE(memcg->vmstats->state[i]);
+
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -667,6 +660,11 @@ unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
return x;
}
+unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
+{
+ return __memcg_page_state(memcg, idx, false);
+}
+
static int memcg_page_state_unit(int item);
/*
@@ -709,18 +707,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
/* idx can be of type enum memcg_stat_item or node_stat_item. */
unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
{
- long x;
- int i = memcg_stats_index(idx);
-
- if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
- return 0;
-
- x = READ_ONCE(memcg->vmstats->state_local[i]);
-#ifdef CONFIG_SMP
- if (x < 0)
- x = 0;
-#endif
- return x;
+ return __memcg_page_state(memcg, idx, true);
}
static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
@@ -859,24 +846,25 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
memcg_stats_unlock();
}
-unsigned long memcg_events(struct mem_cgroup *memcg, int event)
+static unsigned long __memcg_events(struct mem_cgroup *memcg, int event, bool local)
{
int i = memcg_events_index(event);
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, event))
return 0;
- return READ_ONCE(memcg->vmstats->events[i]);
+ return local ? READ_ONCE(memcg->vmstats->events_local[i]) :
+ READ_ONCE(memcg->vmstats->events[i]);
}
-unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
+unsigned long memcg_events(struct mem_cgroup *memcg, int event)
{
- int i = memcg_events_index(event);
-
- if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, event))
- return 0;
+ return __memcg_events(memcg, event, false);
+}
- return READ_ONCE(memcg->vmstats->events_local[i]);
+unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
+{
+ return __memcg_events(memcg, event, true);
}
struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 1:35 ` [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function Chen Ridong
@ 2024-12-06 5:33 ` Yu Zhao
2024-12-06 6:40 ` Chen Ridong
0 siblings, 1 reply; 21+ messages in thread
From: Yu Zhao @ 2024-12-06 5:33 UTC (permalink / raw)
To: Chen Ridong, Hugh Dickins
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On Thu, Dec 5, 2024 at 6:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
> From: Chen Ridong <chenridong@huawei.com>
>
> In the `mem_cgroup_update_lru_size` function, the `lru_size` should be
> updated by adding `nr_pages` regardless of whether `nr_pages` is greater
> than 0 or less than 0. To simplify this function, add a check for
> `nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same
> actions.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
NAK.
The commit that added that clearly explains why it was done that way.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 5:33 ` Yu Zhao
@ 2024-12-06 6:40 ` Chen Ridong
2024-12-06 8:24 ` Hugh Dickins
0 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 6:40 UTC (permalink / raw)
To: Yu Zhao, Hugh Dickins
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On 2024/12/6 13:33, Yu Zhao wrote:
> On Thu, Dec 5, 2024 at 6:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> In the `mem_cgroup_update_lru_size` function, the `lru_size` should be
>> updated by adding `nr_pages` regardless of whether `nr_pages` is greater
>> than 0 or less than 0. To simplify this function, add a check for
>> `nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same
>> actions.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>
> NAK.
>
> The commit that added that clearly explains why it was done that way.
Thank you for your reply.
I have read the commit message for ca707239e8a7 ("mm: update_lru_size
warn and reset bad lru_size") before sending my patch. However, I did
not quite understand why we need to deal with the difference between
'nr_pages > 0' and 'nr_pages < 0'.
The 'lru_zone_size' can only be modified in the
'mem_cgroup_update_lru_size' function. Only subtracting 'nr_pages' or
adding 'nr_pages' in a way that causes an overflow can make the size < 0.
For case 1, subtracting 'nr_pages', we should issue a warning if the
size goes below 0. For case 2, when adding 'nr_pages' results in an
overflow, there will be no warning and the size won't be reset to 0 the
first time it occurs . It might be that a warning will be issued the
next time 'mem_cgroup_update_lru_size' is called to modify the
'lru_zone_size'. However, as the commit message said, "the first
occurrence is the most informative," and it seems we have missed that
first occurrence.
As the commit message said: "and then the vast unsigned long size draws
page reclaim into a loop of repeatedly", I think that a warning should
be issued and 'lru_zone_size' should be reset whenever 'size < 0' occurs
for the first time, whether from subtracting or adding 'nr_pages'.
I am be grateful if you can explain more details, it has confused me for
a while. Thank you very much.
Best regards,
Ridong
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 6:40 ` Chen Ridong
@ 2024-12-06 8:24 ` Hugh Dickins
2024-12-06 10:02 ` Chen Ridong
2024-12-06 21:20 ` Shakeel Butt
0 siblings, 2 replies; 21+ messages in thread
From: Hugh Dickins @ 2024-12-06 8:24 UTC (permalink / raw)
To: Chen Ridong
Cc: Yu Zhao, Hugh Dickins, akpm, mhocko, hannes, yosryahmed,
roman.gushchin, shakeel.butt, muchun.song, davidf, vbabka,
linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]
On Fri, 6 Dec 2024, Chen Ridong wrote:
> On 2024/12/6 13:33, Yu Zhao wrote:
> > On Thu, Dec 5, 2024 at 6:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >> From: Chen Ridong <chenridong@huawei.com>
> >>
> >> In the `mem_cgroup_update_lru_size` function, the `lru_size` should be
> >> updated by adding `nr_pages` regardless of whether `nr_pages` is greater
> >> than 0 or less than 0. To simplify this function, add a check for
> >> `nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same
> >> actions.
> >>
> >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >
> > NAK.
> >
> > The commit that added that clearly explains why it was done that way.
Thanks Yu: I did spot this going by, but was indeed hoping that someone
else would NAK it, with more politeness than I could muster at the time!
>
> Thank you for your reply.
>
> I have read the commit message for ca707239e8a7 ("mm: update_lru_size
> warn and reset bad lru_size") before sending my patch. However, I did
> not quite understand why we need to deal with the difference between
> 'nr_pages > 0' and 'nr_pages < 0'.
>
>
> The 'lru_zone_size' can only be modified in the
> 'mem_cgroup_update_lru_size' function. Only subtracting 'nr_pages' or
> adding 'nr_pages' in a way that causes an overflow can make the size < 0.
>
> For case 1, subtracting 'nr_pages', we should issue a warning if the
> size goes below 0. For case 2, when adding 'nr_pages' results in an
> overflow, there will be no warning and the size won't be reset to 0 the
> first time it occurs . It might be that a warning will be issued the
> next time 'mem_cgroup_update_lru_size' is called to modify the
> 'lru_zone_size'. However, as the commit message said, "the first
> occurrence is the most informative," and it seems we have missed that
> first occurrence.
>
> As the commit message said: "and then the vast unsigned long size draws
> page reclaim into a loop of repeatedly", I think that a warning should
> be issued and 'lru_zone_size' should be reset whenever 'size < 0' occurs
> for the first time, whether from subtracting or adding 'nr_pages'.
One thing, not obvious, but important to understand, is that WARN_ONCE()
only issues a warning the first time its condition is found true, but
it returns the true or false of the condition every time. So lru_size
is forced to 0 each time an inconsistency is detected.
(But IIRC VM_WARN_ON_ONCE() does not behave in that useful way; or maybe
I've got that macro name wrong - we have so many slightly differing.)
Perhaps understanding that will help you to make better sense of the
order of events in this function.
Another thing to understand: it's called before adding folio to list,
but after removing folio from list: when it can usefully compare whether
the emptiness of the list correctly matches lru_size 0. It cannot do so
when adding if you "simplify" it in the way that you did.
You might be focusing too much on the "size < 0" aspect of it, or you
might be worrying more than I did about size growing larger and larger
until it wraps to negative (not likely on 64-bit, unless corrupted).
I hope these remarks help, but you need to think through it again
for yourself.
Hugh
>
> I am be grateful if you can explain more details, it has confused me for
> a while. Thank you very much.
>
> Best regards,
> Ridong
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 8:24 ` Hugh Dickins
@ 2024-12-06 10:02 ` Chen Ridong
2024-12-06 21:20 ` Shakeel Butt
1 sibling, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-06 10:02 UTC (permalink / raw)
To: Hugh Dickins
Cc: Yu Zhao, akpm, mhocko, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, davidf, vbabka, linux-mm, linux-kernel,
cgroups, chenridong, wangweiyang2
On 2024/12/6 16:24, Hugh Dickins wrote:
> On Fri, 6 Dec 2024, Chen Ridong wrote:
>> On 2024/12/6 13:33, Yu Zhao wrote:
>>> On Thu, Dec 5, 2024 at 6:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> In the `mem_cgroup_update_lru_size` function, the `lru_size` should be
>>>> updated by adding `nr_pages` regardless of whether `nr_pages` is greater
>>>> than 0 or less than 0. To simplify this function, add a check for
>>>> `nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same
>>>> actions.
>>>>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>
>>> NAK.
>>>
>>> The commit that added that clearly explains why it was done that way.
>
> Thanks Yu: I did spot this going by, but was indeed hoping that someone
> else would NAK it, with more politeness than I could muster at the time!
>
>>
>> Thank you for your reply.
>>
>> I have read the commit message for ca707239e8a7 ("mm: update_lru_size
>> warn and reset bad lru_size") before sending my patch. However, I did
>> not quite understand why we need to deal with the difference between
>> 'nr_pages > 0' and 'nr_pages < 0'.
>>
>>
>> The 'lru_zone_size' can only be modified in the
>> 'mem_cgroup_update_lru_size' function. Only subtracting 'nr_pages' or
>> adding 'nr_pages' in a way that causes an overflow can make the size < 0.
>>
>> For case 1, subtracting 'nr_pages', we should issue a warning if the
>> size goes below 0. For case 2, when adding 'nr_pages' results in an
>> overflow, there will be no warning and the size won't be reset to 0 the
>> first time it occurs . It might be that a warning will be issued the
>> next time 'mem_cgroup_update_lru_size' is called to modify the
>> 'lru_zone_size'. However, as the commit message said, "the first
>> occurrence is the most informative," and it seems we have missed that
>> first occurrence.
>>
>> As the commit message said: "and then the vast unsigned long size draws
>> page reclaim into a loop of repeatedly", I think that a warning should
>> be issued and 'lru_zone_size' should be reset whenever 'size < 0' occurs
>> for the first time, whether from subtracting or adding 'nr_pages'.
>
> One thing, not obvious, but important to understand, is that WARN_ONCE()
> only issues a warning the first time its condition is found true, but
> it returns the true or false of the condition every time. So lru_size
> is forced to 0 each time an inconsistency is detected.
>
Thank you for your explanation.
My patch does not change this logic.
> (But IIRC VM_WARN_ON_ONCE() does not behave in that useful way; or maybe
> I've got that macro name wrong - we have so many slightly differing.)
>
> Perhaps understanding that will help you to make better sense of the
> order of events in this function.
>
> Another thing to understand: it's called before adding folio to list,
> but after removing folio from list: when it can usefully compare whether
> the emptiness of the list correctly matches lru_size 0. It cannot do so
> when adding if you "simplify" it in the way that you did.
>
The commit b4536f0c829c ("mm, memcg: fix the active list aging for
lowmem requests when memcg is enabled") has removed the emptiness check.
What is meaningful is that we can determine whether the size is smaller
than 0 before adding `nr_pages`. However, as I mentioned, the
`lru_zone_size` can only be modified in the `mem_cgroup_update_lru_size`
function, which means that a warning must have been issued if the size
was smaller than 0 before adding `nr_pages` when `nr_pages` is greater
than 0. In this case, it will not issue a warning again.
Perhaps "when it can usefully compare whether the emptiness of the list
correctly matches lru_size 0" is not useful now.
> You might be focusing too much on the "size < 0" aspect of it, or you
> might be worrying more than I did about size growing larger and larger
> until it wraps to negative (not likely on 64-bit, unless corrupted).> I hope these remarks help, but you need to think through it again
> for yourself.
>
> Hugh
Thank you very much for your patience.
Best regards,
Ridong
>
>>
>> I am be grateful if you can explain more details, it has confused me for
>> a while. Thank you very much.
>>
>> Best regards,
>> Ridong
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 8:24 ` Hugh Dickins
2024-12-06 10:02 ` Chen Ridong
@ 2024-12-06 21:20 ` Shakeel Butt
2024-12-10 11:35 ` Chen Ridong
1 sibling, 1 reply; 21+ messages in thread
From: Shakeel Butt @ 2024-12-06 21:20 UTC (permalink / raw)
To: Hugh Dickins
Cc: Chen Ridong, Yu Zhao, akpm, mhocko, hannes, yosryahmed,
roman.gushchin, muchun.song, davidf, vbabka, linux-mm,
linux-kernel, cgroups, chenridong, wangweiyang2
On Fri, Dec 06, 2024 at 12:24:54AM -0800, Hugh Dickins wrote:
[...]
> Another thing to understand: it's called before adding folio to list,
> but after removing folio from list: when it can usefully compare whether
> the emptiness of the list correctly matches lru_size 0.
I think one source of confusion might be that this "emptiness" check has
been removed by commit b4536f0c829c because of maintaining the list size
per-zone and actual list is shared between zones of a node.
> It cannot do so
> when adding if you "simplify" it in the way that you did.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function
2024-12-06 21:20 ` Shakeel Butt
@ 2024-12-10 11:35 ` Chen Ridong
0 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-10 11:35 UTC (permalink / raw)
To: Shakeel Butt, Hugh Dickins
Cc: Yu Zhao, akpm, mhocko, hannes, yosryahmed, roman.gushchin,
muchun.song, davidf, vbabka, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On 2024/12/7 5:20, Shakeel Butt wrote:
> On Fri, Dec 06, 2024 at 12:24:54AM -0800, Hugh Dickins wrote:
> [...]
>> Another thing to understand: it's called before adding folio to list,
>> but after removing folio from list: when it can usefully compare whether
>> the emptiness of the list correctly matches lru_size 0.
>
> I think one source of confusion might be that this "emptiness" check has
> been removed by commit b4536f0c829c because of maintaining the list size
> per-zone and actual list is shared between zones of a node.
>
Agree.
Maybe it doesn't have to distinguish between "size > 0" and "size < 0" now?
Thanks,
Ridong
>> It cannot do so
>> when adding if you "simplify" it in the way that you did.
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 0/5] Some cleanup for memcg
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
` (4 preceding siblings ...)
2024-12-06 1:35 ` [next -v1 5/5] memcg: factor out stat(event)/stat_local(event_local) reading functions Chen Ridong
@ 2024-12-14 1:47 ` Chen Ridong
5 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-14 1:47 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 2024/12/6 9:35, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
>
> Chen Ridong (5):
> memcg: use OFP_PEAK_UNSET instead of -1
> memcg: call the free function when allocation of pn fails
> memcg: simplify the mem_cgroup_update_lru_size function
> memcg: factor out the __refill_obj_stock function
> memcg: factor out stat(event)/stat_local(event_local) reading
> functions
>
> mm/memcontrol.c | 140 ++++++++++++++++++++++--------------------------
> 1 file changed, 63 insertions(+), 77 deletions(-)
>
Friendly ping.
I am looking forward to having someone review these patches, with the
exception of patch 3. I will drop patch 3(NAK by YU) in the next version.
Best regards,
Ridong
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1
2024-12-06 1:35 ` [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1 Chen Ridong
@ 2024-12-17 12:27 ` Michal Koutný
2024-12-17 16:55 ` David Finkel
2024-12-17 17:47 ` Shakeel Butt
1 sibling, 1 reply; 21+ messages in thread
From: Michal Koutný @ 2024-12-17 12:27 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
[-- Attachment #1: Type: text/plain, Size: 395 bytes --]
On Fri, Dec 06, 2024 at 01:35:08AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The 'OFP_PEAK_UNSET' has been defined, use it instead of '-1'.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Michal Koutný <mkoutny@suse.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 2/5] memcg: call the free function when allocation of pn fails
2024-12-06 1:35 ` [next -v1 2/5] memcg: call the free function when allocation of pn fails Chen Ridong
@ 2024-12-17 12:27 ` Michal Koutný
2024-12-17 17:47 ` Shakeel Butt
1 sibling, 0 replies; 21+ messages in thread
From: Michal Koutný @ 2024-12-17 12:27 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
[-- Attachment #1: Type: text/plain, Size: 743 bytes --]
On Fri, Dec 06, 2024 at 01:35:09AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The 'free_mem_cgroup_per_node_info' function is used to free
> the 'mem_cgroup_per_node' struct. Using 'pn' as the input for the
> free_mem_cgroup_per_node_info function will be much clearer.
> Call 'free_mem_cgroup_per_node_info' when 'alloc_mem_cgroup_per_node_info'
> fails, to free 'pn' as a whole, which makes the code more cohesive.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/memcontrol.c | 27 ++++++++++++---------------
> 1 file changed, 12 insertions(+), 15 deletions(-)
(Little of a judgment call but also)
Reviewed-by: Michal Koutný <mkoutny@suse.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1
2024-12-17 12:27 ` Michal Koutný
@ 2024-12-17 16:55 ` David Finkel
0 siblings, 0 replies; 21+ messages in thread
From: David Finkel @ 2024-12-17 16:55 UTC (permalink / raw)
To: Michal Koutný
Cc: Chen Ridong, akpm, mhocko, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, vbabka, linux-mm, linux-kernel,
cgroups, chenridong, wangweiyang2
On Tue, Dec 17, 2024 at 7:27 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Fri, Dec 06, 2024 at 01:35:08AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
> > From: Chen Ridong <chenridong@huawei.com>
> >
> > The 'OFP_PEAK_UNSET' has been defined, use it instead of '-1'.
> >
> > Signed-off-by: Chen Ridong <chenridong@huawei.com>
> > ---
> > mm/memcontrol.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
Thanks for cleaning this up!
Acked-By: David Finkel <davidf@vimeo.com>
--
David Finkel
Senior Principal Software Engineer, Core Services
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1
2024-12-06 1:35 ` [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1 Chen Ridong
2024-12-17 12:27 ` Michal Koutný
@ 2024-12-17 17:47 ` Shakeel Butt
1 sibling, 0 replies; 21+ messages in thread
From: Shakeel Butt @ 2024-12-17 17:47 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, muchun.song,
davidf, vbabka, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
On Fri, Dec 06, 2024 at 01:35:08AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The 'OFP_PEAK_UNSET' has been defined, use it instead of '-1'.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 2/5] memcg: call the free function when allocation of pn fails
2024-12-06 1:35 ` [next -v1 2/5] memcg: call the free function when allocation of pn fails Chen Ridong
2024-12-17 12:27 ` Michal Koutný
@ 2024-12-17 17:47 ` Shakeel Butt
1 sibling, 0 replies; 21+ messages in thread
From: Shakeel Butt @ 2024-12-17 17:47 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, muchun.song,
davidf, vbabka, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
On Fri, Dec 06, 2024 at 01:35:09AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The 'free_mem_cgroup_per_node_info' function is used to free
> the 'mem_cgroup_per_node' struct. Using 'pn' as the input for the
> free_mem_cgroup_per_node_info function will be much clearer.
> Call 'free_mem_cgroup_per_node_info' when 'alloc_mem_cgroup_per_node_info'
> fails, to free 'pn' as a whole, which makes the code more cohesive.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 4/5] memcg: factor out the __refill_obj_stock function
2024-12-06 1:35 ` [next -v1 4/5] memcg: factor out the __refill_obj_stock function Chen Ridong
@ 2024-12-17 18:19 ` Shakeel Butt
2024-12-19 1:54 ` Chen Ridong
0 siblings, 1 reply; 21+ messages in thread
From: Shakeel Butt @ 2024-12-17 18:19 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, muchun.song,
davidf, vbabka, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
On Fri, Dec 06, 2024 at 01:35:11AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Factor out the '__refill_obj_stock' function to make the code more
> cohesive.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/memcontrol.c | 31 ++++++++++++++++++-------------
> 1 file changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f977e0be1c04..0c9331d7b606 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2697,6 +2697,21 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
> obj_cgroup_put(objcg);
> }
>
> +/* If the cached_objcg was refilled, return true; otherwise, return false */
> +static bool __refill_obj_stock(struct memcg_stock_pcp *stock,
> + struct obj_cgroup *objcg, struct obj_cgroup **old_objcg)
> +{
> + if (READ_ONCE(stock->cached_objcg) != objcg) {
Keep the above check in the calling functions and make this a void
function. Also I think we need a better name.
> + *old_objcg = drain_obj_stock(stock);
> + obj_cgroup_get(objcg);
> + stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
> + ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
> + WRITE_ONCE(stock->cached_objcg, objcg);
> + return true;
> + }
> + return false;
> +}
> +
> static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> enum node_stat_item idx, int nr)
> {
> @@ -2713,12 +2728,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> * accumulating over a page of vmstat data or when pgdat or idx
> * changes.
> */
> - if (READ_ONCE(stock->cached_objcg) != objcg) {
> - old = drain_obj_stock(stock);
> - obj_cgroup_get(objcg);
> - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
> - ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
> - WRITE_ONCE(stock->cached_objcg, objcg);
> + if (__refill_obj_stock(stock, objcg, &old)) {
> stock->cached_pgdat = pgdat;
> } else if (stock->cached_pgdat != pgdat) {
> /* Flush the existing cached vmstat data */
> @@ -2871,14 +2881,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
> local_lock_irqsave(&memcg_stock.stock_lock, flags);
>
> stock = this_cpu_ptr(&memcg_stock);
> - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
> - old = drain_obj_stock(stock);
> - obj_cgroup_get(objcg);
> - WRITE_ONCE(stock->cached_objcg, objcg);
> - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
> - ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
> + if (__refill_obj_stock(stock, objcg, &old))
> allow_uncharge = true; /* Allow uncharge when objcg changes */
> - }
> +
> stock->nr_bytes += nr_bytes;
>
> if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 4/5] memcg: factor out the __refill_obj_stock function
2024-12-17 18:19 ` Shakeel Butt
@ 2024-12-19 1:54 ` Chen Ridong
0 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-19 1:54 UTC (permalink / raw)
To: Shakeel Butt
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, muchun.song,
davidf, vbabka, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
On 2024/12/18 2:19, Shakeel Butt wrote:
> On Fri, Dec 06, 2024 at 01:35:11AM +0000, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Factor out the '__refill_obj_stock' function to make the code more
>> cohesive.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> mm/memcontrol.c | 31 ++++++++++++++++++-------------
>> 1 file changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index f977e0be1c04..0c9331d7b606 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2697,6 +2697,21 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
>> obj_cgroup_put(objcg);
>> }
>>
>> +/* If the cached_objcg was refilled, return true; otherwise, return false */
>> +static bool __refill_obj_stock(struct memcg_stock_pcp *stock,
>> + struct obj_cgroup *objcg, struct obj_cgroup **old_objcg)
>> +{
>> + if (READ_ONCE(stock->cached_objcg) != objcg) {
>
> Keep the above check in the calling functions and make this a void
> function. Also I think we need a better name.
>
Thank you for your review。
How about keeping the check in the calling functions and renaming like that:
/* Replace the stock objcg with objcg, return the old objcg */
static obj_cgroup *replace_stock_objcg (struct memcg_stock_pcp *stock,
struct obj_cgroup *objcg)
Best regards,
Ridong
>> + *old_objcg = drain_obj_stock(stock);
>> + obj_cgroup_get(objcg);
>> + stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
>> + ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
>> + WRITE_ONCE(stock->cached_objcg, objcg);
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>> enum node_stat_item idx, int nr)
>> {
>> @@ -2713,12 +2728,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>> * accumulating over a page of vmstat data or when pgdat or idx
>> * changes.
>> */
>> - if (READ_ONCE(stock->cached_objcg) != objcg) {
>> - old = drain_obj_stock(stock);
>> - obj_cgroup_get(objcg);
>> - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
>> - ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
>> - WRITE_ONCE(stock->cached_objcg, objcg);
>> + if (__refill_obj_stock(stock, objcg, &old)) {
>> stock->cached_pgdat = pgdat;
>> } else if (stock->cached_pgdat != pgdat) {
>> /* Flush the existing cached vmstat data */
>> @@ -2871,14 +2881,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
>> local_lock_irqsave(&memcg_stock.stock_lock, flags);
>>
>> stock = this_cpu_ptr(&memcg_stock);
>> - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
>> - old = drain_obj_stock(stock);
>> - obj_cgroup_get(objcg);
>> - WRITE_ONCE(stock->cached_objcg, objcg);
>> - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
>> - ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
>> + if (__refill_obj_stock(stock, objcg, &old))
>> allow_uncharge = true; /* Allow uncharge when objcg changes */
>> - }
>> +
>> stock->nr_bytes += nr_bytes;
>>
>> if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [next -v1 5/5] memcg: factor out stat(event)/stat_local(event_local) reading functions
2024-12-06 1:35 ` [next -v1 5/5] memcg: factor out stat(event)/stat_local(event_local) reading functions Chen Ridong
@ 2024-12-20 9:51 ` Chen Ridong
0 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2024-12-20 9:51 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 2024/12/6 9:35, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The only difference between 'lruvec_page_state' and
> 'lruvec_page_state_local' is that they read 'state' and 'state_local',
> respectively. Factor out an inner functions to make the code more concise.
> Do the same for reading 'memcg_page_stat' and 'memcg_events'.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
I hope someone can review this patch.
If there are any comments or suggestions, I will update this series
together,
Best regards
Ridong
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-12-20 9:52 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 1:35 [next -v1 0/5] Some cleanup for memcg Chen Ridong
2024-12-06 1:35 ` [next -v1 1/5] memcg: use OFP_PEAK_UNSET instead of -1 Chen Ridong
2024-12-17 12:27 ` Michal Koutný
2024-12-17 16:55 ` David Finkel
2024-12-17 17:47 ` Shakeel Butt
2024-12-06 1:35 ` [next -v1 2/5] memcg: call the free function when allocation of pn fails Chen Ridong
2024-12-17 12:27 ` Michal Koutný
2024-12-17 17:47 ` Shakeel Butt
2024-12-06 1:35 ` [next -v1 3/5] memcg: simplify the mem_cgroup_update_lru_size function Chen Ridong
2024-12-06 5:33 ` Yu Zhao
2024-12-06 6:40 ` Chen Ridong
2024-12-06 8:24 ` Hugh Dickins
2024-12-06 10:02 ` Chen Ridong
2024-12-06 21:20 ` Shakeel Butt
2024-12-10 11:35 ` Chen Ridong
2024-12-06 1:35 ` [next -v1 4/5] memcg: factor out the __refill_obj_stock function Chen Ridong
2024-12-17 18:19 ` Shakeel Butt
2024-12-19 1:54 ` Chen Ridong
2024-12-06 1:35 ` [next -v1 5/5] memcg: factor out stat(event)/stat_local(event_local) reading functions Chen Ridong
2024-12-20 9:51 ` Chen Ridong
2024-12-14 1:47 ` [next -v1 0/5] Some cleanup for memcg Chen Ridong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).