From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,yosryahmed@google.com,roman.gushchin@linux.dev,muchun.song@linux.dev,mhocko@kernel.org,hannes@cmpxchg.org,shakeel.butt@linux.dev,akpm@linux-foundation.org
Subject: + memcg-reduce-memory-for-the-lruvec-and-memcg-stats.patch added to mm-unstable branch
Date: Mon, 29 Apr 2024 10:19:13 -0700 [thread overview]
Message-ID: <20240429171914.8E86AC113CD@smtp.kernel.org> (raw)
The patch titled
Subject: memcg: reduce memory for the lruvec and memcg stats
has been added to the -mm mm-unstable branch. Its filename is
memcg-reduce-memory-for-the-lruvec-and-memcg-stats.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/memcg-reduce-memory-for-the-lruvec-and-memcg-stats.patch
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Shakeel Butt <shakeel.butt@linux.dev>
Subject: memcg: reduce memory for the lruvec and memcg stats
Date: Fri, 26 Apr 2024 17:37:29 -0700
At the moment, the amount of memory allocated for stats related structs in
the mem_cgroup corresponds to the size of enum node_stat_item. However
not all fields in enum node_stat_item has corresponding memcg stats. So,
let's use indirection mechanism similar to the one used for memcg vmstats
management.
For a given x86_64 config, the size of stats with and without patch is:
structs size in bytes w/o with
struct lruvec_stats 1128 648
struct lruvec_stats_percpu 752 432
struct memcg_vmstats 1832 1352
struct memcg_vmstats_percpu 1280 960
The memory savings is further compounded by the fact that these structs
are allocated for each cpu and for each node. To be precise, for each
memcg the memory saved would be:
Memory saved = ((21 * 3 * NR_NODES) + (21 * 2 * NR_NODS * NR_CPUS) +
(21 * 3) + (21 * 2 * NR_CPUS)) * sizeof(long)
Where 21 is the number of fields eliminated.
Link: https://lkml.kernel.org/r/20240427003733.3898961-4-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 115 insertions(+), 23 deletions(-)
--- a/mm/memcontrol.c~memcg-reduce-memory-for-the-lruvec-and-memcg-stats
+++ a/mm/memcontrol.c
@@ -575,35 +575,105 @@ mem_cgroup_largest_soft_limit_node(struc
return mz;
}
+/* Subset of node_stat_item for memcg stats */
+static const unsigned int memcg_node_stat_items[] = {
+ NR_INACTIVE_ANON,
+ NR_ACTIVE_ANON,
+ NR_INACTIVE_FILE,
+ NR_ACTIVE_FILE,
+ NR_UNEVICTABLE,
+ NR_SLAB_RECLAIMABLE_B,
+ NR_SLAB_UNRECLAIMABLE_B,
+ WORKINGSET_REFAULT_ANON,
+ WORKINGSET_REFAULT_FILE,
+ WORKINGSET_ACTIVATE_ANON,
+ WORKINGSET_ACTIVATE_FILE,
+ WORKINGSET_RESTORE_ANON,
+ WORKINGSET_RESTORE_FILE,
+ WORKINGSET_NODERECLAIM,
+ NR_ANON_MAPPED,
+ NR_FILE_MAPPED,
+ NR_FILE_PAGES,
+ NR_FILE_DIRTY,
+ NR_WRITEBACK,
+ NR_SHMEM,
+ NR_SHMEM_THPS,
+ NR_FILE_THPS,
+ NR_ANON_THPS,
+ NR_KERNEL_STACK_KB,
+ NR_PAGETABLE,
+ NR_SECONDARY_PAGETABLE,
+#ifdef CONFIG_SWAP
+ NR_SWAPCACHE,
+#endif
+};
+
+static const unsigned int memcg_stat_items[] = {
+ MEMCG_SWAP,
+ MEMCG_SOCK,
+ MEMCG_PERCPU_B,
+ MEMCG_VMALLOC,
+ MEMCG_KMEM,
+ MEMCG_ZSWAP_B,
+ MEMCG_ZSWAPPED,
+};
+
+#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
+#define NR_MEMCG_STATS (NR_MEMCG_NODE_STAT_ITEMS + ARRAY_SIZE(memcg_stat_items))
+static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
+
+static void init_memcg_stats(void)
+{
+ int8_t i, j = 0;
+
+ /* Switch to short once this failure occurs. */
+ BUILD_BUG_ON(NR_MEMCG_STATS >= 127 /* INT8_MAX */);
+
+ for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i)
+ mem_cgroup_stats_index[memcg_node_stat_items[i]] = ++j;
+
+ for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i)
+ mem_cgroup_stats_index[memcg_stat_items[i]] = ++j;
+}
+
+static inline int memcg_stats_index(int idx)
+{
+ return mem_cgroup_stats_index[idx] - 1;
+}
+
struct lruvec_stats_percpu {
/* Local (CPU and cgroup) state */
- long state[NR_VM_NODE_STAT_ITEMS];
+ long state[NR_MEMCG_NODE_STAT_ITEMS];
/* Delta calculation for lockless upward propagation */
- long state_prev[NR_VM_NODE_STAT_ITEMS];
+ long state_prev[NR_MEMCG_NODE_STAT_ITEMS];
};
struct lruvec_stats {
/* Aggregated (CPU and subtree) state */
- long state[NR_VM_NODE_STAT_ITEMS];
+ long state[NR_MEMCG_NODE_STAT_ITEMS];
/* Non-hierarchical (CPU aggregated) state */
- long state_local[NR_VM_NODE_STAT_ITEMS];
+ long state_local[NR_MEMCG_NODE_STAT_ITEMS];
/* Pending child counts during tree propagation */
- long state_pending[NR_VM_NODE_STAT_ITEMS];
+ long state_pending[NR_MEMCG_NODE_STAT_ITEMS];
};
unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx)
{
struct mem_cgroup_per_node *pn;
- long x;
+ long x = 0;
+ int i;
if (mem_cgroup_disabled())
return node_page_state(lruvec_pgdat(lruvec), idx);
- pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- x = READ_ONCE(pn->lruvec_stats->state[idx]);
+ i = memcg_stats_index(idx);
+ if (i >= 0) {
+ pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+ x = READ_ONCE(pn->lruvec_stats->state[i]);
+ }
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -616,12 +686,16 @@ unsigned long lruvec_page_state_local(st
{
struct mem_cgroup_per_node *pn;
long x = 0;
+ int i;
if (mem_cgroup_disabled())
return node_page_state(lruvec_pgdat(lruvec), idx);
- pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
- x = READ_ONCE(pn->lruvec_stats->state_local[idx]);
+ i = memcg_stats_index(idx);
+ if (i >= 0) {
+ pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
+ x = READ_ONCE(pn->lruvec_stats->state_local[i]);
+ }
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -689,11 +763,11 @@ struct memcg_vmstats_percpu {
/* The above should fit a single cacheline for memcg_rstat_updated() */
/* Local (CPU and cgroup) page state & events */
- long state[MEMCG_NR_STAT];
+ long state[NR_MEMCG_STATS];
unsigned long events[NR_MEMCG_EVENTS];
/* Delta calculation for lockless upward propagation */
- long state_prev[MEMCG_NR_STAT];
+ long state_prev[NR_MEMCG_STATS];
unsigned long events_prev[NR_MEMCG_EVENTS];
/* Cgroup1: threshold notifications & softlimit tree updates */
@@ -703,15 +777,15 @@ struct memcg_vmstats_percpu {
struct memcg_vmstats {
/* Aggregated (CPU and subtree) page state & events */
- long state[MEMCG_NR_STAT];
+ long state[NR_MEMCG_STATS];
unsigned long events[NR_MEMCG_EVENTS];
/* Non-hierarchical (CPU aggregated) page state & events */
- long state_local[MEMCG_NR_STAT];
+ long state_local[NR_MEMCG_STATS];
unsigned long events_local[NR_MEMCG_EVENTS];
/* Pending child counts during tree propagation */
- long state_pending[MEMCG_NR_STAT];
+ long state_pending[NR_MEMCG_STATS];
unsigned long events_pending[NR_MEMCG_EVENTS];
/* Stats updates since the last flush */
@@ -844,7 +918,13 @@ static void flush_memcg_stats_dwork(stru
unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
{
- long x = READ_ONCE(memcg->vmstats->state[idx]);
+ long x;
+ int i = memcg_stats_index(idx);
+
+ if (i < 0)
+ return 0;
+
+ x = READ_ONCE(memcg->vmstats->state[i]);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -876,18 +956,25 @@ static int memcg_state_val_in_pages(int
*/
void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
{
- if (mem_cgroup_disabled())
+ int i = memcg_stats_index(idx);
+
+ if (mem_cgroup_disabled() || i < 0)
return;
- __this_cpu_add(memcg->vmstats_percpu->state[idx], val);
+ __this_cpu_add(memcg->vmstats_percpu->state[i], val);
memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val));
}
/* idx can be of type enum memcg_stat_item or node_stat_item. */
static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
{
- long x = READ_ONCE(memcg->vmstats->state_local[idx]);
+ long x;
+ int i = memcg_stats_index(idx);
+
+ if (i < 0)
+ return 0;
+ x = READ_ONCE(memcg->vmstats->state_local[i]);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
@@ -901,6 +988,10 @@ static void __mod_memcg_lruvec_state(str
{
struct mem_cgroup_per_node *pn;
struct mem_cgroup *memcg;
+ int i = memcg_stats_index(idx);
+
+ if (i < 0)
+ return;
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
memcg = pn->memcg;
@@ -927,10 +1018,10 @@ static void __mod_memcg_lruvec_state(str
}
/* Update memcg */
- __this_cpu_add(memcg->vmstats_percpu->state[idx], val);
+ __this_cpu_add(memcg->vmstats_percpu->state[i], val);
/* Update lruvec */
- __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val);
+ __this_cpu_add(pn->lruvec_stats_percpu->state[i], val);
memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val));
memcg_stats_unlock();
@@ -5697,6 +5788,7 @@ mem_cgroup_css_alloc(struct cgroup_subsy
page_counter_init(&memcg->kmem, &parent->kmem);
page_counter_init(&memcg->tcpmem, &parent->tcpmem);
} else {
+ init_memcg_stats();
init_memcg_events();
page_counter_init(&memcg->memory, NULL);
page_counter_init(&memcg->swap, NULL);
@@ -5868,7 +5960,7 @@ static void mem_cgroup_css_rstat_flush(s
statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
- for (i = 0; i < MEMCG_NR_STAT; i++) {
+ for (i = 0; i < NR_MEMCG_STATS; i++) {
/*
* Collect the aggregated propagation counts of groups
* below us. We're in a per-cpu loop here and this is
@@ -5932,7 +6024,7 @@ static void mem_cgroup_css_rstat_flush(s
lstatc = per_cpu_ptr(pn->lruvec_stats_percpu, cpu);
- for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
+ for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; i++) {
delta = lstats->state_pending[i];
if (delta)
lstats->state_pending[i] = 0;
_
Patches currently in -mm which might be from shakeel.butt@linux.dev are
memcg-simple-cleanup-of-stats-update-functions.patch
memcg-reduce-memory-size-of-mem_cgroup_events_index.patch
memcg-dynamically-allocate-lruvec_stats.patch
memcg-reduce-memory-for-the-lruvec-and-memcg-stats.patch
memcg-cleanup-__mod_memcg_lruvec_state.patch
memcg-pr_warn_once-for-unexpected-events-and-stats.patch
memcg-use-proper-type-for-mod_memcg_state.patch
mm-cleanup-workingset_nodes-in-workingset.patch
next reply other threads:[~2024-04-29 17:19 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-29 17:19 Andrew Morton [this message]
-- strict thread matches above, loose matches on Subject: below --
2024-05-02 15:35 + memcg-reduce-memory-for-the-lruvec-and-memcg-stats.patch added to mm-unstable branch Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240429171914.8E86AC113CD@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=mhocko@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.