All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, Yosry Ahmed <yosry@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2] mm/percpu, memcontrol: Per-memcg-lruvec percpu accounting
Date: Wed, 8 Apr 2026 11:40:27 +0900	[thread overview]
Message-ID: <adXAG52R6WVHd0n9@hyeyoo> (raw)
In-Reply-To: <20260404033844.1892595-1-joshua.hahnjy@gmail.com>

On Fri, Apr 03, 2026 at 08:38:43PM -0700, Joshua Hahn wrote:
> enum memcg_stat_item includes memory that is tracked on a per-memcg
> level, but not at a per-node (and per-lruvec) level. Diagnosing
> memory pressure for memcgs in multi-NUMA systems can be difficult,
> since not all of the memory accounted in memcg can be traced back
> to a node. In scenarios where numa nodes in an memcg are asymmetrically
> stressed, this difference can be invisible to the user.
> 
> Convert MEMCG_PERCPU_B from a memcg_stat_item to a memcg_node_stat_item
> to give visibility into per-node breakdowns for percpu allocations.
> 
> This will get us closer to being able to know the memcg and physical
> association of all memory on the system. Specifically for percpu, this
> granularity will help demonstrate footprint differences on systems with
> asymmetric NUMA nodes.
> 
> Because percpu memory is accounted at a sub-PAGE_SIZE level, we must
> account node level statistics (accounted in PAGE_SIZE units) and
> memcg-lruvec statistics separately. Account node statistics when the pcpu
> pages are allocated, and account memcg-lruvec statistics when pcpu
> objects are handed out.
> 
> To do account these separately, expose mod_memcg_lruvec_state to be
> used outside of memcontrol.
> 
> The memory overhead of this patch is small; it adds 16 bytes
> per-cgroup-node-cpu. For an example machine with 200 CPUs split across
> 2 nodes and 50 cgroups in the system, we see a 312.5 kB increase. Note
> that this is the same cost as any other item in memcg_node_stat_item.
> 
> Performance impact is also negligible. These are results from a kernel
> module which performs 100k percpu allocations via __alloc_percpu_gfp
> with GFP_KERNEL | __GFP_ACCOUNT in a cgroup, across 20 trials.
> Batched performs 100k allocations followed by 100k frees, while
> interleaved performs allocation --> free --> allocation ...
> 
> +-------------+----------------+--------------+--------------+
> |    Test     | linus-upstream |    patch     |     diff     |
> +-------------+----------------+--------------+--------------+
> | Batched     | 6586 +/- 51    | 6595 +/- 35  | +9 (0.13%)   |
> | Interleaved | 1053 +/- 126   | 1085 +/- 113 | +32 (+0.85%) |
> +-------------+----------------+--------------+--------------+
> 
> One functional change is that there can be a tiny inconsistency between
> the size of the allocation used for memcg limit checking and what is
> charged to each lruvec due to dropping fractional charges when rounding.
> In reality this value is very very small and always lies on the side of
> memory checking at a higher threshold, so there is no behavioral change
> from userspace.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> ---
>  include/linux/memcontrol.h |  4 +++-
>  include/linux/mmzone.h     |  4 +++-
>  mm/memcontrol.c            | 12 +++++-----
>  mm/percpu-vm.c             | 14 ++++++++++--
>  mm/percpu.c                | 45 ++++++++++++++++++++++++++++++++++----
>  mm/vmstat.c                |  1 +
>  6 files changed, 66 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index 4f5937090590d..e36b639f521dd 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -65,6 +66,10 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
>  				__free_page(page);
>  		}
>  	}
> +
> +	for_each_node(nid)
> +		mod_node_page_state(NODE_DATA(nid), NR_PERCPU_B,
> +				-1L * nr_pages * nr_cpus_node(nid) * PAGE_SIZE);

Can this end up with mis-accounting due to CPU hotplug?

>  }
>  
>  /**
> @@ -84,7 +89,8 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
>  			    gfp_t gfp)
>  {
>  	unsigned int cpu, tcpu;
> -	int i;
> +	int nr_pages = page_end - page_start;
> +	int i, nid;
>  
>  	gfp |= __GFP_HIGHMEM;
>  
> @@ -97,6 +103,10 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
>  				goto err;
>  		}
>  	}
> +
> +	for_each_node(nid)
> +		mod_node_page_state(NODE_DATA(nid), NR_PERCPU_B,
> +				    nr_pages * nr_cpus_node(nid) * PAGE_SIZE);
>  	return 0;
>  
>  err:

-- 
Cheers,
Harry / Hyeonggon

  parent reply	other threads:[~2026-04-08  2:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04  3:38 [PATCH v2] mm/percpu, memcontrol: Per-memcg-lruvec percpu accounting Joshua Hahn
2026-04-04  4:56 ` Matthew Wilcox
2026-04-04  5:03   ` Joshua Hahn
2026-04-08  2:40 ` Harry Yoo (Oracle) [this message]
2026-04-08  3:40   ` Joshua Hahn
2026-04-08  3:52     ` Harry Yoo (Oracle)
2026-04-14 20:26 ` Joshua Hahn
2026-04-15  2:32   ` Harry Yoo (Oracle)
2026-04-15 14:33     ` Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adXAG52R6WVHd0n9@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=david@kernel.org \
    --cc=dennis@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=vbabka@kernel.org \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.