The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Zhen Ni <zhen.ni@easystack.cn>
Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support
Date: Mon, 11 May 2026 10:54:39 +0200	[thread overview]
Message-ID: <agGZT0bXygoZiif1@localhost.localdomain> (raw)
In-Reply-To: <20260511033017.747781-3-zhen.ni@easystack.cn>

On Mon, May 11, 2026 at 11:30:16AM +0800, Zhen Ni wrote:
> Add NUMA node filtering functionality to page_owner to allow filtering
> pages by specific NUMA node(s). This is useful for NUMA-aware memory
> allocation analysis and debugging.
> 
> The filter supports flexible nodelist input formats:
> - Single node: echo "0" > nid
> - Multiple nodes: echo "0,2,3" > nid
> - Node range: echo "0-3" > nid
> - Mixed format: echo "0,2-4,7" > nid
> - Clear filter: echo > nid (empty string)
> 
> The implementation uses nodemask_t for efficient multi-node filtering
> and nodelist_parse() for flexible input parsing. Empty input clears
> the filter.
> 
> Note: Access to nid_mask uses plain load/store without locking because
> nodemask_t is too large (128 bytes) for READ_ONCE/WRITE_ONCE. This is
> safe for debug use: low-frequency changes and torn reads would only
> cause temporary inconsistency in debug output.
> 
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
...
> ---
>  mm/page_owner.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 92 insertions(+)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 27a412c52d41..8a38005539ff 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
...
> @@ -700,6 +708,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
>  		pfn++;
>  
> +	mask = owner_filter.nid_mask;
> +	filter_by_nid = !nodes_empty(mask);
> +
>  	/* Find an allocated page */
>  	for (; pfn < max_pfn; pfn++) {
>  		/*
> @@ -732,6 +743,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  		if (unlikely(!page_ext))
>  			continue;
>  
> +		/* NUMA node filter using bitmask */
> +		if (filter_by_nid) {

This comment is kinda pointless because it explains something that the code makes it
quite clear.
Either drop it, or just go with "NUMA node filter", but "using bitmask"
does not really add much.


> +			int nid = page_to_nid(page);
> +
> +			if (!node_isset(nid, mask))
> +				goto ext_put_continue;
> +		}
> +
>  		/*
>  		 * Some pages could be missed by concurrent allocation or free,
>  		 * because we don't hold the zone lock.
> @@ -1043,6 +1062,77 @@ static const struct file_operations page_owner_print_mode_fops = {
>  	.llseek = default_llseek,
>  };
>  
> +static ssize_t nid_filter_write(struct file *file,
> +				 const char __user *buf,
> +				 size_t count, loff_t *ppos)
> +{
> +	char *kbuf;
> +	nodemask_t mask;
> +	int ret;
> +
> +	/*
> +	 * Limit input size to handle worst-case nodelist (all nodes).
> +	 * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
> +	 */
> +	if (count > (6 * MAX_NUMNODES))
> +		return -EINVAL;
> +
> +	kbuf = kmalloc_objs(*kbuf, count + 1);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	if (strncpy_from_user(kbuf, buf, count) < 0) {
> +		ret = -EFAULT;
> +		goto out_free;
> +	}
> +	kbuf[count] = '\0';
> +
> +	/* Support nodelist format like "0", "0,2", "0-3", or empty to clear */
> +	if (nodelist_parse(kbuf, mask)) {
> +		ret = -EINVAL;
> +		goto out_free;
> +	}

nodelist_parse() can also return other return values besides EINVAL.
Something like

 ret = nodelist_parse(...)
 if (ret < 0)
    return ret

might be cleaner.

> +
> +	/* Validate that all specified nodes actually exist in the system */
> +	if (!nodes_subset(mask, node_states[N_MEMORY])) {
> +		ret = -EINVAL;
> +		goto out_free;
> +	}

Ok, I get that since you want to filter allocations by numa nodes, you
want to make sure that those nodes have memory.
Although that might change due to concurrent memory-hotplug operations,
but that is a different story.

I do not like the comment though, because we can have other nodes
existing in the system with no memory (e.g: memoryless nodes only having
cpus, or none of them), so I would make that clearer:

"
  /* 
   * We want to filter memory allocations by numa nodes, so make sure
   * that the specified nodes have memory.
   */
"

or something along those lines.


> +
> +	owner_filter.nid_mask = mask;
> +	ret = count;
> +
> +out_free:
> +	kfree(kbuf);
> +	return ret;
> +}
> +
> +static int nid_filter_show(struct seq_file *m, void *v)
> +{
> +	nodemask_t mask = owner_filter.nid_mask;
> +
> +	if (nodes_empty(mask))
> +		seq_puts(m, "\n");
> +	else
> +		seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask));

is not nodemask_pr_args clever enough to not print anything or print "0"
if the nmask is NODE_MASK_NONE?


-- 
Oscar Salvador
SUSE Labs

  reply	other threads:[~2026-05-11  8:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  3:30 [PATCH v6 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-11  3:30 ` [PATCH v6 1/3] mm/page_owner: add print_mode filter Zhen Ni
2026-05-11  8:29   ` Oscar Salvador
2026-05-11 11:54     ` zhen.ni
2026-05-11  3:30 ` [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
2026-05-11  8:54   ` Oscar Salvador [this message]
2026-05-11 12:24     ` zhen.ni
2026-05-11  3:30 ` [PATCH v6 3/3] mm/page_owner: document page_owner filter features Zhen Ni
2026-05-11  8:33   ` Oscar Salvador
2026-05-11 12:23 ` [PATCH v6 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Michal Hocko
2026-05-11 12:40   ` zhen.ni
2026-05-11 12:54     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agGZT0bXygoZiif1@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=zhen.ni@easystack.cn \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox