All of lore.kernel.org
 help / color / mirror / Atom feed
From: "zhen.ni" <zhen.ni@easystack.cn>
To: Oscar Salvador <osalvador@suse.de>
Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support
Date: Mon, 11 May 2026 20:24:40 +0800	[thread overview]
Message-ID: <2a70be08-ea9c-473b-80f2-9852e2f6fc51@easystack.cn> (raw)
In-Reply-To: <agGZT0bXygoZiif1@localhost.localdomain>



在 2026/5/11 16:54, Oscar Salvador 写道:
> On Mon, May 11, 2026 at 11:30:16AM +0800, Zhen Ni wrote:
>> Add NUMA node filtering functionality to page_owner to allow filtering
>> pages by specific NUMA node(s). This is useful for NUMA-aware memory
>> allocation analysis and debugging.
>>
>> The filter supports flexible nodelist input formats:
>> - Single node: echo "0" > nid
>> - Multiple nodes: echo "0,2,3" > nid
>> - Node range: echo "0-3" > nid
>> - Mixed format: echo "0,2-4,7" > nid
>> - Clear filter: echo > nid (empty string)
>>
>> The implementation uses nodemask_t for efficient multi-node filtering
>> and nodelist_parse() for flexible input parsing. Empty input clears
>> the filter.
>>
>> Note: Access to nid_mask uses plain load/store without locking because
>> nodemask_t is too large (128 bytes) for READ_ONCE/WRITE_ONCE. This is
>> safe for debug use: low-frequency changes and torn reads would only
>> cause temporary inconsistency in debug output.
>>
>> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
>> ---
> ...
>> ---
>>   mm/page_owner.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 92 insertions(+)
>>
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index 27a412c52d41..8a38005539ff 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
> ...
>> @@ -700,6 +708,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>   	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
>>   		pfn++;
>>   
>> +	mask = owner_filter.nid_mask;
>> +	filter_by_nid = !nodes_empty(mask);
>> +
>>   	/* Find an allocated page */
>>   	for (; pfn < max_pfn; pfn++) {
>>   		/*
>> @@ -732,6 +743,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>   		if (unlikely(!page_ext))
>>   			continue;
>>   
>> +		/* NUMA node filter using bitmask */
>> +		if (filter_by_nid) {
> 
> This comment is kinda pointless because it explains something that the code makes it
> quite clear.
> Either drop it, or just go with "NUMA node filter", but "using bitmask"
> does not really add much.
> 
I'll just remove it entirely.
> 
>> +			int nid = page_to_nid(page);
>> +
>> +			if (!node_isset(nid, mask))
>> +				goto ext_put_continue;
>> +		}
>> +
>>   		/*
>>   		 * Some pages could be missed by concurrent allocation or free,
>>   		 * because we don't hold the zone lock.
>> @@ -1043,6 +1062,77 @@ static const struct file_operations page_owner_print_mode_fops = {
>>   	.llseek = default_llseek,
>>   };
>>   
>> +static ssize_t nid_filter_write(struct file *file,
>> +				 const char __user *buf,
>> +				 size_t count, loff_t *ppos)
>> +{
>> +	char *kbuf;
>> +	nodemask_t mask;
>> +	int ret;
>> +
>> +	/*
>> +	 * Limit input size to handle worst-case nodelist (all nodes).
>> +	 * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
>> +	 */
>> +	if (count > (6 * MAX_NUMNODES))
>> +		return -EINVAL;
>> +
>> +	kbuf = kmalloc_objs(*kbuf, count + 1);
>> +	if (!kbuf)
>> +		return -ENOMEM;
>> +
>> +	if (strncpy_from_user(kbuf, buf, count) < 0) {
>> +		ret = -EFAULT;
>> +		goto out_free;
>> +	}
>> +	kbuf[count] = '\0';
>> +
>> +	/* Support nodelist format like "0", "0,2", "0-3", or empty to clear */
>> +	if (nodelist_parse(kbuf, mask)) {
>> +		ret = -EINVAL;
>> +		goto out_free;
>> +	}
> 
> nodelist_parse() can also return other return values besides EINVAL.
> Something like
> 
>   ret = nodelist_parse(...)
>   if (ret < 0)
>      return ret
> 
> might be cleaner.
> 
  I'll change it.
>> +
>> +	/* Validate that all specified nodes actually exist in the system */
>> +	if (!nodes_subset(mask, node_states[N_MEMORY])) {
>> +		ret = -EINVAL;
>> +		goto out_free;
>> +	}
> 
> Ok, I get that since you want to filter allocations by numa nodes, you
> want to make sure that those nodes have memory.
> Although that might change due to concurrent memory-hotplug operations,
> but that is a different story.
> 
> I do not like the comment though, because we can have other nodes
> existing in the system with no memory (e.g: memoryless nodes only having
> cpus, or none of them), so I would make that clearer:
> 
> "
>    /*
>     * We want to filter memory allocations by numa nodes, so make sure
>     * that the specified nodes have memory.
>     */
> "
> 
> or something along those lines.
> 
> 
I'll update the comment to be more precise about filtering nodes with 
memory.
>> +
>> +	owner_filter.nid_mask = mask;
>> +	ret = count;
>> +
>> +out_free:
>> +	kfree(kbuf);
>> +	return ret;
>> +}
>> +
>> +static int nid_filter_show(struct seq_file *m, void *v)
>> +{
>> +	nodemask_t mask = owner_filter.nid_mask;
>> +
>> +	if (nodes_empty(mask))
>> +		seq_puts(m, "\n");
>> +	else
>> +		seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask));
> 
> is not nodemask_pr_args clever enough to not print anything or print "0"
> if the nmask is NODE_MASK_NONE?
> 
> 
Looking at lib/vsprintf.c:bitmap_list_string(), the %*pbl format
doesn't print anything when the bitmap is empty (the 
for_each_set_bitrange loopdoesn't execute). So we can simplify this to 
just remove check nodes_empty().

Thanks for the thorough review!

Best regards,
Zhen

  reply	other threads:[~2026-05-11 12:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  3:30 [PATCH v6 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-11  3:30 ` [PATCH v6 1/3] mm/page_owner: add print_mode filter Zhen Ni
2026-05-11  8:29   ` Oscar Salvador
2026-05-11 11:54     ` zhen.ni
2026-05-11  3:30 ` [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
2026-05-11  8:54   ` Oscar Salvador
2026-05-11 12:24     ` zhen.ni [this message]
2026-05-11  3:30 ` [PATCH v6 3/3] mm/page_owner: document page_owner filter features Zhen Ni
2026-05-11  8:33   ` Oscar Salvador
2026-05-11 12:23 ` [PATCH v6 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Michal Hocko
2026-05-11 12:40   ` zhen.ni
2026-05-11 12:54     ` Michal Hocko
2026-05-12  3:11       ` zhen.ni
2026-05-12  7:26         ` Michal Hocko
2026-05-12  8:16           ` zhen.ni
2026-05-12  8:54             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a70be08-ea9c-473b-80f2-9852e2f6fc51@easystack.cn \
    --to=zhen.ni@easystack.cn \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.