All of lore.kernel.org
 help / color / mirror / Atom feed
From: "zhen.ni" <zhen.ni@easystack.cn>
To: SeongJae Park <sj@kernel.org>
Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/4] mm/page_owner: add NUMA node filter with nodelist support
Date: Wed, 29 Apr 2026 17:03:56 +0800	[thread overview]
Message-ID: <f09a287b-87e9-4701-afe0-929363937fd8@easystack.cn> (raw)
In-Reply-To: <20260429012808.88831-1-sj@kernel.org>



在 2026/4/29 09:28, SeongJae Park 写道:
> On Tue, 28 Apr 2026 15:11:11 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:
> 
>> Add NUMA node filtering functionality to page_owner to allow
>> filtering pages by specific NUMA node(s) using nodelist format.
>>
>> The filter allows users to focus on pages from specific NUMA nodes,
>> which is useful for NUMA-aware memory allocation analysis and debugging.
>>
>> Supported input formats:
>> - Single node: echo "2" > nid
>> - Multiple nodes: echo "0,2,3" > nid
>> - Node range: echo "0-3" > nid
>> - Mixed format: echo "0,2-4,7" > nid
>> - Disable filter: echo "-1" > nid
>>
>> Link: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easystack.cn/
>> Link: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easystack.cn/
> 
> Seems the above two links are for v1 and v2 of this patch.  I think putting
> those with the context at commentary area [1] could be useful.
> 
Good suggestion.
>> Suggested-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
>> ---
> [...]
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index 6d87b6948cfa..e674a374669a 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -685,6 +685,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>   	struct page_ext *page_ext;
>>   	struct page_owner *page_owner;
>>   	depot_stack_handle_t handle;
>> +	nodemask_t mask;
>>   
>>   	if (!static_branch_unlikely(&page_owner_inited))
>>   		return -EINVAL;
>> @@ -698,6 +699,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>   	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
>>   		pfn++;
>>   
>> +	mask = owner_filter.nid_mask;
>> +
> 
> READ_ONCE() was used for owner_filter.print_mode.  Should nid_mask also read
> using READ_ONCE()?
> 
The reason is that `owner_filter.nid_mask` is a nodemask_t, which is a
128-byte structure. READ_ONCE() only supports types up to 8 bytes and 
will trigger a compile-time assertion failure for larger structures.

This was actually an issue in v2 - the AI review tool (sashiko.dev) and
Andrew both caught the compilation error with READ_ONCE/WRITE_ONCE on
nodemask_t, so v3 removed them.
>>   	/* Find an allocated page */
>>   	for (; pfn < max_pfn; pfn++) {
>>   		/*
>> @@ -730,6 +733,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>   		if (unlikely(!page_ext))
>>   			continue;
>>   
>> +		/* NUMA node filter using bitmask */
>> +		if (!nodes_empty(mask)) {
>> +			int nid = page_to_nid(page);
>> +
>> +			if (!node_isset(nid, mask))
>> +				goto ext_put_continue;
>> +		}
>> +
>>   		/*
>>   		 * Some pages could be missed by concurrent allocation or free,
>>   		 * because we don't hold the zone lock.
>> @@ -1009,6 +1020,75 @@ DEFINE_SIMPLE_ATTRIBUTE(page_owner_print_mode_fops,
>>   			&page_owner_print_mode_get,
>>   			&page_owner_print_mode_set, "%lld");
>>   
>> +static ssize_t nid_filter_write(struct file *file,
>> +				 const char __user *buf,
>> +				 size_t count, loff_t *ppos)
>> +{
>> +	char *kbuf;
>> +	nodemask_t mask;
>> +	int ret;
>> +	int val;
>> +
>> +	/*
>> +	 * Limit input size to handle worst-case nodelist (all nodes).
>> +	 * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
>> +	 * Formula: 100 bytes overhead + 6 * MAX_NUMNODES
>> +	 */
>> +	if (count > (100 + 6 * MAX_NUMNODES))
>> +		return -EINVAL;
>> +
>> +	kbuf = kmalloc(count + 1, GFP_KERNEL);
>> +	if (!kbuf)
>> +		return -ENOMEM;
>> +
>> +	if (copy_from_user(kbuf, buf, count)) {
>> +		ret = -EFAULT;
>> +		goto out_free;
>> +	}
>> +	kbuf[count] = '\0';
>> +
>> +	/* Support: "-1" to clear, or nodelist format like "0", "0,2", "0-3" */
>> +	if (kstrtoint(kbuf, 10, &val) == 0 && val == -1)
>> +		nodes_clear(mask);
>> +	else if (nodelist_parse(kbuf, mask)) {
>> +		ret = -EINVAL;
>> +		goto out_free;
>> +	}
> 
> Doesn't empty string input to nodelist_parse() clears the mask?  Can't it be
> reused?
> 
Yes, empty input (echo > nid) works because nodelist_parse() handles it
correctly. However, nodelist_parse() - which is implemented via
bitmap_parselist() - cannot handle "-1" as it's not a valid range format
and would return an error. The explicit "-1" check is necessary to 
support `echo "-1" > nid` without returning an error.

So the "-1" check handles a case that nodelist_parse() cannot handle.
>> +
>> +	owner_filter.nid_mask = mask;
>> +	ret = count;
>> +
>> +out_free:
>> +	kfree(kbuf);
>> +	return ret;
>> +}
>> +
>> +static int nid_filter_show(struct seq_file *m, void *v)
>> +{
>> +	nodemask_t mask = owner_filter.nid_mask;
>> +
>> +	if (nodes_empty(mask))
>> +		seq_puts(m, "-1\n");
>> +	else
>> +		seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask));
>> +
>> +	return 0;
>> +}
>> +
>> +static int nid_filter_open(struct inode *inode, struct file *file)
>> +{
>> +	return single_open(file, nid_filter_show, NULL);
>> +}
>> +
>> +static const struct file_operations nid_filter_fops = {
>> +	.owner		= THIS_MODULE,
>> +	.open		= nid_filter_open,
>> +	.read		= seq_read,
>> +	.llseek		= seq_lseek,
>> +	.write		= nid_filter_write,
>> +	.release	= single_release,
>> +};
>> +
>>   
>>   static int __init pageowner_init(void)
>>   {
>> @@ -1024,6 +1104,8 @@ static int __init pageowner_init(void)
>>   	filter_dir = debugfs_create_dir("page_owner_filter", NULL);
>>   	debugfs_create_file("print_mode", 0600, filter_dir, NULL,
>>   			    &page_owner_print_mode_fops);
>> +	debugfs_create_file("nid", 0600, filter_dir, NULL,
>> +			    &nid_filter_fops);
> 
> Why don't you use 'page_owner_' prefix like other fops, for consistency?
> 
For consistency with the other file_operations
in this module (page_owner_fops, page_owner_threshold_fops,
page_owner_print_mode_fops), I'll rename nid_filter_fops to
page_owner_nid_filter_fops.

I'll incorporate these improvements in the next version.

Thanks for the detailed review!
>>   
>>   	dir = debugfs_create_dir("page_owner_stacks", NULL);
>>   	debugfs_create_file("show_stacks", 0400, dir,
>> -- 
>> 2.20.1
> 
> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
> 
> 
> Thanks,
> SJ
> 
> 
Best regards,
Zhen Ni


  reply	other threads:[~2026-04-29  9:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  7:11 [PATCH v3 0/4] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-04-28  7:11 ` [PATCH v3 1/4] mm/page_owner: add filter infrastructure Zhen Ni
2026-04-28  7:11 ` [PATCH v3 2/4] mm/page_owner: add print_mode filter Zhen Ni
2026-04-29  0:57   ` SeongJae Park
2026-04-29  8:19     ` zhen.ni
2026-04-28  7:11 ` [PATCH v3 3/4] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
2026-04-28 14:16   ` Andrew Morton
2026-04-29  7:30     ` zhen.ni
2026-04-29  1:28   ` SeongJae Park
2026-04-29  9:03     ` zhen.ni [this message]
2026-04-29 14:56       ` SeongJae Park
2026-04-30  3:56         ` zhen.ni
2026-04-30  5:16           ` SeongJae Park
2026-04-30  6:00             ` zhen.ni
2026-04-28  7:11 ` [PATCH v3 4/4] mm/page_owner: document page_owner filter features Zhen Ni
2026-04-29  1:35   ` SeongJae Park
2026-04-29  9:14     ` zhen.ni
2026-04-28 14:15 ` [PATCH v3 0/4] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f09a287b-87e9-4701-afe0-929363937fd8@easystack.cn \
    --to=zhen.ni@easystack.cn \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.