From: "zhen.ni" <zhen.ni@easystack.cn>
To: SeongJae Park <sj@kernel.org>
Cc: akpm@linux-foundation.org, vbabka@kernel.org, surenb@google.com,
mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/4] mm/page_owner: add NUMA node filter with nodelist support
Date: Thu, 30 Apr 2026 11:56:33 +0800 [thread overview]
Message-ID: <f2d8be6b-d19d-402d-ab53-b40639c66146@easystack.cn> (raw)
In-Reply-To: <20260429145645.80675-1-sj@kernel.org>
在 2026/4/29 22:56, SeongJae Park 写道:
> On Wed, 29 Apr 2026 17:03:56 +0800 "zhen.ni" <zhen.ni@easystack.cn> wrote:
>
>>
>>
>> 在 2026/4/29 09:28, SeongJae Park 写道:
>>> On Tue, 28 Apr 2026 15:11:11 +0800 Zhen Ni <zhen.ni@easystack.cn> wrote:
> [...]
>>>> @@ -685,6 +685,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>>> struct page_ext *page_ext;
>>>> struct page_owner *page_owner;
>>>> depot_stack_handle_t handle;
>>>> + nodemask_t mask;
>>>>
>>>> if (!static_branch_unlikely(&page_owner_inited))
>>>> return -EINVAL;
>>>> @@ -698,6 +699,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>>>> while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
>>>> pfn++;
>>>>
>>>> + mask = owner_filter.nid_mask;
>>>> +
>>>
>>> READ_ONCE() was used for owner_filter.print_mode. Should nid_mask also read
>>> using READ_ONCE()?
>>>
>> The reason is that `owner_filter.nid_mask` is a nodemask_t, which is a
>> 128-byte structure. READ_ONCE() only supports types up to 8 bytes and
>> will trigger a compile-time assertion failure for larger structures.
>>
>> This was actually an issue in v2 - the AI review tool (sashiko.dev) and
>> Andrew both caught the compilation error with READ_ONCE/WRITE_ONCE on
>> nodemask_t, so v3 removed them.
>
> Thank you for kindly sharing the context. Now I understand why READ_ONCE()
> cannot be used. But, is plain load/store safe enough for nodemask_t?
> Shouldn't it still be protected against races?
>
Concurrency Safety:
I considered spinlock and RCU, but decided against them:
- Spinlock: Adds overhead on every read, overkill for a debug facility
- RCU: Requires dynamic allocation of 128-byte nodemask_t, too complex
- READ_ONCE/WRITE_ONCE: Not possible, exceeds 8-byte limit
Plain load/store is safe here because:
1. page_owner is debug code with low-frequency filter changes
2. Worst case of torn read is temporary inconsistency in debug output
3. Similar debugfs interfaces use the same approach
The overhead of locking doesn't justify the benefit for this debug use case.
Do you think this is acceptable, or would you prefer I add locking?
> [...]
>>>> +static ssize_t nid_filter_write(struct file *file,
>>>> + const char __user *buf,
>>>> + size_t count, loff_t *ppos)
>>>> +{
>>>> + char *kbuf;
>>>> + nodemask_t mask;
>>>> + int ret;
>>>> + int val;
>>>> +
>>>> + /*
>>>> + * Limit input size to handle worst-case nodelist (all nodes).
>>>> + * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
>>>> + * Formula: 100 bytes overhead + 6 * MAX_NUMNODES
>>>> + */
>>>> + if (count > (100 + 6 * MAX_NUMNODES))
>>>> + return -EINVAL;
>>>> +
>>>> + kbuf = kmalloc(count + 1, GFP_KERNEL);
>>>> + if (!kbuf)
>>>> + return -ENOMEM;
>>>> +
>>>> + if (copy_from_user(kbuf, buf, count)) {
>>>> + ret = -EFAULT;
>>>> + goto out_free;
>>>> + }
>>>> + kbuf[count] = '\0';
>>>> +
>>>> + /* Support: "-1" to clear, or nodelist format like "0", "0,2", "0-3" */
>>>> + if (kstrtoint(kbuf, 10, &val) == 0 && val == -1)
>>>> + nodes_clear(mask);
>>>> + else if (nodelist_parse(kbuf, mask)) {
>>>> + ret = -EINVAL;
>>>> + goto out_free;
>>>> + }
>>>
>>> Doesn't empty string input to nodelist_parse() clears the mask? Can't it be
>>> reused?
>>>
>> Yes, empty input (echo > nid) works because nodelist_parse() handles it
>> correctly. However, nodelist_parse() - which is implemented via
>> bitmap_parselist() - cannot handle "-1" as it's not a valid range format
>> and would return an error. The explicit "-1" check is necessary to
>> support `echo "-1" > nid` without returning an error.
>>
>> So the "-1" check handles a case that nodelist_parse() cannot handle.
>
> Thank you for kindly explaining the reason. But, do we really need to support
> "-1" input? Couldn't we just redefine the interface?
>
I chose "-1" to clearly differentiate from valid NUMA node IDs (0, 1, 2,
3...).Since node IDs are non-negative integers, "-1" naturally means
"invalid" or "no filter", which is an intuitive convention in Linux
(e.g., pid -1, signal -1).
Do you have a better suggestion for how to represent "clear filter"?
> [...]
>>>> + debugfs_create_file("nid", 0600, filter_dir, NULL,
>>>> + &nid_filter_fops);
>>>
>>> Why don't you use 'page_owner_' prefix like other fops, for consistency?
>>>
>> For consistency with the other file_operations
>> in this module (page_owner_fops, page_owner_threshold_fops,
>> page_owner_print_mode_fops), I'll rename nid_filter_fops to
>> page_owner_nid_filter_fops.
>>
>> I'll incorporate these improvements in the next version.
>
> Thank you for kindly accepting my humble suggestions.
>
>>
>> Thanks for the detailed review!
>
> Thank you for sharing this nice work, too!
>
>
> Thanks,
> SJ
>
> [...]
>
>
next prev parent reply other threads:[~2026-04-30 3:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 7:11 [PATCH v3 0/4] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-04-28 7:11 ` [PATCH v3 1/4] mm/page_owner: add filter infrastructure Zhen Ni
2026-04-28 7:11 ` [PATCH v3 2/4] mm/page_owner: add print_mode filter Zhen Ni
2026-04-29 0:57 ` SeongJae Park
2026-04-29 8:19 ` zhen.ni
2026-04-28 7:11 ` [PATCH v3 3/4] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
2026-04-28 14:16 ` Andrew Morton
2026-04-29 7:30 ` zhen.ni
2026-04-29 1:28 ` SeongJae Park
2026-04-29 9:03 ` zhen.ni
2026-04-29 14:56 ` SeongJae Park
2026-04-30 3:56 ` zhen.ni [this message]
2026-04-30 5:16 ` SeongJae Park
2026-04-30 6:00 ` zhen.ni
2026-04-28 7:11 ` [PATCH v3 4/4] mm/page_owner: document page_owner filter features Zhen Ni
2026-04-29 1:35 ` SeongJae Park
2026-04-29 9:14 ` zhen.ni
2026-04-28 14:15 ` [PATCH v3 0/4] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2d8be6b-d19d-402d-ab53-b40639c66146@easystack.cn \
--to=zhen.ni@easystack.cn \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox