From: "Zi Yan" <ziy@nvidia.com>
To: "Zhen Ni" <zhen.ni@easystack.cn>, <akpm@linux-foundation.org>,
<vbabka@kernel.org>
Cc: <surenb@google.com>, <mhocko@suse.com>, <jackmanb@google.com>,
<hannes@cmpxchg.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v11 2/4] mm/page_owner: add NUMA node filter
Date: Thu, 25 Jun 2026 14:37:00 -0400 [thread overview]
Message-ID: <DJICJHL8QY2F.2C6UR95O4RBIT@nvidia.com> (raw)
In-Reply-To: <20260625043101.338794-3-zhen.ni@easystack.cn>
On Thu Jun 25, 2026 at 12:30 AM EDT, Zhen Ni wrote:
> Add NUMA node filtering functionality to page_owner to allow filtering
> pages by specific NUMA node(s). This is useful for NUMA-aware memory
> allocation analysis and debugging.
>
> The filter supports flexible input formats:
> - Single node: nid=0
> - Multiple nodes: nid=0,2,3
> - Node range: nid=0-3
> - Mixed format: nid=0,2-4,7
>
> Example usage:
> # Using the page_owner_filter tool (recommended)
> ./page_owner_filter -n 0-3
> ./page_owner_filter -m stack_handle -n 0,2-4,7
>
> The implementation uses per-file-descriptor filter state stored in
> file->private_data, allowing each opener to have independent filter
> configuration. It uses nodemask_t for efficient multi-node filtering and
> nodelist_parse() for flexible input parsing. Node validity is verified
> using nodes_subset() to reject nodes without memory.
>
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
> Changes in v11:
> - Remove 'nid' member from struct page_owner to save memory
> - Read page->flags directly with poison checking
>
> Changes in v10:
> - Add 'nid' member to struct page_owner and record it at allocation time
> - Remove cond_resched() in page iteration loop (unconditional call)
> - Update NUMA filter to use saved nid instead of page_to_nid()
>
> Changes in v9:
> - Add spinlock protection for NUMA filter state access
> - Use memdesc_nid() instead of page_to_nid() to bypass PF_POISONED_CHECK()
>
> Changes in v8:
> - Add cond_resched() in page iteration loop to prevent RCU stalls
> - Reject empty nid list to avoid enabling an empty filter
> - Improve comment: "Commit all filter changes"
>
> Changes in v7:
> - per-file-descriptor implementation
>
> Changes in v6:
> - Add node validity check using nodes_subset
> to reject invalid node numbers that don't exist in the system
> - Move bool filter_by_nid declaration to top of block
> - Use kmalloc_objs instead of kmalloc
> - Remove 100 bytes overhead
>
> Changes in v5:
> - Optimize nodes_empty() check in page iteration loop
> - Add __data_racy qualifier to nid_mask field
>
> Changes in v4:
> - Remove "-1" support, use empty string to clear filter
> - Use strncpy_from_user() instead of copy_from_user()
> - Add concurrency safety documentation for nid_mask access
> - Rename fops to page_owner_nid_filter_fops for consistency
>
> Changes in v3:
> - Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
> * nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
> * Direct assignment is safe for this use case
> - Add comment explaining input length calculation formula
> * 6 bytes = ",NNNNN" (comma + 5-digit node number)
> - Simplify "-1" check using kstrtoint() instead of dual strcmp()
> - Move nodemask_t mask read outside PFN iteration loop for performance
> * Avoids 128-byte structure copy on each iteration
>
> Changes in v2:
> - Use nodemask_t instead of int to support multiple nodes
> - Implement nodelist_parse() to support flexible input formats
> * Single node: "0", "2"
> * Multiple nodes: "0,2,3"
> * Ranges: "0-3"
> * Mixed: "0,2-4,7"
> - Use %*pbl format for output (e.g., "0-2", "0,2-4,7")
> - Use dynamic memory allocation (kmalloc) to handle variable-length input
> - Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES)
>
> v10: https://lore.kernel.org/linux-mm/20260618035750.3724613-3-zhen.ni@easystack.cn/
> v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-3-zhen.ni@easystack.cn/
> v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-3-zhen.ni@easystack.cn/
> v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-3-zhen.ni@easystack.cn/
> v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@easystack.cn/
> v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easystack.cn/
> v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easystack.cn/
> v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easystack.cn/
> v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easystack.cn/
> v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easystack.cn/
> ---
> mm/page_owner.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 7595735979bf..cae5abf0ac9a 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -68,6 +68,8 @@ static const char * const page_owner_print_mode_strings[] = {
>
> struct page_owner_filter_state {
> enum page_owner_print_mode print_mode;
> + nodemask_t nid_filter;
> + bool nid_filter_enabled;
I thought about initializing nid_filter to node_states[N_MEMORY] to get
rid of nid_filter_enabled, but that adds unncessary nid filtering to all
page_ower reads. Your approach is better.
> spinlock_t lock;
> };
>
> @@ -698,6 +700,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> struct page_owner *page_owner;
> depot_stack_handle_t handle;
> struct page_owner_filter_state *state = file->private_data;
> + unsigned long flags;
>
> if (!static_branch_unlikely(&page_owner_inited))
> return -EINVAL;
> @@ -774,6 +777,27 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> if (!handle)
> goto ext_put_continue;
>
> + spin_lock_irqsave(&state->lock, flags);
> + if (state->nid_filter_enabled) {
> + int nid;
> + memdesc_flags_t page_flags = READ_ONCE(page->flags);
> +
> + /*
> + * Bypass PF_POISONED_CHECK() in page_to_nid() to avoid
> + * VM_BUG_ON when accessing poisoned pages.
> + */
> + if (page_flags.f == PAGE_POISON_PATTERN) {
> + spin_unlock_irqrestore(&state->lock, flags);
> + goto ext_put_continue;
> + }
> + nid = memdesc_nid(page_flags);
> + if (!node_isset(nid, state->nid_filter)) {
> + spin_unlock_irqrestore(&state->lock, flags);
> + goto ext_put_continue;
> + }
> + }
> + spin_unlock_irqrestore(&state->lock, flags);
> +
> /* Record the next PFN to read in the file offset */
> *ppos = pfn + 1;
>
> @@ -783,6 +807,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> &page_owner_tmp, handle, state);
> ext_put_continue:
> page_ext_put(page_ext);
> + cond_resched();
In the changelog above, cond_resched() is said to be removed in V10. Did
you miss this? Or it is intended.
Otherwise, LGTM.
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2026-06-25 18:37 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 4:30 [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-06-25 4:30 ` [PATCH v11 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-06-25 18:26 ` Zi Yan
2026-06-25 19:20 ` Andrew Morton
2026-06-25 19:24 ` Zi Yan
2026-06-25 4:30 ` [PATCH v11 2/4] mm/page_owner: add NUMA node filter Zhen Ni
2026-06-25 18:37 ` Zi Yan [this message]
2026-06-26 8:20 ` zhen.ni
2026-06-25 19:27 ` Zi Yan
2026-06-25 20:04 ` Andrew Morton
2026-06-25 4:31 ` [PATCH v11 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-06-25 4:50 ` Andrew Morton
2026-06-25 4:31 ` [PATCH v11 4/4] mm/page_owner: document page_owner filter Zhen Ni
2026-06-25 4:55 ` [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Andrew Morton
2026-06-25 12:57 ` zhen.ni
2026-06-25 18:22 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DJICJHL8QY2F.2C6UR95O4RBIT@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=zhen.ni@easystack.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.