Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Zi Yan" <ziy@nvidia.com>
To: "Zhen Ni" <zhen.ni@easystack.cn>, <akpm@linux-foundation.org>,
	<vbabka@kernel.org>
Cc: <surenb@google.com>, <mhocko@suse.com>, <jackmanb@google.com>,
	<hannes@cmpxchg.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v11 2/4] mm/page_owner: add NUMA node filter
Date: Thu, 25 Jun 2026 14:37:00 -0400	[thread overview]
Message-ID: <DJICJHL8QY2F.2C6UR95O4RBIT@nvidia.com> (raw)
In-Reply-To: <20260625043101.338794-3-zhen.ni@easystack.cn>

On Thu Jun 25, 2026 at 12:30 AM EDT, Zhen Ni wrote:
> Add NUMA node filtering functionality to page_owner to allow filtering
> pages by specific NUMA node(s). This is useful for NUMA-aware memory
> allocation analysis and debugging.
>
> The filter supports flexible input formats:
> - Single node: nid=0
> - Multiple nodes: nid=0,2,3
> - Node range: nid=0-3
> - Mixed format: nid=0,2-4,7
>
> Example usage:
>   # Using the page_owner_filter tool (recommended)
>   ./page_owner_filter -n 0-3
>   ./page_owner_filter -m stack_handle -n 0,2-4,7
>
> The implementation uses per-file-descriptor filter state stored in
> file->private_data, allowing each opener to have independent filter
> configuration. It uses nodemask_t for efficient multi-node filtering and
> nodelist_parse() for flexible input parsing. Node validity is verified
> using nodes_subset() to reject nodes without memory.
>
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
> Changes in v11:
> - Remove 'nid' member from struct page_owner to save memory
> - Read page->flags directly with poison checking
>
> Changes in v10:
> - Add 'nid' member to struct page_owner and record it at allocation time
> - Remove cond_resched() in page iteration loop (unconditional call)
> - Update NUMA filter to use saved nid instead of page_to_nid()
>
> Changes in v9:
> - Add spinlock protection for NUMA filter state access
> - Use memdesc_nid() instead of page_to_nid() to bypass PF_POISONED_CHECK()
>
> Changes in v8:
> - Add cond_resched() in page iteration loop to prevent RCU stalls
> - Reject empty nid list to avoid enabling an empty filter
> - Improve comment: "Commit all filter changes"
>
> Changes in v7:
> - per-file-descriptor implementation
>
> Changes in v6:
> - Add node validity check using nodes_subset
>   to reject invalid node numbers that don't exist in the system
> - Move bool filter_by_nid declaration to top of block
> - Use kmalloc_objs instead of kmalloc
> - Remove 100 bytes overhead
>
> Changes in v5:
> - Optimize nodes_empty() check in page iteration loop
> - Add __data_racy qualifier to nid_mask field
>
> Changes in v4:
> - Remove "-1" support, use empty string to clear filter
> - Use strncpy_from_user() instead of copy_from_user()
> - Add concurrency safety documentation for nid_mask access
> - Rename fops to page_owner_nid_filter_fops for consistency
>
> Changes in v3:
> - Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
>   * nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
>   * Direct assignment is safe for this use case
> - Add comment explaining input length calculation formula
>   * 6 bytes = ",NNNNN" (comma + 5-digit node number)
> - Simplify "-1" check using kstrtoint() instead of dual strcmp()
> - Move nodemask_t mask read outside PFN iteration loop for performance
>   * Avoids 128-byte structure copy on each iteration
>
> Changes in v2:
> - Use nodemask_t instead of int to support multiple nodes
> - Implement nodelist_parse() to support flexible input formats
>   * Single node: "0", "2"
>   * Multiple nodes: "0,2,3"
>   * Ranges: "0-3"
>   * Mixed: "0,2-4,7"
> - Use %*pbl format for output (e.g., "0-2", "0,2-4,7")
> - Use dynamic memory allocation (kmalloc) to handle variable-length input
> - Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES)
>
> v10: https://lore.kernel.org/linux-mm/20260618035750.3724613-3-zhen.ni@easystack.cn/
> v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-3-zhen.ni@easystack.cn/
> v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-3-zhen.ni@easystack.cn/
> v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-3-zhen.ni@easystack.cn/
> v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@easystack.cn/
> v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easystack.cn/
> v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easystack.cn/
> v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easystack.cn/
> v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easystack.cn/
> v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easystack.cn/
> ---
>  mm/page_owner.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 7595735979bf..cae5abf0ac9a 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -68,6 +68,8 @@ static const char * const page_owner_print_mode_strings[] = {
>  
>  struct page_owner_filter_state {
>  	enum page_owner_print_mode print_mode;
> +	nodemask_t nid_filter;
> +	bool nid_filter_enabled;

I thought about initializing nid_filter to node_states[N_MEMORY] to get
rid of nid_filter_enabled, but that adds unncessary nid filtering to all
page_ower reads. Your approach is better.

>  	spinlock_t lock;
>  };
>  
> @@ -698,6 +700,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  	struct page_owner *page_owner;
>  	depot_stack_handle_t handle;
>  	struct page_owner_filter_state *state = file->private_data;
> +	unsigned long flags;
>  
>  	if (!static_branch_unlikely(&page_owner_inited))
>  		return -EINVAL;
> @@ -774,6 +777,27 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  		if (!handle)
>  			goto ext_put_continue;
>  
> +		spin_lock_irqsave(&state->lock, flags);
> +		if (state->nid_filter_enabled) {
> +			int nid;
> +			memdesc_flags_t page_flags = READ_ONCE(page->flags);
> +
> +			/*
> +			 * Bypass PF_POISONED_CHECK() in page_to_nid() to avoid
> +			 * VM_BUG_ON when accessing poisoned pages.
> +			 */
> +			if (page_flags.f == PAGE_POISON_PATTERN) {
> +				spin_unlock_irqrestore(&state->lock, flags);
> +				goto ext_put_continue;
> +			}
> +			nid = memdesc_nid(page_flags);
> +			if (!node_isset(nid, state->nid_filter)) {
> +				spin_unlock_irqrestore(&state->lock, flags);
> +				goto ext_put_continue;
> +			}
> +		}
> +		spin_unlock_irqrestore(&state->lock, flags);
> +
>  		/* Record the next PFN to read in the file offset */
>  		*ppos = pfn + 1;
>  
> @@ -783,6 +807,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  				&page_owner_tmp, handle, state);
>  ext_put_continue:
>  		page_ext_put(page_ext);
> +		cond_resched();

In the changelog above, cond_resched() is said to be removed in V10. Did
you miss this? Or it is intended.

Otherwise, LGTM.

Acked-by: Zi Yan <ziy@nvidia.com>

-- 
Best Regards,
Yan, Zi



  reply	other threads:[~2026-06-25 18:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25  4:30 [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-06-25  4:30 ` [PATCH v11 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-06-25 18:26   ` Zi Yan
2026-06-25 19:20     ` Andrew Morton
2026-06-25 19:24       ` Zi Yan
2026-06-25  4:30 ` [PATCH v11 2/4] mm/page_owner: add NUMA node filter Zhen Ni
2026-06-25 18:37   ` Zi Yan [this message]
2026-06-25 19:27   ` Zi Yan
2026-06-25 20:04     ` Andrew Morton
2026-06-25  4:31 ` [PATCH v11 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-06-25  4:50   ` Andrew Morton
2026-06-25  4:31 ` [PATCH v11 4/4] mm/page_owner: document page_owner filter Zhen Ni
2026-06-25  4:55 ` [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Andrew Morton
2026-06-25 12:57   ` zhen.ni
2026-06-25 18:22 ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DJICJHL8QY2F.2C6UR95O4RBIT@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=zhen.ni@easystack.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox