From: Zhen Ni <zhen.ni@easystack.cn>
To: akpm@linux-foundation.org, vbabka@kernel.org
Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com,
hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Zhen Ni <zhen.ni@easystack.cn>
Subject: [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support
Date: Mon, 11 May 2026 11:30:16 +0800 [thread overview]
Message-ID: <20260511033017.747781-3-zhen.ni@easystack.cn> (raw)
In-Reply-To: <20260511033017.747781-1-zhen.ni@easystack.cn>
Add NUMA node filtering functionality to page_owner to allow filtering
pages by specific NUMA node(s). This is useful for NUMA-aware memory
allocation analysis and debugging.
The filter supports flexible nodelist input formats:
- Single node: echo "0" > nid
- Multiple nodes: echo "0,2,3" > nid
- Node range: echo "0-3" > nid
- Mixed format: echo "0,2-4,7" > nid
- Clear filter: echo > nid (empty string)
The implementation uses nodemask_t for efficient multi-node filtering
and nodelist_parse() for flexible input parsing. Empty input clears
the filter.
Note: Access to nid_mask uses plain load/store without locking because
nodemask_t is too large (128 bytes) for READ_ONCE/WRITE_ONCE. This is
safe for debug use: low-frequency changes and torn reads would only
cause temporary inconsistency in debug output.
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v6:
- Add node validity check using nodes_subset
to reject invalid node numbers that don't exist in the system
- Move bool filter_by_nid declaration to top of block
- Use kmalloc_objs instead of kmalloc
- Remove 100 bytes overhead
Changes in v5:
- Optimize nodes_empty() check in page iteration loop
- Add __data_racy qualifier to nid_mask field
Changes in v4:
- Remove "-1" support, use empty string to clear filter
- Use strncpy_from_user() instead of copy_from_user()
- Add concurrency safety documentation for nid_mask access
- Rename fops to page_owner_nid_filter_fops for consistency
Changes in v3:
- Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
* nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
* Direct assignment is safe for this use case
- Add comment explaining input length calculation formula
* 6 bytes = ",NNNNN" (comma + 5-digit node number)
- Simplify "-1" check using kstrtoint() instead of dual strcmp()
- Move nodemask_t mask read outside PFN iteration loop for performance
* Avoids 128-byte structure copy on each iteration
Changes in v2:
- Use nodemask_t instead of int to support multiple nodes
- Implement nodelist_parse() to support flexible input formats
* Single node: "0", "2"
* Multiple nodes: "0,2,3"
* Ranges: "0-3"
* Mixed: "0,2-4,7"
- Use %*pbl format for output (e.g., "0-2", "0,2-4,7")
- Use dynamic memory allocation (kmalloc) to handle variable-length input
- Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES)
v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easystack.cn/
v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easystack.cn/
v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easystack.cn/
v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easystack.cn/
v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easystack.cn/
---
mm/page_owner.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 92 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 27a412c52d41..8a38005539ff 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -67,10 +67,16 @@ static const char * const page_owner_print_mode_strings[] = {
struct page_owner_filter {
enum page_owner_print_mode print_mode;
+ /*
+ * Lockless access: nodemask_t exceeds READ_ONCE/WRITE_ONCE size limit.
+ * Torn reads acceptable for debug interface with infrequent writes.
+ */
+ nodemask_t __data_racy nid_mask;
};
static struct page_owner_filter owner_filter = {
.print_mode = PAGE_OWNER_PRINT_FULL_STACK,
+ .nid_mask = NODE_MASK_NONE,
};
static bool page_owner_enabled __initdata;
@@ -687,6 +693,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
struct page_ext *page_ext;
struct page_owner *page_owner;
depot_stack_handle_t handle;
+ nodemask_t mask;
+ bool filter_by_nid;
if (!static_branch_unlikely(&page_owner_inited))
return -EINVAL;
@@ -700,6 +708,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
pfn++;
+ mask = owner_filter.nid_mask;
+ filter_by_nid = !nodes_empty(mask);
+
/* Find an allocated page */
for (; pfn < max_pfn; pfn++) {
/*
@@ -732,6 +743,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
if (unlikely(!page_ext))
continue;
+ /* NUMA node filter using bitmask */
+ if (filter_by_nid) {
+ int nid = page_to_nid(page);
+
+ if (!node_isset(nid, mask))
+ goto ext_put_continue;
+ }
+
/*
* Some pages could be missed by concurrent allocation or free,
* because we don't hold the zone lock.
@@ -1043,6 +1062,77 @@ static const struct file_operations page_owner_print_mode_fops = {
.llseek = default_llseek,
};
+static ssize_t nid_filter_write(struct file *file,
+ const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ char *kbuf;
+ nodemask_t mask;
+ int ret;
+
+ /*
+ * Limit input size to handle worst-case nodelist (all nodes).
+ * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
+ */
+ if (count > (6 * MAX_NUMNODES))
+ return -EINVAL;
+
+ kbuf = kmalloc_objs(*kbuf, count + 1);
+ if (!kbuf)
+ return -ENOMEM;
+
+ if (strncpy_from_user(kbuf, buf, count) < 0) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+ kbuf[count] = '\0';
+
+ /* Support nodelist format like "0", "0,2", "0-3", or empty to clear */
+ if (nodelist_parse(kbuf, mask)) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ /* Validate that all specified nodes actually exist in the system */
+ if (!nodes_subset(mask, node_states[N_MEMORY])) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ owner_filter.nid_mask = mask;
+ ret = count;
+
+out_free:
+ kfree(kbuf);
+ return ret;
+}
+
+static int nid_filter_show(struct seq_file *m, void *v)
+{
+ nodemask_t mask = owner_filter.nid_mask;
+
+ if (nodes_empty(mask))
+ seq_puts(m, "\n");
+ else
+ seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask));
+
+ return 0;
+}
+
+static int nid_filter_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nid_filter_show, NULL);
+}
+
+static const struct file_operations page_owner_nid_filter_fops = {
+ .owner = THIS_MODULE,
+ .open = nid_filter_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .write = nid_filter_write,
+ .release = single_release,
+};
+
static int __init pageowner_init(void)
{
@@ -1058,6 +1148,8 @@ static int __init pageowner_init(void)
filter_dir = debugfs_create_dir("page_owner_filter", NULL);
debugfs_create_file("print_mode", 0600, filter_dir, NULL,
&page_owner_print_mode_fops);
+ debugfs_create_file("nid", 0600, filter_dir, NULL,
+ &page_owner_nid_filter_fops);
dir = debugfs_create_dir("page_owner_stacks", NULL);
debugfs_create_file("show_stacks", 0400, dir,
--
2.20.1
next prev parent reply other threads:[~2026-05-11 3:30 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 3:30 [PATCH v6 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-11 3:30 ` [PATCH v6 1/3] mm/page_owner: add print_mode filter Zhen Ni
2026-05-11 8:29 ` Oscar Salvador
2026-05-11 11:54 ` zhen.ni
2026-05-11 3:30 ` Zhen Ni [this message]
2026-05-11 8:54 ` [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support Oscar Salvador
2026-05-11 12:24 ` zhen.ni
2026-05-11 3:30 ` [PATCH v6 3/3] mm/page_owner: document page_owner filter features Zhen Ni
2026-05-11 8:33 ` Oscar Salvador
2026-05-11 12:23 ` [PATCH v6 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Michal Hocko
2026-05-11 12:40 ` zhen.ni
2026-05-11 12:54 ` Michal Hocko
2026-05-12 3:11 ` zhen.ni
2026-05-12 7:26 ` Michal Hocko
2026-05-12 8:16 ` zhen.ni
2026-05-12 8:54 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260511033017.747781-3-zhen.ni@easystack.cn \
--to=zhen.ni@easystack.cn \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.