* [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering
@ 2026-04-19 15:55 Zhen Ni
2026-04-19 15:55 ` [PATCH v2 1/3] mm/page_owner: add filter infrastructure Zhen Ni
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Zhen Ni @ 2026-04-19 15:55 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
This patch series introduces filtering capabilities to the page_owner
feature to address storage and performance challenges in production
environments.
Changes from v1:
- Renamed 'compact' to 'print_mode' with enum type for better clarity
* PAGE_OWNER_PRINT_FULL_STACK (0): print full stack traces
* PAGE_OWNER_PRINT_STACK_HANDLE (1): print only stack handles
- Changed NUMA filter from single node to nodelist with bitmask support
* Uses nodelist_parse() to support "0", "0,2", "0-3", "0,2-4,7" formats
* Uses nodemask_t internally for efficient multi-node filtering
* Output uses %*pbl format (e.g., "0-2", "0,2-4,7")
- Improved memory handling in nid_filter_write using dynamic allocation
* Limit: (100 + 6 * MAX_NUMNODES) to handle worst-case input
These changes address feedback from v1 review:
- "compact" was too vague → use descriptive enum (PAGE_OWNER_PRINT_*)
- Single node filter was limiting → use nodelist_parse() for multi-node support
Problem Statement
=================
In production environments with large memory configurations (e.g., 250GB+),
collecting page_owner information often results in files ranging from
several gigabytes to over 10GB. This creates significant challenges:
1. Storage pressure on production systems
2. Difficulty transferring large files from production environments
3. Post-processing overhead with tools/mm/page_owner_sort.c
The primary contributor to file size is redundant stack trace
information. While the kernel already deduplicates stacks via
stackdepot, page_owner retrieves and stores full stack traces for
each page, only to deduplicate them again during post-processing.
Additionally, in NUMA-aware environments (e.g., DPDK-based cloud
deployments where QEMU processes are bound to specific NUMA nodes),
OOM events are often node-specific rather than system-wide.
Currently, page_owner cannot filter by NUMA node, forcing users to
collect and analyze data for all nodes.
Solution
========
This patch series introduces a flexible filter infrastructure with
two initial filters:
1. **Print Mode Filter**: Outputs only stack handles instead of
full stack traces. The handle-to-stack mapping can be retrieved
from the existing show_stacks_handles interface. This dramatically
reduces output size while preserving all allocation metadata.
2. **NUMA Node Filter**: Allows filtering pages by specific NUMA node(s)
using flexible nodelist format, enabling targeted analysis of memory
issues in NUMA-aware deployments.
Implementation
==============
The series is structured as follows:
- Patch 1: Add filter infrastructure (data structures and
debugfs directory)
- Patch 2: Implement print_mode filter
- Patch 3: Implement NUMA node filter with nodelist support
Usage Example
=============
Enable print_mode and filter for NUMA nodes 0,2-3:
# cd /sys/kernel/debug/page_owner_filter/
# echo 1 > print_mode
# echo "0,2-3" > nid
# cat /sys/kernel/debug/page_owner > page_owner.txt
Sample print_mode output (showing handles only):
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper),
ts 0 ns PFN 0x40000 type Unmovable Block 512 type Unmovable
Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x252000(__GFP_NOWARN|
__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE), pid 0, tgid 0 (swapper),
ts 0 ns PFN 0x40002 type Unmovable Block 512 type Unmovable
Flags 0x23fffe0000000200(workingset|node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Testing
=======
Tested on a system with multiple NUMA nodes. Verified that:
- Filters work independently and in combination
- Print_mode output correlates correctly with show_stacks_handles
- Default behavior (filters disabled) remains unchanged
- NUMA filter works with single node, multiple nodes, and ranges
Example test session:
# cat print_mode
0
# echo "0,1-2" > nid
# cat nid
0-2
# echo "0,2-3" > nid
# cat nid
0,2-3
# echo 1 > print_mode
# head -n 100 /sys/kernel/debug/page_owner
[Shows compact mode output with handles only]
Future Enhancements
==================
The filter infrastructure is designed to be extensible. Potential
future filters could include:
- PID/TGID filtering
- Time range filtering (allocation timestamp windows)
- GFP flag filtering
- Migration type filtering
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Zhen Ni (3):
mm/page_owner: add filter infrastructure
mm/page_owner: add print_mode filter
mm/page_owner: add NUMA node filter with nodelist support
mm/page_owner.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 122 insertions(+), 2 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/3] mm/page_owner: add filter infrastructure
2026-04-19 15:55 [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
@ 2026-04-19 15:55 ` Zhen Ni
2026-04-19 15:55 ` [PATCH v2 2/3] mm/page_owner: add print_mode filter Zhen Ni
2026-04-19 15:55 ` [PATCH v2 3/3] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
2 siblings, 0 replies; 4+ messages in thread
From: Zhen Ni @ 2026-04-19 15:55 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add data structure for page_owner filtering functionality and create
debugfs directory for filter controls.
This adds:
- enum page_owner_print_mode with values for full_stack and stack_handle
- struct page_owner_filter with print_mode and nid_mask fields
- Static owner_filter instance initialized with default values
- page_owner_filter debugfs directory
The filter infrastructure will be used to add print_mode and NUMA node
filtering capabilities in subsequent commits.
Link: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easystack.cn/
Suggested-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v2:
- Use enum page_owner_print_mode instead of bool 'compact' for better clarity
- Use nodemask_t instead of int 'nid' to support multi-node filtering
---
mm/page_owner.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 8178e0be557f..5884d883837e 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -54,6 +54,21 @@ struct stack_print_ctx {
u8 flags;
};
+enum page_owner_print_mode {
+ PAGE_OWNER_PRINT_FULL_STACK,
+ PAGE_OWNER_PRINT_STACK_HANDLE,
+};
+
+struct page_owner_filter {
+ enum page_owner_print_mode print_mode;
+ nodemask_t nid_mask;
+};
+
+static struct page_owner_filter owner_filter = {
+ .print_mode = PAGE_OWNER_PRINT_FULL_STACK,
+ .nid_mask = NODE_MASK_NONE,
+};
+
static bool page_owner_enabled __initdata;
DEFINE_STATIC_KEY_FALSE(page_owner_inited);
@@ -973,7 +988,7 @@ DEFINE_SIMPLE_ATTRIBUTE(page_owner_threshold_fops, &page_owner_threshold_get,
static int __init pageowner_init(void)
{
- struct dentry *dir;
+ struct dentry *dir, *filter_dir;
if (!static_branch_unlikely(&page_owner_inited)) {
pr_info("page_owner is disabled\n");
@@ -981,6 +996,9 @@ static int __init pageowner_init(void)
}
debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops);
+
+ filter_dir = debugfs_create_dir("page_owner_filter", NULL);
+
dir = debugfs_create_dir("page_owner_stacks", NULL);
debugfs_create_file("show_stacks", 0400, dir,
(void *)(STACK_PRINT_FLAG_STACK |
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/3] mm/page_owner: add print_mode filter
2026-04-19 15:55 [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-04-19 15:55 ` [PATCH v2 1/3] mm/page_owner: add filter infrastructure Zhen Ni
@ 2026-04-19 15:55 ` Zhen Ni
2026-04-19 15:55 ` [PATCH v2 3/3] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
2 siblings, 0 replies; 4+ messages in thread
From: Zhen Ni @ 2026-04-19 15:55 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add print_mode functionality to reduce page_owner output size by
printing only the stack handle instead of the full stack trace.
Example output with print_mode enabled:
Page allocated via order 0, mask 0x42800(GFP_NOWAIT|__GFP_COMP),
pid 1, tgid 1 (systemd), ts 349667370 ns
PFN 0xa00a2 type Unmovable Block 1280 type Unmovable
Flags 0x33fffe0000004124(referenced|lru|active|private|node=3|zone=0|
lastcpupid=0x1ffff)
handle: 17432583
Charged to memcg /
Print mode significantly reduces output size while preserving all
other page allocation information. The correspondence between handles
and stack traces can be obtained through the show_stacks_handles interface.
Link: https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easystack.cn/
Suggested-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v2:
- Renamed from 'compact mode' to 'print_mode' for better clarity
- Use enum values (0=full_stack, 1=stack_handle) instead of boolean
- Update debugfs filename from 'compact' to 'print_mode'
---
mm/page_owner.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5884d883837e..6d87b6948cfa 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -590,7 +590,13 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
migratetype_names[pageblock_mt],
&page->flags);
- ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
+ /* Print mode: full stack or stack handle */
+ if (READ_ONCE(owner_filter.print_mode) == PAGE_OWNER_PRINT_STACK_HANDLE) {
+ ret += scnprintf(kbuf + ret, count - ret,
+ "handle: %d\n", handle);
+ } else {
+ ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
+ }
if (ret >= count)
goto err;
@@ -985,6 +991,24 @@ static int page_owner_threshold_set(void *data, u64 val)
DEFINE_SIMPLE_ATTRIBUTE(page_owner_threshold_fops, &page_owner_threshold_get,
&page_owner_threshold_set, "%llu");
+static int page_owner_print_mode_get(void *data, u64 *val)
+{
+ *val = READ_ONCE(owner_filter.print_mode);
+ return 0;
+}
+
+static int page_owner_print_mode_set(void *data, u64 val)
+{
+ if (val > PAGE_OWNER_PRINT_STACK_HANDLE)
+ return -EINVAL;
+ WRITE_ONCE(owner_filter.print_mode, val);
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(page_owner_print_mode_fops,
+ &page_owner_print_mode_get,
+ &page_owner_print_mode_set, "%lld");
+
static int __init pageowner_init(void)
{
@@ -998,6 +1022,8 @@ static int __init pageowner_init(void)
debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops);
filter_dir = debugfs_create_dir("page_owner_filter", NULL);
+ debugfs_create_file("print_mode", 0600, filter_dir, NULL,
+ &page_owner_print_mode_fops);
dir = debugfs_create_dir("page_owner_stacks", NULL);
debugfs_create_file("show_stacks", 0400, dir,
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 3/3] mm/page_owner: add NUMA node filter with nodelist support
2026-04-19 15:55 [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-04-19 15:55 ` [PATCH v2 1/3] mm/page_owner: add filter infrastructure Zhen Ni
2026-04-19 15:55 ` [PATCH v2 2/3] mm/page_owner: add print_mode filter Zhen Ni
@ 2026-04-19 15:55 ` Zhen Ni
2 siblings, 0 replies; 4+ messages in thread
From: Zhen Ni @ 2026-04-19 15:55 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add NUMA node filtering functionality to page_owner to allow
filtering pages by specific NUMA node(s) using nodelist format.
The filter allows users to focus on pages from specific NUMA nodes,
which is useful for NUMA-aware memory allocation analysis and debugging.
Supported input formats:
- Single node: echo "2" > nid
- Multiple nodes: echo "0,2,3" > nid
- Node range: echo "0-3" > nid
- Mixed format: echo "0,2-4,7" > nid
- Disable filter: echo "-1" > nid
Link: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easystack.cn/
Suggested-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v2:
- Use nodemask_t instead of int to support multiple nodes
- Implement nodelist_parse() to support flexible input formats
* Single node: "0", "2"
* Multiple nodes: "0,2,3"
* Ranges: "0-3"
* Mixed: "0,2-4,7"
- Use %*pbl format for output (e.g., "0-2", "0,2-4,7")
- Use dynamic memory allocation (kmalloc) to handle variable-length input
- Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES)
---
mm/page_owner.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 6d87b6948cfa..8c13bb3798d8 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -707,6 +707,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
* user through copy_to_user() or GFP_KERNEL allocations.
*/
struct page_owner page_owner_tmp;
+ nodemask_t mask;
/*
* If the new page is in a new MAX_ORDER_NR_PAGES area,
@@ -730,6 +731,15 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
if (unlikely(!page_ext))
continue;
+ /* NUMA node filter using bitmask */
+ mask = READ_ONCE(owner_filter.nid_mask);
+ if (!nodes_empty(mask)) {
+ int nid = page_to_nid(page);
+
+ if (!node_isset(nid, mask))
+ goto ext_put_continue;
+ }
+
/*
* Some pages could be missed by concurrent allocation or free,
* because we don't hold the zone lock.
@@ -1009,6 +1019,70 @@ DEFINE_SIMPLE_ATTRIBUTE(page_owner_print_mode_fops,
&page_owner_print_mode_get,
&page_owner_print_mode_set, "%lld");
+static ssize_t nid_filter_write(struct file *file,
+ const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ char *kbuf;
+ nodemask_t mask;
+ int ret;
+
+ /* Limit input size to handle worst-case nodelist (all nodes) */
+ if (count > (100 + 6 * MAX_NUMNODES))
+ return -EINVAL;
+
+ kbuf = kmalloc(count + 1, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ if (copy_from_user(kbuf, buf, count)) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+ kbuf[count] = '\0';
+
+ /* Support: "-1" to clear, or nodelist format like "0", "0,2", "0-3" */
+ if (strcmp(kbuf, "-1\n") == 0 || strcmp(kbuf, "-1") == 0)
+ nodes_clear(mask);
+ else if (nodelist_parse(kbuf, mask)) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ WRITE_ONCE(owner_filter.nid_mask, mask);
+ ret = count;
+
+out_free:
+ kfree(kbuf);
+ return ret;
+}
+
+static int nid_filter_show(struct seq_file *m, void *v)
+{
+ nodemask_t mask = READ_ONCE(owner_filter.nid_mask);
+
+ if (nodes_empty(mask))
+ seq_puts(m, "-1\n");
+ else
+ seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask));
+
+ return 0;
+}
+
+static int nid_filter_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nid_filter_show, NULL);
+}
+
+static const struct file_operations nid_filter_fops = {
+ .owner = THIS_MODULE,
+ .open = nid_filter_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .write = nid_filter_write,
+ .release = single_release,
+};
+
static int __init pageowner_init(void)
{
@@ -1024,6 +1098,8 @@ static int __init pageowner_init(void)
filter_dir = debugfs_create_dir("page_owner_filter", NULL);
debugfs_create_file("print_mode", 0600, filter_dir, NULL,
&page_owner_print_mode_fops);
+ debugfs_create_file("nid", 0600, filter_dir, NULL,
+ &nid_filter_fops);
dir = debugfs_create_dir("page_owner_stacks", NULL);
debugfs_create_file("show_stacks", 0400, dir,
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-19 17:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-19 15:55 [PATCH v2 0/3] mm/page_owner: add filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-04-19 15:55 ` [PATCH v2 1/3] mm/page_owner: add filter infrastructure Zhen Ni
2026-04-19 15:55 ` [PATCH v2 2/3] mm/page_owner: add print_mode filter Zhen Ni
2026-04-19 15:55 ` [PATCH v2 3/3] mm/page_owner: add NUMA node filter with nodelist support Zhen Ni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox