* [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering
@ 2026-05-15 9:19 Zhen Ni
2026-05-15 9:19 ` [PATCH v7 1/4] mm/page_owner: add print_mode filter Zhen Ni
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Zhen Ni @ 2026-05-15 9:19 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
This patch series introduces per-file-descriptor filtering capabilities to the
page_owner feature.
Changes from v6:
- print_mode and NUMA node filter implementation (patches 1-2)
- Add page_owner_filter userspace tool (patch 3)
- Update documentation for per-fd interface (patch 4)
Changes from v5:
- Address SeongJae Park's review comments for patch 1/3:
* Remove unnecessary braces in if/else statement
* Use stack array instead of kmalloc for input buffer
- Address SeongJae Park's review comments for patch 2/3:
* Add node validity check using nodes_subset() to reject non-existent nodes
* Separate variable declaration and statement
* Use kmalloc_objs() for consistency with kernel patterns
* Remove 100 bytes overhead
- Add lore links to all previous versions
Changes from v4:
- Optimize nodes_empty() check in page iteration loop
- Add __data_racy qualifier to nid_mask field
Changes from v3:
- Change print_mode from numeric (0/1) to string-based interface
* Use "full_stack"/"stack_handle" strings instead of numbers
* Display current mode with bracket notation: "[full_stack] stack_handle"
- Remove "-1" support from NUMA filter
* Use empty string to clear filter (echo > nid)
- Use strncpy_from_user() instead of copy_from_user()
- Rename nid_filter_fops to page_owner_nid_filter_fops for consistency
- Merge patch 1 (infrastructure) and patch 2 (print_mode) from v3
- Update documentation to match new interface
* String-based examples
* Tab indentation in code blocks
Changes from v2:
- Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
* nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
* Direct assignment is safe for this use case
- Add comment explaining input length calculation formula
* 6 bytes = ",NNNNN" (comma + 5-digit node number)
- Simplify "-1" check using kstrtoint() instead of dual strcmp()
- Move nodemask_t mask read outside PFN iteration loop for performance
* Avoids 128-byte structure copy on each iteration
- Add documentation for filter features (patch 3/3)
Changes from v1:
- Renamed 'compact' to 'print_mode' with enum type for better clarity
* PAGE_OWNER_PRINT_FULL_STACK (0): print full stack traces
* PAGE_OWNER_PRINT_STACK_HANDLE (1): print only stack handles
- Changed NUMA filter from single node to nodelist with bitmask support
* Uses nodelist_parse() to support "0", "0,2", "0-3", "0,2-4,7" formats
* Uses nodemask_t internally for efficient multi-node filtering
* Output uses %*pbl format (e.g., "0-2", "0,2-4,7")
- Improved memory handling in nid_filter_write using dynamic allocation
* Limit: (100 + 6 * MAX_NUMNODES) to handle worst-case input
Problem Statement
=================
In production environments with large memory configurations (e.g., 250GB+),
collecting page_owner information often results in files ranging from
several gigabytes to over 10GB. This creates significant challenges:
1. Storage pressure on production systems
2. Difficulty transferring large files from production environments
3. Post-processing overhead with tools/mm/page_owner_sort.c
The primary contributor to file size is redundant stack trace
information. While the kernel already deduplicates stacks via
stackdepot, page_owner retrieves and stores full stack traces for
each page, only to deduplicate them again during post-processing.
Additionally, in NUMA-aware environments (e.g., DPDK-based cloud
deployments where QEMU processes are bound to specific NUMA nodes),
OOM events are often node-specific rather than system-wide.
Previously, page_owner could not filter by NUMA node, forcing users to
collect and analyze data for all nodes.
Solution
========
This patch series introduces a per-file-descriptor filter infrastructure
with two initial filters:
1. **Print Mode Filter**: Outputs only stack handles instead of
full stack traces. The handle-to-stack mapping can be retrieved
from the existing show_stacks_handles interface. This dramatically
reduces output size while preserving all allocation metadata.
2. **NUMA Node Filter**: Allows filtering pages by specific NUMA node(s)
using flexible nodelist format, enabling targeted analysis of memory
issues in NUMA-aware deployments.
The per-fd design allows multiple concurrent page_owner reads with
different filters, solving coordination issues in multi-user production
environments.
Implementation
==============
The series is structured as follows:
- Patch 1: Implement print_mode filter infrastructure
* Add file->private_data to store per-fd filter state
* Add .open, .release, and .write file operations
* Support "stack", "handle", and "stack_handle" modes via "mode=" write commands
- Patch 2: Implement NUMA node filter infrastructure
* Add nid_filter field to per-fd state
* Support flexible nodelist format via "nid=" write commands (single, multiple, ranges)
* Validate nodes and reject non-existent nodes using nodes_subset()
- Patch 3: Add page_owner_filter userspace tool
* Manages per-fd filters via write() interface
* Provides user-friendly command-line interface
* Includes comprehensive input validation
- Patch 4: Document filter features and usage
Usage Example
=============
Using the page_owner_filter tool with per-fd filters:
# ./page_owner_filter -m stack_handle -n "0,2-3" -o page_owner.txt
The tool opens /sys/kernel/debug/page_owner, sets filters via write(),
then reads the filtered output to the specified file (or stdout).
Sample print_mode output (showing handles only):
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper),
ts 0 ns PFN 0x40000 type Unmovable Block 512 type Unmovable
Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x252000(__GFP_NOWARN|
__GFP_NORETRY|__GFP_COMP|__GFP_THISNODE), pid 0, tgid 0 (swapper),
ts 0 ns PFN 0x40002 type Unmovable Block 512 type Unmovable
Flags 0x23fffe0000000200(workingset|node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Testing
=======
Tested on a 4-node NUMA system. Verified that:
1. **Kernel without page_owner enabled**:
Tool properly detects and reports missing page_owner support:
```
$ ./page_owner_filter -m stack
Error: /sys/kernel/debug/page_owner does not exist
Make sure page_owner is enabled in kernel
```
2. **Kernel without per-fd filter support**:
Tool properly detects and reports missing filter support:
```
$ ./page_owner_filter -m stack
Error: Kernel rejected the filter command.
Possible causes:
- Kernel does not support per-fd filtering
- NUMA node has no memory
- Unknown reason
```
3. **Comprehensive userspace tool testing**:
Tested 26 test cases covering:
- Help messages (-h, --help)
- Invalid inputs (mode, nid format, range validation)
- Valid modes (stack, handle, stack_handle)
- Valid nid filters (single node, multiple nodes, ranges)
- Combined mode and nid filters
- Node validity verification (grep-based verification)
- Error handling for out-of-range nodes
Test script (test_page_owner_filter.sh):
```bash
#!/bin/bash
# Test script for page_owner_filter tool
cd "$(dirname "$0")"
echo "========================================="
echo "page_owner_filter Test Suite"
echo "========================================="
echo
echo "Test 1: -h"
echo "./page_owner_filter -h"
./page_owner_filter -h
echo
echo "Test 2: --help"
echo "./page_owner_filter --help"
./page_owner_filter --help
echo
echo "Test 3: Invalid mode"
echo ./page_owner_filter -m invalid
./page_owner_filter -m invalid
echo
echo "Test 4: Invalid nid with letters"
echo ./page_owner_filter -n 0,a,2
./page_owner_filter -n 0,a,2
echo
echo "Test 5: Invalid nid with double comma"
echo ./page_owner_filter -n 0,,2
./page_owner_filter -n 0,,2
echo
echo "Test 6: Invalid nid starting with comma"
echo ./page_owner_filter -n ,0,1
./page_owner_filter -n ,0,1
echo
echo "Test 7: Invalid nid ending with comma"
echo ./page_owner_filter -n "0,1,"
./page_owner_filter -n "0,1,"
echo
echo "Test 8: No filters specified"
echo ./page_owner_filter
./page_owner_filter
echo
echo "Test 9: Invalid nid - node 4 (out of range)"
echo ./page_owner_filter -n 4
./page_owner_filter -n 4
echo
echo "Test 10: Invalid nid - large number"
echo './page_owner_filter -n 65535'
./page_owner_filter -n 65535
echo
echo "Test 11: Invalid mode AND invalid nid"
echo ./page_owner_filter -m wrong -n abc
./page_owner_filter -m wrong -n abc
echo
echo "Test 12: Two invalid modes (try both)"
echo ./page_owner_filter -m wrong1 -m wrong2
./page_owner_filter -m wrong1 -m wrong2
echo
echo "Test 13: Valid mode - stack"
echo './page_owner_filter -m stack | head -20'
./page_owner_filter -m stack | head -20
echo
echo "Test 14: Valid mode - handle"
echo './page_owner_filter -m handle | head -20'
./page_owner_filter -m handle | head -20
echo
echo "Test 15: Valid mode - stack_handle"
echo './page_owner_filter -m stack_handle | head -20'
./page_owner_filter -m stack_handle | head -20
echo
echo "Test 16: All modes"
echo './page_owner_filter -m stack -m handle -m stack_handle | head -20'
./page_owner_filter -m stack -m handle -m stack_handle | head -20
echo
echo "Test 17: Valid nid - single"
echo './page_owner_filter -n 0 | head -20'
./page_owner_filter -n 0 | head -20
echo 'Verify: should have node=0, should NOT have node=1,2,3'
echo './page_owner_filter -n 0 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 0 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 18: Valid nid - multiple"
echo 'Verify: should have node=0,1,3, should NOT have node=2'
echo './page_owner_filter -n 0,1,3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 0,1,3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 19: Valid nid - range"
echo 'Verify: should have node=2,3, should NOT have node=0,1'
echo './page_owner_filter -n 2-3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 2-3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 20: Valid nid - range"
echo 'Verify: should have node=0,1,2,3'
echo './page_owner_filter -n 2-3,0-1 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 2-3,0-1 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 21: Valid nid - range"
echo 'Verify: should have node=2, should NOT have node=0,1,3'
echo './page_owner_filter -n 2-2 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 2-2 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 22: Invalid nid - range start must be <= end"
echo './page_owner_filter -n 3-0'
./page_owner_filter -n 3-0
echo
echo './page_owner_filter -n 1-0,0-1'
./page_owner_filter -n 1-0,0-1
echo
echo './page_owner_filter -n 2-3,1-0,0-1'
./page_owner_filter -n 2-3,1-0,0-1
echo
echo './page_owner_filter -n 3,1-0,1'
./page_owner_filter -n 3,1-0,1
echo
echo "Test 23: Invalid nid - NUMA node 4 and above have no memory"
echo './page_owner_filter -n 0-4'
./page_owner_filter -n 0-4
echo
echo './page_owner_filter -n 1,0-4'
./page_owner_filter -n 1,0-4
echo
echo './page_owner_filter -n 7-8'
./page_owner_filter -n 7-8
echo
echo './page_owner_filter -n 8-1'
./page_owner_filter -n 8-1
echo
echo "Test 24: Valid nid - range and comma mixed"
echo 'Verify: should have node=0,2,3, should NOT have node=1'
echo './page_owner_filter -n 2-3,0| grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 2-3,0 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 25: Valid nid - range and comma mixed"
echo 'Verify: should have node=1,2,3, should NOT have node=0'
echo './page_owner_filter -n 1,2-3| grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -n 1,2-3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "Test 26: Valid handle mode + nid filter"
echo './page_owner_filter -m handle -n "0,1" | head -20'
./page_owner_filter -m handle -n "0,1" | head -20
echo 'Verify: should show stacks, and only node=0,1 (not 2,3)'
echo './page_owner_filter -m handle -n "0,1" | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c'
./page_owner_filter -m handle -n "0,1" | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
echo
echo "========================================="
echo "Tests completed. Please check output above."
echo "========================================="
```
Test output:
```
=========================================
page_owner_filter Test Suite
=========================================
Test 1: -h
./page_owner_filter -h
Usage: ./page_owner_filter [OPTIONS]
Options:
-m, --mode MODE : print_mode (stack, handle, or stack_handle)
-n, --nid NID_LIST : NUMA node IDs (comma-separated or ranges)
-o, --output FILE : output file (default: stdout)
-h, --help : show this help message
Examples:
./page_owner_filter -m stack
./page_owner_filter -m handle
./page_owner_filter -m stack_handle
./page_owner_filter -m stack -o output.txt
./page_owner_filter -n 0,1,2
./page_owner_filter -m stack -n 0
Test 2: --help
./page_owner_filter --help
Usage: ./page_owner_filter [OPTIONS]
Options:
-m, --mode MODE : print_mode (stack, handle, or stack_handle)
-n, --nid NID_LIST : NUMA node IDs (comma-separated or ranges)
-o, --output FILE : output file (default: stdout)
-h, --help : show this help message
Examples:
./page_owner_filter -m stack
./page_owner_filter -m handle
./page_owner_filter -m stack_handle
./page_owner_filter -m stack -o output.txt
./page_owner_filter -n 0,1,2
./page_owner_filter -m stack -n 0
Test 3: Invalid mode
./page_owner_filter -m invalid
Error: Invalid mode 'invalid'
Valid modes: stack, handle, stack_handle
Test 4: Invalid nid with letters
./page_owner_filter -n 0,a,2
Error: Invalid character 'a' in nid_list
Test 5: Invalid nid with double comma
./page_owner_filter -n 0,,2
Error: Invalid nid_list format
Test 6: Invalid nid starting with comma
./page_owner_filter -n ,0,1
Error: Invalid nid_list format
Test 7: Invalid nid ending with comma
./page_owner_filter -n 0,1,
Error: Invalid nid_list format
Test 8: No filters specified
./page_owner_filter
Error: At least one filter (-m or -n) must be specified
Usage: ./page_owner_filter [OPTIONS]
Options:
-m, --mode MODE : print_mode (stack, handle, or stack_handle)
-n, --nid NID_LIST : NUMA node IDs (comma-separated or ranges)
-o, --output FILE : output file (default: stdout)
-h, --help : show this help message
Examples:
./page_owner_filter -m stack
./page_owner_filter -m handle
./page_owner_filter -m stack_handle
./page_owner_filter -m stack -o output.txt
./page_owner_filter -n 0,1,2
./page_owner_filter -m stack -n 0
Test 9: Invalid nid - node 4 (out of range)
./page_owner_filter -n 4
Error: Kernel rejected the filter command.
Possible causes:
- Kernel does not support per-fd filtering
- NUMA node has no memory
- Unknown reason
Test 10: Invalid nid - large number
./page_owner_filter -n 65535
write filter command: Numerical result out of range
Test 11: Invalid mode AND invalid nid
./page_owner_filter -m wrong -n abc
Error: Invalid mode 'wrong'
Valid modes: stack, handle, stack_handle
Test 12: Two invalid modes (try both)
./page_owner_filter -m wrong1 -m wrong2
Error: Invalid mode 'wrong1'
Valid modes: stack, handle, stack_handle
Test 13: Valid mode - stack
./page_owner_filter -m stack | head -20
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40000 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40001 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40002 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
Test 14: Valid mode - handle
./page_owner_filter -m handle | head -20
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40000 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40001 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40002 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40003 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40004 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000040(head|node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Test 15: Valid mode - stack_handle
./page_owner_filter -m stack_handle | head -20
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40000 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40001 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40002 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
Test 16: All modes
./page_owner_filter -m stack -m handle -m stack_handle | head -20
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40000 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40001 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40002 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
Test 17: Valid nid - single
./page_owner_filter -n 0 | head -20
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40000 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40001 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40002 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
register_early_stack+0x2c/0x70
init_page_owner+0x2c/0x460
page_ext_init+0x204/0x298
mm_core_init+0xdc/0x14c
Verify: should have node=0, should NOT have node=1,2,3
./page_owner_filter -n 0 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
91327 node=0
Test 18: Valid nid - multiple
Verify: should have node=0,1,3, should NOT have node=2
./page_owner_filter -n 0,1,3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
91299 node=0
43515 node=1
110404 node=3
Test 19: Valid nid - range
Verify: should have node=2,3, should NOT have node=0,1
./page_owner_filter -n 2-3 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
19391 node=2
110287 node=3
Test 20: Valid nid - range
Verify: should have node=0,1,2,3
./page_owner_filter -n 2-3,0-1 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
91562 node=0
43527 node=1
19495 node=2
110286 node=3
Test 21: Valid nid - range
Verify: should have node=2, should NOT have node=0,1,3
./page_owner_filter -n 2-2 | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
19505 node=2
Test 22: Invalid nid - range start must be <= end
./page_owner_filter -n 3-0
Error: Invalid range 3-0 (start must be <= end)
./page_owner_filter -n 1-0,0-1
Error: Invalid range 1-0 (start must be <= end)
./page_owner_filter -n 2-3,1-0,0-1
Error: Invalid range 1-0 (start must be <= end)
./page_owner_filter -n 3,1-0,1
Error: Invalid range 1-0 (start must be <= end)
Test 23: Invalid nid - NUMA node 4 and above have no memory
./page_owner_filter -n 0-4
Error: Kernel rejected the filter command.
Possible causes:
- Kernel does not support per-fd filtering
- NUMA node has no memory
- Unknown reason
./page_owner_filter -n 1,0-4
Error: Kernel rejected the filter command.
Possible causes:
- Kernel does not support per-fd filtering
- NUMA node has no memory
- Unknown reason
./page_owner_filter -n 7-8
Error: Kernel rejected the filter command.
Possible causes:
- Kernel does not support per-fd filtering
- NUMA node has no memory
- Unknown reason
./page_owner_filter -n 8-1
Error: Invalid range 8-1 (start must be <= end)
Test 24: Valid nid - range and comma mixed
Verify: should have node=0,2,3, should NOT have node=1
./page_owner_filter -n 2-3,0| grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
91741 node=0
19389 node=2
110286 node=3
Test 25: Valid nid - range and comma mixed
Verify: should have node=1,2,3, should NOT have node=0
./page_owner_filter -n 1,2-3| grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
43462 node=1
19402 node=2
110288 node=3
Test 26: Valid handle mode + nid filter
./page_owner_filter -m handle -n "0,1" | head -20
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40000 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40001 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40002 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40003 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000000(node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Page allocated via order 0, mask 0x0(), pid 0, tgid 0 (swapper), ts 0 ns
PFN 0x40004 type Unmovable Block 512 type Unmovable Flags 0x3fffe0000000040(head|node=0|zone=0|lastcpupid=0x1ffff)
handle: 1048577
Verify: should show stacks, and only node=0,1 (not 2,3)
./page_owner_filter -m handle -n "0,1" | grep "PFN" | grep -o "node=[0-9]" | sort | uniq -c
91677 node=0
43458 node=1
=========================================
Tests completed. Please check output above.
=========================================
```
Future Enhancements
===================
The per-fd filter infrastructure is designed to be extensible. Potential
future filters could include:
- PID/TGID filtering
- Time range filtering (allocation timestamp windows)
- GFP flag filtering
- Migration type filtering
v6: https://lore.kernel.org/linux-mm/20260511024748.183550-1-zhen.ni@easystack.cn/
v5: https://lore.kernel.org/linux-mm/20260507064643.179187-1-zhen.ni@easystack.cn/
v4: https://lore.kernel.org/linux-mm/20260430163247.13628-1-zhen.ni@easystack.cn/
v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-1-zhen.ni@easystack.cn/
v2: https://lore.kernel.org/linux-mm/20260419155540.376847-1-zhen.ni@easystack.cn/
v1: https://lore.kernel.org/linux-mm/20260417154638.22370-1-zhen.ni@easystack.cn/
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Zhen Ni (4):
mm/page_owner: add print_mode filter
mm/page_owner: add NUMA node filter
tools/mm: add page_owner_filter userspace tool
mm/page_owner: document page_owner filter
Documentation/mm/page_owner.rst | 77 ++++++++-
mm/page_owner.c | 143 ++++++++++++++++-
tools/mm/Makefile | 4 +-
tools/mm/page_owner_filter.c | 277 ++++++++++++++++++++++++++++++++
4 files changed, 493 insertions(+), 8 deletions(-)
create mode 100644 tools/mm/page_owner_filter.c
--
2.20.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v7 1/4] mm/page_owner: add print_mode filter
2026-05-15 9:19 [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
@ 2026-05-15 9:19 ` Zhen Ni
2026-05-15 9:19 ` [PATCH v7 2/4] mm/page_owner: add NUMA node filter Zhen Ni
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Zhen Ni @ 2026-05-15 9:19 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add a print_mode filter to page_owner that allows users to choose between
printing stack traces, stack handles, or both, providing flexibility for
different debugging and analysis scenarios.
The filter provides three modes via page_owner:
- Writing "mode=stack" prints stack traces for each page (default)
- Writing "mode=handle" prints only the handle number
- Writing "mode=stack_handle" prints both stack traces and handles
The default stack mode maintains backward compatibility with existing
usage, displaying complete stack traces for each page allocation.
The handle mode dramatically reduces log size and improves performance by
showing only the handle number instead of the full stack trace. Testing
shows handle mode reduces output size by ~66% (84MB vs 244MB) and
improves read performance by ~4.4x compared to full stack output. The
mapping from handles to actual stack traces can be obtained via the
show_stacks_handles interface.
The stack_handle mode prints both stack traces and handles, making it
easier to identify pages with the same allocation pattern by comparing
handle numbers instead of comparing large stack traces.
Example usage:
# Using the page_owner_filter tool (recommended)
./page_owner_filter -m stack # Print only stack traces (default)
./page_owner_filter -m handle # Print only handles
./page_owner_filter -m stack_handle # Print both stack and handles
Sample output (handle mode):
Page allocated via order 0, migratetype Unmovable, gfp_mask 0x1100ca,
pid 1, tgid 1 (systemd), ts 123456789 ns
PFN 0x1000 type Unmovable Block 1 type Unmovable
Flags 0x3fffe800000084(referenced|lru|active|private|node=0|zone=1)
handle: 17432583
...
This implementation uses per-file-descriptor filter state stored in
file->private_data, allowing each opener to have independent filter
configuration.
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v7:
- per-file-descriptor implementation
Changes in v6:
- Remove unnecessary braces in if/else statement (coding style)
- Use stack array (char kbuf[33]) instead of kmalloc for input buffer
Changes in v5:
- No code changes
Changes in v4:
- Change from numeric (0/1) to string-based interface ("full_stack"/"stack_handle")
- Merge infrastructure patch into this patch
Changes in v3:
- No code changes
Changes in v2:
- Renamed from 'compact mode' to 'print_mode' for better clarity
- Use enum values (0=full_stack, 1=stack_handle) instead of boolean
- Update debugfs filename from 'compact' to 'print_mode'
v6: https://lore.kernel.org/linux-mm/20260511033017.747781-2-zhen.ni@easystack.cn/
v5: https://lore.kernel.org/linux-mm/20260507064643.179187-2-zhen.ni@easystack.cn/
v4: https://lore.kernel.org/linux-mm/20260430163247.13628-2-zhen.ni@easystack.cn/
v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-2-zhen.ni@easystack.cn/
https://lore.kernel.org/linux-mm/20260428071112.1420380-3-zhen.ni@easystack.cn/
v2: https://lore.kernel.org/linux-mm/20260419155540.376847-2-zhen.ni@easystack.cn/
https://lore.kernel.org/linux-mm/20260419155540.376847-3-zhen.ni@easystack.cn/
v1: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easystack.cn/
https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easystack.cn/
---
mm/page_owner.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 102 insertions(+), 4 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 8178e0be557f..559d9782ac0a 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -54,6 +54,22 @@ struct stack_print_ctx {
u8 flags;
};
+enum page_owner_print_mode {
+ PAGE_OWNER_PRINT_STACK,
+ PAGE_OWNER_PRINT_HANDLE,
+ PAGE_OWNER_PRINT_STACK_HANDLE,
+};
+
+static const char * const page_owner_print_mode_strings[] = {
+ [PAGE_OWNER_PRINT_STACK] = "stack",
+ [PAGE_OWNER_PRINT_HANDLE] = "handle",
+ [PAGE_OWNER_PRINT_STACK_HANDLE] = "stack_handle",
+};
+
+struct page_owner_filter_state {
+ enum page_owner_print_mode print_mode;
+};
+
static bool page_owner_enabled __initdata;
DEFINE_STATIC_KEY_FALSE(page_owner_inited);
@@ -547,7 +563,8 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
static ssize_t
print_page_owner(char __user *buf, size_t count, unsigned long pfn,
struct page *page, struct page_owner *page_owner,
- depot_stack_handle_t handle)
+ depot_stack_handle_t handle,
+ struct page_owner_filter_state *state)
{
int ret, pageblock_mt, page_mt;
char *kbuf;
@@ -575,7 +592,13 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
migratetype_names[pageblock_mt],
&page->flags);
- ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
+ if (state->print_mode != PAGE_OWNER_PRINT_HANDLE)
+ ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
+
+ if (state->print_mode != PAGE_OWNER_PRINT_STACK)
+ ret += scnprintf(kbuf + ret, count - ret, "handle: %d\n",
+ handle);
+
if (ret >= count)
goto err;
@@ -664,6 +687,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
struct page_ext *page_ext;
struct page_owner *page_owner;
depot_stack_handle_t handle;
+ struct page_owner_filter_state *state = file->private_data;
if (!static_branch_unlikely(&page_owner_inited))
return -EINVAL;
@@ -746,7 +770,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
page_owner_tmp = *page_owner;
page_ext_put(page_ext);
return print_page_owner(buf, count, pfn, page,
- &page_owner_tmp, handle);
+ &page_owner_tmp, handle, state);
ext_put_continue:
page_ext_put(page_ext);
}
@@ -847,7 +871,81 @@ static void init_early_allocated_pages(void)
init_pages_in_zone(zone);
}
+static int page_owner_open(struct inode *inode, struct file *file)
+{
+ struct page_owner_filter_state *state;
+
+ state = kzalloc_obj(*state);
+ if (!state)
+ return -ENOMEM;
+
+ state->print_mode = PAGE_OWNER_PRINT_STACK;
+ file->private_data = state;
+ return 0;
+}
+
+static int page_owner_release(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+ return 0;
+}
+
+static ssize_t page_owner_write(struct file *file,
+ const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ char *kbuf;
+ char *token;
+ int ret;
+ size_t max_input_len;
+ struct page_owner_filter_state *state = file->private_data;
+
+ /*
+ * Maximum input length for filter commands:
+ * 32: print_mode command max length is 17 ("mode=stack_handle").
+ */
+ max_input_len = 32;
+
+ if (count > max_input_len)
+ return -EINVAL;
+
+ kbuf = kmalloc_objs(*kbuf, count + 1);
+ if (!kbuf)
+ return -ENOMEM;
+
+ if (strncpy_from_user(kbuf, buf, count + 1) < 0) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+
+ while ((token = strsep(&kbuf, " \t\n")) != NULL) {
+ if (*token == '\0')
+ continue;
+
+ if (!strncmp(token, "mode=", 5)) {
+ ret = sysfs_match_string(page_owner_print_mode_strings,
+ token + 5);
+ if (ret < 0)
+ goto out_free;
+ state->print_mode = ret;
+ } else {
+ ret = -EINVAL;
+ goto out_free;
+ }
+ }
+
+ ret = count;
+
+out_free:
+ kfree(kbuf);
+ return ret;
+}
+
static const struct file_operations page_owner_fops = {
+ .owner = THIS_MODULE,
+ .open = page_owner_open,
+ .release = page_owner_release,
+ .write = page_owner_write,
.read = read_page_owner,
.llseek = lseek_page_owner,
};
@@ -980,7 +1078,7 @@ static int __init pageowner_init(void)
return 0;
}
- debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops);
+ debugfs_create_file("page_owner", 0600, NULL, NULL, &page_owner_fops);
dir = debugfs_create_dir("page_owner_stacks", NULL);
debugfs_create_file("show_stacks", 0400, dir,
(void *)(STACK_PRINT_FLAG_STACK |
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v7 2/4] mm/page_owner: add NUMA node filter
2026-05-15 9:19 [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-15 9:19 ` [PATCH v7 1/4] mm/page_owner: add print_mode filter Zhen Ni
@ 2026-05-15 9:19 ` Zhen Ni
2026-05-15 9:19 ` [PATCH v7 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-05-15 9:19 ` [PATCH v7 4/4] mm/page_owner: document page_owner filter Zhen Ni
3 siblings, 0 replies; 5+ messages in thread
From: Zhen Ni @ 2026-05-15 9:19 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add NUMA node filtering functionality to page_owner to allow filtering
pages by specific NUMA node(s). This is useful for NUMA-aware memory
allocation analysis and debugging.
The filter supports flexible input formats:
- Single node: nid=0
- Multiple nodes: nid=0,2,3
- Node range: nid=0-3
- Mixed format: nid=0,2-4,7
Example usage:
# Using the page_owner_filter tool (recommended)
./page_owner_filter -n 0-3
./page_owner_filter -m stack_handle -n 0,2-4,7
The implementation uses per-file-descriptor filter state stored in
file->private_data, allowing each opener to have independent filter
configuration. It uses nodemask_t for efficient multi-node filtering and
nodelist_parse() for flexible input parsing. Node validity is verified
using nodes_subset() to reject nodes without memory.
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v7:
- per-file-descriptor implementation
Changes in v6:
- Add node validity check using nodes_subset
to reject invalid node numbers that don't exist in the system
- Move bool filter_by_nid declaration to top of block
- Use kmalloc_objs instead of kmalloc
- Remove 100 bytes overhead
Changes in v5:
- Optimize nodes_empty() check in page iteration loop
- Add __data_racy qualifier to nid_mask field
Changes in v4:
- Remove "-1" support, use empty string to clear filter
- Use strncpy_from_user() instead of copy_from_user()
- Add concurrency safety documentation for nid_mask access
- Rename fops to page_owner_nid_filter_fops for consistency
Changes in v3:
- Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
* nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
* Direct assignment is safe for this use case
- Add comment explaining input length calculation formula
* 6 bytes = ",NNNNN" (comma + 5-digit node number)
- Simplify "-1" check using kstrtoint() instead of dual strcmp()
- Move nodemask_t mask read outside PFN iteration loop for performance
* Avoids 128-byte structure copy on each iteration
Changes in v2:
- Use nodemask_t instead of int to support multiple nodes
- Implement nodelist_parse() to support flexible input formats
* Single node: "0", "2"
* Multiple nodes: "0,2,3"
* Ranges: "0-3"
* Mixed: "0,2-4,7"
- Use %*pbl format for output (e.g., "0-2", "0,2-4,7")
- Use dynamic memory allocation (kmalloc) to handle variable-length input
- Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES)
v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@easystack.cn/
v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@easystack.cn/
v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@easystack.cn/
v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@easystack.cn/
v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@easystack.cn/
v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@easystack.cn/
---
mm/page_owner.c | 43 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 40 insertions(+), 3 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 559d9782ac0a..1e5f27cdc177 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -68,6 +68,8 @@ static const char * const page_owner_print_mode_strings[] = {
struct page_owner_filter_state {
enum page_owner_print_mode print_mode;
+ nodemask_t nid_filter;
+ bool nid_filter_enabled;
};
static bool page_owner_enabled __initdata;
@@ -764,6 +766,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
if (!handle)
goto ext_put_continue;
+ if (state->nid_filter_enabled) {
+ int page_nid = page_to_nid(page);
+
+ if (!node_isset(page_nid, state->nid_filter))
+ goto ext_put_continue;
+ }
+
/* Record the next PFN to read in the file offset */
*ppos = pfn + 1;
@@ -880,6 +889,8 @@ static int page_owner_open(struct inode *inode, struct file *file)
return -ENOMEM;
state->print_mode = PAGE_OWNER_PRINT_STACK;
+ nodes_clear(state->nid_filter);
+ state->nid_filter_enabled = false;
file->private_data = state;
return 0;
}
@@ -899,12 +910,18 @@ static ssize_t page_owner_write(struct file *file,
int ret;
size_t max_input_len;
struct page_owner_filter_state *state = file->private_data;
+ enum page_owner_print_mode new_print_mode = state->print_mode;
+ nodemask_t new_nid_filter = state->nid_filter;
+ bool new_nid_filter_enabled = state->nid_filter_enabled;
/*
* Maximum input length for filter commands:
- * 32: print_mode command max length is 17 ("mode=stack_handle").
+ * - 32: print_mode command max length is 17 ("mode=stack_handle")
+ * with sufficient buffer
+ * - 6 * MAX_NUMNODES: worst case for nid list
+ * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes
*/
- max_input_len = 32;
+ max_input_len = 32 + 6 * MAX_NUMNODES;
if (count > max_input_len)
return -EINVAL;
@@ -927,13 +944,33 @@ static ssize_t page_owner_write(struct file *file,
token + 5);
if (ret < 0)
goto out_free;
- state->print_mode = ret;
+ new_print_mode = ret;
+ } else if (!strncmp(token, "nid=", 4)) {
+ ret = nodelist_parse(token + 4, new_nid_filter);
+ if (ret < 0)
+ goto out_free;
+
+ /*
+ * We want to filter memory allocations by numa nodes, so make sure
+ * that the specified nodes have memory.
+ */
+ if (!nodes_subset(new_nid_filter, node_states[N_MEMORY])) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ new_nid_filter_enabled = true;
} else {
ret = -EINVAL;
goto out_free;
}
}
+ /* Update state atomically */
+ state->print_mode = new_print_mode;
+ state->nid_filter = new_nid_filter;
+ state->nid_filter_enabled = new_nid_filter_enabled;
+
ret = count;
out_free:
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v7 3/4] tools/mm: add page_owner_filter userspace tool
2026-05-15 9:19 [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-15 9:19 ` [PATCH v7 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-05-15 9:19 ` [PATCH v7 2/4] mm/page_owner: add NUMA node filter Zhen Ni
@ 2026-05-15 9:19 ` Zhen Ni
2026-05-15 9:19 ` [PATCH v7 4/4] mm/page_owner: document page_owner filter Zhen Ni
3 siblings, 0 replies; 5+ messages in thread
From: Zhen Ni @ 2026-05-15 9:19 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add a userspace filtering tool for page_owner that supports per-fd
filtering with print_mode and NUMA node filters.
Features:
- Three print modes: stack (default), handle, stack_handle
- NUMA node filtering with flexible formats (single: 0, multiple: 0,1,2,
range: 0-3, mixed: 0,2-3)
- Per-file-descriptor filter state for independent filtering
Usage examples:
# Filter by print mode
./page_owner_filter -m handle
./page_owner_filter -m stack_handle
# Filter by NUMA node
./page_owner_filter -n 0
./page_owner_filter -n 0-3
# Combined filters
./page_owner_filter -m stack -n 0,1,2
./page_owner_filter -m handle -n 0,2-3
The tool validates inputs before sending commands to the kernel and
provides clear error messages when the kernel does not support
per-fd filtering.
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v7:
- New patch for userspace tool
---
tools/mm/Makefile | 4 +-
tools/mm/page_owner_filter.c | 277 +++++++++++++++++++++++++++++++++++
2 files changed, 279 insertions(+), 2 deletions(-)
create mode 100644 tools/mm/page_owner_filter.c
diff --git a/tools/mm/Makefile b/tools/mm/Makefile
index f5725b5c23aa..858186a6eefd 100644
--- a/tools/mm/Makefile
+++ b/tools/mm/Makefile
@@ -3,7 +3,7 @@
#
include ../scripts/Makefile.include
-BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test
+BUILD_TARGETS=page-types slabinfo page_owner_sort page_owner_filter thp_swap_allocator_test
INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps
LIB_DIR = ../lib/api
@@ -23,7 +23,7 @@ $(LIBS):
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
clean:
- $(RM) page-types slabinfo page_owner_sort thp_swap_allocator_test
+ $(RM) page-types slabinfo page_owner_sort page_owner_filter thp_swap_allocator_test
make -C $(LIB_DIR) clean
sbindir ?= /usr/sbin
diff --git a/tools/mm/page_owner_filter.c b/tools/mm/page_owner_filter.c
new file mode 100644
index 000000000000..cea7dacf1245
--- /dev/null
+++ b/tools/mm/page_owner_filter.c
@@ -0,0 +1,277 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * User-space helper to filter page_owner output per-fd
+ *
+ * Example use:
+ * ./page_owner_filter -m handle
+ * ./page_owner_filter -m stack_handle
+ * ./page_owner_filter -n 0,1,2
+ *
+ * See Documentation/mm/page_owner.rst
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <ctype.h>
+#include <getopt.h>
+
+#define MAX_CMD_LEN 512
+
+static void usage(const char *prog)
+{
+ fprintf(stderr, "Usage: %s [OPTIONS]\n", prog);
+ fprintf(stderr, "\nOptions:\n");
+ fprintf(stderr, " -m, --mode MODE : print_mode (stack, handle, or stack_handle)\n");
+ fprintf(stderr, " -n, --nid NID_LIST : NUMA node IDs (comma-separated or ranges)\n");
+ fprintf(stderr, " -o, --output FILE : output file (default: stdout)\n");
+ fprintf(stderr, " -h, --help : show this help message\n");
+ fprintf(stderr, "\nExamples:\n");
+ fprintf(stderr, " %s -m stack\n", prog);
+ fprintf(stderr, " %s -m handle\n", prog);
+ fprintf(stderr, " %s -m stack_handle\n", prog);
+ fprintf(stderr, " %s -m stack -o output.txt\n", prog);
+ fprintf(stderr, " %s -n 0,1,2\n", prog);
+ fprintf(stderr, " %s -m stack -n 0\n", prog);
+}
+
+static int validate_mode(const char *mode)
+{
+ if (strcmp(mode, "stack") == 0 ||
+ strcmp(mode, "handle") == 0 ||
+ strcmp(mode, "stack_handle") == 0)
+ return 0;
+
+ fprintf(stderr, "Error: Invalid mode '%s'\n", mode);
+ fprintf(stderr, "Valid modes: stack, handle, stack_handle\n");
+ return -1;
+}
+
+static int validate_nid_list(const char *nid_list)
+{
+ const char *p;
+ int i = 0;
+ int has_digit = 0;
+ int in_range = 0;
+ int prev_num = 0;
+ int curr_num = 0;
+
+ if (!nid_list || strlen(nid_list) == 0)
+ return 0;
+
+ for (p = nid_list; *p; p++) {
+ if (*p == ',') {
+ if (!has_digit) {
+ fprintf(stderr, "Error: Invalid nid_list format\n");
+ return -1;
+ }
+ if (in_range && prev_num > curr_num) {
+ fprintf(stderr,
+ "Error: Invalid range %d-%d (start must be <= end)\n",
+ prev_num, curr_num);
+ return -1;
+ }
+ i = 0;
+ has_digit = 0;
+ in_range = 0;
+ prev_num = 0;
+ curr_num = 0;
+ continue;
+ }
+
+ if (*p == '-') {
+ if (!has_digit) {
+ fprintf(stderr,
+ "Error: Invalid nid_list format ");
+ fprintf(stderr,
+ "(dash without preceding number)\n");
+ return -1;
+ }
+ prev_num = curr_num;
+ curr_num = 0;
+ i = 0;
+ has_digit = 0;
+ in_range = 1;
+ continue;
+ }
+
+ if (!isdigit(*p)) {
+ fprintf(stderr, "Error: Invalid character '%c' in nid_list\n", *p);
+ return -1;
+ }
+
+ if (i > 5) {
+ fprintf(stderr, "Error: NID too long (max 65536)\n");
+ return -1;
+ }
+ curr_num = curr_num * 10 + (*p - '0');
+ i++;
+ has_digit = 1;
+ }
+
+ if (!has_digit) {
+ fprintf(stderr, "Error: Invalid nid_list format\n");
+ return -1;
+ }
+
+ if (in_range && prev_num > curr_num) {
+ fprintf(stderr,
+ "Error: Invalid range %d-%d (start must be <= end)\n",
+ prev_num, curr_num);
+ return -1;
+ }
+
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ const char *output_file = NULL;
+ char filter_cmd[MAX_CMD_LEN];
+ FILE *output = NULL;
+ int fd = -1;
+ ssize_t ret;
+ char buf[4096];
+ int opt;
+ size_t cmd_len = 0;
+
+ static struct option long_options[] = {
+ {"mode", required_argument, 0, 'm'},
+ {"nid", required_argument, 0, 'n'},
+ {"output", required_argument, 0, 'o'},
+ {"help", no_argument, 0, 'h'},
+ {0, 0, 0, 0}
+ };
+
+ filter_cmd[0] = '\0';
+
+ if (argc > 1) {
+ for (int i = 1; i < argc; i++) {
+ if (strcmp(argv[i], "-h") == 0 || strcmp(argv[i], "--help") == 0) {
+ usage(argv[0]);
+ return 0;
+ }
+ }
+ }
+
+ /* Check if page_owner exists and is readable */
+ if (access("/sys/kernel/debug/page_owner", F_OK) != 0) {
+ if (errno == ENOENT)
+ fprintf(stderr, "Error: /sys/kernel/debug/page_owner does not exist\n");
+ else
+ perror("Error accessing /sys/kernel/debug/page_owner");
+ fprintf(stderr, "Make sure page_owner is enabled in kernel\n");
+ return 1;
+ }
+
+ while ((opt = getopt_long(argc, argv, "m:n:o:h", long_options, NULL)) != -1) {
+ switch (opt) {
+ case 'm': {
+ const char *mode = optarg;
+
+ if (validate_mode(mode) < 0)
+ return 1;
+ cmd_len += snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len,
+ "%smode=%s", cmd_len > 0 ? " " : "", mode);
+ break;
+ }
+ case 'n': {
+ const char *nid_list = optarg;
+
+ if (validate_nid_list(nid_list) < 0)
+ return 1;
+ cmd_len += snprintf(filter_cmd + cmd_len, MAX_CMD_LEN - cmd_len,
+ "%snid=%s", cmd_len > 0 ? " " : "", nid_list);
+ break;
+ }
+ case 'o':
+ output_file = optarg;
+ break;
+ case 'h':
+ /* Already handled above */
+ break;
+ default:
+ usage(argv[0]);
+ return 1;
+ }
+ }
+
+ /* At least one filter must be specified */
+ if (cmd_len == 0) {
+ fprintf(stderr, "Error: At least one filter (-m or -n) must be specified\n\n");
+ usage(argv[0]);
+ return 1;
+ }
+
+ /* Open page_owner for read-write - this will fail if kernel doesn't support write */
+ fd = open("/sys/kernel/debug/page_owner", O_RDWR);
+ if (fd < 0) {
+ if (errno == EACCES || errno == EPERM) {
+ fprintf(stderr, "Error: /sys/kernel/debug/page_owner ");
+ fprintf(stderr, "does not support write access\n");
+ fprintf(stderr, "This kernel does not support ");
+ fprintf(stderr, "per-fd filtering.\n");
+ fprintf(stderr, "Please ensure you have a kernel with ");
+ fprintf(stderr, "per-fd filtering support.\n");
+ } else {
+ perror("Error opening /sys/kernel/debug/page_owner");
+ }
+ return 1;
+ }
+
+ if (output_file) {
+ output = fopen(output_file, "w");
+ if (!output) {
+ perror("open output file");
+ close(fd);
+ return 1;
+ }
+ } else {
+ output = stdout;
+ }
+
+ ret = write(fd, filter_cmd, strlen(filter_cmd));
+
+ if (ret < 0) {
+ if (errno == EINVAL) {
+ fprintf(stderr, "Error: Kernel rejected the filter command.\n");
+ fprintf(stderr, "Possible causes:\n");
+ fprintf(stderr, " - Kernel does not support per-fd filtering\n");
+ fprintf(stderr, " - NUMA node has no memory\n");
+ fprintf(stderr, " - Unknown reason\n");
+ } else {
+ perror("write filter command");
+ }
+ close(fd);
+ if (output != stdout)
+ fclose(output);
+ return 1;
+ }
+
+ if ((size_t)ret != strlen(filter_cmd))
+ fprintf(stderr, "Warning: Partial write (%zd/%zu)\n", ret, strlen(filter_cmd));
+
+ /* Read and display filtered output */
+ while ((ret = read(fd, buf, sizeof(buf) - 1)) > 0) {
+ buf[ret] = '\0';
+ fprintf(output, "%s", buf);
+ fflush(output);
+ }
+
+ if (ret < 0) {
+ perror("read page_owner");
+ close(fd);
+ if (output != stdout)
+ fclose(output);
+ return 1;
+ }
+
+ close(fd);
+ if (output != stdout)
+ fclose(output);
+
+ return 0;
+}
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v7 4/4] mm/page_owner: document page_owner filter
2026-05-15 9:19 [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
` (2 preceding siblings ...)
2026-05-15 9:19 ` [PATCH v7 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
@ 2026-05-15 9:19 ` Zhen Ni
3 siblings, 0 replies; 5+ messages in thread
From: Zhen Ni @ 2026-05-15 9:19 UTC (permalink / raw)
To: akpm, vbabka
Cc: surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel,
Zhen Ni
Add documentation for the page_owner_filter userspace tool and
kernel-level filtering features.
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
Changes in v7:
- document for per-file-descriptor implementation
Changes in v6:
- No code changes
Changes in v5:
- No code changes
Changes in v4:
- Update print_mode documentation to reflect string-based interface
* Change from "0/1" to "full_stack"/"stack_handle"
* Add bracket notation example: "[full_stack] stack_handle"
- Update NUMA filter documentation
* Remove "-1" example
* Add empty string as clear method
- Fix indentation: use tabs instead of spaces in code examples
Changes in v3:
- New patch to document filter features as requested by Andrew Morton
v6: https://lore.kernel.org/linux-mm/20260511033017.747781-4-zhen.ni@easystack.cn/
v5: https://lore.kernel.org/linux-mm/20260507064643.179187-4-zhen.ni@easystack.cn/
v4: https://lore.kernel.org/linux-mm/20260430163247.13628-4-zhen.ni@easystack.cn/
v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-5-zhen.ni@easystack.cn/
---
Documentation/mm/page_owner.rst | 77 ++++++++++++++++++++++++++++++++-
1 file changed, 75 insertions(+), 2 deletions(-)
diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst
index 6b12f3b007ec..aef1fe561ade 100644
--- a/Documentation/mm/page_owner.rst
+++ b/Documentation/mm/page_owner.rst
@@ -65,7 +65,14 @@ un-tracking state.
Usage
=====
-1) Build user-space helper::
+1) Build user-space helpers::
+
+To filter page_owner output::
+
+ cd tools/mm
+ make page_owner_filter
+
+To sort and analyze page_owner output::
cd tools/mm
make page_owner_sort
@@ -74,7 +81,11 @@ Usage
3) Do the job that you want to debug.
-4) Analyze information from page owner::
+4) (Optional) Filter page_owner output::
+
+ ./page_owner_filter -m handle -n 0,1,2 > filtered_page_owner.txt
+
+5) Analyze information from page owner::
cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
cat stacks.txt
@@ -263,3 +274,65 @@ STANDARD FORMAT SPECIFIERS
f free whether the page has been released or not
st stacktrace stack trace of the page allocation
ator allocator memory allocator for pages
+
+Filtering page_owner output
+============================
+
+page_owner supports filtering output at the kernel level before reading,
+which reduces the amount of data that needs to be processed in userspace.
+
+The page_owner_filter tool provides a convenient interface for this filtering
+capability. It supports two types of filters:
+
+1. **print_mode filter**: Control what information is printed for each page
+ - ``stack``: Print full stack traces (default, compatible with existing usage)
+ - ``handle``: Print only stack handle numbers (much faster, smaller output)
+ - ``stack_handle``: Print both stack traces and handle numbers
+
+ The ``handle`` mode uses numeric identifiers instead of full stack traces.
+ The mapping from handles to actual stack traces can be obtained via the
+ show_stacks_handles interface.
+
+2. **NUMA node filter**: Filter pages by NUMA node ID
+ - Supports single node: ``-n 0``
+ - Multiple nodes: ``-n 0,1,2``
+ - Ranges: ``-n 0-3``
+ - Mixed format: ``-n 0,2-3,5``
+
+Usage examples::
+
+ # Filter by print mode
+ ./page_owner_filter -m handle
+ ./page_owner_filter -m stack_handle
+
+ # Filter by NUMA node
+ ./page_owner_filter -n 0
+ ./page_owner_filter -n 0-3
+
+ # Combined filters
+ ./page_owner_filter -m stack -n 0,1,2
+ ./page_owner_filter -m handle -n 0,2-3
+
+ # Save to file
+ ./page_owner_filter -m handle -o filtered_output.txt
+
+The handle mode is particularly useful for monitoring and performance-critical
+scenarios as it dramatically reduces output size. Testing shows handle mode can
+reduce output size by ~66% (84MB vs 244MB) and improve read performance by ~4.4x
+compared to full stack output.
+
+The NUMA node filter is useful for NUMA-aware memory allocation analysis and debugging.
+
+Behind the scenes, page_owner_filter opens /sys/kernel/debug/page_owner and
+writes filter commands before reading the filtered output. The filtering uses
+per-file-descriptor state, allowing each open() to have independent filter settings.
+
+Each file descriptor maintains its own filter state, so you can have multiple
+independent filtering operations running concurrently. For example, in different
+terminals you can run different filters simultaneously::
+
+ # Terminal 1: Filter node 0
+ ./page_owner_filter -n 0 > node0_output.txt
+
+ # Terminal 2: Filter node 1 (runs concurrently)
+ ./page_owner_filter -n 1 > node1_output.txt
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-15 9:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 9:19 [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-05-15 9:19 ` [PATCH v7 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-05-15 9:19 ` [PATCH v7 2/4] mm/page_owner: add NUMA node filter Zhen Ni
2026-05-15 9:19 ` [PATCH v7 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-05-15 9:19 ` [PATCH v7 4/4] mm/page_owner: document page_owner filter Zhen Ni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox