Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Ye Liu <ye.liu@linux.dev>
To: Zhen Ni <zhen.ni@easystack.cn>,
	akpm@linux-foundation.org, vbabka@kernel.org
Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com,
	hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v11 1/4] mm/page_owner: add print_mode filter
Date: Mon, 29 Jun 2026 10:59:00 +0800	[thread overview]
Message-ID: <d86d309d-9f5a-4d63-afb0-dc4016fe1ded@linux.dev> (raw)
In-Reply-To: <20260625043101.338794-2-zhen.ni@easystack.cn>


在 2026/6/25 12:30, Zhen Ni 写道:
> Add a print_mode filter to page_owner that allows users to choose between
> printing stack traces, stack handles, or both, providing flexibility for
> different debugging and analysis scenarios.
>
> The filter provides three modes via page_owner:
> - Writing "mode=stack" prints stack traces for each page (default)
> - Writing "mode=handle" prints only the handle number
> - Writing "mode=stack_handle" prints both stack traces and handles
>
> The default stack mode maintains backward compatibility with existing
> usage, displaying complete stack traces for each page allocation.
>
> The handle mode dramatically reduces log size and improves performance by
> showing only the handle number instead of the full stack trace. Testing
> shows handle mode reduces output size by ~66% (84MB vs 244MB) and
> improves read performance by ~4.4x compared to full stack output. The
> mapping from handles to actual stack traces can be obtained via the
> show_stacks_handles interface.
>
> The stack_handle mode prints both stack traces and handles, making it
> easier to identify pages with the same allocation pattern by comparing
> handle numbers instead of comparing large stack traces.
>
> Example usage:
>   # Using the page_owner_filter tool (recommended)
>   ./page_owner_filter -m stack          # Print only stack traces (default)
>   ./page_owner_filter -m handle         # Print only handles
>   ./page_owner_filter -m stack_handle   # Print both stack and handles
>
> Sample output (handle mode):
>   Page allocated via order 0, migratetype Unmovable, gfp_mask 0x1100ca,
>   pid 1, tgid 1 (systemd), ts 123456789 ns
>   PFN 0x1000 type Unmovable Block 1 type Unmovable
>   Flags 0x3fffe800000084(referenced|lru|active|private|node=0|zone=1)
>   handle: 17432583
>   ...
>
> This implementation uses per-file-descriptor filter state stored in
> file->private_data, allowing each opener to have independent filter
> configuration.
>
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
> Changes in v11:
> - No changes
>
> Changes in v10:
> - No changes
>
> Changes in v9:
> - Add spinlock_t lock to struct page_owner_filter_state for concurrent access protection
>
> Changes in v8:
> - Fix buffer overflow by adding bounds check between stack_depot_snprint() and scnprintf()
> - Fix unsafe string handling: use memdup_user_nul() instead of kmalloc_objs + strncpy_from_user()
> - Fix strsep() memory corruption by saving original pointer before strsep() call
> - Change format specifier from %d to %u for depot_stack_handle_t
>
> Changes in v7:
> - per-file-descriptor implementation
>
> Changes in v6:
> - Remove unnecessary braces in if/else statement (coding style)
> - Use stack array (char kbuf[33]) instead of kmalloc for input buffer
>
> Changes in v5:
> - No code changes
>
> Changes in v4:
> - Change from numeric (0/1) to string-based interface ("full_stack"/"stack_handle")
> - Merge infrastructure patch into this patch
>
> Changes in v3:
> - No code changes
>
> Changes in v2:
> - Renamed from 'compact mode' to 'print_mode' for better clarity
> - Use enum values (0=full_stack, 1=stack_handle) instead of boolean
> - Update debugfs filename from 'compact' to 'print_mode'
>
> v10: https://lore.kernel.org/linux-mm/20260618035750.3724613-2-zhen.ni@easystack.cn/
> v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-2-zhen.ni@easystack.cn/
> v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-2-zhen.ni@easystack.cn/
> v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-2-zhen.ni@easystack.cn/
> v6: https://lore.kernel.org/linux-mm/20260511033017.747781-2-zhen.ni@easystack.cn/
> v5: https://lore.kernel.org/linux-mm/20260507064643.179187-2-zhen.ni@easystack.cn/
> v4: https://lore.kernel.org/linux-mm/20260430163247.13628-2-zhen.ni@easystack.cn/
> v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-2-zhen.ni@easystack.cn/
>     https://lore.kernel.org/linux-mm/20260428071112.1420380-3-zhen.ni@easystack.cn/
> v2: https://lore.kernel.org/linux-mm/20260419155540.376847-2-zhen.ni@easystack.cn/
>     https://lore.kernel.org/linux-mm/20260419155540.376847-3-zhen.ni@easystack.cn/
> v1: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easystack.cn/
>     https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easystack.cn/
> ---
>  mm/page_owner.c | 129 +++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 123 insertions(+), 6 deletions(-)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 8178e0be557f..7595735979bf 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -54,6 +54,23 @@ struct stack_print_ctx {
>  	u8 flags;
>  };
>  
> +enum page_owner_print_mode {
> +	PAGE_OWNER_PRINT_STACK,
> +	PAGE_OWNER_PRINT_HANDLE,
> +	PAGE_OWNER_PRINT_STACK_HANDLE,
> +};
> +
> +static const char * const page_owner_print_mode_strings[] = {
> +	[PAGE_OWNER_PRINT_STACK]	= "stack",
> +	[PAGE_OWNER_PRINT_HANDLE]	= "handle",
> +	[PAGE_OWNER_PRINT_STACK_HANDLE]	= "stack_handle",
> +};
> +
> +struct page_owner_filter_state {
> +	enum page_owner_print_mode print_mode;
> +	spinlock_t lock;
Hi , Zhen
The spinlock in struct page_owner_filter_state is unnecessary and adds significant overhead in the read path.
                                                                                                    
1. Per-fd isolation: the state is allocated per open() and stored in file->private_data.
There is no cross-fd contention possible.
2. Hot path cost: the lock is taken for every single page in read_page_owner() and
print_page_owner(). A single read can traverse millions of pages, each paying
spin_lock_irqsave/irqrestore — including interrupt disable — just to read a mode
enum or check a nodemask. This is measurable overhead for no real benefit.
3. No practical race: nobody writes filter config to an fd while simultaneously reading from it. 
                                                                                                    
Suggest dropping the lock entirely. 
                                                                                                    
Just my take though — happy to follow whatever the other reviewers prefer here.

> +};
> +
>  static bool page_owner_enabled __initdata;
>  DEFINE_STATIC_KEY_FALSE(page_owner_inited);
>  
> @@ -547,16 +564,23 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
>  static ssize_t
>  print_page_owner(char __user *buf, size_t count, unsigned long pfn,
>  		struct page *page, struct page_owner *page_owner,
> -		depot_stack_handle_t handle)
> +		depot_stack_handle_t handle,
> +		struct page_owner_filter_state *state)
>  {
>  	int ret, pageblock_mt, page_mt;
>  	char *kbuf;
> +	enum page_owner_print_mode print_mode;
> +	unsigned long flags;
>  
>  	count = min_t(size_t, count, PAGE_SIZE);
>  	kbuf = kmalloc(count, GFP_KERNEL);
>  	if (!kbuf)
>  		return -ENOMEM;
>  
> +	spin_lock_irqsave(&state->lock, flags);
> +	print_mode = state->print_mode;
> +	spin_unlock_irqrestore(&state->lock, flags);
> +
>  	ret = scnprintf(kbuf, count,
>  			"Page allocated via order %u, mask %#x(%pGg), pid %d, tgid %d (%s), ts %llu ns\n",
>  			page_owner->order, page_owner->gfp_mask,
> @@ -575,9 +599,18 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
>  			migratetype_names[pageblock_mt],
>  			&page->flags);
>  
> -	ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
> -	if (ret >= count)
> -		goto err;
> +	if (print_mode != PAGE_OWNER_PRINT_HANDLE) {
> +		ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
> +		if (ret >= count)
> +			goto err;
> +	}
> +
> +	if (print_mode != PAGE_OWNER_PRINT_STACK) {
> +		ret += scnprintf(kbuf + ret, count - ret, "handle: %u\n",
> +				 handle);
> +		if (ret >= count)
> +			goto err;
> +	}
>  
>  	if (page_owner->last_migrate_reason != -1) {
>  		ret += scnprintf(kbuf + ret, count - ret,
> @@ -664,6 +697,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  	struct page_ext *page_ext;
>  	struct page_owner *page_owner;
>  	depot_stack_handle_t handle;
> +	struct page_owner_filter_state *state = file->private_data;
>  
>  	if (!static_branch_unlikely(&page_owner_inited))
>  		return -EINVAL;
> @@ -746,7 +780,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  		page_owner_tmp = *page_owner;
>  		page_ext_put(page_ext);
>  		return print_page_owner(buf, count, pfn, page,
> -				&page_owner_tmp, handle);
> +				&page_owner_tmp, handle, state);
>  ext_put_continue:
>  		page_ext_put(page_ext);
>  	}
> @@ -847,7 +881,90 @@ static void init_early_allocated_pages(void)
>  		init_pages_in_zone(zone);
>  }
>  
> +static int page_owner_open(struct inode *inode, struct file *file)
> +{
> +	struct page_owner_filter_state *state;
> +
> +	state = kzalloc_obj(*state);
> +	if (!state)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&state->lock);
> +	state->print_mode = PAGE_OWNER_PRINT_STACK;
> +	file->private_data = state;
> +	return 0;
> +}
> +
> +static int page_owner_release(struct inode *inode, struct file *file)
> +{
> +	kfree(file->private_data);
> +	return 0;
> +}
> +
> +static ssize_t page_owner_write(struct file *file,
> +				 const char __user *buf,
> +				 size_t count, loff_t *ppos)
> +{
> +	char *kbuf;
> +	char *orig;
> +	char *token;
> +	int ret;
> +	size_t max_input_len;
> +	struct page_owner_filter_state *state = file->private_data;
> +	enum page_owner_print_mode new_print_mode;
> +	unsigned long flags;
> +
> +	/*
> +	 * Maximum input length for filter commands:
> +	 * 32: print_mode command max length is 17 ("mode=stack_handle").
> +	 */
> +	max_input_len = 32;
> +
> +	if (count > max_input_len)
> +		return -EINVAL;
> +
> +	kbuf = memdup_user_nul(buf, count);
> +	if (IS_ERR(kbuf))
> +		return PTR_ERR(kbuf);
> +
> +	orig = kbuf;
> +
> +	spin_lock_irqsave(&state->lock, flags);
> +	new_print_mode = state->print_mode;
> +	spin_unlock_irqrestore(&state->lock, flags);
> +
> +	while ((token = strsep(&kbuf, " \t\n")) != NULL) {
> +		if (*token == '\0')
> +			continue;
> +
> +		if (!strncmp(token, "mode=", 5)) {
> +			ret = sysfs_match_string(page_owner_print_mode_strings,
> +						token + 5);
> +			if (ret < 0)
> +				goto out_free;
> +			new_print_mode = ret;
> +		} else {
> +			ret = -EINVAL;
> +			goto out_free;
> +		}
> +	}
> +
> +	spin_lock_irqsave(&state->lock, flags);
> +	state->print_mode = new_print_mode;
> +	spin_unlock_irqrestore(&state->lock, flags);
> +
> +	ret = count;
> +
> +out_free:
> +	kfree(orig);
> +	return ret;
> +}
> +
>  static const struct file_operations page_owner_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= page_owner_open,
> +	.release	= page_owner_release,
> +	.write		= page_owner_write,
>  	.read		= read_page_owner,
>  	.llseek		= lseek_page_owner,
>  };
> @@ -980,7 +1097,7 @@ static int __init pageowner_init(void)
>  		return 0;
>  	}
>  
> -	debugfs_create_file("page_owner", 0400, NULL, NULL, &page_owner_fops);
> +	debugfs_create_file("page_owner", 0600, NULL, NULL, &page_owner_fops);
>  	dir = debugfs_create_dir("page_owner_stacks", NULL);
>  	debugfs_create_file("show_stacks", 0400, dir,
>  			    (void *)(STACK_PRINT_FLAG_STACK |

-- 
Thanks,
Ye Liu



  parent reply	other threads:[~2026-06-29  2:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25  4:30 [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-06-25  4:30 ` [PATCH v11 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-06-25 18:26   ` Zi Yan
2026-06-25 19:20     ` Andrew Morton
2026-06-25 19:24       ` Zi Yan
2026-06-29  2:59   ` Ye Liu [this message]
2026-06-29  9:30     ` Vlastimil Babka (SUSE)
2026-06-25  4:30 ` [PATCH v11 2/4] mm/page_owner: add NUMA node filter Zhen Ni
2026-06-25 18:37   ` Zi Yan
2026-06-26  8:20     ` zhen.ni
2026-06-25 19:27   ` Zi Yan
2026-06-25 20:04     ` Andrew Morton
2026-06-25  4:31 ` [PATCH v11 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-06-25  4:50   ` Andrew Morton
2026-06-29  8:31     ` zhen.ni
2026-06-25  4:31 ` [PATCH v11 4/4] mm/page_owner: document page_owner filter Zhen Ni
2026-06-25  4:55 ` [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Andrew Morton
2026-06-25 12:57   ` zhen.ni
2026-06-25 18:22 ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d86d309d-9f5a-4d63-afb0-dc4016fe1ded@linux.dev \
    --to=ye.liu@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=zhen.ni@easystack.cn \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox