From: "zhen.ni" <zhen.ni@easystack.cn>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
Ye Liu <ye.liu@linux.dev>,
akpm@linux-foundation.org
Cc: surenb@google.com, mhocko@suse.com, jackmanb@google.com,
hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v11 1/4] mm/page_owner: add print_mode filter
Date: Fri, 3 Jul 2026 16:14:27 +0800 [thread overview]
Message-ID: <f7bd2660-374c-4b2f-a274-8fd46368abea@easystack.cn> (raw)
In-Reply-To: <49526116-ce35-4447-bd98-f5f0ca12d92a@kernel.org>
在 2026/6/29 17:30, Vlastimil Babka (SUSE) 写道:
> On 6/29/26 04:59, Ye Liu wrote:
>>
>> 在 2026/6/25 12:30, Zhen Ni 写道:
>>> Add a print_mode filter to page_owner that allows users to choose between
>>> printing stack traces, stack handles, or both, providing flexibility for
>>> different debugging and analysis scenarios.
>>>
>>> The filter provides three modes via page_owner:
>>> - Writing "mode=stack" prints stack traces for each page (default)
>>> - Writing "mode=handle" prints only the handle number
>>> - Writing "mode=stack_handle" prints both stack traces and handles
>>>
>>> The default stack mode maintains backward compatibility with existing
>>> usage, displaying complete stack traces for each page allocation.
>>>
>>> The handle mode dramatically reduces log size and improves performance by
>>> showing only the handle number instead of the full stack trace. Testing
>>> shows handle mode reduces output size by ~66% (84MB vs 244MB) and
>>> improves read performance by ~4.4x compared to full stack output. The
>>> mapping from handles to actual stack traces can be obtained via the
>>> show_stacks_handles interface.
>>>
>>> The stack_handle mode prints both stack traces and handles, making it
>>> easier to identify pages with the same allocation pattern by comparing
>>> handle numbers instead of comparing large stack traces.
>>>
>>> Example usage:
>>> # Using the page_owner_filter tool (recommended)
>>> ./page_owner_filter -m stack # Print only stack traces (default)
>>> ./page_owner_filter -m handle # Print only handles
>>> ./page_owner_filter -m stack_handle # Print both stack and handles
>>>
>>> Sample output (handle mode):
>>> Page allocated via order 0, migratetype Unmovable, gfp_mask 0x1100ca,
>>> pid 1, tgid 1 (systemd), ts 123456789 ns
>>> PFN 0x1000 type Unmovable Block 1 type Unmovable
>>> Flags 0x3fffe800000084(referenced|lru|active|private|node=0|zone=1)
>>> handle: 17432583
>>> ...
>>>
>>> This implementation uses per-file-descriptor filter state stored in
>>> file->private_data, allowing each opener to have independent filter
>>> configuration.
>>>
>>> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
>>> ---
>>> Changes in v11:
>>> - No changes
>>>
>>> Changes in v10:
>>> - No changes
>>>
>>> Changes in v9:
>>> - Add spinlock_t lock to struct page_owner_filter_state for concurrent access protection
>>>
>>> Changes in v8:
>>> - Fix buffer overflow by adding bounds check between stack_depot_snprint() and scnprintf()
>>> - Fix unsafe string handling: use memdup_user_nul() instead of kmalloc_objs + strncpy_from_user()
>>> - Fix strsep() memory corruption by saving original pointer before strsep() call
>>> - Change format specifier from %d to %u for depot_stack_handle_t
>>>
>>> Changes in v7:
>>> - per-file-descriptor implementation
>>>
>>> Changes in v6:
>>> - Remove unnecessary braces in if/else statement (coding style)
>>> - Use stack array (char kbuf[33]) instead of kmalloc for input buffer
>>>
>>> Changes in v5:
>>> - No code changes
>>>
>>> Changes in v4:
>>> - Change from numeric (0/1) to string-based interface ("full_stack"/"stack_handle")
>>> - Merge infrastructure patch into this patch
>>>
>>> Changes in v3:
>>> - No code changes
>>>
>>> Changes in v2:
>>> - Renamed from 'compact mode' to 'print_mode' for better clarity
>>> - Use enum values (0=full_stack, 1=stack_handle) instead of boolean
>>> - Update debugfs filename from 'compact' to 'print_mode'
>>>
>>> v10: https://lore.kernel.org/linux-mm/20260618035750.3724613-2-zhen.ni@easystack.cn/
>>> v9: https://lore.kernel.org/linux-mm/20260525081652.2210206-2-zhen.ni@easystack.cn/
>>> v8: https://lore.kernel.org/linux-mm/20260520075641.1931080-2-zhen.ni@easystack.cn/
>>> v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-2-zhen.ni@easystack.cn/
>>> v6: https://lore.kernel.org/linux-mm/20260511033017.747781-2-zhen.ni@easystack.cn/
>>> v5: https://lore.kernel.org/linux-mm/20260507064643.179187-2-zhen.ni@easystack.cn/
>>> v4: https://lore.kernel.org/linux-mm/20260430163247.13628-2-zhen.ni@easystack.cn/
>>> v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-2-zhen.ni@easystack.cn/
>>> https://lore.kernel.org/linux-mm/20260428071112.1420380-3-zhen.ni@easystack.cn/
>>> v2: https://lore.kernel.org/linux-mm/20260419155540.376847-2-zhen.ni@easystack.cn/
>>> https://lore.kernel.org/linux-mm/20260419155540.376847-3-zhen.ni@easystack.cn/
>>> v1: https://lore.kernel.org/linux-mm/20260417154638.22370-2-zhen.ni@easystack.cn/
>>> https://lore.kernel.org/linux-mm/20260417154638.22370-3-zhen.ni@easystack.cn/
>>> ---
>>> mm/page_owner.c | 129 +++++++++++++++++++++++++++++++++++++++++++++---
>>> 1 file changed, 123 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>>> index 8178e0be557f..7595735979bf 100644
>>> --- a/mm/page_owner.c
>>> +++ b/mm/page_owner.c
>>> @@ -54,6 +54,23 @@ struct stack_print_ctx {
>>> u8 flags;
>>> };
>>>
>>> +enum page_owner_print_mode {
>>> + PAGE_OWNER_PRINT_STACK,
>>> + PAGE_OWNER_PRINT_HANDLE,
>>> + PAGE_OWNER_PRINT_STACK_HANDLE,
>>> +};
>>> +
>>> +static const char * const page_owner_print_mode_strings[] = {
>>> + [PAGE_OWNER_PRINT_STACK] = "stack",
>>> + [PAGE_OWNER_PRINT_HANDLE] = "handle",
>>> + [PAGE_OWNER_PRINT_STACK_HANDLE] = "stack_handle",
>>> +};
>>> +
>>> +struct page_owner_filter_state {
>>> + enum page_owner_print_mode print_mode;
>>> + spinlock_t lock;
>> Hi , Zhen
>> The spinlock in struct page_owner_filter_state is unnecessary and adds significant overhead in the read path.
>>
>> 1. Per-fd isolation: the state is allocated per open() and stored in file->private_data.
>> There is no cross-fd contention possible.
>> 2. Hot path cost: the lock is taken for every single page in read_page_owner() and
>> print_page_owner(). A single read can traverse millions of pages, each paying
>> spin_lock_irqsave/irqrestore — including interrupt disable — just to read a mode
>> enum or check a nodemask. This is measurable overhead for no real benefit.
>> 3. No practical race: nobody writes filter config to an fd while simultaneously reading from it.
>>
>> Suggest dropping the lock entirely.
>>
>> Just my take though — happy to follow whatever the other reviewers prefer here.
>
> I agree. If someone is writing (updating filter) and reading (getting
> page_owner output) at the same time from multiple threads, they might get
> inconsistent results but that's getting what you ask for. Importantly it
> can't cause any crash, AFAICS.
>
>
Hi Vlastimil, Ye,
Thanks for the review. I understand your concerns about the spinlock
overhead in the read path.
The spinlock does have its use case: it prevents race conditions when
multiple threads share the same file descriptor and call read() and
write() concurrently. While we recommend users use the page_owner_filter
tool, we cannot exclude the possibility that some users might directly
share the fd across threads.
That said, I'm open to discussion on whether we need the spinlock. As
Vlastimil noted, the issue isn't severe enough to cause crashes. My v8
version didn't have the spinlock - I added it in response to review
feedback.
So the question is really whether we want to protect multi-threaded fd
sharing or not. Because, the overhead is small in non-contended cases
(single-threaded usage) since there are no competing lock holders.
Thanks,
Zhen
next prev parent reply other threads:[~2026-07-03 8:14 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 4:30 [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Zhen Ni
2026-06-25 4:30 ` [PATCH v11 1/4] mm/page_owner: add print_mode filter Zhen Ni
2026-06-25 18:26 ` Zi Yan
2026-06-25 19:20 ` Andrew Morton
2026-06-25 19:24 ` Zi Yan
2026-06-29 2:59 ` Ye Liu
2026-06-29 9:30 ` Vlastimil Babka (SUSE)
2026-07-03 8:14 ` zhen.ni [this message]
2026-07-03 8:51 ` Vlastimil Babka (SUSE)
2026-06-25 4:30 ` [PATCH v11 2/4] mm/page_owner: add NUMA node filter Zhen Ni
2026-06-25 18:37 ` Zi Yan
2026-06-26 8:20 ` zhen.ni
2026-06-25 19:27 ` Zi Yan
2026-06-25 20:04 ` Andrew Morton
2026-06-25 4:31 ` [PATCH v11 3/4] tools/mm: add page_owner_filter userspace tool Zhen Ni
2026-06-25 4:50 ` Andrew Morton
2026-06-29 8:31 ` zhen.ni
2026-06-25 4:31 ` [PATCH v11 4/4] mm/page_owner: document page_owner filter Zhen Ni
2026-06-25 4:55 ` [PATCH v11 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering Andrew Morton
2026-06-25 12:57 ` zhen.ni
2026-06-25 18:22 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7bd2660-374c-4b2f-a274-8fd46368abea@easystack.cn \
--to=zhen.ni@easystack.cn \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ye.liu@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox