All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qi Zheng <qi.zheng@linux.dev>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>,
	Andreas Larsson <andreas@gaisler.com>
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-alpha@vger.kernel.org,
	loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linux-um@lists.infradead.org,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	sparclinux <sparclinux@vger.kernel.org>,
	will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org,
	aneesh.kumar@kernel.org, npiggin@gmail.com, dev.jain@arm.com,
	ioworker0@gmail.com, linmag7@gmail.com
Subject: Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
Date: Tue, 27 Jan 2026 19:47:16 +0800	[thread overview]
Message-ID: <d0ebcc3d-ba81-49ca-899a-34206f8dd71f@linux.dev> (raw)
In-Reply-To: <b95f042f-8fc8-4b6c-b9db-b198efdd0973@kernel.org>



On 1/27/26 7:29 PM, David Hildenbrand (Red Hat) wrote:
> On 1/26/26 07:59, Qi Zheng wrote:
>>
>>
>> On 1/23/26 11:15 PM, Andreas Larsson wrote:
>>> On 2025-12-17 10:45, Qi Zheng wrote:
>>>> From: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>
>>>> The PT_RECLAIM can work on all architectures that support
>>>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on
>>>> MMU_GATHER_RCU_TABLE_FREE.
>>>>
>>>> BTW, change PT_RECLAIM to be enabled by default, since nobody should 
>>>> want
>>>> to turn it off.
>>>>
>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>> ---
>>>>    arch/x86/Kconfig | 1 -
>>>>    mm/Kconfig       | 9 ++-------
>>>>    2 files changed, 2 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index 80527299f859a..0d22da56a71b0 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -331,7 +331,6 @@ config X86
>>>>        select FUNCTION_ALIGNMENT_4B
>>>>        imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
>>>>        select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
>>>> -    select ARCH_SUPPORTS_PT_RECLAIM        if X86_64
>>>>        select ARCH_SUPPORTS_SCHED_SMT        if SMP
>>>>        select SCHED_SMT            if SMP
>>>>        select ARCH_SUPPORTS_SCHED_CLUSTER    if SMP
>>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>>> index bd0ea5454af82..fc00b429b7129 100644
>>>> --- a/mm/Kconfig
>>>> +++ b/mm/Kconfig
>>>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK
>>>>          The architecture has hardware support for userspace shadow 
>>>> call
>>>>              stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
>>>> -config ARCH_SUPPORTS_PT_RECLAIM
>>>> -    def_bool n
>>>> -
>>>>    config PT_RECLAIM
>>>> -    bool "reclaim empty user page table pages"
>>>> -    default y
>>>> -    depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>>> -    select MMU_GATHER_RCU_TABLE_FREE
>>>> +    def_bool y
>>>> +    depends on MMU_GATHER_RCU_TABLE_FREE
>>>>        help
>>>>          Try to reclaim empty user page table pages in paths other 
>>>> than munmap
>>>>          and exit_mmap path.
>>>
>>> Hi,
>>>
>>> This patch unfortunately results in a WARN_ON_ONCE and unaligned
>>> accesses on sparc64:
>>>
>>> $ stress-ng --mmaphuge 20 -t 60
>>> stress-ng: info:  [559] setting to a 1 min run per stressor
>>> stress-ng: info:  [559] dispatching hogs: 20 mmaphuge
>>> [  560.592569] ------------[ cut here ]------------
>>> [  560.592663] WARNING: kernel/rcu/tree.c:3098 at 
>>> __call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568
>>> [  560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not 
>>> tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY
>>> [  560.592805] Call Trace:
>>> [  560.592812] [<00000000004368b8>] dump_stack+0x8/0x60
>>> [  560.592844] [<0000000000482a60>] __warn+0xe0/0x140
>>> [  560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120
>>> [  560.592901] [<0000000000526a40>] 
>>> __call_rcu_common.constprop.0+0x200/0x760
>>> [  560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20
>>> [  560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0
>>> [  560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0
>>> [  560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240
>>> [  560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140
>>> [  560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280
>>> [  560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260
>>> [  560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140
>>> [  560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160
>>> [  560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40
>>> [  560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0
>>> [  560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
>>> [  560.593274] ---[ end trace 0000000000000000 ]---
>>> [  560.593960] log_unaligned: 209 callbacks suppressed
>>> [  560.593979] Kernel unaligned access at TPC[526a4c] 
>>> __call_rcu_common.constprop.0+0x20c/0x760
>>> [  560.594121] Kernel unaligned access at TPC[526864] 
>>> __call_rcu_common.constprop.0+0x24/0x760
>>> [  560.594198] Kernel unaligned access at TPC[52b3c4] 
>>> rcu_segcblist_enqueue+0x24/0x40
>>> [  560.594275] Kernel unaligned access at TPC[526860] 
>>> __call_rcu_common.constprop.0+0x20/0x760
>>> [  560.594360] Kernel unaligned access at TPC[526864] 
>>> __call_rcu_common.constprop.0+0x24/0x760
>>> [  567.054127] log_unaligned: 1105 callbacks suppressed
>>> [  567.054167] Kernel unaligned access at TPC[526860] 
>>> __call_rcu_common.constprop.0+0x20/0x760
>>> [  567.054331] Kernel unaligned access at TPC[526864] 
>>> __call_rcu_common.constprop.0+0x24/0x760
>>> [  567.054410] Kernel unaligned access at TPC[52b3c4] 
>>> rcu_segcblist_enqueue+0x24/0x40
>>
>> Thanks for your report!
>>
>> On sparc64, pmd and pud levels are not of struct page:
> 
> Can you elaborate, I don't understand what you mean :)

On sparc64:

static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, 
bool is_page)
{
	unsigned long pgf = (unsigned long)table;
	if (is_page)
		pgf |= 0x1UL;
	tlb_remove_table(tlb, (void *)pgf);
}

static inline void __tlb_remove_table(void *_table)
{
	void *table = (void *)((unsigned long)_table & ~0x1UL);
	bool is_page = false;

	if ((unsigned long)_table & 0x1UL)
		is_page = true;
	pgtable_free(table, is_page);
}

void pgtable_free(void *table, bool is_page)
{
	if (is_page)
		__pte_free(table);
	else
		kmem_cache_free(pgtable_cache, table);
}

For pmd and pud levels, is_page is false, so we can not do the
following in __tlb_remove_table_one().

```
	ptdesc = table;
	call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
```

> 
> Is it also a problem on architectures like s390x and ppc, where we 
> squeeze multiple page tables into a physical pages?

For ppc, it's the same as for sparc64.

For s390x, it supports MMU_GATHER_RCU_TABLE_FREE and define its own
pxx_free_tlb(), but these all call tlb_remove_ptdesc(), so there is no
problem.

> 
>>
>> __pmd_free_tlb/__pud_free_tlb
>> --> pgtable_free_tlb(tlb, pud/pmd, false). <=== is_page == false
>>       --> tlb_remove_table
>>
>> So in __tlb_remove_table_one(), the table cannot be treated as
>> ptdesc because it does not have an pt_rcu_head member.
>>
>> Hi David, it seems we still need to keep ARCH_SUPPORTS_PT_RECLAIM?
> 
> Or we invert it and only disable it for the known-problematic 
> architectures?

Yes, the problem lies with those architectures that support
MMU_GATHER_RCU_TABLE_FREE and define their own _tlb_remove_table().

So my plan is as follows:

1. convert __HAVE_ARCH_TLB_REMOVE_TABLE to 
CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
2. make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE && 
!HAVE_ARCH_TLB_REMOVE_TABLE

I'll send v4 soon.

Thanks,
Qi

> 


      reply	other threads:[~2026-01-27 11:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng
2025-12-17  9:45 ` [PATCH v3 1/7] mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h Qi Zheng
2025-12-17  9:45 ` [PATCH v3 2/7] alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE Qi Zheng
2025-12-17  9:45 ` [PATCH v3 3/7] LoongArch: " Qi Zheng
2025-12-17  9:45 ` [PATCH v3 4/7] mips: " Qi Zheng
2025-12-17  9:45 ` [PATCH v3 5/7] parisc: " Qi Zheng
2025-12-17  9:45 ` [PATCH v3 6/7] um: " Qi Zheng
2025-12-17  9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng
2025-12-31  9:42   ` Wei Yang
2025-12-31  9:52     ` Qi Zheng
2026-01-01  2:07       ` Wei Yang
2026-01-19 10:18         ` David Hildenbrand (Red Hat)
2026-01-22 14:00           ` Wei Yang
2026-01-23  3:21             ` Qi Zheng
2026-01-24  1:45               ` Wei Yang
2026-01-18 11:23   ` David Hildenbrand (Red Hat)
2026-01-19  3:50     ` Qi Zheng
2026-01-19 10:12       ` David Hildenbrand (Red Hat)
2026-01-19 10:20   ` David Hildenbrand (Red Hat)
2026-01-23 15:15   ` Andreas Larsson
2026-01-26  6:59     ` Qi Zheng
2026-01-27 11:29       ` David Hildenbrand (Red Hat)
2026-01-27 11:47         ` Qi Zheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0ebcc3d-ba81-49ca-899a-34206f8dd71f@linux.dev \
    --to=qi.zheng@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=andreas@gaisler.com \
    --cc=aneesh.kumar@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=ioworker0@gmail.com \
    --cc=linmag7@gmail.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=loongarch@lists.linux.dev \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=will@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.