linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Qi Zheng <qi.zheng@linux.dev>,
	will@kernel.org, aneesh.kumar@kernel.org, npiggin@gmail.com,
	peterz@infradead.org, dev.jain@arm.com,
	akpm@linux-foundation.org, ioworker0@gmail.com
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-alpha@vger.kernel.org,
	linux-snps-arc@lists.infradead.org, loongarch@lists.linux.dev,
	linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
	linux-um@lists.infradead.org,
	Qi Zheng <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH 7/7] mm: make PT_RECLAIM depend on MMU_GATHER_RCU_TABLE_FREE && 64BIT
Date: Wed, 19 Nov 2025 13:24:35 +0100	[thread overview]
Message-ID: <7160b6ec-4da5-4273-be91-1339bd00d009@kernel.org> (raw)
In-Reply-To: <479b0409-335f-4450-8eb2-5270a5847f5e@linux.dev>

On 19.11.25 13:13, Qi Zheng wrote:
> 
> 
> On 11/19/25 7:35 PM, David Hildenbrand (Red Hat) wrote:
>> On 19.11.25 12:02, Qi Zheng wrote:
>>> Hi David,
>>>
>>> On 11/19/25 6:19 PM, David Hildenbrand (Red Hat) wrote:
>>>> On 18.11.25 13:02, Qi Zheng wrote:
>>>>>
>>>>>
>>>>> On 11/18/25 12:57 AM, David Hildenbrand (Red Hat) wrote:
>>>>>> On 14.11.25 12:11, Qi Zheng wrote:
>>>>>>> From: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>>>
>>>>>> Subject: s/&&/&/
>>>>>
>>>>> will do.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Make PT_RECLAIM depend on MMU_GATHER_RCU_TABLE_FREE so that
>>>>>>> PT_RECLAIM
>>>>>>> can
>>>>>>> be enabled by default on all architectures that support
>>>>>>> MMU_GATHER_RCU_TABLE_FREE.
>>>>>>>
>>>>>>> Considering that a large number of PTE page table pages (such as
>>>>>>> 100GB+)
>>>>>>> can only be caused on a 64-bit system, let PT_RECLAIM also depend on
>>>>>>> 64BIT.
>>>>>>>
>>>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>>>> ---
>>>>>>>      arch/x86/Kconfig | 1 -
>>>>>>>      mm/Kconfig       | 6 +-----
>>>>>>>      2 files changed, 1 insertion(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>>>>> index eac2e86056902..96bff81fd4787 100644
>>>>>>> --- a/arch/x86/Kconfig
>>>>>>> +++ b/arch/x86/Kconfig
>>>>>>> @@ -330,7 +330,6 @@ config X86
>>>>>>>          select FUNCTION_ALIGNMENT_4B
>>>>>>>          imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
>>>>>>>          select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
>>>>>>> -    select ARCH_SUPPORTS_PT_RECLAIM        if X86_64
>>>>>>>          select ARCH_SUPPORTS_SCHED_SMT        if SMP
>>>>>>>          select SCHED_SMT            if SMP
>>>>>>>          select ARCH_SUPPORTS_SCHED_CLUSTER    if SMP
>>>>>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>>>>>> index a5a90b169435d..e795fbd69e50c 100644
>>>>>>> --- a/mm/Kconfig
>>>>>>> +++ b/mm/Kconfig
>>>>>>> @@ -1440,14 +1440,10 @@ config ARCH_HAS_USER_SHADOW_STACK
>>>>>>>            The architecture has hardware support for userspace shadow
>>>>>>> call
>>>>>>>                stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
>>>>>>> -config ARCH_SUPPORTS_PT_RECLAIM
>>>>>>> -    def_bool n
>>>>>>> -
>>>>>>>      config PT_RECLAIM
>>>>>>>          bool "reclaim empty user page table pages"
>>>>>>>          default y
>>>>>>> -    depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>>>>>> -    select MMU_GATHER_RCU_TABLE_FREE
>>>>>>> +    depends on MMU_GATHER_RCU_TABLE_FREE && MMU && SMP && 64BIT
>>>>>>
>>>>>> Who would we have MMU_GATHER_RCU_TABLE_FREE without MMU? (can we drop
>>>>>> the MMU part)
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> Why do we care about SMP in the first place? (can we frop SMP)
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> But I also wonder why we need "MMU_GATHER_RCU_TABLE_FREE && 64BIT":
>>>>>>
>>>>>> Would it be harmful on 32bit (sure, we might not reclaim as much, but
>>>>>> still there is memory to be reclaimed?)?
>>>>>
>>>>> This is also fine on 32bit, but the benefits are not significant, So I
>>>>> chose to enable it only on 64-bit.
>>>>
>>>> Right. Address space is smaller, but also memory is smaller. Not that I
>>>> think we strictly *must* to support 32bit, I merely wonder why we
>>>> wouldn't just enable it here.
>>>>
>>>> OTOH, if there is a good reason we cannot enable it, we can definitely
>>>> just keep it 64bit only.
>>>
>>> The only difficulty is this:
>>>
>>>>
>>>>>
>>>>> I actually tried enabling MMU_GATHER_RCU_TABLE_FREE on all
>>>>> architectures, and apart from sparc32 being a bit troublesome (because
>>>>> it uses mm->page_table_lock for synchronization within
>>>>> __pte_free_tlb()), the modifications were relatively simple.
>>>
>>> in sparc32:
>>>
>>> void pte_free(struct mm_struct *mm, pgtable_t ptep)
>>> {
>>>            struct page *page;
>>>
>>>            page = pfn_to_page(__nocache_pa((unsigned long)ptep) >>
>>> PAGE_SHIFT);
>>>            spin_lock(&mm->page_table_lock);
>>>            if (page_ref_dec_return(page) == 1)
>>>                    pagetable_dtor(page_ptdesc(page));
>>>            spin_unlock(&mm->page_table_lock);
>>>
>>>            srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
>>> }
>>>
>>> #define __pte_free_tlb(tlb, pte, addr)  pte_free((tlb)->mm, pte)
>>>
>>> To enable MMU_GATHER_RCU_TABLE_FREE on sparc32, we need to implement
>>> __tlb_remove_table(), and call the pte_free() above in
>>> __tlb_remove_table().
>>>
>>> However, the __tlb_remove_table() does not have an mm parameter:
>>>
>>> void __tlb_remove_table(void *_table)
>>>
>>> so we need to use another lock instead of mm->page_table_lock.
>>>
>>> I have already sent the v2 [1], and perhaps after that I can enable
>>> PT_RECLAIM on all 32-bit architectures as well.
>>>
>>
>> I guess if we just make it depend on MMU_GATHER_RCU_TABLE_FREE that will
>> be fine.
>>
>>> [1].
>>> https://lore.kernel.org/all/
>>> cover.1763537007.git.zhengqi.arch@bytedance.com/
>>>
>>>>>
>>>>>>
>>>>>> If all 64BIT support MMU_GATHER_RCU_TABLE_FREE (as you previously
>>>>>> state), why can't we only check for 64BIT?
>>>>>
>>>>> OK, will do.
>>>>
>>>> This was also more of a question for discussion:
>>>>
>>>> Would it make sense to have
>>>>
>>>> config PT_RECLAIM
>>>>        def_bool y
>>>>        depends on MMU_GATHER_RCU_TABLE_FREE
>>>
>>> make sense.
>>>
>>>>
>>>> (a) Would we want to make it configurable (why?)
>>>
>>> No, it was just out of caution before.
>>>
>>>> (b) Do we really care about SMP (why?)
>>>
>>> No. Simply because the following situation is impossible to occur:
>>>
>>> pte_offset_map
>>> traversing the PTE page table
>>>
>>> <preemption or hardirq>
>>>
>>> call madvise(MADV_DONTNEED)
>>>
>>> so there's no need to free PTE page via RCU.
>>>
>>>> (c) Do we want to limit to 64bit (why?)
>>>
>>> No, just because the profit is greater at 64-BIT.
>>
>> I was briefly wondering if on 32bit (but maybe also on 64bit with
>> configurable user page table levels?) we could have the scenario that we
>> only have two page table levels.
>>
>> So reclaiming the PMD level (corresponding to the highest level) would
> 
> reclaiming the PMD level? The PT_RECLAIM only reclaim PTE pages, not PMD
> pages, am I misunderstanding something?

Sorry, I looked too much into PMD table sharing the last days :D

You're right, it would work in any case even with only 2 levels of apge 
tables.

-- 
Cheers

David


  reply	other threads:[~2025-11-19 12:24 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14 11:11 [PATCH 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng
2025-11-14 11:11 ` [PATCH 1/7] alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE Qi Zheng
2025-11-14 19:13   ` Magnus Lindholm
2025-11-15  9:06     ` Qi Zheng
2025-11-14 11:11 ` [PATCH 2/7] arc: " Qi Zheng
2025-11-14 11:20   ` Qi Zheng
2025-11-14 23:10     ` Vineet Gupta
2025-11-15  9:08       ` Qi Zheng
2025-11-14 11:11 ` [PATCH 3/7] loongarch: " Qi Zheng
2025-11-14 14:17   ` Huacai Chen
2025-11-14 15:55     ` Qi Zheng
2025-11-17  6:41     ` Qi Zheng
2025-11-17  6:57       ` Huacai Chen
2025-11-14 11:11 ` [PATCH 4/7] mips: " Qi Zheng
2025-11-14 11:11 ` [PATCH 5/7] parisc: " Qi Zheng
2025-11-14 11:11 ` [PATCH 6/7] um: " Qi Zheng
2025-11-14 11:11 ` [PATCH 7/7] mm: make PT_RECLAIM depend on MMU_GATHER_RCU_TABLE_FREE && 64BIT Qi Zheng
2025-11-15  0:51   ` kernel test robot
2025-11-15  1:12   ` kernel test robot
2025-11-17 16:57   ` David Hildenbrand (Red Hat)
2025-11-18 12:02     ` Qi Zheng
2025-11-19 10:19       ` David Hildenbrand (Red Hat)
2025-11-19 11:02         ` Qi Zheng
2025-11-19 11:35           ` David Hildenbrand (Red Hat)
2025-11-19 12:13             ` Qi Zheng
2025-11-19 12:24               ` David Hildenbrand (Red Hat) [this message]
2025-11-17 16:53 ` [PATCH 0/7] enable PT_RECLAIM on all 64-bit architectures David Hildenbrand (Red Hat)
2025-11-18 11:53   ` Qi Zheng
2025-11-19 10:13     ` David Hildenbrand (Red Hat)
2025-11-19 10:37       ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7160b6ec-4da5-4273-be91-1339bd00d009@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=ioworker0@gmail.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=linux-um@lists.infradead.org \
    --cc=loongarch@lists.linux.dev \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=qi.zheng@linux.dev \
    --cc=will@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).