All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Brodsky <kevin.brodsky@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>, Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] arm64/mm: Ensure lazy_mmu_mode never nests
Date: Fri, 1 Aug 2025 13:22:09 +0200	[thread overview]
Message-ID: <8d810d87-e6a5-42bf-83c5-ba3aaec1889f@arm.com> (raw)
In-Reply-To: <066fa735-98ad-45f4-9316-b983d2e5a3d3@arm.com>

On 12/06/2025 18:00, Ryan Roberts wrote:
> On 12/06/2025 15:59, Will Deacon wrote:
>> On Tue, Jun 10, 2025 at 05:37:20PM +0100, Ryan Roberts wrote:
>>> On 10/06/2025 16:07, Catalin Marinas wrote:
>>>> On Tue, Jun 10, 2025 at 02:41:01PM +0100, Ryan Roberts wrote:
>>>>> On 10/06/2025 13:00, Catalin Marinas wrote:
>>>>>> On Fri, Jun 06, 2025 at 02:56:52PM +0100, Ryan Roberts wrote:
>>>>>>> Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested")
>>>>>>> provided a quick fix to ensure that lazy_mmu_mode continues to work when
>>>>>>> CONFIG_DEBUG_PAGEALLOC is enabled, which can cause lazy_mmu_mode to
>>>>>>> nest.
>>>>>>>
>>>>>>> The solution in that patch is the make the implementation tolerant to
>>>>>> s/is the make/is to make/
>>>>>>
>>>>>>> nesting; when the inner nest exits lazy_mmu_mode, we exit then the outer
>>>>>>> exit becomes a nop. But this sacrifices the optimization opportunity for
>>>>>>> the remainder of the outer user.
>>>>>> [...]
>>>>>>> I wonder if you might be willing to take this for v6.16? I think its a neater
>>>>>>> solution then my first attempt - Commit 1ef3095b1405 ("arm64/mm: Permit
>>>>>>> lazy_mmu_mode to be nested") - which is already in Linus's master.
>>>>>>>
>>>>>>> To be clear, the current solution is safe, I just think this is much neater.
>>>>>> Maybe better, though I wouldn't say much neater. One concern I have is
>>>>>> about whether we'll get other such nesting in the future and we need to
>>>>>> fix them in generic code. Here we control __kernel_map_pages() but we
>>>>>> may not for other cases.
>>>>>>
>>>>>> Is it the fault of the arch code that uses apply_to_page_range() via
>>>>>> __kernel_map_pages()? It feels like it shouldn't care about the lazy
>>>>>> mode as that's some detail of the apply_to_page_range() implementation.
>>>>>> Maybe this API should just allow nesting.
>>>>> I don't think it is possible to properly support nesting:
>>>>>
>>>>> enter_lazy_mmu
>>>>>     for_each_pte {
>>>>>         read/modify-write pte
>>>>>
>>>>>         alloc_page
>>>>>             enter_lazy_mmu
>>>>>                 make page valid
>>>>>             exit_lazy_mmu
>>>>>
>>>>>         write_to_page
>>>>>     }
>>>>> exit_lazy_mmu
>>>>>
>>>>> This example only works because lazy_mmu doesn't support nesting. The "make page
>>>>> valid" operation is completed by the time of the inner exit_lazy_mmu so that the
>>>>> page can be accessed in write_to_page. If nesting was supported, the inner
>>>>> exit_lazy_mmu would become a nop and write_to_page would explode.
>>>> What I meant is for enter/exit_lazy_mmu to handle a kind of de-nesting
>>>> themselves: enter_lazy_mmu would emit the barriers if already in lazy
>>>> mode, clear pending (well, it doesn't need to do this but it may be
>>>> easier to reason about in terms of flattening). exit_lazy_mmu also needs
>>>> to emit the barriers but leave the lazy mode on if already on when last
>>>> entered. This does need some API modifications to return the old mode on
>>>> enter and get an argument for exit. But the correctness wouldn't be
>>>> affected since exit_lazy_mmu still emits the barriers irrespective of
>>>> the nesting level.
>>> Ahh I see what you mean now; exit always emits barriers but only the last exit
>>> clears TIF_LAZY_MMU.
>>>
>>> I think that's much cleaner, but we are changing the API which needs changes to
>>> all the arches and my attempt at [1] didn't really gain much enthusiasm.
>> To be honest, I don't think the proposal in this series is really
>> improving what we have. Either we support nested lazy mode or we don't;
>> having __kernel_map_pages() mess around with the lazy mmu state because
>> it somehow knows that set_memory_valid() is going to use it is fragile
>> and ugly.
>>
>> So I'm incined to leave the current code as-is, unless we can remove it
>> in favour of teaching the core code how to handle it instead.
> Yeah fair enough. I'm not going to have time to do the proper nesting support
> thing. But I'll see if I can find someone internally that I might be able to
> convince. If not, we'll just leave as is.

I've started working on supporting nesting as Catalin suggested above -
modifying the API so that enter() returns whether the call is nested,
and leave() takes the value returned by enter(). I've got it working
without too much trouble, there's a fair amount of churn at the call
sites but a trivial Coccinelle script can handle most of them.

This new approach will also help with protecting page tables with
kpkeys: lazy_mmu is very useful to write to POR_EL1 less often [1], but
this currently assumes that nesting doesn't occur. In fact the new API
should allow the old value of POR_EL1 to be returned by enter(), meaning
we wouldn't need to store it in a thread-local variable.

- Kevin

[1]
https://lore.kernel.org/linux-mm/20250411091631.954228-19-kevin.brodsky@arm.com/



      parent reply	other threads:[~2025-08-01 11:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 13:56 [PATCH v1] arm64/mm: Ensure lazy_mmu_mode never nests Ryan Roberts
2025-06-10 12:00 ` Catalin Marinas
2025-06-10 13:41   ` Ryan Roberts
2025-06-10 15:07     ` Catalin Marinas
2025-06-10 16:37       ` Ryan Roberts
2025-06-12 14:59         ` Will Deacon
2025-06-12 16:00           ` Ryan Roberts
2025-06-12 16:21             ` Will Deacon
2025-08-01 11:22             ` Kevin Brodsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d810d87-e6a5-42bf-83c5-ba3aaec1889f@arm.com \
    --to=kevin.brodsky@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.