* [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
@ 2025-05-12 10:22 Ryan Roberts
2025-05-12 11:00 ` Catalin Marinas
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-05-12 10:22 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Pasha Tatashin, Andrew Morton,
Uladzislau Rezki, Christoph Hellwig, David Hildenbrand,
Matthew Wilcox (Oracle), Mark Rutland, Anshuman Khandual,
Alexandre Ghiti, Kevin Brodsky
Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
flags in order to defer barriers until exiting the mode. At the same
time, it added warnings to check that pte manipulations were never
performed in interrupt context, because the tracking implementation
could not deal with nesting.
But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
do manipulate ptes in softirq context, which triggered the warnings.
So let's take the simplest and safest route and disable the batching
optimization in interrupt contexts. This makes these users no worse off
than prior to the optimization. Additionally the known offenders are
debug features that only manipulate a single PTE, so there is no
performance gain anyway.
There may be some obscure case of encrypted/decrypted DMA with the
dma_free_coherent called from an interrupt context, but again, this is
no worse off than prior to the commit.
Some options for supporting nesting were considered, but there is a
difficult to solve problem if any code manipulates ptes within interrupt
context but *outside of* a lazy mmu region. If this case exists, the
code would expect the updates to be immediate, but because the task
context may have already been in lazy mmu mode, the updates would be
deferred, which could cause incorrect behaviour. This problem is avoided
by always ensuring updates within interrupt context are immediate.
Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
Hi Will,
I've tested before and after with KFENCE enabled and it solves the issue. I've
also run all the mm-selftests which all continue to pass.
Catalin suggested a Fixes patch targetting the SHA as it is in for-next/mm was
the preferred approach, but shout if you want something different. I'm hoping
that with this fix we can still make it for this cycle, subject to not finding
any more issues.
Thanks,
Ryan
arch/arm64/include/asm/pgtable.h | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ab4a1b19e596..e65083ec35cb 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -64,7 +64,11 @@ static inline void queue_pte_barriers(void)
{
unsigned long flags;
- VM_WARN_ON(in_interrupt());
+ if (in_interrupt()) {
+ emit_pte_barriers();
+ return;
+ }
+
flags = read_thread_flags();
if (flags & BIT(TIF_LAZY_MMU)) {
@@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
static inline void arch_enter_lazy_mmu_mode(void)
{
- VM_WARN_ON(in_interrupt());
+ if (in_interrupt())
+ return;
+
VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
set_thread_flag(TIF_LAZY_MMU);
@@ -87,12 +93,18 @@ static inline void arch_enter_lazy_mmu_mode(void)
static inline void arch_flush_lazy_mmu_mode(void)
{
+ if (in_interrupt())
+ return;
+
if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
emit_pte_barriers();
}
static inline void arch_leave_lazy_mmu_mode(void)
{
+ if (in_interrupt())
+ return;
+
arch_flush_lazy_mmu_mode();
clear_thread_flag(TIF_LAZY_MMU);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 10:22 [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Ryan Roberts
@ 2025-05-12 11:00 ` Catalin Marinas
2025-05-12 11:03 ` Ryan Roberts
2025-05-12 11:07 ` David Hildenbrand
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Catalin Marinas @ 2025-05-12 11:00 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
> flags in order to defer barriers until exiting the mode. At the same
> time, it added warnings to check that pte manipulations were never
> performed in interrupt context, because the tracking implementation
> could not deal with nesting.
>
> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
> do manipulate ptes in softirq context, which triggered the warnings.
>
> So let's take the simplest and safest route and disable the batching
> optimization in interrupt contexts. This makes these users no worse off
> than prior to the optimization. Additionally the known offenders are
> debug features that only manipulate a single PTE, so there is no
> performance gain anyway.
>
> There may be some obscure case of encrypted/decrypted DMA with the
> dma_free_coherent called from an interrupt context, but again, this is
> no worse off than prior to the commit.
>
> Some options for supporting nesting were considered, but there is a
> difficult to solve problem if any code manipulates ptes within interrupt
> context but *outside of* a lazy mmu region. If this case exists, the
> code would expect the updates to be immediate, but because the task
> context may have already been in lazy mmu mode, the updates would be
> deferred, which could cause incorrect behaviour. This problem is avoided
> by always ensuring updates within interrupt context are immediate.
>
> Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
> Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
As per the request in the original report, please also add:
Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
I'll give it a try as well with my configurations and let you know if
there are any problems. In the meantime:
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 11:00 ` Catalin Marinas
@ 2025-05-12 11:03 ` Ryan Roberts
0 siblings, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-05-12 11:03 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 12/05/2025 12:00, Catalin Marinas wrote:
> On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
>> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
>> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
>> flags in order to defer barriers until exiting the mode. At the same
>> time, it added warnings to check that pte manipulations were never
>> performed in interrupt context, because the tracking implementation
>> could not deal with nesting.
>>
>> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
>> do manipulate ptes in softirq context, which triggered the warnings.
>>
>> So let's take the simplest and safest route and disable the batching
>> optimization in interrupt contexts. This makes these users no worse off
>> than prior to the optimization. Additionally the known offenders are
>> debug features that only manipulate a single PTE, so there is no
>> performance gain anyway.
>>
>> There may be some obscure case of encrypted/decrypted DMA with the
>> dma_free_coherent called from an interrupt context, but again, this is
>> no worse off than prior to the commit.
>>
>> Some options for supporting nesting were considered, but there is a
>> difficult to solve problem if any code manipulates ptes within interrupt
>> context but *outside of* a lazy mmu region. If this case exists, the
>> code would expect the updates to be immediate, but because the task
>> context may have already been in lazy mmu mode, the updates would be
>> deferred, which could cause incorrect behaviour. This problem is avoided
>> by always ensuring updates within interrupt context are immediate.
>>
>> Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
>> Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>
> As per the request in the original report, please also add:
>
> Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
I've already added it, 2 lines above your comment...
>
> I'll give it a try as well with my configurations and let you know if
> there are any problems. In the meantime:
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 10:22 [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Ryan Roberts
2025-05-12 11:00 ` Catalin Marinas
@ 2025-05-12 11:07 ` David Hildenbrand
2025-05-12 12:00 ` Ryan Roberts
2025-05-12 13:14 ` Catalin Marinas
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-05-12 11:07 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Pasha Tatashin,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
Matthew Wilcox (Oracle), Mark Rutland, Anshuman Khandual,
Alexandre Ghiti, Kevin Brodsky
Cc: linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 12.05.25 12:22, Ryan Roberts wrote:
> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
> flags in order to defer barriers until exiting the mode. At the same
> time, it added warnings to check that pte manipulations were never
> performed in interrupt context, because the tracking implementation
> could not deal with nesting.
>
> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
> do manipulate ptes in softirq context, which triggered the warnings.
>
> So let's take the simplest and safest route and disable the batching
> optimization in interrupt contexts. This makes these users no worse off
> than prior to the optimization. Additionally the known offenders are
> debug features that only manipulate a single PTE, so there is no
> performance gain anyway.
>
> There may be some obscure case of encrypted/decrypted DMA with the
> dma_free_coherent called from an interrupt context, but again, this is
> no worse off than prior to the commit.
>
> Some options for supporting nesting were considered, but there is a
> difficult to solve problem if any code manipulates ptes within interrupt
> context but *outside of* a lazy mmu region. If this case exists, the
> code would expect the updates to be immediate, but because the task
> context may have already been in lazy mmu mode, the updates would be
> deferred, which could cause incorrect behaviour. This problem is avoided
> by always ensuring updates within interrupt context are immediate.
>
> Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
> Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>
> Hi Will,
>
> I've tested before and after with KFENCE enabled and it solves the issue. I've
> also run all the mm-selftests which all continue to pass.
>
> Catalin suggested a Fixes patch targetting the SHA as it is in for-next/mm was
> the preferred approach, but shout if you want something different. I'm hoping
> that with this fix we can still make it for this cycle, subject to not finding
> any more issues.
>
> Thanks,
> Ryan
>
>
> arch/arm64/include/asm/pgtable.h | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index ab4a1b19e596..e65083ec35cb 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -64,7 +64,11 @@ static inline void queue_pte_barriers(void)
> {
> unsigned long flags;
>
> - VM_WARN_ON(in_interrupt());
> + if (in_interrupt()) {
> + emit_pte_barriers();
> + return;
> + }
> +
> flags = read_thread_flags();
>
> if (flags & BIT(TIF_LAZY_MMU)) {
> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
> #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
> static inline void arch_enter_lazy_mmu_mode(void)
> {
> - VM_WARN_ON(in_interrupt());
> + if (in_interrupt())
> + return;
> +
> VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
>
> set_thread_flag(TIF_LAZY_MMU);
> @@ -87,12 +93,18 @@ static inline void arch_enter_lazy_mmu_mode(void)
>
> static inline void arch_flush_lazy_mmu_mode(void)
> {
> + if (in_interrupt())
> + return;
> +
> if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
> emit_pte_barriers();
> }
>
> static inline void arch_leave_lazy_mmu_mode(void)
> {
> + if (in_interrupt())
> + return;
> +
> arch_flush_lazy_mmu_mode();
> clear_thread_flag(TIF_LAZY_MMU);
> }
I guess in all cases we could optimize out the in_interrupt() check on
!debug configs.
Hm, maybe there is an elegant way to catch all of these "problematic" users?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 11:07 ` David Hildenbrand
@ 2025-05-12 12:00 ` Ryan Roberts
2025-05-12 12:05 ` David Hildenbrand
0 siblings, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-05-12 12:00 UTC (permalink / raw)
To: David Hildenbrand, Catalin Marinas, Will Deacon, Pasha Tatashin,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
Matthew Wilcox (Oracle), Mark Rutland, Anshuman Khandual,
Alexandre Ghiti, Kevin Brodsky
Cc: linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 12/05/2025 12:07, David Hildenbrand wrote:
> On 12.05.25 12:22, Ryan Roberts wrote:
>> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
>> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
>> flags in order to defer barriers until exiting the mode. At the same
>> time, it added warnings to check that pte manipulations were never
>> performed in interrupt context, because the tracking implementation
>> could not deal with nesting.
>>
>> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
>> do manipulate ptes in softirq context, which triggered the warnings.
>>
>> So let's take the simplest and safest route and disable the batching
>> optimization in interrupt contexts. This makes these users no worse off
>> than prior to the optimization. Additionally the known offenders are
>> debug features that only manipulate a single PTE, so there is no
>> performance gain anyway.
>>
>> There may be some obscure case of encrypted/decrypted DMA with the
>> dma_free_coherent called from an interrupt context, but again, this is
>> no worse off than prior to the commit.
>>
>> Some options for supporting nesting were considered, but there is a
>> difficult to solve problem if any code manipulates ptes within interrupt
>> context but *outside of* a lazy mmu region. If this case exists, the
>> code would expect the updates to be immediate, but because the task
>> context may have already been in lazy mmu mode, the updates would be
>> deferred, which could cause incorrect behaviour. This problem is avoided
>> by always ensuring updates within interrupt context are immediate.
>>
>> Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
>> Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/linux-arm-
>> kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Hi Will,
>>
>> I've tested before and after with KFENCE enabled and it solves the issue. I've
>> also run all the mm-selftests which all continue to pass.
>>
>> Catalin suggested a Fixes patch targetting the SHA as it is in for-next/mm was
>> the preferred approach, but shout if you want something different. I'm hoping
>> that with this fix we can still make it for this cycle, subject to not finding
>> any more issues.
>>
>> Thanks,
>> Ryan
>>
>>
>> arch/arm64/include/asm/pgtable.h | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index ab4a1b19e596..e65083ec35cb 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -64,7 +64,11 @@ static inline void queue_pte_barriers(void)
>> {
>> unsigned long flags;
>>
>> - VM_WARN_ON(in_interrupt());
>> + if (in_interrupt()) {
>> + emit_pte_barriers();
>> + return;
>> + }
>> +
>> flags = read_thread_flags();
>>
>> if (flags & BIT(TIF_LAZY_MMU)) {
>> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
>> #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
>> static inline void arch_enter_lazy_mmu_mode(void)
>> {
>> - VM_WARN_ON(in_interrupt());
>> + if (in_interrupt())
>> + return;
>> +
>> VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
>>
>> set_thread_flag(TIF_LAZY_MMU);
>> @@ -87,12 +93,18 @@ static inline void arch_enter_lazy_mmu_mode(void)
>>
>> static inline void arch_flush_lazy_mmu_mode(void)
>> {
>> + if (in_interrupt())
>> + return;
>> +
>> if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
>> emit_pte_barriers();
>> }
>>
>> static inline void arch_leave_lazy_mmu_mode(void)
>> {
>> + if (in_interrupt())
>> + return;
>> +
>> arch_flush_lazy_mmu_mode();
>> clear_thread_flag(TIF_LAZY_MMU);
>> }
>
> I guess in all cases we could optimize out the in_interrupt() check on !debug
> configs.
I think that assumes we can easily and accurately identify all configs that
cause this? We've identified 2 but I'm not confident that it's a full list.
Also, KFENCE isn't really a debug config (despite me calling it that in the
commit log) - it's supposed to be something that can be enabled in production
builds.
>
> Hm, maybe there is an elegant way to catch all of these "problematic" users?
I'm all ears if you have any suggestions? :)
It actaully looks like x86/XEN tries to solves this problem in a similar way:
enum xen_lazy_mode xen_get_lazy_mode(void)
{
if (in_interrupt())
return XEN_LAZY_NONE;
return this_cpu_read(xen_lazy_mode);
}
Although I'm not convinced it's fully robust. It also has:
static inline void enter_lazy(enum xen_lazy_mode mode)
{
BUG_ON(this_cpu_read(xen_lazy_mode) != XEN_LAZY_NONE);
this_cpu_write(xen_lazy_mode, mode);
}
which is called as part of its arch_enter_lazy_mmu_mode() implementation. If a
task was already in lazy mmu mode when an interrupt comes in and causes the
nested arch_enter_lazy_mmu_mode() that we saw in this bug report, surely that
BUG_ON() should trigger?
Thanks,
Ryan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 12:00 ` Ryan Roberts
@ 2025-05-12 12:05 ` David Hildenbrand
2025-05-12 12:33 ` Ryan Roberts
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-05-12 12:05 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Pasha Tatashin,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
Matthew Wilcox (Oracle), Mark Rutland, Anshuman Khandual,
Alexandre Ghiti, Kevin Brodsky
Cc: linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
>>> static inline void arch_leave_lazy_mmu_mode(void)
>>> {
>>> + if (in_interrupt())
>>> + return;
>>> +
>>> arch_flush_lazy_mmu_mode();
>>> clear_thread_flag(TIF_LAZY_MMU);
>>> }
>>
>> I guess in all cases we could optimize out the in_interrupt() check on !debug
>> configs.
>
> I think that assumes we can easily and accurately identify all configs that
> cause this? We've identified 2 but I'm not confident that it's a full list.
Agreed. I was wondering if we could convert the ones to use different
pte helpers, whereby these helpers would not be available without
CONFIG_WHATEVER. Then, make these features select CONFIG_WHATEVER.
VM_WARN_ON_* would be used to catch any violations / wrong use of pte
helpers.
> Also, KFENCE isn't really a debug config (despite me calling it that in the
> commit log) - it's supposed to be something that can be enabled in production
> builds.
Agreed. Even Fedora has it.
>
>>
>> Hm, maybe there is an elegant way to catch all of these "problematic" users?
>
> I'm all ears if you have any suggestions? :)
>
>
> It actaully looks like x86/XEN tries to solves this problem in a similar way:
Heh, yes. Good old xen ...
>
> enum xen_lazy_mode xen_get_lazy_mode(void)
> {
> if (in_interrupt())
> return XEN_LAZY_NONE;
>
> return this_cpu_read(xen_lazy_mode);
> }
>
> Although I'm not convinced it's fully robust. It also has:
>
> static inline void enter_lazy(enum xen_lazy_mode mode)
> {
> BUG_ON(this_cpu_read(xen_lazy_mode) != XEN_LAZY_NONE);
>
> this_cpu_write(xen_lazy_mode, mode);
> }
>
> which is called as part of its arch_enter_lazy_mmu_mode() implementation. If a
> task was already in lazy mmu mode when an interrupt comes in and causes the
> nested arch_enter_lazy_mmu_mode() that we saw in this bug report, surely that
> BUG_ON() should trigger?
Hm, good point. But that code is old, so probably something seems to be
preventing that?
In any case, just a thought on the in_interrupt() check, I think this
commit is good enough as is.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 12:05 ` David Hildenbrand
@ 2025-05-12 12:33 ` Ryan Roberts
0 siblings, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-05-12 12:33 UTC (permalink / raw)
To: David Hildenbrand, Catalin Marinas, Will Deacon, Pasha Tatashin,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
Matthew Wilcox (Oracle), Mark Rutland, Anshuman Khandual,
Alexandre Ghiti, Kevin Brodsky
Cc: linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 12/05/2025 13:05, David Hildenbrand wrote:
>
>>>> static inline void arch_leave_lazy_mmu_mode(void)
>>>> {
>>>> + if (in_interrupt())
>>>> + return;
>>>> +
>>>> arch_flush_lazy_mmu_mode();
>>>> clear_thread_flag(TIF_LAZY_MMU);
>>>> }
>>>
>>> I guess in all cases we could optimize out the in_interrupt() check on !debug
>>> configs.
>>
>> I think that assumes we can easily and accurately identify all configs that
>> cause this? We've identified 2 but I'm not confident that it's a full list.
>
> Agreed. I was wondering if we could convert the ones to use different pte
> helpers, whereby these helpers would not be available without CONFIG_WHATEVER.
> Then, make these features select CONFIG_WHATEVER.
Trouble is, in KFENCE's case, there is quite a bit of code between the general
API it calls and the pte manipulations:
arch_enter_lazy_mmu_mode arch/arm64/include/asm/pgtable.h:82 [inline] (P)
apply_to_pte_range mm/memory.c:2936 [inline] (P)
apply_to_pmd_range mm/memory.c:2985 [inline] (P)
apply_to_pud_range mm/memory.c:3021 [inline] (P)
apply_to_p4d_range mm/memory.c:3057 [inline] (P)
__apply_to_page_range+0xdb4/0x13e4 mm/memory.c:3093 (P)
apply_to_page_range+0x4c/0x64 mm/memory.c:3112
__change_memory_common+0xac/0x3f8 arch/arm64/mm/pageattr.c:64
set_memory_valid+0x68/0x7c arch/arm64/mm/pageattr.c:-1
kfence_protect_page arch/arm64/include/asm/kfence.h:17 [inline]
kfence_protect mm/kfence/core.c:247 [inline]
kfence_guarded_free+0x278/0x5a8 mm/kfence/core.c:565
__kfence_free+0x104/0x198 mm/kfence/core.c:1187
kfence_free include/linux/kfence.h:187 [inline]
slab_free_hook mm/slub.c:2318 [inline]
slab_free mm/slub.c:4642 [inline]
kfree+0x268/0x474 mm/slub.c:4841
slab_free_after_rcu_debug+0x78/0x2f4 mm/slub.c:4679
rcu_do_batch kernel/rcu/tree.c:2568 [inline]
rcu_core+0x848/0x17a4 kernel/rcu/tree.c:2824
rcu_core_si+0x10/0x1c kernel/rcu/tree.c:2841
handle_softirqs+0x328/0xc88 kernel/softirq.c:579
__do_softirq+0x14/0x20 kernel/softirq.c:613
>
> VM_WARN_ON_* would be used to catch any violations / wrong use of pte helpers.
>
>> Also, KFENCE isn't really a debug config (despite me calling it that in the
>> commit log) - it's supposed to be something that can be enabled in production
>> builds.
>
> Agreed. Even Fedora has it.
>
>>
>>>
>>> Hm, maybe there is an elegant way to catch all of these "problematic" users?
>>
>> I'm all ears if you have any suggestions? :)
>>
>>
>> It actaully looks like x86/XEN tries to solves this problem in a similar way:
>
> Heh, yes. Good old xen ...
>
>>
>> enum xen_lazy_mode xen_get_lazy_mode(void)
>> {
>> if (in_interrupt())
>> return XEN_LAZY_NONE;
>>
>> return this_cpu_read(xen_lazy_mode);
>> }
>>
>> Although I'm not convinced it's fully robust. It also has:
>>
>> static inline void enter_lazy(enum xen_lazy_mode mode)
>> {
>> BUG_ON(this_cpu_read(xen_lazy_mode) != XEN_LAZY_NONE);
>>
>> this_cpu_write(xen_lazy_mode, mode);
>> }
>>
>> which is called as part of its arch_enter_lazy_mmu_mode() implementation. If a
>> task was already in lazy mmu mode when an interrupt comes in and causes the
>> nested arch_enter_lazy_mmu_mode() that we saw in this bug report, surely that
>> BUG_ON() should trigger?
>
> Hm, good point. But that code is old, so probably something seems to be
> preventing that?
>
>
> In any case, just a thought on the in_interrupt() check, I think this commit is
> good enough as is.
OK thanks. I'll leave it as-is then in the absence of other feedback :)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 10:22 [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Ryan Roberts
2025-05-12 11:00 ` Catalin Marinas
2025-05-12 11:07 ` David Hildenbrand
@ 2025-05-12 13:14 ` Catalin Marinas
2025-05-12 13:53 ` Ryan Roberts
2025-05-13 20:46 ` Will Deacon
2025-05-14 15:14 ` Will Deacon
4 siblings, 1 reply; 14+ messages in thread
From: Catalin Marinas @ 2025-05-12 13:14 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
> #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
> static inline void arch_enter_lazy_mmu_mode(void)
> {
> - VM_WARN_ON(in_interrupt());
> + if (in_interrupt())
> + return;
> +
> VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
I still get this warning trigger with some debugging enabled (more
specifically, CONFIG_DEBUG_PAGEALLOC). Patch applied on top of the arm64
for-kernelci.
Is it because the unmap code uses arch_enter_lazy_mmu_mode() already and
__apply_to_page_range() via __kernel_map_pages() is attempting another
nested call? I think it's still safe, we just drop the optimisation in
the outer code and issue the barriers immediately. So maybe drop this
warning as well but add a comment on how nesting works.
------------[ cut here ]------------
WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8
Modules linked in: ip_tables x_tables ipv6
CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT
Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __apply_to_page_range+0x85c/0x9f8
lr : __apply_to_page_range+0x2b4/0x9f8
sp : ffff80008009b3c0
x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108
x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001
x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20
x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003
x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff
x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018
x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000
x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000
x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000
Call trace:
__apply_to_page_range+0x85c/0x9f8 (P)
apply_to_page_range+0x14/0x20
set_memory_valid+0x5c/0xd8
__kernel_map_pages+0x84/0xc0
get_page_from_freelist+0x1110/0x1340
__alloc_frozen_pages_noprof+0x114/0x1178
alloc_pages_mpol+0xb8/0x1d0
alloc_frozen_pages_noprof+0x48/0xc0
alloc_pages_noprof+0x10/0x60
get_free_pages_noprof+0x14/0x90
__tlb_remove_folio_pages_size.isra.0+0xe4/0x140
__tlb_remove_folio_pages+0x10/0x20
unmap_page_range+0xa1c/0x14c0
unmap_single_vma.isra.0+0x48/0x90
unmap_vmas+0xe0/0x200
vms_clear_ptes+0xf4/0x140
vms_complete_munmap_vmas+0x7c/0x208
do_vmi_align_munmap+0x180/0x1a8
do_vmi_munmap+0xac/0x188
__vm_munmap+0xe0/0x1e0
__arm64_sys_munmap+0x20/0x38
invoke_syscall+0x48/0x104
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x4c/0x16c
el0t_64_sync_handler+0x10c/0x140
el0t_64_sync+0x198/0x19c
irq event stamp: 281312
hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0
hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98
softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518
softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20
---[ end trace 0000000000000000 ]---
--
Catalin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 13:14 ` Catalin Marinas
@ 2025-05-12 13:53 ` Ryan Roberts
2025-05-12 14:14 ` Ryan Roberts
0 siblings, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-05-12 13:53 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 12/05/2025 14:14, Catalin Marinas wrote:
> On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
>> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
>> #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
>> static inline void arch_enter_lazy_mmu_mode(void)
>> {
>> - VM_WARN_ON(in_interrupt());
>> + if (in_interrupt())
>> + return;
>> +
>> VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
>
> I still get this warning trigger with some debugging enabled (more
> specifically, CONFIG_DEBUG_PAGEALLOC). Patch applied on top of the arm64
> for-kernelci.
Thanks for the report...
I'll admit I didn't explicitly test CONFIG_DEBUG_PAGEALLOC since I thought we
concluded when talking that the failure mode was the same as KFENCE in that it
was due to pte manipulations in the interrupt context.
But that's not what this trace shows...
The warning is basically saying we have nested lazy mmu mode regions, both in
task context, which is completely illegal as far as lazy mmu is concerned.
Looks like the first nest is zap_pte_range(), which is batching with mmu_gather
and that allocates memory in tlb_next_batch(). And when CONFIG_DEBUG_PAGEALLOC
is enabled, it calls into the arch to make the allocated page valid in the
linear map. arm64 does that with apply_to_page_range(), which does a second lazy
mmu nest.
I need to have a think about what the right fix is. Will get back to you shortly.
Thanks,
Ryan
>
> Is it because the unmap code uses arch_enter_lazy_mmu_mode() already and
> __apply_to_page_range() via __kernel_map_pages() is attempting another
> nested call? I think it's still safe, we just drop the optimisation in
> the outer code and issue the barriers immediately. So maybe drop this
> warning as well but add a comment on how nesting works.
>
> ------------[ cut here ]------------
> WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8
> Modules linked in: ip_tables x_tables ipv6
> CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT
> Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024
> pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __apply_to_page_range+0x85c/0x9f8
> lr : __apply_to_page_range+0x2b4/0x9f8
> sp : ffff80008009b3c0
> x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108
> x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001
> x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20
> x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003
> x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff
> x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018
> x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000
> x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000
> x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000
> Call trace:
> __apply_to_page_range+0x85c/0x9f8 (P)
> apply_to_page_range+0x14/0x20
> set_memory_valid+0x5c/0xd8
> __kernel_map_pages+0x84/0xc0
> get_page_from_freelist+0x1110/0x1340
> __alloc_frozen_pages_noprof+0x114/0x1178
> alloc_pages_mpol+0xb8/0x1d0
> alloc_frozen_pages_noprof+0x48/0xc0
> alloc_pages_noprof+0x10/0x60
> get_free_pages_noprof+0x14/0x90
> __tlb_remove_folio_pages_size.isra.0+0xe4/0x140
> __tlb_remove_folio_pages+0x10/0x20
> unmap_page_range+0xa1c/0x14c0
> unmap_single_vma.isra.0+0x48/0x90
> unmap_vmas+0xe0/0x200
> vms_clear_ptes+0xf4/0x140
> vms_complete_munmap_vmas+0x7c/0x208
> do_vmi_align_munmap+0x180/0x1a8
> do_vmi_munmap+0xac/0x188
> __vm_munmap+0xe0/0x1e0
> __arm64_sys_munmap+0x20/0x38
> invoke_syscall+0x48/0x104
> el0_svc_common.constprop.0+0x40/0xe0
> do_el0_svc+0x1c/0x28
> el0_svc+0x4c/0x16c
> el0t_64_sync_handler+0x10c/0x140
> el0t_64_sync+0x198/0x19c
> irq event stamp: 281312
> hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0
> hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98
> softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518
> softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20
> ---[ end trace 0000000000000000 ]---
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 13:53 ` Ryan Roberts
@ 2025-05-12 14:14 ` Ryan Roberts
0 siblings, 0 replies; 14+ messages in thread
From: Ryan Roberts @ 2025-05-12 14:14 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 12/05/2025 14:53, Ryan Roberts wrote:
> On 12/05/2025 14:14, Catalin Marinas wrote:
>> On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
>>> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
>>> #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
>>> static inline void arch_enter_lazy_mmu_mode(void)
>>> {
>>> - VM_WARN_ON(in_interrupt());
>>> + if (in_interrupt())
>>> + return;
>>> +
>>> VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
>>
>> I still get this warning trigger with some debugging enabled (more
>> specifically, CONFIG_DEBUG_PAGEALLOC). Patch applied on top of the arm64
>> for-kernelci.
>
> Thanks for the report...
>
> I'll admit I didn't explicitly test CONFIG_DEBUG_PAGEALLOC since I thought we
> concluded when talking that the failure mode was the same as KFENCE in that it
> was due to pte manipulations in the interrupt context.
>
> But that's not what this trace shows...
>
> The warning is basically saying we have nested lazy mmu mode regions, both in
> task context, which is completely illegal as far as lazy mmu is concerned.
>
> Looks like the first nest is zap_pte_range(), which is batching with mmu_gather
> and that allocates memory in tlb_next_batch(). And when CONFIG_DEBUG_PAGEALLOC
> is enabled, it calls into the arch to make the allocated page valid in the
> linear map. arm64 does that with apply_to_page_range(), which does a second lazy
> mmu nest.
>
> I need to have a think about what the right fix is. Will get back to you shortly.
>
> Thanks,
> Ryan
>
>>
>> Is it because the unmap code uses arch_enter_lazy_mmu_mode() already and
>> __apply_to_page_range() via __kernel_map_pages() is attempting another
>> nested call? I think it's still safe, we just drop the optimisation in
>> the outer code and issue the barriers immediately. So maybe drop this
>> warning as well but add a comment on how nesting works.
Sorry Catalin, I completely missed this propsal on first read!
Yes that's what is happening, yes it is still safe, and yes we just drop the
optimization.
But it's an ugly solution IMHO.
The real problem is that arm64 has chosen to use apply_to_page_range() as part
of it's implementation of __kernel_map_pages(), and apply_to_page_range()
expects to be able to use lazy mmu.
lazy mmu spec says "Nesting is not permitted and the mode cannot be used in
interrupt context." Clearly neither of those things are true :)
Looking at the powerpc lazy mmu implementation, I think it will do exactly what
you are suggesting we do here, which is just terminate the optimization early.
So I guess that's the most pragmatic approach. I'll re-send this patch. Sigh.
Thanks,
Ryan
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8
>> Modules linked in: ip_tables x_tables ipv6
>> CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT
>> Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024
>> pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : __apply_to_page_range+0x85c/0x9f8
>> lr : __apply_to_page_range+0x2b4/0x9f8
>> sp : ffff80008009b3c0
>> x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108
>> x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001
>> x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20
>> x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001
>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003
>> x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff
>> x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018
>> x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000
>> x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000
>> x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000
>> Call trace:
>> __apply_to_page_range+0x85c/0x9f8 (P)
>> apply_to_page_range+0x14/0x20
>> set_memory_valid+0x5c/0xd8
>> __kernel_map_pages+0x84/0xc0
>> get_page_from_freelist+0x1110/0x1340
>> __alloc_frozen_pages_noprof+0x114/0x1178
>> alloc_pages_mpol+0xb8/0x1d0
>> alloc_frozen_pages_noprof+0x48/0xc0
>> alloc_pages_noprof+0x10/0x60
>> get_free_pages_noprof+0x14/0x90
>> __tlb_remove_folio_pages_size.isra.0+0xe4/0x140
>> __tlb_remove_folio_pages+0x10/0x20
>> unmap_page_range+0xa1c/0x14c0
>> unmap_single_vma.isra.0+0x48/0x90
>> unmap_vmas+0xe0/0x200
>> vms_clear_ptes+0xf4/0x140
>> vms_complete_munmap_vmas+0x7c/0x208
>> do_vmi_align_munmap+0x180/0x1a8
>> do_vmi_munmap+0xac/0x188
>> __vm_munmap+0xe0/0x1e0
>> __arm64_sys_munmap+0x20/0x38
>> invoke_syscall+0x48/0x104
>> el0_svc_common.constprop.0+0x40/0xe0
>> do_el0_svc+0x1c/0x28
>> el0_svc+0x4c/0x16c
>> el0t_64_sync_handler+0x10c/0x140
>> el0t_64_sync+0x198/0x19c
>> irq event stamp: 281312
>> hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0
>> hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98
>> softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518
>> softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20
>> ---[ end trace 0000000000000000 ]---
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 10:22 [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Ryan Roberts
` (2 preceding siblings ...)
2025-05-12 13:14 ` Catalin Marinas
@ 2025-05-13 20:46 ` Will Deacon
2025-05-14 9:29 ` Ryan Roberts
2025-05-14 15:14 ` Will Deacon
4 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2025-05-13 20:46 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
> flags in order to defer barriers until exiting the mode. At the same
> time, it added warnings to check that pte manipulations were never
> performed in interrupt context, because the tracking implementation
> could not deal with nesting.
>
> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
> do manipulate ptes in softirq context, which triggered the warnings.
Hmm. Do we also have to worry about the case where a softirq is triggered
off the back of a hardirq *and* that hardirq is taken while we're in the
middle of e.g. queue_pte_barriers()? In that case, I think we can end
up in strange situations, such as having LAZY_MMU_PENDING set when
LAZY_MMU is clear, although it looks like things still work even in that
case.
Will
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-13 20:46 ` Will Deacon
@ 2025-05-14 9:29 ` Ryan Roberts
2025-05-14 15:13 ` Will Deacon
0 siblings, 1 reply; 14+ messages in thread
From: Ryan Roberts @ 2025-05-14 9:29 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On 13/05/2025 21:46, Will Deacon wrote:
> On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
>> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
>> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
>> flags in order to defer barriers until exiting the mode. At the same
>> time, it added warnings to check that pte manipulations were never
>> performed in interrupt context, because the tracking implementation
>> could not deal with nesting.
>>
>> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
>> do manipulate ptes in softirq context, which triggered the warnings.
>
> Hmm. Do we also have to worry about the case where a softirq is triggered
> off the back of a hardirq *and* that hardirq is taken while we're in the
> middle of e.g. queue_pte_barriers()? In that case, I think we can end
> up in strange situations, such as having LAZY_MMU_PENDING set when
> LAZY_MMU is clear, although it looks like things still work even in that
> case.
I don't see any problem here. This change ensures that we always behave the
"old" way in interrupt context. So the interrupt context will never even look at
those TIF flags, so it doesn't matter that the task context is midway through
changing the flags when the interrupt comes in.
(although somehow I feel like I should be bracing for a zinger :)
Thanks,
Ryan
>
> Will
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-14 9:29 ` Ryan Roberts
@ 2025-05-14 15:13 ` Will Deacon
0 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2025-05-14 15:13 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel,
syzbot+5c0d9392e042f41d45c5
On Wed, May 14, 2025 at 10:29:17AM +0100, Ryan Roberts wrote:
> On 13/05/2025 21:46, Will Deacon wrote:
> > On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote:
> >> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
> >> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
> >> flags in order to defer barriers until exiting the mode. At the same
> >> time, it added warnings to check that pte manipulations were never
> >> performed in interrupt context, because the tracking implementation
> >> could not deal with nesting.
> >>
> >> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
> >> do manipulate ptes in softirq context, which triggered the warnings.
> >
> > Hmm. Do we also have to worry about the case where a softirq is triggered
> > off the back of a hardirq *and* that hardirq is taken while we're in the
> > middle of e.g. queue_pte_barriers()? In that case, I think we can end
> > up in strange situations, such as having LAZY_MMU_PENDING set when
> > LAZY_MMU is clear, although it looks like things still work even in that
> > case.
>
> I don't see any problem here. This change ensures that we always behave the
> "old" way in interrupt context. So the interrupt context will never even look at
> those TIF flags, so it doesn't matter that the task context is midway through
> changing the flags when the interrupt comes in.
>
> (although somehow I feel like I should be bracing for a zinger :)
Ha, for some reason, I was looking at the code _without_ your fix
applied. Although it's quite hard to think about, I couldn't spot any
other issues with nesting beyond the one you call out at the end of the
commit message. Your patch makes all of this a lot simpler, though, so
I'll pick it up (along with the other one).
Thanks,
Will
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts
2025-05-12 10:22 [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Ryan Roberts
` (3 preceding siblings ...)
2025-05-13 20:46 ` Will Deacon
@ 2025-05-14 15:14 ` Will Deacon
4 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2025-05-14 15:14 UTC (permalink / raw)
To: Catalin Marinas, Pasha Tatashin, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Alexandre Ghiti, Kevin Brodsky,
Ryan Roberts
Cc: kernel-team, Will Deacon, linux-arm-kernel, linux-mm,
linux-kernel, syzbot+5c0d9392e042f41d45c5
On Mon, 12 May 2025 11:22:40 +0100, Ryan Roberts wrote:
> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
> flags in order to defer barriers until exiting the mode. At the same
> time, it added warnings to check that pte manipulations were never
> performed in interrupt context, because the tracking implementation
> could not deal with nesting.
>
> [...]
Applied to arm64 (for-next/mm), thanks!
[1/1] arm64/mm: Disable barrier batching in interrupt contexts
https://git.kernel.org/arm64/c/b81c688426a9
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-05-14 17:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 10:22 [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Ryan Roberts
2025-05-12 11:00 ` Catalin Marinas
2025-05-12 11:03 ` Ryan Roberts
2025-05-12 11:07 ` David Hildenbrand
2025-05-12 12:00 ` Ryan Roberts
2025-05-12 12:05 ` David Hildenbrand
2025-05-12 12:33 ` Ryan Roberts
2025-05-12 13:14 ` Catalin Marinas
2025-05-12 13:53 ` Ryan Roberts
2025-05-12 14:14 ` Ryan Roberts
2025-05-13 20:46 ` Will Deacon
2025-05-14 9:29 ` Ryan Roberts
2025-05-14 15:13 ` Will Deacon
2025-05-14 15:14 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).