* [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
[not found] <20260616145125.307082728@linuxfoundation.org>
@ 2026-06-16 14:58 ` Greg Kroah-Hartman
2026-06-21 15:02 ` Ben Hutchings
0 siblings, 1 reply; 5+ messages in thread
From: Greg Kroah-Hartman @ 2026-06-16 14:58 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Will Deacon, linux-arm-kernel,
linux-kernel, David Hildenbrand (Arm), Ryan Roberts,
Anshuman Khandual, Catalin Marinas, Sasha Levin
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anshuman Khandual <anshuman.khandual@arm.com>
[ Upstream commit 48478b9f791376b4b89018d7afdfd06865498f65 ]
During a memory hot remove operation, both linear and vmemmap mappings for
the memory range being removed, get unmapped via unmap_hotplug_range() but
mapped pages get freed only for vmemmap mapping. This is just a sequential
operation where each table entry gets cleared, followed by a leaf specific
TLB flush, and then followed by memory free operation when applicable.
This approach was simple and uniform both for vmemmap and linear mappings.
But linear mapping might contain CONT marked block memory where it becomes
necessary to first clear out all entire in the range before a TLB flush.
This is as per the architecture requirement. Hence batch all TLB flushes
during the table tear down walk and finally do it in unmap_hotplug_range().
Prior to this fix, it was hypothetically possible for a speculative access
to a higher address in the contiguous block to fill the TLB with shattered
entries for the entire contiguous range after a lower address had already
been cleared and invalidated. Due to the table entries being shattered, the
subsequent TLB invalidation for the higher address would not then clear the
TLB entries for the lower address, meaning stale TLB entries could persist.
Besides it also helps in improving the performance via TLBI range operation
along with reduced synchronization instructions. The time spent executing
unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal
in KVM guest.
This scheme is not applicable during vmemmap mapping tear down where memory
needs to be freed and hence a TLB flush is required after clearing out page
table entry.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ replaced `__pte_clear()` with `pte_clear()` ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/arm64/mm/mmu.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -925,10 +925,14 @@ static void unmap_hotplug_pte_range(pmd_
WARN_ON(!pte_present(pte));
pte_clear(&init_mm, addr, ptep);
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
- if (free_mapped)
+ if (free_mapped) {
+ /* CONT blocks are not supported in the vmemmap */
+ WARN_ON(pte_cont(pte));
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
free_hotplug_page_range(pte_page(pte),
PAGE_SIZE, altmap);
+ }
+ /* unmap_hotplug_range() flushes TLB for !free_mapped */
} while (addr += PAGE_SIZE, addr < end);
}
@@ -949,15 +953,14 @@ static void unmap_hotplug_pmd_range(pud_
WARN_ON(!pmd_present(pmd));
if (pmd_sect(pmd)) {
pmd_clear(pmdp);
-
- /*
- * One TLBI should be sufficient here as the PMD_SIZE
- * range is mapped with a single block entry.
- */
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
- if (free_mapped)
+ if (free_mapped) {
+ /* CONT blocks are not supported in the vmemmap */
+ WARN_ON(pmd_cont(pmd));
+ flush_tlb_kernel_range(addr, addr + PMD_SIZE);
free_hotplug_page_range(pmd_page(pmd),
PMD_SIZE, altmap);
+ }
+ /* unmap_hotplug_range() flushes TLB for !free_mapped */
continue;
}
WARN_ON(!pmd_table(pmd));
@@ -982,15 +985,12 @@ static void unmap_hotplug_pud_range(p4d_
WARN_ON(!pud_present(pud));
if (pud_sect(pud)) {
pud_clear(pudp);
-
- /*
- * One TLBI should be sufficient here as the PUD_SIZE
- * range is mapped with a single block entry.
- */
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
- if (free_mapped)
+ if (free_mapped) {
+ flush_tlb_kernel_range(addr, addr + PUD_SIZE);
free_hotplug_page_range(pud_page(pud),
PUD_SIZE, altmap);
+ }
+ /* unmap_hotplug_range() flushes TLB for !free_mapped */
continue;
}
WARN_ON(!pud_table(pud));
@@ -1020,6 +1020,7 @@ static void unmap_hotplug_p4d_range(pgd_
static void unmap_hotplug_range(unsigned long addr, unsigned long end,
bool free_mapped, struct vmem_altmap *altmap)
{
+ unsigned long start = addr;
unsigned long next;
pgd_t *pgdp, pgd;
@@ -1041,6 +1042,9 @@ static void unmap_hotplug_range(unsigned
WARN_ON(!pgd_present(pgd));
unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap);
} while (addr = next, addr < end);
+
+ if (!free_mapped)
+ flush_tlb_kernel_range(start, end);
}
static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
2026-06-16 14:58 ` [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range() Greg Kroah-Hartman
@ 2026-06-21 15:02 ` Ben Hutchings
2026-06-23 14:25 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Ben Hutchings @ 2026-06-21 15:02 UTC (permalink / raw)
To: Anshuman Khandual, Catalin Marinas, David Hildenbrand (Arm),
Ryan Roberts
Cc: patches, Will Deacon, linux-arm-kernel, linux-kernel, Sasha Levin,
Greg Kroah-Hartman, stable
[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]
On Tue, 2026-06-16 at 20:28 +0530, Greg Kroah-Hartman wrote:
> 6.1-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Anshuman Khandual <anshuman.khandual@arm.com>
>
> [ Upstream commit 48478b9f791376b4b89018d7afdfd06865498f65 ]
[...]
> @@ -949,15 +953,14 @@ static void unmap_hotplug_pmd_range(pud_
> WARN_ON(!pmd_present(pmd));
> if (pmd_sect(pmd)) {
> pmd_clear(pmdp);
> -
> - /*
> - * One TLBI should be sufficient here as the PMD_SIZE
> - * range is mapped with a single block entry.
> - */
> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> - if (free_mapped)
> + if (free_mapped) {
> + /* CONT blocks are not supported in the vmemmap */
> + WARN_ON(pmd_cont(pmd));
> + flush_tlb_kernel_range(addr, addr + PMD_SIZE);
It wasn't clear to me from the commit message why this now adds PMD_SIZE
rather than PAGE_SIZE. It seems like this change is fine for Linux
6.13+ with a CPU that supports TLB range flushing, but otherwise results
in unnecessarily executing multiple TLB invalidations at intervals of
the base page size.
> free_hotplug_page_range(pmd_page(pmd),
> PMD_SIZE, altmap);
> + }
> + /* unmap_hotplug_range() flushes TLB for !free_mapped */
> continue;
> }
> WARN_ON(!pmd_table(pmd));
> @@ -982,15 +985,12 @@ static void unmap_hotplug_pud_range(p4d_
> WARN_ON(!pud_present(pud));
> if (pud_sect(pud)) {
> pud_clear(pudp);
> -
> - /*
> - * One TLBI should be sufficient here as the PUD_SIZE
> - * range is mapped with a single block entry.
> - */
> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> - if (free_mapped)
> + if (free_mapped) {
> + flush_tlb_kernel_range(addr, addr + PUD_SIZE);
[...]
Similarly here, but this is effectively flush_tlb_all() instead.
Ben.
--
Ben Hutchings
No political challenge can be met by shopping. - George Monbiot
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
2026-06-21 15:02 ` Ben Hutchings
@ 2026-06-23 14:25 ` Will Deacon
2026-06-24 15:05 ` Ryan Roberts
0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2026-06-23 14:25 UTC (permalink / raw)
To: Ben Hutchings
Cc: Anshuman Khandual, Catalin Marinas, David Hildenbrand (Arm),
Ryan Roberts, patches, linux-arm-kernel, linux-kernel,
Sasha Levin, Greg Kroah-Hartman, stable, mark.rutland
On Sun, Jun 21, 2026 at 05:02:27PM +0200, Ben Hutchings wrote:
> On Tue, 2026-06-16 at 20:28 +0530, Greg Kroah-Hartman wrote:
> > 6.1-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Anshuman Khandual <anshuman.khandual@arm.com>
> >
> > [ Upstream commit 48478b9f791376b4b89018d7afdfd06865498f65 ]
> [...]
> > @@ -949,15 +953,14 @@ static void unmap_hotplug_pmd_range(pud_
> > WARN_ON(!pmd_present(pmd));
> > if (pmd_sect(pmd)) {
> > pmd_clear(pmdp);
> > -
> > - /*
> > - * One TLBI should be sufficient here as the PMD_SIZE
> > - * range is mapped with a single block entry.
> > - */
> > - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > - if (free_mapped)
> > + if (free_mapped) {
> > + /* CONT blocks are not supported in the vmemmap */
> > + WARN_ON(pmd_cont(pmd));
> > + flush_tlb_kernel_range(addr, addr + PMD_SIZE);
>
> It wasn't clear to me from the commit message why this now adds PMD_SIZE
> rather than PAGE_SIZE. It seems like this change is fine for Linux
> 6.13+ with a CPU that supports TLB range flushing, but otherwise results
> in unnecessarily executing multiple TLB invalidations at intervals of
> the base page size.
Hmm, the commit message also makes very little sense to me and so I don't
understand why this patch has us doing multiple TLB invalidations when
we run into a !cont, block mapping at the PMD level. The old comment
(which this patch removes) should still apply afaict.
Anshuman, Ryan, any ideas what's going on here?
Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
2026-06-23 14:25 ` Will Deacon
@ 2026-06-24 15:05 ` Ryan Roberts
2026-06-24 16:29 ` Greg Kroah-Hartman
0 siblings, 1 reply; 5+ messages in thread
From: Ryan Roberts @ 2026-06-24 15:05 UTC (permalink / raw)
To: Will Deacon, Ben Hutchings
Cc: Anshuman Khandual, Catalin Marinas, David Hildenbrand (Arm),
patches, linux-arm-kernel, linux-kernel, Sasha Levin,
Greg Kroah-Hartman, stable, mark.rutland
On 23/06/2026 15:25, Will Deacon wrote:
> On Sun, Jun 21, 2026 at 05:02:27PM +0200, Ben Hutchings wrote:
>> On Tue, 2026-06-16 at 20:28 +0530, Greg Kroah-Hartman wrote:
>>> 6.1-stable review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Anshuman Khandual <anshuman.khandual@arm.com>
>>>
>>> [ Upstream commit 48478b9f791376b4b89018d7afdfd06865498f65 ]
>> [...]
>>> @@ -949,15 +953,14 @@ static void unmap_hotplug_pmd_range(pud_
>>> WARN_ON(!pmd_present(pmd));
>>> if (pmd_sect(pmd)) {
>>> pmd_clear(pmdp);
>>> -
>>> - /*
>>> - * One TLBI should be sufficient here as the PMD_SIZE
>>> - * range is mapped with a single block entry.
>>> - */
>>> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>>> - if (free_mapped)
>>> + if (free_mapped) {
>>> + /* CONT blocks are not supported in the vmemmap */
>>> + WARN_ON(pmd_cont(pmd));
>>> + flush_tlb_kernel_range(addr, addr + PMD_SIZE);
>>
>> It wasn't clear to me from the commit message why this now adds PMD_SIZE
>> rather than PAGE_SIZE. It seems like this change is fine for Linux
>> 6.13+ with a CPU that supports TLB range flushing, but otherwise results
>> in unnecessarily executing multiple TLB invalidations at intervals of
>> the base page size.
>
> Hmm, the commit message also makes very little sense to me and so I don't
> understand why this patch has us doing multiple TLB invalidations when
> we run into a !cont, block mapping at the PMD level. The old comment
> (which this patch removes) should still apply afaict.
>
> Anshuman, Ryan, any ideas what's going on here?
I think this change was probably my fault; Given the API is called
flush_tlb_kernel_range() it seemed like an abuse/hack to pretend we are only
flushing the first PAGE_SIZE of the range. But as I understand it, even if the
HW shatters a block mapping into multiple TLB entries, all of the entries
relating to the block mapping will be invalidated if just one of them intersects
the TLBI range/address. So it should be safe to reapply this hack.
Although ideally I think it would be better if this API took a stride argument;
then intent is clear.
What's the best way to handle this? Submit a patch for mainline that reverts
this part, then get it backported to stable (implying this current patch will
have been applied to stable)?
Thanks,
Ryan
>
> Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
2026-06-24 15:05 ` Ryan Roberts
@ 2026-06-24 16:29 ` Greg Kroah-Hartman
0 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2026-06-24 16:29 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Ben Hutchings, Anshuman Khandual, Catalin Marinas,
David Hildenbrand (Arm), patches, linux-arm-kernel, linux-kernel,
Sasha Levin, stable, mark.rutland
On Wed, Jun 24, 2026 at 04:05:01PM +0100, Ryan Roberts wrote:
> On 23/06/2026 15:25, Will Deacon wrote:
> > On Sun, Jun 21, 2026 at 05:02:27PM +0200, Ben Hutchings wrote:
> >> On Tue, 2026-06-16 at 20:28 +0530, Greg Kroah-Hartman wrote:
> >>> 6.1-stable review patch. If anyone has any objections, please let me know.
> >>>
> >>> ------------------
> >>>
> >>> From: Anshuman Khandual <anshuman.khandual@arm.com>
> >>>
> >>> [ Upstream commit 48478b9f791376b4b89018d7afdfd06865498f65 ]
> >> [...]
> >>> @@ -949,15 +953,14 @@ static void unmap_hotplug_pmd_range(pud_
> >>> WARN_ON(!pmd_present(pmd));
> >>> if (pmd_sect(pmd)) {
> >>> pmd_clear(pmdp);
> >>> -
> >>> - /*
> >>> - * One TLBI should be sufficient here as the PMD_SIZE
> >>> - * range is mapped with a single block entry.
> >>> - */
> >>> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> >>> - if (free_mapped)
> >>> + if (free_mapped) {
> >>> + /* CONT blocks are not supported in the vmemmap */
> >>> + WARN_ON(pmd_cont(pmd));
> >>> + flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> >>
> >> It wasn't clear to me from the commit message why this now adds PMD_SIZE
> >> rather than PAGE_SIZE. It seems like this change is fine for Linux
> >> 6.13+ with a CPU that supports TLB range flushing, but otherwise results
> >> in unnecessarily executing multiple TLB invalidations at intervals of
> >> the base page size.
> >
> > Hmm, the commit message also makes very little sense to me and so I don't
> > understand why this patch has us doing multiple TLB invalidations when
> > we run into a !cont, block mapping at the PMD level. The old comment
> > (which this patch removes) should still apply afaict.
> >
> > Anshuman, Ryan, any ideas what's going on here?
>
> I think this change was probably my fault; Given the API is called
> flush_tlb_kernel_range() it seemed like an abuse/hack to pretend we are only
> flushing the first PAGE_SIZE of the range. But as I understand it, even if the
> HW shatters a block mapping into multiple TLB entries, all of the entries
> relating to the block mapping will be invalidated if just one of them intersects
> the TLBI range/address. So it should be safe to reapply this hack.
>
> Although ideally I think it would be better if this API took a stride argument;
> then intent is clear.
>
> What's the best way to handle this? Submit a patch for mainline that reverts
> this part, then get it backported to stable (implying this current patch will
> have been applied to stable)?
yes, that's probably the best way.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-24 16:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260616145125.307082728@linuxfoundation.org>
2026-06-16 14:58 ` [PATCH 6.1 337/522] arm64/mm: Enable batched TLB flush in unmap_hotplug_range() Greg Kroah-Hartman
2026-06-21 15:02 ` Ben Hutchings
2026-06-23 14:25 ` Will Deacon
2026-06-24 15:05 ` Ryan Roberts
2026-06-24 16:29 ` Greg Kroah-Hartman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox