* [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
@ 2026-05-27 14:36 Karsten Desler
2026-05-27 15:53 ` Oscar Salvador (SUSE)
0 siblings, 1 reply; 10+ messages in thread
From: Karsten Desler @ 2026-05-27 14:36 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
Hi,
I found a reproducible hugetlb regression on an AMD Family 15h system.
On some boots, mmap(MAP_HUGETLB) returns a virtual address that is not aligned
to the hugepage size. The mapping is nevertheless installed as a hugetlb VMA.
When the process exits, the kernel later BUGs in __unmap_hugepage_range().
6.18.33 x86_64, AMD opteron 6238, 2M hugepages
Example bad mapping captured from /proc/$pid/maps:
7fc67f604000-7fc67f804000 rw-p 00000000 00:0f 12340 /anon_hugepage (deleted)
The address has offset 0x4000 within a 2 MiB hugepage.
smaps confirms it is really hugetlb:
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Private_Hugetlb: 2048 kB
VmFlags: rd wr mr mw me de ht
Minimal reproducer:
echo 1000 > /proc/sys/vm/nr_hugepages
mmap(NULL, 1229824, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE|MAP_HUGETLB, -1, 0)
On bad boots this returns e.g.:
mmap returned 0x7fc67f604000 aligned=no offset=16384
and exiting the process triggers:
Kernel BUG at __unmap_hugepage_range+0x5ef/0x640
RIP: __unmap_hugepage_range+0x5ef/0x640
Fixing recursive fault but reboot is needed!
The following is AI work, sorry if that's total BS but at the very least,
I can reproduce the kernelBUG and booting with
align_va_addr=off
works around the issue.
This is boot-dependent. Some boots work, some fail. The reason appears
to be the per-boot AMD F15h VA alignment offset.
The old x86 hugetlb path in arch/x86/mm/hugetlbpage.c only set:
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
It did not add the AMD F15h align offset.
After the v6.13-rc1 hugetlb mmap rework, hugetlb mappings go through
arch_get_unmapped_area*(), and x86 currently does:
if (filp) {
info.align_mask = get_align_mask(filp);
info.align_offset += get_align_bits();
}
For hugetlb, get_align_mask(filp) correctly returns the hugepage alignment
mask, but get_align_bits() can still return the AMD F15h per-boot offset,
e.g. 0x4000. That produces a non-hugepage-aligned hugetlb VMA.
Likely introduced by the v6.13-rc1 series:
1317a5e7f7b1 arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappings
7bd3f1e1a9ae mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags
cc92882ee218 mm: drop hugetlb_get_unmapped_area{_*} functions
AI suggests passing filp to get_align_bits and doing
if (filp && is_file_hugepages(filp))
return 0;
Best regards,
Karsten
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-27 14:36 [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment Karsten Desler
@ 2026-05-27 15:53 ` Oscar Salvador (SUSE)
2026-05-27 18:28 ` Oscar Salvador (SUSE)
0 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-05-27 15:53 UTC (permalink / raw)
To: Karsten Desler
Cc: linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
On Wed, May 27, 2026 at 04:36:43PM +0200, Karsten Desler wrote:
> Hi,
>
> I found a reproducible hugetlb regression on an AMD Family 15h system.
>
> On some boots, mmap(MAP_HUGETLB) returns a virtual address that is not aligned
> to the hugepage size. The mapping is nevertheless installed as a hugetlb VMA.
> When the process exits, the kernel later BUGs in __unmap_hugepage_range().
>
> 6.18.33 x86_64, AMD opteron 6238, 2M hugepages
Thanks Kartsten for reporting this.
Ooops, that would be me.
> Example bad mapping captured from /proc/$pid/maps:
>
> 7fc67f604000-7fc67f804000 rw-p 00000000 00:0f 12340 /anon_hugepage (deleted)
>
> The address has offset 0x4000 within a 2 MiB hugepage.
>
> smaps confirms it is really hugetlb:
>
> KernelPageSize: 2048 kB
> MMUPageSize: 2048 kB
> Private_Hugetlb: 2048 kB
> VmFlags: rd wr mr mw me de ht
>
> Minimal reproducer:
>
> echo 1000 > /proc/sys/vm/nr_hugepages
>
> mmap(NULL, 1229824, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE|MAP_HUGETLB, -1, 0)
>
> On bad boots this returns e.g.:
>
> mmap returned 0x7fc67f604000 aligned=no offset=16384
>
> and exiting the process triggers:
>
> Kernel BUG at __unmap_hugepage_range+0x5ef/0x640
> RIP: __unmap_hugepage_range+0x5ef/0x640
> Fixing recursive fault but reboot is needed!
>
> The following is AI work, sorry if that's total BS but at the very least,
> I can reproduce the kernelBUG and booting with
> align_va_addr=off
> works around the issue.
>
> This is boot-dependent. Some boots work, some fail. The reason appears
> to be the per-boot AMD F15h VA alignment offset.
I have to confess that I completely overlooked that scenario, so let me
apologyze.
> The old x86 hugetlb path in arch/x86/mm/hugetlbpage.c only set:
>
> info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>
> It did not add the AMD F15h align offset.
>
> After the v6.13-rc1 hugetlb mmap rework, hugetlb mappings go through
> arch_get_unmapped_area*(), and x86 currently does:
>
> if (filp) {
> info.align_mask = get_align_mask(filp);
> info.align_offset += get_align_bits();
> }
Ok, I see.
>
> For hugetlb, get_align_mask(filp) correctly returns the hugepage alignment
> mask, but get_align_bits() can still return the AMD F15h per-boot offset,
> e.g. 0x4000. That produces a non-hugepage-aligned hugetlb VMA.
>
> Likely introduced by the v6.13-rc1 series:
>
> 1317a5e7f7b1 arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappings
> 7bd3f1e1a9ae mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags
> cc92882ee218 mm: drop hugetlb_get_unmapped_area{_*} functions
Yes, that was part of a refactoring I did some time ago.
I will fix it up later today/early tomorrow.
Would you be available for a quick test once I have the patch?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-27 15:53 ` Oscar Salvador (SUSE)
@ 2026-05-27 18:28 ` Oscar Salvador (SUSE)
2026-05-27 20:39 ` Karsten Desler
2026-05-27 21:04 ` Dave Hansen
0 siblings, 2 replies; 10+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-05-27 18:28 UTC (permalink / raw)
To: Karsten Desler
Cc: linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
On Wed, May 27, 2026 at 05:53:06PM +0200, Oscar Salvador (SUSE) wrote:
> On Wed, May 27, 2026 at 04:36:43PM +0200, Karsten Desler wrote:
> >
> > For hugetlb, get_align_mask(filp) correctly returns the hugepage alignment
> > mask, but get_align_bits() can still return the AMD F15h per-boot offset,
> > e.g. 0x4000. That produces a non-hugepage-aligned hugetlb VMA.
> >
> > Likely introduced by the v6.13-rc1 series:
> >
> > 1317a5e7f7b1 arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappings
> > 7bd3f1e1a9ae mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags
> > cc92882ee218 mm: drop hugetlb_get_unmapped_area{_*} functions
>
> Yes, that was part of a refactoring I did some time ago.
>
> I will fix it up later today/early tomorrow.
>
> Would you be available for a quick test once I have the patch?
Maybe this? Can you give it a shot?
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 776ae6fa7f2d..60f876dce8e5 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -157,7 +157,12 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len,
}
if (filp) {
info.align_mask = get_align_mask(filp);
- info.align_offset += get_align_bits();
+ /*
+ * Hugepages must remain hugepage-aligned, so skip adding an offset
+ * in case we enabled 'align_va_addr'.
+ */
+ if (!is_file_hugepages(filp))
+ info.align_offset += get_align_bits();
}
return vm_unmapped_area(&info);
@@ -222,7 +227,12 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr0,
if (filp) {
info.align_mask = get_align_mask(filp);
- info.align_offset += get_align_bits();
+ /*
+ * Hugepages must remain hugepage-aligned, so skip adding an offset
+ * in case we enabled 'align_va_addr'.
+ */
+ if (!is_file_hugepages(filp))
+ info.align_offset += get_align_bits();
}
addr = vm_unmapped_area(&info);
if (!(addr & ~PAGE_MASK))
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-27 18:28 ` Oscar Salvador (SUSE)
@ 2026-05-27 20:39 ` Karsten Desler
2026-05-27 21:04 ` Dave Hansen
1 sibling, 0 replies; 10+ messages in thread
From: Karsten Desler @ 2026-05-27 20:39 UTC (permalink / raw)
To: Oscar Salvador (SUSE)
Cc: linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
* Oscar Salvador (SUSE) wrote:
> Maybe this? Can you give it a shot?
Thanks, applied and tested across 5 reboots.
All mmaps return properly aligned hugepages, don't cause kernelBUGs
on exit and show the expected munmap behavior.
mmap(NULL, 1229824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE|MAP_HUGETLB, -1, 0) = 0x7f7e49a00000
munmap(0x7f7e49a00000, 1229824) = -1 EINVAL (Invalid argument)
munmap(0x7f7e49a00000, 2097152) = 0
Best regards,
Karsten
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-27 18:28 ` Oscar Salvador (SUSE)
2026-05-27 20:39 ` Karsten Desler
@ 2026-05-27 21:04 ` Dave Hansen
2026-05-28 5:45 ` Oscar Salvador (SUSE)
1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2026-05-27 21:04 UTC (permalink / raw)
To: Oscar Salvador (SUSE), Karsten Desler
Cc: linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
On 5/27/26 11:28, Oscar Salvador (SUSE) wrote:
> if (filp) {
> info.align_mask = get_align_mask(filp);
> - info.align_offset += get_align_bits();
> + /*
> + * Hugepages must remain hugepage-aligned, so skip adding an offset
> + * in case we enabled 'align_va_addr'.
> + */
> + if (!is_file_hugepages(filp))
> + info.align_offset += get_align_bits();
> }
That's a good hack to show the scope of the problem.
But I'd really rather this be dealt with in the arch-independent code,
not by adding hugetlb hacks to arch code. It isn't even clear to me what
exactly goes wrong when you set a tiny ->align_offset and have a larger
->align_mask. Shouldn't the tiny offset just get masked off?
gap += (info->align_offset - gap) & info->align_mask;
I spent a whole five seconds looking at it, but something seems to be
missing from the problem description.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-27 21:04 ` Dave Hansen
@ 2026-05-28 5:45 ` Oscar Salvador (SUSE)
2026-05-28 12:45 ` Oscar Salvador (SUSE)
0 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-05-28 5:45 UTC (permalink / raw)
To: Dave Hansen
Cc: Karsten Desler, linux-mm, linux-kernel, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
On Wed, May 27, 2026 at 02:04:10PM -0700, Dave Hansen wrote:
> On 5/27/26 11:28, Oscar Salvador (SUSE) wrote:
> > if (filp) {
> > info.align_mask = get_align_mask(filp);
> > - info.align_offset += get_align_bits();
> > + /*
> > + * Hugepages must remain hugepage-aligned, so skip adding an offset
> > + * in case we enabled 'align_va_addr'.
> > + */
> > + if (!is_file_hugepages(filp))
> > + info.align_offset += get_align_bits();
> > }
>
> That's a good hack to show the scope of the problem.
Haha, do not worry, I myself have 0 interestin spreading hugetlb-specific
code around (on the contrary), but I wanted to proof the point.
>
> But I'd really rather this be dealt with in the arch-independent code,
> not by adding hugetlb hacks to arch code. It isn't even clear to me what
> exactly goes wrong when you set a tiny ->align_offset and have a larger
> ->align_mask. Shouldn't the tiny offset just get masked off?
>
> gap += (info->align_offset - gap) & info->align_mask;
I would assume so, but something is definitely going in with the calculation,
as I could reproduce this if I artificially set align_offset to 0x1000 for
hugetlb mappings.
I will find some time today to have a deep look.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-28 5:45 ` Oscar Salvador (SUSE)
@ 2026-05-28 12:45 ` Oscar Salvador (SUSE)
2026-05-28 14:03 ` Oscar Salvador (SUSE)
0 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-05-28 12:45 UTC (permalink / raw)
To: Dave Hansen
Cc: Karsten Desler, linux-mm, linux-kernel, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
On Thu, May 28, 2026 at 07:45:24AM +0200, Oscar Salvador (SUSE) wrote:
> On Wed, May 27, 2026 at 02:04:10PM -0700, Dave Hansen wrote:
> > On 5/27/26 11:28, Oscar Salvador (SUSE) wrote:
> > > if (filp) {
> > > info.align_mask = get_align_mask(filp);
> > > - info.align_offset += get_align_bits();
> > > + /*
> > > + * Hugepages must remain hugepage-aligned, so skip adding an offset
> > > + * in case we enabled 'align_va_addr'.
> > > + */
> > > + if (!is_file_hugepages(filp))
> > > + info.align_offset += get_align_bits();
> > > }
> >
> > That's a good hack to show the scope of the problem.
>
> Haha, do not worry, I myself have 0 interestin spreading hugetlb-specific
> code around (on the contrary), but I wanted to proof the point.
>
> >
> > But I'd really rather this be dealt with in the arch-independent code,
> > not by adding hugetlb hacks to arch code. It isn't even clear to me what
> > exactly goes wrong when you set a tiny ->align_offset and have a larger
> > ->align_mask. Shouldn't the tiny offset just get masked off?
> >
> > gap += (info->align_offset - gap) & info->align_mask;
Ok, I finally got to it.
So, let us assume we ask for a 2MB hugetlb page.
~huge_page_mask = 0x1fffff
huge_page_mask_align = PAGE_MASK & ~huge_page_mask(hstate_file(file)) = 0x1ff000
unmapped_area_topdown()
info->length = 0x200000 (2MB)
info->align_mask = 0x1ff000
/* Adjust search length to account for worst case alignment overhead */
total_gap_lenght_requested = info->length + info->align_mask = 0x3ff000
We find a gap: 0x7f28cfb10000 - 0x7f28cfd10000 (2MB) and assuming align_offset is = 0:
gap -= (gap - info->align_offset) & info->align_mask
0x7f28cfb10000 -= 0x7f28cfb10000 & 0x1ff000 = 7f28cfa00000 (2MB aligned)
IIUC, we mask what we got with align_mask to know how much we need to substract in
order to be properly aligned (and since we already accounted for extra length before,
we are sure we do not overstep anything below).
Now, I have to acknowledge that I had to look at the code several times, because
it was not clear to me why we were not just masking off 2MB, but I guess if we do we
lose whatever align_offset gives us (if smaller than align_mask).
I mean, code was already like that before my refactoring, just that back
then we did "align_offset = 0" unilaterally for hugetlb mappings.
The only thing bugging is, should not the same happen for THP-file-backed mappings?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-28 12:45 ` Oscar Salvador (SUSE)
@ 2026-05-28 14:03 ` Oscar Salvador (SUSE)
2026-05-28 15:31 ` Borislav Petkov
0 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-05-28 14:03 UTC (permalink / raw)
To: Dave Hansen
Cc: Karsten Desler, linux-mm, linux-kernel, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
On Thu, May 28, 2026 at 02:45:01PM +0200, Oscar Salvador (SUSE) wrote:
> The only thing bugging is, should not the same happen for THP-file-backed mappings?
Aha, no, for THP we do not set align_mask (at least on x86), and the masking off is being
done in __thp_get_unmapped_area().
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-28 14:03 ` Oscar Salvador (SUSE)
@ 2026-05-28 15:31 ` Borislav Petkov
2026-05-28 18:29 ` Oscar Salvador (SUSE)
0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2026-05-28 15:31 UTC (permalink / raw)
To: Oscar Salvador (SUSE)
Cc: Dave Hansen, Karsten Desler, linux-mm, linux-kernel,
Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin
On Thu, May 28, 2026 at 04:03:12PM +0200, Oscar Salvador (SUSE) wrote:
> On Thu, May 28, 2026 at 02:45:01PM +0200, Oscar Salvador (SUSE) wrote:
> > The only thing bugging is, should not the same happen for THP-file-backed mappings?
>
> Aha, no, for THP we do not set align_mask (at least on x86), and the masking off is being
> done in __thp_get_unmapped_area().
I hope you found this in the proces:
dfb09f9b7ab0 ("x86, amd: Avoid cache aliasing penalties on AMD family 15h")
It is from a different century tho and perhaps not too relevant but I thought
I should mention it...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment
2026-05-28 15:31 ` Borislav Petkov
@ 2026-05-28 18:29 ` Oscar Salvador (SUSE)
0 siblings, 0 replies; 10+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-05-28 18:29 UTC (permalink / raw)
To: Borislav Petkov
Cc: Dave Hansen, Karsten Desler, linux-mm, linux-kernel,
Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin
On Thu, May 28, 2026 at 08:31:18AM -0700, Borislav Petkov wrote:
> On Thu, May 28, 2026 at 04:03:12PM +0200, Oscar Salvador (SUSE) wrote:
> > On Thu, May 28, 2026 at 02:45:01PM +0200, Oscar Salvador (SUSE) wrote:
> > > The only thing bugging is, should not the same happen for THP-file-backed mappings?
> >
> > Aha, no, for THP we do not set align_mask (at least on x86), and the masking off is being
> > done in __thp_get_unmapped_area().
>
> I hope you found this in the proces:
>
> dfb09f9b7ab0 ("x86, amd: Avoid cache aliasing penalties on AMD family 15h")
Ei Boris, thanks for pointing it out.
Actually, checking the other arches that have their own get_align_mask()
for setting the mask (s390 and sparc), they both skip info.align_offset
if we are dealing with hugetlb, e.g: s390:
info.align_mask = get_align_mask(filp, flags);
if (!(filp && is_file_hugepages(filp)))
info.align_offset = pgoff << PAGE_SHIFT;
So, maybe for the time being we can do the same in x86 in order to fix the
regression (although the refactoring is 2 years old and first time we heard
about it was yesterday) and then we can think of a nicer way to handle this
in non-arch code so s390 and sparc would get cleaned up as well.
Thoughts?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-28 18:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 14:36 [REGRESSION] x86/hugetlb: AMD F15h VA alignment offset breaks MAP_HUGETLB alignment Karsten Desler
2026-05-27 15:53 ` Oscar Salvador (SUSE)
2026-05-27 18:28 ` Oscar Salvador (SUSE)
2026-05-27 20:39 ` Karsten Desler
2026-05-27 21:04 ` Dave Hansen
2026-05-28 5:45 ` Oscar Salvador (SUSE)
2026-05-28 12:45 ` Oscar Salvador (SUSE)
2026-05-28 14:03 ` Oscar Salvador (SUSE)
2026-05-28 15:31 ` Borislav Petkov
2026-05-28 18:29 ` Oscar Salvador (SUSE)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox