* [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
@ 2026-06-01 12:50 Oscar Salvador
2026-06-01 20:02 ` Andrew Morton
2026-06-01 20:25 ` Dave Hansen
0 siblings, 2 replies; 7+ messages in thread
From: Oscar Salvador @ 2026-06-01 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Dave Hansen, Borislav Petkov, Karsten Desler, linux-kernel,
linux-mm, Oscar Salvador
On x86, arch_get_unmapped_area{_topdown} set align_offset in order to avoid
cache aliasing on I$ on AMD family 15h when 'align_va_addr' is enabled.
Prior to commit 7bd3f1e1a9ae ("mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags"),
we did not have to worry about that because hugetlb specific code did not set
align_offset, but above commit got rid of hugetlb specific code and started to route
hugetlb mappings through the generic interface.
Doing that has the effect of handling non-aligned hugetlb mappings to userspace,
which is plain wrong.
So, skip setting align_offset if we are dealing with a hugetlb mapping.
Fixes: 7bd3f1e1a9ae ("mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags")
Reported-by: Karsten Desler <kdesler@soohrt.org>
Closes: https://lore.kernel.org/linux-mm/20260527143643.GO31091@soohrt.org/
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
So, let me say two things:
1) Karsten tested below patch and reported it was working fine for him.
Did not stamp his Tested-by though, because it was not explicitly provided.
2) This is a hack, I know, and I should probably be flagellated for this but
since this is a regression, I went for the quick/easy-to-apply fix, so it can
be easily backported.
Having said that, I already made my mind to fix this in a better way, which would
involve getting rid of hugetlb-specific code and do the masking off as we do for
THP, but for that I need to refactor the code and that would not be so easy
to backported. Just so you understand the reasoning behind.
---
arch/x86/kernel/sys_x86_64.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 776ae6fa7f2d..60f876dce8e5 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -157,7 +157,12 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len,
}
if (filp) {
info.align_mask = get_align_mask(filp);
- info.align_offset += get_align_bits();
+ /*
+ * Hugepages must remain hugepage-aligned, so skip adding an offset
+ * in case we enabled 'align_va_addr'.
+ */
+ if (!is_file_hugepages(filp))
+ info.align_offset += get_align_bits();
}
return vm_unmapped_area(&info);
@@ -222,7 +227,12 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr0,
if (filp) {
info.align_mask = get_align_mask(filp);
- info.align_offset += get_align_bits();
+ /*
+ * Hugepages must remain hugepage-aligned, so skip adding an offset
+ * in case we enabled 'align_va_addr'.
+ */
+ if (!is_file_hugepages(filp))
+ info.align_offset += get_align_bits();
}
addr = vm_unmapped_area(&info);
if (!(addr & ~PAGE_MASK))
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
2026-06-01 12:50 [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings Oscar Salvador
@ 2026-06-01 20:02 ` Andrew Morton
2026-06-01 20:25 ` Dave Hansen
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2026-06-01 20:02 UTC (permalink / raw)
To: Oscar Salvador
Cc: Dave Hansen, Borislav Petkov, Karsten Desler, linux-kernel,
linux-mm
On Mon, 1 Jun 2026 14:50:15 +0200 Oscar Salvador <osalvador@suse.de> wrote:
> On x86, arch_get_unmapped_area{_topdown} set align_offset in order to avoid
> cache aliasing on I$ on AMD family 15h when 'align_va_addr' is enabled.
> Prior to commit 7bd3f1e1a9ae ("mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags"),
> we did not have to worry about that because hugetlb specific code did not set
> align_offset, but above commit got rid of hugetlb specific code and started to route
> hugetlb mappings through the generic interface.
> Doing that has the effect of handling non-aligned hugetlb mappings to userspace,
> which is plain wrong.
>
> So, skip setting align_offset if we are dealing with a hugetlb mapping.
Cool.
> Fixes: 7bd3f1e1a9ae ("mm: make hugetlb mappings go through mm_get_unmapped_area_vmflags")
> Reported-by: Karsten Desler <kdesler@soohrt.org>
> Closes: https://lore.kernel.org/linux-mm/20260527143643.GO31091@soohrt.org/
Ah, there it is. "the kernel later BUGs". We'd prefer it not do that so I'll cc:stable.
Thanks for including the link. And thanks Karsten for reporting and
testing.
> So, let me say two things:
> 1) Karsten tested below patch and reported it was working fine for him.
> Did not stamp his Tested-by though, because it was not explicitly provided.
This often happens - testers (well, confirmers) don't formally add the
tag. Often these people aren't regular kernel developers and simply
don't know the conventions. But if I see that email, I'll add the
Tested-by: because that's what happened.
> 2) This is a hack, I know, and I should probably be flagellated for this but
> since this is a regression, I went for the quick/easy-to-apply fix, so it can
> be easily backported.
That works (the patch, not the flagellation).
> Having said that, I already made my mind to fix this in a better way, which would
> involve getting rid of hugetlb-specific code and do the masking off as we do for
> THP, but for that I need to refactor the code and that would not be so easy
> to backported. Just so you understand the reasoning behind.
Great, thanks, flagellation is excused.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
2026-06-01 12:50 [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings Oscar Salvador
2026-06-01 20:02 ` Andrew Morton
@ 2026-06-01 20:25 ` Dave Hansen
2026-06-02 5:02 ` Oscar Salvador (SUSE)
2026-06-04 14:51 ` Oscar Salvador (SUSE)
1 sibling, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2026-06-01 20:25 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Dave Hansen, Borislav Petkov, Karsten Desler, linux-kernel,
linux-mm
On 6/1/26 05:50, Oscar Salvador wrote:
> On x86, arch_get_unmapped_area{_topdown} set align_offset in order to avoid
> cache aliasing on I$ on AMD family 15h when 'align_va_addr' is enabled.
I'm sorry, but this is a big NAK from the x86 side, without at _least_ a
substantially improved changelog and evidence that x86 is both the only
place to reasonably fix this _and_ evidence that no other arch is affected.
Why?
x86 is not the only architecture that uses .align_offset.
The bug was introduced in a generic, non-x86 commit (7bd3f1e1a9ae)
The real fix is almost surely in arch-generic code.
So my fear is that we'll apply the x86 thing. Four other architectures
will come along and have to add their own fixes. Then (if we're lucky)
someone will stick around and fix it properly, than all the
architectures will have to remove their hacks.
Worst-case, we end up with a latent bug, just without any testing
coverage because x86 testing is the most broad.
Oscar, even if the real fix involves a couple of patches, can we please
see that, first? Then, we can go back and make an informed decision
about hacks versus proper fixes.
Second, if there's a hack to be had, it should probably be to back out
the AMD optimization. That's absolutely an x86-only thing.
Third, I have zero issues with distros adding and carrying the x86
align_offset hack. That's what kernel distros are *for*.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
2026-06-01 20:25 ` Dave Hansen
@ 2026-06-02 5:02 ` Oscar Salvador (SUSE)
2026-06-02 13:26 ` Oscar Salvador (SUSE)
2026-06-04 14:51 ` Oscar Salvador (SUSE)
1 sibling, 1 reply; 7+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-06-02 5:02 UTC (permalink / raw)
To: Dave Hansen
Cc: Oscar Salvador, Andrew Morton, Dave Hansen, Borislav Petkov,
Karsten Desler, linux-kernel, linux-mm
On Mon, Jun 01, 2026 at 01:25:12PM -0700, Dave Hansen wrote:
> On 6/1/26 05:50, Oscar Salvador wrote:
> > On x86, arch_get_unmapped_area{_topdown} set align_offset in order to avoid
> > cache aliasing on I$ on AMD family 15h when 'align_va_addr' is enabled.
>
> I'm sorry, but this is a big NAK from the x86 side, without at _least_ a
> substantially improved changelog and evidence that x86 is both the only
> place to reasonably fix this _and_ evidence that no other arch is affected.
>
> Why?
>
> x86 is not the only architecture that uses .align_offset.
Yes, although we skip setting it for hugetlb mappings already some other
arches like sparc and s390.
> The bug was introduced in a generic, non-x86 commit (7bd3f1e1a9ae)
>
> The real fix is almost surely in arch-generic code.
>
> So my fear is that we'll apply the x86 thing. Four other architectures
> will come along and have to add their own fixes. Then (if we're lucky)
> someone will stick around and fix it properly, than all the
> architectures will have to remove their hacks.
Yes, I am not happy either with the current code, given that some arches have
to do this dance of skip-offset-if-hugetlb and feels _quite_ wrong to do
it at arch level, given that only hugetlb needs that and given that
hugetlb is not the only one needing aligned things (thp?).
> Worst-case, we end up with a latent bug, just without any testing
> coverage because x86 testing is the most broad.
>
> Oscar, even if the real fix involves a couple of patches, can we please
> see that, first? Then, we can go back and make an informed decision
> about hacks versus proper fixes.
Sure, as I already said I was not happy with the fix either, was just an
attempt to address the regression quicker.
Having said that, and proven that you are not happy either, let me take
the long(er) road and come up with a proper fix for everyone, so no more
hacks on arch level are needed.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
2026-06-02 5:02 ` Oscar Salvador (SUSE)
@ 2026-06-02 13:26 ` Oscar Salvador (SUSE)
0 siblings, 0 replies; 7+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-06-02 13:26 UTC (permalink / raw)
To: Dave Hansen
Cc: Oscar Salvador, Andrew Morton, Dave Hansen, Borislav Petkov,
Karsten Desler, linux-kernel, linux-mm
On Tue, Jun 02, 2026 at 07:02:22AM +0200, Oscar Salvador (SUSE) wrote:
> Sure, as I already said I was not happy with the fix either, was just an
> attempt to address the regression quicker.
> Having said that, and proven that you are not happy either, let me take
> the long(er) road and come up with a proper fix for everyone, so no more
> hacks on arch level are needed.
Alright, I do have something on the works already, just need to test
it more carefully and will post it.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
2026-06-01 20:25 ` Dave Hansen
2026-06-02 5:02 ` Oscar Salvador (SUSE)
@ 2026-06-04 14:51 ` Oscar Salvador (SUSE)
2026-06-04 20:38 ` Dave Hansen
1 sibling, 1 reply; 7+ messages in thread
From: Oscar Salvador (SUSE) @ 2026-06-04 14:51 UTC (permalink / raw)
To: Dave Hansen
Cc: Oscar Salvador, Andrew Morton, Dave Hansen, Borislav Petkov,
Karsten Desler, linux-kernel, linux-mm
On Mon, Jun 01, 2026 at 01:25:12PM -0700, Dave Hansen wrote:
> Oscar, even if the real fix involves a couple of patches, can we please
> see that, first? Then, we can go back and make an informed decision
> about hacks versus proper fixes.
Fix sits here [1].
It is quite straightforward, is arch-agnostic and I can confirm it fixes
the AMD regression.
It is part of a 8-patch series [2] (#1 is the fix and follow up are
cleanups).
I shall send it once I confirm that some more tests pass.
[1] https://github.com/leberus/linux/commit/fd6ac9d2af4d0acd27c6b2a840a6d9e6ad4bc4a6
[2] https://github.com/leberus/linux/ hugetlb-refactor-mask-align
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings
2026-06-04 14:51 ` Oscar Salvador (SUSE)
@ 2026-06-04 20:38 ` Dave Hansen
0 siblings, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2026-06-04 20:38 UTC (permalink / raw)
To: Oscar Salvador (SUSE)
Cc: Oscar Salvador, Andrew Morton, Dave Hansen, Borislav Petkov,
Karsten Desler, linux-kernel, linux-mm
On 6/4/26 07:51, Oscar Salvador (SUSE) wrote:
> On Mon, Jun 01, 2026 at 01:25:12PM -0700, Dave Hansen wrote:
>> Oscar, even if the real fix involves a couple of patches, can we please
>> see that, first? Then, we can go back and make an informed decision
>> about hacks versus proper fixes.
> Fix sits here [1].
> It is quite straightforward, is arch-agnostic and I can confirm it fixes
> the AMD regression.
> It is part of a 8-patch series [2] (#1 is the fix and follow up are
> cleanups).
So it does appear that the x86 workaround wasn't the best direction.
That patch looks fine. I find it a bit hard to parse. I'd probably say
something more along the lines of:
/*
* hugetlb mappings need to be huge page aligned.
* mm_get_unmapped_area_vmflags() only gives back PAGE_SIZE
* aligned areas.
*
* Extend the requested unmapped area for the worse case
* scenario: the address comes back and needs to be aligned
* up by one small page short of a large page.
*/
len += huge_page_size(h) - PAGE_SIZE;
Also, wouldn't it be more clear to just talk about the length in terms
of page sizes, not masks?
Also, do you even need this if()?
if (addr0 & ~huge_page_mask(h))
So the code looks like this:
pgoff = 0;
addr0 = mm_get_unmapped_area_vmflags(file, addr0,...
/* Align the address to the next huge_page_size boundary */
addr0 = ALIGN(addr0, huge_page_size(h));
If it's aligned, it won't do anything.
But I'm 100% ok with the approach. I *believe* the code you have is OK,
although I'm not feeling super confident that I've read the mask
calculations correctly.
So while I'd like some improvements, I'm OK with this:
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-04 20:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 12:50 [PATCH] arch,x86: Skip setting align_offset for hugetlb mappings Oscar Salvador
2026-06-01 20:02 ` Andrew Morton
2026-06-01 20:25 ` Dave Hansen
2026-06-02 5:02 ` Oscar Salvador (SUSE)
2026-06-02 13:26 ` Oscar Salvador (SUSE)
2026-06-04 14:51 ` Oscar Salvador (SUSE)
2026-06-04 20:38 ` Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox