* [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
@ 2025-05-28 17:51 Jann Horn
2025-06-03 3:41 ` Andrew Morton
2025-06-17 8:12 ` kernel test robot
0 siblings, 2 replies; 10+ messages in thread
From: Jann Horn @ 2025-05-28 17:51 UTC (permalink / raw)
To: Muchun Song, Oscar Salvador, linux-mm, Andrew Morton
Cc: Lorenzo Stoakes, Jann Horn
Many distro kernels enable hugetlb support, but most systems running
those kernels never actually allocate hugepages or enable hugetlb
overcommit.
On such systems, hugetlb is unusable for any legitimate usecase, but it
is still possible to exercise a lot of hugetlb-specific code by creating
MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to
create page tables shared across processes.
This is exposed through the mmap() syscall, with no privileges required,
so from a security perspective, this is interesting attack surface.
Lock it down by completely denying creation of hugetlb files if no huge
pages for the hstate could be allocated without administratively
changing huge page limits.
hstate_is_enabled() is written based on documentation in
Documentation/admin-guide/sysctl/vm.rst and
Documentation/admin-guide/mm/hugetlbpage.rst , in particular this:
> nr_overcommit_hugepages
> =======================
>
> Change the maximum size of the hugepage pool. The maximum is
> nr_hugepages + nr_overcommit_hugepages.
and this:
> As long as this condition holds--that is, until
> ``nr_hugepages+nr_overcommit_hugepages`` is increased sufficiently, or
> the surplus huge pages go out of use and are freed-- no more surplus
> huge pages will be allowed to be allocated.
Note that, in the userspace API:
- `h->nr_overcommit_huge_pages` is called "nr_overcommit_hugepages"
- `h->max_huge_pages` is called "nr_hugepages"
I am not explicitly marking this for stable backport yet at this point,
but I will want to backport this once it's landed in a point release and
nobody's complained for a while.
Signed-off-by: Jann Horn <jannh@google.com>
---
@akpm: no rush with this one; probably makes sense to wait for an ack
from a hugetlb person before queueing it up, and then send it through
mm-unstable like a feature patch.
@Lorenzo: I'm just CCing you as an FYI in case you're interested, it
doesn't touch any code outside hugetlb
---
fs/hugetlbfs/inode.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index e4de5425838d..fc03dd541b4d 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1517,6 +1517,16 @@ static int get_hstate_idx(int page_size_log)
return hstate_index(h);
}
+static bool hstate_is_enabled(struct hstate *h)
+{
+ bool is_enabled;
+
+ spin_lock_irq(&hugetlb_lock);
+ is_enabled = h->nr_overcommit_huge_pages || h->max_huge_pages;
+ spin_unlock_irq(&hugetlb_lock);
+ return is_enabled;
+}
+
/*
* Note that size should be aligned to proper hugepage size in caller side,
* otherwise hugetlb_reserve_pages reserves one less hugepages than intended.
@@ -1549,6 +1559,15 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
return ERR_PTR(-EPERM);
}
+ /*
+ * If no hugetlb pages of this size are supposed to exist, then don't
+ * even allow creating a hugetlb file (even if the file has size 0 or
+ * userspace requests MAP_NORESERVE).
+ * This limits attack surface for systems that don't use hugetlb.
+ */
+ if (!hstate_is_enabled(HUGETLBFS_SB(mnt->mnt_sb)->hstate))
+ return ERR_PTR(-ENOMEM);
+
file = ERR_PTR(-ENOSPC);
/* hugetlbfs_vfsmount[] mounts do not use idmapped mounts. */
inode = hugetlbfs_get_inode(mnt->mnt_sb, &nop_mnt_idmap, NULL,
---
base-commit: b1456f6dc167f7f101746e495bede2bac3d0e19f
change-id: 20250524-hugetlb-nerf-cc125f7fc187
--
Jann Horn <jannh@google.com>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-05-28 17:51 [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up Jann Horn
@ 2025-06-03 3:41 ` Andrew Morton
2025-06-03 4:29 ` Jann Horn
2025-06-17 8:12 ` kernel test robot
1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2025-06-03 3:41 UTC (permalink / raw)
To: Jann Horn; +Cc: Muchun Song, Oscar Salvador, linux-mm, Lorenzo Stoakes
On Wed, 28 May 2025 19:51:29 +0200 Jann Horn <jannh@google.com> wrote:
> Many distro kernels enable hugetlb support, but most systems running
> those kernels never actually allocate hugepages or enable hugetlb
> overcommit.
>
> On such systems, hugetlb is unusable for any legitimate usecase, but it
> is still possible to exercise a lot of hugetlb-specific code by creating
> MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to
> create page tables shared across processes.
>
> This is exposed through the mmap() syscall, with no privileges required,
> so from a security perspective, this is interesting attack surface.
>
> Lock it down by completely denying creation of hugetlb files if no huge
> pages for the hstate could be allocated without administratively
> changing huge page limits.
So this is a non-backward-compatible change?
If any userspace is affected it's probably either stupid or evil, but I
do wonder if there are legit cases for doing this, such as "I don't
know if there are any hugepages configured, but I'll try this anyway
and figure out what to do later on". And maybe there are other legit
cases!
Clearly any such cases will be obscure but please let's consider the
worst-case effects upon existing userspace?
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1517,6 +1517,16 @@ static int get_hstate_idx(int page_size_log)
> return hstate_index(h);
> }
>
> +static bool hstate_is_enabled(struct hstate *h)
> +{
> + bool is_enabled;
> +
> + spin_lock_irq(&hugetlb_lock);
> + is_enabled = h->nr_overcommit_huge_pages || h->max_huge_pages;
> + spin_unlock_irq(&hugetlb_lock);
> + return is_enabled;
> +}
> +
> /*
> * Note that size should be aligned to proper hugepage size in caller side,
> * otherwise hugetlb_reserve_pages reserves one less hugepages than intended.
> @@ -1549,6 +1559,15 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
> return ERR_PTR(-EPERM);
> }
>
> + /*
> + * If no hugetlb pages of this size are supposed to exist, then don't
> + * even allow creating a hugetlb file (even if the file has size 0 or
> + * userspace requests MAP_NORESERVE).
> + * This limits attack surface for systems that don't use hugetlb.
> + */
> + if (!hstate_is_enabled(HUGETLBFS_SB(mnt->mnt_sb)->hstate))
> + return ERR_PTR(-ENOMEM);
> +
> file = ERR_PTR(-ENOSPC);
> /* hugetlbfs_vfsmount[] mounts do not use idmapped mounts. */
> inode = hugetlbfs_get_inode(mnt->mnt_sb, &nop_mnt_idmap, NULL,
Yes it's hard to imagine that this will cause damage...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-03 3:41 ` Andrew Morton
@ 2025-06-03 4:29 ` Jann Horn
2025-06-03 5:43 ` Oscar Salvador
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Jann Horn @ 2025-06-03 4:29 UTC (permalink / raw)
To: Andrew Morton; +Cc: Muchun Song, Oscar Salvador, linux-mm, Lorenzo Stoakes
On Tue, Jun 3, 2025 at 5:41 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 28 May 2025 19:51:29 +0200 Jann Horn <jannh@google.com> wrote:
> > Many distro kernels enable hugetlb support, but most systems running
> > those kernels never actually allocate hugepages or enable hugetlb
> > overcommit.
> >
> > On such systems, hugetlb is unusable for any legitimate usecase, but it
> > is still possible to exercise a lot of hugetlb-specific code by creating
> > MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to
> > create page tables shared across processes.
> >
> > This is exposed through the mmap() syscall, with no privileges required,
> > so from a security perspective, this is interesting attack surface.
> >
> > Lock it down by completely denying creation of hugetlb files if no huge
> > pages for the hstate could be allocated without administratively
> > changing huge page limits.
>
> So this is a non-backward-compatible change?
Yes, this change changes kernel behavior that is userspace-visible,
and causes syscalls to return errors where they worked before.
> If any userspace is affected it's probably either stupid or evil, but I
> do wonder if there are legit cases for doing this, such as "I don't
> know if there are any hugepages configured, but I'll try this anyway
> and figure out what to do later on". And maybe there are other legit
> cases!
Right. I think an affected case would be if userspace tries to detect
whether the kernel supports hugepages by creating a MAP_NORESERVE
mapping or huge memfd, and if that works, twiddles sysfs knobs to
actually allocate hugepages or shows a specific error message. Such a
program might end up wrongly assuming that the kernel does not support
hugepages. My understanding is that hugepages are normally
administratively configured so that they can be allocated early during
boot without having to worry about RAM fragmentation, in which case
this probably wouldn't happen, but it's not like I actually have a
good understanding of how typical hugetlb users work.
Another affected case would be if userspace confirms that the kernel
supports hugetlb through sysfs or such, then creates a MAP_NORESERVE
hugetlb and asserts that this must work because MAP_NORESERVE more or
less can't fail, and crashes with an assertion failure or such.
My understanding is that the combination of MAP_HUGETLB and
MAP_NORESERVE is somewhat rare in the first place; searching debian
codesearch for both flags on the same line, I basically only get one
hit in the "gridtools" package, though there might well be other cases
where the flags are set on separate lines. memfd_create(MFD_HUGETLB)
seems to be more common.
But yeah, I can't rule out that this would break something, and I sort
of hope that the hugetlb maintainers might have some idea how likely
such a scenario would be. If we think that there's a realistic chance
of breaking something with this, we shouldn't do this and I could try
to cook up a more limited patch that maybe only gates more specific
parts of hugetlb on this check in a less user-visible way (perhaps
bailing out earlier on hugetlb page faults); but I think that would
also reduce the utility of the patch somewhat.
I did think about whether this is the kind of borderline-breaking
change that should include a pr_warn_once() to inform the user that
their system encountered a specific behavioral difference due to a
kernel change, in case it does unexpectedly break something; I decided
against it, but if someone thinks this is sufficiently close to a
breaking change to warrant that, I'll add that.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-03 4:29 ` Jann Horn
@ 2025-06-03 5:43 ` Oscar Salvador
2025-06-03 19:14 ` Jann Horn
2025-06-04 2:54 ` Andrew Morton
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2025-06-03 5:43 UTC (permalink / raw)
To: Jann Horn; +Cc: Andrew Morton, Muchun Song, linux-mm, Lorenzo Stoakes
On Tue, Jun 03, 2025 at 06:29:24AM +0200, Jann Horn wrote:
> Yes, this change changes kernel behavior that is userspace-visible,
> and causes syscalls to return errors where they worked before.
Yes, that is what make me unease about this.
It is true that most of the hugetlb cases out there work on
pre-allocated pages, because the later it gets the harder to get large
pages from the system.
But as you say below, there might be applications out there that tweak
the sys knobs themselves, and with this change those might break.
Now, how valid are those? Heh, hard to anwser.
So I guess it boils down to how hard and effective is to actually exploit
whatever we manage to create by allowing this.
But if we take that route, I think that hinting the user about this behaviour
change is the right thing to do.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-03 5:43 ` Oscar Salvador
@ 2025-06-03 19:14 ` Jann Horn
0 siblings, 0 replies; 10+ messages in thread
From: Jann Horn @ 2025-06-03 19:14 UTC (permalink / raw)
To: Oscar Salvador; +Cc: Andrew Morton, Muchun Song, linux-mm, Lorenzo Stoakes
On Tue, Jun 3, 2025 at 7:43 AM Oscar Salvador <osalvador@suse.de> wrote:
> On Tue, Jun 03, 2025 at 06:29:24AM +0200, Jann Horn wrote:
> > Yes, this change changes kernel behavior that is userspace-visible,
> > and causes syscalls to return errors where they worked before.
>
> Yes, that is what make me unease about this.
>
> It is true that most of the hugetlb cases out there work on
> pre-allocated pages, because the later it gets the harder to get large
> pages from the system.
>
> But as you say below, there might be applications out there that tweak
> the sys knobs themselves, and with this change those might break.
> Now, how valid are those? Heh, hard to anwser.
>
> So I guess it boils down to how hard and effective is to actually exploit
> whatever we manage to create by allowing this.
> But if we take that route, I think that hinting the user about this behaviour
> change is the right thing to do.
I guess the unusual part of hugetlb are really mostly the shared page
tables, which are also the reason why walks of hugetlb page tables
require extra locks. I guess I'll think about this some more and maybe
send a slightly different take on this that only prevents page fault
handling from getting far enough to establish shared PMDs, or
something like that.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-03 4:29 ` Jann Horn
2025-06-03 5:43 ` Oscar Salvador
@ 2025-06-04 2:54 ` Andrew Morton
2025-06-16 22:09 ` Mark Brown
2025-06-17 9:13 ` David Hildenbrand
3 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2025-06-04 2:54 UTC (permalink / raw)
To: Jann Horn; +Cc: Muchun Song, Oscar Salvador, linux-mm, Lorenzo Stoakes
On Tue, 3 Jun 2025 06:29:24 +0200 Jann Horn <jannh@google.com> wrote:
> I did think about whether this is the kind of borderline-breaking
> change that should include a pr_warn_once() to inform the user that
> their system encountered a specific behavioral difference due to a
> kernel change, in case it does unexpectedly break something; I decided
> against it, but if someone thinks this is sufficiently close to a
> breaking change to warrant that, I'll add that.
We could not make the change at this time. Instead we emit a loud
"don't do this, we're taking it away" warning and see if anyone reports
it.
Or we could merge this change as-is, along with a warning to get
people's attention then run that in -next for a while.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-03 4:29 ` Jann Horn
2025-06-03 5:43 ` Oscar Salvador
2025-06-04 2:54 ` Andrew Morton
@ 2025-06-16 22:09 ` Mark Brown
2025-06-17 9:13 ` David Hildenbrand
3 siblings, 0 replies; 10+ messages in thread
From: Mark Brown @ 2025-06-16 22:09 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, Muchun Song, Oscar Salvador, linux-mm,
Lorenzo Stoakes, Aishwarya.TCV, Naresh Kamboju
[-- Attachment #1: Type: text/plain, Size: 5103 bytes --]
On Tue, Jun 03, 2025 at 06:29:24AM +0200, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 5:41 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Wed, 28 May 2025 19:51:29 +0200 Jann Horn <jannh@google.com> wrote:
> > > Lock it down by completely denying creation of hugetlb files if no huge
> > > pages for the hstate could be allocated without administratively
> > > changing huge page limits.
> > So this is a non-backward-compatible change?
> Yes, this change changes kernel behavior that is userspace-visible,
> and causes syscalls to return errors where they worked before.
> > If any userspace is affected it's probably either stupid or evil, but I
> > do wonder if there are legit cases for doing this, such as "I don't
> > know if there are any hugepages configured, but I'll try this anyway
> > and figure out what to do later on". And maybe there are other legit
> > cases!
> Right. I think an affected case would be if userspace tries to detect
> whether the kernel supports hugepages by creating a MAP_NORESERVE
> mapping or huge memfd, and if that works, twiddles sysfs knobs to
This does indeed cause at least the LTP memfd_create04 testcase to fall
over on systems with less memory, if it finds hugepages information in
/sys it explicitly creates a MFD_HUGETLB memfd at various sizes and then
just closes them without verifying that there are any actual pages at
any of the supported sizes. I don't think this is a particular problem,
it's pure test code rather than an actual use case, but seems worth
highlighting and people will notice LTP issues if this gets backported
to stable. It seems reasonable to update LTP here (probably with better
hugepages enumeration code?).
tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
tst_test.c:1904: TINFO: Tested kernel: 6.16.0-rc2-next-20250616 #1 SMP PREEMPT Mon Jun 16 07:16:11 UTC 2025 aarch64
tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
memfd_create04.c:66: TINFO: Attempt to create file using 64kB huge page size
memfd_create04.c:75: TFAIL: memfd_create() failed unexpectedly: ENOMEM (12)
The mm selftests might also be impacted here, some of them do a similar
check for supported huge page sizes by looking in /proc without checking
if there's any actual pages available. I should resurrect the thread
where I asked about that. Our CI only runs the mm selftests with
hugepages explicitly configured so wouldn't notice.
bisect log FWIW:
git bisect start
# status: waiting for both good and bad commits
# bad: [050f8ad7b58d9079455af171ac279c4b9b828c11] Add linux-next specific files for 20250616
git bisect bad 050f8ad7b58d9079455af171ac279c4b9b828c11
# status: waiting for good commit(s), bad commit known
# good: [d39a02e47d1e0fba70992a45ec6a6591268a11a8] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
git bisect good d39a02e47d1e0fba70992a45ec6a6591268a11a8
# bad: [852f31106c2bd9d7ca17ab766fcaf6ac5a96f037] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
git bisect bad 852f31106c2bd9d7ca17ab766fcaf6ac5a96f037
# bad: [c9b6136d8ef32de2441225e47dfa7b32342fb67b] Merge branch 'xtensa-for-next' of git://github.com/jcmvbkbc/linux-xtensa.git
git bisect bad c9b6136d8ef32de2441225e47dfa7b32342fb67b
# bad: [496dd87c6e8bdad67563009b6793c59178b117fa] Merge branch 'next' of https://github.com/Broadcom/stblinux.git
git bisect bad 496dd87c6e8bdad67563009b6793c59178b117fa
# bad: [5091252d7c0bb144d359fbfb5838adcea29f5601] Merge branch 'mm-nonmm-unstable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 5091252d7c0bb144d359fbfb5838adcea29f5601
# bad: [3e546482aa3d3a3bb0ee9bbaaaa396d7d57a2b63] drivers,cxl: use node-notifier instead of memory-notifier
git bisect bad 3e546482aa3d3a3bb0ee9bbaaaa396d7d57a2b63
# bad: [e5437627e74d7f610a296c215fc557e6b5436610] mm: rename CONFIG_PAGE_BLOCK_ORDER to CONFIG_PAGE_BLOCK_MAX_ORDER
git bisect bad e5437627e74d7f610a296c215fc557e6b5436610
# good: [c03ff8cc73ddad14444dde3384c34d3d4dfbc354] mm: ksm: have KSM VMA checks not require a VMA pointer
git bisect good c03ff8cc73ddad14444dde3384c34d3d4dfbc354
# bad: [1b8405a9525fafb31b0feb38e1be73aefaee9ce0] mm: Kconfig: use verb *use* in plural form in description
git bisect bad 1b8405a9525fafb31b0feb38e1be73aefaee9ce0
# good: [42da1e1d4762be8d70db9d8cefea7cd0fb6e3d15] mm/hugetlb: convert hugetlb_change_protection() to folios
git bisect good 42da1e1d4762be8d70db9d8cefea7cd0fb6e3d15
# bad: [326969ed54efc372555acbf726a28ce537c43525] mm/vmstat: make MEMCG select VM_EVENT_COUNTERS
git bisect bad 326969ed54efc372555acbf726a28ce537c43525
# bad: [3b0376d70ab0de716559b10fa8211743c9288529] hugetlb: block hugetlb file creation if hugetlb is not set up
git bisect bad 3b0376d70ab0de716559b10fa8211743c9288529
# first bad commit: [3b0376d70ab0de716559b10fa8211743c9288529] hugetlb: block hugetlb file creation if hugetlb is not set up
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-05-28 17:51 [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up Jann Horn
2025-06-03 3:41 ` Andrew Morton
@ 2025-06-17 8:12 ` kernel test robot
1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-06-17 8:12 UTC (permalink / raw)
To: Jann Horn
Cc: oe-lkp, lkp, linux-mm, ltp, Muchun Song, Oscar Salvador,
Andrew Morton, Lorenzo Stoakes, Jann Horn, oliver.sang
Hello,
kernel test robot noticed "ltp.memfd_create04.fail" on:
commit: d72db81adc14284db5f67ff331d876691b491985 ("[PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up")
url: https://github.com/intel-lab-lkp/linux/commits/Jann-Horn/hugetlb-block-hugetlb-file-creation-if-hugetlb-is-not-set-up/20250529-015217
patch link: https://lore.kernel.org/all/20250528-hugetlb-nerf-v1-1-a404ca33e819@google.com/
patch subject: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
in testcase: ltp
version: ltp-x86_64-99ebf35b3-1_20250607
with following parameters:
disk: 1HDD
fs: f2fs
test: syscalls-04
config: x86_64-rhel-9.4-ltp
compiler: gcc-12
test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506171557.b6192de8-lkp@intel.com
<<<test_start>>>
tag=memfd_create04 stime=1749692234
cmdline="memfd_create04"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:1953: TINFO: LTP version: 20250530-27-g99ebf35b3
tst_test.c:1956: TINFO: Tested kernel: 6.15.0-02199-gd72db81adc14 #1 SMP PREEMPT_DYNAMIC Wed Jun 4 12:19:07 CST 2025 x86_64
tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
tst_kconfig.c:676: TINFO: CONFIG_KASAN kernel option detected which might slow the execution
tst_test.c:1774: TINFO: Overall timeout per run is 0h 10m 00s
memfd_create04.c:64: TINFO: Attempt to create file using 64kB huge page size
memfd_create04.c:71: TPASS: Test failed as expected
memfd_create04.c:64: TINFO: Attempt to create file using 512kB huge page size
memfd_create04.c:71: TPASS: Test failed as expected
memfd_create04.c:64: TINFO: Attempt to create file using 2048kB huge page size
memfd_create04.c:73: TFAIL: memfd_create() failed unexpectedly: ENOMEM (12)
Summary:
passed 2
failed 1
broken 0
skipped 0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=1
<<<test_end>>>
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250617/202506171557.b6192de8-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-03 4:29 ` Jann Horn
` (2 preceding siblings ...)
2025-06-16 22:09 ` Mark Brown
@ 2025-06-17 9:13 ` David Hildenbrand
2025-06-17 15:35 ` Jann Horn
3 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:13 UTC (permalink / raw)
To: Jann Horn, Andrew Morton
Cc: Muchun Song, Oscar Salvador, linux-mm, Lorenzo Stoakes
On 03.06.25 06:29, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 5:41 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Wed, 28 May 2025 19:51:29 +0200 Jann Horn <jannh@google.com> wrote:
>>> Many distro kernels enable hugetlb support, but most systems running
>>> those kernels never actually allocate hugepages or enable hugetlb
>>> overcommit.
>>>
>>> On such systems, hugetlb is unusable for any legitimate usecase, but it
>>> is still possible to exercise a lot of hugetlb-specific code by creating
>>> MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to
>>> create page tables shared across processes.
>>>
>>> This is exposed through the mmap() syscall, with no privileges required,
>>> so from a security perspective, this is interesting attack surface.
>>>
>>> Lock it down by completely denying creation of hugetlb files if no huge
>>> pages for the hstate could be allocated without administratively
>>> changing huge page limits.
>>
>> So this is a non-backward-compatible change?
>
> Yes, this change changes kernel behavior that is userspace-visible,
> and causes syscalls to return errors where they worked before.
>
>> If any userspace is affected it's probably either stupid or evil, but I
>> do wonder if there are legit cases for doing this, such as "I don't
>> know if there are any hugepages configured, but I'll try this anyway
>> and figure out what to do later on". And maybe there are other legit
>> cases!
>
> Right. I think an affected case would be if userspace tries to detect
> whether the kernel supports hugepages by creating a MAP_NORESERVE
> mapping or huge memfd, and if that works, twiddles sysfs knobs to
> actually allocate hugepages or shows a specific error message. Such a
> program might end up wrongly assuming that the kernel does not support
> hugepages. My understanding is that hugepages are normally
> administratively configured so that they can be allocated early during
> boot without having to worry about RAM fragmentation, in which case
> this probably wouldn't happen, but it's not like I actually have a
> good understanding of how typical hugetlb users work.
>
> Another affected case would be if userspace confirms that the kernel
> supports hugetlb through sysfs or such, then creates a MAP_NORESERVE
> hugetlb and asserts that this must work because MAP_NORESERVE more or
> less can't fail, and crashes with an assertion failure or such.
>
> My understanding is that the combination of MAP_HUGETLB and
> MAP_NORESERVE is somewhat rare in the first place; searching debian
> codesearch for both flags on the same line, I basically only get one
> hit in the "gridtools" package, though there might well be other cases
> where the flags are set on separate lines. memfd_create(MFD_HUGETLB)
> seems to be more common.
QEMU can trigger this, and there might be corner-case use cases where
you setup a virtio-mem device (to hotplug memory later) when staring the
VM, but actually allocate the huge pages only when wanting to provide
them to the VM.
It's not that common, because usually you back all your VM through huge
pages, not just hotplugged memory.
But it's definitely possible ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
2025-06-17 9:13 ` David Hildenbrand
@ 2025-06-17 15:35 ` Jann Horn
0 siblings, 0 replies; 10+ messages in thread
From: Jann Horn @ 2025-06-17 15:35 UTC (permalink / raw)
To: David Hildenbrand, Mark Brown, Andrew Morton
Cc: Muchun Song, Oscar Salvador, linux-mm, Lorenzo Stoakes
On Tue, Jun 17, 2025 at 11:13 AM David Hildenbrand <david@redhat.com> wrote:
> On 03.06.25 06:29, Jann Horn wrote:
> > On Tue, Jun 3, 2025 at 5:41 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >> On Wed, 28 May 2025 19:51:29 +0200 Jann Horn <jannh@google.com> wrote:
> >>> Many distro kernels enable hugetlb support, but most systems running
> >>> those kernels never actually allocate hugepages or enable hugetlb
> >>> overcommit.
> >>>
> >>> On such systems, hugetlb is unusable for any legitimate usecase, but it
> >>> is still possible to exercise a lot of hugetlb-specific code by creating
> >>> MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to
> >>> create page tables shared across processes.
> >>>
> >>> This is exposed through the mmap() syscall, with no privileges required,
> >>> so from a security perspective, this is interesting attack surface.
> >>>
> >>> Lock it down by completely denying creation of hugetlb files if no huge
> >>> pages for the hstate could be allocated without administratively
> >>> changing huge page limits.
> >>
> >> So this is a non-backward-compatible change?
> >
> > Yes, this change changes kernel behavior that is userspace-visible,
> > and causes syscalls to return errors where they worked before.
> >
> >> If any userspace is affected it's probably either stupid or evil, but I
> >> do wonder if there are legit cases for doing this, such as "I don't
> >> know if there are any hugepages configured, but I'll try this anyway
> >> and figure out what to do later on". And maybe there are other legit
> >> cases!
> >
> > Right. I think an affected case would be if userspace tries to detect
> > whether the kernel supports hugepages by creating a MAP_NORESERVE
> > mapping or huge memfd, and if that works, twiddles sysfs knobs to
> > actually allocate hugepages or shows a specific error message. Such a
> > program might end up wrongly assuming that the kernel does not support
> > hugepages. My understanding is that hugepages are normally
> > administratively configured so that they can be allocated early during
> > boot without having to worry about RAM fragmentation, in which case
> > this probably wouldn't happen, but it's not like I actually have a
> > good understanding of how typical hugetlb users work.
> >
> > Another affected case would be if userspace confirms that the kernel
> > supports hugetlb through sysfs or such, then creates a MAP_NORESERVE
> > hugetlb and asserts that this must work because MAP_NORESERVE more or
> > less can't fail, and crashes with an assertion failure or such.
> >
> > My understanding is that the combination of MAP_HUGETLB and
> > MAP_NORESERVE is somewhat rare in the first place; searching debian
> > codesearch for both flags on the same line, I basically only get one
> > hit in the "gridtools" package, though there might well be other cases
> > where the flags are set on separate lines. memfd_create(MFD_HUGETLB)
> > seems to be more common.
>
> QEMU can trigger this, and there might be corner-case use cases where
> you setup a virtio-mem device (to hotplug memory later) when staring the
> VM, but actually allocate the huge pages only when wanting to provide
> them to the VM.
>
> It's not that common, because usually you back all your VM through huge
> pages, not just hotplugged memory.
>
> But it's definitely possible ...
Okay, yeah, sounds like we should drop this patch for now, and I might
come back with a more targeted mitigation later.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-06-17 15:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-28 17:51 [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up Jann Horn
2025-06-03 3:41 ` Andrew Morton
2025-06-03 4:29 ` Jann Horn
2025-06-03 5:43 ` Oscar Salvador
2025-06-03 19:14 ` Jann Horn
2025-06-04 2:54 ` Andrew Morton
2025-06-16 22:09 ` Mark Brown
2025-06-17 9:13 ` David Hildenbrand
2025-06-17 15:35 ` Jann Horn
2025-06-17 8:12 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).