* [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr @ 2026-04-07 8:14 Denis M. Karpov 2026-04-08 3:21 ` Harry Yoo (Oracle) 2026-04-08 12:36 ` Usama Arif 0 siblings, 2 replies; 9+ messages in thread From: Denis M. Karpov @ 2026-04-07 8:14 UTC (permalink / raw) To: rppt, akpm, Liam.Howlett, ljs Cc: vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel, Denis M . Karpov The current implementation of validate_range() in fs/userfaultfd.c performs a hard check against mmap_min_addr without considering capabilities, but the mmap() syscall uses security_mmap_addr() which allows privileged processes (with CAP_SYS_RAWIO) to map below mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses dac_mmap_min_addr variable which can be changed with /proc/sys/vm/mmap_min_addr. Because userfaultfd uses a different check, UFFDIO_REGISTER may fail with -EINVAL for valid memory areas that were successfully mapped below mmap_min_addr even with appropriate capabilities. This prevents apps like binary compilers from using UFFD for valid memory regions mapped by application. Replace the rigid mmap_min_addr check with security_mmap_addr() to align userfaultfd with the standard kernel memory mapping security policy. Signed-off-by: Denis M. Karpov <komlomal@gmail.com> --- Initial RFC following the discussion on the [BUG] thread. Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ --- fs/userfaultfd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index bdc84e521..dbfe5b2a0 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( return -EINVAL; if (!len) return -EINVAL; - if (start < mmap_min_addr) - return -EINVAL; if (start >= task_size) return -EINVAL; if (len > task_size - start) return -EINVAL; if (start + len <= start) return -EINVAL; - return 0; + return security_mmap_addr(start); } static __always_inline int validate_range(struct mm_struct *mm, -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-07 8:14 [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr Denis M. Karpov @ 2026-04-08 3:21 ` Harry Yoo (Oracle) 2026-04-08 8:09 ` Denis M. Karpov 2026-04-08 12:36 ` Usama Arif 1 sibling, 1 reply; 9+ messages in thread From: Harry Yoo (Oracle) @ 2026-04-08 3:21 UTC (permalink / raw) To: Denis M. Karpov Cc: rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote: > The current implementation of validate_range() in fs/userfaultfd.c > performs a hard check against mmap_min_addr without considering > capabilities, but the mmap() syscall uses security_mmap_addr() > which allows privileged processes (with CAP_SYS_RAWIO) to map below > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > dac_mmap_min_addr variable which can be changed with > /proc/sys/vm/mmap_min_addr. > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > with -EINVAL for valid memory areas that were successfully mapped > below mmap_min_addr even with appropriate capabilities. > > This prevents apps like binary compilers from using UFFD for valid memory > regions mapped by application. > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > userfaultfd with the standard kernel memory mapping security policy. Perhaps worth adding Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > --- > fs/userfaultfd.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index bdc84e521..dbfe5b2a0 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > return -EINVAL; > if (!len) > return -EINVAL; > - if (start < mmap_min_addr) > - return -EINVAL; > if (start >= task_size) > return -EINVAL; > if (len > task_size - start) > return -EINVAL; > if (start + len <= start) > return -EINVAL; > - return 0; > + return security_mmap_addr(start); Hmm but it looks bit strange to check capability for address that is already mapped by mmap(). Why is this required? > } -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-08 3:21 ` Harry Yoo (Oracle) @ 2026-04-08 8:09 ` Denis M. Karpov 2026-04-09 2:51 ` Harry Yoo (Oracle) 0 siblings, 1 reply; 9+ messages in thread From: Denis M. Karpov @ 2026-04-08 8:09 UTC (permalink / raw) To: Harry Yoo (Oracle), Andrea Arcangeli Cc: rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel > Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Thank you, I will add this Fixes tag in the next patch. > Hmm but it looks bit strange to check capability for address that is > already mapped by mmap(). Why is this required? Actually, it's not obvious to me either, but I may miss something. My intent was to replace the current restrictive check with a more flexible one. I think performing this check here allows us to deny invalid requests early, before locks or VMA lookups occur. Removing this check entirely would also allow using UFFD in cases where a task drops privileges after the initial mmap(). This seems reasonable because the VMA already exists, i.e. kernel already allowed this mapping. In the [BUG] thread discussion Andrea Arcangeli also suggested adding a check for FIRST_USER_ADDRESS to handle architectural constraints. Andrea, could you please comment on this? Specifically, would a check against FIRST_USER_ADDRESS sufficient here, or do we still need to check caps? On Wed, Apr 8, 2026 at 6:21 AM Harry Yoo (Oracle) <harry@kernel.org> wrote: > > On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote: > > The current implementation of validate_range() in fs/userfaultfd.c > > performs a hard check against mmap_min_addr without considering > > capabilities, but the mmap() syscall uses security_mmap_addr() > > which allows privileged processes (with CAP_SYS_RAWIO) to map below > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > > dac_mmap_min_addr variable which can be changed with > > /proc/sys/vm/mmap_min_addr. > > > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > > with -EINVAL for valid memory areas that were successfully mapped > > below mmap_min_addr even with appropriate capabilities. > > > > This prevents apps like binary compilers from using UFFD for valid memory > > regions mapped by application. > > > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > > userfaultfd with the standard kernel memory mapping security policy. > > Perhaps worth adding > > Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") > > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > > > --- > > fs/userfaultfd.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index bdc84e521..dbfe5b2a0 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > > return -EINVAL; > > if (!len) > > return -EINVAL; > > - if (start < mmap_min_addr) > > - return -EINVAL; > > if (start >= task_size) > > return -EINVAL; > > if (len > task_size - start) > > return -EINVAL; > > if (start + len <= start) > > return -EINVAL; > > - return 0; > > + return security_mmap_addr(start); > > Hmm but it looks bit strange to check capability for address that is > already mapped by mmap(). Why is this required? > > > } > > -- > Cheers, > Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-08 8:09 ` Denis M. Karpov @ 2026-04-09 2:51 ` Harry Yoo (Oracle) 2026-04-09 7:58 ` Lorenzo Stoakes 0 siblings, 1 reply; 9+ messages in thread From: Harry Yoo (Oracle) @ 2026-04-09 2:51 UTC (permalink / raw) To: Denis M. Karpov Cc: Andrea Arcangeli, rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel On Wed, Apr 08, 2026 at 11:09:00AM +0300, Denis M. Karpov wrote: > > Hmm but it looks bit strange to check capability for address that is > > already mapped by mmap(). Why is this required? > > Actually, it's not obvious to me either, but I may miss something. > My intent was to replace the current restrictive check with a more flexible one. Technically, it's less restrictive only if start < mmap_min_addr (setting aside the discussion of whether this is an appropriate check). Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process should have the capability when registering an existing VMA to userfaultfd) > I think performing this check here allows us to deny invalid requests early, > before locks or VMA lookups occur. But we're not trying to optimize it and we shouldn't add checks without a proper explanation for the sake of optimization. > Removing this check entirely would also allow using UFFD in cases where a task > drops privileges after the initial mmap(). This seems reasonable because the > VMA already exists, i.e. kernel already allowed this mapping. Yeah, that seems reasonable to me. IOW, I don't think "creating a VMA on a specific address (w/ proper capabilities) is okay but once it is registered to userfaultfd, it becomes a security hole" is a valid argument. And we don't unmap those mappings when the process loses the capability to map them anyway. > In the [BUG] thread discussion Was it a private discussion? I can't find Andrea's emails on the thread. > Andrea Arcangeli also suggested adding a check for > FIRST_USER_ADDRESS to handle architectural constraints. Again, what's the point of checking this on the VMA that is already created? *checks why FIRST_USER_ADDRESS was introduced* commit e2cdef8c847b480529b7e26991926aab4be008e6 Author: Hugh Dickins <hugh@veritas.com> Date: Tue Apr 19 13:29:19 2005 -0700 [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS The patches to free_pgtables by vma left problems on any architectures which leave some user address page table entries unencapsulated by vma. Andi has fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose first PAGE_SIZE is reserved (perhaps) for machine vectors. Our calls to free_pgtables must not touch that area, and exit_mmap's BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have allocated an extra page table, which its free_pgd_slow would free later. FIRST_USER_PGD_NR has misled me and others: until all the arches define FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This patch fixes the bugs, the remaining patches just clean it up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Oh, ok. there might be a raw mapping without VMA below FIRST_USER_ADDRESS. Adding such a check wouldn't hurt... but if there is no VMA, you can't register the range to userfaultfd anyway? > Andrea, could you please comment on this? Specifically, would a > check against FIRST_USER_ADDRESS sufficient here, or do we still > need to check caps? > > On Wed, Apr 8, 2026 at 6:21 AM Harry Yoo (Oracle) <harry@kernel.org> wrote: > > > > On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote: > > > The current implementation of validate_range() in fs/userfaultfd.c > > > performs a hard check against mmap_min_addr without considering > > > capabilities, but the mmap() syscall uses security_mmap_addr() > > > which allows privileged processes (with CAP_SYS_RAWIO) to map below > > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > > > dac_mmap_min_addr variable which can be changed with > > > /proc/sys/vm/mmap_min_addr. > > > > > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > > > with -EINVAL for valid memory areas that were successfully mapped > > > below mmap_min_addr even with appropriate capabilities. > > > > > > This prevents apps like binary compilers from using UFFD for valid memory > > > regions mapped by application. > > > > > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > > > userfaultfd with the standard kernel memory mapping security policy. > > > > Perhaps worth adding > > > > Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") > > > > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > > > > > --- > > > fs/userfaultfd.c | 4 +--- > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > index bdc84e521..dbfe5b2a0 100644 > > > --- a/fs/userfaultfd.c > > > +++ b/fs/userfaultfd.c > > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > > > return -EINVAL; > > > if (!len) > > > return -EINVAL; > > > - if (start < mmap_min_addr) > > > - return -EINVAL; > > > if (start >= task_size) > > > return -EINVAL; > > > if (len > task_size - start) > > > return -EINVAL; > > > if (start + len <= start) > > > return -EINVAL; > > > - return 0; > > > + return security_mmap_addr(start); > > > > Hmm but it looks bit strange to check capability for address that is > > already mapped by mmap(). Why is this required? -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-09 2:51 ` Harry Yoo (Oracle) @ 2026-04-09 7:58 ` Lorenzo Stoakes 0 siblings, 0 replies; 9+ messages in thread From: Lorenzo Stoakes @ 2026-04-09 7:58 UTC (permalink / raw) To: Harry Yoo (Oracle) Cc: Denis M. Karpov, Andrea Arcangeli, rppt, akpm, Liam.Howlett, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel On Thu, Apr 09, 2026 at 11:51:11AM +0900, Harry Yoo (Oracle) wrote: > On Wed, Apr 08, 2026 at 11:09:00AM +0300, Denis M. Karpov wrote: > > > Hmm but it looks bit strange to check capability for address that is > > > already mapped by mmap(). Why is this required? > > > > Actually, it's not obvious to me either, but I may miss something. > > My intent was to replace the current restrictive check with a more flexible one. > > Technically, it's less restrictive only if start < mmap_min_addr > (setting aside the discussion of whether this is an appropriate check). > > Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process > should have the capability when registering an existing VMA to userfaultfd) > > > I think performing this check here allows us to deny invalid requests early, > > before locks or VMA lookups occur. > > But we're not trying to optimize it and we shouldn't add checks without > a proper explanation for the sake of optimization. Duplicating this kind of logic in the already horribly duplicative (and more generally, horrible) UFFD implementation is actively buggy and incorrect IMO. I also find it extremely odd that we are validating that a... source address... is... mapped that way (in userfaultfd_copy(), we validate uffdio_copy.src using validate_unaligned_range(), as well as the destination via validate_range()). It just makes no sense to me at all. Let's get rid of it. > > > Removing this check entirely would also allow using UFFD in cases where a task > > drops privileges after the initial mmap(). This seems reasonable because the > > VMA already exists, i.e. kernel already allowed this mapping. > > Yeah, that seems reasonable to me. > > IOW, I don't think "creating a VMA on a specific address (w/ proper > capabilities) is okay but once it is registered to userfaultfd, > it becomes a security hole" is a valid argument. Yes. > > And we don't unmap those mappings when the process loses the capability > to map them anyway. Once it's mapped it's mapped...? > > > In the [BUG] thread discussion > > Was it a private discussion? I can't find Andrea's emails on the thread. > > > Andrea Arcangeli also suggested adding a check for > > FIRST_USER_ADDRESS to handle architectural constraints. > > Again, what's the point of checking this on the VMA that is already created? > *checks why FIRST_USER_ADDRESS was introduced* Yeah this is just the exact same thing with a different thing to compare against no? copy_from_user() will handle this in mfill_copy_folio_locked(), returning an error if a user tried to copy from somewhere they shouldn't have (the same way as if the user tried to copy from somewhere else they shouldn't have). Let's not block on off-list sidebars. > > commit e2cdef8c847b480529b7e26991926aab4be008e6 > Author: Hugh Dickins <hugh@veritas.com> > Date: Tue Apr 19 13:29:19 2005 -0700 > > [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS > > The patches to free_pgtables by vma left problems on any architectures which > leave some user address page table entries unencapsulated by vma. Andi has > fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose > first PAGE_SIZE is reserved (perhaps) for machine vectors. > > Our calls to free_pgtables must not touch that area, and exit_mmap's > BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have > allocated an extra page table, which its free_pgd_slow would free later. > > FIRST_USER_PGD_NR has misled me and others: until all the arches define > FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This > patch fixes the bugs, the remaining patches just clean it up. > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > Oh, ok. there might be a raw mapping without VMA below FIRST_USER_ADDRESS. > > Adding such a check wouldn't hurt... but if there is no VMA, you can't > register the range to userfaultfd anyway? Exactly... and I don't want to see us randomly do checks that already happened previously. Putting duplicated bitrot-baiting code in what is one of the worst areas of mm is not something I want us to do, and would like us to actively remove anything that already exists like this. And the fact that this is in an fs/ file is even more annoying to me. Really I don't think _any_ meaningful uffd logic belongs there. Especially since we have a bunch of other uffd crap in mm/userfaultfd.c. The fs/userfaultfd.c file should be a bare-bones thing that handles the fs side of uffd _only_. Cheers, Lorenzo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-07 8:14 [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr Denis M. Karpov 2026-04-08 3:21 ` Harry Yoo (Oracle) @ 2026-04-08 12:36 ` Usama Arif 2026-04-09 8:01 ` Lorenzo Stoakes 1 sibling, 1 reply; 9+ messages in thread From: Usama Arif @ 2026-04-08 12:36 UTC (permalink / raw) To: Denis M. Karpov Cc: Usama Arif, rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > The current implementation of validate_range() in fs/userfaultfd.c > performs a hard check against mmap_min_addr without considering > capabilities, but the mmap() syscall uses security_mmap_addr() > which allows privileged processes (with CAP_SYS_RAWIO) to map below > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > dac_mmap_min_addr variable which can be changed with > /proc/sys/vm/mmap_min_addr. > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > with -EINVAL for valid memory areas that were successfully mapped > below mmap_min_addr even with appropriate capabilities. > > This prevents apps like binary compilers from using UFFD for valid memory > regions mapped by application. > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > userfaultfd with the standard kernel memory mapping security policy. > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > --- > Initial RFC following the discussion on the [BUG] thread. > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > --- > fs/userfaultfd.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index bdc84e521..dbfe5b2a0 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > return -EINVAL; > if (!len) > return -EINVAL; > - if (start < mmap_min_addr) > - return -EINVAL; > if (start >= task_size) > return -EINVAL; > if (len > task_size - start) > return -EINVAL; > if (start + len <= start) > return -EINVAL; > - return 0; > + return security_mmap_addr(start); Is this introducing an ABI change? The old code returned -EINVAL when start was below mmap_min_addr. The new code calls security_mmap_addr() which returns -EPERM when the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking specifically for -EINVAL would see different behavior start is below mmap_min_addr. > } > > static __always_inline int validate_range(struct mm_struct *mm, > -- > 2.47.3 > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-08 12:36 ` Usama Arif @ 2026-04-09 8:01 ` Lorenzo Stoakes 2026-04-09 9:05 ` Denis M. Karpov 2026-04-09 10:52 ` Usama Arif 0 siblings, 2 replies; 9+ messages in thread From: Lorenzo Stoakes @ 2026-04-09 8:01 UTC (permalink / raw) To: Usama Arif Cc: Denis M. Karpov, rppt, akpm, Liam.Howlett, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: > On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > > > The current implementation of validate_range() in fs/userfaultfd.c > > performs a hard check against mmap_min_addr without considering > > capabilities, but the mmap() syscall uses security_mmap_addr() > > which allows privileged processes (with CAP_SYS_RAWIO) to map below > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > > dac_mmap_min_addr variable which can be changed with > > /proc/sys/vm/mmap_min_addr. > > > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > > with -EINVAL for valid memory areas that were successfully mapped > > below mmap_min_addr even with appropriate capabilities. > > > > This prevents apps like binary compilers from using UFFD for valid memory > > regions mapped by application. > > > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > > userfaultfd with the standard kernel memory mapping security policy. > > > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > > > --- > > Initial RFC following the discussion on the [BUG] thread. > > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > > --- > > fs/userfaultfd.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index bdc84e521..dbfe5b2a0 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > > return -EINVAL; > > if (!len) > > return -EINVAL; > > - if (start < mmap_min_addr) > > - return -EINVAL; > > if (start >= task_size) > > return -EINVAL; > > if (len > task_size - start) > > return -EINVAL; > > if (start + len <= start) > > return -EINVAL; > > - return 0; > > + return security_mmap_addr(start); > > Is this introducing an ABI change? > > The old code returned -EINVAL when start was below mmap_min_addr. > The new code calls security_mmap_addr() which returns -EPERM when > the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking > specifically for -EINVAL would see different behavior start is > below mmap_min_addr. You mean API change? :) we don't guarantee ABI for kernel stuff anyway. Firstly, as with Harry, I don't believe we should be duplicating checks here anyway. UFFD is duplicative enough as it is. And this is such a silly edge case that I don't think it is valid or reasonable for us to account for whichever totally insane user relies on a pointless re-check being done there and _then_ relies on the error code being... -EINVAL... which is overloaded for a million other possible failures. Let's let it be -EFAULT and remove this silly check altogether. > > > } > > > > static __always_inline int validate_range(struct mm_struct *mm, > > -- > > 2.47.3 > > > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-09 8:01 ` Lorenzo Stoakes @ 2026-04-09 9:05 ` Denis M. Karpov 2026-04-09 10:52 ` Usama Arif 1 sibling, 0 replies; 9+ messages in thread From: Denis M. Karpov @ 2026-04-09 9:05 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Usama Arif, rppt, akpm, Liam.Howlett, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel, Andrea Arcangeli (to Harry) > Technically, it's less restrictive only if start < mmap_min_addr > (setting aside the discussion of whether this is an appropriate check). > > Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process > should have the capability when registering an existing VMA to userfaultfd) Hmm, I can't find any checks for addr >= mmap_min_addr in the security subsystem, only if addr < mmap_min_addr. Otherwise, one would need capabilities for regular mmap() calls as well. > Was it a private discussion? I can't find Andrea's emails on the thread. Oh, it seems Andrea accidentally dropped some recipients from the CC list. I have CC'd him here so he can clarify his points if he feels it is necessary. (to Lorenzo) >Duplicating this kind of logic in the already horribly duplicative (and more >generally, horrible) UFFD implementation is actively buggy and incorrect IMO. So, no security_mmap_addr check, no FIRST_USER_ADDRESS check. Thank you both for the review. I'll prepare the patch. On Thu, Apr 9, 2026 at 11:01 AM Lorenzo Stoakes <ljs@kernel.org> wrote: > > On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: > > On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > > > > > The current implementation of validate_range() in fs/userfaultfd.c > > > performs a hard check against mmap_min_addr without considering > > > capabilities, but the mmap() syscall uses security_mmap_addr() > > > which allows privileged processes (with CAP_SYS_RAWIO) to map below > > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > > > dac_mmap_min_addr variable which can be changed with > > > /proc/sys/vm/mmap_min_addr. > > > > > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > > > with -EINVAL for valid memory areas that were successfully mapped > > > below mmap_min_addr even with appropriate capabilities. > > > > > > This prevents apps like binary compilers from using UFFD for valid memory > > > regions mapped by application. > > > > > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > > > userfaultfd with the standard kernel memory mapping security policy. > > > > > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > > > > > --- > > > Initial RFC following the discussion on the [BUG] thread. > > > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > > > --- > > > fs/userfaultfd.c | 4 +--- > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > index bdc84e521..dbfe5b2a0 100644 > > > --- a/fs/userfaultfd.c > > > +++ b/fs/userfaultfd.c > > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > > > return -EINVAL; > > > if (!len) > > > return -EINVAL; > > > - if (start < mmap_min_addr) > > > - return -EINVAL; > > > if (start >= task_size) > > > return -EINVAL; > > > if (len > task_size - start) > > > return -EINVAL; > > > if (start + len <= start) > > > return -EINVAL; > > > - return 0; > > > + return security_mmap_addr(start); > > > > Is this introducing an ABI change? > > > > The old code returned -EINVAL when start was below mmap_min_addr. > > The new code calls security_mmap_addr() which returns -EPERM when > > the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking > > specifically for -EINVAL would see different behavior start is > > below mmap_min_addr. > > You mean API change? :) we don't guarantee ABI for kernel stuff anyway. > > Firstly, as with Harry, I don't believe we should be duplicating checks here > anyway. UFFD is duplicative enough as it is. > > And this is such a silly edge case that I don't think it is valid or reasonable > for us to account for whichever totally insane user relies on a pointless > re-check being done there and _then_ relies on the error code > being... -EINVAL... which is overloaded for a million other possible failures. > > Let's let it be -EFAULT and remove this silly check altogether. > > > > > > } > > > > > > static __always_inline int validate_range(struct mm_struct *mm, > > > -- > > > 2.47.3 > > > > > > > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr 2026-04-09 8:01 ` Lorenzo Stoakes 2026-04-09 9:05 ` Denis M. Karpov @ 2026-04-09 10:52 ` Usama Arif 1 sibling, 0 replies; 9+ messages in thread From: Usama Arif @ 2026-04-09 10:52 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Denis M. Karpov, rppt, akpm, Liam.Howlett, vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel On 09/04/2026 09:01, Lorenzo Stoakes wrote: > On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: >> On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: >> >>> The current implementation of validate_range() in fs/userfaultfd.c >>> performs a hard check against mmap_min_addr without considering >>> capabilities, but the mmap() syscall uses security_mmap_addr() >>> which allows privileged processes (with CAP_SYS_RAWIO) to map below >>> mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses >>> dac_mmap_min_addr variable which can be changed with >>> /proc/sys/vm/mmap_min_addr. >>> >>> Because userfaultfd uses a different check, UFFDIO_REGISTER may fail >>> with -EINVAL for valid memory areas that were successfully mapped >>> below mmap_min_addr even with appropriate capabilities. >>> >>> This prevents apps like binary compilers from using UFFD for valid memory >>> regions mapped by application. >>> >>> Replace the rigid mmap_min_addr check with security_mmap_addr() to align >>> userfaultfd with the standard kernel memory mapping security policy. >>> >>> Signed-off-by: Denis M. Karpov <komlomal@gmail.com> >>> >>> --- >>> Initial RFC following the discussion on the [BUG] thread. >>> Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ >>> --- >>> fs/userfaultfd.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c >>> index bdc84e521..dbfe5b2a0 100644 >>> --- a/fs/userfaultfd.c >>> +++ b/fs/userfaultfd.c >>> @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( >>> return -EINVAL; >>> if (!len) >>> return -EINVAL; >>> - if (start < mmap_min_addr) >>> - return -EINVAL; >>> if (start >= task_size) >>> return -EINVAL; >>> if (len > task_size - start) >>> return -EINVAL; >>> if (start + len <= start) >>> return -EINVAL; >>> - return 0; >>> + return security_mmap_addr(start); >> >> Is this introducing an ABI change? >> >> The old code returned -EINVAL when start was below mmap_min_addr. >> The new code calls security_mmap_addr() which returns -EPERM when >> the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking >> specifically for -EINVAL would see different behavior start is >> below mmap_min_addr. > > You mean API change? :) we don't guarantee ABI for kernel stuff anyway. > Ah no, I meant ABI, I hope :) The return value of validate_unaligned_range() flows directly back to the ioctl() return value, which is visible to userspace. The error code a program sees from ioctl(fd, UFFDIO_REGISTER, ...) changes from -EINVAL to -EPERM for the same input, right? Its probably not an issue, but we would need to update https://man7.org/linux/man-pages/man2/ioctl_userfaultfd.2.html right? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-09 10:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-07 8:14 [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr Denis M. Karpov 2026-04-08 3:21 ` Harry Yoo (Oracle) 2026-04-08 8:09 ` Denis M. Karpov 2026-04-09 2:51 ` Harry Yoo (Oracle) 2026-04-09 7:58 ` Lorenzo Stoakes 2026-04-08 12:36 ` Usama Arif 2026-04-09 8:01 ` Lorenzo Stoakes 2026-04-09 9:05 ` Denis M. Karpov 2026-04-09 10:52 ` Usama Arif
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox