* [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise [not found] <20251001090353.57523-1-acsjakub@amazon.de> @ 2025-10-01 9:03 ` Jakub Acs 2025-10-01 14:06 ` David Hildenbrand ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Jakub Acs @ 2025-10-01 9:03 UTC (permalink / raw) To: linux-mm Cc: acsjakub, akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable syzkaller discovered the following crash: (kernel BUG) [ 44.607039] ------------[ cut here ]------------ [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 <snip other registers, drop unreliable trace> [ 44.617726] Call Trace: [ 44.617926] <TASK> [ 44.619284] userfaultfd_release+0xef/0x1b0 [ 44.620976] __fput+0x3f9/0xb60 [ 44.621240] fput_close_sync+0x110/0x210 [ 44.622222] __x64_sys_close+0x8f/0x120 [ 44.622530] do_syscall_64+0x5b/0x2f0 [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 44.623244] RIP: 0033:0x7f365bb3f227 Kernel panics because it detects UFFD inconsistency during userfaultfd_release_all(). Specifically, a VMA which has a valid pointer to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. The inconsistency is caused in ksm_madvise(): when user calls madvise() with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR mode, it accidentally clears all flags stored in the upper 32 bits of vma->vm_flags. Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int and int are 32-bit wide. This setup causes the following mishap during the &= ~VM_MERGEABLE assignment. VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then promoted to unsigned long before the & operation. This promotion fills upper 32 bits with leading 0s, as we're doing unsigned conversion (and even for a signed conversion, this wouldn't help as the leading bit is 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears the upper 32-bits of its value. Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the BIT() macro. Note: other VM_* flags are not affected: This only happens to the VM_MERGEABLE flag, as the other VM_* flags are all constants of type int and after ~ operation, they end up with leading 1 and are thus converted to unsigned long with leading 1s. Note 2: After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is no longer a kernel BUG, but a WARNING at the same place: [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 but the root-cause (flag-drop) remains the same. Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") Signed-off-by: Jakub Acs <acsjakub@amazon.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Xu Xin <xu.xin16@zte.com.cn> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1ae97a0b8ec7..c6794d0e24eb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */ #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */ #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */ -#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */ +#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */ #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */ -- 2.47.3 Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-10-01 9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs @ 2025-10-01 14:06 ` David Hildenbrand 2025-10-01 16:43 ` SeongJae Park 2025-11-06 10:39 ` Vlastimil Babka 2 siblings, 0 replies; 7+ messages in thread From: David Hildenbrand @ 2025-10-01 14:06 UTC (permalink / raw) To: Jakub Acs, linux-mm Cc: akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable On 01.10.25 11:03, Jakub Acs wrote: > syzkaller discovered the following crash: (kernel BUG) > > [ 44.607039] ------------[ cut here ]------------ > [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! > [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) > [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 > > <snip other registers, drop unreliable trace> > > [ 44.617726] Call Trace: > [ 44.617926] <TASK> > [ 44.619284] userfaultfd_release+0xef/0x1b0 > [ 44.620976] __fput+0x3f9/0xb60 > [ 44.621240] fput_close_sync+0x110/0x210 > [ 44.622222] __x64_sys_close+0x8f/0x120 > [ 44.622530] do_syscall_64+0x5b/0x2f0 > [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 44.623244] RIP: 0033:0x7f365bb3f227 > > Kernel panics because it detects UFFD inconsistency during > userfaultfd_release_all(). Specifically, a VMA which has a valid pointer > to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. > > The inconsistency is caused in ksm_madvise(): when user calls madvise() > with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR > mode, it accidentally clears all flags stored in the upper 32 bits of > vma->vm_flags. > > Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int > and int are 32-bit wide. This setup causes the following mishap during > the &= ~VM_MERGEABLE assignment. > > VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. > After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then > promoted to unsigned long before the & operation. This promotion fills > upper 32 bits with leading 0s, as we're doing unsigned conversion (and > even for a signed conversion, this wouldn't help as the leading bit is > 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff > instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears > the upper 32-bits of its value. > > Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the > BIT() macro. > > Note: other VM_* flags are not affected: > This only happens to the VM_MERGEABLE flag, as the other VM_* flags are > all constants of type int and after ~ operation, they end up with > leading 1 and are thus converted to unsigned long with leading 1s. > > Note 2: > After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is > no longer a kernel BUG, but a WARNING at the same place: > > [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 > > but the root-cause (flag-drop) remains the same. > > Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") Very Likely we want to CC stable. > Signed-off-by: Jakub Acs <acsjakub@amazon.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Xu Xin <xu.xin16@zte.com.cn> > Cc: Chengming Zhou <chengming.zhou@linux.dev> > Cc: Peter Xu <peterx@redhat.com> > Cc: Axel Rasmussen <axelrasmussen@google.com> > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > Cc: stable@vger.kernel.org > --- IMHO no need to resend this one if Andrew can just pick this one up. Then, you can send out patch #2 separately as commented in reply to patch #2. Thanks! Acked-by: David Hildenbrand <david@redhat.com> -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-10-01 9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs 2025-10-01 14:06 ` David Hildenbrand @ 2025-10-01 16:43 ` SeongJae Park 2025-11-06 10:39 ` Vlastimil Babka 2 siblings, 0 replies; 7+ messages in thread From: SeongJae Park @ 2025-10-01 16:43 UTC (permalink / raw) To: Jakub Acs Cc: SeongJae Park, linux-mm, akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable On Wed, 1 Oct 2025 09:03:52 +0000 Jakub Acs <acsjakub@amazon.de> wrote: > syzkaller discovered the following crash: (kernel BUG) > > [ 44.607039] ------------[ cut here ]------------ > [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! > [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) > [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 > > <snip other registers, drop unreliable trace> > > [ 44.617726] Call Trace: > [ 44.617926] <TASK> > [ 44.619284] userfaultfd_release+0xef/0x1b0 > [ 44.620976] __fput+0x3f9/0xb60 > [ 44.621240] fput_close_sync+0x110/0x210 > [ 44.622222] __x64_sys_close+0x8f/0x120 > [ 44.622530] do_syscall_64+0x5b/0x2f0 > [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 44.623244] RIP: 0033:0x7f365bb3f227 > > Kernel panics because it detects UFFD inconsistency during > userfaultfd_release_all(). Specifically, a VMA which has a valid pointer > to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. > > The inconsistency is caused in ksm_madvise(): when user calls madvise() > with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR > mode, it accidentally clears all flags stored in the upper 32 bits of > vma->vm_flags. > > Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int > and int are 32-bit wide. This setup causes the following mishap during > the &= ~VM_MERGEABLE assignment. > > VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. > After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then > promoted to unsigned long before the & operation. This promotion fills > upper 32 bits with leading 0s, as we're doing unsigned conversion (and > even for a signed conversion, this wouldn't help as the leading bit is > 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff > instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears > the upper 32-bits of its value. > > Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the > BIT() macro. Nice! > > Note: other VM_* flags are not affected: > This only happens to the VM_MERGEABLE flag, as the other VM_* flags are > all constants of type int and after ~ operation, they end up with > leading 1 and are thus converted to unsigned long with leading 1s. > > Note 2: > After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is > no longer a kernel BUG, but a WARNING at the same place: > > [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 > > but the root-cause (flag-drop) remains the same. > > Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") Nit. It is recommended [1] to use 12 characters of the SHA-1 ID, but you are using 13 characters. > Signed-off-by: Jakub Acs <acsjakub@amazon.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Xu Xin <xu.xin16@zte.com.cn> > Cc: Chengming Zhou <chengming.zhou@linux.dev> > Cc: Peter Xu <peterx@redhat.com> > Cc: Axel Rasmussen <axelrasmussen@google.com> > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > Cc: stable@vger.kernel.org Nit. This would be nice to be placed just after the 'Fixes:' tag. Acked-by: SeongJae Park <sj@kernel.org> [1] https://docs.kernel.org/process/submitting-patches.html#describe-your-changes Thanks, SJ [...] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-10-01 9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs 2025-10-01 14:06 ` David Hildenbrand 2025-10-01 16:43 ` SeongJae Park @ 2025-11-06 10:39 ` Vlastimil Babka 2025-11-06 11:16 ` David Hildenbrand (Red Hat) ` (2 more replies) 2 siblings, 3 replies; 7+ messages in thread From: Vlastimil Babka @ 2025-11-06 10:39 UTC (permalink / raw) To: Jakub Acs, linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes Cc: akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable On 10/1/25 11:03, Jakub Acs wrote: > syzkaller discovered the following crash: (kernel BUG) > > [ 44.607039] ------------[ cut here ]------------ > [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! > [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) > [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 > > <snip other registers, drop unreliable trace> > > [ 44.617726] Call Trace: > [ 44.617926] <TASK> > [ 44.619284] userfaultfd_release+0xef/0x1b0 > [ 44.620976] __fput+0x3f9/0xb60 > [ 44.621240] fput_close_sync+0x110/0x210 > [ 44.622222] __x64_sys_close+0x8f/0x120 > [ 44.622530] do_syscall_64+0x5b/0x2f0 > [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 44.623244] RIP: 0033:0x7f365bb3f227 > > Kernel panics because it detects UFFD inconsistency during > userfaultfd_release_all(). Specifically, a VMA which has a valid pointer > to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. > > The inconsistency is caused in ksm_madvise(): when user calls madvise() > with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR > mode, it accidentally clears all flags stored in the upper 32 bits of > vma->vm_flags. > > Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int > and int are 32-bit wide. This setup causes the following mishap during > the &= ~VM_MERGEABLE assignment. > > VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. > After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then > promoted to unsigned long before the & operation. This promotion fills > upper 32 bits with leading 0s, as we're doing unsigned conversion (and > even for a signed conversion, this wouldn't help as the leading bit is > 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff > instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears > the upper 32-bits of its value. > > Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the > BIT() macro. > > Note: other VM_* flags are not affected: > This only happens to the VM_MERGEABLE flag, as the other VM_* flags are > all constants of type int and after ~ operation, they end up with > leading 1 and are thus converted to unsigned long with leading 1s. > > Note 2: > After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is > no longer a kernel BUG, but a WARNING at the same place: > > [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 > > but the root-cause (flag-drop) remains the same. > > Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") Late to the party, but it seems to me the correct Fixes: should be f8af4da3b4c1 ("ksm: the mm interface to ksm") which introduced the flag and the buggy clearing code, no? Commit 7677f7fd8be76 is just one that notices it, right? But there are other flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they can be cleared using a madvise. So we can't amend the Fixes: now but maybe could advise stable to backport for even older versions than based on 7677f7fd8be76 ? > Signed-off-by: Jakub Acs <acsjakub@amazon.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Xu Xin <xu.xin16@zte.com.cn> > Cc: Chengming Zhou <chengming.zhou@linux.dev> > Cc: Peter Xu <peterx@redhat.com> > Cc: Axel Rasmussen <axelrasmussen@google.com> > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > Cc: stable@vger.kernel.org > --- > include/linux/mm.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 1ae97a0b8ec7..c6794d0e24eb 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp); > #define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */ > #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */ > #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */ > -#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */ > +#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */ > > #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS > #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-11-06 10:39 ` Vlastimil Babka @ 2025-11-06 11:16 ` David Hildenbrand (Red Hat) 2025-11-07 9:49 ` Jakub Acs 2025-11-10 10:00 ` Vlastimil Babka 2 siblings, 0 replies; 7+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-11-06 11:16 UTC (permalink / raw) To: Vlastimil Babka, Jakub Acs, linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes Cc: akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable On 06.11.25 11:39, Vlastimil Babka wrote: > On 10/1/25 11:03, Jakub Acs wrote: >> syzkaller discovered the following crash: (kernel BUG) >> >> [ 44.607039] ------------[ cut here ]------------ >> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! >> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) >> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 >> >> <snip other registers, drop unreliable trace> >> >> [ 44.617726] Call Trace: >> [ 44.617926] <TASK> >> [ 44.619284] userfaultfd_release+0xef/0x1b0 >> [ 44.620976] __fput+0x3f9/0xb60 >> [ 44.621240] fput_close_sync+0x110/0x210 >> [ 44.622222] __x64_sys_close+0x8f/0x120 >> [ 44.622530] do_syscall_64+0x5b/0x2f0 >> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e >> [ 44.623244] RIP: 0033:0x7f365bb3f227 >> >> Kernel panics because it detects UFFD inconsistency during >> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer >> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. >> >> The inconsistency is caused in ksm_madvise(): when user calls madvise() >> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR >> mode, it accidentally clears all flags stored in the upper 32 bits of >> vma->vm_flags. >> >> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int >> and int are 32-bit wide. This setup causes the following mishap during >> the &= ~VM_MERGEABLE assignment. >> >> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. >> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then >> promoted to unsigned long before the & operation. This promotion fills >> upper 32 bits with leading 0s, as we're doing unsigned conversion (and >> even for a signed conversion, this wouldn't help as the leading bit is >> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff >> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears >> the upper 32-bits of its value. >> >> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the >> BIT() macro. >> >> Note: other VM_* flags are not affected: >> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are >> all constants of type int and after ~ operation, they end up with >> leading 1 and are thus converted to unsigned long with leading 1s. >> >> Note 2: >> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is >> no longer a kernel BUG, but a WARNING at the same place: >> >> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 >> >> but the root-cause (flag-drop) remains the same. >> >> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") > > Late to the party, but it seems to me the correct Fixes: should be > f8af4da3b4c1 ("ksm: the mm interface to ksm") > which introduced the flag and the buggy clearing code, no? > > Commit 7677f7fd8be76 is just one that notices it, right? But there are other > flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they > can be cleared using a madvise. > > So we can't amend the Fixes: now but maybe could advise stable to backport > for even older versions than based on 7677f7fd8be76 ? Yes, I agree. -- Cheers David ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-11-06 10:39 ` Vlastimil Babka 2025-11-06 11:16 ` David Hildenbrand (Red Hat) @ 2025-11-07 9:49 ` Jakub Acs 2025-11-10 10:00 ` Vlastimil Babka 2 siblings, 0 replies; 7+ messages in thread From: Jakub Acs @ 2025-11-07 9:49 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes, akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable On Thu, Nov 06, 2025 at 11:39:28AM +0100, Vlastimil Babka wrote: > On 10/1/25 11:03, Jakub Acs wrote: > > syzkaller discovered the following crash: (kernel BUG) > > > > [ 44.607039] ------------[ cut here ]------------ > > [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! > > [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI > > [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) > > [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 > > > > <snip other registers, drop unreliable trace> > > > > [ 44.617726] Call Trace: > > [ 44.617926] <TASK> > > [ 44.619284] userfaultfd_release+0xef/0x1b0 > > [ 44.620976] __fput+0x3f9/0xb60 > > [ 44.621240] fput_close_sync+0x110/0x210 > > [ 44.622222] __x64_sys_close+0x8f/0x120 > > [ 44.622530] do_syscall_64+0x5b/0x2f0 > > [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 44.623244] RIP: 0033:0x7f365bb3f227 > > > > Kernel panics because it detects UFFD inconsistency during > > userfaultfd_release_all(). Specifically, a VMA which has a valid pointer > > to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. > > > > The inconsistency is caused in ksm_madvise(): when user calls madvise() > > with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR > > mode, it accidentally clears all flags stored in the upper 32 bits of > > vma->vm_flags. > > > > Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int > > and int are 32-bit wide. This setup causes the following mishap during > > the &= ~VM_MERGEABLE assignment. > > > > VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. > > After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then > > promoted to unsigned long before the & operation. This promotion fills > > upper 32 bits with leading 0s, as we're doing unsigned conversion (and > > even for a signed conversion, this wouldn't help as the leading bit is > > 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff > > instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears > > the upper 32-bits of its value. > > > > Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the > > BIT() macro. > > > > Note: other VM_* flags are not affected: > > This only happens to the VM_MERGEABLE flag, as the other VM_* flags are > > all constants of type int and after ~ operation, they end up with > > leading 1 and are thus converted to unsigned long with leading 1s. > > > > Note 2: > > After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is > > no longer a kernel BUG, but a WARNING at the same place: > > > > [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 > > > > but the root-cause (flag-drop) remains the same. > > > > Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") > > Late to the party, but it seems to me the correct Fixes: should be > f8af4da3b4c1 ("ksm: the mm interface to ksm") > which introduced the flag and the buggy clearing code, no? > > Commit 7677f7fd8be76 is just one that notices it, right? But there are other > flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they > can be cleared using a madvise. > > So we can't amend the Fixes: now but maybe could advise stable to backport > for even older versions than based on 7677f7fd8be76 ? > Good point. It was a bit tricky to determine the correct "fixes" tag, as there were more candidates: - the commit that initially introduced VM_MERGEABLE as a constant with different inferred type to other vm_flags constants - the commit that first started using upper 32 bits of vm_flags and did not make sure the constants are defined safely - f8af4da3b4c1 indeed, as the one that makes the drop actually possible - 7677f7fd8be76 that shows us a path where the drop manifests Looking back, I agree f8af4da3b4c1 is the better option, but as you said, that won't be changed now. Nevertheless, I'll send the backports after a round of kselftests, thanks for pointing this out. Have a good day, Jakub Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise 2025-11-06 10:39 ` Vlastimil Babka 2025-11-06 11:16 ` David Hildenbrand (Red Hat) 2025-11-07 9:49 ` Jakub Acs @ 2025-11-10 10:00 ` Vlastimil Babka 2 siblings, 0 replies; 7+ messages in thread From: Vlastimil Babka @ 2025-11-10 10:00 UTC (permalink / raw) To: Jakub Acs, linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes, Dave Hansen Cc: akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel, stable On 11/6/25 11:39, Vlastimil Babka wrote: > On 10/1/25 11:03, Jakub Acs wrote: >> syzkaller discovered the following crash: (kernel BUG) >> >> [ 44.607039] ------------[ cut here ]------------ >> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067! >> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI >> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none) >> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 >> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460 >> >> <snip other registers, drop unreliable trace> >> >> [ 44.617726] Call Trace: >> [ 44.617926] <TASK> >> [ 44.619284] userfaultfd_release+0xef/0x1b0 >> [ 44.620976] __fput+0x3f9/0xb60 >> [ 44.621240] fput_close_sync+0x110/0x210 >> [ 44.622222] __x64_sys_close+0x8f/0x120 >> [ 44.622530] do_syscall_64+0x5b/0x2f0 >> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e >> [ 44.623244] RIP: 0033:0x7f365bb3f227 >> >> Kernel panics because it detects UFFD inconsistency during >> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer >> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags. >> >> The inconsistency is caused in ksm_madvise(): when user calls madvise() >> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR >> mode, it accidentally clears all flags stored in the upper 32 bits of >> vma->vm_flags. >> >> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int >> and int are 32-bit wide. This setup causes the following mishap during >> the &= ~VM_MERGEABLE assignment. >> >> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. >> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then >> promoted to unsigned long before the & operation. This promotion fills >> upper 32 bits with leading 0s, as we're doing unsigned conversion (and >> even for a signed conversion, this wouldn't help as the leading bit is >> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff >> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears >> the upper 32-bits of its value. >> >> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the >> BIT() macro. >> >> Note: other VM_* flags are not affected: >> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are >> all constants of type int and after ~ operation, they end up with >> leading 1 and are thus converted to unsigned long with leading 1s. >> >> Note 2: >> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is >> no longer a kernel BUG, but a WARNING at the same place: >> >> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067 >> >> but the root-cause (flag-drop) remains the same. >> >> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode") > > Late to the party, but it seems to me the correct Fixes: should be > f8af4da3b4c1 ("ksm: the mm interface to ksm") > which introduced the flag and the buggy clearing code, no? Clarification: flags with bits >31 did not exist at the time of f8af4da3b4c1 as they were only introduced later with 63c17fb8e5a4 ("mm/core, x86/mm/pkeys: Store protection bits in high VMA flags") (v4.6) so that would have been the most precise Fixes: commit. Sorry, Hugh :) But that doesn't affect the stable backports efforts where the oldest LTS is 5.4 anyway. > Commit 7677f7fd8be76 is just one that notices it, right? But there are other > flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they > can be cleared using a madvise. > > So we can't amend the Fixes: now but maybe could advise stable to backport > for even older versions than based on 7677f7fd8be76 ? > >> Signed-off-by: Jakub Acs <acsjakub@amazon.de> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Xu Xin <xu.xin16@zte.com.cn> >> Cc: Chengming Zhou <chengming.zhou@linux.dev> >> Cc: Peter Xu <peterx@redhat.com> >> Cc: Axel Rasmussen <axelrasmussen@google.com> >> Cc: linux-mm@kvack.org >> Cc: linux-kernel@vger.kernel.org >> Cc: stable@vger.kernel.org >> --- >> include/linux/mm.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 1ae97a0b8ec7..c6794d0e24eb 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp); >> #define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */ >> #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */ >> #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */ >> -#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */ >> +#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */ >> >> #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS >> #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */ > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-10 10:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251001090353.57523-1-acsjakub@amazon.de>
2025-10-01 9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
2025-10-01 14:06 ` David Hildenbrand
2025-10-01 16:43 ` SeongJae Park
2025-11-06 10:39 ` Vlastimil Babka
2025-11-06 11:16 ` David Hildenbrand (Red Hat)
2025-11-07 9:49 ` Jakub Acs
2025-11-10 10:00 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox