From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>, Jakub Acs <acsjakub@amazon.de>,
linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
Jann Horn <jannh@google.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, xu.xin16@zte.com.cn,
chengming.zhou@linux.dev, peterx@redhat.com,
axelrasmussen@google.com, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
Date: Thu, 6 Nov 2025 12:16:36 +0100 [thread overview]
Message-ID: <0bc6a1ba-4f4f-4b04-b66c-b5d217faefab@kernel.org> (raw)
In-Reply-To: <13c7242e-3a40-469b-9e99-8a65a21449bb@suse.cz>
On 06.11.25 11:39, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
>> syzkaller discovered the following crash: (kernel BUG)
>>
>> [ 44.607039] ------------[ cut here ]------------
>> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
>> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
>> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>>
>> <snip other registers, drop unreliable trace>
>>
>> [ 44.617726] Call Trace:
>> [ 44.617926] <TASK>
>> [ 44.619284] userfaultfd_release+0xef/0x1b0
>> [ 44.620976] __fput+0x3f9/0xb60
>> [ 44.621240] fput_close_sync+0x110/0x210
>> [ 44.622222] __x64_sys_close+0x8f/0x120
>> [ 44.622530] do_syscall_64+0x5b/0x2f0
>> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 44.623244] RIP: 0033:0x7f365bb3f227
>>
>> Kernel panics because it detects UFFD inconsistency during
>> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
>> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>>
>> The inconsistency is caused in ksm_madvise(): when user calls madvise()
>> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
>> mode, it accidentally clears all flags stored in the upper 32 bits of
>> vma->vm_flags.
>>
>> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
>> and int are 32-bit wide. This setup causes the following mishap during
>> the &= ~VM_MERGEABLE assignment.
>>
>> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
>> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
>> promoted to unsigned long before the & operation. This promotion fills
>> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
>> even for a signed conversion, this wouldn't help as the leading bit is
>> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
>> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
>> the upper 32-bits of its value.
>>
>> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
>> BIT() macro.
>>
>> Note: other VM_* flags are not affected:
>> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
>> all constants of type int and after ~ operation, they end up with
>> leading 1 and are thus converted to unsigned long with leading 1s.
>>
>> Note 2:
>> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
>> no longer a kernel BUG, but a WARNING at the same place:
>>
>> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>>
>> but the root-cause (flag-drop) remains the same.
>>
>> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
>
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?
>
> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
>
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
Yes, I agree.
--
Cheers
David
next prev parent reply other threads:[~2025-11-06 11:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 9:03 [PATCH v3 0/2] mm, ksm: fix flag-dropping behavior Jakub Acs
2025-10-01 9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
2025-10-01 14:06 ` David Hildenbrand
2025-10-01 16:43 ` SeongJae Park
2025-11-06 10:39 ` Vlastimil Babka
2025-11-06 11:16 ` David Hildenbrand (Red Hat) [this message]
2025-11-07 9:49 ` Jakub Acs
2025-11-10 10:00 ` Vlastimil Babka
2025-10-01 9:03 ` [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT() Jakub Acs
2025-10-01 14:04 ` David Hildenbrand
2025-10-02 8:03 ` Jakub Acs
2025-10-01 16:51 ` SeongJae Park
2025-10-02 7:29 ` David Hildenbrand
2025-10-02 17:39 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0bc6a1ba-4f4f-4b04-b66c-b5d217faefab@kernel.org \
--to=david@kernel.org \
--cc=acsjakub@amazon.de \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=chengming.zhou@linux.dev \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=peterx@redhat.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.