All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: syzbot <syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, pasha.tatashin@soleen.com,
	syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set
Date: Mon, 22 Apr 2024 09:28:42 -0400	[thread overview]
Message-ID: <ZiZmCl3fTFfIYf1t@x1n> (raw)
In-Reply-To: <bbeb3704-e4a6-42fa-90e7-28de1e885249@redhat.com>

On Mon, Apr 22, 2024 at 01:46:20PM +0200, David Hildenbrand wrote:
> On 22.04.24 12:38, David Hildenbrand wrote:
> > On 22.04.24 12:07, David Hildenbrand wrote:
> > > On 21.04.24 22:16, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
> > > > git tree:       linux-next
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000
> > > > 
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com
> > > > 
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
> > > > WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420
> > > 
> > > I think this is
> > > 
> > > if (pte_present(pte) && pte_uffd_wp(pte))
> > > 	WARN_ON_ONCE(pte_write(pte));
> > > 
> > > mm/page_table_check.c:213
> > > > Modules linked in:
> > > > CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
> > > > RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
> > > > RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
> > > > Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
> > > > RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
> > > > RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
> > > > RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
> > > > RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
> > > > R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
> > > > R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
> > > > FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > Call Trace:
> > > >     <TASK>
> > > >     page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
> > > >     set_ptes include/linux/pgtable.h:267 [inline]
> > > >     __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
> > > >     ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
> > > >     change_pte_range mm/mprotect.c:194 [inline]
> > > >     change_pmd_range mm/mprotect.c:424 [inline]
> > > >     change_pud_range mm/mprotect.c:457 [inline]
> > > >     change_p4d_range mm/mprotect.c:480 [inline]
> > > >     change_protection_range mm/mprotect.c:508 [inline]
> > > >     change_protection+0x2770/0x3cc0 mm/mprotect.c:542
> > > >     mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
> > > >     do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
> > > >     __do_sys_mprotect mm/mprotect.c:841 [inline]
> > > >     __se_sys_mprotect mm/mprotect.c:838 [inline]
> > > >     __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
> > > >     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > >     do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> > > >     entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > RIP: 0033:0x7f45514bf429
> > > > Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
> > > > RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
> > > > RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
> > > > RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
> > > > R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
> > > > R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
> > > >     </TASK>
> > > 
> > > Did we find a real issue that involves mprotect()?
> > > 
> > > At least can_change_pte_writable() should always return "false" for
> > > userfaultfd_pte_wp().
> > > 
> > > Do we maybe have a uffd-wp PTE outside of a UFFD_WP VMA?
> > > 
> > > Or was the PTE already writable and we only detect it now as we call
> > > mprotect()? (missed to detect it earlier?)
> > 
> > Staring at the reproducer, we do
> > 
> > 
> >     syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> >             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> >             /*offset=*/0ul);
> >     syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
> >             /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
> >             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> >             /*offset=*/0ul);
> > 
> > -> Writable anonymous memmory
> > 
> >     syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> >             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> >             /*offset=*/0ul);
> >     intptr_t res = 0;
> >     res = syscall(__NR_userfaultfd,
> >                   /*flags=UFFD_USER_MODE_ONLY|O_NONBLOCK*/ 0x801ul);
> >     if (res != -1)
> >       r[0] = res;
> >     *(uint64_t*)0x200004c0 = 0xaa;
> >     *(uint64_t*)0x200004c8 = 0;
> >     *(uint64_t*)0x200004d0 = 0;
> >     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x200004c0ul);
> > 
> > -> _UFFDIO_API handshake?
> > 
> >     syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x3000ul,
> >             /*prot=PROT_SEM|PROT_EXEC*/ 0xcul);
> > 
> > -> Protect target range R/O. I assume: no page populated yet?
> > -> 3 pages starting at 0x20ffc000ul;
> > 
> >     *(uint64_t*)0x20000180 = 0x20ffc000;
> >     *(uint64_t*)0x20000188 = 0x3000;
> >     *(uint64_t*)0x20000190 = 3;
> >     *(uint64_t*)0x20000198 = 0;
> >     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul);
> > 
> > -> _UFFDIO_REGISTER (aa00)
> > -> _range = 3 pages starting at 0x20ffc000ul
> > -> _mode = UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_MINOR
> > 
> >     *(uint64_t*)0x20000000 = 0x20ffd000;
> >     *(uint64_t*)0x20000008 = 0x20ffb000;
> >     *(uint64_t*)0x20000010 = 0x1000;
> >     *(uint64_t*)0x20000018 = 3;
> >     *(uint64_t*)0x20000020 = 0;
> >     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc028aa03, /*arg=*/0x20000000ul);
> > 
> > -> _UFFDIO_COPY (aa03)
> > -> dst = 0x20ffd000
> > -> src = 0x20ffb000
> > -> len = 0x1000 (single page)
> > -> mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP
> > 
> > -> We are copying into the R/O range. src should be R/W and trigger a page fault
> >      on access where we get a fresh page.
> > 
> >     *(uint16_t*)0x200000c0 = 1;
> >     *(uint64_t*)0x200000c8 = 0x20000040;
> >     *(uint16_t*)0x20000040 = 6;
> >     *(uint8_t*)0x20000042 = 0;
> >     *(uint8_t*)0x20000043 = 0;
> >     *(uint32_t*)0x20000044 = 0x7fffffff;
> >     res = syscall(__NR_seccomp, /*op=*/1ul, /*flags=*/0ul, /*arg=*/0x200000c0ul);
> >     if (res != -1)
> >       r[1] = res;
> >     syscall(__NR_open_tree, /*dfd=*/-1, /*filename=*/0ul, /*flags=*/0ul);
> > 
> > -> No idea what happens here and if it is relevant. If __NR_seccomp failed, we would
> >      no set r[1].
> > 
> >     syscall(__NR_close_range, /*fd=*/r[1], /*max_fd=*/-1, /*flags=*/0ul);
> > 
> > -> Is that closing uffd as well, especially if __NR_seccomp failed?
> > 
> >     syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x4000ul,
> >             /*prot=PROT_SEM|PROT_WRITE|PROT_READ|PROT_EXEC*/ 0xful);
> > 
> > -> Restore write permissions. This seems to fire the uffd-wp page table check I assume.
> 
> I think the issue is that userfaultfd_release() will clear the VMA UFFD_WP flag,
> but it will not clear PTE uffd-wp bits. So we have leftover PTE uffd-wp bits at
> the time we wr-unprotect.
> 
> I thought we removed that lazy handling, but looks like we didn't consider the
> "close uffd" case in:
> 
> commit f369b07c861435bd812a9d14493f71b34132ed6f
> Author: Peter Xu <peterx@redhat.com>
> Date:   Thu Aug 11 16:13:40 2022 -0400
> 
>     mm/uffd: reset write protection when unregister with wp-mode
> 
> 
> close should behave just like unregister.
> 
> 
> Simplified+readable reproducer:
> 
> #define _GNU_SOURCE
> 
> #include <stdint.h>
> #include <fcntl.h>
> #include <sys/syscall.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/ioctl.h>
> #include <linux/userfaultfd.h>
> #include <unistd.h>
> 
> int main(void)
> {
>         void *src = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>         void *dst = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>         struct uffdio_register uffdio_register = {};
>         struct uffdio_copy uffdio_copy = {};
>         struct uffdio_api uffdio_api = {};
>         int uffd;
> 
>         uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
>         uffdio_api.api = UFFD_API;
>         ioctl(uffd, UFFDIO_API, &uffdio_api);
> 
>         uffdio_register.range.start = (uintptr_t)dst;
>         uffdio_register.range.len = 4096;
>         uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
>         ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);
> 
>         uffdio_copy.dst = (uintptr_t)dst;
>         uffdio_copy.src = (uintptr_t)src;
>         uffdio_copy.len = 4096;
>         uffdio_copy.mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP;
>         ioctl(uffd, UFFDIO_COPY, &uffdio_copy);
> 
>         close(uffd);
> 
>         mprotect(dst, 4096, PROT_READ|PROT_WRITE);
>         return 0;
> }

Thanks, I'll post a patch.

PS: next time feel free to try "strace ./reproducer", it'll do the
translations and I found it handy to work with syzbot.

-- 
Peter Xu



  reply	other threads:[~2024-04-22 13:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-21 20:16 [syzbot] [mm?] WARNING in __page_table_check_ptes_set syzbot
2024-04-22 10:07 ` David Hildenbrand
2024-04-22 10:38   ` David Hildenbrand
2024-04-22 11:46     ` David Hildenbrand
2024-04-22 13:28       ` Peter Xu [this message]
2024-04-22 15:10         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiZmCl3fTFfIYf1t@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.