* Re: BUG: unable to handle kernel paging request in fuse_copy_do
[not found] <CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com>
@ 2024-03-22 13:50 ` Miklos Szeredi
2024-03-22 15:41 ` David Hildenbrand
0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-22 13:50 UTC (permalink / raw)
To: xingwei lee
Cc: linux-fsdevel, linux-kernel, samsun1006219, syzkaller-bugs,
linux-mm, Mike Rapoport
[MM list + secretmem author CC-d]
On Thu, 21 Mar 2024 at 08:52, xingwei lee <xrivendell7@gmail.com> wrote:
>
> Hello I found a bug titled "BUG: unable to handle kernel paging
> request in fuse_copy_do” with modified syzkaller, and maybe it is
> related to fs/fuse.
> I also confirmed in the latest upstream.
>
> If you fix this issue, please add the following tag to the commit:
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
Thanks for the report. This looks like a secretmem vs get_user_pages issue.
I reduced the syz reproducer to a minimal one that isn't dependent on fuse:
=== repro.c ===
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/socket.h>
int main(void)
{
int fd1, fd2, fd3;
int pip[2];
struct iovec iov;
void *addr;
fd1 = syscall(__NR_memfd_secret, 0);
addr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
ftruncate(fd1, 7);
fd2 = socket(AF_INET, SOCK_DGRAM, 0);
getsockopt(fd2, 0, 0, NULL, addr);
pipe(pip);
iov.iov_base = addr;
iov.iov_len = 0x50;
vmsplice(pip[1], &iov, 1, 0);
fd3 = open("/tmp/repro-secretmem.test", O_RDWR | O_CREAT, 0x600);
splice(pip[0], NULL, fd3, NULL, 0x50, 0);
return 0;
}
=======
Thanks,
Miklos
>
> kernel: upstream 23956900041d968f9ad0f30db6dede4daccd7aa9
> kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=9f47e8dfa53b0b11
> with KASAN enabled
> compiler: gcc (Debian 12.2.0-14) 12.2.0
>
> BUG: unable to handle kernel paging request in fuse_copy_do
> UDPLite: UDP-Lite is deprecated and scheduled to be removed in 2025,
> please contact the netdev mailing list
> BUG: unable to handle page fault for address: ffff88802c29c000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 13001067 P4D 13001067 PUD 13002067 PMD 24c8d063 PTE 800fffffd3d63060
> Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 8221 Comm: 1e9 Not tainted 6.8.0-05202-g9187210eee7d-dirty #21
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.16.2-1.fc38 04/01/2014
> RIP: 0010:memcpy+0xc/0x20 arch/x86/lib/memcpy_64.S:38
> Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 48 89 f80
> RSP: 0018:ffffc9001065f9c8 EFLAGS: 00010246
> RAX: ffffc9001065fb10 RBX: ffffc9001065fc78 RCX: 0000000000000010
> RDX: 0000000000000010 RSI: ffff88802c29c000 RDI: ffffc9001065fb10
> RBP: 0000000000000010 R08: ffff88802c29c000 R09: 0000000000000001
> R10: ffffffff8ea82ed7 R11: ffffc9001065fd98 R12: ffffc9001065fac0
> R13: 0000000000000010 R14: ffffc9001065faf0 R15: ffffc9001065fcbc
> FS: 000000000f82d480(0000) GS:ffff88823bc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88802c29c000 CR3: 000000002dd7c000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> fuse_copy_do+0x152/0x340 fs/fuse/dev.c:758
> fuse_copy_one fs/fuse/dev.c:1007 [inline]
> fuse_dev_do_write+0x1df/0x26a0 fs/fuse/dev.c:1863
> fuse_dev_write+0x129/0x1b0 fs/fuse/dev.c:1960
> call_write_iter include/linux/fs.h:2108 [inline]
> new_sync_write fs/read_write.c:497 [inline]
> vfs_write+0x62e/0x10a0 fs/read_write.c:590
> ksys_write+0xf6/0x1d0 fs/read_write.c:643
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x7c/0x1d0 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x6c/0x74
>
> =* repro.c =*
> #define _GNU_SOURCE
>
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <setjmp.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/prctl.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
>
> #ifndef __NR_memfd_secret
> #define __NR_memfd_secret 447
> #endif
>
> static __thread int clone_ongoing;
> static __thread int skip_segv;
> static __thread jmp_buf segv_env;
>
> static void segv_handler(int sig, siginfo_t* info, void* ctx) {
> if (__atomic_load_n(&clone_ongoing, __ATOMIC_RELAXED) != 0) {
> exit(sig);
> }
> uintptr_t addr = (uintptr_t)info->si_addr;
> const uintptr_t prog_start = 1 << 20;
> const uintptr_t prog_end = 100 << 20;
> int skip = __atomic_load_n(&skip_segv, __ATOMIC_RELAXED) != 0;
> int valid = addr < prog_start || addr > prog_end;
> if (skip && valid) {
> _longjmp(segv_env, 1);
> }
> exit(sig);
> }
>
> static void install_segv_handler(void) {
> struct sigaction sa;
> memset(&sa, 0, sizeof(sa));
> sa.sa_handler = SIG_IGN;
> syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
> syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
> memset(&sa, 0, sizeof(sa));
> sa.sa_sigaction = segv_handler;
> sa.sa_flags = SA_NODEFER | SA_SIGINFO;
> sigaction(SIGSEGV, &sa, NULL);
> sigaction(SIGBUS, &sa, NULL);
> }
>
> #define NONFAILING(...) \
> ({ \
> int ok = 1; \
> __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST); \
> if (_setjmp(segv_env) == 0) { \
> __VA_ARGS__; \
> } else \
> ok = 0; \
> __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); \
> ok; \
> })
>
> static void sleep_ms(uint64_t ms) {
> usleep(ms * 1000);
> }
>
> static uint64_t current_time_ms(void) {
> struct timespec ts;
> if (clock_gettime(CLOCK_MONOTONIC, &ts))
> exit(1);
> return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
>
> static bool write_file(const char* file, const char* what, ...) {
> char buf[1024];
> va_list args;
> va_start(args, what);
> vsnprintf(buf, sizeof(buf), what, args);
> va_end(args);
> buf[sizeof(buf) - 1] = 0;
> int len = strlen(buf);
> int fd = open(file, O_WRONLY | O_CLOEXEC);
> if (fd == -1)
> return false;
> if (write(fd, buf, len) != len) {
> int err = errno;
> close(fd);
> errno = err;
> return false;
> }
> close(fd);
> return true;
> }
>
> static void kill_and_wait(int pid, int* status) {
> kill(-pid, SIGKILL);
> kill(pid, SIGKILL);
> for (int i = 0; i < 100; i++) {
> if (waitpid(-1, status, WNOHANG | __WALL) == pid)
> return;
> usleep(1000);
> }
> DIR* dir = opendir("/sys/fs/fuse/connections");
> if (dir) {
> for (;;) {
> struct dirent* ent = readdir(dir);
> if (!ent)
> break;
> if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
> continue;
> char abort[300];
> snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
> ent->d_name);
> int fd = open(abort, O_WRONLY);
> if (fd == -1) {
> continue;
> }
> if (write(fd, abort, 1) < 0) {
> }
> close(fd);
> }
> closedir(dir);
> } else {
> }
> while (waitpid(-1, status, __WALL) != pid) {
> }
> }
>
> static void setup_test() {
> prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
> setpgrp();
> write_file("/proc/self/oom_score_adj", "1000");
> }
>
> static void execute_one(void);
>
> #define WAIT_FLAGS __WALL
>
> static void loop(void) {
> int iter = 0;
> for (;; iter++) {
> int pid = fork();
> if (pid < 0)
> exit(1);
> if (pid == 0) {
> setup_test();
> execute_one();
> exit(0);
> }
> int status = 0;
> uint64_t start = current_time_ms();
> for (;;) {
> if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
> break;
> sleep_ms(1);
> if (current_time_ms() - start < 5000)
> continue;
> kill_and_wait(pid, &status);
> break;
> }
> }
> }
>
> uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
>
> void execute_one(void) {
> intptr_t res = 0;
> NONFAILING(memcpy((void*)0x20002040, "./file0\000", 8));
> syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20002040ul, /*mode=*/0ul);
> NONFAILING(memcpy((void*)0x20002080, "/dev/fuse\000", 10));
> res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20002080ul,
> /*flags=*/2ul, /*mode=*/0ul);
> if (res != -1)
> r[0] = res;
> NONFAILING(memcpy((void*)0x200020c0, "./file0\000", 8));
> NONFAILING(memcpy((void*)0x20002100, "fuse\000", 5));
> NONFAILING(memcpy((void*)0x20002140, "fd", 2));
> NONFAILING(*(uint8_t*)0x20002142 = 0x3d);
> NONFAILING(sprintf((char*)0x20002143, "0x%016llx", (long long)r[0]));
> NONFAILING(*(uint8_t*)0x20002155 = 0x2c);
> NONFAILING(memcpy((void*)0x20002156, "rootmode", 8));
> NONFAILING(*(uint8_t*)0x2000215e = 0x3d);
> NONFAILING(sprintf((char*)0x2000215f, "%023llo", (long long)0x4000));
> NONFAILING(*(uint8_t*)0x20002176 = 0x2c);
> NONFAILING(memcpy((void*)0x20002177, "user_id", 7));
> NONFAILING(*(uint8_t*)0x2000217e = 0x3d);
> NONFAILING(sprintf((char*)0x2000217f, "%020llu", (long long)0));
> NONFAILING(*(uint8_t*)0x20002193 = 0x2c);
> NONFAILING(memcpy((void*)0x20002194, "group_id", 8));
> NONFAILING(*(uint8_t*)0x2000219c = 0x3d);
> NONFAILING(sprintf((char*)0x2000219d, "%020llu", (long long)0));
> NONFAILING(*(uint8_t*)0x200021b1 = 0x2c);
> NONFAILING(*(uint8_t*)0x200021b2 = 0);
> syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x200020c0ul, /*type=*/0x20002100ul,
> /*flags=*/0ul, /*opts=*/0x20002140ul);
> res = syscall(__NR_memfd_secret, /*flags=*/0ul);
> if (res != -1)
> r[1] = res;
> syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0xb36000ul,
> /*prot=PROT_GROWSUP|PROT_READ*/ 0x2000001ul,
> /*flags=MAP_STACK|MAP_POPULATE|MAP_FIXED|MAP_SHARED*/ 0x28011ul,
> /*fd=*/r[1], /*offset=*/0ul);
> syscall(__NR_ftruncate, /*fd=*/r[1], /*len=*/7ul);
> res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/2ul, /*proto=*/0x88);
> if (res != -1)
> r[2] = res;
> NONFAILING(*(uint32_t*)0x20000280 = 0);
> syscall(__NR_getsockopt, /*fd=*/r[2], /*level=*/1, /*optname=*/0x11,
> /*optval=*/0ul, /*optlen=*/0x20000280ul);
> NONFAILING(*(uint32_t*)0x20000000 = 0x50);
> NONFAILING(*(uint32_t*)0x20000004 = 0);
> NONFAILING(*(uint64_t*)0x20000008 = 0);
> NONFAILING(*(uint32_t*)0x20000010 = 7);
> NONFAILING(*(uint32_t*)0x20000014 = 0x27);
> NONFAILING(*(uint32_t*)0x20000018 = 0);
> NONFAILING(*(uint32_t*)0x2000001c = 0);
> NONFAILING(*(uint16_t*)0x20000020 = 0);
> NONFAILING(*(uint16_t*)0x20000022 = 0);
> NONFAILING(*(uint32_t*)0x20000024 = 0);
> NONFAILING(*(uint32_t*)0x20000028 = 0);
> NONFAILING(*(uint16_t*)0x2000002c = 0);
> NONFAILING(*(uint16_t*)0x2000002e = 0);
> NONFAILING(memset((void*)0x20000030, 0, 32));
> syscall(__NR_write, /*fd=*/r[0], /*arg=*/0x20000000ul, /*len=*/0x50ul);
> }
> int main(void) {
> syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
> /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> /*offset=*/0ul);
> install_segv_handler();
> loop();
> return 0;
> }
>
> =* repro.txt =*
> mkdirat(0xffffffffffffff9c, &(0x7f0000002040)='./file0\x00', 0x0)
> r0 = openat$fuse(0xffffffffffffff9c, &(0x7f0000002080), 0x2, 0x0)
> mount$fuse(0x0, &(0x7f00000020c0)='./file0\x00', &(0x7f0000002100),
> 0x0, &(0x7f0000002140)={{'fd', 0x3d, r0}, 0x2c, {'rootmode', 0x3d,
> 0x4000}})
> r1 = memfd_secret(0x0)
> mmap(&(0x7f0000000000/0xb36000)=nil, 0xb36000, 0x2000001, 0x28011, r1, 0x0)
> ftruncate(r1, 0x7)
> r2 = socket$inet_udplite(0x2, 0x2, 0x88)
> getsockopt$sock_cred(r2, 0x1, 0x11, 0x0, &(0x7f0000000280))
> write$FUSE_INIT(r0, &(0x7f0000000000)={0x50}, 0x50)
>
>
> see aslo https://gist.github.com/xrivendell7/961be96ae091c9671bb56efea902cec4.
>
> I hope it helps.
> best regards.
> xingwei Lee
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
2024-03-22 13:50 ` BUG: unable to handle kernel paging request in fuse_copy_do Miklos Szeredi
@ 2024-03-22 15:41 ` David Hildenbrand
2024-03-22 19:46 ` Miklos Szeredi
0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:41 UTC (permalink / raw)
To: Miklos Szeredi, xingwei lee
Cc: linux-fsdevel, linux-kernel, samsun1006219, syzkaller-bugs,
linux-mm, Mike Rapoport
On 22.03.24 14:50, Miklos Szeredi wrote:
> [MM list + secretmem author CC-d]
>
> On Thu, 21 Mar 2024 at 08:52, xingwei lee <xrivendell7@gmail.com> wrote:
>>
>> Hello I found a bug titled "BUG: unable to handle kernel paging
>> request in fuse_copy_do” with modified syzkaller, and maybe it is
>> related to fs/fuse.
>> I also confirmed in the latest upstream.
>>
>> If you fix this issue, please add the following tag to the commit:
>> Reported-by: xingwei lee <xrivendell7@gmail.com>
>> Reported-by: yue sun <samsun1006219@gmail.com>
>
> Thanks for the report. This looks like a secretmem vs get_user_pages issue.
>
> I reduced the syz reproducer to a minimal one that isn't dependent on fuse:
>
> === repro.c ===
> #define _GNU_SOURCE
>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/syscall.h>
> #include <sys/socket.h>
>
> int main(void)
> {
> int fd1, fd2, fd3;
> int pip[2];
> struct iovec iov;
> void *addr;
>
> fd1 = syscall(__NR_memfd_secret, 0);
> addr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
> ftruncate(fd1, 7);
> fd2 = socket(AF_INET, SOCK_DGRAM, 0);
> getsockopt(fd2, 0, 0, NULL, addr);
>
> pipe(pip);
> iov.iov_base = addr;
> iov.iov_len = 0x50;
> vmsplice(pip[1], &iov, 1, 0);
pip[1] should be the write end. So it will be used as the source.
I assume we go the ITER_SOURCE path in vmsplice, and call
vmsplice_to_pipe(). Then we call iter_to_pipe().
I would expect iov_iter_get_pages2() -> get_user_pages_fast() to fail on
secretmem pages?
But at least the vmsplice() just seems to work. Which is weird, because
GUP-fast should not apply (page not faulted in?) and check_vma_flags()
bails out early on vma_is_secretmem(vma).
So something is not quite right.
>
> fd3 = open("/tmp/repro-secretmem.test", O_RDWR | O_CREAT, 0x600);
> splice(pip[0], NULL, fd3, NULL, 0x50, 0);
>
> return 0;
> }
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
2024-03-22 15:41 ` David Hildenbrand
@ 2024-03-22 19:46 ` Miklos Szeredi
[not found] ` <620f68b0-4fe0-4e3e-856a-dedb4bcdf3a7@redhat.com>
0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-22 19:46 UTC (permalink / raw)
To: David Hildenbrand
Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
syzkaller-bugs, linux-mm, Mike Rapoport
On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
> But at least the vmsplice() just seems to work. Which is weird, because
> GUP-fast should not apply (page not faulted in?)
But it is faulted in, and that indeed seems to be the root cause.
Improved repro:
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/syscall.h>
int main(void)
{
int fd1, fd2;
int pip[2];
struct iovec iov;
char *addr;
int ret;
fd1 = syscall(__NR_memfd_secret, 0);
addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
ftruncate(fd1, 7);
addr[0] = 1; /* fault in page */
pipe(pip);
iov.iov_base = addr;
iov.iov_len = 0x50;
ret = vmsplice(pip[1], &iov, 1, 0);
if (ret == -1 && errno == EFAULT) {
printf("Success\n");
return 0;
}
fd2 = open("/tmp/repro-secretmem.test", O_RDWR | O_CREAT, 0x600);
splice(pip[0], NULL, fd2, NULL, 0x50, 0);
return 0;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
[not found] ` <620f68b0-4fe0-4e3e-856a-dedb4bcdf3a7@redhat.com>
@ 2024-03-22 21:13 ` Miklos Szeredi
2024-03-22 21:18 ` David Hildenbrand
0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-22 21:13 UTC (permalink / raw)
To: David Hildenbrand
Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
syzkaller-bugs, linux-mm, Mike Rapoport
On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
>
> On 22.03.24 20:46, Miklos Szeredi wrote:
> > On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
> >
> >> But at least the vmsplice() just seems to work. Which is weird, because
> >> GUP-fast should not apply (page not faulted in?)
> >
> > But it is faulted in, and that indeed seems to be the root cause.
>
> secretmem mmap() won't populate the page tables. So it's not faulted in yet.
>
> When we GUP via vmsplice, GUP-fast should not find it in the page tables
> and fallback to slow GUP.
>
> There, we seem to pass check_vma_flags(), trigger faultin_page() to
> fault it in, and then find it via follow_page_mask().
>
> ... and I wonder how we manage to skip check_vma_flags(), or otherwise
> managed to GUP it.
>
> vmsplice() should, in theory, never succeed here.
>
> Weird :/
>
> > Improved repro:
> >
> > #define _GNU_SOURCE
> >
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <stdio.h>
> > #include <errno.h>
> > #include <sys/mman.h>
> > #include <sys/syscall.h>
> >
> > int main(void)
> > {
> > int fd1, fd2;
> > int pip[2];
> > struct iovec iov;
> > char *addr;
> > int ret;
> >
> > fd1 = syscall(__NR_memfd_secret, 0);
> > addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
> > ftruncate(fd1, 7);
> > addr[0] = 1; /* fault in page */
Here the page is faulted in and GUP-fast will find it. It's not in
the kernel page table, but it is in the user page table, which is what
matter for GUP.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
2024-03-22 21:13 ` Miklos Szeredi
@ 2024-03-22 21:18 ` David Hildenbrand
2024-03-22 21:33 ` David Hildenbrand
0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-22 21:18 UTC (permalink / raw)
To: Miklos Szeredi
Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
syzkaller-bugs, linux-mm, Mike Rapoport
On 22.03.24 22:13, Miklos Szeredi wrote:
> On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 22.03.24 20:46, Miklos Szeredi wrote:
>>> On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> But at least the vmsplice() just seems to work. Which is weird, because
>>>> GUP-fast should not apply (page not faulted in?)
>>>
>>> But it is faulted in, and that indeed seems to be the root cause.
>>
>> secretmem mmap() won't populate the page tables. So it's not faulted in yet.
>>
>> When we GUP via vmsplice, GUP-fast should not find it in the page tables
>> and fallback to slow GUP.
>>
>> There, we seem to pass check_vma_flags(), trigger faultin_page() to
>> fault it in, and then find it via follow_page_mask().
>>
>> ... and I wonder how we manage to skip check_vma_flags(), or otherwise
>> managed to GUP it.
>>
>> vmsplice() should, in theory, never succeed here.
>>
>> Weird :/
>>
>>> Improved repro:
>>>
>>> #define _GNU_SOURCE
>>>
>>> #include <fcntl.h>
>>> #include <unistd.h>
>>> #include <stdio.h>
>>> #include <errno.h>
>>> #include <sys/mman.h>
>>> #include <sys/syscall.h>
>>>
>>> int main(void)
>>> {
>>> int fd1, fd2;
>>> int pip[2];
>>> struct iovec iov;
>>> char *addr;
>>> int ret;
>>>
>>> fd1 = syscall(__NR_memfd_secret, 0);
>>> addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
>>> ftruncate(fd1, 7);
>>> addr[0] = 1; /* fault in page */
>
> Here the page is faulted in and GUP-fast will find it. It's not in
> the kernel page table, but it is in the user page table, which is what
> matter for GUP.
Trust me, I know the GUP code very well :P
gup_pte_range -- GUP fast -- contains:
if (unlikely(folio_is_secretmem(folio))) {
gup_put_folio(folio, 1, flags);
goto pte_unmap;
}
So we "should" be rejecting any secretmem folios and fallback to GUP slow.
... we don't check the same in gup_huge_pmd(), but we shouldn't ever see
THP in secretmem code.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
2024-03-22 21:18 ` David Hildenbrand
@ 2024-03-22 21:33 ` David Hildenbrand
[not found] ` <dd3e28b3-647c-4657-9c3f-9778bb046799@redhat.com>
0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-22 21:33 UTC (permalink / raw)
To: Miklos Szeredi
Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
syzkaller-bugs, linux-mm, Mike Rapoport
On 22.03.24 22:18, David Hildenbrand wrote:
> On 22.03.24 22:13, Miklos Szeredi wrote:
>> On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 22.03.24 20:46, Miklos Szeredi wrote:
>>>> On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>> But at least the vmsplice() just seems to work. Which is weird, because
>>>>> GUP-fast should not apply (page not faulted in?)
>>>>
>>>> But it is faulted in, and that indeed seems to be the root cause.
>>>
>>> secretmem mmap() won't populate the page tables. So it's not faulted in yet.
>>>
>>> When we GUP via vmsplice, GUP-fast should not find it in the page tables
>>> and fallback to slow GUP.
>>>
>>> There, we seem to pass check_vma_flags(), trigger faultin_page() to
>>> fault it in, and then find it via follow_page_mask().
>>>
>>> ... and I wonder how we manage to skip check_vma_flags(), or otherwise
>>> managed to GUP it.
>>>
>>> vmsplice() should, in theory, never succeed here.
>>>
>>> Weird :/
>>>
>>>> Improved repro:
>>>>
>>>> #define _GNU_SOURCE
>>>>
>>>> #include <fcntl.h>
>>>> #include <unistd.h>
>>>> #include <stdio.h>
>>>> #include <errno.h>
>>>> #include <sys/mman.h>
>>>> #include <sys/syscall.h>
>>>>
>>>> int main(void)
>>>> {
>>>> int fd1, fd2;
>>>> int pip[2];
>>>> struct iovec iov;
>>>> char *addr;
>>>> int ret;
>>>>
>>>> fd1 = syscall(__NR_memfd_secret, 0);
>>>> addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
>>>> ftruncate(fd1, 7);
>>>> addr[0] = 1; /* fault in page */
>>
>> Here the page is faulted in and GUP-fast will find it. It's not in
>> the kernel page table, but it is in the user page table, which is what
>> matter for GUP.
>
> Trust me, I know the GUP code very well :P
>
> gup_pte_range -- GUP fast -- contains:
>
> if (unlikely(folio_is_secretmem(folio))) {
> gup_put_folio(folio, 1, flags);
> goto pte_unmap;
> }
>
> So we "should" be rejecting any secretmem folios and fallback to GUP slow.
>
>
> ... we don't check the same in gup_huge_pmd(), but we shouldn't ever see
> THP in secretmem code.
>
Ehm:
[ 29.441405] Secretmem fault: PFN: 1096177
[ 29.442092] GUP-fast: PFN: 1096177
... is folio_is_secretmem() broken?
... is it something "obvious" like:
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 35f3a4a8ceb1e..6996f1f53f147 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
* We know that secretmem pages are not compound and LRU so we can
* save a couple of cycles here.
*/
- if (folio_test_large(folio) || !folio_test_lru(folio))
+ if (folio_test_large(folio) || folio_test_lru(folio))
return false;
mapping = (struct address_space *)
--
Cheers,
David / dhildenb
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
[not found] ` <b40eb0b7-7362-4d19-95b3-e06435e6e09c@redhat.com>
@ 2024-03-24 10:29 ` Mike Rapoport
2024-03-25 11:21 ` Miklos Szeredi
1 sibling, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2024-03-24 10:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: Miklos Szeredi, xingwei lee, linux-fsdevel, linux-kernel,
samsun1006219, syzkaller-bugs, linux-mm
On Fri, Mar 22, 2024 at 10:56:08PM +0100, David Hildenbrand wrote:
> On 22.03.24 22:37, David Hildenbrand wrote:
> > On 22.03.24 22:33, David Hildenbrand wrote:
> > > On 22.03.24 22:18, David Hildenbrand wrote:
> > > > On 22.03.24 22:13, Miklos Szeredi wrote:
> > > > > On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
> > > > > >
> > > > > > On 22.03.24 20:46, Miklos Szeredi wrote:
> > > > > > > On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
> > > > > > >
> > > > > > > > But at least the vmsplice() just seems to work. Which is weird, because
> > > > > > > > GUP-fast should not apply (page not faulted in?)
> > > > > > >
> > > > > > > But it is faulted in, and that indeed seems to be the root cause.
> > > > > >
> > > > > > secretmem mmap() won't populate the page tables. So it's not faulted in yet.
> > > > > >
> > > > > > When we GUP via vmsplice, GUP-fast should not find it in the page tables
> > > > > > and fallback to slow GUP.
> > > > > >
> > > > > > There, we seem to pass check_vma_flags(), trigger faultin_page() to
> > > > > > fault it in, and then find it via follow_page_mask().
> > > > > >
> > > > > > ... and I wonder how we manage to skip check_vma_flags(), or otherwise
> > > > > > managed to GUP it.
> > > > > >
> > > > > > vmsplice() should, in theory, never succeed here.
> > > > > >
> > > > > > Weird :/
> > > > > >
> > > > > > > Improved repro:
> > > > > > >
> > > > > > > #define _GNU_SOURCE
> > > > > > >
> > > > > > > #include <fcntl.h>
> > > > > > > #include <unistd.h>
> > > > > > > #include <stdio.h>
> > > > > > > #include <errno.h>
> > > > > > > #include <sys/mman.h>
> > > > > > > #include <sys/syscall.h>
> > > > > > >
> > > > > > > int main(void)
> > > > > > > {
> > > > > > > int fd1, fd2;
> > > > > > > int pip[2];
> > > > > > > struct iovec iov;
> > > > > > > char *addr;
> > > > > > > int ret;
> > > > > > >
> > > > > > > fd1 = syscall(__NR_memfd_secret, 0);
> > > > > > > addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
> > > > > > > ftruncate(fd1, 7);
> > > > > > > addr[0] = 1; /* fault in page */
> > > > >
> > > > > Here the page is faulted in and GUP-fast will find it. It's not in
> > > > > the kernel page table, but it is in the user page table, which is what
> > > > > matter for GUP.
> > > >
> > > > Trust me, I know the GUP code very well :P
> > > >
> > > > gup_pte_range -- GUP fast -- contains:
> > > >
> > > > if (unlikely(folio_is_secretmem(folio))) {
> > > > gup_put_folio(folio, 1, flags);
> > > > goto pte_unmap;
> > > > }
> > > >
> > > > So we "should" be rejecting any secretmem folios and fallback to GUP slow.
> > > >
> > > >
> > > > ... we don't check the same in gup_huge_pmd(), but we shouldn't ever see
> > > > THP in secretmem code.
> > > >
> > >
> > > Ehm:
> > >
> > > [ 29.441405] Secretmem fault: PFN: 1096177
> > > [ 29.442092] GUP-fast: PFN: 1096177
> > >
> > >
> > > ... is folio_is_secretmem() broken?
> > >
> > > ... is it something "obvious" like:
> > >
> > > diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> > > index 35f3a4a8ceb1e..6996f1f53f147 100644
> > > --- a/include/linux/secretmem.h
> > > +++ b/include/linux/secretmem.h
> > > @@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
> > > * We know that secretmem pages are not compound and LRU so we can
> > > * save a couple of cycles here.
> > > */
> > > - if (folio_test_large(folio) || !folio_test_lru(folio))
> > > + if (folio_test_large(folio) || folio_test_lru(folio))
> > > return false;
> > > mapping = (struct address_space *)
> >
> > ... yes, that does the trick!
> >
>
> Proper patch (I might send out again on Monday "officially"). There are
> other improvements we want to do to folio_is_secretmem() in the light of
> folio_fast_pin_allowed(), that I wanted to do a while ago. I might send
> a patch for that as well now that I'm at it.
The most robust but a bit slower solution is to make folio_is_secretmem()
call folio_mapping() rather than open code the check.
What improvements did you have in mind?
> From 85558a46d9f249f26bd77dd3b18d14f248464845 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Fri, 22 Mar 2024 22:45:36 +0100
> Subject: [PATCH] mm/secretmem: fix GUP-fast succeeding on secretmem folios
>
> folio_is_secretmem() states that secretmem folios cannot be LRU folios:
> so we may only exit early if we find an LRU folio. Yet, we exit early if
> we find a folio that is not a secretmem folio.
>
> Consequently, folio_is_secretmem() fails to detect secretmem folios and,
> therefore, we can succeed in grabbing a secretmem folio during GUP-fast,
> crashing the kernel when we later try reading/writing to the folio, because
> the folio has been unmapped from the directmap.
>
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
> include/linux/secretmem.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index 35f3a4a8ceb1..6996f1f53f14 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
> * We know that secretmem pages are not compound and LRU so we can
> * save a couple of cycles here.
> */
> - if (folio_test_large(folio) || !folio_test_lru(folio))
> + if (folio_test_large(folio) || folio_test_lru(folio))
> return false;
> mapping = (struct address_space *)
> --
> 2.43.2
>
>
> --
> Cheers,
>
> David / dhildenb
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: BUG: unable to handle kernel paging request in fuse_copy_do
[not found] ` <b40eb0b7-7362-4d19-95b3-e06435e6e09c@redhat.com>
2024-03-24 10:29 ` Mike Rapoport
@ 2024-03-25 11:21 ` Miklos Szeredi
1 sibling, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-25 11:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
syzkaller-bugs, linux-mm, Mike Rapoport
On Fri, 22 Mar 2024 at 22:56, David Hildenbrand <david@redhat.com> wrote:
> From 85558a46d9f249f26bd77dd3b18d14f248464845 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Fri, 22 Mar 2024 22:45:36 +0100
> Subject: [PATCH] mm/secretmem: fix GUP-fast succeeding on secretmem folios
>
> folio_is_secretmem() states that secretmem folios cannot be LRU folios:
> so we may only exit early if we find an LRU folio. Yet, we exit early if
> we find a folio that is not a secretmem folio.
>
> Consequently, folio_is_secretmem() fails to detect secretmem folios and,
> therefore, we can succeed in grabbing a secretmem folio during GUP-fast,
> crashing the kernel when we later try reading/writing to the folio, because
> the folio has been unmapped from the directmap.
>
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Verified that it's no longer crashing with the reproducers.
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-03-25 11:21 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com>
2024-03-22 13:50 ` BUG: unable to handle kernel paging request in fuse_copy_do Miklos Szeredi
2024-03-22 15:41 ` David Hildenbrand
2024-03-22 19:46 ` Miklos Szeredi
[not found] ` <620f68b0-4fe0-4e3e-856a-dedb4bcdf3a7@redhat.com>
2024-03-22 21:13 ` Miklos Szeredi
2024-03-22 21:18 ` David Hildenbrand
2024-03-22 21:33 ` David Hildenbrand
[not found] ` <dd3e28b3-647c-4657-9c3f-9778bb046799@redhat.com>
[not found] ` <b40eb0b7-7362-4d19-95b3-e06435e6e09c@redhat.com>
2024-03-24 10:29 ` Mike Rapoport
2024-03-25 11:21 ` Miklos Szeredi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).