Re: BUG: unable to handle kernel paging request in fuse_copy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
       [not found] <CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com>
@ 2024-03-22 13:50 ` Miklos Szeredi
  2024-03-22 15:41   ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-22 13:50 UTC (permalink / raw)
  To: xingwei lee
  Cc: linux-fsdevel, linux-kernel, samsun1006219, syzkaller-bugs,
	linux-mm, Mike Rapoport

[MM list + secretmem author CC-d]

On Thu, 21 Mar 2024 at 08:52, xingwei lee <xrivendell7@gmail.com> wrote:
>
> Hello I found a bug titled "BUG: unable to handle kernel paging
> request in fuse_copy_do” with modified syzkaller, and maybe it is
> related to fs/fuse.
> I also confirmed in the latest upstream.
>
> If you fix this issue, please add the following tag to the commit:
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>

Thanks for the report.   This looks like a secretmem vs get_user_pages issue.

I reduced the syz reproducer to a minimal one that isn't dependent on fuse:

=== repro.c ===
#define _GNU_SOURCE

#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/socket.h>

int main(void)
{
        int fd1, fd2, fd3;
        int pip[2];
        struct iovec iov;
        void *addr;

        fd1 = syscall(__NR_memfd_secret, 0);
        addr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
        ftruncate(fd1, 7);
        fd2 = socket(AF_INET, SOCK_DGRAM, 0);
        getsockopt(fd2, 0, 0, NULL, addr);

        pipe(pip);
        iov.iov_base = addr;
        iov.iov_len = 0x50;
        vmsplice(pip[1], &iov, 1, 0);

        fd3 = open("/tmp/repro-secretmem.test", O_RDWR | O_CREAT, 0x600);
        splice(pip[0], NULL, fd3, NULL, 0x50, 0);

        return 0;
}
=======

Thanks,
Miklos

>
> kernel: upstream 23956900041d968f9ad0f30db6dede4daccd7aa9
> kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=9f47e8dfa53b0b11
> with KASAN enabled
> compiler: gcc (Debian 12.2.0-14) 12.2.0
>
> BUG: unable to handle kernel paging request in fuse_copy_do
> UDPLite: UDP-Lite is deprecated and scheduled to be removed in 2025,
> please contact the netdev mailing list
> BUG: unable to handle page fault for address: ffff88802c29c000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 13001067 P4D 13001067 PUD 13002067 PMD 24c8d063 PTE 800fffffd3d63060
> Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 8221 Comm: 1e9 Not tainted 6.8.0-05202-g9187210eee7d-dirty #21
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.16.2-1.fc38 04/01/2014
> RIP: 0010:memcpy+0xc/0x20 arch/x86/lib/memcpy_64.S:38
> Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 48 89 f80
> RSP: 0018:ffffc9001065f9c8 EFLAGS: 00010246
> RAX: ffffc9001065fb10 RBX: ffffc9001065fc78 RCX: 0000000000000010
> RDX: 0000000000000010 RSI: ffff88802c29c000 RDI: ffffc9001065fb10
> RBP: 0000000000000010 R08: ffff88802c29c000 R09: 0000000000000001
> R10: ffffffff8ea82ed7 R11: ffffc9001065fd98 R12: ffffc9001065fac0
> R13: 0000000000000010 R14: ffffc9001065faf0 R15: ffffc9001065fcbc
> FS: 000000000f82d480(0000) GS:ffff88823bc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88802c29c000 CR3: 000000002dd7c000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> fuse_copy_do+0x152/0x340 fs/fuse/dev.c:758
> fuse_copy_one fs/fuse/dev.c:1007 [inline]
> fuse_dev_do_write+0x1df/0x26a0 fs/fuse/dev.c:1863
> fuse_dev_write+0x129/0x1b0 fs/fuse/dev.c:1960
> call_write_iter include/linux/fs.h:2108 [inline]
> new_sync_write fs/read_write.c:497 [inline]
> vfs_write+0x62e/0x10a0 fs/read_write.c:590
> ksys_write+0xf6/0x1d0 fs/read_write.c:643
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x7c/0x1d0 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x6c/0x74
>
> =* repro.c =*
> #define _GNU_SOURCE
>
> #include <dirent.h>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <setjmp.h>
> #include <signal.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/prctl.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <time.h>
> #include <unistd.h>
>
> #ifndef __NR_memfd_secret
> #define __NR_memfd_secret 447
> #endif
>
> static __thread int clone_ongoing;
> static __thread int skip_segv;
> static __thread jmp_buf segv_env;
>
> static void segv_handler(int sig, siginfo_t* info, void* ctx) {
>  if (__atomic_load_n(&clone_ongoing, __ATOMIC_RELAXED) != 0) {
>    exit(sig);
>  }
>  uintptr_t addr = (uintptr_t)info->si_addr;
>  const uintptr_t prog_start = 1 << 20;
>  const uintptr_t prog_end = 100 << 20;
>  int skip = __atomic_load_n(&skip_segv, __ATOMIC_RELAXED) != 0;
>  int valid = addr < prog_start || addr > prog_end;
>  if (skip && valid) {
>    _longjmp(segv_env, 1);
>  }
>  exit(sig);
> }
>
> static void install_segv_handler(void) {
>  struct sigaction sa;
>  memset(&sa, 0, sizeof(sa));
>  sa.sa_handler = SIG_IGN;
>  syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
>  syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
>  memset(&sa, 0, sizeof(sa));
>  sa.sa_sigaction = segv_handler;
>  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
>  sigaction(SIGSEGV, &sa, NULL);
>  sigaction(SIGBUS, &sa, NULL);
> }
>
> #define NONFAILING(...)                                  \
>  ({                                                     \
>    int ok = 1;                                          \
>    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST); \
>    if (_setjmp(segv_env) == 0) {                        \
>      __VA_ARGS__;                                       \
>    } else                                               \
>      ok = 0;                                            \
>    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); \
>    ok;                                                  \
>  })
>
> static void sleep_ms(uint64_t ms) {
>  usleep(ms * 1000);
> }
>
> static uint64_t current_time_ms(void) {
>  struct timespec ts;
>  if (clock_gettime(CLOCK_MONOTONIC, &ts))
>    exit(1);
>  return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
> }
>
> static bool write_file(const char* file, const char* what, ...) {
>  char buf[1024];
>  va_list args;
>  va_start(args, what);
>  vsnprintf(buf, sizeof(buf), what, args);
>  va_end(args);
>  buf[sizeof(buf) - 1] = 0;
>  int len = strlen(buf);
>  int fd = open(file, O_WRONLY | O_CLOEXEC);
>  if (fd == -1)
>    return false;
>  if (write(fd, buf, len) != len) {
>    int err = errno;
>    close(fd);
>    errno = err;
>    return false;
>  }
>  close(fd);
>  return true;
> }
>
> static void kill_and_wait(int pid, int* status) {
>  kill(-pid, SIGKILL);
>  kill(pid, SIGKILL);
>  for (int i = 0; i < 100; i++) {
>    if (waitpid(-1, status, WNOHANG | __WALL) == pid)
>      return;
>    usleep(1000);
>  }
>  DIR* dir = opendir("/sys/fs/fuse/connections");
>  if (dir) {
>    for (;;) {
>      struct dirent* ent = readdir(dir);
>      if (!ent)
>        break;
>      if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
>        continue;
>      char abort[300];
>      snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
>               ent->d_name);
>      int fd = open(abort, O_WRONLY);
>      if (fd == -1) {
>        continue;
>      }
>      if (write(fd, abort, 1) < 0) {
>      }
>      close(fd);
>    }
>    closedir(dir);
>  } else {
>  }
>  while (waitpid(-1, status, __WALL) != pid) {
>  }
> }
>
> static void setup_test() {
>  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
>  setpgrp();
>  write_file("/proc/self/oom_score_adj", "1000");
> }
>
> static void execute_one(void);
>
> #define WAIT_FLAGS __WALL
>
> static void loop(void) {
>  int iter = 0;
>  for (;; iter++) {
>    int pid = fork();
>    if (pid < 0)
>      exit(1);
>    if (pid == 0) {
>      setup_test();
>      execute_one();
>      exit(0);
>    }
>    int status = 0;
>    uint64_t start = current_time_ms();
>    for (;;) {
>      if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
>        break;
>      sleep_ms(1);
>      if (current_time_ms() - start < 5000)
>        continue;
>      kill_and_wait(pid, &status);
>      break;
>    }
>  }
> }
>
> uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
>
> void execute_one(void) {
>  intptr_t res = 0;
>  NONFAILING(memcpy((void*)0x20002040, "./file0\000", 8));
>  syscall(__NR_mkdirat, /*fd=*/0xffffff9c, /*path=*/0x20002040ul, /*mode=*/0ul);
>  NONFAILING(memcpy((void*)0x20002080, "/dev/fuse\000", 10));
>  res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20002080ul,
>                /*flags=*/2ul, /*mode=*/0ul);
>  if (res != -1)
>    r[0] = res;
>  NONFAILING(memcpy((void*)0x200020c0, "./file0\000", 8));
>  NONFAILING(memcpy((void*)0x20002100, "fuse\000", 5));
>  NONFAILING(memcpy((void*)0x20002140, "fd", 2));
>  NONFAILING(*(uint8_t*)0x20002142 = 0x3d);
>  NONFAILING(sprintf((char*)0x20002143, "0x%016llx", (long long)r[0]));
>  NONFAILING(*(uint8_t*)0x20002155 = 0x2c);
>  NONFAILING(memcpy((void*)0x20002156, "rootmode", 8));
>  NONFAILING(*(uint8_t*)0x2000215e = 0x3d);
>  NONFAILING(sprintf((char*)0x2000215f, "%023llo", (long long)0x4000));
>  NONFAILING(*(uint8_t*)0x20002176 = 0x2c);
>  NONFAILING(memcpy((void*)0x20002177, "user_id", 7));
>  NONFAILING(*(uint8_t*)0x2000217e = 0x3d);
>  NONFAILING(sprintf((char*)0x2000217f, "%020llu", (long long)0));
>  NONFAILING(*(uint8_t*)0x20002193 = 0x2c);
>  NONFAILING(memcpy((void*)0x20002194, "group_id", 8));
>  NONFAILING(*(uint8_t*)0x2000219c = 0x3d);
>  NONFAILING(sprintf((char*)0x2000219d, "%020llu", (long long)0));
>  NONFAILING(*(uint8_t*)0x200021b1 = 0x2c);
>  NONFAILING(*(uint8_t*)0x200021b2 = 0);
>  syscall(__NR_mount, /*src=*/0ul, /*dst=*/0x200020c0ul, /*type=*/0x20002100ul,
>          /*flags=*/0ul, /*opts=*/0x20002140ul);
>  res = syscall(__NR_memfd_secret, /*flags=*/0ul);
>  if (res != -1)
>    r[1] = res;
>  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0xb36000ul,
>          /*prot=PROT_GROWSUP|PROT_READ*/ 0x2000001ul,
>          /*flags=MAP_STACK|MAP_POPULATE|MAP_FIXED|MAP_SHARED*/ 0x28011ul,
>          /*fd=*/r[1], /*offset=*/0ul);
>  syscall(__NR_ftruncate, /*fd=*/r[1], /*len=*/7ul);
>  res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/2ul, /*proto=*/0x88);
>  if (res != -1)
>    r[2] = res;
>  NONFAILING(*(uint32_t*)0x20000280 = 0);
>  syscall(__NR_getsockopt, /*fd=*/r[2], /*level=*/1, /*optname=*/0x11,
>          /*optval=*/0ul, /*optlen=*/0x20000280ul);
>  NONFAILING(*(uint32_t*)0x20000000 = 0x50);
>  NONFAILING(*(uint32_t*)0x20000004 = 0);
>  NONFAILING(*(uint64_t*)0x20000008 = 0);
>  NONFAILING(*(uint32_t*)0x20000010 = 7);
>  NONFAILING(*(uint32_t*)0x20000014 = 0x27);
>  NONFAILING(*(uint32_t*)0x20000018 = 0);
>  NONFAILING(*(uint32_t*)0x2000001c = 0);
>  NONFAILING(*(uint16_t*)0x20000020 = 0);
>  NONFAILING(*(uint16_t*)0x20000022 = 0);
>  NONFAILING(*(uint32_t*)0x20000024 = 0);
>  NONFAILING(*(uint32_t*)0x20000028 = 0);
>  NONFAILING(*(uint16_t*)0x2000002c = 0);
>  NONFAILING(*(uint16_t*)0x2000002e = 0);
>  NONFAILING(memset((void*)0x20000030, 0, 32));
>  syscall(__NR_write, /*fd=*/r[0], /*arg=*/0x20000000ul, /*len=*/0x50ul);
> }
> int main(void) {
>  syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>          /*offset=*/0ul);
>  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
>          /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
>          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>          /*offset=*/0ul);
>  syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>          /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>          /*offset=*/0ul);
>  install_segv_handler();
>  loop();
>  return 0;
> }
>
> =* repro.txt =*
> mkdirat(0xffffffffffffff9c, &(0x7f0000002040)='./file0\x00', 0x0)
> r0 = openat$fuse(0xffffffffffffff9c, &(0x7f0000002080), 0x2, 0x0)
> mount$fuse(0x0, &(0x7f00000020c0)='./file0\x00', &(0x7f0000002100),
> 0x0, &(0x7f0000002140)={{'fd', 0x3d, r0}, 0x2c, {'rootmode', 0x3d,
> 0x4000}})
> r1 = memfd_secret(0x0)
> mmap(&(0x7f0000000000/0xb36000)=nil, 0xb36000, 0x2000001, 0x28011, r1, 0x0)
> ftruncate(r1, 0x7)
> r2 = socket$inet_udplite(0x2, 0x2, 0x88)
> getsockopt$sock_cred(r2, 0x1, 0x11, 0x0, &(0x7f0000000280))
> write$FUSE_INIT(r0, &(0x7f0000000000)={0x50}, 0x50)
>
>
> see aslo https://gist.github.com/xrivendell7/961be96ae091c9671bb56efea902cec4.
>
> I hope it helps.
> best regards.
> xingwei Lee


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
  2024-03-22 13:50 ` BUG: unable to handle kernel paging request in fuse_copy_do Miklos Szeredi
@ 2024-03-22 15:41   ` David Hildenbrand
  2024-03-22 19:46     ` Miklos Szeredi
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-22 15:41 UTC (permalink / raw)
  To: Miklos Szeredi, xingwei lee
  Cc: linux-fsdevel, linux-kernel, samsun1006219, syzkaller-bugs,
	linux-mm, Mike Rapoport

On 22.03.24 14:50, Miklos Szeredi wrote:
> [MM list + secretmem author CC-d]
> 
> On Thu, 21 Mar 2024 at 08:52, xingwei lee <xrivendell7@gmail.com> wrote:
>>
>> Hello I found a bug titled "BUG: unable to handle kernel paging
>> request in fuse_copy_do” with modified syzkaller, and maybe it is
>> related to fs/fuse.
>> I also confirmed in the latest upstream.
>>
>> If you fix this issue, please add the following tag to the commit:
>> Reported-by: xingwei lee <xrivendell7@gmail.com>
>> Reported-by: yue sun <samsun1006219@gmail.com>
> 
> Thanks for the report.   This looks like a secretmem vs get_user_pages issue.
> 
> I reduced the syz reproducer to a minimal one that isn't dependent on fuse:
> 
> === repro.c ===
> #define _GNU_SOURCE
> 
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/syscall.h>
> #include <sys/socket.h>
> 
> int main(void)
> {
>          int fd1, fd2, fd3;
>          int pip[2];
>          struct iovec iov;
>          void *addr;
> 
>          fd1 = syscall(__NR_memfd_secret, 0);
>          addr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
>          ftruncate(fd1, 7);
>          fd2 = socket(AF_INET, SOCK_DGRAM, 0);
>          getsockopt(fd2, 0, 0, NULL, addr);
> 
>          pipe(pip);
>          iov.iov_base = addr;
>          iov.iov_len = 0x50;
>          vmsplice(pip[1], &iov, 1, 0);

pip[1] should be the write end. So it will be used as the source.

I assume we go the ITER_SOURCE path in vmsplice, and call 
vmsplice_to_pipe(). Then we call iter_to_pipe().

I would expect iov_iter_get_pages2() -> get_user_pages_fast() to fail on 
secretmem pages?

But at least the vmsplice() just seems to work. Which is weird, because 
GUP-fast should not apply (page not faulted in?) and check_vma_flags() 
bails out early on vma_is_secretmem(vma).

So something is not quite right.

> 
>          fd3 = open("/tmp/repro-secretmem.test", O_RDWR | O_CREAT, 0x600);
>          splice(pip[0], NULL, fd3, NULL, 0x50, 0);
> 
>          return 0;
> }




-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
  2024-03-22 15:41   ` David Hildenbrand
@ 2024-03-22 19:46     ` Miklos Szeredi
       [not found]       ` <620f68b0-4fe0-4e3e-856a-dedb4bcdf3a7@redhat.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-22 19:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
	syzkaller-bugs, linux-mm, Mike Rapoport

On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:

> But at least the vmsplice() just seems to work. Which is weird, because
> GUP-fast should not apply (page not faulted in?)

But it is faulted in, and that indeed seems to be the root cause.
Improved repro:

#define _GNU_SOURCE

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/syscall.h>

int main(void)
{
        int fd1, fd2;
        int pip[2];
        struct iovec iov;
        char *addr;
        int ret;

        fd1 = syscall(__NR_memfd_secret, 0);
        addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
        ftruncate(fd1, 7);
        addr[0] = 1; /* fault in page */
        pipe(pip);
        iov.iov_base = addr;
        iov.iov_len = 0x50;
        ret = vmsplice(pip[1], &iov, 1, 0);
        if (ret == -1 && errno == EFAULT) {
                printf("Success\n");
                return 0;
        }

        fd2 = open("/tmp/repro-secretmem.test", O_RDWR | O_CREAT, 0x600);
        splice(pip[0], NULL, fd2, NULL, 0x50, 0);

        return 0;
}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
       [not found]       ` <620f68b0-4fe0-4e3e-856a-dedb4bcdf3a7@redhat.com>
@ 2024-03-22 21:13         ` Miklos Szeredi
  2024-03-22 21:18           ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-22 21:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
	syzkaller-bugs, linux-mm, Mike Rapoport

On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
>
> On 22.03.24 20:46, Miklos Szeredi wrote:
> > On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
> >
> >> But at least the vmsplice() just seems to work. Which is weird, because
> >> GUP-fast should not apply (page not faulted in?)
> >
> > But it is faulted in, and that indeed seems to be the root cause.
>
> secretmem mmap() won't populate the page tables. So it's not faulted in yet.
>
> When we GUP via vmsplice, GUP-fast should not find it in the page tables
> and fallback to slow GUP.
>
> There, we seem to pass check_vma_flags(), trigger faultin_page() to
> fault it in, and then find it via follow_page_mask().
>
> ... and I wonder how we manage to skip check_vma_flags(), or otherwise
> managed to GUP it.
>
> vmsplice() should, in theory, never succeed here.
>
> Weird :/
>
> > Improved repro:
> >
> > #define _GNU_SOURCE
> >
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <stdio.h>
> > #include <errno.h>
> > #include <sys/mman.h>
> > #include <sys/syscall.h>
> >
> > int main(void)
> > {
> >          int fd1, fd2;
> >          int pip[2];
> >          struct iovec iov;
> >          char *addr;
> >          int ret;
> >
> >          fd1 = syscall(__NR_memfd_secret, 0);
> >          addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
> >          ftruncate(fd1, 7);
> >          addr[0] = 1; /* fault in page */

Here the page is faulted in and GUP-fast will find it.  It's not in
the kernel page table, but it is in the user page table, which is what
matter for GUP.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
  2024-03-22 21:13         ` Miklos Szeredi
@ 2024-03-22 21:18           ` David Hildenbrand
  2024-03-22 21:33             ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-22 21:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
	syzkaller-bugs, linux-mm, Mike Rapoport

On 22.03.24 22:13, Miklos Szeredi wrote:
> On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 22.03.24 20:46, Miklos Szeredi wrote:
>>> On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> But at least the vmsplice() just seems to work. Which is weird, because
>>>> GUP-fast should not apply (page not faulted in?)
>>>
>>> But it is faulted in, and that indeed seems to be the root cause.
>>
>> secretmem mmap() won't populate the page tables. So it's not faulted in yet.
>>
>> When we GUP via vmsplice, GUP-fast should not find it in the page tables
>> and fallback to slow GUP.
>>
>> There, we seem to pass check_vma_flags(), trigger faultin_page() to
>> fault it in, and then find it via follow_page_mask().
>>
>> ... and I wonder how we manage to skip check_vma_flags(), or otherwise
>> managed to GUP it.
>>
>> vmsplice() should, in theory, never succeed here.
>>
>> Weird :/
>>
>>> Improved repro:
>>>
>>> #define _GNU_SOURCE
>>>
>>> #include <fcntl.h>
>>> #include <unistd.h>
>>> #include <stdio.h>
>>> #include <errno.h>
>>> #include <sys/mman.h>
>>> #include <sys/syscall.h>
>>>
>>> int main(void)
>>> {
>>>           int fd1, fd2;
>>>           int pip[2];
>>>           struct iovec iov;
>>>           char *addr;
>>>           int ret;
>>>
>>>           fd1 = syscall(__NR_memfd_secret, 0);
>>>           addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
>>>           ftruncate(fd1, 7);
>>>           addr[0] = 1; /* fault in page */
> 
> Here the page is faulted in and GUP-fast will find it.  It's not in
> the kernel page table, but it is in the user page table, which is what
> matter for GUP.

Trust me, I know the GUP code very well :P

gup_pte_range -- GUP fast -- contains:

if (unlikely(folio_is_secretmem(folio))) {
	gup_put_folio(folio, 1, flags);
	goto pte_unmap;
}

So we "should" be rejecting any secretmem folios and fallback to GUP slow.


... we don't check the same in gup_huge_pmd(), but we shouldn't ever see 
THP in secretmem code.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
  2024-03-22 21:18           ` David Hildenbrand
@ 2024-03-22 21:33             ` David Hildenbrand
       [not found]               ` <dd3e28b3-647c-4657-9c3f-9778bb046799@redhat.com>
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-22 21:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
	syzkaller-bugs, linux-mm, Mike Rapoport

On 22.03.24 22:18, David Hildenbrand wrote:
> On 22.03.24 22:13, Miklos Szeredi wrote:
>> On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 22.03.24 20:46, Miklos Szeredi wrote:
>>>> On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>> But at least the vmsplice() just seems to work. Which is weird, because
>>>>> GUP-fast should not apply (page not faulted in?)
>>>>
>>>> But it is faulted in, and that indeed seems to be the root cause.
>>>
>>> secretmem mmap() won't populate the page tables. So it's not faulted in yet.
>>>
>>> When we GUP via vmsplice, GUP-fast should not find it in the page tables
>>> and fallback to slow GUP.
>>>
>>> There, we seem to pass check_vma_flags(), trigger faultin_page() to
>>> fault it in, and then find it via follow_page_mask().
>>>
>>> ... and I wonder how we manage to skip check_vma_flags(), or otherwise
>>> managed to GUP it.
>>>
>>> vmsplice() should, in theory, never succeed here.
>>>
>>> Weird :/
>>>
>>>> Improved repro:
>>>>
>>>> #define _GNU_SOURCE
>>>>
>>>> #include <fcntl.h>
>>>> #include <unistd.h>
>>>> #include <stdio.h>
>>>> #include <errno.h>
>>>> #include <sys/mman.h>
>>>> #include <sys/syscall.h>
>>>>
>>>> int main(void)
>>>> {
>>>>            int fd1, fd2;
>>>>            int pip[2];
>>>>            struct iovec iov;
>>>>            char *addr;
>>>>            int ret;
>>>>
>>>>            fd1 = syscall(__NR_memfd_secret, 0);
>>>>            addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
>>>>            ftruncate(fd1, 7);
>>>>            addr[0] = 1; /* fault in page */
>>
>> Here the page is faulted in and GUP-fast will find it.  It's not in
>> the kernel page table, but it is in the user page table, which is what
>> matter for GUP.
> 
> Trust me, I know the GUP code very well :P
> 
> gup_pte_range -- GUP fast -- contains:
> 
> if (unlikely(folio_is_secretmem(folio))) {
> 	gup_put_folio(folio, 1, flags);
> 	goto pte_unmap;
> }
> 
> So we "should" be rejecting any secretmem folios and fallback to GUP slow.
> 
> 
> ... we don't check the same in gup_huge_pmd(), but we shouldn't ever see
> THP in secretmem code.
> 

Ehm:

[   29.441405] Secretmem fault: PFN: 1096177
[   29.442092] GUP-fast: PFN: 1096177


... is folio_is_secretmem() broken?

... is it something "obvious" like:

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 35f3a4a8ceb1e..6996f1f53f147 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
          * We know that secretmem pages are not compound and LRU so we can
          * save a couple of cycles here.
          */
-       if (folio_test_large(folio) || !folio_test_lru(folio))
+       if (folio_test_large(folio) || folio_test_lru(folio))
                 return false;
  
         mapping = (struct address_space *)


-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
       [not found]                 ` <b40eb0b7-7362-4d19-95b3-e06435e6e09c@redhat.com>
@ 2024-03-24 10:29                   ` Mike Rapoport
  2024-03-25 11:21                   ` Miklos Szeredi
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2024-03-24 10:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Miklos Szeredi, xingwei lee, linux-fsdevel, linux-kernel,
	samsun1006219, syzkaller-bugs, linux-mm

On Fri, Mar 22, 2024 at 10:56:08PM +0100, David Hildenbrand wrote:
> On 22.03.24 22:37, David Hildenbrand wrote:
> > On 22.03.24 22:33, David Hildenbrand wrote:
> > > On 22.03.24 22:18, David Hildenbrand wrote:
> > > > On 22.03.24 22:13, Miklos Szeredi wrote:
> > > > > On Fri, 22 Mar 2024 at 22:08, David Hildenbrand <david@redhat.com> wrote:
> > > > > > 
> > > > > > On 22.03.24 20:46, Miklos Szeredi wrote:
> > > > > > > On Fri, 22 Mar 2024 at 16:41, David Hildenbrand <david@redhat.com> wrote:
> > > > > > > 
> > > > > > > > But at least the vmsplice() just seems to work. Which is weird, because
> > > > > > > > GUP-fast should not apply (page not faulted in?)
> > > > > > > 
> > > > > > > But it is faulted in, and that indeed seems to be the root cause.
> > > > > > 
> > > > > > secretmem mmap() won't populate the page tables. So it's not faulted in yet.
> > > > > > 
> > > > > > When we GUP via vmsplice, GUP-fast should not find it in the page tables
> > > > > > and fallback to slow GUP.
> > > > > > 
> > > > > > There, we seem to pass check_vma_flags(), trigger faultin_page() to
> > > > > > fault it in, and then find it via follow_page_mask().
> > > > > > 
> > > > > > ... and I wonder how we manage to skip check_vma_flags(), or otherwise
> > > > > > managed to GUP it.
> > > > > > 
> > > > > > vmsplice() should, in theory, never succeed here.
> > > > > > 
> > > > > > Weird :/
> > > > > > 
> > > > > > > Improved repro:
> > > > > > > 
> > > > > > > #define _GNU_SOURCE
> > > > > > > 
> > > > > > > #include <fcntl.h>
> > > > > > > #include <unistd.h>
> > > > > > > #include <stdio.h>
> > > > > > > #include <errno.h>
> > > > > > > #include <sys/mman.h>
> > > > > > > #include <sys/syscall.h>
> > > > > > > 
> > > > > > > int main(void)
> > > > > > > {
> > > > > > >              int fd1, fd2;
> > > > > > >              int pip[2];
> > > > > > >              struct iovec iov;
> > > > > > >              char *addr;
> > > > > > >              int ret;
> > > > > > > 
> > > > > > >              fd1 = syscall(__NR_memfd_secret, 0);
> > > > > > >              addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd1, 0);
> > > > > > >              ftruncate(fd1, 7);
> > > > > > >              addr[0] = 1; /* fault in page */
> > > > > 
> > > > > Here the page is faulted in and GUP-fast will find it.  It's not in
> > > > > the kernel page table, but it is in the user page table, which is what
> > > > > matter for GUP.
> > > > 
> > > > Trust me, I know the GUP code very well :P
> > > > 
> > > > gup_pte_range -- GUP fast -- contains:
> > > > 
> > > > if (unlikely(folio_is_secretmem(folio))) {
> > > > 	gup_put_folio(folio, 1, flags);
> > > > 	goto pte_unmap;
> > > > }
> > > > 
> > > > So we "should" be rejecting any secretmem folios and fallback to GUP slow.
> > > > 
> > > > 
> > > > ... we don't check the same in gup_huge_pmd(), but we shouldn't ever see
> > > > THP in secretmem code.
> > > > 
> > > 
> > > Ehm:
> > > 
> > > [   29.441405] Secretmem fault: PFN: 1096177
> > > [   29.442092] GUP-fast: PFN: 1096177
> > > 
> > > 
> > > ... is folio_is_secretmem() broken?
> > > 
> > > ... is it something "obvious" like:
> > > 
> > > diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> > > index 35f3a4a8ceb1e..6996f1f53f147 100644
> > > --- a/include/linux/secretmem.h
> > > +++ b/include/linux/secretmem.h
> > > @@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
> > >             * We know that secretmem pages are not compound and LRU so we can
> > >             * save a couple of cycles here.
> > >             */
> > > -       if (folio_test_large(folio) || !folio_test_lru(folio))
> > > +       if (folio_test_large(folio) || folio_test_lru(folio))
> > >                    return false;
> > >            mapping = (struct address_space *)
> > 
> > ... yes, that does the trick!
> > 
> 
> Proper patch (I might send out again on Monday "officially"). There are
> other improvements we want to do to folio_is_secretmem() in the light of
> folio_fast_pin_allowed(), that I wanted to do a while ago. I might send
> a patch for that as well now that I'm at it.
 
The most robust but a bit slower solution is to make folio_is_secretmem()
call folio_mapping() rather than open code the check.

What improvements did you have in mind?
 
> From 85558a46d9f249f26bd77dd3b18d14f248464845 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Fri, 22 Mar 2024 22:45:36 +0100
> Subject: [PATCH] mm/secretmem: fix GUP-fast succeeding on secretmem folios
> 
> folio_is_secretmem() states that secretmem folios cannot be LRU folios:
> so we may only exit early if we find an LRU folio. Yet, we exit early if
> we find a folio that is not a secretmem folio.
>
> Consequently, folio_is_secretmem() fails to detect secretmem folios and,
> therefore, we can succeed in grabbing a secretmem folio during GUP-fast,
> crashing the kernel when we later try reading/writing to the folio, because
> the folio has been unmapped from the directmap.
> 
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  include/linux/secretmem.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index 35f3a4a8ceb1..6996f1f53f14 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
>  	 * We know that secretmem pages are not compound and LRU so we can
>  	 * save a couple of cycles here.
>  	 */
> -	if (folio_test_large(folio) || !folio_test_lru(folio))
> +	if (folio_test_large(folio) || folio_test_lru(folio))
>  		return false;
>  	mapping = (struct address_space *)
> -- 
> 2.43.2
> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: unable to handle kernel paging request in fuse_copy_do
       [not found]                 ` <b40eb0b7-7362-4d19-95b3-e06435e6e09c@redhat.com>
  2024-03-24 10:29                   ` Mike Rapoport
@ 2024-03-25 11:21                   ` Miklos Szeredi
  1 sibling, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-25 11:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: xingwei lee, linux-fsdevel, linux-kernel, samsun1006219,
	syzkaller-bugs, linux-mm, Mike Rapoport

On Fri, 22 Mar 2024 at 22:56, David Hildenbrand <david@redhat.com> wrote:

>  From 85558a46d9f249f26bd77dd3b18d14f248464845 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Fri, 22 Mar 2024 22:45:36 +0100
> Subject: [PATCH] mm/secretmem: fix GUP-fast succeeding on secretmem folios
>
> folio_is_secretmem() states that secretmem folios cannot be LRU folios:
> so we may only exit early if we find an LRU folio. Yet, we exit early if
> we find a folio that is not a secretmem folio.
>
> Consequently, folio_is_secretmem() fails to detect secretmem folios and,
> therefore, we can succeed in grabbing a secretmem folio during GUP-fast,
> crashing the kernel when we later try reading/writing to the folio, because
> the folio has been unmapped from the directmap.
>
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Verified that it's no longer crashing with the reproducers.

Tested-by: Miklos Szeredi <mszeredi@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-25 11:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com>
2024-03-22 13:50 ` BUG: unable to handle kernel paging request in fuse_copy_do Miklos Szeredi
2024-03-22 15:41   ` David Hildenbrand
2024-03-22 19:46     ` Miklos Szeredi
     [not found]       ` <620f68b0-4fe0-4e3e-856a-dedb4bcdf3a7@redhat.com>
2024-03-22 21:13         ` Miklos Szeredi
2024-03-22 21:18           ` David Hildenbrand
2024-03-22 21:33             ` David Hildenbrand
     [not found]               ` <dd3e28b3-647c-4657-9c3f-9778bb046799@redhat.com>
     [not found]                 ` <b40eb0b7-7362-4d19-95b3-e06435e6e09c@redhat.com>
2024-03-24 10:29                   ` Mike Rapoport
2024-03-25 11:21                   ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).