* [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume
@ 2021-08-23 9:31 Sergio Lopez
2021-08-24 5:04 ` Chirantan Ekbote
0 siblings, 1 reply; 6+ messages in thread
From: Sergio Lopez @ 2021-08-23 9:31 UTC (permalink / raw)
To: virtio-fs-list
[-- Attachment #1: Type: text/plain, Size: 4539 bytes --]
Hi,
I've noticed that trying to use gdb/lldb on any binary residing on a
DAX-enabled virtio-fs volume leads to a SIGSEGV in userspace...
[root@fedora ~]# mount -t virtiofs -o dax,context=system_u:object_r:root_t:s0 /dev/root /mnt/virtio-fs
[root@fedora ~]# gdb /mnt/virtio-fs/true
GNU gdb (GDB) Fedora 10.2-3.fc34
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mnt/virtio-fs/true...
Missing separate debuginfo for /mnt/virtio-fs/true
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/25/47d5e7307ccc0f5a75e5990afdc5e15293ccc1.debug
Reading symbols from .gnu_debugdata for /mnt/virtio-fs/true...
(No debugging symbols found in .gnu_debugdata for /mnt/virtio-fs/true)
(gdb) break main
Breakpoint 1 at 0x14c0
(gdb) r
Starting program: /mnt/virtio-fs/true
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-5.fc34.x86_64
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555960 in _start ()
... while the kernel drops a warning:
[ 39.962629] ------------[ cut here ]------------
[ 39.963854] WARNING: CPU: 1 PID: 718 at mm/memory.c:2666 wp_page_copy+0x565/0x5d0
[ 39.965736] Modules linked in: virtio_net net_failover i2c_i801 failover i2c_smbus joydev drm ip_tablew
[ 39.969040] CPU: 1 PID: 718 Comm: gdb Not tainted 5.11.12-300.fc34.x86_64 #1
[ 39.970800] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebui4
[ 39.973535] RIP: 0010:wp_page_copy+0x565/0x5d0
[ 39.974675] Code: 89 7b 48 48 8b 43 30 49 39 07 75 76 48 8b 7c 24 08 48 8b 74 24 10 ba 00 10 00 00 e8 5
[ 39.979165] RSP: 0018:ffffa3aa40cb7bc0 EFLAGS: 00010206
[ 39.980483] RAX: 0000000000001000 RBX: ffffa3aa40cb7c60 RCX: 0000000000001000
[ 39.982244] RDX: 0000000000001000 RSI: 0000555555555000 RDI: ffff898117a9f000
[ 39.984015] RBP: ffffe934045ea7c0 R08: ffffe934041c7f68 R09: 0000000000000001
[ 39.985780] R10: 0000000000117a9f R11: ffff89810825dd20 R12: ffff898105829770
[ 39.987552] R13: 0000000000000000 R14: ffff898106c11dc0 R15: 0000000000000001
[ 39.989324] FS: 00007f9691a91080(0000) GS:ffff898763a40000(0000) knlGS:0000000000000000
[ 39.991319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 39.992760] CR2: 0000555555555000 CR3: 000000010422a000 CR4: 00000000000006e0
[ 39.994535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 39.996312] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 39.998086] Call Trace:
[ 39.998766] handle_mm_fault+0x113f/0x1970
[ 39.999828] __get_user_pages+0x28c/0x7f0
[ 40.000868] __get_user_pages_remote+0xd4/0x320
[ 40.002028] __access_remote_vm+0x106/0x370
[ 40.003099] ptrace_access_vm+0x9d/0xd0
[ 40.004100] ptrace_request+0x560/0x620
[ 40.005097] arch_ptrace+0x13d/0x3b0
[ 40.006039] __x64_sys_ptrace+0x82/0x120
[ 40.007050] do_syscall_64+0x33/0x40
[ 40.008002] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 40.009283] RIP: 0033:0x7f9694ce2aae
[ 40.010220] Code: 70 41 83 f8 03 c7 44 24 10 08 00 00 00 48 89 44 24 18 48 8d 44 24 30 8b 70 08 4c 0f 4
[ 40.014714] RSP: 002b:00007ffe0cffb4e0 EFLAGS: 00000202 ORIG_RAX: 0000000000000065
[ 40.016577] RAX: ffffffffffffffda RBX: 00005555555554c0 RCX: 00007f9694ce2aae
[ 40.018331] RDX: 00005555555554c0 RSI: 00000000000002ef RDI: 0000000000000005
[ 40.020092] RBP: 0000000000000001 R08: 0000000000000004 R09: 00005555555554c0
[ 40.021855] R10: 55415641fa1e0fcc R11: 0000000000000202 R12: 0000000000000001
[ 40.023612] R13: 00007f9691a90848 R14: 00000000000002ef R15: 0000000000000000
[ 40.025364] ---[ end trace 2273d62ee1978eb4 ]---
Seems like DAX breaks something in the ptrace_access_vm path. On a
volume without DAX works fine.
The guest is a Fedora with a 5.11.12-300.fc34.x86_64 kernel.
Sergio.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume 2021-08-23 9:31 [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume Sergio Lopez @ 2021-08-24 5:04 ` Chirantan Ekbote 2021-08-24 13:22 ` Sergio Lopez 2021-08-24 21:11 ` Vivek Goyal 0 siblings, 2 replies; 6+ messages in thread From: Chirantan Ekbote @ 2021-08-24 5:04 UTC (permalink / raw) To: Sergio Lopez; +Cc: virtio-fs-list, David Riley, Sergey Senozhatsky, Yu Zhao Hi Sergio, On Mon, Aug 23, 2021 at 6:31 PM Sergio Lopez <slp@redhat.com> wrote: > > Hi, > > I've noticed that trying to use gdb/lldb on any binary residing on a > DAX-enabled virtio-fs volume leads to a SIGSEGV in userspace... > > Seems like DAX breaks something in the ptrace_access_vm path. On a > volume without DAX works fine. > We've seen this as well and unfortunately it doesn't appear to be limited to virtio-fs. Using DAX on a ext4 formatted virtio-pmem disk image has the same problem. We've actually disabled DAX everywhere because of this. Unfortunately most of the details are in an internal bug report but I'll try to extract the relevant bits here. This is well outside my depth so I've CC'd some of the people who have looked at this. The initial bug report was for virtio-pmem+ext4 so some of the details are specific to pmem but I suspect something similar is happening for virtio-fs as well. The issue is that process_vm_readv() corrupts the memory of files that are mmap()'d when DAX is enabled. 1. A filesystem is mounted with DAX enabled. pmem_attach_disk() sets pfn_flags to PFN_DEV|PFN_MAP. In the fuse case, this appears to happen here [1]. 2. When the (strace/gdb/etc) process does its initial read of the mmap()'d region, the pfn flags for the page are inherited from the pmem structure set to PFN_DEV|PFN_MAP in step 1. During a call to insert_pfn(), pte_mkdevmap is called to mark the pte as devmap. 3. If you follow the ftrace of the process_vm_readv(), it eventually reaches do_wp_page(). If the target process had not previously read the page in, this would not call do_wp_page() and instead just fault in the page normally through the ext4/dax logic. 4. do_wp_page() calls vm_normal_page() which returns NULL due to the remote pte being marked special and devmap (from above). If we just ignore the devmap check and return the page that has been found and allow the normal copy to occur, then no problem occurs. However, that can't be safely done in normal dax cases. Due to vm_normal_page() returning NULL, wp_page_copy() is called (first call site) with a null vmf->address. If the mmap()d file is originally from a non-dax filesystem (eg tmpfs), the second wp_page_copy() ends up being called with a valid vmf->address. 5. cow_user_page() ends up in this unexpected case since src==vmf->address is NULL, delimited with the following comment: /* * If the source page was a PFN mapping, we don't have * a "struct page" for it. We do a best-effort copy by * just copying from the original user address. If that * fails, we just zero-fill it. Live with it. */ The end effect of this is that there is the __copy_from_user_inatomic() call with an invalid uaddr because the uaddr is from the remote address space. This results in another page fault because that remote address isn't valid in the process calling process_vm_readv(). It seems that there's a few issues here, a) that it's trying to read from the remote address space as if it were local, and b) that the failure here is corrupting the remote processes memory and not just returning an empty page which would be less broken. In the good case of the mmap()d file being from tmpfs, src==vmf->address is non-NULL and copy_user_highpage can properly do the copy of the page. At that point, the caller is able to copy data from that page to its own local buffer and return data successfully, as well as avoid corrupting the remote process. We've also found that reverting "17839856fd58: gup: document and work around "COW can break either way" issue" seems to make the problem go away. Chirantan [1]: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/d5ae8d7f85b7f6f6e60f1af8ff4be52b0926fde1/fs/fuse/virtio_fs.c#741 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume 2021-08-24 5:04 ` Chirantan Ekbote @ 2021-08-24 13:22 ` Sergio Lopez 2021-08-24 18:09 ` Yu Zhao 2021-08-24 21:11 ` Vivek Goyal 1 sibling, 1 reply; 6+ messages in thread From: Sergio Lopez @ 2021-08-24 13:22 UTC (permalink / raw) To: Chirantan Ekbote; +Cc: virtio-fs-list, David Riley, Sergey Senozhatsky, Yu Zhao [-- Attachment #1: Type: text/plain, Size: 4081 bytes --] On Tue, Aug 24, 2021 at 02:04:45PM +0900, Chirantan Ekbote wrote: > Hi Sergio, > > On Mon, Aug 23, 2021 at 6:31 PM Sergio Lopez <slp@redhat.com> wrote: > > > > Hi, > > > > I've noticed that trying to use gdb/lldb on any binary residing on a > > DAX-enabled virtio-fs volume leads to a SIGSEGV in userspace... > > > > Seems like DAX breaks something in the ptrace_access_vm path. On a > > volume without DAX works fine. > > > > We've seen this as well and unfortunately it doesn't appear to be > limited to virtio-fs. Using DAX on a ext4 formatted virtio-pmem disk > image has the same problem. We've actually disabled DAX everywhere > because of this. > > Unfortunately most of the details are in an internal bug report but > I'll try to extract the relevant bits here. This is well outside my > depth so I've CC'd some of the people who have looked at this. The > initial bug report was for virtio-pmem+ext4 so some of the details are > specific to pmem but I suspect something similar is happening for > virtio-fs as well. > > The issue is that process_vm_readv() corrupts the memory of files that > are mmap()'d when DAX is enabled. > > 1. A filesystem is mounted with DAX enabled. pmem_attach_disk() sets > pfn_flags to PFN_DEV|PFN_MAP. In the fuse case, this appears to > happen here [1]. > 2. When the (strace/gdb/etc) process does its initial read of the > mmap()'d region, the pfn flags for the page are inherited from the > pmem structure set to PFN_DEV|PFN_MAP in step 1. During a call to > insert_pfn(), pte_mkdevmap is called to mark the pte as devmap. > 3. If you follow the ftrace of the process_vm_readv(), it eventually > reaches do_wp_page(). If the target process had not previously read > the page in, this would not call do_wp_page() and instead just fault > in the page normally through the ext4/dax logic. > 4. do_wp_page() calls vm_normal_page() which returns NULL due to the > remote pte being marked special and devmap (from above). If we just > ignore the devmap check and return the page that has been found and > allow the normal copy to occur, then no problem occurs. However, that > can't be safely done in normal dax cases. Due to vm_normal_page() > returning NULL, wp_page_copy() is called (first call site) with a null > vmf->address. If the mmap()d file is originally from a non-dax > filesystem (eg tmpfs), the second wp_page_copy() ends up being called > with a valid vmf->address. > 5. cow_user_page() ends up in this unexpected case since > src==vmf->address is NULL, delimited with the following comment: > > /* > * If the source page was a PFN mapping, we don't have > * a "struct page" for it. We do a best-effort copy by > * just copying from the original user address. If that > * fails, we just zero-fill it. Live with it. > */ > > The end effect of this is that there is the > __copy_from_user_inatomic() call with an invalid uaddr because the > uaddr is from the remote address space. This results in another page > fault because that remote address isn't valid in the process calling > process_vm_readv(). It seems that there's a few issues here, a) that > it's trying to read from the remote address space as if it were local, > and b) that the failure here is corrupting the remote processes memory > and not just returning an empty page which would be less broken. > > In the good case of the mmap()d file being from tmpfs, > src==vmf->address is non-NULL and copy_user_highpage can properly do > the copy of the page. At that point, the caller is able to copy data > from that page to its own local buffer and return data successfully, > as well as avoid corrupting the remote process. > > We've also found that reverting "17839856fd58: gup: document and work > around "COW can break either way" issue" seems to make the problem go > away. Thanks a lot for the super-detailed write-up. Do you know if someone is already working on a fix that can be upstreamed? Sergio. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume 2021-08-24 13:22 ` Sergio Lopez @ 2021-08-24 18:09 ` Yu Zhao 2021-08-24 23:27 ` Andrea Arcangeli 0 siblings, 1 reply; 6+ messages in thread From: Yu Zhao @ 2021-08-24 18:09 UTC (permalink / raw) To: Sergio Lopez Cc: virtio-fs-list, David Riley, Sergey Senozhatsky, Andrea Arcangeli On Tue, Aug 24, 2021 at 7:22 AM Sergio Lopez <slp@redhat.com> wrote: > > On Tue, Aug 24, 2021 at 02:04:45PM +0900, Chirantan Ekbote wrote: > > Hi Sergio, > > > > On Mon, Aug 23, 2021 at 6:31 PM Sergio Lopez <slp@redhat.com> wrote: > > > > > > Hi, > > > > > > I've noticed that trying to use gdb/lldb on any binary residing on a > > > DAX-enabled virtio-fs volume leads to a SIGSEGV in userspace... > > > > > > Seems like DAX breaks something in the ptrace_access_vm path. On a > > > volume without DAX works fine. > > > > > > > We've seen this as well and unfortunately it doesn't appear to be > > limited to virtio-fs. Using DAX on a ext4 formatted virtio-pmem disk > > image has the same problem. We've actually disabled DAX everywhere > > because of this. > > > > Unfortunately most of the details are in an internal bug report but > > I'll try to extract the relevant bits here. This is well outside my > > depth so I've CC'd some of the people who have looked at this. The > > initial bug report was for virtio-pmem+ext4 so some of the details are > > specific to pmem but I suspect something similar is happening for > > virtio-fs as well. > > > > The issue is that process_vm_readv() corrupts the memory of files that > > are mmap()'d when DAX is enabled. > > > > 1. A filesystem is mounted with DAX enabled. pmem_attach_disk() sets > > pfn_flags to PFN_DEV|PFN_MAP. In the fuse case, this appears to > > happen here [1]. > > 2. When the (strace/gdb/etc) process does its initial read of the > > mmap()'d region, the pfn flags for the page are inherited from the > > pmem structure set to PFN_DEV|PFN_MAP in step 1. During a call to > > insert_pfn(), pte_mkdevmap is called to mark the pte as devmap. > > 3. If you follow the ftrace of the process_vm_readv(), it eventually > > reaches do_wp_page(). If the target process had not previously read > > the page in, this would not call do_wp_page() and instead just fault > > in the page normally through the ext4/dax logic. > > 4. do_wp_page() calls vm_normal_page() which returns NULL due to the > > remote pte being marked special and devmap (from above). If we just > > ignore the devmap check and return the page that has been found and > > allow the normal copy to occur, then no problem occurs. However, that > > can't be safely done in normal dax cases. Due to vm_normal_page() > > returning NULL, wp_page_copy() is called (first call site) with a null > > vmf->address. If the mmap()d file is originally from a non-dax > > filesystem (eg tmpfs), the second wp_page_copy() ends up being called > > with a valid vmf->address. > > 5. cow_user_page() ends up in this unexpected case since > > src==vmf->address is NULL, delimited with the following comment: > > > > /* > > * If the source page was a PFN mapping, we don't have > > * a "struct page" for it. We do a best-effort copy by > > * just copying from the original user address. If that > > * fails, we just zero-fill it. Live with it. > > */ > > > > The end effect of this is that there is the > > __copy_from_user_inatomic() call with an invalid uaddr because the > > uaddr is from the remote address space. This results in another page > > fault because that remote address isn't valid in the process calling > > process_vm_readv(). It seems that there's a few issues here, a) that > > it's trying to read from the remote address space as if it were local, > > and b) that the failure here is corrupting the remote processes memory > > and not just returning an empty page which would be less broken. > > > > In the good case of the mmap()d file being from tmpfs, > > src==vmf->address is non-NULL and copy_user_highpage can properly do > > the copy of the page. At that point, the caller is able to copy data > > from that page to its own local buffer and return data successfully, > > as well as avoid corrupting the remote process. > > > > We've also found that reverting "17839856fd58: gup: document and work > > around "COW can break either way" issue" seems to make the problem go > > away. > > Thanks a lot for the super-detailed write-up. Do you know if someone > is already working on a fix that can be upstreamed? Hi Sergio, I cc'ed Andrea from RH -- he might have a fix queued in his tree. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume 2021-08-24 18:09 ` Yu Zhao @ 2021-08-24 23:27 ` Andrea Arcangeli 0 siblings, 0 replies; 6+ messages in thread From: Andrea Arcangeli @ 2021-08-24 23:27 UTC (permalink / raw) To: Yu Zhao; +Cc: virtio-fs-list, David Riley, Sergey Senozhatsky Hello everyone, On Tue, Aug 24, 2021 at 12:09:37PM -0600, Yu Zhao wrote: > On Tue, Aug 24, 2021 at 7:22 AM Sergio Lopez <slp@redhat.com> wrote: > > > > On Tue, Aug 24, 2021 at 02:04:45PM +0900, Chirantan Ekbote wrote: > > > Hi Sergio, > > > > > > On Mon, Aug 23, 2021 at 6:31 PM Sergio Lopez <slp@redhat.com> wrote: > > > > > > > > Hi, > > > > > > > > I've noticed that trying to use gdb/lldb on any binary residing on a > > > > DAX-enabled virtio-fs volume leads to a SIGSEGV in userspace... > > > > > > > > Seems like DAX breaks something in the ptrace_access_vm path. On a > > > > volume without DAX works fine. > > > > > > > > > > We've seen this as well and unfortunately it doesn't appear to be > > > limited to virtio-fs. Using DAX on a ext4 formatted virtio-pmem disk > > > image has the same problem. We've actually disabled DAX everywhere > > > because of this. > > > > > > Unfortunately most of the details are in an internal bug report but > > > I'll try to extract the relevant bits here. This is well outside my > > > depth so I've CC'd some of the people who have looked at this. The > > > initial bug report was for virtio-pmem+ext4 so some of the details are > > > specific to pmem but I suspect something similar is happening for > > > virtio-fs as well. > > > > > > The issue is that process_vm_readv() corrupts the memory of files that > > > are mmap()'d when DAX is enabled. > > > > > > 1. A filesystem is mounted with DAX enabled. pmem_attach_disk() sets > > > pfn_flags to PFN_DEV|PFN_MAP. In the fuse case, this appears to > > > happen here [1]. > > > 2. When the (strace/gdb/etc) process does its initial read of the > > > mmap()'d region, the pfn flags for the page are inherited from the > > > pmem structure set to PFN_DEV|PFN_MAP in step 1. During a call to > > > insert_pfn(), pte_mkdevmap is called to mark the pte as devmap. > > > 3. If you follow the ftrace of the process_vm_readv(), it eventually > > > reaches do_wp_page(). If the target process had not previously read > > > the page in, this would not call do_wp_page() and instead just fault > > > in the page normally through the ext4/dax logic. > > > 4. do_wp_page() calls vm_normal_page() which returns NULL due to the > > > remote pte being marked special and devmap (from above). If we just > > > ignore the devmap check and return the page that has been found and > > > allow the normal copy to occur, then no problem occurs. However, that > > > can't be safely done in normal dax cases. Due to vm_normal_page() > > > returning NULL, wp_page_copy() is called (first call site) with a null > > > vmf->address. If the mmap()d file is originally from a non-dax > > > filesystem (eg tmpfs), the second wp_page_copy() ends up being called > > > with a valid vmf->address. > > > 5. cow_user_page() ends up in this unexpected case since > > > src==vmf->address is NULL, delimited with the following comment: > > > > > > /* > > > * If the source page was a PFN mapping, we don't have > > > * a "struct page" for it. We do a best-effort copy by > > > * just copying from the original user address. If that > > > * fails, we just zero-fill it. Live with it. > > > */ > > > > > > The end effect of this is that there is the > > > __copy_from_user_inatomic() call with an invalid uaddr because the > > > uaddr is from the remote address space. This results in another page > > > fault because that remote address isn't valid in the process calling > > > process_vm_readv(). It seems that there's a few issues here, a) that > > > it's trying to read from the remote address space as if it were local, > > > and b) that the failure here is corrupting the remote processes memory > > > and not just returning an empty page which would be less broken. > > > > > > In the good case of the mmap()d file being from tmpfs, > > > src==vmf->address is non-NULL and copy_user_highpage can properly do > > > the copy of the page. At that point, the caller is able to copy data > > > from that page to its own local buffer and return data successfully, > > > as well as avoid corrupting the remote process. > > > > > > We've also found that reverting "17839856fd58: gup: document and work > > > around "COW can break either way" issue" seems to make the problem go > > > away. > > > > Thanks a lot for the super-detailed write-up. Do you know if someone > > is already working on a fix that can be upstreamed? Hmm, it appears you're not actually using latest upstream, if you were, you couldn't backout 178 clean. If you were to use latest upstream, I suppose your problem would be already solved, however upstream cannot be backported to production because upstream reopened CVE-2020-29374 with a slightly modified reproducer: https://github.com/aagit/kernel-testcases-for-v5.11/blob/main/vmsplice-v5.11.c Upstream if backported would also cause all kind of ABI breaks to clear_refs (obvious ones like RDMA being disabled from clear_refs tracking, see 9348b73c2e1bfea74ccd4a44fb4ccc7276ab9623) but also some that won't require any RDMA device like this ABI regression: https://github.com/aagit/kernel-testcases-for-v5.11/blob/main/page_count_do_wp_page.c Upstream also triggers false positive COWs after a swapin followed by a memory write, if there's still a swapcache pin on the anon memory as an example of how inaccurate it become. So production has to stick to 178 since breaking ptrace on pmem is preferable than dealing with the above fallout. Note that 178 has other defect not just ptrace on pmem, it breaks KSM density and it also broke uffd-wp (which I have to assume isn't included yet in your current kernel version, so that shouldn't be a concern). > Hi Sergio, > > I cc'ed Andrea from RH -- he might have a fix queued in his tree. That's right thanks for the CC. 17839856fd58 was in the right direction because the only bug there was, was in GUP and so it's correct to fix it in GUP itself. To resolve the defect of 17839856fd58, whenever a GUP pin is being taken for reading on a shared page with mapcount elevated, the mapcount unshare logic triggers a new fault called COR fault, which copies the page. Once done it leaves the pagetables like if a read fault happened. That resolves all defects caused by 17839856fd58 which instead acted as a write fault despite it was a GUP "read" access. The patchset continues and leverages the COR fault to add full MM coherency by the POSIX specs to FOLL_LONGTERM GUP pins no matter which vma type. It's now as coherent as the long term pin was under the MMU notifier it was impossible to achieve without the COR fault. https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=9f2f36029b5c8d4b3390c93e90a3b9254b7ec730 If you're curious to test it, this is the latest version: https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=mapcount_deshare I notice the branch is already super old, I didn't rebase it since mid Jun sorry... There's also a v5.10-LTS -stable version. I'll rebase it again soon as time permits. Thanks, Andrea ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume 2021-08-24 5:04 ` Chirantan Ekbote 2021-08-24 13:22 ` Sergio Lopez @ 2021-08-24 21:11 ` Vivek Goyal 1 sibling, 0 replies; 6+ messages in thread From: Vivek Goyal @ 2021-08-24 21:11 UTC (permalink / raw) To: Chirantan Ekbote Cc: Andrea Arcangeli, Yu Zhao, virtio-fs-list, David Riley, Sergey Senozhatsky, Dan Williams On Tue, Aug 24, 2021 at 02:04:45PM +0900, Chirantan Ekbote wrote: > Hi Sergio, > > On Mon, Aug 23, 2021 at 6:31 PM Sergio Lopez <slp@redhat.com> wrote: > > > > Hi, > > > > I've noticed that trying to use gdb/lldb on any binary residing on a > > DAX-enabled virtio-fs volume leads to a SIGSEGV in userspace... > > > > Seems like DAX breaks something in the ptrace_access_vm path. On a > > volume without DAX works fine. > > > > We've seen this as well and unfortunately it doesn't appear to be > limited to virtio-fs. Using DAX on a ext4 formatted virtio-pmem disk > image has the same problem. We've actually disabled DAX everywhere > because of this. Thanks Chirantan for confirming that this probably a generic DAX issue (and not limited to virtiofs) and providing all the details. Copying Dan Williams. He might have an idea. Vivek > > Unfortunately most of the details are in an internal bug report but > I'll try to extract the relevant bits here. This is well outside my > depth so I've CC'd some of the people who have looked at this. The > initial bug report was for virtio-pmem+ext4 so some of the details are > specific to pmem but I suspect something similar is happening for > virtio-fs as well. > > The issue is that process_vm_readv() corrupts the memory of files that > are mmap()'d when DAX is enabled. > > 1. A filesystem is mounted with DAX enabled. pmem_attach_disk() sets > pfn_flags to PFN_DEV|PFN_MAP. In the fuse case, this appears to > happen here [1]. > 2. When the (strace/gdb/etc) process does its initial read of the > mmap()'d region, the pfn flags for the page are inherited from the > pmem structure set to PFN_DEV|PFN_MAP in step 1. During a call to > insert_pfn(), pte_mkdevmap is called to mark the pte as devmap. > 3. If you follow the ftrace of the process_vm_readv(), it eventually > reaches do_wp_page(). If the target process had not previously read > the page in, this would not call do_wp_page() and instead just fault > in the page normally through the ext4/dax logic. > 4. do_wp_page() calls vm_normal_page() which returns NULL due to the > remote pte being marked special and devmap (from above). If we just > ignore the devmap check and return the page that has been found and > allow the normal copy to occur, then no problem occurs. However, that > can't be safely done in normal dax cases. Due to vm_normal_page() > returning NULL, wp_page_copy() is called (first call site) with a null > vmf->address. If the mmap()d file is originally from a non-dax > filesystem (eg tmpfs), the second wp_page_copy() ends up being called > with a valid vmf->address. > 5. cow_user_page() ends up in this unexpected case since > src==vmf->address is NULL, delimited with the following comment: > > /* > * If the source page was a PFN mapping, we don't have > * a "struct page" for it. We do a best-effort copy by > * just copying from the original user address. If that > * fails, we just zero-fill it. Live with it. > */ > > The end effect of this is that there is the > __copy_from_user_inatomic() call with an invalid uaddr because the > uaddr is from the remote address space. This results in another page > fault because that remote address isn't valid in the process calling > process_vm_readv(). It seems that there's a few issues here, a) that > it's trying to read from the remote address space as if it were local, > and b) that the failure here is corrupting the remote processes memory > and not just returning an empty page which would be less broken. > > In the good case of the mmap()d file being from tmpfs, > src==vmf->address is non-NULL and copy_user_highpage can properly do > the copy of the page. At that point, the caller is able to copy data > from that page to its own local buffer and return data successfully, > as well as avoid corrupting the remote process. > > We've also found that reverting "17839856fd58: gup: document and work > around "COW can break either way" issue" seems to make the problem go > away. > > Chirantan > > [1]: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/d5ae8d7f85b7f6f6e60f1af8ff4be52b0926fde1/fs/fuse/virtio_fs.c#741 > > _______________________________________________ > Virtio-fs mailing list > Virtio-fs@redhat.com > https://listman.redhat.com/mailman/listinfo/virtio-fs > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-24 23:27 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-23 9:31 [Virtio-fs] Problem using gdb/lldb on a binary residing on a DAX-enabled volume Sergio Lopez 2021-08-24 5:04 ` Chirantan Ekbote 2021-08-24 13:22 ` Sergio Lopez 2021-08-24 18:09 ` Yu Zhao 2021-08-24 23:27 ` Andrea Arcangeli 2021-08-24 21:11 ` Vivek Goyal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.