* [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs
@ 2023-08-30 8:20 Erik Schilling
2023-09-01 10:37 ` Erik Schilling
0 siblings, 1 reply; 5+ messages in thread
From: Erik Schilling @ 2023-08-30 8:20 UTC (permalink / raw)
To: virtualization, linux-fsdevel, virtio-fs
Cc: Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Alex Bennée,
Manos Pitsidianakis, Viresh Kumar, linux-kernel
Hi all!
Some days ago I posted to #virtiofs:matrix.org, describing that I am
observing what looks like a corruption when executing programs from a
virtiofs-based filesystem.
Over the last few days I spent more time drilling into the problem.
This is an attempt at summarizing my findings in order to see what other
people think about this.
When running binaries mounted from virtiofs they may either: fail with a
segfault, fail with badaddr, get stuck or - sometimes - succeed.
Environment:
Host: Fedora 38 running 6.4.11-200.fc38.x86_64
Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
virtiofsd: latest main + some debug prints [1]
QEMU: built from recent git [2]
virtiofsd invocation:
RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
--socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never
QEMU invocation:
~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
-cpu cortex-a57 \
-serial mon:stdio \
-device virtio-net-pci,netdev=net0 \
-netdev user,id=net0,hostfwd=tcp::2223-:22 \
-display none -m 2048 -smp 4 \
-object memory-backend-memfd,id=mem,size=2048M,share=on \
-numa node,memdev=mem \
-hda trs-overlay-guest.qcow2 \
-chardev socket,id=char0,path="fs.sock0" \
-device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
-append 'root=/dev/vda2 ro log_buf_len=8M'
I figured that launching virtiofsd with --cache=always masks the
problem. Therefore, I set --cache=never, but I think I observed no
difference compared to the default setting (auto).
Adding logging to virtiofsd and kernel _feeled_ like it made the problem
harder to reproduce - leaving me with the impression that some race is
happening on somewhere.
Trying to rule out that virtiofsd is returning corrupted data, I added
some logging and hashsum calculation hacks to it [1]. The hashes check
out across multiple accesses and the order and kind of queued messages
is exactly the same in both the error case and crash case. fio was also
unable to find any errors with a naive job description [3].
Next, I tried to capture info on the guest side. This became a bit
tricky since the crashes became pretty rare once I followed a fixed
pattern of starting log capture, running perf and trying to reproduce
the problem. Ultimately, I had the most consistent results with
immediately running a program twice:
/mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
/mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
(/mnt being the virtiofs mount)
For collecting logs, I made a hack to the guest kernel in order to dump
the page content after receiving the virtiofs responses [4]. Reproducing
the problem with this, leaves me with logs that seem to suggest that
virtiofsd is returning identical content, but the guest kernel seems to
receive differing pages:
good-kernel [5]:
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs: page: 000000006996d520
kernel: virtio_fs: to: 00000000de590c14
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[...]
bad-kernel [6]:
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs: page: 000000006ce9a559
kernel: virtio_fs: to: 000000007ae8b946
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b.....
kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N..............
[...]
When looking at the corresponding output from virtiofsd, it claims to
have returned identical data:
good-virtiofsd [7]:
[DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481
[src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
fh: 31,
offset: 0,
size: 832,
read_flags: 2,
lock_owner: 6838554705639967244,
flags: 131072,
padding: 0,
}
[src/file_traits.rs:161] hash = 2308490450751364994
[DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 }
bad-virtiofsd [8]:
[DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406
[src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
fh: 12,
offset: 0,
size: 832,
read_flags: 2,
lock_owner: 6181120926258395554,
flags: 131072,
padding: 0,
}
[src/file_traits.rs:161] hash = 2308490450751364994
[DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 }
The "corruption" only seems to happen in this one page, all other pages
are identical between runs (except that the bad run terminates earlier).
What do the experts think here? To me it feels a bit like some kind of
corruption is going on. Or am I misinterpreting things here?
Which further analysis steps would you suggest?
Further notes:
After collecting the above results, I realized that running the guest
with -smp 1 makes the problems a lot worse. So maybe that is a better
choice when trying to reproduce it.
Repo with my scripts is available at:
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
The scripts are just quick and dirty implementations and are not
particulary portable.
- Erik
[1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
[2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
[3] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
[4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
[5] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
[6] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
[7] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
[8] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs
2023-08-30 8:20 [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs Erik Schilling
@ 2023-09-01 10:37 ` Erik Schilling
2023-09-13 6:18 ` Erik Schilling
0 siblings, 1 reply; 5+ messages in thread
From: Erik Schilling @ 2023-09-01 10:37 UTC (permalink / raw)
To: Erik Schilling, virtualization, linux-fsdevel, virtio-fs
Cc: Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Alex Bennée,
Manos Pitsidianakis, Viresh Kumar, linux-kernel
On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote:
> Hi all!
>
> Some days ago I posted to #virtiofs:matrix.org, describing that I am
> observing what looks like a corruption when executing programs from a
> virtiofs-based filesystem.
>
> Over the last few days I spent more time drilling into the problem.
> This is an attempt at summarizing my findings in order to see what other
> people think about this.
>
> When running binaries mounted from virtiofs they may either: fail with a
> segfault, fail with badaddr, get stuck or - sometimes - succeed.
>
> Environment:
> Host: Fedora 38 running 6.4.11-200.fc38.x86_64
> Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
> virtiofsd: latest main + some debug prints [1]
> QEMU: built from recent git [2]
>
> virtiofsd invocation:
> RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
> --socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never
>
> QEMU invocation:
> ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
> -cpu cortex-a57 \
> -serial mon:stdio \
> -device virtio-net-pci,netdev=net0 \
> -netdev user,id=net0,hostfwd=tcp::2223-:22 \
> -display none -m 2048 -smp 4 \
> -object memory-backend-memfd,id=mem,size=2048M,share=on \
> -numa node,memdev=mem \
> -hda trs-overlay-guest.qcow2 \
> -chardev socket,id=char0,path="fs.sock0" \
> -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
> -append 'root=/dev/vda2 ro log_buf_len=8M'
>
> I figured that launching virtiofsd with --cache=always masks the
> problem. Therefore, I set --cache=never, but I think I observed no
> difference compared to the default setting (auto).
>
> Adding logging to virtiofsd and kernel _feeled_ like it made the problem
> harder to reproduce - leaving me with the impression that some race is
> happening on somewhere.
>
> Trying to rule out that virtiofsd is returning corrupted data, I added
> some logging and hashsum calculation hacks to it [1]. The hashes check
> out across multiple accesses and the order and kind of queued messages
> is exactly the same in both the error case and crash case. fio was also
> unable to find any errors with a naive job description [3].
>
> Next, I tried to capture info on the guest side. This became a bit
> tricky since the crashes became pretty rare once I followed a fixed
> pattern of starting log capture, running perf and trying to reproduce
> the problem. Ultimately, I had the most consistent results with
> immediately running a program twice:
>
> /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
> /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
>
> (/mnt being the virtiofs mount)
>
> For collecting logs, I made a hack to the guest kernel in order to dump
> the page content after receiving the virtiofs responses [4]. Reproducing
> the problem with this, leaves me with logs that seem to suggest that
> virtiofsd is returning identical content, but the guest kernel seems to
> receive differing pages:
>
> good-kernel [5]:
> kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs: page: 000000006996d520
> kernel: virtio_fs: to: 00000000de590c14
> kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [...]
>
> bad-kernel [6]:
> kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832
> kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> kernel: virtio_fs: page: 000000006ce9a559
> kernel: virtio_fs: to: 000000007ae8b946
> kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b.....
> kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N..............
> [...]
>
> When looking at the corresponding output from virtiofsd, it claims to
> have returned identical data:
>
> good-virtiofsd [7]:
> [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481
> [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
> fh: 31,
> offset: 0,
> size: 832,
> read_flags: 2,
> lock_owner: 6838554705639967244,
> flags: 131072,
> padding: 0,
> }
> [src/file_traits.rs:161] hash = 2308490450751364994
> [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 }
>
> bad-virtiofsd [8]:
> [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406
> [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
> fh: 12,
> offset: 0,
> size: 832,
> read_flags: 2,
> lock_owner: 6181120926258395554,
> flags: 131072,
> padding: 0,
> }
> [src/file_traits.rs:161] hash = 2308490450751364994
> [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 }
>
> The "corruption" only seems to happen in this one page, all other pages
> are identical between runs (except that the bad run terminates earlier).
>
> What do the experts think here? To me it feels a bit like some kind of
> corruption is going on. Or am I misinterpreting things here?
>
> Which further analysis steps would you suggest?
>
>
> Further notes:
>
> After collecting the above results, I realized that running the guest
> with -smp 1 makes the problems a lot worse. So maybe that is a better
> choice when trying to reproduce it.
>
> Repo with my scripts is available at:
> https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
>
> The scripts are just quick and dirty implementations and are not
> particulary portable.
Summary of my testing during the last few days:
Testing with KCSAN revealed a few cases that look like missing READ_ONCE
annotations (will send patches separately). But nothing of that was
related to the immediate problem. I tested instrument_read() and another
round of logging with a delay to virtio_fs_request_complete. It looks
like the buffer get corrupted before entering that function. KCSAN
or manual sleeps + prints did not show any corruption while in that
function.
KASAN did not report any issues.
Patching virtiofsd to do an additional copy and going through rust-vmm's
.copy_to() function did not change the behaviour.
I will mostly be off next week, will continue analysis afterwards. Happy
to hear about suggestions of other things to try :).
Good weekend,
- Erik
>
> - Erik
>
> [1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
> [2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
> [3] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
> [4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
> [5] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
> [6] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
> [7] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
> [8] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs
2023-09-01 10:37 ` Erik Schilling
@ 2023-09-13 6:18 ` Erik Schilling
2023-09-13 14:00 ` Erik Schilling
0 siblings, 1 reply; 5+ messages in thread
From: Erik Schilling @ 2023-09-13 6:18 UTC (permalink / raw)
To: Erik Schilling, virtualization, linux-fsdevel, virtio-fs
Cc: Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi, Alex Bennée,
Manos Pitsidianakis, Viresh Kumar, linux-kernel, qemu-devel
On Fri Sep 1, 2023 at 12:37 PM CEST, Erik Schilling wrote:
> On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote:
> > Hi all!
> >
> > Some days ago I posted to #virtiofs:matrix.org, describing that I am
> > observing what looks like a corruption when executing programs from a
> > virtiofs-based filesystem.
> >
> > Over the last few days I spent more time drilling into the problem.
> > This is an attempt at summarizing my findings in order to see what other
> > people think about this.
> >
> > When running binaries mounted from virtiofs they may either: fail with a
> > segfault, fail with badaddr, get stuck or - sometimes - succeed.
> >
> > Environment:
> > Host: Fedora 38 running 6.4.11-200.fc38.x86_64
> > Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
> > virtiofsd: latest main + some debug prints [1]
> > QEMU: built from recent git [2]
> >
> > virtiofsd invocation:
> > RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
> > --socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never
> >
> > QEMU invocation:
> > ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
> > -cpu cortex-a57 \
> > -serial mon:stdio \
> > -device virtio-net-pci,netdev=net0 \
> > -netdev user,id=net0,hostfwd=tcp::2223-:22 \
> > -display none -m 2048 -smp 4 \
> > -object memory-backend-memfd,id=mem,size=2048M,share=on \
> > -numa node,memdev=mem \
> > -hda trs-overlay-guest.qcow2 \
> > -chardev socket,id=char0,path="fs.sock0" \
> > -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
> > -append 'root=/dev/vda2 ro log_buf_len=8M'
> >
> > I figured that launching virtiofsd with --cache=always masks the
> > problem. Therefore, I set --cache=never, but I think I observed no
> > difference compared to the default setting (auto).
> >
> > Adding logging to virtiofsd and kernel _feeled_ like it made the problem
> > harder to reproduce - leaving me with the impression that some race is
> > happening on somewhere.
> >
> > Trying to rule out that virtiofsd is returning corrupted data, I added
> > some logging and hashsum calculation hacks to it [1]. The hashes check
> > out across multiple accesses and the order and kind of queued messages
> > is exactly the same in both the error case and crash case. fio was also
> > unable to find any errors with a naive job description [3].
> >
> > Next, I tried to capture info on the guest side. This became a bit
> > tricky since the crashes became pretty rare once I followed a fixed
> > pattern of starting log capture, running perf and trying to reproduce
> > the problem. Ultimately, I had the most consistent results with
> > immediately running a program twice:
> >
> > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
> > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
> >
> > (/mnt being the virtiofs mount)
> >
> > For collecting logs, I made a hack to the guest kernel in order to dump
> > the page content after receiving the virtiofs responses [4]. Reproducing
> > the problem with this, leaves me with logs that seem to suggest that
> > virtiofsd is returning identical content, but the guest kernel seems to
> > receive differing pages:
> >
> > good-kernel [5]:
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs: page: 000000006996d520
> > kernel: virtio_fs: to: 00000000de590c14
> > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [...]
> >
> > bad-kernel [6]:
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832
> > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > kernel: virtio_fs: page: 000000006ce9a559
> > kernel: virtio_fs: to: 000000007ae8b946
> > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b.....
> > kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N..............
> > [...]
> >
> > When looking at the corresponding output from virtiofsd, it claims to
> > have returned identical data:
> >
> > good-virtiofsd [7]:
> > [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481
> > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
> > fh: 31,
> > offset: 0,
> > size: 832,
> > read_flags: 2,
> > lock_owner: 6838554705639967244,
> > flags: 131072,
> > padding: 0,
> > }
> > [src/file_traits.rs:161] hash = 2308490450751364994
> > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 }
> >
> > bad-virtiofsd [8]:
> > [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406
> > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
> > fh: 12,
> > offset: 0,
> > size: 832,
> > read_flags: 2,
> > lock_owner: 6181120926258395554,
> > flags: 131072,
> > padding: 0,
> > }
> > [src/file_traits.rs:161] hash = 2308490450751364994
> > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 }
> >
> > The "corruption" only seems to happen in this one page, all other pages
> > are identical between runs (except that the bad run terminates earlier).
> >
> > What do the experts think here? To me it feels a bit like some kind of
> > corruption is going on. Or am I misinterpreting things here?
> >
> > Which further analysis steps would you suggest?
> >
> >
> > Further notes:
> >
> > After collecting the above results, I realized that running the guest
> > with -smp 1 makes the problems a lot worse. So maybe that is a better
> > choice when trying to reproduce it.
> >
> > Repo with my scripts is available at:
> > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
> >
> > The scripts are just quick and dirty implementations and are not
> > particulary portable.
>
> Summary of my testing during the last few days:
>
> Testing with KCSAN revealed a few cases that look like missing READ_ONCE
> annotations (will send patches separately). But nothing of that was
> related to the immediate problem. I tested instrument_read() and another
> round of logging with a delay to virtio_fs_request_complete. It looks
> like the buffer get corrupted before entering that function. KCSAN
> or manual sleeps + prints did not show any corruption while in that
> function.
>
> KASAN did not report any issues.
>
> Patching virtiofsd to do an additional copy and going through rust-vmm's
> .copy_to() function did not change the behaviour.
>
> I will mostly be off next week, will continue analysis afterwards. Happy
> to hear about suggestions of other things to try :).
Back from a week of vacation...
Summary of what was discussed on #virtiofs:matrix.org:
The issue only seems to happen in QEMU TCG scenarios (I tested aarch64
and x86_64 on x86_64, wizzard on Matrix tested arm32).
CCing qemu-devel. Maybe someone has some hints on where to focus the
debugging efforts?
I am trying to build a complex monster script of tracing the relevant
addresses in order to figure out whether the guest or host does the
writes. But I am happy to hear about more clever ideas :).
- Erik
>
> Good weekend,
>
> - Erik
>
>
> >
> > - Erik
> >
> > [1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
> > [2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
> > [3] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
> > [4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
> > [5] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
> > [6] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
> > [7] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
> > [8] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs
2023-09-13 6:18 ` Erik Schilling
@ 2023-09-13 14:00 ` Erik Schilling
2023-09-13 15:38 ` Alex Bennée
0 siblings, 1 reply; 5+ messages in thread
From: Erik Schilling @ 2023-09-13 14:00 UTC (permalink / raw)
To: Erik Schilling, virtualization, linux-fsdevel, virtio-fs
Cc: Richard Henderson, Weiwei Li, Vivek Goyal, Stefan Hajnoczi,
Miklos Szeredi, Alex Bennée, Manos Pitsidianakis,
Viresh Kumar, linux-kernel, qemu-devel, German Maglione,
Hanna Czenczek
CCing a few more people as suggested by stefanha on #qemu.
On Wed Sep 13, 2023 at 8:18 AM CEST, Erik Schilling wrote:
> On Fri Sep 1, 2023 at 12:37 PM CEST, Erik Schilling wrote:
> > On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote:
> > > Hi all!
> > >
> > > Some days ago I posted to #virtiofs:matrix.org, describing that I am
> > > observing what looks like a corruption when executing programs from a
> > > virtiofs-based filesystem.
> > >
> > > Over the last few days I spent more time drilling into the problem.
> > > This is an attempt at summarizing my findings in order to see what other
> > > people think about this.
> > >
> > > When running binaries mounted from virtiofs they may either: fail with a
> > > segfault, fail with badaddr, get stuck or - sometimes - succeed.
> > >
> > > Environment:
> > > Host: Fedora 38 running 6.4.11-200.fc38.x86_64
> > > Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
> > > virtiofsd: latest main + some debug prints [1]
> > > QEMU: built from recent git [2]
> > >
> > > virtiofsd invocation:
> > > RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
> > > --socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never
> > >
> > > QEMU invocation:
> > > ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
> > > -cpu cortex-a57 \
> > > -serial mon:stdio \
> > > -device virtio-net-pci,netdev=net0 \
> > > -netdev user,id=net0,hostfwd=tcp::2223-:22 \
> > > -display none -m 2048 -smp 4 \
> > > -object memory-backend-memfd,id=mem,size=2048M,share=on \
> > > -numa node,memdev=mem \
> > > -hda trs-overlay-guest.qcow2 \
> > > -chardev socket,id=char0,path="fs.sock0" \
> > > -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
> > > -append 'root=/dev/vda2 ro log_buf_len=8M'
> > >
> > > I figured that launching virtiofsd with --cache=always masks the
> > > problem. Therefore, I set --cache=never, but I think I observed no
> > > difference compared to the default setting (auto).
> > >
> > > Adding logging to virtiofsd and kernel _feeled_ like it made the problem
> > > harder to reproduce - leaving me with the impression that some race is
> > > happening on somewhere.
> > >
> > > Trying to rule out that virtiofsd is returning corrupted data, I added
> > > some logging and hashsum calculation hacks to it [1]. The hashes check
> > > out across multiple accesses and the order and kind of queued messages
> > > is exactly the same in both the error case and crash case. fio was also
> > > unable to find any errors with a naive job description [3].
> > >
> > > Next, I tried to capture info on the guest side. This became a bit
> > > tricky since the crashes became pretty rare once I followed a fixed
> > > pattern of starting log capture, running perf and trying to reproduce
> > > the problem. Ultimately, I had the most consistent results with
> > > immediately running a program twice:
> > >
> > > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
> > > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
> > >
> > > (/mnt being the virtiofs mount)
> > >
> > > For collecting logs, I made a hack to the guest kernel in order to dump
> > > the page content after receiving the virtiofs responses [4]. Reproducing
> > > the problem with this, leaves me with logs that seem to suggest that
> > > virtiofsd is returning identical content, but the guest kernel seems to
> > > receive differing pages:
> > >
> > > good-kernel [5]:
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs: page: 000000006996d520
> > > kernel: virtio_fs: to: 00000000de590c14
> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > [...]
> > >
> > > bad-kernel [6]:
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832
> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
> > > kernel: virtio_fs: page: 000000006ce9a559
> > > kernel: virtio_fs: to: 000000007ae8b946
> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b.....
> > > kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N..............
> > > [...]
> > >
> > > When looking at the corresponding output from virtiofsd, it claims to
> > > have returned identical data:
> > >
> > > good-virtiofsd [7]:
> > > [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481
> > > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
> > > fh: 31,
> > > offset: 0,
> > > size: 832,
> > > read_flags: 2,
> > > lock_owner: 6838554705639967244,
> > > flags: 131072,
> > > padding: 0,
> > > }
> > > [src/file_traits.rs:161] hash = 2308490450751364994
> > > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 }
> > >
> > > bad-virtiofsd [8]:
> > > [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406
> > > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
> > > fh: 12,
> > > offset: 0,
> > > size: 832,
> > > read_flags: 2,
> > > lock_owner: 6181120926258395554,
> > > flags: 131072,
> > > padding: 0,
> > > }
> > > [src/file_traits.rs:161] hash = 2308490450751364994
> > > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 }
> > >
> > > The "corruption" only seems to happen in this one page, all other pages
> > > are identical between runs (except that the bad run terminates earlier).
> > >
> > > What do the experts think here? To me it feels a bit like some kind of
> > > corruption is going on. Or am I misinterpreting things here?
> > >
> > > Which further analysis steps would you suggest?
> > >
> > >
> > > Further notes:
> > >
> > > After collecting the above results, I realized that running the guest
> > > with -smp 1 makes the problems a lot worse. So maybe that is a better
> > > choice when trying to reproduce it.
> > >
> > > Repo with my scripts is available at:
> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
> > >
> > > The scripts are just quick and dirty implementations and are not
> > > particulary portable.
> >
> > Summary of my testing during the last few days:
> >
> > Testing with KCSAN revealed a few cases that look like missing READ_ONCE
> > annotations (will send patches separately). But nothing of that was
> > related to the immediate problem. I tested instrument_read() and another
> > round of logging with a delay to virtio_fs_request_complete. It looks
> > like the buffer get corrupted before entering that function. KCSAN
> > or manual sleeps + prints did not show any corruption while in that
> > function.
> >
> > KASAN did not report any issues.
> >
> > Patching virtiofsd to do an additional copy and going through rust-vmm's
> > .copy_to() function did not change the behaviour.
> >
> > I will mostly be off next week, will continue analysis afterwards. Happy
> > to hear about suggestions of other things to try :).
>
> Back from a week of vacation...
>
> Summary of what was discussed on #virtiofs:matrix.org:
>
> The issue only seems to happen in QEMU TCG scenarios (I tested aarch64
> and x86_64 on x86_64, wizzard on Matrix tested arm32).
>
> CCing qemu-devel. Maybe someone has some hints on where to focus the
> debugging efforts?
>
> I am trying to build a complex monster script of tracing the relevant
> addresses in order to figure out whether the guest or host does the
> writes. But I am happy to hear about more clever ideas :).
After hearing about investigations of bugs in other virtio scenarios
that seem to be caused by QEMU [9], I tested some older QEMU versions.
Indeed, a882b5712373171d3bd53cd82ddab4453ddef468 did not show the buggy
behaviour. So I did a bisect:
git bisect start
# status: waiting for both good and bad commits
# good: [a882b5712373171d3bd53cd82ddab4453ddef468] Revert "virtio: introduce macro IRTIO_CONFIG_IRQ_IDX"
git bisect good a882b5712373171d3bd53cd82ddab4453ddef468
# status: waiting for bad commit, 1 good commit known
# bad: [9ef497755afc252fb8e060c9ea6b0987abfd20b6] Merge tag 'pull-vfio-20230911' of https://github.com/legoater/qemu into staging
git bisect bad 9ef497755afc252fb8e060c9ea6b0987abfd20b6
# skip: [3ba5fe46ea4456a16e2f47ab8e75943b54879c4e] Merge tag 'mips-20221108' of https://github.com/philmd/qemu into staging
git bisect skip 3ba5fe46ea4456a16e2f47ab8e75943b54879c4e
# skip: [ade760a2f63804b7ab1839fbc3e5ddbf30538718] Merge tag 'pull-request-2022-11-08' of https://gitlab.com/thuth/qemu into staging
git bisect skip ade760a2f63804b7ab1839fbc3e5ddbf30538718
# good: [ad2ca2e3f762b0cb98eb976002569795b270aef1] target/xtensa: Drop tcg_temp_free
git bisect good ad2ca2e3f762b0cb98eb976002569795b270aef1
# bad: [19a720b74fde7e859d19f12c66a72e545947a657] Merge tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into staging
git bisect bad 19a720b74fde7e859d19f12c66a72e545947a657
# bad: [29d9efca16080211f107b540f04d1ed3c12c63b0] arm/Kconfig: Do not build TCG-only boards on a KVM-only build
git bisect bad 29d9efca16080211f107b540f04d1ed3c12c63b0
# good: [9636e513255362c4a329e3e5fb2c97dab3c5ce47] Merge tag 'misc-next-pull-request' of https://gitlab.com/berrange/qemu into staging
git bisect good 9636e513255362c4a329e3e5fb2c97dab3c5ce47
# bad: [45608654aa63ca2b311d6cb761e1522f2128e00e] Merge tag 'pull-tpm-2023-04-20-1' of https://github.com/stefanberger/qemu-tpm into staging
git bisect bad 45608654aa63ca2b311d6cb761e1522f2128e00e
# good: [1ff4a81bd3efb207992f1da267886fe0c4df764f] tcg: use QTree instead of GTree
git bisect good 1ff4a81bd3efb207992f1da267886fe0c4df764f
# bad: [9ed98cae151368cc89c4bb77c9f325f7185e8f09] block-backend: ignore inserted state in blk_co_nb_sectors
git bisect bad 9ed98cae151368cc89c4bb77c9f325f7185e8f09
# good: [c8cb603293fd329f2a62ade76ec9de3f462fc5c3] tests/avocado: Test Xen guest support under KVM
git bisect good c8cb603293fd329f2a62ade76ec9de3f462fc5c3
# bad: [64f1c63d87208e28e8e38c4ab514ada1728960ef] Merge tag 'pull_error_handle_fix_use_after_free.v1' of https://github.com/stefanberger/qemu-tpm into staging
git bisect bad 64f1c63d87208e28e8e38c4ab514ada1728960ef
# good: [8a712df4d4d736b7fe6441626677bfd271d95b15] Merge tag 'pull-for-8.0-040423-2' of https://gitlab.com/stsquad/qemu into staging
git bisect good 8a712df4d4d736b7fe6441626677bfd271d95b15
# bad: [7d0334e49111787ae19fbc8d29ff6e7347f0605e] Merge tag 'pull-tcg-20230404' of https://gitlab.com/rth7680/qemu into staging
git bisect bad 7d0334e49111787ae19fbc8d29ff6e7347f0605e
# bad: [3371802fba3f7be4465f8a5e5777d43d556676ef] accel/tcg: Fix jump cache set in cpu_exec_loop
git bisect bad 3371802fba3f7be4465f8a5e5777d43d556676ef
# good: [6cda41daa2162b8e1048124655ba02a8c2b762b4] Revert "linux-user/arm: Take more care allocating commpage"
git bisect good 6cda41daa2162b8e1048124655ba02a8c2b762b4
# skip: [c83574392e0af108a643347712564f6749906413] accel/tcg: Fix overwrite problems of tcg_cflags
git bisect skip c83574392e0af108a643347712564f6749906413
# only skipped commits left to test
# possible first bad commit: [3371802fba3f7be4465f8a5e5777d43d556676ef] accel/tcg: Fix jump cache set in cpu_exec_loop
# possible first bad commit: [c83574392e0af108a643347712564f6749906413] accel/tcg: Fix overwrite problems of tcg_cflags
I had an inclusive test in the end where c83574392e did not yield in me
being able to start the VM.
Whether one of these contains a bug or whether only new behaviour of
QEMU revealed a bug somewhere else is of course still to be figured out.
[9] https://gitlab.com/qemu-project/qemu/-/issues/1866
- Erik
>
> - Erik
>
> >
> > Good weekend,
> >
> > - Erik
> >
> >
> > >
> > > - Erik
> > >
> > > [1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
> > > [2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
> > > [3] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
> > > [4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
> > > [5] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
> > > [6] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
> > > [7] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
> > > [8] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs
2023-09-13 14:00 ` Erik Schilling
@ 2023-09-13 15:38 ` Alex Bennée
0 siblings, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2023-09-13 15:38 UTC (permalink / raw)
To: Erik Schilling
Cc: virtualization, linux-fsdevel, virtio-fs, Richard Henderson,
Weiwei Li, Vivek Goyal, Stefan Hajnoczi, Miklos Szeredi,
Manos Pitsidianakis, Viresh Kumar, linux-kernel, qemu-devel,
German Maglione, Hanna Czenczek, Philippe Mathieu-Daudé
"Erik Schilling" <erik.schilling@linaro.org> writes:
> CCing a few more people as suggested by stefanha on #qemu.
(Add philmd who's tracking the MIPs failure)
> On Wed Sep 13, 2023 at 8:18 AM CEST, Erik Schilling wrote:
>> On Fri Sep 1, 2023 at 12:37 PM CEST, Erik Schilling wrote:
>> > On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote:
>> > > Hi all!
>> > >
>> > > Some days ago I posted to #virtiofs:matrix.org, describing that I am
>> > > observing what looks like a corruption when executing programs from a
>> > > virtiofs-based filesystem.
>> > >
>> > > Over the last few days I spent more time drilling into the problem.
>> > > This is an attempt at summarizing my findings in order to see what other
>> > > people think about this.
>> > >
>> > > When running binaries mounted from virtiofs they may either: fail with a
>> > > segfault, fail with badaddr, get stuck or - sometimes - succeed.
>> > >
>> > > Environment:
>> > > Host: Fedora 38 running 6.4.11-200.fc38.x86_64
>> > > Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
>> > > virtiofsd: latest main + some debug prints [1]
>> > > QEMU: built from recent git [2]
>> > >
>> > > virtiofsd invocation:
>> > > RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
>> > > --socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never
>> > >
>> > > QEMU invocation:
>> > > ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
>> > > -cpu cortex-a57 \
>> > > -serial mon:stdio \
>> > > -device virtio-net-pci,netdev=net0 \
>> > > -netdev user,id=net0,hostfwd=tcp::2223-:22 \
>> > > -display none -m 2048 -smp 4 \
>> > > -object memory-backend-memfd,id=mem,size=2048M,share=on \
>> > > -numa node,memdev=mem \
>> > > -hda trs-overlay-guest.qcow2 \
>> > > -chardev socket,id=char0,path="fs.sock0" \
>> > > -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
>> > > -append 'root=/dev/vda2 ro log_buf_len=8M'
>> > >
>> > > I figured that launching virtiofsd with --cache=always masks the
>> > > problem. Therefore, I set --cache=never, but I think I observed no
>> > > difference compared to the default setting (auto).
>> > >
>> > > Adding logging to virtiofsd and kernel _feeled_ like it made the problem
>> > > harder to reproduce - leaving me with the impression that some race is
>> > > happening on somewhere.
>> > >
>> > > Trying to rule out that virtiofsd is returning corrupted data, I added
>> > > some logging and hashsum calculation hacks to it [1]. The hashes check
>> > > out across multiple accesses and the order and kind of queued messages
>> > > is exactly the same in both the error case and crash case. fio was also
>> > > unable to find any errors with a naive job description [3].
>> > >
>> > > Next, I tried to capture info on the guest side. This became a bit
>> > > tricky since the crashes became pretty rare once I followed a fixed
>> > > pattern of starting log capture, running perf and trying to reproduce
>> > > the problem. Ultimately, I had the most consistent results with
>> > > immediately running a program twice:
>> > >
>> > > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
>> > > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /
>> > >
>> > > (/mnt being the virtiofs mount)
>> > >
>> > > For collecting logs, I made a hack to the guest kernel in order to dump
>> > > the page content after receiving the virtiofs responses [4]. Reproducing
>> > > the problem with this, leaves me with logs that seem to suggest that
>> > > virtiofsd is returning identical content, but the guest kernel seems to
>> > > receive differing pages:
>> > >
>> > > good-kernel [5]:
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs: page: 000000006996d520
>> > > kernel: virtio_fs: to: 00000000de590c14
>> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > > [...]
>> > >
>> > > bad-kernel [6]:
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832
>> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
>> > > kernel: virtio_fs: page: 000000006ce9a559
>> > > kernel: virtio_fs: to: 000000007ae8b946
Are these the copying from virtio to buffer cache? I would assume they
trigger -d trace:memory_notdirty_write_access if so.
>> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > > kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b.....
>> > > kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N..............
>> > > [...]
>> > >
>> > > When looking at the corresponding output from virtiofsd, it claims to
>> > > have returned identical data:
>> > >
>> > > good-virtiofsd [7]:
>> > > [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481
>> > > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
>> > > fh: 31,
>> > > offset: 0,
>> > > size: 832,
>> > > read_flags: 2,
>> > > lock_owner: 6838554705639967244,
>> > > flags: 131072,
>> > > padding: 0,
>> > > }
>> > > [src/file_traits.rs:161] hash = 2308490450751364994
>> > > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 }
>> > >
>> > > bad-virtiofsd [8]:
>> > > [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406
>> > > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
>> > > fh: 12,
>> > > offset: 0,
>> > > size: 832,
>> > > read_flags: 2,
>> > > lock_owner: 6181120926258395554,
>> > > flags: 131072,
>> > > padding: 0,
>> > > }
>> > > [src/file_traits.rs:161] hash = 2308490450751364994
>> > > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 }
>> > >
>> > > The "corruption" only seems to happen in this one page, all other pages
>> > > are identical between runs (except that the bad run terminates earlier).
>> > >
>> > > What do the experts think here? To me it feels a bit like some kind of
>> > > corruption is going on. Or am I misinterpreting things here?
>> > >
>> > > Which further analysis steps would you suggest?
>> > >
>> > >
>> > > Further notes:
>> > >
>> > > After collecting the above results, I realized that running the guest
>> > > with -smp 1 makes the problems a lot worse. So maybe that is a better
>> > > choice when trying to reproduce it.
>> > >
>> > > Repo with my scripts is available at:
>> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/
>> > >
>> > > The scripts are just quick and dirty implementations and are not
>> > > particulary portable.
>> >
>> > Summary of my testing during the last few days:
>> >
>> > Testing with KCSAN revealed a few cases that look like missing READ_ONCE
>> > annotations (will send patches separately). But nothing of that was
>> > related to the immediate problem. I tested instrument_read() and another
>> > round of logging with a delay to virtio_fs_request_complete. It looks
>> > like the buffer get corrupted before entering that function. KCSAN
>> > or manual sleeps + prints did not show any corruption while in that
>> > function.
>> >
>> > KASAN did not report any issues.
>> >
>> > Patching virtiofsd to do an additional copy and going through rust-vmm's
>> > .copy_to() function did not change the behaviour.
>> >
>> > I will mostly be off next week, will continue analysis afterwards. Happy
>> > to hear about suggestions of other things to try :).
>>
>> Back from a week of vacation...
>>
>> Summary of what was discussed on #virtiofs:matrix.org:
>>
>> The issue only seems to happen in QEMU TCG scenarios (I tested aarch64
>> and x86_64 on x86_64, wizzard on Matrix tested arm32).
>>
>> CCing qemu-devel. Maybe someone has some hints on where to focus the
>> debugging efforts?
>>
>> I am trying to build a complex monster script of tracing the relevant
>> addresses in order to figure out whether the guest or host does the
>> writes. But I am happy to hear about more clever ideas :).
>
> After hearing about investigations of bugs in other virtio scenarios
> that seem to be caused by QEMU [9], I tested some older QEMU versions.
>
> Indeed, a882b5712373171d3bd53cd82ddab4453ddef468 did not show the buggy
> behaviour. So I did a bisect:
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [a882b5712373171d3bd53cd82ddab4453ddef468] Revert "virtio:
> introduce macro IRTIO_CONFIG_IRQ_IDX"
> git bisect good a882b5712373171d3bd53cd82ddab4453ddef468
> # status: waiting for bad commit, 1 good commit known
> # bad: [9ef497755afc252fb8e060c9ea6b0987abfd20b6] Merge tag
> 'pull-vfio-20230911' of https://github.com/legoater/qemu into staging
> git bisect bad 9ef497755afc252fb8e060c9ea6b0987abfd20b6
> # skip: [3ba5fe46ea4456a16e2f47ab8e75943b54879c4e] Merge tag
> 'mips-20221108' of https://github.com/philmd/qemu into staging
> git bisect skip 3ba5fe46ea4456a16e2f47ab8e75943b54879c4e
> # skip: [ade760a2f63804b7ab1839fbc3e5ddbf30538718] Merge tag
> 'pull-request-2022-11-08' of https://gitlab.com/thuth/qemu into
> staging
> git bisect skip ade760a2f63804b7ab1839fbc3e5ddbf30538718
> # good: [ad2ca2e3f762b0cb98eb976002569795b270aef1] target/xtensa: Drop tcg_temp_free
> git bisect good ad2ca2e3f762b0cb98eb976002569795b270aef1
> # bad: [19a720b74fde7e859d19f12c66a72e545947a657] Merge tag
> 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into
> staging
> git bisect bad 19a720b74fde7e859d19f12c66a72e545947a657
> # bad: [29d9efca16080211f107b540f04d1ed3c12c63b0] arm/Kconfig: Do
> not build TCG-only boards on a KVM-only build
> git bisect bad 29d9efca16080211f107b540f04d1ed3c12c63b0
> # good: [9636e513255362c4a329e3e5fb2c97dab3c5ce47] Merge tag
> 'misc-next-pull-request' of https://gitlab.com/berrange/qemu into
> staging
> git bisect good 9636e513255362c4a329e3e5fb2c97dab3c5ce47
> # bad: [45608654aa63ca2b311d6cb761e1522f2128e00e] Merge tag
> 'pull-tpm-2023-04-20-1' of https://github.com/stefanberger/qemu-tpm
> into staging
> git bisect bad 45608654aa63ca2b311d6cb761e1522f2128e00e
> # good: [1ff4a81bd3efb207992f1da267886fe0c4df764f] tcg: use QTree instead of GTree
> git bisect good 1ff4a81bd3efb207992f1da267886fe0c4df764f
> # bad: [9ed98cae151368cc89c4bb77c9f325f7185e8f09] block-backend:
> ignore inserted state in blk_co_nb_sectors
> git bisect bad 9ed98cae151368cc89c4bb77c9f325f7185e8f09
> # good: [c8cb603293fd329f2a62ade76ec9de3f462fc5c3] tests/avocado: Test Xen guest support under KVM
> git bisect good c8cb603293fd329f2a62ade76ec9de3f462fc5c3
> # bad: [64f1c63d87208e28e8e38c4ab514ada1728960ef] Merge tag
> 'pull_error_handle_fix_use_after_free.v1' of
> https://github.com/stefanberger/qemu-tpm into staging
> git bisect bad 64f1c63d87208e28e8e38c4ab514ada1728960ef
> # good: [8a712df4d4d736b7fe6441626677bfd271d95b15] Merge tag
> 'pull-for-8.0-040423-2' of https://gitlab.com/stsquad/qemu into
> staging
> git bisect good 8a712df4d4d736b7fe6441626677bfd271d95b15
> # bad: [7d0334e49111787ae19fbc8d29ff6e7347f0605e] Merge tag
> 'pull-tcg-20230404' of https://gitlab.com/rth7680/qemu into staging
> git bisect bad 7d0334e49111787ae19fbc8d29ff6e7347f0605e
> # bad: [3371802fba3f7be4465f8a5e5777d43d556676ef] accel/tcg: Fix jump cache set in cpu_exec_loop
> git bisect bad 3371802fba3f7be4465f8a5e5777d43d556676ef
> # good: [6cda41daa2162b8e1048124655ba02a8c2b762b4] Revert
> "linux-user/arm: Take more care allocating commpage"
> git bisect good 6cda41daa2162b8e1048124655ba02a8c2b762b4
> # skip: [c83574392e0af108a643347712564f6749906413] accel/tcg: Fix overwrite problems of tcg_cflags
> git bisect skip c83574392e0af108a643347712564f6749906413
> # only skipped commits left to test
> # possible first bad commit:
> [3371802fba3f7be4465f8a5e5777d43d556676ef] accel/tcg: Fix jump cache
> set in cpu_exec_loop
It should be possible to rule out the jump cache in HEAD by patching out
the two:
if (likely(tb &&
tb->pc == pc &&
tb->cs_base == cs_base &&
tb->flags == flags &&
tb_cflags(tb) == cflags)) {
return tb;
}
functions to if(0) { return tb } in tb_lookup. That can rule out if a
corrupted jump cache is responsible (at the cost of running a bit
slower).
Apropos of earlier tb_jmp_cache_inval_tb() is responsible for removing
entries of TB's that are invalidated for whatever reason from the cache.
> # possible first bad commit:
> [c83574392e0af108a643347712564f6749906413] accel/tcg: Fix overwrite
> problems of tcg_cflags
>
> I had an inclusive test in the end where c83574392e did not yield in me
> being able to start the VM.
It's interesting that is involved the PCREL stuff. This is a
relatively recent optimisation that allows us to re-use translations for
physical pages that are mapped into virtual memory multiple times. To do
that the target translator has to support CF_PCREL and translate only
knowing the PC as an offset into the page.
>
> Whether one of these contains a bug or whether only new behaviour of
> QEMU revealed a bug somewhere else is of course still to be figured out.
>
> [9] https://gitlab.com/qemu-project/qemu/-/issues/1866
This is follow up on #1826, #1834, #1846 which is all broadly to do with
how QEMU deals with updates to it's memory maps (e.g. when PCI devices
are remapped/reconfigured). We are unsure if the MIPS architecture
should be executing some sort of barrier around reconfiguration that can
ensure we exit the block or we have to detect mid-block IO accesses
icount style.
>
> - Erik
>
>>
>> - Erik
>>
>> >
>> > Good weekend,
>> >
>> > - Erik
>> >
>> >
>> > >
>> > > - Erik
>> > >
>> > > [1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
>> > > [2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
>> > > [3]
>> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
>> > > [4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
>> > > [5] Line 12923:
>> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
>> > > [6] Line 12923:
>> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
>> > > [7]
>> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
>> > > [8]
>> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-09-13 15:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-30 8:20 [Virtio-fs] [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs Erik Schilling
2023-09-01 10:37 ` Erik Schilling
2023-09-13 6:18 ` Erik Schilling
2023-09-13 14:00 ` Erik Schilling
2023-09-13 15:38 ` Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox