All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dominique MARTINET <dominique.martinet@atmark-techno.com>
To: io-uring@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: read corruption with qemu master io_uring engine / linux master / btrfs(?)
Date: Tue, 28 Jun 2022 18:08:40 +0900	[thread overview]
Message-ID: <YrrFGO4A1jS0GI0G@atmark-techno.com> (raw)

I don't have any good reproducer so it's a bit difficult to specify,
let's start with what I have...

I've got this one VM which has various segfaults all over the place when
starting it with aio=io_uring for its disk as follow:

  qemu-system-x86_64 -drive file=qemu/atde-test,if=none,id=hd0,format=raw,cache=none,aio=io_uring \
      -device virtio-blk-pci,drive=hd0 -m 8G -smp 4 -serial mon:stdio -enable-kvm

It also happens with virtio-scsi-blk:
  -device virtio-scsi-pci,id=scsihw0 \
  -drive file=qemu/atde-test,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring \
  -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100

It also happened when the disk I was using was a qcow file backing up a
vmdk image (this VM's original disk is for vmware), so while I assume
qemu reading code and qemu-img convert code are similar I'll pretend
image format doesn't matter at this point...
It's happened with two such images, but I haven't been able to reproduce
with any other VMs yet.

I can also reproduce this on a second host machine with a completely
different ssd (WD sata in one vs. samsung nvme), so probably not a
firmware bug.

scrub sees no problem with my filesystems on the host.

I've confirmed it happens with at least debian testing's 5.16.0-4-amd64
and 5.17.0-1-amd64 kernels, as well as 5.19.0-rc4.
It also happens with both debian's 7.0.0 and the master branch
(v7.0.0-2031-g40d522490714)


These factors aside, anything else I tried changing made this bug no
longer reproduce:
 - I'm not sure what the rule is but it sometimes doesn't happen when
running the VM twice in a row, sometimes it happens again. Making a
fresh copy with `cp --reflink=always` of my source image seems to be
reliable.
 - it stops happening without io_uring
 - it stops happening if I copy the disk image with --reflink=never
 - it stops happening if I copy the disk image to another btrfs
partition, created in the same lv, so something about my partition
history matters?...
(I have ssd > GPT partitions > luks > lvm > btrfs with a single disk as
metadata DUP data single)
 - I was unable to reproduce on xfs (with a reflink copy) either but I
also was only able to try on a new fs...
 - I've never been able to reproduce on other VMs


If you'd like to give it a try, my reproducer source image is
---
curl -O https://download.atmark-techno.com/atde/atde9-amd64-20220624.tar.xz
tar xf atde9-amd64-20220624.tar.xz
qemu-img convert -O raw atde9-amd64-20220624/atde9-amd64.vmdk atde-test
cp --reflink=always atde-test atde-test2
---
and using 'atde-test'.
For further attempts I've removed atde-test and copied back from
atde-test2 with cp --reflink=always.
This VM graphical output is borked, but ssh listens so something like
`-netdev user,id=net0,hostfwd=tcp::2227-:22 -device virtio-net-pci,netdev=net0`
and 'ssh -p 2227 -l atmark localhost' should allow login with password
'atmark' or you can change vt on the console (root password 'root')

I also had similar problems with atde9-amd64-20211201.tar.xz .


When reproducing I've had either segfaults in the initrd and complete
boot failures, or boot working and login failures but ssh working
without login shell (ssh ... -tt localhost sh)
that allowed me to dump content of a couple of corrupted files.
When I looked:
- /usr/lib/x86_64-linux-gnu/libc-2.31.so had zeroes instead of data from
offset 0xb6000 to 0xb7fff; rest of file was identical.
- /usr/bin/dmesg had garbadge from 0x05000 until 0x149d8 (end of file).
I was lucky and could match the garbage quickly: it is identical to the
content from 0x1000-0x109d8 within the disk itself.

I've rebooted a few times and it looks like the corruption is identical
everytime for this machine as long as I keep using the same source file;
running from qemu-img convert again seems to change things a bit?
but whatever it is that is specific to these files is stable, even
through host reboots.



I'm sorry I haven't been able to make a better reproducer, I'll keep
trying a bit more tomorrow but maybe someone has an idea with what I've
had so far :/

Perhaps at this point it might be simpler to just try to take qemu out
of the equation and issue many parallel reads to different offsets
(overlapping?) of a large file in a similar way qemu io_uring engine
does and check their contents?


Thanks, and I'll probably follow up a bit tomorrow even if no-one has
any idea, but even ideas of where to look would be appreciated.
-- 
Dominique

             reply	other threads:[~2022-06-28  9:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28  9:08 Dominique MARTINET [this message]
2022-06-28 19:03 ` read corruption with qemu master io_uring engine / linux master / btrfs(?) Nikolay Borisov
2022-06-29  0:35   ` Dominique MARTINET
2022-06-29  5:14     ` Dominique MARTINET
2022-06-29 15:37       ` Filipe Manana
2022-06-30  0:41         ` Dominique MARTINET
2022-06-30  7:56           ` Dominique MARTINET
2022-06-30 10:45             ` Filipe Manana
2022-06-30 11:09               ` Filipe Manana
2022-06-30 11:27               ` Dominique MARTINET
2022-06-30 12:51                 ` Filipe Manana
2022-06-30 13:08                   ` Dominique MARTINET
2022-06-30 15:10                     ` Filipe Manana
2022-07-01  1:25                       ` Dominique MARTINET
2022-07-01  8:45                         ` Filipe Manana
2022-07-01 10:33                           ` Filipe Manana
2022-07-01 23:48                             ` Dominique MARTINET
2022-06-28 19:05 ` Nikolay Borisov
2022-06-28 19:12 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrrFGO4A1jS0GI0G@atmark-techno.com \
    --to=dominique.martinet@atmark-techno.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.