* Potential regression in 'qemu-img convert' to LVM @ 2020-09-14 12:25 Stefan Reiter 2020-09-15 9:08 ` Nir Soffer 0 siblings, 1 reply; 5+ messages in thread From: Stefan Reiter @ 2020-09-14 12:25 UTC (permalink / raw) To: qemu-block; +Cc: qemu-devel Hi list, following command fails since 5.1 (tested on kernel 5.4.60): # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1 qemu-img: error while writing at byte 2157968896: Device or resource busy (source is ZFS here, but doesn't matter in practice, it always fails the same; offset changes slightly but consistently hovers around 2^31) strace shows the following: fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896, 4608) = -1 EBUSY (Device or resource busy) Other fallocate calls leading up to this work fine. This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero images", before that all fallocates happened at the start. Reverting the commit and calling qemu-img exactly the same way on the same data works fine. Simply retrying the syscall on EBUSY (like EINTR) does *not* work, once it fails it keeps failing with the same error. I couldn't find anything related to EBUSY on fallocate, and it only happens on LVM targets... Any idea or pointers where to look? ~ Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential regression in 'qemu-img convert' to LVM 2020-09-14 12:25 Potential regression in 'qemu-img convert' to LVM Stefan Reiter @ 2020-09-15 9:08 ` Nir Soffer 2020-09-15 11:51 ` Stefan Reiter 0 siblings, 1 reply; 5+ messages in thread From: Nir Soffer @ 2020-09-15 9:08 UTC (permalink / raw) To: Stefan Reiter; +Cc: QEMU Developers, qemu-block On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote: > > Hi list, > > following command fails since 5.1 (tested on kernel 5.4.60): > > # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1 > qemu-img: error while writing at byte 2157968896: Device or resource busy > > (source is ZFS here, but doesn't matter in practice, it always fails the > same; offset changes slightly but consistently hovers around 2^31) > > strace shows the following: > fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896, > 4608) = -1 EBUSY (Device or resource busy) What is the size of the LV? Does it happen if you change sparse minimum size (-S)? For example: -S 64k qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1 /dev/vg/disk-1 > Other fallocate calls leading up to this work fine. > > This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero > images", before that all fallocates happened at the start. Reverting the > commit and calling qemu-img exactly the same way on the same data works > fine. But slowly, doing up to 100% more work for fully allocated images. > Simply retrying the syscall on EBUSY (like EINTR) does *not* work, > once it fails it keeps failing with the same error. > > I couldn't find anything related to EBUSY on fallocate, and it only > happens on LVM targets... Any idea or pointers where to look? Is this thin LV? This works for us using regular LVs. Which kernel? which distro? Nir ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential regression in 'qemu-img convert' to LVM 2020-09-15 9:08 ` Nir Soffer @ 2020-09-15 11:51 ` Stefan Reiter 2021-01-07 20:03 ` Nir Soffer 0 siblings, 1 reply; 5+ messages in thread From: Stefan Reiter @ 2020-09-15 11:51 UTC (permalink / raw) To: Nir Soffer; +Cc: QEMU Developers, qemu-block On 9/15/20 11:08 AM, Nir Soffer wrote: > On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote: >> >> Hi list, >> >> following command fails since 5.1 (tested on kernel 5.4.60): >> >> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1 >> qemu-img: error while writing at byte 2157968896: Device or resource busy >> >> (source is ZFS here, but doesn't matter in practice, it always fails the >> same; offset changes slightly but consistently hovers around 2^31) >> >> strace shows the following: >> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896, >> 4608) = -1 EBUSY (Device or resource busy) > > What is the size of the LV? > Same as the source, 5GB in my test case. Created with: # lvcreate -ay --size 5242880k --name disk-1 vg > Does it happen if you change sparse minimum size (-S)? > > For example: -S 64k > > qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1 > /dev/vg/disk-1 > Tried a few different values, always the same result: EBUSY at byte 2157968896. >> Other fallocate calls leading up to this work fine. >> >> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero >> images", before that all fallocates happened at the start. Reverting the >> commit and calling qemu-img exactly the same way on the same data works >> fine. > > But slowly, doing up to 100% more work for fully allocated images. > Of course, I'm not saying the patch is wrong, reverting it just avoids triggering the bug. >> Simply retrying the syscall on EBUSY (like EINTR) does *not* work, >> once it fails it keeps failing with the same error. >> >> I couldn't find anything related to EBUSY on fallocate, and it only >> happens on LVM targets... Any idea or pointers where to look? > > Is this thin LV? > No, regular LV. See command above. > This works for us using regular LVs. > > Which kernel? which distro? > Reproducible on: * PVE w/ kernel 5.4.60 (Ubuntu based) * Manjaro w/ kernel 5.8.6 I found that it does not happen with all images, I suppose there must be a certain number of smaller holes for it to happen. I am using a VM image with a bare-bones Alpine Linux installation, but it's not an isolated case, we've had two people report the issue on our bug tracker: https://bugzilla.proxmox.com/show_bug.cgi?id=3002 Thanks, Stefan > Nir > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential regression in 'qemu-img convert' to LVM 2020-09-15 11:51 ` Stefan Reiter @ 2021-01-07 20:03 ` Nir Soffer 2021-03-04 16:07 ` Stefan Reiter 0 siblings, 1 reply; 5+ messages in thread From: Nir Soffer @ 2021-01-07 20:03 UTC (permalink / raw) To: Stefan Reiter; +Cc: QEMU Developers, qemu-block, Maxim Levitsky On Tue, Sep 15, 2020 at 2:51 PM Stefan Reiter <s.reiter@proxmox.com> wrote: > > On 9/15/20 11:08 AM, Nir Soffer wrote: > > On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote: > >> > >> Hi list, > >> > >> following command fails since 5.1 (tested on kernel 5.4.60): > >> > >> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1 > >> qemu-img: error while writing at byte 2157968896: Device or resource busy > >> > >> (source is ZFS here, but doesn't matter in practice, it always fails the > >> same; offset changes slightly but consistently hovers around 2^31) > >> > >> strace shows the following: > >> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896, > >> 4608) = -1 EBUSY (Device or resource busy) > > > > What is the size of the LV? > > > > Same as the source, 5GB in my test case. Created with: > > # lvcreate -ay --size 5242880k --name disk-1 vg > > > Does it happen if you change sparse minimum size (-S)? > > > > For example: -S 64k > > > > qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1 > > /dev/vg/disk-1 > > > > Tried a few different values, always the same result: EBUSY at byte > 2157968896. > > >> Other fallocate calls leading up to this work fine. > >> > >> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero > >> images", before that all fallocates happened at the start. Reverting the > >> commit and calling qemu-img exactly the same way on the same data works > >> fine. > > > > But slowly, doing up to 100% more work for fully allocated images. > > > > Of course, I'm not saying the patch is wrong, reverting it just avoids > triggering the bug. > > >> Simply retrying the syscall on EBUSY (like EINTR) does *not* work, > >> once it fails it keeps failing with the same error. > >> > >> I couldn't find anything related to EBUSY on fallocate, and it only > >> happens on LVM targets... Any idea or pointers where to look? > > > > Is this thin LV? > > > > No, regular LV. See command above. > > > This works for us using regular LVs. > > > > Which kernel? which distro? > > > > Reproducible on: > * PVE w/ kernel 5.4.60 (Ubuntu based) > * Manjaro w/ kernel 5.8.6 > > I found that it does not happen with all images, I suppose there must be > a certain number of smaller holes for it to happen. I am using a VM > image with a bare-bones Alpine Linux installation, but it's not an > isolated case, we've had two people report the issue on our bug tracker: > https://bugzilla.proxmox.com/show_bug.cgi?id=3002 I think that this issue may be fixed by https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00358.html Nir ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential regression in 'qemu-img convert' to LVM 2021-01-07 20:03 ` Nir Soffer @ 2021-03-04 16:07 ` Stefan Reiter 0 siblings, 0 replies; 5+ messages in thread From: Stefan Reiter @ 2021-03-04 16:07 UTC (permalink / raw) To: Nir Soffer; +Cc: QEMU Developers, qemu-block, Maxim Levitsky On 07/01/2021 21:03, Nir Soffer wrote: > On Tue, Sep 15, 2020 at 2:51 PM Stefan Reiter <s.reiter@proxmox.com> wrote: >> >> On 9/15/20 11:08 AM, Nir Soffer wrote: >>> On Mon, Sep 14, 2020 at 3:25 PM Stefan Reiter <s.reiter@proxmox.com> wrote: >>>> >>>> Hi list, >>>> >>>> following command fails since 5.1 (tested on kernel 5.4.60): >>>> >>>> # qemu-img convert -p -f raw -O raw /dev/zvol/pool/disk-1 /dev/vg/disk-1 >>>> qemu-img: error while writing at byte 2157968896: Device or resource busy >>>> >>>> (source is ZFS here, but doesn't matter in practice, it always fails the >>>> same; offset changes slightly but consistently hovers around 2^31) >>>> >>>> strace shows the following: >>>> fallocate(13, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2157968896, >>>> 4608) = -1 EBUSY (Device or resource busy) >>> >>> What is the size of the LV? >>> >> >> Same as the source, 5GB in my test case. Created with: >> >> # lvcreate -ay --size 5242880k --name disk-1 vg >> >>> Does it happen if you change sparse minimum size (-S)? >>> >>> For example: -S 64k >>> >>> qemu-img convert -p -f raw -O raw -S 64k /dev/zvol/pool/disk-1 >>> /dev/vg/disk-1 >>> >> >> Tried a few different values, always the same result: EBUSY at byte >> 2157968896. >> >>>> Other fallocate calls leading up to this work fine. >>>> >>>> This happens since commit edafc70c0c "qemu-img convert: Don't pre-zero >>>> images", before that all fallocates happened at the start. Reverting the >>>> commit and calling qemu-img exactly the same way on the same data works >>>> fine. >>> >>> But slowly, doing up to 100% more work for fully allocated images. >>> >> >> Of course, I'm not saying the patch is wrong, reverting it just avoids >> triggering the bug. >> >>>> Simply retrying the syscall on EBUSY (like EINTR) does *not* work, >>>> once it fails it keeps failing with the same error. >>>> >>>> I couldn't find anything related to EBUSY on fallocate, and it only >>>> happens on LVM targets... Any idea or pointers where to look? >>> >>> Is this thin LV? >>> >> >> No, regular LV. See command above. >> >>> This works for us using regular LVs. >>> >>> Which kernel? which distro? >>> >> >> Reproducible on: >> * PVE w/ kernel 5.4.60 (Ubuntu based) >> * Manjaro w/ kernel 5.8.6 >> >> I found that it does not happen with all images, I suppose there must be >> a certain number of smaller holes for it to happen. I am using a VM >> image with a bare-bones Alpine Linux installation, but it's not an >> isolated case, we've had two people report the issue on our bug tracker: >> https://bugzilla.proxmox.com/show_bug.cgi?id=3002 > > I think that this issue may be fixed by > https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00358.html > > Nir > > Sorry for the late reply, but yes, I can confirm this fixes the issue. ~ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-04 16:17 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-09-14 12:25 Potential regression in 'qemu-img convert' to LVM Stefan Reiter 2020-09-15 9:08 ` Nir Soffer 2020-09-15 11:51 ` Stefan Reiter 2021-01-07 20:03 ` Nir Soffer 2021-03-04 16:07 ` Stefan Reiter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).