Re: [Qemu-devel] [Qemu-discuss] Converting qcow2 image to raw thin lv

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Nir Soffer <nirsof@gmail.com>, Max Reitz <mreitz@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-discuss@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-discuss] Converting qcow2 image to raw thin lv
Date: Mon, 13 Feb 2017 13:11:51 +0100	[thread overview]
Message-ID: <20170213121151.GA31520@olga.wb> (raw)
In-Reply-To: <20170213100430.GA4011@noname.redhat.com>

On Mon, Feb 13, 2017 at 11:04:30AM +0100, Kevin Wolf wrote:
> Am 12.02.2017 um 01:58 hat Nir Soffer geschrieben:
> > On Sat, Feb 11, 2017 at 12:23 AM, Nir Soffer <nirsof@gmail.com> wrote:
> > > Hi all,
> > >
> > > I'm trying to convert images (mostly qcow2) to raw format on thin lv,
> > > hoping to write only the allocated blocks on the thin lv, but
> > > it seems that qemu-img cannot write sparse image on a block
> > > device.
> > >
> > > (...)
> > 
> > So it seems that qemu-img is trying to write a sparse image.
> > 
> > I tested again with empty file:
> > 
> >     truncate -s 20m empty
> > 
> > Using strace, qemu-img checks the device discard_zeroes_data:
> > 
> >     ioctl(11, BLKDISCARDZEROES, 0)          = 0
> > 
> > Then it find that the source is empty:
> > 
> >     lseek(10, 0, SEEK_DATA)                 = -1 ENXIO (No such device
> > or address)
> > 
> > Then it issues one call
> > 
> >     [pid 10041] ioctl(11, BLKZEROOUT, 0x7f6049c82ba0) = 0
> > 
> > And fsync and close the destination.
> > 
> > # grep -s "" /sys/block/dm-57/queue/discard_*
> > /sys/block/dm-57/queue/discard_granularity:65536
> > /sys/block/dm-57/queue/discard_max_bytes:17179869184
> > /sys/block/dm-57/queue/discard_zeroes_data:0
> > 
> > I wonder why discard_zeroes_data is 0, while discarding
> > blocks seems to zero them.
> > 
> > Seems that this this bug:
> > https://bugzilla.redhat.com/835622
> > 
> > thin lv does promise (by default) to zero new allocated blocks,
> > and it does returns zeros when reading unallocated data, like
> > a sparse file.
> > 
> > Since qemu does not know that the thin lv is not allocated, it cannot
> > skip empty blocks safely.
> > 
> > It would be useful if it had a flag to force sparsness when the
> > user knows that this operation is safe, or maybe we need a thin lvm
> > driver?
> 
> Yes, I think your analysis is correct, I seem to remember that I've seen
> this happen before.
> 
> The Right Thing (TM) to do, however, seems to be fixing the kernel so
> that BLKDISCARDZEROES correctly returns that discard does in fact zero
> out blocks on this device. As soon as this ioctl works correctly,
> qemu-img should just automatically do what you want.
> 
> Now if it turns out it is important to support older kernels without the
> fix, we can think about a driver-specific option for the 'file' driver
> that overrides the kernel's value. But I really want to make sure that
> we use such workarounds only in addition, not instead of doing the
> proper root cause fix in the kernel.
> 
> So can you please bring it up with the LVM people?

I'm not sure it's that easy. The discard granularity of LVM thin is not
equal to their reported block/sector sizes, but to the size of the
chunks they allocate.

  # blockdev --getss /dev/dm-9
  512
  # blockdev --getbsz /dev/dm-9
  4096
  # blockdev --getpbsz /dev/dm-9
  4096
  # cat /sys/block/dm-9/queue/discard_granularity
  131072
  #

I currently don't see qemu using the discard_granularity property for
this purpose. IIRC the code for write_zeroes() eg. simply checks the
discard_zeroes flag but not what size it is trying to zero-out/discard.

We have an experimental semi-complete "can-do-footshooting" 'zeroinit'
filter for this purpose to basically explicitly set the "has_zero_init"
flag and drop "write_zeroes()" calls to blocks at an address greater
than the highest written one up to that point.
It should use a dirty bitmap instead and is sort of dangerous this way
which is why it's not on the qemu-devel list. But if this approach is at
all acceptable (despite being a hack) I could improve it and send it to
the list?
https://github.com/Blub/qemu/commit/6f6f38d2ef8f22a12f72e4d60f8a1fa978ac569a
(you'd just prefix the destination with `zeroinit:` in the qemu-img
command)

Additionally I'm currently still playing with the details and quirks of
various storages (lvm/dm thin, rbd, zvols) in an attempt to create a
tool to convert between various storages. (I did some successful tests
converting disk images between these storages & qcow2 together with
their snapshots in a COW-aware way...) I'm planning on releasing some
experimental code soon-ish (there's still some polishing to do though to
the documentation, the library's API and the format - and the qcow2
support is a patch for qemu-img to use the library.)

My adventures into dm-thin metadata allows me to answer this one though:

> > or maybe we need a thin lvm driver?

Probably not. It does not support SEEK_DATA/SEEK_HOLE and to my
knowledge also has no other sane metadata querying methods. You'd have
to read the metadata device instead. To do this properly you have to
reserve a metadata snapshot and there can only ever be one of those per
pool, which means you could only have 1 such disk in total running on a
system and no other dm-thin metadata aware tool could be used during
that time (otherwise the reserver operations will fail with an error and
qemu would have to wait&retry a lot...).

     prev parent reply	other threads:[~2017-02-13 12:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAMr-obvQ82w5+fuF1Q1pnbWvvp0mFguG8R1uaBV+RDBkKdEwGg@mail.gmail.com>
     [not found] ` <CAMr-obv38E7jmVN4r-UQLoBgk5bBj9ZWWrYts6=12Pkiz62KvA@mail.gmail.com>
2017-02-13 10:04   ` [Qemu-devel] Converting qcow2 image to raw thin lv Kevin Wolf
2017-02-13 12:11     ` Wolfgang Bumiller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170213121151.GA31520@olga.wb \
    --to=w.bumiller@proxmox.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-discuss@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).