From: Kevin Wolf <kwolf@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>
Cc: QEMU Developers <qemu-devel@nongnu.org>,
qemu-block <qemu-block@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH 0/2] qcow2: Force preallocation with data-file-raw
Date: Tue, 23 Jun 2020 12:18:14 +0200 [thread overview]
Message-ID: <20200623101814.GD5853@linux.fritz.box> (raw)
In-Reply-To: <CAMRbyyu+tkhZLJXKiuDRxRixZqsXgzQ3GzgcnP0pXN2-r6Xagw@mail.gmail.com>
Am 22.06.2020 um 17:50 hat Nir Soffer geschrieben:
> On Mon, Jun 22, 2020 at 12:47 PM Max Reitz <mreitz@redhat.com> wrote:
> >
> > On 22.06.20 00:25, Nir Soffer wrote:
> > > On Fri, Jun 19, 2020 at 1:40 PM Max Reitz <mreitz@redhat.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> As discussed here:
> > >>
> > >> https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00644.html
> > >> https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg00329.html
> > >> https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg00240.html
> > >>
> > >> I think that qcow2 images with data-file-raw should always have
> > >> preallocated 1:1 L1/L2 tables, so that the image always looks the same
> > >> whether you respect or ignore the qcow2 metadata.
> > >
> > > I don't know the internals of qcow2 data_file, but are we really using
> > > qcow2 metadata when accessing the data file?
> >
> > Yes.
> >
> > > This may have unwanted performance consequences.
> >
> > I don’t think so, because in practice normal lookups of L1/L2 mappings
> > generally don’t cost that much performance.
> >
> > > If I understand correctly, qcow2 metadata is needed only for keeping
> > > bitmaps (or maybe
> > > future extensions) for raw data file, and reading from the qcow2 image
> > > should be read
> > > directly from the raw file without any extra work.
> > >
> > > Writing to the data file should also bypass the qcow2 metadata, since the bitmap
> > > is updated in memory.
> >
> > Well, with this series, writing would no longer update the metadata at
> > least, because it would always be preallocated already.
> >
> > >> The easiest way to
> > >> achieve that is to enforce at least metadata preallocation whenever
> > >> data-file-raw is given.
> > >
> > > But preallocation is not free, even on file systems, it can be even
> > > slow (NFS < 4.2).
> >
> > Metadata preallocation with an external data file should be the same
> > speed on every file system. We only need to create the metadata
> > structures, which, with the default cluster size (64k) take up a bit
> > more than 1/8192 of the full image size.
> >
> > Sure, it’s not free. But if we decide we should indeed fully ignore the
> > L1/L2 tables for data-file-raw images, the qcow2 spec must be amended.
> > As I can read it, it currently doesn’t say so.
> >
> > (By the way, this is not a trivial change. Right now, data-file-raw is
> > an autoclear flag: If a version of qemu that doesn’t support it accesses
> > the image, it will automatically clear the flag, but the image stays
> > valid. If we decide to completely ignore the L1/L2 tables (i.e. not
> > even create them), then this can no longer be an autoclear flag. We’d
> > need a new incompatible flag. (Because without L1/L2 tables, the image
> > becomes useless to older qemu versions.))
> >
> > > With block storage this means you need to allocate the entire image size on
> > > storage for writing the metadata.
> > >
> > > While oVirt does not use qcow2 with data_file, having preallocated qcow2
> > > will make this very hard to use, for example for 500 GiB disk we will have to
> > > allocate 500 GiB disk for the raw data file and 500 GiB disk for the qcow2
> > > metadata disk which will be 99% unused.
> >
> > I don’t understand this. When you use an external data file, the qcow2
> > file will only contain the metadata:
> >
> > $ qemu-img create -f qcow2 \
> > -o data_file=foo.data,data_file_raw=on,preallocation=metadata \
> > foo.qcow2 8G
> > Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 data_file=foo.data
> > data_file_raw=on cluster_size=65536 preallocation=metadata
> > lazy_refcounts=off refcount_bits=16
> > $ ls -l foo.qcow2
> > ... 1310720 ... foo.qcow2
> > $ ls -l foo.data
> > ... 8589934592 ... foo.data
>
> When allocating metadata in regular qcow2, need the to allocate the
> entire device
> (+ extra space for metadata overhead):
>
> # qemu-img create -f qcow2 -o preallocation=metadata foo.qcow2 500g
> Formatting 'foo.qcow2', fmt=qcow2 size=536870912000 cluster_size=65536
> preallocation=metadata lazy_refcounts=off refcount_bits=16
>
> # qemu-img check foo.qcow2
> No errors were found on the image.
> 8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
> Image end offset: 536953094144
I think we shouldn't really call this "allocating" because we don't
actually reserve space for it yet. On a filesystem, you get a large file
size, but it's almost completely sparse. On block devices, it depends on
whether the storage has thin provisioning.
> But I see that with metadata file we allocate much less:
>
> # qemu-img create -f qcow2 -o
> data_file=foo.data,data_file_raw=on,preallocation=metadata foo.qcow2
> 500g
> Formatting 'foo.qcow2', fmt=qcow2 size=536870912000 data_file=foo.data
> data_file_raw=on cluster_size=65536 preallocation=metadata
> lazy_refcounts=off refcount_bits=16
>
> # qemu-img check foo.qcow2
> No errors were found on the image.
> 8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
> Image end offset: 65798144
Actually, this is not much less, but just split in two places. You still
have the 500 GB data file. The metadata is small, but it was already
small before:
536953094144 - 536870912000 = ~78 MB.
Not exactly sure why it's more than the 64 MB you get for an external
data file, maybe some alignment thing, but not significant anyway.
Kevin
next prev parent reply other threads:[~2020-06-23 10:19 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 10:40 [PATCH 0/2] qcow2: Force preallocation with data-file-raw Max Reitz
2020-06-19 10:40 ` [PATCH 1/2] " Max Reitz
2020-06-19 16:47 ` Alberto Garcia
2020-06-22 9:35 ` Max Reitz
2020-06-22 9:48 ` Max Reitz
2020-06-23 10:26 ` Kevin Wolf
2020-06-22 14:46 ` Alberto Garcia
2020-06-22 15:06 ` Max Reitz
2020-06-22 15:15 ` Nir Soffer
2020-06-22 15:48 ` Max Reitz
2020-06-22 18:34 ` Eric Blake
2020-06-22 17:36 ` Alberto Garcia
2020-06-23 7:28 ` Max Reitz
2020-06-19 10:40 ` [PATCH 2/2] iotests/244: Test preallocation for data-file-raw Max Reitz
2020-06-19 11:55 ` [PATCH 0/2] qcow2: Force preallocation with data-file-raw no-reply
2020-06-21 22:25 ` Nir Soffer
2020-06-22 9:47 ` Max Reitz
2020-06-22 15:50 ` Nir Soffer
2020-06-23 10:18 ` Kevin Wolf [this message]
2020-06-22 17:44 ` Alberto Garcia
2020-06-23 7:28 ` Max Reitz
2020-06-23 10:04 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200623101814.GD5853@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=nsoffer@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).