Re: [Qemu-devel] [PATCH v4] block/vdi: Use bdrv_flush after metadata updates

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: phoeagon <phoeagon@gmail.com>
To: Stefan Weil <sw@weilnetz.de>, Kevin Wolf <kwolf@redhat.com>,
	Max Reitz <mreitz@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4] block/vdi: Use bdrv_flush after metadata updates
Date: Sat, 09 May 2015 03:54:45 +0000	[thread overview]
Message-ID: <CAKYApDBqmPq2idjVne6zQveJ6mxSw+Bcp_9vzzvOXEfqy_q65Q@mail.gmail.com> (raw)
In-Reply-To: <554D2A03.3080201@weilnetz.de>

[-- Attachment #1: Type: text/plain, Size: 4586 bytes --]

Thanks. Dbench does not logically allocate new disk space all the time,
because it's a FS level benchmark that creates file and deletes them.
Therefore it also depends on the guest FS, say, a btrfs guest FS allocates
about 1.8x space of that from EXT4, due to its COW nature. It does cause
the FS to allocate some space during about 1/3 of the test duration I
think. But this does not mitigate it too much because a FS often writes in
a stride rather than consecutively, which causes write amplification at
allocation times.

So I tested it with qemu-img convert from a 400M raw file:
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t unsafe
-O vdi /run/shm/rand 1.vdi

real 0m0.402s
user 0m0.206s
sys 0m0.202s
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t
writeback -O vdi /run/shm/rand 1.vdi

real 0m8.678s
user 0m0.169s
sys 0m0.500s
zheq-PC sdb # time qemu-img convert -f raw -t writeback -O vdi
/run/shm/rand 1.vdi

real 0m4.320s
user 0m0.148s
sys 0m0.471s
zheq-PC sdb # time qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand
1.vdi
real 0m0.489s
user 0m0.173s
sys 0m0.325s

zheq-PC sdb # time qemu-img convert -f raw -O vdi /run/shm/rand 1.vdi

real 0m0.515s
user 0m0.168s
sys 0m0.357s
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -O vdi
/run/shm/rand 1.vdi

real 0m0.431s
user 0m0.192s
sys 0m0.248s

Although 400M is not a giant file, it does show the trend.
As you can see when there's drastic allocation needs, and when there no
extra buffering from a virtualized host, the throughput drops about 50%.
But still it has no effect on "unsafe" mode, as predicted. Also I believe
that expecting to use a half-converted image is seldom the use case, while
host crash and power loss are not so unimaginable.
Looks like qemu-img convert is using "unsafe" as default as well, so even
novice "qemu-img convert" users are not likely to find performance
degradation.

I have not yet tried guest OS installation on top, but I guess a new flag
for one-time faster OS installation is not likely useful, and
"cache=unsafe" already does the trick.


On Sat, May 9, 2015 at 5:26 AM Stefan Weil <sw@weilnetz.de> wrote:

> Am 08.05.2015 um 15:55 schrieb Kevin Wolf:
> > Am 08.05.2015 um 15:14 hat Max Reitz geschrieben:
> >> On 07.05.2015 17:16, Zhe Qiu wrote:
> >>> In reference to b0ad5a45...078a458e, metadata writes to
> >>> qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes.
> >>>
> >>> Only when write is successful that bdrv_flush is called.
> >>>
> >>> Signed-off-by: Zhe Qiu <phoeagon@gmail.com>
> >>> ---
> >>>   block/vdi.c | 3 +++
> >>>   1 file changed, 3 insertions(+)
> >> I missed Kevin's arguments before, but I think that adding this is
> >> more correct than not having it; and when thinking about speed, this
> >> is vdi, a format supported for compatibility.
> > If you use it only as a convert target, you probably care more about
> > speed than about leaks in case of a host crash.
> >
> >> So if we wanted to optimize it, we'd probably have to cache multiple
> >> allocations, do them at once and then flush afterwards (like the
> >> metadata cache we have in qcow2?)
> > That would defeat the purpose of this patch which aims at having
> > metadata and data written out almost at the same time. On the other
> > hand, fully avoiding the problem instead of just making the window
> > smaller would require a journal, which VDI just doesn't have.
> >
> > I'm not convinced of this patch, but I'll defer to Stefan Weil as the
> > VDI maintainer.
> >
> > Kevin
>
> Thanks for asking. I share your concerns regarding reduced performance
> caused by bdrv_flush. Conversions to VDI will take longer (how much?),
> and also installation of an OS on a new VDI disk image will be slower
> because that are the typical scenarios where the disk usage grows.
>
> @phoeagon: Did the benchmark which you used allocate additional disk
> storage? If not or if it only allocated once and then spent some time
> on already allocated blocks, that benchmark was not valid for this case.
>
> On the other hand I don't see a need for the flushing because the kind
> of failures (power failure) and their consequences seem to be acceptable
> for typical VDI usage, namely either image conversion or tests with
> existing images.
>
> That's why I'd prefer not to use bdrv_flush here. Could we make
> bdrv_flush optional (either generally or for cases like this one) so
> both people who prefer speed and people who would want
> bdrv_flush to decrease the likelihood of inconsistencies can be
> satisfied?
>
> Stefan
>
>

[-- Attachment #2: Type: text/html, Size: 6867 bytes --]

next prev parent reply	other threads:[~2015-05-09  3:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-07  4:04 [Qemu-devel] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates Zhe Qiu
2015-05-07 10:09 ` Stefan Hajnoczi
2015-05-07 15:05 ` [Qemu-devel] [Qemu-block] " Eric Blake
2015-05-07 15:16 ` [Qemu-devel] [PATCH v4] " Zhe Qiu
2015-05-08 13:14   ` Max Reitz
2015-05-08 13:55     ` Kevin Wolf
2015-05-08 14:43       ` phoeagon
2015-05-08 21:26       ` Stefan Weil
2015-05-09  3:54         ` phoeagon [this message]
2015-05-09  3:59           ` phoeagon
2015-05-09  6:39             ` Stefan Weil
2015-05-09  7:41               ` phoeagon
2015-05-10 15:01           ` Paolo Bonzini
2015-05-10 16:02             ` Stefan Weil
2015-05-10 16:05               ` phoeagon
2015-05-10 16:10               ` Paolo Bonzini
2015-05-10 16:26                 ` Stefan Weil
2015-05-10 17:14                   ` phoeagon
2015-05-08 10:43 ` [Qemu-devel] [Qemu-block] [PATCH v3] " Kevin Wolf
2015-05-08 11:50   ` phoeagon
2015-05-08 12:02     ` Kevin Wolf
2015-05-08 12:56       ` phoeagon
2015-05-08 13:10 ` [Qemu-devel] " Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKYApDBqmPq2idjVne6zQveJ6mxSw+Bcp_9vzzvOXEfqy_q65Q@mail.gmail.com \
    --to=phoeagon@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).