Re: [Qemu-devel] [Qemu-block] Request for clarification on qemu-img convert behavior zeroing target host_device

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "De Backer, Fred (Nokia - BE/Antwerp)" <fred.de_backer@nokia.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Nir Soffer <nsoffer@redhat.com>,
	qemu block <qemu-block@nongnu.org>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	"Aamir T, Owais (Nokia - IN/Chennai)" <owais.aamir_t@nokia.com>
Subject: Re: [Qemu-devel] [Qemu-block] Request for clarification on qemu-img convert behavior zeroing target host_device
Date: Thu, 13 Dec 2018 09:05:43 -0600	[thread overview]
Message-ID: <db4f1763-e28f-387c-2652-ba71c29e97ff@redhat.com> (raw)
In-Reply-To: <20181213144914.GH5427@linux.fritz.box>

On 12/13/18 8:49 AM, Kevin Wolf wrote:

>>> We observe that in Fedora 29 the qemu-img, before imaging the disk, it fully zeroes it. Taking into account the disk size, the whole process now takes 35 minutes instead of 50 seconds. This causes the ironic-python-agent operation to time-out. The Fedora 27 qemu-img doesn't do that.
>>
>> Known issue; Nir and Rich have posted a previous thread on the topic, and
>> the conclusion is that we need to make qemu-img smarter about NOT requesting
>> pre-zeroing of devices where that is more expensive than just zeroing as we
>> go.
>> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg01182.html
> 
> Yes, we should be careful to avoid the fallback in this case.
> 
> However, how could this ever go from 50 seconds for writing the whole
> image to 35 minutes?! Even if you end up writing the whole image twice
> because you write zeros first and then overwrite them everywhere with
> data, shouldn't the maximum be doubling the time, i.e. 100 seconds?
> 
> Why is the write_zeroes fallback _that_ slow? It will also hit guests
> that request write_zeroes, so I feel this is worth investigating a bit
> more nevertheless.
> 
> Can you check with strace which operation actually succeeds writing
> zeros to /dev/sda? The first thing we try is fallocate with
> FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE. This should always be fast,
> so I suppose this fails in your case. The next thing is BLKZEROOUT,
> which I think can do a fallback in the kernel. Does this return success?
> Otherwise we have another fallback mechanism inside of QEMU, which would
> use normal pwrite calls with a zeroed buffer.

It may also be a case of poor lseek(SEEK_HOLE) performance on the source 
(a known issue with at least some versions of tmpfs). The way qemu-img 
queries for block status, it ends up repeatedly hammering on lseek(), 
and if lseek() is already O(n) instead of O(1) in behavior, that 
explodes into some O(n^2) scaling because qemu-img isn't caching the 
answers it got previously.

> 
> Once we know which mechanism is used, we can look into why it is so
> abysmally slow.

Indeed, performance traces are important for issues like this.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

next prev parent reply	other threads:[~2018-12-13 15:05 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <DB6PR07MB33330E3562B0AF0ECC8307BCAEA00@DB6PR07MB3333.eurprd07.prod.outlook.com>
2018-12-13 13:12 ` [Qemu-devel] Request for clarification on qemu-img convert behavior zeroing target host_device De Backer, Fred (Nokia - BE/Antwerp)
2018-12-13 14:17   ` Eric Blake
2018-12-13 14:49     ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2018-12-13 15:05       ` Eric Blake [this message]
2018-12-13 21:14         ` De Backer, Fred (Nokia - BE/Antwerp)
2018-12-13 21:53           ` Nir Soffer
     [not found]             ` <VI1PR07MB3344412C5DE71936F9689909AEA10@VI1PR07MB3344.eurprd07.prod.outlook.com>
2018-12-14 10:59               ` De Backer, Fred (Nokia - BE/Antwerp)
2018-12-14 12:26                 ` Kevin Wolf
2018-12-14 12:52                   ` De Backer, Fred (Nokia - BE/Antwerp)
2018-12-14 13:10                     ` Richard W.M. Jones
2018-12-14 13:22                   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db4f1763-e28f-387c-2652-ba71c29e97ff@redhat.com \
    --to=eblake@redhat.com \
    --cc=fred.de_backer@nokia.com \
    --cc=kwolf@redhat.com \
    --cc=nsoffer@redhat.com \
    --cc=owais.aamir_t@nokia.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).