From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40302) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXSYi-000696-5N for qemu-devel@nongnu.org; Thu, 13 Dec 2018 10:05:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gXSYh-0005W5-0p for qemu-devel@nongnu.org; Thu, 13 Dec 2018 10:05:52 -0500 References: <14b32e39-a9f6-1a83-87e6-6e150954ddb7@redhat.com> <20181213144914.GH5427@linux.fritz.box> From: Eric Blake Message-ID: Date: Thu, 13 Dec 2018 09:05:43 -0600 MIME-Version: 1.0 In-Reply-To: <20181213144914.GH5427@linux.fritz.box> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-block] Request for clarification on qemu-img convert behavior zeroing target host_device List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: "De Backer, Fred (Nokia - BE/Antwerp)" , "qemu-devel@nongnu.org" , Nir Soffer , qemu block , "Richard W.M. Jones" , "Aamir T, Owais (Nokia - IN/Chennai)" On 12/13/18 8:49 AM, Kevin Wolf wrote: >>> We observe that in Fedora 29 the qemu-img, before imaging the disk, it fully zeroes it. Taking into account the disk size, the whole process now takes 35 minutes instead of 50 seconds. This causes the ironic-python-agent operation to time-out. The Fedora 27 qemu-img doesn't do that. >> >> Known issue; Nir and Rich have posted a previous thread on the topic, and >> the conclusion is that we need to make qemu-img smarter about NOT requesting >> pre-zeroing of devices where that is more expensive than just zeroing as we >> go. >> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg01182.html > > Yes, we should be careful to avoid the fallback in this case. > > However, how could this ever go from 50 seconds for writing the whole > image to 35 minutes?! Even if you end up writing the whole image twice > because you write zeros first and then overwrite them everywhere with > data, shouldn't the maximum be doubling the time, i.e. 100 seconds? > > Why is the write_zeroes fallback _that_ slow? It will also hit guests > that request write_zeroes, so I feel this is worth investigating a bit > more nevertheless. > > Can you check with strace which operation actually succeeds writing > zeros to /dev/sda? The first thing we try is fallocate with > FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE. This should always be fast, > so I suppose this fails in your case. The next thing is BLKZEROOUT, > which I think can do a fallback in the kernel. Does this return success? > Otherwise we have another fallback mechanism inside of QEMU, which would > use normal pwrite calls with a zeroed buffer. It may also be a case of poor lseek(SEEK_HOLE) performance on the source (a known issue with at least some versions of tmpfs). The way qemu-img queries for block status, it ends up repeatedly hammering on lseek(), and if lseek() is already O(n) instead of O(1) in behavior, that explodes into some O(n^2) scaling because qemu-img isn't caching the answers it got previously. > > Once we know which mechanism is used, we can look into why it is so > abysmally slow. Indeed, performance traces are important for issues like this. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org