From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42418) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ds7mj-00055A-WB for qemu-devel@nongnu.org; Wed, 13 Sep 2017 09:33:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ds7mf-0002Tp-Lh for qemu-devel@nongnu.org; Wed, 13 Sep 2017 09:32:57 -0400 Message-ID: <59B93376.1070108@oracle.com> Date: Wed, 13 Sep 2017 14:32:38 +0100 From: Darren Kenny MIME-Version: 1.0 References: <59B91B00.20208@oracle.com> <59B91DCA.5080405@oracle.com> <20170913122005.GB5319@localhost.localdomain> In-Reply-To: <20170913122005.GB5319@localhost.localdomain> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-block] Q: Report of leaked clusters with qcow2 when disk is resized with a live VM List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org Hi Kevin, Thanks for getting back to me so quickly. Kevin Wolf wrote: > Am 13.09.2017 um 14:00 hat Darren Kenny geschrieben: >> [Cross-posted from qemu-devel, meant to send here first] > > Just keep both lists in the CC for the same email. Will do. > There is an issue here, which is that you are accessing the image at > the same time from two separate processes. qemu is using writeback > caches in order to improve performance, so only after the guest has > issued a flush command to its disk or after you shut down or at least > pause qemu, the changes are fully written to the image file. In qemu > 2.10, you would probably see this instead: $ qemu-img check > ./test.qcow2 qemu-img: Could not open './test.qcow2': Failed to get > shared "write" lock Is another process using the image? This lock can > be overridden, but at least it shows clearly that you are doing > something that you probably shouldn't be doing. Hmm, I've just updated to the HEAD of the Git repo, and I didn't see this locking behaviour, it still did the same thing as before. Does the disk need to be formatted/mounted before it's seen as locked? Or even a configure option? The version that have is: $ qemu-img --version qemu-img version 2.10.50 (v2.10.0-476-g04ef330) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers $ qemu-system-x86_64 --version QEMU emulator version 2.10.50 (v2.10.0-476-g04ef330) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers The last commit I have is (as in the version string): 04ef330 tcg/tci: do not use ldst label (never implemented) >> What I observed, then was that if I powered down the VM, or even just >> quit the VM, that the subsequent check of the disk would say that >> everything was just fine, and there no longer were leaked clusters. > > Yes, quitting the VM writes out all of the cached data, so the on-disk > state and the in-memory state become fully consistent again. OK >> In looking at the code in qcow2_truncate() it would appear that in the case >> where prealloc has the value PREALLOC_MODE_OFF, that we don't flush the >> metadata to disk - which seems to be the case here. >> >> If I ignore the if test, and always execute the block in block/qcow2.c, >> lines 3250 to 3258: >> >> if (prealloc != PREALLOC_MODE_OFF) { >> /* Flush metadata before actually changing the image size */ >> ret = bdrv_flush(bs); >> if (ret< 0) { >> error_setg_errno(errp, -ret, >> "Failed to flush the preallocated area to disk"); >> return ret; >> } >> } >> >> causing the flush to always be done, then the check will succeed when >> the VM is still running. >> >> While I know that this resolves the issue, I can only imagine that >> there was some reason that this check for !PREALLOC_MODE_OFF was being >> done in the first place. >> >> So, I'm hoping that someone here might be able to explain to me why >> that check is needed, but also why it might be wrong to do the flush >> regardless of the value of prealloc here. > > Doing a flush here wouldn't be wrong, but it's also unnecessary and > would slow down the operation a bit. Sure, but how often does a resize/truncate get done? Would seem like a small impact to do it - but I agree w.r.t. the single-process access as a better solution. > For the preallocation case, the goal that is achieved with the flush is > that the header update with the new image size is only written if the > preallocation succeeded, so that in error cases you don't end up with an > image that seems to be resized, but isn't actually preallocated as > requested. Understood, thanks. >> If it is wrong to do that flush here, then would anyone have >> suggestions as to an alternative solution to this issue? > > Don't call 'qemu-img check' (or basically any other operation on the > image) while a VM is using the image. Heh, sure - if the locking was working as you describe then I think I would be fine with that :) Thanks, Darren.