From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41937) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ds69g-0007pK-FZ for qemu-devel@nongnu.org; Wed, 13 Sep 2017 07:48:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ds69b-0003cd-Gz for qemu-devel@nongnu.org; Wed, 13 Sep 2017 07:48:32 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:17065) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ds69b-0003Zm-89 for qemu-devel@nongnu.org; Wed, 13 Sep 2017 07:48:27 -0400 Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v8DBmOuD022158 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 13 Sep 2017 11:48:24 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v8DBmNTX006195 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 13 Sep 2017 11:48:23 GMT Received: from ubhmp0007.oracle.com (ubhmp0007.oracle.com [156.151.24.60]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v8DBmMej013811 for ; Wed, 13 Sep 2017 11:48:23 GMT Message-ID: <59B91B00.20208@oracle.com> Date: Wed, 13 Sep 2017 12:48:16 +0100 From: Darren Kenny MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Q: Report of leaked clusters with qcow2 when disk is resized with a live VM List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi, It was observed during some testing of Qemu 2.9 that it appeared that if you resized a qcow2 block device while the VM is running, that an qemu-img check would report that there were leaked clusters. The steps to reproduce are: - First create the test image: # /usr/bin/qemu-img create -f qcow2 test.qcow2 10G Formatting 'test.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 # qemu-img check test.qcow2 No errors were found on the image. - Now run a VM based here on Oracle Linux 7, but the disto really isn't important here, since the test disk is not even mounted in the VM at this point in time: # /usr/bin/qemu-kvm \ -name 'test-vm' \ -monitor stdio \ -drive id=drive_image1,if=none,snapshot=on,format=qcow2,file=./ol73-64.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \ -drive id=drive_test,if=none,format=qcow2,file=./stg.qcow2 \ -device virtio-blk-pci,id=stg,drive=drive_test,bootindex=1,serial=TARGET_DISK0,bus=pci.0,addr=0x5 \ -net bridge,br=br1 -net nic,model=virtio,macaddr=52:54:00:90:91:92 \ -m 4096 \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -vnc :0 - Resize the img size to 15g from qemu monitor (on stdio after above command) QEMU 2.5.0 monitor - type 'help' for more information (qemu) block_resize drive_test 15360 - Now, in a separate terminal, while leaving the VM running, check the img again from host side: # qemu-img check ./test.qcow2 Leaked cluster 3 refcount=1 reference=0 1 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 327680 As it suggests above, this is not really corruption, but it is a bit misleading, and could make people think there is an issue here (hence the reason I've been asked to find a fix). What I observed, then was that if I powered down the VM, or even just quit the VM, that the subsequent check of the disk would say that everything was just fine, and there no longer were leaked clusters. In looking at the code in qcow2_truncate() it would appear that in the case where prealloc has the value PREALLOC_MODE_OFF, that we don't flush the metadata to disk - which seems to be the case here. If I ignore the if test, and always execute the block in block/qcow2.c, lines 3250 to 3258: if (prealloc != PREALLOC_MODE_OFF) { /* Flush metadata before actually changing the image size */ ret = bdrv_flush(bs); if (ret < 0) { error_setg_errno(errp, -ret, "Failed to flush the preallocated area to disk"); return ret; } } causing the flush to always be done, then the check will succeed when the VM is still running. While I know that this resolves the issue, I can only imagine that there was some reason that this check for !PREALLOC_MODE_OFF was being done in the first place. So, I'm hoping that someone here might be able to explain to me why that check is needed, but also why it might be wrong to do the flush regardless of the value of prealloc here. If it is wrong to do that flush here, then would anyone have suggestions as to an alternative solution to this issue? Thanks, Darren.