Re: [Qemu-devel] [Qemu-block] Q: Report of leaked clusters with qcow2 when disk is resized with a live VM

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Darren Kenny <darren.kenny@oracle.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-block] Q: Report of leaked clusters with qcow2 when disk is resized with a live VM
Date: Wed, 13 Sep 2017 14:32:38 +0100	[thread overview]
Message-ID: <59B93376.1070108@oracle.com> (raw)
In-Reply-To: <20170913122005.GB5319@localhost.localdomain>

Hi Kevin,

Thanks for getting back to me so quickly.

Kevin Wolf wrote:
> Am 13.09.2017 um 14:00 hat Darren Kenny geschrieben:
>> [Cross-posted from qemu-devel, meant to send here first]
>
> Just keep both lists in the CC for the same email.

Will do.
> There is an issue here, which is that you are accessing the image at 
> the same time from two separate processes. qemu is using writeback 
> caches in order to improve performance, so only after the guest has 
> issued a flush command to its disk or after you shut down or at least 
> pause qemu, the changes are fully written to the image file. In qemu 
> 2.10, you would probably see this instead: $ qemu-img check 
> ./test.qcow2 qemu-img: Could not open './test.qcow2': Failed to get 
> shared "write" lock Is another process using the image? This lock can 
> be overridden, but at least it shows clearly that you are doing 
> something that you probably shouldn't be doing.

Hmm, I've just updated to the HEAD of the Git repo, and I didn't see this
locking behaviour, it still did the same thing as before.

Does the disk need to be formatted/mounted before it's seen as locked?
Or even a configure option?

The version that have is:

     $ qemu-img --version
     qemu-img version 2.10.50 (v2.10.0-476-g04ef330)
     Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

     $ qemu-system-x86_64 --version
     QEMU emulator version 2.10.50 (v2.10.0-476-g04ef330)
     Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

The last commit I have is (as in the version string):

     04ef330 tcg/tci: do not use ldst label (never implemented)

>> What I observed, then was that if I powered down the VM, or even just
>> quit the VM, that the subsequent check of the disk would say that
>> everything was just fine, and there no longer were leaked clusters.
>
> Yes, quitting the VM writes out all of the cached data, so the on-disk
> state and the in-memory state become fully consistent again.

OK

>> In looking at the code in qcow2_truncate() it would appear that in the case
>> where prealloc has the value PREALLOC_MODE_OFF, that we don't flush the
>> metadata to disk - which seems to be the case here.
>>
>> If I ignore the if test, and always execute the block in block/qcow2.c,
>> lines 3250 to 3258:
>>
>>    if (prealloc != PREALLOC_MODE_OFF) {
>>        /* Flush metadata before actually changing the image size */
>>        ret = bdrv_flush(bs);
>>        if (ret<  0) {
>>            error_setg_errno(errp, -ret,
>>                             "Failed to flush the preallocated area to disk");
>>            return ret;
>>        }
>>    }
>>
>> causing the flush to always be done, then the check will succeed when
>> the VM is still running.
>>
>> While I know that this resolves the issue, I can only imagine that
>> there was some reason that this check for !PREALLOC_MODE_OFF was being
>> done in the first place.
>>
>> So, I'm hoping that someone here might be able to explain to me why
>> that check is needed, but also why it might be wrong to do the flush
>> regardless of the value of prealloc here.
>
> Doing a flush here wouldn't be wrong, but it's also unnecessary and
> would slow down the operation a bit.
Sure, but how often does a resize/truncate get done? Would seem like a
small impact to do it - but I agree w.r.t. the single-process access
as a better solution.
> For the preallocation case, the goal that is achieved with the flush is
> that the header update with the new image size is only written if the
> preallocation succeeded, so that in error cases you don't end up with an
> image that seems to be resized, but isn't actually preallocated as
> requested.
Understood, thanks.
>> If it is wrong to do that flush here, then would anyone have
>> suggestions as to an alternative solution to this issue?
>
> Don't call 'qemu-img check' (or basically any other operation on the
> image) while a VM is using the image.
Heh, sure - if the locking was working as you describe then I think
I would be fine with that :)

Thanks,

Darren.

next prev parent reply	other threads:[~2017-09-13 13:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-13 11:48 [Qemu-devel] Q: Report of leaked clusters with qcow2 when disk is resized with a live VM Darren Kenny
     [not found] ` <59B91DCA.5080405@oracle.com>
2017-09-13 12:20   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2017-09-13 13:32     ` Darren Kenny [this message]
2017-09-13 14:07       ` Kevin Wolf
2017-09-13 15:33         ` Darren Kenny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59B93376.1070108@oracle.com \
    --to=darren.kenny@oracle.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).