Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Cody <jcody@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Zhi Hui Li <zhihuili@linux.vnet.ibm.com>,
	Taisuke Yamada <tai@rakugaki.org>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Fri, 08 Jun 2012 10:32:49 -0400	[thread overview]
Message-ID: <4FD20D11.1080603@redhat.com> (raw)
In-Reply-To: <CAJSP0QVtvYxCOJX1okgfJH1vKG8nWdmPE5Ew3Dwvjm51-rUCzA@mail.gmail.com>

On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>> On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>>> We want to commit snap1.qcow2 down into vm001.img while the guest is running:
>>>
>>> vm001.img <-- snap2.qcow2
>>>
>>> This means copying allocated blocks from snap1.qcow2 and writing them
>>> into vm001.img.  Once this process is complete it is safe to delete
>>> snap1.qcow2 since all data is now in vm001.img.
>>
>> Yes, this is the same as what we are wanting to accomplish.  The trick
>> here is open vm001.img r/w, in a safe manner (by safe, I mean able to
>> abort in case of error while keeping the guest running live).
>>
>> My thoughts on this has revolved around something similar to what was
>> done in bdrv_append(), where a duplicate BDS is created, a new file-open
>> performed with the appropriate access mode flags, and if successful
>> swapped out for the originally opened BDS for vm001.img.  If there is an
>> error, the new BDS is abandoned without modifying the BDS list.
> 
> Yes, there needs to be an atomic way to try opening the image read/write.
> 
>>>
>>> As a result we have made the backing file chain shorter.  This is
>>> improtant because otherwise incremental backup would grow the backing
>>> file chain forever - each time it takes a new snapshot the chain
>>> becomes longer and I/O accesses can become slower!
>>>
>>> The task is to add a new block job type called "commit".  It is like
>>> the qemu-img commit command except it works while the guest is
>>> running.
>>>
>>> The new QMP command should look like this:
>>>
>>> { 'command': 'block-commit', 'data': { 'device': 'str', 'image':
>>> 'str', 'base': 'str', '*speed': 'int' }
>>
>> This is very similar to what I was thinking as well - I think the only
>> difference in the command is that I what you called 'image' I called
>> 'top', and the argument order was after base.
>>
>> Here is what I had for the command:
>>
>> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>>                                       '*top': 'str', '*speed': 'int' } }
>>
>> I don't think I have a strong preference for either of our proposed
>> commands - they are essentially the same.
> 
> Yes, they are basically the same.  I don't mind which one we use.
> 
>>>
>>> Note that block-commit cannot work on the top-most image since the
>>> guest is still writing to that image and we might never be able to
>>> copy all the data into the base image (the guest could write new data
>>> as quickly as we copy it to the base).  The command should check for
>>> this and reject the top-most image.
>>
>> By this you mean that you would like to disallow committing the
>> top-level image to the base?  Perhaps there is a way to attempt to
>> converge, and adaptively give more time to the co-routine if we are able
>> to detect divergence.  This may require violating the 'speed' parameter,
>> however, and make the commit less 'live'.
> 
> Yes, I think we should disallow merging down the topmost image.  The
> convergence problem is the same issue that live migration has.  It's a
> hard problem and not something that is essential for block-commit.  It
> would add complexity into the block-commit implementation - the only
> benefit would be that you can merge down the last COW snapshot.  We
> already have an implementation that does this: the "commit" command
> which stops the guest :).

OK.  And, it is better to get this implemented now, and if desired, I
suppose we can always revisit the convergence at a later date (or not at
all, if there is no pressing case for it).

> 
> If we take Taisuke's scenario, just create a snapshot on the SSD
> immediately before merging down:
> 
>   /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/cow.qcow2
> 
> (qemu) snapshot_blkdev ...
> 
>   /big-slow-cheap-disk/master.img <-
> /small-fast-expensive-ssd/cow.qcow2 <-
> /small-fast-expensive-ssd/newcow.qcow2
> 
> (qemu) block-commit cow.qcow2
> 
>   /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/newcow.qcow2
> 
>>>
>>> Let's figure out how to specify block-commit so we're all happy, that
>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>
>>
>> I think we are almost completely on the same page - devil is in the
>> details, of course (for instance, on how to convert the destination base
>> from r/o to r/w).
> 
> Great.  The atomic r/o -> r/w transition and the commit coroutine can
> be implemented on in parallel.  Are you happy to implement the atomic
> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
> assume that part already works and focus on implementing the commit
> coroutine in the meantime.  I'm just suggesting a way to split up the
> work, please let me know if you think this is good.

I am happy to do it that way.  I'll shift my focus to the atomic image
reopen in r/w mode.  I'll go ahead and post my diagrams and other info
for block-commit on the wiki, because I don't think it conflicts with we
discussed above (although I will modify my diagrams to not show commit
from the top-level image).  Of course, feel free to change it as
necessary.

Thanks,
Jeff

next prev parent reply	other threads:[~2012-06-08 14:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07  6:19 [Qemu-devel] CoW image commit+shrink(= make_empty) support Taisuke Yamada
2012-06-07 14:14 ` Jeff Cody
2012-06-08 12:42   ` Stefan Hajnoczi
2012-06-08 13:19     ` Jeff Cody
2012-06-08 13:53       ` Stefan Hajnoczi
2012-06-08 14:32         ` Jeff Cody [this message]
2012-06-08 16:11           ` Kevin Wolf
2012-06-08 17:46             ` Jeff Cody
2012-06-08 17:57               ` Kevin Wolf
2012-06-08 18:33                 ` Jeff Cody
2012-06-08 21:08                   ` Kevin Wolf
2012-06-09 16:52                     ` Jeff Cody
2012-06-11  7:57                       ` Kevin Wolf
2012-06-10 16:10                 ` Paolo Bonzini
2012-06-11  7:59                   ` Kevin Wolf
2012-06-11  8:01                     ` Paolo Bonzini
2012-06-11 12:09               ` Stefan Hajnoczi
2012-06-11 12:50                 ` Kevin Wolf
2012-06-11 14:24                   ` Stefan Hajnoczi
2012-06-11 15:37                     ` Jeff Cody
2012-06-11 19:12                       ` Paolo Bonzini
2012-06-12  7:27                         ` Zhi Hui Li
2012-06-12 10:56                       ` Stefan Hajnoczi
2012-06-13 10:56                         ` Supriya Kannery
2012-06-14 14:23                       ` Zhi Hui Li
2012-06-14 14:29                         ` Jeff Cody
2012-06-14 18:28                           ` Supriya Kannery
2012-06-15 21:01                             ` Supriya Kannery
2012-06-10 16:06       ` Paolo Bonzini
2012-06-08 10:39 ` Kevin Wolf
2012-06-09 11:21   ` Taisuke Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD20D11.1080603@redhat.com \
    --to=jcody@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=tai@rakugaki.org \
    --cc=zhihuili@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).