All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Cody <jcody@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Zhi Hui Li <zhihuili@linux.vnet.ibm.com>,
	Taisuke Yamada <tai@rakugaki.org>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Fri, 08 Jun 2012 10:32:49 -0400	[thread overview]
Message-ID: <4FD20D11.1080603@redhat.com> (raw)
In-Reply-To: <CAJSP0QVtvYxCOJX1okgfJH1vKG8nWdmPE5Ew3Dwvjm51-rUCzA@mail.gmail.com>

On 06/08/2012 09:53 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 8, 2012 at 2:19 PM, Jeff Cody <jcody@redhat.com> wrote:
>> On 06/08/2012 08:42 AM, Stefan Hajnoczi wrote:
>>> On Thu, Jun 7, 2012 at 3:14 PM, Jeff Cody <jcody@redhat.com> wrote:
>>>> On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
>>> We want to commit snap1.qcow2 down into vm001.img while the guest is running:
>>>
>>> vm001.img <-- snap2.qcow2
>>>
>>> This means copying allocated blocks from snap1.qcow2 and writing them
>>> into vm001.img.  Once this process is complete it is safe to delete
>>> snap1.qcow2 since all data is now in vm001.img.
>>
>> Yes, this is the same as what we are wanting to accomplish.  The trick
>> here is open vm001.img r/w, in a safe manner (by safe, I mean able to
>> abort in case of error while keeping the guest running live).
>>
>> My thoughts on this has revolved around something similar to what was
>> done in bdrv_append(), where a duplicate BDS is created, a new file-open
>> performed with the appropriate access mode flags, and if successful
>> swapped out for the originally opened BDS for vm001.img.  If there is an
>> error, the new BDS is abandoned without modifying the BDS list.
> 
> Yes, there needs to be an atomic way to try opening the image read/write.
> 
>>>
>>> As a result we have made the backing file chain shorter.  This is
>>> improtant because otherwise incremental backup would grow the backing
>>> file chain forever - each time it takes a new snapshot the chain
>>> becomes longer and I/O accesses can become slower!
>>>
>>> The task is to add a new block job type called "commit".  It is like
>>> the qemu-img commit command except it works while the guest is
>>> running.
>>>
>>> The new QMP command should look like this:
>>>
>>> { 'command': 'block-commit', 'data': { 'device': 'str', 'image':
>>> 'str', 'base': 'str', '*speed': 'int' }
>>
>> This is very similar to what I was thinking as well - I think the only
>> difference in the command is that I what you called 'image' I called
>> 'top', and the argument order was after base.
>>
>> Here is what I had for the command:
>>
>> { 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
>>                                       '*top': 'str', '*speed': 'int' } }
>>
>> I don't think I have a strong preference for either of our proposed
>> commands - they are essentially the same.
> 
> Yes, they are basically the same.  I don't mind which one we use.
> 
>>>
>>> Note that block-commit cannot work on the top-most image since the
>>> guest is still writing to that image and we might never be able to
>>> copy all the data into the base image (the guest could write new data
>>> as quickly as we copy it to the base).  The command should check for
>>> this and reject the top-most image.
>>
>> By this you mean that you would like to disallow committing the
>> top-level image to the base?  Perhaps there is a way to attempt to
>> converge, and adaptively give more time to the co-routine if we are able
>> to detect divergence.  This may require violating the 'speed' parameter,
>> however, and make the commit less 'live'.
> 
> Yes, I think we should disallow merging down the topmost image.  The
> convergence problem is the same issue that live migration has.  It's a
> hard problem and not something that is essential for block-commit.  It
> would add complexity into the block-commit implementation - the only
> benefit would be that you can merge down the last COW snapshot.  We
> already have an implementation that does this: the "commit" command
> which stops the guest :).

OK.  And, it is better to get this implemented now, and if desired, I
suppose we can always revisit the convergence at a later date (or not at
all, if there is no pressing case for it).

> 
> If we take Taisuke's scenario, just create a snapshot on the SSD
> immediately before merging down:
> 
>   /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/cow.qcow2
> 
> (qemu) snapshot_blkdev ...
> 
>   /big-slow-cheap-disk/master.img <-
> /small-fast-expensive-ssd/cow.qcow2 <-
> /small-fast-expensive-ssd/newcow.qcow2
> 
> (qemu) block-commit cow.qcow2
> 
>   /big-slow-cheap-disk/master.img <- /small-fast-expensive-ssd/newcow.qcow2
> 
>>>
>>> Let's figure out how to specify block-commit so we're all happy, that
>>> way we can avoid duplicating work.  Any comments on my notes above?
>>>
>>
>> I think we are almost completely on the same page - devil is in the
>> details, of course (for instance, on how to convert the destination base
>> from r/o to r/w).
> 
> Great.  The atomic r/o -> r/w transition and the commit coroutine can
> be implemented on in parallel.  Are you happy to implement the atomic
> r/o -> r/w transition since you wrote bdrv_append()?  Then Zhi Hui can
> assume that part already works and focus on implementing the commit
> coroutine in the meantime.  I'm just suggesting a way to split up the
> work, please let me know if you think this is good.

I am happy to do it that way.  I'll shift my focus to the atomic image
reopen in r/w mode.  I'll go ahead and post my diagrams and other info
for block-commit on the wiki, because I don't think it conflicts with we
discussed above (although I will modify my diagrams to not show commit
from the top-level image).  Of course, feel free to change it as
necessary.

Thanks,
Jeff

  reply	other threads:[~2012-06-08 14:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07  6:19 [Qemu-devel] CoW image commit+shrink(= make_empty) support Taisuke Yamada
2012-06-07 14:14 ` Jeff Cody
2012-06-08 12:42   ` Stefan Hajnoczi
2012-06-08 13:19     ` Jeff Cody
2012-06-08 13:53       ` Stefan Hajnoczi
2012-06-08 14:32         ` Jeff Cody [this message]
2012-06-08 16:11           ` Kevin Wolf
2012-06-08 17:46             ` Jeff Cody
2012-06-08 17:57               ` Kevin Wolf
2012-06-08 18:33                 ` Jeff Cody
2012-06-08 21:08                   ` Kevin Wolf
2012-06-09 16:52                     ` Jeff Cody
2012-06-11  7:57                       ` Kevin Wolf
2012-06-10 16:10                 ` Paolo Bonzini
2012-06-11  7:59                   ` Kevin Wolf
2012-06-11  8:01                     ` Paolo Bonzini
2012-06-11 12:09               ` Stefan Hajnoczi
2012-06-11 12:50                 ` Kevin Wolf
2012-06-11 14:24                   ` Stefan Hajnoczi
2012-06-11 15:37                     ` Jeff Cody
2012-06-11 19:12                       ` Paolo Bonzini
2012-06-12  7:27                         ` Zhi Hui Li
2012-06-12 10:56                       ` Stefan Hajnoczi
2012-06-13 10:56                         ` Supriya Kannery
2012-06-14 14:23                       ` Zhi Hui Li
2012-06-14 14:29                         ` Jeff Cody
2012-06-14 18:28                           ` Supriya Kannery
2012-06-15 21:01                             ` Supriya Kannery
2012-06-10 16:06       ` Paolo Bonzini
2012-06-08 10:39 ` Kevin Wolf
2012-06-09 11:21   ` Taisuke Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD20D11.1080603@redhat.com \
    --to=jcody@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=tai@rakugaki.org \
    --cc=zhihuili@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.