All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	quintela@redhat.com, KVM devel mailing list <kvm@vger.kernel.org>,
	qemu-devel@nongnu.org, Chris Wright <chrisw@redhat.com>,
	Dor Laor <dlaor@redhat.com>, Avi Kivity <avi@redhat.com>
Subject: Re: KVM call agenda for June 28
Date: Thu, 30 Jun 2011 11:36:20 -0300	[thread overview]
Message-ID: <20110630143620.GA4366@amt.cnet> (raw)
In-Reply-To: <BANLkTin-7hkUnMHJN9jUY87m8Y=fHS_GYA@mail.gmail.com>

On Thu, Jun 30, 2011 at 01:54:09PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 29, 2011 at 4:41 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Jun 29, 2011 at 11:08:23AM +0100, Stefan Hajnoczi wrote:
> >>  This can be used to merge data from an intermediate image without
> >> merging the base image.  When streaming completes the backing file
> >> will be set to the base image.  The backing file relationship would
> >> typically look like this:
> >>
> >> 1. Before block_stream -a -b base.img ide0-hd completion:
> >>
> >> base.img <- sn1 <- ... <- ide0-hd.qed
> >>
> >> 2. After streaming completes:
> >>
> >> base.img <- ide0-hd.qed
> >>
> >> This describes the image streaming use cases that I, Adam, and Anthony
> >> propose to support.  In the course of the discussion we've sometimes
> >> been distracted with the internals of what a unified live block
> >> copy/image streaming implementation should do.  I wanted to post this
> >> summary of image streaming to refocus us on the use case and the APIs
> >> that users will see.
> >>
> >> Stefan
> >
> > OK, with an external COW file for formats that do not support it the
> > interface can be similar. Also there is no need to mirror writes,
> > no switch operation, always use destination image.
> 
> Marcelo, does this mean you are happy with how management deals with
> power failure/crash during streaming?

Yep.

> Are we settled on the approach where the destination file always has
> the source file as its backing file?

Yep.

> Here are the components that I can identify:
> 
> 1. blkmirror - used by live block copy to keep source and destination
> in sync.  Already implemented as a block driver by Marcelo.

No need for it anymore, now you switch to the destination before
the operation starts. And always use destination from there on.

> 2. External COW overlay - can be used to add backing file (COW)
> support on top of any image, including raw.  Currently unimplemented,
> needs to be a block driver.  Kevin, do you want to write this?
> 
> 3. Unified background copy - image format-independent mechanism for
> copy contents of a backing file chain into the image file (with
> exception of backing files chained below base).  Needs to play nice
> with blkmirror.  Stefan can write this.

Note the background copy itself is to simply read from 0...END. The bulk
is in the block driver.

> 4. Live block copy API and high-level control - the main code that
> adds the live block copy feature.  Existing patches by Marcelo, can be
> restructured to use common core by Marcelo.

Can use your proposed block_stream interface, with a "block_switch"
command on top, so:

1) management creates copy.img with backing file current.img, allows
access
2) management issues "block_switch dev copy.img"
3) management issues "block_stream dev base"

> 5. Image streaming API and high-level control - the main code that
> adds the image streaming feature.  Existing patches by Stefan, Adam,
> Anthony, can be restructured to use common core by Stefan.
> 
> I previously posted a proposed API for the unified background copy
> mechanism.  I'm thinking that background copy is not the best name
> since it is limited to copying the backing file into the image file.
> 
> /**
>  * Start a background copy operation
>  *
>  * Unallocated clusters in the image will be populated with data
>  * from its backing file.  This operation runs in the background and a
>  * completion function is invoked when it is finished.
>  */
> BackgroundCopy *background_copy_start(
>    BlockDriverState *bs,
> 
>    /**
>     * Note: Kevin suggests we migrate this into BlockDriverState
>     *       in order to enable copy-on-read.
>     *
>     * Base image that both source and destination have as a
>     * backing file ancestor.  Data will not be copied from base
>     * since both source and destination will have access to base
>     * image.  This may be NULL to copy all data.
>     */
>    BlockDriverState *base,
> 
>    BlockDriverCompletionFunc *cb, void *opaque);
> 
> /**
>  * Cancel a background copy operation
>  *
>  * This function marks the background copy operation for cancellation and the
>  * completion function is invoked once the operation has been cancelled.
>  */
> void background_copy_cancel(BackgroundCopy *bgc,
>                             BlockDriverCompletionFunc *cb, void *opaque);
> 
> /**
>  * Get progress of a running background copy operation
>  */
> void background_copy_get_status(BackgroundCopy *bgc,
>                                 BackgroundCopyStatus *status);
> 
> Stefan

Thought of implementing "block_stream" command by reopening device with

blkstream:imagename.img

Then:

AIO_READ:
- for each cluster in request:
    - if allocated-or-in-final-base, read.
    - check write queue, if present wait on it, if not, add "copy"
      entry to write queue.
    - issue cluster sized read from source.
    - on completion:
        - copy data to original read buffer, complete it.
        - if not cancelled, write cluster to destination.

AIO_WRITE
for each cluster in request:
    - check write queue, cancel/wait for "copy" entry.
    - add "guest" entry to write queue.
    - issue write to destination.
    - on completion:
        - remove write queue entry.


With the 0...END background read, once it completes write final base
file for image.

So block_stream/block_stream_cancel/block_stream_status commands, the
background read and the rebase -u update can be separate from the block
driver.


WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Chris Wright <chrisw@redhat.com>,
	KVM devel mailing list <kvm@vger.kernel.org>,
	quintela@redhat.com, Dor Laor <dlaor@redhat.com>,
	qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] KVM call agenda for June 28
Date: Thu, 30 Jun 2011 11:36:20 -0300	[thread overview]
Message-ID: <20110630143620.GA4366@amt.cnet> (raw)
In-Reply-To: <BANLkTin-7hkUnMHJN9jUY87m8Y=fHS_GYA@mail.gmail.com>

On Thu, Jun 30, 2011 at 01:54:09PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 29, 2011 at 4:41 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Jun 29, 2011 at 11:08:23AM +0100, Stefan Hajnoczi wrote:
> >>  This can be used to merge data from an intermediate image without
> >> merging the base image.  When streaming completes the backing file
> >> will be set to the base image.  The backing file relationship would
> >> typically look like this:
> >>
> >> 1. Before block_stream -a -b base.img ide0-hd completion:
> >>
> >> base.img <- sn1 <- ... <- ide0-hd.qed
> >>
> >> 2. After streaming completes:
> >>
> >> base.img <- ide0-hd.qed
> >>
> >> This describes the image streaming use cases that I, Adam, and Anthony
> >> propose to support.  In the course of the discussion we've sometimes
> >> been distracted with the internals of what a unified live block
> >> copy/image streaming implementation should do.  I wanted to post this
> >> summary of image streaming to refocus us on the use case and the APIs
> >> that users will see.
> >>
> >> Stefan
> >
> > OK, with an external COW file for formats that do not support it the
> > interface can be similar. Also there is no need to mirror writes,
> > no switch operation, always use destination image.
> 
> Marcelo, does this mean you are happy with how management deals with
> power failure/crash during streaming?

Yep.

> Are we settled on the approach where the destination file always has
> the source file as its backing file?

Yep.

> Here are the components that I can identify:
> 
> 1. blkmirror - used by live block copy to keep source and destination
> in sync.  Already implemented as a block driver by Marcelo.

No need for it anymore, now you switch to the destination before
the operation starts. And always use destination from there on.

> 2. External COW overlay - can be used to add backing file (COW)
> support on top of any image, including raw.  Currently unimplemented,
> needs to be a block driver.  Kevin, do you want to write this?
> 
> 3. Unified background copy - image format-independent mechanism for
> copy contents of a backing file chain into the image file (with
> exception of backing files chained below base).  Needs to play nice
> with blkmirror.  Stefan can write this.

Note the background copy itself is to simply read from 0...END. The bulk
is in the block driver.

> 4. Live block copy API and high-level control - the main code that
> adds the live block copy feature.  Existing patches by Marcelo, can be
> restructured to use common core by Marcelo.

Can use your proposed block_stream interface, with a "block_switch"
command on top, so:

1) management creates copy.img with backing file current.img, allows
access
2) management issues "block_switch dev copy.img"
3) management issues "block_stream dev base"

> 5. Image streaming API and high-level control - the main code that
> adds the image streaming feature.  Existing patches by Stefan, Adam,
> Anthony, can be restructured to use common core by Stefan.
> 
> I previously posted a proposed API for the unified background copy
> mechanism.  I'm thinking that background copy is not the best name
> since it is limited to copying the backing file into the image file.
> 
> /**
>  * Start a background copy operation
>  *
>  * Unallocated clusters in the image will be populated with data
>  * from its backing file.  This operation runs in the background and a
>  * completion function is invoked when it is finished.
>  */
> BackgroundCopy *background_copy_start(
>    BlockDriverState *bs,
> 
>    /**
>     * Note: Kevin suggests we migrate this into BlockDriverState
>     *       in order to enable copy-on-read.
>     *
>     * Base image that both source and destination have as a
>     * backing file ancestor.  Data will not be copied from base
>     * since both source and destination will have access to base
>     * image.  This may be NULL to copy all data.
>     */
>    BlockDriverState *base,
> 
>    BlockDriverCompletionFunc *cb, void *opaque);
> 
> /**
>  * Cancel a background copy operation
>  *
>  * This function marks the background copy operation for cancellation and the
>  * completion function is invoked once the operation has been cancelled.
>  */
> void background_copy_cancel(BackgroundCopy *bgc,
>                             BlockDriverCompletionFunc *cb, void *opaque);
> 
> /**
>  * Get progress of a running background copy operation
>  */
> void background_copy_get_status(BackgroundCopy *bgc,
>                                 BackgroundCopyStatus *status);
> 
> Stefan

Thought of implementing "block_stream" command by reopening device with

blkstream:imagename.img

Then:

AIO_READ:
- for each cluster in request:
    - if allocated-or-in-final-base, read.
    - check write queue, if present wait on it, if not, add "copy"
      entry to write queue.
    - issue cluster sized read from source.
    - on completion:
        - copy data to original read buffer, complete it.
        - if not cancelled, write cluster to destination.

AIO_WRITE
for each cluster in request:
    - check write queue, cancel/wait for "copy" entry.
    - add "guest" entry to write queue.
    - issue write to destination.
    - on completion:
        - remove write queue entry.


With the 0...END background read, once it completes write final base
file for image.

So block_stream/block_stream_cancel/block_stream_status commands, the
background read and the rebase -u update can be separate from the block
driver.

  reply	other threads:[~2011-06-30 14:36 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-27 14:32 KVM call agenda for June 28 Juan Quintela
2011-06-27 14:32 ` [Qemu-devel] " Juan Quintela
2011-06-28 13:38 ` Stefan Hajnoczi
2011-06-28 13:38   ` [Qemu-devel] " Stefan Hajnoczi
2011-06-28 19:41   ` Marcelo Tosatti
2011-06-28 19:41     ` [Qemu-devel] " Marcelo Tosatti
2011-06-29  5:32     ` Stefan Hajnoczi
2011-06-29  5:32       ` [Qemu-devel] " Stefan Hajnoczi
2011-06-29  7:57     ` Kevin Wolf
2011-06-29  7:57       ` [Qemu-devel] " Kevin Wolf
2011-06-29 10:08       ` Stefan Hajnoczi
2011-06-29 10:08         ` [Qemu-devel] " Stefan Hajnoczi
2011-06-29 15:41         ` Marcelo Tosatti
2011-06-29 15:41           ` [Qemu-devel] " Marcelo Tosatti
2011-06-30 11:48           ` Stefan Hajnoczi
2011-06-30 11:48             ` [Qemu-devel] " Stefan Hajnoczi
2011-06-30 12:39             ` Kevin Wolf
2011-06-30 12:39               ` [Qemu-devel] " Kevin Wolf
2011-06-30 12:54           ` Stefan Hajnoczi
2011-06-30 12:54             ` [Qemu-devel] " Stefan Hajnoczi
2011-06-30 14:36             ` Marcelo Tosatti [this message]
2011-06-30 14:36               ` Marcelo Tosatti
2011-06-30 14:52               ` Kevin Wolf
2011-06-30 14:52                 ` [Qemu-devel] " Kevin Wolf
2011-06-30 18:38                 ` Marcelo Tosatti
2011-07-05  8:01                   ` Dor Laor
2011-07-05 12:40                     ` Stefan Hajnoczi
2011-07-05 12:40                       ` Stefan Hajnoczi
2011-07-05 12:58                       ` Marcelo Tosatti
2011-07-05 12:58                         ` Marcelo Tosatti
2011-07-05 13:39                         ` Dor Laor
2011-07-05 13:39                           ` Dor Laor
2011-07-05 14:29                           ` Marcelo Tosatti
2011-07-05 14:29                             ` [Qemu-devel] " Marcelo Tosatti
2011-07-05 14:32                           ` Marcelo Tosatti
2011-07-05 14:32                             ` Marcelo Tosatti
2011-07-05 14:46                             ` Kevin Wolf
2011-07-05 14:46                               ` Kevin Wolf
2011-07-05 15:04                             ` Dor Laor
2011-07-05 15:04                               ` Dor Laor
2011-07-05 15:29                               ` Marcelo Tosatti
2011-07-05 15:29                                 ` Marcelo Tosatti
2011-07-05 15:37                             ` Stefan Hajnoczi
2011-07-05 15:37                               ` Stefan Hajnoczi
2011-07-05 18:18                               ` Marcelo Tosatti
2011-07-05 18:18                                 ` Marcelo Tosatti
2011-07-06  7:48                                 ` Kevin Wolf
2011-07-06  7:48                                   ` Kevin Wolf
2011-07-07 15:25                                 ` Stefan Hajnoczi
2011-07-07 15:25                                   ` Stefan Hajnoczi
2011-06-28 13:43 ` Anthony Liguori
2011-06-28 13:43   ` Anthony Liguori
2011-06-28 13:48   ` Avi Kivity
2011-06-28 13:48     ` Avi Kivity
2011-06-30 14:10     ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110630143620.GA4366@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi@redhat.com \
    --cc=chrisw@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.