All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
	Dor Laor <dlaor@redhat.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>,
	Adam Litke <agl@us.ibm.com>
Subject: Re: [Qemu-devel] live block copy/stream/snapshot discussion
Date: Tue, 12 Jul 2011 18:10:26 +0200	[thread overview]
Message-ID: <4E1C71F2.4030507@redhat.com> (raw)
In-Reply-To: <CAJSP0QXwEY7wKYD=vQf1H4nLKzccemNSbNauaMSK8GQgoHrbfw@mail.gmail.com>

Am 12.07.2011 17:45, schrieb Stefan Hajnoczi:
>>>> Image streaming API
>>>> ===================
>>>>
>>>> For leaf images with copy-on-read semantics, the stream commands allow the user
>>>> to populate local blocks by manually streaming them from the backing image.
>>>> Once all blocks have been streamed, the dependency on the original backing
>>>> image can be removed.  Therefore, stream commands can be used to implement
>>>> post-copy live block migration and rapid deployment.
>>>>
>>>> The block_stream command can be used to stream a single cluster, to
>>>> start streaming the entire device, and to cancel an active stream.  It
>>>> is easiest to allow the block_stream command to manage streaming for the
>>>> entire device but a managent tool could use single cluster mode to
>>>> throttle the I/O rate.
>>
>> As discussed earlier, having the management send requests for each
>> single cluster doesn't make any sense at all. It wouldn't only throttle
>> the I/O rate but bring it down to a level that makes it unusable. What
>> you really want is to allow the management to give us a range (offset +
>> length) that qemu should stream.
> 
> I feel that an iteration interface is problematic whether the
> management tool or QEMU decide what to stream.  Let's have just the
> background streaming operation.
> 
> The problem with byte ranges is two-fold.  The management tool doesn't
> know which regions of the image are allocated so it may do a lot of
> nop calls to already-allocated regions with no intelligence as to
> where the next sensible offset for streaming is.  Secondly, because
> the progress and performance of image streaming depend largely on
> whether or not clusters are allocated (it is very fast when a cluster
> is already allocated and we have no work to do), offsets are bad
> indicators of progress to the user.  I think it's best not to expose
> these details to the management tool at all.
> 
> The only reason for the iteration interface was to punt I/O throttling
> to the management tool.  I think it would be easier to just throttle
> inside the streaming function.
> 
> Kevin: Are you happy with dropping the iteration interface?
> Adam: Is there a libvirt requirement for iteration or could we support
> background copy only?

Okay, works for me.

>>>> The command synopses are as follows:
>>>>
>>>> block_stream
>>>> ------------
>>>>
>>>> Copy data from a backing file into a block device.
>>>>
>>>> If the optional 'all' argument is true, this operation is performed in the
>>>> background until the entire backing file has been copied.  The status of
>>>> ongoing block_stream operations can be checked with query-block-stream.
>>
>> Not sure if it's a good idea to use a bool argument to turn a command
>> into its opposite. I think having a separate command for stopping would
>> be cleaner. Something for the QMP folks to decide, though.
> 
> git branch new_branch
> git branch -D new_branch
> 
> Makes sense to me :)

I don't think you should compare a command line option to a programming
interface. Having a git_create_branch(const char *name, bool delete)
would really look strange. Anyway, probably a matter of taste.

A hint that separate commands would make sense is that the stop command
won't need the other arguments that the start command gets ('all' and
'base').

>>>> Arguments:
>>>>
>>>> - all:    copy entire device (json-bool, optional)
>>>> - stop:   stop copying to device (json-bool, optional)
>>>> - device: device name (json-string)
>>>
>>> It must be possible to specify backing file that will be
>>> active after streaming finishes (data from that file will not
>>> be streamed into active file, of course).
>>
>> Yes, I think the common base image belongs here.
> 
> Right.  We need to specify it by filename:
> 
>   - base: filename of base file (json-string, optional)
> 
>   Sectors are not copied from the base file and its backing file
>   chain.  The following describes this feature:
>     Before: base <- sn1 <- sn2 <- sn3 <- vm.img
>     After:  base <- vm.img

Does this imply that a rebase -u happens always after completion?

>> With all = false, where does the streaming begin?
> 
> Streaming begins at the start of the image.
> 
>> Do you have something like the "current streaming offset" in the state of each BlockDriverState?
> 
> Yes, there is a StreamState for each block device that has an
> in-progress operation.  The progress is saved between block_stream
> (without -a) invocations so the caller does not need to specify the
> streaming offset as an argument.
> 
> Thanks for pointing out these weaknesses in the documentation.  It
> should really be explained fully.

I think we also need to describe error cases. For example, what happens
if you try to start streaming while it's already in progress?

>>>> Return:
>>>>
>>>> - device: device name (json-string)
>>>> - len:    size of the device, in bytes (json-int)
>>>> - offset: ending offset of the completed I/O, in bytes (json-int)
>>
>> So you only get the reply when the request has completed? With the
>> current monitor, this means that QMP is blocked while we stream, doesn't
>> it? How are you supposed to send the stop command then?
> 
> Incomplete documentation again, sorry.  The block_stream command
> behaves as follows:
> 
> 1. block_stream all returns immediately and the BLOCK_STREAM_COMPLETED
> event is raised when streaming completes either successfully or with
> an error.
> 
> 2. block_stream stop returns when the in-progress streaming operation
> has been safely stopped.
> 
> 3. block_stream returns when one iteration of streaming has completed.
> 
>> Two of three examples below have an empty return value instead, so they
>> are not compliant to this specification.
> 
> I will update the documentation, the non-all invocations do not return anything.

Okay, then I don't understand what the 'offset' return value means. The
text says "offset of the completed I/O". If all=true immediately
returns, shouldn't it always be 0?

>> I find it rather disturbing that a command like 'change' has made it
>> into QMP... Anyway, I don't think this is really what we need.
>>
>> We have two switches to do. The first one happens before starting the
>> copy: Creating the copy, with the source as its backing file, and
>> switching to that. The monitor command to achieve this is snapshot_blkdev.
> 
> I don't think that creating image files in QEMU is going to work when
> running KVM with libvirt (SELinux).  The QEMU process does not have
> the ability to create new image files.  It needs at least a file
> descriptor to an empty file or maybe a file that has been created
> using qemu-img like I showed above.

Independent problem. We're really creating an external snapshot here, so
we should use the function for external snapshots. libvirt can
pre-create an empty image file, so that qemu will write the image format
data into it, but we have discussed this before.

Kevin

  reply	other threads:[~2011-07-12 16:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-05 14:17 [Qemu-devel] live block copy/stream/snapshot discussion Dor Laor
2011-07-11 12:54 ` Stefan Hajnoczi
2011-07-11 14:47   ` Stefan Hajnoczi
2011-07-11 16:32     ` Marcelo Tosatti
2011-07-12  8:06       ` Kevin Wolf
2011-07-12 15:45         ` Stefan Hajnoczi
2011-07-12 16:10           ` Kevin Wolf [this message]
2011-07-13  9:51             ` Stefan Hajnoczi
2011-07-14  9:39               ` Stefan Hajnoczi
2011-07-14  9:55                 ` Kevin Wolf
2011-07-14 10:00                   ` Stefan Hajnoczi
2011-07-14 10:07                     ` Kevin Wolf
2011-07-12 17:47           ` Adam Litke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E1C71F2.4030507@redhat.com \
    --to=kwolf@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.