All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Date: Sun, 12 Sep 2010 08:25:47 -0500	[thread overview]
Message-ID: <4C8CD4DB.9020905@codemonkey.ws> (raw)
In-Reply-To: <4C8CCA91.4060001@redhat.com>

On 09/12/2010 07:41 AM, Avi Kivity wrote:
>  On 09/07/2010 05:57 PM, Anthony Liguori wrote:
>>> I agree that streaming should be generic, like block migration.  The
>>> trivial generic implementation is:
>>>
>>> void bdrv_stream(BlockDriverState* bs)
>>> {
>>>      for (sector = 0; sector<  bdrv_getlength(bs); sector += n) {
>>>          if (!bdrv_is_allocated(bs, sector,&n)) {
>>
>> Three problems here.  First problem is that bdrv_is_allocated is 
>> synchronous. 
>
> Put the whole thing in a thread.

It doesn't fix anything.  You don't want stream to serialize all I/O 
operations.

>> The second problem is that streaming makes the most sense when it's 
>> the smallest useful piece of work whereas bdrv_is_allocated() may 
>> return a very large range.
>>
>> You could cap it here but you then need to make sure that cap is at 
>> least cluster_size to avoid a lot of unnecessary I/O.
>
> That seems like a nice solution.  You probably want a multiple of the 
> cluster size to retain efficiency.

What you basically do is:

stream_step_three():
    complete()

stream_step_two(offset, length):
    bdrv_aio_readv(offset, length, buffer, stream_step_three)

bdrv_aio_stream():
     bdrv_aio_find_free_cluster(stream_step_two)

And that's exactly what the current code looks like.  The only change to 
the patch that this does is make some of qed's internals be block layer 
interfaces.

One of the things Stefan has mentioned is that a lot of the QED code 
could be reused by other formats.  All formats implement things like CoW 
on their own today but if you exposed interfaces like 
bdrv_aio_find_free_cluster(), you could actually implement a lot more in 
the generic block layer.

So, I agree with you in principle that this all should be common code.  
I think it's a larger effort though.

>>
>> The QED streaming implementation is 140 LOCs too so you quickly end 
>> up adding more code to the block formats to support these new 
>> interfaces than it takes to just implement it in the block format.
>
> bdrv_is_allocated() already exists (and is needed for commit), what 
> else is needed?  cluster size?

Synchronous implementations are not reusable to implement asynchronous 
anything.  But you need the code to be cluster aware too.

>> Third problem is that  streaming really requires being able to do 
>> zero write detection in a meaningful way.  You don't want to always 
>> do zero write detection so you need another interface to mark a 
>> specific write as a write that should be checked for zeros.
>
> You can do that in bdrv_stream(), above, before the actual write, and 
> call bdrv_unmap() if you detect zeros.

My QED branch now does that FWIW.  At the moment, it only detects zero 
reads to unallocated clusters and writes a special zero cluster marker.  
However, the detection code is in the generic path so once the fsck() 
logic is working, we can implement a free list in QED.

In QED, the detection code needs to have a lot of knowledge about 
cluster boundaries and the format of the device.  In principle, this 
should be common code but it's not for the same reason copy-on-write is 
not common code today.

Regards,

Anthony Liguori

  reply	other threads:[~2010-09-12 13:25 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-07 13:41 [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration Anthony Liguori
2010-09-07 14:01 ` Alexander Graf
2010-09-07 14:31   ` Anthony Liguori
2010-09-07 14:33 ` Stefan Hajnoczi
2010-09-07 14:51   ` Anthony Liguori
2010-09-07 14:55     ` Stefan Hajnoczi
2010-09-07 15:00       ` Anthony Liguori
2010-09-07 15:09         ` Stefan Hajnoczi
2010-09-07 15:20           ` Anthony Liguori
2010-09-08  8:26           ` Kevin Wolf
2010-09-07 14:34 ` Kevin Wolf
2010-09-07 14:49   ` Stefan Hajnoczi
2010-09-07 14:57     ` Anthony Liguori
2010-09-07 15:05       ` Stefan Hajnoczi
2010-09-07 15:23         ` Anthony Liguori
2010-09-12 12:41       ` Avi Kivity
2010-09-12 13:25         ` Anthony Liguori [this message]
2010-09-12 13:40           ` Avi Kivity
2010-09-12 15:23             ` Anthony Liguori
2010-09-12 16:45               ` Avi Kivity
2010-09-12 17:19                 ` Anthony Liguori
2010-09-12 17:31                   ` Avi Kivity
2010-09-07 14:49   ` Anthony Liguori
2010-09-07 15:02     ` Kevin Wolf
2010-09-07 15:11       ` Anthony Liguori
2010-09-07 15:20         ` Kevin Wolf
2010-09-07 15:30           ` Anthony Liguori
2010-09-07 15:39             ` Kevin Wolf
2010-09-07 16:00               ` Anthony Liguori
2010-09-07 15:03 ` [Qemu-devel] " Daniel P. Berrange
2010-09-07 15:16   ` Anthony Liguori
2010-09-12 10:55 ` [Qemu-devel] " Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C8CD4DB.9020905@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.