All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Date: Sun, 12 Sep 2010 10:23:40 -0500	[thread overview]
Message-ID: <4C8CF07C.5040509@codemonkey.ws> (raw)
In-Reply-To: <4C8CD847.8030804@redhat.com>

On 09/12/2010 08:40 AM, Avi Kivity wrote:
> Why would it serialize all I/O operations?  It's just like another 
> vcpu issuing reads.

Because the block layer isn't re-entrant.

>> What you basically do is:
>>
>> stream_step_three():
>>    complete()
>>
>> stream_step_two(offset, length):
>>    bdrv_aio_readv(offset, length, buffer, stream_step_three)
>>
>> bdrv_aio_stream():
>>     bdrv_aio_find_free_cluster(stream_step_two)
>
> Isn't there a write() missing somewhere?

Streaming relies on copy-on-read to do the writing.

>>
>> And that's exactly what the current code looks like.  The only change 
>> to the patch that this does is make some of qed's internals be block 
>> layer interfaces.
>
> Why do you need find_free_cluster()?  That's a physical offset thing.  
> Just write to the same logical offset.
>
> IOW:
>
>     bdrv_aio_stream():
>         bdrv_aio_read(offset, stream_2)

It's an optimization.  If you've got a fully missing L1 entry, then 
you're going to memset() 2GB worth of zeros.  That's just wasted work.  
With a 1TB image with a 1GB allocation, it's a huge amount of wasted work.

>     stream_2():
>         if all zeros:
>             increment offset
>             if more:
>                 bdrv_aio_stream()
>        bdrv_aio_write(offset, stream_3)
>
>     stream_3():
>         bdrv_aio_write(offset, stream_4)

I don't understand why stream_3() is needed.

>     stream_4():
>         increment offset
>         if more:
>              bdrv_aio_stream()
>
>
> Of course, need to serialize wrt guest writes, which adds a bit more 
> complexity.  I'll leave it to you to code the state machine for that.

http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d

>>
>>>> Third problem is that  streaming really requires being able to do 
>>>> zero write detection in a meaningful way.  You don't want to always 
>>>> do zero write detection so you need another interface to mark a 
>>>> specific write as a write that should be checked for zeros.
>>>
>>> You can do that in bdrv_stream(), above, before the actual write, 
>>> and call bdrv_unmap() if you detect zeros.
>>
>> My QED branch now does that FWIW.  At the moment, it only detects 
>> zero reads to unallocated clusters and writes a special zero cluster 
>> marker.  However, the detection code is in the generic path so once 
>> the fsck() logic is working, we can implement a free list in QED.
>>
>> In QED, the detection code needs to have a lot of knowledge about 
>> cluster boundaries and the format of the device.  In principle, this 
>> should be common code but it's not for the same reason copy-on-write 
>> is not common code today.
>
> Parts of it are: commit.  Of course, that's horribly synchronous.

If you've got AIO internally, making commit work is pretty easy.  Doing 
asynchronous commit at a generic layer is not easy though unless you 
expose lots of details.

Generally, I think the block layer makes more sense if the interface to 
the formats are high level and code sharing is achieved not by mandating 
a world view but rather but making libraries of common functionality.   
This is more akin to how the FS layer works in Linux.

So IMHO, we ought to add a bdrv_aio_commit function, turn the current 
code into a generic_aio_commit, implement a qed_aio_commit, then somehow 
do qcow2_aio_commit, and look at what we can refactor into common code.

Regards,

Anthony Liguori

  reply	other threads:[~2010-09-12 15:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-07 13:41 [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration Anthony Liguori
2010-09-07 14:01 ` Alexander Graf
2010-09-07 14:31   ` Anthony Liguori
2010-09-07 14:33 ` Stefan Hajnoczi
2010-09-07 14:51   ` Anthony Liguori
2010-09-07 14:55     ` Stefan Hajnoczi
2010-09-07 15:00       ` Anthony Liguori
2010-09-07 15:09         ` Stefan Hajnoczi
2010-09-07 15:20           ` Anthony Liguori
2010-09-08  8:26           ` Kevin Wolf
2010-09-07 14:34 ` Kevin Wolf
2010-09-07 14:49   ` Stefan Hajnoczi
2010-09-07 14:57     ` Anthony Liguori
2010-09-07 15:05       ` Stefan Hajnoczi
2010-09-07 15:23         ` Anthony Liguori
2010-09-12 12:41       ` Avi Kivity
2010-09-12 13:25         ` Anthony Liguori
2010-09-12 13:40           ` Avi Kivity
2010-09-12 15:23             ` Anthony Liguori [this message]
2010-09-12 16:45               ` Avi Kivity
2010-09-12 17:19                 ` Anthony Liguori
2010-09-12 17:31                   ` Avi Kivity
2010-09-07 14:49   ` Anthony Liguori
2010-09-07 15:02     ` Kevin Wolf
2010-09-07 15:11       ` Anthony Liguori
2010-09-07 15:20         ` Kevin Wolf
2010-09-07 15:30           ` Anthony Liguori
2010-09-07 15:39             ` Kevin Wolf
2010-09-07 16:00               ` Anthony Liguori
2010-09-07 15:03 ` [Qemu-devel] " Daniel P. Berrange
2010-09-07 15:16   ` Anthony Liguori
2010-09-12 10:55 ` [Qemu-devel] " Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C8CF07C.5040509@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.