[Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
@ 2013-05-13 21:21 Wolfgang Richter
  2013-05-14  8:40 ` Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-13 21:21 UTC (permalink / raw)
  To: qemu-devel, stefanha, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

I'm working on a new patch series which will add a new QMP command,
block-trace, which turns on tracing of writes for a specified block device
and
sends the stream unmodified to another block device.  The 'trace' is meant
to
be precise meaning that writes are not lost, which differentiates this
command
from others.  It can be turned on and off depending on when it is needed.

How is this different from block-backup or drive-mirror?
--------------------------------------------------------

block-backup is designed to create point-in-time snapshots and not clone the
entire write stream of a VM to a particular device.  It implements
copy-on-write to create a snapshot.  Thus whenever a write occurs,
block-backup
is designed to send the original data and not the contents of the new write.

drive-mirror is designed to mirror a disk to another location.  It operates
by
periodically scanning a dirty bitmap and cloning blocks when dirtied.  This
is
efficient as it allows for batching of writes, but it does not maintain the
order in which guest writes occurred and it can miss intermediate writes
when
they go to the same location on disk.

How can block-trace be used?
----------------------------

(1) Disk introspection - systems which analyze the writes going to a disk
for
introspection require a perfect clone of the write stream to an original
disk
to stay in-sync with updates to guest file systems.

(2) Replicated block device - two block devices could be maintained as exact
copies of each other up to a point in the disk write stream that has
successfully been written to the destination block device.

--
Wolf

[-- Attachment #2: Type: text/html, Size: 2134 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-13 21:21 [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection Wolfgang Richter
@ 2013-05-14  8:40 ` Stefan Hajnoczi
  2013-05-14 15:42   ` Wolfgang Richter
  2013-05-14  8:50 ` Kevin Wolf
  2013-05-16 13:44 ` Richard W.M. Jones
  2 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-14  8:40 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Paolo Bonzini, qemu-devel

On Mon, May 13, 2013 at 05:21:54PM -0400, Wolfgang Richter wrote:
> I'm working on a new patch series which will add a new QMP command,
> block-trace, which turns on tracing of writes for a specified block device
> and
> sends the stream unmodified to another block device.  The 'trace' is meant
> to
> be precise meaning that writes are not lost, which differentiates this
> command
> from others.  It can be turned on and off depending on when it is needed.
> 
> 
> 
> How is this different from block-backup or drive-mirror?
> --------------------------------------------------------
> 
> block-backup is designed to create point-in-time snapshots and not clone the
> entire write stream of a VM to a particular device.  It implements
> copy-on-write to create a snapshot.  Thus whenever a write occurs,
> block-backup
> is designed to send the original data and not the contents of the new write.
> 
> drive-mirror is designed to mirror a disk to another location.  It operates
> by
> periodically scanning a dirty bitmap and cloning blocks when dirtied.  This
> is
> efficient as it allows for batching of writes, but it does not maintain the
> order in which guest writes occurred and it can miss intermediate writes
> when
> they go to the same location on disk.
> 
> 
> 
> How can block-trace be used?
> ----------------------------
> 
> (1) Disk introspection - systems which analyze the writes going to a disk
> for
> introspection require a perfect clone of the write stream to an original
> disk
> to stay in-sync with updates to guest file systems.
> 
> (2) Replicated block device - two block devices could be maintained as exact
> copies of each other up to a point in the disk write stream that has
> successfully been written to the destination block device.

CCed Benoit Canet, who implemented the quorum block driver to mirror I/O
to multiple images and verify data integrity.

QEMU is accumulating many different approaches to snapshots and
mirroring.  They all have their pros and cons so it's not possible to
support only one approach for all use cases.

The suggested approach is writing a BlockDriver which mirrors I/O to two
BlockDriverStates.  There has been discussion around breaking
BlockDriver into smaller interfaces, including a BlockFilter for
intercepting I/O, but this has not been implemented.  blkverify is an
example of a BlockDriver that manages two child BlockDriverStates and
may be a good starting point.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-13 21:21 [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection Wolfgang Richter
  2013-05-14  8:40 ` Stefan Hajnoczi
@ 2013-05-14  8:50 ` Kevin Wolf
  2013-05-14 10:04   ` Paolo Bonzini
  2013-05-14 15:45   ` Wolfgang Richter
  2013-05-16 13:44 ` Richard W.M. Jones
  2 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2013-05-14  8:50 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Paolo Bonzini, qemu-devel, stefanha

Am 13.05.2013 um 23:21 hat Wolfgang Richter geschrieben:
> I'm working on a new patch series which will add a new QMP command,
> block-trace, which turns on tracing of writes for a specified block device and
> sends the stream unmodified to another block device.  The 'trace' is meant to
> be precise meaning that writes are not lost, which differentiates this command
> from others.  It can be turned on and off depending on when it is needed.
> 
> 
> 
> How is this different from block-backup or drive-mirror?
> --------------------------------------------------------
> 
> block-backup is designed to create point-in-time snapshots and not clone the
> entire write stream of a VM to a particular device.  It implements
> copy-on-write to create a snapshot.  Thus whenever a write occurs, block-backup
> is designed to send the original data and not the contents of the new write.
> 
> drive-mirror is designed to mirror a disk to another location.  It operates by
> periodically scanning a dirty bitmap and cloning blocks when dirtied.  This is
> efficient as it allows for batching of writes, but it does not maintain the
> order in which guest writes occurred and it can miss intermediate writes when
> they go to the same location on disk.

Or, to translate it into our existing terminology, drive-mirror
implements a passive mirror, you're proposing an active one (which we
do want to have).

With an active mirror, we'll want to have another choice: The mirror can
be synchronous (guest writes only complete after the mirrored write has
completed) or asynchronous (completion is based only on the original
image). It should be easy enough to support both once an active mirror
exists.

> How can block-trace be used?
> ----------------------------
> 
> (1) Disk introspection - systems which analyze the writes going to a disk for
> introspection require a perfect clone of the write stream to an original disk
> to stay in-sync with updates to guest file systems.
> 
> (2) Replicated block device - two block devices could be maintained as exact
> copies of each other up to a point in the disk write stream that has
> successfully been written to the destination block device.

You're leaving out the most interesting section: How should block-trace
be implemented?

The first question is what the API should look like, on the QMP level. I
think originally the idea was to use drive-mirror for all kinds of
mirrors, but maybe it makes more sense indeed to keep the active mirror
separate. I don't particularly like the name block-trace for a separate
command, but let's save the bikeshedding for later.

The other question is how to implement it internally. I don't think
adding specific code for each new block job into bdrv_co_do_writev() is
acceptable. We really need a generic way to intercept I/O operations.
The keyword from earlier discussions is "block filters". Essentially the
idea is that the block job temporarily adds a BlockDriverState on top of
the format driver and becomes able to implement all callbacks it likes
to intercept. The bad news is that the infrastructure isn't there yet
to actually make this happen in a sane way.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14  8:50 ` Kevin Wolf
@ 2013-05-14 10:04   ` Paolo Bonzini
  2013-05-14 15:48     ` Wolfgang Richter
  2013-05-14 15:45   ` Wolfgang Richter
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-14 10:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Wolfgang Richter, qemu-devel, stefanha

Il 14/05/2013 10:50, Kevin Wolf ha scritto:
> Or, to translate it into our existing terminology, drive-mirror
> implements a passive mirror, you're proposing an active one (which we
> do want to have).
> 
> With an active mirror, we'll want to have another choice: The mirror can
> be synchronous (guest writes only complete after the mirrored write has
> completed) or asynchronous (completion is based only on the original
> image). It should be easy enough to support both once an active mirror
> exists.

Right, I'm waiting for Stefan's block-backup to give me the "right"
hooks for the active mirror.

The bulk phase will always be passive, but an active-asynchronous mirror
has some interesting properties and it makes sense to implement it.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14  8:40 ` Stefan Hajnoczi
@ 2013-05-14 15:42   ` Wolfgang Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-14 15:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

On Tue, May 14, 2013 at 4:40 AM, Stefan Hajnoczi <stefanha@redhat.com>wrote:

> QEMU is accumulating many different approaches to snapshots and
> mirroring.  They all have their pros and cons so it's not possible to
> support only one approach for all use cases.
>
> The suggested approach is writing a BlockDriver which mirrors I/O to two
> BlockDriverStates.  There has been discussion around breaking
> BlockDriver into smaller interfaces, including a BlockFilter for
> intercepting I/O, but this has not been implemented.  blkverify is an
> example of a BlockDriver that manages two child BlockDriverStates and
> may be a good starting point.
>

BlockFilter sounds interesting.  The main reason I proposed 'block-trace'
is because that is almost identical to what I currently have implemented
with the tracing framework---I just didn't have a nice QMP command.

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 1334 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14  8:50 ` Kevin Wolf
  2013-05-14 10:04   ` Paolo Bonzini
@ 2013-05-14 15:45   ` Wolfgang Richter
  1 sibling, 0 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-14 15:45 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1645 bytes --]

On Tue, May 14, 2013 at 4:50 AM, Kevin Wolf <kwolf@redhat.com> wrote:

> Or, to translate it into our existing terminology, drive-mirror
> implements a passive mirror, you're proposing an active one (which we
> do want to have).
>
> With an active mirror, we'll want to have another choice: The mirror can
> be synchronous (guest writes only complete after the mirrored write has
> completed) or asynchronous (completion is based only on the original
> image). It should be easy enough to support both once an active mirror
> exists.
>

Yes! Active mirroring is precisely what is needed to implement block-level
introspection.


> You're leaving out the most interesting section: How should block-trace
> be implemented?
>

Noted, although maybe folding it into 'drive-mirror' as an 'active' option
might be best, now that Paolo has spoken up.


> The other question is how to implement it internally. I don't think
> adding specific code for each new block job into bdrv_co_do_writev() is
> acceptable. We really need a generic way to intercept I/O operations.
> The keyword from earlier discussions is "block filters". Essentially the
> idea is that the block job temporarily adds a BlockDriverState on top of
> the format driver and becomes able to implement all callbacks it likes
> to intercept. The bad news is that the infrastructure isn't there yet
> to actually make this happen in a sane way.


Yeah, I'd also really love block filters and probably would have
originally used them instead of the tracing subsystem originally if they
existed.  It would make implementing all kinds of 'block-level' features
much, much easier.

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 2465 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14 10:04   ` Paolo Bonzini
@ 2013-05-14 15:48     ` Wolfgang Richter
  2013-05-14 16:45       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-14 15:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 14/05/2013 10:50, Kevin Wolf ha scritto:
> > Or, to translate it into our existing terminology, drive-mirror
> > implements a passive mirror, you're proposing an active one (which we
> > do want to have).
> >
> > With an active mirror, we'll want to have another choice: The mirror can
> > be synchronous (guest writes only complete after the mirrored write has
> > completed) or asynchronous (completion is based only on the original
> > image). It should be easy enough to support both once an active mirror
> > exists.
>
> Right, I'm waiting for Stefan's block-backup to give me the "right"
> hooks for the active mirror.
>
> The bulk phase will always be passive, but an active-asynchronous mirror
> has some interesting properties and it makes sense to implement it.
>

Do you mean you'd model the 'active' mode after 'block-backup,' or actually
call functions provided by 'block-backup'?  If I knew more about what you
had in mind, I wouldn't mind trying to add this 'active' mode to
'drive-mirror'
and test it with my use case.  I want to avoid duplicate work, so if you
want to implement it yourself I can defer this.

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 1927 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14 15:48     ` Wolfgang Richter
@ 2013-05-14 16:45       ` Paolo Bonzini
  2013-05-14 19:30         ` Wolfgang Richter
  2013-05-15  7:59         ` Kevin Wolf
  0 siblings, 2 replies; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-14 16:45 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Kevin Wolf, qemu-devel, stefanha

Il 14/05/2013 17:48, Wolfgang Richter ha scritto:
> On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini <pbonzini@redhat.com
> <mailto:pbonzini@redhat.com>> wrote:
> 
>     Il 14/05/2013 10:50, Kevin Wolf ha scritto:
>     > Or, to translate it into our existing terminology, drive-mirror
>     > implements a passive mirror, you're proposing an active one (which we
>     > do want to have).
>     >
>     > With an active mirror, we'll want to have another choice: The
>     mirror can
>     > be synchronous (guest writes only complete after the mirrored
>     write has
>     > completed) or asynchronous (completion is based only on the original
>     > image). It should be easy enough to support both once an active mirror
>     > exists.
> 
>     Right, I'm waiting for Stefan's block-backup to give me the "right"
>     hooks for the active mirror.
> 
>     The bulk phase will always be passive, but an active-asynchronous mirror
>     has some interesting properties and it makes sense to implement it.
> 
> 
> Do you mean you'd model the 'active' mode after 'block-backup,' or actually
> call functions provided by 'block-backup'?

No, I'll just reuse the same hooks within block/mirror.c (almost... it
looks like I need after_write too, not just before_write :( that's a
pity).  Basically:

1) before the write, if there is space in the job's buffers, allocate a
MirrorOp and a data buffer for the write.  Also record whether the block
was dirty before;

2) after the write, do nothing if there was no room to allocate the data
buffer.  Else clear the block from the dirty bitmap.  If the block was
dirty, read the whole cluster from the source as in passive mirroring.
If it wasn't, copy the data from guest memory to the preallocated buffer
and write it to the destination;

> If I knew more about what you
> had in mind, I wouldn't mind trying to add this 'active' mode to
> 'drive-mirror'
> and test it with my use case.  I want to avoid duplicate work, so if you
> want to implement it yourself I can defer this.

Also the other way round.  If you want to give it a shot based on the
above spec just tell me.

It should require no changes to block.c except for adding after_write.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14 16:45       ` Paolo Bonzini
@ 2013-05-14 19:30         ` Wolfgang Richter
  2013-05-15  7:59         ` Kevin Wolf
  1 sibling, 0 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-14 19:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1532 bytes --]

On Tue, May 14, 2013 at 12:45 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> No, I'll just reuse the same hooks within block/mirror.c (almost... it
> looks like I need after_write too, not just before_write :( that's a
> pity).  Basically:
>
> 1) before the write, if there is space in the job's buffers, allocate a
> MirrorOp and a data buffer for the write.  Also record whether the block
> was dirty before;
>
> 2) after the write, do nothing if there was no room to allocate the data
> buffer.  Else clear the block from the dirty bitmap.  If the block was
> dirty, read the whole cluster from the source as in passive mirroring.
> If it wasn't, copy the data from guest memory to the preallocated buffer
> and write it to the destination;
>
> > If I knew more about what you
> > had in mind, I wouldn't mind trying to add this 'active' mode to
> > 'drive-mirror'
> > and test it with my use case.  I want to avoid duplicate work, so if you
> > want to implement it yourself I can defer this.
>
> Also the other way round.  If you want to give it a shot based on the
> above spec just tell me.


Talked with my group here as well.  I think I'd like to give it a shot
based on the
above spec rather than refactor my code into a new command.  This way it
will
hopefully reduce duplicated efforts, and provide extra testing for the
"active
mirroring" code.

I'll take a pass through the mirror code to make sure I understand it
better than
I currently do.

Would you like to coordinate off-list until we have a patch?

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 2229 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-14 16:45       ` Paolo Bonzini
  2013-05-14 19:30         ` Wolfgang Richter
@ 2013-05-15  7:59         ` Kevin Wolf
  2013-05-15  8:25           ` Paolo Bonzini
  1 sibling, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-15  7:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wolfgang Richter, qemu-devel, stefanha

Am 14.05.2013 um 18:45 hat Paolo Bonzini geschrieben:
> Il 14/05/2013 17:48, Wolfgang Richter ha scritto:
> > On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini <pbonzini@redhat.com
> > <mailto:pbonzini@redhat.com>> wrote:
> > 
> >     Il 14/05/2013 10:50, Kevin Wolf ha scritto:
> >     > Or, to translate it into our existing terminology, drive-mirror
> >     > implements a passive mirror, you're proposing an active one (which we
> >     > do want to have).
> >     >
> >     > With an active mirror, we'll want to have another choice: The
> >     mirror can
> >     > be synchronous (guest writes only complete after the mirrored
> >     write has
> >     > completed) or asynchronous (completion is based only on the original
> >     > image). It should be easy enough to support both once an active mirror
> >     > exists.
> > 
> >     Right, I'm waiting for Stefan's block-backup to give me the "right"
> >     hooks for the active mirror.
> > 
> >     The bulk phase will always be passive, but an active-asynchronous mirror
> >     has some interesting properties and it makes sense to implement it.
> > 
> > 
> > Do you mean you'd model the 'active' mode after 'block-backup,' or actually
> > call functions provided by 'block-backup'?
> 
> No, I'll just reuse the same hooks within block/mirror.c (almost... it
> looks like I need after_write too, not just before_write :( that's a
> pity).

Makes me wonder if using a real BlockDriver for the filter from the
beginning wouldn't be better than accumulating more and more hooks and
having to find ways to pass data from 'before' to 'after' hooks...

> Basically:
> 
> 1) before the write, if there is space in the job's buffers, allocate a
> MirrorOp and a data buffer for the write.  Also record whether the block
> was dirty before;
> 
> 2) after the write, do nothing if there was no room to allocate the data
> buffer.  Else clear the block from the dirty bitmap.  If the block was
> dirty, read the whole cluster from the source as in passive mirroring.
> If it wasn't, copy the data from guest memory to the preallocated buffer
> and write it to the destination;

Does the "if there was no room" part mean that the mirror is active only
sometimes?

And why even bother with a dirty bitmap for an active mirror? The
background job that sequentially processes the whole image only needs a
counter, no bitmap.

At which point it looks like implementing it separate from mirror.c
could make more sense.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-15  7:59         ` Kevin Wolf
@ 2013-05-15  8:25           ` Paolo Bonzini
  2013-05-15  8:53             ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-15  8:25 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Wolfgang Richter, qemu-devel, stefanha

Il 15/05/2013 09:59, Kevin Wolf ha scritto:
>>> Do you mean you'd model the 'active' mode after 'block-backup,' or actually
>>> call functions provided by 'block-backup'?
>>
>> No, I'll just reuse the same hooks within block/mirror.c (almost... it
>> looks like I need after_write too, not just before_write :( that's a
>> pity).
> 
> Makes me wonder if using a real BlockDriver for the filter from the
> beginning wouldn't be better than accumulating more and more hooks and
> having to find ways to pass data from 'before' to 'after' hooks...

We don't need a way to pass data from before to after hooks, a simple
scan of a linked list will do.

>> Basically:
>>
>> 1) before the write, if there is space in the job's buffers, allocate a
>> MirrorOp and a data buffer for the write.  Also record whether the block
>> was dirty before;
>>
>> 2) after the write, do nothing if there was no room to allocate the data
>> buffer.  Else clear the block from the dirty bitmap.  If the block was
>> dirty, read the whole cluster from the source as in passive mirroring.
>> If it wasn't, copy the data from guest memory to the preallocated buffer
>> and write it to the destination;
> 
> Does the "if there was no room" part mean that the mirror is active only
> sometimes?

Yes, otherwise the guest can allocate arbitrary amounts of memory in the
host just by starting a few very large I/O operations.

> And why even bother with a dirty bitmap for an active mirror? The
> background job that sequentially processes the whole image only needs a
> counter, no bitmap.

That's not enough for the case when the host crashes and you have to
restart the mirroring or complete it offline.

Paolo

> At which point it looks like implementing it separate from mirror.c
> could make more sense.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-15  8:25           ` Paolo Bonzini
@ 2013-05-15  8:53             ` Kevin Wolf
  2013-05-15  9:16               ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-15  8:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wolfgang Richter, qemu-devel, stefanha

Am 15.05.2013 um 10:25 hat Paolo Bonzini geschrieben:
> Il 15/05/2013 09:59, Kevin Wolf ha scritto:
> >>> Do you mean you'd model the 'active' mode after 'block-backup,' or actually
> >>> call functions provided by 'block-backup'?
> >>
> >> No, I'll just reuse the same hooks within block/mirror.c (almost... it
> >> looks like I need after_write too, not just before_write :( that's a
> >> pity).
> > 
> > Makes me wonder if using a real BlockDriver for the filter from the
> > beginning wouldn't be better than accumulating more and more hooks and
> > having to find ways to pass data from 'before' to 'after' hooks...
> 
> We don't need a way to pass data from before to after hooks, a simple
> scan of a linked list will do.

So in this case the linked list is the way.

> >> Basically:
> >>
> >> 1) before the write, if there is space in the job's buffers, allocate a
> >> MirrorOp and a data buffer for the write.  Also record whether the block
> >> was dirty before;
> >>
> >> 2) after the write, do nothing if there was no room to allocate the data
> >> buffer.  Else clear the block from the dirty bitmap.  If the block was
> >> dirty, read the whole cluster from the source as in passive mirroring.
> >> If it wasn't, copy the data from guest memory to the preallocated buffer
> >> and write it to the destination;
> > 
> > Does the "if there was no room" part mean that the mirror is active only
> > sometimes?
> 
> Yes, otherwise the guest can allocate arbitrary amounts of memory in the
> host just by starting a few very large I/O operations.

I think I would rather throttle I/O in this case, i.e. requests wait
until they can get the space. At least for a synchronous mirror we
have to do something like this.

> > And why even bother with a dirty bitmap for an active mirror? The
> > background job that sequentially processes the whole image only needs a
> > counter, no bitmap.
> 
> That's not enough for the case when the host crashes and you have to
> restart the mirroring or complete it offline.

You're thinking of a persistent bitmap here? Makes sense then, I didn't
think about that.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-15  8:53             ` Kevin Wolf
@ 2013-05-15  9:16               ` Paolo Bonzini
  2013-05-15  9:46                 ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-15  9:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Wolfgang Richter, qemu-devel, stefanha


> > We don't need a way to pass data from before to after hooks, a simple
> > scan of a linked list will do.
> 
> So in this case the linked list is the way.

Point taken. :)

> > > Does the "if there was no room" part mean that the mirror is active only
> > > sometimes?
> > 
> > Yes, otherwise the guest can allocate arbitrary amounts of memory in the
> > host just by starting a few very large I/O operations.
> 
> I think I would rather throttle I/O in this case, i.e. requests wait
> until they can get the space. At least for a synchronous mirror we
> have to do something like this.

Yes, but this is still asynchronous.  The active part is just an optimization
to avoid write amplification (where small random writes require I/O of an entire
block as big as the bitmap granularity).

> > > And why even bother with a dirty bitmap for an active mirror? The
> > > background job that sequentially processes the whole image only needs a
> > > counter, no bitmap.
> > 
> > That's not enough for the case when the host crashes and you have to
> > restart the mirroring or complete it offline.
> 
> You're thinking of a persistent bitmap here? Makes sense then, I didn't
> think about that.

Yes.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-15  9:16               ` Paolo Bonzini
@ 2013-05-15  9:46                 ` Kevin Wolf
  2013-05-15 11:54                   ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-15  9:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wolfgang Richter, qemu-devel, stefanha

Am 15.05.2013 um 11:16 hat Paolo Bonzini geschrieben:
> > > > Does the "if there was no room" part mean that the mirror is active only
> > > > sometimes?
> > > 
> > > Yes, otherwise the guest can allocate arbitrary amounts of memory in the
> > > host just by starting a few very large I/O operations.

On second thought, can't you do zero copy anyway for full cluster
writes? This means that at most two clusters per request must be
allocated, no matter how large it is, and you can probably reuse the
same one-cluster buffer for both.

> > I think I would rather throttle I/O in this case, i.e. requests wait
> > until they can get the space. At least for a synchronous mirror we
> > have to do something like this.
> 
> Yes, but this is still asynchronous.  The active part is just an optimization
> to avoid write amplification (where small random writes require I/O of an entire
> block as big as the bitmap granularity).

Yes, that sounds like a good use case.

But does this really cover all use cases a real synchronous active
mirror would provide? I understood that Wolf wants to get every single
guest request exposed e.g. on an NBD connection.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-15  9:46                 ` Kevin Wolf
@ 2013-05-15 11:54                   ` Paolo Bonzini
  2013-05-22 15:46                     ` Wolfgang Richter
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-15 11:54 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Wolfgang Richter, qemu-devel, stefanha

Il 15/05/2013 11:46, Kevin Wolf ha scritto:
> Am 15.05.2013 um 11:16 hat Paolo Bonzini geschrieben:
>>>>> Does the "if there was no room" part mean that the mirror is active only
>>>>> sometimes?
>>>>
>>>> Yes, otherwise the guest can allocate arbitrary amounts of memory in the
>>>> host just by starting a few very large I/O operations.
> 
> On second thought, can't you do zero copy anyway for full cluster
> writes? This means that at most two clusters per request must be
> allocated, no matter how large it is, and you can probably reuse the
> same one-cluster buffer for both.

Only for synchronous mirror.  For an asynchronous mirror, there's no
guarantee that the mirror finishes writing before the source.  When that
fails, the guest can touch the memory and the mirror diverges from the
source.

>>> I think I would rather throttle I/O in this case, i.e. requests wait
>>> until they can get the space. At least for a synchronous mirror we
>>> have to do something like this.
>>
>> Yes, but this is still asynchronous.  The active part is just an optimization
>> to avoid write amplification (where small random writes require I/O of an entire
>> block as big as the bitmap granularity).
> 
> Yes, that sounds like a good use case.
> 
> But does this really cover all use cases a real synchronous active
> mirror would provide? I understood that Wolf wants to get every single
> guest request exposed e.g. on an NBD connection.

He can use throttling to limit the guest's I/O speed to the size of the
asynchronous mirror's buffer.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-13 21:21 [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection Wolfgang Richter
  2013-05-14  8:40 ` Stefan Hajnoczi
  2013-05-14  8:50 ` Kevin Wolf
@ 2013-05-16 13:44 ` Richard W.M. Jones
  2013-05-22 15:51   ` Wolfgang Richter
  2 siblings, 1 reply; 27+ messages in thread
From: Richard W.M. Jones @ 2013-05-16 13:44 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Paolo Bonzini, qemu-devel, stefanha

[...]

>From my point of view, what I'm missing here is how would I use it.

Ideally I'd like to issue some QMP commands which would set up the
point-in-time snapshot, and then connect to this snapshot over (eg)
NBD, then when I'm done, send some more QMP commands to tear down the
snapshot.

I think this document would be better with one or more examples
showing how this would be used.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-15 11:54                   ` Paolo Bonzini
@ 2013-05-22 15:46                     ` Wolfgang Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-22 15:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

On Wed, May 15, 2013 at 7:54 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> > But does this really cover all use cases a real synchronous active
> > mirror would provide? I understood that Wolf wants to get every single
> > guest request exposed e.g. on an NBD connection.
>
> He can use throttling to limit the guest's I/O speed to the size of the
> asynchronous mirror's buffer.
>

Throttling is fine for me, and actually what I do today (this is the
highest source of
overhead for a system that wants to see everything), just with the tracing
framework.

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 1001 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-16 13:44 ` Richard W.M. Jones
@ 2013-05-22 15:51   ` Wolfgang Richter
  2013-05-22 16:11     ` Paolo Bonzini
  2013-05-22 16:42     ` Richard W.M. Jones
  0 siblings, 2 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-22 15:51 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Paolo Bonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On Thu, May 16, 2013 at 9:44 AM, Richard W.M. Jones <rjones@redhat.com>wrote:

> Ideally I'd like to issue some QMP commands which would set up the
> point-in-time snapshot, and then connect to this snapshot over (eg)
> NBD, then when I'm done, send some more QMP commands to tear down the

snapshot.
>

This is actually interesting.  Does the QEMU nbd server support multiple
readers?

Essentially, if you're RWMJ (not me), and you're keeping a full mirror,
it's clear that
the mirror write stream goes to an nbd server, but is it possible to attach
a reader
to that same nbd server and read things back (read-only)?  I know it's
possible to name
the volumes you attach to, so I think conceptually with the nbd protocol
this should work.

I think this document would be better with one or more examples
> showing how this would be used.
>

I think the thread now has me looking at making the mirror command 'active'
:-)
rather than have a new QMP command.

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 1790 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 15:51   ` Wolfgang Richter
@ 2013-05-22 16:11     ` Paolo Bonzini
  2013-05-22 16:29       ` Wolfgang Richter
  2013-05-22 16:42     ` Richard W.M. Jones
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-22 16:11 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Richard W.M. Jones, stefanha, qemu-devel

Il 22/05/2013 17:51, Wolfgang Richter ha scritto:
> On Thu, May 16, 2013 at 9:44 AM, Richard W.M. Jones <rjones@redhat.com
> <mailto:rjones@redhat.com>> wrote:
> 
>     Ideally I'd like to issue some QMP commands which would set up the
>     point-in-time snapshot, and then connect to this snapshot over (eg)
>     NBD, then when I'm done, send some more QMP commands to tear down the 
> 
>     snapshot.
> 
> 
> This is actually interesting.  Does the QEMU nbd server support multiple
> readers?

Yes.

> Essentially, if you're RWMJ (not me), and you're keeping a full
> mirror, it's clear that the mirror write stream goes to an nbd server,
> but is it possible to attach a reader to that same nbd server and read
> things back (read-only)?

Yes, it can be done with both qemu-nbd and the QEMU nbd server commands.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 16:11     ` Paolo Bonzini
@ 2013-05-22 16:29       ` Wolfgang Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-22 16:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard W.M. Jones, stefanha, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]

On Wed, May 22, 2013 at 12:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> > Essentially, if you're RWMJ (not me), and you're keeping a full
> > mirror, it's clear that the mirror write stream goes to an nbd server,
> > but is it possible to attach a reader to that same nbd server and read
> > things back (read-only)?
>
> Yes, it can be done with both qemu-nbd and the QEMU nbd server commands.


Then this means, if there was an active mirror (or snapshot being created),
it would
be easy to attach an nbd client as a reader to it even as it is being
synchronized
(perhaps dangerous?).

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 1037 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 15:51   ` Wolfgang Richter
  2013-05-22 16:11     ` Paolo Bonzini
@ 2013-05-22 16:42     ` Richard W.M. Jones
  2013-05-22 18:32       ` Wolfgang Richter
  1 sibling, 1 reply; 27+ messages in thread
From: Richard W.M. Jones @ 2013-05-22 16:42 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Paolo Bonzini, qemu-devel, stefanha

On Wed, May 22, 2013 at 11:51:16AM -0400, Wolfgang Richter wrote:
> This is actually interesting.  Does the QEMU nbd server support multiple
> readers?

Yes.  qemu-nbd has a -e/--shared=<N> option which appears to do
exactly what it says in the man page.

$ guestfish -N fs exit
$ ls -lh test1.img 
-rw-rw-r--. 1 rjones rjones 100M May 22 17:37 test1.img
$ qemu-nbd -e 3 -r -t test1.img

>From another shell:

$ guestfish --format=raw -a nbd://localhost

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> run
><fs> list-filesystems 
/dev/sda1: ext2

Run up to two extra guestfish instances, with the same result.  The
fourth guestfish instance hangs at the 'run' command until one of the
first three is told to exit.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 16:42     ` Richard W.M. Jones
@ 2013-05-22 18:32       ` Wolfgang Richter
  2013-05-22 19:26         ` Richard W.M. Jones
  0 siblings, 1 reply; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-22 18:32 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Paolo Bonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 389 bytes --]

On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones <rjones@redhat.com>wrote:

> Run up to two extra guestfish instances, with the same result.  The
> fourth guestfish instance hangs at the 'run' command until one of the
> first three is told to exit.


And your interested on being notified when a snapshot is "safe" to read
from?
Or is it valuable to try reading immediately?

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 767 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 18:32       ` Wolfgang Richter
@ 2013-05-22 19:26         ` Richard W.M. Jones
  2013-05-22 19:38           ` Wolfgang Richter
  0 siblings, 1 reply; 27+ messages in thread
From: Richard W.M. Jones @ 2013-05-22 19:26 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Paolo Bonzini, qemu-devel, stefanha

On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote:
> On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones <rjones@redhat.com>wrote:
> 
> > Run up to two extra guestfish instances, with the same result.  The
> > fourth guestfish instance hangs at the 'run' command until one of the
> > first three is told to exit.
> 
> 
> And your interested on being notified when a snapshot is "safe" to read
> from?
> Or is it valuable to try reading immediately?

I'm not sure I understand the question.

I assumed (maybe wrongly) that if we had an NBD address (ie. Unix
socket or IP:port) then we'd just connect to that and go.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 19:26         ` Richard W.M. Jones
@ 2013-05-22 19:38           ` Wolfgang Richter
  2013-05-22 20:47             ` Richard W.M. Jones
  0 siblings, 1 reply; 27+ messages in thread
From: Wolfgang Richter @ 2013-05-22 19:38 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Paolo Bonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

On Wed, May 22, 2013 at 3:26 PM, Richard W.M. Jones <rjones@redhat.com>wrote:

> On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote:
> > On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones <rjones@redhat.com
> >wrote:
> >
> > > Run up to two extra guestfish instances, with the same result.  The
> > > fourth guestfish instance hangs at the 'run' command until one of the
> > > first three is told to exit.
> >
> >
> > And your interested on being notified when a snapshot is "safe" to read
> > from?
> > Or is it valuable to try reading immediately?
>
> I'm not sure I understand the question.
>
> I assumed (maybe wrongly) that if we had an NBD address (ie. Unix
> socket or IP:port) then we'd just connect to that and go.


I meant if there was interest in reading from a disk that isn't fully
synchronized
(yet) to the original disk (it might have old blocks).  Or would you only
want to
connect once a (complete) snapshot is available (synchronized completely to
some point-in.

-- 
Wolf

[-- Attachment #2: Type: text/html, Size: 1597 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 19:38           ` Wolfgang Richter
@ 2013-05-22 20:47             ` Richard W.M. Jones
  2013-05-22 21:46               ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Richard W.M. Jones @ 2013-05-22 20:47 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Paolo Bonzini, qemu-devel, stefanha

On Wed, May 22, 2013 at 03:38:33PM -0400, Wolfgang Richter wrote:
> On Wed, May 22, 2013 at 3:26 PM, Richard W.M. Jones <rjones@redhat.com>wrote:
> 
> > On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote:
> > > On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones <rjones@redhat.com
> > >wrote:
> > >
> > > > Run up to two extra guestfish instances, with the same result.  The
> > > > fourth guestfish instance hangs at the 'run' command until one of the
> > > > first three is told to exit.
> > >
> > >
> > > And your interested on being notified when a snapshot is "safe" to read
> > > from?
> > > Or is it valuable to try reading immediately?
> >
> > I'm not sure I understand the question.
> >
> > I assumed (maybe wrongly) that if we had an NBD address (ie. Unix
> > socket or IP:port) then we'd just connect to that and go.
> 
> 
> I meant if there was interest in reading from a disk that isn't fully
> synchronized
> (yet) to the original disk (it might have old blocks).  Or would you only
> want to
> connect once a (complete) snapshot is available (synchronized completely to
> some point-in.

IIUC a disk which wasn't fully synchronized wouldn't necessarily be
interpretable by libguestfs, so I guess we would need the complete
snapshot.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 20:47             ` Richard W.M. Jones
@ 2013-05-22 21:46               ` Paolo Bonzini
  2013-05-23  7:50                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-22 21:46 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Wolfgang Richter, qemu-devel, stefanha

Il 22/05/2013 22:47, Richard W.M. Jones ha scritto:
>> > 
>> > I meant if there was interest in reading from a disk that isn't fully
>> > synchronized
>> > (yet) to the original disk (it might have old blocks).  Or would you only
>> > want to
>> > connect once a (complete) snapshot is available (synchronized completely to
>> > some point-in.
> IIUC a disk which wasn't fully synchronized wouldn't necessarily be
> interpretable by libguestfs, so I guess we would need the complete
> snapshot.

In the case of point-in-time backups (Stefan's block-backup) the plan is
to have the snapshot complete from the beginning.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
  2013-05-22 21:46               ` Paolo Bonzini
@ 2013-05-23  7:50                 ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-23  7:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wolfgang Richter, Richard W.M. Jones, stefanha, qemu-devel

On Wed, May 22, 2013 at 11:46:15PM +0200, Paolo Bonzini wrote:
> Il 22/05/2013 22:47, Richard W.M. Jones ha scritto:
> >> > 
> >> > I meant if there was interest in reading from a disk that isn't fully
> >> > synchronized
> >> > (yet) to the original disk (it might have old blocks).  Or would you only
> >> > want to
> >> > connect once a (complete) snapshot is available (synchronized completely to
> >> > some point-in.
> > IIUC a disk which wasn't fully synchronized wouldn't necessarily be
> > interpretable by libguestfs, so I guess we would need the complete
> > snapshot.
> 
> In the case of point-in-time backups (Stefan's block-backup) the plan is
> to have the snapshot complete from the beginning.

The way it will work is that the drive-backup target is a qcow2 image
with the guest's disk as its backing file.  When the guest writes to the
disk, drive-backup copies the original data to the qcow2 image.

The qcow2 image is exported over NBD so a client can connect to access
the read-only point-in-time snapshot.  It is not necessary to populate
the qcow2 file since it uses the guest disk as its backing file - all
reads to unpopulated clusters go to the backing file.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-05-23  7:50 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-13 21:21 [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection Wolfgang Richter
2013-05-14  8:40 ` Stefan Hajnoczi
2013-05-14 15:42   ` Wolfgang Richter
2013-05-14  8:50 ` Kevin Wolf
2013-05-14 10:04   ` Paolo Bonzini
2013-05-14 15:48     ` Wolfgang Richter
2013-05-14 16:45       ` Paolo Bonzini
2013-05-14 19:30         ` Wolfgang Richter
2013-05-15  7:59         ` Kevin Wolf
2013-05-15  8:25           ` Paolo Bonzini
2013-05-15  8:53             ` Kevin Wolf
2013-05-15  9:16               ` Paolo Bonzini
2013-05-15  9:46                 ` Kevin Wolf
2013-05-15 11:54                   ` Paolo Bonzini
2013-05-22 15:46                     ` Wolfgang Richter
2013-05-14 15:45   ` Wolfgang Richter
2013-05-16 13:44 ` Richard W.M. Jones
2013-05-22 15:51   ` Wolfgang Richter
2013-05-22 16:11     ` Paolo Bonzini
2013-05-22 16:29       ` Wolfgang Richter
2013-05-22 16:42     ` Richard W.M. Jones
2013-05-22 18:32       ` Wolfgang Richter
2013-05-22 19:26         ` Richard W.M. Jones
2013-05-22 19:38           ` Wolfgang Richter
2013-05-22 20:47             ` Richard W.M. Jones
2013-05-22 21:46               ` Paolo Bonzini
2013-05-23  7:50                 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).