Re: [Qemu-devel] [Nbd] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extension

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Denis V. Lunev" <den@openvz.org>
To: Eric Blake <eblake@redhat.com>, Alex Bligh <alex@alex.org.uk>
Cc: "nbd-general@lists.sourceforge.net"
	<nbd-general@lists.sourceforge.net>,
	Kevin Wolf <kwolf@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Pavel Borzenkov <pborzenkov@virtuozzo.com>,
	"Stefan stefanha@redhat. com" <stefanha@redhat.com>,
	Wouter Verhelst <w@uter.be>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [Nbd] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Mon, 4 Apr 2016 22:54:02 +0300	[thread overview]
Message-ID: <5702C65A.7040101@openvz.org> (raw)
In-Reply-To: <5702C1AB.8020601@redhat.com>

On 04/04/2016 10:34 PM, Eric Blake wrote:
> On 04/04/2016 12:06 PM, Alex Bligh wrote:
>> On 4 Apr 2016, at 17:39, Eric Blake <eblake@redhat.com> wrote:
>>
>>> +    This command is meant to operate in tandem with other (non-NBD)
>>> +    channels to the server.  Generally, a "dirty" block is a block
>>> +    that has been written to by someone, but the exact meaning of "has
>>> +    been written" is left to the implementation.  For example, a
>>> +    virtual machine monitor could provide a (non-NBD) command to start
>>> +    tracking blocks written by the virtual machine.  A backup client
>>> +    can then connect to an NBD server provided by the virtual machine
>>> +    monitor and use `NBD_CMD_BLOCK_STATUS` with the
>>> +    `NBD_FLAG_STATUS_DIRTY` bit set in order to read only the dirty
>>> +    blocks that the virtual machine has changed.
>>> +
>>> +    An implementation that doesn't track the "dirtiness" state of
>>> +    blocks MUST either fail this command with `EINVAL`, or mark all
>>> +    blocks as dirty in the descriptor that it returns.  Upon receiving
>>> +    an `NBD_CMD_BLOCK_STATUS` command with the flag
>>> +    `NBD_FLAG_STATUS_DIRTY` set, the server MUST return the dirtiness
>>> +    status of the device, where the status field of each descriptor is
>>> +    determined by the following bit:
>>> +
>>> +      - `NBD_STATE_CLEAN` (bit 2); if set, the block represents a
>>> +        portion of the file that is still clean because it has not
>>> +        been written; if clear, the block represents a portion of the
>>> +        file that is dirty, or where the server could not otherwise
>>> +        determine its status.
>> A couple of questions:
>>
>> 1. I am not sure that the block dirtiness and the zero/allocation/hole thing
>>     always have the same natural blocksize. It's pretty easy to imagine
>>     a server whose natural blocksize is a disk sector (and can therefore
>>     report presence of zeroes to that resolution) but where 'dirtiness'
>>     was maintained independently at a less fine-grained level. Maybe
>>     that suggests 2 commands would be useful.
> In fact, qemu does just that with qcow2 images - the user can request a
> dirtiness granularity that is much larger than cluster granularity
> (where clusters are the current limitation on reporting holes, but where
> Kevin Wolf has an idea about a potential qcow2 extension that would even
> let us report holes at a sector granularity).
>
> Nothing requires the two uses to report at the same granularity.  THe
> NBD_REPLY_TYPE_BLOCK_STATUS allows the server to divide into descriptors
> as it sees fit (so it could report holes at a 4k granularity, but
> dirtiness only at a 64k granularity) - all that matters is that when all
> the descriptors have been sent, they total up to the length of the
> original client request.  So by itself, granularity does not require
> another command.
exactly!


>> 2. Given the communication is out of band, how is it realistically
>>     possible to sync this backup? You'll ask for all the dirty blocks,
>>     but whilst the command is being executed (as well as immediately
>>     after the reply) further blocks may be dirtied. So your reply
>>     always overestimates what is clean (probably the wrong way around).
>>     Furthermore, the next time you do a 'backup', you don't know whether
>>     the blocks were dirty as they were dirty on the previous backup,
>>     or because they were dirty on this backup.
> You are correct that as a one-way operation, querying dirtiness is not
> very useful if there is not a way to mark something clean, or if
> something else can be dirtying things in parallel.  But that doesn't
> mean the command is not useful - if the NBD server is exporting a file
> as read-only, where nothing else can be dirtying it in parallel, then a
> single pass over the dirty information is sufficient to learn what
> portions of the file to copy out.
>
> At this point, I was just trying to rebase the proposal as originally
> made by Denis and Pavel; perhaps they will have more insight on how they
> envisioned using the command, or on whether we should try harder to make
> this more of a two-way protocol (where the client can tell the server
> when to mark something as clean, or when to start tracking whether
> something is dirty).
for now and for QEMU we want this to expose accumulated dirtiness
of the block device, which is collected by the server. Yes, this requires
external coordination. May be this COULD be the part of the protocol,
but QEMU will not use that part of the protocol.

saying about dirtiness, we would soon come to the fact, that
we can have several dirtiness states regarding different
lines of incremental backups. This complexity is hidden
inside QEMU and it would be very difficult to publish and
reuse it.


>> If I was designing a backup protocol (off the top of my head) I'd
>> make all commands return a monotonic 64 bit counter of the number of
>> writes to the disk since some arbitrary time, and provide a 'GETDIRTY'
>> command that returned all blocks with a monotonic counter greater than that.
>> That way I could precisely get the writes that were executed since
>> any particular read. You'd allow it to be 'slack' and include things
>> in that list that might not have changed (i.e. false positives) but
>> not false negatives.
> Yes, that might work as an implementation - but there's the question of
> whether other implementations would also work.  We want the protocol to
> describe the concept, and not be too heavily tied to one particular
> implementation.
>
> The documentation is also trying to be very straightforward that asking
> about dirtiness requires out-of-band coordination, and that a server can
> just blindly report everything as dirty if there is no better thing to
> report.  So anyone actually making use of this command already has to be
> aware of the out-of-band coordination needed to make it useful.
>
yes, and this approach is perfect. If there is no information about
dirtiness, we should report this as all dirty. Though this information
could be type-specific.

next prev parent reply	other threads:[~2016-04-04 19:54 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-04 16:39 [Qemu-devel] [PATCH v2] doc: Add NBD_CMD_BLOCK_STATUS extension Eric Blake
2016-04-04 18:06 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-04-04 19:34   ` Eric Blake
2016-04-04 19:54     ` Denis V. Lunev [this message]
2016-04-04 20:03       ` Alex Bligh
2016-04-04 20:08         ` Denis V. Lunev
2016-04-04 20:34           ` Eric Blake
2016-04-04 21:06             ` Denis V. Lunev
2016-04-04 21:12             ` Alex Bligh
2016-04-05 14:15         ` Paolo Bonzini
2016-04-05 15:01           ` Alex Bligh
2016-04-05 15:23             ` Paolo Bonzini
2016-04-05 15:27               ` Alex Bligh
2016-04-05 15:31                 ` Paolo Bonzini
2016-04-04 23:08       ` Wouter Verhelst
2016-04-04 23:32         ` Eric Blake
2016-04-05  7:16           ` Wouter Verhelst
2016-04-05 21:44           ` Wouter Verhelst
2016-04-05  7:13         ` Alex Bligh
2016-04-04 19:58     ` Alex Bligh
2016-04-04 20:04       ` Denis V. Lunev
2016-04-04 20:08         ` Alex Bligh
2016-04-04 20:13           ` Denis V. Lunev
2016-04-04 20:15             ` Alex Bligh
2016-04-04 20:27               ` Denis V. Lunev
2016-04-04 20:45                 ` Eric Blake
2016-04-04 21:04                   ` Denis V. Lunev
2016-04-04 21:12                     ` Alex Bligh
2016-04-04 21:17                     ` Eric Blake
2016-04-04 21:27                       ` Denis V. Lunev
2016-04-04 20:26           ` Eric Blake
2016-04-04 21:07             ` Alex Bligh
2016-04-04 21:25               ` Eric Blake
2016-04-04 22:06                 ` Alex Bligh
2016-04-04 20:22       ` Eric Blake
2016-04-05 13:38     ` Paolo Bonzini
2016-04-04 22:40 ` Wouter Verhelst
2016-04-04 23:03   ` Eric Blake
2016-04-05 13:41     ` Paolo Bonzini
2016-04-06  5:57     ` Denis V. Lunev
2016-04-06 14:08       ` Eric Blake
2016-04-05  4:05 ` [Qemu-devel] " Kevin Wolf
2016-04-05 13:43   ` Paolo Bonzini
2016-04-07 10:38     ` Vladimir Sementsov-Ogievskiy
2016-04-07 16:10       ` Eric Blake
2016-04-07 16:21         ` [Qemu-devel] [Nbd] " Alex Bligh
2016-04-08 11:35         ` [Qemu-devel] " Kevin Wolf
2016-04-09  9:08         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-04-13 12:38         ` [Qemu-devel] " Pavel Borzenkov
2016-04-13 14:40           ` Eric Blake
2016-04-07 15:35     ` Pavel Borzenkov
2016-04-07 15:43       ` Paolo Bonzini
2016-04-05  8:51 ` Stefan Hajnoczi
2016-04-05  9:24 ` [Qemu-devel] [Nbd] " Markus Pargmann
2016-04-05 13:50   ` Paolo Bonzini
2016-04-11  5:58     ` Markus Pargmann
2016-04-05 14:14   ` Eric Blake
2016-04-05 20:50     ` Wouter Verhelst
2016-04-11  6:07       ` Markus Pargmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5702C65A.7040101@openvz.org \
    --to=den@openvz.org \
    --cc=alex@alex.org.uk \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pbonzini@redhat.com \
    --cc=pborzenkov@virtuozzo.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).