qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Eric Blake <eblake@redhat.com>, John Snow <jsnow@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org, armbru@redhat.com,
	famz@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing
Date: Wed, 4 Jul 2018 16:07:07 +0200	[thread overview]
Message-ID: <480692b8-cbe1-3499-89c8-0d3f2cf99eb4@redhat.com> (raw)
In-Reply-To: <20180703111538.GB3812@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 5766 bytes --]

On 2018-07-03 13:15, Kevin Wolf wrote:
> Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.06.2018 20:40, Eric Blake wrote:
>>> On 06/29/2018 12:30 PM, John Snow wrote:
>>>>
>>>>
>>>> On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> We need to synchronize backup job with reading from fleecing image
>>>>> like it was done in block/replication.c.
>>>>>
>>>>> Otherwise, the following situation is theoretically possible:
>>>>>
>>>>> 1. client start reading
>>>>> 2. client understand, that there is no corresponding cluster in
>>>>>     fleecing image
>>>>
>>>> I don't think the client refocuses the read, but QEMU does. (the client
>>>> has no idea if it is reading from the fleecing node or the backing
>>>> file.)
>>>>
>>>> ... but understood:
>>>>
>>>>> 3. client is going to read from backing file (i.e. active image)
>>>>> 4. guest writes to active image
>>>>
>>>> My question here is if QEMU will allow reads and writes to interleave in
>>>> general. If the client is reading from the backing file, the active
>>>> image, will QEMU allow a write to modify that data before we're done
>>>> getting that data?
>>>>
>>>>> 5. this write is stopped by backup(sync=none) and cluster is copied to
>>>>>     fleecing image
>>>>> 6. guest write continues...
>>>>> 7. and client reads _new_ (or partly new) date from active image
>>>>>
>>>>
>>>> I can't answer this for myself one way or the other right now, but I
>>>> imagine you observed a failure in practice in your test setups, which
>>>> motivated this patch?
>>>>
>>>> A test or some proof would help justification for this patch. It would
>>>> also help prove that it solves what it claims to!
>>>
>>> In fact, do we really need a new filter, or do we just need to make the
>>> "sync":"none" blockdev-backup job take the appropriate synchronization
>>> locks?
>>>
>>
>> How? We'll need additional mechanism like serializing requests.. Or a way to
>> reuse serializing requests. Using backup internal synchronization looks
>> simpler, and it is already used in block replication.
> 
> But it also just an ugly hack that fixes one special case and leaves
> everything else broken. replication is usually not a good example for
> anything. It always gives me bad surprises when I have to look at it.
> 
> We'll have to figure out where to fix this problem (or what it really
> is, once you look more than just at fleecing), but I think requiring the
> user to add a filter driver to work around missing serialisation in
> other code, and corrupting their image if they forget to, is not a
> reasonable solution.
> 
> I see at least two things wrong in this context:
> 
> * The permissions don't seem to match reality. The NBD server
>   unconditionally shares PERM_WRITE, which is wrong in this case. The
>   client wants to see a point-in-time snapshot that never changes. This
>   should become an option so that it can be properly reflected in the
>   permissions used.
> 
> * Once we have proper permissions, the fleecing setup breaks down
>   because the guest needs PERM_WRITE on the backing file, but the
>   fleecing overlay allows that only if the NBD client allows it (which
>   it doesn't for fleecing).
> 
>   Now we can implement an exception right into backup that installs a
>   backup filter driver between source and target if the source is the
>   backing file of the target. The filter driver would be similar to the
>   commit filter driver in that it simply promises !PERM_WRITE to its
>   parents, but allows PERM_WRITE on the source because it has installed
>   the before_write_notifier that guarantees this condition.

I don't quite understand what you mean "between source and target",
because *the* backup filter driver would need to be on top of both.
Unless you continue to use before-write notifiers, but I don't see why
when a driver that sits on top of source would intercept all requests
anyway.

And if we want to unify mirror and backup at any time, I think we have
to get away from before-write anyway.

>   All writes to the target that are made by the backup job in this setup
>   (including before_write_notifier writes) need to be marked as
>   serialising so that any concurrent reads are completed first.
> 
> And if we decide to add a target filter to backup, we should probably at
> the same time use a filter driver for intercepting source writes instead
> of using before_write_notifier.
> 
> Max, I think you intended to make both source and target children of the
> same block job node (or at least for mirror). But wouldn't that create
> loops in a setup like this? I think we may need two filters that are
> only connected through the block job, but not with block graph edges.

A node can only provide one data view to all of its parents.  It cannot
distinguish based on the parent.  Therefore, the main backup driver on
top of source and target would necessarily present the current disk
(changing) state, just like the mirror filter does.

Fleecing wants another view, though.  It wants an unchanging view.  It
therefore seems clear to me that we'd need to separate nodes: One
between guest and source (which performs the backup by intercepting
write requests), and one between the fleecing user (e.g. NBD) and the
target.  The former presents the current state, the latter presents the
"unchanging" point-in-time snapshot.

Sure, we can get away without the former (we do so today), but I don't
see why we would want to.  blockdev-copy (as an extension of
blockdev-mirror) would have such a node anyway, and I can see no reason
why we wouldn't attach both source and target to it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2018-07-04 14:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29 15:15 [Qemu-devel] [PATCH v2 0/3] image fleecing Vladimir Sementsov-Ogievskiy
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 1/3] blockdev-backup: enable non-root nodes for backup source Vladimir Sementsov-Ogievskiy
2018-06-29 17:13   ` Eric Blake
2018-06-29 17:31   ` John Snow
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing Vladimir Sementsov-Ogievskiy
2018-06-29 17:24   ` Eric Blake
2018-07-02  6:35     ` Fam Zheng
2018-07-02 11:27       ` Vladimir Sementsov-Ogievskiy
2018-07-02 11:47     ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:30   ` John Snow
2018-06-29 17:40     ` Eric Blake
2018-07-02 12:09       ` Vladimir Sementsov-Ogievskiy
2018-07-03 11:15         ` Kevin Wolf
2018-07-03 11:52           ` Vladimir Sementsov-Ogievskiy
2018-07-03 16:11           ` Vladimir Sementsov-Ogievskiy
2018-07-03 18:02             ` Kevin Wolf
2018-07-04 14:07           ` Max Reitz [this message]
2018-07-02 11:57     ` Vladimir Sementsov-Ogievskiy
2018-07-03 11:22       ` Kevin Wolf
2018-06-29 15:15 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: Image fleecing test case 222 Vladimir Sementsov-Ogievskiy
2018-06-29 15:31   ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:58   ` Eric Blake
2018-06-29 21:04     ` John Snow
2018-07-02  6:45       ` Fam Zheng
2018-07-02 12:58       ` Vladimir Sementsov-Ogievskiy
2018-06-29 16:38 ` [Qemu-devel] [PATCH v2 0/3] image fleecing John Snow
2018-06-29 17:36   ` Vladimir Sementsov-Ogievskiy
2018-06-29 17:52     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=480692b8-cbe1-3499-89c8-0d3f2cf99eb4@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).