From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
Date: Thu, 6 Feb 2020 18:06:45 +0100 [thread overview]
Message-ID: <a3dfe3d1-542c-d12f-bcb5-05ab1cf26f6b@redhat.com> (raw)
In-Reply-To: <20200206165700.GH4926@linux.fritz.box>
[-- Attachment #1.1: Type: text/plain, Size: 7610 bytes --]
On 06.02.20 17:57, Kevin Wolf wrote:
> Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
>> On 06.02.20 16:51, Kevin Wolf wrote:
>>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>>>> On 06.02.20 15:58, Kevin Wolf wrote:
>>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>>>> replaced without breaking anything else.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> ---
>>>>>>>> block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>>> 1 file changed, 25 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>>>> --- a/block/quorum.c
>>>>>>>> +++ b/block/quorum.c
>>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>>>
>>>>>>>> typedef struct QuorumChild {
>>>>>>>> BdrvChild *child;
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * If set, check whether this node can be replaced without any
>>>>>>>> + * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>>>> + * WRITE permission.
>>>>>>>> + */
>>>>>>>> + bool to_be_replaced;
>>>>>>>
>>>>>>> I don't understand these permission changes. How does (preparing for)
>>>>>>> detaching a node from quorum make its content invalid?
>>>>>>
>>>>>> It doesn’t, of course. What we are preparing for is to replace it by
>>>>>> some other node with some other content.
>>>>>>
>>>>>>> And why do we
>>>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>>>> read-only?
>>>>>>>
>>>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>>>> Is this really "make sure that"?
>>>>>>
>>>>>> I think the comment is not only unclear, it is the problem. (Well,
>>>>>> maybe the code is also.)
>>>>>>
>>>>>> This series is about fixing at least some things about replacing nodes
>>>>>> by mirroring. The original use cases this was introduced for was to fix
>>>>>> broken quorum children: The other children are still intact, so you read
>>>>>> from the quorum node and replace the broken child (which maybe shows
>>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>>>
>>>>>> Replacing that broken node by the fixed one changes the data that’s
>>>>>> visible on that node.
>>>>>
>>>>> Hm, yes, that's true. But I wonder if this is really something that the
>>>>> permission system must catch. Like other graph manipulations, it's
>>>>> essentially the user saying "trust me, I know what I'm doing, this node
>>>>> makes sense in this place".
>>>>>
>>>>> Because if you assume that the user could add a node with unsuitable
>>>>> content and you want to prevent this, where do we stop?
>>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>>>> visible data change. Should we therefore only allow snapshots when
>>>>> shared writes are allowed? This doesn't work obviously.
>>>>>
>>>>> So I'm inclined to say that this is the user's responsibility and we
>>>>> don't have to jump through hoops to prevent every possible way that the
>>>>> user could mess up. (Which often also result in preventing legitimate
>>>>> cases like here a quorum of read-only nodes.)
>>>>
>>>> Well, if you ask the question “where do we stop”, we also have to ask
>>>> the question “where do we start”. If we say the user knows what they’re
>>>> doing, we might as well drop the whole can_replace infrastructure
>>>> altogether and just assume that you can replace any node by anything.
>>>
>>> Well, I don't actually know if that would be completely unreasonable.
>>> The idea was obviously to keep graph changes restricted to very specific
>>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
>>> have quite a few more operations that allow changing the graph.
>>>
>>> So if preventing some cases gives us headaches and is probably more work
>>> than dealing with any bugs they might reveal, maybe preventing them is
>>> wrong.
>>>
>>> I'm just afraid that we might be overengineering this and waste time on
>>> things that we don't actually get much use from.
>>
>> That’s why I’m asking.
>
> Did I answer your question sufficiently then?
No, because “I’m afraid” is a sentiment I fully share, but it doesn’t
answer the question whether we are indeed overengineering this or not. :-)
I suppose my stance now is “This is probably overengineered, but now we
might as well roll with it”.
>>>> If the WRITE permission is the problem, then I suppose we can drop that.
>>>> Unsharing CONSISTENT_READ is bad enough that it effectively deters all
>>>> other parents anyway.
>>>
>>> WRITE is probably the more practical problem, though it's technically
>>> the correct one to take.
>>>
>>> CONSISTENT_READ is already a problem in theory because replacing a child
>>> node with different content doesn't even match its definition:
>>>
>>> /**
>>> * A user that has the "permission" of consistent reads is guaranteed that
>>> * their view of the contents of the block device is complete and
>>> * self-consistent, representing the contents of a disk at a specific
>>> * point.
>>> *
>>> * For most block devices (including their backing files) this is true, but
>>> * the property cannot be maintained in a few situations like for
>>> * intermediate nodes of a commit block job.
>>> */
>>> BLK_PERM_CONSISTENT_READ = 0x01,
>>>
>>> Replacing an image with a different image means that the node represents
>>> the content of a different disk now, but it's probably still complete
>>> and self-consistent.
>>
>> At any point in time yes, but not over the time span of the change. The
>> definition doesn’t say that the node represents the contents of a disk
>> at a specific point, but the view from the parent.
>>
>> I argue that that view is always over some period of time, so if you
>> suddenly switch out the whole disk, then it isn’t a self-consistent view.
>
> I think your theory that it's over some period of time conflicts with
> the documentation that says "at a specific point".
I’d rather not get into a deeper discussion on what CONSISTENT_READ
means again... :-/
I always feel like if you really take only a single point in time, then
anything could be some hypothetical disk.
So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch
this, you don’t want to”.
>> Alternatively, we could of course also just forego the permission system
>> here altogether and just check that there are no other parents at all.
>> (Which is effectively the same as unsharing CONSISTENT_READ.)
>
> This would sidestep all of the artificial permission twiddling, which
> sounds good.
>
> It would probably also needlessly restrict the allowed use cases,
Only in theory, though, because in practice basically everything useful
takes CONSISTENT_READ anyway.
> but
> then, who cares about nodes with multiple parents, one of which is a
> quorum node?
>
> So I guess I would be fine with either checking that there are no
> parents or maybe even just dropping the check completely.
OK, I’ll check the parent list then. (Except it must be exactly one
parent, namely Quorum.)
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-02-06 17:07 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
2020-01-30 21:44 ` [PATCH v3 01/21] blockdev: Allow external snapshots everywhere Max Reitz
2020-01-30 21:44 ` [PATCH v3 02/21] blockdev: Allow resizing everywhere Max Reitz
2020-01-30 21:44 ` [PATCH v3 03/21] block: Drop bdrv_is_first_non_filter() Max Reitz
2020-01-30 21:44 ` [PATCH v3 04/21] iotests: Let 041 use -blockdev for quorum children Max Reitz
2020-01-30 21:44 ` [PATCH v3 05/21] quorum: Fix child permissions Max Reitz
2020-01-30 21:44 ` [PATCH v3 06/21] block: Add bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 07/21] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 08/21] quorum: Store children in own structure Max Reitz
2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
2020-02-04 9:33 ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:38 ` Kevin Wolf
2020-02-06 10:11 ` Max Reitz
2020-02-06 14:58 ` Kevin Wolf
2020-02-06 15:21 ` Max Reitz
2020-02-06 15:51 ` Kevin Wolf
2020-02-06 16:43 ` Max Reitz
2020-02-06 16:47 ` Max Reitz
2020-02-06 16:58 ` Kevin Wolf
2020-02-06 16:57 ` Kevin Wolf
2020-02-06 17:06 ` Max Reitz [this message]
2020-02-06 17:41 ` Kevin Wolf
2020-01-30 21:44 ` [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2020-02-04 9:37 ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 11/21] block: Use bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2020-02-04 9:45 ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 13/21] mirror: Double-check immediately before replacing Max Reitz
2020-01-30 21:44 ` [PATCH v3 14/21] quorum: Stop marking it as a filter Max Reitz
2020-01-30 21:44 ` [PATCH v3 15/21] iotests: Use complete_and_wait() in 155 Max Reitz
2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
2020-02-04 10:33 ` Vladimir Sementsov-Ogievskiy
2020-02-04 14:09 ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns Max Reitz
2020-02-04 11:40 ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 18/21] iotests: Resolve TODOs in 041 Max Reitz
2020-02-04 9:54 ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 19/21] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2020-01-30 21:44 ` [PATCH v3 20/21] iotests: Add tests for invalid Quorum @replaces Max Reitz
2020-01-30 21:44 ` [PATCH v3 21/21] iotests: Check that @replaces can replace filters Max Reitz
2020-02-04 9:56 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3dfe3d1-542c-d12f-bcb5-05ab1cf26f6b@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).