qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
Date: Thu, 6 Feb 2020 11:11:05 +0100	[thread overview]
Message-ID: <92b951da-5e12-e08f-f8f7-a862790b51ac@redhat.com> (raw)
In-Reply-To: <20200205153859.GE5768@dhcp-200-226.str.redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 3513 bytes --]

On 05.02.20 16:38, Kevin Wolf wrote:
> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>> We will need this to verify that Quorum can let one of its children be
>> replaced without breaking anything else.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 59cd524502..6a7224c9e4 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>  
>>  typedef struct QuorumChild {
>>      BdrvChild *child;
>> +
>> +    /*
>> +     * If set, check whether this node can be replaced without any
>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>> +     * WRITE permission.
>> +     */
>> +    bool to_be_replaced;
> 
> I don't understand these permission changes. How does (preparing for)
> detaching a node from quorum make its content invalid?

It doesn’t, of course.  What we are preparing for is to replace it by
some other node with some other content.

> And why do we
> suddenly need WRITE permissions even if the quorum node is only used
> read-only?
> 
> The comment is a bit unclear, too. "check whether" implies that both
> outcomes could be true, but it doesn't say what happens in either case.
> Is this really "make sure that"?

I think the comment is not only unclear, it is the problem.  (Well,
maybe the code is also.)

This series is about fixing at least some things about replacing nodes
by mirroring.  The original use cases this was introduced for was to fix
broken quorum children: The other children are still intact, so you read
from the quorum node and replace the broken child (which maybe shows
invalid data, or maybe just EIO) by the fixed mirror result.

Replacing that broken node by the fixed one changes the data that’s
visible on that node.

That’s fine with quorum because that one child never influenced its
result anyway.  In fact, we know that the new child agrees with the
quorum, so it actually reduces conflict.

But if there are other parents on the node, they may disagree.  So we
need to warn them that we will disrupt consistency by replacing the node
with a potentially completely different one.  If those other parents
don’t care about consistency (CONSISTENT_READ) and don’t mind data
changes (other WRITE users), then we can freely replace the node still.


Now (assuming that this reasoning makes sense) it may seem as if the
general block layer should handle such cases; like it should unshare
CONSISTENT_READ and take WRITE.  But that isn’t true, because it calls
bdrv_recurse_can_replace() specifically to check that the some node can
be replaced by the new node without changing any visible data.  So
usually there are no such sudden data changes.

Quorum is the only node that can tolerate such data changes on its
children without changing its own visible data.  Hence it’s responsible
for ensuring that there’s noone else that minds if one of its child
nodes is replaced by something else.

(Note that this isn’t about replacing a BdrvChild, but about replacing a
BDS.  It isn’t like only quorum’s BdrvChild will be made to point
somewhere else; all BdrvChild pointers to the old node will be made to
point to the new one.)


Again assuming this makes sense, I wonder how we could condense that
into a reasonable comment.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-02-06 10:14 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-30 21:44 [PATCH v3 00/21] block: Fix check_to_replace_node() Max Reitz
2020-01-30 21:44 ` [PATCH v3 01/21] blockdev: Allow external snapshots everywhere Max Reitz
2020-01-30 21:44 ` [PATCH v3 02/21] blockdev: Allow resizing everywhere Max Reitz
2020-01-30 21:44 ` [PATCH v3 03/21] block: Drop bdrv_is_first_non_filter() Max Reitz
2020-01-30 21:44 ` [PATCH v3 04/21] iotests: Let 041 use -blockdev for quorum children Max Reitz
2020-01-30 21:44 ` [PATCH v3 05/21] quorum: Fix child permissions Max Reitz
2020-01-30 21:44 ` [PATCH v3 06/21] block: Add bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 07/21] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 08/21] quorum: Store children in own structure Max Reitz
2020-01-30 21:44 ` [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced Max Reitz
2020-02-04  9:33   ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:38   ` Kevin Wolf
2020-02-06 10:11     ` Max Reitz [this message]
2020-02-06 14:58       ` Kevin Wolf
2020-02-06 15:21         ` Max Reitz
2020-02-06 15:51           ` Kevin Wolf
2020-02-06 16:43             ` Max Reitz
2020-02-06 16:47               ` Max Reitz
2020-02-06 16:58                 ` Kevin Wolf
2020-02-06 16:57               ` Kevin Wolf
2020-02-06 17:06                 ` Max Reitz
2020-02-06 17:41                   ` Kevin Wolf
2020-01-30 21:44 ` [PATCH v3 10/21] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2020-02-04  9:37   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 11/21] block: Use bdrv_recurse_can_replace() Max Reitz
2020-01-30 21:44 ` [PATCH v3 12/21] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2020-02-04  9:45   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 13/21] mirror: Double-check immediately before replacing Max Reitz
2020-01-30 21:44 ` [PATCH v3 14/21] quorum: Stop marking it as a filter Max Reitz
2020-01-30 21:44 ` [PATCH v3 15/21] iotests: Use complete_and_wait() in 155 Max Reitz
2020-01-30 21:44 ` [PATCH v3 16/21] iotests: Add VM.assert_block_path() Max Reitz
2020-02-04 10:33   ` Vladimir Sementsov-Ogievskiy
2020-02-04 14:09   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 17/21] iotests/041: Drop superfluous shutdowns Max Reitz
2020-02-04 11:40   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 18/21] iotests: Resolve TODOs in 041 Max Reitz
2020-02-04  9:54   ` Vladimir Sementsov-Ogievskiy
2020-01-30 21:44 ` [PATCH v3 19/21] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2020-01-30 21:44 ` [PATCH v3 20/21] iotests: Add tests for invalid Quorum @replaces Max Reitz
2020-01-30 21:44 ` [PATCH v3 21/21] iotests: Check that @replaces can replace filters Max Reitz
2020-02-04  9:56   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92b951da-5e12-e08f-f8f7-a862790b51ac@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).