qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
To: Kevin Wolf <kwolf@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Alberto Garcia <berto@igalia.com>,
	qemu devel <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 1/1] quorum: Change vote rules for 64 bits hash
Date: Tue, 23 Feb 2016 10:55:33 +0800	[thread overview]
Message-ID: <56CBCA25.7090009@cn.fujitsu.com> (raw)
In-Reply-To: <20160222103455.GC5387@noname.str.redhat.com>

On 02/22/2016 06:34 PM, Kevin Wolf wrote:
> Am 22.02.2016 um 10:02 hat Dr. David Alan Gilbert geschrieben:
>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>>> On 02/20/2016 10:28 PM, Max Reitz wrote:
>>>> On 19.02.2016 12:24, Alberto Garcia wrote:
>>>>> On Fri 19 Feb 2016 09:26:53 AM CET, Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>>>
>>>>>>>> If quorum has two children(A, B). A do flush sucessfully, but B
>>>>>>>> flush failed.  We MUST choice A as winner rather than just pick
>>>>>>>> anyone of them. Otherwise the filesystem of guest will become
>>>>>>>> read-only with following errors:
>>>>>>>>
>>>>>>>> end_request: I/O error, dev vda, sector 11159960
>>>>>>>> Aborting journal on device vda3-8
>>>>>>>> EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal
>>>>>>>> EXT4-fs (vda3): Remounting filesystem read-only
>>>>>>>
>>>>>>> Hi Xie,
>>>>>>>
>>>>>>> Let's see if I'm getting this right:
>>>>>>>
>>>>>>> - When Quorum flushes to disk, there's a vote among the return values of
>>>>>>>    the flush operations of its members, and the one that wins is the one
>>>>>>>    that Quorum returns.
>>>>>>>
>>>>>>> - If there's a tie then Quorum choses the first result from the list of
>>>>>>>    winners.
>>>>>>>
>>>>>>> - With your patch you want to give priority to the vote with result == 0
>>>>>>>    if there's any, so Quorum would return 0 (and succeed).
>>>>>>>
>>>>>>> This seems to me like an ad-hoc fix for a particular use case. What
>>>>>>> if you have 3 members and two of them fail with the same error code?
>>>>>>> Would you still return 0 or the error code from the other two?
>>>>>>
>>>>>> For example:
>>>>>> children.0 returns 0
>>>>>> children.1 returns -EIO
>>>>>> children.2 returns -EPIPE
>>>>>>
>>>>>> In this case, quorum returns -EPIPE now(without this patch).
>>>>>>
>>>>>> For example:
>>>>>> children.0 returns -EPIPE
>>>>>> children.1 returns -EIO
>>>>>> children.2 returns 0
>>>>>> In this case, quorum returns 0 now.
>>>>>
>>>>> My question is: what's the rationale for returning 0 in case a) but not
>>>>> in case b)?
>>>>>
>>>>>    a)
>>>>>      children.0 returns -EPIPE
>>>>>      children.1 returns -EIO
>>>>>      children.2 returns 0
>>>>>
>>>>>    b)
>>>>>      children.0 returns -EIO
>>>>>      children.1 returns -EIO
>>>>>      children.2 returns 0
>>>>>
>>>>> In both cases you have one successful flush and two errors. You want to
>>>>> return always 0 in case a) and always -EIO in case b). But the only
>>>>> difference is that in case b) the errors happen to be the same, so why
>>>>> does that matter?
>>>>>
>>>>> That said, I'm not very convinced of the current logics of the Quorum
>>>>> flush code either, so it's not even a problem with your patch... it
>>>>> seems to me that the code should follow the same logics as in the
>>>>> read/write case: if the number of correct flushes >= threshold then
>>>>> return 0, else select the most common error code.
>>>>
>>>> I'm not convinced of the logic either, which is why I waited for you to
>>>> respond to this patch. :-)
>>>>
>>>> Intuitively, I'd expect Quorum to return an error if flushing failed for
>>>> any of the children, because, well, flushing failed. I somehow feel like
>>>> flushing is different from a read or write operation and therefore
>>>> ignoring the threshold would be fine here. However, maybe my intuition
>>>> is just off.
>>>>
>>>> Anyway, regardless of that, if we do take the threshold into account, we
>>>> should not use the exact error value for voting but just whether an
>>>> error occurred or not. If all but one children fail to flush (all for
>>>> different reasons), I find it totally wrong to return success. We should
>>>> then just return -EIO or something.
>>>>
>>> Hi Berto & Max
>>>
>>> Thanks for your comments, i'd like to have a summary here. For flush cases:
>>>
>>> 1) if flush successfully(result >= 0), result = 0; else if result < 0,
>>> result = -EIO. then invoke quorum_count_vote
>>> 2) if correct flushes >= threshold, mark correct flushes as winner directly.
>
> Please try to return one of the real error codes instead of -EIO.
> Essentially we should use the same logic as for writes (like Berto
> suggested above).

Yes, i correct myself. It seems everyone reaches an agreement in this 
thread, and will use the same logic as writes in next version.

Thanks
	-Xie
>
>> I find it difficult to understand how this corresponds to the behaviour needed
>> in COLO, where we have the NBD and the real storage on the primary; in that
>> case the failure of the real storage should give an error to the guest, but the
>> failure of the NBD shouldn't produce a guest visible failure.
>
> That's probably because you're abusing quorum as an active mirroring
> filter, which it really isn't. This is okay for now so you have
> something to work with, but I expect that eventually you'll need a
> different driver. (Well, or maybe I'm mistaken and you actually do need
> to read back data from NBD to compare it to the real storage - do you?)
>
> Anyway, I'm curious how you would handle a failed write/flush request to
> the NBD target. Simply ignoring it doesn't feel right; in case of a
> failover, wouldn't you switch to a potentially corrupted image then?
>
> Kevin
>
>
> .
>

  parent reply	other threads:[~2016-02-23  2:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16  2:15 [Qemu-devel] [PATCH v2 0/1] change quorum vote rules for 64-bits hash Changlong Xie
2016-02-16  2:15 ` [Qemu-devel] [PATCH v2 1/1] quorum: Change vote rules for 64 bits hash Changlong Xie
2016-02-18 10:00   ` Dr. David Alan Gilbert
2016-02-18 15:16   ` Alberto Garcia
2016-02-19  8:26     ` Wen Congyang
2016-02-19 11:24       ` Alberto Garcia
2016-02-20 14:28         ` Max Reitz
2016-02-22  3:17           ` Changlong Xie
2016-02-22  9:02             ` Dr. David Alan Gilbert
2016-02-22  9:52               ` Changlong Xie
2016-02-22  9:59                 ` Dr. David Alan Gilbert
2016-02-22 10:34               ` Kevin Wolf
2016-02-22 10:39                 ` Dr. David Alan Gilbert
2016-02-23  2:55                 ` Changlong Xie [this message]
2016-02-22 13:31           ` Alberto Garcia
2016-02-22 13:43             ` Alberto Garcia
2016-02-22 16:37             ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CBCA25.7090009@cn.fujitsu.com \
    --to=xiecl.fnst@cn.fujitsu.com \
    --cc=berto@igalia.com \
    --cc=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).