From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47210) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXmOW-0008Gc-9f for qemu-devel@nongnu.org; Mon, 22 Feb 2016 04:03:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aXmOT-00033T-41 for qemu-devel@nongnu.org; Mon, 22 Feb 2016 04:03:04 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49674) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXmOS-00033M-SN for qemu-devel@nongnu.org; Mon, 22 Feb 2016 04:03:01 -0500 Date: Mon, 22 Feb 2016 09:02:55 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20160222090254.GA2940@work-vm> References: <1455588944-29799-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1455588944-29799-2-git-send-email-xiecl.fnst@cn.fujitsu.com> <56C6D1CD.7010800@cn.fujitsu.com> <56C877F3.3070401@redhat.com> <56CA7DD8.9000403@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56CA7DD8.9000403@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH v2 1/1] quorum: Change vote rules for 64 bits hash List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Changlong Xie Cc: Kevin Wolf , Alberto Garcia , qemu devel , Max Reitz * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote: > On 02/20/2016 10:28 PM, Max Reitz wrote: > >On 19.02.2016 12:24, Alberto Garcia wrote: > >>On Fri 19 Feb 2016 09:26:53 AM CET, Wen Congyang wrote: > >> > >>>>>If quorum has two children(A, B). A do flush sucessfully, but B > >>>>>flush failed. We MUST choice A as winner rather than just pick > >>>>>anyone of them. Otherwise the filesystem of guest will become > >>>>>read-only with following errors: > >>>>> > >>>>>end_request: I/O error, dev vda, sector 11159960 > >>>>>Aborting journal on device vda3-8 > >>>>>EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal > >>>>>EXT4-fs (vda3): Remounting filesystem read-only > >>>> > >>>>Hi Xie, > >>>> > >>>>Let's see if I'm getting this right: > >>>> > >>>>- When Quorum flushes to disk, there's a vote among the return values of > >>>> the flush operations of its members, and the one that wins is the one > >>>> that Quorum returns. > >>>> > >>>>- If there's a tie then Quorum choses the first result from the list of > >>>> winners. > >>>> > >>>>- With your patch you want to give priority to the vote with result == 0 > >>>> if there's any, so Quorum would return 0 (and succeed). > >>>> > >>>>This seems to me like an ad-hoc fix for a particular use case. What > >>>>if you have 3 members and two of them fail with the same error code? > >>>>Would you still return 0 or the error code from the other two? > >>> > >>>For example: > >>>children.0 returns 0 > >>>children.1 returns -EIO > >>>children.2 returns -EPIPE > >>> > >>>In this case, quorum returns -EPIPE now(without this patch). > >>> > >>>For example: > >>>children.0 returns -EPIPE > >>>children.1 returns -EIO > >>>children.2 returns 0 > >>>In this case, quorum returns 0 now. > >> > >>My question is: what's the rationale for returning 0 in case a) but not > >>in case b)? > >> > >> a) > >> children.0 returns -EPIPE > >> children.1 returns -EIO > >> children.2 returns 0 > >> > >> b) > >> children.0 returns -EIO > >> children.1 returns -EIO > >> children.2 returns 0 > >> > >>In both cases you have one successful flush and two errors. You want to > >>return always 0 in case a) and always -EIO in case b). But the only > >>difference is that in case b) the errors happen to be the same, so why > >>does that matter? > >> > >>That said, I'm not very convinced of the current logics of the Quorum > >>flush code either, so it's not even a problem with your patch... it > >>seems to me that the code should follow the same logics as in the > >>read/write case: if the number of correct flushes >= threshold then > >>return 0, else select the most common error code. > > > >I'm not convinced of the logic either, which is why I waited for you to > >respond to this patch. :-) > > > >Intuitively, I'd expect Quorum to return an error if flushing failed for > >any of the children, because, well, flushing failed. I somehow feel like > >flushing is different from a read or write operation and therefore > >ignoring the threshold would be fine here. However, maybe my intuition > >is just off. > > > >Anyway, regardless of that, if we do take the threshold into account, we > >should not use the exact error value for voting but just whether an > >error occurred or not. If all but one children fail to flush (all for > >different reasons), I find it totally wrong to return success. We should > >then just return -EIO or something. > > > Hi Berto & Max > > Thanks for your comments, i'd like to have a summary here. For flush cases: > > 1) if flush successfully(result >= 0), result = 0; else if result < 0, > result = -EIO. then invoke quorum_count_vote > 2) if correct flushes >= threshold, mark correct flushes as winner directly. I find it difficult to understand how this corresponds to the behaviour needed in COLO, where we have the NBD and the real storage on the primary; in that case the failure of the real storage should give an error to the guest, but the failure of the NBD shouldn't produce a guest visible failure. Dave > Will fix in next version. > > Thanks > -Xie > >Max > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK