From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60539) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXnHS-0003VE-7X for qemu-devel@nongnu.org; Mon, 22 Feb 2016 04:59:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aXnHP-0007s9-07 for qemu-devel@nongnu.org; Mon, 22 Feb 2016 04:59:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34421) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXnHO-0007s3-P3 for qemu-devel@nongnu.org; Mon, 22 Feb 2016 04:59:46 -0500 Date: Mon, 22 Feb 2016 09:59:41 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20160222095940.GD2940@work-vm> References: <1455588944-29799-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1455588944-29799-2-git-send-email-xiecl.fnst@cn.fujitsu.com> <56C6D1CD.7010800@cn.fujitsu.com> <56C877F3.3070401@redhat.com> <56CA7DD8.9000403@cn.fujitsu.com> <20160222090254.GA2940@work-vm> <56CADA72.7040604@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56CADA72.7040604@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH v2 1/1] quorum: Change vote rules for 64 bits hash List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Changlong Xie Cc: Kevin Wolf , Alberto Garcia , qemu devel , Max Reitz * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote: > On 02/22/2016 05:02 PM, Dr. David Alan Gilbert wrote: > >* Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote: > >>On 02/20/2016 10:28 PM, Max Reitz wrote: > >>>On 19.02.2016 12:24, Alberto Garcia wrote: > >>>>On Fri 19 Feb 2016 09:26:53 AM CET, Wen Congyang wrote: > >>>> > >>>>>>>If quorum has two children(A, B). A do flush sucessfully, but B > >>>>>>>flush failed. We MUST choice A as winner rather than just pick > >>>>>>>anyone of them. Otherwise the filesystem of guest will become > >>>>>>>read-only with following errors: > >>>>>>> > >>>>>>>end_request: I/O error, dev vda, sector 11159960 > >>>>>>>Aborting journal on device vda3-8 > >>>>>>>EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal > >>>>>>>EXT4-fs (vda3): Remounting filesystem read-only > >>>>>> > >>>>>>Hi Xie, > >>>>>> > >>>>>>Let's see if I'm getting this right: > >>>>>> > >>>>>>- When Quorum flushes to disk, there's a vote among the return values of > >>>>>> the flush operations of its members, and the one that wins is the one > >>>>>> that Quorum returns. > >>>>>> > >>>>>>- If there's a tie then Quorum choses the first result from the list of > >>>>>> winners. > >>>>>> > >>>>>>- With your patch you want to give priority to the vote with result == 0 > >>>>>> if there's any, so Quorum would return 0 (and succeed). > >>>>>> > >>>>>>This seems to me like an ad-hoc fix for a particular use case. What > >>>>>>if you have 3 members and two of them fail with the same error code? > >>>>>>Would you still return 0 or the error code from the other two? > >>>>> > >>>>>For example: > >>>>>children.0 returns 0 > >>>>>children.1 returns -EIO > >>>>>children.2 returns -EPIPE > >>>>> > >>>>>In this case, quorum returns -EPIPE now(without this patch). > >>>>> > >>>>>For example: > >>>>>children.0 returns -EPIPE > >>>>>children.1 returns -EIO > >>>>>children.2 returns 0 > >>>>>In this case, quorum returns 0 now. > >>>> > >>>>My question is: what's the rationale for returning 0 in case a) but not > >>>>in case b)? > >>>> > >>>> a) > >>>> children.0 returns -EPIPE > >>>> children.1 returns -EIO > >>>> children.2 returns 0 > >>>> > >>>> b) > >>>> children.0 returns -EIO > >>>> children.1 returns -EIO > >>>> children.2 returns 0 > >>>> > >>>>In both cases you have one successful flush and two errors. You want to > >>>>return always 0 in case a) and always -EIO in case b). But the only > >>>>difference is that in case b) the errors happen to be the same, so why > >>>>does that matter? > >>>> > >>>>That said, I'm not very convinced of the current logics of the Quorum > >>>>flush code either, so it's not even a problem with your patch... it > >>>>seems to me that the code should follow the same logics as in the > >>>>read/write case: if the number of correct flushes >= threshold then > >>>>return 0, else select the most common error code. > >>> > >>>I'm not convinced of the logic either, which is why I waited for you to > >>>respond to this patch. :-) > >>> > >>>Intuitively, I'd expect Quorum to return an error if flushing failed for > >>>any of the children, because, well, flushing failed. I somehow feel like > >>>flushing is different from a read or write operation and therefore > >>>ignoring the threshold would be fine here. However, maybe my intuition > >>>is just off. > >>> > >>>Anyway, regardless of that, if we do take the threshold into account, we > >>>should not use the exact error value for voting but just whether an > >>>error occurred or not. If all but one children fail to flush (all for > >>>different reasons), I find it totally wrong to return success. We should > >>>then just return -EIO or something. > >>> > >>Hi Berto & Max > >> > >>Thanks for your comments, i'd like to have a summary here. For flush cases: > >> > >>1) if flush successfully(result >= 0), result = 0; else if result < 0, > >>result = -EIO. then invoke quorum_count_vote > >>2) if correct flushes >= threshold, mark correct flushes as winner directly. > > > >I find it difficult to understand how this corresponds to the behaviour needed > >in COLO, where we have the NBD and the real storage on the primary; in that > >case the failure of the real storage should give an error to the guest, but the > >failure of the NBD shouldn't produce a guest visible failure. > > > Hi Dave > > "in that case the failure of the real storage should give an error to the > guest, but the failure of the NBD shouldn't produce a guest visible > failure." > This is just what i think :), but there is a restricted condition > . > 1) If the guest *Must* fetch the return code for flush operation? This is > prerequisite. > 2) If no. Since Colo and Quorum are independent of each other, so quorum > don't know if we are in colo mode. I think the only way to implement your > idea is:"pass some parameters such as flush= on/off to each quorum child". I'm not sure why flush is special; but either way I think for Quorum the answer is to have a flag per-child to say whether a failure on that child should be visible to the guest. Dave > BTW, i've just sent out V3, and thought it make scense. > > Thanks > -Xie > >Dave > > > >>Will fix in next version. > >> > >>Thanks > >> -Xie > >>>Max > >>> > >> > >> > >-- > >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > >. > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK