From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39215) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZYzrw-0006gd-DU for qemu-devel@nongnu.org; Mon, 07 Sep 2015 13:06:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZYzrv-0005dC-4L for qemu-devel@nongnu.org; Mon, 07 Sep 2015 13:06:12 -0400 Date: Mon, 7 Sep 2015 17:56:39 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150907165638.GJ2337@work-vm> References: <1441183880-26993-1-git-send-email-wency@cn.fujitsu.com> <1441183880-26993-7-git-send-email-wency@cn.fujitsu.com> <55E72422.30301@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55E72422.30301@redhat.com> Subject: Re: [Qemu-devel] [PATCH 06/16] quorum: allow ignoring child errors List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Kevin Wolf , Fam Zheng , qemu block , Jiang Yunhong , Dong Eddie , qemu devel , "Michael R. Hines" , Max Reitz , Alberto Garcia , Gonglei , Stefan Hajnoczi , Paolo Bonzini , Yang Hongyang , zhanghailiang * Eric Blake (eblake@redhat.com) wrote: > On 09/02/2015 02:51 AM, Wen Congyang wrote: > > If the child is not ready, read/write/getlength/flush will > > return -errno. It is not critical error, and can be ignored: > > 1. read/write: > > Just not report the error event. > > What happens if all the children report an error? Or is the threshold > at play here? I think it's interesting because in the COLO case the intention isn't really about a threshold (in the way you might use for RAID or mirroring), it's that one of the stores is local (and not expected to error) and one is somewhere over a network, so if it fails you don't want to stop the local VM working. However, if it fails we do need to know about it; if any write to the secondary stops then the fault-tolerance has failed (at least for that drive); so we should do *something* - I'm not sure what though. Dave > For example, if you have a threshold of 3/5, then I'm assuming that if > up to two children return an errno, then it is okay to ignore; but if > three or more return an errno, you haven't met threshold, so the I/O > must fail. > > Are you ignoring ALL errors (including things like EACCES), or just EIO > errors? > > > > 2. getlength: > > just ignore it. If all children's getlength return -errno, > > and be ignored, return -EIO. > > 3. flush: > > Just ignore it. If all children's getlength return -errno, > > s/getlength/flush/ > > > and be ignored, return 0. > > Yuck - claiming success when all of the children fail feels dangerous. > > > > > Usage: children.x.ignore-errors=true > > > > Signed-off-by: Wen Congyang > > Signed-off-by: zhanghailiang > > Signed-off-by: Gonglei > > Cc: Alberto Garcia > > --- > > block/quorum.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++---- > > Interface review only: > > > +++ b/qapi/block-core.json > > @@ -1411,6 +1411,8 @@ > > # @allow-write-backing-file: #optional whether the backing file is opened in > > # read-write mode. It is only for backing file > > # (Since 2.5 default: false) > > +# @ignore-errors: #options whether the child's I/O error should be ignored. > > s/options/optional/ > s/error/errors/ > > > +# it is only for quorum's child.(Since 2.5 default: false) > > Space after '.' in English sentences. > > The fact that you are documenting that this option can only be specified > for quorum children makes me wonder if it belongs better as an option in > BlockdevOptionsQuorum rather than BlockdevOptionsBase. > > Semantically, it sounds like you are trying to allow for a per-child > decision of whether this particular child's errors matter to the overall > quorum. So, if we have a 3/5 quorum, we can decide that for children A, > B, C, and D, errors cannot be ignored, but for child E, errors are not a > problem. > > As written, you are tying the semantics to each child BDS, and requiring > special code to double-check that the property is only ever set if the > BDS is used as the child of a quorum. Furthermore, if the property is > set, you are then changing what the child does in response to various > operations. > > What if you instead create a list property in the quorum parent? Maybe > along the lines of: > > # @child-errors-okay: #optional an array of named-node children where > errors will be ignored (Since 2.5, default empty) > > { 'struct': 'BlockdevOptionsQuorum', > 'data': { '*blkverify': 'bool', > 'children': [ 'BlockdevRef' ], > 'vote-threshold': 'int', > '*rewrite-corrupted': 'bool', > '*read-pattern': 'QuorumReadPattern', > '*child-errors-okay': ['str'] } } > > The above example of a 3/5 quorum, where only child E can ignore errors, > would then be represented as: > > { "children": [ "A", "B", "C", "D", "E" ], 'vote-threshold':3, > 'child-errors-okay': [ "E" ] } > > The code to ignore the errors is then done in the quorum itself (the BDS > for E does not have to care about a special ignore-errors property, but > just always returns the error as usual; and then the quorum is deciding > how to handle the error), and you are not polluting the BDS state for > something that is quorum-specific, because it is now the quorum itself > that tracks the special casing. > > Finally, why can't hot-plug/unplug of quorum members work? If you are > going to always ignore errors from a particular child, then why is that > child even part of the quorum? Isn't a better design to just not add > the child to the quorum until it is ready and won't be reporting errors? > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK