All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>,
	qemu block <qemu-block@nongnu.org>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	qemu devel <qemu-devel@nongnu.org>,
	"Michael R. Hines" <mrhines@linux.vnet.ibm.com>,
	Max Reitz <mreitz@redhat.com>, Alberto Garcia <berto@igalia.com>,
	Gonglei <arei.gonglei@huawei.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Yang Hongyang <yanghy@cn.fujitsu.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: Re: [Qemu-devel] [PATCH 06/16] quorum: allow ignoring child errors
Date: Mon, 7 Sep 2015 17:56:39 +0100	[thread overview]
Message-ID: <20150907165638.GJ2337@work-vm> (raw)
In-Reply-To: <55E72422.30301@redhat.com>

* Eric Blake (eblake@redhat.com) wrote:
> On 09/02/2015 02:51 AM, Wen Congyang wrote:
> > If the child is not ready, read/write/getlength/flush will
> > return -errno. It is not critical error, and can be ignored:
> > 1. read/write:
> >    Just not report the error event.
> 
> What happens if all the children report an error?  Or is the threshold
> at play here?

I think it's interesting because in the COLO case the intention isn't
really about a threshold (in the way you might use for RAID or mirroring),
it's that one of the stores is local (and not expected to error) and one
is somewhere over a network, so if it fails you don't want to stop
the local VM working.

However, if it fails we do need to know about it; if any write to
the secondary stops then the fault-tolerance has failed (at least for
that drive); so we should do *something* - I'm not sure what though.

Dave


> For example, if you have a threshold of 3/5, then I'm assuming that if
> up to two children return an errno, then it is okay to ignore; but if
> three or more return an errno, you haven't met threshold, so the I/O
> must fail.
> 
> Are you ignoring ALL errors (including things like EACCES), or just EIO
> errors?
> 
> 
> > 2. getlength:
> >    just ignore it. If all children's getlength return -errno,
> >    and be ignored, return -EIO.
> > 3. flush:
> >    Just ignore it. If all children's getlength return -errno,
> 
> s/getlength/flush/
> 
> >    and be ignored, return 0.
> 
> Yuck - claiming success when all of the children fail feels dangerous.
> 
> > 
> > Usage: children.x.ignore-errors=true
> > 
> > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > Cc: Alberto Garcia <berto@igalia.com>
> > ---
> >  block/quorum.c       | 94 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 
> Interface review only:
> 
> > +++ b/qapi/block-core.json
> > @@ -1411,6 +1411,8 @@
> >  # @allow-write-backing-file: #optional whether the backing file is opened in
> >  #                            read-write mode. It is only for backing file
> >  #                            (Since 2.5 default: false)
> > +# @ignore-errors: #options whether the child's I/O error should be ignored.
> 
> s/options/optional/
> s/error/errors/
> 
> > +#                 it is only for quorum's child.(Since 2.5 default: false)
> 
> Space after '.' in English sentences.
> 
> The fact that you are documenting that this option can only be specified
> for quorum children makes me wonder if it belongs better as an option in
> BlockdevOptionsQuorum rather than BlockdevOptionsBase.
> 
> Semantically, it sounds like you are trying to allow for a per-child
> decision of whether this particular child's errors matter to the overall
> quorum.  So, if we have a 3/5 quorum, we can decide that for children A,
> B, C, and D, errors cannot be ignored, but for child E, errors are not a
> problem.
> 
> As written, you are tying the semantics to each child BDS, and requiring
> special code to double-check that the property is only ever set if the
> BDS is used as the child of a quorum.  Furthermore, if the property is
> set, you are then changing what the child does in response to various
> operations.
> 
> What if you instead create a list property in the quorum parent?  Maybe
> along the lines of:
> 
> # @child-errors-okay: #optional an array of named-node children where
> errors will be ignored (Since 2.5, default empty)
> 
> { 'struct': 'BlockdevOptionsQuorum',
>   'data': { '*blkverify': 'bool',
>             'children': [ 'BlockdevRef' ],
>             'vote-threshold': 'int',
>             '*rewrite-corrupted': 'bool',
>             '*read-pattern': 'QuorumReadPattern',
>             '*child-errors-okay': ['str'] } }
> 
> The above example of a 3/5 quorum, where only child E can ignore errors,
> would then be represented as:
> 
> { "children": [ "A", "B", "C", "D", "E" ], 'vote-threshold':3,
> 'child-errors-okay': [ "E" ] }
> 
> The code to ignore the errors is then done in the quorum itself (the BDS
> for E does not have to care about a special ignore-errors property, but
> just always returns the error as usual; and then the quorum is deciding
> how to handle the error), and you are not polluting the BDS state for
> something that is quorum-specific, because it is now the quorum itself
> that tracks the special casing.
> 
> Finally, why can't hot-plug/unplug of quorum members work?  If you are
> going to always ignore errors from a particular child, then why is that
> child even part of the quorum?  Isn't a better design to just not add
> the child to the quorum until it is ready and won't be reporting errors?
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  parent reply	other threads:[~2015-09-07 17:06 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-02  8:51 [Qemu-devel] [PATCH 00/16] Block replication for continuous checkpoints Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 01/16] introduce a new API to enable/disable attach device model Wen Congyang
2015-09-02 15:37   ` Eric Blake
2015-09-07  1:27     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 02/16] introduce a new API to check if blk is attached Wen Congyang
2015-09-02 15:40   ` Eric Blake
2015-09-02  8:51 ` [Qemu-devel] [PATCH 03/16] allow writing to the backing file Wen Congyang
2015-09-02 16:06   ` Eric Blake
2015-09-09  9:19     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 04/16] block: Allow references for backing files Wen Congyang
2015-09-02 18:50   ` Eric Blake
2015-09-09  8:51     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 05/16] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
2015-09-02 19:01   ` Eric Blake
2015-09-07  2:18     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 06/16] quorum: allow ignoring child errors Wen Congyang
2015-09-02 16:30   ` Eric Blake
2015-09-07  3:40     ` Wen Congyang
2015-09-07 16:56     ` Dr. David Alan Gilbert [this message]
2015-09-08  0:46       ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 07/16] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-09-02 14:10   ` Jeff Cody
2015-09-02  8:51 ` [Qemu-devel] [PATCH 08/16] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 09/16] Allow creating backup jobs when opening BDS Wen Congyang
2015-09-02 14:12   ` Jeff Cody
2015-09-02  8:51 ` [Qemu-devel] [PATCH 10/16] docs: block replication's description Wen Congyang
2015-09-02 20:41   ` Eric Blake
2015-09-09  8:22     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 11/16] Add new block driver interfaces to control block replication Wen Congyang
2015-09-02 16:33   ` Eric Blake
2015-09-09  9:24     ` Wen Congyang
2015-09-25  6:14     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 12/16] skip nbd_target when starting " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 13/16] quorum: implement block driver interfaces for " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 14/16] Implement new driver " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 15/16] support replication driver in blockdev-add Wen Congyang
2015-09-02 16:36   ` Eric Blake
2015-09-09  8:27     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 16/16] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150907165638.GJ2337@work-vm \
    --to=dgilbert@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=berto@igalia.com \
    --cc=eblake@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.