qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>,
	zhanghailiang <zhang.zhanghailiang@huawei.com>,
	qemu block <qemu-block@nongnu.org>,
	Jiang Yunhong <yunhong.jiang@intel.com>,
	Dong Eddie <eddie.dong@intel.com>,
	qemu devel <qemu-devel@nongnu.org>,
	"Michael R. Hines" <mrhines@linux.vnet.ibm.com>,
	Max Reitz <mreitz@redhat.com>, Alberto Garcia <berto@igalia.com>,
	Gonglei <arei.gonglei@huawei.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Yang Hongyang <yanghy@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH 06/16] quorum: allow ignoring child errors
Date: Tue, 8 Sep 2015 08:46:36 +0800	[thread overview]
Message-ID: <55EE2FEC.8080408@cn.fujitsu.com> (raw)
In-Reply-To: <20150907165638.GJ2337@work-vm>

On 09/08/2015 12:56 AM, Dr. David Alan Gilbert wrote:
> * Eric Blake (eblake@redhat.com) wrote:
>> On 09/02/2015 02:51 AM, Wen Congyang wrote:
>>> If the child is not ready, read/write/getlength/flush will
>>> return -errno. It is not critical error, and can be ignored:
>>> 1. read/write:
>>>    Just not report the error event.
>>
>> What happens if all the children report an error?  Or is the threshold
>> at play here?
> 
> I think it's interesting because in the COLO case the intention isn't
> really about a threshold (in the way you might use for RAID or mirroring),
> it's that one of the stores is local (and not expected to error) and one
> is somewhere over a network, so if it fails you don't want to stop
> the local VM working.
> 
> However, if it fails we do need to know about it; if any write to
> the secondary stops then the fault-tolerance has failed (at least for
> that drive); so we should do *something* - I'm not sure what though.

The filter driver replication catches this error, and the COLO framework
will get this error when doing checkpoint.

Thanks
Wen Congyang

> 
> Dave
> 
> 
>> For example, if you have a threshold of 3/5, then I'm assuming that if
>> up to two children return an errno, then it is okay to ignore; but if
>> three or more return an errno, you haven't met threshold, so the I/O
>> must fail.
>>
>> Are you ignoring ALL errors (including things like EACCES), or just EIO
>> errors?
>>
>>
>>> 2. getlength:
>>>    just ignore it. If all children's getlength return -errno,
>>>    and be ignored, return -EIO.
>>> 3. flush:
>>>    Just ignore it. If all children's getlength return -errno,
>>
>> s/getlength/flush/
>>
>>>    and be ignored, return 0.
>>
>> Yuck - claiming success when all of the children fail feels dangerous.
>>
>>>
>>> Usage: children.x.ignore-errors=true
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>> Cc: Alberto Garcia <berto@igalia.com>
>>> ---
>>>  block/quorum.c       | 94 ++++++++++++++++++++++++++++++++++++++++++++++++----
>>
>> Interface review only:
>>
>>> +++ b/qapi/block-core.json
>>> @@ -1411,6 +1411,8 @@
>>>  # @allow-write-backing-file: #optional whether the backing file is opened in
>>>  #                            read-write mode. It is only for backing file
>>>  #                            (Since 2.5 default: false)
>>> +# @ignore-errors: #options whether the child's I/O error should be ignored.
>>
>> s/options/optional/
>> s/error/errors/
>>
>>> +#                 it is only for quorum's child.(Since 2.5 default: false)
>>
>> Space after '.' in English sentences.
>>
>> The fact that you are documenting that this option can only be specified
>> for quorum children makes me wonder if it belongs better as an option in
>> BlockdevOptionsQuorum rather than BlockdevOptionsBase.
>>
>> Semantically, it sounds like you are trying to allow for a per-child
>> decision of whether this particular child's errors matter to the overall
>> quorum.  So, if we have a 3/5 quorum, we can decide that for children A,
>> B, C, and D, errors cannot be ignored, but for child E, errors are not a
>> problem.
>>
>> As written, you are tying the semantics to each child BDS, and requiring
>> special code to double-check that the property is only ever set if the
>> BDS is used as the child of a quorum.  Furthermore, if the property is
>> set, you are then changing what the child does in response to various
>> operations.
>>
>> What if you instead create a list property in the quorum parent?  Maybe
>> along the lines of:
>>
>> # @child-errors-okay: #optional an array of named-node children where
>> errors will be ignored (Since 2.5, default empty)
>>
>> { 'struct': 'BlockdevOptionsQuorum',
>>   'data': { '*blkverify': 'bool',
>>             'children': [ 'BlockdevRef' ],
>>             'vote-threshold': 'int',
>>             '*rewrite-corrupted': 'bool',
>>             '*read-pattern': 'QuorumReadPattern',
>>             '*child-errors-okay': ['str'] } }
>>
>> The above example of a 3/5 quorum, where only child E can ignore errors,
>> would then be represented as:
>>
>> { "children": [ "A", "B", "C", "D", "E" ], 'vote-threshold':3,
>> 'child-errors-okay': [ "E" ] }
>>
>> The code to ignore the errors is then done in the quorum itself (the BDS
>> for E does not have to care about a special ignore-errors property, but
>> just always returns the error as usual; and then the quorum is deciding
>> how to handle the error), and you are not polluting the BDS state for
>> something that is quorum-specific, because it is now the quorum itself
>> that tracks the special casing.
>>
>> Finally, why can't hot-plug/unplug of quorum members work?  If you are
>> going to always ignore errors from a particular child, then why is that
>> child even part of the quorum?  Isn't a better design to just not add
>> the child to the quorum until it is ready and won't be reporting errors?
>>
>> -- 
>> Eric Blake   eblake redhat com    +1-919-301-3266
>> Libvirt virtualization library http://libvirt.org
>>
> 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

  reply	other threads:[~2015-09-08  0:47 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-02  8:51 [Qemu-devel] [PATCH 00/16] Block replication for continuous checkpoints Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 01/16] introduce a new API to enable/disable attach device model Wen Congyang
2015-09-02 15:37   ` Eric Blake
2015-09-07  1:27     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 02/16] introduce a new API to check if blk is attached Wen Congyang
2015-09-02 15:40   ` Eric Blake
2015-09-02  8:51 ` [Qemu-devel] [PATCH 03/16] allow writing to the backing file Wen Congyang
2015-09-02 16:06   ` Eric Blake
2015-09-09  9:19     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 04/16] block: Allow references for backing files Wen Congyang
2015-09-02 18:50   ` Eric Blake
2015-09-09  8:51     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 05/16] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
2015-09-02 19:01   ` Eric Blake
2015-09-07  2:18     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 06/16] quorum: allow ignoring child errors Wen Congyang
2015-09-02 16:30   ` Eric Blake
2015-09-07  3:40     ` Wen Congyang
2015-09-07 16:56     ` Dr. David Alan Gilbert
2015-09-08  0:46       ` Wen Congyang [this message]
2015-09-02  8:51 ` [Qemu-devel] [PATCH 07/16] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-09-02 14:10   ` Jeff Cody
2015-09-02  8:51 ` [Qemu-devel] [PATCH 08/16] block: make bdrv_put_ref_bh_schedule() as a public API Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 09/16] Allow creating backup jobs when opening BDS Wen Congyang
2015-09-02 14:12   ` Jeff Cody
2015-09-02  8:51 ` [Qemu-devel] [PATCH 10/16] docs: block replication's description Wen Congyang
2015-09-02 20:41   ` Eric Blake
2015-09-09  8:22     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 11/16] Add new block driver interfaces to control block replication Wen Congyang
2015-09-02 16:33   ` Eric Blake
2015-09-09  9:24     ` Wen Congyang
2015-09-25  6:14     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 12/16] skip nbd_target when starting " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 13/16] quorum: implement block driver interfaces for " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 14/16] Implement new driver " Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 15/16] support replication driver in blockdev-add Wen Congyang
2015-09-02 16:36   ` Eric Blake
2015-09-09  8:27     ` Wen Congyang
2015-09-02  8:51 ` [Qemu-devel] [PATCH 16/16] Add a new API to start/stop replication, do checkpoint to all BDSes Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EE2FEC.8080408@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=arei.gonglei@huawei.com \
    --cc=berto@igalia.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).