From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38859) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akxWW-00044J-2W for qemu-devel@nongnu.org; Tue, 29 Mar 2016 13:33:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1akxWS-0004lk-1K for qemu-devel@nongnu.org; Tue, 29 Mar 2016 13:33:48 -0400 Date: Tue, 29 Mar 2016 18:33:26 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160329173326.GK2240@work-vm> References: <20160317094831.GA2504@work-vm> <56EA7F39.9060504@cn.fujitsu.com> <56FAA168.9090304@redhat.com> <56FAA2C4.3000002@redhat.com> <20160329155024.GH2240@work-vm> <56FAA4BB.3080300@redhat.com> <20160329155426.GI2240@work-vm> <56FAA65C.3080107@redhat.com> <20160329160309.GJ2240@work-vm> <56FAA8A3.6010604@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56FAA8A3.6010604@redhat.com> Subject: Re: [Qemu-devel] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: Kevin Wolf , Changlong Xie , Alberto Garcia , qemu block , Jiang Yunhong , Dong Eddie , qemu devel , Markus Armbruster , Gonglei , Stefan Hajnoczi , zhanghailiang * Max Reitz (mreitz@redhat.com) wrote: > On 29.03.2016 18:03, Dr. David Alan Gilbert wrote: > > * Max Reitz (mreitz@redhat.com) wrote: > >> On 29.03.2016 17:54, Dr. David Alan Gilbert wrote: > >>> * Max Reitz (mreitz@redhat.com) wrote: > >>>> On 29.03.2016 17:50, Dr. David Alan Gilbert wrote: > >>>>> * Eric Blake (eblake@redhat.com) wrote: > >>>>>> On 03/29/2016 09:38 AM, Max Reitz wrote: > >>>>>>> On 17.03.2016 10:56, Wen Congyang wrote: > >>>>>>>> On 03/17/2016 05:48 PM, Dr. David Alan Gilbert wrote: > >>>>>>> > >>>>>>> [...] > >>>>>>> > >>>>>>>>> The children.0 notation is really confusing in the way that Berto > >>>>>>>>> describes; I hit this a couple of months ago and it really doesn't > >>>>>>>>> make sense. > >>>>>>>> > >>>>>>>> Do you mean: read from children.1 first, and then read from children.0 in > >>>>>>>> fifo mode? Yes, the behavior is very strange. > >>>>>>> > >>>>>>> So is this intended or is it not? In > >>>>>>> http://lists.nongnu.org/archive/html/qemu-block/2016-03/msg00526.html > >>>>>>> you said that it is. > >>>>>>> > >>>>>>> I myself would indeed say it is very strange. If I were a user, I would > >>>>>>> not expect this behavior. And as I developer, I think that how a BDS's > >>>>>>> child is used by its parent should solely depend on its role (e.g. > >>>>>>> whether it is "children.0" or "children.1"). > >>>>>> > >>>>>> It sounds like the argument here, and in Max's thread on > >>>>>> query-block-node-tree, is that we DO have cases where order matters, and > >>>>>> so we need a way for the hot-add operation to explicitly specify where > >>>>>> in the list a child is inserted (whether it is being inserted as the new > >>>>>> primary image, or explicitly as the last resort, or somewhere in the > >>>>>> middle). An optional parameter, that defaults to appending, may be ok, > >>>>>> but we definitely need to consider how the order of children is affected > >>>>>> by hot-add. > >>>>> > >>>>> Certainly in the COLO case the two children are not identical; and IMHO we need > >>>>> to get away from thinking about ordering and start thinking about functional > >>>>> namingd - children.0/children.1 doesn't suggest the fact they behave > >>>>> differently. > >>>> > >>>> To me it does. If quorum is operating in a mode call "FIFO" I would > >>>> expect some order on the child nodes, and if the child nodes are > >>>> actually numbered in an ascending order, that is an obvious order. > >>> > >>> I don't understand why it's called 'FIFO'. > >> > >> Because in that mode quorum successively reads from all of its children > >> and returns the first successful result. So the First successful Input > >> is the one that becomes quorum's Output (there isn't much of a > >> successive output, so it doesn't make much sense to call that the First > >> Output, though...). > >> > >> I didn't name it, though. *waves hands defensively* :-) > > > > But that description doesn't make sense for what COLO uses it for. > > > > They have, on the primary host: > > 0) Local disk > > 1) an NBD connection to the secondary > > > > So in theory a read should always happen from (0) and writes should > > go to both. > > Well that's the way it works, isn't it? > > I didn't mention what happens with writes, but those are indeed > distributed to all of quorum's children. And as long as the local disk > doesn't fail, data is always read from it alone. I guess so, but it seems to be odd to name something after an ordering when you never expect it to actually perform the read from anything other than the first; and certainly for fault tolerance stuff I think it's important to define the failure modes. Dave > All you need to do is make sure that the local disk is the first node in > whatever order FIFO is supposed to use. > > Max > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK