From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54857) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agUin-0002KD-N1 for qemu-devel@nongnu.org; Thu, 17 Mar 2016 06:00:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agUim-0002um-Fj for qemu-devel@nongnu.org; Thu, 17 Mar 2016 06:00:01 -0400 Date: Thu, 17 Mar 2016 09:59:48 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20160317095947.GA5966@work-vm> References: <1457578181-27111-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1457578181-27111-3-git-send-email-xiecl.fnst@cn.fujitsu.com> <56E653E0.9030808@cn.fujitsu.com> <56EA06E0.7000409@cn.fujitsu.com> <56EA7C62.3090000@cn.fujitsu.com> <20160317094831.GA2504@work-vm> <56EA7F39.9060504@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56EA7F39.9060504@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wen Congyang Cc: Kevin Wolf , Changlong Xie , Alberto Garcia , zhanghailiang , qemu block , Markus Armbruster , Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , Gonglei , Stefan Hajnoczi * Wen Congyang (wency@cn.fujitsu.com) wrote: > On 03/17/2016 05:48 PM, Dr. David Alan Gilbert wrote: > > * Wen Congyang (wency@cn.fujitsu.com) wrote: > >> On 03/17/2016 05:10 PM, Alberto Garcia wrote: > >>> On Thu 17 Mar 2016 02:22:40 AM CET, Wen Congyang wrote: > >>>>>>>> @@ -81,6 +82,8 @@ typedef struct BDRVQuorumState { > >>>>>>>> bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted > >>>>>>>> * block if Quorum is reached. > >>>>>>>> */ > >>>>>>>> + unsigned long *index_bitmap; > >>>>>> > >>>>>> Hi Berto > >>>>>> > >>>>>> *NOTE*, In the old version, we just used "bs->node_name", but in the > >>>>>> lastest one, as Kevin suggested we introduce > >>>>>> "child->child_name"(formart as "children.xxx"), this is the key cause > >>>>>> why we need this two functions here. > >>>>> > >>>>> I'm sorry I missed this discussion earlier. Your code seems technically > >>>>> correct but I have several questions: > >>>>> > >>>>> - I read that one of the reasons for this change is that "In theory, the > >>>>> same node could be attached twice to the same parent in different > >>>>> roles.". Is there any example of that? What's the use case? > >>>> > >>>> Kevin may know the case. > >>> > >>> Kevin, do you have an example? > >>> > >>>>> - How do you obtain the child name? > >>>> > >>>> IIRC, the answer is no now. I think we can improve 'info block' output > >>> > >>> Okay, but then we should extend that first, otherwise this API cannot be > >>> used. > >>> > >>>>> - I see that if you have children.0 and children.1 (let's say hd0.qcow2 > >>>>> and hd1.qcow2), then you remove children.0 and add it again, it will > >>>>> keep the 'children.0' name (that's what the bitmap is for if I'm > >>>>> understanding it correctly). However the position in the s->children > >>>>> array will change because you do memmove() when you remove children.0 > >>>>> and then add it again to the end of the array. > >>>>> > >>>>> Initial status: > >>>>> > >>>>> s->children[0] <--> "children.0" (hd0.qcow2) > >>>>> s->children[1] <--> "children.1" (hd1.qcow2) > >>>>> > >>>>> children.0 (hd0.qcow2) is removed: > >>>>> > >>>>> s->children[0] <--> "children.1" (hd1.qcow2) > >>>>> > >>>>> children.0 (hd0.qcow2) is added again: > >>>>> > >>>>> s->children[0] <--> "children.1" (hd1.qcow2) > >>>>> s->children[1] <--> "children.0" (hd0.qcow2) > >>>> > >>>> Yes, it is correct. > >>>> > >>>>> > >>>>> Is this correct? Is this the indented behavior? Since you are reading > >>>>> in FIFO mode, now hd1.qcow2 will always be read first, so if > >>>>> children.1 was the secondary disk, it has just become the primary. > >>>> > >>>> Yes. > >>> > >>> And don't you need a way to control the order in which the disks must be > >>> read for COLO? > >> > >> I think in fifo mode, we should read the disk first that is added earlier. > >> > >> We don't need a way to control the order now. > > > > Can you document fully how it's used in COLO then? > > Do you mean document it in docs/block-replication.txt? That would be OK. > > We should have the failure modes documented, and how you'll use > > it after failover etc Without that it's really difficult to tell > > if this naming is right. > > For COLO, children.0 is the real disk, children.1 is replication driver. > After failure, children.1 will be removed by the user. If we want to > continue do COLO, we need add a new children.1 again. So you need to document how to do that. > > The children.0 notation is really confusing in the way that Berto > > describes; I hit this a couple of months ago and it really doesn't > > make sense. > > Do you mean: read from children.1 first, and then read from children.0 in > fifo mode? Yes, the behavior is very strange. I mean the 'children.0' 'children.1' naming is just very confusing. Also because the order in the array is important it's even more confusing since the 'children.1' isn't necessarily the children[1]. Dave > > Thanks > Wen Congyang > > > > > Dave > > > >> > >> Thanks > >> Wen Congyang > >> > >>> > >>> Berto > >>> > >>> > >>> . > >>> > >> > >> > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > > . > > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK