From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53863) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agUdo-00051Q-Hz for qemu-devel@nongnu.org; Thu, 17 Mar 2016 05:54:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agUdn-0000ST-FI for qemu-devel@nongnu.org; Thu, 17 Mar 2016 05:54:52 -0400 References: <1457578181-27111-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1457578181-27111-3-git-send-email-xiecl.fnst@cn.fujitsu.com> <56E653E0.9030808@cn.fujitsu.com> <56EA06E0.7000409@cn.fujitsu.com> <56EA7C62.3090000@cn.fujitsu.com> <20160317094831.GA2504@work-vm> From: Wen Congyang Message-ID: <56EA7F39.9060504@cn.fujitsu.com> Date: Thu, 17 Mar 2016 17:56:09 +0800 MIME-Version: 1.0 In-Reply-To: <20160317094831.GA2504@work-vm> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Kevin Wolf , Changlong Xie , Alberto Garcia , zhanghailiang , qemu block , Markus Armbruster , Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , Gonglei , Stefan Hajnoczi On 03/17/2016 05:48 PM, Dr. David Alan Gilbert wrote: > * Wen Congyang (wency@cn.fujitsu.com) wrote: >> On 03/17/2016 05:10 PM, Alberto Garcia wrote: >>> On Thu 17 Mar 2016 02:22:40 AM CET, Wen Congyang wrote: >>>>>>>> @@ -81,6 +82,8 @@ typedef struct BDRVQuorumState { >>>>>>>> bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted >>>>>>>> * block if Quorum is reached. >>>>>>>> */ >>>>>>>> + unsigned long *index_bitmap; >>>>>> >>>>>> Hi Berto >>>>>> >>>>>> *NOTE*, In the old version, we just used "bs->node_name", but in the >>>>>> lastest one, as Kevin suggested we introduce >>>>>> "child->child_name"(formart as "children.xxx"), this is the key cause >>>>>> why we need this two functions here. >>>>> >>>>> I'm sorry I missed this discussion earlier. Your code seems technically >>>>> correct but I have several questions: >>>>> >>>>> - I read that one of the reasons for this change is that "In theory, the >>>>> same node could be attached twice to the same parent in different >>>>> roles.". Is there any example of that? What's the use case? >>>> >>>> Kevin may know the case. >>> >>> Kevin, do you have an example? >>> >>>>> - How do you obtain the child name? >>>> >>>> IIRC, the answer is no now. I think we can improve 'info block' output >>> >>> Okay, but then we should extend that first, otherwise this API cannot be >>> used. >>> >>>>> - I see that if you have children.0 and children.1 (let's say hd0.qcow2 >>>>> and hd1.qcow2), then you remove children.0 and add it again, it will >>>>> keep the 'children.0' name (that's what the bitmap is for if I'm >>>>> understanding it correctly). However the position in the s->children >>>>> array will change because you do memmove() when you remove children.0 >>>>> and then add it again to the end of the array. >>>>> >>>>> Initial status: >>>>> >>>>> s->children[0] <--> "children.0" (hd0.qcow2) >>>>> s->children[1] <--> "children.1" (hd1.qcow2) >>>>> >>>>> children.0 (hd0.qcow2) is removed: >>>>> >>>>> s->children[0] <--> "children.1" (hd1.qcow2) >>>>> >>>>> children.0 (hd0.qcow2) is added again: >>>>> >>>>> s->children[0] <--> "children.1" (hd1.qcow2) >>>>> s->children[1] <--> "children.0" (hd0.qcow2) >>>> >>>> Yes, it is correct. >>>> >>>>> >>>>> Is this correct? Is this the indented behavior? Since you are reading >>>>> in FIFO mode, now hd1.qcow2 will always be read first, so if >>>>> children.1 was the secondary disk, it has just become the primary. >>>> >>>> Yes. >>> >>> And don't you need a way to control the order in which the disks must be >>> read for COLO? >> >> I think in fifo mode, we should read the disk first that is added earlier. >> >> We don't need a way to control the order now. > > Can you document fully how it's used in COLO then? Do you mean document it in docs/block-replication.txt? > We should have the failure modes documented, and how you'll use > it after failover etc Without that it's really difficult to tell > if this naming is right. For COLO, children.0 is the real disk, children.1 is replication driver. After failure, children.1 will be removed by the user. If we want to continue do COLO, we need add a new children.1 again. > The children.0 notation is really confusing in the way that Berto > describes; I hit this a couple of months ago and it really doesn't > make sense. Do you mean: read from children.1 first, and then read from children.0 in fifo mode? Yes, the behavior is very strange. Thanks Wen Congyang > > Dave > >> >> Thanks >> Wen Congyang >> >>> >>> Berto >>> >>> >>> . >>> >> >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > . >