From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52140) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1apUCq-0006Jk-G8 for qemu-devel@nongnu.org; Mon, 11 Apr 2016 01:16:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1apUCo-0007zC-Rd for qemu-devel@nongnu.org; Mon, 11 Apr 2016 01:16:12 -0400 Message-ID: <570B33AA.7000902@cn.fujitsu.com> Date: Mon, 11 Apr 2016 13:18:34 +0800 From: Changlong Xie MIME-Version: 1.0 References: <1457578181-27111-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1457578181-27111-3-git-send-email-xiecl.fnst@cn.fujitsu.com> <56E653E0.9030808@cn.fujitsu.com> <56EA06E0.7000409@cn.fujitsu.com> <56EA7C62.3090000@cn.fujitsu.com> <20160317094831.GA2504@work-vm> <56EA7F39.9060504@cn.fujitsu.com> <56FAA168.9090304@redhat.com> <56FAA2C4.3000002@redhat.com> <56FAA47A.2020801@redhat.com> <56FBEBA3.9070305@redhat.com> In-Reply-To: <56FBEBA3.9070305@redhat.com> Content-Type: text/plain; charset="iso-8859-15"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz , Alberto Garcia , Eric Blake , Wen Congyang , "Dr. David Alan Gilbert" Cc: qemu devel , Kevin Wolf , Stefan Hajnoczi , Markus Armbruster , Dong Eddie , Jiang Yunhong , qemu block , zhanghailiang , Gonglei On 03/30/2016 11:07 PM, Max Reitz wrote: > On 30.03.2016 13:39, Alberto Garcia wrote: >> On Tue 29 Mar 2016 05:51:22 PM CEST, Max Reitz wrote: >>>> It sounds like the argument here, and in Max's thread on >>>> query-block-node-tree, is that we DO have cases where order matters, and >>>> so we need a way for the hot-add operation to explicitly specify where >>>> in the list a child is inserted (whether it is being inserted as the new >>>> primary image, or explicitly as the last resort, or somewhere in the >>>> middle). An optional parameter, that defaults to appending, may be ok, >>>> but we definitely need to consider how the order of children is affected >>>> by hot-add. >>> >>> However, the order should be queriable after the fact, and there are >>> three ways I see to accomplish this: >>> >>> (1) Make this information queriable as driver-specific BDS information. >>> I personally don't like it very much, but it would be fine. >>> (2) Implement query-block-node-tree, make the order of child nodes >>> significant and thus represent the FIFO order there. I don't like >>> this because it would mean returning two orders through that child >>> node list: One is the numeric order (children.0, children.1, ...) >>> and another is the FIFO order, which are not necessarily equal. >>> (3) Fix FIFO order to the child name (its role). I'm very much in favor >>> of this. >>> >>> While I don't have good arguments against (1), I think I have good >>> arguments for (3) instead: It just doesn't make sense to have a numeric >>> order of children if this order doesn't mean anything; especially if you >>> suddenly do need the list of child nodes to be ordered. To me, it >>> doesn't make any sense to introduce a new hidden order which takes >>> precedence over this obvious user-visible order. >> >> I'm not sure if I understand correctly what you mean in (3). The >> user-visible FIFO order is the one specified when the Quorum is created: >> >> children.0.file.filename=hd0.qcow2, >> children.1.file.filename=hd1.qcow2, >> ... >> >> Would you then call those BDS children.0, children.1, etc > > They are already called that way; it's not their node name but their > "child role" name. > >> and make those >> names be the ones that actually define how they are ordered internally? > > Yes, that's what I meant. > Hi Max I think you just mean what i draw in below chart: 1) Insert 4 children. 0 1 2 3 +---------------------------------------------------- |children.0|children.1|children.2|children.3| +---------------------------------------------------- 2) Remove the 2th child (s->children[1]) { "execute": "x-blockdev-change", "arguments": { "parent": "xxx", "child": "children.1" } } 0 1 2 3 +---------------------------------------------------- |children.0|children.1|children.2|children.3| +------------------------+------------+-------------- | | +------+ +--------+ 0 1 | 2 | +----------------v----------v------------------------ |children.0|children.1|children.2| +---------------------------------------------------- Remove children.1 and shorten the array, then rename children.{2,3} to children.{1.2} 3) Insert a new child 0 1 2 3 +---------------------------------------------------- |children.0|children.1|children.2|children.3| +---------------------------------------------------- But as Wen said: http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg00898.html Everytime we try to remove a children.i (i < n-1, so it's not the last element of the array[n]), we have to rename children.{i+1, n-1} to children.{i, n-2}. Thanks -Xie >> I also have another (not directly related) question: why not simply use >> the node name when removing children? I understood that the idea was >> that it's possible to have the same node attached twice to the same >> Quorum, but can you actually do that? And what's the use case? > > What I like about using the child role name is that it automatically > prevents you from specifying a node that is not a child of the given parent. > > Which makes me notice that it might be a good idea to require the user > to specify the child's role when adding a new child. In this version of > this series (where only quorum is supported), the children are just > inserted in numerical order (first free slot is taken first), but maybe > the user wants to insert them in a different order. > > For quorum, this is basically irrelevant if the order doesn't mean > anything (which I don't like), but it may be relevant for other block > drivers. > > And the "filling up quorum's children" topic then makes me notice that > (x-)blockdev-change should probably be transaction-able (so you can > restructure the whole BDS graph in a single transaction), but that's > something we can care about later on. > > Max >