From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53182) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WyNP8-0008Ah-Gm for qemu-devel@nongnu.org; Sat, 21 Jun 2014 11:40:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WyNP3-0001u8-Mg for qemu-devel@nongnu.org; Sat, 21 Jun 2014 11:40:34 -0400 Received: from lputeaux-656-01-25-125.w80-12.abo.wanadoo.fr ([80.12.84.125]:51779 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WyNP3-0001tx-Ca for qemu-devel@nongnu.org; Sat, 21 Jun 2014 11:40:29 -0400 Date: Sat, 21 Jun 2014 17:40:28 +0200 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140621154028.GC2108@irqsave.net> References: <1403208081-18247-1-git-send-email-benoit.canet@irqsave.net> <1403208081-18247-2-git-send-email-benoit.canet@irqsave.net> <53A34460.8010302@redhat.com> <20140619202043.GA18306@irqsave.net> <20140620050106.GB15938@T430.redhat.com> <53A453AF.3@redhat.com> <20140621085358.GA11607@T430.redhat.com> <20140621104551.GA986@irqsave.net> <20140621151519.GA14173@T430> <20140621153911.GB2108@irqsave.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20140621153911.GB2108@irqsave.net> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] block: Make op blocker recursive List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?iso-8859-1?Q?Beno=EEt?= Canet Cc: kwolf@redhat.com, jcody@redhat.com, Fam Zheng , qemu-devel@nongnu.org, stefanha@redhat.com The Saturday 21 Jun 2014 =E0 17:39:11 (+0200), Beno=EEt Canet wrote : > The Saturday 21 Jun 2014 =E0 23:15:19 (+0800), Fam Zheng wrote : > > On Sat, 06/21 12:45, Beno=EEt Canet wrote: > > > The Saturday 21 Jun 2014 =E0 16:53:58 (+0800), Fam Zheng wrote : > > > > On Fri, 06/20 09:30, Eric Blake wrote: > > > > > On 06/19/2014 11:01 PM, Fam Zheng wrote: > > > > > > On Thu, 06/19 22:20, Beno=EEt Canet wrote: > > > > > >> The Thursday 19 Jun 2014 =E0 14:13:20 (-0600), Eric Blake wr= ote : > > > > > >>> On 06/19/2014 02:01 PM, Beno=EEt Canet wrote: > > > > > >>>> As the code will start to operate on arbitratry nodes we n= eed the op blocker > > > > > >>> > > > > > >>> s/arbitratry/arbitrary/ > > > > > >>> > > > > > >>>> to recursively block or unblock whole BDS subtrees. > > > > > >=20 > > > > > > I don't get the reason, can you elaborate? > > > > >=20 > > > > > Consider what happens if I have: > > > > >=20 > > > > > base <- snap1 <- active > > > > >=20 > > > > > then I start a fleecing NBD server on the state as it was at sn= ap1: > > > > >=20 > > > > > base <- snap1 <- active > > > > > \- fleecing > > > > >=20 > > > > > then I do a blockpull into active: > > > > >=20 > > > > > base <- snap1 <- fleecing > > > > > active > > > > >=20 > > > > > at this point, base and snap1 are no longer tied to active, but= they > > > > > STILL must be protected from operations that would modify their= contents > > > > > in a way that would break the fleecing operation. The solution= we are > > > > > looking at is making BDS blockers recursive to every element of= the > > > > > chain, not just the top-level device. > > > >=20 > > > > This would already have been protected by backing blocker of flee= cing target. > > > >=20 > > > > >=20 > > > > > Another example: consider: > > > > >=20 > > > > > base <- snap1 <- active > > > > >=20 > > > > > then someone uses Jeff's proposed new change-backing-file QMP c= ommand to > > > > > rewrite the snap1 metadata to point to base via a relative name= instead > > > > > of an absolute name. It shouldn't matter whether active is blo= cked, but > > > > > only whether snap1 is blocked. But to know if snap1 is blocked= , we have > > > > > to propagate the blockers of active down recursively to its bac= king files. > > > >=20 > > > > Why do we need to block changging of metadata? I think this opera= tion is safe > > > > in most cases. > > > >=20 > > > > Correct me if I'm missing anything, but even if snap1 _is_ blocke= d, it would be > > > > because snap1 is serving as backing of active. In this case, the = actual blocker > > > > should be active->backing_blocker. > > > >=20 > > > > >=20 > > > > > >> What would be a cleaner solution ? > > > > > >=20 > > > > > > What is the question to solve? > > > > >=20 > > > > > I think Jeff's idea is on target - rather than blocking by oper= ation, we > > > > > should instead be blocking on access patterns (various operatio= ns > > > > > trigger several access patterns): > > > > > https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.= html > > > > >=20 > > > > > Jeff's initial list included: > > > > >=20 > > > > > > So if I think of operations that are done on block devices fr= om a > > > > > > block job, and chuck them into categories, I think we have: > > > > > >=20 > > > > > > 1) Read of guest-visible data > > > > > > 2) Write of guest-visible data > > > > > > 3) Read of host-visible data (e.g. image file metadata) > > > > > > 4) Write of host-visible data (e.g. image file metadata, such= as > > > > > > the backing-file) > > > > > > 5) Block chain manipulations (e.g. movement of a BDS, change = to r/w > > > > > > instead of r/o, etc..) > > > > > > 6) I/O attribute changes (e.g. throttling, etc..) > > > >=20 > > > > Most operations looks safe to me, given the way how IOThreads and= coroutine > > > > work now. It's only the chain manpulations in long running block = jobs that are > > > > exclusive, and by nature it should be checked per chain. Can we = set some op > > > > blockers on the bottom BDS and check it each time, to prevent use= r from > > > > starting a second chain manipulator? > > >=20 > > > I don't know if bottom BDS locking is any good because some driver = like quorum > > > have multiple childs. > > > Locking everytime the root (top) of the tree seems a feasible solut= ion indeed. > >=20 > > Quorom doesn't change the convensions of backing chain, so each child= belongs > > to its own backing chain, and that chain has a deterministic top and = bottom. > >=20 > > Blocking flag on bottom saves us from adding reverse looking up (->ov= erlay > > pointer), because we already have the ->backing_hd pointer in BDS. >=20 > I like the consequence that when a loop is formed like in commit's driv= e-mirror > run code and must be unlocked the bottom locked BDS will act as a guard= to prevent > unlock loop cycling. >=20 > We still have the issue of unlocking the bottom BDS when a subtree is d= etached > from the graphs by a swap. (It does happen in my drive-mirror arbitrary= node > replacement series). >=20 > From my understanding the unlocking of the root BDS is done by drive_mi= rror_complete > while the mirror code tries to unref the orphaned subtree _before_ driv= e_mirror_complete > is called. One fixe to my sentence: s/drive_mirror_complete/block_job_complete/ >=20 > So the bottom BDS would be unrefed before being unlocked. >=20 > Best regards >=20 > Beno=EEt >=20 > >=20 > > Fam > >=20