From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51518) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asb5G-0007hE-GT for qemu-devel@nongnu.org; Tue, 19 Apr 2016 15:13:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1asb5F-0002TA-52 for qemu-devel@nongnu.org; Tue, 19 Apr 2016 15:13:14 -0400 References: <1460690887-32751-1-git-send-email-famz@redhat.com> <1460690887-32751-5-git-send-email-famz@redhat.com> <5712CAD6.9040707@redhat.com> <20160418013334.GC18893@ad-mail.usersys.redhat.com> From: Max Reitz Message-ID: <5716833F.5070404@redhat.com> Date: Tue, 19 Apr 2016 21:13:03 +0200 MIME-Version: 1.0 In-Reply-To: <20160418013334.GC18893@ad-mail.usersys.redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MIJU1LbPVR9m38PJDM06oXS1tLuJG1JMV" Subject: Re: [Qemu-devel] [PATCH for-2.7 v2 04/17] block: Introduce image file locking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: qemu-devel@nongnu.org, Kevin Wolf , Jeff Cody , Markus Armbruster , Eric Blake , John Snow , qemu-block@nongnu.org, berrange@redhat.com, pbonzini@redhat.com, den@openvz.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MIJU1LbPVR9m38PJDM06oXS1tLuJG1JMV Content-Type: multipart/mixed; boundary="xUWtEajp0VuqhtDjEODQdpLg0cRAnkoLt" From: Max Reitz To: Fam Zheng Cc: qemu-devel@nongnu.org, Kevin Wolf , Jeff Cody , Markus Armbruster , Eric Blake , John Snow , qemu-block@nongnu.org, berrange@redhat.com, pbonzini@redhat.com, den@openvz.org Message-ID: <5716833F.5070404@redhat.com> Subject: Re: [PATCH for-2.7 v2 04/17] block: Introduce image file locking References: <1460690887-32751-1-git-send-email-famz@redhat.com> <1460690887-32751-5-git-send-email-famz@redhat.com> <5712CAD6.9040707@redhat.com> <20160418013334.GC18893@ad-mail.usersys.redhat.com> In-Reply-To: <20160418013334.GC18893@ad-mail.usersys.redhat.com> --xUWtEajp0VuqhtDjEODQdpLg0cRAnkoLt Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 18.04.2016 03:33, Fam Zheng wrote: > On Sun, 04/17 01:29, Max Reitz wrote: >> On 15.04.2016 05:27, Fam Zheng wrote: >>> Block drivers can implement this new operation .bdrv_lockf to actuall= y lock the >>> image in the protocol specific way. >>> >>> Signed-off-by: Fam Zheng >>> --- >>> block.c | 42 +++++++++++++++++++++++++++++++++++++= +++++ >>> include/block/block_int.h | 12 ++++++++++++ >>> 2 files changed, 54 insertions(+) >> >> I'm prepared for everyone hating this idea, but I'm just bringing it u= p >> so I can always say I did bring it up. >> >> Heads up: This will be about qcow2 locking again. >> >> Relax, though, it won't be about how much better qcow2 locking is bett= er >> than protocol locking. >> >> Now that you know this feel free to drop out. >> >> This patch implements locking by just trying to lock every single BDS >> that is being opened. While it may fulfill its purpose, I don't think >> that is what we actually want. >> >> What we want is the following: qemu has a BDS graph. It is basically a= >> forest of trees. It may be a bit more complicated (DAGs instead of >> trees), but let's just assume it is. >> >> What we want to protect are leaves in this tree. Every leaf basically >> corresponds to a physical resource such as a file or an NBD connection= =2E >> Every leaf is driven by a protocol block driver. We want to protect >> these physical resources from concurrent access. >> >> Ideally, we can just protect the physical resource itself. This works >> for raw-posix, this works for gluster, this works for raw-win32, and >> probably some other protocols, too. But I guess it won't work for all >> protocols, and even if it does, it would need to be implemented. >> >> But we can protect leaves in the BDS forest by locking non-leaves also= : >> If you lock a qcow2 node, all of its "file" subtree will be protected;= >> normally, that's just a single leaf. >> >> Therefore, I think the ideal approach would be for each BDS tree that = is >> to be created we try to lock all of its leaves, and if that does not >> work for some, we walk up the tree and try to lock inner nodes (e.g. >> format BDSs which then use format locking) so that the leaves are stil= l >> protected even if their protocol does not support that. >> >> This could be implemented like this: Whenever a leaf BDS is created, t= ry >> to lock it. If we can't, leave some information to the parent node tha= t >> its child could not be locked. Then, the parent will evaluate this >> information and try to act upon it. This then recurses up the tree. Or= , >> well, down the tree, considering that in most natural trees the root i= s >> at the bottom. >> >> >> We could just implement qcow2 locking on top of this series as it is, >> but this would result in qcow2 files being locked even if their files'= >> protocol nodes have been successfully locked. That would be superfluou= s >> and we'd have all the issues with force-unlocking qcow2 files we have >> discussed before. >> >> >> So what am I saying? I think that it makes sense to consider format >> locking as a backup alternative to protocol locking in case the latter= >> is not possible. I think it is possible to implement both using the sa= me >> framework. >> >> I don't think we need to worry about the actual implementation of form= at >> locking now. But I do think having a framework which supports both >> format and protocol locking is possible and would be nice to have. >> >> Such a framework would require more effort, however, than the basicall= y >> brute-force "just lock everything" method presented in this patch. Don= 't >> get me wrong, this method here works for what it's supposed to do (I >> haven't reviewed it yet, though), and it's very reasonable if protocol= >> locking is all we intend to have. I'm just suggesting that maybe we do= >> want to have more than that. >> >> >> All in all, I won't object if the locking framework introduced by this= >> series is not supposed to and does not work with format locking. It ca= n >> always be added later if I really like it so much, and I can definitel= y >> understand if it appears to be too much effort for basically no gain >> right now. >> >> As I said above, I just brought this up so I brought it up. :-) >=20 > I don't hate this idea, but it is not necessarily much more effort. We= can > always check the underlying file in qcow2's locking implementation, can= 't we? >=20 > int qcow2_lockf(BlockDriverState *bs, int cmd) > { > if ((cmd !=3D BDRV_LOCKF_UNLOCK) && !bdrv_is_locked(bs->file)) = { > return 0; > } > ... > } Good point. I like that. > The problem with doing this generically in block layer is the chicken-a= nd-egg > problem: it's not safe to just have format probling code or qcow2 drive= r to > read the image or even writing to the header field for opening it, anot= her > process could be writing to the image already. A challenge with format > locking is the lack of file level atomic operations (cmpxchg on the ima= ge > header). Well, I'm not sure whether we'd need to format-lock an image for probing, but with the above, we'd circumvent the whole issue anyway. Max --xUWtEajp0VuqhtDjEODQdpLg0cRAnkoLt-- --MIJU1LbPVR9m38PJDM06oXS1tLuJG1JMV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJXFoM/AAoJEDuxQgLoOKytVO8H/1NKzkVZ4xnbzfu2DZqmNECk kctFCEN6uITEP9Kx5ZzczEiyqoSm0ttP4o1NyZMwZBC7TOzEhrUC0JXWkapxk20m nDOTcS6EeTNubyQcfibdvZNKMA9sAwUbTZpa/MvGO03WAiJquaFvARrwojqPp+CC 4idQxEnprBGjqXb9xZV9lmh/4ZQNxSqTpqMpq4pRP3MypvpRTFOPOonwh986HE84 7NKVAfV3UJIXauGNm1tbqo9JfaSTOm7V7RbLwO/vd0Ib/H91Gh1xplcadlwrDgri OaW5oXvN158RZ2oAdLuquP5RVNkEBpSA03bxQX9QzZcK3w3/H1DsT33dTcCUs24= =O3XY -----END PGP SIGNATURE----- --MIJU1LbPVR9m38PJDM06oXS1tLuJG1JMV--