From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53964)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1bycDz-0000c8-RU
	for qemu-devel@nongnu.org; Mon, 24 Oct 2016 06:11:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1bycDy-00035k-Fn
	for qemu-devel@nongnu.org; Mon, 24 Oct 2016 06:11:23 -0400
Date: Mon, 24 Oct 2016 12:11:11 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20161024101111.GB4374@noname.redhat.com>
References: <1475237406-26917-1-git-send-email-famz@redhat.com>
	<e6c3e63c-13bd-b21d-eb78-f3516c599989@redhat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="v9Ux+11Zm5mwPlX6"
Content-Disposition: inline
In-Reply-To: <e6c3e63c-13bd-b21d-eb78-f3516c599989@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v8 00/36] block: Image locking series
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Max Reitz <mreitz@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org, berrange@redhat.com, John Snow <jsnow@redhat.com>, qemu-block@nongnu.org, rjones@redhat.com, Jeff Cody <jcody@redhat.com>, Markus Armbruster <armbru@redhat.com>, stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, eblake@redhat.com


--v9Ux+11Zm5mwPlX6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Am 22.10.2016 um 03:00 hat Max Reitz geschrieben:
> <parenthesis>
>=20
> I personally still don't like making locking a qdev property very much
> because it doesn't make sense to me*. But I remember Kevin had his
> reasons (even though I can no longer remember them) for asking for it,
> and I don't have any besides "It doesn't make sense to me". After having
> though a bit about it (=3D having written three paragraphs and deleted
> them again), I guess I can make some sense of it, though it seems to be
> a rather esoteric use case still; it appears to me that a guest could
> use it to signal that it's fine with some block device being shared;
> then we could use a shared lock or none at all or I don't know.
> Otherwise, we should get an exclusive lock for write access and a shared
> lock for read access.

The reason is pretty simple if you think about this question: Why do we
need user input in the first place? It's just because we don't know the
requiremens of the guest, and unfortunately the guest won't tell us
(because things like shared IDE disks simply don't exist on real
hardware). So we need ask the user, and we should do this in a layer as
close to the guest (which is the origin of the requirement) as possible.
This point is the device emulation.

Everything that isn't a guest device is under the control of qemu, so
we don't need user input there.

> (Although, fun question (that's the reason for the "I don't know"): What
> if guest A supports such a volatile block device, but guest B doesn't?
> Then imagine we run A with write access and B with read-only access. A
> will claim no lock or a shared lock, while B will probably claim a
> shared lock, too, expecting writing users to try to grab an exclusive
> lock. So suddenly both can open the image file, but B isn't actually
> prepared for that. Fun!)

Hmm, that's a good point actually.

And in fact, it reminds me how I suggested that the problem at hand is
really similar to what the new op blockers are supposed to do. And the
system I proposed for those already handles this situation just fine.

The relevant part here is how the proposal had two bit masks per user,
one describing which operatings the user needs to be able to perform
itself, and the other one for operations that other users are still
allowed perform. This means that for every operation you have four
possible states: Exclusively used; shared use; unused, but blocking
other users; unused, but allowed for other users.

If we interpret the locking flag as the operation "writes to the image",
in your example, A would have the mode "shared use" and B would have
"unused, but blocking". These two modes are in conflict.

Now, the big question is how to translate this into file locking. This
could become a little tricky. I had a few thoughts involving another
lock on byte 2, but none of them actually worked out so far, because
what we want is essentially a lock that can be shared by readers, that
can also be shared by writers, but not by readers and writers at the
same time.

> So as far as I understand it, those qdev properties should eventually be
> changeable by the guest. And if that is so, the user probably shouldn't
> touch them at all because the guest OS really knows whether it can cope
> with volatile block devices or not, and it will tell qemu if that is so.

This is an interesting thought for PV devices, but it hasn't been part
of the plan so far. I'm not sure if the guest OS block device drivers
even have this information generally, because that's more a matter of
whether a cluster file system is used on the block device or not.

> That may be the reason why the qdev property doesn't make sense to me at
> the first glance: It does make sense for the guest OS to set this
> property at that level, but it doesn't make sense for the user to do so.
> Or maybe it does, but the user is really asking for things breaking then
> ("YES, I AM SURE THAT MY GUEST WILL BE COMPLETELY FINE WITH FLOPPY DISKS
> JUST CHANGING RANDOMLY" -- but I guess we've all been at that point
> ourselves...).
>=20
> So after having convinced myself twice now, having the qdev property is
> probably correct.

Essentially the whole reason is that we need the information and the
guest doesn't tell us, so we need to ask someone who supposedly knows
enough about the guest - which is the user.

> </parenthesis>
>=20
> But that's where the flags come in again: The guest may be fine with a
> shared lock or none at all. But what about the block layer? Say you're
> using a qcow2 image; that will not like sharing at all, independently of
> what the guest things. So the block layer needs to have internal
> mechanisms to elevate the locks proposed by the guest devices to e.g.
> exclusive ones. I don't think that is something addressed by this series
> in this version. Maybe you'll still need the flags for that.

I'm not completely sure about the mechanism and whether it should
involve flags, but yes, you need something to propagate the required
locking mode down recursively (just like op blockers will need to be
done - maybe we should really use the same infrastructure and only
do file locking as its implementation of the lowest layer).

What you don't need, however, is user input. We already know that qcow2
doesn't like concurrent writes.

> Final thing: Say you have a user who's crazy. Maybe they want to debug
> something. Anyway, they have some qcow2 image attached to some guest
> device, and they want to access that image simultaneously from some
> other process; as I said, crazy, but there may be legitimate uses for
> that so we should allow it.
>=20
> Now first that user of course has to set the device's lock_mode to
> shared or none, thus promising qemu that the guest will be fine with
> volatile data. But of course qcow2 will override that lock mode, because
> qcow2 doesn't care about the guest: It primarily cares about its own
> metadata integrity. So at this point the user will need some way to
> override qcow2's choice, too. Therefore, I think we also need a per-node
> flag to override the locking mode.
>=20
> qcow2 would probably make this flag default to exclusive for its
> underlying node, and the user would then have to override that flag for
> exactly that underlying node.
>=20
> Therefore I think we still need a block layer property for the lock
> mode. While I now agree that we do need a qdev property, I think that
> alone won't be sufficient.

Attaching a qcow2 image to two processes simply doesn't work unless both
are read-only, so I disagree that we should allow it. Or can you really
come up with a specific reasonable use case that requires this?

Users crazy enough to want something like this patch qemu. Both of us
have patched qemu for debugging guests before and we didn't do things as
crazy as writing to the same qcow2 image from multiple processes.

Kevin

--v9Ux+11Zm5mwPlX6
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYDd4/AAoJEH8JsnLIjy/WPpkP+wU6dtAquhnLW4D3cFmULN7t
436h/DwKCPORDcw+en7zBOLz/KWtldeNU4uk2xjKWIxRtXNDjVeP5cDqPWidO1Vl
d9UmNqxGlkfZWSJGI4jk0GzIOB4VzjaGN8VZmavGFotbLSyWKSKhcbMTLz6wx35e
QpAm4MBDUIyuKOQNEB6kOz7VkDwJi2kYCTeaoKHF+JypLq0SwVeVwEfspbm2S8sK
a1FUk1eWeBLNIYpKUktGPP/3RA/do2QqAxUePH0xSkWwTprNjbnKrig9/FVknhqM
m0yY0VpHCbgWhiVqLA1booOJol+LBFmhT3ikaG2n3am+cJtnV0Ub4EibS7zTYP/V
DZBi0aGwWInEaxe/2PXtX+W6UWWh7nMq7qonHoYlSKbuu5ko6vk2D0TfG65B5XmF
GK61W2BN3Kv5Ipc3dUxZ8IAVp0EvQmnWRRKQgd1guo2xKkeQgUGl+PYDTe6aUczp
fWi4zFCYoJ6WHLD8AoCj4vC+z/dyFGD0CE9mvrzk+qPJOpjxPeRVzUhVtig/pEEk
4y1uYj5NwUx6xTKjTwCKEHPRaqwkqaYY2pQywu4URVn7N6HGERaT9pEY37cTQVaf
WnNhBLR7Y3pR9t5M/qXeBnb2JYk7QKSeFe88qunUXhDGnBDxwmLgc5InuWfNJYJx
dNIOGrIABlWJdIlqlUoX
=sKmY
-----END PGP SIGNATURE-----

--v9Ux+11Zm5mwPlX6--