From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45113)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1asnmF-00064X-E3
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 04:46:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@virtuozzo.com>) id 1asnmD-0003tS-Sd
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 04:46:27 -0400
References: <1460690887-32751-1-git-send-email-famz@redhat.com>
	<1460690887-32751-5-git-send-email-famz@redhat.com>
	<5712CAD6.9040707@redhat.com>
	<20160418013334.GC18893@ad-mail.usersys.redhat.com>
	<571471E5.5010907@openvz.org> <5716839C.8040808@redhat.com>
From: "Denis V. Lunev" <den@openvz.org>
Message-ID: <571741D0.10900@openvz.org>
Date: Wed, 20 Apr 2016 11:46:08 +0300
MIME-Version: 1.0
In-Reply-To: <5716839C.8040808@redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH for-2.7 v2 04/17] block: Introduce image
 file locking
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Max Reitz <mreitz@redhat.com>, Fam Zheng <famz@redhat.com>
Cc: qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>, Jeff Cody <jcody@redhat.com>, Markus Armbruster <armbru@redhat.com>, Eric Blake <eblake@redhat.com>, John Snow <jsnow@redhat.com>, qemu-block@nongnu.org, berrange@redhat.com, pbonzini@redhat.com

On 04/19/2016 10:14 PM, Max Reitz wrote:
> On 18.04.2016 07:34, Denis V. Lunev wrote:
>> On 04/18/2016 04:33 AM, Fam Zheng wrote:
>>> On Sun, 04/17 01:29, Max Reitz wrote:
>>>> On 15.04.2016 05:27, Fam Zheng wrote:
>>>>> Block drivers can implement this new operation .bdrv_lockf to
>>>>> actually lock the
>>>>> image in the protocol specific way.
>>>>>
>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>> ---
>>>>>    block.c                   | 42
>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>>    include/block/block_int.h | 12 ++++++++++++
>>>>>    2 files changed, 54 insertions(+)
>>>> I'm prepared for everyone hating this idea, but I'm just bringing it up
>>>> so I can always say I did bring it up.
>>>>
>>>> Heads up: This will be about qcow2 locking again.
>>>>
>>>> Relax, though, it won't be about how much better qcow2 locking is better
>>>> than protocol locking.
>>>>
>>>> Now that you know this feel free to drop out.
>>>>
>>>> This patch implements locking by just trying to lock every single BDS
>>>> that is being opened. While it may fulfill its purpose, I don't think
>>>> that is what we actually want.
>>>>
>>>> What we want is the following: qemu has a BDS graph. It is basically a
>>>> forest of trees. It may be a bit more complicated (DAGs instead of
>>>> trees), but let's just assume it is.
>>>>
>>>> What we want to protect are leaves in this tree. Every leaf basically
>>>> corresponds to a physical resource such as a file or an NBD connection.
>>>> Every leaf is driven by a protocol block driver. We want to protect
>>>> these physical resources from concurrent access.
>>>>
>>>> Ideally, we can just protect the physical resource itself. This works
>>>> for raw-posix, this works for gluster, this works for raw-win32, and
>>>> probably some other protocols, too. But I guess it won't work for all
>>>> protocols, and even if it does, it would need to be implemented.
>>>>
>>>> But we can protect leaves in the BDS forest by locking non-leaves also:
>>>> If you lock a qcow2 node, all of its "file" subtree will be protected;
>>>> normally, that's just a single leaf.
>>>>
>>>> Therefore, I think the ideal approach would be for each BDS tree that is
>>>> to be created we try to lock all of its leaves, and if that does not
>>>> work for some, we walk up the tree and try to lock inner nodes (e.g.
>>>> format BDSs which then use format locking) so that the leaves are still
>>>> protected even if their protocol does not support that.
>>>>
>>>> This could be implemented like this: Whenever a leaf BDS is created, try
>>>> to lock it. If we can't, leave some information to the parent node that
>>>> its child could not be locked. Then, the parent will evaluate this
>>>> information and try to act upon it. This then recurses up the tree. Or,
>>>> well, down the tree, considering that in most natural trees the root is
>>>> at the bottom.
>>>>
>>>>
>>>> We could just implement qcow2 locking on top of this series as it is,
>>>> but this would result in qcow2 files being locked even if their files'
>>>> protocol nodes have been successfully locked. That would be superfluous
>>>> and we'd have all the issues with force-unlocking qcow2 files we have
>>>> discussed before.
>>>>
>>>>
>>>> So what am I saying? I think that it makes sense to consider format
>>>> locking as a backup alternative to protocol locking in case the latter
>>>> is not possible. I think it is possible to implement both using the same
>>>> framework.
>>>>
>>>> I don't think we need to worry about the actual implementation of format
>>>> locking now. But I do think having a framework which supports both
>>>> format and protocol locking is possible and would be nice to have.
>>>>
>>>> Such a framework would require more effort, however, than the basically
>>>> brute-force "just lock everything" method presented in this patch. Don't
>>>> get me wrong, this method here works for what it's supposed to do (I
>>>> haven't reviewed it yet, though), and it's very reasonable if protocol
>>>> locking is all we intend to have. I'm just suggesting that maybe we do
>>>> want to have more than that.
>>>>
>>>>
>>>> All in all, I won't object if the locking framework introduced by this
>>>> series is not supposed to and does not work with format locking. It can
>>>> always be added later if I really like it so much, and I can definitely
>>>> understand if it appears to be too much effort for basically no gain
>>>> right now.
>>>>
>>>> As I said above, I just brought this up so I brought it up. :-)
>>> I don't hate this idea, but it is not necessarily much more effort.
>>> We can
>>> always check the underlying file in qcow2's locking implementation,
>>> can't we?
>>>
>>>       int qcow2_lockf(BlockDriverState *bs, int cmd)
>>>       {
>>>           if ((cmd != BDRV_LOCKF_UNLOCK) && !bdrv_is_locked(bs->file)) {
>>>               return 0;
>>>           }
>>>           ...
>>>       }
>>>
>>> The problem with doing this generically in block layer is the
>>> chicken-and-egg
>>> problem: it's not safe to just have format probling code or qcow2
>>> driver to
>>> read the image or even writing to the header field for opening it,
>>> another
>>> process could be writing to the image already. A challenge with format
>>> locking is the lack of file level atomic operations (cmpxchg on the image
>>> header).
>>>
>>> Fam
>> We should not touch images!
>>
>> If QEMU will die, f.e. on assert or by power off, we will suffer
>> a REAL pain to guess whether the lock is taken or not.
>> flock() and friends is a perfect mechanics for the purpose
>> as the kernel will drop corresponding locks by magic.
>>
>> Header changes are good for detecting of unclean stops and
>> running consistency checks after that, f.e. if we have lazy
>> ref-counters on.
>>
>> Pls do not do this inside the image. We have eaten that
>> stuff in our older products and this is really BAD way to
>> follow.
> My suggestion was to use format locking only when protocol locking is
> not available. And you can always switch it off manually.
>
> Also, the bulk of my argument was not to implement format locking right
> now, but to keep in mind that maybe we do want to implement it later.
>
> Max
>
OK. this sounds sane and fair if we could start with the
current approach and extend it later on with format
locking.

Den