From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33757) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtddP-0007zH-1b for qemu-devel@nongnu.org; Tue, 03 Nov 2015 10:36:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtddL-0000FM-S7 for qemu-devel@nongnu.org; Tue, 03 Nov 2015 10:36:30 -0500 References: <1446044465-19312-1-git-send-email-den@openvz.org> <1446044465-19312-5-git-send-email-den@openvz.org> <20151030155253.GD16864@stefanha-x1.localdomain> <87fv0nxhp4.fsf@emacs.mitica> <20151103153047.GG15414@stefanha-x1.localdomain> From: "Denis V. Lunev" Message-ID: <5638D46D.90900@openvz.org> Date: Tue, 3 Nov 2015 18:36:13 +0300 MIME-Version: 1.0 In-Reply-To: <20151103153047.GG15414@stefanha-x1.localdomain> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/4] migration: add missed aio_context_acquire into HMP snapshot code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , Juan Quintela Cc: Amit Shah , Paolo Bonzini , qemu-devel@nongnu.org, qemu-stable@nongnu.org On 11/03/2015 06:30 PM, Stefan Hajnoczi wrote: > On Tue, Nov 03, 2015 at 03:48:07PM +0100, Juan Quintela wrote: >> Stefan Hajnoczi wrote: >>> On Wed, Oct 28, 2015 at 06:01:05PM +0300, Denis V. Lunev wrote: >>>> diff --git a/block/snapshot.c b/block/snapshot.c >>>> index 89500f2..f6fa17a 100644 >>>> --- a/block/snapshot.c >>>> +++ b/block/snapshot.c >>>> @@ -259,6 +259,9 @@ void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs, >>>> { >>>> int ret; >>>> Error *local_err = NULL; >>>> + AioContext *aio_context = bdrv_get_aio_context(bs); >>>> + >>>> + aio_context_acquire(aio_context); >>>> >>>> ret = bdrv_snapshot_delete(bs, id_or_name, NULL, &local_err); >>>> if (ret == -ENOENT || ret == -EINVAL) { >>>> @@ -267,6 +270,8 @@ void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs, >>>> ret = bdrv_snapshot_delete(bs, NULL, id_or_name, &local_err); >>>> } >>>> >>>> + aio_context_release(aio_context); >>>> + >>>> if (ret < 0) { >>>> error_propagate(errp, local_err); >>>> } >>> Please make the caller acquire the AioContext instead of modifying >>> bdrv_snapshot_delete_id_or_name() because no other functions in this >>> file acquire AioContext and the API should be consistent. >> That is wrong (TM). No other functions in migration/* know what an >> aiocontext is, and they are fine, thanks O:-) > To clarify my comment: > > APIs should have a consistent locking strategy. Either all of the the > block/snapshot.c public functions should take the lock or none of them > should. > > With an inconsistent locking strategy it's really hard to review code > and ensure it is correct because you need to look up for each function > whether or not it takes the lock internally. > >> So, I guess we would have to get some other function exported from the >> block layer, with the aiocontext taken? >> >> Code ends being like this: >> >> >> while ((bs = bdrv_next(bs))) { >> if (bdrv_can_snapshot(bs) && >> bdrv_snapshot_find(bs, snapshot, name) >= 0) { >> AioContext *ctx = bdrv_get_aio_context(bs); >> >> aio_context_acquire(ctx); >> bdrv_snapshot_delete_by_id_or_name(bs, name, &err); >> aio_context_release(ctx); >> .... some error handling here ... >> } >> >> >> As discussed on irc, we need to get some function exported from the >> block layer that does this. >> >> I am sure that I don't understand the differences between hmp_devlvm() >> and del_existing_snapshots(). > On IRC I commented when you posted this code because there's a bug: > > bdrv_can_snapshot() and bdrv_snapshot_find() must be called with > AioContext acquired. So the function should actually be: > > while ((bs = bdrv_next(bs))) { > AioContext *ctx = bdrv_get_aio_context(ctx); > > if (bdrv_can_snapshot(bs) && > ... > > aio_context_release(ctx); > } this is not that necessary after patch 6 which adds guard into direct AIO submission code. Anyway, I would tend to agree that locking MUST be consistent and it MUST be ENFORCED in proper places. Truly speaking we must place assert(aio_context_is_owner) at the beginning of each externally visible function in block layer. Though this is too late in 2.5. Den