From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51112) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zr8u7-0000d0-Hs for qemu-devel@nongnu.org; Tue, 27 Oct 2015 14:23:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zr8u3-0000B1-Uf for qemu-devel@nongnu.org; Tue, 27 Oct 2015 14:23:27 -0400 References: <1445954986-13005-1-git-send-email-den@openvz.org> <1445954986-13005-5-git-send-email-den@openvz.org> <562FBE8F.7040309@redhat.com> From: "Denis V. Lunev" Message-ID: <562FC10D.7040404@openvz.org> Date: Tue, 27 Oct 2015 21:23:09 +0300 MIME-Version: 1.0 In-Reply-To: <562FBE8F.7040309@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/5] migration: add missed aio_context_acquire into hmp_savevm/hmp_delvm List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Amit Shah , qemu-stable@nongnu.org, qemu-devel@nongnu.org, Stefan Hajnoczi , Juan Quintela On 10/27/2015 09:12 PM, Paolo Bonzini wrote: > > On 27/10/2015 15:09, Denis V. Lunev wrote: >> aio_context should be locked in the similar way as was done in QMP >> snapshot creation in the other case there are a lot of possible >> troubles if native AIO mode is enabled for disk. >> >> - the command can hang (HMP thread) with missed wakeup (the operation is >> actually complete) >> io_submit >> ioq_submit >> laio_submit >> raw_aio_submit >> raw_aio_readv >> bdrv_co_io_em >> bdrv_co_readv_em >> bdrv_aligned_preadv >> bdrv_co_do_preadv >> bdrv_co_do_readv >> bdrv_co_readv >> qcow2_co_readv >> bdrv_aligned_preadv >> bdrv_co_do_pwritev >> bdrv_rw_co_entry >> >> - QEMU can assert in coroutine re-enter >> __GI_abort >> qemu_coroutine_enter >> bdrv_co_io_em_complete >> qemu_laio_process_completion >> qemu_laio_completion_bh >> aio_bh_poll >> aio_dispatch >> aio_poll >> iothread_run >> >> AioContext lock is reqursive. Thus nested locking should not be a problem. >> >> Signed-off-by: Denis V. Lunev >> CC: Stefan Hajnoczi >> CC: Paolo Bonzini >> CC: Juan Quintela >> CC: Amit Shah >> --- >> block/snapshot.c | 5 +++++ >> migration/savevm.c | 7 +++++++ >> 2 files changed, 12 insertions(+) >> >> diff --git a/block/snapshot.c b/block/snapshot.c >> index 89500f2..f6fa17a 100644 >> --- a/block/snapshot.c >> +++ b/block/snapshot.c >> @@ -259,6 +259,9 @@ void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs, >> { >> int ret; >> Error *local_err = NULL; >> + AioContext *aio_context = bdrv_get_aio_context(bs); >> + >> + aio_context_acquire(aio_context); >> >> ret = bdrv_snapshot_delete(bs, id_or_name, NULL, &local_err); >> if (ret == -ENOENT || ret == -EINVAL) { >> @@ -267,6 +270,8 @@ void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs, >> ret = bdrv_snapshot_delete(bs, NULL, id_or_name, &local_err); >> } >> >> + aio_context_release(aio_context); > Why here and not in hmp_delvm, for consistency? > > The call from hmp_savevm is already protected. > > Thanks for fixing the bug! > > Paolo the situation is more difficult. There are several disks in VM. One disk is used for state saving (protected in savevm) and there are several disks touched via static int del_existing_snapshots(Monitor *mon, const char *name) while ((bs = bdrv_next(bs))) { if (bdrv_can_snapshot(bs) && bdrv_snapshot_find(bs, snapshot, name) >= 0) { bdrv_snapshot_delete_by_id_or_name(bs, name, &err); } } in savevm and similar looking code in delvm with similar cycle implemented differently. This patchset looks minimal for me to kludge situation enough. True fix would be a drop of this code in favour of blockdev transactions. At least this is my opinion. Though I can not do this at this stage, this will take a lot of time. Den