From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zr9ZT-0004Z8-RZ for qemu-devel@nongnu.org; Tue, 27 Oct 2015 15:06:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zr9ZQ-0004Ih-Ki for qemu-devel@nongnu.org; Tue, 27 Oct 2015 15:06:11 -0400 References: <1445954986-13005-1-git-send-email-den@openvz.org> <562FC546.9040709@redhat.com> From: "Denis V. Lunev" Message-ID: <562FCB13.50505@openvz.org> Date: Tue, 27 Oct 2015 22:05:55 +0300 MIME-Version: 1.0 In-Reply-To: <562FC546.9040709@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 0/5] dataplane snapshot fixes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org, Stefan Hajnoczi , qemu-stable@nongnu.org On 10/27/2015 09:41 PM, Paolo Bonzini wrote: > On 27/10/2015 15:09, Denis V. Lunev wrote: >> The following test >> while /bin/true ; do >> virsh snapshot-create rhel7 >> sleep 10 >> virsh snapshot-delete rhel7 --current >> done >> with enabled iothreads on a running VM leads to a lot of troubles: hangs, >> asserts, errors. >> >> Though (in general) HMP snapshot code is terrible. I think it should be >> dropped at once and replaced with blkdev transactions code. Though is >> could not fit to QEMU 2.5/stable at all. >> >> Anyway, I think that the construction like >> assert(aio_context_is_locked(aio_context)); >> should be widely used to ensure proper locking. >> >> Changes from v1: >> - aio-context locking added >> - comment is rewritten >> >> Signed-off-by: Denis V. Lunev >> CC: Stefan Hajnoczi >> CC: Paolo Bonzini > For patches 4-5: > > Reviewed-by: Paolo Bonzini > > For patches 1-3 I'm not sure, because we will remove RFifoLock > relatively soon and regular pthread recursive mutexes do not have an > equivalent of rfifolock_is_locked. > > Paolo This does not break any future. Yes, FifoLock will go away, but aio_context_is_locked will survive like it stays in the kernel code. We can either have plain pthread_mutex_try_lock/unlock at first or we can have additional stubs for linux with checks like this (gdb) p *(pthread_mutex_t*)0x6015a0 $3 = { __data = { __lock = 2, __count = 0, __owner = 12276, <== LWP12276 is Thread 3 __nusers = 1, __kind = 0, <== non-recursive __spins = 0, __list = { __prev = 0x0, __next = 0x0 } }, __size = "\002\000\000\000\000\000\000\000\364/\000\000\001",'\000' , __align = 2 } in debug mode. Yes, they relays on internal representation, but they are useful. This assert was VERY useful for me. I presume that there are a LOT of similar places in the code with different functions where aio_context lock was not acquired and there was no way to ensure consistency. Den