From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53995) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZrDZo-0005ti-CS for qemu-devel@nongnu.org; Tue, 27 Oct 2015 19:22:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZrDZj-0006yX-F4 for qemu-devel@nongnu.org; Tue, 27 Oct 2015 19:22:48 -0400 References: <1445954986-13005-1-git-send-email-den@openvz.org> <562FC546.9040709@redhat.com> <562FCB13.50505@openvz.org> From: "Denis V. Lunev" Message-ID: <56300734.3080201@openvz.org> Date: Wed, 28 Oct 2015 02:22:28 +0300 MIME-Version: 1.0 In-Reply-To: <562FCB13.50505@openvz.org> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 0/5] dataplane snapshot fixes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org, Stefan Hajnoczi , qemu-stable@nongnu.org On 10/27/2015 10:05 PM, Denis V. Lunev wrote: > On 10/27/2015 09:41 PM, Paolo Bonzini wrote: >> On 27/10/2015 15:09, Denis V. Lunev wrote: >>> The following test >>> while /bin/true ; do >>> virsh snapshot-create rhel7 >>> sleep 10 >>> virsh snapshot-delete rhel7 --current >>> done >>> with enabled iothreads on a running VM leads to a lot of troubles: >>> hangs, >>> asserts, errors. >>> >>> Though (in general) HMP snapshot code is terrible. I think it should be >>> dropped at once and replaced with blkdev transactions code. Though is >>> could not fit to QEMU 2.5/stable at all. >>> >>> Anyway, I think that the construction like >>> assert(aio_context_is_locked(aio_context)); >>> should be widely used to ensure proper locking. >>> >>> Changes from v1: >>> - aio-context locking added >>> - comment is rewritten >>> >>> Signed-off-by: Denis V. Lunev >>> CC: Stefan Hajnoczi >>> CC: Paolo Bonzini >> For patches 4-5: >> >> Reviewed-by: Paolo Bonzini >> >> For patches 1-3 I'm not sure, because we will remove RFifoLock >> relatively soon and regular pthread recursive mutexes do not have an >> equivalent of rfifolock_is_locked. >> >> Paolo > > This does not break any future. > > Yes, FifoLock will go away, but aio_context_is_locked will > survive like it stays in the kernel code. We can either have > plain pthread_mutex_try_lock/unlock at first or we can > have additional stubs for linux with checks like this > > (gdb) p *(pthread_mutex_t*)0x6015a0 > $3 = { > __data = { > __lock = 2, > __count = 0, > __owner = 12276, <== LWP12276 is Thread 3 > __nusers = 1, > __kind = 0, <== non-recursive > __spins = 0, > __list = { > __prev = 0x0, > __next = 0x0 > } > }, > __size = "\002\000\000\000\000\000\000\000\364/\000\000\001",'\000' > , > __align = 2 > } > > in debug mode. Yes, they relays on internal representation, > but they are useful. > > This assert was VERY useful for me. I presume that there are > a LOT of similar places in the code with different functions > where aio_context lock was not acquired and there was no > way to ensure consistency. > > Den there is a sane fast crossplatform approach :) struct AioContextLock { pthread_t owner; pthread_mutex_t mutex; int count; }; void aio_context_lock(struct AioContextLock *lock) { pthread_mutex_lock(&lock->mutex); lock->owner = pthread_self(); lock->count++; } void aio_context_unlock(struct AioContextLock *lock) { if (--lock->count == 0) { lock->owner = NULL; } pthread_mutex_lock(&lock->mutex); } int aio_context_is_locked(struct AioContextLock *lock) { if (pthread_mutex_trylock(&lock->mutex)) { int own_by_us = lock->owner == pthread_self(); pthread_mutex_unlock(&lock->mutex); return own_by_us ? 1 : 0; } /* the lock was taken by other thread */ return 0; } Den