From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42794) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMaqF-00029H-HE for qemu-devel@nongnu.org; Tue, 04 Aug 2015 07:57:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZMaqB-0003pg-MY for qemu-devel@nongnu.org; Tue, 04 Aug 2015 07:57:11 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:35528 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMaqB-0003oh-Br for qemu-devel@nongnu.org; Tue, 04 Aug 2015 07:57:07 -0400 Message-ID: <55C0A88D.1010800@kamp.de> Date: Tue, 04 Aug 2015 13:57:01 +0200 From: Peter Lieven MIME-Version: 1.0 References: <55BB2DF7.8010808@kamp.de> <55BB302D.50108@redhat.com> <55BB335A.1010009@kamp.de> <55BB3FE7.3000106@redhat.com> <55C08461.1040308@kamp.de> <55C0A7AA.70609@redhat.com> In-Reply-To: <55C0A7AA.70609@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Stefan Hajnoczi Cc: "qemu-devel@nongnu.org" , qemu-stable@nongnu.org Am 04.08.2015 um 13:53 schrieb Paolo Bonzini: > > On 04/08/2015 11:22, Peter Lieven wrote: >>>>>> edec47c main-loop: fix qemu_notify_event for aio_notify optimization >>>>> Part of the above AioContext series. >>>> So either the whole series or none of them I guess? >>> It's a separate bug, and theoretically it's there in 2.3.1 as well, but >>> no one ever reproduced it (it would hang in make check) so not >>> worthwhile. >> Can you give me a pointer what the symtoms where? > If a thread tries to wake up the main thread using qemu_notify_event(), > the main thread will never wake up. This for example could happen if > the first thread calls qemu_set_fd_handler() or timer_mod(). > >> I have a qemu-img convert job on x86_64 that reproducibly hangs on >> bdrv_drain_all at the end of the convert process. >> I convert from nfs:// to local storage here. I try to figure out which BS >> reports busy. Qemu here is still 2.2.1. > qemu-img does not use main-loop, so this cannot be the cause. > > The AioContext bugs only happen when you have a thread executing the > main loop and one thread executing aio_poll, so they can also be > excluded as the cause of qemu-img problems. Okay, what I found out is that in aio_poll I get revents = POLLIN for the nfs file descriptor. But there is no data available on the socket. But as a consequence progress is true and we loop here forever. I have seen that is a common bug in Linux to return POLLIN on a fd even there is no data available. I don't have this problem in general, in this case no qemu-img or qemu process would ever terminate when nfs is involved, but in this special case it happens reproducible. Peter