From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57119) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YdiG0-0005rd-G5 for qemu-devel@nongnu.org; Thu, 02 Apr 2015 12:46:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YdiFx-0007Xd-2j for qemu-devel@nongnu.org; Thu, 02 Apr 2015 12:46:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48807) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YdiFw-0007XS-S0 for qemu-devel@nongnu.org; Thu, 02 Apr 2015 12:46:13 -0400 Message-ID: <551D7250.2010200@redhat.com> Date: Thu, 02 Apr 2015 12:46:08 -0400 From: John Snow MIME-Version: 1.0 References: <551D71BF.6050601@redhat.com> In-Reply-To: <551D71BF.6050601@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] thread-pool.c race condition? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Stefan Hajnoczi Cc: Laszlo Ersek , qemu-devel On 04/02/2015 12:43 PM, Paolo Bonzini wrote: > > > On 02/04/2015 18:26, Stefan Hajnoczi wrote: >> John Snow has reported that qemu-io can hang when the host is under >> heavy load. He made the following observations in gdb: >> >> 1. The program is sitting in aio_poll() (called by bdrv_prwv_co()) >> waiting for request completion. >> >> 2. The thread pool has a ThreadPoolElement with ->state == THREAD_DONE. >> >> The ThreadPoolElement should have been reaped by >> thread_pool_completion_bh() and its callback invoked. For some reason >> this didn't happen and the program is blocked in poll(2) waiting. >> >> This suggests a race condition in thread-pool.c or qemu_bh_schedule() >> (used to complete ThreadPoolElement from a QEMU event loop). >> >> I don't have a good theory why this happens yet. Just wanted to share >> in case someone else hits this problem. > > Laszlo hit something very similar fairly easily with virtio-scsi (but > not virtio-blk!) on aarch64 hosts. Any attempt to debug it (ranging > from compilation with -O0 to tracing) made it disappear. A reliable > reproducer with qemu-io would be a dream... > > Paolo > Unfortunately for you, I hit it by running qemu-iotests on my laptop overnight and I suspect it's triggered by my screensavers hogging CPU when I am AFK... I hit it pretty reliably (100% of the time I tried to run tests while AFK -- three independent screensavers running on three monitors) two weeks ago, but haven't seen it recently. I'll keep you posted...