From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41667) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afOeX-0002kp-NG for qemu-devel@nongnu.org; Mon, 14 Mar 2016 05:19:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afOeU-000853-Cu for qemu-devel@nongnu.org; Mon, 14 Mar 2016 05:19:05 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:53052) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afOeU-00084i-5z for qemu-devel@nongnu.org; Mon, 14 Mar 2016 05:19:02 -0400 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 Mar 2016 03:18:59 -0600 Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 0C7C61FF0026 for ; Mon, 14 Mar 2016 03:07:06 -0600 (MDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u2E9Iung34406556 for ; Mon, 14 Mar 2016 02:18:56 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u2E9Iuqx028520 for ; Mon, 14 Mar 2016 03:18:56 -0600 References: <1455470231-5223-1-git-send-email-pbonzini@redhat.com> <1455470231-5223-6-git-send-email-pbonzini@redhat.com> <56E01544.6060305@de.ibm.com> <56E01D3F.1060204@redhat.com> <56E03333.5020601@de.ibm.com> <56E04C9B.7070801@redhat.com> <20160310015154.GD23632@ad.usersys.redhat.com> <56E13849.3060409@de.ibm.com> <56E14101.4030405@de.ibm.com> <56E29DB1.6000200@redhat.com> From: tu bo Message-ID: <56E681FE.9030806@linux.vnet.ibm.com> Date: Mon, 14 Mar 2016 17:18:54 +0800 MIME-Version: 1.0 In-Reply-To: <56E29DB1.6000200@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 5/8] virtio-blk: fix "disabled data plane" mode List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Christian Borntraeger , Fam Zheng Cc: qemu-devel@nongnu.org Using the latest qemu from master, and got a new qemu crash as below, (gdb) bt #0 0x000003ffabb3b650 in raise () from /lib64/libc.so.6 #1 0x000003ffabb3ced8 in abort () from /lib64/libc.so.6 #2 0x0000000010384c30 in qemu_coroutine_enter (co=0x10a2ed40, opaque=0x0) at util/qemu-coroutine.c:112 #3 0x00000000102fd5c2 in bdrv_co_io_em_complete (opaque=0x3ff22beb518, ret=0) at block/io.c:2311 #4 0x00000000102f1428 in qemu_laio_process_completion (s=0x10a25e30, laiocb=0x3ffa400a2a0) at block/linux-aio.c:92 #5 0x00000000102f15e8 in qemu_laio_completion_bh (opaque=0x10a25e30) at block/linux-aio.c:139 #6 0x0000000010281d70 in aio_bh_call (bh=0x109e3580) at async.c:65 #7 0x0000000010281eb8 in aio_bh_poll (ctx=0x109efe10) at async.c:93 #8 0x000000001029538e in aio_dispatch (ctx=0x109efe10) at aio-posix.c:306 #9 0x0000000010295da6 in aio_poll (ctx=0x109efe10, blocking=false) at aio-posix.c:475 #10 0x000000001014662e in iothread_run (opaque=0x109ef8d0) at iothread.c:46 #11 0x000003ffabd084c6 in start_thread () from /lib64/libpthread.so.0 #12 0x000003ffabc02ec2 in thread_start () from /lib64/libc.so.6 (gdb) frame 2 #2 0x0000000010384c30 in qemu_coroutine_enter (co=0x10a2ed40, opaque=0x0) at util/qemu-coroutine.c:112 112 abort(); (gdb) list 107 108 trace_qemu_coroutine_enter(self, co, opaque); 109 110 if (co->caller) { 111 fprintf(stderr, "Co-routine re-entered recursively\n"); 112 abort(); 113 } 114 115 co->caller = self; 116 co->entry_arg = opaque; Messages in the log file of "/var/log/libvirt/qemu/" as below, LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name rt_vm2 -S -machine s390-ccw-virtio-2.6,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object iothread,id=iothread1 -uuid 80cfa525-b35b-4341-aa20-a581bb528fbf -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rt_vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -drive file=/dev/mapper/36005076305ffc1ae0000000000008036,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:e7:24:dc:dc:11,devno=fe.0.0000 -netdev tap,fd=30,id=hostnet1,vhost=on,vhostfd=31 -device virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:e3:0a:44,devno=fe.0.0002 -chardev pty,id=charconsole0 -device sclpconsole,chardev=charconsole0,id=console0 -device virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on char device redirected to /dev/pts/6 (label charconsole0) Co-routine re-entered recursively 2016-03-14 09:05:37.075+0000: shutting down On 03/11/2016 06:28 PM, Paolo Bonzini wrote: > > > On 10/03/2016 10:40, Christian Borntraeger wrote: >> On 03/10/2016 10:03 AM, Christian Borntraeger wrote: >>> On 03/10/2016 02:51 AM, Fam Zheng wrote: >>> [...] >>>> The aio_poll() inside "blk_set_aio_context(s->conf->conf.blk, s->ctx)" looks >>>> suspicious: >>>> >>>> main thread iothread >>>> ---------------------------------------------------------------------------- >>>> virtio_blk_handle_output() >>>> virtio_blk_data_plane_start() >>>> vblk->dataplane_started = true; >>>> blk_set_aio_context() >>>> bdrv_set_aio_context() >>>> bdrv_drain() >>>> aio_poll() >>>> >>>> virtio_blk_handle_output() >>>> /* s->dataplane_started is true */ >>>> !!! -> virtio_blk_handle_request() >>>> event_notifier_set(ioeventfd) >>>> aio_poll() >>>> virtio_blk_handle_request() >>>> >>>> Christian, could you try the followed patch? The aio_poll above is replaced >>>> with a "limited aio_poll" that doesn't disptach ioeventfd. >>>> >>>> (Note: perhaps moving "vblk->dataplane_started = true;" after >>>> blk_set_aio_context() also *works around* this.) >>>> >>>> --- >>>> >>>> diff --git a/block.c b/block.c >>>> index ba24b8e..e37e8f7 100644 >>>> --- a/block.c >>>> +++ b/block.c >>>> @@ -4093,7 +4093,9 @@ void bdrv_attach_aio_context(BlockDriverState *bs, >>>> >>>> void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) >>>> { >>>> - bdrv_drain(bs); /* ensure there are no in-flight requests */ >>>> + /* ensure there are no in-flight requests */ >>>> + bdrv_drained_begin(bs); >>>> + bdrv_drained_end(bs); >>>> >>>> bdrv_detach_aio_context(bs); >>>> >>> >>> That seems to do the trick. >> >> Or not. Crashed again :-( > > I would put bdrv_drained_end just before aio_context_release. > > But secondarily, I'm thinking of making the logic simpler to understand > in two ways: > > 1) adding a mutex around virtio_blk_data_plane_start/stop. > > 2) moving > > event_notifier_set(virtio_queue_get_host_notifier(s->vq)); > virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, true); > > to a bottom half (created with aio_bh_new in s->ctx). The bottom half > takes the mutex, checks again "if (vblk->dataplane_started)" and if it's > true starts the processing. > > Thanks, > > Paolo >