From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38038) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ag3uG-00053S-5Q for qemu-devel@nongnu.org; Wed, 16 Mar 2016 01:22:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ag3uC-0000ig-So for qemu-devel@nongnu.org; Wed, 16 Mar 2016 01:22:04 -0400 Received: from e32.co.us.ibm.com ([32.97.110.150]:48254) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ag3uC-0000iX-B2 for qemu-devel@nongnu.org; Wed, 16 Mar 2016 01:22:00 -0400 Received: from localhost by e32.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 15 Mar 2016 23:21:58 -0600 Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 0CEAC1FF0023 for ; Tue, 15 Mar 2016 23:10:06 -0600 (MDT) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u2G5LuMv35782836 for ; Tue, 15 Mar 2016 22:21:56 -0700 Received: from d03av03.boulder.ibm.com (localhost [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u2G5Lu1f001752 for ; Tue, 15 Mar 2016 23:21:56 -0600 References: <1455470231-5223-1-git-send-email-pbonzini@redhat.com> <1455470231-5223-6-git-send-email-pbonzini@redhat.com> <56E01544.6060305@de.ibm.com> <56E01D3F.1060204@redhat.com> <56E03333.5020601@de.ibm.com> <56E04C9B.7070801@redhat.com> <20160310015154.GD23632@ad.usersys.redhat.com> <56E13849.3060409@de.ibm.com> <56E14101.4030405@de.ibm.com> <56E29DB1.6000200@redhat.com> <20160315124530.GA10718@ad.usersys.redhat.com> From: tu bo Message-ID: <56E8ED72.4030004@linux.vnet.ibm.com> Date: Wed, 16 Mar 2016 13:21:54 +0800 MIME-Version: 1.0 In-Reply-To: <20160315124530.GA10718@ad.usersys.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 5/8] virtio-blk: fix "disabled data plane" mode List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng , Paolo Bonzini Cc: Christian Borntraeger , qemu-devel@nongnu.org On 03/15/2016 08:45 PM, Fam Zheng wrote: > On Fri, 03/11 11:28, Paolo Bonzini wrote: >> >> >> On 10/03/2016 10:40, Christian Borntraeger wrote: >>> On 03/10/2016 10:03 AM, Christian Borntraeger wrote: >>>> On 03/10/2016 02:51 AM, Fam Zheng wrote: >>>> [...] >>>>> The aio_poll() inside "blk_set_aio_context(s->conf->conf.blk, s->ctx)" looks >>>>> suspicious: >>>>> >>>>> main thread iothread >>>>> ---------------------------------------------------------------------------- >>>>> virtio_blk_handle_output() >>>>> virtio_blk_data_plane_start() >>>>> vblk->dataplane_started = true; >>>>> blk_set_aio_context() >>>>> bdrv_set_aio_context() >>>>> bdrv_drain() >>>>> aio_poll() >>>>> >>>>> virtio_blk_handle_output() >>>>> /* s->dataplane_started is true */ >>>>> !!! -> virtio_blk_handle_request() >>>>> event_notifier_set(ioeventfd) >>>>> aio_poll() >>>>> virtio_blk_handle_request() >>>>> >>>>> Christian, could you try the followed patch? The aio_poll above is replaced >>>>> with a "limited aio_poll" that doesn't disptach ioeventfd. >>>>> >>>>> (Note: perhaps moving "vblk->dataplane_started = true;" after >>>>> blk_set_aio_context() also *works around* this.) >>>>> >>>>> --- >>>>> >>>>> diff --git a/block.c b/block.c >>>>> index ba24b8e..e37e8f7 100644 >>>>> --- a/block.c >>>>> +++ b/block.c >>>>> @@ -4093,7 +4093,9 @@ void bdrv_attach_aio_context(BlockDriverState *bs, >>>>> >>>>> void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) >>>>> { >>>>> - bdrv_drain(bs); /* ensure there are no in-flight requests */ >>>>> + /* ensure there are no in-flight requests */ >>>>> + bdrv_drained_begin(bs); >>>>> + bdrv_drained_end(bs); >>>>> >>>>> bdrv_detach_aio_context(bs); >>>>> >>>> >>>> That seems to do the trick. >>> >>> Or not. Crashed again :-( >> >> I would put bdrv_drained_end just before aio_context_release. > > This won't work. bdrv_drained_end must be called with the same ctx as > bdrv_drained_begin, which is only true before bdrv_detach_aio_context(). > >> >> But secondarily, I'm thinking of making the logic simpler to understand >> in two ways: >> >> 1) adding a mutex around virtio_blk_data_plane_start/stop. >> >> 2) moving >> >> event_notifier_set(virtio_queue_get_host_notifier(s->vq)); >> virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, true); >> >> to a bottom half (created with aio_bh_new in s->ctx). The bottom half >> takes the mutex, checks again "if (vblk->dataplane_started)" and if it's >> true starts the processing. > > Like this? If it captures your idea, could Bo or Christian help test? > > With this patch, I still can get qemu crash as before, (gdb) bt #0 bdrv_co_do_rw (opaque=0x0) at block/io.c:2172 #1 0x000002aa17f5a4a6 in coroutine_trampoline (i0=, i1=-1677707808) at util/coroutine-ucontext.c:79 #2 0x000003ffac25150a in __makecontext_ret () from /lib64/libc.so.6 Good news is that frequency of qemu crash is much less that before. --- > > From b5b8886693828d498ee184fc7d4e13d8c06cdf39 Mon Sep 17 00:00:00 2001 > From: Fam Zheng > Date: Thu, 10 Mar 2016 10:26:36 +0800 > Subject: [PATCH] virtio-blk dataplane start crash fix > > Suggested-by: Paolo Bonzini > Signed-off-by: Fam Zheng > --- > block.c | 4 +++- > hw/block/dataplane/virtio-blk.c | 39 ++++++++++++++++++++++++++++++++------- > 2 files changed, 35 insertions(+), 8 deletions(-) > > diff --git a/block.c b/block.c > index ba24b8e..e37e8f7 100644 > --- a/block.c > +++ b/block.c > @@ -4093,7 +4093,9 @@ void bdrv_attach_aio_context(BlockDriverState *bs, > > void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) > { > - bdrv_drain(bs); /* ensure there are no in-flight requests */ > + /* ensure there are no in-flight requests */ > + bdrv_drained_begin(bs); > + bdrv_drained_end(bs); > > bdrv_detach_aio_context(bs); > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > index 36f3d2b..6db5c22 100644 > --- a/hw/block/dataplane/virtio-blk.c > +++ b/hw/block/dataplane/virtio-blk.c > @@ -49,6 +49,8 @@ struct VirtIOBlockDataPlane { > > /* Operation blocker on BDS */ > Error *blocker; > + > + QemuMutex start_lock; > }; > > /* Raise an interrupt to signal guest, if necessary */ > @@ -150,6 +152,7 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf, > s = g_new0(VirtIOBlockDataPlane, 1); > s->vdev = vdev; > s->conf = conf; > + qemu_mutex_init(&s->start_lock); > > if (conf->iothread) { > s->iothread = conf->iothread; > @@ -184,15 +187,38 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s) > g_free(s); > } > > +typedef struct { > + VirtIOBlockDataPlane *s; > + QEMUBH *bh; > +} VirtIOBlockStartData; > + > +static void virtio_blk_data_plane_start_bh_cb(void *opaque) > +{ > + VirtIOBlockStartData *data = opaque; > + VirtIOBlockDataPlane *s = data->s; > + > + /* Kick right away to begin processing requests already in vring */ > + event_notifier_set(virtio_queue_get_host_notifier(s->vq)); > + > + /* Get this show started by hooking up our callbacks */ > + virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, true); > + > + qemu_bh_delete(data->bh); > + g_free(data); > +} > + > /* Context: QEMU global mutex held */ > void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) > { > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s->vdev))); > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > VirtIOBlock *vblk = VIRTIO_BLK(s->vdev); > + VirtIOBlockStartData *data; > int r; > > + qemu_mutex_lock(&s->start_lock); > if (vblk->dataplane_started || s->starting) { > + qemu_mutex_unlock(&s->start_lock); > return; > } > > @@ -221,13 +247,11 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) > > blk_set_aio_context(s->conf->conf.blk, s->ctx); > > - /* Kick right away to begin processing requests already in vring */ > - event_notifier_set(virtio_queue_get_host_notifier(s->vq)); > - > - /* Get this show started by hooking up our callbacks */ > - aio_context_acquire(s->ctx); > - virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, true); > - aio_context_release(s->ctx); > + data = g_new(VirtIOBlockStartData, 1); > + data->s = s; > + data->bh = aio_bh_new(s->ctx, virtio_blk_data_plane_start_bh_cb, data); > + qemu_bh_schedule(data->bh); > + qemu_mutex_unlock(&s->start_lock); > return; > > fail_host_notifier: > @@ -236,6 +260,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) > s->disabled = true; > s->starting = false; > vblk->dataplane_started = true; > + qemu_mutex_unlock(&s->start_lock); > } > > /* Context: QEMU global mutex held */ >