From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34538) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b2gRD-0006Er-80 for qemu-devel@nongnu.org; Tue, 17 May 2016 10:57:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b2gR8-0002Gu-UG for qemu-devel@nongnu.org; Tue, 17 May 2016 10:57:34 -0400 Received: from mail-wm0-x241.google.com ([2a00:1450:400c:c09::241]:35970) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b2gR8-0002Go-OH for qemu-devel@nongnu.org; Tue, 17 May 2016 10:57:30 -0400 Received: by mail-wm0-x241.google.com with SMTP id w143so5313997wmw.3 for ; Tue, 17 May 2016 07:57:30 -0700 (PDT) Sender: Paolo Bonzini References: <1460719926-12950-1-git-send-email-pbonzini@redhat.com> <1460719926-12950-5-git-send-email-pbonzini@redhat.com> <20160419143149.GI16312@stefanha-x1.localdomain> From: Paolo Bonzini Message-ID: <4dafcfa5-54f9-3ab1-afcf-5ac7270a3951@redhat.com> Date: Tue, 17 May 2016 16:57:28 +0200 MIME-Version: 1.0 In-Reply-To: <20160419143149.GI16312@stefanha-x1.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH 04/11] aio: introduce aio_co_schedule List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: qemu-devel@nongnu.org, kwolf@redhat.com, berto@igalia.com, famz@redhat.com, stefanha@redhat.com On 19/04/2016 16:31, Stefan Hajnoczi wrote: > On Fri, Apr 15, 2016 at 01:31:59PM +0200, Paolo Bonzini wrote: >> @@ -255,6 +257,8 @@ aio_ctx_finalize(GSource *source) >> } >> #endif >> >> + qemu_bh_delete(ctx->schedule_bh); > > Please include an assertion that the scheduled coroutines list is empty. Good idea. >> + while (!QSLIST_EMPTY(&straight)) { >> + Coroutine *co = QSLIST_FIRST(&straight); >> + QSLIST_REMOVE_HEAD(&straight, co_scheduled_next); >> + trace_aio_schedule_bh_cb(ctx, co); >> + qemu_coroutine_enter(co, NULL); FWIW, this should be wrapped with aio_context_acquire/aio_context_release. >> + } >> +} > > This construct brings to mind the use-after-free case when a scheduled > coroutine terminates before it is entered by this loop: > > There are two scheduled Coroutines: A and B. During > qemu_coroutine_enter(A) we enter B. B then terminates by returning from > its main function. Once A yields or terminates we still try to enter > the freed B coroutine. > > Unfortunately I don't think we have good debugging or an assertion for > this bug. I'm sure it will occur at some point... aio_co_schedule (and qemu_coroutine_wake which wraps it later in the series) is quite a low-level interface, so I do not expect many users. That said, there is at least another case where it will be used. In the dataplane branch, where AIO callbacks take the AioContext mutex themselves, we have: static void bdrv_co_io_em_complete(void *opaque, int ret) { CoroutineIOCompletion *co = opaque; co->ret = ret; aio_context_acquire(co->ctx); qemu_coroutine_enter(co->coroutine, NULL); aio_context_release(co->ctx); } ... acb = bs->drv->bdrv_aio_readv(bs, sector_num, iov, nb_sectors, bdrv_co_io_em_complete, &co); qemu_coroutine_yield(); bdrv_co_io_em_complete here can be called before the coroutine has yielded. To prepare for the replacement of the AioContext mutex with fine-grained mutexes, I think bdrv_co_io_em_complete should do something like if (ctx != qemu_get_current_aio_context()) { aio_co_schedule(ctx, co->coroutine); return; } aio_context_acquire(ctx); qemu_coroutine_enter(co->coroutine, NULL); aio_context_release(ctx); > Please document > that the coroutine must not be entered by anyone else while > aio_co_schedule() is active. Sure. Paolo