From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52658) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eJixi-0001tu-N5 for qemu-devel@nongnu.org; Tue, 28 Nov 2017 11:42:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eJixh-0006Am-Lf for qemu-devel@nongnu.org; Tue, 28 Nov 2017 11:42:22 -0500 Date: Tue, 28 Nov 2017 11:42:10 -0500 From: Jeff Cody Message-ID: <20171128164210.GG25110@localhost.localdomain> References: <20171128154350.21504-1-kwolf@redhat.com> <20171128154350.21504-4-kwolf@redhat.com> <9bd2921d-64c6-724c-0d80-b43e33f41817@redhat.com> <20171128162850.GF3703@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171128162850.GF3703@localhost.localdomain> Subject: Re: [Qemu-devel] [PATCH for-2.11 3/4] coroutine: Cancel aio_co_schedule() on direct entry List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Paolo Bonzini , qemu-block@nongnu.org, mreitz@redhat.com, stefanha@redhat.com, famz@redhat.com, qemu-devel@nongnu.org On Tue, Nov 28, 2017 at 05:28:50PM +0100, Kevin Wolf wrote: > Am 28.11.2017 um 17:14 hat Paolo Bonzini geschrieben: > > On 28/11/2017 16:43, Kevin Wolf wrote: > > > + /* Make sure that a coroutine that can alternatively reentered from two > > > + * different sources isn't reentered more than once when the first caller > > > + * uses aio_co_schedule() and the other one enters to coroutine directly. > > > + * This is achieved by cancelling the pending aio_co_schedule(). > > > + * > > > + * The other way round, if aio_co_schedule() would be called after this > > > + * point, this would be a problem, too, but in practice it doesn't happen > > > + * because we're holding the AioContext lock here and aio_co_schedule() > > > + * callers must do the same. > > > > No, this is not true. aio_co_schedule is thread-safe. > > Hm... With the reproducer we were specfically looking at > qmp_block_job_cancel(), which does take the AioContext locks. But it > might not be as universal as I thought. > > To be honest, I just wasn't sure what to do with this case anyway. It > means that the coroutine is already running when someone else schedules > it. We don't really know whether we have to enter it a second time or > not. > > So if it can indeed happen in practice, we need to think a bit more > about this. It would be nice if, on coroutine termination, we could unschedule all pending executions for that coroutine. I think use-after-free is the main concern for someone else calling aio_co_schedule() while the coroutine is currently running. > > > > This means that the coroutine just needs to > > > + * prevent other callers from calling aio_co_schedule() before it yields > > > + * (e.g. block job coroutines by setting job->busy = true). > > > + * > > > + * We still want to ensure that the second case doesn't happen, so reset > > > + * co->scheduled only after setting co->caller to make the above check > > > + * effective for the co_schedule_bh_cb() case. */ > > > + atomic_set(&co->scheduled, NULL); > > > > This doesn't work. The coroutine is still in the list, and if someone > > calls aio_co_schedule again now, any coroutines linked from "co" via > > co_scheduled_next are lost. > > Why would they? We still iterate the whole list in co_schedule_bh_cb(), > we just skip the single qemu_coroutine_enter(). > > > (Also, the AioContext lock is by design not protecting any state in > > AioContext itself; the AioContext lock is only protecting things that > > run in an AioContext but do not have their own lock). > > Such as the coroutine we want to enter, no? > > Kevin