From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56708) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjt7a-0003ok-4L for qemu-devel@nongnu.org; Fri, 03 Mar 2017 14:44:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjt7W-0000ro-MH for qemu-devel@nongnu.org; Fri, 03 Mar 2017 14:44:10 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:59557) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cjt7W-0000qM-C6 for qemu-devel@nongnu.org; Fri, 03 Mar 2017 14:44:06 -0500 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v23JhvCM067155 for ; Fri, 3 Mar 2017 14:44:04 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 28xs8eym93-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 03 Mar 2017 14:44:04 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 3 Mar 2017 19:44:01 -0000 References: <20170302130422.81380-1-pasic@linux.vnet.ibm.com> <20170302154935.7b8ec875.cornelia.huck@de.ibm.com> <9a67d492-5bda-4b2a-0732-4034176d0d28@linux.vnet.ibm.com> <7b6ff676-bf57-d9ab-def7-fe894f821f03@redhat.com> From: Halil Pasic Date: Fri, 3 Mar 2017 20:43:58 +0100 MIME-Version: 1.0 In-Reply-To: <7b6ff676-bf57-d9ab-def7-fe894f821f03@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Message-Id: <42da742f-a4fa-b592-904e-5e6f07bfe990@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Cornelia Huck Cc: qemu-stable@nongnu.org, qemu-devel@nongnu.org, Stefan Hajnoczi , "Michael S. Tsirkin" On 03/02/2017 05:21 PM, Paolo Bonzini wrote: > > > On 02/03/2017 16:55, Halil Pasic wrote: >>>>> blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); >> I'm wondering if synchronization is needed for batch_notify_vqs. I think >> the set_bit can be from the iothread, but the notify_guest_bh below is main >> event loop. Is it OK like this (could we miss bits set in other thread)? >> Does blk_set_aio_context include a barrier? > > Another good question---yes, it does (via bdrv_set_aio_context's call to > aio_context_acquire). > > But it's something to remember for when I get round to removing > aio_context_acquire! Uh, this is complicated. I'm not out of questions, but I fear taking to much of your precious time. I will ask again nevertheless, but please just cut the conversation with -EBUSY if it gets to expensive. What I understand is the batch_notify_vqs is effectively protected by blk_get_aio_context(s->conf.conf.blk)->lock which might or might not be the same as qemu_get_aio_context()->lock. If it ain't the same that means we are running with iothread. Now let us have a look at the drain. IFAIU in #define BDRV_POLL_WHILE(bs, cond) ({ \ bool waited_ = false; \ BlockDriverState *bs_ = (bs); \ AioContext *ctx_ = bdrv_get_aio_context(bs_); \ if (aio_context_in_iothread(ctx_)) { \ while ((cond)) { \ aio_poll(ctx_, true); \ waited_ = true; \ } \ } else { \ assert(qemu_get_current_aio_context() == \ qemu_get_aio_context()); \ /* Ask bdrv_dec_in_flight to wake up the main \ * QEMU AioContext. Extra I/O threads never take \ * other I/O threads' AioContexts (see for example \ * block_job_defer_to_main_loop for how to do it). \ */ \ assert(!bs_->wakeup); \ bs_->wakeup = true; \ while ((cond)) { \ aio_context_release(ctx_); \ aio_poll(qemu_get_aio_context(), true); \ aio_context_acquire(ctx_); \ waited_ = true; \ } \ bs_->wakeup = false; \ } \ waited_; }) where it's interesting (me running with 2 iothreads one assigned to my block device) we are gonna take the "else" branch , a will end up releasing the ctx belonging to the iothread and then acquire it again, and basically wait till the requests are done. Since is virtio_blk_rw_complete acquiring-releasing the same ctx this makes a lot of sense. If we would not release the ctx we would end up with a deadlock. What I do not understand entirely are the atomic_read/write on bs->in_flight which tells us when are we done waiting. In static void blk_aio_complete(BlkAioEmAIOCB *acb) { if (acb->has_returned) { bdrv_dec_in_flight(acb->common.bs); acb->common.cb(acb->common.opaque, acb->rwco.ret); qemu_aio_unref(acb); } } the atomic decrements is before the callback, that is virtio_blk_rw_complete called. My guess here is, that this does not matter, because we are holding the ctx anyway, so it is ensured that we will not observe 0 until the last virtio_blk_rw_complete completed (mutex is recursive). But then I do not understand what is the point acquiring the mutex in virtio_blk_rw_complete? TL;DR Is patch b9e413dd required for the correctness of this patch? What is the role of the aio_context_acquire/release introduced by b9e413dd in virtio_blk_rw_complete? Regards, Halil