From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37225) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZbPJf-0000ib-2X for qemu-devel@nongnu.org; Mon, 14 Sep 2015 04:40:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZbPJd-0003aM-JM for qemu-devel@nongnu.org; Mon, 14 Sep 2015 04:40:46 -0400 Date: Mon, 14 Sep 2015 10:40:35 +0200 From: Kevin Wolf Message-ID: <20150914084035.GA3550@noname.str.redhat.com> References: <1438144934-23619-1-git-send-email-famz@redhat.com> <20150911103926.GA5164@noname.redhat.com> <20150911114653.GA11386@ad.nay.redhat.com> <20150911122214.GB5164@noname.redhat.com> <20150914072730.GB31803@ad.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150914072730.GB31803@ad.nay.redhat.com> Subject: Re: [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: pbonzini@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org Am 14.09.2015 um 09:27 hat Fam Zheng geschrieben: > On Fri, 09/11 14:22, Kevin Wolf wrote: > > Am 11.09.2015 um 13:46 hat Fam Zheng geschrieben: > > > On Fri, 09/11 12:39, Kevin Wolf wrote: > > > > Am 29.07.2015 um 06:42 hat Fam Zheng geschrieben: > > > > > v2: Switch to disable/enable model. [Paolo] > > > > > > > > > > Most existing nested aio_poll()'s in block layer are inconsiderate of > > > > > dispatching potential new r/w requests from ioeventfds and nbd exports, which > > > > > might result in responsiveness issues (e.g. bdrv_drain_all will not return when > > > > > new requests keep coming), or even wrong semantics (e.g. qmp_transaction cannot > > > > > enforce atomicity due to aio_poll in bdrv_drain_all). > > > > > > > > > > Previous attampts to address this issue include new op blocker[1], bdrv_lock[2] > > > > > and nested AioContext (patches not posted to qemu-devel). > > > > > > > > > > This approach is based on the idea proposed by Paolo Bonzini. The original idea > > > > > is introducing "aio_context_disable_client / aio_context_enable_client to > > > > > filter AioContext handlers according to the "client", e.g. > > > > > AIO_CLIENT_DATAPLANE (ioeventfd), AIO_CLIENT_PROTOCOL, AIO_CLIENT_NBD_SERVER, > > > > > AIO_CLIENT_CONTEXT, ... Extend aio_set_{event_notifier,fd}_handler to pass a > > > > > client (type)." > > > > > > > > > > After this series, block layer aio_poll() will only process those "protocol" > > > > > fds that are used in block I/O, plus the ctx->notifier for aio_notify(); other > > > > > aio_poll()'s keep unchanged. > > > > > > > > > > The biggest advantage over approaches [1] and [2] is, no change is needed in > > > > > virtio-{blk,scsi}-dataplane code, also this doesn't depend on converting QMP to > > > > > coroutines. > > > > > > > > It seems that I haven't replied on the mailing list yet, even though I > > > > think I already said this in person at KVM Forum: This series fixes only > > > > a special case of the real problem, which is that bdrv_drain/all at a > > > > single point doesn't make a lot of sense, but needs to be replaced by a > > > > whole section with exclusive access, like a bdrv_drained_begin/end pair. > > > > > > > > To be clear: Anything that works with types of users instead of > > > > individual users is bound to fall short of being a complete solution. I > > > > don't prefer partial solutions when we know there is a bigger problem. > > > > > > > > This series addresses your immediate need of protecting against new data > > > > plane requests, which it arguably achieves. The second case I always > > > > have in mind is Berto's case where he has multiple streaming block jobs > > > > in the same backing file chain [1]. > > > > > > > > This involves a bdrv_reopen() of the target BDS to make it writable, and > > > > bdrv_reopen() uses bdrv_drain_all() so drivers don't have to cope with > > > > running requests while reopening themselves. It can however involve > > > > nested event loops for synchronous operations like bdrv_flush(), and if > > > > those can process completions of block jobs, which can respond by doing > > > > anything to the respective node, things can go wrong. > > > > > > Just to get a better idea of bdrv_drained_begin/end, could you explain how to > > > use the pair to fix the above problem? > > > > How to use it is easy part: In bdrv_reopen_multiple(), you would replace > > the existing bdrv_drain_all() with begin and you would add the > > corresponding end right before the return statement. > > OK, so that the completion of other jobs won't happen because we only complete > the relevant requests? > > Does block_job_pause() work here? If you find all block jobs that do something with the BDS, block_job_pause() would work, yes. But if you go further down that road, you end up not with a design that considers the problem, but with special casing every single user. That doesn't feel very sustainable. > > > > You don't solve this by adding client types (then problematic request > > > > would be PROTOCOL in your proposal and you can never exclude that), but > > > > you really need to have bdrv_drained_being/end pairs, where only > > > > requests issued in between are processed and everything else waits. > > > > > > What do you mean by "only requests issued in between are processed"? Where are > > > the requests from? > > > > Generally speaking, you would have code that looks like this: > > > > bdrv_drain_begin() > > ... > > bdrv_something_synchronous() > > ... > > bdrv_drain_end() > > > > You want to process everything that is necessary for completing > > bdrv_something_synchronous(), but nothing else. > > > > The trickier question is how to implement this. I know that it's much > > easier to say that your series doesn't work than actually proposing > > something else that works... > > I see the basic idea but that is still way too obscure. How would > bdrv_draind_begin/end know what is necessary in completing > bdrv_something_synchronous()? That's not about the interface, but the implementation part, which I tried to answer below. > > One relatively obvious answer we found when we discussed this a while > > back was some kind of a recursive CoRwLock (reader = in-flight request; > > writer = drained section), but that requires obviously that you're > > running in a coroutine if you want to do something with a drained > > request queue. > > > > I'm also not totally happy with the requirement of taking a reader lock > > more or less everywhere. But I'm not sure yet if there is a good > > alternative that can achieve the same. > > We're basically trying to prevent new requests from being submitted, but > couldn't this blocking be a complication itself? A CoRwLock, if any, would be > implemented with something like a CoQueue, but considering the existing CoQueue > in BDS - throttled_reqs, aren't those what we want to keep *empty* between the > bdrv_drain_begin/end pair? Now we're talking about queueing requests to > another CoQueue, which feels contradictory to me. You always queue requests _somewhere_. If they aren't queued in the block layer, they are in the virtio ring or elsewhere. I think the point of bdrv_drain() is that the block drivers don't have any pending requests; they can be queued elsewhere without hurting anyone. Kevin