From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53674) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bwfAL-0007oP-6x for qemu-devel@nongnu.org; Tue, 18 Oct 2016 20:55:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bwfAK-00036s-Dn for qemu-devel@nongnu.org; Tue, 18 Oct 2016 20:55:33 -0400 Date: Wed, 19 Oct 2016 08:55:24 +0800 From: Fam Zheng Message-ID: <20161019005524.GB9394@lemon> References: <1476712470-11660-1-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1476712470-11660-1-git-send-email-pbonzini@redhat.com> Subject: Re: [Qemu-devel] [PATCH v2 00/20] dataplane: remove RFifoLock List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, kwolf@redhat.com, stefanha@redhat.com On Mon, 10/17 15:54, Paolo Bonzini wrote: > This patch reorganizes aio_poll callers to establish new rules for > dataplane locking. The idea is that I/O operations on a dataplane > BDS (i.e. one where the AioContext is not the main one) do not call > aio_poll anymore. Instead, they wait for the operation to end in the > other I/O thread, at which point the other I/O thread calls bdrv_wakeup > to wake up the main thread. > > With this change, only one thread runs aio_poll for an AioContext. > While aio_context_acquire/release is still needed to protect the BDSes, > it need not interrupt the other thread's event loop anymore, and therefore > it does not need contention callbacks anymore. Thus the patch can remove > RFifoLock. This fixes possible hangs in bdrv_drain_all, reproducible (for > example) by unplugging a virtio-scsi-dataplane device while there is I/O > going on for a virtio-blk-dataplane on the same I/O thread. > > Patch 1 is a bugfix that I already posted. > > Patch 2 makes blockjobs independent of aio_poll, the reason for which > should be apparent from the explanation above. > > Patch 3 is an independent mirror bugfix, that I wanted to submit separately > but happens to fix a hang in COLO replication. Like patch 1 I believe > it's pre-existing and merely exposed by these patches. > > Patches 4 to 10 introduce the infrastructure to wake up the main thread > while bdrv_drain or other synchronous operations are running. Patches 11 > to 16 do other changes to prepare for this. Notably bdrv_drain_all > needs to be called without holding any AioContext lock, so bdrv_reopen > releases the lock temporarily (and callers of bdrv_reopen needs fixing). > > Patch 17 then does the big change, after which there are just some > cleanups left to do. > > Paolo > > Fam Zheng (1): > qed: Implement .bdrv_drain > > Paolo Bonzini (19): > replication: interrupt failover if the main device is closed > blockjob: introduce .drain callback for jobs > mirror: use bdrv_drained_begin/bdrv_drained_end > block: add BDS field to count in-flight requests > block: change drain to look only at one child at a time > block: introduce BDRV_POLL_WHILE > nfs: move nfs_set_events out of the while loops > nfs: use BDRV_POLL_WHILE > sheepdog: use BDRV_POLL_WHILE > aio: introduce qemu_get_current_aio_context > iothread: detach all block devices before stopping them > replication: pass BlockDriverState to reopen_backing_file > block: prepare bdrv_reopen_multiple to release AioContext > qemu-io: acquire AioContext > qemu-img: call aio_context_acquire/release around block job > block: only call aio_poll on the current thread's AioContext > iothread: release AioContext around aio_poll > qemu-thread: introduce QemuRecMutex > aio: convert from RFifoLock to QemuRecMutex Modulo the one harmful question, series: Reviewed-by: Fam Zheng