* [Qemu-devel] Multiqueue block layer
@ 2018-02-18 18:20 Stefan Hajnoczi
2018-02-19 18:03 ` Paolo Bonzini
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2018-02-18 18:20 UTC (permalink / raw)
To: qemu-devel, qemu block; +Cc: Paolo Bonzini, Kevin Wolf, Fam Zheng
Paolo's patches have been getting us closer to multiqueue block layer
support but there is a final set of changes required that has become
clearer to me just recently. I'm curious if this matches Paolo's
vision and whether anyone else has comments.
Multiqueue block layer means that I/O requests for a single disk image
can be processed by multiple threads safely. Requests will be
processed simultaneously where possible, but in some cases
synchronization is necessary to protect shared metadata.
Imagine a virtio-blk device with multiple virtqueues, each with an
ioeventfd that is handled by a different IOThread. Each IOThread
should be able to process I/O requests and invoke completion functions
in the AioContext that submitted the request.
Paolo has made key parts of AioContext and coroutine locks (e.g.
CoQueue) thread-safe. Coroutine code can therefore safely execute in
multiple IOThreads and locking works correctly.
That's not to say that block layer code and block drivers are
thread-safe today. They are not because some code still relies on the
fact that coroutines only execute in one AioContext. They rely on the
AioContext acquire/release lock for thread safety.
We need to push the AioContext lock down into BlockDriverState so that
thread-safety is not tied to a single AioContext but to the
BlockDriverState itself. We also need to audit block layer code to
identify places that assume everything is run from a single
AioContext.
After this is done the final piece is to eliminate
bdrv_set_aio_context(). BlockDriverStates should not be associated
with an AioContext. Instead they should use whichever AioContext they
are invoked under. The current thread's AioContext can be fetched
using qemu_get_current_aio_context(). This is either the main loop
AioContext or an IOThread AioContext.
The .bdrv_attach/detach_aio_context() callbacks will no longer be
necessary in a world where block driver code is thread-safe and any
AioContext can be used.
bdrv_drain_all() and friends do not require extensive modifications
because the bdrv_wakeup() mechanism already works properly when there
are multiple IOThreads involved.
Block jobs no longer need to be in the same AioContext as the
BlockDriverState. For simplicity we may choose to always run them in
the main loop AioContext by default. This may have a performance
impact on tight loops like bdrv_is_allocated() and the initial
mirroring phase, but maybe not.
The upshot of all this is that bdrv_set_aio_context() goes away while
all block driver code needs to be more aware of thread-safety. It can
no longer assume that everything is called from one AioContext.
We should optimize file-posix.c and qcow2.c for maximum parallelism
using fine-grained locks and other techniques. The remaining block
drivers can use one CoMutex per BlockDriverState.
I'm excited that we're relatively close to multiqueue now. I don't
want to jinx it by saying 2018 is the year of the multiqueue block
layer, but I'll say it anyway :).
Thoughts?
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Multiqueue block layer
2018-02-18 18:20 [Qemu-devel] Multiqueue block layer Stefan Hajnoczi
@ 2018-02-19 18:03 ` Paolo Bonzini
2018-02-19 18:38 ` Stefan Hajnoczi
2018-02-19 18:59 ` Alexandre DERUMIER
0 siblings, 2 replies; 5+ messages in thread
From: Paolo Bonzini @ 2018-02-19 18:03 UTC (permalink / raw)
To: Stefan Hajnoczi, qemu-devel, qemu block; +Cc: Kevin Wolf, Fam Zheng
On 18/02/2018 19:20, Stefan Hajnoczi wrote:
> Paolo's patches have been getting us closer to multiqueue block layer
> support but there is a final set of changes required that has become
> clearer to me just recently. I'm curious if this matches Paolo's
> vision and whether anyone else has comments.
>
> We need to push the AioContext lock down into BlockDriverState so that
> thread-safety is not tied to a single AioContext but to the
> BlockDriverState itself. We also need to audit block layer code to
> identify places that assume everything is run from a single
> AioContext.
This is mostly done already. Within BlockDriverState
dirty_bitmap_mutex, reqs_lock and the BQL is good enough in many cases.
Drivers already have their mutex.
> After this is done the final piece is to eliminate
> bdrv_set_aio_context(). BlockDriverStates should not be associated
> with an AioContext. Instead they should use whichever AioContext they
> are invoked under. The current thread's AioContext can be fetched
> using qemu_get_current_aio_context(). This is either the main loop
> AioContext or an IOThread AioContext.
>
> The .bdrv_attach/detach_aio_context() callbacks will no longer be
> necessary in a world where block driver code is thread-safe and any
> AioContext can be used.
This is not entirely possible. In particular, network drivers still
have a "home context" which is where the file descriptor callbacks are
attached to. They could still dispatch I/O from any thread in a
multiqueue setup. This is the remaining intermediate step between "no
AioContext lock" and "multiqueue".
> bdrv_drain_all() and friends do not require extensive modifications
> because the bdrv_wakeup() mechanism already works properly when there
> are multiple IOThreads involved.
Yes, this is already done indeed.
> Block jobs no longer need to be in the same AioContext as the
> BlockDriverState. For simplicity we may choose to always run them in
> the main loop AioContext by default. This may have a performance
> impact on tight loops like bdrv_is_allocated() and the initial
> mirroring phase, but maybe not.
>
> The upshot of all this is that bdrv_set_aio_context() goes away while
> all block driver code needs to be more aware of thread-safety. It can
> no longer assume that everything is called from one AioContext.
Correct.
> We should optimize file-posix.c and qcow2.c for maximum parallelism
> using fine-grained locks and other techniques. The remaining block
> drivers can use one CoMutex per BlockDriverState.
Even better: there is one thread pool and linux-aio context per I/O
thread, file-posix.c should just submit I/O to the current thread with
no locking whatsoever. There is still reqs_lock, but that can be
optimized easily (see
http://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03323.html; now
that we have QemuLockable, reqs_lock could also just become a QemuSpin).
qcow2.c could be adjusted to use rwlocks.
> I'm excited that we're relatively close to multiqueue now. I don't
> want to jinx it by saying 2018 is the year of the multiqueue block
> layer, but I'll say it anyway :).
Heh. I have stopped pushing my patches (and scratched a few itches with
patchew instead) because I'm still a bit burned out from recent KVM
stuff, but this may be the injection of enthusiasm that I needed. :)
Actually, I'd be content with removing the AioContext lock in the first
half of 2018. 1/3rd of that is gone already---doh! But we're actually
pretty close, thanks to you and all the others who have helped reviewing
the past 100 or so patches!
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Multiqueue block layer
2018-02-19 18:03 ` Paolo Bonzini
@ 2018-02-19 18:38 ` Stefan Hajnoczi
2018-02-19 18:41 ` Stefan Hajnoczi
2018-02-19 18:59 ` Alexandre DERUMIER
1 sibling, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2018-02-19 18:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, qemu block, Kevin Wolf, Fam Zheng
On Mon, Feb 19, 2018 at 6:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 18/02/2018 19:20, Stefan Hajnoczi wrote:
>> Paolo's patches have been getting us closer to multiqueue block layer
>> support but there is a final set of changes required that has become
>> clearer to me just recently. I'm curious if this matches Paolo's
>> vision and whether anyone else has comments.
>>
>> We need to push the AioContext lock down into BlockDriverState so that
>> thread-safety is not tied to a single AioContext but to the
>> BlockDriverState itself. We also need to audit block layer code to
>> identify places that assume everything is run from a single
>> AioContext.
>
> This is mostly done already. Within BlockDriverState
> dirty_bitmap_mutex, reqs_lock and the BQL is good enough in many cases.
> Drivers already have their mutex.
In the block/iscsi.c case I noticed iscsilun->mutex isn't being used
consistently by all entry points to the driver. I added it to the
cancellation path recently but wonder if there are more cases where
block drivers are not yet thread-safe. This is what I was thinking
about here.
>> After this is done the final piece is to eliminate
>> bdrv_set_aio_context(). BlockDriverStates should not be associated
>> with an AioContext. Instead they should use whichever AioContext they
>> are invoked under. The current thread's AioContext can be fetched
>> using qemu_get_current_aio_context(). This is either the main loop
>> AioContext or an IOThread AioContext.
>>
>> The .bdrv_attach/detach_aio_context() callbacks will no longer be
>> necessary in a world where block driver code is thread-safe and any
>> AioContext can be used.
>
> This is not entirely possible. In particular, network drivers still
> have a "home context" which is where the file descriptor callbacks are
> attached to. They could still dispatch I/O from any thread in a
> multiqueue setup. This is the remaining intermediate step between "no
> AioContext lock" and "multiqueue".
The iSCSI and NBD protocols support multiple network connections but
we haven't taken advantage of this yet in QEMU. iSCSI explicitly
models a "session" that consists of "connections" (there can be more
than one). The Linux NBD client driver got support for multiple TCP
connections a little while ago, too. This is an extra step that we
can implement in the future and not essential to QEMU block layer
multiqueue support.
>> bdrv_drain_all() and friends do not require extensive modifications
>> because the bdrv_wakeup() mechanism already works properly when there
>> are multiple IOThreads involved.
>
> Yes, this is already done indeed.
>
>> Block jobs no longer need to be in the same AioContext as the
>> BlockDriverState. For simplicity we may choose to always run them in
>> the main loop AioContext by default. This may have a performance
>> impact on tight loops like bdrv_is_allocated() and the initial
>> mirroring phase, but maybe not.
>>
>> The upshot of all this is that bdrv_set_aio_context() goes away while
>> all block driver code needs to be more aware of thread-safety. It can
>> no longer assume that everything is called from one AioContext.
>
> Correct.
>
>> We should optimize file-posix.c and qcow2.c for maximum parallelism
>> using fine-grained locks and other techniques. The remaining block
>> drivers can use one CoMutex per BlockDriverState.
>
> Even better: there is one thread pool and linux-aio context per I/O
> thread, file-posix.c should just submit I/O to the current thread with
> no locking whatsoever. There is still reqs_lock, but that can be
> optimized easily (see
> http://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03323.html; now
> that we have QemuLockable, reqs_lock could also just become a QemuSpin).
>
> qcow2.c could be adjusted to use rwlocks.
>
>> I'm excited that we're relatively close to multiqueue now. I don't
>> want to jinx it by saying 2018 is the year of the multiqueue block
>> layer, but I'll say it anyway :).
>
> Heh. I have stopped pushing my patches (and scratched a few itches with
> patchew instead) because I'm still a bit burned out from recent KVM
> stuff, but this may be the injection of enthusiasm that I needed. :)
>
> Actually, I'd be content with removing the AioContext lock in the first
> half of 2018. 1/3rd of that is gone already---doh! But we're actually
> pretty close, thanks to you and all the others who have helped reviewing
> the past 100 or so patches!
I look forward to reviewing patches you have queued!
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Multiqueue block layer
2018-02-19 18:38 ` Stefan Hajnoczi
@ 2018-02-19 18:41 ` Stefan Hajnoczi
0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2018-02-19 18:41 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, qemu block, Kevin Wolf, Fam Zheng
By the way, one thing that makes me nervous is that we lack an
explicit "request queue" or "per-queue" concept. State can either be
per-BlockDriverState (shared state) or per-request, but not per-queue.
So far there hasn't been a need for per-queue state but there may be
opportunities to eliminate locking if we can replace shared state with
per-queue state.
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] Multiqueue block layer
2018-02-19 18:03 ` Paolo Bonzini
2018-02-19 18:38 ` Stefan Hajnoczi
@ 2018-02-19 18:59 ` Alexandre DERUMIER
1 sibling, 0 replies; 5+ messages in thread
From: Alexandre DERUMIER @ 2018-02-19 18:59 UTC (permalink / raw)
To: pbonzini; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block, Kevin Wolf, Fam Zheng
>>Heh. I have stopped pushing my patches (and scratched a few itches with
>>patchew instead) because I'm still a bit burned out from recent KVM
>>stuff, but this may be the injection of enthusiasm that I needed. :)
Thanks Paolo for your great work on multiqueue, that's a lot of work since the last years !
I cross my fingers for 2018, and wait for ceph rbd block driver multiqueue implementation :)
Regards,
Alexandre
----- Mail original -----
De: "pbonzini" <pbonzini@redhat.com>
À: "Stefan Hajnoczi" <stefanha@gmail.com>, "qemu-devel" <qemu-devel@nongnu.org>, "qemu-block" <qemu-block@nongnu.org>
Cc: "Kevin Wolf" <kwolf@redhat.com>, "Fam Zheng" <famz@redhat.com>
Envoyé: Lundi 19 Février 2018 19:03:19
Objet: Re: [Qemu-devel] Multiqueue block layer
On 18/02/2018 19:20, Stefan Hajnoczi wrote:
> Paolo's patches have been getting us closer to multiqueue block layer
> support but there is a final set of changes required that has become
> clearer to me just recently. I'm curious if this matches Paolo's
> vision and whether anyone else has comments.
>
> We need to push the AioContext lock down into BlockDriverState so that
> thread-safety is not tied to a single AioContext but to the
> BlockDriverState itself. We also need to audit block layer code to
> identify places that assume everything is run from a single
> AioContext.
This is mostly done already. Within BlockDriverState
dirty_bitmap_mutex, reqs_lock and the BQL is good enough in many cases.
Drivers already have their mutex.
> After this is done the final piece is to eliminate
> bdrv_set_aio_context(). BlockDriverStates should not be associated
> with an AioContext. Instead they should use whichever AioContext they
> are invoked under. The current thread's AioContext can be fetched
> using qemu_get_current_aio_context(). This is either the main loop
> AioContext or an IOThread AioContext.
>
> The .bdrv_attach/detach_aio_context() callbacks will no longer be
> necessary in a world where block driver code is thread-safe and any
> AioContext can be used.
This is not entirely possible. In particular, network drivers still
have a "home context" which is where the file descriptor callbacks are
attached to. They could still dispatch I/O from any thread in a
multiqueue setup. This is the remaining intermediate step between "no
AioContext lock" and "multiqueue".
> bdrv_drain_all() and friends do not require extensive modifications
> because the bdrv_wakeup() mechanism already works properly when there
> are multiple IOThreads involved.
Yes, this is already done indeed.
> Block jobs no longer need to be in the same AioContext as the
> BlockDriverState. For simplicity we may choose to always run them in
> the main loop AioContext by default. This may have a performance
> impact on tight loops like bdrv_is_allocated() and the initial
> mirroring phase, but maybe not.
>
> The upshot of all this is that bdrv_set_aio_context() goes away while
> all block driver code needs to be more aware of thread-safety. It can
> no longer assume that everything is called from one AioContext.
Correct.
> We should optimize file-posix.c and qcow2.c for maximum parallelism
> using fine-grained locks and other techniques. The remaining block
> drivers can use one CoMutex per BlockDriverState.
Even better: there is one thread pool and linux-aio context per I/O
thread, file-posix.c should just submit I/O to the current thread with
no locking whatsoever. There is still reqs_lock, but that can be
optimized easily (see
http://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03323.html; now
that we have QemuLockable, reqs_lock could also just become a QemuSpin).
qcow2.c could be adjusted to use rwlocks.
> I'm excited that we're relatively close to multiqueue now. I don't
> want to jinx it by saying 2018 is the year of the multiqueue block
> layer, but I'll say it anyway :).
Heh. I have stopped pushing my patches (and scratched a few itches with
patchew instead) because I'm still a bit burned out from recent KVM
stuff, but this may be the injection of enthusiasm that I needed. :)
Actually, I'd be content with removing the AioContext lock in the first
half of 2018. 1/3rd of that is gone already---doh! But we're actually
pretty close, thanks to you and all the others who have helped reviewing
the past 100 or so patches!
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-02-19 19:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-18 18:20 [Qemu-devel] Multiqueue block layer Stefan Hajnoczi
2018-02-19 18:03 ` Paolo Bonzini
2018-02-19 18:38 ` Stefan Hajnoczi
2018-02-19 18:41 ` Stefan Hajnoczi
2018-02-19 18:59 ` Alexandre DERUMIER
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).