qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: "open list:Block layer core" <qemu-block@nongnu.org>,
	Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Alberto Faria <afaria@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: AioContext lock removal: help needed
Date: Fri, 8 Jul 2022 11:42:06 +0200	[thread overview]
Message-ID: <7ca3224d-102f-f45c-f765-9ea79110d127@redhat.com> (raw)
In-Reply-To: <8904fc80-4ecf-23f5-ab7b-7d016db78aa5@redhat.com>



Am 08/07/2022 um 10:42 schrieb Emanuele Giuseppe Esposito:
> Hello everyone,
> 
> As you all know, I am trying to find a way to replace the well known
> AioContext lock with something else that makes sense and provides the
> same (or even better) guarantees than using this lock.
> 
> The reason for this change have been explained over and over and I don't
> really want to repeat them. Please read the various series I posted in
> the past [1] for more information.
> 
> The end goal is to get rid of the AioContext, and have fine-granularity
> locks in the various components, to make the whole block layer more
> multi-thread friendly and eventually be able to assign multiple virtual
> queues to a single iothread.
> 
> AioContext lock is used everywhere, to protect a huge variety of data.
> This limits a lot the level of multithreading that iothreads can achieve.
> 
> Before digging into the problem itself and possible solutions, I would
> like to also add that we are having a weekly (or bi-weekly, we'll see)
> public meeting where we plan to discuss about this project. Anyone
> interested is very welcome to join. Event invitation is here:
> 
> https://calendar.google.com/event?action=TEMPLATE&tmeid=NTdja2VwMDFyYm9nNjNyc25pdXU5bm8wb3FfMjAyMjA3MTRUMDgwMDAwWiBlZXNwb3NpdEByZWRoYXQuY29t&tmsrc=eesposit%40redhat.com&scp=ALL
> 
> One huge blocker we are having is removing the AioContext from the block
> API (bdrv_* and friends).
> I identified two initial and main candidates that need to lose the
> aiocontext protection:
> - bdrv_replace_child_noperm
> - bdtv_try_set_aio_context
> 
> When these two functions can safely run without AioContext lock, then we
> are getting rid of the majority of its usage.
> The main issue is: what can we use as replacement?
> 
> Let's analyze bdrv_replace_child_noperm (probably the toughest of the
> two): this function performs a graph modification, removing a child from
> a bs and putting it under another. It modifies the bs' ->parents and
> ->children nodes list, and it definitely needs protection because these
> lists are also read from iothreads in parallel.
> 
> Possible candidates to use as replacement:
> 
> - rwlock. With the help of Paolo, I implemented a rwlock optimized for
> many and fast readers, and few writers. Ideal for
> bdrv_replace_child_noperm. However, the problem here is that when a
> writer has to wait other readers to finish (since it has exclusive
> access), it should call a nested event loop to allow others (reader
> included) to progress.
> And this brings us into serious complications, because polling with a
> wlock taken is prone to a lot of deadlocks, including the fact that the
> AioContext lock is still needed in AIO_WAIT_WHILE. The solution would be
> to run everything, readers included, in coroutines. However, this is not
> easy either: long story short, switching BlockDriverState callbacks to
> coroutines is a big problem, as the AioContext lock is still being taken
> in many of the callbacks caller and therefore switching from a coroutine
> creates a mixture of locks taken that simply results in deadlocks.
> Ideally we want to first get rid of the AioContext lock and then switch
> to coroutines, but that's the whole point of the rwlock.
> More on this here:
> https://patchew.org/QEMU/20220426085114.199647-1-eesposit@redhat.com/#cc5e12d1-d25f-d338-bff2-0d3f5cc0def7@redhat.com

This is also very useful (on the same thread as above):
https://patchew.org/QEMU/20220426085114.199647-1-eesposit@redhat.com/#6fc3e40e-7682-b9dc-f789-3ca95e0430db@redhat.com

> 
> But I would say this is not an ideal candidate to replace the AioContext
> lock. At least not in the immediate future.
> 
> - drains. This was the initial and still main lead. Using
> bdrv_drained_begin/end we are sure that a node and all its parents will
> be paused (included jobs), no io will further come since it will be
> temporarily disabled and all processing requests are ensured to be
> finished by the end bdrv_drained_begin returns.
> Even better than bdrv_drained, I proposed using
> bdrv_subtree_drained_begin, which also stops and protects the child of a
> node.
> I think the major drawback of this is that we need to be sure that there
> are no cases where drains is not enough. Together with Kevin and Stefan
> we identified that we need to prevent drain to be called in coroutines,
> regardless on which AioContext they are run. That's because they could
> allow parallel drain/graph reading to happen, for example (thinking
> about the general case) a coroutine yielding after calling drain_begin
> and in the middle of a graph modification could allow another coroutine
> to drain/read the graph.
> Note that draining itself also involves reading the graph too.
> 
> We thought the only usage of coroutines draining was in mirror run()
> callback. However, that is just the tip of the iceberg.
> Other functions like .bdrv_open callbacks (like qcow2_open) take care of
> creating coroutines to execute part of the logic, with valid performance
> reasons (we don't want to wait when we could simply yield and allow
> something else to run).
> 
> So another question is: what could we do to solve this coroutine issue?
> Ideas?
> 
> Main drain series:
> https://patchew.org/QEMU/20220118162738.1366281-1-eesposit@redhat.com/
> [1]
> 
> 
> 
> [1] = https://patchew.org/QEMU/20220301142113.163174-1-eesposit@redhat.com/
> 
> Thank you,
> Emanuele
> 
> 



      reply	other threads:[~2022-07-08  9:43 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08  8:42 AioContext lock removal: help needed Emanuele Giuseppe Esposito
2022-07-08  9:42 ` Emanuele Giuseppe Esposito [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ca3224d-102f-f45c-f765-9ea79110d127@redhat.com \
    --to=eesposit@redhat.com \
    --cc=afaria@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).