qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	John Snow <jsnow@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Fam Zheng <fam@euphon.net>,
	qemu-devel@nongnu.org
Subject: Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock
Date: Wed, 25 May 2022 10:27:34 +0200	[thread overview]
Message-ID: <98027e36-a8a2-e679-4018-c01e08cf124d@redhat.com> (raw)
In-Reply-To: <YozLHPif/jCmOfei@redhat.com>



Am 24/05/2022 um 14:10 schrieb Kevin Wolf:
> Am 18.05.2022 um 14:28 hat Emanuele Giuseppe Esposito geschrieben:
>> label: // read till the end to see why I wrote this here
>>
>> I was hoping someone from the "No" party would answer to your question,
>> because as you know we reached this same conclusion together.
>>
>> We thought about dropping the drain for various reasons, the main one
>> (at least as far as I understood) is that we are not sure if something
>> can still happen in between drain_begin/end, and it is a little bit
>> confusing to use the same mechanism to block I/O and protect the graph.
>>
>> We then thought about implementing a rwlock. A rdlock would clarify what
>> we are protecting and who is using the lock. I had a rwlock draft
>> implementation sent in this thread, but this also lead to additional
>> problems.
>> Main problem was that this new lock would introduce nested event loops,
>> that together with such locking would just create deadlocks.
>> If readers are in coroutines and writers are not (because graph
>> operations are not running in coroutines), we have a lot of deadlocks.
>> If a writer has to take the lock, it must wait all other readers to
>> finish. And it does it by internally calling AIO_WAIT_WHILE, creating
>> nested event loop. We don't know what could execute when polling for
>> events, and for example another writer could be resumed.
> 
> Why is this a problem? Your AIO_WAIT_WHILE() condition would be that
> there are neither readers nor writers, so you would just keep waiting
> until the other writer is done.

Yes, but when we get to the AIO_WAIT_WHILE() condition the wrlock is
already being taken in the current writer.
I think that's what you also mean below.

> 
>> Ideally, we want writers in coroutines too.
>>
>> Additionally, many readers are running in what we call "mixed"
>> functions: usually implemented automatically with generated_co_wrapper
>> tag, they let a function (usually bdrv callback) run always in a
>> coroutine, creating one if necessary. For example, bdrv_flush() makes
>> sure hat bs->bdrv_co_flush() is always run in a coroutine.
>> Such mixed functions are used in other callbacks too, making it really
>> difficult to understand if we are in a coroutine or not, and mostly
>> important make rwlock usage very difficult.
> 
> How do they make rwlock usage difficult?
> 
> *goes back to old IRC discussions*
> 
> Ah, the problem was not the AIO_WAIT_WHILE() while taking the lock, but
> taking the lock first and then running an AIO_WAIT_WHILE() inside the
> locked section. This is forbidden because the callbacks that run during
> AIO_WAIT_WHILE() may in turn wait for the lock that you own, causing a
> deadlock.
> 

Yes

> This is indeed a problem that running in coroutines would avoid because
> the inner waiter would just yield and the outer one could complete its
> job as soon as it's its turn.
> 
> My conclusion in the IRC discussion was that maybe we need to take the
> graph locks when we're entering coroutine context, i.e. the "mixed"
> functions would rdlock the graph when called from non-coroutine context
> and would assume that it's already locked when called from coroutine
> context.

Yes, and that's what I tried to do.
But the first step was to transform all callbacks as coroutines. I think
you also agree with this, correct?

And therefore the easiest step was to convert all callbacks in
generated_co_wrapper functions, so that afterwards we could split them
between coroutine and not-coroutine logic, as discussed on IRC.
Once split, we add the lock in the way you suggested.

However, I didn't even get to the first step part, because tests were
deadlocking after just transforming 2-3 callbacks.

See Paolo thread for a nice explanation on why they are deadlocking and
converting these callbacks is difficult.

> 
>> Which lead us to stepping back once more and try to convert all
>> BlockDriverState callbacks in coroutines. This would greatly simplify
>> rwlock usage, because we could make the rwlock coroutine-frendly
>> (without any AIO_WAIT_WHILE, allowing a writer to wait for readers by
>> just yielding and queuing itself in coroutine queues).
>>
>> First step was then to convert all callbacks in coroutines, using
>> generated_coroutine_wrapper (g_c_w).
>> A typical g_c_w is implemented in this way:
>> 	if (qemu_in_coroutine()) {
>> 		callback();
>> 	} else { // much simplified
>> 		co = qemu_coroutine_create(callback);
>> 		bdrv_coroutine_enter(bs, co);
>> 		BDRV_POLL_WHILE(bs, coroutine_in_progress);
>> 	}
>> Once all callbacks are implemented using g_c_w, we can start splitting
>> the two sides of the if function to only create a coroutine when we are
>> outside from a bdrv callback.
>>
>> However, we immediately found a problem while starting to convert the
>> first callbacks: the AioContext lock is taken around some non coroutine
>> callbacks! For example, bs->bdrv_open() is always called with the
>> AioContext lock taken. In addition, callbacks like bdrv_open are
>> graph-modifying functions, which is probably why we are taking the
>> Aiocontext lock, and they do not like to run in coroutines.
>> Anyways, the real problem comes when we create a coroutine in such
>> places where the AioContext lock is taken and we have a graph-modifying
>> function.
>>
>> bdrv_coroutine_enter() calls aio_co_enter(), which in turns first checks
>>  if the coroutine is entering another context from the current (which is
>> not the case for open) and if we are already in coroutine (for sure
>> not). Therefore it resorts to the following calls;
>> 	aio_context_acquire(ctx);
>>         qemu_aio_coroutine_enter(ctx, co);
>>         aio_context_release(ctx);
>> Which is clearly a problem, because we are taking the lock twice: once
>> from the original caller of the callback, and once here due to the
>> coroutine. This creates a lot of deadlock situations.
> 
> What are the deadlock situations that are created by locking twice?

Scratch this, and refer to Paolo's thread.

> 
> The only problem I'm aware of is AIO_WAIT_WHILE(), which wants to
> temporarily unlock the AioContext It calls aio_context_release() once to
> achieve this, which obviously isn't enough when the context was locked
> twice.
> 
> But AIO_WAIT_WHILE() isn't allowed in coroutines anyway. So how are we
> running into deadlocks here?
> 
> Note that we're probably already doing this inside the .bdrv_open
> implementations: They will ususally read something from the image file,
> calling bdrv_preadv() which is already a generated_coroutine_wrapper
> today and creates a coroutine internally with the same locking pattern
> applied that you describe as problematic here.
> 
> Making .bdrv_open itself a generated_coroutine_wrapper wouldn't really
> change anything fundamental, it would just pull the existing mechanism
> one function higher in the call stack.
> 
>> For example, all callers of bdrv_open() always take the AioContext lock.
>> Often it is taken very high in the call stack, but it's always taken.
>>
>> Getting rid of the lock around qemu_aio_coroutine_enter() is difficult
>> too, because coroutines expect to have the lock taken. For example, if
>> we want to drain from a coroutine, bdrv_co_yield_to_drain releases the
>> lock for us.
> 
> It's not difficult at all in your case where you know that you're
> already in the right thread and the lock is taken: You can call
> qemu_aio_coroutine_enter() directly instead of bdrv_coroutine_enter() in
> this case.
> 
> But as I said, I'm not sure why we need to get rid of it at all.
> 
> Kevin
> 



      reply	other threads:[~2022-05-25  8:37 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-26  8:51 [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock Emanuele Giuseppe Esposito
2022-04-26  8:51 ` [RFC PATCH v2 1/8] aio_wait_kick: add missing memory barrier Emanuele Giuseppe Esposito
2022-04-28 11:09   ` Stefan Hajnoczi
2022-04-29  8:06     ` Emanuele Giuseppe Esposito
2022-04-30  5:21       ` Stefan Hajnoczi
2022-04-29  8:12   ` Paolo Bonzini
2022-04-26  8:51 ` [RFC PATCH v2 2/8] coroutine-lock: release lock when restarting all coroutines Emanuele Giuseppe Esposito
2022-04-26 14:59   ` Paolo Bonzini
2022-04-28 11:21   ` Stefan Hajnoczi
2022-04-28 22:14     ` Paolo Bonzini
2022-04-29  9:35       ` Emanuele Giuseppe Esposito
2022-04-26  8:51 ` [RFC PATCH v2 3/8] block: introduce a lock to protect graph operations Emanuele Giuseppe Esposito
2022-04-26 15:00   ` Paolo Bonzini
2022-04-28 13:45   ` Stefan Hajnoczi
2022-04-29  8:37     ` Emanuele Giuseppe Esposito
2022-04-30  5:48       ` Stefan Hajnoczi
2022-05-02  7:54         ` Emanuele Giuseppe Esposito
2022-05-03 10:50           ` Stefan Hajnoczi
2022-04-26  8:51 ` [RFC PATCH v2 4/8] async: register/unregister aiocontext in graph lock list Emanuele Giuseppe Esposito
2022-04-28 13:46   ` Stefan Hajnoczi
2022-04-28 22:19     ` Paolo Bonzini
2022-04-29  8:37       ` Emanuele Giuseppe Esposito
2022-04-26  8:51 ` [RFC PATCH v2 5/8] block.c: wrlock in bdrv_replace_child_noperm Emanuele Giuseppe Esposito
2022-04-26 15:07   ` Paolo Bonzini
2022-04-28 13:55   ` Stefan Hajnoczi
2022-04-29  8:41     ` Emanuele Giuseppe Esposito
2022-04-26  8:51 ` [RFC PATCH v2 6/8] block: assert that graph read and writes are performed correctly Emanuele Giuseppe Esposito
2022-04-28 14:43   ` Stefan Hajnoczi
2022-04-26  8:51 ` [RFC PATCH v2 7/8] graph-lock: implement WITH_GRAPH_RDLOCK_GUARD and GRAPH_RDLOCK_GUARD macros Emanuele Giuseppe Esposito
2022-04-28 15:00   ` Stefan Hajnoczi
2022-04-26  8:51 ` [RFC PATCH v2 8/8] mirror: protect drains in coroutine with rdlock Emanuele Giuseppe Esposito
2022-04-27  6:55 ` [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock Emanuele Giuseppe Esposito
2022-04-28 10:45   ` Stefan Hajnoczi
2022-04-28 21:56     ` Emanuele Giuseppe Esposito
2022-04-30  5:17       ` Stefan Hajnoczi
2022-05-02  8:02         ` Emanuele Giuseppe Esposito
2022-05-02 13:15           ` Paolo Bonzini
2022-05-03  8:24           ` Kevin Wolf
2022-05-03 11:04           ` Stefan Hajnoczi
2022-04-28 10:34 ` Stefan Hajnoczi
2022-04-29  8:06   ` Emanuele Giuseppe Esposito
2022-05-04 13:39 ` Stefan Hajnoczi
2022-05-17 10:59   ` Stefan Hajnoczi
2022-05-18 12:28     ` Emanuele Giuseppe Esposito
2022-05-18 12:43       ` Paolo Bonzini
2022-05-18 14:57         ` Stefan Hajnoczi
2022-05-18 16:14         ` Kevin Wolf
2022-05-19 11:27           ` Stefan Hajnoczi
2022-05-19 12:52             ` Kevin Wolf
2022-05-22 15:06           ` Stefan Hajnoczi
2022-05-23  8:48             ` Emanuele Giuseppe Esposito
2022-05-23 13:15               ` Stefan Hajnoczi
2022-05-23 13:54                 ` Emanuele Giuseppe Esposito
2022-05-23 13:02             ` Kevin Wolf
2022-05-23 15:13               ` Stefan Hajnoczi
2022-05-23 16:04                 ` Kevin Wolf
2022-05-23 16:45                   ` Stefan Hajnoczi
2022-05-24  7:55             ` Paolo Bonzini
2022-05-24  8:08               ` Stefan Hajnoczi
2022-05-24  9:17                 ` Paolo Bonzini
2022-05-24 10:20                   ` Stefan Hajnoczi
2022-05-24 17:25                     ` Paolo Bonzini
2022-05-24 10:36         ` Kevin Wolf
2022-05-25  7:41           ` Paolo Bonzini
2022-05-18 14:27       ` Stefan Hajnoczi
2022-05-24 12:10       ` Kevin Wolf
2022-05-25  8:27         ` Emanuele Giuseppe Esposito [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98027e36-a8a2-e679-4018-c01e08cf124d@redhat.com \
    --to=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).