From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <fam@euphon.net>,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org,
Hanna Reitz <hreitz@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 13 Apr 2022 17:14:04 +0200 [thread overview]
Message-ID: <5d34e709-fe59-70df-2723-49f252aaed78@redhat.com> (raw)
In-Reply-To: <Ylbjd3kzEsBZmgJQ@redhat.com>
Am 13/04/2022 um 16:51 schrieb Kevin Wolf:
> Am 13.04.2022 um 15:43 hat Emanuele Giuseppe Esposito geschrieben:
>> So this is a more concrete and up-to-date header.
>>
>> Few things to notice:
>> - we have a list of AioContext. They are registered once an aiocontext
>> is created, and deleted when it is destroyed.
>> This list is helpful because each aiocontext can only modify its own
>> number of readers, avoiding unnecessary cacheline bouncing
>>
>> - if a coroutine changes aiocontext, it's ok with regards to the
>> per-aiocontext reader counter. As long as the sum is correct, there's no
>> issue. The problem comes only once the original aiocontext is deleted,
>> and at that point we need to move the count it held to a shared global
>> variable, otherwise we risk to lose track of readers.
>
> So the idea is that we can do bdrv_graph_co_rdlock() in one thread and
> the corresponding bdrv_graph_co_rdunlock() in a different thread?
>
> Would the unlock somehow remember the original thread, or do you use the
> "sum is correct" argument and allow negative counter values, so you can
> end up having count +1 in A and -1 in B to represent "no active
> readers"? If this happens, it's likely to happen many times, so do we
> have to take integer overflows into account then?
>
>> - All synchronization between the flags explained in this header is of
>> course handled in the implementation. But for now it would be nice to
>> have a feedback on the idea/API.
>>
>> So in short we need:
>> - per-aiocontext counter
>> - global list of aiocontext
>> - global additional reader counter (in case an aiocontext is deleted)
>> - global CoQueue
>> - global has_writer flag
>> - global QemuMutex to protect the list access
>>
>> Emanuele
>>
>> #ifndef BLOCK_LOCK_H
>> #define BLOCK_LOCK_H
>>
>> #include "qemu/osdep.h"
>>
>> /*
>> * register_aiocontext:
>> * Add AioContext @ctx to the list of AioContext.
>> * This list is used to obtain the total number of readers
>> * currently running the graph.
>> */
>> void register_aiocontext(AioContext *ctx);
>>
>> /*
>> * unregister_aiocontext:
>> * Removes AioContext @ctx to the list of AioContext.
>> */
>> void unregister_aiocontext(AioContext *ctx);
>>
>> /*
>> * bdrv_graph_wrlock:
>> * Modify the graph. Nobody else is allowed to access the graph.
>> * Set global has_writer to 1, so that the next readers will wait
>> * that writer is done in a coroutine queue.
>> * Then keep track of the running readers by counting what is the total
>> * amount of readers (sum of all aiocontext readers), and wait until
>> * they all finish with AIO_WAIT_WHILE.
>> */
>> void bdrv_graph_wrlock(void);
>
> Do we need a coroutine version that yields instead of using
> AIO_WAIT_WHILE() or are we sure this will only ever be called from
> non-coroutine contexts?
writes (graph modifications) are always done under BQL in the main loop.
Except an unit test, I don't think a coroutine ever does that.
>
>> /*
>> * bdrv_graph_wrunlock:
>> * Write finished, reset global has_writer to 0 and restart
>> * all readers that are waiting.
>> */
>> void bdrv_graph_wrunlock(void);
>>
>> /*
>> * bdrv_graph_co_rdlock:
>> * Read the bs graph. Increases the reader counter of the current
>> aiocontext,
>> * and if has_writer is set, it means that the writer is modifying
>> * the graph, therefore wait in a coroutine queue.
>> * The writer will then wake this coroutine once it is done.
>> *
>> * This lock cannot be taken recursively.
>> */
>> void coroutine_fn bdrv_graph_co_rdlock(void);
>
> What prevents it from being taken recursively when it's just a counter?
> (I do see however, that you can't take a reader lock while you have the
> writer lock or vice versa because it would deadlock.)
>
I actually didn't add the assertion to prevent it from being recoursive
yet, but I think it simplifies everything if it's not recoursive
> Does this being a coroutine_fn mean that we would have to convert QMP
> command handlers to coroutines so that they can take the rdlock while
> they don't expect the graph to change? Or should we have a non-coroutine
> version, too, that works with AIO_WAIT_WHILE()?
Why convert the QMP command handlers? coroutine_fn was just to signal
that it can also be called from coroutines, like the ones created by the
blk_* API.
A reader does not have to be a coroutine. AIO_WAIT_WHILE is not
mandatory to allow it to finish, it helps to ensure progress in case
some reader is waiting for something, but other than that is not
necessary IMO.
> Or should this only be taken for very small pieces of code directly
> accessing the BdrvChild objects, and high-level users like QMP commands
> shouldn't even consider themselves readers?
>
No I think if we focus on small pieces of code we end up having a
million lock/unlock pairs.
>> /*
>> * bdrv_graph_rdunlock:
>> * Read terminated, decrease the count of readers in the current aiocontext.
>> * If the writer is waiting for reads to finish (has_writer == 1), signal
>> * the writer that we are done via aio_wait_kick() to let it continue.
>> */
>> void coroutine_fn bdrv_graph_co_rdunlock(void);
>>
>> #endif /* BLOCK_LOCK_H */
>
> I expect that in the final version, we might want to have some sugar
> like a WITH_BDRV_GRAPH_RDLOCK_GUARD() macro, but obviously that doesn't
> affect the fundamental design.
Yeah I will ping you once I get to that point ;)
Emanuele
>
> Kevin
>
next prev parent reply other threads:[~2022-04-13 15:23 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22 ` Stefan Hajnoczi
2022-03-09 13:49 ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02 9:47 ` Stefan Hajnoczi
2022-03-09 13:26 ` Emanuele Giuseppe Esposito
2022-03-10 15:54 ` Stefan Hajnoczi
2022-03-17 16:23 ` Emanuele Giuseppe Esposito
2022-03-30 10:53 ` Hanna Reitz
2022-03-30 11:55 ` Emanuele Giuseppe Esposito
2022-03-30 14:12 ` Hanna Reitz
2022-03-30 16:02 ` Paolo Bonzini
2022-03-31 9:59 ` Paolo Bonzini
2022-03-31 13:51 ` Emanuele Giuseppe Esposito
2022-03-31 16:40 ` Paolo Bonzini
2022-04-01 8:05 ` Emanuele Giuseppe Esposito
2022-04-01 11:01 ` Paolo Bonzini
2022-04-04 9:25 ` Stefan Hajnoczi
2022-04-04 9:41 ` Paolo Bonzini
2022-04-04 9:51 ` Emanuele Giuseppe Esposito
2022-04-04 10:07 ` Paolo Bonzini
2022-04-05 9:39 ` Stefan Hajnoczi
2022-04-05 10:43 ` Kevin Wolf
2022-04-13 13:43 ` Emanuele Giuseppe Esposito
2022-04-13 14:51 ` Kevin Wolf
2022-04-13 15:14 ` Emanuele Giuseppe Esposito [this message]
2022-04-13 15:22 ` Emanuele Giuseppe Esposito
2022-04-13 16:29 ` Kevin Wolf
2022-04-13 20:43 ` Paolo Bonzini
2022-04-13 20:46 ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20 ` Stefan Hajnoczi
2022-03-09 13:26 ` Emanuele Giuseppe Esposito
2022-03-16 21:55 ` Emanuele Giuseppe Esposito
2022-03-21 12:22 ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24 ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44 ` Vladimir Sementsov-Ogievskiy
2022-03-30 9:09 ` Emanuele Giuseppe Esposito
2022-03-30 9:52 ` Vladimir Sementsov-Ogievskiy
2022-03-30 9:58 ` Emanuele Giuseppe Esposito
2022-04-05 10:55 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d34e709-fe59-70df-2723-49f252aaed78@redhat.com \
--to=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).