From: Kevin Wolf <kwolf@redhat.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Cc: qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
Wen Congyang <wencongyang2@huawei.com>,
Xie Changlong <xiechanglong.d@gmail.com>,
Markus Armbruster <armbru@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
qemu-devel@nongnu.org
Subject: Re: [PATCH v10 11/21] jobs: group together API calls under the same job lock
Date: Wed, 17 Aug 2022 10:46:54 +0200 [thread overview]
Message-ID: <Yvyq/jhJ0B0W6mtF@redhat.com> (raw)
In-Reply-To: <1ed3c1c5-8393-2dc8-c930-606b73778a6b@redhat.com>
Am 16.08.2022 um 16:54 hat Emanuele Giuseppe Esposito geschrieben:
> Am 04/08/2022 um 19:10 schrieb Kevin Wolf:
> > Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> >> Now that the API offers also _locked() functions, take advantage
> >> of it and give also the caller control to take the lock and call
> >> _locked functions.
> >>
> >> This makes sense especially when we have for loops, because it
> >> makes no sense to have:
> >>
> >> for(job = job_next(); ...)
> >>
> >> where each job_next() takes the lock internally.
> >> Instead we want
> >>
> >> JOB_LOCK_GUARD();
> >> for(job = job_next_locked(); ...)
> >>
> >> In addition, protect also direct field accesses, by either creating a
> >> new critical section or widening the existing ones.
> >
> > "In addition" sounds like it should be a separate patch. I was indeed
> > surprised when after a few for loops where you just pulled the existing
> > locking up a bit, I saw some hunks that add completely new locking.
>
> Would it be okay if we don't split it in two? There would be two
> microscopical patches.
If it would be just a hunk or two, fair enough.
> >> Note: at this stage, job_{lock/unlock} and job lock guard macros
> >> are *nop*.
> >>
> >> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> >> ---
> >> block.c | 17 ++++++++++-------
> >> blockdev.c | 12 +++++++++---
> >> blockjob.c | 35 ++++++++++++++++++++++-------------
> >> job-qmp.c | 4 +++-
> >> job.c | 7 +++++--
> >> monitor/qmp-cmds.c | 7 +++++--
> >> qemu-img.c | 37 +++++++++++++++++++++----------------
> >> 7 files changed, 75 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/block.c b/block.c
> >> index 2c00dddd80..7559965dbc 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -4978,8 +4978,8 @@ static void bdrv_close(BlockDriverState *bs)
> >>
> >> void bdrv_close_all(void)
> >> {
> >> - assert(job_next(NULL) == NULL);
> >> GLOBAL_STATE_CODE();
> >> + assert(job_next(NULL) == NULL);
> >>
> >> /* Drop references from requests still in flight, such as canceled block
> >> * jobs whose AIO context has not been polled yet */
> >> @@ -6165,13 +6165,16 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
> >> }
> >> }
> >>
> >> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> >> - GSList *el;
> >> + WITH_JOB_LOCK_GUARD() {
> >> + for (job = block_job_next_locked(NULL); job;
> >> + job = block_job_next_locked(job)) {
> >> + GSList *el;
> >>
> >> - xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> >> - job->job.id);
> >> - for (el = job->nodes; el; el = el->next) {
> >> - xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> >> + xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> >> + job->job.id);
> >> + for (el = job->nodes; el; el = el->next) {
> >> + xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> >> + }
> >> }
> >> }
> >>
> >> diff --git a/blockdev.c b/blockdev.c
> >> index 71f793c4ab..5b79093155 100644
> >> --- a/blockdev.c
> >> +++ b/blockdev.c
> >> @@ -150,12 +150,15 @@ void blockdev_mark_auto_del(BlockBackend *blk)
> >> return;
> >> }
> >>
> >> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> >> + JOB_LOCK_GUARD();
> >> +
> >> + for (job = block_job_next_locked(NULL); job;
> >> + job = block_job_next_locked(job)) {
> >> if (block_job_has_bdrv(job, blk_bs(blk))) {
> >
> > Should this be renamed to block_job_has_bdrv_locked() now?
> >
> > It looks to me like it does need the locking. (Which actually makes
> > this patch a fix and not just an optimisation as the commit message
> > suggests.)
>
> Nope, as GSList *nodes; is always read and written under BQL.
Ah, right. I wonder if we should later take the lock anyway even for
fields where it's not strictly necessary to simplify the locking rules.
Having to check the rules for each field separately is kind of hard. But
let's do only the necessary things in this series.
> >
> >> AioContext *aio_context = job->job.aio_context;
> >> aio_context_acquire(aio_context);
> >>
> >> - job_cancel(&job->job, false);
> >> + job_cancel_locked(&job->job, false);
> >>
> >> aio_context_release(aio_context);
> >> }
> >> @@ -3745,7 +3748,10 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
> >> BlockJobInfoList *head = NULL, **tail = &head;
> >> BlockJob *job;
> >>
> >> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> >> + JOB_LOCK_GUARD();
> >> +
> >> + for (job = block_job_next_locked(NULL); job;
> >> + job = block_job_next_locked(job)) {
> >> BlockJobInfo *value;
> >> AioContext *aio_context;
> >
> > More context:
> >
> > BlockJobInfo *value;
> > AioContext *aio_context;
> >
> > if (block_job_is_internal(job)) {
> > continue;
> > }
> > aio_context = block_job_get_aio_context(job);
> > aio_context_acquire(aio_context);
> > value = block_job_query(job, errp);
> > aio_context_release(aio_context);
> >
> > This should become block_job_query_locked(). (You do that in patch 18,
> > but it looks a bit out of place there - which is precisely because it
> > really belongs in this one.)
>
> Ok
> >
> >> diff --git a/blockjob.c b/blockjob.c
> >> index 0d59aba439..96fb9d9f73 100644
> >> --- a/blockjob.c
> >> +++ b/blockjob.c
> >> @@ -111,8 +111,10 @@ static bool child_job_drained_poll(BdrvChild *c)
> >> /* An inactive or completed job doesn't have any pending requests. Jobs
> >> * with !job->busy are either already paused or have a pause point after
> >> * being reentered, so no job driver code will run before they pause. */
> >> - if (!job->busy || job_is_completed(job)) {
> >> - return false;
> >> + WITH_JOB_LOCK_GUARD() {
> >> + if (!job->busy || job_is_completed_locked(job)) {
> >> + return false;
> >> + }
> >> }
> >>
> >> /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
> >
> > Assuming that the job status can actually change, don't we need the
> > locking for the rest of the function, too? Otherwise we might call
> > drv->drained_poll() for a job that has already paused or completed.
> >
> > Of course, this goes against the assumption that all callbacks are
> > called without holding the job lock. Maybe it's not a good assumption.
> >
> >> @@ -475,13 +477,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
> >> job->ready_notifier.notify = block_job_event_ready;
> >> job->idle_notifier.notify = block_job_on_idle;
> >>
> >> - notifier_list_add(&job->job.on_finalize_cancelled,
> >> - &job->finalize_cancelled_notifier);
> >> - notifier_list_add(&job->job.on_finalize_completed,
> >> - &job->finalize_completed_notifier);
> >> - notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> >> - notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> >> - notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> >> + WITH_JOB_LOCK_GUARD() {
> >> + notifier_list_add(&job->job.on_finalize_cancelled,
> >> + &job->finalize_cancelled_notifier);
> >> + notifier_list_add(&job->job.on_finalize_completed,
> >> + &job->finalize_completed_notifier);
> >> + notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> >> + notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> >> + notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> >> + }
> >>
> >> error_setg(&job->blocker, "block device is in use by block job: %s",
> >> job_type_str(&job->job));
> >
> > Why is this the right scope for the lock? It looks very arbitrary to
> > lock only here, but not for the assignments above or the function calls
> > below.
> >
> > Given that job_create() already puts the job in the job_list so it
> > becomes visible for other code, should we not keep the job lock from the
> > moment that we create the job until it is fully initialised?
>
> I try to protect only what needs protection, nothing more. Otherwise
> then it is not clear what are we protecting and why. According to the
> split I made in job.h, things like job_type_str and whatever I did not
> protect are not protected because they don't need the lock.
I think the last paragraph above explains what it would protect?
Having a half-initialised job in the job list without holding the lock
sounds dangerous to me. Maybe it's actually okay in practice because
this is GLOBAL_STATE_CODE() and we can assume that code accessing
the job list outside of the main thread probably skips over the
half-initialised job, but it's another case where relying on the BQL is
confusing when there would be a more specific lock for it.
> >
> >> @@ -558,10 +562,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
> >> action);
> >> }
> >> if (action == BLOCK_ERROR_ACTION_STOP) {
> >> - if (!job->job.user_paused) {
> >> - job_pause(&job->job);
> >> - /* make the pause user visible, which will be resumed from QMP. */
> >> - job->job.user_paused = true;
> >> + WITH_JOB_LOCK_GUARD() {
> >> + if (!job->job.user_paused) {
> >> + job_pause_locked(&job->job);
> >> + /*
> >> + * make the pause user visible, which will be
> >> + * resumed from QMP.
> >> + */
> >> + job->job.user_paused = true;
> >> + }
> >> }
> >> block_job_iostatus_set_err(job, error);
> >
> > Why is this call not in the critical section? It accesses job->iostatus.
>
> But the blockjob is not yet "classified". Comes after.
Ok.
Kevin
next prev parent reply other threads:[~2022-08-17 8:53 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-25 7:38 [PATCH v10 00/21] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 01/21] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 02/21] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2022-07-29 12:30 ` Kevin Wolf
2022-08-16 18:28 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 03/21] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
2022-07-29 12:30 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 04/21] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-07-29 12:33 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 05/21] job.c: add job_lock/unlock while keeping job.h intact Emanuele Giuseppe Esposito
2022-07-27 10:45 ` Vladimir Sementsov-Ogievskiy
2022-07-29 13:33 ` Kevin Wolf
2022-08-16 18:31 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 06/21] job: move and update comments from blockjob.c Emanuele Giuseppe Esposito
2022-08-03 15:47 ` Kevin Wolf
2022-08-16 18:32 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 07/21] blockjob: introduce block_job _locked() APIs Emanuele Giuseppe Esposito
2022-08-03 15:52 ` Kevin Wolf
2022-08-16 18:33 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 08/21] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2022-08-04 11:47 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 09/21] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
2022-07-27 14:29 ` Vladimir Sementsov-Ogievskiy
2022-08-04 11:56 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 10/21] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2022-08-04 16:35 ` Kevin Wolf
2022-08-16 14:23 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 11/21] jobs: group together API calls under the same job lock Emanuele Giuseppe Esposito
2022-07-27 14:50 ` Vladimir Sementsov-Ogievskiy
2022-08-04 17:10 ` Kevin Wolf
2022-08-16 14:54 ` Emanuele Giuseppe Esposito
2022-08-17 8:46 ` Kevin Wolf [this message]
2022-08-17 9:35 ` Emanuele Giuseppe Esposito
2022-08-17 9:59 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 12/21] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
2022-08-05 8:14 ` Kevin Wolf
2022-08-16 14:57 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 13/21] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
2022-08-05 8:37 ` Kevin Wolf
2022-08-16 15:09 ` Emanuele Giuseppe Esposito
2022-08-17 8:34 ` Kevin Wolf
2022-08-17 11:16 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 14/21] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
2022-07-27 15:22 ` Vladimir Sementsov-Ogievskiy
2022-08-05 9:12 ` Kevin Wolf
2022-08-17 8:04 ` Emanuele Giuseppe Esposito
2022-08-17 13:10 ` Emanuele Giuseppe Esposito
2022-08-18 8:48 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 15/21] blockjob.h: categorize fields in struct BlockJob Emanuele Giuseppe Esposito
2022-08-05 9:21 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 16/21] blockjob: rename notifier callbacks as _locked Emanuele Giuseppe Esposito
2022-08-05 9:25 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 17/21] blockjob: protect iostatus field in BlockJob struct Emanuele Giuseppe Esposito
2022-07-27 15:29 ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:39 ` Emanuele Giuseppe Esposito
2022-08-05 10:55 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 18/21] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2022-07-27 15:53 ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:52 ` Emanuele Giuseppe Esposito
2022-08-17 18:54 ` Vladimir Sementsov-Ogievskiy
2022-08-18 7:46 ` Emanuele Giuseppe Esposito
2022-08-19 15:49 ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:53 ` Emanuele Giuseppe Esposito
2022-08-05 13:01 ` Kevin Wolf
2022-08-17 12:45 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 19/21] block_job_query: remove atomic read Emanuele Giuseppe Esposito
2022-08-05 13:01 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 20/21] blockjob: remove unused functions Emanuele Giuseppe Esposito
2022-08-05 13:05 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 21/21] job: " Emanuele Giuseppe Esposito
2022-08-05 13:09 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yvyq/jhJ0B0W6mtF@redhat.com \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
--cc=wencongyang2@huawei.com \
--cc=xiechanglong.d@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).