From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
Wen Congyang <wencongyang2@huawei.com>,
Xie Changlong <xiechanglong.d@gmail.com>,
Markus Armbruster <armbru@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
qemu-devel@nongnu.org
Subject: Re: [PATCH v10 11/21] jobs: group together API calls under the same job lock
Date: Tue, 16 Aug 2022 16:54:01 +0200 [thread overview]
Message-ID: <1ed3c1c5-8393-2dc8-c930-606b73778a6b@redhat.com> (raw)
In-Reply-To: <Yuv9cKJotWg0NEno@redhat.com>
Am 04/08/2022 um 19:10 schrieb Kevin Wolf:
> Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
>> Now that the API offers also _locked() functions, take advantage
>> of it and give also the caller control to take the lock and call
>> _locked functions.
>>
>> This makes sense especially when we have for loops, because it
>> makes no sense to have:
>>
>> for(job = job_next(); ...)
>>
>> where each job_next() takes the lock internally.
>> Instead we want
>>
>> JOB_LOCK_GUARD();
>> for(job = job_next_locked(); ...)
>>
>> In addition, protect also direct field accesses, by either creating a
>> new critical section or widening the existing ones.
>
> "In addition" sounds like it should be a separate patch. I was indeed
> surprised when after a few for loops where you just pulled the existing
> locking up a bit, I saw some hunks that add completely new locking.
Would it be okay if we don't split it in two? There would be two
microscopical patches.
>
>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>> are *nop*.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>> block.c | 17 ++++++++++-------
>> blockdev.c | 12 +++++++++---
>> blockjob.c | 35 ++++++++++++++++++++++-------------
>> job-qmp.c | 4 +++-
>> job.c | 7 +++++--
>> monitor/qmp-cmds.c | 7 +++++--
>> qemu-img.c | 37 +++++++++++++++++++++----------------
>> 7 files changed, 75 insertions(+), 44 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 2c00dddd80..7559965dbc 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4978,8 +4978,8 @@ static void bdrv_close(BlockDriverState *bs)
>>
>> void bdrv_close_all(void)
>> {
>> - assert(job_next(NULL) == NULL);
>> GLOBAL_STATE_CODE();
>> + assert(job_next(NULL) == NULL);
>>
>> /* Drop references from requests still in flight, such as canceled block
>> * jobs whose AIO context has not been polled yet */
>> @@ -6165,13 +6165,16 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
>> }
>> }
>>
>> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>> - GSList *el;
>> + WITH_JOB_LOCK_GUARD() {
>> + for (job = block_job_next_locked(NULL); job;
>> + job = block_job_next_locked(job)) {
>> + GSList *el;
>>
>> - xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
>> - job->job.id);
>> - for (el = job->nodes; el; el = el->next) {
>> - xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
>> + xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
>> + job->job.id);
>> + for (el = job->nodes; el; el = el->next) {
>> + xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
>> + }
>> }
>> }
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 71f793c4ab..5b79093155 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -150,12 +150,15 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>> return;
>> }
>>
>> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>> + JOB_LOCK_GUARD();
>> +
>> + for (job = block_job_next_locked(NULL); job;
>> + job = block_job_next_locked(job)) {
>> if (block_job_has_bdrv(job, blk_bs(blk))) {
>
> Should this be renamed to block_job_has_bdrv_locked() now?
>
> It looks to me like it does need the locking. (Which actually makes
> this patch a fix and not just an optimisation as the commit message
> suggests.)
Nope, as GSList *nodes; is always read and written under BQL.
>
>> AioContext *aio_context = job->job.aio_context;
>> aio_context_acquire(aio_context);
>>
>> - job_cancel(&job->job, false);
>> + job_cancel_locked(&job->job, false);
>>
>> aio_context_release(aio_context);
>> }
>> @@ -3745,7 +3748,10 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>> BlockJobInfoList *head = NULL, **tail = &head;
>> BlockJob *job;
>>
>> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>> + JOB_LOCK_GUARD();
>> +
>> + for (job = block_job_next_locked(NULL); job;
>> + job = block_job_next_locked(job)) {
>> BlockJobInfo *value;
>> AioContext *aio_context;
>
> More context:
>
> BlockJobInfo *value;
> AioContext *aio_context;
>
> if (block_job_is_internal(job)) {
> continue;
> }
> aio_context = block_job_get_aio_context(job);
> aio_context_acquire(aio_context);
> value = block_job_query(job, errp);
> aio_context_release(aio_context);
>
> This should become block_job_query_locked(). (You do that in patch 18,
> but it looks a bit out of place there - which is precisely because it
> really belongs in this one.)
Ok
>
>> diff --git a/blockjob.c b/blockjob.c
>> index 0d59aba439..96fb9d9f73 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -111,8 +111,10 @@ static bool child_job_drained_poll(BdrvChild *c)
>> /* An inactive or completed job doesn't have any pending requests. Jobs
>> * with !job->busy are either already paused or have a pause point after
>> * being reentered, so no job driver code will run before they pause. */
>> - if (!job->busy || job_is_completed(job)) {
>> - return false;
>> + WITH_JOB_LOCK_GUARD() {
>> + if (!job->busy || job_is_completed_locked(job)) {
>> + return false;
>> + }
>> }
>>
>> /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
>
> Assuming that the job status can actually change, don't we need the
> locking for the rest of the function, too? Otherwise we might call
> drv->drained_poll() for a job that has already paused or completed.
>
> Of course, this goes against the assumption that all callbacks are
> called without holding the job lock. Maybe it's not a good assumption.
>
>> @@ -475,13 +477,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>> job->ready_notifier.notify = block_job_event_ready;
>> job->idle_notifier.notify = block_job_on_idle;
>>
>> - notifier_list_add(&job->job.on_finalize_cancelled,
>> - &job->finalize_cancelled_notifier);
>> - notifier_list_add(&job->job.on_finalize_completed,
>> - &job->finalize_completed_notifier);
>> - notifier_list_add(&job->job.on_pending, &job->pending_notifier);
>> - notifier_list_add(&job->job.on_ready, &job->ready_notifier);
>> - notifier_list_add(&job->job.on_idle, &job->idle_notifier);
>> + WITH_JOB_LOCK_GUARD() {
>> + notifier_list_add(&job->job.on_finalize_cancelled,
>> + &job->finalize_cancelled_notifier);
>> + notifier_list_add(&job->job.on_finalize_completed,
>> + &job->finalize_completed_notifier);
>> + notifier_list_add(&job->job.on_pending, &job->pending_notifier);
>> + notifier_list_add(&job->job.on_ready, &job->ready_notifier);
>> + notifier_list_add(&job->job.on_idle, &job->idle_notifier);
>> + }
>>
>> error_setg(&job->blocker, "block device is in use by block job: %s",
>> job_type_str(&job->job));
>
> Why is this the right scope for the lock? It looks very arbitrary to
> lock only here, but not for the assignments above or the function calls
> below.
>
> Given that job_create() already puts the job in the job_list so it
> becomes visible for other code, should we not keep the job lock from the
> moment that we create the job until it is fully initialised?
I try to protect only what needs protection, nothing more. Otherwise
then it is not clear what are we protecting and why. According to the
split I made in job.h, things like job_type_str and whatever I did not
protect are not protected because they don't need the lock.
>
>> @@ -558,10 +562,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>> action);
>> }
>> if (action == BLOCK_ERROR_ACTION_STOP) {
>> - if (!job->job.user_paused) {
>> - job_pause(&job->job);
>> - /* make the pause user visible, which will be resumed from QMP. */
>> - job->job.user_paused = true;
>> + WITH_JOB_LOCK_GUARD() {
>> + if (!job->job.user_paused) {
>> + job_pause_locked(&job->job);
>> + /*
>> + * make the pause user visible, which will be
>> + * resumed from QMP.
>> + */
>> + job->job.user_paused = true;
>> + }
>> }
>> block_job_iostatus_set_err(job, error);
>
> Why is this call not in the critical section? It accesses job->iostatus.
But the blockjob is not yet "classified". Comes after.
>
>> }
>> diff --git a/job-qmp.c b/job-qmp.c
>> index ac11a6c23c..cfaf34ffb7 100644
>> --- a/job-qmp.c
>> +++ b/job-qmp.c
>> @@ -194,7 +194,9 @@ JobInfoList *qmp_query_jobs(Error **errp)
>> JobInfoList *head = NULL, **tail = &head;
>> Job *job;
>>
>> - for (job = job_next(NULL); job; job = job_next(job)) {
>> + JOB_LOCK_GUARD();
>> +
>> + for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
>> JobInfo *value;
>> AioContext *aio_context;
>
> Should job_query_single() be renamed to job_query_single_locked()?
Makes sense
>
>> diff --git a/job.c b/job.c
>> index ebaa4e585b..b0729e2bb2 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -668,7 +668,7 @@ void coroutine_fn job_pause_point(Job *job)
>> job_pause_point_locked(job);
>> }
>>
>> -void job_yield_locked(Job *job)
>> +static void job_yield_locked(Job *job)
>> {
>> assert(job->busy);
>
> It was already unused outside of job.c before this patch. Should it have
> been static from the start?
>
>> @@ -1041,11 +1041,14 @@ static void job_completed_txn_abort_locked(Job *job)
>> /* Called with job_mutex held, but releases it temporarily */
>> static int job_prepare_locked(Job *job)
>> {
>> + int ret;
>> +
>> GLOBAL_STATE_CODE();
>> if (job->ret == 0 && job->driver->prepare) {
>> job_unlock();
>> - job->ret = job->driver->prepare(job);
>> + ret = job->driver->prepare(job);
>> job_lock();
>> + job->ret = ret;
>> job_update_rc_locked(job);
>> }
>> return job->ret;
>> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
>> index 1ebb89f46c..1897ed7a13 100644
>> --- a/monitor/qmp-cmds.c
>> +++ b/monitor/qmp-cmds.c
>> @@ -133,8 +133,11 @@ void qmp_cont(Error **errp)
>> blk_iostatus_reset(blk);
>> }
>>
>> - for (job = block_job_next(NULL); job; job = block_job_next(job)) {
>> - block_job_iostatus_reset(job);
>> + WITH_JOB_LOCK_GUARD() {
>> + for (job = block_job_next_locked(NULL); job;
>> + job = block_job_next_locked(job)) {
>> + block_job_iostatus_reset_locked(job);
>> + }
>> }
>>
>> /* Continuing after completed migration. Images have been inactivated to
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 4cf4d2423d..98c7662b0f 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -912,25 +912,30 @@ static void run_block_job(BlockJob *job, Error **errp)
>> int ret = 0;
>>
>> aio_context_acquire(aio_context);
>> - job_ref(&job->job);
>> - do {
>> - float progress = 0.0f;
>> - aio_poll(aio_context, true);
>> + WITH_JOB_LOCK_GUARD() {
>> + job_ref_locked(&job->job);
>> + do {
>> + float progress = 0.0f;
>> + job_unlock();
>
> This might be more a question of style, but mixing WITH_JOB_LOCK_GUARD()
> with manual job_unlock()/job_lock() feels dangerous. What if someone
> added a break between the unlock/lock pair? The lock guard would try to
> unlock a mutex that is already unlocked, which probably means an
> assertion failure.
>
> I feel we should just use manual job_lock()/job_unlock() for everything
> in this function.
As you wish, ok.
>
>> + aio_poll(aio_context, true);
>> +
>> + progress_get_snapshot(&job->job.progress, &progress_current,
>> + &progress_total);
>> + if (progress_total) {
>> + progress = (float)progress_current / progress_total * 100.f;
>> + }
>> + qemu_progress_print(progress, 0);
>> + job_lock();
>> + } while (!job_is_ready_locked(&job->job) &&
>> + !job_is_completed_locked(&job->job));
>>
>> - progress_get_snapshot(&job->job.progress, &progress_current,
>> - &progress_total);
>> - if (progress_total) {
>> - progress = (float)progress_current / progress_total * 100.f;
>> + if (!job_is_completed_locked(&job->job)) {
>> + ret = job_complete_sync_locked(&job->job, errp);
>> + } else {
>> + ret = job->job.ret;
>> }
>> - qemu_progress_print(progress, 0);
>> - } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
>> -
>> - if (!job_is_completed(&job->job)) {
>> - ret = job_complete_sync(&job->job, errp);
>> - } else {
>> - ret = job->job.ret;
>> + job_unref_locked(&job->job);
>> }
>> - job_unref(&job->job);
>> aio_context_release(aio_context);
>>
>> /* publish completion progress only when success */
>
Emanuele
next prev parent reply other threads:[~2022-08-16 14:56 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-25 7:38 [PATCH v10 00/21] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 01/21] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 02/21] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2022-07-29 12:30 ` Kevin Wolf
2022-08-16 18:28 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 03/21] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
2022-07-29 12:30 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 04/21] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-07-29 12:33 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 05/21] job.c: add job_lock/unlock while keeping job.h intact Emanuele Giuseppe Esposito
2022-07-27 10:45 ` Vladimir Sementsov-Ogievskiy
2022-07-29 13:33 ` Kevin Wolf
2022-08-16 18:31 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 06/21] job: move and update comments from blockjob.c Emanuele Giuseppe Esposito
2022-08-03 15:47 ` Kevin Wolf
2022-08-16 18:32 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 07/21] blockjob: introduce block_job _locked() APIs Emanuele Giuseppe Esposito
2022-08-03 15:52 ` Kevin Wolf
2022-08-16 18:33 ` Stefan Hajnoczi
2022-07-25 7:38 ` [PATCH v10 08/21] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2022-08-04 11:47 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 09/21] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
2022-07-27 14:29 ` Vladimir Sementsov-Ogievskiy
2022-08-04 11:56 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 10/21] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2022-08-04 16:35 ` Kevin Wolf
2022-08-16 14:23 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 11/21] jobs: group together API calls under the same job lock Emanuele Giuseppe Esposito
2022-07-27 14:50 ` Vladimir Sementsov-Ogievskiy
2022-08-04 17:10 ` Kevin Wolf
2022-08-16 14:54 ` Emanuele Giuseppe Esposito [this message]
2022-08-17 8:46 ` Kevin Wolf
2022-08-17 9:35 ` Emanuele Giuseppe Esposito
2022-08-17 9:59 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 12/21] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
2022-08-05 8:14 ` Kevin Wolf
2022-08-16 14:57 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 13/21] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
2022-08-05 8:37 ` Kevin Wolf
2022-08-16 15:09 ` Emanuele Giuseppe Esposito
2022-08-17 8:34 ` Kevin Wolf
2022-08-17 11:16 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 14/21] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
2022-07-27 15:22 ` Vladimir Sementsov-Ogievskiy
2022-08-05 9:12 ` Kevin Wolf
2022-08-17 8:04 ` Emanuele Giuseppe Esposito
2022-08-17 13:10 ` Emanuele Giuseppe Esposito
2022-08-18 8:48 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 15/21] blockjob.h: categorize fields in struct BlockJob Emanuele Giuseppe Esposito
2022-08-05 9:21 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 16/21] blockjob: rename notifier callbacks as _locked Emanuele Giuseppe Esposito
2022-08-05 9:25 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 17/21] blockjob: protect iostatus field in BlockJob struct Emanuele Giuseppe Esposito
2022-07-27 15:29 ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:39 ` Emanuele Giuseppe Esposito
2022-08-05 10:55 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 18/21] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2022-07-27 15:53 ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:52 ` Emanuele Giuseppe Esposito
2022-08-17 18:54 ` Vladimir Sementsov-Ogievskiy
2022-08-18 7:46 ` Emanuele Giuseppe Esposito
2022-08-19 15:49 ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:53 ` Emanuele Giuseppe Esposito
2022-08-05 13:01 ` Kevin Wolf
2022-08-17 12:45 ` Emanuele Giuseppe Esposito
2022-07-25 7:38 ` [PATCH v10 19/21] block_job_query: remove atomic read Emanuele Giuseppe Esposito
2022-08-05 13:01 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 20/21] blockjob: remove unused functions Emanuele Giuseppe Esposito
2022-08-05 13:05 ` Kevin Wolf
2022-07-25 7:38 ` [PATCH v10 21/21] job: " Emanuele Giuseppe Esposito
2022-08-05 13:09 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ed3c1c5-8393-2dc8-c930-606b73778a6b@redhat.com \
--to=eesposit@redhat.com \
--cc=armbru@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
--cc=wencongyang2@huawei.com \
--cc=xiechanglong.d@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).