All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Cc: qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Wen Congyang <wencongyang2@huawei.com>,
	Xie Changlong <xiechanglong.d@gmail.com>,
	Markus Armbruster <armbru@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v10 11/21] jobs: group together API calls under the same job lock
Date: Thu, 4 Aug 2022 19:10:08 +0200	[thread overview]
Message-ID: <Yuv9cKJotWg0NEno@redhat.com> (raw)
In-Reply-To: <20220725073855.76049-12-eesposit@redhat.com>

Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> Now that the API offers also _locked() functions, take advantage
> of it and give also the caller control to take the lock and call
> _locked functions.
> 
> This makes sense especially when we have for loops, because it
> makes no sense to have:
> 
> for(job = job_next(); ...)
> 
> where each job_next() takes the lock internally.
> Instead we want
> 
> JOB_LOCK_GUARD();
> for(job = job_next_locked(); ...)
> 
> In addition, protect also direct field accesses, by either creating a
> new critical section or widening the existing ones.

"In addition" sounds like it should be a separate patch. I was indeed
surprised when after a few for loops where you just pulled the existing
locking up a bit, I saw some hunks that add completely new locking.

> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c            | 17 ++++++++++-------
>  blockdev.c         | 12 +++++++++---
>  blockjob.c         | 35 ++++++++++++++++++++++-------------
>  job-qmp.c          |  4 +++-
>  job.c              |  7 +++++--
>  monitor/qmp-cmds.c |  7 +++++--
>  qemu-img.c         | 37 +++++++++++++++++++++----------------
>  7 files changed, 75 insertions(+), 44 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 2c00dddd80..7559965dbc 100644
> --- a/block.c
> +++ b/block.c
> @@ -4978,8 +4978,8 @@ static void bdrv_close(BlockDriverState *bs)
>  
>  void bdrv_close_all(void)
>  {
> -    assert(job_next(NULL) == NULL);
>      GLOBAL_STATE_CODE();
> +    assert(job_next(NULL) == NULL);
>  
>      /* Drop references from requests still in flight, such as canceled block
>       * jobs whose AIO context has not been polled yet */
> @@ -6165,13 +6165,16 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
>          }
>      }
>  
> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> -        GSList *el;
> +    WITH_JOB_LOCK_GUARD() {
> +        for (job = block_job_next_locked(NULL); job;
> +             job = block_job_next_locked(job)) {
> +            GSList *el;
>  
> -        xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> -                           job->job.id);
> -        for (el = job->nodes; el; el = el->next) {
> -            xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> +            xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
> +                                job->job.id);
> +            for (el = job->nodes; el; el = el->next) {
> +                xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
> +            }
>          }
>      }
>  
> diff --git a/blockdev.c b/blockdev.c
> index 71f793c4ab..5b79093155 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -150,12 +150,15 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>          return;
>      }
>  
> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> +    JOB_LOCK_GUARD();
> +
> +    for (job = block_job_next_locked(NULL); job;
> +         job = block_job_next_locked(job)) {
>          if (block_job_has_bdrv(job, blk_bs(blk))) {

Should this be renamed to block_job_has_bdrv_locked() now?

It looks to me like it does need the locking. (Which actually makes
this patch a fix and not just an optimisation as the commit message
suggests.)

>              AioContext *aio_context = job->job.aio_context;
>              aio_context_acquire(aio_context);
>  
> -            job_cancel(&job->job, false);
> +            job_cancel_locked(&job->job, false);
>  
>              aio_context_release(aio_context);
>          }
> @@ -3745,7 +3748,10 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>      BlockJobInfoList *head = NULL, **tail = &head;
>      BlockJob *job;
>  
> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> +    JOB_LOCK_GUARD();
> +
> +    for (job = block_job_next_locked(NULL); job;
> +         job = block_job_next_locked(job)) {
>          BlockJobInfo *value;
>          AioContext *aio_context;

More context:

        BlockJobInfo *value;
        AioContext *aio_context;

        if (block_job_is_internal(job)) {
            continue;
        }
        aio_context = block_job_get_aio_context(job);
        aio_context_acquire(aio_context);
        value = block_job_query(job, errp);
        aio_context_release(aio_context);

This should become block_job_query_locked(). (You do that in patch 18,
but it looks a bit out of place there - which is precisely because it
really belongs in this one.)

> diff --git a/blockjob.c b/blockjob.c
> index 0d59aba439..96fb9d9f73 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -111,8 +111,10 @@ static bool child_job_drained_poll(BdrvChild *c)
>      /* An inactive or completed job doesn't have any pending requests. Jobs
>       * with !job->busy are either already paused or have a pause point after
>       * being reentered, so no job driver code will run before they pause. */
> -    if (!job->busy || job_is_completed(job)) {
> -        return false;
> +    WITH_JOB_LOCK_GUARD() {
> +        if (!job->busy || job_is_completed_locked(job)) {
> +            return false;
> +        }
>      }
>  
>      /* Otherwise, assume that it isn't fully stopped yet, but allow the job to

Assuming that the job status can actually change, don't we need the
locking for the rest of the function, too? Otherwise we might call
drv->drained_poll() for a job that has already paused or completed.

Of course, this goes against the assumption that all callbacks are
called without holding the job lock. Maybe it's not a good assumption.

> @@ -475,13 +477,15 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>      job->ready_notifier.notify = block_job_event_ready;
>      job->idle_notifier.notify = block_job_on_idle;
>  
> -    notifier_list_add(&job->job.on_finalize_cancelled,
> -                      &job->finalize_cancelled_notifier);
> -    notifier_list_add(&job->job.on_finalize_completed,
> -                      &job->finalize_completed_notifier);
> -    notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> -    notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> -    notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> +    WITH_JOB_LOCK_GUARD() {
> +        notifier_list_add(&job->job.on_finalize_cancelled,
> +                          &job->finalize_cancelled_notifier);
> +        notifier_list_add(&job->job.on_finalize_completed,
> +                          &job->finalize_completed_notifier);
> +        notifier_list_add(&job->job.on_pending, &job->pending_notifier);
> +        notifier_list_add(&job->job.on_ready, &job->ready_notifier);
> +        notifier_list_add(&job->job.on_idle, &job->idle_notifier);
> +    }
>  
>      error_setg(&job->blocker, "block device is in use by block job: %s",
>                 job_type_str(&job->job));

Why is this the right scope for the lock? It looks very arbitrary to
lock only here, but not for the assignments above or the function calls
below.

Given that job_create() already puts the job in the job_list so it
becomes visible for other code, should we not keep the job lock from the
moment that we create the job until it is fully initialised?

> @@ -558,10 +562,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>                                          action);
>      }
>      if (action == BLOCK_ERROR_ACTION_STOP) {
> -        if (!job->job.user_paused) {
> -            job_pause(&job->job);
> -            /* make the pause user visible, which will be resumed from QMP. */
> -            job->job.user_paused = true;
> +        WITH_JOB_LOCK_GUARD() {
> +            if (!job->job.user_paused) {
> +                job_pause_locked(&job->job);
> +                /*
> +                 * make the pause user visible, which will be
> +                 * resumed from QMP.
> +                 */
> +                job->job.user_paused = true;
> +            }
>          }
>          block_job_iostatus_set_err(job, error);

Why is this call not in the critical section? It accesses job->iostatus.

>      }
> diff --git a/job-qmp.c b/job-qmp.c
> index ac11a6c23c..cfaf34ffb7 100644
> --- a/job-qmp.c
> +++ b/job-qmp.c
> @@ -194,7 +194,9 @@ JobInfoList *qmp_query_jobs(Error **errp)
>      JobInfoList *head = NULL, **tail = &head;
>      Job *job;
>  
> -    for (job = job_next(NULL); job; job = job_next(job)) {
> +    JOB_LOCK_GUARD();
> +
> +    for (job = job_next_locked(NULL); job; job = job_next_locked(job)) {
>          JobInfo *value;
>          AioContext *aio_context;

Should job_query_single() be renamed to job_query_single_locked()?

> diff --git a/job.c b/job.c
> index ebaa4e585b..b0729e2bb2 100644
> --- a/job.c
> +++ b/job.c
> @@ -668,7 +668,7 @@ void coroutine_fn job_pause_point(Job *job)
>      job_pause_point_locked(job);
>  }
>  
> -void job_yield_locked(Job *job)
> +static void job_yield_locked(Job *job)
>  {
>      assert(job->busy);

It was already unused outside of job.c before this patch. Should it have
been static from the start?

> @@ -1041,11 +1041,14 @@ static void job_completed_txn_abort_locked(Job *job)
>  /* Called with job_mutex held, but releases it temporarily */
>  static int job_prepare_locked(Job *job)
>  {
> +    int ret;
> +
>      GLOBAL_STATE_CODE();
>      if (job->ret == 0 && job->driver->prepare) {
>          job_unlock();
> -        job->ret = job->driver->prepare(job);
> +        ret = job->driver->prepare(job);
>          job_lock();
> +        job->ret = ret;
>          job_update_rc_locked(job);
>      }
>      return job->ret;
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index 1ebb89f46c..1897ed7a13 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -133,8 +133,11 @@ void qmp_cont(Error **errp)
>          blk_iostatus_reset(blk);
>      }
>  
> -    for (job = block_job_next(NULL); job; job = block_job_next(job)) {
> -        block_job_iostatus_reset(job);
> +    WITH_JOB_LOCK_GUARD() {
> +        for (job = block_job_next_locked(NULL); job;
> +             job = block_job_next_locked(job)) {
> +            block_job_iostatus_reset_locked(job);
> +        }
>      }
>  
>      /* Continuing after completed migration. Images have been inactivated to
> diff --git a/qemu-img.c b/qemu-img.c
> index 4cf4d2423d..98c7662b0f 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -912,25 +912,30 @@ static void run_block_job(BlockJob *job, Error **errp)
>      int ret = 0;
>  
>      aio_context_acquire(aio_context);
> -    job_ref(&job->job);
> -    do {
> -        float progress = 0.0f;
> -        aio_poll(aio_context, true);
> +    WITH_JOB_LOCK_GUARD() {
> +        job_ref_locked(&job->job);
> +        do {
> +            float progress = 0.0f;
> +            job_unlock();

This might be more a question of style, but mixing WITH_JOB_LOCK_GUARD()
with manual job_unlock()/job_lock() feels dangerous. What if someone
added a break between the unlock/lock pair? The lock guard would try to
unlock a mutex that is already unlocked, which probably means an
assertion failure.

I feel we should just use manual job_lock()/job_unlock() for everything
in this function.

> +            aio_poll(aio_context, true);
> +
> +            progress_get_snapshot(&job->job.progress, &progress_current,
> +                                &progress_total);
> +            if (progress_total) {
> +                progress = (float)progress_current / progress_total * 100.f;
> +            }
> +            qemu_progress_print(progress, 0);
> +            job_lock();
> +        } while (!job_is_ready_locked(&job->job) &&
> +                 !job_is_completed_locked(&job->job));
>  
> -        progress_get_snapshot(&job->job.progress, &progress_current,
> -                              &progress_total);
> -        if (progress_total) {
> -            progress = (float)progress_current / progress_total * 100.f;
> +        if (!job_is_completed_locked(&job->job)) {
> +            ret = job_complete_sync_locked(&job->job, errp);
> +        } else {
> +            ret = job->job.ret;
>          }
> -        qemu_progress_print(progress, 0);
> -    } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
> -
> -    if (!job_is_completed(&job->job)) {
> -        ret = job_complete_sync(&job->job, errp);
> -    } else {
> -        ret = job->job.ret;
> +        job_unref_locked(&job->job);
>      }
> -    job_unref(&job->job);
>      aio_context_release(aio_context);
>  
>      /* publish completion progress only when success */

Kevin



  parent reply	other threads:[~2022-08-04 17:21 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-25  7:38 [PATCH v10 00/21] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 01/21] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 02/21] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2022-07-29 12:30   ` Kevin Wolf
2022-08-16 18:28   ` Stefan Hajnoczi
2022-07-25  7:38 ` [PATCH v10 03/21] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
2022-07-29 12:30   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 04/21] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-07-29 12:33   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 05/21] job.c: add job_lock/unlock while keeping job.h intact Emanuele Giuseppe Esposito
2022-07-27 10:45   ` Vladimir Sementsov-Ogievskiy
2022-07-29 13:33   ` Kevin Wolf
2022-08-16 18:31   ` Stefan Hajnoczi
2022-07-25  7:38 ` [PATCH v10 06/21] job: move and update comments from blockjob.c Emanuele Giuseppe Esposito
2022-08-03 15:47   ` Kevin Wolf
2022-08-16 18:32   ` Stefan Hajnoczi
2022-07-25  7:38 ` [PATCH v10 07/21] blockjob: introduce block_job _locked() APIs Emanuele Giuseppe Esposito
2022-08-03 15:52   ` Kevin Wolf
2022-08-16 18:33   ` Stefan Hajnoczi
2022-07-25  7:38 ` [PATCH v10 08/21] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2022-08-04 11:47   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 09/21] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
2022-07-27 14:29   ` Vladimir Sementsov-Ogievskiy
2022-08-04 11:56   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 10/21] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2022-08-04 16:35   ` Kevin Wolf
2022-08-16 14:23     ` Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 11/21] jobs: group together API calls under the same job lock Emanuele Giuseppe Esposito
2022-07-27 14:50   ` Vladimir Sementsov-Ogievskiy
2022-08-04 17:10   ` Kevin Wolf [this message]
2022-08-16 14:54     ` Emanuele Giuseppe Esposito
2022-08-17  8:46       ` Kevin Wolf
2022-08-17  9:35         ` Emanuele Giuseppe Esposito
2022-08-17  9:59           ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 12/21] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
2022-08-05  8:14   ` Kevin Wolf
2022-08-16 14:57     ` Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 13/21] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
2022-08-05  8:37   ` Kevin Wolf
2022-08-16 15:09     ` Emanuele Giuseppe Esposito
2022-08-17  8:34       ` Kevin Wolf
2022-08-17 11:16         ` Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 14/21] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
2022-07-27 15:22   ` Vladimir Sementsov-Ogievskiy
2022-08-05  9:12   ` Kevin Wolf
2022-08-17  8:04     ` Emanuele Giuseppe Esposito
2022-08-17 13:10       ` Emanuele Giuseppe Esposito
2022-08-18  8:48         ` Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 15/21] blockjob.h: categorize fields in struct BlockJob Emanuele Giuseppe Esposito
2022-08-05  9:21   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 16/21] blockjob: rename notifier callbacks as _locked Emanuele Giuseppe Esposito
2022-08-05  9:25   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 17/21] blockjob: protect iostatus field in BlockJob struct Emanuele Giuseppe Esposito
2022-07-27 15:29   ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:39     ` Emanuele Giuseppe Esposito
2022-08-05 10:55   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 18/21] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2022-07-27 15:53   ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:52     ` Emanuele Giuseppe Esposito
2022-08-17 18:54       ` Vladimir Sementsov-Ogievskiy
2022-08-18  7:46         ` Emanuele Giuseppe Esposito
2022-08-19 15:49           ` Vladimir Sementsov-Ogievskiy
2022-08-16 12:53     ` Emanuele Giuseppe Esposito
2022-08-05 13:01   ` Kevin Wolf
2022-08-17 12:45     ` Emanuele Giuseppe Esposito
2022-07-25  7:38 ` [PATCH v10 19/21] block_job_query: remove atomic read Emanuele Giuseppe Esposito
2022-08-05 13:01   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 20/21] blockjob: remove unused functions Emanuele Giuseppe Esposito
2022-08-05 13:05   ` Kevin Wolf
2022-07-25  7:38 ` [PATCH v10 21/21] job: " Emanuele Giuseppe Esposito
2022-08-05 13:09   ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yuv9cKJotWg0NEno@redhat.com \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.