qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>, qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
	John Snow <jsnow@redhat.com>
Subject: Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
Date: Thu, 8 Apr 2021 19:58:56 +0300	[thread overview]
Message-ID: <505ba75a-996b-0c65-0c49-add50e55e3ce@virtuozzo.com> (raw)
In-Reply-To: <20210408162039.242670-2-mreitz@redhat.com>

08.04.2021 19:20, Max Reitz wrote:
> block-job-complete can only be applied when the job is READY, not when
> it is on STANDBY (ready, but paused).  Draining a job technically pauses
> it (which makes a READY job enter STANDBY), and ending the drained
> section does not synchronously resume it, but only schedules the job,
> which will then be resumed.  So attempting to complete a job immediately
> after a drained section may sometimes fail.
> 
> That is bad at least because users cannot really work nicely around
> this: A job may be paused and resumed at any time, so waiting for the
> job to be in the READY state and then issuing a block-job-complete poses
> a TOCTTOU problem.  The only way around it would be to issue
> block-job-complete until it no longer fails due to the job being in the
> STANDBY state, but that would not be nice.
> 
> We can solve the problem by allowing block-job-complete to be invoked on
> jobs that are on STANDBY, if that status is the result of a drained
> section (not because the user has paused the job), and that section has
> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
> 
> Perhaps we could actually just directly allow this, seeing that mirror
> is the only user of ready/complete, and that mirror_complete() could
> probably work under the given circumstances, but there may be many side
> effects to consider.
> 
> It is simpler to add a function job_wait_unpaused() that waits for the
> job to be resumed (under said circumstances), and to make
> qmp_block_job_complete() use it to delay job_complete() until then.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/qemu/job.h | 15 +++++++++++++++
>   blockdev.c         |  3 +++
>   job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 60 insertions(+)
> 
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index efc6fa7544..cf3082b6d7 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>    */
>   int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>   
> +/**
> + * If the job has been paused because of a drained section, and that
> + * section has ended, wait until the job is resumed.
> + *
> + * Return 0 if the job is not paused, or if it has been successfully
> + * resumed.
> + * Return an error if the job has been paused in such a way that
> + * waiting will not resume it, i.e. if it has been paused by the user,
> + * or if it is still drained.
> + *
> + * Callers must be in the home AioContext and hold the AioContext lock
> + * of job->aio_context.
> + */
> +int job_wait_unpaused(Job *job, Error **errp);
> +
>   #endif
> diff --git a/blockdev.c b/blockdev.c
> index a57590aae4..c0cc2fa364 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
>           return;
>       }
>   
> +    if (job_wait_unpaused(&job->job, errp) < 0) {
> +        return;
> +    }
>       trace_qmp_block_job_complete(job);
>       job_complete(&job->job, errp);
>       aio_context_release(aio_context);
> diff --git a/job.c b/job.c
> index 289edee143..1ea30fd294 100644
> --- a/job.c
> +++ b/job.c
> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>       job_unref(job);
>       return ret;
>   }
> +
> +int job_wait_unpaused(Job *job, Error **errp)
> +{
> +    /*
> +     * Only run this function from the main context, because this is
> +     * what we need, and this way we do not have to think about what
> +     * happens if the user concurrently pauses the job from the main
> +     * monitor.
> +     */
> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> +
> +    /*
> +     * Quick path (e.g. so we do not get an error if pause_count > 0
> +     * but the job is not even paused)
> +     */
> +    if (!job->paused) {
> +        return 0;
> +    }
> +
> +    /* If the user has paused the job, waiting will not help */
> +    if (job->user_paused) {
> +        error_setg(errp, "Job '%s' has been paused by the user", job->id);
> +        return -EBUSY;
> +    }
> +
> +    /* Similarly, if the job is still drained, waiting will not help either */
> +    if (job->pause_count > 0) {
> +        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
> +        return -EBUSY;
> +    }
> +
> +    /*
> +     * This function is specifically for waiting for a job to be
> +     * resumed after a drained section.  Ending the drained section
> +     * includes a job_enter(), which schedules the job loop to be run,
> +     * and once it does, job->paused will be cleared.  Therefore, we
> +     * do not need to invoke job_enter() here.
> +     */
> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
> +
> +    return 0;
> +}
> 

Hmm.. It seems that when job->pause_count becomes 0, job_enter is called, and the period when pause_count is 0 but paused is still true should be relatively shot. And patch doesn't help if user call job-complete during drained section. So it looks like the patch will help relatively seldom.. Or I'm missing something?

job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true?

-- 
Best regards,
Vladimir


  parent reply	other threads:[~2021-04-08 17:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
2021-04-08 16:55   ` John Snow
2021-04-09  9:31     ` Max Reitz
2021-04-09 10:17       ` Kevin Wolf
2021-04-09  9:44     ` Kevin Wolf
2021-04-09  9:57       ` Max Reitz
2021-04-09 16:54         ` John Snow
2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy [this message]
2021-04-08 17:04     ` John Snow
2021-04-08 17:26       ` Vladimir Sementsov-Ogievskiy
2021-04-09  9:51         ` Max Reitz
2021-04-09 10:07           ` Vladimir Sementsov-Ogievskiy
2021-04-09 10:18             ` Max Reitz
2021-04-09  9:38     ` Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused() Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job Max Reitz
2021-04-08 17:09 ` [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=505ba75a-996b-0c65-0c49-add50e55e3ce@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).