From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>, qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
John Snow <jsnow@redhat.com>
Subject: Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
Date: Thu, 8 Apr 2021 19:58:56 +0300 [thread overview]
Message-ID: <505ba75a-996b-0c65-0c49-add50e55e3ce@virtuozzo.com> (raw)
In-Reply-To: <20210408162039.242670-2-mreitz@redhat.com>
08.04.2021 19:20, Max Reitz wrote:
> block-job-complete can only be applied when the job is READY, not when
> it is on STANDBY (ready, but paused). Draining a job technically pauses
> it (which makes a READY job enter STANDBY), and ending the drained
> section does not synchronously resume it, but only schedules the job,
> which will then be resumed. So attempting to complete a job immediately
> after a drained section may sometimes fail.
>
> That is bad at least because users cannot really work nicely around
> this: A job may be paused and resumed at any time, so waiting for the
> job to be in the READY state and then issuing a block-job-complete poses
> a TOCTTOU problem. The only way around it would be to issue
> block-job-complete until it no longer fails due to the job being in the
> STANDBY state, but that would not be nice.
>
> We can solve the problem by allowing block-job-complete to be invoked on
> jobs that are on STANDBY, if that status is the result of a drained
> section (not because the user has paused the job), and that section has
> ended. That is, if the job is on STANDBY, but scheduled to be resumed.
>
> Perhaps we could actually just directly allow this, seeing that mirror
> is the only user of ready/complete, and that mirror_complete() could
> probably work under the given circumstances, but there may be many side
> effects to consider.
>
> It is simpler to add a function job_wait_unpaused() that waits for the
> job to be resumed (under said circumstances), and to make
> qmp_block_job_complete() use it to delay job_complete() until then.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> include/qemu/job.h | 15 +++++++++++++++
> blockdev.c | 3 +++
> job.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 60 insertions(+)
>
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index efc6fa7544..cf3082b6d7 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
> */
> int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>
> +/**
> + * If the job has been paused because of a drained section, and that
> + * section has ended, wait until the job is resumed.
> + *
> + * Return 0 if the job is not paused, or if it has been successfully
> + * resumed.
> + * Return an error if the job has been paused in such a way that
> + * waiting will not resume it, i.e. if it has been paused by the user,
> + * or if it is still drained.
> + *
> + * Callers must be in the home AioContext and hold the AioContext lock
> + * of job->aio_context.
> + */
> +int job_wait_unpaused(Job *job, Error **errp);
> +
> #endif
> diff --git a/blockdev.c b/blockdev.c
> index a57590aae4..c0cc2fa364 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
> return;
> }
>
> + if (job_wait_unpaused(&job->job, errp) < 0) {
> + return;
> + }
> trace_qmp_block_job_complete(job);
> job_complete(&job->job, errp);
> aio_context_release(aio_context);
> diff --git a/job.c b/job.c
> index 289edee143..1ea30fd294 100644
> --- a/job.c
> +++ b/job.c
> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
> job_unref(job);
> return ret;
> }
> +
> +int job_wait_unpaused(Job *job, Error **errp)
> +{
> + /*
> + * Only run this function from the main context, because this is
> + * what we need, and this way we do not have to think about what
> + * happens if the user concurrently pauses the job from the main
> + * monitor.
> + */
> + assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> +
> + /*
> + * Quick path (e.g. so we do not get an error if pause_count > 0
> + * but the job is not even paused)
> + */
> + if (!job->paused) {
> + return 0;
> + }
> +
> + /* If the user has paused the job, waiting will not help */
> + if (job->user_paused) {
> + error_setg(errp, "Job '%s' has been paused by the user", job->id);
> + return -EBUSY;
> + }
> +
> + /* Similarly, if the job is still drained, waiting will not help either */
> + if (job->pause_count > 0) {
> + error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
> + return -EBUSY;
> + }
> +
> + /*
> + * This function is specifically for waiting for a job to be
> + * resumed after a drained section. Ending the drained section
> + * includes a job_enter(), which schedules the job loop to be run,
> + * and once it does, job->paused will be cleared. Therefore, we
> + * do not need to invoke job_enter() here.
> + */
> + AIO_WAIT_WHILE(job->aio_context, job->paused);
> +
> + return 0;
> +}
>
Hmm.. It seems that when job->pause_count becomes 0, job_enter is called, and the period when pause_count is 0 but paused is still true should be relatively shot. And patch doesn't help if user call job-complete during drained section. So it looks like the patch will help relatively seldom.. Or I'm missing something?
job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true?
--
Best regards,
Vladimir
next prev parent reply other threads:[~2021-04-08 17:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
2021-04-08 16:55 ` John Snow
2021-04-09 9:31 ` Max Reitz
2021-04-09 10:17 ` Kevin Wolf
2021-04-09 9:44 ` Kevin Wolf
2021-04-09 9:57 ` Max Reitz
2021-04-09 16:54 ` John Snow
2021-04-08 16:58 ` Vladimir Sementsov-Ogievskiy [this message]
2021-04-08 17:04 ` John Snow
2021-04-08 17:26 ` Vladimir Sementsov-Ogievskiy
2021-04-09 9:51 ` Max Reitz
2021-04-09 10:07 ` Vladimir Sementsov-Ogievskiy
2021-04-09 10:18 ` Max Reitz
2021-04-09 9:38 ` Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused() Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job Max Reitz
2021-04-08 17:09 ` [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete John Snow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=505ba75a-996b-0c65-0c49-add50e55e3ce@virtuozzo.com \
--to=vsementsov@virtuozzo.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).