qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>, John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
Date: Fri, 9 Apr 2021 11:57:38 +0200	[thread overview]
Message-ID: <d8c87c27-f116-c0a4-5557-62fcfc8de44f@redhat.com> (raw)
In-Reply-To: <YHAiCyl6x0kZutXg@merkur.fritz.box>

On 09.04.21 11:44, Kevin Wolf wrote:
> Am 08.04.2021 um 18:55 hat John Snow geschrieben:
>> On 4/8/21 12:20 PM, Max Reitz wrote:
>>> block-job-complete can only be applied when the job is READY, not when
>>> it is on STANDBY (ready, but paused).  Draining a job technically pauses
>>> it (which makes a READY job enter STANDBY), and ending the drained
>>> section does not synchronously resume it, but only schedules the job,
>>> which will then be resumed.  So attempting to complete a job immediately
>>> after a drained section may sometimes fail.
>>>
>>> That is bad at least because users cannot really work nicely around
>>> this: A job may be paused and resumed at any time, so waiting for the
>>> job to be in the READY state and then issuing a block-job-complete poses
>>> a TOCTTOU problem.  The only way around it would be to issue
>>> block-job-complete until it no longer fails due to the job being in the
>>> STANDBY state, but that would not be nice.
>>>
>>> We can solve the problem by allowing block-job-complete to be invoked on
>>> jobs that are on STANDBY, if that status is the result of a drained
>>> section (not because the user has paused the job), and that section has
>>> ended.  That is, if the job is on STANDBY, but scheduled to be resumed.
>>>
>>> Perhaps we could actually just directly allow this, seeing that mirror
>>> is the only user of ready/complete, and that mirror_complete() could
>>> probably work under the given circumstances, but there may be many side
>>> effects to consider.
>>>
>>> It is simpler to add a function job_wait_unpaused() that waits for the
>>> job to be resumed (under said circumstances), and to make
>>> qmp_block_job_complete() use it to delay job_complete() until then.
>>>
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>    include/qemu/job.h | 15 +++++++++++++++
>>>    blockdev.c         |  3 +++
>>>    job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
>>>    3 files changed, 60 insertions(+)
>>>
>>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>>> index efc6fa7544..cf3082b6d7 100644
>>> --- a/include/qemu/job.h
>>> +++ b/include/qemu/job.h
>>> @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
>>>     */
>>>    int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp);
>>> +/**
>>> + * If the job has been paused because of a drained section, and that
>>> + * section has ended, wait until the job is resumed.
>>> + *
>>> + * Return 0 if the job is not paused, or if it has been successfully
>>> + * resumed.
>>> + * Return an error if the job has been paused in such a way that
>>> + * waiting will not resume it, i.e. if it has been paused by the user,
>>> + * or if it is still drained.
>>> + *
>>> + * Callers must be in the home AioContext and hold the AioContext lock
>>> + * of job->aio_context.
>>> + */
>>> +int job_wait_unpaused(Job *job, Error **errp);
>>> +
>>>    #endif
>>> diff --git a/blockdev.c b/blockdev.c
>>> index a57590aae4..c0cc2fa364 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp)
>>>            return;
>>>        }
>>> +    if (job_wait_unpaused(&job->job, errp) < 0) {
>>> +        return;
>>> +    }
>>
>> After which point, we assume we've transitioned back to either RUNNING or
>> READY, and
>>
>>>        trace_qmp_block_job_complete(job);
>>>        job_complete(&job->job, errp);
>>
>> This function checks the usual state table for permission to deliver/perform
>> the verb.
>>
>>>        aio_context_release(aio_context);
>>> diff --git a/job.c b/job.c
>>> index 289edee143..1ea30fd294 100644
>>> --- a/job.c
>>> +++ b/job.c
>>> @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>>>        job_unref(job);
>>>        return ret;
>>>    }
>>> +
>>> +int job_wait_unpaused(Job *job, Error **errp)
>>> +{
>>> +    /*
>>> +     * Only run this function from the main context, because this is
>>> +     * what we need, and this way we do not have to think about what
>>> +     * happens if the user concurrently pauses the job from the main
>>> +     * monitor.
>>> +     */
>>> +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>> +
>>> +    /*
>>> +     * Quick path (e.g. so we do not get an error if pause_count > 0
>>> +     * but the job is not even paused)
>>> +     */
>>> +    if (!job->paused) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    /* If the user has paused the job, waiting will not help */
>>> +    if (job->user_paused) {
>>> +        error_setg(errp, "Job '%s' has been paused by the user", job->id);
>>> +        return -EBUSY;
>>> +    }
>>> +
>>
>> Or the job has encountered an error if that error policy is set. It is maybe
>> more accurate to say that the job is currently paused/halted (for some
>> reason) and is awaiting the explicit unpause instruction.
>>
>> "Job '%s' has been paused and needs to be explicitly resumed with
>> job-resume", maybe?
> 
> Sounds good to me.
> 
>> Job '%s' has been paused and needs to be [explicitly] resumed
>> [by the user] [with job-resume]
>>
>> Some combo of those runes.
>>
>>> +    /* Similarly, if the job is still drained, waiting will not help either */
>>> +    if (job->pause_count > 0) {
>>> +        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id);
>>> +        return -EBUSY;
>>> +    }
>>> +
>>
>> This leaks an internal state detail out to the caller. In which
>> circumstances does this happen? Do we expect it to?
>>
>> As the user: Why is it blocked? Can I unblock it? Do I wait?
> 
> I don't even understand why we need to fail here (which is still an
> unexpected case for management tools, so we would not fix the bug fully,
> but just restrict it to a more unlikely condition).
> 
> We know that we're running in a QMP command handler here, so it's not
> the caller that drained the job, but some iothread. The iothread can
> still make progress, so why doesn't waiting help?

Yes, probably the best would be to drop this block.

>>> +    /*
>>> +     * This function is specifically for waiting for a job to be
>>> +     * resumed after a drained section.  Ending the drained section
>>> +     * includes a job_enter(), which schedules the job loop to be run,
>>> +     * and once it does, job->paused will be cleared.  Therefore, we
>>> +     * do not need to invoke job_enter() here.
>>> +     */
>>> +    AIO_WAIT_WHILE(job->aio_context, job->paused);
> 
> Do we need a aio_wait_kick() in job_pause_point()?

Oh, yes.

>>> +    return 0;
>>> +}
>>>
>>
>> Looks about right to me, but you'll want Kevin's look-see for the finer
>> details, of course.
>>
>> My concern is that this adds a wait of an indefinite period to the
>> job_complete command. We mitigate this by checking for some other internal
>> state criteria first, and then by process of elimination deduce that it's
>> safe to wait, as it will (likely) be very quick.
>>
>> Do we open the door for ourselves to get into trouble here, either by a
>> state we are forgetting to rule out (You'd have added it if you know the
>> answer to this) or a hypothetical future change where we forget to update
>> this function?
>>
>> Not necessarily a blocker, I think, and this does solve a real problem
>> fairly inexpensively.
> 
> I think the only assumption this function really makes (in particular
> after dropping the job->pause_count check) is that an automatically
> paused job (not counting I/O errors which are treated as user paused)
> will be resumed automatically eventually. Which is an entirely
> reasonable assumption because otherwise the job would be stuck with no
> way to resume it.
> 
> The part how long it will block is less clear, but if it isn't very
> short, wouldn't that be a sign that a backend is hanging and we're in
> trouble anyway?

Just as a PS, in a reply to one of Vladimir’s mails 
(da048f58-43a6-6811-6ad2-0d7899737a23@redhat.com) I was wondering 
whether it even makes sense for mirror to do all the stuff it does in 
mirror_complete() to do it there.  Aren’t all of those things that 
should really be done in job-finalize (i.e. mirror_exit_common())?

Max



  reply	other threads:[~2021-04-09  9:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 16:20 [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 1/3] " Max Reitz
2021-04-08 16:55   ` John Snow
2021-04-09  9:31     ` Max Reitz
2021-04-09 10:17       ` Kevin Wolf
2021-04-09  9:44     ` Kevin Wolf
2021-04-09  9:57       ` Max Reitz [this message]
2021-04-09 16:54         ` John Snow
2021-04-08 16:58   ` Vladimir Sementsov-Ogievskiy
2021-04-08 17:04     ` John Snow
2021-04-08 17:26       ` Vladimir Sementsov-Ogievskiy
2021-04-09  9:51         ` Max Reitz
2021-04-09 10:07           ` Vladimir Sementsov-Ogievskiy
2021-04-09 10:18             ` Max Reitz
2021-04-09  9:38     ` Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused() Max Reitz
2021-04-08 16:20 ` [PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job Max Reitz
2021-04-08 17:09 ` [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8c87c27-f116-c0a4-5557-62fcfc8de44f@redhat.com \
    --to=mreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).