All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liang Li <liliang.opensource@gmail.com>
To: John Snow <jsnow@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	eblake@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Liang Li <liliangleo@didichuxing.com>,
	Huaitong Han <huanhuaitong@didichuxing.com>
Subject: Re: [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel
Date: Tue, 6 Feb 2018 11:11:49 +0800	[thread overview]
Message-ID: <20180206031148.GA75587@localhost> (raw)
In-Reply-To: <af357239-5670-fca5-d9ec-ee0a10f4e00e@redhat.com>

On Mon, Feb 05, 2018 at 02:28:55PM -0500, John Snow wrote:
> 
> 
> On 01/31/2018 09:19 PM, Liang Li wrote:
> > On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote:
> >>
> >>
> >> On 01/30/2018 03:38 AM, Liang Li wrote:
> >>> When doing drive mirror to a low speed shared storage, if there was heavy
> >>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
> >>> can't be canceled immediately, it would keep running until the heavy BLK IO
> >>> workload stopped in the VM.
> >>>
> >>> Because libvirt depends on block-job-cancel for block live migration, the
> >>> current block-job-cancel has the semantic to make sure data is in sync after
> >>> the 'ready' event.  This semantic can't meet some requirement, for example,
> >>> people may use drive mirror for realtime backup while need the ability of
> >>> block live migration. If drive mirror can't not be cancelled immediately,
> >>> it means block live migration need to wait, because libvirt make use drive
> >>> mirror to implement block live migration and only one drive mirror block
> >>> job is allowed at the same time for a give block dev.
> >>>
> >>> We need a new interface for 'force cancel', which could quit block job
> >>> immediately if don't care about whether data is in sync or not.
> >>>
> >>> 'force' is not used by libvirt currently, to make things simple, change
> >>> it's semantic slightly, hope it will not break some use case which need its
> >>> original semantic.
> >>>
> >>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>> Cc: Jeff Cody <jcody@redhat.com>
> >>> Cc: Kevin Wolf <kwolf@redhat.com>
> >>> Cc: Max Reitz <mreitz@redhat.com>
> >>> Cc: Eric Blake <eblake@redhat.com>
> >>> Cc: John Snow <jsnow@redhat.com>
> >>> Reported-by: Huaitong Han <huanhuaitong@didichuxing.com>
> >>> Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com>
> >>> Signed-off-by: Liang Li <liliangleo@didichuxing.com>
> >>> ---
> >>> block/mirror.c            |  9 +++------
> >>> blockdev.c                |  4 ++--
> >>> blockjob.c                | 11 ++++++-----
> >>> hmp-commands.hx           |  3 ++-
> >>> include/block/blockjob.h  |  9 ++++++++-
> >>> qapi/block-core.json      |  6 ++++--
> >>> tests/test-blockjob-txn.c |  8 ++++----
> >>> 7 files changed, 29 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/block/mirror.c b/block/mirror.c
> >>> index c9badc1..c22dff9 100644
> >>> --- a/block/mirror.c
> >>> +++ b/block/mirror.c
> >>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
> >>>
> >>>         ret = 0;
> >>>         trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
> >>> -        if (!s->synced) {
> >>> -            block_job_sleep_ns(&s->common, delay_ns);
> >>> -            if (block_job_is_cancelled(&s->common)) {
> >>> -                break;
> >>> -            }
> >>> +        if (block_job_is_cancelled(&s->common) && s->common.force) {
> >>> +            break;
> >>
> >> what's the justification for removing the sleep in the case that
> >> !s->synced && !block_job_is_cancelled(...) ?
> >>
> > if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete) {}'
> > will execute, there is a block_job_sleep_ns there.
> > 
> > block_job_sleep_ns is for rate throttling, if there is no more data to sync, 
> > sleep is not needed, right?
> > 
> >>>         } else if (!should_complete) {
> >>>             delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
> >>>             block_job_sleep_ns(&s->common, delay_ns);
> >>> @@ -887,7 +884,7 @@ immediate_exit:
> >>>          * or it was cancelled prematurely so that we do not guarantee that
> >>>          * the target is a copy of the source.
> >>>          */
> >>> -        assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
> >>> +        assert(ret < 0 || block_job_is_cancelled(&s->common));
> >>
> >> This assertion gets weaker in the case where force isn't provided, is
> >> that desired?
> >>
> > yes. if force quit is used, the following condition can be true
> > 
> > (ret >= 0) && (s->synced) && (block_job_is_cancelled(&s->common)) 
> > 
> > so the above assert should be changed, or it will be failed.
> > 
> 
> What I mean is that when we *don't* use force quit, the assertion here
> is weakened. You can work the force conditional in here:
> 
> ret < 0 || (block_job_is_cancelled(job) && (job->force || !s->synced))
> 
> which preserves the stricter assertion when force isn't provided.
> 

you are right, will change.
> >>>         assert(need_drain);
> >>>         mirror_wait_for_all_io(s);
> >>>     }
> >>> diff --git a/blockdev.c b/blockdev.c
> >>> index 8e977ee..039f156 100644
> >>> --- a/blockdev.c
> >>> +++ b/blockdev.c
> >>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
> >>>         aio_context_acquire(aio_context);
> >>>
> >>>         if (bs->job) {
> >>> -            block_job_cancel(bs->job);
> >>> +            block_job_cancel(bs->job, false);
> >>>         }
> >>>
> >>>         aio_context_release(aio_context);
> >>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device,
> >>>     }
> >>>
> >>>     trace_qmp_block_job_cancel(job);
> >>> -    block_job_cancel(job);
> >>> +    block_job_cancel(job, force);
> >>> out:
> >>>     aio_context_release(aio_context);
> >>> }
> >>> diff --git a/blockjob.c b/blockjob.c
> >>> index f5cea84..0aacb50 100644
> >>> --- a/blockjob.c
> >>> +++ b/blockjob.c
> >>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job)
> >>>     block_job_unref(job);
> >>> }
> >>>
> >>> -static void block_job_cancel_async(BlockJob *job)
> >>> +static void block_job_cancel_async(BlockJob *job, bool force)
> >>> {
> >>>     if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
> >>>         block_job_iostatus_reset(job);
> >>> @@ -376,6 +376,7 @@ static void block_job_cancel_async(BlockJob *job)
> >>>         job->pause_count--;
> >>>     }
> >>>     job->cancelled = true;
> >>> +    job->force = force;
> >>> }
> >>>
> >>
> >> I suppose this is okay; we're setting a permanent property of the job as
> >> part of a limited operation.
> >>
> >> Granted, the job should disappear afterwards, so it should never
> >> conflict, but it made me wonder for a moment.
> >>
> >> What happens if we cancel with force = true and then immediately cancel
> >> again with force = false? What happens? Can it cause issues?
> >>
> > 
> > Indeed. It can be fixed by:
> > 
> > if (!job->force)
> >    job->force = force
> > 
> > it's that ok ?
> > 
> 
> I think so, yes. or job->force |= force, or your preferred equivalent
> for only allowing false-->true here.
> 
> >>> static int block_job_finish_sync(BlockJob *job,
> >>> @@ -437,7 +438,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
> >>>      * on the caller, so leave it. */
> >>>     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> >>>         if (other_job != job) {
> >>> -            block_job_cancel_async(other_job);
> >>> +            block_job_cancel_async(other_job, true);
> >>
> >> What's the rationale for deciding that transactional cancellations are
> >> always force=true?
> >>
> > 
> > no particular reason, just try to speed up the abort process. :)
> > 
> 
> I'd prefer we not change the semantics of how transactions fail without
> a stronger justification than wanting to make it faster!
> 

OK.  I will send the v2, thanks!


Liang
> >> Granted, this doesn't matter currently because mirror isn't a
> >> transactional job, but that makes the decision all the more curious.
> >>
> >>>         }
> >>>     }
> > 
> > Thanks for your comments.
> > 
> > Liang
> > 

  reply	other threads:[~2018-02-06  3:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-01  2:19 [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel Liang Li
2018-02-05 19:28 ` John Snow
2018-02-06  3:11   ` Liang Li [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-02-01  2:14 Liang Li
2018-01-30  8:38 Liang Li
2018-01-30 14:20 ` Eric Blake
2018-01-30 20:08 ` John Snow
2018-01-30 20:18 ` John Snow
2018-01-30 20:39   ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180206031148.GA75587@localhost \
    --to=liliang.opensource@gmail.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=huanhuaitong@didichuxing.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=liliangleo@didichuxing.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.