Re: [PATCH 0/3] migration: Downtime tracepoints

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 0/3] migration: Downtime tracepoints
Date: Mon, 30 Oct 2023 11:13:55 -0400	[thread overview]
Message-ID: <ZT/IM4gz6s6PfKg8@x1n> (raw)
In-Reply-To: <e6be9230-2f63-4c48-9db4-5eff2c4399eb@oracle.com>

On Fri, Oct 27, 2023 at 11:17:41PM +0100, Joao Martins wrote:
> On 27/10/2023 15:41, Peter Xu wrote:
> > On Fri, Oct 27, 2023 at 09:58:03AM +0100, Joao Martins wrote:
> >> On 26/10/2023 21:07, Peter Xu wrote:
> >>> On Thu, Oct 26, 2023 at 08:33:13PM +0100, Joao Martins wrote:
> >>>> Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
> >>>> considering it started on the other patches (if you also agree it is right). The
> >>>> patches ofc are enterily different, but at least I like to believe the ideas
> >>>> initially presented and then subsequently improved are what lead to the downtime
> >>>> observability improvements in this series.
> >>>
> >>> Sure, I'll add that.
> >>>
> >>> If you like, I would be definitely happy to have Co-developed-by: with you,
> >>> if you agree. 
> >>
> >> Oh, that's great, thanks!
> > 
> > Great!  I apologize on not asking already before a formal patch is post.
> > 
> >>
> >>> I just don't know whether that addressed all your need, and
> >>> I need some patch like that for our builds.
> >>
> >> I think it achieves the same as the other series. Or rather it re-implements it
> >> but with less compromise on QAPI and made the tracepoints more 'generic' to even
> >> other usecases and less specific to the 'checkpoint breakdown'. Which makes the
> >> implementation simpler (like we don't need that array storing the checkpoint
> >> timestamps) given that it's just tracing and not for QAPI.
> > 
> > Yes.  Please also feel free to have a closer look on the exact checkpoints
> > in that patch.  I just want to make sure that'll be able to service the
> > same as the patch you proposed, but with tracepoints, and I don't miss
> > anything important.
> > 
> > The dest checkpoints are all new, I hope I nailed them all right as we
> > would expect.
> > 
> Yeah, it looks like so; but I am no master of postcopy so I don't have quite the
> cirurgical eye there.
> 
> Perhaps for the loadvm side, 'precopy-bh-enter' suggests some ambiguity (given
> that precopy happens on both source and destination?). Perhaps being explicit in
> either incoming-bh-enter? or even prefix checkpoint names by 'source' on source
> side for example to distinguish?

I don't think src has a bottom half; if not considering cleanup_bh as part
of migration at all.  Destination's BH is a major part of migration.

In all cases, let me prefix them with "src"/"dst" then, hopefully even clearer.

> 
> > For src checkpoints, IIRC your patch explicitly tracked return path closing
> > while patch 4 only made it just part of final enclosure; the 4th checkpoint
> > is after non-iterable sent, until 5th to be the last "downtime-end". It can
> > cover more than "return path close":
> > 
> >     qemu_savevm_state_complete_precopy_non_iterable <--- 4th checkpoint
> >     qemu_fflush (after non-iterable sent)
> >     close_return_path_on_source
> >     cpu_throttle_stop
> >     qemu_mutex_lock_iothread
> >     migration_downtime_end                          <---- 5th checkpoint
> > 
> > If you see fit or necessary, I can, for example, also add one more
> > checkpoint right after close return path.  I just didn't know whether it's
> > your intention to explicitly check that point.  Just let me know if so.
> > 
> It was put there because it was a simple catch-all from the source PoV on how
> much the destination is taking. Of course bearing in mind that without
> return-path then such metric/tracepoint is not available. On the other hand, with
> migration_downtime_end I think we have the equivalent in that resume checkpoint.
> So we just need to make the diff between that and the non-iterable and I think
> we have the same things as I was looking for (even more I would say).
> 
> > Also on whether you prefer to keep a timestamp in the tracepoint itself;
> > I only use either "log" or "dtrace"+qemu-trace-stap for tracing: both of
> > them contain timestamps already.  But I can also add the timestamps
> > (offseted by downtime_start) if you prefer.
> > 
> Perhaps it is easy to wrap the checkpoint tracepoint in its own function to
> allow extension of something else e.g. add the timestamp (or any other data into
> the checkpoints) or do something in a particular checkpoint of the migration.
> The timing function from qemu would give qemu own idea of time (directly
> correlable with the downtime metric that applications consume) which would be
> more consistent? though I am at too minds on this whether to rely on tracing
> stamps or align with what's provided to applications.

Yes it should be more consistent.  Let me add them into the checkpoints.

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2023-10-30 15:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-26 15:53 [PATCH 0/3] migration: Downtime tracepoints Peter Xu
2023-10-26 15:53 ` [PATCH 1/3] migration: Set downtime_start even for postcopy Peter Xu
2023-10-26 17:05   ` Fabiano Rosas
2023-10-26 15:53 ` [PATCH 2/3] migration: Add migration_downtime_start|end() helpers Peter Xu
2023-10-26 17:11   ` Fabiano Rosas
2023-10-26 15:53 ` [PATCH 3/3] migration: Add per vmstate downtime tracepoints Peter Xu
2023-10-26 16:06 ` [PATCH 0/3] migration: Downtime tracepoints Joao Martins
2023-10-26 17:03   ` Peter Xu
2023-10-26 18:18     ` Peter Xu
2023-10-26 19:33       ` Joao Martins
2023-10-26 20:07         ` Peter Xu
2023-10-27  8:58           ` Joao Martins
2023-10-27 14:41             ` Peter Xu
2023-10-27 22:17               ` Joao Martins
2023-10-30 15:13                 ` Peter Xu [this message]
2023-10-30 16:09                   ` Peter Xu
2023-10-30 16:11                     ` Joao Martins
2023-10-26 19:01 ` [PATCH 4/3] migration: Add tracepoints for downtime checkpoints Peter Xu
2023-10-26 19:43   ` Joao Martins
2023-10-26 20:08     ` Peter Xu
2023-10-26 20:14       ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZT/IM4gz6s6PfKg8@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=joao.m.martins@oracle.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.