qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 0/3] migration: Downtime tracepoints
Date: Thu, 26 Oct 2023 20:33:13 +0100	[thread overview]
Message-ID: <778f6c92-221c-41d2-b0ae-4f5f4a208f65@oracle.com> (raw)
In-Reply-To: <ZTqtieZo/VaSscp5@x1n>

On 26/10/2023 19:18, Peter Xu wrote:
> On Thu, Oct 26, 2023 at 01:03:57PM -0400, Peter Xu wrote:
>> On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote:
>>> On 26/10/2023 16:53, Peter Xu wrote:
>>>> This small series (actually only the last patch; first two are cleanups)
>>>> wants to improve ability of QEMU downtime analysis similarly to what Joao
>>>> used to propose here:
>>>>
>>>>   https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
>>>>
>>> Thanks for following up on the idea; It's been hard to have enough bandwidth for
>>> everything on the past set of weeks :(
>>
>> Yeah, totally understdood.  I think our QE team pushed me towards some
>> series like this, while my plan was waiting for your new version. :)
>>

Oh my end, it was similar (though not by QE/QA) with folks feeling at a blank
when they see a bigger downtime.

Having an explainer/breakdown totally makes this easier to poke holes on where
problems are.

>> Then when I started I decided to go into per-device.  I was thinking of
>> also persist that information, but then I remembered some ppc guest can
>> have ~40,000 vmstates..  and memory to maintain that may or may not regress
>> a ppc user.  So I figured I should first keep it simple with tracepoints.
>>

Yeah, I should have removed that last patch for QAPI.

vmstates is something that I wasn't quite liking how it looked, but I think you
managed to square a relatively clean way on that last patch.

>>>
>>>> But with a few differences:
>>>>
>>>>   - Nothing exported yet to qapi, all tracepoints so far
>>>>
>>>>   - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp),
>>>>     finer granule by providing downtime measurements for each vmstate (I
>>>>     made microsecond to be the unit to be accurate).  So far it seems
>>>>     iterable / non-iterable is the core of the problem, and I want to nail
>>>>     it to per-device.
>>>>
>>>>   - Trace dest QEMU too
>>>>
>>>> For the last bullet: consider the case where a device save() can be super
>>>> fast, while load() can actually be super slow.  Both of them will
>>>> contribute to the ultimate downtime, but not a simple summary: when src
>>>> QEMU is save()ing on device1, dst QEMU can be load()ing on device2.  So
>>>> they can run in parallel.  However the only way to figure all components of
>>>> the downtime is to record both.
>>>>
>>>> Please have a look, thanks.
>>>>
>>>
>>> I like your series, as it allows a user to pinpoint one particular bad device,
>>> while covering the load side too. The checkpoints of migration on the other hand
>>> were useful -- while also a bit ugly -- for the sort of big picture of how
>>> downtime breaks down. Perhaps we could add that /also/ as tracepoitns without
>>> specifically commiting to be exposed in QAPI.
>>>
>>> More fundamentally, how can one capture the 'stop' part? There's also time spent
>>> there like e.g. quiescing/stopping vhost-net workers, or suspending the VF
>>> device. All likely as bad to those tracepoints pertaining device-state/ram
>>> related stuff (iterable and non-iterable portions).
>>
>> Yeah that's a good point.  I didn't cover "stop" yet because I think it's
>> just more tricky and I didn't think it all through, yet.
>>

It could follow your previous line of thought where you do it per vmstate.

But the catch is that vm state change handlers are nameless so tracepoints
wouldn't be tell which vm-state is spending time on each

>> The first question is, when stopping some backends, the vCPUs are still
>> running, so it's not 100% clear to me on which should be contributed as
>> part of real downtime.
> 
> I was wrong.. we always stop vcpus first.
> 

I was about to say this, but I guess you figured out. Even if your vCPUs weren't
stopped first, the external I/O threads (qemu or kernel) wouldn't service
guest own I/O which is a portion of outage.

> If you won't mind, I can add some traceopints for all those spots in this
> series to cover your other series.  I'll also make sure I do that for both
> sides.
> 
Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
considering it started on the other patches (if you also agree it is right). The
patches ofc are enterily different, but at least I like to believe the ideas
initially presented and then subsequently improved are what lead to the downtime
observability improvements in this series.

	Joao


  reply	other threads:[~2023-10-26 19:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-26 15:53 [PATCH 0/3] migration: Downtime tracepoints Peter Xu
2023-10-26 15:53 ` [PATCH 1/3] migration: Set downtime_start even for postcopy Peter Xu
2023-10-26 17:05   ` Fabiano Rosas
2023-10-26 15:53 ` [PATCH 2/3] migration: Add migration_downtime_start|end() helpers Peter Xu
2023-10-26 17:11   ` Fabiano Rosas
2023-10-26 15:53 ` [PATCH 3/3] migration: Add per vmstate downtime tracepoints Peter Xu
2023-10-26 16:06 ` [PATCH 0/3] migration: Downtime tracepoints Joao Martins
2023-10-26 17:03   ` Peter Xu
2023-10-26 18:18     ` Peter Xu
2023-10-26 19:33       ` Joao Martins [this message]
2023-10-26 20:07         ` Peter Xu
2023-10-27  8:58           ` Joao Martins
2023-10-27 14:41             ` Peter Xu
2023-10-27 22:17               ` Joao Martins
2023-10-30 15:13                 ` Peter Xu
2023-10-30 16:09                   ` Peter Xu
2023-10-30 16:11                     ` Joao Martins
2023-10-26 19:01 ` [PATCH 4/3] migration: Add tracepoints for downtime checkpoints Peter Xu
2023-10-26 19:43   ` Joao Martins
2023-10-26 20:08     ` Peter Xu
2023-10-26 20:14       ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=778f6c92-221c-41d2-b0ae-4f5f4a208f65@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).