All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer
Date: Mon, 29 Apr 2024 16:34:00 -0400	[thread overview]
Message-ID: <ZjAEOB9TbeK_7F9z@x1n> (raw)
In-Reply-To: <ad4c7663-6a12-48e5-b082-d919a535fcc2@maciej.szmigiero.name>

On Fri, Apr 26, 2024 at 07:35:36PM +0200, Maciej S. Szmigiero wrote:
> On 24.04.2024 00:27, Peter Xu wrote:
> > On Tue, Apr 23, 2024 at 06:14:18PM +0200, Maciej S. Szmigiero wrote:
> > > We don't lose any genericity since by default the transfer is done via
> > > mixed RAM / device state multifd channels from a shared pool.
> > > 
> > > It's only when x-multifd-channels-device-state is set to value > 0 then
> > > the requested multifd channel counts gets dedicated to device state.
> > > 
> > > It could be seen as a fine-tuning option for cases where tests show that
> > > it provides some benefits to the particular workload - just like many
> > > other existing migration options are.
> > > 
> > > 14% downtime improvement is too much to waste - I'm not sure that's only
> > > due to avoiding RAM syncs, it's possible that there are other subtle
> > > performance interactions too.
> > > 
> > > For even more genericity this option could be named like
> > > x-multifd-channels-map and contain an array of channel settings like
> > > "ram,ram,ram,device-state,device-state".
> > > Then a possible future other uses of multifd channels wouldn't even need
> > > a new dedicated option.
> > 
> > Yeah I understand such option would only provide more options.
> > 
> > However as long as such option got introduced, user will start to do their
> > own "optimizations" on how to provision the multifd channels, and IMHO
> > it'll be great if we as developer can be crystal clear on why it needs to
> > be introduced in the first place, rather than making all channels open to
> > all purposes.
> > 
> > So I don't think I'm strongly against such parameter, but I want to double
> > check we really understand what's behind this to justify such parameter.
> > Meanwhile I'd be always be pretty caucious on introducing any migration
> > parameters, due to the compatibility nightmares.  The less parameter the
> > better..
> 
> Ack, I am also curious why dedicated device state multifd channels bring
> such downtime improvement.
> 
> > > 
> > > > > 
> > > > > I think one of the reasons for these results is that mixed (RAM + device
> > > > > state) multifd channels participate in the RAM sync process
> > > > > (MULTIFD_FLAG_SYNC) whereas device state dedicated channels don't.
> > > > 
> > > > Firstly, I'm wondering whether we can have better names for these new
> > > > hooks.  Currently (only comment on the async* stuff):
> > > > 
> > > >     - complete_precopy_async
> > > >     - complete_precopy
> > > >     - complete_precopy_async_wait
> > > > 
> > > > But perhaps better:
> > > > 
> > > >     - complete_precopy_begin
> > > >     - complete_precopy
> > > >     - complete_precopy_end
> > > > 
> > > > ?
> > > > 
> > > > As I don't see why the device must do something with async in such hook.
> > > > To me it's more like you're splitting one process into multiple, then
> > > > begin/end sounds more generic.
> > > 
> > > Ack, I will rename these hooks to begin/end.
> > > 
> > > > Then, if with that in mind, IIUC we can already split ram_save_complete()
> > > > into >1 phases too. For example, I would be curious whether the performance
> > > > will go back to normal if we offloading multifd_send_sync_main() into the
> > > > complete_precopy_end(), because we really only need one shot of that, and I
> > > > am quite surprised it already greatly affects VFIO dumping its own things.
> > > 
> > > AFAIK there's already just one multifd_send_sync_main() during downtime -
> > > the one called from save_live_complete_precopy SaveVMHandler.
> > > 
> > > In order to truly never interfere with device state transfer the sync would
> > > need to be ordered after the device state transfer is complete - that is,
> > > after VFIO complete_precopy_end (complete_precopy_async_wait) handler
> > > returns.
> > 
> > Do you think it'll be worthwhile give it a shot, even if we can't decide
> > yet on the order of end()s to be called?
> 
> Upon a closer inspection it looks like that there are, in fact, *two*
> RAM syncs done during the downtime - besides the one at the end of
> ram_save_complete() there's another on in find_dirty_block(). This function
> is called earlier from ram_save_complete() -> ram_find_and_save_block().

Fabiano and I used to discuss this when he's working on the mapped-ram
feature, and afaiu the flush in complete() is not needed when the other one
existed.

I tried to remove it and at least the qtests run all well:

@@ -3415,10 +3415,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         }
     }
 
-    if (migrate_multifd() && !migrate_multifd_flush_after_each_section() &&
-        !migrate_mapped_ram()) {
-        qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
-    }
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     return qemu_fflush(f);
 }

> 
> Unfortunately, skipping that intermediate sync in find_dirty_block() and
> moving the one from the end of ram_save_complete() to another SaveVMHandler
> that's called only after VFIO device state transfer doesn't actually
> improve downtime (at least not on its own).
> 
> > It'll be great if we could look into these issues instead of workarounds,
> > and figure out what affected the performance behind, and also whether that
> > can be fixed without such parameter.
> 
> I've been looking at this and added some measurements around device state
> queuing for submission in multifd_queue_device_state().
> 
> To my surprise, the mixed RAM / device state config of 15/0 has *much*
> lower total queuing time of around 100 msec compared to the dedicated
> device state channels 15/4 config with total queuing time of around
> 300 msec.

Did it account device only, or device+RAM?

I'd expect RAM enqueue time grows in 15/0 due to the sharing with device
threads.

However even if so it may not be that fair a comparison, as the cpu
resources aren't equal.  It's fairer if we compare 15/0 (mixed) v.s. 10/5
(dedicated), for example.

> 
> Despite that, the 15/4 config still has significantly lower overall
> downtime.
> 
> This means that any reason for the downtime difference is probably on
> the receive / load side of the migration rather than on the save /
> send side.
> 
> I guess the reason for the lower device state queuing time in the 15/0
> case is that this data could be sent via any of the 15 multifd channels
> rather than just the 4 dedicated ones in the 15/4 case.

Agree.

> 
> Nevertheless, I will continue to look at this problem to at least find
> some explanation for the difference in downtime that dedicated device
> state multifd channels bring.

Thanks for looking at this.

-- 
Peter Xu



  reply	other threads:[~2024-04-29 20:35 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16 14:42 [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 01/26] migration: Add x-channel-header pseudo-capability Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 02/26] migration: Add migration channel header send/receive Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 03/26] migration: Add send/receive header for main channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 04/26] multifd: change multifd_new_send_channel_create() param type Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 05/26] migration: Add a DestroyNotify parameter to socket_send_channel_create() Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 06/26] multifd: pass MFDSendChannelConnectData when connecting sending socket Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 07/26] migration/postcopy: pass PostcopyPChannelConnectData when connecting sending preempt socket Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 08/26] migration: Allow passing migration header in migration channel creation Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 09/26] migration: Add send/receive header for postcopy preempt channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 10/26] migration: Add send/receive header for multifd channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 11/26] migration/options: Mapped-ram is not channel header compatible Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 12/26] migration: Enable x-channel-header pseudo-capability Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 13/26] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 14/26] migration/ram: Add load start trace event Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 15/26] migration/multifd: Zero p->flags before starting filling a packet Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 16/26] migration: Add save_live_complete_precopy_async{, wait} handlers Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 17/26] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 18/26] migration: Add load_finish handler and associated functions Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 19/26] migration: Add x-multifd-channels-device-state parameter Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 20/26] migration: Add MULTIFD_DEVICE_STATE migration channel type Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 21/26] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 22/26] migration/multifd: Convert multifd_send_pages::next_channel to atomic Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 23/26] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2024-04-29 20:04   ` Peter Xu
2024-05-06 16:25     ` Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 24/26] migration/multifd: Add migration_has_device_state_support() Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 25/26] vfio/migration: Multifd device state transfer support - receive side Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 26/26] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2024-04-17  8:36 ` [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer Daniel P. Berrangé
2024-04-17 12:11   ` Maciej S. Szmigiero
2024-04-17 16:35     ` Daniel P. Berrangé
2024-04-18  9:50       ` Maciej S. Szmigiero
2024-04-18 10:39         ` Daniel P. Berrangé
2024-04-18 18:14           ` Maciej S. Szmigiero
2024-04-18 20:02             ` Peter Xu
2024-04-19 10:07               ` Daniel P. Berrangé
2024-04-19 15:31                 ` Peter Xu
2024-04-23 16:15                   ` Maciej S. Szmigiero
2024-04-23 22:20                     ` Peter Xu
2024-04-23 22:25                       ` Maciej S. Szmigiero
2024-04-23 22:35                         ` Peter Xu
2024-04-26 17:34                           ` Maciej S. Szmigiero
2024-04-29 15:09                             ` Peter Xu
2024-05-06 16:26                               ` Maciej S. Szmigiero
2024-05-06 17:56                                 ` Peter Xu
2024-05-07  8:41                                   ` Avihai Horon
2024-05-07 16:13                                     ` Peter Xu
2024-05-07 17:23                                       ` Avihai Horon
2024-04-23 16:14               ` Maciej S. Szmigiero
2024-04-23 22:27                 ` Peter Xu
2024-04-26 17:35                   ` Maciej S. Szmigiero
2024-04-29 20:34                     ` Peter Xu [this message]
2024-04-19 10:20             ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjAEOB9TbeK_7F9z@x1n \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=joao.m.martins@oracle.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.