qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>,
	"Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v1 00/13] Multifd 🔀 device state transfer support with VFIO consumer
Date: Wed, 17 Jul 2024 17:19:15 -0300	[thread overview]
Message-ID: <87msmf22m4.fsf@suse.de> (raw)
In-Reply-To: <ZpgSTCAGbKwWi_o8@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Tue, Jul 16, 2024 at 10:10:12PM +0200, Maciej S. Szmigiero wrote:
>> On 27.06.2024 16:56, Peter Xu wrote:
>> > On Thu, Jun 27, 2024 at 11:14:28AM +0200, Maciej S. Szmigiero wrote:
>> > > On 26.06.2024 18:23, Peter Xu wrote:
>> > > > On Wed, Jun 26, 2024 at 05:47:34PM +0200, Maciej S. Szmigiero wrote:
>> > > > > On 26.06.2024 03:51, Peter Xu wrote:
>> > > > > > On Wed, Jun 26, 2024 at 12:44:29AM +0200, Maciej S. Szmigiero wrote:
>> > > > > > > On 25.06.2024 19:25, Peter Xu wrote:
>> > > > > > > > On Mon, Jun 24, 2024 at 09:51:18PM +0200, Maciej S. Szmigiero wrote:
>> > > > > > > > > Hi Peter,
>> > > > > > > > 
>> > > > > > > > Hi, Maciej,
>> > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > On 23.06.2024 22:27, Peter Xu wrote:
>> > > > > > > > > > On Tue, Jun 18, 2024 at 06:12:18PM +0200, Maciej S. Szmigiero wrote:
>> > > > > > > > > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>> > > > > > > > > > > 
>> > > > > > > > > > > This is an updated v1 patch series of the RFC (v0) series located here:
>> > > > > > > > > > > https://lore.kernel.org/qemu-devel/cover.1713269378.git.maciej.szmigiero@oracle.com/
>> > > > > > > > > > 
>> > > > > > > > > > OK I took some hours thinking about this today, and here's some high level
>> > > > > > > > > > comments for this series.  I'll start with which are more relevant to what
>> > > > > > > > > > Fabiano has already suggested in the other thread, then I'll add some more.
>> > > > > > > > > > 
>> > > > > > > > > > https://lore.kernel.org/r/20240620212111.29319-1-farosas@suse.de
>> > > > > > > > > 
>> > > > > > > > > That's a long list, thanks for these comments.
>> > > > > > > > > 
>> > > > > > > > > I have responded to them inline below.
>> > > > > > > > > (..)
>> > > > > > > 
>> > > > > > > 2) Submit this operation to the thread pool and wait for it to complete,
>> > > > > > 
>> > > > > > VFIO doesn't need to have its own code waiting.  If this pool is for
>> > > > > > migration purpose in general, qemu migration framework will need to wait at
>> > > > > > some point for all jobs to finish before moving on.  Perhaps it should be
>> > > > > > at the end of the non-iterative session.
>> > > > > 
>> > > > > So essentially, instead of calling save_live_complete_precopy_end handlers
>> > > > > from the migration code you would like to hard-code its current VFIO
>> > > > > implementation of calling vfio_save_complete_precopy_async_thread_thread_terminate().
>> > > > > 
>> > > > > Only it wouldn't be then called VFIO precopy async thread terminate but some
>> > > > > generic device state async precopy thread terminate function.
>> > > > 
>> > > > I don't understand what did you mean by "hard code".
>> > > 
>> > > "Hard code" wasn't maybe the best expression here.
>> > > 
>> > > I meant the move of the functionality that's provided by
>> > > vfio_save_complete_precopy_async_thread_thread_terminate() in this patch set
>> > > to the common migration code.
>> > 
>> > I see.  That function only does a thread_join() so far.
>> > 
>> > So can I understand it as below [1] should work for us, and it'll be clean
>> > too (with nothing to hard-code)?
>> 
>> It will need some signal to the worker threads pool to terminate before
>> waiting for them to finish (as the code in [1] just waits).
>> 
>> In the case of current vfio_save_complete_precopy_async_thread() implementation,
>> this signal isn't necessary as this thread simply terminates when it has read
>> all the date it needs from the device.
>> 
>> In a worker threads pool case there will be some threads waiting for
>> jobs to be queued to them and so they will need to be somehow signaled
>> to exit.
>
> Right.  We may need something like multifd_send_should_exit() +
> MultiFDSendParams.sem.  It'll be nicer if we can generalize that part so
> multifd threads can also rebase to that thread model, but maybe I'm asking
> too much.
>
>> 
>> > The time to join() the worker threads can be even later, until
>> > migrate_fd_cleanup() on sender side.  You may have a better idea on when
>> > would be the best place to do it when start working on it.
>> > 
>> > > 
>> > > > What I was saying is if we target the worker thread pool to be used for
>> > > > "concurrently dump vmstates", then it'll make sense to make sure all the
>> > > > jobs there were flushed after qemu dumps all non-iterables (because this
>> > > > should be the last step of the switchover).
>> > > > 
>> > > > I expect it looks like this:
>> > > > 
>> > > >     while (pool->active_threads) {
>> > > >         qemu_sem_wait(&pool->job_done);
>> > > >     }
>> > 
>> > [1]
>> > 
>> (..)
>> > > I think that with this thread pool introduction we'll unfortunately almost certainly
>> > > need to target this patch set at 9.2, since these overall changes (and Fabiano
>> > > patches too) will need good testing, might uncover some performance regressions
>> > > (for example related to the number of buffers limit or Fabiano multifd changes),
>> > > bring some review comments from other people, etc.
>> > > 
>> > > In addition to that, we are in the middle of holiday season and a lot of people
>> > > aren't available - like Fabiano said he will be available only in a few weeks.
>> > 
>> > Right, that's unfortunate.  Let's see, but still I really hope we can also
>> > get some feedback from Fabiano before it lands, even with that we have
>> > chance for 9.1 but it's just challenging, it's the same condition I
>> > mentioned since the 1st email.  And before Fabiano's back (he's the active
>> > maintainer for this release), I'm personally happy if you can propose
>> > something that can land earlier in this release partly.  E.g., if you want
>> > we can at least upstream Fabiano's idea first, or some more on top.
>> > 
>> > For that, also feel to have a look at my comment today:
>> > 
>> > https://lore.kernel.org/r/Zn15y693g0AkDbYD@x1n
>> > 
>> > Feel free to comment there too.  There's a tiny uncertainty there so far on
>> > specifying "max size for a device state" if do what I suggested, as multifd
>> > setup will need to allocate an enum buffer suitable for both ram + device.
>> > But I think that's not an issue and you'll tackle that properly when
>> > working on it.  It's more about whether you agree on what I said as a
>> > general concept.
>> > 
>> 
>> Since it seems that the discussion on Fabiano's patch set has subsided I think
>> I will start by basing my updated patch set on top of his RFC and then if
>> Fabiano wants to submit v1/v2 of his patch set then I will rebase mine on top
>> of it.
>> 
>> Otherwise, you can wait until I have a v2 ready and then we can work with that.
>
> Oh I thought you already started modifying his patchset.
>
> In this case, AFAIR Fabiano has plan to rework that RFC series, so maybe
> you want to double check with him, and can also wait for his new version if
> that's easier, because I do expect there'll be major changes.
>
> Fabiano?

Don't wait on me. I think I can make the changes Peter suggested without
affecting too much the interfaces used by this series. If it comes to
it, I can rebase this series "under" Maciej's.


  reply	other threads:[~2024-07-17 20:19 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-18 16:12 [PATCH v1 00/13] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 01/13] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 02/13] migration/ram: Add load start trace event Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 03/13] migration/multifd: Zero p->flags before starting filling a packet Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 04/13] migration: Add save_live_complete_precopy_{begin, end} handlers Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 05/13] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 06/13] migration: Add load_finish handler and associated functions Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 07/13] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 08/13] migration/multifd: Convert multifd_send_pages::next_channel to atomic Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 09/13] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 10/13] migration/multifd: Add migration_has_device_state_support() Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 11/13] vfio/migration: Multifd device state transfer support - receive side Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 12/13] vfio/migration: Add x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2024-06-18 16:12 ` [PATCH v1 13/13] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2024-06-23 20:27 ` [PATCH v1 00/13] Multifd 🔀 device state transfer support with VFIO consumer Peter Xu
2024-06-24 19:51   ` Maciej S. Szmigiero
2024-06-25 17:25     ` Peter Xu
2024-06-25 22:44       ` Maciej S. Szmigiero
2024-06-26  1:51         ` Peter Xu
2024-06-26 15:47           ` Maciej S. Szmigiero
2024-06-26 16:23             ` Peter Xu
2024-06-27  9:14               ` Maciej S. Szmigiero
2024-06-27 14:56                 ` Peter Xu
2024-07-16 20:10                   ` Maciej S. Szmigiero
2024-07-17 18:49                     ` Peter Xu
2024-07-17 20:19                       ` Fabiano Rosas [this message]
2024-07-17 21:07                         ` Maciej S. Szmigiero
2024-07-17 21:21                           ` Peter Xu
2024-06-27 15:09                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87msmf22m4.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=eblake@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).