All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: qemu-devel@nongnu.org, "Alex Williamson" <alex@shazbot.org>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Pierrick Bouvier" <pierrick.bouvier@oss.qualcomm.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Zhao Liu" <zhao1.liu@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Maor Gottlieb" <maorg@nvidia.com>
Subject: Re: [PATCH 09/14] vfio/migration: Re-query precopy size before sending VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
Date: Mon, 25 May 2026 11:19:50 -0400	[thread overview]
Message-ID: <ahRoln21IZAO_iOA@x1.local> (raw)
In-Reply-To: <5af18c64-267e-4948-98c9-20d94b4db4e9@nvidia.com>

On Sun, May 24, 2026 at 09:45:33AM +0300, Avihai Horon wrote:
> 
> On 5/21/2026 6:04 PM, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Thu, May 21, 2026 at 04:46:31PM +0300, Avihai Horon wrote:
> > > On 5/19/2026 10:58 PM, Peter Xu wrote:
> > > > External email: Use caution opening links or attachments
> > > > 
> > > > 
> > > > On Tue, May 05, 2026 at 11:14:18AM +0300, Avihai Horon wrote:
> > > > > When precopy initial_bytes reaches zero VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> > > > > flag is sent to the destination to indicate that initial data has been
> > > > > sent, so destination can indicate back to source when it finished
> > > > > loading it.
> > > > > 
> > > > > To get a more accurate estimation of initial_bytes, re-query precopy
> > > > > size before sending the flag. Extract the flag sending logic from
> > > > > vfio_save_iterate() to a new helper for clarity.
> > > > > 
> > > > > This may prevent premature sending of VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> > > > > flag if, for example, the previously queried initial_bytes was lower
> > > > > than actually is. Additionally, it prevents sending the flag if
> > > > > vfio_query_precopy_size() failed.
> > > > > 
> > > > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > > > > ---
> > > > >    hw/vfio/migration.c  | 37 ++++++++++++++++++++++++++++++++-----
> > > > >    hw/vfio/trace-events |  1 +
> > > > >    2 files changed, 33 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > > > > index 2911583ee1..243624b5fe 100644
> > > > > --- a/hw/vfio/migration.c
> > > > > +++ b/hw/vfio/migration.c
> > > > > @@ -456,6 +456,37 @@ static void vfio_update_estimated_pending_data(VFIOMigration *migration,
> > > > >                                             data_size);
> > > > >    }
> > > > > 
> > > > > +/* Returns true if the init data flag was sent, false otherwise */
> > > > > +static bool vfio_send_init_data_flag(QEMUFile *f, VFIOMigration *migration)
> > > > > +{
> > > > > +    VFIODevice *vbasedev = migration->vbasedev;
> > > > > +    int ret;
> > > > > +
> > > > > +    if (!migrate_switchover_ack()) {
> > > > > +        return false;
> > > > > +    }
> > > > > +
> > > > > +    if (migration->precopy_init_size || migration->initial_data_sent) {
> > > > > +        return false;
> > > > > +    }
> > [1]
> > 
> > > > > +
> > > > > +    /*
> > > > > +     * precopy_init_size holds an estimation of the initial data size, re-query
> > > > > +     * precopy size to ensure it's really zero before sending init data flag.
> > > > > +     * Don't send the flag if query fails.
> > > > > +     */
> > > > > +    ret = vfio_query_precopy_size(migration);
> > > > > +    if (ret || migration->precopy_init_size) {
> > > > > +        return false;
> > > > > +    }
> > > > IIUC this chunk isn't necessary?  If we don't expect REINIT to happen that
> > > > much (when NIC reconfigures?), then we can still rely on the window where
> > > > the "new switchover ack" will be requested later on during the exact sync.
> > > > 
> > > > Relying on that seems slightly cleaner.
> > > Not sure I follow.
> > > 
> > > New switchover ack is requested in exact sync if we see new init_bytes > 0
> > > (REINIT flag).
> > > This flow happens only after the new switchover ack is requested in exact
> > > sync, when init_bytes = 0 again.
> > > 
> > > So this chunk just makes sure we send the VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> > > flag at the right time.
> > AFAIU, what this chunk does is, we may save one switchover-ack if REINIT
> > got here.  It doesn't provide much functional difference in reality.
> > 
> > With this code there, when it happens to see REINIT, instead of sending an
> > immediate VFIO_MIG_FLAG_DEV_INIT_DATA_SENT message, it falls back to send
> > init data in the next iteration loop, saving that flag, and saving a
> > "request switchover-ack" on src QEMU too.
> > 
> > If above code removed, IIUC VFIO will send VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> > immediately causing dest sends ACK.  vfio_query_precopy_size() will be
> > postponed until the next sync query (which must happen at some point before
> > final switchover), then it will be collected there, VFIO src will request
> > for switchover-ack, then another VFIO_MIG_FLAG_DEV_INIT_DATA_SENT is
> > expected.
> > 
> > Both should work, but what I meant is, I think we don't need this random
> > check, because it's optimistic, it's not functionally necessary, IIUC.
> > 
> > IOW, see the current code and how it can still race with a REINIT anyway:
> > 
> >               migration thread                                  some vfio driver thread
> > 
> >      ret = vfio_query_precopy_size(migration);
> >      if (ret || migration->precopy_init_size) {
> >          return false;
> >      }
> >                                                                     got reconfigured,
> >                                                                     set REINIT
> > 
> >      qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
> >      migration->initial_data_sent = true;
> >      trace_vfio_send_init_data_flag(vbasedev->name);
> > 
> > It's the same to me if e.g. we try to vfio_query_precopy_size() in VFIO's
> > iterative loops from time to time, it'll also work, it'll make sync more
> > frequent, but it's not needed.
> 
> I see what you mean now.
> However, the purpose of this chunk is not to check for another REINIT, but
> rather to ensure VFIO_MIG_FLAG_DEV_INIT_DATA_SENT is sent at the right time
> -- when init_bytes is truly 0.

IIUC, that part should normally be guaranteed by this line you added prior
to it:

+    if (migration->precopy_init_size || migration->initial_data_sent) {
+        return false;
+    }

Hence when reaching vfio_query_precopy_size(), precopy_init_size==0.

IOW, I think sending INIT_DATA_SENT is fine once it's guarateed the next
vfio_query_precopy_size() will revoke it by the newly populated REINIT
data.  Essentially, the race window is pretty small here.

If you still want to keep it there, I'm still fine; having this extra check
isn't making this incorrect.  I just want to make sure we're on the same
page.

Thanks,

> 
> If the flag is sent before init_bytes = 0 then dest may ack too early,
> before it did the time-consuming load and we risk having that load during
> downtime (see my next response to the cover letter).
> 
> At first I thought mlx5 may report init_bytes estimate which is a bit lower
> than the actual value but then I realized it can't happen. However, I chose
> to keep this patch because it's aligned with the uapi that defines
> init_bytes as an estimate, and because it covers the case where precopy info
> ioctl fails.
> 
> Granted, if initial_bytes is so important I'd expect drivers to report a
> pretty accurate value, but still, it's aligned with the uapi definition and
> doesn't hurt.
> 
> Thanks.
> 

-- 
Peter Xu



  reply	other threads:[~2026-05-25 15:20 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  8:14 [PATCH 00/14] Make switchover-ack re-usable and add VFIO precopy REINIT feature Avihai Horon
2026-05-05  8:14 ` [PATCH 01/14] scripts/update-linux-headers: Add typelimits.h Avihai Horon
2026-05-07  7:54   ` Cédric Le Goater
2026-05-07  9:07     ` gaosong
2026-05-05  8:14 ` [PATCH 02/14] linux-headers: Update to Linux v7.1-rc1 Avihai Horon
2026-05-07  9:16   ` Cédric Le Goater
2026-05-05  8:14 ` [PATCH 03/14] migration: Propagate errors in migration_completion_precopy() Avihai Horon
2026-05-07  8:03   ` Cédric Le Goater
2026-05-08 13:01     ` Avihai Horon
2026-05-15 15:20       ` Peter Xu
2026-05-18 12:02         ` Avihai Horon
2026-05-05  8:14 ` [PATCH 04/14] migration: Log the approver in qemu_loadvm_approve_switchover() Avihai Horon
2026-05-07  8:09   ` Cédric Le Goater
2026-05-08 13:07     ` Avihai Horon
2026-05-15 15:24   ` Peter Xu
2026-05-05  8:14 ` [PATCH 05/14] migration: Replace switchover_ack_needed SaveVMHandler Avihai Horon
2026-05-15 15:27   ` Peter Xu
2026-05-05  8:14 ` [PATCH 06/14] migration: Rename switchover-ack code to legacy Avihai Horon
2026-05-05  8:14 ` [PATCH 07/14] migration: Make switchover-ack re-usable Avihai Horon
2026-05-07 14:10   ` Fabiano Rosas
2026-05-19 19:48   ` Peter Xu
2026-05-21 13:39     ` Avihai Horon
2026-05-21 14:54       ` Peter Xu
2026-05-24  6:34         ` Avihai Horon
2026-05-25 15:01           ` Peter Xu
2026-05-26  9:08             ` Avihai Horon
2026-05-26 16:23               ` Peter Xu
2026-05-27  8:17                 ` Avihai Horon
2026-05-27 15:38                   ` Peter Xu
2026-05-28  9:10                     ` Avihai Horon
2026-05-28 13:21                       ` Peter Xu
2026-05-05  8:14 ` [PATCH 08/14] migration: Check switchover-ack during switchover phase Avihai Horon
2026-05-05  8:14 ` [PATCH 09/14] vfio/migration: Re-query precopy size before sending VFIO_MIG_FLAG_DEV_INIT_DATA_SENT Avihai Horon
2026-05-07  8:24   ` Cédric Le Goater
2026-05-08 13:10     ` Avihai Horon
2026-05-19 19:58   ` Peter Xu
2026-05-21 13:46     ` Avihai Horon
2026-05-21 15:04       ` Peter Xu
2026-05-24  6:45         ` Avihai Horon
2026-05-25 15:19           ` Peter Xu [this message]
2026-05-26  9:17             ` Avihai Horon
2026-05-26 17:38               ` Peter Xu
2026-05-05  8:14 ` [PATCH 10/14] vfio/migration: Add Error ** parameter to vfio_migration_init() Avihai Horon
2026-05-07  7:59   ` Cédric Le Goater
2026-05-08 13:18     ` Avihai Horon
2026-05-08 13:21       ` Avihai Horon
2026-05-05  8:14 ` [PATCH 11/14] vfio/migration: Add new switchover-ack mechanism Avihai Horon
2026-05-05  8:14 ` [PATCH 12/14] vfio/migration: Implement VFIO_PRECOPY_INFO_REINIT feature Avihai Horon
2026-05-05  8:14 ` [PATCH 13/14] vfio/migration: Check VFIO_PRECOPY_INFO_REINIT during switchover Avihai Horon
2026-05-05  8:14 ` [PATCH 14/14] migration: Enable new switchover-ack Avihai Horon
2026-05-19 20:11   ` Peter Xu
2026-05-21 14:00     ` Avihai Horon
2026-05-21 15:05       ` Peter Xu
2026-05-19 20:09 ` [PATCH 00/14] Make switchover-ack re-usable and add VFIO precopy REINIT feature Peter Xu
2026-05-21 13:53   ` Avihai Horon
2026-05-21 15:20     ` Peter Xu
2026-05-24  6:58       ` Avihai Horon
2026-05-25 15:15         ` Peter Xu
2026-05-26  9:10           ` Avihai Horon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahRoln21IZAO_iOA@x1.local \
    --to=peterx@redhat.com \
    --cc=alex@shazbot.org \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=farosas@suse.de \
    --cc=maorg@nvidia.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pierrick.bouvier@oss.qualcomm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.