All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Alex Williamson" <alex@shazbot.org>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Pranav Tyagi" <prtyagi@redhat.com>,
	"Zhiyi Guo" <zhguo@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>
Subject: Re: [PATCH RFC 08/12] vfio/migration: Fix incorrect reporting for VFIO pending data
Date: Thu, 2 Apr 2026 11:28:09 -0400	[thread overview]
Message-ID: <ac6LCWmVHl8fq81J@x1.local> (raw)
In-Reply-To: <e6a9aaf2-e2ab-4b25-a01d-b6093922668e@nvidia.com>

On Wed, Mar 25, 2026 at 07:32:12PM +0200, Avihai Horon wrote:
> 
> On 3/20/2026 1:12 AM, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > VFIO used to report different things in its fast/slow version of query
> > pending results.  It was likely because it wants to make sure precopy data
> > can reach 0 hence trigger sync queries.
> 
> It was to guarantee precopy data can reach 0 and trigger dirty sync queries,
> since not all VFIO device data may be precopy-able.

Thanks for confirming.

> 
> > 
> > Now with stopcopy size reporting facility it doesn't need this hack
> > anymore.  Fix this.
> 
> Looks good, now the fast/slow path are consistent, plus we get the benefit
> that if VFIO device has much stopcopy data, migration will try to reduce
> RAM/other iterative devices to the minimum continuously instead of jumping
> from fast to slow path as it currently does.

Yep.

> 
> I still want to test this later with a few workloads, just to make sure we
> don't miss anything here.

Thanks a lot.  Let me know if there's any update.

> 
> > 
> > Copy stable might be too much; just skip it and skip the Fixes.
> > 
> > Cc: Avihai Horon <avihaih@nvidia.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   hw/vfio/migration.c | 11 +++++++----
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > index c054c749b0..9dbe5ad9e9 100644
> > --- a/hw/vfio/migration.c
> > +++ b/hw/vfio/migration.c
> > @@ -591,6 +591,10 @@ static void vfio_state_pending_sync(VFIODevice *vbasedev)
> >                             __func__, migration->precopy_init_size,
> >                             migration->precopy_dirty_size,
> >                             migration->stopcopy_size);
> > +        migration->stopcopy_size = 0;
> > +    } else {
> > +        migration->stopcopy_size -=
> > +            (migration->precopy_init_size + migration->precopy_dirty_size);
> 
> Query of stopcopy and precopy are not atomic, so we better use MIN() here:
> 
> migration->stopcopy_size -= MIN(migration->stopcopy_size,
> migration->precopy_init_size + migration->precopy_dirty_size);

Yes.  I'll revisit all these places on atomicity and possible overflow
issues in this series after I get better understanding on the "data shrink"
possibilities.

> 
> >       }
> >   }
> > 
> > @@ -598,19 +602,18 @@ static void vfio_state_pending(void *opaque, MigPendingData *pending)
> >   {
> >       VFIODevice *vbasedev = opaque;
> >       VFIOMigration *migration = vbasedev->migration;
> > -    uint64_t remain;
> > 
> >       if (pending->fastpath) {
> >           if (!vfio_device_state_is_precopy(vbasedev)) {
> >               return;
> >           }
> > -        remain = migration->precopy_init_size + migration->precopy_dirty_size;
> >       } else {
> >           vfio_state_pending_sync(vbasedev);
> > -        remain = migration->stopcopy_size;
> >       }
> > 
> > -    pending->precopy_bytes += remain;
> > +    pending->precopy_bytes +=
> > +        migration->precopy_init_size + migration->precopy_dirty_size;
> > +    pending->stopcopy_bytes += migration->stopcopy_size;
> 
> Now that migration->stopcopy_size holds only the stopcopy size (and not
> stopcopy+precopy), we should remove these lines from
> vfio_update_estimated_pending_data():
> 
>     /* The total size remaining requires separate accounting */
>     migration->stopcopy_size -= data_size;

Good point, I missed this part when changing the definition, I'll fix.

Thanks,

> 
> Thanks.
> 
> > 
> >       trace_vfio_state_pending(vbasedev->name, migration->stopcopy_size,
> >                                migration->precopy_init_size,
> > --
> > 2.50.1
> > 
> 

-- 
Peter Xu



  reply	other threads:[~2026-04-02 15:28 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 23:12 [PATCH RFC 00/12] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
2026-03-19 23:12 ` [PATCH RFC 01/12] migration: Fix low possibility downtime violation Peter Xu
2026-03-20 12:26   ` Prasad Pandit
2026-03-27 14:35     ` Juraj Marcin
2026-03-30 11:52       ` Prasad Pandit
2026-03-31 12:49         ` Juraj Marcin
2026-04-06  7:21           ` Prasad Pandit
2026-04-01 19:11       ` Peter Xu
2026-03-27 15:05   ` Juraj Marcin
2026-03-19 23:12 ` [PATCH RFC 02/12] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
2026-03-19 23:26   ` Peter Xu
2026-03-20  6:54   ` Markus Armbruster
2026-04-01 19:38     ` Peter Xu
2026-04-01 19:47     ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 03/12] vfio/migration: Throttle vfio_save_block() on data size to read Peter Xu
2026-03-25 14:10   ` Avihai Horon
2026-04-01 20:36     ` Peter Xu
2026-04-06 11:21       ` Avihai Horon
2026-04-07 15:18         ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 04/12] vfio/migration: Cache stop size in VFIOMigration Peter Xu
2026-03-25 14:15   ` Avihai Horon
2026-04-01 20:41     ` Peter Xu
2026-04-06 11:28       ` Avihai Horon
2026-03-19 23:12 ` [PATCH RFC 05/12] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
2026-03-24 10:35   ` Prasad Pandit
2026-04-01 20:53     ` Peter Xu
2026-03-25 15:20   ` Avihai Horon
2026-04-01 21:22     ` Peter Xu
2026-04-06 11:54       ` Avihai Horon
2026-03-27 15:17   ` Juraj Marcin
2026-03-19 23:12 ` [PATCH RFC 06/12] migration: Use the new save_query_pending() API directly Peter Xu
2026-03-24  9:35   ` Prasad Pandit
2026-03-27 15:24   ` Juraj Marcin
2026-04-01 22:28     ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 07/12] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
2026-03-24 11:05   ` Prasad Pandit
2026-03-25 16:54   ` Avihai Horon
2026-04-02 14:09     ` Peter Xu
2026-04-06 12:20       ` Avihai Horon
2026-04-07 15:30         ` Peter Xu
2026-03-27 16:43   ` Juraj Marcin
2026-04-02 15:16     ` Peter Xu
2026-04-07 15:19       ` Juraj Marcin
2026-04-07 15:32         ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 08/12] vfio/migration: Fix incorrect reporting for VFIO pending data Peter Xu
2026-03-25 17:32   ` Avihai Horon
2026-04-02 15:28     ` Peter Xu [this message]
2026-04-02 15:55       ` Peter Xu
2026-04-06 12:34         ` Avihai Horon
2026-04-07 15:45           ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 09/12] migration: Make iteration counter out of RAM Peter Xu
2026-03-20  6:12   ` Yong Huang
2026-03-20  9:49   ` Prasad Pandit
2026-04-02 15:35     ` Peter Xu
2026-03-27 16:49   ` Juraj Marcin
2026-04-02 15:42     ` Peter Xu
2026-03-19 23:13 ` [PATCH RFC 10/12] migration: Introduce a helper to return switchover bw estimate Peter Xu
2026-03-23 10:26   ` Prasad Pandit
2026-03-27 17:07   ` Juraj Marcin
2026-04-07 17:27     ` Peter Xu
2026-04-08 14:33       ` Juraj Marcin
2026-03-19 23:13 ` [PATCH RFC 11/12] migration: Calculate expected downtime on demand Peter Xu
2026-03-27 17:17   ` Juraj Marcin
2026-04-07 17:33     ` Peter Xu
2026-03-19 23:13 ` [PATCH RFC 12/12] migration: Fix calculation of expected_downtime to take VFIO info Peter Xu
2026-03-23 12:05   ` Prasad Pandit
2026-04-07 17:40     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac6LCWmVHl8fq81J@x1.local \
    --to=peterx@redhat.com \
    --cc=alex@shazbot.org \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=prtyagi@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yishaih@nvidia.com \
    --cc=zhguo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.