Re: [PATCH RFC 03/12] vfio/migration: Throttle vfio_save_block() on data size to read

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Avihai Horon <avihaih@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Alex Williamson" <alex@shazbot.org>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Pranav Tyagi" <prtyagi@redhat.com>,
	"Zhiyi Guo" <zhguo@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>
Subject: Re: [PATCH RFC 03/12] vfio/migration: Throttle vfio_save_block() on data size to read
Date: Mon, 6 Apr 2026 14:21:40 +0300	[thread overview]
Message-ID: <dc9e5884-c532-43e6-813d-e302a3cd970b@nvidia.com> (raw)
In-Reply-To: <ac2B3VF8CSu3ytEE@x1.local>


On 4/1/2026 11:36 PM, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, Mar 25, 2026 at 04:10:14PM +0200, Avihai Horon wrote:
>> Hi Peter,
> Avihai,
>
>> Thanks for sending this series.
> Thanks for taking a look.
>
>> On 3/20/2026 1:12 AM, Peter Xu wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> During precopy phase, VFIO maintains two counters for init/dirty data
>>> tracking for query estimations.
>>>
>>> VFIO fetches data during precopy by reading from the VFIO fd, after
>>> fetching it'll deduct the read size.
>>>
>>> Here since the fd's size can dynamically change, I think it means VFIO may
>>> read more than what it "thought" were there for fetching.
>>>
>>> I highly suspect it's also relevant to a weird case in the function of
>>> vfio_update_estimated_pending_data(), where when VFIO reads 0 from the FD
>>> it will _reset_ the two counters, instead of asserting both of them being
>>> zeros, which looks pretty hackish.
>>>
>>> Just guarantee it from userspace level that VFIO won't read more than what
>>> it expects for now.
>> The VFIO_MIG_GET_PRECOPY_INFO ioctl returns an estimation of the data size
>> currently available for reading. So, even if the ioctl returns X bytes, it
>> may be that there are more than X bytes to read or less than X bytes.
>> The code was written in a flexible way to handle such discrepancies.
>>
>> Because we are dealing with an estimation, I don't think we can assert that
>> the counters are zero, and I don't think reading only up to the cached size
>> gives us any benefit:
>> If the estimation is lower than actual available data, we are just deferring
>> sending of the remaining data to a later stage.
> Since we'll introduce cached size, having the read() only happen with the
> size reported still makes sense to me.
>
> We're not deferring to later that much, when dirty data reaches zero, we'll
> re-sync with everything including VFIO's VFIO_MIG_GET_PRECOPY_INFO.  So
> it's just splitting one last-phase read() into two smaller read()s.  To me,
> it sounds still OK if with that we can make sure the counter won't overflow.

The counter doesn't overflow today as well (thanks to the MIN calculation).

>
>> If the estimation is higher than actual available data, we may still read()
>> zero when the cached values are not zero.
>>
>> I think we should keep the code as is.
>>
>> Does that make sense?
> I can understand what got reported in VFIO_MIG_GET_PRECOPY_INFO may not be
> the total size of dirty data, but what the userapp can read.  That part is
> fine.
>
> Now, do you mean the size reported could shrink as well?

Yes.

>    Could you explain
> why, and when, dirtied data size can shrink?

First of all, the sizes of VFIO_MIG_GET_PRECOPY_INFO are defined as an 
estimation, so anything can be reported there.

But specifically for mlx5, VFIO_MIG_GET_PRECOPY_INFO includes two steps 
- query and save: first, the driver queries the device for the 
*expected* amount of data (that's the returned init/dirty sizes) and 
then the driver async-ly saves the data.
The query returns the *expected* amount of data, but the actual data 
returned in the save may be smaller.

Regarding VFIO_DEVICE_FEATURE_MIG_DATA_SIZE , I don't have a concrete 
example of data "shrink".
But I can think of a device that all its data is precopy-able and 
VFIO_DEVICE_FEATURE_MIG_DATA_SIZE may return X data remaining while 
actual data is smaller than X, simply because the uAPI defines that it's 
an estimate.

What I am trying to say is that VFIO_MIG_GET_PRECOPY_INFO and 
VFIO_DEVICE_FEATURE_MIG_DATA_SIZE sizes are estimates which are good 
enough for telling QEMU how much data is remaining, but shouldn't be 
used to make precise calculations on how much to read(). At least I 
don't see the benefit from it.

Thanks.

next prev parent reply	other threads:[~2026-04-06 11:27 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 23:12 [PATCH RFC 00/12] migration/vfio: Fix a few issues on API misuse or statistic reports Peter Xu
2026-03-19 23:12 ` [PATCH RFC 01/12] migration: Fix low possibility downtime violation Peter Xu
2026-03-20 12:26   ` Prasad Pandit
2026-03-27 14:35     ` Juraj Marcin
2026-03-30 11:52       ` Prasad Pandit
2026-03-31 12:49         ` Juraj Marcin
2026-04-06  7:21           ` Prasad Pandit
2026-04-01 19:11       ` Peter Xu
2026-03-27 15:05   ` Juraj Marcin
2026-03-19 23:12 ` [PATCH RFC 02/12] migration/qapi: Rename MigrationStats to MigrationRAMStats Peter Xu
2026-03-19 23:26   ` Peter Xu
2026-03-20  6:54   ` Markus Armbruster
2026-04-01 19:38     ` Peter Xu
2026-04-01 19:47     ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 03/12] vfio/migration: Throttle vfio_save_block() on data size to read Peter Xu
2026-03-25 14:10   ` Avihai Horon
2026-04-01 20:36     ` Peter Xu
2026-04-06 11:21       ` Avihai Horon [this message]
2026-04-07 15:18         ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 04/12] vfio/migration: Cache stop size in VFIOMigration Peter Xu
2026-03-25 14:15   ` Avihai Horon
2026-04-01 20:41     ` Peter Xu
2026-04-06 11:28       ` Avihai Horon
2026-03-19 23:12 ` [PATCH RFC 05/12] migration/treewide: Merge @state_pending_{exact|estimate} APIs Peter Xu
2026-03-24 10:35   ` Prasad Pandit
2026-04-01 20:53     ` Peter Xu
2026-03-25 15:20   ` Avihai Horon
2026-04-01 21:22     ` Peter Xu
2026-04-06 11:54       ` Avihai Horon
2026-03-27 15:17   ` Juraj Marcin
2026-03-19 23:12 ` [PATCH RFC 06/12] migration: Use the new save_query_pending() API directly Peter Xu
2026-03-24  9:35   ` Prasad Pandit
2026-03-27 15:24   ` Juraj Marcin
2026-04-01 22:28     ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 07/12] migration: Introduce stopcopy_bytes in save_query_pending() Peter Xu
2026-03-24 11:05   ` Prasad Pandit
2026-03-25 16:54   ` Avihai Horon
2026-04-02 14:09     ` Peter Xu
2026-04-06 12:20       ` Avihai Horon
2026-04-07 15:30         ` Peter Xu
2026-03-27 16:43   ` Juraj Marcin
2026-04-02 15:16     ` Peter Xu
2026-04-07 15:19       ` Juraj Marcin
2026-04-07 15:32         ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 08/12] vfio/migration: Fix incorrect reporting for VFIO pending data Peter Xu
2026-03-25 17:32   ` Avihai Horon
2026-04-02 15:28     ` Peter Xu
2026-04-02 15:55       ` Peter Xu
2026-04-06 12:34         ` Avihai Horon
2026-04-07 15:45           ` Peter Xu
2026-03-19 23:12 ` [PATCH RFC 09/12] migration: Make iteration counter out of RAM Peter Xu
2026-03-20  6:12   ` Yong Huang
2026-03-20  9:49   ` Prasad Pandit
2026-04-02 15:35     ` Peter Xu
2026-03-27 16:49   ` Juraj Marcin
2026-04-02 15:42     ` Peter Xu
2026-03-19 23:13 ` [PATCH RFC 10/12] migration: Introduce a helper to return switchover bw estimate Peter Xu
2026-03-23 10:26   ` Prasad Pandit
2026-03-27 17:07   ` Juraj Marcin
2026-04-07 17:27     ` Peter Xu
2026-04-08 14:33       ` Juraj Marcin
2026-03-19 23:13 ` [PATCH RFC 11/12] migration: Calculate expected downtime on demand Peter Xu
2026-03-27 17:17   ` Juraj Marcin
2026-04-07 17:33     ` Peter Xu
2026-03-19 23:13 ` [PATCH RFC 12/12] migration: Fix calculation of expected_downtime to take VFIO info Peter Xu
2026-03-23 12:05   ` Prasad Pandit
2026-04-07 17:40     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc9e5884-c532-43e6-813d-e302a3cd970b@nvidia.com \
    --to=avihaih@nvidia.com \
    --cc=alex@shazbot.org \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=peterx@redhat.com \
    --cc=prtyagi@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yishaih@nvidia.com \
    --cc=zhguo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.