qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: "Cédric Le Goater" <clg@redhat.com>,
	qemu-devel@nongnu.org,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Yishai Hadas" <yishaih@nvidia.com>,
	"Maor Gottlieb" <maorg@nvidia.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Tarun Gupta" <targupta@nvidia.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Peter Xu" <peterx@redhat.com>
Subject: Re: [PATCH for-8.2 v3 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
Date: Tue, 8 Aug 2023 09:46:52 -0300	[thread overview]
Message-ID: <ZNI5PG1gtcdGzQI/@nvidia.com> (raw)
In-Reply-To: <d870e5d4-dd34-6cdc-7586-971928927869@nvidia.com>

On Tue, Aug 08, 2023 at 09:23:09AM +0300, Avihai Horon wrote:
> 
> On 07/08/2023 18:53, Cédric Le Goater wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > [ Adding Juan and Peter for their awareness ]
> > 
> > On 8/2/23 10:14, Avihai Horon wrote:
> > > Changing the device state from STOP_COPY to STOP can take time as the
> > > device may need to free resources and do other operations as part of the
> > > transition. Currently, this is done in vfio_save_complete_precopy() and
> > > therefore it is counted in the migration downtime.
> > > 
> > > To avoid this, change the device state from STOP_COPY to STOP in
> > > vfio_save_cleanup(), which is called after migration has completed and
> > > thus is not part of migration downtime.
> > 
> > What bothers me is that this looks like a device specific optimization
> 
> True, currently it helps mlx5, but this change is based on the assumption
> that, in general, VFIO devices are likely to free resources when
> transitioning from STOP_COPY to STOP.
> So I think this is a good change to have in any case.

Yes, I agree with this idea. Kernel drivers should be designed to take
advantage of things like this and defer time consuming work to the
stop arc.

> > and we are loosing the error part.
> 
> I don't think we lose the error part.
> AFAIU, the crucial part is transitioning to STOP_COPY and sending the final
> data.
> If that's done successfully, then migration is successful.

Yes.

> The STOP_COPY->STOP transition is done as part of the cleanup flow, after
> the migration is completed -- i.e., failure in it does not affect the
> success of migration.
> Further more, if there is an error in the STOP_COPY->STOP transition, then
> it's reported by vfio_migration_set_state().

If STOP_COPY->STOP fails then the migration should succeed anyhow and
qemu has to FLR the VFIO device to recover it. This probably only
matters if the migration is aborted for some other reason and qemu has
to resume the VM. It will not be able to restore the device to running
until it has been FLRd.

However, qemu can probably ignore this error and eventually if it
tries to go to RUNNING the kernel will report failure and it can do
the FLR at that point. Otherwise on the migration success path qemu
should simply close the VFIO device.

Jason


  reply	other threads:[~2023-08-08 12:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-02  8:14 [PATCH for-8.2 v3 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
2023-08-02  8:14 ` [PATCH for-8.2 v3 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-08-07 15:53   ` Cédric Le Goater
2023-08-08  6:23     ` Avihai Horon
2023-08-08 12:46       ` Jason Gunthorpe [this message]
2023-08-11 10:29       ` Cédric Le Goater
2023-08-13  5:35         ` Avihai Horon
2023-08-02  8:14 ` [PATCH for-8.2 v3 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry Avihai Horon
2023-08-02  8:14 ` [PATCH for-8.2 v3 3/6] qdev: Add qdev_add_vm_change_state_handler_full() Avihai Horon
2023-08-02  8:14 ` [PATCH for-8.2 v3 4/6] vfio/migration: Refactor PRE_COPY and RUNNING state checks Avihai Horon
2023-08-02  8:14 ` [PATCH for-8.2 v3 5/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
2023-08-02  8:43   ` Cédric Le Goater
2023-08-28  7:35     ` YangHang Liu
2023-08-02  8:14 ` [PATCH for-8.2 v3 6/6] vfio/migration: Allow migration of multiple P2P supporting devices Avihai Horon
2023-08-28  7:50 ` [PATCH for-8.2 v3 0/6] vfio/migration: Add P2P support for VFIO migration YangHang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNI5PG1gtcdGzQI/@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=maorg@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).