netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity
Date: Wed, 29 Sep 2021 20:21:09 -0300	[thread overview]
Message-ID: <20210929232109.GC3544071@ziepe.ca> (raw)
In-Reply-To: <29835bf4-d094-ae6d-1a32-08e65847b52c@nvidia.com>

On Thu, Sep 30, 2021 at 12:48:55AM +0300, Max Gurtovoy wrote:
> 
> On 9/29/2021 7:14 PM, Jason Gunthorpe wrote:
> > On Wed, Sep 29, 2021 at 06:28:44PM +0300, Max Gurtovoy wrote:
> > 
> > > > So you have a device that's actively modifying its internal state,
> > > > performing I/O, including DMA (thereby dirtying VM memory), all while
> > > > in the _STOP state?  And you don't see this as a problem?
> > > I don't see how is it different from vfio-pci situation.
> > vfio-pci provides no way to observe the migration state. It isn't
> > "000b"
> 
> Alex said that there is a problem of compatibility.

Yes, when a vfio_device first opens it must be running - ie able to do
DMA and otherwise operational.

When we add the migration extension this cannot change, so after
open_device() the device should be operational.

The reported state in the migration region should accurately reflect
what the device is currently doing. If the device is operational then
it must report running, not stopped.

Thus a driver cannot just zero initalize the migration "registers",
they have to be accurate.

> > > Maybe we need to rename STOP state. We can call it READY or LIVE or
> > > NON_MIGRATION_STATE.
> > It was a poor choice to use 000b as stop, but it doesn't really
> > matter. The mlx5 driver should just pre-init this readable to running.
> 
> I guess we can do it for this reason. There is no functional problem nor
> compatibility issue here as was mentioned.
> 
> But still we need the kernel to track transitions. We don't want to allow
> moving from RESUMING to SAVING state for example. How this transition can be
> allowed ?

It seems semantically fine to me, as per Alex's note what will happen
is defined:

driver will see RESUMING toggle off so it will trigger a
de-serialization

driver will see SAVING toggled on so it will serialize the new state
(either the pre-copy state or the post-copy state dpending on the
running bit)

Depending on the running bit the device may or may not be woken up.

If de-serialization fails then the state goes to error and SAVING is
ignored.

The driver logic probably looks something like this:

// Running toggles off
if (oldstate & RUNNING != newstate & RUNNING && oldstate & RUNNING)
    queice
    freeze

// Resuming toggles off
if (oldstate & RESUMING != newstate & RESUMING && oldstate & RESUMING)
   deserialize

// Saving toggles on
if (oldstate & SAVING != newstate & SAVING && newstate & SAVING)
   if (!(newstate & RUNNING))
     serialize post copy

// Running toggles on
if (oldstate & RUNNING != newstate & RUNNING && newstate & RUNNING)
   unfreeze
   unqueice

I'd have to check that carefully against the state chart from my last
email though..

And need to check how the "Stop Active Transactions" bit fits in there

Jason

  parent reply	other threads:[~2021-09-29 23:21 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22 10:38 [PATCH mlx5-next 0/7] Add mlx5 live migration driver Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 1/7] PCI/IOV: Provide internal VF index Leon Romanovsky
2021-09-22 21:59   ` Bjorn Helgaas
2021-09-23  6:35     ` Leon Romanovsky
2021-09-24 13:08       ` Bjorn Helgaas
2021-09-25 10:10         ` Leon Romanovsky
2021-09-25 17:41           ` Bjorn Helgaas
2021-09-26  6:36             ` Leon Romanovsky
2021-09-26 20:23               ` Bjorn Helgaas
2021-09-27 11:55                 ` Leon Romanovsky
2021-09-27 14:47                   ` Bjorn Helgaas
2021-09-22 10:38 ` [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity Leon Romanovsky
2021-09-23 10:33   ` Shameerali Kolothum Thodi
2021-09-23 11:17     ` Leon Romanovsky
2021-09-23 13:55       ` Max Gurtovoy
2021-09-24  7:44         ` Shameerali Kolothum Thodi
2021-09-24  9:37           ` Kirti Wankhede
2021-09-26  9:09           ` Max Gurtovoy
2021-09-26 16:17             ` Shameerali Kolothum Thodi
2021-09-27 18:24               ` Max Gurtovoy
2021-09-27 18:29                 ` Shameerali Kolothum Thodi
2021-09-27 22:46   ` Alex Williamson
2021-09-27 23:12     ` Jason Gunthorpe
2021-09-28 19:19       ` Alex Williamson
2021-09-28 19:35         ` Jason Gunthorpe
2021-09-28 20:18           ` Alex Williamson
2021-09-29 16:16             ` Jason Gunthorpe
2021-09-29 18:06               ` Alex Williamson
2021-09-29 18:26                 ` Jason Gunthorpe
2021-09-29 10:57         ` Max Gurtovoy
2021-09-29 10:44       ` Max Gurtovoy
2021-09-29 12:35         ` Alex Williamson
2021-09-29 13:26           ` Max Gurtovoy
2021-09-29 13:50             ` Alex Williamson
2021-09-29 14:36               ` Max Gurtovoy
2021-09-29 15:17                 ` Alex Williamson
2021-09-29 15:28                   ` Max Gurtovoy
2021-09-29 16:14                     ` Jason Gunthorpe
2021-09-29 21:48                       ` Max Gurtovoy
2021-09-29 22:44                         ` Alex Williamson
2021-09-30  9:25                           ` Max Gurtovoy
2021-09-30 12:41                             ` Alex Williamson
2021-09-29 23:21                         ` Jason Gunthorpe [this message]
2021-09-30  9:34                           ` Max Gurtovoy
2021-09-30 14:47                             ` Jason Gunthorpe
2021-09-30 15:32                               ` Max Gurtovoy
2021-09-30 16:24                                 ` Jason Gunthorpe
2021-09-30 16:51                                   ` Max Gurtovoy
2021-09-30 17:01                                     ` Jason Gunthorpe
2021-09-22 10:38 ` [PATCH mlx5-next 3/7] vfio/pci_core: Make the region->release() function optional Leon Romanovsky
2021-09-23 13:57   ` Max Gurtovoy
2021-09-22 10:38 ` [PATCH mlx5-next 4/7] net/mlx5: Introduce migration bits and structures Leon Romanovsky
2021-09-24  5:48   ` Mark Zhang
2021-09-22 10:38 ` [PATCH mlx5-next 5/7] net/mlx5: Expose APIs to get/put the mlx5 core device Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 6/7] mlx5_vfio_pci: Expose migration commands over mlx5 device Leon Romanovsky
2021-09-28 20:22   ` Alex Williamson
2021-09-29  5:36     ` Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 7/7] mlx5_vfio_pci: Implement vfio_pci driver for mlx5 devices Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210929232109.GC3544071@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).