From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Yishai Hadas <yishaih@nvidia.com>,
bhelgaas@google.com, saeedm@nvidia.com,
linux-pci@vger.kernel.org, kvm@vger.kernel.org,
netdev@vger.kernel.org, kuba@kernel.org, leonro@nvidia.com,
kwankhede@nvidia.com, mgurtovoy@nvidia.com, maorg@nvidia.com
Subject: Re: [PATCH V6 mlx5-next 09/15] vfio: Extend the device migration protocol with RUNNING_P2P
Date: Wed, 2 Feb 2022 16:54:44 -0700 [thread overview]
Message-ID: <20220202165444.44816642.alex.williamson@redhat.com> (raw)
In-Reply-To: <20220201195003.GN1786498@nvidia.com>
On Tue, 1 Feb 2022 15:50:03 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Tue, Feb 01, 2022 at 12:13:22PM -0700, Alex Williamson wrote:
> > On Tue, 1 Feb 2022 14:53:21 -0400
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > On Tue, Feb 01, 2022 at 11:31:44AM -0700, Alex Williamson wrote:
> > > > > + bool have_p2p = device->migration_flags & VFIO_MIGRATION_P2P;
> > > > > +
> > > > > if (cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
> > > > > new_fsm >= ARRAY_SIZE(vfio_from_fsm_table))
> > > > > return VFIO_DEVICE_STATE_ERROR;
> > > > >
> > > > > - return vfio_from_fsm_table[cur_fsm][new_fsm];
> > > > > + if (!have_p2p && (new_fsm == VFIO_DEVICE_STATE_RUNNING_P2P ||
> > > > > + cur_fsm == VFIO_DEVICE_STATE_RUNNING_P2P))
> > > > > + return VFIO_DEVICE_STATE_ERROR;
> > > >
> > > > new_fsm is provided by the user, we pass set_state.device_state
> > > > directly to .migration_set_state. We should do bounds checking and
> > > > compatibility testing on the end state in the core so that we can
> > >
> > > This is the core :)
> >
> > But this is the wrong place, we need to do it earlier rather than when
> > we're already iterating next states. I only mention core to avoid that
> > I'm suggesting a per driver responsibility.
>
> Only the first vfio_mig_get_next_state() can return ERROR, once it
> succeeds the subsequent ones must also succeed.
Yes, I see that.
> This is the earliest can be. It is done directly after taking the lock
> that allows us to read the current state to call this function to
> determine if the requested transition is acceptable.
I think the argument here is that there's no value to validating or
bounds checking the end state, which could be done in the core ioctl
before calling the driver if the first iteration will already fail for
both the end state and the full path validation.
> > > Userspace can never put the device into error. As the function comment
> > > says VFIO_DEVICE_STATE_ERROR is returned to indicate the arc is not
> > > permitted. The driver is required to reflect that back as an errno
> > > like mlx5 shows:
> > >
> > > + next_state = vfio_mig_get_next_state(vdev, mvdev->mig_state,
> > > + new_state);
> > > + if (next_state == VFIO_DEVICE_STATE_ERROR) {
> > > + res = ERR_PTR(-EINVAL);
> > > + break;
> > > + }
> > >
> > > We never get the driver into error, userspaces gets an EINVAL and no
> > > change to the device state.
> >
> > Hmm, subtle. I'd argue that if we do a bounds and support check of the
> > end state in vfio_ioctl_mig_set_state() before calling
> > .migration_set_state() then we could remove ERROR from
> > vfio_from_fsm_table[] altogether and simply begin
> > vfio_mig_get_next_state() with:
>
> Then we can't reject blocked arcs like STOP_COPY -> PRE_COPY.
Right, I hadn't made it through to 15/, which helps to clarify how the
cur_fsm + new_fsm validate the full arc.
> It is setup this way to allow the core code to assert all policy, not
> just a simple validation of the next_fsm.
>
> > Then we only get to ERROR by the driver placing us in ERROR and things
> > feel a bit more sane to me.
>
> This is already true.
>
> Perhaps it is confusing using ERROR to indicate that
> vfio_mig_get_next_state() failed. Would you be happier with a -errno
> return?
Yes, it's confusing to me that next_state() returns states that don't
become the device_state. Stuffing the next step back into cur_fsm and
using an errno for a bounds/validity/blocked-arc test would be a better
API. Thanks,
Alex
next prev parent reply other threads:[~2022-02-02 23:54 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-30 16:08 [PATCH V6 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 01/15] PCI/IOV: Add pci_iov_vf_id() to get VF index Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 02/15] net/mlx5: Reuse exported virtfn index function call Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 03/15] net/mlx5: Disable SRIOV before PF removal Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 04/15] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 05/15] net/mlx5: Expose APIs to get/put the mlx5 core device Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 06/15] net/mlx5: Introduce migration bits and structures Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 07/15] vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Yishai Hadas
2022-01-31 23:41 ` Alex Williamson
2022-02-01 0:11 ` Jason Gunthorpe
2022-02-01 15:47 ` Alex Williamson
2022-02-01 15:49 ` Jason Gunthorpe
2022-01-30 16:08 ` [PATCH V6 mlx5-next 08/15] vfio: Define device migration protocol v2 Yishai Hadas
2022-01-31 23:43 ` Alex Williamson
2022-02-01 0:31 ` Jason Gunthorpe
2022-02-01 17:04 ` Alex Williamson
2022-02-01 18:36 ` Jason Gunthorpe
2022-02-01 21:49 ` Alex Williamson
2022-02-02 0:24 ` Jason Gunthorpe
2022-02-02 23:36 ` Alex Williamson
2022-02-03 14:17 ` Jason Gunthorpe
2022-02-04 12:12 ` Cornelia Huck
2022-02-03 15:51 ` Tarun Gupta (SW-GPU)
2022-02-01 12:06 ` Cornelia Huck
2022-02-01 12:10 ` Jason Gunthorpe
2022-02-01 12:18 ` Cornelia Huck
2022-02-01 12:27 ` Jason Gunthorpe
2022-01-30 16:08 ` [PATCH V6 mlx5-next 09/15] vfio: Extend the device migration protocol with RUNNING_P2P Yishai Hadas
2022-02-01 11:54 ` Cornelia Huck
2022-02-01 12:13 ` Jason Gunthorpe
2022-02-01 18:31 ` Alex Williamson
2022-02-01 18:53 ` Jason Gunthorpe
2022-02-01 19:13 ` Alex Williamson
2022-02-01 19:50 ` Jason Gunthorpe
2022-02-02 23:54 ` Alex Williamson [this message]
2022-02-03 14:22 ` Jason Gunthorpe
2022-01-30 16:08 ` [PATCH V6 mlx5-next 10/15] vfio: Remove migration protocol v1 Yishai Hadas
2022-02-01 11:23 ` Cornelia Huck
2022-02-01 12:13 ` Jason Gunthorpe
2022-02-01 12:39 ` Cornelia Huck
2022-02-01 12:54 ` Jason Gunthorpe
2022-02-01 13:26 ` Cornelia Huck
2022-02-01 13:52 ` Jason Gunthorpe
2022-02-01 14:19 ` Cornelia Huck
2022-02-01 14:29 ` Jason Gunthorpe
2022-02-02 11:34 ` Cornelia Huck
2022-02-02 12:22 ` Jason Gunthorpe
2022-02-01 23:01 ` Alex Williamson
2022-02-02 0:28 ` Jason Gunthorpe
2022-02-02 11:38 ` Cornelia Huck
2022-01-30 16:08 ` [PATCH V6 mlx5-next 11/15] vfio/mlx5: Expose migration commands over mlx5 device Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 12/15] vfio/mlx5: Implement vfio_pci driver for mlx5 devices Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 13/15] vfio/pci: Expose vfio_pci_core_aer_err_detected() Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 14/15] vfio/mlx5: Use its own PCI reset_done error handler Yishai Hadas
2022-01-30 16:08 ` [PATCH V6 mlx5-next 15/15] vfio: Extend the device migration protocol with PRE_COPY Yishai Hadas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220202165444.44816642.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=jgg@nvidia.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-pci@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=mgurtovoy@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).