From: Cornelia Huck <cohuck@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
linux-doc@vger.kernel.org, kvm@vger.kernel.org,
Kirti Wankhede <kwankhede@nvidia.com>,
Max Gurtovoy <mgurtovoy@nvidia.com>,
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH RFC] vfio: Documentation for the migration region
Date: Fri, 26 Nov 2021 16:01:49 +0100 [thread overview]
Message-ID: <87mtlrhsf6.fsf@redhat.com> (raw)
In-Reply-To: <20211126130608.GR4670@nvidia.com>
On Fri, Nov 26 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Fri, Nov 26, 2021 at 01:56:26PM +0100, Cornelia Huck wrote:
>> On Thu, Nov 25 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> > On Thu, Nov 25, 2021 at 01:27:12PM +0100, Cornelia Huck wrote:
>> >> On Wed, Nov 24 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:
>> >>
>> >> > On Wed, Nov 24, 2021 at 05:55:49PM +0100, Cornelia Huck wrote:
>> >>
>> >> >> What I meant to say: If we give userspace the flexibility to operate
>> >> >> this, we also must give different device types some flexibility. While
>> >> >> subchannels will follow the general flow, they'll probably condense/omit
>> >> >> some steps, as I/O is quite different to PCI there.
>> >> >
>> >> > I would say no - migration is general, no device type should get to
>> >> > violate this spec. Did you have something specific in mind? There is
>> >> > very little PCI specific here already
>> >>
>> >> I'm not really thinking about violating the spec, but more omitting
>> >> things that do not really apply to the hardware. For example, it is
>> >> really easy to shut up a subchannel, we don't really need to wait until
>> >> nothing happens anymore, and it doesn't even have MMIO.
>> >
>> > I've never really looked closely at the s390 mdev drivers..
>> >
>> > What does something like AP even do anyhow? The ioctl handler doesn't
>> > do anything, there is no mmap hook, how does the VFIO userspace
>> > interact with this thing?
>>
>> For AP, the magic is in the hardware/firmware; the vfio parts are needed
>> to configure what is exposed to a given guest, not for operation. Once
>> it is up, the hardware will handle any instructions directly, the
>> hypervisor will not see them. (Unfortunately, none of the details have
>> public documentation.) I have no idea how this would play with migration.
>
> That is kind of what I thought..
>
> VFIO is all about exposing a device to userspace control, sounds like
> the S390 drivers skipped that step.
Note that what I wrote above is about AP; CCW does indeed trigger
operations like start subchannel from userspace and relays interrupts
back to userspace. AP is just very dissimilar to basically anything
else.
>
> KVM is all about taking what userspace can already control and giving
> it to a guest, in an accelerated way.
>
> Making a bypass where a KVM guest has more capability than the user
> process because VFIO and KVM have been directly coupled completely
> upends the whole logical model.
>
> As we talked with Intel's wbinvd stuff you should have a mental model
> where the VFIO userspace process can do anything the KVM guest can do
> via ioctls on the mdev. KVM is just an accelerated way to do that same
> stuff. Maybe S390 doesn't implement those ioctls, but they are
> logically part of the model.
FWIW, AP had been a pain to model in a way that we could hand the
devices to the guest; if we are supposed to use vfio for this purpose,
the current design is probably the best we can get, at least nobody has
been able to come up with a better way to interact with the interfaces
that we have.
CCW needs a kernel part for translations, as it doesn't have an iommu,
and the I/O instructions are of course privileged (but so are the
instructions for s390 PCI); I think it is quite close to other devices
in other respects, only that it has a more transaction-based model.
> So, for the migration doc, imagine some non-accelerated KVM that was
> intercepting the guest operations and calling the logical ioctls on
> the mdev instead. When we talk about MMIO/PIO/etc it also includes
> mdev operation ioctls too, and by extension any ioctl accelerated
> inside KVM.
I think only AP is the really odd one out here; CCW will likely differ
in some details... I just wanted to make sure that this will not run
counter to the documentation.
prev parent reply other threads:[~2021-11-26 15:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-22 19:53 [PATCH RFC] vfio: Documentation for the migration region Jason Gunthorpe
2021-11-22 20:31 ` Jonathan Corbet
2021-11-23 0:20 ` Jason Gunthorpe
2021-11-23 7:22 ` Akira Yokosawa
2021-11-23 14:21 ` Cornelia Huck
2021-11-23 16:53 ` Jason Gunthorpe
2021-11-24 16:55 ` Cornelia Huck
2021-11-24 18:40 ` Jason Gunthorpe
2021-11-25 12:27 ` Cornelia Huck
2021-11-25 16:14 ` Jason Gunthorpe
2021-11-26 12:56 ` Cornelia Huck
2021-11-26 13:06 ` Jason Gunthorpe
2021-11-26 15:01 ` Cornelia Huck [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mtlrhsf6.fsf@redhat.com \
--to=cohuck@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=corbet@lwn.net \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=mgurtovoy@nvidia.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).