From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>,
Matthew Rosato <mjrosato@linux.ibm.com>,
david@redhat.com, cohuck@redhat.com, qemu-devel@nongnu.org,
pasic@linux.ibm.com, borntraeger@de.ibm.com,
qemu-s390x@nongnu.org, rth@twiddle.net
Subject: Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
Date: Tue, 28 Jul 2020 16:13:57 +0200 [thread overview]
Message-ID: <2f3cd133-e0bc-8923-3016-db2414acef90@linux.ibm.com> (raw)
In-Reply-To: <20200728065215.21a7f5af@x1.home>
On 7/28/20 2:52 PM, Alex Williamson wrote:
> On Tue, 28 Jul 2020 11:33:55 +0200
> Niklas Schnelle <schnelle@linux.ibm.com> wrote:
>
>> On 7/27/20 6:47 PM, Alex Williamson wrote:
>>> On Mon, 27 Jul 2020 17:40:39 +0200
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>>>> On 2020-07-23 18:29, Alex Williamson wrote:
>>>>> On Thu, 23 Jul 2020 11:13:55 -0400
>>>>> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
>>>>>
>>>>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>>>>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>>>>>> fails spectacularly, with errors in qemu like:
>> ... snip ...
>>>>
>>>> Alex, Matt,
>>>>
>>>> in s390 we have the possibility to assign a virtual function to a
>>>> logical partition as function 0.
>>>> In this case it can not be treated as a virtual function but must be
>>>> treated as a physical function.
>>>> This is currently working very well.
>>>> However, these functions do not set PCI_COMMAND_MEMORY as we need.
>>>
>>> Where is the vendor and device ID virtualization done for these
>>> devices, we can't have a PF with IDs 0000:0000.
>> Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
>> so it is the mandatory function zero on it's PCI bus, where until recently
>> we always had only one function per bus but with the recent multi-function
>> support it can act more like on other platforms with several PCI functions
>> sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
>> That's why I'm saying that having devfn == 0 should not be very special for a VF
>> passed to a VM and I really don't see where it would not act like a VF passed
>> from any other Hypervisor.
>
> My question is relative to other registers on VFs that are not
> implemented in hardware, not the device address. In addition to the
> memory bit of the command register, SR-IOV VFs do not implement the
> vendor and device IDs, these are to be virtualized from the values
> provided in the PF SR-IOV capability. It wouldn't make much sense to
> expose a device with no PCI vendor or device ID, so I assume the z/VM
> hypervisor is virtualizing these, but not the memory bit.
Ahh, yes I see. On Z these are actually already virtualized at the LPAR
level as part of the firmware based scanning I mentioned that is the
reason for pdev->no_vf_scan. This is true even for VFs created
through sriov_numvfs in the host which is why I did not realize these
don't come from hardware, even though looking at drivers/pci/iov.c it's
pretty obvious and I did touch that code before. Sorry for the confusion.
>
>> The only really tricky part in my opinion is where during the "probing"
>> we do set is_virtfn so it happens both for VFs passed-through from z/VM
>> or LPAR and VFs created through sriov_numvfs which unlike on other platforms
>> are also scanned by Firmware (pdev->no_vf_scan disables the Linux side scanning).
>> With the fix I'm currently testing I had to do this in pcibios_enable_device()
>> because I also create sysfs links between VFs and their parent PFs and those
>> need the sysfs entries to be already created, which makes the more apropriately
>> sound pcibios_bus_add_device() too early.
>
>
> I don't think it would be wise to set is_virtfn without a physfn being
> present in the OS view. I believe pci_dev.is_virtfn implies
> pci_dev.physfn points to the PF device. Thanks>
> Alex
Thank you a lot for that hint, you're absolutely right, while the
drivers do work with is_virtfn == 1 && physffn == NULL
vfio-pci gets very confused. And sorry Pierre for doubting
your is_virtfn_detached idea, this thing really is confusing.
>
>>>> Shouldn't we fix this inside the kernel, to keep older QMEU working?
>>>>
>>>> Then would it be OK to add a new bit/boolean inside the
>>>> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during
>>>> enumeration and test inside __vfio_pci_memory_enabled() to return true?
>>>
>>> Probably each instance of is_virtfn in vfio-pci should be looked at to
>>> see if it applies to s390. If we're going to recognize this as a VF,
>>> I'd rather we complete the emulation that the lower level hypervisor
>>> has missed. If we can enable all the is_virtfn code on s390, then we
>>> should probably cache is_virtfn on the vfio_pci_device object and allow
>>> s390 a place to set it once at probe or enable time.
>>>
>>>> In the enumeration we have the possibility to know if the function is a
>>>> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
>>>>
>>>> It seems an easy fix without side effects.
>>>>
>>>> What do you think?
>>>
>>> It sure seems preferable to recognize that it is a VF in the kernel
>>> than to require userspace to have arch specific hacks. Thanks,
>>>
>>> Alex
>>>
>>
>
next prev parent reply other threads:[~2020-07-28 14:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-23 15:13 [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Matthew Rosato
2020-07-23 15:13 ` [RFC PATCH] s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci Matthew Rosato
2020-07-23 16:29 ` [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Alex Williamson
2020-07-23 18:12 ` Matthew Rosato
2020-07-27 15:40 ` Pierre Morel
2020-07-27 16:37 ` Matthew Rosato
2020-07-27 16:47 ` Alex Williamson
2020-07-28 9:33 ` Niklas Schnelle
2020-07-28 12:52 ` Alex Williamson
2020-07-28 14:13 ` Niklas Schnelle [this message]
2020-07-28 8:59 ` Niklas Schnelle
2020-07-24 9:46 ` Niklas Schnelle
2020-07-24 9:53 ` Niklas Schnelle
2020-07-24 14:15 ` Niklas Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2f3cd133-e0bc-8923-3016-db2414acef90@linux.ibm.com \
--to=schnelle@linux.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=pmorel@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).