From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42719)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XPp08-0002tY-35
	for qemu-devel@nongnu.org; Fri, 05 Sep 2014 04:36:19 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XPozw-0002a1-Nz
	for qemu-devel@nongnu.org; Fri, 05 Sep 2014 04:36:12 -0400
Received: from cantor2.suse.de ([195.135.220.15]:53057 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XPozw-0002Zt-Cd
	for qemu-devel@nongnu.org; Fri, 05 Sep 2014 04:36:00 -0400
Message-ID: <540975EF.8090504@suse.de>
Date: Fri, 05 Sep 2014 10:35:59 +0200
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <20140904105223.336503578@de.ibm.com>
	<1409836584.3804.268.camel@ul30vt.home>
	<20140905074651.GA43812@tuxmaker.boeblingen.de.ibm.com>
In-Reply-To: <20140905074651.GA43812@tuxmaker.boeblingen.de.ibm.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC][patch 0/6] pci pass-through support for
	qemu/KVM on s390
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Frank Blaschka <blaschka@linux.vnet.ibm.com>, Alex Williamson <alex.williamson@redhat.com>
Cc: linux-s390@vger.kernel.org, frank.blaschka@de.ibm.com, kvm@vger.kernel.org, aik@ozlabs.ru, qemu-devel@nongnu.org, pbonzini@redhat.com


On 05.09.14 09:46, Frank Blaschka wrote:
> On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
>> On Thu, 2014-09-04 at 12:52 +0200, frank.blaschka@de.ibm.com wrote:
>>> This set of patches implements pci pass-through support for qemu/KVM =
on s390.
>>> PCI support on s390 is very different from other platforms.
>>> Major differences are:
>>>
>>> 1) all PCI operations are driven by special s390 instructions
>>
>> Generating config cycles is always arch specific.
>>
>>> 2) all s390 PCI instructions are privileged
>>
>> While the operations to generate config cycles on x86 are not
>> privileged, they must be arbitrated between accesses, so in a sense
>> they're privileged.
>>
>>> 3) PCI config and memory spaces can not be mmap'ed
>>
>> VFIO has mapping flags that allow any region to specify mmap support.
>>
>=20
> Hi Alex,
>=20
> thx for your reply.
>=20
> Let me elaborate a little bit ore on 1 - 3. Config and memory space can=
 not
> be accessed via memory operations. You have to use special s390 instruc=
tions.
> This instructions can not be executed in user space. So there is no oth=
er
> way than executing this instructions in kernel. Yes vfio does support a
> slow path via ioctrl we could use, but this seems suboptimal from perfo=
rmance
> point of view.

Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
to call into the kernel for every PCI access, but I still think that
VFIO provides the correct abstraction layer for us to use. If nothing
else, it would at least give us identical configuration to x86 and nice
debugability en par with the other platforms.

> =20
>>> 4) no classic interrupts (INTX, MSI). The pci hw understands the conc=
ept
>>>    of requesting MSIX irqs but irqs are delivered as s390 adapter irq=
s.
>>
>> VFIO delivers interrupts as eventfds regardless of the underlying
>> platform mechanism.
>>
>=20
> yes that's right, but then we have to do platform specific stuff to pre=
sent
> the irq to the guest. I do not say this is impossible but we have add s=
390
> specific code to vfio.=20

Not at all - interrupt delivery is completely transparent to VFIO.

>=20
>>> 5) For DMA access there is always an IOMMU required.
>>
>> x86 requires the same.
>>
>>>  s390 pci implementation
>>>    does not support a complete memory to iommu mapping, dma mappings =
are
>>>    created on request.
>>
>> Sounds like POWER.
>=20
> Don't know the details from power, maybe it is similar but not the same=
.
> We might be able to extend vfio to have a new interface allowing
> us to do DMA mappings on request.

We already have that.

>=20
>>
>>> 6) The OS does not get any informations about the physical layout
>>>    of the PCI bus.
>>
>> If that means that every device is isolated (seems unlikely for
>> multifunction devices) then that makes IOMMU group support really easy=
.
>>
>=20
> OK
> =20
>>> 7) To take advantage of system z specific virtualization features
>>>    we need to access the SIE control block residing in the kernel KVM
>>
>> The KVM-VFIO device allows interaction between VFIO devices and KVM.
>>
>>> 8) To enable system z specific virtualization features we have to man=
ipulate
>>>    the zpci device in kernel.
>>
>> VFIO supports different device backends, currently pci_dev and working
>> towards platform devices.  zpci might just be an extension to standard
>> pci.
>>
>=20
> 7 - 8 At least this is not as straightforward as the pure kernel approa=
ch, but
> I have to dig into that in more detail if we could only agree on a vfio=
 solution.

Please do so, yes :).

>=20
>>> For this reasons I decided to implement a kernel based approach simil=
ar
>>> to x86 device assignment. There is a new qemu device (s390-pci) repre=
senting a
>>> pass through device on the host. Here is a sample qemu device configu=
ration:
>>>
>>> -device s390-pci,host=3D0000:00:00.0
>>>
>>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy=
 instance
>>> in the kernel KVM and connect this instance to the host pci device.
>>>
>>> kernel patches apply to linux-kvm
>>>
>>> s390: cio: chsc function to register GIB
>>> s390: pci: export pci functions for pass-through usage
>>> KVM: s390: Add GISA support
>>> KVM: s390: Add PCI pass-through support
>>>
>>> qemu patches apply to qemu-master
>>>
>>> s390: Add PCI bus support
>>> s390: Add PCI pass-through device support
>>>
>>> Feedback and discussion is highly welcome ...
>>
>> KVM-based device assignment needs to go away.  It's a horrible model f=
or
>> devices, it offers very little protection to the kernel, assumes every
>> device is fully isolated and visible to the IOMMU, relies on smatterin=
g
>> of sysfs files to operate, etc.  x86, POWER, and ARM are all moving to
>> VFIO-based device assignment.  Why is s390 special enough to repeat al=
l
>> the mistakes that x86 did?  Thanks,
>>
>=20
> Is this your personal opinion or was this a strategic decision of the
> QEMU/KVM community? Can anybody give us direction about this?
>=20
> Actually I can understand your point. In the last weeks I did some deve=
lopment
> and testing regarding the use of vfio too. But the in kernel solutions =
seems to
> offer the best performance and most straighforward implementation for o=
ur
> platform.

I don't see why there should be any difference in performance between
the two approaches if done right. However, we'd get a lot of benefits.
Most notably the fact that s390 is not different from everyone else.

I think you'll see that it's pretty straight forward to do things VFIO
style once you get the hang of it :).


Alex