From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Kardashevskiy Subject: Re: [RFC PATCH] PCI: Introduce INTx check & mask API Date: Fri, 25 May 2012 12:47:35 +1000 Message-ID: <4FBEF2C7.4000708@ozlabs.ru> References: <4FBDE6D6.80700@ozlabs.ru> <4FBE2349.6040800@siemens.com> <4FBEDDF3.20108@ozlabs.ru> <4FBEEEA4.2060504@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Graf , Alex Williamson , David Gibson To: Jan Kiszka Return-path: Received: from mail-ob0-f174.google.com ([209.85.214.174]:64631 "EHLO mail-ob0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754393Ab2EYCrq (ORCPT ); Thu, 24 May 2012 22:47:46 -0400 Received: by obbtb18 with SMTP id tb18so593411obb.19 for ; Thu, 24 May 2012 19:47:45 -0700 (PDT) In-Reply-To: <4FBEEEA4.2060504@web.de> Sender: kvm-owner@vger.kernel.org List-ID: On 25/05/12 12:29, Jan Kiszka wrote: > On 2012-05-24 22:18, Alexey Kardashevskiy wrote: >> On 24/05/12 22:02, Jan Kiszka wrote: >>> On 2012-05-24 04:44, Alexey Kardashevskiy wrote: >>>> [Found while debugging VFIO on POWER but it is platform independen= t] >>>> >>>> There is a feature in PCI (>=3D2.3?) to mask/unmask INTx via PCI_C= OMMAND and >>>> PCI_STATUS registers. >>> >>> Yes, 2.3 introduced this. Masking is done via command register, che= cking >>> if the source was the PCI in question via the status register. The >>> latter is important for supporting IRQ sharing - and that's why we >>> introduced this masking API to the PCI layer. >> >> >> Is not it just a quite small optimization to not to disable interrup= ts on all devices which share >> the same IRQ but just on those who fired an interrupt? If so, do PCI= devices really often share >> IRQs? Does not supporting this mean real slowdown on such devices? >> >> As far as I understand, everyone who cares about performance uses MS= I/MSIX, no? >=20 > Not everyone is blessed with MSI-only PCI devices. From my notebook: >=20 > # cat /proc/interrupts > [...] > 22: [...] IO-APIC-fasteoi ehci_hcd:usb1, ehci_hcd:usb2 >=20 > So, if I want to assign one EHCI controller to a guest, I have to > disable the other as well. The same can happen quickly if you attach = a > few legacy PCI adapters to a system and want to pass them through. Why? vfio-pci receives interrupt, disables it, handles it, enables inte= rrupt back. Yes, handling is a bit longer and includes passing interrupt to QEMU and then to the gue= st (can be optimized to avoid QEMU) and waiting for EOI notification but this is all the difference. Does the current kernel use INTx bit for your USB controllers now, with= out any KVM, etc? So, is it just an optimization or it is something bigger that I missed? >>>> And there is some API to support that (commit a2e27787f893621c5a6b= 865acf6b7766f8671328). >>>> >>>> I have a network adapter: >>>> 0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 = 10GbE Single Port Adapter >>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParEr= r+ Stepping- SERR+ FastB2B- DisINTx- >>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=3Dfast >TAbort- = SERR- >>> >>>> pci_intx_mask_supported() reports that the feature is supported fo= r this adapter >>>> BUT the adapter does not set PCI_STATUS_INTERRUPT so pci_check_and= _set_intx_mask() >>>> never changes PCI_COMMAND and INTx does not work on it when we use= it as VFIO-PCI device. >>>> >>>> If I remove the check of this bit, it works fine as it is called f= rom an interrupt handler and >>>> Status bit check is redundant. >>>> >>>> Opened a spec: >>>> PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register = Bits >>>> =3D=3D=3D >>>> 3 This read-only bit reflects the state of the interrupt in the >>>> device/function. Only when the Interrupt Disable bit in the comman= d >>>> register is a 0 and this Interrupt Status bit is a 1, will the >>>> device=E2=80=99s/function=E2=80=99s INTx# signal be asserted. Sett= ing the Interrupt >>>> Disable bit to a 1 has no effect on the state of this bit. >>>> =3D=3D=3D >>>> With this adapter, INTx# is asserted but Status bit is still 0. >>>> >>>> Is it mandatory for a device to set Status bit if it supports INTx= masking? >>>> >>>> 2 Alex: if it is mandatory, then we need to be able to disable pci= _2_3 in VFIO-PCI >>>> somehow. >>> >>> Since PCI 2.3, this bit is mandatory, and it should be independent = of >>> the masking bit. The question is, if your device is supposed to sup= port >>> 2.3, thus is just buggy, or if our detection algorithm is unreliabl= e. It >>> basically builds on the assumption that, if we can flip the mask bi= t, >>> the feature should be present. I guess that is the best we can do. = Maybe >>> we can augment this with a blacklist of devices that "support" flip= ping >>> without actually providing the feature. >> >> It is a good moment to start :) >> Not sure where - in VFIO or along with that PCI INTx API. >=20 > At PCI level as the API is VFIO agnostic (it was introduced for > "classic" KVM device assignment, in fact). >> Here is that broken device: >> aik@vpl2:~$ lspci -s 1:1:0.0 >> 0001:01:00.0 Ethernet controller: Chelsio Communications Inc T310 10= GbE Single Port Adapter >> aik@vpl2:~$ lspci -ns 1:1:0.0 >> 0001:01:00.0 0200: 1425:0030 >=20 > A patch to add the infrastructure as well would be even more welcome.= :) > You could have a look at drivers/pci/quirks.c for patterns how to do = this. I am not sure yet that we need this feature at all ;) I would rather pr= efer to have some way to disable it in VFIO rather than to add yet another quirk for the feature= which nobody uses at the moment. Really, this device supports MSI/MSIX and in real life nobody is going = to use INTx on it. The only need for it is testing. --=20 Alexey