From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49260) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wn3ri-0007n3-UB for qemu-devel@nongnu.org; Wed, 21 May 2014 06:35:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wn3rc-0007cH-Hp for qemu-devel@nongnu.org; Wed, 21 May 2014 06:35:18 -0400 Message-ID: <537C815D.20409@suse.de> Date: Wed, 21 May 2014 12:35:09 +0200 From: Alexander Graf MIME-Version: 1.0 References: <1400147999-4793-1-git-send-email-aik@ozlabs.ru> <1400147999-4793-9-git-send-email-aik@ozlabs.ru> <537C6680.20601@suse.de> <537C6964.2050708@ozlabs.ru> <537C6C81.7040702@suse.de> <20140521091104.GD19109@redhat.com> <537C6E35.9090607@suse.de> <537C72F0.3080107@ozlabs.ru> <537C76D1.1080208@suse.de> <537C7C3E.2050501@ozlabs.ru> In-Reply-To: <537C7C3E.2050501@ozlabs.ru> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy , "Michael S. Tsirkin" Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org On 21.05.14 12:13, Alexey Kardashevskiy wrote: > On 05/21/2014 07:50 PM, Alexander Graf wrote: >> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as >>>>>>>>> XICS used to be unable to reuse interrupts which becomes a problem for >>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver >>>>>>>>> reload or >>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices >>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good >>>>>>>>> reason >>>>>>>>> for that. >>>>>>>>> >>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>> >>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ >>>>>>>>> number >>>>>>>>> of a device is stored in MSI/MSIX config space so there is no need to >>>>>>>>> store >>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling >>>>>>>>> what >>>>>>>>> type >>>>>>>>> of interrupt for which device it has configured in order to return >>>>>>>>> error if >>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI >>>>>>>>> which >>>>>>>>> it has not enabled. >>>>>>>>> >>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices >>>>>>>>> per PHB, >>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>> >>>>>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name >>>>>>>>> which was >>>>>>>>> wrong since the beginning. >>>>>>>>> >>>>>>>>> This fixed traces to be more informative. >>>>>>>>> >>>>>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>>>>> --- >>>>>>>>> >>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could >>>>>>>>> remove >>>>>>>>> msi/msix >>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>> Is this a hard requirement? Does a device have to choose between >>>>>>>> MSIX and >>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI >>>>>>>> limitation, >>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation? >>>>>>> My implementation does not have this limitation, I asked if I can >>>>>>> simplify >>>>>>> code by introducing one :) >>>>>>> >>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled >>>>>>> but >>>>>>> it does not seem to be used by anyone => cannot debug and confirm. >>>>>>> >>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is >>>>>>> already >>>>>>> enabled, this is a "change", not enabling both types. But it also >>>>>>> says MSI >>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>> rather not impose one. >>>>>> >>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at >>>>>> the same time? >>>>>> >>>>>> >>>>>> Alex >>>>> No, and the PCI spec says: >>>>> A function is permitted to implement both MSI and MSI-X, but system >>>>> software is >>>>> prohibited from enabling both at the same time. If system software >>>>> enables both at the same time, the result is undefined. >>>> Ah, cool. So yes Alexey, feel free to impose it :). >>> Heh. This solves just half of the problem - I still have to keep track of >>> what device got MSI/MSIX configured via that ibm,change-msi interface. I >>> was hoping I can store such flag somewhere in a device PCI config space but >>> MSI/MSIX enable bit is not good as it is not set when those calls are made. >>> And I cannot rely on address/data fields much as the guest can change them >>> (I already use them to store IRQ numbers and btw it is missing checks when >>> I read them back for disposal, I'll fix in next round). >>> >>> Or on "enable" event I could put IRQ numbers to .data of MSI config space >>> and on "disable" check if it is not zero, then configuration took place, >>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change >>> to MSI/MSIX config space (it does not on SPAPR except weird selftest >>> cases), I compare .data with what ICS can possibly have and either reject >>> "disable" or handle it and if it breaks XICS - that's too bad for the >>> stupid guest. Would that be acceptable? >> Can't you prohibit the guest from writing to the MSI configuration >> registers itself? Then you don't need to do sanity checks. > > I could for emulated devices but VFIO uses the same code. For example, > there is an IBM SCSI IPR card which does a "self test". For that, it saves > MSIX BAR content, does reboot via some backdoor interface and restores MSIX > BAR. It has been solved for VFIO in the host kernel by restoring MSIX data > from cached values when guest is trying to restore it with what it thinks > is actual MSIX data (it is virtualized because of x86). But there is cache We already have a cache because we don't access the real PCI registers with msi_set_message(), no? Alex