From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44064) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WnQhD-00016O-CJ for qemu-devel@nongnu.org; Thu, 22 May 2014 06:58:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WnQh6-00069u-Us for qemu-devel@nongnu.org; Thu, 22 May 2014 06:57:59 -0400 Message-ID: <537DD82F.4020402@suse.de> Date: Thu, 22 May 2014 12:57:51 +0200 From: Alexander Graf MIME-Version: 1.0 References: <1400756006-26297-1-git-send-email-aik@ozlabs.ru> In-Reply-To: <1400756006-26297-1-git-send-email-aik@ozlabs.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy , qemu-devel@nongnu.org Cc: qemu-ppc@nongnu.org, Juan Quintela On 22.05.14 12:53, Alexey Kardashevskiy wrote: > On 05/22/2014 05:16 PM, Alexander Graf wrote:> >>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy : >>> >>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote: >>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote: >>>>> >>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote: >>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote: >>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX >>>>>>>>>>>>>> interrupt as >>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a >>>>>>>>>>>>>> problem for >>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver >>>>>>>>>>>>>> reload or >>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices >>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good >>>>>>>>>>>>>> reason >>>>>>>>>>>>>> for that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first >>>>>>>>>>>>>> IRQ >>>>>>>>>>>>>> number >>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no >>>>>>>>>>>>>> need to >>>>>>>>>>>>>> store >>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling >>>>>>>>>>>>>> what >>>>>>>>>>>>>> type >>>>>>>>>>>>>> of interrupt for which device it has configured in order to return >>>>>>>>>>>>>> error if >>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable >>>>>>>>>>>>>> MSI >>>>>>>>>>>>>> which >>>>>>>>>>>>>> it has not enabled. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices >>>>>>>>>>>>>> per PHB, >>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This changes migration stream as it fixes >>>>>>>>>>>>>> vmstate_spapr_pci_msi::name >>>>>>>>>>>>>> which was >>>>>>>>>>>>>> wrong since the beginning. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This fixed traces to be more informative. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> >>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could >>>>>>>>>>>>>> remove >>>>>>>>>>>>>> msi/msix >>>>>>>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between >>>>>>>>>>>>> MSIX and >>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI >>>>>>>>>>>>> limitation, >>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation? >>>>>>>>>>>> My implementation does not have this limitation, I asked if I can >>>>>>>>>>>> simplify >>>>>>>>>>>> code by introducing one :) >>>>>>>>>>>> >>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled >>>>>>>>>>>> but >>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm. >>>>>>>>>>>> >>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is >>>>>>>>>>>> already >>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also >>>>>>>>>>>> says MSI >>>>>>>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>>>>>>> rather not impose one. >>>>>>>>>>> >>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at >>>>>>>>>>> the same time? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Alex >>>>>>>>>> No, and the PCI spec says: >>>>>>>>>> A function is permitted to implement both MSI and MSI-X, but >>>>>>>>>> system >>>>>>>>>> software is >>>>>>>>>> prohibited from enabling both at the same time. If system software >>>>>>>>>> enables both at the same time, the result is undefined. >>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :). >>>>>>>> Heh. This solves just half of the problem - I still have to keep track of >>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I >>>>>>>> was hoping I can store such flag somewhere in a device PCI config space >>>>>>>> but >>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are >>>>>>>> made. >>>>>>>> And I cannot rely on address/data fields much as the guest can change them >>>>>>>> (I already use them to store IRQ numbers and btw it is missing checks when >>>>>>>> I read them back for disposal, I'll fix in next round). >>>>>>>> >>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space >>>>>>>> and on "disable" check if it is not zero, then configuration took place, >>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change >>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest >>>>>>>> cases), I compare .data with what ICS can possibly have and either reject >>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the >>>>>>>> stupid guest. Would that be acceptable? >>>>>>> Can't you prohibit the guest from writing to the MSI configuration >>>>>>> registers itself? Then you don't need to do sanity checks. >>>>>> I could for emulated devices but VFIO uses the same code. For example, >>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves >>>>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX >>>>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data >>>>>> from cached values when guest is trying to restore it with what it thinks >>>>>> is actual MSIX data (it is virtualized because of x86). But there is cache >>>>> We already have a cache because we don't access the real PCI registers with >>>>> msi_set_message(), no? >>>> >>>> For emulated devices there is no cache. And in any case the guest is >>>> allowed to write to it... Who knows what AIX does? I do not. >>> >>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but >>> how to migrate such thing? Temporary cache somewhere and then unpack it? Or >>> use old style migration callbacks? >> Could you try to introduce a new vmstate type that just serializes and >> deserializes hash tables? Maybe there is already a serialization >> function for it in glib? > I have not found any (most likely I do not know how to search there), > I added mine, here are VMSTATE_HASH + its use for SPAPR. > > > Is this a movement to right direction? I need to put key/value sizes > into VMSTATE definition somehow but do not really want to touch > VMStateField. I'm not sure. Juan? Alex