From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52590) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WYe6s-000506-Te for qemu-devel@nongnu.org; Fri, 11 Apr 2014 12:15:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WYe6m-00073y-JN for qemu-devel@nongnu.org; Fri, 11 Apr 2014 12:15:22 -0400 Message-ID: <53481512.5020503@suse.de> Date: Fri, 11 Apr 2014 18:15:14 +0200 From: Alexander Graf MIME-Version: 1.0 References: <1394770689-29039-1-git-send-email-aik@ozlabs.ru> <1394770689-29039-7-git-send-email-aik@ozlabs.ru> <534693E4.1050204@suse.de> <53469B86.7020400@ozlabs.ru> <53469BF3.7000406@suse.de> <5346AE11.1050206@ozlabs.ru> <5347B4D3.9060508@suse.de> <5347E263.7050605@ozlabs.ru> <53480131.4070203@ozlabs.ru> <53480331.1080106@suse.de> <534809DC.5050303@ozlabs.ru> <53480C5E.6060908@suse.de> <534811E6.2050006@ozlabs.ru> In-Reply-To: <534811E6.2050006@ozlabs.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, =?UTF-8?B?QW5kcmVhcyBGw6Ry?= =?UTF-8?B?YmVy?= On 11.04.14 18:01, Alexey Kardashevskiy wrote: > On 04/12/2014 01:38 AM, Alexander Graf wrote: >> On 11.04.14 17:27, Alexey Kardashevskiy wrote: >>> On 04/12/2014 12:58 AM, Alexander Graf wrote: >>>> On 11.04.14 16:50, Alexey Kardashevskiy wrote: >>>>> On 04/11/2014 11:58 PM, Alexander Graf wrote: >>>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote: >>>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote: >>>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote: >>>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote: >>>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote: >>>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote: >>>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not >>>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it >>>>>>>>>>>>> previously returned, it only had the last returned IRQ. >>>>>>>>>>>>> However migration may change interrupts for devices depending on >>>>>>>>>>>>> their order in the command line. >>>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't >>>>>>>>>>>> change >>>>>>>>>>>> anything. >>>>>>>>>>> I put wrong commit message. By change I meant that the default state >>>>>>>>>>> before >>>>>>>>>>> the destination guest started accepting migration is different from >>>>>>>>>>> what >>>>>>>>>>> the destination guest became after migration finished. And migration >>>>>>>>>>> cannot >>>>>>>>>>> avoid changing this default state. >>>>>>>>>> Ok, why is the IRQ configuration different? >>>>>>>>> Because QEMU creates devices in the order as in the command line, and >>>>>>>>> libvirt changes this order - the XML used to create the guest and the >>>>>>>>> XML >>>>>>>>> which is sends during migration are different. libvirt thinks it is ok >>>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices >>>>>>>>> but it >>>>>>>>> is not because since the order is different, devices call IRQ >>>>>>>>> allocator in >>>>>>>>> different order and get different IRQs. >>>>>>>> So your patch migrates the current IRQ configuration, but once you >>>>>>>> restart >>>>>>>> the virtual machine on the destination host it will have different IRQ >>>>>>>> numbering again, right? >>>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and >>>>>>> survive reset) or as a part of ibm,change-msi rtas call which happens in >>>>>>> the same order as it only depends on pci addresses and we do not change >>>>>>> this either. >>>>>> Ok, let me rephrase. If I shut the machine down because I'm doing >>>>>> on-disk hibernate and then boot it back up, will the guest find the same >>>>>> configuration? >>>>> I do not understand what you mean by this. Hibernation by the guest OS >>>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes, >>>> by the guest OS. The host will only see a genuine "shutdown" event. The >>>> guest OS will expect the machine to look *the exact same* as before the >>>> shutdown. >>> Ok. So. I have to implement "irq" property everywhere (PHB is missing >>> INTA/B/C/D now) and check if they did not change during migration via those >> Hrm. Not sure. Maybe it'd make sense to join next week's call on platform >> device creation. The problem seems pretty closely related. > What are those platform devices and what are you going to discuss exactly? Devices that don't have a unified interrupt routing scheme like PCI where you just link lines A/B/C/D to your controller and you're good to go. > > >>> VMSTATE.*EQUAL. Correct? >> Why would you need this? I think we already said a couple dozen times that >> configuration matching is a bigger problem, no? > For debug! It is not needed in general, yes. > > >>> If so (more or less), I still would like to keep patches 1..7. >>> In fact, the first one is independent and we need it anyway. >>> Yes/no? >> Why? > IOMMUs do not migrate correctly - they only have a class have and > instance_id and this instance_it depends on command line arguments order. > The #1 patch makes it classname + liobn. Why do we need a bus for that? > > >>> >>>>> config may be different. If it is "migrate to file" and then "migrate from >>>>> file" (do not know what you call it when migration goes to a pipe which is >>>>> "tar") - then config will be the same. >>>>> >>>>> >>>>>>>> I'm not sure that's a good solution to the problem. I guess we should >>>>>>>> rather aim to make sure that we can make IRQ allocation explicit. >>>>>>>> Fundamentally the problem sounds very similar to the PCI slot >>>>>>>> allocation >>>>>>>> which eventually got solved by libvirt specifying the slots manually. >>>>>>> We can do that too. Who decides? :) >>>>>> The better solution wins :) >>>>> We both know who decides ;) I posted series, I need heads up if it is >>>>> going >>>>> the right way or not. >>>> It's not :). If a guest may not have different IRQ allocation after >>>> migration, it also must not have different IRQ allocation after shutdown + >>>> restart. >>> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded >>> (some are for sure but I do not know about MSI)? Or in order to support >> Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI >> interrupts which again are hardcoded to IOAPIC interrupt lines after some >> PCI line swizzling. > This is what I meant - I need to have a way to tell PHB IRQ numbers for > INTA/B/C/D. Yes, just like platform devices ;). Alex