From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50194) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WYdtu-0003wL-OX for qemu-devel@nongnu.org; Fri, 11 Apr 2014 12:02:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WYdtk-0002c5-H5 for qemu-devel@nongnu.org; Fri, 11 Apr 2014 12:01:58 -0400 Received: from mail-pb0-f54.google.com ([209.85.160.54]:55976) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WYdtk-0002bq-9P for qemu-devel@nongnu.org; Fri, 11 Apr 2014 12:01:48 -0400 Received: by mail-pb0-f54.google.com with SMTP id ma3so5590098pbc.13 for ; Fri, 11 Apr 2014 09:01:47 -0700 (PDT) Message-ID: <534811E6.2050006@ozlabs.ru> Date: Sat, 12 Apr 2014 02:01:42 +1000 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <1394770689-29039-1-git-send-email-aik@ozlabs.ru> <1394770689-29039-7-git-send-email-aik@ozlabs.ru> <534693E4.1050204@suse.de> <53469B86.7020400@ozlabs.ru> <53469BF3.7000406@suse.de> <5346AE11.1050206@ozlabs.ru> <5347B4D3.9060508@suse.de> <5347E263.7050605@ozlabs.ru> <53480131.4070203@ozlabs.ru> <53480331.1080106@suse.de> <534809DC.5050303@ozlabs.ru> <53480C5E.6060908@suse.de> In-Reply-To: <53480C5E.6060908@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, =?UTF-8?B?QW5kcmVhcyBGw6Ry?= =?UTF-8?B?YmVy?= On 04/12/2014 01:38 AM, Alexander Graf wrote: > > On 11.04.14 17:27, Alexey Kardashevskiy wrote: >> On 04/12/2014 12:58 AM, Alexander Graf wrote: >>> On 11.04.14 16:50, Alexey Kardashevskiy wrote: >>>> On 04/11/2014 11:58 PM, Alexander Graf wrote: >>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy wrote: >>>>> >>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote: >>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote: >>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote: >>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote: >>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote: >>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote: >>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not >>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it >>>>>>>>>>>> previously returned, it only had the last returned IRQ. >>>>>>>>>>>> However migration may change interrupts for devices depending on >>>>>>>>>>>> their order in the command line. >>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't >>>>>>>>>>> change >>>>>>>>>>> anything. >>>>>>>>>> I put wrong commit message. By change I meant that the default state >>>>>>>>>> before >>>>>>>>>> the destination guest started accepting migration is different from >>>>>>>>>> what >>>>>>>>>> the destination guest became after migration finished. And migration >>>>>>>>>> cannot >>>>>>>>>> avoid changing this default state. >>>>>>>>> Ok, why is the IRQ configuration different? >>>>>>>> Because QEMU creates devices in the order as in the command line, and >>>>>>>> libvirt changes this order - the XML used to create the guest and the >>>>>>>> XML >>>>>>>> which is sends during migration are different. libvirt thinks it is ok >>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices >>>>>>>> but it >>>>>>>> is not because since the order is different, devices call IRQ >>>>>>>> allocator in >>>>>>>> different order and get different IRQs. >>>>>>> So your patch migrates the current IRQ configuration, but once you >>>>>>> restart >>>>>>> the virtual machine on the destination host it will have different IRQ >>>>>>> numbering again, right? >>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and >>>>>> survive reset) or as a part of ibm,change-msi rtas call which happens in >>>>>> the same order as it only depends on pci addresses and we do not change >>>>>> this either. >>>>> Ok, let me rephrase. If I shut the machine down because I'm doing >>>>> on-disk hibernate and then boot it back up, will the guest find the same >>>>> configuration? >>>> I do not understand what you mean by this. Hibernation by the guest OS >>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes, >>> by the guest OS. The host will only see a genuine "shutdown" event. The >>> guest OS will expect the machine to look *the exact same* as before the >>> shutdown. >> Ok. So. I have to implement "irq" property everywhere (PHB is missing >> INTA/B/C/D now) and check if they did not change during migration via those > > Hrm. Not sure. Maybe it'd make sense to join next week's call on platform > device creation. The problem seems pretty closely related. What are those platform devices and what are you going to discuss exactly? >> VMSTATE.*EQUAL. Correct? > > Why would you need this? I think we already said a couple dozen times that > configuration matching is a bigger problem, no? For debug! It is not needed in general, yes. >> If so (more or less), I still would like to keep patches 1..7. >> In fact, the first one is independent and we need it anyway. >> Yes/no? > > Why? IOMMUs do not migrate correctly - they only have a class have and instance_id and this instance_it depends on command line arguments order. The #1 patch makes it classname + liobn. > >> >> >>>> config may be different. If it is "migrate to file" and then "migrate from >>>> file" (do not know what you call it when migration goes to a pipe which is >>>> "tar") - then config will be the same. >>>> >>>> >>>>>>> I'm not sure that's a good solution to the problem. I guess we should >>>>>>> rather aim to make sure that we can make IRQ allocation explicit. >>>>>>> Fundamentally the problem sounds very similar to the PCI slot >>>>>>> allocation >>>>>>> which eventually got solved by libvirt specifying the slots manually. >>>>>> We can do that too. Who decides? :) >>>>> The better solution wins :) >>>> We both know who decides ;) I posted series, I need heads up if it is >>>> going >>>> the right way or not. >>> It's not :). If a guest may not have different IRQ allocation after >>> migration, it also must not have different IRQ allocation after shutdown + >>> restart. >> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded >> (some are for sure but I do not know about MSI)? Or in order to support > > Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI > interrupts which again are hardcoded to IOAPIC interrupt lines after some > PCI line swizzling. This is what I meant - I need to have a way to tell PHB IRQ numbers for INTA/B/C/D. > MSI gets configured by the guest, so it has to make sure MSIs are set up > identically again after hibernation. > > > Alex >> migration, the user has to specify IRQs for the devices which may get >> different IRQs depending on things like command line parameters order? > -- Alexey