From: David Gibson <david@gibson.dropbear.id.au>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Greg Kurz <groug@kaod.org>
Subject: Re: [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Thu, 12 Apr 2018 15:15:03 +1000 [thread overview]
Message-ID: <20180412051503.GN9425@umbus.fritz.box> (raw)
In-Reply-To: <2226ed5e-f666-8060-2edd-998d2f5107ed@kaod.org>
[-- Attachment #1: Type: text/plain, Size: 5571 bytes --]
On Wed, Jan 17, 2018 at 03:39:46PM +0100, Cédric Le Goater wrote:
> On 01/17/2018 12:10 PM, Benjamin Herrenschmidt wrote:
> > On Wed, 2018-01-17 at 10:18 +0100, Cédric Le Goater wrote:
> >>>>> Also, have we decided how the process of switching between XICS and
> >>>>> XIVE will work vs. CAS ?
> >>>>
> >>>> That's how it is described in the architecture. The current choice is
> >>>> to create both XICS and XIVE objects and choose at CAS which one to
> >>>> use. It relies today on the capability of the pseries machine to
> >>>> allocate IRQ numbers for both interrupt controller backends. These
> >>>> patches have been merged in QEMU.
> >>>>
> >>>> A change of interrupt mode results in a reset. The device tree is
> >>>> populated accordingly and the ICPs are switched for the model in
> >>>> use.
> >>>
> >>> For KVM we need to only instanciate one of them though.
> >>
> >> Hmm,
> >>
> >> How would we handle a guest rebooting on a kernel without XIVE support ?
> >
> > It will do CAS again and we can change the devices.
>
> So, we would destroy the previous QEMU ICS object and create a new one
> in the CAS hcall. That would probably work. There might be some issues
> in creating and destroying the ICS KVM device, but that can be studied
> without XIVE.
Adding and removing devices at runtime based on guest requests like
this will get really hairy in qemu.
As I've said before for the first cut, I think we want to select just
one as a machine option to avoid this confusion.
Looking further ahead, I think we'll be better off having both the
XIVE and XICS models always present (at least minimally) in qemu, but
with only one "active" at any given time.
Note that having the inactive one destroy and clean up the
corresponding KVM devices is fine, as is deallocating as much of its
runtime state as we can without changing the notional QOM tree.
>
> It used to be considered ugly to create a QEMU device at reset time, so
> I wonder if this is still the case, because when the machine reaches CAS,
> we really are beyond reset.
>
> If this is OK, then the next "issue" is to keep in sync the allocated
> IRQ numbers. The IRQ allocator is now merged at the machine level, so
> the synchronization is obvious to do when both backend QEMU objects
> are available. that's the path I took. If both QEMU objects are not
> available, then we need to scan the IRQ number space in the current
> interrupt mode and allocate the same IRQs in the newly negotiated mode.
> Probably OK. I don't see major problems with the current code.
>
> Migration is a problem. We will need both backend QEMU objects to be
> available anyhow if we want to migrate. So we are back to the current
> solution creating both QEMU objects but we can try to defer some of the
> KVM inits and create the KVM device on demand at CAS time.
>
> The next problem is the ICP object that currently needs the KVM device
> fd to connect the vcpus ... So, we will need to change that also.
> That is probably the biggest problem today. We need a way to disconnect
> the vpcu from the KVM device and see how we can defer the connection.
> I need to make sure this is possible, I can check that without XIVE
> I think.
>
> >> Are you suggesting to create the XICS or XIVE device in the CAS negotiation
> >> process ? So, the machine would not have any interrupt controller before
> >> CAS. That seems really late to me. grub uses the console for instance.
> >
> > We start with XICS by default.
>
> yes.
>
> >> I think it should prepare for both options, start in XIVE legacy mode,
> >> which is XICS, then possibly switch to XIVE exploitation mode.
> >>
> >>>>> And how that will interact with KVM ?
> >>>>
> >>>> I expect we will do the same, which is to create two KVM devices to
> >>>> be able to handle both interrupt controller backends depending on the
> >>>> mode negotiated by the guest.
> >>>
> >>> That will be an ungodly mess, I'd rather we only instanciate the right
> >>> one.
> >>
> >> It's rather transparent currently in the emulated version. There are two
> >> sets of objects in QEMU, switching is done in CAS. KVM support should not
> >> change anything in that area.
> >>
> >> I expect the 'xive-kvm' object to get/set states for migration, just like
> >> for XICS and to setup the ESB+TIMA memory regions, which is new.
> >
> > But both XICS and XIVE are completely different kernel KVM devices that will
> > need to "hook" into the same set of internal hooks for things like interrupts
> > being passed through, RTAS calls etc...
> >
> > How does KVM knows which one to "activate" ?
>
> Can't we add an extra IRQ type and use vcpu->arch.irq_type for that ?
> I haven't studied all the low level details though.
>
> > I don't think the kernel should have both.
>
> I hear that. From a QEMU perspective, it is much easier to put everything
> in place for both interrupt modes and let the guest decide what it wants
> to use.
>
> If we choose not to, we will need to find solution to defer the KVM inits
> and to disconnect/reconnect the vcpus. For the latter, we could add a
> KVM_DISABLE_CAP ioctl or maybe better add a new capability like
> KVM_CAP_IRQ_XIVE to perform the switch.
>
>
> C.
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2018-04-12 5:18 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-09 8:43 [Qemu-devel] [PATCH v2 00/19] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 01/19] dma-helpers: add a return value to store helpers Cédric Le Goater
2017-12-19 4:46 ` David Gibson
2017-12-19 6:43 ` Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
2017-12-09 14:06 ` Cédric Le Goater
2017-12-20 5:09 ` David Gibson
2017-12-20 7:38 ` Cédric Le Goater
2018-04-12 5:07 ` David Gibson
2018-04-12 8:18 ` Cédric Le Goater
2018-04-16 4:26 ` David Gibson
2018-04-19 17:40 ` Cédric Le Goater
2018-04-26 5:36 ` David Gibson
2018-04-26 8:17 ` Cédric Le Goater
2018-05-03 2:29 ` David Gibson
2018-05-03 8:43 ` Cédric Le Goater
2018-05-04 6:35 ` David Gibson
2018-05-04 15:35 ` Cédric Le Goater
2017-12-21 0:12 ` Benjamin Herrenschmidt
2017-12-21 9:16 ` Cédric Le Goater
2017-12-21 10:09 ` Cédric Le Goater
2017-12-21 22:53 ` Benjamin Herrenschmidt
2018-01-17 9:18 ` Cédric Le Goater
2018-01-17 11:10 ` Benjamin Herrenschmidt
2018-01-17 14:39 ` Cédric Le Goater
2018-01-17 17:57 ` Cédric Le Goater
2018-01-17 21:27 ` Benjamin Herrenschmidt
2018-01-18 13:27 ` Cédric Le Goater
2018-01-18 21:08 ` Benjamin Herrenschmidt
2018-02-11 8:08 ` David Gibson
2018-02-11 22:55 ` Benjamin Herrenschmidt
2018-02-12 2:02 ` Alexey Kardashevskiy
2018-02-12 12:20 ` [Qemu-devel] [Qemu-ppc] " Andrea Bolognani
2018-02-12 14:40 ` Benjamin Herrenschmidt
2018-02-13 1:11 ` Alexey Kardashevskiy
2018-02-13 7:40 ` Cédric Le Goater
2018-02-12 7:10 ` [Qemu-devel] " Cédric Le Goater
2018-04-12 5:16 ` David Gibson
2018-04-12 8:36 ` Cédric Le Goater
2018-04-16 4:29 ` David Gibson
2018-04-19 13:01 ` Cédric Le Goater
2018-04-12 5:15 ` David Gibson [this message]
2018-04-12 8:51 ` Cédric Le Goater
2018-04-12 5:10 ` David Gibson
2018-04-12 8:41 ` Cédric Le Goater
2018-04-12 5:08 ` David Gibson
2018-04-12 8:28 ` Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 03/19] spapr: introduce the XIVE interrupt sources Cédric Le Goater
2017-12-14 15:24 ` Cédric Le Goater
2017-12-18 0:59 ` Benjamin Herrenschmidt
2017-12-19 6:37 ` Cédric Le Goater
2017-12-20 5:13 ` David Gibson
2017-12-20 5:22 ` David Gibson
2017-12-20 7:54 ` Cédric Le Goater
2017-12-20 18:08 ` Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 04/19] spapr: add support for the LSI " Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 05/19] spapr: introduce a XIVE interrupt presenter model Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 06/19] spapr: introduce the XIVE Event Queues Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 07/19] spapr: push the XIVE EQ data in OS event queue Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 08/19] spapr: notify the CPU when the XIVE interrupt priority is more privileged Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 09/19] spapr: add support for the SET_OS_PENDING command (XIVE) Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 10/19] spapr: introduce a 'xive_exploitation' boolean to enable XIVE Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 11/19] spapr: add a sPAPRXive object to the machine Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 12/19] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 13/19] spapr: add device tree support for the XIVE " Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 14/19] spapr: introduce a helper to map the XIVE memory regions Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 15/19] spapr: add XIVE support to spapr_qirq() Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 16/19] spapr: introduce a spapr_icp_create() helper Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 17/19] spapr: toggle the ICP depending on the selected interrupt mode Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 18/19] spapr: add support to dump XIVE information Cédric Le Goater
2017-12-09 8:43 ` [Qemu-devel] [PATCH v2 19/19] spapr: advertise XIVE exploitation mode in CAS Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180412051503.GN9425@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=benh@kernel.crashing.org \
--cc=clg@kaod.org \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).