qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Greg Kurz <groug@kaod.org>
Subject: Re: [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Thu, 3 May 2018 12:29:51 +1000	[thread overview]
Message-ID: <20180503022951.GK13229@umbus.fritz.box> (raw)
In-Reply-To: <6ccceb5a-e2fc-6fa3-cc40-c701e5047590@kaod.org>

[-- Attachment #1: Type: text/plain, Size: 8586 bytes --]

On Thu, Apr 26, 2018 at 10:17:13AM +0200, Cédric Le Goater wrote:
> On 04/26/2018 07:36 AM, David Gibson wrote:
> > On Thu, Apr 19, 2018 at 07:40:09PM +0200, Cédric Le Goater wrote:
> >> On 04/16/2018 06:26 AM, David Gibson wrote:
> >>> On Thu, Apr 12, 2018 at 10:18:11AM +0200, Cédric Le Goater wrote:
> >>>> On 04/12/2018 07:07 AM, David Gibson wrote:
> >>>>> On Wed, Dec 20, 2017 at 08:38:41AM +0100, Cédric Le Goater wrote:
> >>>>>> On 12/20/2017 06:09 AM, David Gibson wrote:
> >>>>>>> On Sat, Dec 09, 2017 at 09:43:21AM +0100, Cédric Le Goater
> >> wrote:
> > [snip]
> >>>> The XIVE tables are :
> >>>>
> >>>> * IVT
> >>>>
> >>>>   associate an interrupt source number with an event queue. the data
> >>>>   to be pushed in the queue is stored there also.
> >>>
> >>> Ok, so there would be one of these tables for each IVRE, 
> >>
> >> yes. one for each XIVE interrupt controller. That is one per processor 
> >> or socket.
> > 
> > Ah.. so there can be more than one in a multi-socket system.
> >  >>> with one entry for each source managed by that IVSE, yes?
> >>
> >> yes. The table is simply indexed by the interrupt number in the
> >> global IRQ number space of the machine.
> > 
> > How does that work on a multi-chip machine?  Does each chip just have
> > a table for a slice of the global irq number space?
> 
> yes. IRQ Allocation is done relative to the chip, each chip having 
> a range depending on its block id. XIVE has a concept of block,
> which is used in skiboot in a one-to-one relationship with the chip.

Ok.  I'm assuming this block id forms the high(ish) bits of the global
irq number, yes?

> >>> Do the XIVE IPIs have entries here, or do they bypass this?
> >>
> >> no. The IPIs have entries also in this table.
> >>
> >>>> * EQDT:
> >>>>
> >>>>   describes the queues in the OS RAM, also contains a set of flags,
> >>>>   a virtual target, etc.
> >>>
> >>> So on real hardware this would be global, yes?  And it would be
> >>> consulted by the IVRE?
> >>
> >> yes. Exactly. The XIVE routing routine :
> >>
> >> 	https://github.com/legoater/qemu/blob/xive/hw/intc/xive.c#L706
> >>
> >> gives a good overview of the usage of the tables.
> >>
> >>> For guests, we'd expect one table per-guest?  
> >>
> >> yes but only in emulation mode. 
> > 
> > I'm not sure what you mean by this.
> 
> I meant the sPAPR QEMU emulation mode. Linux/KVM relies on the overall 
> table allocated in OPAL for the system. 

Right.. I'm thinking of this from the point of view of the guest
and/or qemu, rather than from the implementation.  Even if the actual
storage of the entries is distributed across the host's global table,
we still logically have a table per guest, right?

> >>> How would those be integrated with the host table?
> >>
> >> Under KVM, this is handled by the host table (setup done in skiboot) 
> >> and we are only interested in the state of the EQs for migration.
> > 
> > This doesn't make sense to me; the guest is able to alter the IVT
> > entries, so that configuration must be migrated somehow.
> 
> yes. The IVE needs to be migrated. We use get/set KVM ioctls to save 
> and restore the value which is cached in the KVM irq state struct 
> (server, prio, eq data). no OPAL calls are needed though.

Right.  Again, at this stage I don't particularly care what the
backend details are - whether the host calls OPAL or whatever.  I'm
more concerned with the logical model.

> >> This state is set  with the H_INT_SET_QUEUE_CONFIG hcall,
> > 
> > "This state" here meaning IVT entries?
> 
> no. The H_INT_SET_QUEUE_CONFIG sets the event queue OS page for a 
> server/priority couple. That is where the event queue data is
> pushed.

Ah.  Doesn't that mean the guest *does* effectively have an EQD table,
updated by this call?  We'd need to migrate that data as well, and
it's not part of the IVT, right?

> H_INT_SET_SOURCE_CONFIG does the targeting : irq, server, priority,
> and the eq data to be pushed in case of an event.

Ok - that's the IVT entries, yes?

>  
> >> followed
> >> by an OPAL call and then a HW update. It defines the EQ page in which
> >> to push event notification for the couple server/priority. 
> >>
> >>>> * VPDT:
> >>>>
> >>>>   describe the virtual targets, which can have different natures,
> >>>>   a lpar, a cpu. This is for powernv, spapr does not have this 
> >>>>   concept.
> >>>
> >>> Ok  On hardware that would also be global and consulted by the IVRE,
> >>> yes?
> >>
> >> yes.
> > 
> > Except.. is it actually global, or is there one per-chip/socket?
> 
> There is a global VP allocator splitting the ids depending on the
> block/chip, but, to be honest, I have not dug in the details
> 
> > [snip]
> >>>>    In the current version I am working on, the XiveFabric interface is
> >>>>    more complex :
> >>>>
> >>>> 	typedef struct XiveFabricClass {
> >>>> 	    InterfaceClass parent;
> >>>> 	    XiveIVE *(*get_ive)(XiveFabric *xf, uint32_t lisn);
> >>>
> >>> This does an IVT lookup, I take it?
> >>
> >> yes. It is an interface for the underlying storage, which is different
> >> in sPAPR and PowerNV. The goal is to make the routing generic.
> > 
> > Right.  So, yes, we definitely want a method *somehwere* to do an IVT
> > lookup.  I'm not entirely sure where it belongs yet.
> 
> Me either. I have stuffed the XiveFabric with all the abstraction 
> needed for the moment. 
> 
> I am starting to think that there should be an interface to forward 
> events and another one to route them. The router being a special case 
> of the forwarder, the last one. The "simple" devices, like PSI, should 
> only be forwarders for the sources they own but the interrupt controllers 
> should be forwarders (they have sources) and also routers.

I'm not really clear what you mean by "forward" here.

> 
> >>>> 	    XiveNVT *(*get_nvt)(XiveFabric *xf, uint32_t server);
> >>>
> >>> This one a VPDT lookup, yes?
> >>
> >> yes.
> >>
> >>>> 	    XiveEQ  *(*get_eq)(XiveFabric *xf, uint32_t eq_idx);
> >>>
> >>> And this one an EQDT lookup?
> >>
> >> yes.
> >>
> >>>> 	} XiveFabricClass;
> >>>>
> >>>>    It helps in making the routing algorithm independent of the model. 
> >>>>    I hope to make powernv converge and use it.
> >>>>
> >>>>  - a set of MMIOs for the TIMA. They model the presenter engine. 
> >>>>    current_cpu is used to retrieve the NVT object, which holds the 
> >>>>    registers for interrupt management.  
> >>>
> >>> Right.  Now the TIMA is local to a target/server not an EQ, right?
> >>
> >> The TIMA is the MMIO giving access to the registers which are per CPU. 
> >> The EQ are for routing. They are under the CPU object because it is 
> >> convenient.
> >>  
> >>> I guess we need at least one of these per-vcpu.  
> >>
> >> yes.
> >>
> >>> Do we also need an lpar-global, or other special ones?
> >>
> >> That would be for the host. AFAICT KVM does not use such special
> >> VPs.
> > 
> > Um.. "does not use".. don't we get to decide that?
> 
> Well, that part in the specs is still a little obscure for me and 
> I am not sure it will fit very well in the Linux/KVM model. It should 
> be hidden to the guest anyway and can come in later.
> 
> >>>> The EQs are stored under the NVT. This saves us an unnecessary EQDT 
> >>>> table. But we could add one under the XIVE device model.
> >>>
> >>> I'm not sure of the distinction you're drawing between the NVT and the
> >>> XIVE device mode.
> >>
> >> we could add a new table under the XIVE interrupt device model 
> >> sPAPRXive to store the EQs and indexed them like skiboot does. 
> >> But it seems unnecessary to me as we can use the object below 
> >> 'cpu->intc', which is the XiveNVT object.  
> > 
> > So, basically assuming a fixed set of EQs (one per priority?)
> 
> yes. It's easier to capture the state and dump information from
> the monitor.
> 
> > per CPU for a PAPR guest?  
> 
> yes, that's own it works.
> 
> > That makes sense (assuming PAPR doesn't provide guest interfaces to 
> > ask for something else).
> 
> Yes. All hcalls take prio/server parameters and the reserved prio range 
> for the platform is in the device tree. 0xFF is a special case to reset 
> targeting. 
> 
> Thanks,
> 
> C.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2018-05-03  2:30 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-09  8:43 [Qemu-devel] [PATCH v2 00/19] spapr: Guest exploitation of the XIVE interrupt controller (POWER9) Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 01/19] dma-helpers: add a return value to store helpers Cédric Le Goater
2017-12-19  4:46   ` David Gibson
2017-12-19  6:43     ` Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller Cédric Le Goater
2017-12-09 14:06   ` Cédric Le Goater
2017-12-20  5:09   ` David Gibson
2017-12-20  7:38     ` Cédric Le Goater
2018-04-12  5:07       ` David Gibson
2018-04-12  8:18         ` Cédric Le Goater
2018-04-16  4:26           ` David Gibson
2018-04-19 17:40             ` Cédric Le Goater
2018-04-26  5:36               ` David Gibson
2018-04-26  8:17                 ` Cédric Le Goater
2018-05-03  2:29                   ` David Gibson [this message]
2018-05-03  8:43                     ` Cédric Le Goater
2018-05-04  6:35                       ` David Gibson
2018-05-04 15:35                         ` Cédric Le Goater
2017-12-21  0:12     ` Benjamin Herrenschmidt
2017-12-21  9:16       ` Cédric Le Goater
2017-12-21 10:09         ` Cédric Le Goater
2017-12-21 22:53         ` Benjamin Herrenschmidt
2018-01-17  9:18           ` Cédric Le Goater
2018-01-17 11:10             ` Benjamin Herrenschmidt
2018-01-17 14:39               ` Cédric Le Goater
2018-01-17 17:57                 ` Cédric Le Goater
2018-01-17 21:27                 ` Benjamin Herrenschmidt
2018-01-18 13:27                   ` Cédric Le Goater
2018-01-18 21:08                     ` Benjamin Herrenschmidt
2018-02-11  8:08                   ` David Gibson
2018-02-11 22:55                     ` Benjamin Herrenschmidt
2018-02-12  2:02                       ` Alexey Kardashevskiy
2018-02-12 12:20                         ` [Qemu-devel] [Qemu-ppc] " Andrea Bolognani
2018-02-12 14:40                           ` Benjamin Herrenschmidt
2018-02-13  1:11                             ` Alexey Kardashevskiy
2018-02-13  7:40                             ` Cédric Le Goater
2018-02-12  7:10                       ` [Qemu-devel] " Cédric Le Goater
2018-04-12  5:16                       ` David Gibson
2018-04-12  8:36                         ` Cédric Le Goater
2018-04-16  4:29                           ` David Gibson
2018-04-19 13:01                             ` Cédric Le Goater
2018-04-12  5:15                 ` David Gibson
2018-04-12  8:51                   ` Cédric Le Goater
2018-04-12  5:10             ` David Gibson
2018-04-12  8:41               ` Cédric Le Goater
2018-04-12  5:08       ` David Gibson
2018-04-12  8:28         ` Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 03/19] spapr: introduce the XIVE interrupt sources Cédric Le Goater
2017-12-14 15:24   ` Cédric Le Goater
2017-12-18  0:59     ` Benjamin Herrenschmidt
2017-12-19  6:37       ` Cédric Le Goater
2017-12-20  5:13         ` David Gibson
2017-12-20  5:22   ` David Gibson
2017-12-20  7:54     ` Cédric Le Goater
2017-12-20 18:08       ` Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 04/19] spapr: add support for the LSI " Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 05/19] spapr: introduce a XIVE interrupt presenter model Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 06/19] spapr: introduce the XIVE Event Queues Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 07/19] spapr: push the XIVE EQ data in OS event queue Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 08/19] spapr: notify the CPU when the XIVE interrupt priority is more privileged Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 09/19] spapr: add support for the SET_OS_PENDING command (XIVE) Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 10/19] spapr: introduce a 'xive_exploitation' boolean to enable XIVE Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 11/19] spapr: add a sPAPRXive object to the machine Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 12/19] spapr: add hcalls support for the XIVE exploitation interrupt mode Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 13/19] spapr: add device tree support for the XIVE " Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 14/19] spapr: introduce a helper to map the XIVE memory regions Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 15/19] spapr: add XIVE support to spapr_qirq() Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 16/19] spapr: introduce a spapr_icp_create() helper Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 17/19] spapr: toggle the ICP depending on the selected interrupt mode Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 18/19] spapr: add support to dump XIVE information Cédric Le Goater
2017-12-09  8:43 ` [Qemu-devel] [PATCH v2 19/19] spapr: advertise XIVE exploitation mode in CAS Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180503022951.GK13229@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=benh@kernel.crashing.org \
    --cc=clg@kaod.org \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).