kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Scott Wood <scottwood@freescale.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>, <kvm@vger.kernel.org>,
	<kvm-ppc@vger.kernel.org>
Subject: Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
Date: Fri, 15 Feb 2013 17:59:11 -0600	[thread overview]
Message-ID: <1360972751.6960.7@snotra> (raw)
In-Reply-To: <20130215231831.GA21288@iris.ozlabs.ibm.com> (from paulus@samba.org on Fri Feb 15 17:18:31 2013)

On 02/15/2013 05:18:31 PM, Paul Mackerras wrote:
> On Fri, Feb 15, 2013 at 02:05:41PM -0600, Scott Wood wrote:
> > On 02/14/2013 06:01:08 PM, Paul Mackerras wrote:
> > >From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > >
> > >This adds in-kernel emulation of the XICS (eXternal Interrupt
> > >Controller Specification) interrupt controller specified by PAPR,  
> for
> > >both HV and PR KVM guests.
> > >
> > >This adds a new KVM_CREATE_IRQCHIP_ARGS ioctl, which is like
> > >KVM_CREATE_IRQCHIP in that it indicates that the virtual machine
> > >should use in-kernel interrupt controller emulation, but also  
> takes an
> > >argument struct that contains the type of interrupt controller
> > >architecture and an optional parameter.  Currently only one type  
> value
> > >is defined, that which indicates the XICS architecture.
> >
> > Would the device config API I posted a couple days ago work for you?
> 
> I suppose it could be made to work.  It doesn't feel like a natural
> fit though, because your API seems to assume (AFAICT) that a device is
> manipulated via registers at specific physical addresses, so I would
> have to invent an artificial set of registers with addresses and bit
> layouts, that aren't otherwise required.  The XICS is operated from
> the guest side via hcalls, not via emulated MMIO.

I don't think it makes such an assumption.  The MPIC device has  
physical registers, so it exposes them, but it also exposes things that  
are not physical registers (e.g. the per-IRQ input state).  The generic  
device control layer leaves interpretation of attributes up to the  
device.

I think it would be easier to fit XICS into the device control api  
model than to fit MPIC into this model, not to mention what would  
happen if we later want to emulate some other type of device -- x86  
already has at least one non-irqchip emulated device (i8254).

> Part of the reason I went with the API that I did is that that was
> what was agreed on at KVM Forum (as far as I can tell, not having been
> at the meeting).  Your device API seems to be quite different to that.

I wasn't there either.  It's fine to have discussions at such events,  
but it should not preclude others from having input, or better ideas  
from being considered afterward.

> > >The XICS emulation supports up to 1048560 interrupt sources.
> > >Interrupt source numbers below 16 are reserved; 0 is used to mean  
> no
> > >interrupt and 2 is used for IPIs.  Internally these are  
> represented in
> > >blocks of 1024, called ICS (interrupt controller source) entities,  
> but
> > >that is not visible to userspace.
> > >
> > >Two other new ioctls allow userspace to control the interrupt
> > >sources.  The KVM_IRQCHIP_SET_SOURCES ioctl sets the priority,
> > >destination cpu, level/edge sensitivity and pending state of a  
> range
> > >of interrupt sources, creating them if they don't already exist.   
> The
> > >KVM_IRQCHIP_GET_SOURCES ioctl returns that information for a range  
> of
> > >interrupt sources (they are required to already exist).
> >
> > Why is it userspace's job to control these?  If you use  
> KVM_IRQ_PENDING
> 
> These are primarily there to support live migration.  For live
> migration, userspace needs to be able to extract the entire state of
> the virtual machine from the old guest, and then set the new guest to
> that exact same state.

In that case, the state it describes is insufficient for MPIC.

> We have live migration working in qemu for
> pSeries guests with in-kernel XICS emulation using this interface.
> If you're not doing live migration,

We don't yet, but would prefer not to assume that it'll never happen.

> > for interrupt injection, what if there's a race with the user  
> changing
> > other flags via MMIO?  Maybe this isn't an issue with XICS, but  
> this is
> > being presented as a generic API.
> 
> They're not used while the guest is running, as I said, but even if
> they were, there is appropriate locking in there to handle any races.

OK, KVM_IRQ_LINE is still used for interrupt injection.  I was hoping  
to avoid going through a standardized interface that forces a global  
interrupt numberspace.

How do MSIs get injected?

BTW, do you have any plans regarding irqfd?

> > >+struct kvm_irq_sources {
> > >+	__u32 irq;
> > >+	__u32 nr_irqs;
> > >+	__u64 __user *irqbuf;
> > >+};
> >
> > Please no pointers in UAPI -- this would require a compat wrapper  
> with
> > 32-bit user and 64-bit kernel.
> 
> Hmmm, you're right.  I suppose it will have to be a fixed-size buffer,

It doesn't need to be a fixed size buffer.  You can still have  
pointers, but they need to be represented as a plain "__u64" with users  
casting the pointer.

> > >+/* irqbuf entries are laid out like this: */
> > >+#define KVM_IRQ_SERVER_SHIFT	0
> > >+#define KVM_IRQ_SERVER_MASK	0xffffffffULL
> > >+#define KVM_IRQ_PRIORITY_SHIFT	32
> > >+#define KVM_IRQ_PRIORITY_MASK	0xff
> > >+#define KVM_IRQ_LEVEL_SENSITIVE	(1ULL << 40)
> > >+#define KVM_IRQ_MASKED		(1ULL << 41)
> > >+#define KVM_IRQ_PENDING		(1ULL << 42)
> >
> > What does "server" mean?  Do you mean "laid out like this for XICS"?
> 
> Sorry, I should have made that "destination" rather than "server".
> You're right, "server" is confusing, but it just means "where the
> interrupt is sent to be handled".  It has nothing particularly to do
> with "server" computers.

Right, I was aware of the IBM terminology here -- just thought it  
wasn't appropriate in a generic interface.

What about interrupt controllers that allow multiple destinations?   
More than 256 priorities?  Different "levels" of output (normal,  
critical, machine check)?  Programmable vector numbers?  Active  
high/low control?

I just don't think irqchip state can be sanely made generic.

> > Let's please have a clean separation between what is generic and  
> what is
> > implementation-specific.
> 
> I believe that the interface is pretty cleanly generic - the model is
> a set of interrupt sources and some per-vcpu state, with priorities to
> decide which interrupts get delivered when.  That describes the basics
> of just about any SMP-capable interrupt controller, including MPIC.

The per-vcpu state isn't even part of this AFAICT.  It's an  
XICS-specific ONE_REG -- which is fine, but all that's left of the  
"generic" API is the get/set sources which is an imperfect match to our  
per-IRQ state and it's not clear how an implementation should extend it.

It would be straightforward to map it to the device control api by  
having XICS declare an attribute group that corresponds to IRQ sources  
(like KVM_DEV_MPIC_GRP_IRQ_ACTIVE) and the attribute data be what  
you've defined for kvm_irq_sources.  Or if you want to preserve the  
"set multiple IRQs at once" approach, you could have an attribute that  
takes exactly a kvm_irq_sources struct, though it might be better to  
have a generic batched attribute set/get ioctl (as has been proposed  
for ONE_REG if the need arises).

> MPIC would still need an extra interface for userspace to save and
> restore things like the timer registers at live migration time, and
> for userspace to configure the base address, MPIC version, etc., of
> course.

Yes, which pretty much means we need the device control api anyway --  
or MPIC-specific ioctls, which I wanted to avoid.

-Scott

  reply	other threads:[~2013-02-15 23:59 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-14 23:59 [PATCH 0/9] In-kernel XICS interrupt controller emulation Paul Mackerras
2013-02-14 23:59 ` [PATCH 1/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls Paul Mackerras
2013-03-21  8:52   ` Alexander Graf
2013-04-04  5:37     ` Paul Mackerras
2013-04-04  9:49       ` Alexander Graf
2013-04-04 22:38         ` Paul Mackerras
2013-04-19 15:16           ` Alexander Graf
2013-02-15  0:00 ` [PATCH 2/9] KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Paul Mackerras
2013-03-21  8:58   ` Alexander Graf
2013-02-15  0:01 ` [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller Paul Mackerras
2013-02-15 20:05   ` Scott Wood
2013-02-15 23:18     ` Paul Mackerras
2013-02-15 23:59       ` Scott Wood [this message]
2013-02-16  2:56         ` Paul Mackerras
2013-02-16  3:57           ` Scott Wood
2013-02-16  4:51             ` Paul Mackerras
2013-02-18 22:43               ` Scott Wood
2013-02-20  0:41                 ` Paul Mackerras
2013-02-20  1:01                   ` Scott Wood
2013-02-20 19:58           ` Marcelo Tosatti
2013-02-21  0:20             ` Scott Wood
2013-02-21  1:09               ` Marcelo Tosatti
2013-02-21  1:45                 ` Scott Wood
2013-02-24 14:08               ` Gleb Natapov
2013-02-25  0:59             ` Paul Mackerras
2013-03-21  9:20               ` Alexander Graf
2013-02-15  0:01 ` [PATCH 4/9] KVM: PPC: Book3S: Generalize interfaces to interrupt controller emulation Paul Mackerras
2013-02-15  0:02 ` [PATCH 5/9] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state Paul Mackerras
2013-02-15  0:02 ` [PATCH 6/9] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Paul Mackerras
2013-02-15  0:03 ` [PATCH 7/9] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation Paul Mackerras
2013-02-15  0:03 ` [PATCH 8/9] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts Paul Mackerras
2013-02-15  0:04 ` [PATCH 9/9] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls Paul Mackerras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1360972751.6960.7@snotra \
    --to=scottwood@freescale.com \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).