From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: qemu-ppc@nongnu.org, "Cédric Le Goater" <clg@kaod.org>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] spapr: Don't use the "dual" interrupt controller mode with an old hypervisor
Date: Wed, 12 Jun 2019 11:29:36 +1000 [thread overview]
Message-ID: <20190612012935.GF3998@umbus.fritz.box> (raw)
In-Reply-To: <20190607114950.3247af33@bahia.lab.toulouse-stg.fr.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 6939 bytes --]
On Fri, Jun 07, 2019 at 11:49:50AM +0200, Greg Kurz wrote:
65;5603;1c> On Fri, 7 Jun 2019 10:17:58 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
>
> > On 07/06/2019 02:19, David Gibson wrote:
> > > On Thu, Jun 06, 2019 at 07:08:59PM +0200, Greg Kurz wrote:
> > >> If KVM is too old to support XIVE native exploitation mode, we might end
> > >> up using the emulated XIVE after CAS. This is sub-optimal if KVM in-kernel
> > >> XICS is available, which is the case most of the time.
> > >
> > > This is intentional. A predictable guest environment trumps performance.
> >
> > I don't agree.
> >
> > If the user does not specify any specific interrupt mode, we should favor
> > the faster one.
> >
>
> This all boils down to the semantics of "dual"... "XICS AND XIVE" or "any
> of XICS or XIVE" ?
It means xics and xive are available, guest gets to choose.
> We already have a precedent when using pre-power9 compat mode. If the
> user passes ic-mode=dual,max-compat-cpu=power8, we internally convert
> "dual" to act as "xics". The rationale is that a guest running in
> power8 architected mode isn't supposed to know about XIVE at all.
>
> Should we forbid this config and exit QEMU instead ?
Uh.. yes, that would make sense, actually. However
max-compat-cpu=power8 without an explicit ic-mode=xive or ic-mode=dual
should act as ic-mode=xics. The difference here is that it's an
explicitly user specified constraint, rather than a host implied
constraint.
> > Here is the current matrix (with this patch) for guests running on an
> > old KVM, that is without KVM XIVE support. Let's discuss on what we
> > want.
> >
> > kernel_irqchip
> >
> > (default)
> > ic-mode allowed off on
> >
> > dual XICS KVM XICS emul.(3) XICS KVM (default mode)
> > xics XICS KVM XICS emul. XICS KVM
> > xive XIVE emul.(1) XIVE emul. QEMU failure (2)
> >
> >
> > (1) QEMU warns with "warning: kernel_irqchip requested but unavailable:
> > IRQ_XIVE capability must be present for KVM"
> > (2) QEMU fails with "kernel_irqchip requested but unavailable:
> > IRQ_XIVE capability must be present for KVM"
> > (3) That is wrong I think, we should get XIVE emulated.
> >
>
> This is the logical consequence of turning "dual" into "xics".
>
> >
> > what you would want is XIVE emulation when ic-mode=dual and
> > kernel_irqchip=allowed, which is the behavior with this patch (but there
>
> I guess you mean s/with/without
>
> > are reboot bugs)
> >
>
> Yeah. If the semantics of "dual" is to always advertise XICS AND XIVE, and we
> keep the current fallback behavior, then we need each implementation to have
> proper init/teardown support as well as proper rollback on error paths.
>
> This is definitely not the case: rollback is missing in both in-kernel
> XICS and XIVE code and the emulated XICS and XIVE don't have teardown.
>
> This can generate a variety of bugs, including QEMU crashes... the old KVM case
> is just one of them, but there might be others.
>
> >
> > >> Also, an old KVM may not allow to destroy and re-create the KVM XICS, which
> > >> is precisely what "dual" does during machine reset. This causes QEMU to try
> > >> to switch to emulated XICS and to crash because RTAS call de-registration
> > >> isn't handled correctly. We could possibly fix that, but again we would
> > >> still end up with an emulated XICS or XIVE.
> > >
> > > Ugh, that's a problem.
> >
> > Yes. It's another problem around the way we cleanup the allocated resources.
> > It should be another patch.
> >
>
> Yeah, probably many other patches...
>
> > >
> > >> "dual" is definitely not a good choice with older KVMs. Internally force
> > >> XICS when we detect this.
> > >
> > > But this is not an acceptable solution. Silently changing the guest
> > > visible environment based on host capabilities is never ok.
> >
> > If the host (KVM) doesn't have a capability, what is the point of trying
> > to use it if we can do better. I know you are considering KVM/QEMU as a
> > whole but who would run with kernel_irqchip=off ?
> >
>
> The real problem is when you don't pass kernel_irqchip at all. Combined
> with "dual", this gives a fair number of cases. Do we want to support
> all possible transitions ?
>
> > > We must
> > > either give the guest environment that the user has requested, or fail
> > > outright.
> >
> > 'dual' mode means both and the user is not requesting XIVE. We are changing
> > the priority of choices from :
> >
> > 1. KVM XIVE
> > 2. QEMU XIVE
> > 3. KVM XICS
> > 4. QEMU XICS
> >
> > to:
> >
> > 1. KVM XIVE
> > 2. KVM XICS
> > 3. QEMU XIVE
> > 4. QEMU XICS
> >
> > which is better I think.
> >
>
> Using KVM XICS is indeed better than QEMU XIVE... but IIUC what David
> is saying, kernel-irqchip semantics are:
>
> - on: user favors performance, at the expense of a reduced spectrum
> of usable hosts
>
> - absent: user favors being able to start the VM on a wider spectrum
> of hosts, at the possible expense of performance
Correct.
> - off: user wants the VM to start on any host, doesn't care for
> performance at all
Well, TBH, the main use of kernel-irqchip=off is for debugging and testing.
>
> So the only way to have 1. KVM XIVE 2.KVM XICS would be to pass "on".
>
> > C.
> >
> >
> > >
> > >>
> > >> Signed-off-by: Greg Kurz <groug@kaod.org>
> > >> ---
> > >> hw/ppc/spapr_irq.c | 10 ++++++++++
> > >> 1 file changed, 10 insertions(+)
> > >>
> > >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > >> index 3156daf09381..d788bd662a7a 100644
> > >> --- a/hw/ppc/spapr_irq.c
> > >> +++ b/hw/ppc/spapr_irq.c
> > >> @@ -18,6 +18,7 @@
> > >> #include "hw/ppc/xics_spapr.h"
> > >> #include "cpu-models.h"
> > >> #include "sysemu/kvm.h"
> > >> +#include "kvm_ppc.h"
> > >>
> > >> #include "trace.h"
> > >>
> > >> @@ -668,6 +669,15 @@ static void spapr_irq_check(SpaprMachineState *spapr, Error **errp)
> > >> return;
> > >> }
> > >> }
> > >> +
> > >> + /*
> > >> + * KVM may be too old to support XIVE, in which case we'd rather try
> > >> + * to use the in-kernel XICS instead of the emulated XIVE.
> > >> + */
> > >> + if (kvm_enabled() && !kvmppc_has_cap_xive() &&
> > >> + spapr->irq == &spapr_irq_dual) {
> > >> + spapr->irq = &spapr_irq_xics;
> > >> + }
> > >> }
> > >>
> > >> /*
> > >>
> > >
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2019-06-12 2:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 17:08 [Qemu-devel] [PATCH] spapr: Don't use the "dual" interrupt controller mode with an old hypervisor Greg Kurz
2019-06-06 17:38 ` Cédric Le Goater
2019-06-07 0:19 ` David Gibson
2019-06-07 8:17 ` Cédric Le Goater
2019-06-07 8:27 ` Cédric Le Goater
2019-06-07 9:49 ` Greg Kurz
2019-06-12 1:29 ` David Gibson [this message]
2019-06-11 5:26 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190612012935.GF3998@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=clg@kaod.org \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).