qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: qemu-ppc@nongnu.org, Greg Kurz <groug@kaod.org>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] spapr: Don't use the "dual" interrupt controller mode with an old hypervisor
Date: Tue, 11 Jun 2019 15:26:07 +1000	[thread overview]
Message-ID: <20190611052607.GC3998@umbus.fritz.box> (raw)
In-Reply-To: <2d02c8d1-d9f4-efd0-5059-6ca24e622107@kaod.org>

[-- Attachment #1: Type: text/plain, Size: 4511 bytes --]

On Fri, Jun 07, 2019 at 10:17:58AM +0200, Cédric Le Goater wrote:
> On 07/06/2019 02:19, David Gibson wrote:
> > On Thu, Jun 06, 2019 at 07:08:59PM +0200, Greg Kurz wrote:
> >> If KVM is too old to support XIVE native exploitation mode, we might end
> >> up using the emulated XIVE after CAS. This is sub-optimal if KVM in-kernel
> >> XICS is available, which is the case most of the time.
> > 
> > This is intentional.  A predictable guest environment trumps performance.
> 
> I don't agree. 
> 
> If the user does not specify any specific interrupt mode, we should favor 
> the faster one.

In principle that sounds good, but it doesn't work that way, and can't
with anything resembling the current model.  The user can't specify a
specific interrupt mode to be used, they can only specify what modes
will be available to the guest, and "dual" means *both* are available.

Otherwise we will get inconsistent behaviour - potentially triggering
different guest bugs - on what's allegedly the same qemu
configuration. It gets worse if you consider edge cases like migrating
between the initial machine setup and CAS

> Here is the current matrix (with this patch) for guests running on an 
> old KVM, that is without KVM XIVE support. Let's discuss on what we
> want. 
> 
>                         kernel_irqchip
> 
>            (default) 
> ic-mode     allowed           off            on 
> 
> dual        XICS KVM       XICS emul.(3)   XICS KVM         (default mode)

This should be as per "xive" - emul, emul, error (assuming POWER9 and
a a xive capable guest).  Otherwise we're presenting different guest
environments to the guest based on something that's not visible in the
qemu parameters.

We've done that a bunch of times in the past, and many of them have
bitten us in the arse later on.

> xics        XICS KVM       XICS emul.      XICS KVM    
> xive        XIVE emul.(1)  XIVE emul.     QEMU failure (2)
> 
> 
> (1) QEMU warns with "warning: kernel_irqchip requested but unavailable: 
>     IRQ_XIVE capability must be present for KVM" 
> (2) QEMU fails with "kernel_irqchip requested but unavailable: 
>     IRQ_XIVE capability must be present for KVM" 
> (3) That is wrong I think, we should get XIVE emulated.
> 
> 
> what you would want is XIVE emulation when ic-mode=dual and 
> kernel_irqchip=allowed, which is the behavior with this patch (but there
> are reboot bugs)
> 
>  
> >> Also, an old KVM may not allow to destroy and re-create the KVM XICS, which
> >> is precisely what "dual" does during machine reset. This causes QEMU to try
> >> to switch to emulated XICS and to crash because RTAS call de-registration
> >> isn't handled correctly. We could possibly fix that, but again we would
> >> still end up with an emulated XICS or XIVE.
> > 
> > Ugh, that's a problem.
> 
> Yes. It's another problem around the way we cleanup the allocated resources.
> It should be another patch.

Agreed.

> >> "dual" is definitely not a good choice with older KVMs. Internally force
> >> XICS when we detect this.
> > 
> > But this is not an acceptable solution.  Silently changing the guest
> > visible environment based on host capabilities is never ok. 
> 
> If the host (KVM) doesn't have a capability, what is the point of trying 
> to use it if we can do better. I know you are considering KVM/QEMU as a
> whole but who would run with kernel_irqchip=off ?

PAPR option negotiation via CAS is already confusing, with just the
guest and qemu as participants.  If the host kernel is also a
participant it's just asking for hard to reproduce bugs.

> > We must
> > either give the guest environment that the user has requested, or fail
> > outright.
> 
> 'dual' mode means both and the user is not requesting XIVE. We are changing 
> the priority of choices from :
> 
>  1. KVM XIVE
>  2. QEMU XIVE
>  3. KVM XICS
>  4. QEMU XICS
> 
> to:
> 
>  1. KVM XIVE
>  2. KVM XICS
>  3. QEMU XIVE
>  4. QEMU XICS
> 
> which is better I think.

If we were designing the whole environment from scratch, maybe.  But
the way things work at the moment, negotiation between qemu and the
host kernel and qemu and the guest kernel must be separate phases.
That makes the priority you describe impossible.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      parent reply	other threads:[~2019-06-11 12:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 17:08 [Qemu-devel] [PATCH] spapr: Don't use the "dual" interrupt controller mode with an old hypervisor Greg Kurz
2019-06-06 17:38 ` Cédric Le Goater
2019-06-07  0:19 ` David Gibson
2019-06-07  8:17   ` Cédric Le Goater
2019-06-07  8:27     ` Cédric Le Goater
2019-06-07  9:49     ` Greg Kurz
2019-06-12  1:29       ` David Gibson
2019-06-11  5:26     ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190611052607.GC3998@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=clg@kaod.org \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).