qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@linux.ibm.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aneesh.kumar@linux.ibm.com, danielhb413@gmail.com,
	qemu-devel@nongnu.org, npiggin@gmail.com, qemu-ppc@nongnu.org,
	clg@kaod.org
Subject: Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5
Date: Mon, 14 Mar 2022 19:10:10 -0300	[thread overview]
Message-ID: <87ee346v99.fsf@linux.ibm.com> (raw)
In-Reply-To: <YixlR+rLNZCsAA50@yekko>

David Gibson <david@gibson.dropbear.id.au> writes:

> On Tue, Mar 08, 2022 at 10:23:59PM -0300, Fabiano Rosas wrote:
>> QEMU reports MMU support to the guest via the ibm,architecture-vec-5
>> property of the /chosen node. Byte number 26 specifies Radix Table
>> Expansions, currently only GTSE (Guest Translation Shootdown
>> Enable). This feature determines whether the tlbie instruction (and
>> others) are HV privileged.
>> 
>> Up until now, we always reported GTSE=1 to guests. Even after the
>> support for GTSE=0 was added. As part of that support, a kernel
>> command line radix_hcall_invalidate=on was introduced that overrides
>> the GTSE value received via CAS. So a guest can run with GTSE=0 and
>> use the H_RPT_INVALIDATE hcall instead of tlbie.
>> 
>> In this scenario, having GTSE always set to 1 by QEMU leads to a crash
>> when running nested KVM guests because KVM does not allow a nested
>> hypervisor to set GTSE support for its nested guests. So a nested
>> guest always uses the same value for LPCR_GTSE as its HV. Since the
>> nested HV disabled GTSE, but the L2 QEMU always reports GTSE=1, we run
>> into a crash when:
>> 
>> L1 LPCR_GTSE=0
>> L2 LPCR_GTSE=0
>> L2 CAS GTSE=1
>> 
>> The nested guest will run 'tlbie' and crash because the HW looks at
>> LPCR_GTSE, which is clear.
>> 
>> Having GTSE disabled in the L1 and enabled in the L2 is not an option
>> because the whole purpose of GTSE is to disallow access to tlbie and
>> we cannot allow L1 to spawn L2s that can access features that L1
>> itself cannot.
>> 
>> We also cannot have the guest check the LPCR bit, because LPCR is
>> HV-privileged.
>> 
>> So this patch goes through the most intuitive route which is to have
>> QEMU ask KVM about GTSE support and advertise the correct value to the
>> guest. A new KVM_CAP_PPC_GTSE capability is being added.
>> 
>> TCG continues to always enable GTSE.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
>> ---
>>  hw/ppc/spapr.c       | 38 +++++++++++++++++++++++++++++++-------
>>  target/ppc/kvm.c     |  8 ++++++++
>>  target/ppc/kvm_ppc.h |  6 ++++++
>>  3 files changed, 45 insertions(+), 7 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 4cc204f90d..3e95a1831f 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -971,7 +971,7 @@ static void spapr_dt_ov5_platform_support(SpaprMachineState *spapr, void *fdt,
>>          23, 0x00, /* XICS / XIVE mode */
>>          24, 0x00, /* Hash/Radix, filled in below. */
>>          25, 0x00, /* Hash options: Segment Tables == no, GTSE == no. */
>> -        26, 0x40, /* Radix options: GTSE == yes. */
>> +        26, 0x00, /* Radix options, filled in below. */
>>      };
>>  
>>      if (spapr->irq->xics && spapr->irq->xive) {
>> @@ -1000,10 +1000,16 @@ static void spapr_dt_ov5_platform_support(SpaprMachineState *spapr, void *fdt,
>>          } else {
>>              val[3] = 0x00; /* Hash */
>>          }
>> +
>> +        if (kvmppc_has_cap_gtse()) {
>> +            val[7] = 0x40 /* OV5_MMU_RADIX_GTSE */;
>> +        }
>>      } else {
>>          /* V3 MMU supports both hash and radix in tcg (with dynamic switching) */
>>          val[3] = 0xC0;
>> +        val[7] = 0x40 /* OV5_MMU_RADIX_GTSE */;
>>      }
>> +
>>      _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
>>                       val, sizeof(val)));
>>  }
>> @@ -2824,14 +2830,32 @@ static void spapr_machine_init(MachineState *machine)
>>      /* Init numa_assoc_array */
>>      spapr_numa_associativity_init(spapr, machine);
>>  
>> -    if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
>> -        ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
>> +    if (ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
>>                                spapr->max_compat_pvr)) {
>> -        spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_300);
>> -        /* KVM and TCG always allow GTSE with radix... */
>> -        spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
>> +
>> +        /* TCG always supports Radix w/ GTSE */
>> +        if (!kvm_enabled()) {
>> +            spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_300);
>> +            spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
>
> Yeah, this is no good.  It's never ok to change guest visible
> behaviour depending on host properties (in this case whether it's KVM
> or not).  It messes up the invariants we need for migration, which
> require that the guest visible state depend only on the user
> configuration.
>
> The usual way to handle this is with a new capability, you can then
> change the default with the next machine version.

This particular problem is tricky. TCG cannot disable GTSE because it
does not support H_RPT_INVALIDATE. And older kernels that don't know
about the feature require GTSE.

KVM can afford to disable GTSE because we have a compatibility mechanism
(although a bit crooked): We can invert the logic for the KVM_CAP so
that the presence of KVM_CAP_PPC_GTSE_DISABLE would mean QEMU is allowed
to disable GTSE. Then:
  - older KVM + new QEMU would keep GTSE enabled;

  - older L1 guests are not affected because the host would report
    GTSE=1 with the KVM capability. By the time we decide to disable
    GTSE for L1 guests hopefully all older kernels will be out of use;

  - older nested guests:
    - if L1 runs with GTSE=1, are not affected;

    - if L1 disabled GTSE via kernel cmdline, are already broken (this
      bug). But they would go from crashing to being aborted* by QEMU
      (the guest asks for HPT in the lack of GTSE; nested KVM is radix
      only);

      * there are other broken cases which are fixed completely.

To satisfy TCG we could keep a spapr capability as ON and usually the
guest would pass cap-gtse=off when running with KVM. However this
doesn't work because this crash happens precisely because the nested
guest doesn't know that it needs to use cap-rpt-invalidate=on. Another
cap wouldn't help.

So I think the only way to have a spapr capability for this is if TCG
always defaults to ON and KVM always defaults to OFF. But then we would
be changing guest visible behaviour depending on host properties.



  reply	other threads:[~2022-03-14 22:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09  1:23 [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5 Fabiano Rosas
2022-03-09  1:24 ` [RFC PATCH 2/2] linux-headers: Add KVM_CAP_PPC_GTSE Fabiano Rosas
2022-03-09 19:09 ` [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5 Daniel Henrique Barboza
2022-03-11 14:45   ` Fabiano Rosas
2022-03-10 19:51 ` Fabiano Rosas
2022-03-11  0:30   ` Daniel Henrique Barboza
2022-03-12  9:17 ` David Gibson
2022-03-14 22:10   ` Fabiano Rosas [this message]
2022-03-31  4:29     ` David Gibson
2022-04-01  7:01       ` Aneesh Kumar K.V
2022-04-01 15:50         ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ee346v99.fsf@linux.ibm.com \
    --to=farosas@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=clg@kaod.org \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=npiggin@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).