qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode
@ 2023-06-30  5:30 Gautam Menghani
  2023-06-30  5:59 ` Cédric Le Goater
  2023-06-30  6:51 ` Greg Kurz
  0 siblings, 2 replies; 4+ messages in thread
From: Gautam Menghani @ 2023-06-30  5:30 UTC (permalink / raw)
  To: danielhb413, harshpb, clg, david, groug
  Cc: Gautam Menghani, qemu-ppc, qemu-devel, fbarrat

Currently, XIVE native exploitation mode is not supported in nested
guests. When we boot up a nested guest on PowerNV platform, we observe 
the following entries in the device tree of nested guest:

```
device_type = "power-ivpe";
compatible = "ibm,power-ivpe";
```

But as per LoPAR section B.5.9[1], these entries should only be present
when XIVE native exploitation mode is being used. Presently, there is no 
support for nested virtualization in the context of XIVE, and hence, DT 
shouldn't advertise support for XIVE interrupt controller to a nested guest. 

Also, according to the present behaviour, when we boot a nested KVM
guest, the following QEMU warnings are reported	:
```
Calling ibm,client-architecture-support...qemu-system-ppc64: warning: 
kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present 
for KVM
Falling back to kernel-irqchip=off
.
.
.
[    0.000000][    T0] xive: Using IRQ range [0-0]
[    0.000000][    T0] xive: Interrupt handling initialized with spapr backend
[    0.000000][    T0] xive: Using priority 6 for all interrupts
[    0.000000][    T0] xive: Using 64kB queues
```

With this patch, the above warnings are no longer observed in nested guest's 
dmesg and also the device tree contains the following entries:
```
device_type = "PowerPC-External-Interrupt-Presentation";
compatible = "IBM,ppc-xicp";
```

Also add an additional check to handle the scenarios where
ic-mode=<mode> is explicitly specified by user - make the code error out
when XIVE native capability is not there and user specifies
ic-mode=xive.

Testing:
1. This patch has been tested on a P9 PowerNV machine by spinning up both a
KVM guest and nested KVM guest. The guest can use XIVE native mode just fine 
with correct DT entries and for nested guest, interrupt emulation is being used 
and the DT contains correct entries.

2. This patch also has been tested on KVM on PowerVM platform. In this
scenario, we can boot up a KVM guest on top of a Power Hypervisor guest.
Kernel patches - lore.kernel.org/linuxppc-dev/20230605064848.12319-1-jpn@linux.vnet.ibm.com
QEMU tree to test - github.com/mikey/qemu/tree/kvm-papr

[1] : https://files.openpower.foundation/s/ZmtZyCGiJ2oJHim

Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
---
 hw/ppc/spapr.c     |  2 +-
 hw/ppc/spapr_irq.c | 14 +++++++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 54dbfd7fe9..6434742369 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2840,7 +2840,7 @@ static void spapr_machine_init(MachineState *machine)
     spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
 
     /* advertise XIVE on POWER9 machines */
-    if (spapr->irq->xive) {
+    if (kvmppc_has_cap_xive() && spapr->irq->xive) {
         spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
     }
 
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index a0d1e1298e..856bba042a 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -20,6 +20,7 @@
 #include "hw/qdev-properties.h"
 #include "cpu-models.h"
 #include "sysemu/kvm.h"
+#include "kvm_ppc.h"
 
 #include "trace.h"
 
@@ -294,6 +295,7 @@ uint32_t spapr_irq_nr_msis(SpaprMachineState *spapr)
 void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 {
     SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    bool cap_xive = kvmppc_has_cap_xive();
 
     if (kvm_enabled() && kvm_kernel_irqchip_split()) {
         error_setg(errp, "kernel_irqchip split mode not supported on pseries");
@@ -304,6 +306,16 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
         return;
     }
 
+    /*
+     * Check for valid ic-mode - XIVE native won't work if hypervisor doesn't
+     * have support
+     */
+    if (!cap_xive && !spapr->irq->xics) {
+        error_setg(errp,
+            "XIVE native mode not available, don't use ic-mode=xive");
+        return;
+    }
+
     /* Initialize the MSI IRQ allocator. */
     spapr_irq_msi_init(spapr);
 
@@ -323,7 +335,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
         spapr->ics = ICS_SPAPR(obj);
     }
 
-    if (spapr->irq->xive) {
+    if (cap_xive && spapr->irq->xive) {
         uint32_t nr_servers = spapr_max_server_number(spapr);
         DeviceState *dev;
         int i;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode
  2023-06-30  5:30 [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode Gautam Menghani
@ 2023-06-30  5:59 ` Cédric Le Goater
  2023-06-30  6:51 ` Greg Kurz
  1 sibling, 0 replies; 4+ messages in thread
From: Cédric Le Goater @ 2023-06-30  5:59 UTC (permalink / raw)
  To: Gautam Menghani, danielhb413, harshpb, david, groug
  Cc: qemu-ppc, qemu-devel, fbarrat

On 6/30/23 07:30, Gautam Menghani wrote:
> Currently, XIVE native exploitation mode is not supported in nested
> guests. When we boot up a nested guest on PowerNV platform, we observe
> the following entries in the device tree of nested guest:
> 
> ```
> device_type = "power-ivpe";
> compatible = "ibm,power-ivpe";
> ```
> 
> But as per LoPAR section B.5.9[1], these entries should only be present
> when XIVE native exploitation mode is being used. Presently, there is no
> support for nested virtualization in the context of XIVE, and hence, DT
> shouldn't advertise support for XIVE interrupt controller to a nested guest.
> 
> Also, according to the present behaviour, when we boot a nested KVM
> guest, the following QEMU warnings are reported	:
> ```
> Calling ibm,client-architecture-support...qemu-system-ppc64: warning:
> kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present
> for KVM
> Falling back to kernel-irqchip=off

Yes. A QEMU/KVM principle is to be closest as possible to HW using
emulation if necessary, this to be compatible with a platform which
would have the required HW feature. The reason behind is migration.

This is the case on L1s running on Boston systems which have an
old FW not supporting XIVE migration. pseries machines runs with
and emulated XIVE IC. It allowed to migrate such a guest from a
Boston to a Witherspoon where XIVE HW was supported in guests.
KVM-on-pseries is just another platform where XIVE can not use
the KVM backend and needs to run under partially emulation.

I see no reason to change.

Thanks,

C.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode
  2023-06-30  5:30 [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode Gautam Menghani
  2023-06-30  5:59 ` Cédric Le Goater
@ 2023-06-30  6:51 ` Greg Kurz
  2023-07-25 11:40   ` Gautam Menghani
  1 sibling, 1 reply; 4+ messages in thread
From: Greg Kurz @ 2023-06-30  6:51 UTC (permalink / raw)
  To: Gautam Menghani
  Cc: danielhb413, harshpb, clg, david, qemu-ppc, qemu-devel, fbarrat

On Fri, 30 Jun 2023 11:00:56 +0530
Gautam Menghani <gautam@linux.ibm.com> wrote:

> Currently, XIVE native exploitation mode is not supported in nested
> guests. When we boot up a nested guest on PowerNV platform, we observe 
> the following entries in the device tree of nested guest:
> 
> ```
> device_type = "power-ivpe";
> compatible = "ibm,power-ivpe";
> ```
> 
> But as per LoPAR section B.5.9[1], these entries should only be present
> when XIVE native exploitation mode is being used. Presently, there is no 
> support for nested virtualization in the context of XIVE, and hence, DT 
> shouldn't advertise support for XIVE interrupt controller to a nested guest. 
> 
> Also, according to the present behaviour, when we boot a nested KVM
> guest, the following QEMU warnings are reported	:
> ```
> Calling ibm,client-architecture-support...qemu-system-ppc64: warning: 
> kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present 
> for KVM
> Falling back to kernel-irqchip=off

This is expected since the XIVE native mode is only available
on bare metal... arguably the warning could be silenced but
it was deemed informative

> .
> .
> .
> [    0.000000][    T0] xive: Using IRQ range [0-0]
> [    0.000000][    T0] xive: Interrupt handling initialized with spapr backend
> [    0.000000][    T0] xive: Using priority 6 for all interrupts
> [    0.000000][    T0] xive: Using 64kB queues
> ```
> 
> With this patch, the above warnings are no longer observed in nested guest's 
> dmesg and also the device tree contains the following entries:
> ```
> device_type = "PowerPC-External-Interrupt-Presentation";
> compatible = "IBM,ppc-xicp";
> ```
> 

Hmm... this is a behavior change : same QEMU invocation that would run XIVE
before will now run XICS... :-\

> Also add an additional check to handle the scenarios where
> ic-mode=<mode> is explicitly specified by user - make the code error out
> when XIVE native capability is not there and user specifies
> ic-mode=xive.
> 
> Testing:
> 1. This patch has been tested on a P9 PowerNV machine by spinning up both a
> KVM guest and nested KVM guest. The guest can use XIVE native mode just fine 
> with correct DT entries and for nested guest, interrupt emulation is being used 
> and the DT contains correct entries.
> 
> 2. This patch also has been tested on KVM on PowerVM platform. In this
> scenario, we can boot up a KVM guest on top of a Power Hypervisor guest.
> Kernel patches - lore.kernel.org/linuxppc-dev/20230605064848.12319-1-jpn@linux.vnet.ibm.com
> QEMU tree to test - github.com/mikey/qemu/tree/kvm-papr
> 
> [1] : https://files.openpower.foundation/s/ZmtZyCGiJ2oJHim
> 
> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
> ---
>  hw/ppc/spapr.c     |  2 +-
>  hw/ppc/spapr_irq.c | 14 +++++++++++++-
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 54dbfd7fe9..6434742369 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2840,7 +2840,7 @@ static void spapr_machine_init(MachineState *machine)
>      spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2);
>  
>      /* advertise XIVE on POWER9 machines */
> -    if (spapr->irq->xive) {
> +    if (kvmppc_has_cap_xive() && spapr->irq->xive) {

Nak. The behavior of the machine is only dictated by the
command line arguments passed to QEMU, not by the host
capabilities. This is to guarantee migrability.

If the machine is expected to expose XIVE but the host
doesn't support it, e.g. boston P9 machines, then QEMU
falls back on emulating it.

>          spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT);
>      }
>  
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index a0d1e1298e..856bba042a 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -20,6 +20,7 @@
>  #include "hw/qdev-properties.h"
>  #include "cpu-models.h"
>  #include "sysemu/kvm.h"
> +#include "kvm_ppc.h"
>  
>  #include "trace.h"
>  
> @@ -294,6 +295,7 @@ uint32_t spapr_irq_nr_msis(SpaprMachineState *spapr)
>  void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>  {
>      SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +    bool cap_xive = kvmppc_has_cap_xive();
>  
>      if (kvm_enabled() && kvm_kernel_irqchip_split()) {
>          error_setg(errp, "kernel_irqchip split mode not supported on pseries");
> @@ -304,6 +306,16 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>          return;
>      }
>  
> +    /*
> +     * Check for valid ic-mode - XIVE native won't work if hypervisor doesn't
> +     * have support
> +     */
> +    if (!cap_xive && !spapr->irq->xics) {
> +        error_setg(errp,
> +            "XIVE native mode not available, don't use ic-mode=xive");
> +        return;
> +    }
> +
>      /* Initialize the MSI IRQ allocator. */
>      spapr_irq_msi_init(spapr);
>  
> @@ -323,7 +335,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>          spapr->ics = ICS_SPAPR(obj);
>      }
>  
> -    if (spapr->irq->xive) {
> +    if (cap_xive && spapr->irq->xive) {
>          uint32_t nr_servers = spapr_max_server_number(spapr);
>          DeviceState *dev;
>          int i;



-- 
Greg


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode
  2023-06-30  6:51 ` Greg Kurz
@ 2023-07-25 11:40   ` Gautam Menghani
  0 siblings, 0 replies; 4+ messages in thread
From: Gautam Menghani @ 2023-07-25 11:40 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Gautam Menghani, danielhb413, harshpb, clg, david, qemu-ppc,
	qemu-devel, fbarrat

Ok noted, thanks for the feedback Greg and Cedric.

Thanks,
Gautam


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-25 13:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-30  5:30 [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode Gautam Menghani
2023-06-30  5:59 ` Cédric Le Goater
2023-06-30  6:51 ` Greg Kurz
2023-07-25 11:40   ` Gautam Menghani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).