qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] Nested PAPR API (KVM on PowerVM)
@ 2023-09-06  4:33 Harsh Prateek Bora
  2023-09-06  4:33 ` [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros Harsh Prateek Bora
                   ` (14 more replies)
  0 siblings, 15 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

There is an existing Nested-HV API to enable nested guests on powernv
machines. However, that is not supported on pseries/PowerVM LPARs.
This patch series implements required hcall interfaces to enable nested
guests with KVM on PowerVM.
Unlike Nested-HV, with this API, entire L2 state is retained by L0
during guest entry/exit and uses pre-defined Guest State Buffer (GSB)
format to communicate guest state between L1 and L2 via L0.

L0 here refers to the phyp/PowerVM, or launching a Qemu TCG L0 with the
newly introduced option cap-nested-papr=true (refer patch 5/15).
L1 refers to the LPAR host on PowerVM or Linux booted on Qemu TCG with
above mentioned option cap-nested-papr=true.
L2 refers to nested guest running on top of L1 using KVM.
No SW changes needed for Qemu running in L1 Linux as well as L2 Kernel.

There is a Linux Kernel side patch series to enable support for Nested
PAPR in L1 and same can be found at below url:

Linux Kernel RFC PATCH v4:
- https://lore.kernel.org/linuxppc-dev/20230905034658.82835-1-jniethe5@gmail.com/

For more details, documentation can be referred in either of patch
series.

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/mikey/kvm-powervm-test

Thanks to Michael Neuling, Shivaprasad Bhat, Kautuk Consul, Vaibhav Jain
and Jordan Niethe.

PS: This is a resend of patch series after rebasing to upstream master.

Harsh Prateek Bora (15):
  ppc: spapr: Introduce Nested PAPR API related macros
  ppc: spapr: Add new/extend structs to support Nested PAPR API
  ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr
  ppc: spapr: Start using nested.api for nested kvm-hv api
  ppc: spapr: Introduce cap-nested-papr for nested PAPR API
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU
  ppc: spapr: Initialize the GSB Elements lookup table.
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE
  ppc: spapr: Use correct source for parttbl info for nested PAPR API.
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE
  ppc: spapr: Document Nested PAPR API

 docs/devel/nested-papr.txt      |  500 ++++++++++
 hw/ppc/spapr.c                  |   28 +-
 hw/ppc/spapr_caps.c             |   50 +
 hw/ppc/spapr_hcall.c            |    1 +
 hw/ppc/spapr_nested.c           | 1504 +++++++++++++++++++++++++++++--
 include/hw/ppc/ppc.h            |    2 +
 include/hw/ppc/spapr.h          |   35 +-
 include/hw/ppc/spapr_cpu_core.h |    7 +-
 include/hw/ppc/spapr_nested.h   |  378 ++++++++
 target/ppc/cpu.h                |    2 +
 10 files changed, 2433 insertions(+), 74 deletions(-)
 create mode 100644 docs/devel/nested-papr.txt

-- 
2.39.3



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-06 23:48   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API Harsh Prateek Bora
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

Adding new macros for the new hypercall op-codes, their return codes,
Guest State Buffer (GSB) element IDs and few registers which shall be
used in following patches to support Nested PAPR API.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/spapr.h        |  23 ++++-
 include/hw/ppc/spapr_nested.h | 186 ++++++++++++++++++++++++++++++++++
 target/ppc/cpu.h              |   2 +
 3 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 538b2dfb89..3990fed1d9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -367,6 +367,16 @@ struct SpaprMachineState {
 #define H_NOOP            -63
 #define H_UNSUPPORTED     -67
 #define H_OVERLAP         -68
+#define H_STATE           -75
+#define H_INVALID_ELEMENT_ID               -79
+#define H_INVALID_ELEMENT_SIZE             -80
+#define H_INVALID_ELEMENT_VALUE            -81
+#define H_INPUT_BUFFER_NOT_DEFINED         -82
+#define H_INPUT_BUFFER_TOO_SMALL           -83
+#define H_OUTPUT_BUFFER_NOT_DEFINED        -84
+#define H_OUTPUT_BUFFER_TOO_SMALL          -85
+#define H_PARTITION_PAGE_TABLE_NOT_DEFINED -86
+#define H_GUEST_VCPU_STATE_NOT_HV_OWNED    -87
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -586,8 +596,17 @@ struct SpaprMachineState {
 #define H_RPT_INVALIDATE        0x448
 #define H_SCM_FLUSH             0x44C
 #define H_WATCHDOG              0x45C
-
-#define MAX_HCALL_OPCODE        H_WATCHDOG
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE           0x470
+#define H_GUEST_CREATE_VCPU      0x474
+#define H_GUEST_GET_STATE        0x478
+#define H_GUEST_SET_STATE        0x47C
+#define H_GUEST_RUN_VCPU         0x480
+#define H_GUEST_COPY_MEMORY      0x484
+#define H_GUEST_DELETE           0x488
+
+#define MAX_HCALL_OPCODE        H_GUEST_DELETE
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index d383486476..5cb668dd53 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -4,6 +4,192 @@
 #include "qemu/osdep.h"
 #include "target/ppc/cpu.h"
 
+/* Guest State Buffer Element IDs */
+#define GSB_HV_VCPU_IGNORED_ID  0x0000 /* An element whose value is ignored */
+#define GSB_HV_VCPU_STATE_SIZE  0x0001 /* HV internal format VCPU state size */
+#define GSB_VCPU_OUT_BUF_MIN_SZ 0x0002 /* Min size of the Run VCPU o/p buffer */
+#define GSB_VCPU_LPVR           0x0003 /* Logical PVR */
+#define GSB_TB_OFFSET           0x0004 /* Timebase Offset */
+#define GSB_PART_SCOPED_PAGETBL 0x0005 /* Partition Scoped Page Table */
+#define GSB_PROCESS_TBL         0x0006 /* Process Table */
+                    /* RESERVED 0x0007 - 0x0BFF */
+#define GSB_VCPU_IN_BUFFER      0x0C00 /* Run VCPU Input Buffer */
+#define GSB_VCPU_OUT_BUFFER     0x0C01 /* Run VCPU Out Buffer */
+#define GSB_VCPU_VPA            0x0C02 /* HRA to Guest VCPU VPA */
+                    /* RESERVED 0x0C03 - 0x0FFF */
+#define GSB_VCPU_GPR0           0x1000
+#define GSB_VCPU_GPR1           0x1001
+#define GSB_VCPU_GPR2           0x1002
+#define GSB_VCPU_GPR3           0x1003
+#define GSB_VCPU_GPR4           0x1004
+#define GSB_VCPU_GPR5           0x1005
+#define GSB_VCPU_GPR6           0x1006
+#define GSB_VCPU_GPR7           0x1007
+#define GSB_VCPU_GPR8           0x1008
+#define GSB_VCPU_GPR9           0x1009
+#define GSB_VCPU_GPR10          0x100A
+#define GSB_VCPU_GPR11          0x100B
+#define GSB_VCPU_GPR12          0x100C
+#define GSB_VCPU_GPR13          0x100D
+#define GSB_VCPU_GPR14          0x100E
+#define GSB_VCPU_GPR15          0x100F
+#define GSB_VCPU_GPR16          0x1010
+#define GSB_VCPU_GPR17          0x1011
+#define GSB_VCPU_GPR18          0x1012
+#define GSB_VCPU_GPR19          0x1013
+#define GSB_VCPU_GPR20          0x1014
+#define GSB_VCPU_GPR21          0x1015
+#define GSB_VCPU_GPR22          0x1016
+#define GSB_VCPU_GPR23          0x1017
+#define GSB_VCPU_GPR24          0x1018
+#define GSB_VCPU_GPR25          0x1019
+#define GSB_VCPU_GPR26          0x101A
+#define GSB_VCPU_GPR27          0x101B
+#define GSB_VCPU_GPR28          0x101C
+#define GSB_VCPU_GPR29          0x101D
+#define GSB_VCPU_GPR30          0x101E
+#define GSB_VCPU_GPR31          0x101F
+#define GSB_VCPU_HDEC_EXPIRY_TB 0x1020
+#define GSB_VCPU_SPR_NIA        0x1021
+#define GSB_VCPU_SPR_MSR        0x1022
+#define GSB_VCPU_SPR_LR         0x1023
+#define GSB_VCPU_SPR_XER        0x1024
+#define GSB_VCPU_SPR_CTR        0x1025
+#define GSB_VCPU_SPR_CFAR       0x1026
+#define GSB_VCPU_SPR_SRR0       0x1027
+#define GSB_VCPU_SPR_SRR1       0x1028
+#define GSB_VCPU_SPR_DAR        0x1029
+#define GSB_VCPU_DEC_EXPIRE_TB  0x102A
+#define GSB_VCPU_SPR_VTB        0x102B
+#define GSB_VCPU_SPR_LPCR       0x102C
+#define GSB_VCPU_SPR_HFSCR      0x102D
+#define GSB_VCPU_SPR_FSCR       0x102E
+#define GSB_VCPU_SPR_FPSCR      0x102F
+#define GSB_VCPU_SPR_DAWR0      0x1030
+#define GSB_VCPU_SPR_DAWR1      0x1031
+#define GSB_VCPU_SPR_CIABR      0x1032
+#define GSB_VCPU_SPR_PURR       0x1033
+#define GSB_VCPU_SPR_SPURR      0x1034
+#define GSB_VCPU_SPR_IC         0x1035
+#define GSB_VCPU_SPR_SPRG0      0x1036
+#define GSB_VCPU_SPR_SPRG1      0x1037
+#define GSB_VCPU_SPR_SPRG2      0x1038
+#define GSB_VCPU_SPR_SPRG3      0x1039
+#define GSB_VCPU_SPR_PPR        0x103A
+#define GSB_VCPU_SPR_MMCR0      0x103B
+#define GSB_VCPU_SPR_MMCR1      0x103C
+#define GSB_VCPU_SPR_MMCR2      0x103D
+#define GSB_VCPU_SPR_MMCR3      0x103E
+#define GSB_VCPU_SPR_MMCRA      0x103F
+#define GSB_VCPU_SPR_SIER       0x1040
+#define GSB_VCPU_SPR_SIER2      0x1041
+#define GSB_VCPU_SPR_SIER3      0x1042
+#define GSB_VCPU_SPR_BESCR      0x1043
+#define GSB_VCPU_SPR_EBBHR      0x1044
+#define GSB_VCPU_SPR_EBBRR      0x1045
+#define GSB_VCPU_SPR_AMR        0x1046
+#define GSB_VCPU_SPR_IAMR       0x1047
+#define GSB_VCPU_SPR_AMOR       0x1048
+#define GSB_VCPU_SPR_UAMOR      0x1049
+#define GSB_VCPU_SPR_SDAR       0x104A
+#define GSB_VCPU_SPR_SIAR       0x104B
+#define GSB_VCPU_SPR_DSCR       0x104C
+#define GSB_VCPU_SPR_TAR        0x104D
+#define GSB_VCPU_SPR_DEXCR      0x104E
+#define GSB_VCPU_SPR_HDEXCR     0x104F
+#define GSB_VCPU_SPR_HASHKEYR   0x1050
+#define GSB_VCPU_SPR_HASHPKEYR  0x1051
+#define GSB_VCPU_SPR_CTRL       0x1052
+                    /* RESERVED 0x1053 - 0x1FFF */
+#define GSB_VCPU_SPR_CR         0x2000
+#define GSB_VCPU_SPR_PIDR       0x2001
+#define GSB_VCPU_SPR_DSISR      0x2002
+#define GSB_VCPU_SPR_VSCR       0x2003
+#define GSB_VCPU_SPR_VRSAVE     0x2004
+#define GSB_VCPU_SPR_DAWRX0     0x2005
+#define GSB_VCPU_SPR_DAWRX1     0x2006
+#define GSB_VCPU_SPR_PMC1       0x2007
+#define GSB_VCPU_SPR_PMC2       0x2008
+#define GSB_VCPU_SPR_PMC3       0x2009
+#define GSB_VCPU_SPR_PMC4       0x200A
+#define GSB_VCPU_SPR_PMC5       0x200B
+#define GSB_VCPU_SPR_PMC6       0x200C
+#define GSB_VCPU_SPR_WORT       0x200D
+#define GSB_VCPU_SPR_PSPB       0x200E
+                    /* RESERVED 0x200F - 0x2FFF */
+#define GSB_VCPU_SPR_VSR0       0x3000
+#define GSB_VCPU_SPR_VSR1       0x3001
+#define GSB_VCPU_SPR_VSR2       0x3002
+#define GSB_VCPU_SPR_VSR3       0x3003
+#define GSB_VCPU_SPR_VSR4       0x3004
+#define GSB_VCPU_SPR_VSR5       0x3005
+#define GSB_VCPU_SPR_VSR6       0x3006
+#define GSB_VCPU_SPR_VSR7       0x3007
+#define GSB_VCPU_SPR_VSR8       0x3008
+#define GSB_VCPU_SPR_VSR9       0x3009
+#define GSB_VCPU_SPR_VSR10      0x300A
+#define GSB_VCPU_SPR_VSR11      0x300B
+#define GSB_VCPU_SPR_VSR12      0x300C
+#define GSB_VCPU_SPR_VSR13      0x300D
+#define GSB_VCPU_SPR_VSR14      0x300E
+#define GSB_VCPU_SPR_VSR15      0x300F
+#define GSB_VCPU_SPR_VSR16      0x3010
+#define GSB_VCPU_SPR_VSR17      0x3011
+#define GSB_VCPU_SPR_VSR18      0x3012
+#define GSB_VCPU_SPR_VSR19      0x3013
+#define GSB_VCPU_SPR_VSR20      0x3014
+#define GSB_VCPU_SPR_VSR21      0x3015
+#define GSB_VCPU_SPR_VSR22      0x3016
+#define GSB_VCPU_SPR_VSR23      0x3017
+#define GSB_VCPU_SPR_VSR24      0x3018
+#define GSB_VCPU_SPR_VSR25      0x3019
+#define GSB_VCPU_SPR_VSR26      0x301A
+#define GSB_VCPU_SPR_VSR27      0x301B
+#define GSB_VCPU_SPR_VSR28      0x301C
+#define GSB_VCPU_SPR_VSR29      0x301D
+#define GSB_VCPU_SPR_VSR30      0x301E
+#define GSB_VCPU_SPR_VSR31      0x301F
+#define GSB_VCPU_SPR_VSR32      0x3020
+#define GSB_VCPU_SPR_VSR33      0x3021
+#define GSB_VCPU_SPR_VSR34      0x3022
+#define GSB_VCPU_SPR_VSR35      0x3023
+#define GSB_VCPU_SPR_VSR36      0x3024
+#define GSB_VCPU_SPR_VSR37      0x3025
+#define GSB_VCPU_SPR_VSR38      0x3026
+#define GSB_VCPU_SPR_VSR39      0x3027
+#define GSB_VCPU_SPR_VSR40      0x3028
+#define GSB_VCPU_SPR_VSR41      0x3029
+#define GSB_VCPU_SPR_VSR42      0x302A
+#define GSB_VCPU_SPR_VSR43      0x302B
+#define GSB_VCPU_SPR_VSR44      0x302C
+#define GSB_VCPU_SPR_VSR45      0x302D
+#define GSB_VCPU_SPR_VSR46      0x302E
+#define GSB_VCPU_SPR_VSR47      0x302F
+#define GSB_VCPU_SPR_VSR48      0x3030
+#define GSB_VCPU_SPR_VSR49      0x3031
+#define GSB_VCPU_SPR_VSR50      0x3032
+#define GSB_VCPU_SPR_VSR51      0x3033
+#define GSB_VCPU_SPR_VSR52      0x3034
+#define GSB_VCPU_SPR_VSR53      0x3035
+#define GSB_VCPU_SPR_VSR54      0x3036
+#define GSB_VCPU_SPR_VSR55      0x3037
+#define GSB_VCPU_SPR_VSR56      0x3038
+#define GSB_VCPU_SPR_VSR57      0x3039
+#define GSB_VCPU_SPR_VSR58      0x303A
+#define GSB_VCPU_SPR_VSR59      0x303B
+#define GSB_VCPU_SPR_VSR60      0x303C
+#define GSB_VCPU_SPR_VSR61      0x303D
+#define GSB_VCPU_SPR_VSR62      0x303E
+#define GSB_VCPU_SPR_VSR63      0x303F
+                    /* RESERVED 0x3040 - 0xEFFF */
+#define GSB_VCPU_SPR_HDAR       0xF000
+#define GSB_VCPU_SPR_HDSISR     0xF001
+#define GSB_VCPU_SPR_HEIR       0xF002
+#define GSB_VCPU_SPR_ASDR       0xF003
+/* End of list of Guest State Buffer Element IDs */
+#define GSB_LAST                GSB_VCPU_SPR_ASDR
+
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 25fac9577a..6f7f9b9d58 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1587,9 +1587,11 @@ void ppc_compat_add_property(Object *obj, const char *name,
 #define SPR_PSPB              (0x09F)
 #define SPR_DPDES             (0x0B0)
 #define SPR_DAWR0             (0x0B4)
+#define SPR_DAWR1             (0x0B5)
 #define SPR_RPR               (0x0BA)
 #define SPR_CIABR             (0x0BB)
 #define SPR_DAWRX0            (0x0BC)
+#define SPR_DAWRX1            (0x0BD)
 #define SPR_HFSCR             (0x0BE)
 #define SPR_VRSAVE            (0x100)
 #define SPR_USPRG0            (0x100)
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
  2023-09-06  4:33 ` [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  1:06   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr Harsh Prateek Bora
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This patch introduces new data structures to be used with Nested PAPR
API. Also extends kvmppc_hv_guest_state with additional set of registers
supported with nested PAPR API.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 include/hw/ppc/spapr_nested.h | 48 +++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 5cb668dd53..f8db31075b 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -189,6 +189,39 @@
 /* End of list of Guest State Buffer Element IDs */
 #define GSB_LAST                GSB_VCPU_SPR_ASDR
 
+typedef struct SpaprMachineStateNestedGuest {
+    unsigned long vcpus;
+    struct SpaprMachineStateNestedGuestVcpu *vcpu;
+    uint64_t parttbl[2];
+    uint32_t pvr_logical;
+    uint64_t tb_offset;
+} SpaprMachineStateNestedGuest;
+
+struct SpaprMachineStateNested {
+
+    uint8_t api;
+#define NESTED_API_KVM_HV  1
+#define NESTED_API_PAPR    2
+    uint64_t ptcr;
+    uint32_t lpid_max;
+    uint32_t pvr_base;
+    bool capabilities_set;
+    GHashTable *guests;
+};
+
+struct SpaprMachineStateNestedGuestVcpuRunBuf {
+    uint64_t addr;
+    uint64_t size;
+};
+
+typedef struct SpaprMachineStateNestedGuestVcpu {
+    bool enabled;
+    struct SpaprMachineStateNestedGuestVcpuRunBuf runbufin;
+    struct SpaprMachineStateNestedGuestVcpuRunBuf runbufout;
+    CPUPPCState env;
+    int64_t tb_offset;
+    int64_t dec_expiry_tb;
+} SpaprMachineStateNestedGuestVcpu;
 
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
@@ -228,6 +261,21 @@ struct kvmppc_hv_guest_state {
     uint64_t dawr1;
     uint64_t dawrx1;
     /* Version 2 ends here */
+    uint64_t dec;
+    uint64_t fscr;
+    uint64_t fpscr;
+    uint64_t bescr;
+    uint64_t ebbhr;
+    uint64_t ebbrr;
+    uint64_t tar;
+    uint64_t dexcr;
+    uint64_t hdexcr;
+    uint64_t hashkeyr;
+    uint64_t hashpkeyr;
+    uint64_t ctrl;
+    uint64_t vscr;
+    uint64_t vrsave;
+    ppc_vsr_t vsr[64];
 };
 
 /* Latest version of hv_guest_state structure */
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
  2023-09-06  4:33 ` [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros Harsh Prateek Bora
  2023-09-06  4:33 ` [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  1:13   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api Harsh Prateek Bora
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

Use nested guest state specific struct for storing related info.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr.c         | 4 ++--
 hw/ppc/spapr_nested.c  | 4 ++--
 include/hw/ppc/spapr.h | 3 ++-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 07e91e3800..e44686b04d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1340,8 +1340,8 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
 
         assert(lpid != 0);
 
-        patb = spapr->nested_ptcr & PTCR_PATB;
-        pats = spapr->nested_ptcr & PTCR_PATS;
+        patb = spapr->nested.ptcr & PTCR_PATB;
+        pats = spapr->nested.ptcr & PTCR_PATS;
 
         /* Check if partition table is properly aligned */
         if (patb & MAKE_64BIT_MASK(0, pats + 12)) {
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 121aa96ddc..a669470f1a 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -25,7 +25,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
         return H_PARAMETER;
     }
 
-    spapr->nested_ptcr = ptcr; /* Save new partition table */
+    spapr->nested.ptcr = ptcr; /* Save new partition table */
 
     return H_SUCCESS;
 }
@@ -157,7 +157,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
     struct kvmppc_pt_regs *regs;
     hwaddr len;
 
-    if (spapr->nested_ptcr == 0) {
+    if (spapr->nested.ptcr == 0) {
         return H_NOT_AVAILABLE;
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 3990fed1d9..c8b42af430 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -12,6 +12,7 @@
 #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
 #include "hw/ppc/xics.h"        /* For ICSState */
 #include "hw/ppc/spapr_tpm_proxy.h"
+#include "hw/ppc/spapr_nested.h" /* for SpaprMachineStateNested */
 
 struct SpaprVioBus;
 struct SpaprPhbState;
@@ -216,7 +217,7 @@ struct SpaprMachineState {
     uint32_t vsmt;       /* Virtual SMT mode (KVM's "core stride") */
 
     /* Nested HV support (TCG only) */
-    uint64_t nested_ptcr;
+    struct SpaprMachineStateNested nested;
 
     Notifier epow_notifier;
     QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (2 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  1:35   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API Harsh Prateek Bora
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

With this patch, isolating kvm-hv nested api code to be executed only
when cap-nested-hv is set. This helps keeping api specific logic
mutually exclusive.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr.c      | 7 ++++++-
 hw/ppc/spapr_caps.c | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e44686b04d..0aa9f21516 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1334,8 +1334,11 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
         /* Copy PATE1:GR into PATE0:HR */
         entry->dw0 = spapr->patb_entry & PATE0_HR;
         entry->dw1 = spapr->patb_entry;
+        return true;
+    }
+    assert(spapr->nested.api);
 
-    } else {
+    if (spapr->nested.api == NESTED_API_KVM_HV) {
         uint64_t patb, pats;
 
         assert(lpid != 0);
@@ -3437,6 +3440,8 @@ static void spapr_instance_init(Object *obj)
         spapr_get_host_serial, spapr_set_host_serial);
     object_property_set_description(obj, "host-serial",
         "Host serial number to advertise in guest device tree");
+    /* Nested */
+    spapr->nested.api = 0;
 }
 
 static void spapr_machine_finalizefn(Object *obj)
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 5a0755d34f..a3a790b026 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -454,6 +454,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
         return;
     }
 
+    spapr->nested.api = NESTED_API_KVM_HV;
     if (kvm_enabled()) {
         if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
                               spapr->max_compat_pvr)) {
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (3 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  1:49   ` Nicholas Piggin
  2023-09-07  1:52   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES Harsh Prateek Bora
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This patch introduces a new cmd line option cap-nested-papr to enable
support for nested PAPR API by setting the nested.api version accordingly.
It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
this is meant for nested guest on pseries (PowerVM) where L0 retains
whole state of the nested guest. Both APIs are thus mutually exclusive.
Support for related hcalls is being added in next set of patches.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr.c         |  2 ++
 hw/ppc/spapr_caps.c    | 48 ++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  5 ++++-
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0aa9f21516..cbab7a825f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
         &vmstate_spapr_cap_fwnmi,
         &vmstate_spapr_fwnmi,
         &vmstate_spapr_cap_rpt_invalidate,
+        &vmstate_spapr_cap_nested_papr,
         NULL
     }
 };
@@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
     smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
     smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+    smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
     smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
     smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index a3a790b026..d3b9f107aa 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
     }
 }
 
+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+                                    uint8_t val, Error **errp)
+{
+    ERRP_GUARD();
+    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+    CPUPPCState *env = &cpu->env;
+
+    if (!val) {
+        /* capability disabled by default */
+        return;
+    }
+
+    if (tcg_enabled()) {
+        if (!(env->insns_flags2 & PPC2_ISA300)) {
+            error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
+            error_append_hint(errp,
+                              "Try appending -machine cap-nested-papr=off\n");
+            return;
+        }
+        spapr->nested.api = NESTED_API_PAPR;
+    } else if (kvm_enabled()) {
+        /*
+         * this gets executed in L1 qemu when L2 is launched,
+         * needs kvm-hv support in L1 kernel.
+         */
+        if (!kvmppc_has_cap_nested_kvm_hv()) {
+            error_setg(errp,
+                       "KVM implementation does not support Nested-HV");
+            error_append_hint(errp,
+                              "Try appending -machine cap-nested-hv=off\n");
+        } else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
+            error_setg(errp, "Error enabling cap-nested-hv with KVM");
+            error_append_hint(errp,
+                              "Try appending -machine cap-nested-hv=off\n");
+        }
+    }
+}
+
 static void cap_large_decr_apply(SpaprMachineState *spapr,
                                  uint8_t val, Error **errp)
 {
@@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
         .type = "bool",
         .apply = cap_nested_kvm_hv_apply,
     },
+    [SPAPR_CAP_NESTED_PAPR] = {
+        .name = "nested-papr",
+        .description = "Allow Nested PAPR (Phyp)",
+        .index = SPAPR_CAP_NESTED_PAPR,
+        .get = spapr_cap_get_bool,
+        .set = spapr_cap_set_bool,
+        .type = "bool",
+        .apply = cap_nested_papr_apply,
+    },
     [SPAPR_CAP_LARGE_DECREMENTER] = {
         .name = "large-decr",
         .description = "Allow Large Decrementer",
@@ -920,6 +967,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
 SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
+SPAPR_CAP_MIG_STATE(nested_papr, SPAPR_CAP_NESTED_PAPR);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
 SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c8b42af430..8a6e9ce929 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
 #define SPAPR_CAP_RPT_INVALIDATE        0x0B
 /* Support for AIL modes */
 #define SPAPR_CAP_AIL_MODE_3            0x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR           0x0D
 /* Num Caps */
-#define SPAPR_CAP_NUM                   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM                   (SPAPR_CAP_NESTED_PAPR + 1)
 
 /*
  * Capability Values
@@ -1005,6 +1007,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
 extern const VMStateDescription vmstate_spapr_cap_ibs;
 extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
 extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_nested_papr;
 extern const VMStateDescription vmstate_spapr_cap_large_decr;
 extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
 extern const VMStateDescription vmstate_spapr_cap_fwnmi;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (4 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  2:02   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES Harsh Prateek Bora
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This patch implements nested PAPR hcall H_GUEST_GET_CAPABILITIES and
also enables registration of nested PAPR hcalls whenever an L0 is
launched with cap-nested-papr=true. The common registration routine
shall be used by future patches for registration of related hcall
support
being added. This hcall is used by L1 kernel to get the set of guest
capabilities that are supported by L0 (Qemu TCG).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_caps.c           |  1 +
 hw/ppc/spapr_nested.c         | 35 +++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_nested.h |  6 ++++++
 3 files changed, 42 insertions(+)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index d3b9f107aa..cbe53a79ec 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -511,6 +511,7 @@ static void cap_nested_papr_apply(SpaprMachineState *spapr,
             return;
         }
         spapr->nested.api = NESTED_API_PAPR;
+        spapr_register_nested_phyp();
     } else if (kvm_enabled()) {
         /*
          * this gets executed in L1 qemu when L2 is launched,
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index a669470f1a..37f3a49be2 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -6,6 +6,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
+#include "cpu-models.h"
 
 #ifdef CONFIG_TCG
 #define PRTS_MASK      0x1f
@@ -375,6 +376,29 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
     address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
+static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
+                                             SpaprMachineState *spapr,
+                                             target_ulong opcode,
+                                             target_ulong *args)
+{
+    CPUPPCState *env = &cpu->env;
+    target_ulong flags = args[0];
+
+    if (flags) { /* don't handle any flags capabilities for now */
+        return H_PARAMETER;
+    }
+
+    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+        (CPU_POWERPC_POWER9_BASE))
+        env->gpr[4] = H_GUEST_CAPABILITIES_P9_MODE;
+
+    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+        (CPU_POWERPC_POWER10_BASE))
+        env->gpr[4] = H_GUEST_CAPABILITIES_P10_MODE;
+
+    return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -382,6 +406,12 @@ void spapr_register_nested(void)
     spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
     spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
 }
+
+void spapr_register_nested_phyp(void)
+{
+    spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
+}
+
 #else
 void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 {
@@ -392,4 +422,9 @@ void spapr_register_nested(void)
 {
     /* DO NOTHING */
 }
+
+void spapr_register_nested_phyp(void)
+{
+    /* DO NOTHING */
+}
 #endif
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index f8db31075b..ce198e9f70 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -189,6 +189,11 @@
 /* End of list of Guest State Buffer Element IDs */
 #define GSB_LAST                GSB_VCPU_SPR_ASDR
 
+/* Bit masks to be used in nested PAPR API */
+#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
+#define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
+#define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000
+
 typedef struct SpaprMachineStateNestedGuest {
     unsigned long vcpus;
     struct SpaprMachineStateNestedGuestVcpu *vcpu;
@@ -331,6 +336,7 @@ struct nested_ppc_state {
 };
 
 void spapr_register_nested(void);
+void spapr_register_nested_phyp(void);
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
 
 #endif /* HW_SPAPR_NESTED_H */
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (5 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  2:09   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE Harsh Prateek Bora
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This patch implements nested PAPR hcall H_GUEST_SET_CAPABILITIES.
This is used by L1 to set capabilities of the nested guest being
created. The capabilities being set are subset of the capabilities
returned from the previous call to H_GUEST_GET_CAPABILITIES hcall.
Currently, it only supports P9/P10 capability check through PVR.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr.c                |  1 +
 hw/ppc/spapr_nested.c         | 46 +++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_nested.h |  3 +++
 3 files changed, 50 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index cbab7a825f..7c6f6ee25d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3443,6 +3443,7 @@ static void spapr_instance_init(Object *obj)
         "Host serial number to advertise in guest device tree");
     /* Nested */
     spapr->nested.api = 0;
+    spapr->nested.capabilities_set = false;
 }
 
 static void spapr_machine_finalizefn(Object *obj)
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 37f3a49be2..9af65f257f 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -399,6 +399,51 @@ static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
     return H_SUCCESS;
 }
 
+static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
+                                             SpaprMachineState *spapr,
+                                             target_ulong opcode,
+                                              target_ulong *args)
+{
+    CPUPPCState *env = &cpu->env;
+    target_ulong flags = args[0];
+    target_ulong capabilities = args[1];
+
+    if (flags) { /* don't handle any flags capabilities for now */
+        return H_PARAMETER;
+    }
+
+
+    /* isn't supported */
+    if (capabilities & H_GUEST_CAPABILITIES_COPY_MEM) {
+        env->gpr[4] = 0;
+        return H_P2;
+    }
+
+    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+        (CPU_POWERPC_POWER9_BASE)) {
+        /* We are a P9 */
+        if (!(capabilities & H_GUEST_CAPABILITIES_P9_MODE)) {
+            env->gpr[4] = 1;
+            return H_P2;
+        }
+    }
+
+    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+        (CPU_POWERPC_POWER10_BASE)) {
+        /* We are a P10 */
+        if (!(capabilities & H_GUEST_CAPABILITIES_P10_MODE)) {
+            env->gpr[4] = 2;
+            return H_P2;
+        }
+    }
+
+    spapr->nested.capabilities_set = true;
+
+    spapr->nested.pvr_base = env->spr[SPR_PVR];
+
+    return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -410,6 +455,7 @@ void spapr_register_nested(void)
 void spapr_register_nested_phyp(void)
 {
     spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
+    spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index ce198e9f70..a7996251cb 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -193,6 +193,9 @@
 #define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
 #define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
 #define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000
+#define H_GUEST_CAP_COPY_MEM_BMAP   0
+#define H_GUEST_CAP_P9_MODE_BMAP    1
+#define H_GUEST_CAP_P10_MODE_BMAP   2
 
 typedef struct SpaprMachineStateNestedGuest {
     unsigned long vcpus;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (6 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  2:28   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU Harsh Prateek Bora
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This hcall is used by L1 to indicate to L0 that a new nested guest needs
to be created and therefore necessary resource allocation shall be made.
The L0 uses a hash table for nested guest specific resource management.
This data structure is further utilized by other hcalls to operate on
related members during entire life cycle of the nested guest.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_nested.c         | 75 +++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_nested.h |  3 ++
 2 files changed, 78 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 9af65f257f..09bbbfb341 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -444,6 +444,80 @@ static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
     return H_SUCCESS;
 }
 
+static void
+destroy_guest_helper(gpointer value)
+{
+    struct SpaprMachineStateNestedGuest *guest = value;
+    g_free(guest);
+}
+
+static target_ulong h_guest_create(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   target_ulong opcode,
+                                   target_ulong *args)
+{
+    CPUPPCState *env = &cpu->env;
+    target_ulong flags = args[0];
+    target_ulong continue_token = args[1];
+    uint64_t lpid;
+    int nguests = 0;
+    struct SpaprMachineStateNestedGuest *guest;
+
+    if (flags) { /* don't handle any flags for now */
+        return H_UNSUPPORTED_FLAG;
+    }
+
+    if (continue_token != -1) {
+        return H_P2;
+    }
+
+    if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
+        return H_FUNCTION;
+    }
+
+    if (!spapr->nested.capabilities_set) {
+        return H_STATE;
+    }
+
+    if (!spapr->nested.guests) {
+        spapr->nested.lpid_max = NESTED_GUEST_MAX;
+        spapr->nested.guests = g_hash_table_new_full(NULL,
+                                                     NULL,
+                                                     NULL,
+                                                     destroy_guest_helper);
+    }
+
+    nguests = g_hash_table_size(spapr->nested.guests);
+
+    if (nguests == spapr->nested.lpid_max) {
+        return H_NO_MEM;
+    }
+
+    /* Lookup for available lpid */
+    for (lpid = 1; lpid < spapr->nested.lpid_max; lpid++) {
+        if (!(g_hash_table_lookup(spapr->nested.guests,
+                                  GINT_TO_POINTER(lpid)))) {
+            break;
+        }
+    }
+    if (lpid == spapr->nested.lpid_max) {
+        return H_NO_MEM;
+    }
+
+    guest = g_try_new0(struct SpaprMachineStateNestedGuest, 1);
+    if (!guest) {
+        return H_NO_MEM;
+    }
+
+    guest->pvr_logical = spapr->nested.pvr_base;
+
+    g_hash_table_insert(spapr->nested.guests, GINT_TO_POINTER(lpid), guest);
+    printf("%s: lpid: %lu (MAX: %i)\n", __func__, lpid, spapr->nested.lpid_max);
+
+    env->gpr[4] = lpid;
+    return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -456,6 +530,7 @@ void spapr_register_nested_phyp(void)
 {
     spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
     spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
+    spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index a7996251cb..7841027df8 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -197,6 +197,9 @@
 #define H_GUEST_CAP_P9_MODE_BMAP    1
 #define H_GUEST_CAP_P10_MODE_BMAP   2
 
+/* Nested PAPR API macros */
+#define NESTED_GUEST_MAX 4096
+
 typedef struct SpaprMachineStateNestedGuest {
     unsigned long vcpus;
     struct SpaprMachineStateNestedGuestVcpu *vcpu;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (7 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  2:49   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table Harsh Prateek Bora
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This patch implements support for hcall H_GUEST_CREATE_VCPU which is
used to instantiate a new VCPU for a previously created nested guest.
The L1 provide the guest-id (returned by L0 during call to
H_GUEST_CREATE) and an associated unique vcpu-id to refer to this
instance in future calls. It is assumed that vcpu-ids are being
allocated in a sequential manner and max vcpu limit is 2048.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_nested.c         | 110 ++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h        |   1 +
 include/hw/ppc/spapr_nested.h |   1 +
 3 files changed, 112 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 09bbbfb341..e7956685af 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -376,6 +376,47 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
     address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
+static
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+                                                     target_ulong lpid)
+{
+    SpaprMachineStateNestedGuest *guest;
+
+    guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
+    return guest;
+}
+
+static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
+                       target_ulong vcpuid,
+                       bool inoutbuf)
+{
+    struct SpaprMachineStateNestedGuestVcpu *vcpu;
+
+    if (vcpuid >= NESTED_GUEST_VCPU_MAX) {
+        return false;
+    }
+
+    if (!(vcpuid < guest->vcpus)) {
+        return false;
+    }
+
+    vcpu = &guest->vcpu[vcpuid];
+    if (!vcpu->enabled) {
+        return false;
+    }
+
+    if (!inoutbuf) {
+        return true;
+    }
+
+    /* Check to see if the in/out buffers are registered */
+    if (vcpu->runbufin.addr && vcpu->runbufout.addr) {
+        return true;
+    }
+
+    return false;
+}
+
 static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
                                              SpaprMachineState *spapr,
                                              target_ulong opcode,
@@ -448,6 +489,11 @@ static void
 destroy_guest_helper(gpointer value)
 {
     struct SpaprMachineStateNestedGuest *guest = value;
+    int i = 0;
+    for (i = 0; i < guest->vcpus; i++) {
+        cpu_ppc_tb_free(&guest->vcpu[i].env);
+    }
+    g_free(guest->vcpu);
     g_free(guest);
 }
 
@@ -518,6 +564,69 @@ static target_ulong h_guest_create(PowerPCCPU *cpu,
     return H_SUCCESS;
 }
 
+static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
+                                        SpaprMachineState *spapr,
+                                        target_ulong opcode,
+                                        target_ulong *args)
+{
+    CPUPPCState *env = &cpu->env, *l2env;
+    target_ulong flags = args[0];
+    target_ulong lpid = args[1];
+    target_ulong vcpuid = args[2];
+    SpaprMachineStateNestedGuest *guest;
+
+    if (flags) { /* don't handle any flags for now */
+        return H_UNSUPPORTED_FLAG;
+    }
+
+    guest = spapr_get_nested_guest(spapr, lpid);
+    if (!guest) {
+        return H_P2;
+    }
+
+    if (vcpuid < guest->vcpus) {
+        return H_IN_USE;
+    }
+
+    if (guest->vcpus >= NESTED_GUEST_VCPU_MAX) {
+        return H_P3;
+    }
+
+    if (guest->vcpus) {
+        struct SpaprMachineStateNestedGuestVcpu *vcpus;
+        vcpus = g_try_renew(struct SpaprMachineStateNestedGuestVcpu,
+                            guest->vcpu,
+                            guest->vcpus + 1);
+        if (!vcpus) {
+            return H_NO_MEM;
+        }
+        memset(&vcpus[guest->vcpus], 0,
+               sizeof(struct SpaprMachineStateNestedGuestVcpu));
+        guest->vcpu = vcpus;
+        l2env = &vcpus[guest->vcpus].env;
+    } else {
+        guest->vcpu = g_try_new0(struct SpaprMachineStateNestedGuestVcpu, 1);
+        if (guest->vcpu == NULL) {
+            return H_NO_MEM;
+        }
+        l2env = &guest->vcpu->env;
+    }
+    /* need to memset to zero otherwise we leak L1 state to L2 */
+    memset(l2env, 0, sizeof(CPUPPCState));
+    /* Copy L1 PVR to L2 */
+    l2env->spr[SPR_PVR] = env->spr[SPR_PVR];
+    cpu_ppc_tb_init(l2env, SPAPR_TIMEBASE_FREQ);
+
+    guest->vcpus++;
+    assert(vcpuid < guest->vcpus); /* linear vcpuid allocation only */
+    guest->vcpu[vcpuid].enabled = true;
+
+    if (!vcpu_check(guest, vcpuid, false)) {
+        return H_PARAMETER;
+    }
+    return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -531,6 +640,7 @@ void spapr_register_nested_phyp(void)
     spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
     spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
     spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
+    spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
 }
 
 #else
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 8a6e9ce929..c9f9682a46 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -371,6 +371,7 @@ struct SpaprMachineState {
 #define H_UNSUPPORTED     -67
 #define H_OVERLAP         -68
 #define H_STATE           -75
+#define H_IN_USE          -77
 #define H_INVALID_ELEMENT_ID               -79
 #define H_INVALID_ELEMENT_SIZE             -80
 #define H_INVALID_ELEMENT_VALUE            -81
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 7841027df8..2e8c6ba1ca 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -199,6 +199,7 @@
 
 /* Nested PAPR API macros */
 #define NESTED_GUEST_MAX 4096
+#define NESTED_GUEST_VCPU_MAX 2048
 
 typedef struct SpaprMachineStateNestedGuest {
     unsigned long vcpus;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table.
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (8 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  3:01   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE Harsh Prateek Bora
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This is a first step towards enabling support for nested PAPR hcalls for
providing the get/set of various Guest State Buffer (GSB) elements via
h_guest_[g|s]et_state hcalls. This enables for identifying correct
callbacks for get/set for each of the elements supported via
h_guest_[g|s]et_state hcalls, support for which is added in next patch.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_hcall.c          |   1 +
 hw/ppc/spapr_nested.c         | 487 ++++++++++++++++++++++++++++++++++
 include/hw/ppc/ppc.h          |   2 +
 include/hw/ppc/spapr_nested.h | 102 +++++++
 4 files changed, 592 insertions(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 9b1f225d4a..ca609cb5a4 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1580,6 +1580,7 @@ static void hypercall_register_types(void)
     spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
 
     spapr_register_nested();
+    init_nested();
 }
 
 type_init(hypercall_register_types)
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index e7956685af..6fbb1bcb02 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -7,6 +7,7 @@
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
 #include "cpu-models.h"
+#include "mmu-book3s-v3.h"
 
 #ifdef CONFIG_TCG
 #define PRTS_MASK      0x1f
@@ -417,6 +418,486 @@ static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
     return false;
 }
 
+static void *get_vcpu_env_ptr(SpaprMachineStateNestedGuest *guest,
+                              target_ulong vcpuid)
+{
+    assert(vcpu_check(guest, vcpuid, false));
+    return &guest->vcpu[vcpuid].env;
+}
+
+static void *get_vcpu_ptr(SpaprMachineStateNestedGuest *guest,
+                                   target_ulong vcpuid)
+{
+    assert(vcpu_check(guest, vcpuid, false));
+    return &guest->vcpu[vcpuid];
+}
+
+static void *get_guest_ptr(SpaprMachineStateNestedGuest *guest,
+                           target_ulong vcpuid)
+{
+    return guest;
+}
+
+/*
+ * set=1 means the L1 is trying to set some state
+ * set=0 means the L1 is trying to get some state
+ */
+static void copy_state_8to8(void *a, void *b, bool set)
+{
+    /* set takes from the Big endian element_buf and sets internal buffer */
+
+    if (set) {
+        *(uint64_t *)a = be64_to_cpu(*(uint64_t *)b);
+    } else {
+        *(uint64_t *)b = cpu_to_be64(*(uint64_t *)a);
+    }
+}
+
+static void copy_state_16to16(void *a, void *b, bool set)
+{
+    uint64_t *src, *dst;
+
+    if (set) {
+        src = b;
+        dst = a;
+
+        dst[1] = be64_to_cpu(src[0]);
+        dst[0] = be64_to_cpu(src[1]);
+    } else {
+        src = a;
+        dst = b;
+
+        dst[1] = cpu_to_be64(src[0]);
+        dst[0] = cpu_to_be64(src[1]);
+    }
+}
+
+static void copy_state_4to8(void *a, void *b, bool set)
+{
+    if (set) {
+        *(uint64_t *)a  = (uint64_t) be32_to_cpu(*(uint32_t *)b);
+    } else {
+        *(uint32_t *)b = cpu_to_be32((uint32_t) (*((uint64_t *)a)));
+    }
+}
+
+static void copy_state_pagetbl(void *a, void *b, bool set)
+{
+    uint64_t *pagetbl;
+    uint64_t *buf; /* 3 double words */
+    uint64_t rts;
+
+    assert(set);
+
+    pagetbl = a;
+    buf = b;
+
+    *pagetbl = be64_to_cpu(buf[0]);
+    /* as per ISA section 6.7.6.1 */
+    *pagetbl |= PATE0_HR; /* Host Radix bit is 1 */
+
+    /* RTS */
+    rts = be64_to_cpu(buf[1]);
+    assert(rts == 52);
+    rts = rts - 31; /* since radix tree size = 2^(RTS+31) */
+    *pagetbl |=  ((rts & 0x7) << 5); /* RTS2 is bit 56:58 */
+    *pagetbl |=  (((rts >> 3) & 0x3) << 61); /* RTS1 is bit 1:2 */
+
+    /* RPDS {Size = 2^(RPDS+3) , RPDS >=5} */
+    *pagetbl |= 63 - clz64(be64_to_cpu(buf[2])) - 3;
+}
+
+static void copy_state_proctbl(void *a, void *b, bool set)
+{
+    uint64_t *proctbl;
+    uint64_t *buf; /* 2 double words */
+
+    assert(set);
+
+    proctbl = a;
+    buf = b;
+    /* PRTB: Process Table Base */
+    *proctbl = be64_to_cpu(buf[0]);
+    /* PRTS: Process Table Size = 2^(12+PRTS) */
+    if (be64_to_cpu(buf[1]) == (1ULL << 12)) {
+            *proctbl |= 0;
+    } else if (be64_to_cpu(buf[1]) == (1ULL << 24)) {
+            *proctbl |= 12;
+    } else {
+        g_assert_not_reached();
+    }
+}
+
+static void copy_state_runbuf(void *a, void *b, bool set)
+{
+    uint64_t *buf; /* 2 double words */
+    struct SpaprMachineStateNestedGuestVcpuRunBuf *runbuf;
+
+    assert(set);
+
+    runbuf = a;
+    buf = b;
+
+    runbuf->addr = be64_to_cpu(buf[0]);
+    assert(runbuf->addr);
+
+    /* per spec */
+    assert(be64_to_cpu(buf[1]) <= 16384);
+
+    /*
+     * This will also hit in the input buffer but should be fine for
+     * now. If not we can split this function.
+     */
+    assert(be64_to_cpu(buf[1]) >= VCPU_OUT_BUF_MIN_SZ);
+
+    runbuf->size = be64_to_cpu(buf[1]);
+}
+
+/* tell the L1 how big we want the output vcpu run buffer */
+static void out_buf_min_size(void *a, void *b, bool set)
+{
+    uint64_t *buf; /* 1 double word */
+
+    assert(!set);
+
+    buf = b;
+
+    buf[0] = cpu_to_be64(VCPU_OUT_BUF_MIN_SZ);
+}
+
+static void copy_logical_pvr(void *a, void *b, bool set)
+{
+    uint32_t *buf; /* 1 word */
+    uint32_t *pvr_logical_ptr;
+    uint32_t pvr_logical;
+
+    pvr_logical_ptr = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be32(*pvr_logical_ptr);
+        return;
+    }
+
+    pvr_logical = be32_to_cpu(buf[0]);
+    /* don't change the major version */
+    assert((pvr_logical & CPU_POWERPC_POWER_SERVER_MASK) ==
+           (*pvr_logical_ptr & CPU_POWERPC_POWER_SERVER_MASK));
+
+    *pvr_logical_ptr = pvr_logical;
+}
+
+static void copy_tb_offset(void *a, void *b, bool set)
+{
+    SpaprMachineStateNestedGuest *guest;
+    uint64_t *buf; /* 1 double word */
+    uint64_t *tb_offset_ptr;
+    uint64_t tb_offset;
+
+    tb_offset_ptr = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be64(*tb_offset_ptr);
+        return;
+    }
+
+    tb_offset = be64_to_cpu(buf[0]);
+    /* need to copy this to the individual tb_offset for each vcpu */
+    guest = container_of(tb_offset_ptr,
+                         struct SpaprMachineStateNestedGuest,
+                         tb_offset);
+    for (int i = 0; i < guest->vcpus; i++) {
+        guest->vcpu[i].tb_offset = tb_offset;
+    }
+}
+
+static void copy_state_dec_expire_tb(void *a, void *b, bool set)
+{
+    int64_t *dec_expiry_tb;
+    uint64_t *buf; /* 1 double word */
+
+    dec_expiry_tb = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be64(*dec_expiry_tb);
+        return;
+    }
+
+    *dec_expiry_tb = be64_to_cpu(buf[0]);
+}
+
+static void copy_state_hdecr(void *a, void *b, bool set)
+{
+    uint64_t *buf; /* 1 double word */
+    CPUPPCState *env;
+
+    env = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be64(env->tb_env->hdecr_expiry_tb);
+        return;
+    }
+
+    env->tb_env->hdecr_expiry_tb = be64_to_cpu(buf[0]);
+}
+
+static void copy_state_vscr(void *a, void *b, bool set)
+{
+    uint32_t *buf; /* 1 word */
+    CPUPPCState *env;
+
+    env = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be32(ppc_get_vscr(env));
+        return;
+    }
+
+    ppc_store_vscr(env, be32_to_cpu(buf[0]));
+}
+
+static void copy_state_fpscr(void *a, void *b, bool set)
+{
+    uint64_t *buf; /* 1 double word */
+    CPUPPCState *env;
+
+    env = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be64(env->fpscr);
+        return;
+    }
+
+    ppc_store_fpscr(env, be64_to_cpu(buf[0]));
+}
+
+static void copy_state_cr(void *a, void *b, bool set)
+{
+    uint32_t *buf; /* 1 word */
+    CPUPPCState *env;
+    uint64_t cr; /* api v1 uses uint64_t but papr acr v2 mentions 4 bytes */
+    env = a;
+    buf = b;
+
+    if (!set) {
+        buf[0] = cpu_to_be32((uint32_t)ppc_get_cr(env));
+        return;
+    }
+    cr = be32_to_cpu(buf[0]);
+    ppc_set_cr(env, cr);
+}
+
+struct guest_state_element_type guest_state_element_types[] = {
+    GUEST_STATE_ELEMENT_NOP(GSB_HV_VCPU_IGNORED_ID, 0),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR0,  gpr[0]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR1,  gpr[1]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR2,  gpr[2]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR3,  gpr[3]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR4,  gpr[4]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR5,  gpr[5]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR6,  gpr[6]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR7,  gpr[7]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR8,  gpr[8]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR9,  gpr[9]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR10, gpr[10]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR11, gpr[11]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR12, gpr[12]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR13, gpr[13]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR14, gpr[14]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR15, gpr[15]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR16, gpr[16]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR17, gpr[17]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR18, gpr[18]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR19, gpr[19]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR20, gpr[20]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR21, gpr[21]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR22, gpr[22]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR23, gpr[23]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR24, gpr[24]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR25, gpr[25]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR26, gpr[26]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR27, gpr[27]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR28, gpr[28]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR29, gpr[29]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR30, gpr[30]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_GPR31, gpr[31]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_NIA, nip),
+    GSE_ENV_DWM(GSB_VCPU_SPR_MSR, msr, HVMASK_MSR),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_CTR, ctr),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_LR, lr),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_XER, xer),
+    GUEST_STATE_ELEMENT_ENV_BASE(GSB_VCPU_SPR_CR, 4, copy_state_cr),
+    GUEST_STATE_ELEMENT_NOP_DW(GSB_VCPU_SPR_MMCR3),
+    GUEST_STATE_ELEMENT_NOP_DW(GSB_VCPU_SPR_SIER2),
+    GUEST_STATE_ELEMENT_NOP_DW(GSB_VCPU_SPR_SIER3),
+    GUEST_STATE_ELEMENT_NOP_W(GSB_VCPU_SPR_WORT),
+    GSE_ENV_DWM(GSB_VCPU_SPR_LPCR, spr[SPR_LPCR], HVMASK_LPCR),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_AMOR, spr[SPR_AMOR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_HFSCR, spr[SPR_HFSCR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_DAWR0, spr[SPR_DAWR0]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_DAWRX0, spr[SPR_DAWRX0]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_CIABR, spr[SPR_CIABR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_PURR,  spr[SPR_PURR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SPURR, spr[SPR_SPURR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_IC,    spr[SPR_IC]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_VTB,   spr[SPR_VTB]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_HDAR,  spr[SPR_HDAR]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_HDSISR, spr[SPR_HDSISR]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_HEIR,   spr[SPR_HEIR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_ASDR,  spr[SPR_ASDR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SRR0, spr[SPR_SRR0]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SRR1, spr[SPR_SRR1]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SPRG0, spr[SPR_SPRG0]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SPRG1, spr[SPR_SPRG1]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SPRG2, spr[SPR_SPRG2]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SPRG3, spr[SPR_SPRG3]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PIDR,   spr[SPR_BOOKS_PID]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_CFAR, cfar),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_PPR, spr[SPR_PPR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_DAWR1, spr[SPR_DAWR1]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_DAWRX1, spr[SPR_DAWRX1]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_DEXCR, spr[SPR_DEXCR]),
+    GSE_ENV_DWM(GSB_VCPU_SPR_HDEXCR, spr[SPR_HDEXCR], HVMASK_HDEXCR),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_HASHKEYR,  spr[SPR_HASHKEYR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_HASHPKEYR, spr[SPR_HASHPKEYR]),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR0, 16, vsr[0], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR1, 16, vsr[1], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR2, 16, vsr[2], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR3, 16, vsr[3], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR4, 16, vsr[4], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR5, 16, vsr[5], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR6, 16, vsr[6], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR7, 16, vsr[7], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR8, 16, vsr[8], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR9, 16, vsr[9], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR10, 16, vsr[10], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR11, 16, vsr[11], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR12, 16, vsr[12], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR13, 16, vsr[13], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR14, 16, vsr[14], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR15, 16, vsr[15], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR16, 16, vsr[16], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR17, 16, vsr[17], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR18, 16, vsr[18], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR19, 16, vsr[19], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR20, 16, vsr[20], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR21, 16, vsr[21], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR22, 16, vsr[22], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR23, 16, vsr[23], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR24, 16, vsr[24], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR25, 16, vsr[25], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR26, 16, vsr[26], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR27, 16, vsr[27], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR28, 16, vsr[28], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR29, 16, vsr[29], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR30, 16, vsr[30], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR31, 16, vsr[31], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR32, 16, vsr[32], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR33, 16, vsr[33], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR34, 16, vsr[34], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR35, 16, vsr[35], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR36, 16, vsr[36], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR37, 16, vsr[37], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR38, 16, vsr[38], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR39, 16, vsr[39], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR40, 16, vsr[40], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR41, 16, vsr[41], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR42, 16, vsr[42], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR43, 16, vsr[43], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR44, 16, vsr[44], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR45, 16, vsr[45], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR46, 16, vsr[46], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR47, 16, vsr[47], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR48, 16, vsr[48], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR49, 16, vsr[49], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR50, 16, vsr[50], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR51, 16, vsr[51], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR52, 16, vsr[52], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR53, 16, vsr[53], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR54, 16, vsr[54], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR55, 16, vsr[55], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR56, 16, vsr[56], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR57, 16, vsr[57], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR58, 16, vsr[58], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR59, 16, vsr[59], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR60, 16, vsr[60], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR61, 16, vsr[61], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR62, 16, vsr[62], copy_state_16to16),
+    GUEST_STATE_ELEMENT_ENV(GSB_VCPU_SPR_VSR63, 16, vsr[63], copy_state_16to16),
+    GSBE_NESTED(GSB_PART_SCOPED_PAGETBL, 0x18, parttbl[0],  copy_state_pagetbl),
+    GSBE_NESTED(GSB_PROCESS_TBL,         0x10, parttbl[1],  copy_state_proctbl),
+    GSBE_NESTED(GSB_VCPU_LPVR,           0x4,  pvr_logical, copy_logical_pvr),
+    GSBE_NESTED_MSK(GSB_TB_OFFSET, 0x8, tb_offset, copy_tb_offset,
+                    HVMASK_TB_OFFSET),
+    GSBE_NESTED_VCPU(GSB_VCPU_IN_BUFFER, 0x10, runbufin,    copy_state_runbuf),
+    GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUFFER, 0x10, runbufout,   copy_state_runbuf),
+    GSBE_NESTED_VCPU(GSB_VCPU_OUT_BUF_MIN_SZ, 0x8, runbufout, out_buf_min_size),
+    GSBE_NESTED_VCPU(GSB_VCPU_DEC_EXPIRE_TB, 0x8, dec_expiry_tb,
+                     copy_state_dec_expire_tb),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_EBBHR, spr[SPR_EBBHR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_TAR,   spr[SPR_TAR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_EBBRR, spr[SPR_EBBRR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_BESCR, spr[SPR_BESCR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_IAMR , spr[SPR_IAMR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_AMR  , spr[SPR_AMR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_UAMOR, spr[SPR_UAMOR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_DSCR , spr[SPR_DSCR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_FSCR , spr[SPR_FSCR]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PSPB , spr[SPR_PSPB]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_CTRL , spr[SPR_CTRL]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_VRSAVE, spr[SPR_VRSAVE]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_DAR , spr[SPR_DAR]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_DSISR , spr[SPR_DSISR]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PMC1, spr[SPR_POWER_PMC1]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PMC2, spr[SPR_POWER_PMC2]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PMC3, spr[SPR_POWER_PMC3]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PMC4, spr[SPR_POWER_PMC4]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PMC5, spr[SPR_POWER_PMC5]),
+    GUEST_STATE_ELEMENT_ENV_W(GSB_VCPU_SPR_PMC6, spr[SPR_POWER_PMC6]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_MMCR0, spr[SPR_POWER_MMCR0]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_MMCR1, spr[SPR_POWER_MMCR1]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_MMCR2, spr[SPR_POWER_MMCR2]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_MMCRA, spr[SPR_POWER_MMCRA]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SDAR , spr[SPR_POWER_SDAR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SIAR , spr[SPR_POWER_SIAR]),
+    GUEST_STATE_ELEMENT_ENV_DW(GSB_VCPU_SPR_SIER , spr[SPR_POWER_SIER]),
+    GUEST_STATE_ELEMENT_ENV_BASE(GSB_VCPU_HDEC_EXPIRY_TB, 8, copy_state_hdecr),
+    GUEST_STATE_ELEMENT_ENV_BASE(GSB_VCPU_SPR_VSCR,  4, copy_state_vscr),
+    GUEST_STATE_ELEMENT_ENV_BASE(GSB_VCPU_SPR_FPSCR, 8, copy_state_fpscr)
+};
+
+void init_nested(void)
+{
+    struct guest_state_element_type *type;
+    int i;
+
+    /* Init the guest state elements lookup table, flags for now */
+    for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++) {
+        type = &guest_state_element_types[i];
+
+        assert(type->id <= GSB_LAST);
+        if (type->id >= GSB_VCPU_SPR_HDAR)
+            /* 0xf000 - 0xf005 Thread + RO */
+            type->flags = GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY;
+        else if (type->id >= GSB_VCPU_IN_BUFFER)
+            /* 0x0c00 - 0xf000 Thread + RW */
+            type->flags = 0;
+        else if (type->id >= GSB_VCPU_LPVR)
+            /* 0x0003 - 0x0bff Guest + RW */
+            type->flags = GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE;
+        else if (type->id >= GSB_HV_VCPU_STATE_SIZE)
+            /* 0x0001 - 0x0002 Guest + RO */
+            type->flags = GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY |
+                          GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE;
+    }
+}
+
+
 static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
                                              SpaprMachineState *spapr,
                                              target_ulong opcode,
@@ -658,4 +1139,10 @@ void spapr_register_nested_phyp(void)
 {
     /* DO NOTHING */
 }
+
+void init_nested(void)
+{
+    /* DO NOTHING */
+}
+
 #endif
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index e095c002dc..d7acc28d17 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -33,6 +33,8 @@ struct ppc_tb_t {
     QEMUTimer *decr_timer;
     /* Hypervisor decrementer management */
     uint64_t hdecr_next;    /* Tick for next hdecr interrupt  */
+    /* TB that HDEC should fire and return ctrl back to the Host partition */
+    uint64_t hdecr_expiry_tb;
     QEMUTimer *hdecr_timer;
     int64_t purr_offset;
     void *opaque;
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 2e8c6ba1ca..3c0d6a486e 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -200,6 +200,95 @@
 /* Nested PAPR API macros */
 #define NESTED_GUEST_MAX 4096
 #define NESTED_GUEST_VCPU_MAX 2048
+#define VCPU_OUT_BUF_MIN_SZ   0x80ULL
+#define HVMASK_DEFAULT        0xffffffffffffffff
+#define HVMASK_LPCR           0x0070000003820800 /* BE format */
+#define HVMASK_MSR            0xEBFFFFFFFFBFEFFF
+#define HVMASK_HDEXCR         0x00000000FFFFFFFF
+#define HVMASK_TB_OFFSET      0x000000FFFFFFFFFF
+
+#define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
+    .id = (i),                                     \
+    .size = (sz),                                  \
+    .location = ptr,                               \
+    .offset = offsetof(struct s, f),               \
+    .copy = (c)                                    \
+}
+
+#define GSBE_NESTED(i, sz, f, c) {                             \
+    .id = (i),                                                 \
+    .size = (sz),                                              \
+    .location = get_guest_ptr,                                 \
+    .offset = offsetof(struct SpaprMachineStateNestedGuest, f),\
+    .copy = (c),                                               \
+    .mask = HVMASK_DEFAULT                                     \
+}
+
+#define GSBE_NESTED_MSK(i, sz, f, c, m) {                      \
+    .id = (i),                                                 \
+    .size = (sz),                                              \
+    .location = get_guest_ptr,                                 \
+    .offset = offsetof(struct SpaprMachineStateNestedGuest, f),\
+    .copy = (c),                                               \
+    .mask = (m)                                                \
+}
+
+#define GSBE_NESTED_VCPU(i, sz, f, c) {                            \
+    .id = (i),                                                     \
+    .size = (sz),                                                  \
+    .location = get_vcpu_ptr,                                      \
+    .offset = offsetof(struct SpaprMachineStateNestedGuestVcpu, f),\
+    .copy = (c),                                                   \
+    .mask = HVMASK_DEFAULT                                         \
+}
+
+#define GUEST_STATE_ELEMENT_NOP(i, sz) { \
+    .id = (i),                             \
+    .size = (sz),                          \
+    .location = NULL,                      \
+    .offset = 0,                           \
+    .copy = NULL,                          \
+    .mask = HVMASK_DEFAULT                 \
+}
+
+#define GUEST_STATE_ELEMENT_NOP_DW(i)   \
+        GUEST_STATE_ELEMENT_NOP(i, 8)
+#define GUEST_STATE_ELEMENT_NOP_W(i) \
+        GUEST_STATE_ELEMENT_NOP(i, 4)
+
+#define GUEST_STATE_ELEMENT_ENV_BASE(i, s, c) {  \
+            .id = (i),                           \
+            .size = (s),                         \
+            .location = get_vcpu_env_ptr,        \
+            .offset = 0,                         \
+            .copy = (c),                         \
+            .mask = HVMASK_DEFAULT               \
+    }
+
+#define GUEST_STATE_ELEMENT_ENV(i, s, f, c) {    \
+            .id = (i),                           \
+            .size = (s),                         \
+            .location = get_vcpu_env_ptr,        \
+            .offset = offsetof(CPUPPCState, f),  \
+            .copy = (c),                         \
+            .mask = HVMASK_DEFAULT               \
+    }
+
+#define GUEST_STATE_ELEMENT_MSK(i, s, f, c, m) { \
+            .id = (i),                           \
+            .size = (s),                         \
+            .location = get_vcpu_env_ptr,        \
+            .offset = offsetof(CPUPPCState, f),  \
+            .copy = (c),                         \
+            .mask = (m)                          \
+    }
+
+#define GUEST_STATE_ELEMENT_ENV_DW(i, f) \
+    GUEST_STATE_ELEMENT_ENV(i, 8, f, copy_state_8to8)
+#define GUEST_STATE_ELEMENT_ENV_W(i, f) \
+    GUEST_STATE_ELEMENT_ENV(i, 4, f, copy_state_4to8)
+#define GSE_ENV_DWM(i, f, m) \
+    GUEST_STATE_ELEMENT_MSK(i, 8, f, copy_state_8to8, m)
 
 typedef struct SpaprMachineStateNestedGuest {
     unsigned long vcpus;
@@ -235,6 +324,18 @@ typedef struct SpaprMachineStateNestedGuestVcpu {
     int64_t dec_expiry_tb;
 } SpaprMachineStateNestedGuestVcpu;
 
+struct guest_state_element_type {
+    uint16_t id;
+    int size;
+#define GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE 0x1
+#define GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY  0x2
+   uint16_t flags;
+    void *(*location)(SpaprMachineStateNestedGuest *, target_ulong);
+    size_t offset;
+    void (*copy)(void *, void *, bool);
+    uint64_t mask;
+};
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
@@ -345,5 +446,6 @@ struct nested_ppc_state {
 void spapr_register_nested(void);
 void spapr_register_nested_phyp(void);
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
+void init_nested(void);
 
 #endif /* HW_SPAPR_NESTED_H */
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (9 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  3:30   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 12/15] ppc: spapr: Use correct source for parttbl info for nested PAPR API Harsh Prateek Bora
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

L1 can reuest to get/set state of any of the supported Guest State
Buffer (GSB) elements using h_guest_[get|set]_state hcalls.
These hcalls needs to do some necessary validation check for each
get/set request based on the flags passed and operation supported.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_nested.c         | 267 ++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_nested.h |  22 +++
 2 files changed, 289 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 6fbb1bcb02..498e7286fa 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -897,6 +897,138 @@ void init_nested(void)
     }
 }
 
+static struct guest_state_element *guest_state_element_next(
+    struct guest_state_element *element,
+    int64_t *len,
+    int64_t *num_elements)
+{
+    uint16_t size;
+
+    /* size is of element->value[] only. Not whole guest_state_element */
+    size = be16_to_cpu(element->size);
+
+    if (len) {
+        *len -= size + offsetof(struct guest_state_element, value);
+    }
+
+    if (num_elements) {
+        *num_elements -= 1;
+    }
+
+    return (struct guest_state_element *)(element->value + size);
+}
+
+static
+struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
+        if (id == guest_state_element_types[i].id) {
+            return &guest_state_element_types[i];
+        }
+
+    return NULL;
+}
+
+static void print_element(struct guest_state_element *element,
+                          struct guest_state_request *gsr)
+{
+    printf("id:0x%04x size:0x%04x %s ",
+           be16_to_cpu(element->id), be16_to_cpu(element->size),
+           gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get");
+    printf("buf:0x%016lx ...\n", be64_to_cpu(*(uint64_t *)element->value));
+}
+
+static bool guest_state_request_check(struct guest_state_request *gsr)
+{
+    int64_t num_elements, len = gsr->len;
+    struct guest_state_buffer *gsb = gsr->gsb;
+    struct guest_state_element *element;
+    struct guest_state_element_type *type;
+    uint16_t id, size;
+
+    /* gsb->num_elements = 0 == 32 bits long */
+    assert(len >= 4);
+
+    num_elements = be32_to_cpu(gsb->num_elements);
+    element = gsb->elements;
+    len -= sizeof(gsb->num_elements);
+
+    /* Walk the buffer to validate the length */
+    while (num_elements) {
+
+        id = be16_to_cpu(element->id);
+        size = be16_to_cpu(element->size);
+
+        if (false) {
+            print_element(element, gsr);
+        }
+        /* buffer size too small */
+        if (len < 0) {
+            return false;
+        }
+
+        type = guest_state_element_type_find(id);
+        if (!type) {
+            printf("%s: Element ID %04x unknown\n", __func__, id);
+            print_element(element, gsr);
+            return false;
+        }
+
+        if (id == GSB_HV_VCPU_IGNORED_ID) {
+            goto next_element;
+        }
+
+        if (size != type->size) {
+            printf("%s: Size mismatch. Element ID:%04x. Size Exp:%i Got:%i\n",
+                   __func__, id, type->size, size);
+            print_element(element, gsr);
+            return false;
+        }
+
+        if ((type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY) &&
+            (gsr->flags & GUEST_STATE_REQUEST_SET)) {
+            printf("%s: trying to set a read-only Element ID:%04x.\n",
+                   __func__, id);
+            return false;
+        }
+
+        if (type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE) {
+            /* guest wide element type */
+            if (!(gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE)) {
+                printf("%s: trying to set a guest wide Element ID:%04x.\n",
+                       __func__, id);
+                return false;
+            }
+        } else {
+            /* thread wide element type */
+            if (gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE) {
+                printf("%s: trying to set a thread wide Element ID:%04x.\n",
+                       __func__, id);
+                return false;
+            }
+        }
+next_element:
+        element = guest_state_element_next(element, &len, &num_elements);
+
+    }
+    return true;
+}
+
+static bool is_gsr_invalid(struct guest_state_request *gsr,
+                                   struct guest_state_element *element,
+                                   struct guest_state_element_type *type)
+{
+    if ((gsr->flags & GUEST_STATE_REQUEST_SET) &&
+        (*(uint64_t *)(element->value) & ~(type->mask))) {
+        print_element(element, gsr);
+        printf("L1 can't set reserved bits (allowed mask: 0x%08lx)\n",
+               type->mask);
+        return true;
+    }
+    return false;
+}
 
 static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
                                              SpaprMachineState *spapr,
@@ -1108,6 +1240,139 @@ static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
     return H_SUCCESS;
 }
 
+static target_ulong getset_state(SpaprMachineStateNestedGuest *guest,
+                                 uint64_t vcpuid,
+                                 struct guest_state_request *gsr)
+{
+    void *ptr;
+    uint16_t id;
+    struct guest_state_element *element;
+    struct guest_state_element_type *type;
+    int64_t lenleft, num_elements;
+
+    lenleft = gsr->len;
+
+    if (!guest_state_request_check(gsr)) {
+        return H_P3;
+    }
+
+    num_elements = be32_to_cpu(gsr->gsb->num_elements);
+    element = gsr->gsb->elements;
+    /* Process the elements */
+    while (num_elements) {
+        type = NULL;
+        /* Debug print before doing anything */
+        if (false) {
+            print_element(element, gsr);
+        }
+
+        id = be16_to_cpu(element->id);
+        if (id == GSB_HV_VCPU_IGNORED_ID) {
+            goto next_element;
+        }
+
+        type = guest_state_element_type_find(id);
+        assert(type);
+
+        /* Get pointer to guest data to get/set */
+        if (type->location && type->copy) {
+            ptr = type->location(guest, vcpuid);
+            assert(ptr);
+            if (!~(type->mask) && is_gsr_invalid(gsr, element, type)) {
+                return H_INVALID_ELEMENT_VALUE;
+            }
+            type->copy(ptr + type->offset, element->value,
+                       gsr->flags & GUEST_STATE_REQUEST_SET ? true : false);
+        }
+
+next_element:
+        element = guest_state_element_next(element, &lenleft, &num_elements);
+    }
+
+    return H_SUCCESS;
+}
+
+static target_ulong map_and_getset_state(PowerPCCPU *cpu,
+                                         SpaprMachineStateNestedGuest *guest,
+                                         uint64_t vcpuid,
+                                         struct guest_state_request *gsr)
+{
+    target_ulong rc;
+    int64_t lenleft, len;
+    bool is_write;
+
+    assert(gsr->len < (1024 * 1024)); /* sanity check */
+
+    lenleft = len = gsr->len;
+    gsr->gsb = address_space_map(CPU(cpu)->as, gsr->buf, (uint64_t *)&len,
+                                 false, MEMTXATTRS_UNSPECIFIED);
+    if (!gsr->gsb) {
+        rc = H_P3;
+        goto out1;
+    }
+
+    if (len != lenleft) {
+        rc = H_P3;
+        goto out1;
+    }
+
+    rc = getset_state(guest, vcpuid, gsr);
+
+out1:
+    is_write = (rc == H_SUCCESS) ? len : 0;
+    address_space_unmap(CPU(cpu)->as, gsr->gsb, len, is_write, false);
+    return rc;
+}
+
+static target_ulong h_guest_getset_state(PowerPCCPU *cpu,
+                                         SpaprMachineState *spapr,
+                                         target_ulong *args,
+                                         bool set)
+{
+    target_ulong flags = args[0];
+    target_ulong lpid = args[1];
+    target_ulong vcpuid = args[2];
+    target_ulong buf = args[3];
+    target_ulong buflen = args[4];
+    struct guest_state_request gsr;
+    SpaprMachineStateNestedGuest *guest;
+
+    guest = spapr_get_nested_guest(spapr, lpid);
+    if (!guest) {
+        return H_P2;
+    }
+    gsr.buf = buf;
+    gsr.len = buflen;
+    gsr.flags = 0;
+    if (flags & H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
+        gsr.flags |= GUEST_STATE_REQUEST_GUEST_WIDE;
+    }
+    if (flags & !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
+        return H_PARAMETER; /* flag not supported yet */
+    }
+
+    if (set) {
+        gsr.flags |= GUEST_STATE_REQUEST_SET;
+    }
+    return map_and_getset_state(cpu, guest, vcpuid, &gsr);
+}
+
+static target_ulong h_guest_set_state(PowerPCCPU *cpu,
+                                      SpaprMachineState *spapr,
+                                      target_ulong opcode,
+                                      target_ulong *args)
+{
+    return h_guest_getset_state(cpu, spapr, args, true);
+}
+
+static target_ulong h_guest_get_state(PowerPCCPU *cpu,
+                                      SpaprMachineState *spapr,
+                                      target_ulong opcode,
+                                      target_ulong *args)
+{
+    return h_guest_getset_state(cpu, spapr, args, false);
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -1122,6 +1387,8 @@ void spapr_register_nested_phyp(void)
     spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
     spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
     spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
+    spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
+    spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 3c0d6a486e..eaee624b87 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -206,6 +206,9 @@
 #define HVMASK_MSR            0xEBFFFFFFFFBFEFFF
 #define HVMASK_HDEXCR         0x00000000FFFFFFFF
 #define HVMASK_TB_OFFSET      0x000000FFFFFFFFFF
+#define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000000000000000 /* BE in GSB */
+#define GUEST_STATE_REQUEST_GUEST_WIDE       0x1
+#define GUEST_STATE_REQUEST_SET              0x2
 
 #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
     .id = (i),                                     \
@@ -336,6 +339,25 @@ struct guest_state_element_type {
     uint64_t mask;
 };
 
+struct guest_state_element {
+    uint16_t id;   /* Big Endian */
+    uint16_t size; /* Big Endian */
+    uint8_t value[]; /* Big Endian (based on size above) */
+} QEMU_PACKED;
+
+struct guest_state_buffer {
+    uint32_t num_elements; /* Big Endian */
+    struct guest_state_element elements[];
+} QEMU_PACKED;
+
+/* Actuall buffer plus some metadata about the request */
+struct guest_state_request {
+    struct guest_state_buffer *gsb;
+    int64_t buf;
+    int64_t len;
+    uint16_t flags;
+};
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 12/15] ppc: spapr: Use correct source for parttbl info for nested PAPR API.
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (10 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-06  4:33 ` [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU Harsh Prateek Bora
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

For nested PAPR API, we use SpaprMachineStateNestedGuest struct to store
partition table info. Therefore, use the same in spapr_get_pate() as
well.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr.c         | 14 ++++++++++++++
 hw/ppc/spapr_nested.c  |  1 -
 include/hw/ppc/spapr.h |  3 +++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7c6f6ee25d..ee4b073d19 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1361,9 +1361,23 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
         patb += 16 * lpid;
         entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
         entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+        return true;
     }
 
+#ifdef CONFIG_TCG
+    /* Nested PAPR API */
+    SpaprMachineStateNestedGuest *guest;
+    assert(lpid != 0);
+    guest = spapr_get_nested_guest(spapr, lpid);
+    assert(guest != NULL);
+
+    entry->dw0 = guest->parttbl[0];
+    entry->dw1 = guest->parttbl[1];
+
     return true;
+#else
+    return false;
+#endif
 }
 
 #define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 498e7286fa..67e389a762 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -377,7 +377,6 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
     address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
-static
 SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
                                                      target_ulong lpid)
 {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c9f9682a46..cdc256f057 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -1052,5 +1052,8 @@ void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt);
 
 /* H_WATCHDOG */
 void spapr_watchdog_init(SpaprMachineState *spapr);
+/* Nested PAPR */
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+                                                     target_ulong lpid);
 
 #endif /* HW_SPAPR_H */
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (11 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 12/15] ppc: spapr: Use correct source for parttbl info for nested PAPR API Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  3:55   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE Harsh Prateek Bora
  2023-09-06  4:33 ` [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API Harsh Prateek Bora
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

Once the L1 has created a nested guest and its associated VCPU, it can
request for the execution of nested guest by setting its initial state
which can be done either using the h_guest_set_state or using the input
buffers along with the call to h_guest_run_vcpu(). On guest exit, L0
uses output buffers to convey the exit cause to the L1. L0 takes care of
switching context from L1 to L2 during guest entry and restores L1 context
on guest exit.

Unlike nested-hv, L2 (nested) guest's entire state is retained with
L0 after guest exit and restored on next entry in case of nested-papr.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_nested.c           | 471 +++++++++++++++++++++++++++-----
 include/hw/ppc/spapr_cpu_core.h |   7 +-
 include/hw/ppc/spapr_nested.h   |   6 +
 3 files changed, 408 insertions(+), 76 deletions(-)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 67e389a762..3605f27115 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -12,6 +12,17 @@
 #ifdef CONFIG_TCG
 #define PRTS_MASK      0x1f
 
+static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
+                                     SpaprMachineStateNestedGuestVcpu *vcpu);
+static void exit_process_output_buffer(PowerPCCPU *cpu,
+                                      SpaprMachineStateNestedGuest *guest,
+                                      target_ulong vcpuid,
+                                      target_ulong *r3);
+static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src);
+static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
+                       target_ulong vcpuid,
+                       bool inoutbuf);
+
 static target_ulong h_set_ptbl(PowerPCCPU *cpu,
                                SpaprMachineState *spapr,
                                target_ulong opcode,
@@ -187,21 +198,21 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
         return H_PARAMETER;
     }
 
-    spapr_cpu->nested_host_state = g_try_new(struct nested_ppc_state, 1);
-    if (!spapr_cpu->nested_host_state) {
+    spapr_cpu->nested_hv_host = g_try_new(struct nested_ppc_state, 1);
+    if (!spapr_cpu->nested_hv_host) {
         return H_NO_MEM;
     }
 
     assert(env->spr[SPR_LPIDR] == 0);
     assert(env->spr[SPR_DPDES] == 0);
-    nested_save_state(spapr_cpu->nested_host_state, cpu);
+    nested_save_state(spapr_cpu->nested_hv_host, cpu);
 
     len = sizeof(*regs);
     regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, false,
                                 MEMTXATTRS_UNSPECIFIED);
     if (!regs || len != sizeof(*regs)) {
         address_space_unmap(CPU(cpu)->as, regs, len, 0, false);
-        g_free(spapr_cpu->nested_host_state);
+        g_free(spapr_cpu->nested_hv_host);
         return H_P2;
     }
 
@@ -276,105 +287,146 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 
 void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 {
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
     CPUPPCState *env = &cpu->env;
     SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+    target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
     struct nested_ppc_state l2_state;
-    target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
-    target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
-    target_ulong hsrr0, hsrr1, hdar, asdr, hdsisr;
+    target_ulong hv_ptr, regs_ptr;
+    target_ulong hsrr0 = 0, hsrr1 = 0, hdar = 0, asdr = 0, hdsisr = 0;
     struct kvmppc_hv_guest_state *hvstate;
     struct kvmppc_pt_regs *regs;
     hwaddr len;
+    target_ulong lpid = 0, vcpuid = 0;
+    struct SpaprMachineStateNestedGuestVcpu *vcpu = NULL;
+    struct SpaprMachineStateNestedGuest *guest = NULL;
 
     assert(spapr_cpu->in_nested);
-
-    nested_save_state(&l2_state, cpu);
-    hsrr0 = env->spr[SPR_HSRR0];
-    hsrr1 = env->spr[SPR_HSRR1];
-    hdar = env->spr[SPR_HDAR];
-    hdsisr = env->spr[SPR_HDSISR];
-    asdr = env->spr[SPR_ASDR];
+    if (spapr->nested.api == NESTED_API_KVM_HV) {
+        nested_save_state(&l2_state, cpu);
+        hsrr0 = env->spr[SPR_HSRR0];
+        hsrr1 = env->spr[SPR_HSRR1];
+        hdar = env->spr[SPR_HDAR];
+        hdsisr = env->spr[SPR_HDSISR];
+        asdr = env->spr[SPR_ASDR];
+    } else if (spapr->nested.api == NESTED_API_PAPR) {
+        lpid = spapr_cpu->nested_papr_host->gpr[5];
+        vcpuid = spapr_cpu->nested_papr_host->gpr[6];
+        guest = spapr_get_nested_guest(spapr, lpid);
+        assert(guest);
+        vcpu_check(guest, vcpuid, false);
+        vcpu = &guest->vcpu[vcpuid];
+
+        exit_nested_restore_vcpu(cpu, excp, vcpu);
+        /* do the output buffer for run_vcpu*/
+        exit_process_output_buffer(cpu, guest, vcpuid, &r3_return);
+    } else
+        g_assert_not_reached();
 
     /*
      * Switch back to the host environment (including for any error).
      */
     assert(env->spr[SPR_LPIDR] != 0);
-    nested_load_state(cpu, spapr_cpu->nested_host_state);
-    env->gpr[3] = env->excp_vectors[excp]; /* hcall return value */
 
-    cpu_ppc_hdecr_exit(env);
+    if (spapr->nested.api == NESTED_API_KVM_HV) {
+        nested_load_state(cpu, spapr_cpu->nested_hv_host);
+        env->gpr[3] = r3_return;
+    } else if (spapr->nested.api == NESTED_API_PAPR) {
+        restore_common_regs(env, spapr_cpu->nested_papr_host);
+        env->tb_env->tb_offset -= vcpu->tb_offset;
+        env->gpr[3] = H_SUCCESS;
+        env->gpr[4] = r3_return;
+        hreg_compute_hflags(env);
+        ppc_maybe_interrupt(env);
+        tlb_flush(cs);
+        env->reserve_addr = -1; /* Reset the reservation */
+    }
 
-    spapr_cpu->in_nested = false;
+    cpu_ppc_hdecr_exit(env);
 
-    g_free(spapr_cpu->nested_host_state);
-    spapr_cpu->nested_host_state = NULL;
+    if (spapr->nested.api == NESTED_API_KVM_HV) {
+        hv_ptr = spapr_cpu->nested_hv_host->gpr[4];
+        regs_ptr = spapr_cpu->nested_hv_host->gpr[5];
+
+        len = sizeof(*hvstate);
+        hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
+                                    MEMTXATTRS_UNSPECIFIED);
+        if (len != sizeof(*hvstate)) {
+            address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
+            env->gpr[3] = H_PARAMETER;
+            return;
+        }
 
-    len = sizeof(*hvstate);
-    hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
-                                MEMTXATTRS_UNSPECIFIED);
-    if (len != sizeof(*hvstate)) {
-        address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
-        env->gpr[3] = H_PARAMETER;
-        return;
-    }
+        hvstate->cfar = l2_state.cfar;
+        hvstate->lpcr = l2_state.lpcr;
+        hvstate->pcr = l2_state.pcr;
+        hvstate->dpdes = l2_state.dpdes;
+        hvstate->hfscr = l2_state.hfscr;
+
+        if (excp == POWERPC_EXCP_HDSI) {
+            hvstate->hdar = hdar;
+            hvstate->hdsisr = hdsisr;
+            hvstate->asdr = asdr;
+        } else if (excp == POWERPC_EXCP_HISI) {
+            hvstate->asdr = asdr;
+        }
 
-    hvstate->cfar = l2_state.cfar;
-    hvstate->lpcr = l2_state.lpcr;
-    hvstate->pcr = l2_state.pcr;
-    hvstate->dpdes = l2_state.dpdes;
-    hvstate->hfscr = l2_state.hfscr;
+        /* HEIR should be implemented for HV mode and saved here. */
+        hvstate->srr0 = l2_state.srr0;
+        hvstate->srr1 = l2_state.srr1;
+        hvstate->sprg[0] = l2_state.sprg0;
+        hvstate->sprg[1] = l2_state.sprg1;
+        hvstate->sprg[2] = l2_state.sprg2;
+        hvstate->sprg[3] = l2_state.sprg3;
+        hvstate->pidr = l2_state.pidr;
+        hvstate->ppr = l2_state.ppr;
+
+        /* Is it okay to specify write len larger than actual data written? */
+        address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
+
+        len = sizeof(*regs);
+        regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
+                                    MEMTXATTRS_UNSPECIFIED);
+        if (!regs || len != sizeof(*regs)) {
+            address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
+            env->gpr[3] = H_P2;
+            return;
+        }
 
-    if (excp == POWERPC_EXCP_HDSI) {
-        hvstate->hdar = hdar;
-        hvstate->hdsisr = hdsisr;
-        hvstate->asdr = asdr;
-    } else if (excp == POWERPC_EXCP_HISI) {
-        hvstate->asdr = asdr;
-    }
+        len = sizeof(env->gpr);
+        assert(len == sizeof(regs->gpr));
+        memcpy(regs->gpr, l2_state.gpr, len);
 
-    /* HEIR should be implemented for HV mode and saved here. */
-    hvstate->srr0 = l2_state.srr0;
-    hvstate->srr1 = l2_state.srr1;
-    hvstate->sprg[0] = l2_state.sprg0;
-    hvstate->sprg[1] = l2_state.sprg1;
-    hvstate->sprg[2] = l2_state.sprg2;
-    hvstate->sprg[3] = l2_state.sprg3;
-    hvstate->pidr = l2_state.pidr;
-    hvstate->ppr = l2_state.ppr;
+        regs->link = l2_state.lr;
+        regs->ctr = l2_state.ctr;
+        regs->xer = l2_state.xer;
+        regs->ccr = l2_state.cr;
 
-    /* Is it okay to specify write length larger than actual data written? */
-    address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
+        if (excp == POWERPC_EXCP_MCHECK ||
+            excp == POWERPC_EXCP_RESET ||
+            excp == POWERPC_EXCP_SYSCALL) {
+            regs->nip = l2_state.srr0;
+            regs->msr = l2_state.srr1 & env->msr_mask;
+        } else {
+            regs->nip = hsrr0;
+            regs->msr = hsrr1 & env->msr_mask;
+        }
 
-    len = sizeof(*regs);
-    regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
-                                MEMTXATTRS_UNSPECIFIED);
-    if (!regs || len != sizeof(*regs)) {
-        address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
-        env->gpr[3] = H_P2;
-        return;
+        /* Is it okay to specify write len larger than actual data written? */
+        address_space_unmap(CPU(cpu)->as, regs, len, len, true);
     }
 
-    len = sizeof(env->gpr);
-    assert(len == sizeof(regs->gpr));
-    memcpy(regs->gpr, l2_state.gpr, len);
-
-    regs->link = l2_state.lr;
-    regs->ctr = l2_state.ctr;
-    regs->xer = l2_state.xer;
-    regs->ccr = l2_state.cr;
+    spapr_cpu->in_nested = false;
 
-    if (excp == POWERPC_EXCP_MCHECK ||
-        excp == POWERPC_EXCP_RESET ||
-        excp == POWERPC_EXCP_SYSCALL) {
-        regs->nip = l2_state.srr0;
-        regs->msr = l2_state.srr1 & env->msr_mask;
+    if (spapr->nested.api == NESTED_API_KVM_HV) {
+        g_free(spapr_cpu->nested_hv_host);
+        spapr_cpu->nested_hv_host = NULL;
     } else {
-        regs->nip = hsrr0;
-        regs->msr = hsrr1 & env->msr_mask;
+        g_free(spapr_cpu->nested_papr_host);
+        spapr_cpu->nested_papr_host = NULL;
     }
 
-    /* Is it okay to specify write length larger than actual data written? */
-    address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
 SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
@@ -1372,6 +1424,274 @@ static target_ulong h_guest_get_state(PowerPCCPU *cpu,
     return h_guest_getset_state(cpu, spapr, args, false);
 }
 
+static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src)
+{
+    memcpy(dst->gpr, src->gpr, sizeof(dst->gpr));
+    memcpy(dst->crf, src->crf, sizeof(dst->crf));
+    memcpy(dst->vsr, src->vsr, sizeof(dst->vsr));
+    dst->nip = src->nip;
+    dst->msr = src->msr;
+    dst->lr  = src->lr;
+    dst->ctr = src->ctr;
+    dst->cfar = src->cfar;
+    cpu_write_xer(dst, src->xer);
+    ppc_store_vscr(dst, ppc_get_vscr(src));
+    ppc_store_fpscr(dst, src->fpscr);
+    memcpy(dst->spr, src->spr, sizeof(dst->spr));
+}
+
+static void restore_l2_state(PowerPCCPU *cpu,
+                             CPUPPCState *env,
+                             struct SpaprMachineStateNestedGuestVcpu *vcpu,
+                             target_ulong now)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    target_ulong lpcr, lpcr_mask, hdec;
+    lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
+
+    if (spapr->nested.api == NESTED_API_PAPR) {
+        assert(vcpu);
+        assert(sizeof(env->gpr) == sizeof(vcpu->env.gpr));
+        restore_common_regs(env, &vcpu->env);
+        lpcr = (env->spr[SPR_LPCR] & ~lpcr_mask) |
+               (vcpu->env.spr[SPR_LPCR] & lpcr_mask);
+        lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
+        lpcr &= ~LPCR_LPES0;
+        env->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
+
+        hdec = vcpu->env.tb_env->hdecr_expiry_tb - now;
+        cpu_ppc_store_decr(env, vcpu->dec_expiry_tb - now);
+        cpu_ppc_hdecr_init(env);
+        cpu_ppc_store_hdecr(env, hdec);
+
+        env->tb_env->tb_offset += vcpu->tb_offset;
+    }
+}
+
+static void enter_nested(PowerPCCPU *cpu,
+                         uint64_t lpid,
+                         struct SpaprMachineStateNestedGuestVcpu *vcpu)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    CPUState *cs = CPU(cpu);
+    CPUPPCState *env = &cpu->env;
+    SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+    target_ulong now = cpu_ppc_load_tbl(env);
+
+    assert(env->spr[SPR_LPIDR] == 0);
+    assert(spapr->nested.api); /* ensure API version is initialized */
+    spapr_cpu->nested_papr_host = g_try_new(CPUPPCState, 1);
+    assert(spapr_cpu->nested_papr_host);
+    memcpy(spapr_cpu->nested_papr_host, env, sizeof(CPUPPCState));
+
+    restore_l2_state(cpu, env, vcpu, now);
+    env->spr[SPR_LPIDR] = lpid; /* post restore_l2_state */
+
+    spapr_cpu->in_nested = true;
+
+    hreg_compute_hflags(env);
+    ppc_maybe_interrupt(env);
+    tlb_flush(cs);
+    env->reserve_addr = -1; /* Reset the reservation */
+
+}
+
+static target_ulong h_guest_run_vcpu(PowerPCCPU *cpu,
+                                     SpaprMachineState *spapr,
+                                     target_ulong opcode,
+                                     target_ulong *args)
+{
+    CPUPPCState *env = &cpu->env;
+    target_ulong flags = args[0];
+    target_ulong lpid = args[1];
+    target_ulong vcpuid = args[2];
+    struct SpaprMachineStateNestedGuestVcpu *vcpu;
+    struct guest_state_request gsr;
+    SpaprMachineStateNestedGuest *guest;
+
+    if (flags) /* don't handle any flags for now */
+        return H_PARAMETER;
+
+    guest = spapr_get_nested_guest(spapr, lpid);
+    if (!guest) {
+        return H_P2;
+    }
+    if (!vcpu_check(guest, vcpuid, true)) {
+        return H_P3;
+    }
+
+    if (guest->parttbl[0] == 0) {
+        /* At least need a partition scoped radix tree */
+        return H_NOT_AVAILABLE;
+    }
+
+    vcpu = &guest->vcpu[vcpuid];
+
+    /* Read run_vcpu input buffer to update state */
+    gsr.buf = vcpu->runbufin.addr;
+    gsr.len = vcpu->runbufin.size;
+    gsr.flags = GUEST_STATE_REQUEST_SET; /* Thread wide + writing */
+    if (!map_and_getset_state(cpu, guest, vcpuid, &gsr)) {
+        enter_nested(cpu, lpid, vcpu);
+    }
+
+    return env->gpr[3];
+}
+
+struct run_vcpu_exit_cause run_vcpu_exit_causes[] = {
+    { .nia = 0x980,
+      .count = 0,
+    },
+    { .nia = 0xc00,
+      .count = 10,
+      .ids = {
+          GSB_VCPU_GPR3,
+          GSB_VCPU_GPR4,
+          GSB_VCPU_GPR5,
+          GSB_VCPU_GPR6,
+          GSB_VCPU_GPR7,
+          GSB_VCPU_GPR8,
+          GSB_VCPU_GPR9,
+          GSB_VCPU_GPR10,
+          GSB_VCPU_GPR11,
+          GSB_VCPU_GPR12,
+      },
+    },
+    { .nia = 0xe00,
+      .count = 5,
+      .ids = {
+          GSB_VCPU_SPR_HDAR,
+          GSB_VCPU_SPR_HDSISR,
+          GSB_VCPU_SPR_ASDR,
+          GSB_VCPU_SPR_NIA,
+          GSB_VCPU_SPR_MSR,
+      },
+    },
+    { .nia = 0xe20,
+      .count = 4,
+      .ids = {
+          GSB_VCPU_SPR_HDAR,
+          GSB_VCPU_SPR_ASDR,
+          GSB_VCPU_SPR_NIA,
+          GSB_VCPU_SPR_MSR,
+      },
+    },
+    { .nia = 0xe40,
+      .count = 3,
+      .ids = {
+          GSB_VCPU_SPR_HEIR,
+          GSB_VCPU_SPR_NIA,
+          GSB_VCPU_SPR_MSR,
+      },
+    },
+    { .nia = 0xea0,
+      .count = 0,
+    },
+    { .nia = 0xf80,
+      .count = 3,
+      .ids = {
+          GSB_VCPU_SPR_HFSCR,
+          GSB_VCPU_SPR_NIA,
+          GSB_VCPU_SPR_MSR,
+      },
+    },
+};
+
+static struct run_vcpu_exit_cause *find_exit_cause(uint64_t srr0)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(run_vcpu_exit_causes); i++)
+        if (srr0 == run_vcpu_exit_causes[i].nia) {
+            return &run_vcpu_exit_causes[i];
+        }
+
+    printf("%s: srr0:0x%016lx\n", __func__, srr0);
+    return NULL;
+}
+
+static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
+                                     SpaprMachineStateNestedGuestVcpu *vcpu)
+{
+    CPUPPCState *env = &cpu->env;
+    target_ulong now, hdar, hdsisr, asdr;
+
+    assert(sizeof(env->gpr) == sizeof(vcpu->env.gpr)); /* sanity check */
+
+    now = cpu_ppc_load_tbl(env); /* L2 timebase */
+    now -= vcpu->tb_offset; /* L1 timebase */
+    vcpu->dec_expiry_tb = now - cpu_ppc_load_decr(env);
+    /* backup hdar, hdsisr, asdr if reqd later below */
+    hdar   = vcpu->env.spr[SPR_HDAR];
+    hdsisr = vcpu->env.spr[SPR_HDSISR];
+    asdr   = vcpu->env.spr[SPR_ASDR];
+
+    restore_common_regs(&vcpu->env, env);
+
+    if (excp == POWERPC_EXCP_MCHECK ||
+        excp == POWERPC_EXCP_RESET ||
+        excp == POWERPC_EXCP_SYSCALL) {
+        vcpu->env.nip = env->spr[SPR_SRR0];
+        vcpu->env.msr = env->spr[SPR_SRR1] & env->msr_mask;
+    } else {
+        vcpu->env.nip = env->spr[SPR_HSRR0];
+        vcpu->env.msr = env->spr[SPR_HSRR1] & env->msr_mask;
+    }
+
+    /* hdar, hdsisr, asdr should be retained unless certain exceptions */
+    if ((excp != POWERPC_EXCP_HDSI) && (excp != POWERPC_EXCP_HISI)) {
+        vcpu->env.spr[SPR_ASDR] = asdr;
+    } else if (excp != POWERPC_EXCP_HDSI) {
+        vcpu->env.spr[SPR_HDAR]   = hdar;
+        vcpu->env.spr[SPR_HDSISR] = hdsisr;
+    }
+}
+
+static void exit_process_output_buffer(PowerPCCPU *cpu,
+                                      SpaprMachineStateNestedGuest *guest,
+                                      target_ulong vcpuid,
+                                      target_ulong *r3)
+{
+    SpaprMachineStateNestedGuestVcpu *vcpu = &guest->vcpu[vcpuid];
+    struct guest_state_request gsr;
+    struct guest_state_buffer *gsb;
+    struct guest_state_element *element;
+    struct guest_state_element_type *type;
+    struct run_vcpu_exit_cause *exit_cause;
+    hwaddr len;
+    int i;
+
+    len = vcpu->runbufout.size;
+    gsb = address_space_map(CPU(cpu)->as, vcpu->runbufout.addr, &len, true,
+                            MEMTXATTRS_UNSPECIFIED);
+    if (!gsb || len != vcpu->runbufout.size) {
+        address_space_unmap(CPU(cpu)->as, gsb, len, 0, true);
+        *r3 = H_P2;
+        return;
+    }
+
+    exit_cause = find_exit_cause(*r3);
+
+    /* Create a buffer of elements to send back */
+    gsb->num_elements = cpu_to_be32(exit_cause->count);
+    element = gsb->elements;
+    for (i = 0; i < exit_cause->count; i++) {
+        type = guest_state_element_type_find(exit_cause->ids[i]);
+        assert(type);
+        element->id = cpu_to_be16(exit_cause->ids[i]);
+        element->size = cpu_to_be16(type->size);
+        element = guest_state_element_next(element, NULL, NULL);
+    }
+    gsr.gsb = gsb;
+    gsr.len = VCPU_OUT_BUF_MIN_SZ;
+    gsr.flags = 0; /* get + never guest wide */
+    getset_state(guest, vcpuid, &gsr);
+
+    address_space_unmap(CPU(cpu)->as, gsb, len, len, true);
+    return;
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -1388,6 +1708,7 @@ void spapr_register_nested_phyp(void)
     spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
     spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
     spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
+    spapr_register_hypercall(H_GUEST_RUN_VCPU        , h_guest_run_vcpu);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 69a52e39b8..09855f69aa 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -53,7 +53,12 @@ typedef struct SpaprCpuState {
 
     /* Fields for nested-HV support */
     bool in_nested; /* true while the L2 is executing */
-    struct nested_ppc_state *nested_host_state; /* holds the L1 state while L2 executes */
+    union {
+        /* nested-hv needs minimal set of regs as L1 stores L2 state */
+        struct nested_ppc_state *nested_hv_host;
+        /* In nested-papr, L0 retains entire L2 state, so keep it all safe. */
+        CPUPPCState *nested_papr_host;
+    };
 } SpaprCpuState;
 
 static inline SpaprCpuState *spapr_cpu_state(PowerPCCPU *cpu)
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index eaee624b87..ca5d28c06e 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -358,6 +358,12 @@ struct guest_state_request {
     uint16_t flags;
 };
 
+struct run_vcpu_exit_cause {
+    uint64_t nia;
+    uint64_t count;
+    uint16_t ids[10]; /* max ids supported by run_vcpu_exit_causes */
+};
+
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
  * New member must be added at the end.
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (12 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  2:31   ` Nicholas Piggin
  2023-09-06  4:33 ` [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API Harsh Prateek Bora
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

This hcall is used by L1 to delete a guest entry in L0 or can also be
used to delete all guests if needed (usually in shutdown scenarios).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr_nested.c         | 32 ++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_nested.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 3605f27115..5afdad4990 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -1692,6 +1692,37 @@ static void exit_process_output_buffer(PowerPCCPU *cpu,
     return;
 }
 
+static target_ulong h_guest_delete(PowerPCCPU *cpu,
+                                   SpaprMachineState *spapr,
+                                   target_ulong opcode,
+                                   target_ulong *args)
+{
+    target_ulong flags = args[0];
+    target_ulong lpid = args[1];
+    struct SpaprMachineStateNestedGuest *guest;
+
+    if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
+        return H_FUNCTION;
+    }
+
+    /* handle flag deleteAllGuests, remaining bits reserved */
+    if (flags & ~H_GUEST_DELETE_ALL_MASK) {
+        return H_UNSUPPORTED_FLAG;
+    } else if (flags & H_GUEST_DELETE_ALL_MASK) {
+        g_hash_table_destroy(spapr->nested.guests);
+        return H_SUCCESS;
+    }
+
+    guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
+    if (!guest) {
+        return H_P2;
+    }
+
+    g_hash_table_remove(spapr->nested.guests, GINT_TO_POINTER(lpid));
+
+    return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
     spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -1709,6 +1740,7 @@ void spapr_register_nested_phyp(void)
     spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
     spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
     spapr_register_hypercall(H_GUEST_RUN_VCPU        , h_guest_run_vcpu);
+    spapr_register_hypercall(H_GUEST_DELETE          , h_guest_delete);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index ca5d28c06e..9eb43778ad 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -209,6 +209,7 @@
 #define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000000000000000 /* BE in GSB */
 #define GUEST_STATE_REQUEST_GUEST_WIDE       0x1
 #define GUEST_STATE_REQUEST_SET              0x2
+#define H_GUEST_DELETE_ALL_MASK              0x8000000000000000ULL
 
 #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
     .id = (i),                                     \
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API
  2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
                   ` (13 preceding siblings ...)
  2023-09-06  4:33 ` [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE Harsh Prateek Bora
@ 2023-09-06  4:33 ` Harsh Prateek Bora
  2023-09-07  3:56   ` Nicholas Piggin
  14 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-06  4:33 UTC (permalink / raw)
  To: npiggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

Adding initial documentation about Nested PAPR API to describe the set
of APIs and its usage. Also talks about the Guest State Buffer elements
and it's format which is used between L0/L1 to communicate L2 state.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 docs/devel/nested-papr.txt | 500 +++++++++++++++++++++++++++++++++++++
 1 file changed, 500 insertions(+)
 create mode 100644 docs/devel/nested-papr.txt

diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
new file mode 100644
index 0000000000..c5c2ba7e50
--- /dev/null
+++ b/docs/devel/nested-papr.txt
@@ -0,0 +1,500 @@
+Nested PAPR API (aka KVM on PowerVM)
+====================================
+
+This API aims at providing support to enable nested virtualization with
+KVM on PowerVM. While the existing support for nested KVM on PowerNV was
+introduced with cap-nested-hv option, however, with a slight design change,
+to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
+
+  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
+
+Work by:
+    Michael Neuling <mikey@neuling.org>
+    Vaibhav Jain <vaibhav@linux.ibm.com>
+    Jordan Niethe <jniethe5@gmail.com>
+    Harsh Prateek Bora <harshpb@linux.ibm.com>
+    Shivaprasad G Bhat <sbhat@linux.ibm.com>
+    Kautuk Consul <kconsul@linux.vnet.ibm.com>
+
+Below taken from the kernel documentation:
+
+Introduction
+============
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor. A significant design change
+wrt existing API is that now the entire L2 state is maintained within L0.
+
+Existing Nested-HV API
+======================
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras <paulus@ozlabs.org>
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a signal hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+===============
+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+More details of the individual hcalls follows:
+
+HCALL Details
+=============
+
+This documentation is provided to give an overall understating of the
+API. It doesn't aim to provide full details required to implement
+an L1 or L0. Latest PAPR spec shall be referred for more details.
+
+All these HCALLs are made by the L1 to the L0.
+
+H_GUEST_GET_CAPABILITIES()
+--------------------------
+
+This is called to get the capabilities of the L0 nested
+hypervisor. This includes capabilities such the CPU versions (eg
+POWER9, POWER10) that are supported as L2s.
+
+H_GUEST_SET_CAPABILITIES()
+--------------------------
+
+This is called to inform the L0 of the capabilities of the L1
+hypervisor. The set of flags passed here are the same as
+H_GUEST_GET_CAPABILITIES()
+
+Typically, GET will be called first and then SET will be called with a
+subset of the flags returned from GET. This process allows the L0 and
+L1 to negotiate a agreed set of capabilities.
+
+H_GUEST_CREATE()
+----------------
+
+This is called to create a L2. Returned is ID of the L2 created
+(similar to an LPID), which can be use on subsequent HCALLs to
+identify the L2.
+
+H_GUEST_CREATE_VCPU()
+---------------------
+
+This is called to create a vCPU associated with a L2. The L2 id
+(returned from H_GUEST_CREATE()) should be passed it. Also passed in
+is a unique (for this L2) vCPUid. This vCPUid is allocated by the
+L1.
+
+H_GUEST_SET_STATE()
+-------------------
+
+This is called to set L2 wide or vCPU specific L2 state. This info is
+passed via the Guest State Buffer (GSB), details below.
+
+This can set either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU wide are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to set.
+
+The L1 writes all values in the GSB and the L0 only reads the GSB for
+this call
+
+H_GUEST_GET_STATE()
+-------------------
+
+This is called to get state associated with a L2 or L2 vCPU. This info
+passed via the GSB (details below).
+
+This can get either L2 wide or vcpu specific information. Examples of
+L2 wide is the timebase offset or process scoped page table
+info. Examples of vCPU wide are GPRs or VSRs. A bit in the flags
+parameter specifies if this call is L2 wide or vCPU specific and the
+IDs in the GSB must match this.
+
+The L1 provides a pointer to the GSB as a parameter to this call. Also
+provided is the L2 and vCPU IDs associated with the state to get.
+
+The L1 writes only the IDs and sizes in the GSB.  L0 writes the
+associated values for each ID in the GSB.
+
+H_GUEST_RUN_VCPU()
+------------------
+
+This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
+parameters. The vCPU runs with the state set previously using
+H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
+hcall.
+
+This hcall also has associated input and output GSBs. Unlike
+H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
+parameters to the hcall (This was done in the interest of
+performance). The locations of these GSBs must be preregistered using
+the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table later
+below).
+
+The input GSB may contain only VCPU wide elements to be set. This GSB
+may also contain zero elements (ie 0 in the first 4 bytes of the GSB)
+if nothing needs to be set.
+
+On exit from the hcall, the output buffer is filled with elements
+determined by the L0. The reason for the exit is contained in GPR4 (ie
+NIP is put in GPR4).  The elements returned depend on the exit
+type. For example, if the exit reason is the L2 doing a hcall (GPR4 =
+0xc00), then GPR3-12 are provided in the output GSB as this is the
+state likely needed to service the hcall. If additional state is
+needed, H_GUEST_GET_STATE() may be called by the L1.
+
+To synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
+the L1 may set a flag (as a hcall parameter) and the L0 will
+synthesize the interrupt in the L2. Alternatively, the L1 may
+synthesize the interrupt itself using H_GUEST_SET_STATE() or the
+H_GUEST_RUN_VCPU() input GSB to set the state appropriately.
+
+H_GUEST_DELETE()
+----------------
+
+This is called to delete an L2. All associated vCPUs are also
+deleted. No specific vCPU delete call is provided.
+
+A flag may be provided to delete all guests. This is used to reset the
+L0 in the case of kdump/kexec.
+
+Guest State Buffer (GSB)
+========================
+
+The Guest State Buffer (GSB) is the main method of communicating state
+about the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
+H_GUEST_VCPU_RUN() calls.
+
+State may be associated with a whole L2 (eg timebase offset) or a
+specific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
+H_GUEST_VCPU_RUN().
+
+All data in the GSB is big endian (as is standard in PAPR)
+
+The Guest state buffer has a header which gives the number of
+elements, followed by the GSB elements themselves.
+
+GSB header:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++==========+==========+===========================================+
+|    0     |    4     |  Number of elements                       |
++----------+----------+-------------------------------------------+
+|    4     |          |  Guest state buffer elements              |
++----------+----------+-------------------------------------------+
+
+GSB element:
+
++----------+----------+-------------------------------------------+
+|  Offset  |  Size    |  Purpose                                  |
+|  Bytes   |  Bytes   |                                           |
++==========+==========+===========================================+
+|    0     |    2     |  ID                                       |
++----------+----------+-------------------------------------------+
+|    2     |    2     |  Size of Value                            |
++----------+----------+-------------------------------------------+
+|    4     | As above |  Value                                    |
++----------+----------+-------------------------------------------+
+
+The ID in the GSB element specifies what is to be set. This includes
+archtected state like GPRs, VSRs, SPRs, plus also some meta data about
+the partition like the timebase offset and partition scoped page
+table information.
+
++--------+-------+----+--------+----------------------------------+
+|   ID   | Size  | RW | Thread | Details                          |
+|        | Bytes |    | Guest  |                                  |
+|        |       |    | Scope  |                                  |
++========+=======+====+========+==================================+
+| 0x0000 |       | RW |   TG   | NOP element                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state            |
++--------+-------+----+--------+----------------------------------+
+| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
++--------+-------+----+--------+----------------------------------+
+| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
++--------+-------+----+--------+----------------------------------+
+| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
++--------+-------+----+--------+----------------------------------+
+| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x00 Addr part scope table      |
+|        |       |    |        |- 0x08 Num addr bits              |
+|        |       |    |        |- 0x10 Size root dir              |
++--------+-------+----+--------+----------------------------------+
+| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr proc scope table       |
+|        |       |    |        |- 0x8 Table size.                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0007-|       |    |        | Reserved                         |
+| 0x0BFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
+|        |       |    |        |                                  |
+|        |       |    |        |- 0x0 Addr of buffer              |
+|        |       |    |        |- 0x8 Buffer Size.                |
++--------+-------+----+--------+----------------------------------+
+| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
++--------+-------+----+--------+----------------------------------+
+| 0x0C03-|       |    |        | Reserved                         |
+| 0x0FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
+| 0x101F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
++--------+-------+----+--------+----------------------------------+
+| 0x1021 | 0x08  | RW |   T    | NIA                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1022 | 0x08  | RW |   T    | MSR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1023 | 0x08  | RW |   T    | LR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1024 | 0x08  | RW |   T    | XER                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1025 | 0x08  | RW |   T    | CTR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1026 | 0x08  | RW |   T    | CFAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1027 | 0x08  | RW |   T    | SRR0                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1028 | 0x08  | RW |   T    | SRR1                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1029 | 0x08  | RW |   T    | DAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
++--------+-------+----+--------+----------------------------------+
+| 0x102B | 0x08  | RW |   T    | VTB                              |
++--------+-------+----+--------+----------------------------------+
+| 0x102C | 0x08  | RW |   T    | LPCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102D | 0x08  | RW |   T    | HFSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x102E | 0x08  | RW |   T    | FSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x102F | 0x08  | RW |   T    | FPSCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1032 | 0x08  | RW |   T    | CIABR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1033 | 0x08  | RW |   T    | PURR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1034 | 0x08  | RW |   T    | SPURR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1035 | 0x08  | RW |   T    | IC                               |
++--------+-------+----+--------+----------------------------------+
+| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
+| 0x1039 |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103A | 0x08  | W  |   T    | PPR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
+| 0x103E |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x103F | 0x08  | RW |   T    | MMCRA                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1040 | 0x08  | RW |   T    | SIER                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1043 | 0x08  | RW |   T    | BESCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x1046 | 0x08  | RW |   T    | AMR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x1047 | 0x08  | RW |   T    | IAMR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1048 | 0x08  | RW |   T    | AMOR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104A | 0x08  | RW |   T    | SDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104B | 0x08  | RW |   T    | SIAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104C | 0x08  | RW |   T    | DSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x104D | 0x08  | RW |   T    | TAR                              |
++--------+-------+----+--------+----------------------------------+
+| 0x104E | 0x08  | RW |   T    | DEXCR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
++--------+-------+----+--------+----------------------------------+
+| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
++--------+-------+----+--------+----------------------------------+
+| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
++--------+-------+----+--------+----------------------------------+
+| 0x1052 | 0x08  | RW |   T    | CTRL                             |
++--------+-------+----+--------+----------------------------------+
+| 0x1053-|       |    |        | Reserved                         |
+| 0x1FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x2000 | 0x04  | RW |   T    | CR                               |
++--------+-------+----+--------+----------------------------------+
+| 0x2001 | 0x04  | RW |   T    | PIDR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2002 | 0x04  | RW |   T    | DSISR                            |
++--------+-------+----+--------+----------------------------------+
+| 0x2003 | 0x04  | RW |   T    | VSCR                             |
++--------+-------+----+--------+----------------------------------+
+| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
++--------+-------+----+--------+----------------------------------+
+| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
+| 0x200c |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x200D | 0x04  | RW |   T    | WORT                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200E | 0x04  | RW |   T    | PSPB                             |
++--------+-------+----+--------+----------------------------------+
+| 0x200F-|       |    |        | Reserved                         |
+| 0x2FFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
+| 0x303F |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0x3040-|       |    |        | Reserved                         |
+| 0xEFFF |       |    |        |                                  |
++--------+-------+----+--------+----------------------------------+
+| 0xF000 | 0x08  | R  |   T    | HDAR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
++--------+-------+----+--------+----------------------------------+
+| 0xF002 | 0x04  | R  |   T    | HEIR                             |
++--------+-------+----+--------+----------------------------------+
+| 0xF003 | 0x08  | R  |   T    | ASDR                             |
++--------+-------+----+--------+----------------------------------+
+
+Miscellaneous info
+==================
+
+State not in ptregs/hvregs
+--------------------------
+
+In the v1 API, some state is not in the ptregs/hvstate. This includes
+the vector register and some SPRs. For the L1 to set this state for
+the L2, the L1 loads up these hardware registers before the
+h_enter_nested() call and the L0 ensures they end up as the L2 state
+(by not touching them).
+
+The v2 API removes this and explicitly sets this state via the GSB.
+
+L1 Implementation details: Caching state
+----------------------------------------
+
+In the v1 API, all state is sent from the L1 to the L0 and vice versa
+on every h_enter_nested() hcall. If the L0 is not currently running
+any L2s, the L0 has no state information about them. The only
+exception to this is the location of the partition table, registered
+via h_set_partition_table().
+
+The v2 API changes this so that the L0 retains the L2 state even when
+it's vCPUs are no longer running. This means that the L1 only needs to
+communicate with the L0 about L2 state when it needs to modify the L2
+state, or when it's value is out of date. This provides an opportunity
+for performance optimisation.
+
+When a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
+marks all L2 state as invalid. This means that if the L1 wants to know
+the L2 state (say via a kvm_get_one_reg() call), it needs  to call
+H_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
+valid in L1 until the L2 is run again.
+
+Also, when an L1 modifies L2 vcpu state, it doesn't need to write it
+to the L0 until that L2 vcpu runs again. Hence when the L1 updates
+state (say via a kvm_set_one_reg() call), it writes to an internal L1
+copy and only flushes this copy to the L0 when the L2 runs again via
+the H_GUEST_VCPU_RUN() input buffer.
+
+This lazy updating of state by the L1 avoids unnecessary
+H_GUEST_{G|S}ET_STATE() calls.
+
+References
+==========
+
+For more details, please refer:
+
+[1] Kernel documentation (currently v4 on mailing list):
+    - https://lore.kernel.org/linuxppc-dev/20230905034658.82835-1-jniethe5@gmail.com/
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros
  2023-09-06  4:33 ` [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros Harsh Prateek Bora
@ 2023-09-06 23:48   ` Nicholas Piggin
  2023-09-11  6:21     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-06 23:48 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Adding new macros for the new hypercall op-codes, their return codes,
> Guest State Buffer (GSB) element IDs and few registers which shall be
> used in following patches to support Nested PAPR API.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  include/hw/ppc/spapr.h        |  23 ++++-
>  include/hw/ppc/spapr_nested.h | 186 ++++++++++++++++++++++++++++++++++
>  target/ppc/cpu.h              |   2 +
>  3 files changed, 209 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 538b2dfb89..3990fed1d9 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -367,6 +367,16 @@ struct SpaprMachineState {
>  #define H_NOOP            -63
>  #define H_UNSUPPORTED     -67
>  #define H_OVERLAP         -68
> +#define H_STATE           -75

[snip]

I didn't go through to make sure all the numbers are correct, but
generally looks okay. Are these just copied from KVM sources (or
vice versa)?

> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 25fac9577a..6f7f9b9d58 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1587,9 +1587,11 @@ void ppc_compat_add_property(Object *obj, const char *name,
>  #define SPR_PSPB              (0x09F)
>  #define SPR_DPDES             (0x0B0)
>  #define SPR_DAWR0             (0x0B4)
> +#define SPR_DAWR1             (0x0B5)
>  #define SPR_RPR               (0x0BA)
>  #define SPR_CIABR             (0x0BB)
>  #define SPR_DAWRX0            (0x0BC)
> +#define SPR_DAWRX1            (0x0BD)
>  #define SPR_HFSCR             (0x0BE)
>  #define SPR_VRSAVE            (0x100)
>  #define SPR_USPRG0            (0x100)

Stray change? Should be in 2nd DAWR patch, presumably.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API
  2023-09-06  4:33 ` [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API Harsh Prateek Bora
@ 2023-09-07  1:06   ` Nicholas Piggin
  2023-09-11  6:47     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  1:06 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch introduces new data structures to be used with Nested PAPR
> API. Also extends kvmppc_hv_guest_state with additional set of registers
> supported with nested PAPR API.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  include/hw/ppc/spapr_nested.h | 48 +++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index 5cb668dd53..f8db31075b 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -189,6 +189,39 @@
>  /* End of list of Guest State Buffer Element IDs */
>  #define GSB_LAST                GSB_VCPU_SPR_ASDR
>  
> +typedef struct SpaprMachineStateNestedGuest {
> +    unsigned long vcpus;
> +    struct SpaprMachineStateNestedGuestVcpu *vcpu;
> +    uint64_t parttbl[2];
> +    uint32_t pvr_logical;
> +    uint64_t tb_offset;
> +} SpaprMachineStateNestedGuest;
> +
> +struct SpaprMachineStateNested {
> +
> +    uint8_t api;
> +#define NESTED_API_KVM_HV  1
> +#define NESTED_API_PAPR    2
> +    uint64_t ptcr;
> +    uint32_t lpid_max;
> +    uint32_t pvr_base;
> +    bool capabilities_set;
> +    GHashTable *guests;
> +};
> +
> +struct SpaprMachineStateNestedGuestVcpuRunBuf {
> +    uint64_t addr;
> +    uint64_t size;
> +};
> +
> +typedef struct SpaprMachineStateNestedGuestVcpu {
> +    bool enabled;
> +    struct SpaprMachineStateNestedGuestVcpuRunBuf runbufin;
> +    struct SpaprMachineStateNestedGuestVcpuRunBuf runbufout;
> +    CPUPPCState env;
> +    int64_t tb_offset;
> +    int64_t dec_expiry_tb;
> +} SpaprMachineStateNestedGuestVcpu;
>  
>  /*
>   * Register state for entering a nested guest with H_ENTER_NESTED.
> @@ -228,6 +261,21 @@ struct kvmppc_hv_guest_state {
>      uint64_t dawr1;
>      uint64_t dawrx1;
>      /* Version 2 ends here */
> +    uint64_t dec;
> +    uint64_t fscr;
> +    uint64_t fpscr;
> +    uint64_t bescr;
> +    uint64_t ebbhr;
> +    uint64_t ebbrr;
> +    uint64_t tar;
> +    uint64_t dexcr;
> +    uint64_t hdexcr;
> +    uint64_t hashkeyr;
> +    uint64_t hashpkeyr;
> +    uint64_t ctrl;
> +    uint64_t vscr;
> +    uint64_t vrsave;
> +    ppc_vsr_t vsr[64];
>  };

Why? I can't see where it's used... This is API for the original HV
hcalls which is possibly now broken because the code uses sizeof()
when mapping it.

In general I'm not a fan of splitting patches by the type of code they
add. Definitions for external APIs okay. But for things like internal
structures I prefer added where they are introduced.

It's actually harder to review a patch if related / dependent changes
aren't in it, IMO. What should be split is unrelated or independent
changes and logical steps. Same goes for hcalls too actually. Take a
look at the series that introduced nested HV. 120f738a467 adds all the
hcalls, all the structures, etc. 

So I would also hink about squashing at least get/set capabilities
hcalls together, and guest create/delete, and probably vcpu create/run.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr
  2023-09-06  4:33 ` [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr Harsh Prateek Bora
@ 2023-09-07  1:13   ` Nicholas Piggin
  2023-09-11  7:24     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  1:13 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Use nested guest state specific struct for storing related info.

So this is the patch I would introduce the SpaprMachineStateNested
struct, with just the .ptrc member. Add other members to it as they
are used in later patches.

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr.c         | 4 ++--
>  hw/ppc/spapr_nested.c  | 4 ++--
>  include/hw/ppc/spapr.h | 3 ++-
>  3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 07e91e3800..e44686b04d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1340,8 +1340,8 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>  
>          assert(lpid != 0);
>  
> -        patb = spapr->nested_ptcr & PTCR_PATB;
> -        pats = spapr->nested_ptcr & PTCR_PATS;
> +        patb = spapr->nested.ptcr & PTCR_PATB;
> +        pats = spapr->nested.ptcr & PTCR_PATS;
>  
>          /* Check if partition table is properly aligned */
>          if (patb & MAKE_64BIT_MASK(0, pats + 12)) {

At this point I wonder if we should first move the nested part of
spapr_get_pate into nested code. It's a bit of a wart to have it
here when most of the other nested cases are abstracted from non
nested code quite well.

> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 121aa96ddc..a669470f1a 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -25,7 +25,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
>          return H_PARAMETER;
>      }
>  
> -    spapr->nested_ptcr = ptcr; /* Save new partition table */
> +    spapr->nested.ptcr = ptcr; /* Save new partition table */
>  
>      return H_SUCCESS;
>  }
> @@ -157,7 +157,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>      struct kvmppc_pt_regs *regs;
>      hwaddr len;
>  
> -    if (spapr->nested_ptcr == 0) {
> +    if (spapr->nested.ptcr == 0) {
>          return H_NOT_AVAILABLE;
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 3990fed1d9..c8b42af430 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -12,6 +12,7 @@
>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>  #include "hw/ppc/xics.h"        /* For ICSState */
>  #include "hw/ppc/spapr_tpm_proxy.h"
> +#include "hw/ppc/spapr_nested.h" /* for SpaprMachineStateNested */
>  
>  struct SpaprVioBus;
>  struct SpaprPhbState;
> @@ -216,7 +217,7 @@ struct SpaprMachineState {
>      uint32_t vsmt;       /* Virtual SMT mode (KVM's "core stride") */
>  
>      /* Nested HV support (TCG only) */
> -    uint64_t nested_ptcr;
> +    struct SpaprMachineStateNested nested;

I think convention says to use the typedef for these?

Thanks,
Nick

>  
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api
  2023-09-06  4:33 ` [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api Harsh Prateek Bora
@ 2023-09-07  1:35   ` Nicholas Piggin
  2023-09-11  8:18     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  1:35 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> With this patch, isolating kvm-hv nested api code to be executed only
> when cap-nested-hv is set. This helps keeping api specific logic
> mutually exclusive.

Changelog needs a bit of improvement. Emphasis on "why" for changelogs.
If you take a changeset that makes a single logical change to the code,
you should be able to understand why that is done. You could make some
assumptions about the bigger series when it comes to details so don't
have to explain from first principles. But if it's easy to explain why
the high level, you could.

Why are we adding this fundamentally? So that the spapr nested code can
be extended to support a second API.

This patch should add the api field to the struct, and also the
NESTED_API_KVM_HV definition.

Thanks,
Nick

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr.c      | 7 ++++++-
>  hw/ppc/spapr_caps.c | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e44686b04d..0aa9f21516 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1334,8 +1334,11 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>          /* Copy PATE1:GR into PATE0:HR */
>          entry->dw0 = spapr->patb_entry & PATE0_HR;
>          entry->dw1 = spapr->patb_entry;
> +        return true;
> +    }
> +    assert(spapr->nested.api);
>  
> -    } else {
> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
>          uint64_t patb, pats;
>  
>          assert(lpid != 0);
> @@ -3437,6 +3440,8 @@ static void spapr_instance_init(Object *obj)
>          spapr_get_host_serial, spapr_set_host_serial);
>      object_property_set_description(obj, "host-serial",
>          "Host serial number to advertise in guest device tree");
> +    /* Nested */
> +    spapr->nested.api = 0;
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 5a0755d34f..a3a790b026 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -454,6 +454,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>          return;
>      }
>  
> +    spapr->nested.api = NESTED_API_KVM_HV;
>      if (kvm_enabled()) {
>          if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
>                                spapr->max_compat_pvr)) {



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API
  2023-09-06  4:33 ` [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API Harsh Prateek Bora
@ 2023-09-07  1:49   ` Nicholas Piggin
  2023-09-19  9:49     ` Harsh Prateek Bora
  2023-09-07  1:52   ` Nicholas Piggin
  1 sibling, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  1:49 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch introduces a new cmd line option cap-nested-papr to enable
> support for nested PAPR API by setting the nested.api version accordingly.
> It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
> can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
> this is meant for nested guest on pseries (PowerVM) where L0 retains
> whole state of the nested guest. Both APIs are thus mutually exclusive.
> Support for related hcalls is being added in next set of patches.

This changelog could use some work too.

"Introduce a SPAPR capability cap-nested-papr with provides a nested
HV facility to the guest. This is similar to cap-nested-hv, but uses
a different (incompatible) API and so they are mutually exclusive."

You could add some documentation to say recent Linux pseries guests
support both, and explain more about KVM and PowerVM support there too,
if it is relevant.

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr.c         |  2 ++
>  hw/ppc/spapr_caps.c    | 48 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |  5 ++++-
>  3 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0aa9f21516..cbab7a825f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_cap_fwnmi,
>          &vmstate_spapr_fwnmi,
>          &vmstate_spapr_cap_rpt_invalidate,
> +        &vmstate_spapr_cap_nested_papr,
>          NULL
>      }
>  };
> @@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
>      smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index a3a790b026..d3b9f107aa 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>      }
>  }
>  
> +static void cap_nested_papr_apply(SpaprMachineState *spapr,
> +                                    uint8_t val, Error **errp)
> +{
> +    ERRP_GUARD();
> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +    CPUPPCState *env = &cpu->env;
> +
> +    if (!val) {
> +        /* capability disabled by default */
> +        return;
> +    }
> +
> +    if (tcg_enabled()) {
> +        if (!(env->insns_flags2 & PPC2_ISA300)) {
> +            error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
> +            error_append_hint(errp,
> +                              "Try appending -machine cap-nested-papr=off\n");
> +            return;
> +        }
> +        spapr->nested.api = NESTED_API_PAPR;

I'm not seeing any mutual exclusion with the other cap here. What if
you enable them both? Lucky dip?

It would actually be nice to enable both even if you just choose the
mode after the first hcall is made. I think you could actually support
both (even concurrently) quite easily.

For now this is probably okay if you fix mutex.


> +    } else if (kvm_enabled()) {
> +        /*
> +         * this gets executed in L1 qemu when L2 is launched,
> +         * needs kvm-hv support in L1 kernel.
> +         */
> +        if (!kvmppc_has_cap_nested_kvm_hv()) {
> +            error_setg(errp,
> +                       "KVM implementation does not support Nested-HV");
> +            error_append_hint(errp,
> +                              "Try appending -machine cap-nested-hv=off\n");
> +        } else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
> +            error_setg(errp, "Error enabling cap-nested-hv with KVM");
> +            error_append_hint(errp,
> +                              "Try appending -machine cap-nested-hv=off\n");
> +        }

This is just copy and pasted from the other cap, isn't it?

> +    }
> +}
> +
>  static void cap_large_decr_apply(SpaprMachineState *spapr,
>                                   uint8_t val, Error **errp)
>  {
> @@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>          .type = "bool",
>          .apply = cap_nested_kvm_hv_apply,
>      },
> +    [SPAPR_CAP_NESTED_PAPR] = {
> +        .name = "nested-papr",
> +        .description = "Allow Nested PAPR (Phyp)",
> +        .index = SPAPR_CAP_NESTED_PAPR,
> +        .get = spapr_cap_get_bool,
> +        .set = spapr_cap_set_bool,
> +        .type = "bool",
> +        .apply = cap_nested_papr_apply,
> +    },

Should scrub "Phyp". "Phyp" and PowerVM also doesn't mean anything for
us really. We both implement PAPR. "Nested PAPR" is jibberish for a user
as well -- "Allow Nested KVM-HV (PAPR API)" or similar might be a bit
better.

Thanks,
Nick

>      [SPAPR_CAP_LARGE_DECREMENTER] = {
>          .name = "large-decr",
>          .description = "Allow Large Decrementer",
> @@ -920,6 +967,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
>  SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>  SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> +SPAPR_CAP_MIG_STATE(nested_papr, SPAPR_CAP_NESTED_PAPR);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index c8b42af430..8a6e9ce929 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -81,8 +81,10 @@ typedef enum {
>  #define SPAPR_CAP_RPT_INVALIDATE        0x0B
>  /* Support for AIL modes */
>  #define SPAPR_CAP_AIL_MODE_3            0x0C
> +/* Nested PAPR */
> +#define SPAPR_CAP_NESTED_PAPR           0x0D
>  /* Num Caps */
> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_AIL_MODE_3 + 1)
> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_NESTED_PAPR + 1)
>  
>  /*
>   * Capability Values
> @@ -1005,6 +1007,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
>  extern const VMStateDescription vmstate_spapr_cap_ibs;
>  extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
>  extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
> +extern const VMStateDescription vmstate_spapr_cap_nested_papr;
>  extern const VMStateDescription vmstate_spapr_cap_large_decr;
>  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
>  extern const VMStateDescription vmstate_spapr_cap_fwnmi;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API
  2023-09-06  4:33 ` [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API Harsh Prateek Bora
  2023-09-07  1:49   ` Nicholas Piggin
@ 2023-09-07  1:52   ` Nicholas Piggin
  1 sibling, 0 replies; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  1:52 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch introduces a new cmd line option cap-nested-papr to enable
> support for nested PAPR API by setting the nested.api version accordingly.
> It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
> can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
> this is meant for nested guest on pseries (PowerVM) where L0 retains
> whole state of the nested guest. Both APIs are thus mutually exclusive.
> Support for related hcalls is being added in next set of patches.

Oh, this should be about the final patch too, when you have built
the code to actually support said capability.

Thanks,
Nick

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr.c         |  2 ++
>  hw/ppc/spapr_caps.c    | 48 ++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |  5 ++++-
>  3 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0aa9f21516..cbab7a825f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
>          &vmstate_spapr_cap_fwnmi,
>          &vmstate_spapr_fwnmi,
>          &vmstate_spapr_cap_rpt_invalidate,
> +        &vmstate_spapr_cap_nested_papr,
>          NULL
>      }
>  };
> @@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
>      smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> +    smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>      smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index a3a790b026..d3b9f107aa 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>      }
>  }
>  
> +static void cap_nested_papr_apply(SpaprMachineState *spapr,
> +                                    uint8_t val, Error **errp)
> +{
> +    ERRP_GUARD();
> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +    CPUPPCState *env = &cpu->env;
> +
> +    if (!val) {
> +        /* capability disabled by default */
> +        return;
> +    }
> +
> +    if (tcg_enabled()) {
> +        if (!(env->insns_flags2 & PPC2_ISA300)) {
> +            error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
> +            error_append_hint(errp,
> +                              "Try appending -machine cap-nested-papr=off\n");
> +            return;
> +        }
> +        spapr->nested.api = NESTED_API_PAPR;
> +    } else if (kvm_enabled()) {
> +        /*
> +         * this gets executed in L1 qemu when L2 is launched,
> +         * needs kvm-hv support in L1 kernel.
> +         */
> +        if (!kvmppc_has_cap_nested_kvm_hv()) {
> +            error_setg(errp,
> +                       "KVM implementation does not support Nested-HV");
> +            error_append_hint(errp,
> +                              "Try appending -machine cap-nested-hv=off\n");
> +        } else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
> +            error_setg(errp, "Error enabling cap-nested-hv with KVM");
> +            error_append_hint(errp,
> +                              "Try appending -machine cap-nested-hv=off\n");
> +        }
> +    }
> +}
> +
>  static void cap_large_decr_apply(SpaprMachineState *spapr,
>                                   uint8_t val, Error **errp)
>  {
> @@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>          .type = "bool",
>          .apply = cap_nested_kvm_hv_apply,
>      },
> +    [SPAPR_CAP_NESTED_PAPR] = {
> +        .name = "nested-papr",
> +        .description = "Allow Nested PAPR (Phyp)",
> +        .index = SPAPR_CAP_NESTED_PAPR,
> +        .get = spapr_cap_get_bool,
> +        .set = spapr_cap_set_bool,
> +        .type = "bool",
> +        .apply = cap_nested_papr_apply,
> +    },
>      [SPAPR_CAP_LARGE_DECREMENTER] = {
>          .name = "large-decr",
>          .description = "Allow Large Decrementer",
> @@ -920,6 +967,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
>  SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>  SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> +SPAPR_CAP_MIG_STATE(nested_papr, SPAPR_CAP_NESTED_PAPR);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index c8b42af430..8a6e9ce929 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -81,8 +81,10 @@ typedef enum {
>  #define SPAPR_CAP_RPT_INVALIDATE        0x0B
>  /* Support for AIL modes */
>  #define SPAPR_CAP_AIL_MODE_3            0x0C
> +/* Nested PAPR */
> +#define SPAPR_CAP_NESTED_PAPR           0x0D
>  /* Num Caps */
> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_AIL_MODE_3 + 1)
> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_NESTED_PAPR + 1)
>  
>  /*
>   * Capability Values
> @@ -1005,6 +1007,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
>  extern const VMStateDescription vmstate_spapr_cap_ibs;
>  extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
>  extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
> +extern const VMStateDescription vmstate_spapr_cap_nested_papr;
>  extern const VMStateDescription vmstate_spapr_cap_large_decr;
>  extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
>  extern const VMStateDescription vmstate_spapr_cap_fwnmi;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES
  2023-09-06  4:33 ` [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES Harsh Prateek Bora
@ 2023-09-07  2:02   ` Nicholas Piggin
  2023-09-19 10:48     ` Harsh Prateek Bora
  2023-10-03  8:10     ` Cédric Le Goater
  0 siblings, 2 replies; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  2:02 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch implements nested PAPR hcall H_GUEST_GET_CAPABILITIES and
> also enables registration of nested PAPR hcalls whenever an L0 is
> launched with cap-nested-papr=true. The common registration routine
> shall be used by future patches for registration of related hcall
> support
> being added. This hcall is used by L1 kernel to get the set of guest
> capabilities that are supported by L0 (Qemu TCG).

Changelog can drop "This patch". Probably don't have to be so
detailed here either -- we already established that PAPR hcalls can
be used with cap-nested-papr in the last patch, we know that L1
kernels make the hcalls to the vhyp, etc.

"Introduce the nested PAPR hcall H_GUEST_GET_CAPABILITIES which
is used to query the capabilities of the API and the L2 guests
it provides."

I would squash this with set.

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_caps.c           |  1 +
>  hw/ppc/spapr_nested.c         | 35 +++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_nested.h |  6 ++++++
>  3 files changed, 42 insertions(+)
>
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index d3b9f107aa..cbe53a79ec 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -511,6 +511,7 @@ static void cap_nested_papr_apply(SpaprMachineState *spapr,
>              return;
>          }
>          spapr->nested.api = NESTED_API_PAPR;
> +        spapr_register_nested_phyp();
>      } else if (kvm_enabled()) {
>          /*
>           * this gets executed in L1 qemu when L2 is launched,

Hmm, this doesn't match nested HV registration. If you want to register
the hcalls in the cap apply, can you move spapr_register_nested()
there first? It may make more sense to go in as a dummy function with
the cap patch first, since you don't introduce all hcalls together.

Also phyp->papr. Scrub for phyp please.

> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index a669470f1a..37f3a49be2 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -6,6 +6,7 @@
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/ppc/spapr_nested.h"
> +#include "cpu-models.h"
>  
>  #ifdef CONFIG_TCG
>  #define PRTS_MASK      0x1f
> @@ -375,6 +376,29 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>      address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>  }
>  
> +static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
> +                                             SpaprMachineState *spapr,
> +                                             target_ulong opcode,
> +                                             target_ulong *args)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    target_ulong flags = args[0];
> +
> +    if (flags) { /* don't handle any flags capabilities for now */
> +        return H_PARAMETER;
> +    }
> +
> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +        (CPU_POWERPC_POWER9_BASE))
> +        env->gpr[4] = H_GUEST_CAPABILITIES_P9_MODE;
> +
> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +        (CPU_POWERPC_POWER10_BASE))
> +        env->gpr[4] = H_GUEST_CAPABILITIES_P10_MODE;
> +
> +    return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -382,6 +406,12 @@ void spapr_register_nested(void)
>      spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
>      spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
>  }
> +
> +void spapr_register_nested_phyp(void)
> +{
> +    spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
> +}
> +
>  #else
>  void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  {
> @@ -392,4 +422,9 @@ void spapr_register_nested(void)
>  {
>      /* DO NOTHING */
>  }
> +
> +void spapr_register_nested_phyp(void)
> +{
> +    /* DO NOTHING */
> +}
>  #endif
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index f8db31075b..ce198e9f70 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -189,6 +189,11 @@
>  /* End of list of Guest State Buffer Element IDs */
>  #define GSB_LAST                GSB_VCPU_SPR_ASDR
>  
> +/* Bit masks to be used in nested PAPR API */
> +#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
> +#define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
> +#define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000

See introducing these defines with the patch that uses them isn't so
bad :)

Thanks,
Nick

> +
>  typedef struct SpaprMachineStateNestedGuest {
>      unsigned long vcpus;
>      struct SpaprMachineStateNestedGuestVcpu *vcpu;
> @@ -331,6 +336,7 @@ struct nested_ppc_state {
>  };
>  
>  void spapr_register_nested(void);
> +void spapr_register_nested_phyp(void);
>  void spapr_exit_nested(PowerPCCPU *cpu, int excp);
>  
>  #endif /* HW_SPAPR_NESTED_H */



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES
  2023-09-06  4:33 ` [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES Harsh Prateek Bora
@ 2023-09-07  2:09   ` Nicholas Piggin
  2023-10-03  4:59     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  2:09 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch implements nested PAPR hcall H_GUEST_SET_CAPABILITIES.
> This is used by L1 to set capabilities of the nested guest being
> created. The capabilities being set are subset of the capabilities
> returned from the previous call to H_GUEST_GET_CAPABILITIES hcall.
> Currently, it only supports P9/P10 capability check through PVR.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr.c                |  1 +
>  hw/ppc/spapr_nested.c         | 46 +++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_nested.h |  3 +++
>  3 files changed, 50 insertions(+)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index cbab7a825f..7c6f6ee25d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3443,6 +3443,7 @@ static void spapr_instance_init(Object *obj)
>          "Host serial number to advertise in guest device tree");
>      /* Nested */
>      spapr->nested.api = 0;
> +    spapr->nested.capabilities_set = false;

I would actually think about moving spapr->nested init into
spapr_nested.c.

>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 37f3a49be2..9af65f257f 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -399,6 +399,51 @@ static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
> +                                             SpaprMachineState *spapr,
> +                                             target_ulong opcode,
> +                                              target_ulong *args)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    target_ulong flags = args[0];
> +    target_ulong capabilities = args[1];
> +
> +    if (flags) { /* don't handle any flags capabilities for now */
> +        return H_PARAMETER;
> +    }
> +
> +

May need to do a pass over whitespace.

> +    /* isn't supported */
> +    if (capabilities & H_GUEST_CAPABILITIES_COPY_MEM) {
> +        env->gpr[4] = 0;
> +        return H_P2;
> +    }
> +
> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +        (CPU_POWERPC_POWER9_BASE)) {
> +        /* We are a P9 */
> +        if (!(capabilities & H_GUEST_CAPABILITIES_P9_MODE)) {
> +            env->gpr[4] = 1;
> +            return H_P2;
> +        }
> +    }
> +
> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +        (CPU_POWERPC_POWER10_BASE)) {
> +        /* We are a P10 */

The 2 comments above aren't helpful. Just remove them.

> +        if (!(capabilities & H_GUEST_CAPABILITIES_P10_MODE)) {
> +            env->gpr[4] = 2;
> +            return H_P2;
> +        }
> +    }
> +
> +    spapr->nested.capabilities_set = true;

Is it okay to set twice? If not, add a check. If yes, remove
capabilities_set until it's needed.

> +
> +    spapr->nested.pvr_base = env->spr[SPR_PVR];
> +
> +    return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -410,6 +455,7 @@ void spapr_register_nested(void)
>  void spapr_register_nested_phyp(void)
>  {
>      spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
> +    spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index ce198e9f70..a7996251cb 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -193,6 +193,9 @@
>  #define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
>  #define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
>  #define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000
> +#define H_GUEST_CAP_COPY_MEM_BMAP   0
> +#define H_GUEST_CAP_P9_MODE_BMAP    1
> +#define H_GUEST_CAP_P10_MODE_BMAP   2
>  
>  typedef struct SpaprMachineStateNestedGuest {
>      unsigned long vcpus;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE
  2023-09-06  4:33 ` [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE Harsh Prateek Bora
@ 2023-09-07  2:28   ` Nicholas Piggin
  2023-10-03  7:57     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  2:28 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This hcall is used by L1 to indicate to L0 that a new nested guest needs
> to be created and therefore necessary resource allocation shall be made.
> The L0 uses a hash table for nested guest specific resource management.
> This data structure is further utilized by other hcalls to operate on
> related members during entire life cycle of the nested guest.

Similar comment for changelog re detail. Detailed specification of API
and implementation could go in comments or documentation if useful.

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_nested.c         | 75 +++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_nested.h |  3 ++
>  2 files changed, 78 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 9af65f257f..09bbbfb341 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -444,6 +444,80 @@ static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
>      return H_SUCCESS;
>  }
>  
> +static void
> +destroy_guest_helper(gpointer value)
> +{
> +    struct SpaprMachineStateNestedGuest *guest = value;
> +    g_free(guest);
> +}
> +
> +static target_ulong h_guest_create(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   target_ulong opcode,
> +                                   target_ulong *args)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    target_ulong flags = args[0];
> +    target_ulong continue_token = args[1];
> +    uint64_t lpid;
> +    int nguests = 0;
> +    struct SpaprMachineStateNestedGuest *guest;
> +
> +    if (flags) { /* don't handle any flags for now */
> +        return H_UNSUPPORTED_FLAG;
> +    }
> +
> +    if (continue_token != -1) {
> +        return H_P2;
> +    }
> +
> +    if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
> +        return H_FUNCTION;
> +    }
> +
> +    if (!spapr->nested.capabilities_set) {
> +        return H_STATE;
> +    }
> +
> +    if (!spapr->nested.guests) {
> +        spapr->nested.lpid_max = NESTED_GUEST_MAX;
> +        spapr->nested.guests = g_hash_table_new_full(NULL,
> +                                                     NULL,
> +                                                     NULL,
> +                                                     destroy_guest_helper);

Is lpid_max only used by create? Probably no need to have it in spapr
then->nested then. Also, do we even need to have a limit?

> +    }
> +
> +    nguests = g_hash_table_size(spapr->nested.guests);
> +
> +    if (nguests == spapr->nested.lpid_max) {
> +        return H_NO_MEM;
> +    }
> +
> +    /* Lookup for available lpid */
> +    for (lpid = 1; lpid < spapr->nested.lpid_max; lpid++) {

PAPR API calls it "guest ID" I think. Should change all references to
lpid to that.

> +        if (!(g_hash_table_lookup(spapr->nested.guests,
> +                                  GINT_TO_POINTER(lpid)))) {
> +            break;
> +        }
> +    }
> +    if (lpid == spapr->nested.lpid_max) {
> +        return H_NO_MEM;
> +    }
> +
> +    guest = g_try_new0(struct SpaprMachineStateNestedGuest, 1);
> +    if (!guest) {
> +        return H_NO_MEM;
> +    }
> +
> +    guest->pvr_logical = spapr->nested.pvr_base;
> +
> +    g_hash_table_insert(spapr->nested.guests, GINT_TO_POINTER(lpid), guest);
> +    printf("%s: lpid: %lu (MAX: %i)\n", __func__, lpid, spapr->nested.lpid_max);

Remove printf.

> +
> +    env->gpr[4] = lpid;
> +    return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -456,6 +530,7 @@ void spapr_register_nested_phyp(void)
>  {
>      spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>      spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
> +    spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index a7996251cb..7841027df8 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -197,6 +197,9 @@
>  #define H_GUEST_CAP_P9_MODE_BMAP    1
>  #define H_GUEST_CAP_P10_MODE_BMAP   2
>  
> +/* Nested PAPR API macros */
> +#define NESTED_GUEST_MAX 4096

Prefix with PAPR_?

Thanks,
Nick

> +
>  typedef struct SpaprMachineStateNestedGuest {
>      unsigned long vcpus;
>      struct SpaprMachineStateNestedGuestVcpu *vcpu;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE
  2023-09-06  4:33 ` [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE Harsh Prateek Bora
@ 2023-09-07  2:31   ` Nicholas Piggin
  2023-10-03  8:01     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  2:31 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This hcall is used by L1 to delete a guest entry in L0 or can also be
> used to delete all guests if needed (usually in shutdown scenarios).

I'd squash with at least the create hcall.

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_nested.c         | 32 ++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_nested.h |  1 +
>  2 files changed, 33 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 3605f27115..5afdad4990 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -1692,6 +1692,37 @@ static void exit_process_output_buffer(PowerPCCPU *cpu,
>      return;
>  }
>  
> +static target_ulong h_guest_delete(PowerPCCPU *cpu,
> +                                   SpaprMachineState *spapr,
> +                                   target_ulong opcode,
> +                                   target_ulong *args)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong lpid = args[1];
> +    struct SpaprMachineStateNestedGuest *guest;
> +
> +    if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
> +        return H_FUNCTION;
> +    }

If you only register these hcalls when you apply the cap, then you
don't need to test it, right?

Open question as to whether it's better to register hcalls when
enabling such caps, or do the tests for them here. I guess the
former makes sense.

> +
> +    /* handle flag deleteAllGuests, remaining bits reserved */

This comment is confusing. What is flag deleteAllGuests?

H_GUEST_DELETE_ALL_MASK? Is that a mask, or a flag?

> +    if (flags & ~H_GUEST_DELETE_ALL_MASK) {
> +        return H_UNSUPPORTED_FLAG;
> +    } else if (flags & H_GUEST_DELETE_ALL_MASK) {
> +        g_hash_table_destroy(spapr->nested.guests);
> +        return H_SUCCESS;
> +    }
> +
> +    guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
> +    if (!guest) {
> +        return H_P2;
> +    }
> +
> +    g_hash_table_remove(spapr->nested.guests, GINT_TO_POINTER(lpid));
> +
> +    return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -1709,6 +1740,7 @@ void spapr_register_nested_phyp(void)
>      spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
>      spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
>      spapr_register_hypercall(H_GUEST_RUN_VCPU        , h_guest_run_vcpu);
> +    spapr_register_hypercall(H_GUEST_DELETE          , h_guest_delete);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index ca5d28c06e..9eb43778ad 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -209,6 +209,7 @@
>  #define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000000000000000 /* BE in GSB */
>  #define GUEST_STATE_REQUEST_GUEST_WIDE       0x1
>  #define GUEST_STATE_REQUEST_SET              0x2
> +#define H_GUEST_DELETE_ALL_MASK              0x8000000000000000ULL
>  
>  #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
>      .id = (i),                                     \



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU
  2023-09-06  4:33 ` [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU Harsh Prateek Bora
@ 2023-09-07  2:49   ` Nicholas Piggin
  2023-10-04  4:49     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  2:49 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch implements support for hcall H_GUEST_CREATE_VCPU which is
> used to instantiate a new VCPU for a previously created nested guest.
> The L1 provide the guest-id (returned by L0 during call to
> H_GUEST_CREATE) and an associated unique vcpu-id to refer to this
> instance in future calls. It is assumed that vcpu-ids are being
> allocated in a sequential manner and max vcpu limit is 2048.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_nested.c         | 110 ++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h        |   1 +
>  include/hw/ppc/spapr_nested.h |   1 +
>  3 files changed, 112 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 09bbbfb341..e7956685af 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -376,6 +376,47 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>      address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>  }
>  
> +static
> +SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
> +                                                     target_ulong lpid)
> +{
> +    SpaprMachineStateNestedGuest *guest;
> +
> +    guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
> +    return guest;
> +}

Are you namespacing the new API stuff with papr or no? Might be good to
reduce confusion.

> +
> +static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
> +                       target_ulong vcpuid,
> +                       bool inoutbuf)

What's it checking? That the id is valid? Allocated? Enabled?

> +{
> +    struct SpaprMachineStateNestedGuestVcpu *vcpu;
> +
> +    if (vcpuid >= NESTED_GUEST_VCPU_MAX) {
> +        return false;
> +    }
> +
> +    if (!(vcpuid < guest->vcpus)) {
> +        return false;
> +    }
> +
> +    vcpu = &guest->vcpu[vcpuid];
> +    if (!vcpu->enabled) {
> +        return false;
> +    }
> +
> +    if (!inoutbuf) {
> +        return true;
> +    }
> +
> +    /* Check to see if the in/out buffers are registered */
> +    if (vcpu->runbufin.addr && vcpu->runbufout.addr) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
>  static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>                                               SpaprMachineState *spapr,
>                                               target_ulong opcode,
> @@ -448,6 +489,11 @@ static void
>  destroy_guest_helper(gpointer value)
>  {
>      struct SpaprMachineStateNestedGuest *guest = value;
> +    int i = 0;

Don't need to set i = 0 twice. A newline would be good though.

> +    for (i = 0; i < guest->vcpus; i++) {
> +        cpu_ppc_tb_free(&guest->vcpu[i].env);
> +    }
> +    g_free(guest->vcpu);
>      g_free(guest);
>  }
>  
> @@ -518,6 +564,69 @@ static target_ulong h_guest_create(PowerPCCPU *cpu,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
> +                                        SpaprMachineState *spapr,
> +                                        target_ulong opcode,
> +                                        target_ulong *args)
> +{
> +    CPUPPCState *env = &cpu->env, *l2env;
> +    target_ulong flags = args[0];
> +    target_ulong lpid = args[1];
> +    target_ulong vcpuid = args[2];
> +    SpaprMachineStateNestedGuest *guest;
> +
> +    if (flags) { /* don't handle any flags for now */
> +        return H_UNSUPPORTED_FLAG;
> +    }
> +
> +    guest = spapr_get_nested_guest(spapr, lpid);
> +    if (!guest) {
> +        return H_P2;
> +    }
> +
> +    if (vcpuid < guest->vcpus) {
> +        return H_IN_USE;
> +    }
> +
> +    if (guest->vcpus >= NESTED_GUEST_VCPU_MAX) {
> +        return H_P3;
> +    }
> +
> +    if (guest->vcpus) {
> +        struct SpaprMachineStateNestedGuestVcpu *vcpus;

Ditto for using typedefs. Do a sweep for this.

> +        vcpus = g_try_renew(struct SpaprMachineStateNestedGuestVcpu,
> +                            guest->vcpu,
> +                            guest->vcpus + 1);

g_try_renew doesn't work with NULL mem? That's unfortunate.

> +        if (!vcpus) {
> +            return H_NO_MEM;
> +        }
> +        memset(&vcpus[guest->vcpus], 0,
> +               sizeof(struct SpaprMachineStateNestedGuestVcpu));
> +        guest->vcpu = vcpus;
> +        l2env = &vcpus[guest->vcpus].env;
> +    } else {
> +        guest->vcpu = g_try_new0(struct SpaprMachineStateNestedGuestVcpu, 1);
> +        if (guest->vcpu == NULL) {
> +            return H_NO_MEM;
> +        }
> +        l2env = &guest->vcpu->env;
> +    }

These two legs seem to be doing the same thing in different
ways wrt l2env. Just assign guest->vcpu in the branches and
get the l2env from guest->vcpu[guest->vcpus] afterward, no?

> +    /* need to memset to zero otherwise we leak L1 state to L2 */
> +    memset(l2env, 0, sizeof(CPUPPCState));

AFAIKS you just zeroed it above.

> +    /* Copy L1 PVR to L2 */
> +    l2env->spr[SPR_PVR] = env->spr[SPR_PVR];
> +    cpu_ppc_tb_init(l2env, SPAPR_TIMEBASE_FREQ);

I would move this down to the end, because it's setting up the
vcpu...

> +
> +    guest->vcpus++;
> +    assert(vcpuid < guest->vcpus); /* linear vcpuid allocation only */
> +    guest->vcpu[vcpuid].enabled = true;
> +

... This is still allocating the vcpu so move it up.

> +    if (!vcpu_check(guest, vcpuid, false)) {
> +        return H_PARAMETER;
> +    }
> +    return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -531,6 +640,7 @@ void spapr_register_nested_phyp(void)
>      spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>      spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>      spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
> +    spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 8a6e9ce929..c9f9682a46 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -371,6 +371,7 @@ struct SpaprMachineState {
>  #define H_UNSUPPORTED     -67
>  #define H_OVERLAP         -68
>  #define H_STATE           -75
> +#define H_IN_USE          -77

Why add it here and not in the first patch?

>  #define H_INVALID_ELEMENT_ID               -79
>  #define H_INVALID_ELEMENT_SIZE             -80
>  #define H_INVALID_ELEMENT_VALUE            -81
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index 7841027df8..2e8c6ba1ca 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -199,6 +199,7 @@
>  
>  /* Nested PAPR API macros */
>  #define NESTED_GUEST_MAX 4096
> +#define NESTED_GUEST_VCPU_MAX 2048
>  

PAPR_ prefix?

>  typedef struct SpaprMachineStateNestedGuest {
>      unsigned long vcpus;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table.
  2023-09-06  4:33 ` [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table Harsh Prateek Bora
@ 2023-09-07  3:01   ` Nicholas Piggin
  2023-10-04  9:27     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  3:01 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

Might be good to add a common nested: prefix to all patches actually.

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This is a first step towards enabling support for nested PAPR hcalls for
> providing the get/set of various Guest State Buffer (GSB) elements via
> h_guest_[g|s]et_state hcalls. This enables for identifying correct
> callbacks for get/set for each of the elements supported via
> h_guest_[g|s]et_state hcalls, support for which is added in next patch.

Changelog could use work.

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_hcall.c          |   1 +
>  hw/ppc/spapr_nested.c         | 487 ++++++++++++++++++++++++++++++++++
>  include/hw/ppc/ppc.h          |   2 +
>  include/hw/ppc/spapr_nested.h | 102 +++++++
>  4 files changed, 592 insertions(+)
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 9b1f225d4a..ca609cb5a4 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1580,6 +1580,7 @@ static void hypercall_register_types(void)
>      spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
>  
>      spapr_register_nested();
> +    init_nested();

This is for hcall registration, not general subsystem init I think.
Arguably not sure if it matters, it just looks odd for everything
else to be an hcall except this. I would just add a new init
function.

And actually now I look closer at this, I would not do your papr
hcall init in the cap apply function, if it is possible to do
inside spapr_register_nested(), then that function could look at
which caps are enabled and register the appropriate hcalls. Then
no change to move this into cap code.

>  }
>  
>  type_init(hypercall_register_types)
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index e7956685af..6fbb1bcb02 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c

[snip]

My eyes are going square, I'll review this later.

> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index e095c002dc..d7acc28d17 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -33,6 +33,8 @@ struct ppc_tb_t {
>      QEMUTimer *decr_timer;
>      /* Hypervisor decrementer management */
>      uint64_t hdecr_next;    /* Tick for next hdecr interrupt  */
> +    /* TB that HDEC should fire and return ctrl back to the Host partition */
> +    uint64_t hdecr_expiry_tb;

Why is this here?

>      QEMUTimer *hdecr_timer;
>      int64_t purr_offset;
>      void *opaque;
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index 2e8c6ba1ca..3c0d6a486e 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h

[snip]

>  
> +struct guest_state_element_type {
> +    uint16_t id;
> +    int size;
> +#define GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE 0x1
> +#define GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY  0x2
> +   uint16_t flags;
> +    void *(*location)(SpaprMachineStateNestedGuest *, target_ulong);
> +    size_t offset;
> +    void (*copy)(void *, void *, bool);
> +    uint64_t mask;
> +};

I have to wonder whether this is the best way to go. Having
these indicrect function calls and array of "ops" like this
might be limiting the compiler. I wonder if it should just
be done in a switch table, which is how most interpreters
I've seen (which admittedly is not many) seem to do it.

Thanks,
Nick



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE
  2023-09-06  4:33 ` [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE Harsh Prateek Bora
@ 2023-09-07  3:30   ` Nicholas Piggin
  2023-10-09  8:23     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  3:30 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> L1 can reuest to get/set state of any of the supported Guest State
> Buffer (GSB) elements using h_guest_[get|set]_state hcalls.
> These hcalls needs to do some necessary validation check for each
> get/set request based on the flags passed and operation supported.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_nested.c         | 267 ++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_nested.h |  22 +++
>  2 files changed, 289 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 6fbb1bcb02..498e7286fa 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -897,6 +897,138 @@ void init_nested(void)
>      }
>  }
>  
> +static struct guest_state_element *guest_state_element_next(
> +    struct guest_state_element *element,
> +    int64_t *len,
> +    int64_t *num_elements)
> +{
> +    uint16_t size;
> +
> +    /* size is of element->value[] only. Not whole guest_state_element */
> +    size = be16_to_cpu(element->size);
> +
> +    if (len) {
> +        *len -= size + offsetof(struct guest_state_element, value);
> +    }
> +
> +    if (num_elements) {
> +        *num_elements -= 1;
> +    }
> +
> +    return (struct guest_state_element *)(element->value + size);
> +}
> +
> +static
> +struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
> +{
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
> +        if (id == guest_state_element_types[i].id) {
> +            return &guest_state_element_types[i];
> +        }
> +
> +    return NULL;
> +}
> +
> +static void print_element(struct guest_state_element *element,
> +                          struct guest_state_request *gsr)
> +{
> +    printf("id:0x%04x size:0x%04x %s ",
> +           be16_to_cpu(element->id), be16_to_cpu(element->size),
> +           gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get");
> +    printf("buf:0x%016lx ...\n", be64_to_cpu(*(uint64_t *)element->value));

No printfs. These could be GUEST_ERROR qemu logs if anything, make
sure they're relatively well formed messages if you keep them, i.e.,
something a Linux/KVM developer could understand what went wrong.
I.e., no __func__ which is internal to QEMU, use "H_GUEST_GET_STATE"
etc. Ditto for all the rest of the printfs.

> +}
> +
> +static bool guest_state_request_check(struct guest_state_request *gsr)
> +{
> +    int64_t num_elements, len = gsr->len;
> +    struct guest_state_buffer *gsb = gsr->gsb;
> +    struct guest_state_element *element;
> +    struct guest_state_element_type *type;
> +    uint16_t id, size;
> +
> +    /* gsb->num_elements = 0 == 32 bits long */
> +    assert(len >= 4);

I haven't looked closely, but can the guest can't crash the
host with malformed requests here?

This API is pretty complicated, make sure you sanitize all inputs
carefully, as early as possible, and without too deep a call and
control flow chain from the API entry point.


> +
> +    num_elements = be32_to_cpu(gsb->num_elements);
> +    element = gsb->elements;
> +    len -= sizeof(gsb->num_elements);
> +
> +    /* Walk the buffer to validate the length */
> +    while (num_elements) {
> +
> +        id = be16_to_cpu(element->id);
> +        size = be16_to_cpu(element->size);
> +
> +        if (false) {
> +            print_element(element, gsr);
> +        }
> +        /* buffer size too small */
> +        if (len < 0) {
> +            return false;
> +        }
> +
> +        type = guest_state_element_type_find(id);
> +        if (!type) {
> +            printf("%s: Element ID %04x unknown\n", __func__, id);
> +            print_element(element, gsr);
> +            return false;
> +        }
> +
> +        if (id == GSB_HV_VCPU_IGNORED_ID) {
> +            goto next_element;
> +        }
> +
> +        if (size != type->size) {
> +            printf("%s: Size mismatch. Element ID:%04x. Size Exp:%i Got:%i\n",
> +                   __func__, id, type->size, size);
> +            print_element(element, gsr);
> +            return false;
> +        }
> +
> +        if ((type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY) &&
> +            (gsr->flags & GUEST_STATE_REQUEST_SET)) {
> +            printf("%s: trying to set a read-only Element ID:%04x.\n",
> +                   __func__, id);
> +            return false;
> +        }
> +
> +        if (type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE) {
> +            /* guest wide element type */
> +            if (!(gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE)) {
> +                printf("%s: trying to set a guest wide Element ID:%04x.\n",
> +                       __func__, id);
> +                return false;
> +            }
> +        } else {
> +            /* thread wide element type */
> +            if (gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE) {
> +                printf("%s: trying to set a thread wide Element ID:%04x.\n",
> +                       __func__, id);
> +                return false;
> +            }
> +        }
> +next_element:
> +        element = guest_state_element_next(element, &len, &num_elements);
> +
> +    }
> +    return true;
> +}
> +
> +static bool is_gsr_invalid(struct guest_state_request *gsr,
> +                                   struct guest_state_element *element,
> +                                   struct guest_state_element_type *type)
> +{
> +    if ((gsr->flags & GUEST_STATE_REQUEST_SET) &&
> +        (*(uint64_t *)(element->value) & ~(type->mask))) {
> +        print_element(element, gsr);
> +        printf("L1 can't set reserved bits (allowed mask: 0x%08lx)\n",
> +               type->mask);
> +        return true;
> +    }
> +    return false;
> +}
>  
>  static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>                                               SpaprMachineState *spapr,
> @@ -1108,6 +1240,139 @@ static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
>      return H_SUCCESS;
>  }
>  
> +static target_ulong getset_state(SpaprMachineStateNestedGuest *guest,
> +                                 uint64_t vcpuid,
> +                                 struct guest_state_request *gsr)
> +{
> +    void *ptr;
> +    uint16_t id;
> +    struct guest_state_element *element;
> +    struct guest_state_element_type *type;
> +    int64_t lenleft, num_elements;
> +
> +    lenleft = gsr->len;
> +
> +    if (!guest_state_request_check(gsr)) {
> +        return H_P3;
> +    }
> +
> +    num_elements = be32_to_cpu(gsr->gsb->num_elements);
> +    element = gsr->gsb->elements;
> +    /* Process the elements */
> +    while (num_elements) {
> +        type = NULL;
> +        /* Debug print before doing anything */
> +        if (false) {
> +            print_element(element, gsr);
> +        }
> +
> +        id = be16_to_cpu(element->id);
> +        if (id == GSB_HV_VCPU_IGNORED_ID) {
> +            goto next_element;
> +        }
> +
> +        type = guest_state_element_type_find(id);
> +        assert(type);
> +
> +        /* Get pointer to guest data to get/set */
> +        if (type->location && type->copy) {
> +            ptr = type->location(guest, vcpuid);
> +            assert(ptr);
> +            if (!~(type->mask) && is_gsr_invalid(gsr, element, type)) {
> +                return H_INVALID_ELEMENT_VALUE;
> +            }
> +            type->copy(ptr + type->offset, element->value,
> +                       gsr->flags & GUEST_STATE_REQUEST_SET ? true : false);
> +        }
> +
> +next_element:
> +        element = guest_state_element_next(element, &lenleft, &num_elements);
> +    }
> +
> +    return H_SUCCESS;
> +}
> +
> +static target_ulong map_and_getset_state(PowerPCCPU *cpu,
> +                                         SpaprMachineStateNestedGuest *guest,
> +                                         uint64_t vcpuid,
> +                                         struct guest_state_request *gsr)
> +{
> +    target_ulong rc;
> +    int64_t lenleft, len;
> +    bool is_write;
> +
> +    assert(gsr->len < (1024 * 1024)); /* sanity check */

Use a #define for this, make sure guest can't crash host.
> +
> +    lenleft = len = gsr->len;

Why lenleft? Can't you just check gsr->len like you do gsr->gsb?

> +    gsr->gsb = address_space_map(CPU(cpu)->as, gsr->buf, (uint64_t *)&len,
> +                                 false, MEMTXATTRS_UNSPECIFIED);

So it's a read-only memory access to gsr->buf? Even for the set?

> +    if (!gsr->gsb) {
> +        rc = H_P3;
> +        goto out1;
> +    }
> +
> +    if (len != lenleft) {
> +        rc = H_P3;
> +        goto out1;
> +    }
> +
> +    rc = getset_state(guest, vcpuid, gsr);
> +
> +out1:
> +    is_write = (rc == H_SUCCESS) ? len : 0;
> +    address_space_unmap(CPU(cpu)->as, gsr->gsb, len, is_write, false);

I don't think this is right, you want to specify the length of memory
you actually accessed, even if there was some error.

Over-specifying I think would be okay. So I think just use len.


> +    return rc;
> +}
> +
> +static target_ulong h_guest_getset_state(PowerPCCPU *cpu,
> +                                         SpaprMachineState *spapr,
> +                                         target_ulong *args,
> +                                         bool set)
> +{
> +    target_ulong flags = args[0];
> +    target_ulong lpid = args[1];
> +    target_ulong vcpuid = args[2];
> +    target_ulong buf = args[3];
> +    target_ulong buflen = args[4];
> +    struct guest_state_request gsr;
> +    SpaprMachineStateNestedGuest *guest;
> +
> +    guest = spapr_get_nested_guest(spapr, lpid);
> +    if (!guest) {
> +        return H_P2;
> +    }
> +    gsr.buf = buf;
> +    gsr.len = buflen;
> +    gsr.flags = 0;

Not a big fan of packaging up some args into a structure,
especially if it's pretty static to a file and no need to be
carried around with some data. Do you even need this gsr
thing?

> +    if (flags & H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
> +        gsr.flags |= GUEST_STATE_REQUEST_GUEST_WIDE;
> +    }
> +    if (flags & !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
> +        return H_PARAMETER; /* flag not supported yet */
> +    }
> +
> +    if (set) {
> +        gsr.flags |= GUEST_STATE_REQUEST_SET;
> +    }
> +    return map_and_getset_state(cpu, guest, vcpuid, &gsr);
> +}
> +
> +static target_ulong h_guest_set_state(PowerPCCPU *cpu,
> +                                      SpaprMachineState *spapr,
> +                                      target_ulong opcode,
> +                                      target_ulong *args)
> +{
> +    return h_guest_getset_state(cpu, spapr, args, true);
> +}
> +
> +static target_ulong h_guest_get_state(PowerPCCPU *cpu,
> +                                      SpaprMachineState *spapr,
> +                                      target_ulong opcode,
> +                                      target_ulong *args)
> +{
> +    return h_guest_getset_state(cpu, spapr, args, false);
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -1122,6 +1387,8 @@ void spapr_register_nested_phyp(void)
>      spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>      spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
>      spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
> +    spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
> +    spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index 3c0d6a486e..eaee624b87 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -206,6 +206,9 @@
>  #define HVMASK_MSR            0xEBFFFFFFFFBFEFFF
>  #define HVMASK_HDEXCR         0x00000000FFFFFFFF
>  #define HVMASK_TB_OFFSET      0x000000FFFFFFFFFF
> +#define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000000000000000 /* BE in GSB */
> +#define GUEST_STATE_REQUEST_GUEST_WIDE       0x1
> +#define GUEST_STATE_REQUEST_SET              0x2
>  
>  #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
>      .id = (i),                                     \
> @@ -336,6 +339,25 @@ struct guest_state_element_type {
>      uint64_t mask;
>  };
>  
> +struct guest_state_element {
> +    uint16_t id;   /* Big Endian */
> +    uint16_t size; /* Big Endian */
> +    uint8_t value[]; /* Big Endian (based on size above) */
> +} QEMU_PACKED;
> +
> +struct guest_state_buffer {
> +    uint32_t num_elements; /* Big Endian */
> +    struct guest_state_element elements[];
> +} QEMU_PACKED;

I think it's probably enough to add one comment saying the PAPR
API numbers are all in BE format. This is actually expected of PAPR
so it goes without saying really, but the nested HV API actually had
some things in guest endian format so it's worth calling out.

Actually maybe single out the nested HV structures as different. I
don't know if the upstream code actually handles endian properly...

Thanks,
Nick

> +
> +/* Actuall buffer plus some metadata about the request */
> +struct guest_state_request {
> +    struct guest_state_buffer *gsb;
> +    int64_t buf;
> +    int64_t len;
> +    uint16_t flags;
> +};
> +
>  /*
>   * Register state for entering a nested guest with H_ENTER_NESTED.
>   * New member must be added at the end.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU
  2023-09-06  4:33 ` [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU Harsh Prateek Bora
@ 2023-09-07  3:55   ` Nicholas Piggin
  2023-10-12 10:23     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  3:55 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Once the L1 has created a nested guest and its associated VCPU, it can
> request for the execution of nested guest by setting its initial state
> which can be done either using the h_guest_set_state or using the input
> buffers along with the call to h_guest_run_vcpu(). On guest exit, L0
> uses output buffers to convey the exit cause to the L1. L0 takes care of
> switching context from L1 to L2 during guest entry and restores L1 context
> on guest exit.
>
> Unlike nested-hv, L2 (nested) guest's entire state is retained with
> L0 after guest exit and restored on next entry in case of nested-papr.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  hw/ppc/spapr_nested.c           | 471 +++++++++++++++++++++++++++-----
>  include/hw/ppc/spapr_cpu_core.h |   7 +-
>  include/hw/ppc/spapr_nested.h   |   6 +
>  3 files changed, 408 insertions(+), 76 deletions(-)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 67e389a762..3605f27115 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -12,6 +12,17 @@
>  #ifdef CONFIG_TCG
>  #define PRTS_MASK      0x1f
>  
> +static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
> +                                     SpaprMachineStateNestedGuestVcpu *vcpu);
> +static void exit_process_output_buffer(PowerPCCPU *cpu,
> +                                      SpaprMachineStateNestedGuest *guest,
> +                                      target_ulong vcpuid,
> +                                      target_ulong *r3);
> +static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src);
> +static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
> +                       target_ulong vcpuid,
> +                       bool inoutbuf);
> +
>  static target_ulong h_set_ptbl(PowerPCCPU *cpu,
>                                 SpaprMachineState *spapr,
>                                 target_ulong opcode,
> @@ -187,21 +198,21 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>          return H_PARAMETER;
>      }
>  
> -    spapr_cpu->nested_host_state = g_try_new(struct nested_ppc_state, 1);
> -    if (!spapr_cpu->nested_host_state) {
> +    spapr_cpu->nested_hv_host = g_try_new(struct nested_ppc_state, 1);
> +    if (!spapr_cpu->nested_hv_host) {
>          return H_NO_MEM;
>      }

Don't rename existing thing in the same patch as adding new thing.

>  
>      assert(env->spr[SPR_LPIDR] == 0);
>      assert(env->spr[SPR_DPDES] == 0);
> -    nested_save_state(spapr_cpu->nested_host_state, cpu);
> +    nested_save_state(spapr_cpu->nested_hv_host, cpu);
>  
>      len = sizeof(*regs);
>      regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, false,
>                                  MEMTXATTRS_UNSPECIFIED);
>      if (!regs || len != sizeof(*regs)) {
>          address_space_unmap(CPU(cpu)->as, regs, len, 0, false);
> -        g_free(spapr_cpu->nested_host_state);
> +        g_free(spapr_cpu->nested_hv_host);
>          return H_P2;
>      }
>  
> @@ -276,105 +287,146 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  
>  void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  {
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);

I think it would be worth seeing how it looks to split these into
original and papr functions rather than try mash them together.

>      CPUPPCState *env = &cpu->env;
>      SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
> +    target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
>      struct nested_ppc_state l2_state;
> -    target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
> -    target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
> -    target_ulong hsrr0, hsrr1, hdar, asdr, hdsisr;
> +    target_ulong hv_ptr, regs_ptr;
> +    target_ulong hsrr0 = 0, hsrr1 = 0, hdar = 0, asdr = 0, hdsisr = 0;
>      struct kvmppc_hv_guest_state *hvstate;
>      struct kvmppc_pt_regs *regs;
>      hwaddr len;
> +    target_ulong lpid = 0, vcpuid = 0;
> +    struct SpaprMachineStateNestedGuestVcpu *vcpu = NULL;
> +    struct SpaprMachineStateNestedGuest *guest = NULL;
>  
>      assert(spapr_cpu->in_nested);
> -
> -    nested_save_state(&l2_state, cpu);
> -    hsrr0 = env->spr[SPR_HSRR0];
> -    hsrr1 = env->spr[SPR_HSRR1];
> -    hdar = env->spr[SPR_HDAR];
> -    hdsisr = env->spr[SPR_HDSISR];
> -    asdr = env->spr[SPR_ASDR];
> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
> +        nested_save_state(&l2_state, cpu);
> +        hsrr0 = env->spr[SPR_HSRR0];
> +        hsrr1 = env->spr[SPR_HSRR1];
> +        hdar = env->spr[SPR_HDAR];
> +        hdsisr = env->spr[SPR_HDSISR];
> +        asdr = env->spr[SPR_ASDR];
> +    } else if (spapr->nested.api == NESTED_API_PAPR) {
> +        lpid = spapr_cpu->nested_papr_host->gpr[5];
> +        vcpuid = spapr_cpu->nested_papr_host->gpr[6];
> +        guest = spapr_get_nested_guest(spapr, lpid);
> +        assert(guest);
> +        vcpu_check(guest, vcpuid, false);
> +        vcpu = &guest->vcpu[vcpuid];
> +
> +        exit_nested_restore_vcpu(cpu, excp, vcpu);
> +        /* do the output buffer for run_vcpu*/
> +        exit_process_output_buffer(cpu, guest, vcpuid, &r3_return);
> +    } else
> +        g_assert_not_reached();
>  
>      /*
>       * Switch back to the host environment (including for any error).
>       */
>      assert(env->spr[SPR_LPIDR] != 0);
> -    nested_load_state(cpu, spapr_cpu->nested_host_state);
> -    env->gpr[3] = env->excp_vectors[excp]; /* hcall return value */
>  
> -    cpu_ppc_hdecr_exit(env);
> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
> +        nested_load_state(cpu, spapr_cpu->nested_hv_host);
> +        env->gpr[3] = r3_return;
> +    } else if (spapr->nested.api == NESTED_API_PAPR) {
> +        restore_common_regs(env, spapr_cpu->nested_papr_host);
> +        env->tb_env->tb_offset -= vcpu->tb_offset;
> +        env->gpr[3] = H_SUCCESS;
> +        env->gpr[4] = r3_return;
> +        hreg_compute_hflags(env);
> +        ppc_maybe_interrupt(env);
> +        tlb_flush(cs);
> +        env->reserve_addr = -1; /* Reset the reservation */

There's a bunch of stuff that's getting duplicated anyway, so
it's actually not clear that this maze of if statements makes
it simpler to see that nothing is missed.

> +    }
>  
> -    spapr_cpu->in_nested = false;
> +    cpu_ppc_hdecr_exit(env);
>  
> -    g_free(spapr_cpu->nested_host_state);
> -    spapr_cpu->nested_host_state = NULL;
> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
> +        hv_ptr = spapr_cpu->nested_hv_host->gpr[4];
> +        regs_ptr = spapr_cpu->nested_hv_host->gpr[5];
> +
> +        len = sizeof(*hvstate);
> +        hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
> +                                    MEMTXATTRS_UNSPECIFIED);
> +        if (len != sizeof(*hvstate)) {
> +            address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
> +            env->gpr[3] = H_PARAMETER;
> +            return;
> +        }
>  
> -    len = sizeof(*hvstate);
> -    hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
> -                                MEMTXATTRS_UNSPECIFIED);
> -    if (len != sizeof(*hvstate)) {
> -        address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
> -        env->gpr[3] = H_PARAMETER;
> -        return;
> -    }
> +        hvstate->cfar = l2_state.cfar;
> +        hvstate->lpcr = l2_state.lpcr;
> +        hvstate->pcr = l2_state.pcr;
> +        hvstate->dpdes = l2_state.dpdes;
> +        hvstate->hfscr = l2_state.hfscr;
> +
> +        if (excp == POWERPC_EXCP_HDSI) {
> +            hvstate->hdar = hdar;
> +            hvstate->hdsisr = hdsisr;
> +            hvstate->asdr = asdr;
> +        } else if (excp == POWERPC_EXCP_HISI) {
> +            hvstate->asdr = asdr;
> +        }
>  
> -    hvstate->cfar = l2_state.cfar;
> -    hvstate->lpcr = l2_state.lpcr;
> -    hvstate->pcr = l2_state.pcr;
> -    hvstate->dpdes = l2_state.dpdes;
> -    hvstate->hfscr = l2_state.hfscr;
> +        /* HEIR should be implemented for HV mode and saved here. */
> +        hvstate->srr0 = l2_state.srr0;
> +        hvstate->srr1 = l2_state.srr1;
> +        hvstate->sprg[0] = l2_state.sprg0;
> +        hvstate->sprg[1] = l2_state.sprg1;
> +        hvstate->sprg[2] = l2_state.sprg2;
> +        hvstate->sprg[3] = l2_state.sprg3;
> +        hvstate->pidr = l2_state.pidr;
> +        hvstate->ppr = l2_state.ppr;
> +
> +        /* Is it okay to specify write len larger than actual data written? */
> +        address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
> +
> +        len = sizeof(*regs);
> +        regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
> +                                    MEMTXATTRS_UNSPECIFIED);
> +        if (!regs || len != sizeof(*regs)) {
> +            address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
> +            env->gpr[3] = H_P2;
> +            return;
> +        }
>  
> -    if (excp == POWERPC_EXCP_HDSI) {
> -        hvstate->hdar = hdar;
> -        hvstate->hdsisr = hdsisr;
> -        hvstate->asdr = asdr;
> -    } else if (excp == POWERPC_EXCP_HISI) {
> -        hvstate->asdr = asdr;
> -    }
> +        len = sizeof(env->gpr);
> +        assert(len == sizeof(regs->gpr));
> +        memcpy(regs->gpr, l2_state.gpr, len);
>  
> -    /* HEIR should be implemented for HV mode and saved here. */
> -    hvstate->srr0 = l2_state.srr0;
> -    hvstate->srr1 = l2_state.srr1;
> -    hvstate->sprg[0] = l2_state.sprg0;
> -    hvstate->sprg[1] = l2_state.sprg1;
> -    hvstate->sprg[2] = l2_state.sprg2;
> -    hvstate->sprg[3] = l2_state.sprg3;
> -    hvstate->pidr = l2_state.pidr;
> -    hvstate->ppr = l2_state.ppr;
> +        regs->link = l2_state.lr;
> +        regs->ctr = l2_state.ctr;
> +        regs->xer = l2_state.xer;
> +        regs->ccr = l2_state.cr;
>  
> -    /* Is it okay to specify write length larger than actual data written? */
> -    address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
> +        if (excp == POWERPC_EXCP_MCHECK ||
> +            excp == POWERPC_EXCP_RESET ||
> +            excp == POWERPC_EXCP_SYSCALL) {
> +            regs->nip = l2_state.srr0;
> +            regs->msr = l2_state.srr1 & env->msr_mask;
> +        } else {
> +            regs->nip = hsrr0;
> +            regs->msr = hsrr1 & env->msr_mask;
> +        }
>  
> -    len = sizeof(*regs);
> -    regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
> -                                MEMTXATTRS_UNSPECIFIED);
> -    if (!regs || len != sizeof(*regs)) {
> -        address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
> -        env->gpr[3] = H_P2;
> -        return;
> +        /* Is it okay to specify write len larger than actual data written? */
> +        address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>      }
>  
> -    len = sizeof(env->gpr);
> -    assert(len == sizeof(regs->gpr));
> -    memcpy(regs->gpr, l2_state.gpr, len);
> -
> -    regs->link = l2_state.lr;
> -    regs->ctr = l2_state.ctr;
> -    regs->xer = l2_state.xer;
> -    regs->ccr = l2_state.cr;
> +    spapr_cpu->in_nested = false;
>  
> -    if (excp == POWERPC_EXCP_MCHECK ||
> -        excp == POWERPC_EXCP_RESET ||
> -        excp == POWERPC_EXCP_SYSCALL) {
> -        regs->nip = l2_state.srr0;
> -        regs->msr = l2_state.srr1 & env->msr_mask;
> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
> +        g_free(spapr_cpu->nested_hv_host);
> +        spapr_cpu->nested_hv_host = NULL;
>      } else {
> -        regs->nip = hsrr0;
> -        regs->msr = hsrr1 & env->msr_mask;
> +        g_free(spapr_cpu->nested_papr_host);
> +        spapr_cpu->nested_papr_host = NULL;
>      }
>  
> -    /* Is it okay to specify write length larger than actual data written? */
> -    address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>  }
>  
>  SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
> @@ -1372,6 +1424,274 @@ static target_ulong h_guest_get_state(PowerPCCPU *cpu,
>      return h_guest_getset_state(cpu, spapr, args, false);
>  }
>  
> +static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src)
> +{
> +    memcpy(dst->gpr, src->gpr, sizeof(dst->gpr));
> +    memcpy(dst->crf, src->crf, sizeof(dst->crf));
> +    memcpy(dst->vsr, src->vsr, sizeof(dst->vsr));
> +    dst->nip = src->nip;
> +    dst->msr = src->msr;
> +    dst->lr  = src->lr;
> +    dst->ctr = src->ctr;
> +    dst->cfar = src->cfar;
> +    cpu_write_xer(dst, src->xer);
> +    ppc_store_vscr(dst, ppc_get_vscr(src));
> +    ppc_store_fpscr(dst, src->fpscr);
> +    memcpy(dst->spr, src->spr, sizeof(dst->spr));
> +}
> +
> +static void restore_l2_state(PowerPCCPU *cpu,
> +                             CPUPPCState *env,
> +                             struct SpaprMachineStateNestedGuestVcpu *vcpu,
> +                             target_ulong now)
> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    target_ulong lpcr, lpcr_mask, hdec;
> +    lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
> +
> +    if (spapr->nested.api == NESTED_API_PAPR) {
> +        assert(vcpu);
> +        assert(sizeof(env->gpr) == sizeof(vcpu->env.gpr));
> +        restore_common_regs(env, &vcpu->env);
> +        lpcr = (env->spr[SPR_LPCR] & ~lpcr_mask) |
> +               (vcpu->env.spr[SPR_LPCR] & lpcr_mask);
> +        lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
> +        lpcr &= ~LPCR_LPES0;
> +        env->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
> +
> +        hdec = vcpu->env.tb_env->hdecr_expiry_tb - now;
> +        cpu_ppc_store_decr(env, vcpu->dec_expiry_tb - now);
> +        cpu_ppc_hdecr_init(env);
> +        cpu_ppc_store_hdecr(env, hdec);
> +
> +        env->tb_env->tb_offset += vcpu->tb_offset;
> +    }
> +}
> +
> +static void enter_nested(PowerPCCPU *cpu,
> +                         uint64_t lpid,
> +                         struct SpaprMachineStateNestedGuestVcpu *vcpu)

That's not good since we have h_enter_nested for the old API. Really
have to be a bit more consistent with using papr_ for naming I think.
And you don't have to call this enter_nested anyway, papr_run_vcpu is
okay too since that matches the API call. Can just add a comment /*
Enter the L2 VCPU, equivalent to h_enter_nested */ if you think that's
needed.

> +{
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    CPUState *cs = CPU(cpu);
> +    CPUPPCState *env = &cpu->env;
> +    SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
> +    target_ulong now = cpu_ppc_load_tbl(env);
> +
> +    assert(env->spr[SPR_LPIDR] == 0);
> +    assert(spapr->nested.api); /* ensure API version is initialized */
> +    spapr_cpu->nested_papr_host = g_try_new(CPUPPCState, 1);
> +    assert(spapr_cpu->nested_papr_host);
> +    memcpy(spapr_cpu->nested_papr_host, env, sizeof(CPUPPCState));
> +
> +    restore_l2_state(cpu, env, vcpu, now);
> +    env->spr[SPR_LPIDR] = lpid; /* post restore_l2_state */
> +
> +    spapr_cpu->in_nested = true;
> +
> +    hreg_compute_hflags(env);
> +    ppc_maybe_interrupt(env);
> +    tlb_flush(cs);
> +    env->reserve_addr = -1; /* Reset the reservation */

       ^^^
       This is the kind of block that could be pulled into a
       common helper function. There's 3-4 copies now?
> +
> +}
> +
> +static target_ulong h_guest_run_vcpu(PowerPCCPU *cpu,
> +                                     SpaprMachineState *spapr,
> +                                     target_ulong opcode,
> +                                     target_ulong *args)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    target_ulong flags = args[0];
> +    target_ulong lpid = args[1];
> +    target_ulong vcpuid = args[2];
> +    struct SpaprMachineStateNestedGuestVcpu *vcpu;
> +    struct guest_state_request gsr;
> +    SpaprMachineStateNestedGuest *guest;
> +
> +    if (flags) /* don't handle any flags for now */
> +        return H_PARAMETER;
> +
> +    guest = spapr_get_nested_guest(spapr, lpid);
> +    if (!guest) {
> +        return H_P2;
> +    }
> +    if (!vcpu_check(guest, vcpuid, true)) {
> +        return H_P3;
> +    }
> +
> +    if (guest->parttbl[0] == 0) {
> +        /* At least need a partition scoped radix tree */
> +        return H_NOT_AVAILABLE;
> +    }
> +
> +    vcpu = &guest->vcpu[vcpuid];
> +
> +    /* Read run_vcpu input buffer to update state */
> +    gsr.buf = vcpu->runbufin.addr;
> +    gsr.len = vcpu->runbufin.size;
> +    gsr.flags = GUEST_STATE_REQUEST_SET; /* Thread wide + writing */
> +    if (!map_and_getset_state(cpu, guest, vcpuid, &gsr)) {
> +        enter_nested(cpu, lpid, vcpu);
> +    }
> +
> +    return env->gpr[3];
> +}
> +
> +struct run_vcpu_exit_cause run_vcpu_exit_causes[] = {
> +    { .nia = 0x980,
> +      .count = 0,
> +    },
> +    { .nia = 0xc00,
> +      .count = 10,
> +      .ids = {
> +          GSB_VCPU_GPR3,
> +          GSB_VCPU_GPR4,
> +          GSB_VCPU_GPR5,
> +          GSB_VCPU_GPR6,
> +          GSB_VCPU_GPR7,
> +          GSB_VCPU_GPR8,
> +          GSB_VCPU_GPR9,
> +          GSB_VCPU_GPR10,
> +          GSB_VCPU_GPR11,
> +          GSB_VCPU_GPR12,
> +      },
> +    },
> +    { .nia = 0xe00,
> +      .count = 5,
> +      .ids = {
> +          GSB_VCPU_SPR_HDAR,
> +          GSB_VCPU_SPR_HDSISR,
> +          GSB_VCPU_SPR_ASDR,
> +          GSB_VCPU_SPR_NIA,
> +          GSB_VCPU_SPR_MSR,
> +      },
> +    },
> +    { .nia = 0xe20,
> +      .count = 4,
> +      .ids = {
> +          GSB_VCPU_SPR_HDAR,
> +          GSB_VCPU_SPR_ASDR,
> +          GSB_VCPU_SPR_NIA,
> +          GSB_VCPU_SPR_MSR,
> +      },
> +    },
> +    { .nia = 0xe40,
> +      .count = 3,
> +      .ids = {
> +          GSB_VCPU_SPR_HEIR,
> +          GSB_VCPU_SPR_NIA,
> +          GSB_VCPU_SPR_MSR,
> +      },
> +    },
> +    { .nia = 0xea0,
> +      .count = 0,
> +    },
> +    { .nia = 0xf80,
> +      .count = 3,
> +      .ids = {
> +          GSB_VCPU_SPR_HFSCR,
> +          GSB_VCPU_SPR_NIA,
> +          GSB_VCPU_SPR_MSR,
> +      },
> +    },
> +};
> +
> +static struct run_vcpu_exit_cause *find_exit_cause(uint64_t srr0)
> +{
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(run_vcpu_exit_causes); i++)
> +        if (srr0 == run_vcpu_exit_causes[i].nia) {
> +            return &run_vcpu_exit_causes[i];
> +        }
> +
> +    printf("%s: srr0:0x%016lx\n", __func__, srr0);
> +    return NULL;
> +}

This is another weird control flow thing, also unclear why it's used
here. Here: 52 lines vs 76, no new struct, simpler for the compiler
to understand and optimise.

int get_exit_ids(uint64_t srr0, uint16_t ids[16])
{
    int nr;

    switch (srr0) {
    case 0xc00:
        nr = 10;
        ids[0] = GSP_VCPU_GPR3;
        ids[1] = GSP_VCPU_GPR4;
        ids[2] = GSP_VCPU_GPR5;
        ids[3] = GSP_VCPU_GPR6;
        ids[4] = GSP_VCPU_GPR7;
        ids[5] = GSP_VCPU_GPR8;
        ids[6] = GSP_VCPU_GPR9;
        ids[7] = GSP_VCPU_GPR10;
        ids[8] = GSP_VCPU_GPR11;
        ids[9] = GSP_VCPU_GPR12;
        break;
    case 0xe00:
        nr = 5;
        ids[0] = GSP_VCPU_HDAR;
        ids[1] = GSP_VCPU_HDSISR;
        ids[2] = GSP_VCPU_ASDR;
        ids[3] = GSP_VCPU_NIA;
        ids[4] = GSP_VCPU_MSR;
        break;
    case 0xe20:
        nr = 4;
        ids[0] = GSP_VCPU_HDAR;
        ids[1] = GSP_VCPU_ASDR;
        ids[2] = GSP_VCPU_NIA;
        ids[3] = GSP_VCPU_MSR;
        break;
    case 0xe40:
        nr = 3;
        ids[0] = GSP_VCPU_HEIR;
        ids[1] = GSP_VCPU_NIA;
        ids[2] = GSP_VCPU_MSR;
        break;
    case 0xf80:
        nr = 3;
        ids[0] = GSP_VCPU_HFSCR;
        ids[1] = GSP_VCPU_NIA;
        ids[2] = GSP_VCPU_MSR;
        break;
    default:
        nr = 0;
        break;
    }

    return nr;
}

> +
> +static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
> +                                     SpaprMachineStateNestedGuestVcpu *vcpu)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    target_ulong now, hdar, hdsisr, asdr;
> +
> +    assert(sizeof(env->gpr) == sizeof(vcpu->env.gpr)); /* sanity check */
> +
> +    now = cpu_ppc_load_tbl(env); /* L2 timebase */
> +    now -= vcpu->tb_offset; /* L1 timebase */
> +    vcpu->dec_expiry_tb = now - cpu_ppc_load_decr(env);
> +    /* backup hdar, hdsisr, asdr if reqd later below */
> +    hdar   = vcpu->env.spr[SPR_HDAR];
> +    hdsisr = vcpu->env.spr[SPR_HDSISR];
> +    asdr   = vcpu->env.spr[SPR_ASDR];
> +
> +    restore_common_regs(&vcpu->env, env);
> +
> +    if (excp == POWERPC_EXCP_MCHECK ||
> +        excp == POWERPC_EXCP_RESET ||
> +        excp == POWERPC_EXCP_SYSCALL) {
> +        vcpu->env.nip = env->spr[SPR_SRR0];
> +        vcpu->env.msr = env->spr[SPR_SRR1] & env->msr_mask;
> +    } else {
> +        vcpu->env.nip = env->spr[SPR_HSRR0];
> +        vcpu->env.msr = env->spr[SPR_HSRR1] & env->msr_mask;
> +    }
> +
> +    /* hdar, hdsisr, asdr should be retained unless certain exceptions */
> +    if ((excp != POWERPC_EXCP_HDSI) && (excp != POWERPC_EXCP_HISI)) {
> +        vcpu->env.spr[SPR_ASDR] = asdr;
> +    } else if (excp != POWERPC_EXCP_HDSI) {
> +        vcpu->env.spr[SPR_HDAR]   = hdar;
> +        vcpu->env.spr[SPR_HDSISR] = hdsisr;
> +    }
> +}
> +
> +static void exit_process_output_buffer(PowerPCCPU *cpu,
> +                                      SpaprMachineStateNestedGuest *guest,
> +                                      target_ulong vcpuid,
> +                                      target_ulong *r3)
> +{
> +    SpaprMachineStateNestedGuestVcpu *vcpu = &guest->vcpu[vcpuid];
> +    struct guest_state_request gsr;
> +    struct guest_state_buffer *gsb;
> +    struct guest_state_element *element;
> +    struct guest_state_element_type *type;
> +    struct run_vcpu_exit_cause *exit_cause;
> +    hwaddr len;
> +    int i;
> +
> +    len = vcpu->runbufout.size;
> +    gsb = address_space_map(CPU(cpu)->as, vcpu->runbufout.addr, &len, true,
> +                            MEMTXATTRS_UNSPECIFIED);
> +    if (!gsb || len != vcpu->runbufout.size) {
> +        address_space_unmap(CPU(cpu)->as, gsb, len, 0, true);
> +        *r3 = H_P2;
> +        return;
> +    }
> +
> +    exit_cause = find_exit_cause(*r3);
> +
> +    /* Create a buffer of elements to send back */
> +    gsb->num_elements = cpu_to_be32(exit_cause->count);
> +    element = gsb->elements;
> +    for (i = 0; i < exit_cause->count; i++) {
> +        type = guest_state_element_type_find(exit_cause->ids[i]);
> +        assert(type);
> +        element->id = cpu_to_be16(exit_cause->ids[i]);
> +        element->size = cpu_to_be16(type->size);
> +        element = guest_state_element_next(element, NULL, NULL);
> +    }
> +    gsr.gsb = gsb;
> +    gsr.len = VCPU_OUT_BUF_MIN_SZ;
> +    gsr.flags = 0; /* get + never guest wide */
> +    getset_state(guest, vcpuid, &gsr);
> +
> +    address_space_unmap(CPU(cpu)->as, gsb, len, len, true);
> +    return;
> +}
> +
>  void spapr_register_nested(void)
>  {
>      spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -1388,6 +1708,7 @@ void spapr_register_nested_phyp(void)
>      spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
>      spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
>      spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
> +    spapr_register_hypercall(H_GUEST_RUN_VCPU        , h_guest_run_vcpu);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
> index 69a52e39b8..09855f69aa 100644
> --- a/include/hw/ppc/spapr_cpu_core.h
> +++ b/include/hw/ppc/spapr_cpu_core.h
> @@ -53,7 +53,12 @@ typedef struct SpaprCpuState {
>  
>      /* Fields for nested-HV support */
>      bool in_nested; /* true while the L2 is executing */
> -    struct nested_ppc_state *nested_host_state; /* holds the L1 state while L2 executes */
> +    union {
> +        /* nested-hv needs minimal set of regs as L1 stores L2 state */
> +        struct nested_ppc_state *nested_hv_host;
> +        /* In nested-papr, L0 retains entire L2 state, so keep it all safe. */
> +        CPUPPCState *nested_papr_host;
> +    };

This IMO still shouldn't be a CPUPPCState, but extending
nested_ppc_state. Differences between nested APIs should not
be here either, but inside the nested_ppc_state structure.

Thanks,
Nick

>  } SpaprCpuState;
>  
>  static inline SpaprCpuState *spapr_cpu_state(PowerPCCPU *cpu)
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index eaee624b87..ca5d28c06e 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -358,6 +358,12 @@ struct guest_state_request {
>      uint16_t flags;
>  };
>  
> +struct run_vcpu_exit_cause {
> +    uint64_t nia;
> +    uint64_t count;
> +    uint16_t ids[10]; /* max ids supported by run_vcpu_exit_causes */
> +};
> +
>  /*
>   * Register state for entering a nested guest with H_ENTER_NESTED.
>   * New member must be added at the end.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API
  2023-09-06  4:33 ` [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API Harsh Prateek Bora
@ 2023-09-07  3:56   ` Nicholas Piggin
  2023-10-12 10:25     ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Nicholas Piggin @ 2023-09-07  3:56 UTC (permalink / raw)
  To: Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Adding initial documentation about Nested PAPR API to describe the set
> of APIs and its usage. Also talks about the Guest State Buffer elements
> and it's format which is used between L0/L1 to communicate L2 state.

I would move this patch first (well, behind any cleanup and preparation
patches, but before any new API additions).

Thanks,
Nick

>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  docs/devel/nested-papr.txt | 500 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 500 insertions(+)
>  create mode 100644 docs/devel/nested-papr.txt
>
> diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
> new file mode 100644
> index 0000000000..c5c2ba7e50
> --- /dev/null
> +++ b/docs/devel/nested-papr.txt
> @@ -0,0 +1,500 @@
> +Nested PAPR API (aka KVM on PowerVM)
> +====================================
> +
> +This API aims at providing support to enable nested virtualization with
> +KVM on PowerVM. While the existing support for nested KVM on PowerNV was
> +introduced with cap-nested-hv option, however, with a slight design change,
> +to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
> +
> +  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
> +
> +Work by:
> +    Michael Neuling <mikey@neuling.org>
> +    Vaibhav Jain <vaibhav@linux.ibm.com>
> +    Jordan Niethe <jniethe5@gmail.com>
> +    Harsh Prateek Bora <harshpb@linux.ibm.com>
> +    Shivaprasad G Bhat <sbhat@linux.ibm.com>
> +    Kautuk Consul <kconsul@linux.vnet.ibm.com>
> +
> +Below taken from the kernel documentation:
> +
> +Introduction
> +============
> +
> +This document explains how a guest operating system can act as a
> +hypervisor and run nested guests through the use of hypercalls, if the
> +hypervisor has implemented them. The terms L0, L1, and L2 are used to
> +refer to different software entities. L0 is the hypervisor mode entity
> +that would normally be called the "host" or "hypervisor". L1 is a
> +guest virtual machine that is directly run under L0 and is initiated
> +and controlled by L0. L2 is a guest virtual machine that is initiated
> +and controlled by L1 acting as a hypervisor. A significant design change
> +wrt existing API is that now the entire L2 state is maintained within L0.
> +
> +Existing Nested-HV API
> +======================
> +
> +Linux/KVM has had support for Nesting as an L0 or L1 since 2018
> +
> +The L0 code was added::
> +
> +   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
> +   Author: Paul Mackerras <paulus@ozlabs.org>
> +   Date:   Mon Oct 8 16:31:03 2018 +1100
> +   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
> +
> +The L1 code was added::
> +
> +   commit 360cae313702cdd0b90f82c261a8302fecef030a
> +   Author: Paul Mackerras <paulus@ozlabs.org>
> +   Date:   Mon Oct 8 16:31:04 2018 +1100
> +   KVM: PPC: Book3S HV: Nested guest entry via hypercall
> +
> +This API works primarily using a signal hcall h_enter_nested(). This
> +call made by the L1 to tell the L0 to start an L2 vCPU with the given
> +state. The L0 then starts this L2 and runs until an L2 exit condition
> +is reached. Once the L2 exits, the state of the L2 is given back to
> +the L1 by the L0. The full L2 vCPU state is always transferred from
> +and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
> +vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
> +-> L1 exit).
> +
> +The only state kept by the L0 is the partition table. The L1 registers
> +it's partition table using the h_set_partition_table() hcall. All
> +other state held by the L0 about the L2s is cached state (such as
> +shadow page tables).
> +
> +The L1 may run any L2 or vCPU without first informing the L0. It
> +simply starts the vCPU using h_enter_nested(). The creation of L2s and
> +vCPUs is done implicitly whenever h_enter_nested() is called.
> +
> +In this document, we call this existing API the v1 API.
> +
> +New PAPR API
> +===============
> +
> +The new PAPR API changes from the v1 API such that the creating L2 and
> +associated vCPUs is explicit. In this document, we call this the v2
> +API.
> +
> +h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
> +be called the L1 must explicitly create the L2 using h_guest_create()
> +and any associated vCPUs() created with h_guest_create_vCPU(). Getting
> +and setting vCPU state can also be performed using h_guest_{g|s}et
> +hcall.
> +
> +The basic execution flow is for an L1 to create an L2, run it, and
> +delete it is:
> +
> +- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
> +  (normally at L1 boot time).
> +
> +- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
> +
> +- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
> +
> +- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
> +
> +- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
> +
> +- L1 deletes L2 with H_GUEST_DELETE()
> +
> +More details of the individual hcalls follows:
> +
> +HCALL Details
> +=============
> +
> +This documentation is provided to give an overall understating of the
> +API. It doesn't aim to provide full details required to implement
> +an L1 or L0. Latest PAPR spec shall be referred for more details.
> +
> +All these HCALLs are made by the L1 to the L0.
> +
> +H_GUEST_GET_CAPABILITIES()
> +--------------------------
> +
> +This is called to get the capabilities of the L0 nested
> +hypervisor. This includes capabilities such the CPU versions (eg
> +POWER9, POWER10) that are supported as L2s.
> +
> +H_GUEST_SET_CAPABILITIES()
> +--------------------------
> +
> +This is called to inform the L0 of the capabilities of the L1
> +hypervisor. The set of flags passed here are the same as
> +H_GUEST_GET_CAPABILITIES()
> +
> +Typically, GET will be called first and then SET will be called with a
> +subset of the flags returned from GET. This process allows the L0 and
> +L1 to negotiate a agreed set of capabilities.
> +
> +H_GUEST_CREATE()
> +----------------
> +
> +This is called to create a L2. Returned is ID of the L2 created
> +(similar to an LPID), which can be use on subsequent HCALLs to
> +identify the L2.
> +
> +H_GUEST_CREATE_VCPU()
> +---------------------
> +
> +This is called to create a vCPU associated with a L2. The L2 id
> +(returned from H_GUEST_CREATE()) should be passed it. Also passed in
> +is a unique (for this L2) vCPUid. This vCPUid is allocated by the
> +L1.
> +
> +H_GUEST_SET_STATE()
> +-------------------
> +
> +This is called to set L2 wide or vCPU specific L2 state. This info is
> +passed via the Guest State Buffer (GSB), details below.
> +
> +This can set either L2 wide or vcpu specific information. Examples of
> +L2 wide is the timebase offset or process scoped page table
> +info. Examples of vCPU wide are GPRs or VSRs. A bit in the flags
> +parameter specifies if this call is L2 wide or vCPU specific and the
> +IDs in the GSB must match this.
> +
> +The L1 provides a pointer to the GSB as a parameter to this call. Also
> +provided is the L2 and vCPU IDs associated with the state to set.
> +
> +The L1 writes all values in the GSB and the L0 only reads the GSB for
> +this call
> +
> +H_GUEST_GET_STATE()
> +-------------------
> +
> +This is called to get state associated with a L2 or L2 vCPU. This info
> +passed via the GSB (details below).
> +
> +This can get either L2 wide or vcpu specific information. Examples of
> +L2 wide is the timebase offset or process scoped page table
> +info. Examples of vCPU wide are GPRs or VSRs. A bit in the flags
> +parameter specifies if this call is L2 wide or vCPU specific and the
> +IDs in the GSB must match this.
> +
> +The L1 provides a pointer to the GSB as a parameter to this call. Also
> +provided is the L2 and vCPU IDs associated with the state to get.
> +
> +The L1 writes only the IDs and sizes in the GSB.  L0 writes the
> +associated values for each ID in the GSB.
> +
> +H_GUEST_RUN_VCPU()
> +------------------
> +
> +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
> +parameters. The vCPU runs with the state set previously using
> +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
> +hcall.
> +
> +This hcall also has associated input and output GSBs. Unlike
> +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
> +parameters to the hcall (This was done in the interest of
> +performance). The locations of these GSBs must be preregistered using
> +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table later
> +below).
> +
> +The input GSB may contain only VCPU wide elements to be set. This GSB
> +may also contain zero elements (ie 0 in the first 4 bytes of the GSB)
> +if nothing needs to be set.
> +
> +On exit from the hcall, the output buffer is filled with elements
> +determined by the L0. The reason for the exit is contained in GPR4 (ie
> +NIP is put in GPR4).  The elements returned depend on the exit
> +type. For example, if the exit reason is the L2 doing a hcall (GPR4 =
> +0xc00), then GPR3-12 are provided in the output GSB as this is the
> +state likely needed to service the hcall. If additional state is
> +needed, H_GUEST_GET_STATE() may be called by the L1.
> +
> +To synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
> +the L1 may set a flag (as a hcall parameter) and the L0 will
> +synthesize the interrupt in the L2. Alternatively, the L1 may
> +synthesize the interrupt itself using H_GUEST_SET_STATE() or the
> +H_GUEST_RUN_VCPU() input GSB to set the state appropriately.
> +
> +H_GUEST_DELETE()
> +----------------
> +
> +This is called to delete an L2. All associated vCPUs are also
> +deleted. No specific vCPU delete call is provided.
> +
> +A flag may be provided to delete all guests. This is used to reset the
> +L0 in the case of kdump/kexec.
> +
> +Guest State Buffer (GSB)
> +========================
> +
> +The Guest State Buffer (GSB) is the main method of communicating state
> +about the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
> +H_GUEST_VCPU_RUN() calls.
> +
> +State may be associated with a whole L2 (eg timebase offset) or a
> +specific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
> +H_GUEST_VCPU_RUN().
> +
> +All data in the GSB is big endian (as is standard in PAPR)
> +
> +The Guest state buffer has a header which gives the number of
> +elements, followed by the GSB elements themselves.
> +
> +GSB header:
> +
> ++----------+----------+-------------------------------------------+
> +|  Offset  |  Size    |  Purpose                                  |
> +|  Bytes   |  Bytes   |                                           |
> ++==========+==========+===========================================+
> +|    0     |    4     |  Number of elements                       |
> ++----------+----------+-------------------------------------------+
> +|    4     |          |  Guest state buffer elements              |
> ++----------+----------+-------------------------------------------+
> +
> +GSB element:
> +
> ++----------+----------+-------------------------------------------+
> +|  Offset  |  Size    |  Purpose                                  |
> +|  Bytes   |  Bytes   |                                           |
> ++==========+==========+===========================================+
> +|    0     |    2     |  ID                                       |
> ++----------+----------+-------------------------------------------+
> +|    2     |    2     |  Size of Value                            |
> ++----------+----------+-------------------------------------------+
> +|    4     | As above |  Value                                    |
> ++----------+----------+-------------------------------------------+
> +
> +The ID in the GSB element specifies what is to be set. This includes
> +archtected state like GPRs, VSRs, SPRs, plus also some meta data about
> +the partition like the timebase offset and partition scoped page
> +table information.
> +
> ++--------+-------+----+--------+----------------------------------+
> +|   ID   | Size  | RW | Thread | Details                          |
> +|        | Bytes |    | Guest  |                                  |
> +|        |       |    | Scope  |                                  |
> ++========+=======+====+========+==================================+
> +| 0x0000 |       | RW |   TG   | NOP element                      |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
> +|        |       |    |        |                                  |
> +|        |       |    |        |- 0x00 Addr part scope table      |
> +|        |       |    |        |- 0x08 Num addr bits              |
> +|        |       |    |        |- 0x10 Size root dir              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
> +|        |       |    |        |                                  |
> +|        |       |    |        |- 0x0 Addr proc scope table       |
> +|        |       |    |        |- 0x8 Table size.                 |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0007-|       |    |        | Reserved                         |
> +| 0x0BFF |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
> +|        |       |    |        |                                  |
> +|        |       |    |        |- 0x0 Addr of buffer              |
> +|        |       |    |        |- 0x8 Buffer Size.                |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
> +|        |       |    |        |                                  |
> +|        |       |    |        |- 0x0 Addr of buffer              |
> +|        |       |    |        |- 0x8 Buffer Size.                |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x0C03-|       |    |        | Reserved                         |
> +| 0x0FFF |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
> +| 0x101F |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1021 | 0x08  | RW |   T    | NIA                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1022 | 0x08  | RW |   T    | MSR                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1023 | 0x08  | RW |   T    | LR                               |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1024 | 0x08  | RW |   T    | XER                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1025 | 0x08  | RW |   T    | CTR                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1026 | 0x08  | RW |   T    | CFAR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1027 | 0x08  | RW |   T    | SRR0                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1028 | 0x08  | RW |   T    | SRR1                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1029 | 0x08  | RW |   T    | DAR                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x102B | 0x08  | RW |   T    | VTB                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x102C | 0x08  | RW |   T    | LPCR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x102D | 0x08  | RW |   T    | HFSCR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x102E | 0x08  | RW |   T    | FSCR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x102F | 0x08  | RW |   T    | FPSCR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1032 | 0x08  | RW |   T    | CIABR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1033 | 0x08  | RW |   T    | PURR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1034 | 0x08  | RW |   T    | SPURR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1035 | 0x08  | RW |   T    | IC                               |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
> +| 0x1039 |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x103A | 0x08  | W  |   T    | PPR                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
> +| 0x103E |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x103F | 0x08  | RW |   T    | MMCRA                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1040 | 0x08  | RW |   T    | SIER                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1043 | 0x08  | RW |   T    | BESCR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1046 | 0x08  | RW |   T    | AMR                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1047 | 0x08  | RW |   T    | IAMR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1048 | 0x08  | RW |   T    | AMOR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x104A | 0x08  | RW |   T    | SDAR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x104B | 0x08  | RW |   T    | SIAR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x104C | 0x08  | RW |   T    | DSCR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x104D | 0x08  | RW |   T    | TAR                              |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x104E | 0x08  | RW |   T    | DEXCR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1052 | 0x08  | RW |   T    | CTRL                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x1053-|       |    |        | Reserved                         |
> +| 0x1FFF |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2000 | 0x04  | RW |   T    | CR                               |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2001 | 0x04  | RW |   T    | PIDR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2002 | 0x04  | RW |   T    | DSISR                            |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2003 | 0x04  | RW |   T    | VSCR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
> +| 0x200c |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x200D | 0x04  | RW |   T    | WORT                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x200E | 0x04  | RW |   T    | PSPB                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x200F-|       |    |        | Reserved                         |
> +| 0x2FFF |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
> +| 0x303F |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0x3040-|       |    |        | Reserved                         |
> +| 0xEFFF |       |    |        |                                  |
> ++--------+-------+----+--------+----------------------------------+
> +| 0xF000 | 0x08  | R  |   T    | HDAR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
> ++--------+-------+----+--------+----------------------------------+
> +| 0xF002 | 0x04  | R  |   T    | HEIR                             |
> ++--------+-------+----+--------+----------------------------------+
> +| 0xF003 | 0x08  | R  |   T    | ASDR                             |
> ++--------+-------+----+--------+----------------------------------+
> +
> +Miscellaneous info
> +==================
> +
> +State not in ptregs/hvregs
> +--------------------------
> +
> +In the v1 API, some state is not in the ptregs/hvstate. This includes
> +the vector register and some SPRs. For the L1 to set this state for
> +the L2, the L1 loads up these hardware registers before the
> +h_enter_nested() call and the L0 ensures they end up as the L2 state
> +(by not touching them).
> +
> +The v2 API removes this and explicitly sets this state via the GSB.
> +
> +L1 Implementation details: Caching state
> +----------------------------------------
> +
> +In the v1 API, all state is sent from the L1 to the L0 and vice versa
> +on every h_enter_nested() hcall. If the L0 is not currently running
> +any L2s, the L0 has no state information about them. The only
> +exception to this is the location of the partition table, registered
> +via h_set_partition_table().
> +
> +The v2 API changes this so that the L0 retains the L2 state even when
> +it's vCPUs are no longer running. This means that the L1 only needs to
> +communicate with the L0 about L2 state when it needs to modify the L2
> +state, or when it's value is out of date. This provides an opportunity
> +for performance optimisation.
> +
> +When a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
> +marks all L2 state as invalid. This means that if the L1 wants to know
> +the L2 state (say via a kvm_get_one_reg() call), it needs  to call
> +H_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
> +valid in L1 until the L2 is run again.
> +
> +Also, when an L1 modifies L2 vcpu state, it doesn't need to write it
> +to the L0 until that L2 vcpu runs again. Hence when the L1 updates
> +state (say via a kvm_set_one_reg() call), it writes to an internal L1
> +copy and only flushes this copy to the L0 when the L2 runs again via
> +the H_GUEST_VCPU_RUN() input buffer.
> +
> +This lazy updating of state by the L1 avoids unnecessary
> +H_GUEST_{G|S}ET_STATE() calls.
> +
> +References
> +==========
> +
> +For more details, please refer:
> +
> +[1] Kernel documentation (currently v4 on mailing list):
> +    - https://lore.kernel.org/linuxppc-dev/20230905034658.82835-1-jniethe5@gmail.com/



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros
  2023-09-06 23:48   ` Nicholas Piggin
@ 2023-09-11  6:21     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-11  6:21 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 05:18, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> Adding new macros for the new hypercall op-codes, their return codes,
>> Guest State Buffer (GSB) element IDs and few registers which shall be
>> used in following patches to support Nested PAPR API.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   include/hw/ppc/spapr.h        |  23 ++++-
>>   include/hw/ppc/spapr_nested.h | 186 ++++++++++++++++++++++++++++++++++
>>   target/ppc/cpu.h              |   2 +
>>   3 files changed, 209 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 538b2dfb89..3990fed1d9 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -367,6 +367,16 @@ struct SpaprMachineState {
>>   #define H_NOOP            -63
>>   #define H_UNSUPPORTED     -67
>>   #define H_OVERLAP         -68
>> +#define H_STATE           -75
> 
> [snip]
> 
> I didn't go through to make sure all the numbers are correct, but
> generally looks okay. Are these just copied from KVM sources (or
> vice versa)?

I have mostly referred PAPR spec for the numbers. I hope KVM sources 
follows the same.

> 
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index 25fac9577a..6f7f9b9d58 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1587,9 +1587,11 @@ void ppc_compat_add_property(Object *obj, const char *name,
>>   #define SPR_PSPB              (0x09F)
>>   #define SPR_DPDES             (0x0B0)
>>   #define SPR_DAWR0             (0x0B4)
>> +#define SPR_DAWR1             (0x0B5)
>>   #define SPR_RPR               (0x0BA)
>>   #define SPR_CIABR             (0x0BB)
>>   #define SPR_DAWRX0            (0x0BC)
>> +#define SPR_DAWRX1            (0x0BD)
>>   #define SPR_HFSCR             (0x0BE)
>>   #define SPR_VRSAVE            (0x100)
>>   #define SPR_USPRG0            (0x100)
> 
> Stray change? Should be in 2nd DAWR patch, presumably.

This was introduced here following the PAPR ACR spec for nested API 
initially to support GSB get/set requests. However, I can update the 
patch once the 2nd DAWR patch gets merged.

Thanks for reviewing the series.

regards,
Harsh

> 
> Thanks,
> Nick


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API
  2023-09-07  1:06   ` Nicholas Piggin
@ 2023-09-11  6:47     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-11  6:47 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 06:36, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This patch introduces new data structures to be used with Nested PAPR
>> API. Also extends kvmppc_hv_guest_state with additional set of registers
>> supported with nested PAPR API.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   include/hw/ppc/spapr_nested.h | 48 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index 5cb668dd53..f8db31075b 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -189,6 +189,39 @@
>>   /* End of list of Guest State Buffer Element IDs */
>>   #define GSB_LAST                GSB_VCPU_SPR_ASDR
>>   
>> +typedef struct SpaprMachineStateNestedGuest {
>> +    unsigned long vcpus;
>> +    struct SpaprMachineStateNestedGuestVcpu *vcpu;
>> +    uint64_t parttbl[2];
>> +    uint32_t pvr_logical;
>> +    uint64_t tb_offset;
>> +} SpaprMachineStateNestedGuest;
>> +
>> +struct SpaprMachineStateNested {
>> +
>> +    uint8_t api;
>> +#define NESTED_API_KVM_HV  1
>> +#define NESTED_API_PAPR    2
>> +    uint64_t ptcr;
>> +    uint32_t lpid_max;
>> +    uint32_t pvr_base;
>> +    bool capabilities_set;
>> +    GHashTable *guests;
>> +};
>> +
>> +struct SpaprMachineStateNestedGuestVcpuRunBuf {
>> +    uint64_t addr;
>> +    uint64_t size;
>> +};
>> +
>> +typedef struct SpaprMachineStateNestedGuestVcpu {
>> +    bool enabled;
>> +    struct SpaprMachineStateNestedGuestVcpuRunBuf runbufin;
>> +    struct SpaprMachineStateNestedGuestVcpuRunBuf runbufout;
>> +    CPUPPCState env;
>> +    int64_t tb_offset;
>> +    int64_t dec_expiry_tb;
>> +} SpaprMachineStateNestedGuestVcpu;
>>   
>>   /*
>>    * Register state for entering a nested guest with H_ENTER_NESTED.
>> @@ -228,6 +261,21 @@ struct kvmppc_hv_guest_state {
>>       uint64_t dawr1;
>>       uint64_t dawrx1;
>>       /* Version 2 ends here */
>> +    uint64_t dec;
>> +    uint64_t fscr;
>> +    uint64_t fpscr;
>> +    uint64_t bescr;
>> +    uint64_t ebbhr;
>> +    uint64_t ebbrr;
>> +    uint64_t tar;
>> +    uint64_t dexcr;
>> +    uint64_t hdexcr;
>> +    uint64_t hashkeyr;
>> +    uint64_t hashpkeyr;
>> +    uint64_t ctrl;
>> +    uint64_t vscr;
>> +    uint64_t vrsave;
>> +    ppc_vsr_t vsr[64];
>>   };
> 
> Why? I can't see where it's used... This is API for the original HV
> hcalls which is possibly now broken because the code uses sizeof()
> when mapping it.

Yeh, I had realised after posting the patches to cleanup these 
leftovers. Please ignore these additions, shall be removed.

> 
> In general I'm not a fan of splitting patches by the type of code they
> add. Definitions for external APIs okay. But for things like internal
> structures I prefer added where they are introduced.
> 
Make sense, I shall revisit and move declarations wherever used first.

> It's actually harder to review a patch if related / dependent changes
> aren't in it, IMO. What should be split is unrelated or independent
> changes and logical steps. Same goes for hcalls too actually. Take a
> look at the series that introduced nested HV. 120f738a467 adds all the
> hcalls, all the structures, etc.
> 
> So I would also hink about squashing at least get/set capabilities
> hcalls together, and guest create/delete, and probably vcpu create/run.

Hmm, I think we can keep get/set capab, guest create/delete together as
you suggested. We may want to keep vcpu_run separate as it has
significant changes, to keep it easier for review?
Let me know if you think otherwise.

regards,
Harsh

> 
> Thanks,
> Nick


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr
  2023-09-07  1:13   ` Nicholas Piggin
@ 2023-09-11  7:24     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-11  7:24 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 06:43, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> Use nested guest state specific struct for storing related info.
> 
> So this is the patch I would introduce the SpaprMachineStateNested
> struct, with just the .ptrc member. Add other members to it as they
> are used in later patches.

Sure, will do.

> 
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr.c         | 4 ++--
>>   hw/ppc/spapr_nested.c  | 4 ++--
>>   include/hw/ppc/spapr.h | 3 ++-
>>   3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 07e91e3800..e44686b04d 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1340,8 +1340,8 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>>   
>>           assert(lpid != 0);
>>   
>> -        patb = spapr->nested_ptcr & PTCR_PATB;
>> -        pats = spapr->nested_ptcr & PTCR_PATS;
>> +        patb = spapr->nested.ptcr & PTCR_PATB;
>> +        pats = spapr->nested.ptcr & PTCR_PATS;
>>   
>>           /* Check if partition table is properly aligned */
>>           if (patb & MAKE_64BIT_MASK(0, pats + 12)) {
> 
> At this point I wonder if we should first move the nested part of
> spapr_get_pate into nested code. It's a bit of a wart to have it
> here when most of the other nested cases are abstracted from non
> nested code quite well.

Yeh, I also felt similar when modifying the existing nested code here.
Let me do the needful in next version.

> 
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 121aa96ddc..a669470f1a 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -25,7 +25,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
>>           return H_PARAMETER;
>>       }
>>   
>> -    spapr->nested_ptcr = ptcr; /* Save new partition table */
>> +    spapr->nested.ptcr = ptcr; /* Save new partition table */
>>   
>>       return H_SUCCESS;
>>   }
>> @@ -157,7 +157,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>>       struct kvmppc_pt_regs *regs;
>>       hwaddr len;
>>   
>> -    if (spapr->nested_ptcr == 0) {
>> +    if (spapr->nested.ptcr == 0) {
>>           return H_NOT_AVAILABLE;
>>       }
>>   
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 3990fed1d9..c8b42af430 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -12,6 +12,7 @@
>>   #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>>   #include "hw/ppc/xics.h"        /* For ICSState */
>>   #include "hw/ppc/spapr_tpm_proxy.h"
>> +#include "hw/ppc/spapr_nested.h" /* for SpaprMachineStateNested */
>>   
>>   struct SpaprVioBus;
>>   struct SpaprPhbState;
>> @@ -216,7 +217,7 @@ struct SpaprMachineState {
>>       uint32_t vsmt;       /* Virtual SMT mode (KVM's "core stride") */
>>   
>>       /* Nested HV support (TCG only) */
>> -    uint64_t nested_ptcr;
>> +    struct SpaprMachineStateNested nested;
> 
> I think convention says to use the typedef for these?

Sure, will update. Thanks.

regards,
Harsh

> 
> Thanks,
> Nick
> 
>>   
>>       Notifier epow_notifier;
>>       QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api
  2023-09-07  1:35   ` Nicholas Piggin
@ 2023-09-11  8:18     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-11  8:18 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 07:05, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> With this patch, isolating kvm-hv nested api code to be executed only
>> when cap-nested-hv is set. This helps keeping api specific logic
>> mutually exclusive.
> 
> Changelog needs a bit of improvement. Emphasis on "why" for changelogs.
> If you take a changeset that makes a single logical change to the code,
> you should be able to understand why that is done. You could make some
> assumptions about the bigger series when it comes to details so don't
> have to explain from first principles. But if it's easy to explain why
> the high level, you could.
> 
> Why are we adding this fundamentally? So that the spapr nested code can
> be extended to support a second API.
> 
> This patch should add the api field to the struct, and also the
> NESTED_API_KVM_HV definition.

Sure, folding related changes (struct member, macros) into this patch
and updating changelog as suggested sounds more meaningful to me too.

regards,
Harsh

> 
> Thanks,
> Nick
> 
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr.c      | 7 ++++++-
>>   hw/ppc/spapr_caps.c | 1 +
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index e44686b04d..0aa9f21516 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1334,8 +1334,11 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
>>           /* Copy PATE1:GR into PATE0:HR */
>>           entry->dw0 = spapr->patb_entry & PATE0_HR;
>>           entry->dw1 = spapr->patb_entry;
>> +        return true;
>> +    }
>> +    assert(spapr->nested.api);
>>   
>> -    } else {
>> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
>>           uint64_t patb, pats;
>>   
>>           assert(lpid != 0);
>> @@ -3437,6 +3440,8 @@ static void spapr_instance_init(Object *obj)
>>           spapr_get_host_serial, spapr_set_host_serial);
>>       object_property_set_description(obj, "host-serial",
>>           "Host serial number to advertise in guest device tree");
>> +    /* Nested */
>> +    spapr->nested.api = 0;
>>   }
>>   
>>   static void spapr_machine_finalizefn(Object *obj)
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 5a0755d34f..a3a790b026 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -454,6 +454,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>>           return;
>>       }
>>   
>> +    spapr->nested.api = NESTED_API_KVM_HV;
>>       if (kvm_enabled()) {
>>           if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
>>                                 spapr->max_compat_pvr)) {
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API
  2023-09-07  1:49   ` Nicholas Piggin
@ 2023-09-19  9:49     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-19  9:49 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 07:19, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This patch introduces a new cmd line option cap-nested-papr to enable
>> support for nested PAPR API by setting the nested.api version accordingly.
>> It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
>> can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
>> this is meant for nested guest on pseries (PowerVM) where L0 retains
>> whole state of the nested guest. Both APIs are thus mutually exclusive.
>> Support for related hcalls is being added in next set of patches.
> 
> This changelog could use some work too.
> 
> "Introduce a SPAPR capability cap-nested-papr with provides a nested
(s/with/which?)
> HV facility to the guest. This is similar to cap-nested-hv, but uses
> a different (incompatible) API and so they are mutually exclusive."
> 
We may want to emphasize that nested virtualization on Power uses only 
this new API, whereas the other API was targeted towards PowerNV but 
never became part of the PAPR spec?

> You could add some documentation to say recent Linux pseries guests
> support both, and explain more about KVM and PowerVM support there too,
> if it is relevant.
> 

Sure, will update.

>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr.c         |  2 ++
>>   hw/ppc/spapr_caps.c    | 48 ++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h |  5 ++++-
>>   3 files changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0aa9f21516..cbab7a825f 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
>>           &vmstate_spapr_cap_fwnmi,
>>           &vmstate_spapr_fwnmi,
>>           &vmstate_spapr_cap_rpt_invalidate,
>> +        &vmstate_spapr_cap_nested_papr,
>>           NULL
>>       }
>>   };
>> @@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>       smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
>>       smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
>>       smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>> +    smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
>>       smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>>       smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>>       smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index a3a790b026..d3b9f107aa 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState *spapr,
>>       }
>>   }
>>   
>> +static void cap_nested_papr_apply(SpaprMachineState *spapr,
>> +                                    uint8_t val, Error **errp)
>> +{
>> +    ERRP_GUARD();
>> +    PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>> +    CPUPPCState *env = &cpu->env;
>> +
>> +    if (!val) {
>> +        /* capability disabled by default */
>> +        return;
>> +    }
>> +
>> +    if (tcg_enabled()) {
>> +        if (!(env->insns_flags2 & PPC2_ISA300)) {
>> +            error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
>> +            error_append_hint(errp,
>> +                              "Try appending -machine cap-nested-papr=off\n");
>> +            return;
>> +        }
>> +        spapr->nested.api = NESTED_API_PAPR;
> 
> I'm not seeing any mutual exclusion with the other cap here. What if
> you enable them both? Lucky dip?
> 
> It would actually be nice to enable both even if you just choose the
> mode after the first hcall is made. I think you could actually support
> both (even concurrently) quite easily.
> 
> For now this is probably okay if you fix mutex.
> 

Thanks for catching this. Will fix.

> 
>> +    } else if (kvm_enabled()) {
>> +        /*
>> +         * this gets executed in L1 qemu when L2 is launched,
>> +         * needs kvm-hv support in L1 kernel.
>> +         */
>> +        if (!kvmppc_has_cap_nested_kvm_hv()) {
>> +            error_setg(errp,
>> +                       "KVM implementation does not support Nested-HV");
>> +            error_append_hint(errp,
>> +                              "Try appending -machine cap-nested-hv=off\n");
>> +        } else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
>> +            error_setg(errp, "Error enabling cap-nested-hv with KVM");
>> +            error_append_hint(errp,
>> +                              "Try appending -machine cap-nested-hv=off\n");
>> +        }
> 
> This is just copy and pasted from the other cap, isn't it?
> 
Yeh, error logs needs to be rephrased.

>> +    }
>> +}
>> +
>>   static void cap_large_decr_apply(SpaprMachineState *spapr,
>>                                    uint8_t val, Error **errp)
>>   {
>> @@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>>           .type = "bool",
>>           .apply = cap_nested_kvm_hv_apply,
>>       },
>> +    [SPAPR_CAP_NESTED_PAPR] = {
>> +        .name = "nested-papr",
>> +        .description = "Allow Nested PAPR (Phyp)",
>> +        .index = SPAPR_CAP_NESTED_PAPR,
>> +        .get = spapr_cap_get_bool,
>> +        .set = spapr_cap_set_bool,
>> +        .type = "bool",
>> +        .apply = cap_nested_papr_apply,
>> +    },
> 
> Should scrub "Phyp". "Phyp" and PowerVM also doesn't mean anything for
> us really. We both implement PAPR. "Nested PAPR" is jibberish for a user
> as well -- "Allow Nested KVM-HV (PAPR API)" or similar might be a bit
> better.
> 
I think "Allow Nested-HV (PAPR API)" may be better since L0 is the 
hypervisor here which takes care of nested guests. We may want to avoid 
the term "Nested KVM" which usually means KVM on KVM as hypervisor.
Also, AFAIK, the former API never became part of PAPR spec, not sure if 
it is appropriate to say - both implement PAPR ?

regards,
Harsh

> Thanks,
> Nick
> 
>>       [SPAPR_CAP_LARGE_DECREMENTER] = {
>>           .name = "large-decr",
>>           .description = "Allow Large Decrementer",
>> @@ -920,6 +967,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
>>   SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>>   SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>>   SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
>> +SPAPR_CAP_MIG_STATE(nested_papr, SPAPR_CAP_NESTED_PAPR);
>>   SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>>   SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>>   SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index c8b42af430..8a6e9ce929 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -81,8 +81,10 @@ typedef enum {
>>   #define SPAPR_CAP_RPT_INVALIDATE        0x0B
>>   /* Support for AIL modes */
>>   #define SPAPR_CAP_AIL_MODE_3            0x0C
>> +/* Nested PAPR */
>> +#define SPAPR_CAP_NESTED_PAPR           0x0D
>>   /* Num Caps */
>> -#define SPAPR_CAP_NUM                   (SPAPR_CAP_AIL_MODE_3 + 1)
>> +#define SPAPR_CAP_NUM                   (SPAPR_CAP_NESTED_PAPR + 1)
>>   
>>   /*
>>    * Capability Values
>> @@ -1005,6 +1007,7 @@ extern const VMStateDescription vmstate_spapr_cap_sbbc;
>>   extern const VMStateDescription vmstate_spapr_cap_ibs;
>>   extern const VMStateDescription vmstate_spapr_cap_hpt_maxpagesize;
>>   extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
>> +extern const VMStateDescription vmstate_spapr_cap_nested_papr;
>>   extern const VMStateDescription vmstate_spapr_cap_large_decr;
>>   extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
>>   extern const VMStateDescription vmstate_spapr_cap_fwnmi;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES
  2023-09-07  2:02   ` Nicholas Piggin
@ 2023-09-19 10:48     ` Harsh Prateek Bora
  2023-10-03  8:10     ` Cédric Le Goater
  1 sibling, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-09-19 10:48 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 07:32, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This patch implements nested PAPR hcall H_GUEST_GET_CAPABILITIES and
>> also enables registration of nested PAPR hcalls whenever an L0 is
>> launched with cap-nested-papr=true. The common registration routine
>> shall be used by future patches for registration of related hcall
>> support
>> being added. This hcall is used by L1 kernel to get the set of guest
>> capabilities that are supported by L0 (Qemu TCG).
> 
> Changelog can drop "This patch". Probably don't have to be so
> detailed here either -- we already established that PAPR hcalls can
> be used with cap-nested-papr in the last patch, we know that L1
> kernels make the hcalls to the vhyp, etc.
> 
> "Introduce the nested PAPR hcall H_GUEST_GET_CAPABILITIES which
> is used to query the capabilities of the API and the L2 guests
> it provides."
> 
> I would squash this with set.
> 

Sure, will update the commit log and squash with set.

>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_caps.c           |  1 +
>>   hw/ppc/spapr_nested.c         | 35 +++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_nested.h |  6 ++++++
>>   3 files changed, 42 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index d3b9f107aa..cbe53a79ec 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -511,6 +511,7 @@ static void cap_nested_papr_apply(SpaprMachineState *spapr,
>>               return;
>>           }
>>           spapr->nested.api = NESTED_API_PAPR;
>> +        spapr_register_nested_phyp();
>>       } else if (kvm_enabled()) {
>>           /*
>>            * this gets executed in L1 qemu when L2 is launched,
> 
> Hmm, this doesn't match nested HV registration. If you want to register
> the hcalls in the cap apply, can you move spapr_register_nested()
> there first? It may make more sense to go in as a dummy function with
> the cap patch first, since you don't introduce all hcalls together.
> 
> Also phyp->papr. Scrub for phyp please.

Sure, will do.

> 
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index a669470f1a..37f3a49be2 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -6,6 +6,7 @@
>>   #include "hw/ppc/spapr.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "hw/ppc/spapr_nested.h"
>> +#include "cpu-models.h"
>>   
>>   #ifdef CONFIG_TCG
>>   #define PRTS_MASK      0x1f
>> @@ -375,6 +376,29 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>>       address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>>   }
>>   
>> +static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>> +                                             SpaprMachineState *spapr,
>> +                                             target_ulong opcode,
>> +                                             target_ulong *args)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong flags = args[0];
>> +
>> +    if (flags) { /* don't handle any flags capabilities for now */
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
>> +        (CPU_POWERPC_POWER9_BASE))
>> +        env->gpr[4] = H_GUEST_CAPABILITIES_P9_MODE;
>> +
>> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
>> +        (CPU_POWERPC_POWER10_BASE))
>> +        env->gpr[4] = H_GUEST_CAPABILITIES_P10_MODE;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -382,6 +406,12 @@ void spapr_register_nested(void)
>>       spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
>>       spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
>>   }
>> +
>> +void spapr_register_nested_phyp(void)
>> +{
>> +    spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>> +}
>> +
>>   #else
>>   void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>>   {
>> @@ -392,4 +422,9 @@ void spapr_register_nested(void)
>>   {
>>       /* DO NOTHING */
>>   }
>> +
>> +void spapr_register_nested_phyp(void)
>> +{
>> +    /* DO NOTHING */
>> +}
>>   #endif
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index f8db31075b..ce198e9f70 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -189,6 +189,11 @@
>>   /* End of list of Guest State Buffer Element IDs */
>>   #define GSB_LAST                GSB_VCPU_SPR_ASDR
>>   
>> +/* Bit masks to be used in nested PAPR API */
>> +#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
>> +#define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
>> +#define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000
> 
> See introducing these defines with the patch that uses them isn't so
> bad :)
> 

It's better indeed:)

regards,
Harsh

> Thanks,
> Nick
> 
>> +
>>   typedef struct SpaprMachineStateNestedGuest {
>>       unsigned long vcpus;
>>       struct SpaprMachineStateNestedGuestVcpu *vcpu;
>> @@ -331,6 +336,7 @@ struct nested_ppc_state {
>>   };
>>   
>>   void spapr_register_nested(void);
>> +void spapr_register_nested_phyp(void);
>>   void spapr_exit_nested(PowerPCCPU *cpu, int excp);
>>   
>>   #endif /* HW_SPAPR_NESTED_H */
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES
  2023-09-07  2:09   ` Nicholas Piggin
@ 2023-10-03  4:59     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-03  4:59 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 07:39, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This patch implements nested PAPR hcall H_GUEST_SET_CAPABILITIES.
>> This is used by L1 to set capabilities of the nested guest being
>> created. The capabilities being set are subset of the capabilities
>> returned from the previous call to H_GUEST_GET_CAPABILITIES hcall.
>> Currently, it only supports P9/P10 capability check through PVR.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr.c                |  1 +
>>   hw/ppc/spapr_nested.c         | 46 +++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_nested.h |  3 +++
>>   3 files changed, 50 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index cbab7a825f..7c6f6ee25d 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3443,6 +3443,7 @@ static void spapr_instance_init(Object *obj)
>>           "Host serial number to advertise in guest device tree");
>>       /* Nested */
>>       spapr->nested.api = 0;
>> +    spapr->nested.capabilities_set = false;
> 
> I would actually think about moving spapr->nested init into
> spapr_nested.c.
> 

Agree, moved.

>>   }
>>   
>>   static void spapr_machine_finalizefn(Object *obj)
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 37f3a49be2..9af65f257f 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -399,6 +399,51 @@ static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>>       return H_SUCCESS;
>>   }
>>   
>> +static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
>> +                                             SpaprMachineState *spapr,
>> +                                             target_ulong opcode,
>> +                                              target_ulong *args)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong flags = args[0];
>> +    target_ulong capabilities = args[1];
>> +
>> +    if (flags) { /* don't handle any flags capabilities for now */
>> +        return H_PARAMETER;
>> +    }
>> +
>> +
> 
> May need to do a pass over whitespace.
> 

Sure, done.

>> +    /* isn't supported */
>> +    if (capabilities & H_GUEST_CAPABILITIES_COPY_MEM) {
>> +        env->gpr[4] = 0;
>> +        return H_P2;
>> +    }
>> +
>> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
>> +        (CPU_POWERPC_POWER9_BASE)) {
>> +        /* We are a P9 */
>> +        if (!(capabilities & H_GUEST_CAPABILITIES_P9_MODE)) {
>> +            env->gpr[4] = 1;
>> +            return H_P2;
>> +        }
>> +    }
>> +
>> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
>> +        (CPU_POWERPC_POWER10_BASE)) {
>> +        /* We are a P10 */
> 
> The 2 comments above aren't helpful. Just remove them.
> 

Sure, done.

>> +        if (!(capabilities & H_GUEST_CAPABILITIES_P10_MODE)) {
>> +            env->gpr[4] = 2;
>> +            return H_P2;
>> +        }
>> +    }
>> +
>> +    spapr->nested.capabilities_set = true;
> 
> Is it okay to set twice? If not, add a check. If yes, remove
> capabilities_set until it's needed.
> 

Thanks for pointing it out, adding a check as appropriate.

Thanks
Harsh

>> +
>> +    spapr->nested.pvr_base = env->spr[SPR_PVR];
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -410,6 +455,7 @@ void spapr_register_nested(void)
>>   void spapr_register_nested_phyp(void)
>>   {
>>       spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>> +    spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>>   }
>>   
>>   #else
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index ce198e9f70..a7996251cb 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -193,6 +193,9 @@
>>   #define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
>>   #define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
>>   #define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000
>> +#define H_GUEST_CAP_COPY_MEM_BMAP   0
>> +#define H_GUEST_CAP_P9_MODE_BMAP    1
>> +#define H_GUEST_CAP_P10_MODE_BMAP   2
>>   
>>   typedef struct SpaprMachineStateNestedGuest {
>>       unsigned long vcpus;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE
  2023-09-07  2:28   ` Nicholas Piggin
@ 2023-10-03  7:57     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-03  7:57 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 07:58, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This hcall is used by L1 to indicate to L0 that a new nested guest needs
>> to be created and therefore necessary resource allocation shall be made.
>> The L0 uses a hash table for nested guest specific resource management.
>> This data structure is further utilized by other hcalls to operate on
>> related members during entire life cycle of the nested guest.
> 
> Similar comment for changelog re detail. Detailed specification of API
> and implementation could go in comments or documentation if useful.
> 
Sure, squashing guest create/delete together and updating commit log to 
be abstract as needed.

>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_nested.c         | 75 +++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_nested.h |  3 ++
>>   2 files changed, 78 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 9af65f257f..09bbbfb341 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -444,6 +444,80 @@ static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
>>       return H_SUCCESS;
>>   }
>>   
>> +static void
>> +destroy_guest_helper(gpointer value)
>> +{
>> +    struct SpaprMachineStateNestedGuest *guest = value;
>> +    g_free(guest);
>> +}
>> +
>> +static target_ulong h_guest_create(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   target_ulong opcode,
>> +                                   target_ulong *args)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong flags = args[0];
>> +    target_ulong continue_token = args[1];
>> +    uint64_t lpid;
>> +    int nguests = 0;
>> +    struct SpaprMachineStateNestedGuest *guest;
>> +
>> +    if (flags) { /* don't handle any flags for now */
>> +        return H_UNSUPPORTED_FLAG;
>> +    }
>> +
>> +    if (continue_token != -1) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
>> +        return H_FUNCTION;
>> +    }
>> +
>> +    if (!spapr->nested.capabilities_set) {
>> +        return H_STATE;
>> +    }
>> +
>> +    if (!spapr->nested.guests) {
>> +        spapr->nested.lpid_max = NESTED_GUEST_MAX;
>> +        spapr->nested.guests = g_hash_table_new_full(NULL,
>> +                                                     NULL,
>> +                                                     NULL,
>> +                                                     destroy_guest_helper);
> 
> Is lpid_max only used by create? Probably no need to have it in spapr
> then->nested then. Also, do we even need to have a limit?

Yes, as of now, it is being used only by create and doesnt need to part
of spapr->nested. We can simply use the macro for max guests. Keeping it
to emulate a finite resource model.
For all practical purposes, nested guests in an TCG emulated L0
shouldn't reach that limit.

> 
>> +    }
>> +
>> +    nguests = g_hash_table_size(spapr->nested.guests);
>> +
>> +    if (nguests == spapr->nested.lpid_max) {
>> +        return H_NO_MEM;
>> +    }
>> +
>> +    /* Lookup for available lpid */
>> +    for (lpid = 1; lpid < spapr->nested.lpid_max; lpid++) {
> 
> PAPR API calls it "guest ID" I think. Should change all references to
> lpid to that.

Changing it to "guestid".

> 
>> +        if (!(g_hash_table_lookup(spapr->nested.guests,
>> +                                  GINT_TO_POINTER(lpid)))) {
>> +            break;
>> +        }
>> +    }
>> +    if (lpid == spapr->nested.lpid_max) {
>> +        return H_NO_MEM;
>> +    }
>> +
>> +    guest = g_try_new0(struct SpaprMachineStateNestedGuest, 1);
>> +    if (!guest) {
>> +        return H_NO_MEM;
>> +    }
>> +
>> +    guest->pvr_logical = spapr->nested.pvr_base;
>> +
>> +    g_hash_table_insert(spapr->nested.guests, GINT_TO_POINTER(lpid), guest);
>> +    printf("%s: lpid: %lu (MAX: %i)\n", __func__, lpid, spapr->nested.lpid_max);
> 
> Remove printf.
> 
Done.

>> +
>> +    env->gpr[4] = lpid;
>> +    return H_SUCCESS;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -456,6 +530,7 @@ void spapr_register_nested_phyp(void)
>>   {
>>       spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>>       spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>> +    spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
>>   }
>>   
>>   #else
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index a7996251cb..7841027df8 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -197,6 +197,9 @@
>>   #define H_GUEST_CAP_P9_MODE_BMAP    1
>>   #define H_GUEST_CAP_P10_MODE_BMAP   2
>>   
>> +/* Nested PAPR API macros */
>> +#define NESTED_GUEST_MAX 4096
> 
> Prefix with PAPR_?

Done.

Thanks
Harsh
> 
> Thanks,
> Nick
> 
>> +
>>   typedef struct SpaprMachineStateNestedGuest {
>>       unsigned long vcpus;
>>       struct SpaprMachineStateNestedGuestVcpu *vcpu;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE
  2023-09-07  2:31   ` Nicholas Piggin
@ 2023-10-03  8:01     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-03  8:01 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 08:01, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This hcall is used by L1 to delete a guest entry in L0 or can also be
>> used to delete all guests if needed (usually in shutdown scenarios).
> 
> I'd squash with at least the create hcall.

Done.

> 
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_nested.c         | 32 ++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_nested.h |  1 +
>>   2 files changed, 33 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 3605f27115..5afdad4990 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -1692,6 +1692,37 @@ static void exit_process_output_buffer(PowerPCCPU *cpu,
>>       return;
>>   }
>>   
>> +static target_ulong h_guest_delete(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   target_ulong opcode,
>> +                                   target_ulong *args)
>> +{
>> +    target_ulong flags = args[0];
>> +    target_ulong lpid = args[1];
>> +    struct SpaprMachineStateNestedGuest *guest;
>> +
>> +    if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
>> +        return H_FUNCTION;
>> +    }
> 
> If you only register these hcalls when you apply the cap, then you
> don't need to test it, right?
> 
Yes, cleaned up now.

> Open question as to whether it's better to register hcalls when
> enabling such caps, or do the tests for them here. I guess the
> former makes sense.

Yeh, I am inclined towards former as well.

> 
>> +
>> +    /* handle flag deleteAllGuests, remaining bits reserved */
> 
> This comment is confusing. What is flag deleteAllGuests?
> 
This flag, as per spec, if set, should delete all guests and the
provided guestID is ignored. Updating comment to mention the same.

> H_GUEST_DELETE_ALL_MASK? Is that a mask, or a flag?

Flag, Updating it to H_GUEST_DELETE_ALL_FLAG.

> 
>> +    if (flags & ~H_GUEST_DELETE_ALL_MASK) {
>> +        return H_UNSUPPORTED_FLAG;
>> +    } else if (flags & H_GUEST_DELETE_ALL_MASK) {
>> +        g_hash_table_destroy(spapr->nested.guests);
>> +        return H_SUCCESS;
>> +    }
>> +
>> +    guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
>> +    if (!guest) {
>> +        return H_P2;
>> +    }
>> +
>> +    g_hash_table_remove(spapr->nested.guests, GINT_TO_POINTER(lpid));
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -1709,6 +1740,7 @@ void spapr_register_nested_phyp(void)
>>       spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
>>       spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
>>       spapr_register_hypercall(H_GUEST_RUN_VCPU        , h_guest_run_vcpu);
>> +    spapr_register_hypercall(H_GUEST_DELETE          , h_guest_delete);
>>   }
>>   
>>   #else
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index ca5d28c06e..9eb43778ad 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -209,6 +209,7 @@
>>   #define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000000000000000 /* BE in GSB */
>>   #define GUEST_STATE_REQUEST_GUEST_WIDE       0x1
>>   #define GUEST_STATE_REQUEST_SET              0x2
>> +#define H_GUEST_DELETE_ALL_MASK              0x8000000000000000ULL
>>   
>>   #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
>>       .id = (i),                                     \
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES
  2023-09-07  2:02   ` Nicholas Piggin
  2023-09-19 10:48     ` Harsh Prateek Bora
@ 2023-10-03  8:10     ` Cédric Le Goater
  1 sibling, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2023-10-03  8:10 UTC (permalink / raw)
  To: Nicholas Piggin, Harsh Prateek Bora, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul

On 9/7/23 04:02, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This patch implements nested PAPR hcall H_GUEST_GET_CAPABILITIES and
>> also enables registration of nested PAPR hcalls whenever an L0 is
>> launched with cap-nested-papr=true. The common registration routine
>> shall be used by future patches for registration of related hcall
>> support
>> being added. This hcall is used by L1 kernel to get the set of guest
>> capabilities that are supported by L0 (Qemu TCG).
> 
> Changelog can drop "This patch". Probably don't have to be so
> detailed here either -- we already established that PAPR hcalls can
> be used with cap-nested-papr in the last patch, we know that L1
> kernels make the hcalls to the vhyp, etc.
> 
> "Introduce the nested PAPR hcall H_GUEST_GET_CAPABILITIES which
> is used to query the capabilities of the API and the L2 guests
> it provides."
> 
> I would squash this with set.
> 
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_caps.c           |  1 +
>>   hw/ppc/spapr_nested.c         | 35 +++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_nested.h |  6 ++++++
>>   3 files changed, 42 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index d3b9f107aa..cbe53a79ec 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -511,6 +511,7 @@ static void cap_nested_papr_apply(SpaprMachineState *spapr,
>>               return;
>>           }
>>           spapr->nested.api = NESTED_API_PAPR;
>> +        spapr_register_nested_phyp();
>>       } else if (kvm_enabled()) {
>>           /*
>>            * this gets executed in L1 qemu when L2 is launched,
> 
> Hmm, this doesn't match nested HV registration. If you want to register
> the hcalls in the cap apply, can you move spapr_register_nested()
> there first? It may make more sense to go in as a dummy function with
> the cap patch first, since you don't introduce all hcalls together.
> 
> Also phyp->papr. Scrub for phyp please.

Ah. I was going to say the opposit since on an LPAR :

Architecture:            ppc64le
   Byte Order:            Little Endian
CPU(s):                  192
   On-line CPU(s) list:   0-191
Model name:              POWER10 (architected), altivec supported
   Model:                 2.0 (pvr 0080 0200)
   Thread(s) per core:    8
   Core(s) per socket:    6
   Socket(s):             4
Virtualization features:
   Hypervisor vendor:     pHyp   <-----
   Virtualization type:   para



C.


> 
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index a669470f1a..37f3a49be2 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -6,6 +6,7 @@
>>   #include "hw/ppc/spapr.h"
>>   #include "hw/ppc/spapr_cpu_core.h"
>>   #include "hw/ppc/spapr_nested.h"
>> +#include "cpu-models.h"
>>   
>>   #ifdef CONFIG_TCG
>>   #define PRTS_MASK      0x1f
>> @@ -375,6 +376,29 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>>       address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>>   }
>>   
>> +static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>> +                                             SpaprMachineState *spapr,
>> +                                             target_ulong opcode,
>> +                                             target_ulong *args)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong flags = args[0];
>> +
>> +    if (flags) { /* don't handle any flags capabilities for now */
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
>> +        (CPU_POWERPC_POWER9_BASE))
>> +        env->gpr[4] = H_GUEST_CAPABILITIES_P9_MODE;
>> +
>> +    if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
>> +        (CPU_POWERPC_POWER10_BASE))
>> +        env->gpr[4] = H_GUEST_CAPABILITIES_P10_MODE;
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -382,6 +406,12 @@ void spapr_register_nested(void)
>>       spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
>>       spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
>>   }
>> +
>> +void spapr_register_nested_phyp(void)
>> +{
>> +    spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>> +}
>> +
>>   #else
>>   void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>>   {
>> @@ -392,4 +422,9 @@ void spapr_register_nested(void)
>>   {
>>       /* DO NOTHING */
>>   }
>> +
>> +void spapr_register_nested_phyp(void)
>> +{
>> +    /* DO NOTHING */
>> +}
>>   #endif
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index f8db31075b..ce198e9f70 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -189,6 +189,11 @@
>>   /* End of list of Guest State Buffer Element IDs */
>>   #define GSB_LAST                GSB_VCPU_SPR_ASDR
>>   
>> +/* Bit masks to be used in nested PAPR API */
>> +#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000000000000000
>> +#define H_GUEST_CAPABILITIES_P9_MODE  0x4000000000000000
>> +#define H_GUEST_CAPABILITIES_P10_MODE 0x2000000000000000
> 
> See introducing these defines with the patch that uses them isn't so
> bad :)
> 
> Thanks,
> Nick
> 
>> +
>>   typedef struct SpaprMachineStateNestedGuest {
>>       unsigned long vcpus;
>>       struct SpaprMachineStateNestedGuestVcpu *vcpu;
>> @@ -331,6 +336,7 @@ struct nested_ppc_state {
>>   };
>>   
>>   void spapr_register_nested(void);
>> +void spapr_register_nested_phyp(void);
>>   void spapr_exit_nested(PowerPCCPU *cpu, int excp);
>>   
>>   #endif /* HW_SPAPR_NESTED_H */
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU
  2023-09-07  2:49   ` Nicholas Piggin
@ 2023-10-04  4:49     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-04  4:49 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 08:19, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This patch implements support for hcall H_GUEST_CREATE_VCPU which is
>> used to instantiate a new VCPU for a previously created nested guest.
>> The L1 provide the guest-id (returned by L0 during call to
>> H_GUEST_CREATE) and an associated unique vcpu-id to refer to this
>> instance in future calls. It is assumed that vcpu-ids are being
>> allocated in a sequential manner and max vcpu limit is 2048.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_nested.c         | 110 ++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h        |   1 +
>>   include/hw/ppc/spapr_nested.h |   1 +
>>   3 files changed, 112 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 09bbbfb341..e7956685af 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -376,6 +376,47 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>>       address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>>   }
>>   
>> +static
>> +SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
>> +                                                     target_ulong lpid)
>> +{
>> +    SpaprMachineStateNestedGuest *guest;
>> +
>> +    guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
>> +    return guest;
>> +}
> 
> Are you namespacing the new API stuff with papr or no? Might be good to
> reduce confusion.
> 
I guess you were referring to vcpu_check below.
Renaming vcpu_check to spapr_nested_vcpu_check().

>> +
>> +static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
>> +                       target_ulong vcpuid,
>> +                       bool inoutbuf)
> 
> What's it checking? That the id is valid? Allocated? Enabled?
> 

This is being introduced to do sanity checks for the provided vcpuid of 
a guest. It should check if the vcpuid is valid, allocated and enabled 
before using further.

>> +{
>> +    struct SpaprMachineStateNestedGuestVcpu *vcpu;
>> +
>> +    if (vcpuid >= NESTED_GUEST_VCPU_MAX) {
>> +        return false;
>> +    }
>> +
>> +    if (!(vcpuid < guest->vcpus)) {
>> +        return false;
>> +    }
>> +
>> +    vcpu = &guest->vcpu[vcpuid];
>> +    if (!vcpu->enabled) {
>> +        return false;
>> +    }
>> +
>> +    if (!inoutbuf) {
>> +        return true;
>> +    }
>> +
>> +    /* Check to see if the in/out buffers are registered */
>> +    if (vcpu->runbufin.addr && vcpu->runbufout.addr) {
>> +        return true;
>> +    }
>> +

I think I shall move in/out buf related checks to vcpu_run patch.

>> +    return false;
>> +}
>> +
>>   static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>>                                                SpaprMachineState *spapr,
>>                                                target_ulong opcode,
>> @@ -448,6 +489,11 @@ static void
>>   destroy_guest_helper(gpointer value)
>>   {
>>       struct SpaprMachineStateNestedGuest *guest = value;
>> +    int i = 0;
> 
> Don't need to set i = 0 twice. A newline would be good though.
> 

Yeh, declaring with for loop and removing above init.

>> +    for (i = 0; i < guest->vcpus; i++) {
>> +        cpu_ppc_tb_free(&guest->vcpu[i].env);
>> +    }
>> +    g_free(guest->vcpu);
>>       g_free(guest);
>>   }
>>   
>> @@ -518,6 +564,69 @@ static target_ulong h_guest_create(PowerPCCPU *cpu,
>>       return H_SUCCESS;
>>   }
>>   
>> +static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
>> +                                        SpaprMachineState *spapr,
>> +                                        target_ulong opcode,
>> +                                        target_ulong *args)
>> +{
>> +    CPUPPCState *env = &cpu->env, *l2env;
>> +    target_ulong flags = args[0];
>> +    target_ulong lpid = args[1];
>> +    target_ulong vcpuid = args[2];
>> +    SpaprMachineStateNestedGuest *guest;
>> +
>> +    if (flags) { /* don't handle any flags for now */
>> +        return H_UNSUPPORTED_FLAG;
>> +    }
>> +
>> +    guest = spapr_get_nested_guest(spapr, lpid);
>> +    if (!guest) {
>> +        return H_P2;
>> +    }
>> +
>> +    if (vcpuid < guest->vcpus) {
>> +        return H_IN_USE;
>> +    }
>> +
>> +    if (guest->vcpus >= NESTED_GUEST_VCPU_MAX) {
>> +        return H_P3;
>> +    }
>> +
>> +    if (guest->vcpus) {
>> +        struct SpaprMachineStateNestedGuestVcpu *vcpus;
> 
> Ditto for using typedefs. Do a sweep for this.
> 
Sure, done.

>> +        vcpus = g_try_renew(struct SpaprMachineStateNestedGuestVcpu,
>> +                            guest->vcpu,
>> +                            guest->vcpus + 1);
> 
> g_try_renew doesn't work with NULL mem? That's unfortunate.
> 

Hmm, behaviour with NULL is undefined, so keeping as is.

>> +        if (!vcpus) {
>> +            return H_NO_MEM;
>> +        }
>> +        memset(&vcpus[guest->vcpus], 0,
>> +               sizeof(struct SpaprMachineStateNestedGuestVcpu));
>> +        guest->vcpu = vcpus;
>> +        l2env = &vcpus[guest->vcpus].env;
>> +    } else {
>> +        guest->vcpu = g_try_new0(struct SpaprMachineStateNestedGuestVcpu, 1);
>> +        if (guest->vcpu == NULL) {
>> +            return H_NO_MEM;
>> +        }
>> +        l2env = &guest->vcpu->env;
>> +    }
> 
> These two legs seem to be doing the same thing in different
> ways wrt l2env. Just assign guest->vcpu in the branches and
> get the l2env from guest->vcpu[guest->vcpus] afterward, no?
> 
Sure, that seems better.

>> +    /* need to memset to zero otherwise we leak L1 state to L2 */
>> +    memset(l2env, 0, sizeof(CPUPPCState));
> 
> AFAIKS you just zeroed it above.
> 

Yeh, cleaning up the redundant memset.

>> +    /* Copy L1 PVR to L2 */
>> +    l2env->spr[SPR_PVR] = env->spr[SPR_PVR];
>> +    cpu_ppc_tb_init(l2env, SPAPR_TIMEBASE_FREQ);
> 
> I would move this down to the end, because it's setting up the
> vcpu...
> 

Make sense to re-order above and below chunks.

>> +
>> +    guest->vcpus++;
>> +    assert(vcpuid < guest->vcpus); /* linear vcpuid allocation only */
>> +    guest->vcpu[vcpuid].enabled = true;
>> +
> 
> ... This is still allocating the vcpu so move it up.
> 
>> +    if (!vcpu_check(guest, vcpuid, false)) {
>> +        return H_PARAMETER;
>> +    }
>> +    return H_SUCCESS;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -531,6 +640,7 @@ void spapr_register_nested_phyp(void)
>>       spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, h_guest_get_capabilities);
>>       spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>>       spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
>> +    spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
>>   }
>>   
>>   #else
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 8a6e9ce929..c9f9682a46 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -371,6 +371,7 @@ struct SpaprMachineState {
>>   #define H_UNSUPPORTED     -67
>>   #define H_OVERLAP         -68
>>   #define H_STATE           -75
>> +#define H_IN_USE          -77
> 
> Why add it here and not in the first patch?
> 

Yeh, it was a miss for initial patch, but I guess, we want it here only 
for patch v2. Introducing stuff where they are used first.

>>   #define H_INVALID_ELEMENT_ID               -79
>>   #define H_INVALID_ELEMENT_SIZE             -80
>>   #define H_INVALID_ELEMENT_VALUE            -81
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index 7841027df8..2e8c6ba1ca 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -199,6 +199,7 @@
>>   
>>   /* Nested PAPR API macros */
>>   #define NESTED_GUEST_MAX 4096
>> +#define NESTED_GUEST_VCPU_MAX 2048
>>   
> 
> PAPR_ prefix?
> 
Done.

Thanks
Harsh
>>   typedef struct SpaprMachineStateNestedGuest {
>>       unsigned long vcpus;
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table.
  2023-09-07  3:01   ` Nicholas Piggin
@ 2023-10-04  9:27     ` Harsh Prateek Bora
  2023-10-04  9:42       ` Harsh Prateek Bora
  0 siblings, 1 reply; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-04  9:27 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 08:31, Nicholas Piggin wrote:
> Might be good to add a common nested: prefix to all patches actually.
> 
Noted.

> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> This is a first step towards enabling support for nested PAPR hcalls for
>> providing the get/set of various Guest State Buffer (GSB) elements via
>> h_guest_[g|s]et_state hcalls. This enables for identifying correct
>> callbacks for get/set for each of the elements supported via
>> h_guest_[g|s]et_state hcalls, support for which is added in next patch.
> 
> Changelog could use work.
> 
Sure, will update.

>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_hcall.c          |   1 +
>>   hw/ppc/spapr_nested.c         | 487 ++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/ppc.h          |   2 +
>>   include/hw/ppc/spapr_nested.h | 102 +++++++
>>   4 files changed, 592 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index 9b1f225d4a..ca609cb5a4 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1580,6 +1580,7 @@ static void hypercall_register_types(void)
>>       spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
>>   
>>       spapr_register_nested();
>> +    init_nested();
> 
> This is for hcall registration, not general subsystem init I think.
> Arguably not sure if it matters, it just looks odd for everything
> else to be an hcall except this. I would just add a new init
> function.

I have introduced a new spapr_nested_init routine in spapr_nested.c 
which shall be called from spapr_instance_init. I think we can move GSB 
init there.

> 
> And actually now I look closer at this, I would not do your papr
> hcall init in the cap apply function, if it is possible to do
> inside spapr_register_nested(), then that function could look at
> which caps are enabled and register the appropriate hcalls. Then
> no change to move this into cap code.
> 

IIRC, I had initially tried that during early development but faced 
runtime issues with spapr init at this stage, which is needed to 
identify nested.api. However, keeping cap specific registration in cap
apply function made more sense to me. Further optimizations can be taken 
up later though.

>>   }
>>   
>>   type_init(hypercall_register_types)
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index e7956685af..6fbb1bcb02 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
> 
> [snip]
> 
> My eyes are going square, I'll review this later.
> 

Sure.

>> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
>> index e095c002dc..d7acc28d17 100644
>> --- a/include/hw/ppc/ppc.h
>> +++ b/include/hw/ppc/ppc.h
>> @@ -33,6 +33,8 @@ struct ppc_tb_t {
>>       QEMUTimer *decr_timer;
>>       /* Hypervisor decrementer management */
>>       uint64_t hdecr_next;    /* Tick for next hdecr interrupt  */
>> +    /* TB that HDEC should fire and return ctrl back to the Host partition */
>> +    uint64_t hdecr_expiry_tb;
> 
> Why is this here?

Since there is an existing hypervisor decrementer related variable, it 
appeared appropriate to me to keep it there. Will move it inside
SpaprMachineStateNestedGuestVcpu if that sounds better.

> 
>>       QEMUTimer *hdecr_timer;
>>       int64_t purr_offset;
>>       void *opaque;
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index 2e8c6ba1ca..3c0d6a486e 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
> 
> [snip]
> 
>>   
>> +struct guest_state_element_type {
>> +    uint16_t id;
>> +    int size;
>> +#define GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE 0x1
>> +#define GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY  0x2
>> +   uint16_t flags;
>> +    void *(*location)(SpaprMachineStateNestedGuest *, target_ulong);
>> +    size_t offset;
>> +    void (*copy)(void *, void *, bool);
>> +    uint64_t mask;
>> +};
> 
> I have to wonder whether this is the best way to go. Having
> these indicrect function calls and array of "ops" like this
> might be limiting the compiler. I wonder if it should just
> be done in a switch table, which is how most interpreters
> I've seen (which admittedly is not many) seem to do it.
> 
Hmm, this was chosen after evaluating other approaches as it appeared
better. I think we can move forward with the existing approach and any
further optimizations can be taken up as a follow-up patch.

regards,
Harsh

> Thanks,
> Nick
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table.
  2023-10-04  9:27     ` Harsh Prateek Bora
@ 2023-10-04  9:42       ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-04  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 10/4/23 14:57, Harsh Prateek Bora wrote:
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index 9b1f225d4a..ca609cb5a4 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -1580,6 +1580,7 @@ static void hypercall_register_types(void)
>>>       spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
>>>       spapr_register_nested();
>>> +    init_nested();
>>
>> This is for hcall registration, not general subsystem init I think.
>> Arguably not sure if it matters, it just looks odd for everything
>> else to be an hcall except this. I would just add a new init
>> function.
> 
> I have introduced a new spapr_nested_init routine in spapr_nested.c 
> which shall be called from spapr_instance_init. I think we can move GSB 
> init there.

I revisited the code and feel it is better to do it post hypercall 
registrations in cap apply for nested-papr only as this init is needed 
only for nested papr API.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE
  2023-09-07  3:30   ` Nicholas Piggin
@ 2023-10-09  8:23     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-09  8:23 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 09:00, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> L1 can reuest to get/set state of any of the supported Guest State
>> Buffer (GSB) elements using h_guest_[get|set]_state hcalls.
>> These hcalls needs to do some necessary validation check for each
>> get/set request based on the flags passed and operation supported.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_nested.c         | 267 ++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_nested.h |  22 +++
>>   2 files changed, 289 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 6fbb1bcb02..498e7286fa 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -897,6 +897,138 @@ void init_nested(void)
>>       }
>>   }
>>   
>> +static struct guest_state_element *guest_state_element_next(
>> +    struct guest_state_element *element,
>> +    int64_t *len,
>> +    int64_t *num_elements)
>> +{
>> +    uint16_t size;
>> +
>> +    /* size is of element->value[] only. Not whole guest_state_element */
>> +    size = be16_to_cpu(element->size);
>> +
>> +    if (len) {
>> +        *len -= size + offsetof(struct guest_state_element, value);
>> +    }
>> +
>> +    if (num_elements) {
>> +        *num_elements -= 1;
>> +    }
>> +
>> +    return (struct guest_state_element *)(element->value + size);
>> +}
>> +
>> +static
>> +struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
>> +        if (id == guest_state_element_types[i].id) {
>> +            return &guest_state_element_types[i];
>> +        }
>> +
>> +    return NULL;
>> +}
>> +
>> +static void print_element(struct guest_state_element *element,
>> +                          struct guest_state_request *gsr)
>> +{
>> +    printf("id:0x%04x size:0x%04x %s ",
>> +           be16_to_cpu(element->id), be16_to_cpu(element->size),
>> +           gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get");
>> +    printf("buf:0x%016lx ...\n", be64_to_cpu(*(uint64_t *)element->value));
> 
> No printfs. These could be GUEST_ERROR qemu logs if anything, make
> sure they're relatively well formed messages if you keep them, i.e.,
> something a Linux/KVM developer could understand what went wrong.
> I.e., no __func__ which is internal to QEMU, use "H_GUEST_GET_STATE"
> etc. Ditto for all the rest of the printfs.
> 

Sure, changing to qemu_log_mask(LOG_GUEST_ERROR, "h_guest_%s_state ..."

>> +}
>> +
>> +static bool guest_state_request_check(struct guest_state_request *gsr)
>> +{
>> +    int64_t num_elements, len = gsr->len;
>> +    struct guest_state_buffer *gsb = gsr->gsb;
>> +    struct guest_state_element *element;
>> +    struct guest_state_element_type *type;
>> +    uint16_t id, size;
>> +
>> +    /* gsb->num_elements = 0 == 32 bits long */
>> +    assert(len >= 4);
> 
> I haven't looked closely, but can the guest can't crash the
> host with malformed requests here?
> 
The GSB communication is happening between L1 host and L0 only.
L2 guest doesnt participate and remains unaware of this state exchange.
Hence, Only L1 with malformed request can crash itself, not L2.

> This API is pretty complicated, make sure you sanitize all inputs
> carefully, as early as possible, and without too deep a call and
> control flow chain from the API entry point.
> 

Noted.

> 
>> +
>> +    num_elements = be32_to_cpu(gsb->num_elements);
>> +    element = gsb->elements;
>> +    len -= sizeof(gsb->num_elements);
>> +
>> +    /* Walk the buffer to validate the length */
>> +    while (num_elements) {
>> +
>> +        id = be16_to_cpu(element->id);
>> +        size = be16_to_cpu(element->size);
>> +
>> +        if (false) {
>> +            print_element(element, gsr);
>> +        }
>> +        /* buffer size too small */
>> +        if (len < 0) {
>> +            return false;
>> +        }
>> +
>> +        type = guest_state_element_type_find(id);
>> +        if (!type) {
>> +            printf("%s: Element ID %04x unknown\n", __func__, id);
>> +            print_element(element, gsr);
>> +            return false;
>> +        }
>> +
>> +        if (id == GSB_HV_VCPU_IGNORED_ID) {
>> +            goto next_element;
>> +        }
>> +
>> +        if (size != type->size) {
>> +            printf("%s: Size mismatch. Element ID:%04x. Size Exp:%i Got:%i\n",
>> +                   __func__, id, type->size, size);
>> +            print_element(element, gsr);
>> +            return false;
>> +        }
>> +
>> +        if ((type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY) &&
>> +            (gsr->flags & GUEST_STATE_REQUEST_SET)) {
>> +            printf("%s: trying to set a read-only Element ID:%04x.\n",
>> +                   __func__, id);
>> +            return false;
>> +        }
>> +
>> +        if (type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE) {
>> +            /* guest wide element type */
>> +            if (!(gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE)) {
>> +                printf("%s: trying to set a guest wide Element ID:%04x.\n",
>> +                       __func__, id);
>> +                return false;
>> +            }
>> +        } else {
>> +            /* thread wide element type */
>> +            if (gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE) {
>> +                printf("%s: trying to set a thread wide Element ID:%04x.\n",
>> +                       __func__, id);
>> +                return false;
>> +            }
>> +        }
>> +next_element:
>> +        element = guest_state_element_next(element, &len, &num_elements);
>> +
>> +    }
>> +    return true;
>> +}
>> +
>> +static bool is_gsr_invalid(struct guest_state_request *gsr,
>> +                                   struct guest_state_element *element,
>> +                                   struct guest_state_element_type *type)
>> +{
>> +    if ((gsr->flags & GUEST_STATE_REQUEST_SET) &&
>> +        (*(uint64_t *)(element->value) & ~(type->mask))) {
>> +        print_element(element, gsr);
>> +        printf("L1 can't set reserved bits (allowed mask: 0x%08lx)\n",
>> +               type->mask);
>> +        return true;
>> +    }
>> +    return false;
>> +}
>>   
>>   static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>>                                                SpaprMachineState *spapr,
>> @@ -1108,6 +1240,139 @@ static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
>>       return H_SUCCESS;
>>   }
>>   
>> +static target_ulong getset_state(SpaprMachineStateNestedGuest *guest,
>> +                                 uint64_t vcpuid,
>> +                                 struct guest_state_request *gsr)
>> +{
>> +    void *ptr;
>> +    uint16_t id;
>> +    struct guest_state_element *element;
>> +    struct guest_state_element_type *type;
>> +    int64_t lenleft, num_elements;
>> +
>> +    lenleft = gsr->len;
>> +
>> +    if (!guest_state_request_check(gsr)) {
>> +        return H_P3;
>> +    }
>> +
>> +    num_elements = be32_to_cpu(gsr->gsb->num_elements);
>> +    element = gsr->gsb->elements;
>> +    /* Process the elements */
>> +    while (num_elements) {
>> +        type = NULL;
>> +        /* Debug print before doing anything */
>> +        if (false) {
>> +            print_element(element, gsr);
>> +        }
>> +
>> +        id = be16_to_cpu(element->id);
>> +        if (id == GSB_HV_VCPU_IGNORED_ID) {
>> +            goto next_element;
>> +        }
>> +
>> +        type = guest_state_element_type_find(id);
>> +        assert(type);
>> +
>> +        /* Get pointer to guest data to get/set */
>> +        if (type->location && type->copy) {
>> +            ptr = type->location(guest, vcpuid);
>> +            assert(ptr);
>> +            if (!~(type->mask) && is_gsr_invalid(gsr, element, type)) {
>> +                return H_INVALID_ELEMENT_VALUE;
>> +            }
>> +            type->copy(ptr + type->offset, element->value,
>> +                       gsr->flags & GUEST_STATE_REQUEST_SET ? true : false);
>> +        }
>> +
>> +next_element:
>> +        element = guest_state_element_next(element, &lenleft, &num_elements);
>> +    }
>> +
>> +    return H_SUCCESS;
>> +}
>> +
>> +static target_ulong map_and_getset_state(PowerPCCPU *cpu,
>> +                                         SpaprMachineStateNestedGuest *guest,
>> +                                         uint64_t vcpuid,
>> +                                         struct guest_state_request *gsr)
>> +{
>> +    target_ulong rc;
>> +    int64_t lenleft, len;
>> +    bool is_write;
>> +
>> +    assert(gsr->len < (1024 * 1024)); /* sanity check */
> 
> Use a #define for this, make sure guest can't crash host.

Defined a macro GSB_MAX_BUF_SIZE for this and moving check to caller.
As explained earlier, nested guest cant crash the host as get/set is
happening only between L1 and L0. L2 doesnt participate.

>> +
>> +    lenleft = len = gsr->len;
> 
> Why lenleft? Can't you just check gsr->len like you do gsr->gsb?

My bad, updated.

> 
>> +    gsr->gsb = address_space_map(CPU(cpu)->as, gsr->buf, (uint64_t *)&len,
>> +                                 false, MEMTXATTRS_UNSPECIFIED);
> 
> So it's a read-only memory access to gsr->buf? Even for the set?

Hmm, actually set_state should need RO access, get_state would need RW
access to the provided buffer. However, not sure if there is bug in
these routines as it has been working like this for now. I shall update
and re-validate.

> 
>> +    if (!gsr->gsb) {
>> +        rc = H_P3;
>> +        goto out1;
>> +    }
>> +
>> +    if (len != lenleft) {
>> +        rc = H_P3;
>> +        goto out1;
>> +    }
>> +
>> +    rc = getset_state(guest, vcpuid, gsr);
>> +
>> +out1:
>> +    is_write = (rc == H_SUCCESS) ? len : 0;
>> +    address_space_unmap(CPU(cpu)->as, gsr->gsb, len, is_write, false);
> 
> I don't think this is right, you want to specify the length of memory
> you actually accessed, even if there was some error.
> 
> Over-specifying I think would be okay. So I think just use len.

Hmm, we are specifying len as expected, it's the is_write arg wrongly
set. I think this got carried forward from existing code as a typo as I
see most of the unmaps in spapr_exit_nested are passing last two args
unexpectedly. I shall update this call as appropriate for now and bugs
in existing code can be fixed in separate patches.

> 
> 
>> +    return rc;
>> +}
>> +
>> +static target_ulong h_guest_getset_state(PowerPCCPU *cpu,
>> +                                         SpaprMachineState *spapr,
>> +                                         target_ulong *args,
>> +                                         bool set)
>> +{
>> +    target_ulong flags = args[0];
>> +    target_ulong lpid = args[1];
>> +    target_ulong vcpuid = args[2];
>> +    target_ulong buf = args[3];
>> +    target_ulong buflen = args[4];
>> +    struct guest_state_request gsr;
>> +    SpaprMachineStateNestedGuest *guest;
>> +
>> +    guest = spapr_get_nested_guest(spapr, lpid);
>> +    if (!guest) {
>> +        return H_P2;
>> +    }
>> +    gsr.buf = buf;
>> +    gsr.len = buflen;
>> +    gsr.flags = 0;
> 
> Not a big fan of packaging up some args into a structure,
> especially if it's pretty static to a file and no need to be
> carried around with some data. Do you even need this gsr
> thing?

IMHO, it makes sense to keep related meta-data together for a guest 
state request in this case. It also helps reducing the number of args 
being passed to multiple helper routines down the path, each of which 
may use one or more of its members. If you have strong objections for 
better reasons, I am willing to revisit this.

> 
>> +    if (flags & H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
>> +        gsr.flags |= GUEST_STATE_REQUEST_GUEST_WIDE;
>> +    }
>> +    if (flags & !H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE) {
>> +        return H_PARAMETER; /* flag not supported yet */
>> +    }
>> +
>> +    if (set) {
>> +        gsr.flags |= GUEST_STATE_REQUEST_SET;
>> +    }
>> +    return map_and_getset_state(cpu, guest, vcpuid, &gsr);
>> +}
>> +
>> +static target_ulong h_guest_set_state(PowerPCCPU *cpu,
>> +                                      SpaprMachineState *spapr,
>> +                                      target_ulong opcode,
>> +                                      target_ulong *args)
>> +{
>> +    return h_guest_getset_state(cpu, spapr, args, true);
>> +}
>> +
>> +static target_ulong h_guest_get_state(PowerPCCPU *cpu,
>> +                                      SpaprMachineState *spapr,
>> +                                      target_ulong opcode,
>> +                                      target_ulong *args)
>> +{
>> +    return h_guest_getset_state(cpu, spapr, args, false);
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -1122,6 +1387,8 @@ void spapr_register_nested_phyp(void)
>>       spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, h_guest_set_capabilities);
>>       spapr_register_hypercall(H_GUEST_CREATE          , h_guest_create);
>>       spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
>> +    spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
>> +    spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
>>   }
>>   
>>   #else
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index 3c0d6a486e..eaee624b87 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -206,6 +206,9 @@
>>   #define HVMASK_MSR            0xEBFFFFFFFFBFEFFF
>>   #define HVMASK_HDEXCR         0x00000000FFFFFFFF
>>   #define HVMASK_TB_OFFSET      0x000000FFFFFFFFFF
>> +#define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000000000000000 /* BE in GSB */
>> +#define GUEST_STATE_REQUEST_GUEST_WIDE       0x1
>> +#define GUEST_STATE_REQUEST_SET              0x2
>>   
>>   #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
>>       .id = (i),                                     \
>> @@ -336,6 +339,25 @@ struct guest_state_element_type {
>>       uint64_t mask;
>>   };
>>   
>> +struct guest_state_element {
>> +    uint16_t id;   /* Big Endian */
>> +    uint16_t size; /* Big Endian */
>> +    uint8_t value[]; /* Big Endian (based on size above) */
>> +} QEMU_PACKED;
>> +
>> +struct guest_state_buffer {
>> +    uint32_t num_elements; /* Big Endian */
>> +    struct guest_state_element elements[];
>> +} QEMU_PACKED;
> 
> I think it's probably enough to add one comment saying the PAPR
> API numbers are all in BE format. This is actually expected of PAPR
> so it goes without saying really, but the nested HV API actually had
> some things in guest endian format so it's worth calling out.
> 
> Actually maybe single out the nested HV structures as different. I
> don't know if the upstream code actually handles endian properly...
> 

Sure, removing all BE related comments for now for changes in this series.

regards,
Harsh

> Thanks,
> Nick
> 
>> +
>> +/* Actuall buffer plus some metadata about the request */
>> +struct guest_state_request {
>> +    struct guest_state_buffer *gsb;
>> +    int64_t buf;
>> +    int64_t len;
>> +    uint16_t flags;
>> +};
>> +
>>   /*
>>    * Register state for entering a nested guest with H_ENTER_NESTED.
>>    * New member must be added at the end.
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU
  2023-09-07  3:55   ` Nicholas Piggin
@ 2023-10-12 10:23     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-12 10:23 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 09:25, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> Once the L1 has created a nested guest and its associated VCPU, it can
>> request for the execution of nested guest by setting its initial state
>> which can be done either using the h_guest_set_state or using the input
>> buffers along with the call to h_guest_run_vcpu(). On guest exit, L0
>> uses output buffers to convey the exit cause to the L1. L0 takes care of
>> switching context from L1 to L2 during guest entry and restores L1 context
>> on guest exit.
>>
>> Unlike nested-hv, L2 (nested) guest's entire state is retained with
>> L0 after guest exit and restored on next entry in case of nested-papr.
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr_nested.c           | 471 +++++++++++++++++++++++++++-----
>>   include/hw/ppc/spapr_cpu_core.h |   7 +-
>>   include/hw/ppc/spapr_nested.h   |   6 +
>>   3 files changed, 408 insertions(+), 76 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
>> index 67e389a762..3605f27115 100644
>> --- a/hw/ppc/spapr_nested.c
>> +++ b/hw/ppc/spapr_nested.c
>> @@ -12,6 +12,17 @@
>>   #ifdef CONFIG_TCG
>>   #define PRTS_MASK      0x1f
>>   
>> +static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
>> +                                     SpaprMachineStateNestedGuestVcpu *vcpu);
>> +static void exit_process_output_buffer(PowerPCCPU *cpu,
>> +                                      SpaprMachineStateNestedGuest *guest,
>> +                                      target_ulong vcpuid,
>> +                                      target_ulong *r3);
>> +static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src);
>> +static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
>> +                       target_ulong vcpuid,
>> +                       bool inoutbuf);
>> +
>>   static target_ulong h_set_ptbl(PowerPCCPU *cpu,
>>                                  SpaprMachineState *spapr,
>>                                  target_ulong opcode,
>> @@ -187,21 +198,21 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>>           return H_PARAMETER;
>>       }
>>   
>> -    spapr_cpu->nested_host_state = g_try_new(struct nested_ppc_state, 1);
>> -    if (!spapr_cpu->nested_host_state) {
>> +    spapr_cpu->nested_hv_host = g_try_new(struct nested_ppc_state, 1);
>> +    if (!spapr_cpu->nested_hv_host) {
>>           return H_NO_MEM;
>>       }
> 
> Don't rename existing thing in the same patch as adding new thing.
> 

Sure, renaming in a separate patch before this patch.

>>   
>>       assert(env->spr[SPR_LPIDR] == 0);
>>       assert(env->spr[SPR_DPDES] == 0);
>> -    nested_save_state(spapr_cpu->nested_host_state, cpu);
>> +    nested_save_state(spapr_cpu->nested_hv_host, cpu);
>>   
>>       len = sizeof(*regs);
>>       regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, false,
>>                                   MEMTXATTRS_UNSPECIFIED);
>>       if (!regs || len != sizeof(*regs)) {
>>           address_space_unmap(CPU(cpu)->as, regs, len, 0, false);
>> -        g_free(spapr_cpu->nested_host_state);
>> +        g_free(spapr_cpu->nested_hv_host);
>>           return H_P2;
>>       }
>>   
>> @@ -276,105 +287,146 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>>   
>>   void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>>   {
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    CPUState *cs = CPU(cpu);
> 
> I think it would be worth seeing how it looks to split these into
> original and papr functions rather than try mash them together.
> 

Yeh, that really sounds better approach. Updated patch to keep nested-hv 
code untouched and keeping both API code flows isolated.

>>       CPUPPCState *env = &cpu->env;
>>       SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
>> +    target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
>>       struct nested_ppc_state l2_state;
>> -    target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
>> -    target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
>> -    target_ulong hsrr0, hsrr1, hdar, asdr, hdsisr;
>> +    target_ulong hv_ptr, regs_ptr;
>> +    target_ulong hsrr0 = 0, hsrr1 = 0, hdar = 0, asdr = 0, hdsisr = 0;
>>       struct kvmppc_hv_guest_state *hvstate;
>>       struct kvmppc_pt_regs *regs;
>>       hwaddr len;
>> +    target_ulong lpid = 0, vcpuid = 0;
>> +    struct SpaprMachineStateNestedGuestVcpu *vcpu = NULL;
>> +    struct SpaprMachineStateNestedGuest *guest = NULL;
>>   
>>       assert(spapr_cpu->in_nested);
>> -
>> -    nested_save_state(&l2_state, cpu);
>> -    hsrr0 = env->spr[SPR_HSRR0];
>> -    hsrr1 = env->spr[SPR_HSRR1];
>> -    hdar = env->spr[SPR_HDAR];
>> -    hdsisr = env->spr[SPR_HDSISR];
>> -    asdr = env->spr[SPR_ASDR];
>> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
>> +        nested_save_state(&l2_state, cpu);
>> +        hsrr0 = env->spr[SPR_HSRR0];
>> +        hsrr1 = env->spr[SPR_HSRR1];
>> +        hdar = env->spr[SPR_HDAR];
>> +        hdsisr = env->spr[SPR_HDSISR];
>> +        asdr = env->spr[SPR_ASDR];
>> +    } else if (spapr->nested.api == NESTED_API_PAPR) {
>> +        lpid = spapr_cpu->nested_papr_host->gpr[5];
>> +        vcpuid = spapr_cpu->nested_papr_host->gpr[6];
>> +        guest = spapr_get_nested_guest(spapr, lpid);
>> +        assert(guest);
>> +        vcpu_check(guest, vcpuid, false);
>> +        vcpu = &guest->vcpu[vcpuid];
>> +
>> +        exit_nested_restore_vcpu(cpu, excp, vcpu);
>> +        /* do the output buffer for run_vcpu*/
>> +        exit_process_output_buffer(cpu, guest, vcpuid, &r3_return);
>> +    } else
>> +        g_assert_not_reached();
>>   
>>       /*
>>        * Switch back to the host environment (including for any error).
>>        */
>>       assert(env->spr[SPR_LPIDR] != 0);
>> -    nested_load_state(cpu, spapr_cpu->nested_host_state);
>> -    env->gpr[3] = env->excp_vectors[excp]; /* hcall return value */
>>   
>> -    cpu_ppc_hdecr_exit(env);
>> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
>> +        nested_load_state(cpu, spapr_cpu->nested_hv_host);
>> +        env->gpr[3] = r3_return;
>> +    } else if (spapr->nested.api == NESTED_API_PAPR) {
>> +        restore_common_regs(env, spapr_cpu->nested_papr_host);
>> +        env->tb_env->tb_offset -= vcpu->tb_offset;
>> +        env->gpr[3] = H_SUCCESS;
>> +        env->gpr[4] = r3_return;
>> +        hreg_compute_hflags(env);
>> +        ppc_maybe_interrupt(env);
>> +        tlb_flush(cs);
>> +        env->reserve_addr = -1; /* Reset the reservation */
> 
> There's a bunch of stuff that's getting duplicated anyway, so
> it's actually not clear that this maze of if statements makes
> it simpler to see that nothing is missed.
> 

Yeh, refactored to keep separate routines for both nested-hv and 
nested-papr to avoid touching existing logic for nested-hv API.

>> +    }
>>   
>> -    spapr_cpu->in_nested = false;
>> +    cpu_ppc_hdecr_exit(env);
>>   
>> -    g_free(spapr_cpu->nested_host_state);
>> -    spapr_cpu->nested_host_state = NULL;
>> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
>> +        hv_ptr = spapr_cpu->nested_hv_host->gpr[4];
>> +        regs_ptr = spapr_cpu->nested_hv_host->gpr[5];
>> +
>> +        len = sizeof(*hvstate);
>> +        hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
>> +                                    MEMTXATTRS_UNSPECIFIED);
>> +        if (len != sizeof(*hvstate)) {
>> +            address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
>> +            env->gpr[3] = H_PARAMETER;
>> +            return;
>> +        }
>>   
>> -    len = sizeof(*hvstate);
>> -    hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
>> -                                MEMTXATTRS_UNSPECIFIED);
>> -    if (len != sizeof(*hvstate)) {
>> -        address_space_unmap(CPU(cpu)->as, hvstate, len, 0, true);
>> -        env->gpr[3] = H_PARAMETER;
>> -        return;
>> -    }
>> +        hvstate->cfar = l2_state.cfar;
>> +        hvstate->lpcr = l2_state.lpcr;
>> +        hvstate->pcr = l2_state.pcr;
>> +        hvstate->dpdes = l2_state.dpdes;
>> +        hvstate->hfscr = l2_state.hfscr;
>> +
>> +        if (excp == POWERPC_EXCP_HDSI) {
>> +            hvstate->hdar = hdar;
>> +            hvstate->hdsisr = hdsisr;
>> +            hvstate->asdr = asdr;
>> +        } else if (excp == POWERPC_EXCP_HISI) {
>> +            hvstate->asdr = asdr;
>> +        }
>>   
>> -    hvstate->cfar = l2_state.cfar;
>> -    hvstate->lpcr = l2_state.lpcr;
>> -    hvstate->pcr = l2_state.pcr;
>> -    hvstate->dpdes = l2_state.dpdes;
>> -    hvstate->hfscr = l2_state.hfscr;
>> +        /* HEIR should be implemented for HV mode and saved here. */
>> +        hvstate->srr0 = l2_state.srr0;
>> +        hvstate->srr1 = l2_state.srr1;
>> +        hvstate->sprg[0] = l2_state.sprg0;
>> +        hvstate->sprg[1] = l2_state.sprg1;
>> +        hvstate->sprg[2] = l2_state.sprg2;
>> +        hvstate->sprg[3] = l2_state.sprg3;
>> +        hvstate->pidr = l2_state.pidr;
>> +        hvstate->ppr = l2_state.ppr;
>> +
>> +        /* Is it okay to specify write len larger than actual data written? */
>> +        address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
>> +
>> +        len = sizeof(*regs);
>> +        regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
>> +                                    MEMTXATTRS_UNSPECIFIED);
>> +        if (!regs || len != sizeof(*regs)) {
>> +            address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
>> +            env->gpr[3] = H_P2;
>> +            return;
>> +        }
>>   
>> -    if (excp == POWERPC_EXCP_HDSI) {
>> -        hvstate->hdar = hdar;
>> -        hvstate->hdsisr = hdsisr;
>> -        hvstate->asdr = asdr;
>> -    } else if (excp == POWERPC_EXCP_HISI) {
>> -        hvstate->asdr = asdr;
>> -    }
>> +        len = sizeof(env->gpr);
>> +        assert(len == sizeof(regs->gpr));
>> +        memcpy(regs->gpr, l2_state.gpr, len);
>>   
>> -    /* HEIR should be implemented for HV mode and saved here. */
>> -    hvstate->srr0 = l2_state.srr0;
>> -    hvstate->srr1 = l2_state.srr1;
>> -    hvstate->sprg[0] = l2_state.sprg0;
>> -    hvstate->sprg[1] = l2_state.sprg1;
>> -    hvstate->sprg[2] = l2_state.sprg2;
>> -    hvstate->sprg[3] = l2_state.sprg3;
>> -    hvstate->pidr = l2_state.pidr;
>> -    hvstate->ppr = l2_state.ppr;
>> +        regs->link = l2_state.lr;
>> +        regs->ctr = l2_state.ctr;
>> +        regs->xer = l2_state.xer;
>> +        regs->ccr = l2_state.cr;
>>   
>> -    /* Is it okay to specify write length larger than actual data written? */
>> -    address_space_unmap(CPU(cpu)->as, hvstate, len, len, true);
>> +        if (excp == POWERPC_EXCP_MCHECK ||
>> +            excp == POWERPC_EXCP_RESET ||
>> +            excp == POWERPC_EXCP_SYSCALL) {
>> +            regs->nip = l2_state.srr0;
>> +            regs->msr = l2_state.srr1 & env->msr_mask;
>> +        } else {
>> +            regs->nip = hsrr0;
>> +            regs->msr = hsrr1 & env->msr_mask;
>> +        }
>>   
>> -    len = sizeof(*regs);
>> -    regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, true,
>> -                                MEMTXATTRS_UNSPECIFIED);
>> -    if (!regs || len != sizeof(*regs)) {
>> -        address_space_unmap(CPU(cpu)->as, regs, len, 0, true);
>> -        env->gpr[3] = H_P2;
>> -        return;
>> +        /* Is it okay to specify write len larger than actual data written? */
>> +        address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>>       }
>>   
>> -    len = sizeof(env->gpr);
>> -    assert(len == sizeof(regs->gpr));
>> -    memcpy(regs->gpr, l2_state.gpr, len);
>> -
>> -    regs->link = l2_state.lr;
>> -    regs->ctr = l2_state.ctr;
>> -    regs->xer = l2_state.xer;
>> -    regs->ccr = l2_state.cr;
>> +    spapr_cpu->in_nested = false;
>>   
>> -    if (excp == POWERPC_EXCP_MCHECK ||
>> -        excp == POWERPC_EXCP_RESET ||
>> -        excp == POWERPC_EXCP_SYSCALL) {
>> -        regs->nip = l2_state.srr0;
>> -        regs->msr = l2_state.srr1 & env->msr_mask;
>> +    if (spapr->nested.api == NESTED_API_KVM_HV) {
>> +        g_free(spapr_cpu->nested_hv_host);
>> +        spapr_cpu->nested_hv_host = NULL;
>>       } else {
>> -        regs->nip = hsrr0;
>> -        regs->msr = hsrr1 & env->msr_mask;
>> +        g_free(spapr_cpu->nested_papr_host);
>> +        spapr_cpu->nested_papr_host = NULL;
>>       }
>>   
>> -    /* Is it okay to specify write length larger than actual data written? */
>> -    address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>>   }
>>   
>>   SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
>> @@ -1372,6 +1424,274 @@ static target_ulong h_guest_get_state(PowerPCCPU *cpu,
>>       return h_guest_getset_state(cpu, spapr, args, false);
>>   }
>>   
>> +static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src)
>> +{
>> +    memcpy(dst->gpr, src->gpr, sizeof(dst->gpr));
>> +    memcpy(dst->crf, src->crf, sizeof(dst->crf));
>> +    memcpy(dst->vsr, src->vsr, sizeof(dst->vsr));
>> +    dst->nip = src->nip;
>> +    dst->msr = src->msr;
>> +    dst->lr  = src->lr;
>> +    dst->ctr = src->ctr;
>> +    dst->cfar = src->cfar;
>> +    cpu_write_xer(dst, src->xer);
>> +    ppc_store_vscr(dst, ppc_get_vscr(src));
>> +    ppc_store_fpscr(dst, src->fpscr);
>> +    memcpy(dst->spr, src->spr, sizeof(dst->spr));
>> +}
>> +
>> +static void restore_l2_state(PowerPCCPU *cpu,
>> +                             CPUPPCState *env,
>> +                             struct SpaprMachineStateNestedGuestVcpu *vcpu,
>> +                             target_ulong now)
>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    target_ulong lpcr, lpcr_mask, hdec;
>> +    lpcr_mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_MER;
>> +
>> +    if (spapr->nested.api == NESTED_API_PAPR) {
>> +        assert(vcpu);
>> +        assert(sizeof(env->gpr) == sizeof(vcpu->env.gpr));
>> +        restore_common_regs(env, &vcpu->env);
>> +        lpcr = (env->spr[SPR_LPCR] & ~lpcr_mask) |
>> +               (vcpu->env.spr[SPR_LPCR] & lpcr_mask);
>> +        lpcr |= LPCR_HR | LPCR_UPRT | LPCR_GTSE | LPCR_HVICE | LPCR_HDICE;
>> +        lpcr &= ~LPCR_LPES0;
>> +        env->spr[SPR_LPCR] = lpcr & pcc->lpcr_mask;
>> +
>> +        hdec = vcpu->env.tb_env->hdecr_expiry_tb - now;
>> +        cpu_ppc_store_decr(env, vcpu->dec_expiry_tb - now);
>> +        cpu_ppc_hdecr_init(env);
>> +        cpu_ppc_store_hdecr(env, hdec);
>> +
>> +        env->tb_env->tb_offset += vcpu->tb_offset;
>> +    }
>> +}
>> +
>> +static void enter_nested(PowerPCCPU *cpu,
>> +                         uint64_t lpid,
>> +                         struct SpaprMachineStateNestedGuestVcpu *vcpu)
> 
> That's not good since we have h_enter_nested for the old API. Really
> have to be a bit more consistent with using papr_ for naming I think.
> And you don't have to call this enter_nested anyway, papr_run_vcpu is
> okay too since that matches the API call. Can just add a comment /*
> Enter the L2 VCPU, equivalent to h_enter_nested */ if you think that's
> needed.
> 

Make sense, renaming it to spapr_nested_run_vcpu().

>> +{
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    CPUState *cs = CPU(cpu);
>> +    CPUPPCState *env = &cpu->env;
>> +    SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
>> +    target_ulong now = cpu_ppc_load_tbl(env);
>> +
>> +    assert(env->spr[SPR_LPIDR] == 0);
>> +    assert(spapr->nested.api); /* ensure API version is initialized */
>> +    spapr_cpu->nested_papr_host = g_try_new(CPUPPCState, 1);
>> +    assert(spapr_cpu->nested_papr_host);
>> +    memcpy(spapr_cpu->nested_papr_host, env, sizeof(CPUPPCState));
>> +
>> +    restore_l2_state(cpu, env, vcpu, now);
>> +    env->spr[SPR_LPIDR] = lpid; /* post restore_l2_state */
>> +
>> +    spapr_cpu->in_nested = true;
>> +
>> +    hreg_compute_hflags(env);
>> +    ppc_maybe_interrupt(env);
>> +    tlb_flush(cs);
>> +    env->reserve_addr = -1; /* Reset the reservation */
> 
>         ^^^
>         This is the kind of block that could be pulled into a
>         common helper function. There's 3-4 copies now?

Yeh, introducing helper nested_post_state_update() for this block.

>> +
>> +}
>> +
>> +static target_ulong h_guest_run_vcpu(PowerPCCPU *cpu,
>> +                                     SpaprMachineState *spapr,
>> +                                     target_ulong opcode,
>> +                                     target_ulong *args)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong flags = args[0];
>> +    target_ulong lpid = args[1];
>> +    target_ulong vcpuid = args[2];
>> +    struct SpaprMachineStateNestedGuestVcpu *vcpu;
>> +    struct guest_state_request gsr;
>> +    SpaprMachineStateNestedGuest *guest;
>> +
>> +    if (flags) /* don't handle any flags for now */
>> +        return H_PARAMETER;
>> +
>> +    guest = spapr_get_nested_guest(spapr, lpid);
>> +    if (!guest) {
>> +        return H_P2;
>> +    }
>> +    if (!vcpu_check(guest, vcpuid, true)) {
>> +        return H_P3;
>> +    }
>> +
>> +    if (guest->parttbl[0] == 0) {
>> +        /* At least need a partition scoped radix tree */
>> +        return H_NOT_AVAILABLE;
>> +    }
>> +
>> +    vcpu = &guest->vcpu[vcpuid];
>> +
>> +    /* Read run_vcpu input buffer to update state */
>> +    gsr.buf = vcpu->runbufin.addr;
>> +    gsr.len = vcpu->runbufin.size;
>> +    gsr.flags = GUEST_STATE_REQUEST_SET; /* Thread wide + writing */
>> +    if (!map_and_getset_state(cpu, guest, vcpuid, &gsr)) {
>> +        enter_nested(cpu, lpid, vcpu);
>> +    }
>> +
>> +    return env->gpr[3];
>> +}
>> +
>> +struct run_vcpu_exit_cause run_vcpu_exit_causes[] = {
>> +    { .nia = 0x980,
>> +      .count = 0,
>> +    },
>> +    { .nia = 0xc00,
>> +      .count = 10,
>> +      .ids = {
>> +          GSB_VCPU_GPR3,
>> +          GSB_VCPU_GPR4,
>> +          GSB_VCPU_GPR5,
>> +          GSB_VCPU_GPR6,
>> +          GSB_VCPU_GPR7,
>> +          GSB_VCPU_GPR8,
>> +          GSB_VCPU_GPR9,
>> +          GSB_VCPU_GPR10,
>> +          GSB_VCPU_GPR11,
>> +          GSB_VCPU_GPR12,
>> +      },
>> +    },
>> +    { .nia = 0xe00,
>> +      .count = 5,
>> +      .ids = {
>> +          GSB_VCPU_SPR_HDAR,
>> +          GSB_VCPU_SPR_HDSISR,
>> +          GSB_VCPU_SPR_ASDR,
>> +          GSB_VCPU_SPR_NIA,
>> +          GSB_VCPU_SPR_MSR,
>> +      },
>> +    },
>> +    { .nia = 0xe20,
>> +      .count = 4,
>> +      .ids = {
>> +          GSB_VCPU_SPR_HDAR,
>> +          GSB_VCPU_SPR_ASDR,
>> +          GSB_VCPU_SPR_NIA,
>> +          GSB_VCPU_SPR_MSR,
>> +      },
>> +    },
>> +    { .nia = 0xe40,
>> +      .count = 3,
>> +      .ids = {
>> +          GSB_VCPU_SPR_HEIR,
>> +          GSB_VCPU_SPR_NIA,
>> +          GSB_VCPU_SPR_MSR,
>> +      },
>> +    },
>> +    { .nia = 0xea0,
>> +      .count = 0,
>> +    },
>> +    { .nia = 0xf80,
>> +      .count = 3,
>> +      .ids = {
>> +          GSB_VCPU_SPR_HFSCR,
>> +          GSB_VCPU_SPR_NIA,
>> +          GSB_VCPU_SPR_MSR,
>> +      },
>> +    },
>> +};
>> +
>> +static struct run_vcpu_exit_cause *find_exit_cause(uint64_t srr0)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(run_vcpu_exit_causes); i++)
>> +        if (srr0 == run_vcpu_exit_causes[i].nia) {
>> +            return &run_vcpu_exit_causes[i];
>> +        }
>> +
>> +    printf("%s: srr0:0x%016lx\n", __func__, srr0);
>> +    return NULL;
>> +}
> 
> This is another weird control flow thing, also unclear why it's used
> here. Here: 52 lines vs 76, no new struct, simpler for the compiler
> to understand and optimise.
> 
> int get_exit_ids(uint64_t srr0, uint16_t ids[16])
> {
>      int nr;
> 
>      switch (srr0) {
>      case 0xc00:
>          nr = 10;
>          ids[0] = GSP_VCPU_GPR3;
>          ids[1] = GSP_VCPU_GPR4;
>          ids[2] = GSP_VCPU_GPR5;
>          ids[3] = GSP_VCPU_GPR6;
>          ids[4] = GSP_VCPU_GPR7;
>          ids[5] = GSP_VCPU_GPR8;
>          ids[6] = GSP_VCPU_GPR9;
>          ids[7] = GSP_VCPU_GPR10;
>          ids[8] = GSP_VCPU_GPR11;
>          ids[9] = GSP_VCPU_GPR12;
>          break;
>      case 0xe00:
>          nr = 5;
>          ids[0] = GSP_VCPU_HDAR;
>          ids[1] = GSP_VCPU_HDSISR;
>          ids[2] = GSP_VCPU_ASDR;
>          ids[3] = GSP_VCPU_NIA;
>          ids[4] = GSP_VCPU_MSR;
>          break;
>      case 0xe20:
>          nr = 4;
>          ids[0] = GSP_VCPU_HDAR;
>          ids[1] = GSP_VCPU_ASDR;
>          ids[2] = GSP_VCPU_NIA;
>          ids[3] = GSP_VCPU_MSR;
>          break;
>      case 0xe40:
>          nr = 3;
>          ids[0] = GSP_VCPU_HEIR;
>          ids[1] = GSP_VCPU_NIA;
>          ids[2] = GSP_VCPU_MSR;
>          break;
>      case 0xf80:
>          nr = 3;
>          ids[0] = GSP_VCPU_HFSCR;
>          ids[1] = GSP_VCPU_NIA;
>          ids[2] = GSP_VCPU_MSR;
>          break;
>      default:
>          nr = 0;
>          break;
>      }
> 
>      return nr;
> }
> 

This is really simpler and nice. Updated code as suggested. Thanks.

>> +
>> +static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
>> +                                     SpaprMachineStateNestedGuestVcpu *vcpu)
>> +{
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong now, hdar, hdsisr, asdr;
>> +
>> +    assert(sizeof(env->gpr) == sizeof(vcpu->env.gpr)); /* sanity check */
>> +
>> +    now = cpu_ppc_load_tbl(env); /* L2 timebase */
>> +    now -= vcpu->tb_offset; /* L1 timebase */
>> +    vcpu->dec_expiry_tb = now - cpu_ppc_load_decr(env);
>> +    /* backup hdar, hdsisr, asdr if reqd later below */
>> +    hdar   = vcpu->env.spr[SPR_HDAR];
>> +    hdsisr = vcpu->env.spr[SPR_HDSISR];
>> +    asdr   = vcpu->env.spr[SPR_ASDR];
>> +
>> +    restore_common_regs(&vcpu->env, env);
>> +
>> +    if (excp == POWERPC_EXCP_MCHECK ||
>> +        excp == POWERPC_EXCP_RESET ||
>> +        excp == POWERPC_EXCP_SYSCALL) {
>> +        vcpu->env.nip = env->spr[SPR_SRR0];
>> +        vcpu->env.msr = env->spr[SPR_SRR1] & env->msr_mask;
>> +    } else {
>> +        vcpu->env.nip = env->spr[SPR_HSRR0];
>> +        vcpu->env.msr = env->spr[SPR_HSRR1] & env->msr_mask;
>> +    }
>> +
>> +    /* hdar, hdsisr, asdr should be retained unless certain exceptions */
>> +    if ((excp != POWERPC_EXCP_HDSI) && (excp != POWERPC_EXCP_HISI)) {
>> +        vcpu->env.spr[SPR_ASDR] = asdr;
>> +    } else if (excp != POWERPC_EXCP_HDSI) {
>> +        vcpu->env.spr[SPR_HDAR]   = hdar;
>> +        vcpu->env.spr[SPR_HDSISR] = hdsisr;
>> +    }
>> +}
>> +
>> +static void exit_process_output_buffer(PowerPCCPU *cpu,
>> +                                      SpaprMachineStateNestedGuest *guest,
>> +                                      target_ulong vcpuid,
>> +                                      target_ulong *r3)
>> +{
>> +    SpaprMachineStateNestedGuestVcpu *vcpu = &guest->vcpu[vcpuid];
>> +    struct guest_state_request gsr;
>> +    struct guest_state_buffer *gsb;
>> +    struct guest_state_element *element;
>> +    struct guest_state_element_type *type;
>> +    struct run_vcpu_exit_cause *exit_cause;
>> +    hwaddr len;
>> +    int i;
>> +
>> +    len = vcpu->runbufout.size;
>> +    gsb = address_space_map(CPU(cpu)->as, vcpu->runbufout.addr, &len, true,
>> +                            MEMTXATTRS_UNSPECIFIED);
>> +    if (!gsb || len != vcpu->runbufout.size) {
>> +        address_space_unmap(CPU(cpu)->as, gsb, len, 0, true);
>> +        *r3 = H_P2;
>> +        return;
>> +    }
>> +
>> +    exit_cause = find_exit_cause(*r3);
>> +
>> +    /* Create a buffer of elements to send back */
>> +    gsb->num_elements = cpu_to_be32(exit_cause->count);
>> +    element = gsb->elements;
>> +    for (i = 0; i < exit_cause->count; i++) {
>> +        type = guest_state_element_type_find(exit_cause->ids[i]);
>> +        assert(type);
>> +        element->id = cpu_to_be16(exit_cause->ids[i]);
>> +        element->size = cpu_to_be16(type->size);
>> +        element = guest_state_element_next(element, NULL, NULL);
>> +    }
>> +    gsr.gsb = gsb;
>> +    gsr.len = VCPU_OUT_BUF_MIN_SZ;
>> +    gsr.flags = 0; /* get + never guest wide */
>> +    getset_state(guest, vcpuid, &gsr);
>> +
>> +    address_space_unmap(CPU(cpu)->as, gsb, len, len, true);
>> +    return;
>> +}
>> +
>>   void spapr_register_nested(void)
>>   {
>>       spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
>> @@ -1388,6 +1708,7 @@ void spapr_register_nested_phyp(void)
>>       spapr_register_hypercall(H_GUEST_CREATE_VCPU     , h_guest_create_vcpu);
>>       spapr_register_hypercall(H_GUEST_SET_STATE       , h_guest_set_state);
>>       spapr_register_hypercall(H_GUEST_GET_STATE       , h_guest_get_state);
>> +    spapr_register_hypercall(H_GUEST_RUN_VCPU        , h_guest_run_vcpu);
>>   }
>>   
>>   #else
>> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
>> index 69a52e39b8..09855f69aa 100644
>> --- a/include/hw/ppc/spapr_cpu_core.h
>> +++ b/include/hw/ppc/spapr_cpu_core.h
>> @@ -53,7 +53,12 @@ typedef struct SpaprCpuState {
>>   
>>       /* Fields for nested-HV support */
>>       bool in_nested; /* true while the L2 is executing */
>> -    struct nested_ppc_state *nested_host_state; /* holds the L1 state while L2 executes */
>> +    union {
>> +        /* nested-hv needs minimal set of regs as L1 stores L2 state */
>> +        struct nested_ppc_state *nested_hv_host;
>> +        /* In nested-papr, L0 retains entire L2 state, so keep it all safe. */
>> +        CPUPPCState *nested_papr_host;
>> +    };
> 
> This IMO still shouldn't be a CPUPPCState, but extending
> nested_ppc_state. Differences between nested APIs should not
> be here either, but inside the nested_ppc_state structure.
> 

I think for now, we can have it as CPUPPCState * since we do make use of
existing helper routines which takes env and operate on various regs it
contains. Since nested PAPR API holds entire L2 state, not sure if its
worth extending nested_ppc_state which was initially meant to store a
minimal sub-state, as now it will be duplicating most of CPUPPCState
members. Having said that, I am open to any optimizations that can be
taken as a follow-up patch and shouldnt be a blocker to have this
initial API feature committed and stablised. Hope you agree.

regards,
Harsh

> Thanks,
> Nick
> 
>>   } SpaprCpuState;
>>   
>>   static inline SpaprCpuState *spapr_cpu_state(PowerPCCPU *cpu)
>> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
>> index eaee624b87..ca5d28c06e 100644
>> --- a/include/hw/ppc/spapr_nested.h
>> +++ b/include/hw/ppc/spapr_nested.h
>> @@ -358,6 +358,12 @@ struct guest_state_request {
>>       uint16_t flags;
>>   };
>>   
>> +struct run_vcpu_exit_cause {
>> +    uint64_t nia;
>> +    uint64_t count;
>> +    uint16_t ids[10]; /* max ids supported by run_vcpu_exit_causes */
>> +};
>> +
>>   /*
>>    * Register state for entering a nested guest with H_ENTER_NESTED.
>>    * New member must be added at the end.
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API
  2023-09-07  3:56   ` Nicholas Piggin
@ 2023-10-12 10:25     ` Harsh Prateek Bora
  0 siblings, 0 replies; 47+ messages in thread
From: Harsh Prateek Bora @ 2023-10-12 10:25 UTC (permalink / raw)
  To: Nicholas Piggin, danielhb413, qemu-ppc
  Cc: qemu-devel, mikey, vaibhav, jniethe5, sbhat, kconsul



On 9/7/23 09:26, Nicholas Piggin wrote:
> On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
>> Adding initial documentation about Nested PAPR API to describe the set
>> of APIs and its usage. Also talks about the Guest State Buffer elements
>> and it's format which is used between L0/L1 to communicate L2 state.
> 
> I would move this patch first (well, behind any cleanup and preparation
> patches, but before any new API additions).
> 

Sure, moving this patch before introducing nested PAPR API code.

regards,
Harsh

> Thanks,
> Nick
> 
>>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   docs/devel/nested-papr.txt | 500 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 500 insertions(+)
>>   create mode 100644 docs/devel/nested-papr.txt
>>
>> diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
>> new file mode 100644
>> index 0000000000..c5c2ba7e50
>> --- /dev/null
>> +++ b/docs/devel/nested-papr.txt
>> @@ -0,0 +1,500 @@
>> +Nested PAPR API (aka KVM on PowerVM)
>> +====================================
>> +
>> +This API aims at providing support to enable nested virtualization with
>> +KVM on PowerVM. While the existing support for nested KVM on PowerNV was
>> +introduced with cap-nested-hv option, however, with a slight design change,
>> +to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
>> +
>> +  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
>> +
>> +Work by:
>> +    Michael Neuling <mikey@neuling.org>
>> +    Vaibhav Jain <vaibhav@linux.ibm.com>
>> +    Jordan Niethe <jniethe5@gmail.com>
>> +    Harsh Prateek Bora <harshpb@linux.ibm.com>
>> +    Shivaprasad G Bhat <sbhat@linux.ibm.com>
>> +    Kautuk Consul <kconsul@linux.vnet.ibm.com>
>> +
>> +Below taken from the kernel documentation:
>> +
>> +Introduction
>> +============
>> +
>> +This document explains how a guest operating system can act as a
>> +hypervisor and run nested guests through the use of hypercalls, if the
>> +hypervisor has implemented them. The terms L0, L1, and L2 are used to
>> +refer to different software entities. L0 is the hypervisor mode entity
>> +that would normally be called the "host" or "hypervisor". L1 is a
>> +guest virtual machine that is directly run under L0 and is initiated
>> +and controlled by L0. L2 is a guest virtual machine that is initiated
>> +and controlled by L1 acting as a hypervisor. A significant design change
>> +wrt existing API is that now the entire L2 state is maintained within L0.
>> +
>> +Existing Nested-HV API
>> +======================
>> +
>> +Linux/KVM has had support for Nesting as an L0 or L1 since 2018
>> +
>> +The L0 code was added::
>> +
>> +   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
>> +   Author: Paul Mackerras <paulus@ozlabs.org>
>> +   Date:   Mon Oct 8 16:31:03 2018 +1100
>> +   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
>> +
>> +The L1 code was added::
>> +
>> +   commit 360cae313702cdd0b90f82c261a8302fecef030a
>> +   Author: Paul Mackerras <paulus@ozlabs.org>
>> +   Date:   Mon Oct 8 16:31:04 2018 +1100
>> +   KVM: PPC: Book3S HV: Nested guest entry via hypercall
>> +
>> +This API works primarily using a signal hcall h_enter_nested(). This
>> +call made by the L1 to tell the L0 to start an L2 vCPU with the given
>> +state. The L0 then starts this L2 and runs until an L2 exit condition
>> +is reached. Once the L2 exits, the state of the L2 is given back to
>> +the L1 by the L0. The full L2 vCPU state is always transferred from
>> +and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
>> +vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
>> +-> L1 exit).
>> +
>> +The only state kept by the L0 is the partition table. The L1 registers
>> +it's partition table using the h_set_partition_table() hcall. All
>> +other state held by the L0 about the L2s is cached state (such as
>> +shadow page tables).
>> +
>> +The L1 may run any L2 or vCPU without first informing the L0. It
>> +simply starts the vCPU using h_enter_nested(). The creation of L2s and
>> +vCPUs is done implicitly whenever h_enter_nested() is called.
>> +
>> +In this document, we call this existing API the v1 API.
>> +
>> +New PAPR API
>> +===============
>> +
>> +The new PAPR API changes from the v1 API such that the creating L2 and
>> +associated vCPUs is explicit. In this document, we call this the v2
>> +API.
>> +
>> +h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
>> +be called the L1 must explicitly create the L2 using h_guest_create()
>> +and any associated vCPUs() created with h_guest_create_vCPU(). Getting
>> +and setting vCPU state can also be performed using h_guest_{g|s}et
>> +hcall.
>> +
>> +The basic execution flow is for an L1 to create an L2, run it, and
>> +delete it is:
>> +
>> +- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
>> +  (normally at L1 boot time).
>> +
>> +- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
>> +
>> +- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
>> +
>> +- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
>> +
>> +- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
>> +
>> +- L1 deletes L2 with H_GUEST_DELETE()
>> +
>> +More details of the individual hcalls follows:
>> +
>> +HCALL Details
>> +=============
>> +
>> +This documentation is provided to give an overall understating of the
>> +API. It doesn't aim to provide full details required to implement
>> +an L1 or L0. Latest PAPR spec shall be referred for more details.
>> +
>> +All these HCALLs are made by the L1 to the L0.
>> +
>> +H_GUEST_GET_CAPABILITIES()
>> +--------------------------
>> +
>> +This is called to get the capabilities of the L0 nested
>> +hypervisor. This includes capabilities such the CPU versions (eg
>> +POWER9, POWER10) that are supported as L2s.
>> +
>> +H_GUEST_SET_CAPABILITIES()
>> +--------------------------
>> +
>> +This is called to inform the L0 of the capabilities of the L1
>> +hypervisor. The set of flags passed here are the same as
>> +H_GUEST_GET_CAPABILITIES()
>> +
>> +Typically, GET will be called first and then SET will be called with a
>> +subset of the flags returned from GET. This process allows the L0 and
>> +L1 to negotiate a agreed set of capabilities.
>> +
>> +H_GUEST_CREATE()
>> +----------------
>> +
>> +This is called to create a L2. Returned is ID of the L2 created
>> +(similar to an LPID), which can be use on subsequent HCALLs to
>> +identify the L2.
>> +
>> +H_GUEST_CREATE_VCPU()
>> +---------------------
>> +
>> +This is called to create a vCPU associated with a L2. The L2 id
>> +(returned from H_GUEST_CREATE()) should be passed it. Also passed in
>> +is a unique (for this L2) vCPUid. This vCPUid is allocated by the
>> +L1.
>> +
>> +H_GUEST_SET_STATE()
>> +-------------------
>> +
>> +This is called to set L2 wide or vCPU specific L2 state. This info is
>> +passed via the Guest State Buffer (GSB), details below.
>> +
>> +This can set either L2 wide or vcpu specific information. Examples of
>> +L2 wide is the timebase offset or process scoped page table
>> +info. Examples of vCPU wide are GPRs or VSRs. A bit in the flags
>> +parameter specifies if this call is L2 wide or vCPU specific and the
>> +IDs in the GSB must match this.
>> +
>> +The L1 provides a pointer to the GSB as a parameter to this call. Also
>> +provided is the L2 and vCPU IDs associated with the state to set.
>> +
>> +The L1 writes all values in the GSB and the L0 only reads the GSB for
>> +this call
>> +
>> +H_GUEST_GET_STATE()
>> +-------------------
>> +
>> +This is called to get state associated with a L2 or L2 vCPU. This info
>> +passed via the GSB (details below).
>> +
>> +This can get either L2 wide or vcpu specific information. Examples of
>> +L2 wide is the timebase offset or process scoped page table
>> +info. Examples of vCPU wide are GPRs or VSRs. A bit in the flags
>> +parameter specifies if this call is L2 wide or vCPU specific and the
>> +IDs in the GSB must match this.
>> +
>> +The L1 provides a pointer to the GSB as a parameter to this call. Also
>> +provided is the L2 and vCPU IDs associated with the state to get.
>> +
>> +The L1 writes only the IDs and sizes in the GSB.  L0 writes the
>> +associated values for each ID in the GSB.
>> +
>> +H_GUEST_RUN_VCPU()
>> +------------------
>> +
>> +This is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
>> +parameters. The vCPU runs with the state set previously using
>> +H_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
>> +hcall.
>> +
>> +This hcall also has associated input and output GSBs. Unlike
>> +H_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
>> +parameters to the hcall (This was done in the interest of
>> +performance). The locations of these GSBs must be preregistered using
>> +the H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table later
>> +below).
>> +
>> +The input GSB may contain only VCPU wide elements to be set. This GSB
>> +may also contain zero elements (ie 0 in the first 4 bytes of the GSB)
>> +if nothing needs to be set.
>> +
>> +On exit from the hcall, the output buffer is filled with elements
>> +determined by the L0. The reason for the exit is contained in GPR4 (ie
>> +NIP is put in GPR4).  The elements returned depend on the exit
>> +type. For example, if the exit reason is the L2 doing a hcall (GPR4 =
>> +0xc00), then GPR3-12 are provided in the output GSB as this is the
>> +state likely needed to service the hcall. If additional state is
>> +needed, H_GUEST_GET_STATE() may be called by the L1.
>> +
>> +To synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
>> +the L1 may set a flag (as a hcall parameter) and the L0 will
>> +synthesize the interrupt in the L2. Alternatively, the L1 may
>> +synthesize the interrupt itself using H_GUEST_SET_STATE() or the
>> +H_GUEST_RUN_VCPU() input GSB to set the state appropriately.
>> +
>> +H_GUEST_DELETE()
>> +----------------
>> +
>> +This is called to delete an L2. All associated vCPUs are also
>> +deleted. No specific vCPU delete call is provided.
>> +
>> +A flag may be provided to delete all guests. This is used to reset the
>> +L0 in the case of kdump/kexec.
>> +
>> +Guest State Buffer (GSB)
>> +========================
>> +
>> +The Guest State Buffer (GSB) is the main method of communicating state
>> +about the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
>> +H_GUEST_VCPU_RUN() calls.
>> +
>> +State may be associated with a whole L2 (eg timebase offset) or a
>> +specific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
>> +H_GUEST_VCPU_RUN().
>> +
>> +All data in the GSB is big endian (as is standard in PAPR)
>> +
>> +The Guest state buffer has a header which gives the number of
>> +elements, followed by the GSB elements themselves.
>> +
>> +GSB header:
>> +
>> ++----------+----------+-------------------------------------------+
>> +|  Offset  |  Size    |  Purpose                                  |
>> +|  Bytes   |  Bytes   |                                           |
>> ++==========+==========+===========================================+
>> +|    0     |    4     |  Number of elements                       |
>> ++----------+----------+-------------------------------------------+
>> +|    4     |          |  Guest state buffer elements              |
>> ++----------+----------+-------------------------------------------+
>> +
>> +GSB element:
>> +
>> ++----------+----------+-------------------------------------------+
>> +|  Offset  |  Size    |  Purpose                                  |
>> +|  Bytes   |  Bytes   |                                           |
>> ++==========+==========+===========================================+
>> +|    0     |    2     |  ID                                       |
>> ++----------+----------+-------------------------------------------+
>> +|    2     |    2     |  Size of Value                            |
>> ++----------+----------+-------------------------------------------+
>> +|    4     | As above |  Value                                    |
>> ++----------+----------+-------------------------------------------+
>> +
>> +The ID in the GSB element specifies what is to be set. This includes
>> +archtected state like GPRs, VSRs, SPRs, plus also some meta data about
>> +the partition like the timebase offset and partition scoped page
>> +table information.
>> +
>> ++--------+-------+----+--------+----------------------------------+
>> +|   ID   | Size  | RW | Thread | Details                          |
>> +|        | Bytes |    | Guest  |                                  |
>> +|        |       |    | Scope  |                                  |
>> ++========+=======+====+========+==================================+
>> +| 0x0000 |       | RW |   TG   | NOP element                      |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
>> +|        |       |    |        |                                  |
>> +|        |       |    |        |- 0x00 Addr part scope table      |
>> +|        |       |    |        |- 0x08 Num addr bits              |
>> +|        |       |    |        |- 0x10 Size root dir              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
>> +|        |       |    |        |                                  |
>> +|        |       |    |        |- 0x0 Addr proc scope table       |
>> +|        |       |    |        |- 0x8 Table size.                 |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0007-|       |    |        | Reserved                         |
>> +| 0x0BFF |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
>> +|        |       |    |        |                                  |
>> +|        |       |    |        |- 0x0 Addr of buffer              |
>> +|        |       |    |        |- 0x8 Buffer Size.                |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
>> +|        |       |    |        |                                  |
>> +|        |       |    |        |- 0x0 Addr of buffer              |
>> +|        |       |    |        |- 0x8 Buffer Size.                |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x0C03-|       |    |        | Reserved                         |
>> +| 0x0FFF |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
>> +| 0x101F |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1021 | 0x08  | RW |   T    | NIA                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1022 | 0x08  | RW |   T    | MSR                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1023 | 0x08  | RW |   T    | LR                               |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1024 | 0x08  | RW |   T    | XER                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1025 | 0x08  | RW |   T    | CTR                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1026 | 0x08  | RW |   T    | CFAR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1027 | 0x08  | RW |   T    | SRR0                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1028 | 0x08  | RW |   T    | SRR1                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1029 | 0x08  | RW |   T    | DAR                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x102B | 0x08  | RW |   T    | VTB                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x102C | 0x08  | RW |   T    | LPCR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x102D | 0x08  | RW |   T    | HFSCR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x102E | 0x08  | RW |   T    | FSCR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x102F | 0x08  | RW |   T    | FPSCR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1032 | 0x08  | RW |   T    | CIABR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1033 | 0x08  | RW |   T    | PURR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1034 | 0x08  | RW |   T    | SPURR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1035 | 0x08  | RW |   T    | IC                               |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
>> +| 0x1039 |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x103A | 0x08  | W  |   T    | PPR                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
>> +| 0x103E |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x103F | 0x08  | RW |   T    | MMCRA                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1040 | 0x08  | RW |   T    | SIER                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1043 | 0x08  | RW |   T    | BESCR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1046 | 0x08  | RW |   T    | AMR                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1047 | 0x08  | RW |   T    | IAMR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1048 | 0x08  | RW |   T    | AMOR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x104A | 0x08  | RW |   T    | SDAR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x104B | 0x08  | RW |   T    | SIAR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x104C | 0x08  | RW |   T    | DSCR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x104D | 0x08  | RW |   T    | TAR                              |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x104E | 0x08  | RW |   T    | DEXCR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1052 | 0x08  | RW |   T    | CTRL                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x1053-|       |    |        | Reserved                         |
>> +| 0x1FFF |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2000 | 0x04  | RW |   T    | CR                               |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2001 | 0x04  | RW |   T    | PIDR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2002 | 0x04  | RW |   T    | DSISR                            |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2003 | 0x04  | RW |   T    | VSCR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
>> +| 0x200c |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x200D | 0x04  | RW |   T    | WORT                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x200E | 0x04  | RW |   T    | PSPB                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x200F-|       |    |        | Reserved                         |
>> +| 0x2FFF |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
>> +| 0x303F |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0x3040-|       |    |        | Reserved                         |
>> +| 0xEFFF |       |    |        |                                  |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0xF000 | 0x08  | R  |   T    | HDAR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0xF002 | 0x04  | R  |   T    | HEIR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +| 0xF003 | 0x08  | R  |   T    | ASDR                             |
>> ++--------+-------+----+--------+----------------------------------+
>> +
>> +Miscellaneous info
>> +==================
>> +
>> +State not in ptregs/hvregs
>> +--------------------------
>> +
>> +In the v1 API, some state is not in the ptregs/hvstate. This includes
>> +the vector register and some SPRs. For the L1 to set this state for
>> +the L2, the L1 loads up these hardware registers before the
>> +h_enter_nested() call and the L0 ensures they end up as the L2 state
>> +(by not touching them).
>> +
>> +The v2 API removes this and explicitly sets this state via the GSB.
>> +
>> +L1 Implementation details: Caching state
>> +----------------------------------------
>> +
>> +In the v1 API, all state is sent from the L1 to the L0 and vice versa
>> +on every h_enter_nested() hcall. If the L0 is not currently running
>> +any L2s, the L0 has no state information about them. The only
>> +exception to this is the location of the partition table, registered
>> +via h_set_partition_table().
>> +
>> +The v2 API changes this so that the L0 retains the L2 state even when
>> +it's vCPUs are no longer running. This means that the L1 only needs to
>> +communicate with the L0 about L2 state when it needs to modify the L2
>> +state, or when it's value is out of date. This provides an opportunity
>> +for performance optimisation.
>> +
>> +When a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
>> +marks all L2 state as invalid. This means that if the L1 wants to know
>> +the L2 state (say via a kvm_get_one_reg() call), it needs  to call
>> +H_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
>> +valid in L1 until the L2 is run again.
>> +
>> +Also, when an L1 modifies L2 vcpu state, it doesn't need to write it
>> +to the L0 until that L2 vcpu runs again. Hence when the L1 updates
>> +state (say via a kvm_set_one_reg() call), it writes to an internal L1
>> +copy and only flushes this copy to the L0 when the L2 runs again via
>> +the H_GUEST_VCPU_RUN() input buffer.
>> +
>> +This lazy updating of state by the L1 avoids unnecessary
>> +H_GUEST_{G|S}ET_STATE() calls.
>> +
>> +References
>> +==========
>> +
>> +For more details, please refer:
>> +
>> +[1] Kernel documentation (currently v4 on mailing list):
>> +    - https://lore.kernel.org/linuxppc-dev/20230905034658.82835-1-jniethe5@gmail.com/
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2023-10-12 10:27 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-06  4:33 [PATCH 00/15] Nested PAPR API (KVM on PowerVM) Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros Harsh Prateek Bora
2023-09-06 23:48   ` Nicholas Piggin
2023-09-11  6:21     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API Harsh Prateek Bora
2023-09-07  1:06   ` Nicholas Piggin
2023-09-11  6:47     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr Harsh Prateek Bora
2023-09-07  1:13   ` Nicholas Piggin
2023-09-11  7:24     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api Harsh Prateek Bora
2023-09-07  1:35   ` Nicholas Piggin
2023-09-11  8:18     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API Harsh Prateek Bora
2023-09-07  1:49   ` Nicholas Piggin
2023-09-19  9:49     ` Harsh Prateek Bora
2023-09-07  1:52   ` Nicholas Piggin
2023-09-06  4:33 ` [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES Harsh Prateek Bora
2023-09-07  2:02   ` Nicholas Piggin
2023-09-19 10:48     ` Harsh Prateek Bora
2023-10-03  8:10     ` Cédric Le Goater
2023-09-06  4:33 ` [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES Harsh Prateek Bora
2023-09-07  2:09   ` Nicholas Piggin
2023-10-03  4:59     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE Harsh Prateek Bora
2023-09-07  2:28   ` Nicholas Piggin
2023-10-03  7:57     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU Harsh Prateek Bora
2023-09-07  2:49   ` Nicholas Piggin
2023-10-04  4:49     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table Harsh Prateek Bora
2023-09-07  3:01   ` Nicholas Piggin
2023-10-04  9:27     ` Harsh Prateek Bora
2023-10-04  9:42       ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE Harsh Prateek Bora
2023-09-07  3:30   ` Nicholas Piggin
2023-10-09  8:23     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 12/15] ppc: spapr: Use correct source for parttbl info for nested PAPR API Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU Harsh Prateek Bora
2023-09-07  3:55   ` Nicholas Piggin
2023-10-12 10:23     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE Harsh Prateek Bora
2023-09-07  2:31   ` Nicholas Piggin
2023-10-03  8:01     ` Harsh Prateek Bora
2023-09-06  4:33 ` [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API Harsh Prateek Bora
2023-09-07  3:56   ` Nicholas Piggin
2023-10-12 10:25     ` Harsh Prateek Bora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).