kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
@ 2023-03-12 18:00 Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 1/5] pkvm: arm64: Move nvhe/spinlock.h to include/asm dir Jason Chen CJ
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:00 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

Protected-KVM (pKVM) on Intel platform is designed as a thin hypervisor
to extend KVM supporting VMs isolated from the host.

I am sending out this RFC requesting early review. The patches are in a
little early stage and with large LOC, I hope they can present you the
basic idea of pKVM on Intel platform, and give you an overview of most
fundamental changes needed to launch VMs on top of pKVM hypervisor. The
patches are finally intended to slice into more digestible pieces for
review and merge.

The concept of pKVM is first introduced by Google for ARM platform
[1][2][3], which aims to extend Trust Execution Environment (TEE) from
ARM secure world to virtual machines (VMs). Such VMs are protected by the
pKVM from the host OS or other VMs accessing the payloads running inside
(so called protected VM). More details about the overall idea, design,
and motivations can be found in Will's talk at KVM forum 2020 [4].

There are similar use cases on x86 platforms requesting protected
environment which is isolated from host OS for confidential computing.
Meanwhile host OS still presents the primary user interface and people
will expect the same bare metal experience as before in terms of both
performance and functionalities (like rich-IO usages), so the host OS
is desired to remain the ability to manage the system resources as much
as possible. At the same time, in order to mitigate the attack to the
confidential computing environment, the Trusted Computing Base (TCB) of
the solution shall be minimized.

HW solutions e.g. TDX [5] also exist to support above use cases. But
they are available only on very new platforms. Hence having a software
solution on massive existing platforms is also plausible.

pKVM has the merit of both providing an isolated environment for
protected VMs and also sustaining rich bare metal experiences as
expected by the host OS. This is achieved by creating a small
hypervisor below the host OS which contains only minimal
functionalities (e.g. VMX, EPT, IOMMU, etc.) for isolating protected
VMs from host OS and other VMs. In the meantime the host kernel still
remains access to most of the system resources and plays the role of
managing VM life cycles, allocating VM resources, etc. Existing KVM
module calls into the hypervisor (via emulation or enlightened PV ops)
to complete missing functionalities which have been moved downward.

      +--------------------+   +-----------------+
      |                    |   |                 |
      |     host VM        |   |  protected VM   |
      |    (act like       |   |                 |
      |   on bare metal)   |   |                 |
      |			   |   +-----------------+
      |                    +---------------------+
      |            +--------------------+        |
      |            | vVMX, vEPT, vIOMMU |        |
      |            +--------------------+        |
      +------------------------------------------+
      +------------------------------------------+
      |       pKVM (own VMX, EPT, IOMMU)         |
      +------------------------------------------+

[note: above figure is based on Intel terminologies]

The terminologies used in this RFC series:

- host VM:     native Linux which boot pKVM then deprivilege to a VM
- protected VM: VM launched by host but protected by pKVM
- normal VM:    VM launched & protected by host

pKVM binary is compiled as an extension of KVM module, but resides in a
separate, dedicated memory section of the vmlinux image. It makes pKVM
easy to release and verified boot together with Linux kernel image. It
also means pKVM is a post-launched hypervisor since it's started by KVM
module.

ARM platform naturally supports different exception level (EL) and the
host kernel can be set to run at EL1 during the early boot stage before
launching pKVM hypervisor, so pKVM just needs to be installed to EL2.
On Intel platform, the host Linux kernel is originally running in VMX
root mode, then deprivileged to run into vmx non-root mode as a host VM,
whereas pKVM is kept running at VMX root mode. Comparing with pKVM on
ARM, pKVM on Intel platform needs more complicated deprivilege stage to
prepare and setup VMX environment in VMX root mode.

As a hypervisor, pKVM on Intel platform leverages virtualization
technologies (see below) to guarantee the isolation among itself and low
privilege guests (include host Linux) on top of it:

 - pKVM manages CPU state/context switch between hypervisor and different
   guests. It's largely done by VMCS.

 - pKVM owns EPT page table to manage the GPA to HPA mapping of its host
   VM and guest VMs, which ensures they will not touch the hypervisor's
   memory and isolate among each other. It's similar to pKVM on ARM which
   owns stage-2 MMU page table to isolate memory among hypervisor, host,
   protected VMs and normal VMs. To allow host manage EPT or stage-2 page
   tables, pKVM can choose to provide either PV ops or emulation for these
   page tables. pKVM on ARM chose PV ops, which providing hypervisor calls
   (HVCs) in pKVM for stage-2 MMU page table changes. pKVM on Intel
   platform provides emulation for EPT page table management - this avoids
   the code changes in x86 KVM MMU.

 - pKVM owns IOMMU (VT-d for Intel platform and SMMU for ARM platform)
   to manage device DMA buffer mapping to isolate DMA access. To allow
   host manage IOMMU page tables, smilar to EPT/stage-2 page table
   management, PV ops or emulation method could be chosen. pKVM on ARM
   chose PV ops [6], while pKVM on Intel platform will use IOMMU
   emulation (this RFC does not cover it and we are willing to change if
   see more advantages from PV ops).

A topic in KVM forum 2022 about supporting TEE on x86 client platforms
with pKVM [7] may help you understand more details about the framework
of pKVM on Intel platforms and the deltas between pKVM on Intel and ARM
platforms.

This RFC patch series is essential groundwork for future patch series.
Based on this RFC, host OS is deprivileged and normal VM can be launched
on top of pKVM hypervisor. Following is the TODO list after this series:

- protected VMs
   * page state management
   * security enforcement at vCPU context switch
   * QEMU & crosvm
   * fd-based proposal around KVM private memory [8]
   * guest attestation

- pass-thru devices
   * IOMMU virtualization

This RFC series is organized as follows:

  - Part-1 (this patch set) are refactor of small portions of the pKVM on
    ARM code to ease the pKVM on Intel platform's support;

  - Part-2 introduce pKVM on Intel platform and do the deprivilege for
    host OS, meantime build pKVM as an independent binary;

  - Part-3 introduce pgtable management in pKVM on Intel platform then
    finally isolate pKVM & host VM through creating its own address
    space (MMU + host EPT);

  - Part-4 are misc changes to support VPID, debug and nmi handling in
    pKVM on Intel platform;

  - Part-5 add VMX emulation based on shadow VMCS;

  - Part-6 add EPT emulation based on shadow EPT;

  - and finally part-7 add memory protection based on page stage
    management.

This work is based on Linux 6.2, and you can also get the branch if
you would like to:

  https://github.com/intel-staging/pKVM-IA/tree/RFC-v6.2

Thanks

Jason CJ Chen

[1]: https://lwn.net/Articles/836693/
[2]: https://lwn.net/Articles/837552/
[3]: https://lwn.net/Articles/895790/
[4]: https://kvmforum2020.sched.com/event/eE24/virtualization-for-the-masses-exposing-kvm-on-android-will-deacon-google
[5]: https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
[6]: https://lore.kernel.org/linux-arm-kernel/20230201125328.2186498-1-jean-philippe@linaro.org/T/
[7]: https://kvmforum2022.sched.com/event/15jKc/supporting-tee-on-x86-client-platforms-with-pkvm-jason-chen-intel
[8]: https://lwn.net/Articles/916589/

Jason Chen CJ (5):
  pkvm: arm64: Move nvhe/spinlock.h to include/asm dir
  pkvm: arm64: Make page allocator arch agnostic
  pkvm: arm64: Move page allocator to virt/kvm/pkvm
  pkvm: arm64: Make memory reservation arch agnostic
  pkvm: arm64: Move general part of memory reservation to virt/kvm/pkvm

 arch/arm64/include/asm/kvm_pkvm.h             |  8 ++
 .../asm/pkvm_spinlock.h}                      |  6 +-
 arch/arm64/kvm/Makefile                       |  3 +
 arch/arm64/kvm/hyp/hyp-constants.c            |  2 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  4 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  4 +-
 arch/arm64/kvm/hyp/nvhe/Makefile              |  4 +-
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  4 +-
 arch/arm64/kvm/hyp/nvhe/mm.c                  |  6 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  2 +-
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  2 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               |  4 +-
 arch/arm64/kvm/pkvm.c                         | 76 ++---------------
 .../memory.h => virt/kvm/pkvm/buddy_memory.h  | 10 +--
 .../hyp/include/nvhe => virt/kvm/pkvm}/gfp.h  | 10 +--
 .../hyp/nvhe => virt/kvm/pkvm}/page_alloc.c   |  3 +-
 virt/kvm/pkvm/pkvm.c                          | 84 +++++++++++++++++++
 19 files changed, 134 insertions(+), 102 deletions(-)
 rename arch/arm64/{kvm/hyp/include/nvhe/spinlock.h => include/asm/pkvm_spinlock.h} (95%)
 rename arch/arm64/kvm/hyp/include/nvhe/memory.h => virt/kvm/pkvm/buddy_memory.h (89%)
 rename {arch/arm64/kvm/hyp/include/nvhe => virt/kvm/pkvm}/gfp.h (86%)
 rename {arch/arm64/kvm/hyp/nvhe => virt/kvm/pkvm}/page_alloc.c (99%)
 create mode 100644 virt/kvm/pkvm/pkvm.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH part-1 1/5] pkvm: arm64: Move nvhe/spinlock.h to include/asm dir
  2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
@ 2023-03-12 18:00 ` Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 2/5] pkvm: arm64: Make page allocator arch agnostic Jason Chen CJ
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:00 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

Move nvhe/spinlock.h to include/asm dir, and rename to pkvm_spinlock.h.
This help to expose spinlock for pKVM, which is needed by the following
patch of moving pKVM page allocator to general dir.

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
---
 .../include/nvhe/spinlock.h => include/asm/pkvm_spinlock.h} | 6 +++---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h                       | 2 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h               | 2 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h                        | 2 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h                      | 2 +-
 arch/arm64/kvm/hyp/nvhe/mm.c                                | 2 +-
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/include/asm/pkvm_spinlock.h
similarity index 95%
rename from arch/arm64/kvm/hyp/include/nvhe/spinlock.h
rename to arch/arm64/include/asm/pkvm_spinlock.h
index 7c7ea8c55405..456417b40645 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/include/asm/pkvm_spinlock.h
@@ -10,8 +10,8 @@
  * Copyright (C) 2012 ARM Ltd.
  */
 
-#ifndef __ARM64_KVM_NVHE_SPINLOCK_H__
-#define __ARM64_KVM_NVHE_SPINLOCK_H__
+#ifndef __ARM64_ASM_PKVM_SPINLOCK_H__
+#define __ARM64_ASM_PKVM_SPINLOCK_H__
 
 #include <asm/alternative.h>
 #include <asm/lse.h>
@@ -122,4 +122,4 @@ static inline void hyp_assert_lock_held(hyp_spinlock_t *lock)
 static inline void hyp_assert_lock_held(hyp_spinlock_t *lock) { }
 #endif
 
-#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
+#endif /* __ARM64_ASM_PKVM_SPINLOCK_H__ */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..58e9f15b6a64 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -5,7 +5,7 @@
 #include <linux/list.h>
 
 #include <nvhe/memory.h>
-#include <nvhe/spinlock.h>
+#include <asm/pkvm_spinlock.h>
 
 #define HYP_NO_ORDER	USHRT_MAX
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index b7bdbe63deed..12b5db7a1ffe 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -12,7 +12,7 @@
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
 #include <nvhe/pkvm.h>
-#include <nvhe/spinlock.h>
+#include <asm/pkvm_spinlock.h>
 
 /*
  * SW bits 0-1 are reserved to track the memory ownership state of each page:
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index d5ec972b5c1e..1d50bb1da315 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -8,7 +8,7 @@
 #include <linux/types.h>
 
 #include <nvhe/memory.h>
-#include <nvhe/spinlock.h>
+#include <asm/pkvm_spinlock.h>
 
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 82b3d62538a6..992d3492297b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -10,7 +10,7 @@
 #include <asm/kvm_pkvm.h>
 
 #include <nvhe/gfp.h>
-#include <nvhe/spinlock.h>
+#include <asm/pkvm_spinlock.h>
 
 /*
  * Holds the relevant data for maintaining the vcpu state completely at hyp.
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 318298eb3d6b..9f740e441bce 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -16,7 +16,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
-#include <nvhe/spinlock.h>
+#include <asm/pkvm_spinlock.h>
 
 struct kvm_pgtable pkvm_pgtable;
 hyp_spinlock_t pkvm_pgd_lock;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH part-1 2/5] pkvm: arm64: Make page allocator arch agnostic
  2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 1/5] pkvm: arm64: Move nvhe/spinlock.h to include/asm dir Jason Chen CJ
@ 2023-03-12 18:00 ` Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 3/5] pkvm: arm64: Move page allocator to virt/kvm/pkvm Jason Chen CJ
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:00 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

Move arm64 arch specific definitions in memory.h to asm/kvm_pkvm.h
and remove unnecessary asm/kvm_hyp.h including in page_alloc.c.
Then memory.h and page_alloc.c are arch agostic and can be moved
to general dir.

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
---
 arch/arm64/include/asm/kvm_pkvm.h        | 3 +++
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 4 +---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 1 -
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 01129b0d4c68..2cc283feb97d 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -8,12 +8,15 @@
 
 #include <linux/memblock.h>
 #include <asm/kvm_pgtable.h>
+#include <asm/kvm_mmu.h>
 
 /* Maximum number of VMs that can co-exist under pKVM. */
 #define KVM_MAX_PVMS 255
 
 #define HYP_MEMBLOCK_REGIONS 128
 
+#define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
+
 int pkvm_init_host_vm(struct kvm *kvm);
 int pkvm_create_hyp_vm(struct kvm *kvm);
 void pkvm_destroy_hyp_vm(struct kvm *kvm);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index ab205c4d6774..e7d05f41ddf2 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -2,7 +2,7 @@
 #ifndef __KVM_HYP_MEMORY_H
 #define __KVM_HYP_MEMORY_H
 
-#include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/page.h>
 
 #include <linux/types.h>
@@ -15,8 +15,6 @@ struct hyp_page {
 extern u64 __hyp_vmemmap;
 #define hyp_vmemmap ((struct hyp_page *)__hyp_vmemmap)
 
-#define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
-
 static inline void *hyp_phys_to_virt(phys_addr_t phys)
 {
 	return __hyp_va(phys);
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 803ba3222e75..ef164102ab6a 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -4,7 +4,6 @@
  * Author: Quentin Perret <qperret@google.com>
  */
 
-#include <asm/kvm_hyp.h>
 #include <nvhe/gfp.h>
 
 u64 __hyp_vmemmap;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH part-1 3/5] pkvm: arm64: Move page allocator to virt/kvm/pkvm
  2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 1/5] pkvm: arm64: Move nvhe/spinlock.h to include/asm dir Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 2/5] pkvm: arm64: Make page allocator arch agnostic Jason Chen CJ
@ 2023-03-12 18:00 ` Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 4/5] pkvm: arm64: Make memory reservation arch agnostic Jason Chen CJ
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:00 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

Create virt/kvm/pkvm to hold arch agnostic files. First set of
files moved to this directory are related to page allocator.
As memory.h is too general which may also be used by pKVM in
the future, and here it's only for buddy page allocator, rename
it to buddy_memory.h.

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
---
 arch/arm64/kvm/Makefile                                   | 1 +
 arch/arm64/kvm/hyp/hyp-constants.c                        | 2 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h                      | 2 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h                    | 2 +-
 arch/arm64/kvm/hyp/nvhe/Makefile                          | 4 +++-
 arch/arm64/kvm/hyp/nvhe/early_alloc.c                     | 2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c                     | 4 ++--
 arch/arm64/kvm/hyp/nvhe/mm.c                              | 4 ++--
 arch/arm64/kvm/hyp/nvhe/pkvm.c                            | 2 +-
 arch/arm64/kvm/hyp/nvhe/psci-relay.c                      | 2 +-
 arch/arm64/kvm/hyp/nvhe/setup.c                           | 4 ++--
 .../include/nvhe/memory.h => virt/kvm/pkvm/buddy_memory.h | 6 +++---
 {arch/arm64/kvm/hyp/include/nvhe => virt/kvm/pkvm}/gfp.h  | 8 ++++----
 {arch/arm64/kvm/hyp/nvhe => virt/kvm/pkvm}/page_alloc.c   | 2 +-
 14 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5e33c2d4645a..119b074b001a 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -31,6 +31,7 @@ define rule_gen_hyp_constants
 endef
 
 CFLAGS_hyp-constants.o = -I $(srctree)/$(src)/hyp/include
+CFLAGS_hyp-constants.o += -I $(srctree)/virt/kvm/pkvm
 $(obj)/hyp-constants.s: $(src)/hyp/hyp-constants.c FORCE
 	$(call if_changed_dep,cc_s_c)
 
diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b257a3b4bfc5..6127969cb182 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include <linux/kbuild.h>
-#include <nvhe/memory.h>
+#include <buddy_memory.h>
 #include <nvhe/pkvm.h>
 
 int main(void)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 1d50bb1da315..1a955b16c06b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -7,7 +7,7 @@
 #include <linux/memblock.h>
 #include <linux/types.h>
 
-#include <nvhe/memory.h>
+#include <buddy_memory.h>
 #include <asm/pkvm_spinlock.h>
 
 extern struct kvm_pgtable pkvm_pgtable;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 992d3492297b..4e713e3c4daa 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -9,7 +9,7 @@
 
 #include <asm/kvm_pkvm.h>
 
-#include <nvhe/gfp.h>
+#include <gfp.h>
 #include <asm/pkvm_spinlock.h>
 
 /*
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 530347cdebe3..6cda2cb9b500 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,6 +13,7 @@ ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS -D__DISABLE_TRACE_MMI
 ccflags-y += -fno-stack-protector	\
 	     -DDISABLE_BRANCH_PROFILING	\
 	     $(DISABLE_STACKLEAK_PLUGIN)
+ccflags-y += -I $(srctree)/virt/kvm/pkvm
 
 hostprogs := gen-hyprel
 HOST_EXTRACFLAGS += -I$(objtree)/include
@@ -21,10 +22,11 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o page_alloc.o \
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o \
 	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o
 hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
+hyp-obj-y += ../../../../../virt/kvm/pkvm/page_alloc.o
 hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
 hyp-obj-y += $(lib-objs)
 
diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
index 00de04153cc6..be1e72cdcbce 100644
--- a/arch/arm64/kvm/hyp/nvhe/early_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -7,7 +7,7 @@
 #include <asm/kvm_pgtable.h>
 
 #include <nvhe/early_alloc.h>
-#include <nvhe/memory.h>
+#include <buddy_memory.h>
 
 struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
 s64 __ro_after_init hyp_physvirt_offset;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 552653fa18be..183ae39d2571 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -14,8 +14,8 @@
 
 #include <hyp/fault.h>
 
-#include <nvhe/gfp.h>
-#include <nvhe/memory.h>
+#include <gfp.h>
+#include <buddy_memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 9f740e441bce..ca556bb72a90 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -12,8 +12,8 @@
 #include <asm/spectre.h>
 
 #include <nvhe/early_alloc.h>
-#include <nvhe/gfp.h>
-#include <nvhe/memory.h>
+#include <gfp.h>
+#include <buddy_memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <asm/pkvm_spinlock.h>
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a06ece14a6d8..75a019345ab5 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -8,7 +8,7 @@
 #include <linux/mm.h>
 #include <nvhe/fixed_config.h>
 #include <nvhe/mem_protect.h>
-#include <nvhe/memory.h>
+#include <buddy_memory.h>
 #include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 08508783ec3d..1c757bd02d4d 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -11,7 +11,7 @@
 #include <linux/kvm_host.h>
 #include <uapi/linux/psci.h>
 
-#include <nvhe/memory.h>
+#include <buddy_memory.h>
 #include <nvhe/trap_handler.h>
 
 void kvm_hyp_cpu_entry(unsigned long r0);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 110f04627785..395affd81421 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -12,8 +12,8 @@
 
 #include <nvhe/early_alloc.h>
 #include <nvhe/fixed_config.h>
-#include <nvhe/gfp.h>
-#include <nvhe/memory.h>
+#include <gfp.h>
+#include <buddy_memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/pkvm.h>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/virt/kvm/pkvm/buddy_memory.h
similarity index 94%
rename from arch/arm64/kvm/hyp/include/nvhe/memory.h
rename to virt/kvm/pkvm/buddy_memory.h
index e7d05f41ddf2..b961cb7ac28f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/virt/kvm/pkvm/buddy_memory.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef __KVM_HYP_MEMORY_H
-#define __KVM_HYP_MEMORY_H
+#ifndef __PKVM_BUDDY_MEMORY_H
+#define __PKVM_BUDDY_MEMORY_H
 
 #include <asm/kvm_pkvm.h>
 #include <asm/page.h>
@@ -70,4 +70,4 @@ static inline void hyp_set_page_refcounted(struct hyp_page *p)
 	BUG_ON(p->refcount);
 	p->refcount = 1;
 }
-#endif /* __KVM_HYP_MEMORY_H */
+#endif /* __PKVM_BUDDY_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/virt/kvm/pkvm/gfp.h
similarity index 89%
rename from arch/arm64/kvm/hyp/include/nvhe/gfp.h
rename to virt/kvm/pkvm/gfp.h
index 58e9f15b6a64..1c3ff697efea 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/virt/kvm/pkvm/gfp.h
@@ -1,10 +1,10 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef __KVM_HYP_GFP_H
-#define __KVM_HYP_GFP_H
+#ifndef __PKVM_GFP_H
+#define __PKVM_GFP_H
 
 #include <linux/list.h>
 
-#include <nvhe/memory.h>
+#include <buddy_memory.h>
 #include <asm/pkvm_spinlock.h>
 
 #define HYP_NO_ORDER	USHRT_MAX
@@ -31,4 +31,4 @@ void hyp_put_page(struct hyp_pool *pool, void *addr);
 /* Used pages cannot be freed */
 int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 		  unsigned int reserved_pages);
-#endif /* __KVM_HYP_GFP_H */
+#endif /* __PKVM_GFP_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/virt/kvm/pkvm/page_alloc.c
similarity index 99%
rename from arch/arm64/kvm/hyp/nvhe/page_alloc.c
rename to virt/kvm/pkvm/page_alloc.c
index ef164102ab6a..a090ccba7717 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/virt/kvm/pkvm/page_alloc.c
@@ -4,7 +4,7 @@
  * Author: Quentin Perret <qperret@google.com>
  */
 
-#include <nvhe/gfp.h>
+#include <gfp.h>
 
 u64 __hyp_vmemmap;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH part-1 4/5] pkvm: arm64: Make memory reservation arch agnostic
  2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
                   ` (2 preceding siblings ...)
  2023-03-12 18:00 ` [RFC PATCH part-1 3/5] pkvm: arm64: Move page allocator to virt/kvm/pkvm Jason Chen CJ
@ 2023-03-12 18:00 ` Jason Chen CJ
  2023-03-12 18:00 ` [RFC PATCH part-1 5/5] pkvm: arm64: Move general part of memory reservation to virt/kvm/pkvm Jason Chen CJ
  2023-03-13 16:33 ` [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Sean Christopherson
  5 siblings, 0 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:00 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

Do not use arm specific kvm_nvhe_sym, expose a new definition pkvm_sym
for this usage.

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
---
 arch/arm64/include/asm/kvm_pkvm.h | 2 ++
 arch/arm64/kvm/pkvm.c             | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 2cc283feb97d..b508c7b63ff4 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -17,6 +17,8 @@
 
 #define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
 
+#define pkvm_sym kvm_nvhe_sym
+
 int pkvm_init_host_vm(struct kvm *kvm);
 int pkvm_create_hyp_vm(struct kvm *kvm);
 void pkvm_destroy_hyp_vm(struct kvm *kvm);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index cf56958b1492..e787bd704043 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -13,8 +13,8 @@
 
 #include "hyp_constants.h"
 
-static struct memblock_region *hyp_memory = kvm_nvhe_sym(hyp_memory);
-static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
+static struct memblock_region *hyp_memory = pkvm_sym(hyp_memory);
+static unsigned int *hyp_memblock_nr_ptr = &pkvm_sym(hyp_memblock_nr);
 
 phys_addr_t hyp_mem_base;
 phys_addr_t hyp_mem_size;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH part-1 5/5] pkvm: arm64: Move general part of memory reservation to virt/kvm/pkvm
  2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
                   ` (3 preceding siblings ...)
  2023-03-12 18:00 ` [RFC PATCH part-1 4/5] pkvm: arm64: Make memory reservation arch agnostic Jason Chen CJ
@ 2023-03-12 18:00 ` Jason Chen CJ
  2023-03-13 16:33 ` [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Sean Christopherson
  5 siblings, 0 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:00 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

Most part of the memory reservation for pKVM is arch agnostic, move
them to virt/kvm/pkvm/pkvm.c. Arch specific pre_reserve_check and
total_reserve_pages caculation are separated to arch-implemented APIs
and remain in arch dir.

Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
---
 arch/arm64/include/asm/kvm_pkvm.h |  3 ++
 arch/arm64/kvm/Makefile           |  2 +
 arch/arm64/kvm/pkvm.c             | 76 +++-------------------------
 virt/kvm/pkvm/pkvm.c              | 84 +++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 69 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index b508c7b63ff4..42d32d99595e 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -26,6 +26,9 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm);
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+int hyp_pre_reserve_check(void);
+u64 hyp_total_reserve_pages(void);
+
 static inline unsigned long
 hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t vmemmap_entry_size)
 {
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 119b074b001a..9691fd90de6b 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -22,6 +22,8 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
+kvm-y += ../../../virt/kvm/pkvm/pkvm.o
+
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
 
 always-y := hyp_constants.h hyp-constants.s
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index e787bd704043..97b9647f3370 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -5,95 +5,33 @@
  */
 
 #include <linux/kvm_host.h>
-#include <linux/memblock.h>
 #include <linux/mutex.h>
-#include <linux/sort.h>
 
 #include <asm/kvm_pkvm.h>
 
 #include "hyp_constants.h"
 
-static struct memblock_region *hyp_memory = pkvm_sym(hyp_memory);
-static unsigned int *hyp_memblock_nr_ptr = &pkvm_sym(hyp_memblock_nr);
-
-phys_addr_t hyp_mem_base;
-phys_addr_t hyp_mem_size;
-
-static int cmp_hyp_memblock(const void *p1, const void *p2)
+int hyp_pre_reserve_check(void)
 {
-	const struct memblock_region *r1 = p1;
-	const struct memblock_region *r2 = p2;
-
-	return r1->base < r2->base ? -1 : (r1->base > r2->base);
-}
-
-static void __init sort_memblock_regions(void)
-{
-	sort(hyp_memory,
-	     *hyp_memblock_nr_ptr,
-	     sizeof(struct memblock_region),
-	     cmp_hyp_memblock,
-	     NULL);
-}
-
-static int __init register_memblock_regions(void)
-{
-	struct memblock_region *reg;
-
-	for_each_mem_region(reg) {
-		if (*hyp_memblock_nr_ptr >= HYP_MEMBLOCK_REGIONS)
-			return -ENOMEM;
+	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
+		return -EINVAL;
 
-		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
-		(*hyp_memblock_nr_ptr)++;
-	}
-	sort_memblock_regions();
+	if (kvm_get_mode() != KVM_MODE_PROTECTED)
+		return -EINVAL;
 
 	return 0;
 }
 
-void __init kvm_hyp_reserve(void)
+u64 hyp_total_reserve_pages(void)
 {
 	u64 hyp_mem_pages = 0;
-	int ret;
-
-	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
-		return;
-
-	if (kvm_get_mode() != KVM_MODE_PROTECTED)
-		return;
-
-	ret = register_memblock_regions();
-	if (ret) {
-		*hyp_memblock_nr_ptr = 0;
-		kvm_err("Failed to register hyp memblocks: %d\n", ret);
-		return;
-	}
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
 	hyp_mem_pages += hyp_vm_table_pages();
 	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
-	/*
-	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
-	 * this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
-	 */
-	hyp_mem_size = hyp_mem_pages << PAGE_SHIFT;
-	hyp_mem_base = memblock_phys_alloc(ALIGN(hyp_mem_size, PMD_SIZE),
-					   PMD_SIZE);
-	if (!hyp_mem_base)
-		hyp_mem_base = memblock_phys_alloc(hyp_mem_size, PAGE_SIZE);
-	else
-		hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE);
-
-	if (!hyp_mem_base) {
-		kvm_err("Failed to reserve hyp memory\n");
-		return;
-	}
-
-	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
-		 hyp_mem_base);
+	return hyp_mem_pages;
 }
 
 /*
diff --git a/virt/kvm/pkvm/pkvm.c b/virt/kvm/pkvm/pkvm.c
new file mode 100644
index 000000000000..6f06a41f0e77
--- /dev/null
+++ b/virt/kvm/pkvm/pkvm.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/memblock.h>
+#include <linux/sort.h>
+
+#include <asm/kvm_pkvm.h>
+
+static struct memblock_region *hyp_memory = pkvm_sym(hyp_memory);
+static unsigned int *hyp_memblock_nr_ptr = &pkvm_sym(hyp_memblock_nr);
+
+phys_addr_t hyp_mem_base;
+phys_addr_t hyp_mem_size;
+
+static int cmp_hyp_memblock(const void *p1, const void *p2)
+{
+	const struct memblock_region *r1 = p1;
+	const struct memblock_region *r2 = p2;
+
+	return r1->base < r2->base ? -1 : (r1->base > r2->base);
+}
+
+static void __init sort_memblock_regions(void)
+{
+	sort(hyp_memory,
+	     *hyp_memblock_nr_ptr,
+	     sizeof(struct memblock_region),
+	     cmp_hyp_memblock,
+	     NULL);
+}
+
+static int __init register_memblock_regions(void)
+{
+	struct memblock_region *reg;
+
+	for_each_mem_region(reg) {
+		if (*hyp_memblock_nr_ptr >= HYP_MEMBLOCK_REGIONS)
+			return -ENOMEM;
+
+		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
+		(*hyp_memblock_nr_ptr)++;
+	}
+	sort_memblock_regions();
+
+	return 0;
+}
+
+void __init kvm_hyp_reserve(void)
+{
+	int ret;
+
+	if (hyp_pre_reserve_check() < 0)
+		return;
+
+	ret = register_memblock_regions();
+	if (ret) {
+		*hyp_memblock_nr_ptr = 0;
+		kvm_err("Failed to register hyp memblocks: %d\n", ret);
+		return;
+	}
+
+	/*
+	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
+	 * this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
+	 */
+	hyp_mem_size = hyp_total_reserve_pages() << PAGE_SHIFT;
+	hyp_mem_base = memblock_phys_alloc(ALIGN(hyp_mem_size, PMD_SIZE),
+					   PMD_SIZE);
+	if (!hyp_mem_base)
+		hyp_mem_base = memblock_phys_alloc(hyp_mem_size, PAGE_SIZE);
+	else
+		hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE);
+
+	if (!hyp_mem_base) {
+		kvm_err("Failed to reserve hyp memory\n");
+		return;
+	}
+
+	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
+		 hyp_mem_base);
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
                   ` (4 preceding siblings ...)
  2023-03-12 18:00 ` [RFC PATCH part-1 5/5] pkvm: arm64: Move general part of memory reservation to virt/kvm/pkvm Jason Chen CJ
@ 2023-03-13 16:33 ` Sean Christopherson
  2023-03-14 16:17   ` Jason Chen CJ
  5 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2023-03-13 16:33 UTC (permalink / raw)
  To: Jason Chen CJ; +Cc: kvm

On Mon, Mar 13, 2023, Jason Chen CJ wrote:
> There are similar use cases on x86 platforms requesting protected
> environment which is isolated from host OS for confidential computing.

What exactly are those use cases?  The more details you can provide, the better.
E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
the pKVM implementation.

> HW solutions e.g. TDX [5] also exist to support above use cases. But
> they are available only on very new platforms. Hence having a software
> solution on massive existing platforms is also plausible.

TDX is a software solution, not a hardware solution.  TDX relies on hardware features
that are only present in bleeding edge CPUs, e.g. SEAM, but TDX itself is software.

I bring that up because this RFC, especially since it's being posted by folks
from Intel, raises the question: why not utilize SEAM to implement pKVM for x86?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-14 16:17   ` Jason Chen CJ
@ 2023-03-14 14:21     ` Sean Christopherson
  2023-03-16  8:50       ` Jason Chen CJ
  2023-03-24 10:30       ` Keir Fraser
  0 siblings, 2 replies; 18+ messages in thread
From: Sean Christopherson @ 2023-03-14 14:21 UTC (permalink / raw)
  To: Jason Chen CJ; +Cc: kvm

On Tue, Mar 14, 2023, Jason Chen CJ wrote:
> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
> 
> > On Mon, Mar 13, 2023, Jason Chen CJ wrote:
> > > There are similar use cases on x86 platforms requesting protected
> > > environment which is isolated from host OS for confidential computing.
> > 
> > What exactly are those use cases?  The more details you can provide, the better.
> > E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
> > the pKVM implementation.
> 
> Thanks Sean for your comments, I am very appreciated!
> 
> We are expected 

Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
then please work with whoever you need to in order to get permission to fully
disclose the use case.  Because realistically, without knowing exactly what is
in scope and why, this is going nowhere.  

> to run protected VM with general OS and may with pass-thru secure devices support.

Why?  What is the actual use case?

> May I know your suggestion of "utilize SEAM" is to follow TDX SPEC then
> work out a SW-TDX solution, or just do some leverage from SEAM code?

Throw away TDX and let KVM run its own code in SEAM.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-13 16:33 ` [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Sean Christopherson
@ 2023-03-14 16:17   ` Jason Chen CJ
  2023-03-14 14:21     ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-14 16:17 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:

> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
> > There are similar use cases on x86 platforms requesting protected
> > environment which is isolated from host OS for confidential computing.
> 
> What exactly are those use cases?  The more details you can provide, the better.
> E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
> the pKVM implementation.

Thanks Sean for your comments, I am very appreciated!

We are expected to run protected VM with general OS and may with
pass-thru secure devices support. 

Yes, restricting the isolated(protected) VMs to 64-bit mode could
simplify the pKVM implementation, I think it should be considered.
Especially it could benefit vmcs isolation for protected VM - it echoes
to your comments on VMX emulation.

But we have a pain point to support normal VM. You know, TDX SEAM only
take care protected VM, it has dedicated secure EPT, TDCS etc. for a
protected VM; while for normal VM, it still go to the old KVM logic as
legacy EPT, VMCS kind of thing are still there.

For pKVM, we must rely on EPT, VMCS, IOMMU to do the isolation, so move
them to the hypervisor, and KVM-high need to manage them through pKVM for
both normal & protected VM:

 - for EPT, technically, both paravirtualize & emulation method works,
   we choose to use EPT emulation only because we do not want to change
   KVM x86 MMU code. I am open to switch to paravirtualize method
   especially after TDX patches got merged - we can leverage from it but
   with more consideration to support normal VM.

 - for VMCS, it's more tricky, as the best solution is that normal VM
   run with emulated VMX to see full VMCS features, while protected VM
   run with paravirtualized VMX to limit supported features (which
   simplify the implementation in pKVM for VMCS isolation & management).

 - for IOMMU, it has similar situation as EPT.

> 
> > HW solutions e.g. TDX [5] also exist to support above use cases. But
> > they are available only on very new platforms. Hence having a software
> > solution on massive existing platforms is also plausible.
> 
> TDX is a software solution, not a hardware solution.  TDX relies on hardware features
> that are only present in bleeding edge CPUs, e.g. SEAM, but TDX itself is software.

Agree.

> 
> I bring that up because this RFC, especially since it's being posted by folks
> from Intel, raises the question: why not utilize SEAM to implement pKVM for x86?

Some feedback in above, I suppose SEAM can be leveraged to support
protected VM, but with some further questions:

 - how to support normal VM? if we have tradeoff to limit normal VM's
   feature (same as protected VM), then things may become easier - but I
   don't think it's friendly to end users. If we want to run normal VM
   as what KVM can run now, we need to add extra code in SEAM.

 - do we want to follow same interface? My feeling to TDX interface like
   SEAMCALL for SEPT PAGE.ADD/AUG SEPT.ADD etc are complicated, for pKVM,
   we can actually use simpler & straight-forward hypercall like
   host_donate_guest, host_donate_hyp, host_share_guest.... And further
   more in protected VM (which is TD guest in TDX), PAGE.ACCEPT may not
   need for pKVM, and page sharing (based on SHARED_BIT) may also have
   different implementation for pKVM.

 - do we want to leverage the page ownership mechanism like PAMT? I have
   to say pKVM also aready have one page state management mechanism can
   easily be used.

May I know your suggestion of "utilize SEAM" is to follow TDX SPEC then
work out a SW-TDX solution, or just do some leverage from SEAM code?

-- 

Thanks
Jason CJ Chen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-14 14:21     ` Sean Christopherson
@ 2023-03-16  8:50       ` Jason Chen CJ
  2023-03-24 10:30       ` Keir Fraser
  1 sibling, 0 replies; 18+ messages in thread
From: Jason Chen CJ @ 2023-03-16  8:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
> > On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
> > 
> > > On Mon, Mar 13, 2023, Jason Chen CJ wrote:
> > > > There are similar use cases on x86 platforms requesting protected
> > > > environment which is isolated from host OS for confidential computing.
> > > 
> > > What exactly are those use cases?  The more details you can provide, the better.
> > > E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
> > > the pKVM implementation.
> > 
> > Thanks Sean for your comments, I am very appreciated!
> > 
> > We are expected 
> 
> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
> then please work with whoever you need to in order to get permission to fully
> disclose the use case.  Because realistically, without knowing exactly what is
> in scope and why, this is going nowhere.  
> 
> > to run protected VM with general OS and may with pass-thru secure devices support.
> 
> Why?  What is the actual use case?

Sorry for the confusion, I will try my best to give a general
description of the exact use case.

The use case is for client platform with requirement for confidential
computing:

 - host VM is still working as primary OS (act like native), it will
   launch its required normal VM (e.g., a Linux or android OS)
 - protected VM (e.g,, a Linux OS) is working as a TEE, it launched by
   host VM but finally isolated to host VM and other launched VMs, and
   it may run with pass-thru secure device (e.g., finger printer, secure
   camera etc.)

The general OS support for protected VM is ideal case, I suppose that
most likely user can be convinced to restrict it as a limit one
like 64-bit OS.

> 
> > May I know your suggestion of "utilize SEAM" is to follow TDX SPEC then
> > work out a SW-TDX solution, or just do some leverage from SEAM code?
> 
> Throw away TDX and let KVM run its own code in SEAM.

May I ask what do you mean "KVM run its own code in SEAM"? The target
platform to run pKVM on x86 is not expected to support TDX. Do you mean
we should keep same interface as SEAM?


-- 

Thanks
Jason CJ Chen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-14 14:21     ` Sean Christopherson
  2023-03-16  8:50       ` Jason Chen CJ
@ 2023-03-24 10:30       ` Keir Fraser
  2023-06-07 14:26         ` Mickaël Salaün
  2023-06-08 21:06         ` Dmytro Maluka
  1 sibling, 2 replies; 18+ messages in thread
From: Keir Fraser @ 2023-03-24 10:30 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Jason Chen CJ, kvm, android-kvm

On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
> > On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
> > 
> > > On Mon, Mar 13, 2023, Jason Chen CJ wrote:
> > > > There are similar use cases on x86 platforms requesting protected
> > > > environment which is isolated from host OS for confidential computing.
> > > 
> > > What exactly are those use cases?  The more details you can provide, the better.
> > > E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
> > > the pKVM implementation.
> > 
> > Thanks Sean for your comments, I am very appreciated!
> > 
> > We are expected 
> 
> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
> then please work with whoever you need to in order to get permission to fully
> disclose the use case.  Because realistically, without knowing exactly what is
> in scope and why, this is going nowhere.  

This is being seriously evaluated by ChromeOS as an alternative to
their existing ManaTEE design. Compared with that (hypervisor == full
Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
"VM" runs closer to native and without nested scheduling, demonstrated
better performance, and closer alignment with Android virtualisation
(that's my team, which of course is ARM focused, but we'd love to see
broader uptake of pKVM in the kernel).

 -- Keir

> > to run protected VM with general OS and may with pass-thru secure devices support.
> 
> Why?  What is the actual use case?
> 
> > May I know your suggestion of "utilize SEAM" is to follow TDX SPEC then
> > work out a SW-TDX solution, or just do some leverage from SEAM code?
> 
> Throw away TDX and let KVM run its own code in SEAM.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-24 10:30       ` Keir Fraser
@ 2023-06-07 14:26         ` Mickaël Salaün
  2023-06-08 21:06         ` Dmytro Maluka
  1 sibling, 0 replies; 18+ messages in thread
From: Mickaël Salaün @ 2023-06-07 14:26 UTC (permalink / raw)
  To: Keir Fraser, Sean Christopherson
  Cc: Jason Chen CJ, kvm, android-kvm, x86, linux-hardening


On 24/03/2023 11:30, Keir Fraser wrote:
> On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
>> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
>>> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
>>>
>>>> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
>>>>> There are similar use cases on x86 platforms requesting protected
>>>>> environment which is isolated from host OS for confidential computing.
>>>>
>>>> What exactly are those use cases?  The more details you can provide, the better.
>>>> E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
>>>> the pKVM implementation.
>>>
>>> Thanks Sean for your comments, I am very appreciated!
>>>
>>> We are expected
>>
>> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
>> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
>> then please work with whoever you need to in order to get permission to fully
>> disclose the use case.  Because realistically, without knowing exactly what is
>> in scope and why, this is going nowhere.
> 
> This is being seriously evaluated by ChromeOS as an alternative to
> their existing ManaTEE design. Compared with that (hypervisor == full
> Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
> "VM" runs closer to native and without nested scheduling, demonstrated
> better performance, and closer alignment with Android virtualisation
> (that's my team, which of course is ARM focused, but we'd love to see
> broader uptake of pKVM in the kernel).

This pKVM implementation would definitely be useful to protect the host 
from itself (i.e. improved kernel self-protection) thanks to the 
Hypervisor-Enforced Kernel Integrity patch series: 
https://lore.kernel.org/all/20230505152046.6575-1-mic@digikod.net/

Use cases would then include all bare metal Linux systems with security 
requirements. They would initially configure pKVM with the dedicated 
Heki hypercalls, but not necessarily launch guest VMs.


> 
>   -- Keir
> 
>>> to run protected VM with general OS and may with pass-thru secure devices support.
>>
>> Why?  What is the actual use case?
>>
>>> May I know your suggestion of "utilize SEAM" is to follow TDX SPEC then
>>> work out a SW-TDX solution, or just do some leverage from SEAM code?
>>
>> Throw away TDX and let KVM run its own code in SEAM.
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-03-24 10:30       ` Keir Fraser
  2023-06-07 14:26         ` Mickaël Salaün
@ 2023-06-08 21:06         ` Dmytro Maluka
       [not found]           ` <d0900265-6ae6-2430-8185-4f9d153ec105@intel.com>
  2023-06-09 16:57           ` Trilok Soni
  1 sibling, 2 replies; 18+ messages in thread
From: Dmytro Maluka @ 2023-06-08 21:06 UTC (permalink / raw)
  To: Keir Fraser, Sean Christopherson
  Cc: Jason Chen CJ, kvm, android-kvm, Dmitry Torokhov, Tomasz Nowicki,
	Grzegorz Jaszczyk

On 3/24/23 11:30, Keir Fraser wrote:
> On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
>> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
>>> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
>>>
>>>> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
>>>>> There are similar use cases on x86 platforms requesting protected
>>>>> environment which is isolated from host OS for confidential computing.
>>>>
>>>> What exactly are those use cases?  The more details you can provide, the better.
>>>> E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
>>>> the pKVM implementation.
>>>
>>> Thanks Sean for your comments, I am very appreciated!
>>>
>>> We are expected 
>>
>> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
>> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
>> then please work with whoever you need to in order to get permission to fully
>> disclose the use case.  Because realistically, without knowing exactly what is
>> in scope and why, this is going nowhere.  
> 
> This is being seriously evaluated by ChromeOS as an alternative to
> their existing ManaTEE design. Compared with that (hypervisor == full
> Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
> "VM" runs closer to native and without nested scheduling, demonstrated
> better performance, and closer alignment with Android virtualisation
> (that's my team, which of course is ARM focused, but we'd love to see
> broader uptake of pKVM in the kernel).

Right, we (Google with the help of Semihalf and Intel) have been
evaluating pKVM for ChromeOS on Intel platforms (using this Intel's
pKVM-on-x86 PoC) as a platform for running secure workloads in VMs
protected from the untrusted ChromeOS host, and it looks quite promising
so far, in terms of both performance and design simplicity.

The primary use cases for those secure workloads on Chromebooks are for
protection of sensitive biometric data (e.g. fingerprints, face
authentication), which means that we expect pKVM to provide not just the
basic memory protection for secure VMs but also protection of secure
devices assigned to those VMs (e.g. fingerprint sensor, secure camera).

Summarizing what we discussed at PUCK [1] regarding the existing pKVM
design (with kernel deprivileging) vs pKVM using SEAM (please correct me
if I'm missing something):

- As we are interested in pKVM for client-side platforms (Chromebooks)
  which have no SEAM hardware, using SEAM does not seem to be an option
  at all. And even if it was, we still prefer the current (software
  based) pKVM design, since we need not just memory protection but also
  device protection, and generally we prefer to have more flexibility.

- Sean had a concern that kernel deprivileging may require intrusive
  changes in the common x86 arch code outside KVM, but IIUC it's not
  quite the case. AFAICT the code needed for deprivileging (i.e. making
  the kernel run in VMX non-root as a VM) is almost fully contained
  within KVM, i.e. the rest of the kernel can remain largely agnostic of
  the fact that it is running in VMX non-root. (Jason, please correct me
  if I'm wrong.)

Outside KVM, there is a bit of changes in drivers/intel/iommu/ for a bit
of PV stuff for IOMMU in pKVM (not sure if that is already included in
this RFC), and if we go with a more PV based design [2] and not just for
VMX and EPT but also for IOMMU, then I expect we're gonna have more of
such PV changes for pKVM there, but still contained within Intel IOMMU
driver.

[1] https://lore.kernel.org/kvm/20230606181525.1295020-1-seanjc@google.com/
[2] https://lore.kernel.org/all/ZA9WM3xA6Qu5Q43K@google.com/

Thanks,
Dmytro

> 
>  -- Keir
> 
>>> to run protected VM with general OS and may with pass-thru secure devices support.
>>
>> Why?  What is the actual use case?
>>
>>> May I know your suggestion of "utilize SEAM" is to follow TDX SPEC then
>>> work out a SW-TDX solution, or just do some leverage from SEAM code?
>>
>> Throw away TDX and let KVM run its own code in SEAM.
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
       [not found]           ` <d0900265-6ae6-2430-8185-4f9d153ec105@intel.com>
@ 2023-06-09  8:08             ` Dmytro Maluka
  0 siblings, 0 replies; 18+ messages in thread
From: Dmytro Maluka @ 2023-06-09  8:08 UTC (permalink / raw)
  To: tina.zhang, Keir Fraser, Sean Christopherson
  Cc: Jason Chen CJ, kvm, android-kvm, Dmitry Torokhov, Tomasz Nowicki,
	Grzegorz Jaszczyk

On 6/9/23 01:02, tina.zhang wrote:
> 
> 
> On 6/9/23 05:06, Dmytro Maluka wrote:
>> On 3/24/23 11:30, Keir Fraser wrote:
>>> On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
>>>> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
>>>>> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
>>>>>
>>>>>> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
>>>>>>> There are similar use cases on x86 platforms requesting protected
>>>>>>> environment which is isolated from host OS for confidential
>>>>>>> computing.
>>>>>>
>>>>>> What exactly are those use cases?  The more details you can
>>>>>> provide, the better.
>>>>>> E.g. restricting the isolated VMs to 64-bit mode a la TDX would
>>>>>> likely simplify
>>>>>> the pKVM implementation.
>>>>>
>>>>> Thanks Sean for your comments, I am very appreciated!
>>>>>
>>>>> We are expected
>>>>
>>>> Who is "we"?  Unless Intel is making a rather large pivot, I doubt
>>>> Intel is the
>>>> end customer of pKVM-on-x86.  If you aren't at liberty to say due
>>>> NDA/confidentiality,
>>>> then please work with whoever you need to in order to get permission
>>>> to fully
>>>> disclose the use case.  Because realistically, without knowing
>>>> exactly what is
>>>> in scope and why, this is going nowhere.
>>>
>>> This is being seriously evaluated by ChromeOS as an alternative to
>>> their existing ManaTEE design. Compared with that (hypervisor == full
>>> Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
>>> "VM" runs closer to native and without nested scheduling, demonstrated
>>> better performance, and closer alignment with Android virtualisation
>>> (that's my team, which of course is ARM focused, but we'd love to see
>>> broader uptake of pKVM in the kernel).
>>
>> Right, we (Google with the help of Semihalf and Intel) have been
>> evaluating pKVM for ChromeOS on Intel platforms (using this Intel's
>> pKVM-on-x86 PoC) as a platform for running secure workloads in VMs
>> protected from the untrusted ChromeOS host, and it looks quite promising
>> so far, in terms of both performance and design simplicity.
>>
>> The primary use cases for those secure workloads on Chromebooks are for
>> protection of sensitive biometric data (e.g. fingerprints, face
>> authentication), which means that we expect pKVM to provide not just the
>> basic memory protection for secure VMs but also protection of secure
>> devices assigned to those VMs (e.g. fingerprint sensor, secure camera).
>>
>> Summarizing what we discussed at PUCK [1] regarding the existing pKVM
>> design (with kernel deprivileging) vs pKVM using SEAM (please correct me
>> if I'm missing something):
>>
>> - As we are interested in pKVM for client-side platforms (Chromebooks)
>>    which have no SEAM hardware, using SEAM does not seem to be an option
>>    at all. And even if it was, we still prefer the current (software
>>    based) pKVM design, since we need not just memory protection but also
>>    device protection, and generally we prefer to have more flexibility.
>>
>> - Sean had a concern that kernel deprivileging may require intrusive
>>    changes in the common x86 arch code outside KVM, but IIUC it's not
>>    quite the case. AFAICT the code needed for deprivileging (i.e. making
>>    the kernel run in VMX non-root as a VM) is almost fully contained
>>    within KVM, i.e. the rest of the kernel can remain largely agnostic of
>>    the fact that it is running in VMX non-root. (Jason, please correct me
>>    if I'm wrong.)
>>
>> Outside KVM, there is a bit of changes in drivers/intel/iommu/ for a bit
>> of PV stuff for IOMMU in pKVM (not sure if that is already included in
>> this RFC), and if we go with a more PV based design [2] and not just for
>> VMX and EPT but also for IOMMU, then I expect we're gonna have more of
>> such PV changes for pKVM there, but still contained within Intel IOMMU
>> driver.
> Thanks Dmytro for the summarizing. I just want to add a bit update about
> the PV stuff for Intel IOMMU driver: we took deep look into the
> solution[1] proposed by pKVM-ARM folks and we think it's promising
> especially for the platforms that have no hardware IOMMU nested
> translation support. If PV is going to be the direction, we'd like to
> try the solution on pKVM-IA.

Hi Tina,

Thanks for the info, looks quite interesting. Yeah, I agree that PV
seems to be the best way to go. Also, using (fully or partially) the
same PV interface as on ARM is probably a good idea too.

> 
> [1]:
> https://lore.kernel.org/linux-arm-kernel/20230201125328.2186498-14-jean-philippe@linaro.org/T/
> 
> Regards,
> -Tina
>>
>> [1]
>> https://lore.kernel.org/kvm/20230606181525.1295020-1-seanjc@google.com/
>> [2] https://lore.kernel.org/all/ZA9WM3xA6Qu5Q43K@google.com/
>>
>> Thanks,
>> Dmytro
>>
>>>
>>>   -- Keir
>>>
>>>>> to run protected VM with general OS and may with pass-thru secure
>>>>> devices support.
>>>>
>>>> Why?  What is the actual use case?
>>>>
>>>>> May I know your suggestion of "utilize SEAM" is to follow TDX SPEC
>>>>> then
>>>>> work out a SW-TDX solution, or just do some leverage from SEAM code?
>>>>
>>>> Throw away TDX and let KVM run its own code in SEAM.
>>>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-06-08 21:06         ` Dmytro Maluka
       [not found]           ` <d0900265-6ae6-2430-8185-4f9d153ec105@intel.com>
@ 2023-06-09 16:57           ` Trilok Soni
  2023-06-09 18:44             ` Dmytro Maluka
  2023-06-13 17:45             ` Sean Christopherson
  1 sibling, 2 replies; 18+ messages in thread
From: Trilok Soni @ 2023-06-09 16:57 UTC (permalink / raw)
  To: Dmytro Maluka, Keir Fraser, Sean Christopherson
  Cc: Jason Chen CJ, kvm, android-kvm, Dmitry Torokhov, Tomasz Nowicki,
	Grzegorz Jaszczyk

On 6/8/2023 2:06 PM, Dmytro Maluka wrote:
> On 3/24/23 11:30, Keir Fraser wrote:
>> On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
>>> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
>>>> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
>>>>
>>>>> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
>>>>>> There are similar use cases on x86 platforms requesting protected
>>>>>> environment which is isolated from host OS for confidential computing.
>>>>>
>>>>> What exactly are those use cases?  The more details you can provide, the better.
>>>>> E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
>>>>> the pKVM implementation.
>>>>
>>>> Thanks Sean for your comments, I am very appreciated!
>>>>
>>>> We are expected
>>>
>>> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
>>> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
>>> then please work with whoever you need to in order to get permission to fully
>>> disclose the use case.  Because realistically, without knowing exactly what is
>>> in scope and why, this is going nowhere.
>>
>> This is being seriously evaluated by ChromeOS as an alternative to
>> their existing ManaTEE design. Compared with that (hypervisor == full
>> Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
>> "VM" runs closer to native and without nested scheduling, demonstrated
>> better performance, and closer alignment with Android virtualisation
>> (that's my team, which of course is ARM focused, but we'd love to see
>> broader uptake of pKVM in the kernel).
> 
> Right, we (Google with the help of Semihalf and Intel) have been
> evaluating pKVM for ChromeOS on Intel platforms (using this Intel's
> pKVM-on-x86 PoC) as a platform for running secure workloads in VMs
> protected from the untrusted ChromeOS host, and it looks quite promising
> so far, in terms of both performance and design simplicity.
> 
> The primary use cases for those secure workloads on Chromebooks are for
> protection of sensitive biometric data (e.g. fingerprints, face
> authentication), which means that we expect pKVM to provide not just the
> basic memory protection for secure VMs but also protection of secure
> devices assigned to those VMs (e.g. fingerprint sensor, secure camera).


Very interesting usecases. I would be interested to know how you plan to
paravirt the clocks and regulators required for these devices on the 
guest VM (Protected VM) on x86. On ARM, we have SCMI specification w/
virtio-scmi, it is possible to do the clock and regulators paravirt.

Camera may have need more h/w dependencies than clocks and regulators 
like flash LEDs, gpios, IOMMUs, I2C on top of the camera driver pipeline 
itself.

Do you have any proof-of-concept for above usecases to check and 
reproduce on the chrome w/ x86?

Do we have the recording of the PUCK meeting?

---Trilok Soni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-06-09 16:57           ` Trilok Soni
@ 2023-06-09 18:44             ` Dmytro Maluka
  2023-06-10  8:56               ` Dmytro Maluka
  2023-06-13 17:45             ` Sean Christopherson
  1 sibling, 1 reply; 18+ messages in thread
From: Dmytro Maluka @ 2023-06-09 18:44 UTC (permalink / raw)
  To: Trilok Soni, Keir Fraser, Sean Christopherson
  Cc: Jason Chen CJ, kvm, android-kvm, Dmitry Torokhov, Tomasz Nowicki,
	Grzegorz Jaszczyk

On 6/9/23 18:57, Trilok Soni wrote:
> On 6/8/2023 2:06 PM, Dmytro Maluka wrote:
>> On 3/24/23 11:30, Keir Fraser wrote:
>>> On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
>>>> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
>>>>> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
>>>>>
>>>>>> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
>>>>>>> There are similar use cases on x86 platforms requesting protected
>>>>>>> environment which is isolated from host OS for confidential computing.
>>>>>>
>>>>>> What exactly are those use cases?  The more details you can provide, the better.
>>>>>> E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
>>>>>> the pKVM implementation.
>>>>>
>>>>> Thanks Sean for your comments, I am very appreciated!
>>>>>
>>>>> We are expected
>>>>
>>>> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
>>>> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
>>>> then please work with whoever you need to in order to get permission to fully
>>>> disclose the use case.  Because realistically, without knowing exactly what is
>>>> in scope and why, this is going nowhere.
>>>
>>> This is being seriously evaluated by ChromeOS as an alternative to
>>> their existing ManaTEE design. Compared with that (hypervisor == full
>>> Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
>>> "VM" runs closer to native and without nested scheduling, demonstrated
>>> better performance, and closer alignment with Android virtualisation
>>> (that's my team, which of course is ARM focused, but we'd love to see
>>> broader uptake of pKVM in the kernel).
>>
>> Right, we (Google with the help of Semihalf and Intel) have been
>> evaluating pKVM for ChromeOS on Intel platforms (using this Intel's
>> pKVM-on-x86 PoC) as a platform for running secure workloads in VMs
>> protected from the untrusted ChromeOS host, and it looks quite promising
>> so far, in terms of both performance and design simplicity.
>>
>> The primary use cases for those secure workloads on Chromebooks are for
>> protection of sensitive biometric data (e.g. fingerprints, face
>> authentication), which means that we expect pKVM to provide not just the
>> basic memory protection for secure VMs but also protection of secure
>> devices assigned to those VMs (e.g. fingerprint sensor, secure camera).
> 
> 
> Very interesting usecases. I would be interested to know how you plan to
> paravirt the clocks and regulators required for these devices on the guest VM (Protected VM) on x86. On ARM, we have SCMI specification w/
> virtio-scmi, it is possible to do the clock and regulators paravirt.

On x86 things like clocks and regulators tend to be abstracted away via
ACPI, i.e. they are managed by AML code in ACPI tables, not by the
device driver in kernel. With pKVM, ACPI is still fully managed by the
host, although the secure device driver is running in the protected VM.

So at least in theory this is automatically solved for us in most cases
(though admittedly it is only a theory so far, we have no
proof-of-concept yet, see below).

> Camera may have need more h/w dependencies than clocks and regulators like flash LEDs, gpios, IOMMUs, I2C on top of the camera driver pipeline itself.

When it comes to camera, we are rather considering not a separate
physical camera but a secure image stream, separated from non-secure
image stream from the same camera e.g. with Intel IPU6. In this case
the assigned device (the IPU) is within the SoC. There are actually lots
of challenges with its assignment too, but completely different ones
(how to partition it between the host and the VM and ensure protection
from the host).

> Do you have any proof-of-concept for above usecases to check and reproduce on the chrome w/ x86?

Not really yet. We've been focused on evaluating functionality and
performace of ChromeOS itself, i.e. whether ChromeOS works with pKVM as
good as natively, - without the actual protected VMs yet, but already
with pKVM functionality required for protected VMs (memory protection
etc). We've also been looking a lot into the issues of assignment and
protection of secure devices for protected VMs, but (apart from the
simple case of generic PCI devices) mostly theoretically so far.

> Do we have the recording of the PUCK meeting?
> 
> ---Trilok Soni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-06-09 18:44             ` Dmytro Maluka
@ 2023-06-10  8:56               ` Dmytro Maluka
  0 siblings, 0 replies; 18+ messages in thread
From: Dmytro Maluka @ 2023-06-10  8:56 UTC (permalink / raw)
  To: Trilok Soni, Keir Fraser, Sean Christopherson
  Cc: Jason Chen CJ, kvm, android-kvm, Dmitry Torokhov, Tomasz Nowicki,
	Grzegorz Jaszczyk

On 6/9/23 20:44, Dmytro Maluka wrote:
> On 6/9/23 18:57, Trilok Soni wrote:
>> On 6/8/2023 2:06 PM, Dmytro Maluka wrote:
>>> On 3/24/23 11:30, Keir Fraser wrote:
>>>> On Tue, Mar 14, 2023 at 07:21:18AM -0700, Sean Christopherson wrote:
>>>>> On Tue, Mar 14, 2023, Jason Chen CJ wrote:
>>>>>> On Mon, Mar 13, 2023 at 09:33:41AM -0700, Sean Christopherson wrote:
>>>>>>
>>>>>>> On Mon, Mar 13, 2023, Jason Chen CJ wrote:
>>>>>>>> There are similar use cases on x86 platforms requesting protected
>>>>>>>> environment which is isolated from host OS for confidential computing.
>>>>>>>
>>>>>>> What exactly are those use cases?  The more details you can provide, the better.
>>>>>>> E.g. restricting the isolated VMs to 64-bit mode a la TDX would likely simplify
>>>>>>> the pKVM implementation.
>>>>>>
>>>>>> Thanks Sean for your comments, I am very appreciated!
>>>>>>
>>>>>> We are expected
>>>>>
>>>>> Who is "we"?  Unless Intel is making a rather large pivot, I doubt Intel is the
>>>>> end customer of pKVM-on-x86.  If you aren't at liberty to say due NDA/confidentiality,
>>>>> then please work with whoever you need to in order to get permission to fully
>>>>> disclose the use case.  Because realistically, without knowing exactly what is
>>>>> in scope and why, this is going nowhere.
>>>>
>>>> This is being seriously evaluated by ChromeOS as an alternative to
>>>> their existing ManaTEE design. Compared with that (hypervisor == full
>>>> Linux) the pKVM design is pretty attractive: smaller TCB, host Linux
>>>> "VM" runs closer to native and without nested scheduling, demonstrated
>>>> better performance, and closer alignment with Android virtualisation
>>>> (that's my team, which of course is ARM focused, but we'd love to see
>>>> broader uptake of pKVM in the kernel).
>>>
>>> Right, we (Google with the help of Semihalf and Intel) have been
>>> evaluating pKVM for ChromeOS on Intel platforms (using this Intel's
>>> pKVM-on-x86 PoC) as a platform for running secure workloads in VMs
>>> protected from the untrusted ChromeOS host, and it looks quite promising
>>> so far, in terms of both performance and design simplicity.
>>>
>>> The primary use cases for those secure workloads on Chromebooks are for
>>> protection of sensitive biometric data (e.g. fingerprints, face
>>> authentication), which means that we expect pKVM to provide not just the
>>> basic memory protection for secure VMs but also protection of secure
>>> devices assigned to those VMs (e.g. fingerprint sensor, secure camera).
>>
>>
>> Very interesting usecases. I would be interested to know how you plan to
>> paravirt the clocks and regulators required for these devices on the guest VM (Protected VM) on x86. On ARM, we have SCMI specification w/
>> virtio-scmi, it is possible to do the clock and regulators paravirt.
> 
> On x86 things like clocks and regulators tend to be abstracted away via
> ACPI, i.e. they are managed by AML code in ACPI tables, not by the
> device driver in kernel. With pKVM, ACPI is still fully managed by the
> host, although the secure device driver is running in the protected VM.
> 
> So at least in theory this is automatically solved for us in most cases
> (though admittedly it is only a theory so far, we have no
> proof-of-concept yet, see below).
> 
>> Camera may have need more h/w dependencies than clocks and regulators like flash LEDs, gpios, IOMMUs, I2C on top of the camera driver pipeline itself.
> 
> When it comes to camera, we are rather considering not a separate
> physical camera but a secure image stream, separated from non-secure
> image stream from the same camera e.g. with Intel IPU6. In this case
> the assigned device (the IPU) is within the SoC. There are actually lots
> of challenges with its assignment too, but completely different ones
> (how to partition it between the host and the VM and ensure protection
> from the host).

On second thought, the "within the SoC" point is probably not important.
What's important is that the protected guest needs only a small part of
the IPU hw functionality - the one related to the secure camera data
channel. Among the things you mentioned, the IPU's internal IOMMU needs
to be assigned to the guest, as it's crucial that the secure VM has
exclusive control over this IOMMU to ensure DMA isolation between secure
and non-secure data channels. (The host IOMMU, managed by pKVM
hypervisor, needs to be involved too but is not enough.) Others things
(GPIOs, clocks etc) are left to the host IPU driver and/or ACPI.

How shall we assign the IPU IOMMU to the guest? In the IPU's device
specific ways, unfortunately.

Hopefully for other use cases we can do things more generically, as we
can pass-through the entire device to the guest, except for the power
management bits which are already effectively partitioned away via ACPI.

>> Do you have any proof-of-concept for above usecases to check and reproduce on the chrome w/ x86?
> 
> Not really yet. We've been focused on evaluating functionality and
> performace of ChromeOS itself, i.e. whether ChromeOS works with pKVM as
> good as natively, - without the actual protected VMs yet, but already
> with pKVM functionality required for protected VMs (memory protection
> etc). We've also been looking a lot into the issues of assignment and
> protection of secure devices for protected VMs, but (apart from the
> simple case of generic PCI devices) mostly theoretically so far.
> 
>> Do we have the recording of the PUCK meeting?
>>
>> ---Trilok Soni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction
  2023-06-09 16:57           ` Trilok Soni
  2023-06-09 18:44             ` Dmytro Maluka
@ 2023-06-13 17:45             ` Sean Christopherson
  1 sibling, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2023-06-13 17:45 UTC (permalink / raw)
  To: Trilok Soni
  Cc: Dmytro Maluka, Keir Fraser, Jason Chen CJ, kvm, android-kvm,
	Dmitry Torokhov, Tomasz Nowicki, Grzegorz Jaszczyk

On Fri, Jun 09, 2023, Trilok Soni wrote:
> Do we have the recording of the PUCK meeting?

Link below.  You should have access, though any non-Googlers lurking will likely
need to request access (which I'll grant, I just can't make the folder shared with
literally everyone due to the configuration of our corp accounts).

https://drive.google.com/file/d/1JZ6e8ZgR2gUfB4uBYxsJUxp1KVL5YEA_/view?usp=drive_link&resourcekey=0-MGjMLec-8JEIFC3-vmZeLg

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-06-13 17:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-12 18:00 [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Jason Chen CJ
2023-03-12 18:00 ` [RFC PATCH part-1 1/5] pkvm: arm64: Move nvhe/spinlock.h to include/asm dir Jason Chen CJ
2023-03-12 18:00 ` [RFC PATCH part-1 2/5] pkvm: arm64: Make page allocator arch agnostic Jason Chen CJ
2023-03-12 18:00 ` [RFC PATCH part-1 3/5] pkvm: arm64: Move page allocator to virt/kvm/pkvm Jason Chen CJ
2023-03-12 18:00 ` [RFC PATCH part-1 4/5] pkvm: arm64: Make memory reservation arch agnostic Jason Chen CJ
2023-03-12 18:00 ` [RFC PATCH part-1 5/5] pkvm: arm64: Move general part of memory reservation to virt/kvm/pkvm Jason Chen CJ
2023-03-13 16:33 ` [RFC PATCH part-1 0/5] pKVM on Intel Platform Introduction Sean Christopherson
2023-03-14 16:17   ` Jason Chen CJ
2023-03-14 14:21     ` Sean Christopherson
2023-03-16  8:50       ` Jason Chen CJ
2023-03-24 10:30       ` Keir Fraser
2023-06-07 14:26         ` Mickaël Salaün
2023-06-08 21:06         ` Dmytro Maluka
     [not found]           ` <d0900265-6ae6-2430-8185-4f9d153ec105@intel.com>
2023-06-09  8:08             ` Dmytro Maluka
2023-06-09 16:57           ` Trilok Soni
2023-06-09 18:44             ` Dmytro Maluka
2023-06-10  8:56               ` Dmytro Maluka
2023-06-13 17:45             ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).