* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes [not found] <CA+EHjTwTgEVtea7wgef5G6EEgFa0so_GbNXTMZNKyFE=ucyV0g@mail.gmail.com> @ 2023-10-06 3:21 ` Sean Christopherson 2023-10-06 12:47 ` Fuad Tabba 0 siblings, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2023-10-06 3:21 UTC (permalink / raw) To: Fuad Tabba Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, KVM, moderated list:ARM64 PORT (AARCH64 ARCHITECTURE), KVMARM, LinuxMIPS, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, open list, Chao Peng, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Thu, Oct 05, 2023, Fuad Tabba wrote: > Hi Sean, > > On Tue, Oct 3, 2023 at 9:51 PM Sean Christopherson <seanjc@google.com> wrote: > > > Like I said, pKVM doesn't need a userspace ABI for managing PRIVATE/SHARED, > > > just a way of tracking in the host kernel of what is shared (as opposed to > > > the hypervisor, which already has the knowledge). The solution could simply > > > be that pKVM does not enable KVM_GENERIC_MEMORY_ATTRIBUTES, has its own > > > tracking of the status of the guest pages, and only selects KVM_PRIVATE_MEM. > > > > At the risk of overstepping my bounds, I think that effectively giving the guest > > full control over what is shared vs. private is a mistake. It more or less locks > > pKVM into a single model, and even within that model, dealing with errors and/or > > misbehaving guests becomes unnecessarily problematic. > > > > Using KVM_SET_MEMORY_ATTRIBUTES may not provide value *today*, e.g. the userspace > > side of pKVM could simply "reflect" all conversion hypercalls, and terminate the > > VM on errors. But the cost is very minimal, e.g. a single extra ioctl() per > > converion, and the upside is that pKVM won't be stuck if a use case comes along > > that wants to go beyond "all conversion requests either immediately succeed or > > terminate the guest". > > Now that I understand the purpose of KVM_SET_MEMORY_ATTRIBUTES, I > agree. However, pKVM needs to track at the host kernel (i.e., EL1) > whether guest memory is shared or private. Why does EL1 need it's own view/opinion? E.g. is it to avoid a accessing data that is still private according to EL2 (on behalf of the guest)? Assuming that's the case, why can't EL1 wait until it gets confirmation from EL2 that the data is fully shared before doing whatever it is that needs to be done? Ah, is the problem that whether or not .mmap() is allowed keys off of the state of the memory attributes? If that's so, then yeah, an internal flag in attributes is probably the way to go. It doesn't need to be a "host kernel private" flag though, e.g. an IN_FLUX flag to capture that the attributes aren't fully realized might be more intuitive for readers, and might have utility for other attributes in the future too. > One approach would be to add another flag to the attributes that > tracks the host kernel view. The way KVM_SET_MEMORY_ATTRIBUTES is > implemented now, userspace can zero it, so in that case, that > operation would need to be masked to avoid that. > > Another approach would be to have a pKVM-specific xarray (or similar) > to do the tracking, but since there is a structure that's already > doing something similar (i.e.,the attributes array), it seems like it > would be unnecessary overhead. > > Do you have any ideas or preferences? > > Cheers, > /fuad _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-10-06 3:21 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson @ 2023-10-06 12:47 ` Fuad Tabba 0 siblings, 0 replies; 14+ messages in thread From: Fuad Tabba @ 2023-10-06 12:47 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, KVM, moderated list:ARM64 PORT (AARCH64 ARCHITECTURE), KVMARM, LinuxMIPS, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, open list, Chao Peng, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov Hi Sean, On Fri, Oct 6, 2023 at 4:21 AM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Oct 05, 2023, Fuad Tabba wrote: > > Hi Sean, > > > > On Tue, Oct 3, 2023 at 9:51 PM Sean Christopherson <seanjc@google.com> wrote: > > > > Like I said, pKVM doesn't need a userspace ABI for managing PRIVATE/SHARED, > > > > just a way of tracking in the host kernel of what is shared (as opposed to > > > > the hypervisor, which already has the knowledge). The solution could simply > > > > be that pKVM does not enable KVM_GENERIC_MEMORY_ATTRIBUTES, has its own > > > > tracking of the status of the guest pages, and only selects KVM_PRIVATE_MEM. > > > > > > At the risk of overstepping my bounds, I think that effectively giving the guest > > > full control over what is shared vs. private is a mistake. It more or less locks > > > pKVM into a single model, and even within that model, dealing with errors and/or > > > misbehaving guests becomes unnecessarily problematic. > > > > > > Using KVM_SET_MEMORY_ATTRIBUTES may not provide value *today*, e.g. the userspace > > > side of pKVM could simply "reflect" all conversion hypercalls, and terminate the > > > VM on errors. But the cost is very minimal, e.g. a single extra ioctl() per > > > converion, and the upside is that pKVM won't be stuck if a use case comes along > > > that wants to go beyond "all conversion requests either immediately succeed or > > > terminate the guest". > > > > Now that I understand the purpose of KVM_SET_MEMORY_ATTRIBUTES, I > > agree. However, pKVM needs to track at the host kernel (i.e., EL1) > > whether guest memory is shared or private. > > Why does EL1 need it's own view/opinion? E.g. is it to avoid a accessing data > that is still private according to EL2 (on behalf of the guest)? > > Assuming that's the case, why can't EL1 wait until it gets confirmation from EL2 > that the data is fully shared before doing whatever it is that needs to be done? > > Ah, is the problem that whether or not .mmap() is allowed keys off of the state > of the memory attributes? If that's so, then yeah, an internal flag in attributes > is probably the way to go. It doesn't need to be a "host kernel private" flag > though, e.g. an IN_FLUX flag to capture that the attributes aren't fully realized > might be more intuitive for readers, and might have utility for other attributes > in the future too. Yes, it's because of mmap. I think that an IN_FLUX flag might work here. I'll have a go at it and see how it turns out. Thanks, /fuad > > > One approach would be to add another flag to the attributes that > > tracks the host kernel view. The way KVM_SET_MEMORY_ATTRIBUTES is > > implemented now, userspace can zero it, so in that case, that > > operation would need to be masked to avoid that. > > > > Another approach would be to have a pKVM-specific xarray (or similar) > > to do the tracking, but since there is a structure that's already > > doing something similar (i.e.,the attributes array), it seems like it > > would be unnecessary overhead. > > > > Do you have any ideas or preferences? > > > > Cheers, > > /fuad _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH v12 00/33] KVM: guest_memfd() and per-page attributes @ 2023-09-14 1:54 Sean Christopherson 2023-09-14 1:55 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson 0 siblings, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2023-09-14 1:54 UTC (permalink / raw) To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Sean Christopherson, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn Cc: kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov This is hopefully the last RFC for implementing fd-based (instead of vma-based) memory for KVM guests. If you want the full background of why we are doing this, please go read the v10 cover letter. With luck, v13 will be a "normal" series that's ready for inclusion. Tagged RFC as there are still several empty changelogs, a lot of missing documentation, and a handful of TODOs. And I haven't tested or proofread this anywhere near as much as I normally would. I am posting even though the remaining TODOs aren't _that_ big so that people can test this new version without having to wait a few weeks to close out the remaining TODOs, i.e. to give us at least some chance of hitting v6.7. The most relevant TODO item for non-KVM folks is that we are planning on dropping the dedicated "gmem" file system. Assuming that pans out, the patch to export security_inode_init_security_anon() should go away. KVM folks, there a few changes I want to highlight and get feedback on, all of which are directly related to the "annotated memory faults" series[*]: - Rename kvm_run.memory to kvm_run.memory_fault - Place "memory_fault" in a separate union - Return -EFAULT or -EHWPOISON with exiting with KVM_EXIT_MEMORY_FAULT The first one is pretty self-explanatory, "run->memory.gpa" looks quite odd and would prevent ever doing something directly with memory. Putting the struct in a separate union is not at all necessary for supporting private memory, it's purely forward looking to Anish series, which wants to annotate (fill memory_fault) on all faults, even if KVM ultimately doesn't exit to userspace (x86 has a few unfortunate flows where KVM can clobber a previous exit, or suppress a memory fault exit). Using a separate union, i.e. different bytes in kvm_run, allows exiting to userspace with both memory_fault and the "normal" union filled, e.g. if KVM starts an MMIO exit and then hits a memory fault exit, the MMIO exit will be preserved. It's unlikely userspace will be able to do anything useful with the info in that case, but the reverse will likely be much more interesting, e.g. if KVM hits a memory fault and then doesn't report it to userspace for whatever reason. As for returning -EFAULT/-EHWPOISON, far too many helpers that touch guest memory, i.e. can "fault", return 0 on success, which makes it all bug impossible to use '0' to signal "exit to userspace". Rather than use '0' for _just_ the case where the guest is accessing private vs. shared, my thought is to use -EFAULT everywhere except for the poisoned page case. [*] https://lore.kernel.org/all/20230908222905.1321305-1-amoorthy@google.com TODOs [owner]: - Documentation [none] - Changelogs [Sean] - Fully anonymous inode vs. proper filesystem [Paolo] - kvm_gmem_error_page() testing (my version is untested) [Isaku?] v12: - Squash fixes from others. [Many people] - Kill of the .on_unlock() callback and use .on_lock() when handling memory attributes updates. [Isaku] - Add more tests. [Ackerley] - Move range_has_attrs() to common code. [Paolo] - Return actually number of address spaces for the VM-scoped version of KVM_CAP_MULTI_ADDRESS_SPACE. [Paolo] - Move forward declaration of "struct kvm_gfn_range" to kvm_types.h. [Yuan] - Plumb code to have HVA-based mmu_notifier events affect only shared mappings. [Asish] - Clean up kvm_vm_ioctl_set_mem_attributes() math. [Binbin] - Collect a few reviews and acks. [Paolo, Paul] - Unconditionally advertise a synchronized MMU on PPC. [Paolo] - Check for error return from filemap_grab_folio(). [A - Make max_order optional. [Fuad] - Remove signal injection, zap SPTEs on memory error. [Isaku] - Add KVM_CAP_GUEST_MEMFD. [Xiaoyao] - Invoke kvm_arch_pre_set_memory_attributes() instead of kvm_mmu_unmap_gfn_range(). - Rename kvm_run.memory to kvm_run.memory_fault - Place "memory_fault" in a separate union - Return -EFAULT and -EHWPOISON with KVM_EXIT_MEMORY_FAULT - "Init" run->exit_reason in x86's vcpu_run() v11: - https://lore.kernel.org/all/20230718234512.1690985-1-seanjc@google.com - Test private<=>shared conversions *without* doing fallocate() - PUNCH_HOLE all memory between iterations of the conversion test so that KVM doesn't retain pages in the guest_memfd - Rename hugepage control to be a very generic ALLOW_HUGEPAGE, instead of giving it a THP or PMD specific name. - Fold in fixes from a lot of people (thank you!) - Zap SPTEs *before* updating attributes to ensure no weirdness, e.g. if KVM handles a page fault and looks at inconsistent attributes - Refactor MMU interaction with attributes updates to reuse much of KVM's framework for mmu_notifiers. v10: https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com Ackerley Tng (1): KVM: selftests: Test KVM exit behavior for private memory/access Chao Peng (8): KVM: Use gfn instead of hva for mmu_notifier_retry KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace KVM: Introduce per-page memory attributes KVM: x86: Disallow hugepages when memory attributes are mixed KVM: x86/mmu: Handle page fault for private memory KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper KVM: selftests: Expand set_memory_region_test to validate guest_memfd() KVM: selftests: Add basic selftest for guest_memfd() Sean Christopherson (21): KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER KVM: PPC: Return '1' unconditionally for KVM_CAP_SYNC_MMU KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER KVM: Introduce KVM_SET_USER_MEMORY_REGION2 KVM: Add a dedicated mmu_notifier flag for reclaiming freed memory KVM: Drop .on_unlock() mmu_notifier hook KVM: Set the stage for handling only shared mappings in mmu_notifier events mm: Add AS_UNMOVABLE to mark mapping as completely unmovable security: Export security_inode_init_security_anon() for use by KVM KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory KVM: Add transparent hugepage support for dedicated guest memory KVM: x86: "Reset" vcpu->run->exit_reason early in KVM_RUN KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro KVM: Allow arch code to track number of memslot address spaces per VM KVM: x86: Add support for "protected VMs" that can utilize private memory KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2 KVM: selftests: Add support for creating private memslots KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data Vishal Annapurve (3): KVM: selftests: Add helpers to convert guest memory b/w private and shared KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86) KVM: selftests: Add x86-only selftest for private memory conversions Documentation/virt/kvm/api.rst | 116 ++++ arch/arm64/include/asm/kvm_host.h | 2 - arch/arm64/kvm/Kconfig | 2 +- arch/mips/include/asm/kvm_host.h | 2 - arch/mips/kvm/Kconfig | 2 +- arch/powerpc/include/asm/kvm_host.h | 2 - arch/powerpc/kvm/Kconfig | 8 +- arch/powerpc/kvm/book3s_hv.c | 2 +- arch/powerpc/kvm/powerpc.c | 7 +- arch/riscv/include/asm/kvm_host.h | 2 - arch/riscv/kvm/Kconfig | 2 +- arch/x86/include/asm/kvm_host.h | 17 +- arch/x86/include/uapi/asm/kvm.h | 3 + arch/x86/kvm/Kconfig | 14 +- arch/x86/kvm/debugfs.c | 2 +- arch/x86/kvm/mmu/mmu.c | 264 +++++++- arch/x86/kvm/mmu/mmu_internal.h | 2 + arch/x86/kvm/mmu/tdp_mmu.c | 2 +- arch/x86/kvm/vmx/vmx.c | 11 +- arch/x86/kvm/x86.c | 25 +- include/linux/kvm_host.h | 143 +++- include/linux/kvm_types.h | 1 + include/linux/pagemap.h | 19 +- include/uapi/linux/kvm.h | 67 ++ include/uapi/linux/magic.h | 1 + mm/compaction.c | 43 +- mm/migrate.c | 2 + security/security.c | 1 + tools/testing/selftests/kvm/Makefile | 3 + tools/testing/selftests/kvm/dirty_log_test.c | 2 +- .../testing/selftests/kvm/guest_memfd_test.c | 165 +++++ .../selftests/kvm/include/kvm_util_base.h | 148 +++- .../testing/selftests/kvm/include/test_util.h | 5 + .../selftests/kvm/include/ucall_common.h | 11 + .../selftests/kvm/include/x86_64/processor.h | 15 + .../selftests/kvm/kvm_page_table_test.c | 2 +- tools/testing/selftests/kvm/lib/kvm_util.c | 231 ++++--- tools/testing/selftests/kvm/lib/memstress.c | 3 +- .../selftests/kvm/set_memory_region_test.c | 100 +++ .../kvm/x86_64/private_mem_conversions_test.c | 410 +++++++++++ .../kvm/x86_64/private_mem_kvm_exits_test.c | 121 ++++ .../kvm/x86_64/ucna_injection_test.c | 2 +- virt/kvm/Kconfig | 17 + virt/kvm/Makefile.kvm | 1 + virt/kvm/dirty_ring.c | 2 +- virt/kvm/guest_mem.c | 637 ++++++++++++++++++ virt/kvm/kvm_main.c | 482 +++++++++++-- virt/kvm/kvm_mm.h | 38 ++ 48 files changed, 2888 insertions(+), 271 deletions(-) create mode 100644 tools/testing/selftests/kvm/guest_memfd_test.c create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c create mode 100644 virt/kvm/guest_mem.c base-commit: 0bb80ecc33a8fb5a682236443c1e740d5c917d1d -- 2.42.0.283.g2d96d420d3-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-14 1:54 [RFC PATCH v12 00/33] KVM: guest_memfd() and per-page attributes Sean Christopherson @ 2023-09-14 1:55 ` Sean Christopherson 2023-09-15 6:32 ` Yan Zhao ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Sean Christopherson @ 2023-09-14 1:55 UTC (permalink / raw) To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Sean Christopherson, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn Cc: kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov From: Chao Peng <chao.p.peng@linux.intel.com> In confidential computing usages, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow userspace to operate on the per-page memory attributes. - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to a guest memory range. - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported memory attributes. Use an xarray to store the per-page attributes internally, with a naive, not fully optimized implementation, i.e. prioritize correctness over performance for the initial implementation. Because setting memory attributes is roughly analogous to mprotect() on memory that is mapped into the guest, zap existing mappings prior to updating the memory attributes. Opportunistically provide an arch hook for the post-set path (needed to complete invalidation anyways) in anticipation of x86 needing the hook to update metadata related to determining whether or not a given gfn can be backed with various sizes of hugepages. It's possible that future usages may not require an invalidation, e.g. if KVM ends up supporting RWX protections and userspace grants _more_ protections, but again opt for simplicity and punt optimizations to if/when they are needed. Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com Cc: Fuad Tabba <tabba@google.com> Cc: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- Documentation/virt/kvm/api.rst | 60 ++++++++++ include/linux/kvm_host.h | 18 +++ include/uapi/linux/kvm.h | 14 +++ virt/kvm/Kconfig | 4 + virt/kvm/kvm_main.c | 212 +++++++++++++++++++++++++++++++++ 5 files changed, 308 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index e28a13439a95..c44ef5295a12 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6070,6 +6070,56 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG interface. No error will be returned, but the resulting offset will not be applied. +4.139 KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES +----------------------------------------- + +:Capability: KVM_CAP_MEMORY_ATTRIBUTES +:Architectures: x86 +:Type: vm ioctl +:Parameters: u64 memory attributes bitmask(out) +:Returns: 0 on success, <0 on error + +Returns supported memory attributes bitmask. Supported memory attributes will +have the corresponding bits set in u64 memory attributes bitmask. + +The following memory attributes are defined:: + + #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) + +4.140 KVM_SET_MEMORY_ATTRIBUTES +----------------------------------------- + +:Capability: KVM_CAP_MEMORY_ATTRIBUTES +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct kvm_memory_attributes(in/out) +:Returns: 0 on success, <0 on error + +Sets memory attributes for pages in a guest memory range. Parameters are +specified via the following structure:: + + struct kvm_memory_attributes { + __u64 address; + __u64 size; + __u64 attributes; + __u64 flags; + }; + +The user sets the per-page memory attributes to a guest memory range indicated +by address/size, and in return KVM adjusts address and size to reflect the +actual pages of the memory range have been successfully set to the attributes. +If the call returns 0, "address" is updated to the last successful address + 1 +and "size" is updated to the remaining address size that has not been set +successfully. The user should check the return value as well as the size to +decide if the operation succeeded for the whole range or not. The user may want +to retry the operation with the returned address/size if the previous range was +partially successful. + +Both address and size should be page aligned and the supported attributes can be +retrieved with KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES. + +The "flags" field may be used for future extensions and should be set to 0s. + 5. The kvm_run structure ======================== @@ -8498,6 +8548,16 @@ block sizes is exposed in KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a 64-bit bitmap (each bit describing a block size). The default value is 0, to disable the eager page splitting. +8.41 KVM_CAP_MEMORY_ATTRIBUTES +------------------------------ + +:Capability: KVM_CAP_MEMORY_ATTRIBUTES +:Architectures: x86 +:Type: vm + +This capability indicates KVM supports per-page memory attributes and ioctls +KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES are available. + 9. Known KVM API problems ========================= diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b5373cee2b08..9b695391b11c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -256,6 +256,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER union kvm_mmu_notifier_arg { pte_t pte; + unsigned long attributes; }; struct kvm_gfn_range { @@ -808,6 +809,9 @@ struct kvm { #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; +#endif +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + struct xarray mem_attr_array; #endif char stats_id[KVM_STATS_NAME_SIZE]; }; @@ -2344,4 +2348,18 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; } +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn) +{ + return xa_to_value(xa_load(&kvm->mem_attr_array, gfn)); +} + +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, + unsigned long attrs); +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range); +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range); +#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d2d913acf0df..f8642ff2eb9d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 #define KVM_CAP_USER_MEMORY2 230 +#define KVM_CAP_MEMORY_ATTRIBUTES 231 #ifdef KVM_CAP_IRQ_ROUTING @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) + +struct kvm_memory_attributes { + __u64 address; + __u64 size; + __u64 attributes; + __u64 flags; +}; + +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index ecae2914c97e..5bd7fcaf9089 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -96,3 +96,7 @@ config KVM_GENERIC_HARDWARE_ENABLING config KVM_GENERIC_MMU_NOTIFIER select MMU_NOTIFIER bool + +config KVM_GENERIC_MEMORY_ATTRIBUTES + select KVM_GENERIC_MMU_NOTIFIER + bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a41f8658dfe0..2726938b684b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1218,6 +1218,9 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) spin_lock_init(&kvm->mn_invalidate_lock); rcuwait_init(&kvm->mn_memslots_update_rcuwait); xa_init(&kvm->vcpu_array); +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + xa_init(&kvm->mem_attr_array); +#endif INIT_LIST_HEAD(&kvm->gpc_list); spin_lock_init(&kvm->gpc_lock); @@ -1391,6 +1394,9 @@ static void kvm_destroy_vm(struct kvm *kvm) } cleanup_srcu_struct(&kvm->irq_srcu); cleanup_srcu_struct(&kvm->srcu); +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + xa_destroy(&kvm->mem_attr_array); +#endif kvm_arch_free_vm(kvm); preempt_notifier_dec(); hardware_disable_all(); @@ -2389,6 +2395,188 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, } #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */ +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +/* + * Returns true if _all_ gfns in the range [@start, @end) have attributes + * matching @attrs. + */ +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, + unsigned long attrs) +{ + XA_STATE(xas, &kvm->mem_attr_array, start); + unsigned long index; + bool has_attrs; + void *entry; + + rcu_read_lock(); + + if (!attrs) { + has_attrs = !xas_find(&xas, end); + goto out; + } + + has_attrs = true; + for (index = start; index < end; index++) { + do { + entry = xas_next(&xas); + } while (xas_retry(&xas, entry)); + + if (xas.xa_index != index || xa_to_value(entry) != attrs) { + has_attrs = false; + break; + } + } + +out: + rcu_read_unlock(); + return has_attrs; +} + +static u64 kvm_supported_mem_attributes(struct kvm *kvm) +{ + return 0; +} + +static __always_inline void kvm_handle_gfn_range(struct kvm *kvm, + struct kvm_mmu_notifier_range *range) +{ + struct kvm_gfn_range gfn_range; + struct kvm_memory_slot *slot; + struct kvm_memslots *slots; + struct kvm_memslot_iter iter; + bool found_memslot = false; + bool ret = false; + int i; + + gfn_range.arg = range->arg; + gfn_range.may_block = range->may_block; + + /* + * If/when KVM supports more attributes beyond private .vs shared, this + * _could_ set only_{private,shared} appropriately if the entire target + * range already has the desired private vs. shared state (it's unclear + * if that is a net win). For now, KVM reaches this point if and only + * if the private flag is being toggled, i.e. all mappings are in play. + */ + gfn_range.only_private = false; + gfn_range.only_shared = false; + + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + slots = __kvm_memslots(kvm, i); + + kvm_for_each_memslot_in_gfn_range(&iter, slots, range->start, range->end) { + slot = iter.slot; + gfn_range.slot = slot; + + gfn_range.start = max(range->start, slot->base_gfn); + gfn_range.end = min(range->end, slot->base_gfn + slot->npages); + if (gfn_range.start >= gfn_range.end) + continue; + + if (!found_memslot) { + found_memslot = true; + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + } + + ret |= range->handler(kvm, &gfn_range); + } + } + + if (range->flush_on_ret && ret) + kvm_flush_remote_tlbs(kvm); + + if (found_memslot) + KVM_MMU_UNLOCK(kvm); +} + +/* Set @attributes for the gfn range [@start, @end). */ +static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, + unsigned long attributes) +{ + struct kvm_mmu_notifier_range pre_set_range = { + .start = start, + .end = end, + .handler = kvm_arch_pre_set_memory_attributes, + .on_lock = kvm_mmu_invalidate_begin, + .flush_on_ret = true, + .may_block = true, + }; + struct kvm_mmu_notifier_range post_set_range = { + .start = start, + .end = end, + .arg.attributes = attributes, + .handler = kvm_arch_post_set_memory_attributes, + .on_lock = kvm_mmu_invalidate_end, + .may_block = true, + }; + unsigned long i; + void *entry; + int r = 0; + + entry = attributes ? xa_mk_value(attributes) : NULL; + + mutex_lock(&kvm->slots_lock); + + /* Nothing to do if the entire range as the desired attributes. */ + if (kvm_range_has_memory_attributes(kvm, start, end, attributes)) + goto out_unlock; + + /* + * Reserve memory ahead of time to avoid having to deal with failures + * partway through setting the new attributes. + */ + for (i = start; i < end; i++) { + r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT); + if (r) + goto out_unlock; + } + + kvm_handle_gfn_range(kvm, &pre_set_range); + + for (i = start; i < end; i++) { + r = xa_err(xa_store(&kvm->mem_attr_array, i, entry, + GFP_KERNEL_ACCOUNT)); + KVM_BUG_ON(r, kvm); + } + + kvm_handle_gfn_range(kvm, &post_set_range); + +out_unlock: + mutex_unlock(&kvm->slots_lock); + + return r; +} +static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, + struct kvm_memory_attributes *attrs) +{ + gfn_t start, end; + + /* flags is currently not used. */ + if (attrs->flags) + return -EINVAL; + if (attrs->attributes & ~kvm_supported_mem_attributes(kvm)) + return -EINVAL; + if (attrs->size == 0 || attrs->address + attrs->size < attrs->address) + return -EINVAL; + if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size)) + return -EINVAL; + + start = attrs->address >> PAGE_SHIFT; + end = (attrs->address + attrs->size) >> PAGE_SHIFT; + + /* + * xarray tracks data using "unsigned long", and as a result so does + * KVM. For simplicity, supports generic attributes only on 64-bit + * architectures. + */ + BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long)); + + return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes); +} +#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ + struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn) { return __gfn_to_memslot(kvm_memslots(kvm), gfn); @@ -4587,6 +4775,9 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #ifdef CONFIG_HAVE_KVM_MSI case KVM_CAP_SIGNAL_MSI: #endif +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + case KVM_CAP_MEMORY_ATTRIBUTES: +#endif #ifdef CONFIG_HAVE_KVM_IRQFD case KVM_CAP_IRQFD: #endif @@ -5015,6 +5206,27 @@ static long kvm_vm_ioctl(struct file *filp, break; } #endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */ +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + case KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES: { + u64 attrs = kvm_supported_mem_attributes(kvm); + + r = -EFAULT; + if (copy_to_user(argp, &attrs, sizeof(attrs))) + goto out; + r = 0; + break; + } + case KVM_SET_MEMORY_ATTRIBUTES: { + struct kvm_memory_attributes attrs; + + r = -EFAULT; + if (copy_from_user(&attrs, argp, sizeof(attrs))) + goto out; + + r = kvm_vm_ioctl_set_mem_attributes(kvm, &attrs); + break; + } +#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ case KVM_CREATE_DEVICE: { struct kvm_create_device cd; -- 2.42.0.283.g2d96d420d3-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-14 1:55 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson @ 2023-09-15 6:32 ` Yan Zhao 2023-09-20 21:00 ` Sean Christopherson 2023-09-18 7:51 ` Binbin Wu 2023-10-03 12:47 ` Fuad Tabba 2 siblings, 1 reply; 14+ messages in thread From: Yan Zhao @ 2023-09-15 6:32 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Wed, Sep 13, 2023 at 06:55:09PM -0700, Sean Christopherson wrote: > From: Chao Peng <chao.p.peng@linux.intel.com> > > In confidential computing usages, whether a page is private or shared is > necessary information for KVM to perform operations like page fault > handling, page zapping etc. There are other potential use cases for > per-page memory attributes, e.g. to make memory read-only (or no-exec, > or exec-only, etc.) without having to modify memslots. > ... >> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > + unsigned long attrs) > +{ > + XA_STATE(xas, &kvm->mem_attr_array, start); > + unsigned long index; > + bool has_attrs; > + void *entry; > + > + rcu_read_lock(); > + > + if (!attrs) { > + has_attrs = !xas_find(&xas, end); > + goto out; > + } > + > + has_attrs = true; > + for (index = start; index < end; index++) { > + do { > + entry = xas_next(&xas); > + } while (xas_retry(&xas, entry)); > + > + if (xas.xa_index != index || xa_to_value(entry) != attrs) { Should "xa_to_value(entry) != attrs" be "!(xa_to_value(entry) & attrs)" ? > + has_attrs = false; > + break; > + } > + } > + > +out: > + rcu_read_unlock(); > + return has_attrs; > +} > + ... > +/* Set @attributes for the gfn range [@start, @end). */ > +static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > + unsigned long attributes) > +{ > + struct kvm_mmu_notifier_range pre_set_range = { > + .start = start, > + .end = end, > + .handler = kvm_arch_pre_set_memory_attributes, > + .on_lock = kvm_mmu_invalidate_begin, > + .flush_on_ret = true, > + .may_block = true, > + }; > + struct kvm_mmu_notifier_range post_set_range = { > + .start = start, > + .end = end, > + .arg.attributes = attributes, > + .handler = kvm_arch_post_set_memory_attributes, > + .on_lock = kvm_mmu_invalidate_end, > + .may_block = true, > + }; > + unsigned long i; > + void *entry; > + int r = 0; > + > + entry = attributes ? xa_mk_value(attributes) : NULL; Also here, do we need to get existing attributes of a GFN first ? > + mutex_lock(&kvm->slots_lock); > + > + /* Nothing to do if the entire range as the desired attributes. */ > + if (kvm_range_has_memory_attributes(kvm, start, end, attributes)) > + goto out_unlock; > + > + /* > + * Reserve memory ahead of time to avoid having to deal with failures > + * partway through setting the new attributes. > + */ > + for (i = start; i < end; i++) { > + r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT); > + if (r) > + goto out_unlock; > + } > + > + kvm_handle_gfn_range(kvm, &pre_set_range); > + > + for (i = start; i < end; i++) { > + r = xa_err(xa_store(&kvm->mem_attr_array, i, entry, > + GFP_KERNEL_ACCOUNT)); > + KVM_BUG_ON(r, kvm); > + } > + > + kvm_handle_gfn_range(kvm, &post_set_range); > + > +out_unlock: > + mutex_unlock(&kvm->slots_lock); > + > + return r; > +} _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-15 6:32 ` Yan Zhao @ 2023-09-20 21:00 ` Sean Christopherson 2023-09-21 1:21 ` Yan Zhao 0 siblings, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2023-09-20 21:00 UTC (permalink / raw) To: Yan Zhao Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Fri, Sep 15, 2023, Yan Zhao wrote: > On Wed, Sep 13, 2023 at 06:55:09PM -0700, Sean Christopherson wrote: > > From: Chao Peng <chao.p.peng@linux.intel.com> > > > > In confidential computing usages, whether a page is private or shared is > > necessary information for KVM to perform operations like page fault > > handling, page zapping etc. There are other potential use cases for > > per-page memory attributes, e.g. to make memory read-only (or no-exec, > > or exec-only, etc.) without having to modify memslots. > > > ... > >> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > > + unsigned long attrs) > > +{ > > + XA_STATE(xas, &kvm->mem_attr_array, start); > > + unsigned long index; > > + bool has_attrs; > > + void *entry; > > + > > + rcu_read_lock(); > > + > > + if (!attrs) { > > + has_attrs = !xas_find(&xas, end); > > + goto out; > > + } > > + > > + has_attrs = true; > > + for (index = start; index < end; index++) { > > + do { > > + entry = xas_next(&xas); > > + } while (xas_retry(&xas, entry)); > > + > > + if (xas.xa_index != index || xa_to_value(entry) != attrs) { > Should "xa_to_value(entry) != attrs" be "!(xa_to_value(entry) & attrs)" ? No, the exact comparsion is deliberate. The intent of the API is to determine if the entire range already has the desired attributes, not if there is overlap between the two. E.g. if/when RWX attributes are supported, the exact comparison is needed to handle a RW => R conversion. > > + has_attrs = false; > > + break; > > + } > > + } > > + > > +out: > > + rcu_read_unlock(); > > + return has_attrs; > > +} > > + > ... > > +/* Set @attributes for the gfn range [@start, @end). */ > > +static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > > + unsigned long attributes) > > +{ > > + struct kvm_mmu_notifier_range pre_set_range = { > > + .start = start, > > + .end = end, > > + .handler = kvm_arch_pre_set_memory_attributes, > > + .on_lock = kvm_mmu_invalidate_begin, > > + .flush_on_ret = true, > > + .may_block = true, > > + }; > > + struct kvm_mmu_notifier_range post_set_range = { > > + .start = start, > > + .end = end, > > + .arg.attributes = attributes, > > + .handler = kvm_arch_post_set_memory_attributes, > > + .on_lock = kvm_mmu_invalidate_end, > > + .may_block = true, > > + }; > > + unsigned long i; > > + void *entry; > > + int r = 0; > > + > > + entry = attributes ? xa_mk_value(attributes) : NULL; > Also here, do we need to get existing attributes of a GFN first ? No? @entry is the new value that will be set for all entries. This line doesn't touch the xarray in any way. Maybe I'm just not understanding your question. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-20 21:00 ` Sean Christopherson @ 2023-09-21 1:21 ` Yan Zhao 2023-09-25 17:37 ` Sean Christopherson 0 siblings, 1 reply; 14+ messages in thread From: Yan Zhao @ 2023-09-21 1:21 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Wed, Sep 20, 2023 at 02:00:22PM -0700, Sean Christopherson wrote: > On Fri, Sep 15, 2023, Yan Zhao wrote: > > On Wed, Sep 13, 2023 at 06:55:09PM -0700, Sean Christopherson wrote: > > > From: Chao Peng <chao.p.peng@linux.intel.com> > > > > > > In confidential computing usages, whether a page is private or shared is > > > necessary information for KVM to perform operations like page fault > > > handling, page zapping etc. There are other potential use cases for > > > per-page memory attributes, e.g. to make memory read-only (or no-exec, > > > or exec-only, etc.) without having to modify memslots. > > > > > ... > > >> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > > > + unsigned long attrs) > > > +{ > > > + XA_STATE(xas, &kvm->mem_attr_array, start); > > > + unsigned long index; > > > + bool has_attrs; > > > + void *entry; > > > + > > > + rcu_read_lock(); > > > + > > > + if (!attrs) { > > > + has_attrs = !xas_find(&xas, end); > > > + goto out; > > > + } > > > + > > > + has_attrs = true; > > > + for (index = start; index < end; index++) { > > > + do { > > > + entry = xas_next(&xas); > > > + } while (xas_retry(&xas, entry)); > > > + > > > + if (xas.xa_index != index || xa_to_value(entry) != attrs) { > > Should "xa_to_value(entry) != attrs" be "!(xa_to_value(entry) & attrs)" ? > > No, the exact comparsion is deliberate. The intent of the API is to determine > if the entire range already has the desired attributes, not if there is overlap > between the two. > > E.g. if/when RWX attributes are supported, the exact comparison is needed to > handle a RW => R conversion. > > > > + has_attrs = false; > > > + break; > > > + } > > > + } > > > + > > > +out: > > > + rcu_read_unlock(); > > > + return has_attrs; > > > +} > > > + > > ... > > > +/* Set @attributes for the gfn range [@start, @end). */ > > > +static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > > > + unsigned long attributes) > > > +{ > > > + struct kvm_mmu_notifier_range pre_set_range = { > > > + .start = start, > > > + .end = end, > > > + .handler = kvm_arch_pre_set_memory_attributes, > > > + .on_lock = kvm_mmu_invalidate_begin, > > > + .flush_on_ret = true, > > > + .may_block = true, > > > + }; > > > + struct kvm_mmu_notifier_range post_set_range = { > > > + .start = start, > > > + .end = end, > > > + .arg.attributes = attributes, > > > + .handler = kvm_arch_post_set_memory_attributes, > > > + .on_lock = kvm_mmu_invalidate_end, > > > + .may_block = true, > > > + }; > > > + unsigned long i; > > > + void *entry; > > > + int r = 0; > > > + > > > + entry = attributes ? xa_mk_value(attributes) : NULL; > > Also here, do we need to get existing attributes of a GFN first ? > > No? @entry is the new value that will be set for all entries. This line doesn't > touch the xarray in any way. Maybe I'm just not understanding your question. Hmm, I thought this interface was to allow users to add/remove an attribute to a GFN rather than overwrite all attributes of a GFN. Now I think I misunderstood the intention. But I wonder if there is a way for users to just add one attribute, as I don't find ioctl like KVM_GET_MEMORY_ATTRIBUTES for users to get current attributes and then to add/remove one based on that. e.g. maybe in future, KVM wants to add one attribute in kernel without being told by userspace ? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-21 1:21 ` Yan Zhao @ 2023-09-25 17:37 ` Sean Christopherson 0 siblings, 0 replies; 14+ messages in thread From: Sean Christopherson @ 2023-09-25 17:37 UTC (permalink / raw) To: Yan Zhao Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Thu, Sep 21, 2023, Yan Zhao wrote: > On Wed, Sep 20, 2023 at 02:00:22PM -0700, Sean Christopherson wrote: > > On Fri, Sep 15, 2023, Yan Zhao wrote: > > > On Wed, Sep 13, 2023 at 06:55:09PM -0700, Sean Christopherson wrote: > > > > +/* Set @attributes for the gfn range [@start, @end). */ > > > > +static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > > > > + unsigned long attributes) > > > > +{ > > > > + struct kvm_mmu_notifier_range pre_set_range = { > > > > + .start = start, > > > > + .end = end, > > > > + .handler = kvm_arch_pre_set_memory_attributes, > > > > + .on_lock = kvm_mmu_invalidate_begin, > > > > + .flush_on_ret = true, > > > > + .may_block = true, > > > > + }; > > > > + struct kvm_mmu_notifier_range post_set_range = { > > > > + .start = start, > > > > + .end = end, > > > > + .arg.attributes = attributes, > > > > + .handler = kvm_arch_post_set_memory_attributes, > > > > + .on_lock = kvm_mmu_invalidate_end, > > > > + .may_block = true, > > > > + }; > > > > + unsigned long i; > > > > + void *entry; > > > > + int r = 0; > > > > + > > > > + entry = attributes ? xa_mk_value(attributes) : NULL; > > > Also here, do we need to get existing attributes of a GFN first ? > > > > No? @entry is the new value that will be set for all entries. This line doesn't > > touch the xarray in any way. Maybe I'm just not understanding your question. > Hmm, I thought this interface was to allow users to add/remove an attribute to a GFN > rather than overwrite all attributes of a GFN. Now I think I misunderstood the intention. > > But I wonder if there is a way for users to just add one attribute, as I don't find > ioctl like KVM_GET_MEMORY_ATTRIBUTES for users to get current attributes and then to > add/remove one based on that. e.g. maybe in future, KVM wants to add one attribute in > kernel without being told by userspace ? The plan is that memory attributes will be 100% userspace driven, i.e. that KVM will never add its own attributes. That's why there is (currently) no KVM_GET_MEMORY_ATTRIBUTES, the intended usage model is that userspace is fully responsible for managing attributes, and so should never need to query information that it already knows. If there's a compelling case for getting attributes then we could certainly add such an ioctl(), but I hope we never need to add a GET because that likely means we've made mistakes along the way. Giving userspace full control of attributes allows for a simpler uAPI, e.g. if userspace doesn't have full control, then setting or clearing bits requires a RMW operation, which means creating a more complex ioctl(). That's why its a straight SET operation and not an OR type operation. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-14 1:55 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson 2023-09-15 6:32 ` Yan Zhao @ 2023-09-18 7:51 ` Binbin Wu 2023-09-20 21:03 ` Sean Christopherson 2023-10-03 12:47 ` Fuad Tabba 2 siblings, 1 reply; 14+ messages in thread From: Binbin Wu @ 2023-09-18 7:51 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On 9/14/2023 9:55 AM, Sean Christopherson wrote: > From: Chao Peng <chao.p.peng@linux.intel.com> [...] > > +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > +/* > + * Returns true if _all_ gfns in the range [@start, @end) have attributes > + * matching @attrs. > + */ > +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > + unsigned long attrs) > +{ > + XA_STATE(xas, &kvm->mem_attr_array, start); > + unsigned long index; > + bool has_attrs; > + void *entry; > + > + rcu_read_lock(); > + > + if (!attrs) { > + has_attrs = !xas_find(&xas, end); IIUIC, xas_find() is inclusive for "end", so here should be "end - 1" ? > + goto out; > + } > + > + has_attrs = true; > + for (index = start; index < end; index++) { > + do { > + entry = xas_next(&xas); > + } while (xas_retry(&xas, entry)); > + > + if (xas.xa_index != index || xa_to_value(entry) != attrs) { > + has_attrs = false; > + break; > + } > + } > + > +out: > + rcu_read_unlock(); > + return has_attrs; > +} > + > [...] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-18 7:51 ` Binbin Wu @ 2023-09-20 21:03 ` Sean Christopherson 2023-09-27 5:19 ` Binbin Wu 0 siblings, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2023-09-20 21:03 UTC (permalink / raw) To: Binbin Wu Cc: kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Mon, Sep 18, 2023, Binbin Wu wrote: > > > On 9/14/2023 9:55 AM, Sean Christopherson wrote: > > From: Chao Peng <chao.p.peng@linux.intel.com> > [...] > > +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > > +/* > > + * Returns true if _all_ gfns in the range [@start, @end) have attributes > > + * matching @attrs. > > + */ > > +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > > + unsigned long attrs) > > +{ > > + XA_STATE(xas, &kvm->mem_attr_array, start); > > + unsigned long index; > > + bool has_attrs; > > + void *entry; > > + > > + rcu_read_lock(); > > + > > + if (!attrs) { > > + has_attrs = !xas_find(&xas, end); > IIUIC, xas_find() is inclusive for "end", so here should be "end - 1" ? Yes, that does appear to be the case. Inclusive vs. exclusive on gfn ranges has is the bane of my existence. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-20 21:03 ` Sean Christopherson @ 2023-09-27 5:19 ` Binbin Wu 0 siblings, 0 replies; 14+ messages in thread From: Binbin Wu @ 2023-09-27 5:19 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Chao Peng, Fuad Tabba, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On 9/21/2023 5:03 AM, Sean Christopherson wrote: > On Mon, Sep 18, 2023, Binbin Wu wrote: >> >> On 9/14/2023 9:55 AM, Sean Christopherson wrote: >>> From: Chao Peng <chao.p.peng@linux.intel.com> >> [...] >>> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES >>> +/* >>> + * Returns true if _all_ gfns in the range [@start, @end) have attributes >>> + * matching @attrs. >>> + */ >>> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, >>> + unsigned long attrs) >>> +{ >>> + XA_STATE(xas, &kvm->mem_attr_array, start); >>> + unsigned long index; >>> + bool has_attrs; >>> + void *entry; >>> + >>> + rcu_read_lock(); >>> + >>> + if (!attrs) { >>> + has_attrs = !xas_find(&xas, end); >> IIUIC, xas_find() is inclusive for "end", so here should be "end - 1" ? > Yes, that does appear to be the case. Inclusive vs. exclusive on gfn ranges has > is the bane of my existence. Seems this one is not included in the "KVM: guest_memfd fixes" patch series? https://lore.kernel.org/kvm/20230921203331.3746712-1-seanjc@google.com/ _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-09-14 1:55 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson 2023-09-15 6:32 ` Yan Zhao 2023-09-18 7:51 ` Binbin Wu @ 2023-10-03 12:47 ` Fuad Tabba 2023-10-03 15:59 ` Sean Christopherson 2 siblings, 1 reply; 14+ messages in thread From: Fuad Tabba @ 2023-10-03 12:47 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov Hi, > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index d2d913acf0df..f8642ff2eb9d 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > #define KVM_CAP_USER_MEMORY2 230 > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > + > +struct kvm_memory_attributes { > + __u64 address; > + __u64 size; > + __u64 attributes; > + __u64 flags; > +}; > + > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > + In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED attributes from userspace. However, we'd like to use the attributes xarray to track the sharing state of guest pages at the host kernel. Moreover, we'd rather the default guest page state be PRIVATE, and only specify which pages are shared. All pKVM guest pages start off as private, and the majority will remain so. I'm not sure if this is the best way to do this: One idea would be to move the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() lives as well. This would allow different architectures to specify their own attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). This wouldn't help in terms of preventing userspace from clearing attributes (i.e., setting a 0 attribute) though. The other thing, which we need for pKVM anyway, is to make kvm_vm_set_mem_attributes() global, so that it can be called from outside of kvm_main.c (already have a local patch for this that declares it in kvm_host.h), and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. This would let pKVM select only KVM_PRIVATE_MEM (as opposed to KVM_GENERIC_PRIVATE_MEM, which selects KVM_GENERIC_MEMORY_ATTRIBUTES), preventing userspace from setting these attributes, while allowing pKVM to call kvm_vm_set_mem_attributes(). What do you think? Thanks, /fuad _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-10-03 12:47 ` Fuad Tabba @ 2023-10-03 15:59 ` Sean Christopherson 2023-10-03 18:33 ` Fuad Tabba 0 siblings, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2023-10-03 15:59 UTC (permalink / raw) To: Fuad Tabba Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-10-03 15:59 ` Sean Christopherson @ 2023-10-03 18:33 ` Fuad Tabba 2023-10-03 20:51 ` Sean Christopherson 0 siblings, 1 reply; 14+ messages in thread From: Fuad Tabba @ 2023-10-03 18:33 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov Hi Sean, On Tue, Oct 3, 2023 at 4:59 PM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Oct 03, 2023, Fuad Tabba wrote: > > Hi, > > > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > > index d2d913acf0df..f8642ff2eb9d 100644 > > > --- a/include/uapi/linux/kvm.h > > > +++ b/include/uapi/linux/kvm.h > > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > > #define KVM_CAP_USER_MEMORY2 230 > > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > > + > > > +struct kvm_memory_attributes { > > > + __u64 address; > > > + __u64 size; > > > + __u64 attributes; > > > + __u64 flags; > > > +}; > > > + > > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > > + > > > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > > attributes from userspace. > > Why not? The whole thing falls apart if userspace doesn't *know* the state of a > page, and the only way for userspace to know the state of a page at a given moment > in time is if userspace controls the attributes. E.g. even if KVM were to provide > a way for userspace to query attributes, the attributes exposed to usrspace would > become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) > since userspace couldn't prevent future changes. I think I might not quite understand the purpose of the KVM_SET_MEMORY_ATTRIBUTES ABI. In pKVM, all of a protected guest's memory is private by default, until the guest shares it with the host (via a hypercall), or another guest (future work). When the guest shares it, userspace is notified via KVM_EXIT_HYPERCALL. In many use cases, userspace doesn't need to keep track directly of all of this, but can reactively un/map the memory being un/shared. > Why does pKVM need to prevent userspace from stating *its* view of attributes? > > If the goal is to reduce memory overhead, that can be solved by using an internal, > non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest > attempts to access memory where pKVM and userspace don't agree on the state, > generate an exit to userspace. Or kill the guest. Or do something else entirely. For the pKVM hypervisor the guest's view of the attributes doesn't matter. The hypervisor at the end of the day is the ultimate arbiter for what is shared and with how. For pKVM (at least in my port of guestmem), we use the memory attributes from guestmem essentially to control which memory can be mapped by the host. One difference between pKVM and TDX (as I understand it), is that TDX uses the msb of the guest's IPA to indicate whether memory is shared or private, and that can generate a mismatch on guest memory access between what it thinks the state is, and what the sharing state in reality is. pKVM doesn't have that. Memory is private by default, and can be shared in-place, both in the guest's IPA space as well as the underlying physical page. > > However, we'd like to use the attributes xarray to track the sharing state of > > guest pages at the host kernel. > > > > Moreover, we'd rather the default guest page state be PRIVATE, and > > only specify which pages are shared. All pKVM guest pages start off as > > private, and the majority will remain so. > > I would rather optimize kvm_vm_set_mem_attributes() to generate range-based > xarray entries, at which point it shouldn't matter all that much whether PRIVATE > or SHARED is the default "empty" state. We opted not to do that for the initial > merge purely to keep the code as simple as possible (which is obviously still not > exactly simple). > > With range-based xarray entries, the cost of tagging huge chunks of memory as > PRIVATE should be a non-issue. And if that's not enough for whatever reason, I > would rather define the polarity of PRIVATE on a per-VM basis, but only for internal > storage. Sounds good. > > I'm not sure if this is the best way to do this: One idea would be to move > > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > > lives as well. This would allow different architectures to specify their own > > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > > This wouldn't help in terms of preventing userspace from clearing attributes > > (i.e., setting a 0 attribute) though. > > > > The other thing, which we need for pKVM anyway, is to make > > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > > kvm_main.c (already have a local patch for this that declares it in > > kvm_host.h), > > That's no problem, but I am definitely opposed to KVM modifying attributes that > are owned by userspace. > > > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. > > As above, I am opposed to pKVM having a completely different ABI for managing > PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the > attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work > for pKVM, then we've failed miserably and should revist the uAPI. Like I said, pKVM doesn't need a userspace ABI for managing PRIVATE/SHARED, just a way of tracking in the host kernel of what is shared (as opposed to the hypervisor, which already has the knowledge). The solution could simply be that pKVM does not enable KVM_GENERIC_MEMORY_ATTRIBUTES, has its own tracking of the status of the guest pages, and only selects KVM_PRIVATE_MEM. Thanks! /fuad _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes 2023-10-03 18:33 ` Fuad Tabba @ 2023-10-03 20:51 ` Sean Christopherson 0 siblings, 0 replies; 14+ messages in thread From: Sean Christopherson @ 2023-10-03 20:51 UTC (permalink / raw) To: Fuad Tabba Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Huacai Chen, Michael Ellerman, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou, Matthew Wilcox (Oracle), Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, kvm, linux-arm-kernel, kvmarm, linux-mips, linuxppc-dev, kvm-riscv, linux-riscv, linux-fsdevel, linux-mm, linux-security-module, linux-kernel, Chao Peng, Jarkko Sakkinen, Anish Moorthy, Yu Zhang, Isaku Yamahata, Xu Yilun, Vlastimil Babka, Vishal Annapurve, Ackerley Tng, Maciej Szmigiero, David Hildenbrand, Quentin Perret, Michael Roth, Wang, Liam Merwick, Isaku Yamahata, Kirill A . Shutemov On Tue, Oct 03, 2023, Fuad Tabba wrote: > On Tue, Oct 3, 2023 at 4:59 PM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Oct 03, 2023, Fuad Tabba wrote: > > > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > > > + > > > > > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > > > attributes from userspace. > > > > Why not? The whole thing falls apart if userspace doesn't *know* the state of a > > page, and the only way for userspace to know the state of a page at a given moment > > in time is if userspace controls the attributes. E.g. even if KVM were to provide > > a way for userspace to query attributes, the attributes exposed to usrspace would > > become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) > > since userspace couldn't prevent future changes. > > I think I might not quite understand the purpose of the > KVM_SET_MEMORY_ATTRIBUTES ABI. In pKVM, all of a protected guest's memory is > private by default, until the guest shares it with the host (via a > hypercall), or another guest (future work). When the guest shares it, > userspace is notified via KVM_EXIT_HYPERCALL. In many use cases, userspace > doesn't need to keep track directly of all of this, but can reactively un/map > the memory being un/shared. Yes, and then userspace needs to tell KVM, via KVM_SET_MEMORY_ATTRIBUTES, that userspace has agreed to change the state of the page. Userspace may not need/want to explicitly track the state of pages, but userspace still needs to tell KVM what userspace wants. KVM is primarily an accelerator, e.g. KVM's role is to make things go fast (relative to doing things in userspace) and provide access to resources/instructions that require elevated privileges. As a general rule, we try to avoid defining the vCPU model, security policies, etc. in KVM, because hardcoding policy into KVM (and the kernel as a whole) eventually limits the utility of KVM. As it pertains to PRIVATE vs. SHARED, KVM's role is to define and enforce the basic rules, but KVM shouldn't do things like define when it is (il)legal to convert memory to/from SHARED, what pages can be converted, what happens if the guest and userspace disagree, etc. > > Why does pKVM need to prevent userspace from stating *its* view of attributes? > > > > If the goal is to reduce memory overhead, that can be solved by using an internal, > > non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest > > attempts to access memory where pKVM and userspace don't agree on the state, > > generate an exit to userspace. Or kill the guest. Or do something else entirely. > > For the pKVM hypervisor the guest's view of the attributes doesn't > matter. The hypervisor at the end of the day is the ultimate arbiter > for what is shared and with how. For pKVM (at least in my port of > guestmem), we use the memory attributes from guestmem essentially to > control which memory can be mapped by the host. The guest's view absolutely matters. The guest's view may not be expressed at access time, e.g. as you note below, pKVM and other software-protected VMs don't have a dedicated shared vs. private bit like TDX and SNP. But the view is still there, e.g. in the pKVM model, the guest expresses its desire for shared vs. private via hypercall, and IIRC, the guest's view is tracked by the hypervisor in the stage-2 PTEs. pKVM itself may track the guest's view on things, but the view is still the guest's. E.g. if the guest thinks a page is private, but in reality KVM and host userspace have it as shared, then the guest may unintentionally leak data to the untrusted world. IIUC, you have implemented guest_memfd support in pKVM by changing the attributes when the guest makes the hypercall. This can work, but only so long as the guest and userspace are well-behaved, and it will likely paint pKVM into a corner in the long run. E.g. if the guest makes a hypercall to convert memory to PRIVATE, but there is no memslot or the memslot doesn't support private memory, then unless there is policy baked into KVM, or an ABI for the guest<=>host hypercall interface that allows unwinding the program counter, you're stuck. Returning an error for the hypercall straight from KVM is undesirable as that would put policy into KVM that doesn't need to be there, e.g. that would prevent userspace from manipulating memslots in response to (un)share requests from the guest. It's a similar story if KVM marks the page as PRIVATE, as that would prevent userspace from returning an error for the hypercall, i.e. would prevent usersepace from denying the request to convert to PRIVATE. > One difference between pKVM and TDX (as I understand it), is that TDX > uses the msb of the guest's IPA to indicate whether memory is shared > or private, and that can generate a mismatch on guest memory access > between what it thinks the state is, and what the sharing state in > reality is. pKVM doesn't have that. Memory is private by default, and > can be shared in-place, both in the guest's IPA space as well as the > underlying physical page. TDX's shared bit and SNP's encryption bit are just a means of hardware enforcement. pKVM does have a hardware bit because hardware doesn't provide any enforcement. But as above, pKVM does have an equivalent *somewhere*. > > > The other thing, which we need for pKVM anyway, is to make > > > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > > > kvm_main.c (already have a local patch for this that declares it in > > > kvm_host.h), > > > > That's no problem, but I am definitely opposed to KVM modifying attributes that > > are owned by userspace. > > > > > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. > > > > As above, I am opposed to pKVM having a completely different ABI for managing > > PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the > > attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work > > for pKVM, then we've failed miserably and should revist the uAPI. > > Like I said, pKVM doesn't need a userspace ABI for managing PRIVATE/SHARED, > just a way of tracking in the host kernel of what is shared (as opposed to > the hypervisor, which already has the knowledge). The solution could simply > be that pKVM does not enable KVM_GENERIC_MEMORY_ATTRIBUTES, has its own > tracking of the status of the guest pages, and only selects KVM_PRIVATE_MEM. At the risk of overstepping my bounds, I think that effectively giving the guest full control over what is shared vs. private is a mistake. It more or less locks pKVM into a single model, and even within that model, dealing with errors and/or misbehaving guests becomes unnecessarily problematic. Using KVM_SET_MEMORY_ATTRIBUTES may not provide value *today*, e.g. the userspace side of pKVM could simply "reflect" all conversion hypercalls, and terminate the VM on errors. But the cost is very minimal, e.g. a single extra ioctl() per converion, and the upside is that pKVM won't be stuck if a use case comes along that wants to go beyond "all conversion requests either immediately succeed or terminate the guest". _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-10-06 12:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CA+EHjTwTgEVtea7wgef5G6EEgFa0so_GbNXTMZNKyFE=ucyV0g@mail.gmail.com> 2023-10-06 3:21 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson 2023-10-06 12:47 ` Fuad Tabba 2023-09-14 1:54 [RFC PATCH v12 00/33] KVM: guest_memfd() and per-page attributes Sean Christopherson 2023-09-14 1:55 ` [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes Sean Christopherson 2023-09-15 6:32 ` Yan Zhao 2023-09-20 21:00 ` Sean Christopherson 2023-09-21 1:21 ` Yan Zhao 2023-09-25 17:37 ` Sean Christopherson 2023-09-18 7:51 ` Binbin Wu 2023-09-20 21:03 ` Sean Christopherson 2023-09-27 5:19 ` Binbin Wu 2023-10-03 12:47 ` Fuad Tabba 2023-10-03 15:59 ` Sean Christopherson 2023-10-03 18:33 ` Fuad Tabba 2023-10-03 20:51 ` Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).