From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52081C678D4 for ; Tue, 17 Jan 2023 03:21:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFB9E6B0075; Mon, 16 Jan 2023 22:21:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AAB516B0078; Mon, 16 Jan 2023 22:21:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 973916B007B; Mon, 16 Jan 2023 22:21:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8737F6B0075 for ; Mon, 16 Jan 2023 22:21:30 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A4E891C4C03 for ; Tue, 17 Jan 2023 03:21:29 +0000 (UTC) X-FDA: 80362840698.29.37AB3B6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf29.hostedemail.com (Postfix) with ESMTP id D9D3B12000A for ; Tue, 17 Jan 2023 03:21:26 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RP64pWNG; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf29.hostedemail.com: domain of binbin.wu@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=binbin.wu@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673925687; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=glkrtVJBy96wRo1j3rSKXwNVYm9qp8A3mO786Vbw3bg=; b=OY74hTAPCzsXCTUvJlPurtx0HjsidzzpZwNmc9XYVDSwn0FIc7S8zBTmJIuWgLwmHz8OkD PCCg5Nj7WcFYBPi9rqljmURLiqy3wlEI296jBYhevJFu7KjNb5c1HKlxyFWDV3jMap7dW6 vJQvluly7VVJJ+Id57BUT/3nfIuIQ8E= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RP64pWNG; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf29.hostedemail.com: domain of binbin.wu@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=binbin.wu@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673925687; a=rsa-sha256; cv=none; b=fJDdOtwe3S2FEFLmJxUN3Z+YXh/NmQHBk/ewIO5EOjYcqLBUyKCSHee96hZpJHmi6tOMFt KutbelV13wto/JMIZCp64t3a6P/3+GctbJHYsJ714FrjRytQl4JxKhTwMyuDyJ0bNDkpep 0rpJuDAE67CRt4zTOhafJvEaporyvYs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673925686; x=1705461686; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=dHN1vav1t5EnRUr6UdHotg6czTkkjwNiQkXgUEZ1z7k=; b=RP64pWNGhKyElnCG5bYWuQB18bZmO0/CiDhz5TvmKXQPELpxe2d/Q1cE 3bHHJMJ8KvP2XDDnIVi5iiXu+00kRVl2rYscnmg8t1dJvJEYpBletg8Qv lx73a6DidA+ElhPLFIJSsWBRmyi6Zml2YOEUR9PlKy/Ag5vsTmDLx6ZC5 A0YJNKbP9BTU2LHgscjS7hhyjQI1VNkW4Oi4xrkkyPGeoTMOXQTFa1RhO 3AMpTeua8WSBuGm3xxR1z4qp/x/plqOvDBECiP+07gHe88Tc5rTsU+1YN aMgMxa8XLCGfUTOrn0G+k7hSWohzIu+9v9qA3natEZY/6bKMJ+dSToMYi g==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="323298305" X-IronPort-AV: E=Sophos;i="5.97,222,1669104000"; d="scan'208";a="323298305" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2023 19:21:24 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="783096609" X-IronPort-AV: E=Sophos;i="5.97,222,1669104000"; d="scan'208";a="783096609" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.249.170.151]) ([10.249.170.151]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2023 19:21:13 -0800 Message-ID: Date: Tue, 17 Jan 2023 11:21:10 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v10 2/9] KVM: Introduce per-page memory attributes To: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Naoya Horiguchi , Miaohe Lin , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, wei.w.wang@intel.com References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-3-chao.p.peng@linux.intel.com> From: Binbin Wu In-Reply-To: <20221202061347.1070246-3-chao.p.peng@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: D9D3B12000A X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 3kzh8h68wxptam4m9ty6a1fs558bsusb X-HE-Tag: 1673925686-527222 X-HE-Meta: U2FsdGVkX19+jRzf8OTbeLZjCkMS+rGQqACL3gUeniIhT1V/yYU8UnkkznkWPb3XwS/ZLgesh0VdXkegHJoRakQ82z3KPqnP4H9oIxIg+YjptDwwhboqE5/2+7X8BGEuyBPOS36A7K3/u42aPFDyQ8zU5RzEvlLv7tNVQvdru5rBFIA0pc8EpLPCW3H18hwzxN52DSseJW7KQZYKO4zkAXljNa5REfjcgX50UNGANbH6J2HyDNGdpaucBbznGVE8CDhH3B8YD5gEdWzkAU32dWzz4t98BdcstPztC+5fERBYRY3/QQJfN4ASXTIlzCSDkfU8JFpdsbGbjVHrXAUYAX0wpwajT0OCDIhBfDyiIWNOu9MZIxkWcGZmlVEZGBvpuIlfFxmT3Fn1XRqDR3HyYRq5ieo5Ly3AAswkr7yPKDfGsvk9AlMgxT9uDhTT4TAAiHgr/hqbtsrD2JtxetXIcT51Fmx37FrQbit0p36fRLGNMsrbk1jEbB/6lOKbjEtuNHZVtAYjpJCmV1TfwWDTb5TsQgCVgqrUlcBBSyholxwGy2HQdNO6JMYuEUmzEsHHDRBaFK//MX4RZpAMxeGn5WoILI5FOrI4/364Q+E7wY89Lnprp8yWcZVW6ficXl5u5xk+f452bPlyp20gN44mpAo4++IHP2cIZmAxFWH8Lgo0S4fcTNaug+n7qAGehb42k5b/Lvl9j0qOmsYniUreGj6Rheix+WVqXxfQ9J/iaBK83rw17ku79R/HqSyybaTIZAbNVFatEhjPX6n5MHuYKVd/9q9wRZNqqrGMjeH5BieviBezsNb6e1q0hZwa7qfokMSkK9n+h5T30egYYDRey/GcBUUukyae X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/2/2022 2:13 PM, Chao Peng wrote: > In confidential computing usages, whether a page is private or shared is > necessary information for KVM to perform operations like page fault > handling, page zapping etc. There are other potential use cases for > per-page memory attributes, e.g. to make memory read-only (or no-exec, > or exec-only, etc.) without having to modify memslots. > > Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow > userspace to operate on the per-page memory attributes. > - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to > a guest memory range. > - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported > memory attributes. > > KVM internally uses xarray to store the per-page memory attributes. > > Suggested-by: Sean Christopherson > Signed-off-by: Chao Peng > Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com/ > --- > Documentation/virt/kvm/api.rst | 63 ++++++++++++++++++++++++++++ > arch/x86/kvm/Kconfig | 1 + > include/linux/kvm_host.h | 3 ++ > include/uapi/linux/kvm.h | 17 ++++++++ Should the changes introduced in this file also need to be added in tools/include/uapi/linux/kvm.h ? > virt/kvm/Kconfig | 3 ++ > virt/kvm/kvm_main.c | 76 ++++++++++++++++++++++++++++++++++ > 6 files changed, 163 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 5617bc4f899f..bb2f709c0900 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -5952,6 +5952,59 @@ delivery must be provided via the "reg_aen" struct. > The "pad" and "reserved" fields may be used for future extensions and should be > set to 0s by userspace. > > +4.138 KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES > +----------------------------------------- > + > +:Capability: KVM_CAP_MEMORY_ATTRIBUTES > +:Architectures: x86 > +:Type: vm ioctl > +:Parameters: u64 memory attributes bitmask(out) > +:Returns: 0 on success, <0 on error > + > +Returns supported memory attributes bitmask. Supported memory attributes will > +have the corresponding bits set in u64 memory attributes bitmask. > + > +The following memory attributes are defined:: > + > + #define KVM_MEMORY_ATTRIBUTE_READ (1ULL << 0) > + #define KVM_MEMORY_ATTRIBUTE_WRITE (1ULL << 1) > + #define KVM_MEMORY_ATTRIBUTE_EXECUTE (1ULL << 2) > + #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > + > +4.139 KVM_SET_MEMORY_ATTRIBUTES > +----------------------------------------- > + > +:Capability: KVM_CAP_MEMORY_ATTRIBUTES > +:Architectures: x86 > +:Type: vm ioctl > +:Parameters: struct kvm_memory_attributes(in/out) > +:Returns: 0 on success, <0 on error > + > +Sets memory attributes for pages in a guest memory range. Parameters are > +specified via the following structure:: > + > + struct kvm_memory_attributes { > + __u64 address; > + __u64 size; > + __u64 attributes; > + __u64 flags; > + }; > + > +The user sets the per-page memory attributes to a guest memory range indicated > +by address/size, and in return KVM adjusts address and size to reflect the > +actual pages of the memory range have been successfully set to the attributes. > +If the call returns 0, "address" is updated to the last successful address + 1 > +and "size" is updated to the remaining address size that has not been set > +successfully. The user should check the return value as well as the size to > +decide if the operation succeeded for the whole range or not. The user may want > +to retry the operation with the returned address/size if the previous range was > +partially successful. > + > +Both address and size should be page aligned and the supported attributes can be > +retrieved with KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES. > + > +The "flags" field may be used for future extensions and should be set to 0s. > + > 5. The kvm_run structure > ======================== > > @@ -8270,6 +8323,16 @@ structure. > When getting the Modified Change Topology Report value, the attr->addr > must point to a byte where the value will be stored or retrieved from. > > +8.40 KVM_CAP_MEMORY_ATTRIBUTES > +------------------------------ > + > +:Capability: KVM_CAP_MEMORY_ATTRIBUTES > +:Architectures: x86 > +:Type: vm > + > +This capability indicates KVM supports per-page memory attributes and ioctls > +KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES/KVM_SET_MEMORY_ATTRIBUTES are available. > + > 9. Known KVM API problems > ========================= > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index fbeaa9ddef59..a8e379a3afee 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -49,6 +49,7 @@ config KVM > select SRCU > select INTERVAL_TREE > select HAVE_KVM_PM_NOTIFIER if PM > + select HAVE_KVM_MEMORY_ATTRIBUTES > help > Support hosting fully virtualized guest machines using hardware > virtualization extensions. You will need a fairly recent > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 8f874a964313..a784e2b06625 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -800,6 +800,9 @@ struct kvm { > > #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER > struct notifier_block pm_notifier; > +#endif > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES > + struct xarray mem_attr_array; > #endif > char stats_id[KVM_STATS_NAME_SIZE]; > }; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 64dfe9c07c87..5d0941acb5bb 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1182,6 +1182,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_S390_CPU_TOPOLOGY 222 > #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 > #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224 > +#define KVM_CAP_MEMORY_ATTRIBUTES 225 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -2238,4 +2239,20 @@ struct kvm_s390_zpci_op { > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > +#define KVM_SET_MEMORY_ATTRIBUTES _IOWR(KVMIO, 0xd3, struct kvm_memory_attributes) > + > +struct kvm_memory_attributes { > + __u64 address; > + __u64 size; > + __u64 attributes; > + __u64 flags; > +}; > + > +#define KVM_MEMORY_ATTRIBUTE_READ (1ULL << 0) > +#define KVM_MEMORY_ATTRIBUTE_WRITE (1ULL << 1) > +#define KVM_MEMORY_ATTRIBUTE_EXECUTE (1ULL << 2) > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > + > #endif /* __LINUX_KVM_H */ > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > index 800f9470e36b..effdea5dd4f0 100644 > --- a/virt/kvm/Kconfig > +++ b/virt/kvm/Kconfig > @@ -19,6 +19,9 @@ config HAVE_KVM_IRQ_ROUTING > config HAVE_KVM_DIRTY_RING > bool > > +config HAVE_KVM_MEMORY_ATTRIBUTES > + bool > + > # Only strongly ordered architectures can select this, as it doesn't > # put any explicit constraint on userspace ordering. They can also > # select the _ACQ_REL version. > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 1782c4555d94..7f0f5e9f2406 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1150,6 +1150,9 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > spin_lock_init(&kvm->mn_invalidate_lock); > rcuwait_init(&kvm->mn_memslots_update_rcuwait); > xa_init(&kvm->vcpu_array); > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES > + xa_init(&kvm->mem_attr_array); > +#endif > > INIT_LIST_HEAD(&kvm->gpc_list); > spin_lock_init(&kvm->gpc_lock); > @@ -1323,6 +1326,9 @@ static void kvm_destroy_vm(struct kvm *kvm) > kvm_free_memslots(kvm, &kvm->__memslots[i][0]); > kvm_free_memslots(kvm, &kvm->__memslots[i][1]); > } > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES > + xa_destroy(&kvm->mem_attr_array); > +#endif > cleanup_srcu_struct(&kvm->irq_srcu); > cleanup_srcu_struct(&kvm->srcu); > kvm_arch_free_vm(kvm); > @@ -2323,6 +2329,49 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, > } > #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */ > > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES > +static u64 kvm_supported_mem_attributes(struct kvm *kvm) > +{ > + return 0; > +} > + > +static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, > + struct kvm_memory_attributes *attrs) > +{ > + gfn_t start, end; > + unsigned long i; > + void *entry; > + u64 supported_attrs = kvm_supported_mem_attributes(kvm); > + > + /* flags is currently not used. */ > + if (attrs->flags) > + return -EINVAL; > + if (attrs->attributes & ~supported_attrs) > + return -EINVAL; > + if (attrs->size == 0 || attrs->address + attrs->size < attrs->address) > + return -EINVAL; > + if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size)) > + return -EINVAL; > + > + start = attrs->address >> PAGE_SHIFT; > + end = (attrs->address + attrs->size - 1 + PAGE_SIZE) >> PAGE_SHIFT; > + > + entry = attrs->attributes ? xa_mk_value(attrs->attributes) : NULL; > + > + mutex_lock(&kvm->lock); > + for (i = start; i < end; i++) > + if (xa_err(xa_store(&kvm->mem_attr_array, i, entry, > + GFP_KERNEL_ACCOUNT))) > + break; > + mutex_unlock(&kvm->lock); > + > + attrs->address = i << PAGE_SHIFT; > + attrs->size = (end - i) << PAGE_SHIFT; > + > + return 0; > +} > +#endif /* CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES */ > + > struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn) > { > return __gfn_to_memslot(kvm_memslots(kvm), gfn); > @@ -4459,6 +4508,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > #ifdef CONFIG_HAVE_KVM_MSI > case KVM_CAP_SIGNAL_MSI: > #endif > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES > + case KVM_CAP_MEMORY_ATTRIBUTES: > +#endif > #ifdef CONFIG_HAVE_KVM_IRQFD > case KVM_CAP_IRQFD: > case KVM_CAP_IRQFD_RESAMPLE: > @@ -4804,6 +4856,30 @@ static long kvm_vm_ioctl(struct file *filp, > break; > } > #endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */ > +#ifdef CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES > + case KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES: { > + u64 attrs = kvm_supported_mem_attributes(kvm); > + > + r = -EFAULT; > + if (copy_to_user(argp, &attrs, sizeof(attrs))) > + goto out; > + r = 0; > + break; > + } > + case KVM_SET_MEMORY_ATTRIBUTES: { > + struct kvm_memory_attributes attrs; > + > + r = -EFAULT; > + if (copy_from_user(&attrs, argp, sizeof(attrs))) > + goto out; > + > + r = kvm_vm_ioctl_set_mem_attributes(kvm, &attrs); > + > + if (!r && copy_to_user(argp, &attrs, sizeof(attrs))) > + r = -EFAULT; > + break; > + } > +#endif /* CONFIG_HAVE_KVM_MEMORY_ATTRIBUTES */ > case KVM_CREATE_DEVICE: { > struct kvm_create_device cd; >