From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9139439D6F4; Tue, 28 Apr 2026 23:25:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777418718; cv=none; b=WMnDJQkcHhErPq/etx334vUsumruEEgAWZIK6cd3IC9Ov1PFel0xu/kT/dYFnanv/saQnnpanKfBV9/JKAJUKocEC78hNbp5unCXO6TJedRjGbWa7LYecKoax5d5rdIz5eA/nNjKqMHpRQ1S2RPh4uHgQZqHbX4U+fbZBh9M77g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777418718; c=relaxed/simple; bh=EtJm+CLHgfPNJ6px7UjasKC7l622h0ZRyjwTXKMOX3U=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=kjhOMTvIhmtALff9C5mzUgQzjTMISBC57APUDTvR5yTaHgXoyJyKOOSF6UC41PNl+yDPBaxOO7yz1004VtZFNUAyyzH9839c5SCUv6ooOAGI3RsYxfNqWGrzBeN2VAVEO8Hz7EAjC2FqVPuofiKYI5PNkj9RMkEmRbZ77PfcnVE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=V0uomQ/J; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="V0uomQ/J" Received: by smtp.kernel.org (Postfix) with ESMTPS id 4DA7EC4AF4D; Tue, 28 Apr 2026 23:25:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777418718; bh=EtJm+CLHgfPNJ6px7UjasKC7l622h0ZRyjwTXKMOX3U=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=V0uomQ/JhXw4yLeGGkngDAxO9P1PdPYD5S2rvZgBxDozaBwEU5jlsmcgi0oHuIiiB G3cHpbjENBALSoK9vtkCfN+PHivZM0Q4SZnEqVTQw9Se9P0H2uA6rkVSAlhaEftBe+ IDyjZIDvLnzqh/da7Qd361zA1wWuIR2swCq+p63hShGpbmqqt4PCV6HH0nYQrYADPH EORNEFG9OId32q0zryQGAyAPJlA+2q7guU3XaHfLLBf9uQro5vApXrJkbN6YF6w+Id 73ZiY/nO4jipWo64F06FhMBxznQkzDXM7ewv32PGWQSlOhQEQ33veOMMBsXCQuF4fr 1m7fks716ZrzA== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39D50FF887C; Tue, 28 Apr 2026 23:25:18 +0000 (UTC) From: Ackerley Tng via B4 Relay Date: Tue, 28 Apr 2026 16:25:09 -0700 Subject: [PATCH RFC v5 14/53] KVM: guest_memfd: Advertise KVM_SET_MEMORY_ATTRIBUTES2 ioctl Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20260428-gmem-inplace-conversion-v5-14-d8608ccfca22@google.com> References: <20260428-gmem-inplace-conversion-v5-0-d8608ccfca22@google.com> In-Reply-To: <20260428-gmem-inplace-conversion-v5-0-d8608ccfca22@google.com> To: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com, brauner@kernel.org, chao.p.peng@linux.intel.com, david@kernel.org, ira.weiny@intel.com, jmattson@google.com, jthoughton@google.com, michael.roth@amd.com, oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com, rick.p.edgecombe@intel.com, rientjes@google.com, shivankg@amd.com, steven.price@arm.com, tabba@google.com, willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com, forkloop@google.com, pratyush@kernel.org, suzuki.poulose@arm.com, aneesh.kumar@kernel.org, Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , Shuah Khan , Vishal Annapurve , Andrew Morton , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Youngjun Park , Qi Zheng , Shakeel Butt , Kiryl Shutsemau , Jason Gunthorpe , Vlastimil Babka Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, Ackerley Tng X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1777418714; l=5936; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=tu+JlRkYHR1IXTuUBVoULvQy7xjaIDUUYI8NTjhnxms=; b=mrTIaTLDrZxGt/EbqQxPyT5JpCnDYB9pIjvwlVNRpC1BZCmqzBP+bFEx8nXGN2ewa9bJYCDE6 sNYXWkfdQ3/C3Lhmox9HSJZgUHo4eLbBzOaJeyyyVidANnKH7vvxFAF X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Endpoint-Received: by B4 Relay for ackerleytng@google.com/20260225 with auth_id=649 X-Original-From: Ackerley Tng Reply-To: ackerleytng@google.com From: Ackerley Tng Introduce KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES to advertise the availability of the KVM_SET_MEMORY_ATTRIBUTES2 ioctl. KVM_SET_MEMORY_ATTRIBUTES2 is a guest_memfd-scoped version of the existing KVM_SET_MEMORY_ATTRIBUTES VM ioctl. It allows userspace to manage memory attributes, such as KVM_MEMORY_ATTRIBUTE_PRIVATE, directly on a guest_memfd file descriptor. This new version uses struct kvm_memory_attributes2, which adds an error_offset field to the output. This allows KVM to return the specific offset that triggered an error, which is especially useful for handling EAGAIN results caused by transient page reference counts during attribute conversions. Update the KVM API documentation to define the new ioctl and its behavior, and add the necessary UAPI definitions and capability checks. Suggested-by: Sean Christopherson Suggested-by: Michael Roth Signed-off-by: Ackerley Tng --- Documentation/virt/kvm/api.rst | 72 +++++++++++++++++++++++++++++++++++++++++- include/uapi/linux/kvm.h | 2 ++ virt/kvm/kvm_main.c | 5 +++ 3 files changed, 78 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 52bbbb553ce10..6ce10c8ddb634 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -117,7 +117,7 @@ description: x86 includes both i386 and x86_64. Type: - system, vm, or vcpu. + system, vm, vcpu or guest_memfd. Parameters: what parameters are accepted by the ioctl. @@ -6361,6 +6361,8 @@ S390: Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set. Returns -EINVAL if called on a protected VM. +.. _KVM_SET_MEMORY_ATTRIBUTES: + 4.141 KVM_SET_MEMORY_ATTRIBUTES ------------------------------- @@ -6553,6 +6555,74 @@ KVM_S390_KEYOP_SSKE Sets the storage key for the guest address ``guest_addr`` to the key specified in ``key``, returning the previous value in ``key``. +4.145 KVM_SET_MEMORY_ATTRIBUTES2 +--------------------------------- + +:Capability: KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES +:Architectures: all +:Type: guest_memfd ioctl +:Parameters: struct kvm_memory_attributes2 (in/out) +:Returns: 0 on success, <0 on error + +Errors: + + ========== =============================================================== + EINVAL The specified `offset` or `size` were invalid (e.g. not + page aligned, causes an overflow, or size is zero). + EFAULT The parameter address was invalid. + EAGAIN Some page within requested range had unexpected refcounts. The + offset of the page will be returned in `error_offset`. + ENOMEM Ran out of memory trying to track private/shared state + ========== =============================================================== + +KVM_SET_MEMORY_ATTRIBUTES2 is an extension to +KVM_SET_MEMORY_ATTRIBUTES that supports returning (writing) values to +userspace. The original (pre-extension) fields are shared with +KVM_SET_MEMORY_ATTRIBUTES identically. + +Attribute values are shared with KVM_SET_MEMORY_ATTRIBUTES. + +:: + + struct kvm_memory_attributes2 { + /* in */ + union { + __u64 address; + __u64 offset; + }; + __u64 size; + __u64 attributes; + __u64 flags; + /* out */ + __u64 error_offset; + __u64 reserved[11]; + }; + + #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) + +Set attributes for a range of offsets within a guest_memfd to +KVM_MEMORY_ATTRIBUTE_PRIVATE to limit the specified guest_memfd backed +memory range for guest_use. Even if KVM_CAP_GUEST_MEMFD_MMAP is +supported, after a successful call to set +KVM_MEMORY_ATTRIBUTE_PRIVATE, the requested range will not be mappable +into host userspace and will only be mappable by the guest. + +To allow the range to be mappable into host userspace again, call +KVM_SET_MEMORY_ATTRIBUTES2 on the guest_memfd again with +KVM_MEMORY_ATTRIBUTE_PRIVATE unset. + +If this ioctl returns -EAGAIN, the offset of the page with unexpected +refcounts will be returned in `error_offset`. This can occur if there +are transient refcounts on the pages, taken by other parts of the +kernel. + +Userspace is expected to figure out how to remove all known refcounts +on the shared pages, such as refcounts taken by get_user_pages(), and +try the ioctl again. A possible source of these long term refcounts is +if the guest_memfd memory was pinned in IOMMU page tables. + +See also: :ref: `KVM_SET_MEMORY_ATTRIBUTES`. + .. _kvm_run: 5. The kvm_run structure diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0b55258573d3d..f437fd0f1350c 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -996,6 +996,7 @@ struct kvm_enable_cap { #define KVM_CAP_S390_USER_OPEREXEC 246 #define KVM_CAP_S390_KEYOP 247 #define KVM_CAP_S390_VSIE_ESAMODE 248 +#define KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES 249 struct kvm_irq_routing_irqchip { __u32 irqchip; @@ -1648,6 +1649,7 @@ struct kvm_memory_attributes { __u64 flags; }; +/* Available with KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES */ #define KVM_SET_MEMORY_ATTRIBUTES2 _IOWR(KVMIO, 0xd2, struct kvm_memory_attributes2) struct kvm_memory_attributes2 { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4d7bf52b7b717..cec02d68d7039 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4972,6 +4972,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) return 1; case KVM_CAP_GUEST_MEMFD_FLAGS: return kvm_gmem_get_supported_flags(kvm); + case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES: + if (vm_memory_attributes) + return 0; + + return kvm_supported_mem_attributes(kvm); #endif default: break; -- 2.54.0.545.g6539524ca2-goog