From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36058C83F25 for ; Mon, 21 Jul 2025 15:07:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA5766B0095; Mon, 21 Jul 2025 11:07:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B56486B0096; Mon, 21 Jul 2025 11:07:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A449B6B0098; Mon, 21 Jul 2025 11:07:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8E7F56B0095 for ; Mon, 21 Jul 2025 11:07:48 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3E897BB462 for ; Mon, 21 Jul 2025 15:07:48 +0000 (UTC) X-FDA: 83688601416.07.49DFDC5 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by imf04.hostedemail.com (Postfix) with ESMTP id 967FB40003 for ; Mon, 21 Jul 2025 15:07:43 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ayD5vex4; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of xiaoyao.li@intel.com designates 192.198.163.7 as permitted sender) smtp.mailfrom=xiaoyao.li@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1753110466; a=rsa-sha256; cv=none; b=bsf4k71LzeyD6SnnGDOd7Re3jQnub14PpNg84bmxbMSsfmasYG3RKzKo9ud5zZl0UDTvpv Lza5KaC3wAFwyTEi9N3S2L5QBpGWFjvy1moYCtZozj9CkNAmYHTtMEC8C19mTnBSDwJuTd P3afUc1ZnFeVqdrj3GqrK2dQtNrxLfA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ayD5vex4; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of xiaoyao.li@intel.com designates 192.198.163.7 as permitted sender) smtp.mailfrom=xiaoyao.li@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1753110466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6KAyPegtl+ZWvI+P6yKeaZE/MLv0lM6FK+2s2p7UNqk=; b=sY2eVttv7B0v1MZ7Tjipk9w4MmUPkGx58i0nxyGST5uYbcvbUt+gTqpRzGETAUbXHMTgWn wOoYagdolYfaaLM+u7HGydftJXuoNd9knuhMHxO482Id+7hISgBbHS4oosF9WXq3GpkQ8s oe6NqE4OBm4pYcwSr8KjTctPiHKpiUA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1753110466; x=1784646466; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=4YlAyrTgURcjQLLGh8M8Baulvlj/zb48j/Ays/Dbezo=; b=ayD5vex48nXwJ9vj0q77GSmxRs0ZBPHI0hZFR+fXQ60yxI13TM/EpKGn LatfeY8+Fa49BvZK6nrQ+bFWIVhVkQVvHL3aHnN55jDMayzU45sAt93I+ Tqh7dMUtLfOEmveiRTx6NjR3V26+E/wiXtVgiEvlPlfggYGX0a3xgen/T hPB24CaKt8pJICJYSUfHTwn1ug6bJOa7qq5O/VPE4+koeXF0i0ucAkeDV 2xGFinbhhC/q4+kUMNWo/BK6y1hx/H2tCHLB5WQV2VEXfGTMVR6xZZHWB RbEtTFJsR6WfPgqhT0MPgeH3sFDyo5rq5tdMt/c6eQcqdg/mjr7Z9gwxf g==; X-CSE-ConnectionGUID: QaQL3NikQMWKMFFAD3vTyw== X-CSE-MsgGUID: oL0a30cPSdur4cgBFj3HWQ== X-IronPort-AV: E=McAfee;i="6800,10657,11499"; a="80772531" X-IronPort-AV: E=Sophos;i="6.16,329,1744095600"; d="scan'208";a="80772531" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2025 08:07:42 -0700 X-CSE-ConnectionGUID: Y2/p+6F3S7qzVEpfid72dw== X-CSE-MsgGUID: Fsb9V4BJTea2+1zJTwr/dA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,329,1744095600"; d="scan'208";a="163160761" Received: from xiaoyaol-hp-g830.ccr.corp.intel.com (HELO [10.124.247.1]) ([10.124.247.1]) by fmviesa005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2025 08:07:28 -0700 Message-ID: <1fe0f46a-152a-4b5b-99e2-2a74873dafdc@intel.com> Date: Mon, 21 Jul 2025 23:07:25 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type To: Sean Christopherson , Vishal Annapurve Cc: Fuad Tabba , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com References: <20250717162731.446579-1-tabba@google.com> <20250717162731.446579-15-tabba@google.com> <505a30a3-4c55-434c-86a5-f86d2e9dc78a@intel.com> Content-Language: en-US From: Xiaoyao Li In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: ccz6acip6ye575zn6jpsom43bbbhi1zw X-Rspam-User: X-Rspamd-Queue-Id: 967FB40003 X-Rspamd-Server: rspam02 X-HE-Tag: 1753110463-345640 X-HE-Meta: U2FsdGVkX18i4scsGlaJNzcOCWRrygoN22RL+zwM7hrm8StPNux5CUmKuEQ9b9gdqWWdWCP5E5035Xv78KLCa+DB5+pzUZBa3N3s9iAC0SqEZ22yPnfKJSCrOu9vBhGit1Xi5nb85ioTAl1r6+7LUmFzGoWS8LzmdBNeqwsmiX7ac9M4PGryPKpq9DjfzSq4F7DDbtliozqVR52ARMdjBu1Tuk5HMmzrNBb7qB7XBf0OLjspPtmo4J59nOdGzs10QnC+EkcP/32KPf9ioVUkN1QgthezA2Mao9PQuo4NnU9SLvrAqLtjB+TRCF8deCbrD+mdvznVS6g7/KCib8S0fCgGTL0S1L4/916ZHM//VDdM6u2S20THNSthfEPXERmr/tTQuSK4ReFkaSRVXp5WVy0iRES0BHWMOcNXZAq3kK/l9opKs53+rqw4M4WzZjmrJOzpdzAPdh6940UFM6vxeRY3EnuqOPoJZeWQMJE+QIakDfELd21u3/3ujsiP0EJ2YDuhi1cFLXEfHvqSE8F71x5dAq7+zGWpj1qluTaMp3/gZvOMs7lrE56nnMOhddGdJgfW1v9NZJdqj3jN7gTMjdDeH7ynbQao8p5/9uy5OTVKzsn9gIczgNzN4o9eTdJZO9HwVTHsEicR9M+ijzKZAPD9wuidrrPKZPovGfm+4dhRfBJlRa0ChSWszn0vHNniJ2EyEUnpNU3y7zdpJQkvRbaJyaTyzK5wxFScoI6EHYpjL6ayQCOzvIOtToXgdkBdyq9RUe892swRNggTfbQp/Jlyaq5XD4bKJVeXGLrodBNuA3O4dvWjUB+wtTONPqkCwmX3oa9J+2TPKJNQDZojoBV/epEOc4VOaw7X30sdOFFhl/gWOP/BCWahwR1kx+I8WS7VHEY9ijU2tzPmQqVRGXwAjEeuHL5WwhSW1gsHFzZG0bMuZObjO7FzzsMJ2PVNtR5jwJwLZMu3IbqWVzT oVe6OxkG 6CuF68mMJTisVYRuraVtqM2iMWSdBA2VoDyxZJbjjscWv/KXdZ5NqRSmMiQisCNdK8SNcUxZBwgGKTi3pvrtQlXFtjBQ4m1jtMbT+iJYTup/w/ARlRvqA9yUAy4QS6uV+xYf3dKZrai34kbUNzD6AjEBU2SSpbD5U6xYksKjCFyEUNi4H97PmZZ3nZ1lZz8L2KNYzG9rHZFCusqSXpvquDI38/zV3RswjZpTfBpU+yvsyBVlRDgrtcqgaWm4QUn9XmxZLgNBTJ3E7w6wK9DoyVJyzV3qw77X9Az37BSuJGLtySQYzt0dto1e8gZ38RoWalQnIB8f4KvGL88U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 7/21/2025 10:42 PM, Sean Christopherson wrote: > On Mon, Jul 21, 2025, Vishal Annapurve wrote: >> On Mon, Jul 21, 2025 at 5:22 AM Xiaoyao Li wrote: >>> >>> On 7/18/2025 12:27 AM, Fuad Tabba wrote: >>>> +/* >>>> + * CoCo VMs with hardware support that use guest_memfd only for backing private >>>> + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled. >>>> + */ >>>> +#define kvm_arch_supports_gmem_mmap(kvm) \ >>>> + (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) && \ >>>> + (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM) >>> >>> I want to share the findings when I do the POC to enable gmem mmap in QEMU. >>> >>> Actually, QEMU can use gmem with mmap support as the normal memory even >>> without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd >>> on KVM_SET_USER_MEMORY_REGION2. >>> >>> Since the gmem is mmapable, QEMU can pass the userspace addr got from >>> mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It >>> works well for non-coco VMs on x86. >>> >>> Then it seems feasible to use gmem with mmap for the shared memory of >>> TDX, and an additional gmem without mmap for the private memory. i.e., >>> For struct kvm_userspace_memory_region, the @userspace_addr is passed >>> with the uaddr returned from gmem0 with mmap, while @guest_memfd is >>> passed with another gmem1 fd without mmap. >>> >>> However, it fails actually, because the kvm_arch_suports_gmem_mmap() >>> returns false for TDX VMs, which means userspace cannot allocate gmem >>> with mmap just for shared memory for TDX. >> >> Why do you want such a usecase to work? > > I'm guessing Xiaoyao was asking an honest question in response to finding a > perceived flaw when trying to get this all working in QEMU. I'm not sure if it is an flaw. Such usecase is not supported is just anti-intuition to me. >> If kvm allows mappable guest_memfd files for TDX VMs without >> conversion support, userspace will be able to use those for backing > > s/able/unable? I think vishal meant "able", because ... >> private memory unless: >> 1) KVM checks at binding time if the guest_memfd passed during memslot >> creation is not a mappable one and doesn't enforce "not mappable" >> requirement for TDX VMs at creation time. > > Xiaoyao's question is about "just for shared memory", so this is irrelevant for > the question at hand. ... if we allow gmem mmap for TDX, KVM needs to ensure the mmapable gmem should only be passed via userspace_addr. IOW, KVM needs to forbid userspace from passing the mmap'able guest_memfd to kvm_userspace_memory_region2.guest_memfd. Because it allows userspace to access the private mmeory. >> 2) KVM fetches shared faults through userspace page tables and not >> guest_memfd directly. > > This is also irrelevant. KVM _already_ supports resolving shared faults through > userspace page tables. That support won't go away as KVM will always need/want > to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX). > >> I don't see value in trying to go out of way to support such a usecase. > > But if/when KVM gains support for tracking shared vs. private in guest_memfd > itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go > out of its to support using guest_memfd for the @userspace_addr backing store. > Unless I'm missing something, the only thing needed to "support" this scenario is: As above, we need 1) mentioned by Vishal as well, to prevent userspace from passing mmapable guest_memfd to serve as private memory. > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index d01bd7a2c2bd..34403d2f1eeb 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -533,7 +533,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) > u64 flags = args->flags; > u64 valid_flags = 0; > > - if (kvm_arch_supports_gmem_mmap(kvm)) > + // if (kvm_arch_supports_gmem_mmap(kvm)) > valid_flags |= GUEST_MEMFD_FLAG_MMAP; > > if (flags & ~valid_flags) > > I think the question we actually want to answer is: do we want to go out of our > way to *prevent* such a usecase. E.g. is there any risk/danger that we need to > mitigate, and would the cost of the mitigation be acceptable? > > I think the answer is "no", because preventing userspace from using guest_memfd > as shared-only memory would require resolving the VMA during hva_to_pfn() in order > to fully prevent such behavior, and I defintely don't want to take mmap_lock > around hva_to_pfn_fast(). > > I don't see any obvious danger lurking. KVM's pre-guest_memfd memory management > scheme is all about effectively making KVM behave like "just another" userspace > agent. E.g. if/when TDX/SNP support comes along, guest_memfd must not allow mapping > private memory into userspace regardless of what KVM supports for page faults. > > So unless I'm missing something, for now we do nothing, and let this support come > along naturally once TDX support mmap() on guest_memfd.