From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fra-out-011.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-011.esa.eu-central-1.outbound.mail-perimeter.amazon.com [52.28.197.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA03D3EBF03; Thu, 22 Jan 2026 18:48:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.28.197.132 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769107711; cv=none; b=mdJvft92ImXQ1WDmbr1bB/kT+vfqdzo65wxV7hk5nbU3OxvphZN+/kCsL7MTxVGbikHWtnhN9uXiYOW/v0ii8oWazlh2ld2KmuvvsV7aa2N13hrW48vGMxVZPmwFPvEyor0FzPToAh6rwD7BTRzmZiELRsK5z9tMituNxQvzRf8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769107711; c=relaxed/simple; bh=e473fGbaxEFgQov/AWdKjragoZNg0WVx0cwAlSFTVVg=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=QJ5U9tM0oO2q9RzVrHlSIO/jsP5B4fKfbLa/hcOsLIx7BiXMRz4R9Qs52nHV83btDC3l2LcpGVidarur4v9TLL00Qod3kKdcIJ8u08rQkcYTSfH6fFnvjO5IZCuaZoIoG4vl6qg1MQqeUt+c8KYq0c1tUB3hJ+mAwyJeZC7WJBo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b=MJHsmS6+; arc=none smtp.client-ip=52.28.197.132 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b="MJHsmS6+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1769107706; x=1800643706; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=EvBM3L4II/cGkagk2VbdKNvKqQzeUtAnRgo+PSoIV/Y=; b=MJHsmS6+mfH44e8BXBEhngiAk4MKZAiWWgJy1uRFQ4V6WBLfGuNZKbwZ fiz+6bzSVYAsekdqIH74MmSxIRoTU9czCYllXjLu3vaEG+gToXvxCkfEF PGXyuy3UFB/E46qsjYGFFKU837+iOfsOf/fZuocv82Gd0HdR3929b/m0K mOBEv42ebZ5DBfB2GcWwy2OCS9SI6u0IbFOL1L7opBrAXP4pvAglGLmEX wFR0Kwi+85ket/Th484vnHIfMngW170wmjuVG9QaWXDrtmYhu/VPa14h5 bUADxVqlQFE9x5JB4yAjuglHeMLPUFkQL1A7HWEAnLWyU5vGgoEW6Ew9U A==; X-CSE-ConnectionGUID: tcmKhT3uQH6oov/07SCvrg== X-CSE-MsgGUID: qNQV+8P6Qz6tQEAKSD8cQQ== X-IronPort-AV: E=Sophos;i="6.21,246,1763424000"; d="scan'208";a="8196021" Received: from ip-10-6-6-97.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.6.97]) by internal-fra-out-011.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jan 2026 18:48:03 +0000 Received: from EX19MTAEUC002.ant.amazon.com [54.240.197.236:17130] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.29.47:2525] with esmtp (Farcaster) id 32c357db-7900-4592-9eca-47dae0cab1ce; Thu, 22 Jan 2026 18:48:02 +0000 (UTC) X-Farcaster-Flow-ID: 32c357db-7900-4592-9eca-47dae0cab1ce Received: from EX19D005EUB003.ant.amazon.com (10.252.51.31) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.35; Thu, 22 Jan 2026 18:48:02 +0000 Received: from [192.168.23.186] (10.106.82.17) by EX19D005EUB003.ant.amazon.com (10.252.51.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.35; Thu, 22 Jan 2026 18:47:43 +0000 Message-ID: Date: Thu, 22 Jan 2026 18:47:41 +0000 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map To: Ackerley Tng , "Edgecombe, Rick P" , "linux-riscv@lists.infradead.org" , "kalyazin@amazon.co.uk" , "kernel@xen0n.name" , "linux-kselftest@vger.kernel.org" , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "linux-s390@vger.kernel.org" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvm@vger.kernel.org" , "bpf@vger.kernel.org" , "linux-doc@vger.kernel.org" , "loongarch@lists.linux.dev" CC: "david@kernel.org" , "palmer@dabbelt.com" , "catalin.marinas@arm.com" , "svens@linux.ibm.com" , "jgross@suse.com" , "surenb@google.com" , "riel@surriel.com" , "pfalcato@suse.de" , "peterx@redhat.com" , "x86@kernel.org" , "rppt@kernel.org" , "thuth@redhat.com" , "maz@kernel.org" , "dave.hansen@linux.intel.com" , "ast@kernel.org" , "vbabka@suse.cz" , "Annapurve, Vishal" , "borntraeger@linux.ibm.com" , "alex@ghiti.fr" , "pjw@kernel.org" , "tglx@linutronix.de" , "willy@infradead.org" , "hca@linux.ibm.com" , "wyihan@google.com" , "ryan.roberts@arm.com" , "jolsa@kernel.org" , "yang@os.amperecomputing.com" , "jmattson@google.com" , "luto@kernel.org" , "aneesh.kumar@kernel.org" , "haoluo@google.com" , "patrick.roy@linux.dev" , "akpm@linux-foundation.org" , "coxu@redhat.com" , "mhocko@suse.com" , "mlevitsk@redhat.com" , "jgg@ziepe.ca" , "hpa@zytor.com" , "song@kernel.org" , "oupton@kernel.org" , "peterz@infradead.org" , "maobibo@loongson.cn" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "jthoughton@google.com" , "martin.lau@linux.dev" , "jhubbard@nvidia.com" , "Yu, Yu-cheng" , "Jonathan.Cameron@huawei.com" , "eddyz87@gmail.com" , "yonghong.song@linux.dev" , "chenhuacai@kernel.org" , "shuah@kernel.org" , "prsampat@amd.com" , "kevin.brodsky@arm.com" , "shijie@os.amperecomputing.com" , "suzuki.poulose@arm.com" , "itazur@amazon.co.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "dev.jain@arm.com" , "gor@linux.ibm.com" , "jackabt@amazon.co.uk" , "daniel@iogearbox.net" , "agordeev@linux.ibm.com" , "andrii@kernel.org" , "mingo@redhat.com" , "aou@eecs.berkeley.edu" , "joey.gouly@arm.com" , "derekmn@amazon.com" , "xmarcalx@amazon.co.uk" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "jackmanb@google.com" , "bp@alien8.de" , "corbet@lwn.net" , "jannh@google.com" , "john.fastabend@gmail.com" , "kas@kernel.org" , "will@kernel.org" , "seanjc@google.com" References: <20260114134510.1835-1-kalyazin@amazon.com> <20260114134510.1835-8-kalyazin@amazon.com> <294bca75-2f3e-46db-bb24-7c471a779cc1@amazon.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJnrNfABQkFps9DAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOpfgD/exazh4C2Z8fNEz54YLJ6tuFEgQrVQPX6nQ/PfQi2+dwBAMGTpZcj9Z9NvSe1 CmmKYnYjhzGxzjBs8itSUvWIcMsFzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmes18AFCQWmz0MCGwwACgkQr5LKIKmaZPNTlQEA+q+rGFn7273rOAg+rxPty0M8lJbT i2kGo8RmPPLu650A/1kWgz1AnenQUYzTAFnZrKSsXAw5WoHaDLBz9kiO5pAK In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: EX19D011EUA001.ant.amazon.com (10.252.50.114) To EX19D005EUB003.ant.amazon.com (10.252.51.31) On 22/01/2026 18:37, Ackerley Tng wrote: > Nikita Kalyazin writes: > >> On 16/01/2026 00:00, Edgecombe, Rick P wrote: >>> On Wed, 2026-01-14 at 13:46 +0000, Kalyazin, Nikita wrote: >>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio) >>>> +{ >>>> + /* >>>> + * Direct map restoration cannot fail, as the only error condition >>>> + * for direct map manipulation is failure to allocate page tables >>>> + * when splitting huge pages, but this split would have already >>>> + * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map(). > > Do you know if folio_restore_direct_map() will also end up merging page > table entries to a higher level? By looking at the callchain in x86 at least, I can't see how it would. > >>>> + * Thus folio_restore_direct_map() here only updates prot bits. >>>> + */ >>>> + if (kvm_gmem_folio_no_direct_map(folio)) { >>>> + WARN_ON_ONCE(folio_restore_direct_map(folio)); >>>> + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP); >>>> + } >>>> +} >>>> + >>> >>> Does this assume the folio would not have been split after it was zapped? As in, >>> if it was zapped at 2MB granularity (no 4KB direct map split required) but then >>> restored at 4KB (split required)? Or it gets merged somehow before this? > > I agree with the rest of the discussion that this will probably land > before huge page support, so I will have to figure out the intersection > of the two later. > >> >> AFAIK it can't be zapped at 2MB granularity as the zapping code will >> inevitably cause splitting because guest_memfd faults occur at the base >> page granularity as of now. > > Here's what I'm thinking for now: > > [HugeTLB, no conversions] > With initial HugeTLB support (no conversions), host userspace > guest_memfd faults will be: > > + For guest_memfd with PUD-sized pages > + At PUD level or PTE level > + For guest_memfd with PMD-sized pages > + At PMD level or PTE level > > Since this guest_memfd doesn't support conversions, the folio is never > split/merged, so the direct map is restored at whatever level it was > zapped. I think this works out well. > > [HugeTLB + conversions] > For a guest_memfd with HugeTLB support and conversions, host userspace > guest_memfd faults will always be at PTE level, so the direct map will > be split and the faulted pages have the direct map zapped in 4K chunks > as they are faulted. > > On conversion back to private, put those back into the direct map > (putting aside whether to merge the direct map PTEs for now). > > > Unfortunately there's no unmapping callback for guest_memfd to use, so > perhaps the principle should be to put the folios back into the direct > map ASAP - at unmapping if guest_memfd is doing the unmapping, otherwise > at freeing time?