From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fra-out-010.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-010.esa.eu-central-1.outbound.mail-perimeter.amazon.com [63.178.143.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A335934846A; Mon, 26 Jan 2026 16:56:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=63.178.143.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769446600; cv=none; b=bqJjo7hjdmwMu35ryXPYYQBrZelDNDXhXIzLtBvPWf4IoBWW0Hd4ng/Iwgg42cgRbG0P/Uw6T3kIAuc/7fyM6fnZ3o1jQn0QbCjjoqPU+2nvLxqoNntweSLfjVhfwyd0gS7z4rcvfynqUq+VqBdCOE0fST6GxgnPlkK+G1O/Nyg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769446600; c=relaxed/simple; bh=ukU/E1N/NHrpDBckIGLgNFAyQe4bWfQrJe2+TaF7Pt4=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=JoVPEr9fvWnfqVhA5UxMTpJv85l+9AVE+e00Z7/iPvWYF8QuHhyekGA72UOxpoBhi/sx9EKQk+YdBYXqGJw2c4q0C6p6XqpQnugC4956ljUmVRPkHoAVHg4pZPKydUVHAsCGuVdiqHm8qwD6+f8kIqMNl/Jdrm8Friyk8ohpExQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b=Jnr8mIUb; arc=none smtp.client-ip=63.178.143.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b="Jnr8mIUb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1769446598; x=1800982598; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=i1wNPmaVv8oAnx46s+pv/CwmAbOdzYrpQ/gTMvCUjdA=; b=Jnr8mIUb9PqMlTRnZfWLuDCFinsXK/iikp5GVYjdFk0J3VD8rlqQ+xUC 1jei9tRzoZ0vQ13/OiK5T5d1GN1bL7b6LwQQtEu3k1Olcemwb4t96yPm1 0PDuK5BJlfjrKRZy54JLnA5T6dwUIZFzCANvZ0UgupXgTSJGfvh1Psqug faqflUyTwfS0tQi/oY0b25bP+5ag2wS58w5PWwo8pSQwx3rNBezUNwVBE 5mSnHv+u2leHYnoJDuuT4thSe3gM68tuvX5QfqCrrFJiiyy+f9fTUj9q5 KIyMPRJ7ZOyTtJ1sdSpIg77UBbT65WAS0gvUP3/J5Zoh/+d2IHFjlxcBc A==; X-CSE-ConnectionGUID: hG/nmz7vTNmceFnAp6G6SA== X-CSE-MsgGUID: 3E3qoK4vTpqzDoxpd4XG1Q== X-IronPort-AV: E=Sophos;i="6.21,255,1763424000"; d="scan'208";a="8357615" Received: from ip-10-6-3-216.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.3.216]) by internal-fra-out-010.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jan 2026 16:56:36 +0000 Received: from EX19MTAEUC001.ant.amazon.com [54.240.197.225:22549] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.9.185:2525] with esmtp (Farcaster) id 8a3ce0b8-5cfe-49f5-b875-46094201b2e6; Mon, 26 Jan 2026 16:56:35 +0000 (UTC) X-Farcaster-Flow-ID: 8a3ce0b8-5cfe-49f5-b875-46094201b2e6 Received: from EX19D005EUB003.ant.amazon.com (10.252.51.31) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.35; Mon, 26 Jan 2026 16:56:30 +0000 Received: from [192.168.25.27] (10.106.82.32) by EX19D005EUB003.ant.amazon.com (10.252.51.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.35; Mon, 26 Jan 2026 16:56:11 +0000 Message-ID: Date: Mon, 26 Jan 2026 16:56:10 +0000 Precedence: bulk X-Mailing-List: linux-s390@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map To: Ackerley Tng , "Edgecombe, Rick P" , "linux-riscv@lists.infradead.org" , "kalyazin@amazon.co.uk" , "kernel@xen0n.name" , "linux-kselftest@vger.kernel.org" , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "linux-s390@vger.kernel.org" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvm@vger.kernel.org" , "bpf@vger.kernel.org" , "linux-doc@vger.kernel.org" , "loongarch@lists.linux.dev" CC: "david@kernel.org" , "palmer@dabbelt.com" , "catalin.marinas@arm.com" , "svens@linux.ibm.com" , "jgross@suse.com" , "surenb@google.com" , "riel@surriel.com" , "pfalcato@suse.de" , "peterx@redhat.com" , "x86@kernel.org" , "rppt@kernel.org" , "thuth@redhat.com" , "maz@kernel.org" , "dave.hansen@linux.intel.com" , "ast@kernel.org" , "vbabka@suse.cz" , "Annapurve, Vishal" , "borntraeger@linux.ibm.com" , "alex@ghiti.fr" , "pjw@kernel.org" , "tglx@linutronix.de" , "willy@infradead.org" , "hca@linux.ibm.com" , "wyihan@google.com" , "ryan.roberts@arm.com" , "jolsa@kernel.org" , "yang@os.amperecomputing.com" , "jmattson@google.com" , "luto@kernel.org" , "aneesh.kumar@kernel.org" , "haoluo@google.com" , "patrick.roy@linux.dev" , "akpm@linux-foundation.org" , "coxu@redhat.com" , "mhocko@suse.com" , "mlevitsk@redhat.com" , "jgg@ziepe.ca" , "hpa@zytor.com" , "song@kernel.org" , "oupton@kernel.org" , "peterz@infradead.org" , "maobibo@loongson.cn" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "jthoughton@google.com" , "martin.lau@linux.dev" , "jhubbard@nvidia.com" , "Yu, Yu-cheng" , "Jonathan.Cameron@huawei.com" , "eddyz87@gmail.com" , "yonghong.song@linux.dev" , "chenhuacai@kernel.org" , "shuah@kernel.org" , "prsampat@amd.com" , "kevin.brodsky@arm.com" , "shijie@os.amperecomputing.com" , "suzuki.poulose@arm.com" , "itazur@amazon.co.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "dev.jain@arm.com" , "gor@linux.ibm.com" , "jackabt@amazon.co.uk" , "daniel@iogearbox.net" , "agordeev@linux.ibm.com" , "andrii@kernel.org" , "mingo@redhat.com" , "aou@eecs.berkeley.edu" , "joey.gouly@arm.com" , "derekmn@amazon.com" , "xmarcalx@amazon.co.uk" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "jackmanb@google.com" , "bp@alien8.de" , "corbet@lwn.net" , "jannh@google.com" , "john.fastabend@gmail.com" , "kas@kernel.org" , "will@kernel.org" , "seanjc@google.com" References: <20260114134510.1835-1-kalyazin@amazon.com> <20260114134510.1835-8-kalyazin@amazon.com> <294bca75-2f3e-46db-bb24-7c471a779cc1@amazon.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJnrNfABQkFps9DAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOpfgD/exazh4C2Z8fNEz54YLJ6tuFEgQrVQPX6nQ/PfQi2+dwBAMGTpZcj9Z9NvSe1 CmmKYnYjhzGxzjBs8itSUvWIcMsFzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmes18AFCQWmz0MCGwwACgkQr5LKIKmaZPNTlQEA+q+rGFn7273rOAg+rxPty0M8lJbT i2kGo8RmPPLu650A/1kWgz1AnenQUYzTAFnZrKSsXAw5WoHaDLBz9kiO5pAK In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: EX19D001EUB003.ant.amazon.com (10.252.51.38) To EX19D005EUB003.ant.amazon.com (10.252.51.31) On 22/01/2026 18:37, Ackerley Tng wrote: > Nikita Kalyazin writes: > >> On 16/01/2026 00:00, Edgecombe, Rick P wrote: >>> On Wed, 2026-01-14 at 13:46 +0000, Kalyazin, Nikita wrote: >>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio) >>>> +{ >>>> + /* >>>> + * Direct map restoration cannot fail, as the only error condition >>>> + * for direct map manipulation is failure to allocate page tables >>>> + * when splitting huge pages, but this split would have already >>>> + * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map(). > > Do you know if folio_restore_direct_map() will also end up merging page > table entries to a higher level? > >>>> + * Thus folio_restore_direct_map() here only updates prot bits. >>>> + */ >>>> + if (kvm_gmem_folio_no_direct_map(folio)) { >>>> + WARN_ON_ONCE(folio_restore_direct_map(folio)); >>>> + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP); >>>> + } >>>> +} >>>> + >>> >>> Does this assume the folio would not have been split after it was zapped? As in, >>> if it was zapped at 2MB granularity (no 4KB direct map split required) but then >>> restored at 4KB (split required)? Or it gets merged somehow before this? > > I agree with the rest of the discussion that this will probably land > before huge page support, so I will have to figure out the intersection > of the two later. > >> >> AFAIK it can't be zapped at 2MB granularity as the zapping code will >> inevitably cause splitting because guest_memfd faults occur at the base >> page granularity as of now. > > Here's what I'm thinking for now: > > [HugeTLB, no conversions] > With initial HugeTLB support (no conversions), host userspace > guest_memfd faults will be: > > + For guest_memfd with PUD-sized pages > + At PUD level or PTE level > + For guest_memfd with PMD-sized pages > + At PMD level or PTE level > > Since this guest_memfd doesn't support conversions, the folio is never > split/merged, so the direct map is restored at whatever level it was > zapped. I think this works out well. > > [HugeTLB + conversions] > For a guest_memfd with HugeTLB support and conversions, host userspace > guest_memfd faults will always be at PTE level, so the direct map will > be split and the faulted pages have the direct map zapped in 4K chunks > as they are faulted. > > On conversion back to private, put those back into the direct map > (putting aside whether to merge the direct map PTEs for now). Makes sense to me. > > > Unfortunately there's no unmapping callback for guest_memfd to use, so > perhaps the principle should be to put the folios back into the direct > map ASAP - at unmapping if guest_memfd is doing the unmapping, otherwise > at freeing time? I'm not sure I fully understand what you mean here. What would be the purpose for hooking up to unmapping? Why would making sure we put folios back into the direct map whenever they are freed or converted to private not be sufficient?