From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A7DBC3DA6D for ; Fri, 16 May 2025 22:43:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D15866B000A; Fri, 16 May 2025 18:43:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC3806B0082; Fri, 16 May 2025 18:43:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3DAB6B0083; Fri, 16 May 2025 18:43:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8BD9C6B000A for ; Fri, 16 May 2025 18:43:17 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6AFDD16029D for ; Fri, 16 May 2025 22:43:18 +0000 (UTC) X-FDA: 83450248476.22.4EAF834 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf04.hostedemail.com (Postfix) with ESMTP id B95FF40003 for ; Fri, 16 May 2025 22:43:16 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ekKijttv; spf=pass (imf04.hostedemail.com: domain of 3g78naAsKCOgKMUObVOidXQQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--ackerleytng.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3g78naAsKCOgKMUObVOidXQQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ekKijttv; spf=pass (imf04.hostedemail.com: domain of 3g78naAsKCOgKMUObVOidXQQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--ackerleytng.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3g78naAsKCOgKMUObVOidXQQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747435396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=UwP/Vx2iaUwHKjVIPmZoWWF77us3LDnzfoBHKI6jETg=; b=VIRkpJT95Ui5VlP62GG807+7xRPzvLo6cNuAb4HbLrK2pGWQ/bOyRnjQ8VsoxSMnt0vJNU xEa6KdI9U2CFRN/7cyN7MV7nOHZsSkjEweP7wxAxwVNttGd92XP8R5iOPgddh9gkQ7xbV1 wEDGUBmoSP8W1etN9TkiHtlnXdKHHCk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747435396; a=rsa-sha256; cv=none; b=Y/ITc3iGHczQcjmZOloNWCLVL9s2XT1RCNC4mbTKI7xdRl9g+U2HuZ0GVgW2BDyGowhL5F mrHf3oTJrw0OOnjw8nTDE4WtP575wkQEo8+aqlACQTp9JIJKGAdR3IXCXDvEL+PlB2nLjh okB6l8REKjwo2doe2n/CQGaaQ0dGQLU= Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-7398d70abbfso3461735b3a.2 for ; Fri, 16 May 2025 15:43:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747435395; x=1748040195; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=UwP/Vx2iaUwHKjVIPmZoWWF77us3LDnzfoBHKI6jETg=; b=ekKijttvnzSWmoYbcxhyvsxuRJWY26feuWKriPVAzakhWq7AxoYO/zF+ZA4ko1ekgq 7UVnH0Z8126a/wUor0ij832K7hGQTXK60QaJJN1vCctF3nj/ZDV/dq+Q0pcVs/pmaPQz lUG7HHgyjSPZHWmymQTRNZxcVYxb1z7CjlPm5vJzcXyL4OhiokWdOfAX0P2sYujBOB6M nNaqGu3w/uT8UL151YrwEwsbuQteyv92OgHa4LVr5fhXhgFhS8a/E8zh3vGQ5i9iWXJw qnibXTd7r9AY04pVza8UrvSh8ltaJ17tR0xuE2iJ/ATG9EIcH5+iIxtIB0AiyFOGnQd5 F0ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747435395; x=1748040195; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UwP/Vx2iaUwHKjVIPmZoWWF77us3LDnzfoBHKI6jETg=; b=WlsOa+d7e836i/fXhSogR0QPS5rBiE/T5e2DLqukviWsz30tdqafHKk2vL07Ecr7k/ s+rkXs6HDUByZurLltVEu04pIDWeOBOVYM0Ll3DXOT7HnJah3Nq/InIDQfXEGy2pB0Jn qwq8lkfBoF6EASMF+VysipeS0/AQeQlf94p5twi6SRyxplJJO6EzLCJuCzDNxnfVpS3Z vBty6H6nQJ0iojqQ4bw2VObdBXxsZrLy7dHjkViv6OpPJ0JQDrKeUobFIg6cmmm6BEVI yIlEOK5pDu8ZCHhgHucdAKKzEx4WmLljfgRQ0afTOLU+MZST5RhlbFLI1nD4uIvX2Uvs 2IRA== X-Forwarded-Encrypted: i=1; AJvYcCXTd5EM7IKFczyLHzT+bIGcQch0f0IXtYBo9RMMETNgdsALwsnV19Gl8r+jn3AsI43Vf+9Rg9kRyA==@kvack.org X-Gm-Message-State: AOJu0YxIqI3gxCQ2sC7JBggLLo2yNeqCDg6KZsiFd4kPtaexbmn1R0sV Vw8BTkVD9meuZHh+IGQFFcK+BwHyA3KR31mUzQiw5x6jpLp5FuipMb8W4UT7kRxV7xNNZDziXhh fymOFQJmwLi85Eyck+Y+BjR7ZHw== X-Google-Smtp-Source: AGHT+IG6+sFRSJbWE22FTz4ijWTukSwix/9j770r9PlxkRLB52VwmrOJ/kpeM4pmmDos7Oer2aGmJhNogxMLy5V/6g== X-Received: from pfbik5.prod.google.com ([2002:a05:6a00:8d05:b0:73c:29d8:b795]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:10c6:b0:740:a52f:9652 with SMTP id d2e1a72fcca58-742a97aa35emr6301726b3a.6.1747435395246; Fri, 16 May 2025 15:43:15 -0700 (PDT) Date: Fri, 16 May 2025 15:43:14 -0700 In-Reply-To: (message from Ackerley Tng on Wed, 14 May 2025 16:41:39 -0700) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd From: Ackerley Tng To: Ackerley Tng Cc: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B95FF40003 X-Stat-Signature: 79mx81r5q45d7hux7pss8khkfn8j7ooj X-Rspam-User: X-HE-Tag: 1747435396-780637 X-HE-Meta: U2FsdGVkX18qsFTlJ04EzJRhWnyQB0IAEziHVzo3NTBuofn3wq9HJzXGtJQzAxEBs1F8mNX30QokMZKFhVQQeiVfSv/qxTsWZIn+UyhP2o0+B7m9q3oRFty2St6EDvpVEdmyyWDNb0Q7hTWz6sxB7+AtmGvIL+N0xIkJHIzbPuhOpu8BQ7apiOXynkfJyWi5bQTY5aMYGjzI2DLj/LaLfy+3nnw4AGDsrg3LkyAI9UwXYli2j3yJu9AT/aCB2q87lpn4uqKYgex6gPtC6q34WZjyCmvI6TI23fwJDxb53E7oWFEWAJRjKfGXU5TiWTKFM/3raRxH+VP+x9CFPkrM4/TDqh/nKbGSDTBs1RYo1Zhq0UOCj15hkL3kw5vEzDaPhymluJpPYpHhEionPiS3wtjtn520z5q1Vzk+DvWklZtSSv2rPge74LzuCrVN3UMfEsMR7+NQPhMdCbhFEUu7lYhLHN83VnqJZML7OGkiScU65qQuISck/+QhX/SbT/jQ5Otxp0lZGQSmmkQaygZ7EC/hT1uz76YRy6jAyd1n2n7J5AqDcTq22nGwYtaKKsfP8wp3ne0G7RwMmlFSmr7kiMzT/J4u5q/GTrrG0wK4xMCV1J5S7GVMu28/Begoz7Pgxb2cC4rcl5/KH4xh9omZEWD/r98m4vx42h5GBRdvb27P/xT+Kqh/bBqA+M1PUdPeqRBeSdWFAsOV0c2097ZofKGdbYZ4VQNSC+iuJO3rS0Z9gnYFYqNeIUArwqxX9ocWmOvqu4Y5EMfP7kN3I5Ki8wQFKPsfDnpOcbFNJqh4d9lemp6YF4LZ+cnjdlkFSIBWWLuyt89VCNaF8eWt4lnMUSCLT7qWT0wPp7qSVkAwfJ38Z7nST46UuUwqtOqC1yF+YU1dWlVG0x97+9H1otvP/OHf1bQ1nKoZr5oltmpU19XQwyLZdHq6ph9GmCokx6VxR+eQf8et8RakRBjitD2 IvJ3Ztxw PWkwVuMJavjh1Lu/OY8NJKizQdn7IztEf0byAn0Fy2PYLM1N2rcFm3fEtskmo7TG2wCByppwxayOsSZ7AiP7NcKEbFWn2GSaHV0CzTYPp8/sglr4Dvl0U2xqCYZ0WMp8gprHn9rWo77XDqykE1xBX5nQRWNIkIHk0wWIggVrJYUSwJ6yZ7kQgC6hVDJ4jRHHUGRD67JuSymJs97XkfsR9cUDUj/3QDxHNoJVrNzpM/kFGcCkwYgiotFGiAld05uqHGL5RJhj1PHmRAsVqcINANejwMcl/aTdwsS55NKBTJc/FczaWs7BRRvKJ9a+KIEoHwhdRTMPM4p/oVQH/TRXsGh1b+B3YtWXSnVXJlkHc+mRPrDl6u4PJHgK2Tc8r6FcJ4SP+wMzKGjlE/0gcmb1VWtvmI7tymIdtwdpD2jlt8tIJMDOmEVLHJBgcorytZRc3ht+o3nKHu1HtPJRRNRApAUjug6myi3vd3hlR1qpKbn2NOTUFsMppxPog0mxPO5apjQiJlrTvJK2ZhtW99bQ+Mdnqnha3yKKdzJci5yKEXV/2IEnVkqbpYAYgh8A5EmL8mtneKDs1w92FTK+38fzQd8EQNsb+Vz5jcKLOiF08cc3OGlrhk9ABbdKtaaR7MZTwQrbUIDeB5li/SN5jYgJv9/MUCL1Yx4Ys2DhGctiZM1vT5SM1sk9HVvlonw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ackerley Tng writes: > > > Here are some remaining issues/TODOs: > > 1. Memory error handling such as machine check errors have not been > implemented. > 2. I've not looked into preparedness of pages, only zeroing has been > considered. > 3. When allocating HugeTLB pages, if two threads allocate indices > mapping to the same huge page, the utilization in guest_memfd inode's > subpool may momentarily go over the subpool limit (the requested size > of the inode at guest_memfd creation time), causing one of the two > threads to get -ENOMEM. Suggestions to solve this are appreciated! > 4. max_usage_in_bytes statistic (cgroups v1) for guest_memfd HugeTLB > pages should be correct but needs testing and could be wrong. > 5. memcg charging (charge_memcg()) for cgroups v2 for guest_memfd > HugeTLB pages after splitting should be correct but needs testing and > could be wrong. > 6. Page cache accounting: When a hugetlb page is split, guest_memfd will > incur page count in both NR_HUGETLB (counted at hugetlb allocation > time) and NR_FILE_PAGES stats (counted when split pages are added to > the filemap). Is this aligned with what people expect? > For people who might be testing this series with non-Coco VMs (heads up, Patrick and Nikita!), this currently splits the folio as long as some shareability in the huge folio is shared, which is probably unnecessary? IIUC core-mm doesn't support mapping at 1G but from a cursory reading it seems like the faulting function calling kvm_gmem_fault_shared() could possibly be able to map a 1G page at 4K. Looks like we might need another flag like GUEST_MEMFD_FLAG_SUPPORT_CONVERSION, which will gate initialization of the shareability maple tree/xarray. If shareability is NULL for the entire hugepage range, then no splitting will occur. For Coco VMs, this should be safe, since if this flag is not set, kvm_gmem_fault_shared() will always not be able to fault (the shareability value will be NULL. > Here are some optimizations that could be explored in future series: > > 1. Pages could be split from 1G to 2M first and only split to 4K if > necessary. > 2. Zeroing could be skipped for Coco VMs if hardware already zeroes the > pages. > >