From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 340EF1F872A for ; Fri, 16 May 2025 17:51:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747417914; cv=none; b=cBjWygFFB5n2+mX4r30DQUSUIe2843zkQqJ5/zIHALI7AUSdPGJumjaDKFgmSHyQ6YeEh+SVleOMiDyzyRYFoDhh1LTMHiSGpRcJCKUUeXFnu1zSuyCOKM5swgxumttwfsS7x5GrlU+TkdNt71NXM9g7cEEtoGFtmiz5sUE/zUA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747417914; c=relaxed/simple; bh=2Q3t7q7zXPiO9TUpc46lPD0ZwFpz2ZIFeKkfnFOyL64=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oCiDOq6isLCWUn0v/RTvCCDa6b2e/ji0uYoGAxhtm8yGcCG875iOFr5EwdF2sUPJuQ+NCzv1E0lp/2UxI7JZB1KpcuMZqhkaL9qTX4aHSWL+7+m0NFLpMGs2jfeZWoSlyBh7p96ESTZwzPyMYYG6By40p4koN3bnQdySdO5VbJs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tVybtsgY; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tVybtsgY" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30e9659a391so354340a91.0 for ; Fri, 16 May 2025 10:51:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747417912; x=1748022712; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=V5HQp6kq+X9VHvMiTR+cwYmP5qs8g8pvYOQWqNMSHLc=; b=tVybtsgYeCWwe2RqoWixByGp6XXq0Mpcco8c0dtI956qNccwIqImmJOk5JGqSacjDo G8EROcJ0VV+yds8A5Tde7Nh4Vmg0EzElYePNBLu4BgliKafDlDxxaAfIEGnakaGbomNv OuvOwHOBsuj7D8qWezvmcuYIq+AS0XeHGF4Qaom7uoj6DV/RV4HTHP1Qw7A1iqo0vjht CKbTBf257dtqsRKnnpuUlXpmlOQipzwEk4f5I/C0oDQ4lrfeh/TUPCc2AT5Sj+AZwziq 8420bpxkYvF27gQjUZvoqRHsAu7F1HgpNJC6dNVEpAppAVuW1cAKDjAk/JFlimIGmsMQ YQJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747417912; x=1748022712; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=V5HQp6kq+X9VHvMiTR+cwYmP5qs8g8pvYOQWqNMSHLc=; b=nB5CUCd1PCZX3sYuZuf0wHswB7BYwKnpXhcXdI2F4crsYWXXaibm/tMMoav+CIh/Bd OoFTI0VPEDNlmgw/2NGI2G01vDsJ7RXE1JIcHOfiz4O9qmDneJ4DEDmAniwv5cM8cVAw I+ng+lIxdcRIacRnyf61lVpDNPDoihKhJQjAjaOxDOzTM7nCdVdSCPlmLW1hLVsT6Csg mytT9Tl9Ect7XTnR9BJP6eCDOkgbOLmaAZ8rIL6H1vaKyJfbpnbG+0B/zEPcogWy3N6A MUA6ctn24MynShxQap6DgzHTaPRL5TUU/SaiQSTG4Dx5B+ORtS4qGE6jZZAb4xNFtb2+ NJQg== X-Forwarded-Encrypted: i=1; AJvYcCVNPMpLSyco7XidLNdkuiIicoWWGJkY6vCJLv/KqQWoxijYqRZJUPU8sTfWHTukdT1XAVIlgvAFQZJy4G8=@vger.kernel.org X-Gm-Message-State: AOJu0YyAscikLBYd3H9RkzdZq9vNDmue0heEUX++3jrMqJutpmsdXVmg 6qdX3rzFF3vRnROtI1gDzO6wtVgW+4QiS/W33H0g7vtXcIqhDiMA9bkUiqCSBnDYZIhe+Voetq3 F2EgEyw== X-Google-Smtp-Source: AGHT+IFwtvvhEW33lcn8I0JC2VzVOUh9GPDEpoRuyJJmzbZM79EJn+UNnUU3HQ0UiE4FGqM+5/rqg3sdCXQ= X-Received: from pjyd7.prod.google.com ([2002:a17:90a:dfc7:b0:2ff:5752:a78f]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2b45:b0:30e:823f:ef21 with SMTP id 98e67ed59e1d1-30e83228e05mr4391414a91.32.1747417912343; Fri, 16 May 2025 10:51:52 -0700 (PDT) Date: Fri, 16 May 2025 10:51:50 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <24e8ae7483d0fada8d5042f9cd5598573ca8f1c5.camel@intel.com> <7d3b391f3a31396bd9abe641259392fd94b5e72f.camel@intel.com> Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd From: Sean Christopherson To: Rick P Edgecombe Cc: Vishal Annapurve , "pvorel@suse.cz" , "kvm@vger.kernel.org" , "catalin.marinas@arm.com" , Jun Miao , "palmer@dabbelt.com" , "pdurrant@amazon.co.uk" , "vbabka@suse.cz" , "peterx@redhat.com" , "x86@kernel.org" , "amoorthy@google.com" , "jack@suse.cz" , "maz@kernel.org" , "tabba@google.com" , "vkuznets@redhat.com" , "quic_svaddagi@quicinc.com" , "mail@maciej.szmigiero.name" , "hughd@google.com" , "quic_eberman@quicinc.com" , Wei W Wang , "keirf@google.com" , Maciej Wieczor-Retman , Yan Y Zhao , Dave Hansen , "ajones@ventanamicro.com" , "rppt@kernel.org" , "quic_mnalajal@quicinc.com" , "aik@amd.com" , "usama.arif@bytedance.com" , "fvdl@google.com" , "paul.walmsley@sifive.com" , "quic_cvanscha@quicinc.com" , "nsaenz@amazon.es" , "willy@infradead.org" , Fan Du , "anthony.yznaga@oracle.com" , "linux-kernel@vger.kernel.org" , "thomas.lendacky@amd.com" , "mic@digikod.net" , "oliver.upton@linux.dev" , Kirill Shutemov , "akpm@linux-foundation.org" , "steven.price@arm.com" , "binbin.wu@linux.intel.com" , "muchun.song@linux.dev" , Zhiquan1 Li , "rientjes@google.com" , "mpe@ellerman.id.au" , Erdem Aktas , "david@redhat.com" , "jgg@ziepe.ca" , "bfoster@redhat.com" , "jhubbard@nvidia.com" , Haibo1 Xu , "anup@brainfault.org" , Isaku Yamahata , "jthoughton@google.com" , "will@kernel.org" , "steven.sistare@oracle.com" , "quic_pheragu@quicinc.com" , "jarkko@kernel.org" , "chenhuacai@kernel.org" , Kai Huang , "shuah@kernel.org" , "dwmw@amazon.co.uk" , "pankaj.gupta@amd.com" , Chao P Peng , "nikunj@amd.com" , Alexander Graf , "viro@zeniv.linux.org.uk" , "pbonzini@redhat.com" , "yuzenghui@huawei.com" , "jroedel@suse.de" , "suzuki.poulose@arm.com" , "jgowans@amazon.com" , Yilun Xu , "liam.merwick@oracle.com" , "michael.roth@amd.com" , "quic_tsoni@quicinc.com" , "richard.weiyang@gmail.com" , Ira Weiny , "aou@eecs.berkeley.edu" , Xiaoyao Li , "qperret@google.com" , "kent.overstreet@linux.dev" , "dmatlack@google.com" , "james.morse@arm.com" , "brauner@kernel.org" , "ackerleytng@google.com" , "linux-fsdevel@vger.kernel.org" , "pgonda@google.com" , "quic_pderrin@quicinc.com" , "roypat@amazon.co.uk" , "linux-mm@kvack.org" , "hch@infradead.org" Content-Type: text/plain; charset="us-ascii" On Fri, May 16, 2025, Rick P Edgecombe wrote: > On Fri, 2025-05-16 at 06:11 -0700, Vishal Annapurve wrote: > > Google internally uses 1G hugetlb pages to achieve high bandwidth IO, > > lower memory footprint using HVO and lower MMU/IOMMU page table memory > > footprint among other improvements. These percentages carry a > > substantial impact when working at the scale of large fleets of hosts > > each carrying significant memory capacity. > > There must have been a lot of measuring involved in that. But the numbers I was > hoping for were how much does *this* series help upstream. ... > I asked this question assuming there were some measurements for the 1GB part of > this series. It sounds like the reasoning is instead that this is how Google > does things, which is backed by way more benchmarking than kernel patches are > used to getting. So it can just be reasonable assumed to be helpful. > > But for upstream code, I'd expect there to be a bit more concrete than "we > believe" and "substantial impact". It seems like I'm in the minority here > though. So if no one else wants to pressure test the thinking in the usual way, > I guess I'll just have to wonder. >From my perspective, 1GiB hugepage support in guest_memfd isn't about improving CoCo performance, it's about achieving feature parity on guest_memfd with respect to existing backing stores so that it's possible to use guest_memfd to back all VM shapes in a fleet. Let's assume there is significant value in backing non-CoCo VMs with 1GiB pages, unless you want to re-litigate the existence of 1GiB support in HugeTLBFS. If we assume 1GiB support is mandatory for non-CoCo VMs, then it becomes mandatory for CoCo VMs as well, because it's the only realistic way to run CoCo VMs and non-CoCo VMs on a single host. Mixing 1GiB HugeTLBFS with any other backing store for VMs simply isn't tenable due to the nature of 1GiB allocations. E.g. grabbing sub-1GiB chunks of memory for CoCo VMs quickly fragments memory to the point where HugeTLBFS can't allocate memory for non-CoCo VMs. Teaching HugeTLBFS to play nice with TDX and SNP isn't happening, which leaves adding 1GiB support to guest_memfd as the only way forward. Any boost to TDX (or SNP) performance is purely a bonus.