From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Date: Wed, 29 Nov 2023 14:40:19 -0800 Subject: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory In-Reply-To: <81628606-ca9b-866f-5e71-91001e856871@suse.cz> References: <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> <4ca2253d-276f-43c5-8e9f-0ded5d5b2779@redhat.com> <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Message-ID: List-Id: To: kvm-riscv@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Mon, Nov 27, 2023, Vlastimil Babka wrote: > On 11/2/23 16:46, Paolo Bonzini wrote: > > On Thu, Nov 2, 2023 at 4:38?PM Sean Christopherson wrote: > >> Actually, looking that this again, there's not actually a hard dependency on THP. > >> A THP-enabled kernel _probably_ gives a higher probability of using hugepages, > >> but mostly because THP selects COMPACTION, and I suppose because using THP for > >> other allocations reduces overall fragmentation. > > > > Yes, that's why I didn't even bother enabling it unless THP is > > enabled, but it makes even more sense to just try. > > > >> So rather than honor KVM_GUEST_MEMFD_ALLOW_HUGEPAGE iff THP is enabled, I think > >> we should do the below (I verified KVM can create hugepages with THP=n). We'll > >> need another capability, but (a) we probably should have that anyways and (b) it > >> provides a cleaner path to adding PUD-sized hugepage support in the future. > > > > I wonder if we need KVM_CAP_GUEST_MEMFD_HUGEPAGE_PMD_SIZE though. This > > should be a generic kernel API and in fact the sizes are available in > > a not-so-friendly format in /sys/kernel/mm/hugepages. > > > > We should just add /sys/kernel/mm/hugepages/sizes that contains > > "2097152 1073741824" on x86 (only the former if 1G pages are not > > supported). > > > > Plus: is this the best API if we need something else for 1G pages? > > > > Let's drop *this* patch and proceed incrementally. (Again, this is > > what I want to do with this final review: identify places that are > > stil sticky, and don't let them block the rest). > > > > Coincidentially we have an open spot next week at plumbers. Let's > > extend Fuad's section to cover more guestmem work. > > Hi, > > was there any outcome wrt this one? No, we punted on hugepage support for the initial guest_memfd merge. We definitely plan on adding hugeapge support sooner than later, but we haven't yet agreed on exactly what that will look like. > Based on my experience with THP's it would be best if userspace didn't have > to opt-in, nor care about the supported size. If the given size is unaligned, > provide a mix of large pages up to an aligned size, and for the rest fallback > to base pages, which should be better than -EINVAL on creation (is it > possible with the current implementation? I'd hope so so?). guest_memfd serves a different use case than THP. For modern VMs, and especially for slice-of-hardware VMs that are one of the main targets for guest_memfd, if not _the_ main target, guest memory should _always_ be backed by hugepages in the physical domain. The actual guest mappings might not be huge, e.g. x86 needs to do partial mappings to skip over (legacy) memory holes, but KVM already gracefully handles that. In other words, for most guest_memfd use cases, if userspace wants hugepages but KVM can't provide hugepages, then it is much more desirable to return an error than to silently fall back to small pages. I 100% agree that having to opt-in is suboptimal, but IMO providing "error on an incompatible configuration" semantics without requiring userspace to opt-in is an even worse experience for userspace. > A way to opt-out from huge pages could be useful although there's always the > risk of some initial troubles resulting in various online sources cargo-cult > recommending to opt-out forever. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E05147A51 for ; Wed, 29 Nov 2023 22:40:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Nc3DItpT" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-5ca61d84dc3so6117647b3.0 for ; Wed, 29 Nov 2023 14:40:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701297621; x=1701902421; darn=lists.linux.dev; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=Nc3DItpThdoM/yKlsiXHwN2jxCBxtFbSJs59ZwY5xnKJmyzaDoghWr/XNDeR4OzxU0 1xw/Z0nxhxam4eGDyqIEE6CO8NQMjbqFt0I9IatfCcXDxHVuK7XTXyy5Tmtu+TRa+sq+ bA18Tb15DrcIjxDyQOhJSsEEppaSRtDSSv9qo94S0xgDhbJDT4V4WJ1fTTrBUMuGTxAu 2QavdpZ/Jmt8VDYfIwZbErh1UQlzBSQMfXdXZ5WtWUHCHBly90isUhe7fPa+RQD+CYCb IK1kvW23gnso4oTi5gEYZJOPh8WdBdTaB9kNeHstxWqAsMRjx0t3CUefDTDCvjtqoKAQ niPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701297621; x=1701902421; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=onwCIru/iROy/KAeZ8/9/Fs+Iwt6p4MkUcgSvESEW2cnBUQhXA7GbeTXR8iuDwfq+M sDQchbholLQ+4us2Ucaj75UCRvHmWXIKWG5kFmzpprQGW8oCQR0qFgXQn0xsoROQt3YS 0EPDpVm2pnvS3Vm305KUsMAw/4nDfbRuF57kf5TiystZdtnMHZsVbtvLjVSJ6K/nHx9m ceWjRcjyMzXoTFl3l2id5VIrP2cPdxRZQAY7B+ofnwbBw+LGT5YSXv2vVIrB3i3bZD10 PbdXjXP1Y3l7vnaRPu3BHf/lETupiVQYbROWwmxuj0ewhddxDKZnKus+YesnXCyZwqwg 4A0Q== X-Gm-Message-State: AOJu0YwHEepmKCLUmv2uXU47LxRe5nkFS6mE3+qrSv6rTi550b1FCvlE 0RELSK+iKcAr3oyMecEYSRiFabx42oY= X-Google-Smtp-Source: AGHT+IH9RxzmajTLifcNJivXHXF15GYt6RNI6hyz68HWdvvEFKxSVfujRMym9vZDynLvRqNw698SRRht3dY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:842:b0:5cc:cd5e:8f0e with SMTP id bz2-20020a05690c084200b005cccd5e8f0emr592433ywb.0.1701297621195; Wed, 29 Nov 2023 14:40:21 -0800 (PST) Date: Wed, 29 Nov 2023 14:40:19 -0800 In-Reply-To: <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> <4ca2253d-276f-43c5-8e9f-0ded5d5b2779@redhat.com> <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Vlastimil Babka Cc: Paolo Bonzini , Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Mon, Nov 27, 2023, Vlastimil Babka wrote: > On 11/2/23 16:46, Paolo Bonzini wrote: > > On Thu, Nov 2, 2023 at 4:38=E2=80=AFPM Sean Christopherson wrote: > >> Actually, looking that this again, there's not actually a hard depende= ncy on THP. > >> A THP-enabled kernel _probably_ gives a higher probability of using h= ugepages, > >> but mostly because THP selects COMPACTION, and I suppose because using= THP for > >> other allocations reduces overall fragmentation. > >=20 > > Yes, that's why I didn't even bother enabling it unless THP is > > enabled, but it makes even more sense to just try. > >=20 > >> So rather than honor KVM_GUEST_MEMFD_ALLOW_HUGEPAGE iff THP is enabled= , I think > >> we should do the below (I verified KVM can create hugepages with THP= =3Dn). We'll > >> need another capability, but (a) we probably should have that anyways = and (b) it > >> provides a cleaner path to adding PUD-sized hugepage support in the fu= ture. > >=20 > > I wonder if we need KVM_CAP_GUEST_MEMFD_HUGEPAGE_PMD_SIZE though. This > > should be a generic kernel API and in fact the sizes are available in > > a not-so-friendly format in /sys/kernel/mm/hugepages. > >=20 > > We should just add /sys/kernel/mm/hugepages/sizes that contains > > "2097152 1073741824" on x86 (only the former if 1G pages are not > > supported). > >=20 > > Plus: is this the best API if we need something else for 1G pages? > >=20 > > Let's drop *this* patch and proceed incrementally. (Again, this is > > what I want to do with this final review: identify places that are > > stil sticky, and don't let them block the rest). > >=20 > > Coincidentially we have an open spot next week at plumbers. Let's > > extend Fuad's section to cover more guestmem work. >=20 > Hi, >=20 > was there any outcome wrt this one? No, we punted on hugepage support for the initial guest_memfd merge. We de= finitely plan on adding hugeapge support sooner than later, but we haven't yet agree= d on exactly what that will look like. > Based on my experience with THP's it would be best if userspace didn't ha= ve > to opt-in, nor care about the supported size. If the given size is unalig= ned, > provide a mix of large pages up to an aligned size, and for the rest fall= back > to base pages, which should be better than -EINVAL on creation (is it > possible with the current implementation? I'd hope so so?). guest_memfd serves a different use case than THP. For modern VMs, and espe= cially for slice-of-hardware VMs that are one of the main targets for guest_memfd,= if not _the_ main target, guest memory should _always_ be backed by hugepages in t= he physical domain. The actual guest mappings might not be huge, e.g. x86 nee= ds to do partial mappings to skip over (legacy) memory holes, but KVM already gra= cefully handles that. In other words, for most guest_memfd use cases, if userspace wants hugepage= s but KVM can't provide hugepages, then it is much more desirable to return an er= ror than to silently fall back to small pages. I 100% agree that having to opt-in is suboptimal, but IMO providing "error = on an incompatible configuration" semantics without requiring userspace to opt-in= is an even worse experience for userspace. > A way to opt-out from huge pages could be useful although there's always = the > risk of some initial troubles resulting in various online sources cargo-c= ult > recommending to opt-out forever. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 546AFC4167B for ; Wed, 29 Nov 2023 22:40:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=DcLHADclQCbaZ1o8Mg9Oz8iibU+BbMx2PB0U5+fXJgo=; b=WXeP0LB0t7kGHHvVZnJq7KxXzu LvR8ZLnTztVAE3kmze3Mtbefa8wDdotz/k2MKUAcBQMfaP9aKRle9pwqSAd8cDjZVYKbXNo5as58q 3NpdhMw2dUDqQJIPzQXClneE13YAA7B6GddF7V2yJ804x1+Vudc4yfGulw+tGqQnyWYoDFQFSGeJs jyliTMBdEb1Fgi4Z7t/sR/1OGzi1YIRVdnWf7WYTnOE6RYvHooY1p82nB+41Jev+gGVw2IjrIdtcT L8rgQ7OjLAQoHJROiJoexXd9gPJaf1yQiMz0ApKG6SJ9VxR6fNQOPV6thoP9jNa1YLn4BXZONkPoD fUpzV6+g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r8TEI-009SLP-2y; Wed, 29 Nov 2023 22:40:26 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r8TEG-009SJK-0G for linux-riscv@lists.infradead.org; Wed, 29 Nov 2023 22:40:26 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5ca61d84dc3so6117517b3.0 for ; Wed, 29 Nov 2023 14:40:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701297621; x=1701902421; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=A8TrdsqVlSB2vVj0cbQg0CZg/YrdMLGPoQcxd28Ednc63oOOi+g41ccZKPasLgLK+E 43XC9EMUUccQlK3/IMrX5/UxuAw8QjKh6xgD+3c6ysqE14hnxXd6SnAihDNzNEnuNItp on/zq3Eiljy24b0Smrg3Otlc7DOWjYTuryKj9buIrc70ugPMzp1AtjJsI6PDk+SbTzq4 qtjK6xVuXWeNKC4dO6Sw2dhZN66nnSeMN9PMDxx06B1vybzlQqTY2A9MrQyDcwAwSuEh 404e1G+NKvYmudE4AdvMUeUQ2Dpz9JKqY0RUxizawf375gTQAQvHRY61odf4BTF3eIrj 7Yxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701297621; x=1701902421; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=KMmc7uAGxvrSoIGtu+SYEzR5iTjWeK5seZ51f1aIFoDMNgIKqdxfgJontzNmAIz3q2 EUJz4fBePB2gTBDnBpv10BAfA+bBUOI93erHyatoHH4faUJoelVeHE69oEUvXCVM6Uhn oE/oSv90AVQ32ExlAlvI+t5OGOKlyV1O3oORNME/6YDIt1P89CbMYHUYbvN+30+Hh8wo hW5+yeB+r7GFkcbLdKtMFSF6FQs11OsrWDbYAO2bz3+TniNDYS/gyKTkmWWJJm1P8NoB VUWUjM6KRhAAYCzMG3ukkNgmlICAik1qP8unKkafYcEagejVUbD0KVIcxcCtR3U2E6oD mE0g== X-Gm-Message-State: AOJu0YxxyZqYVx+v8e5HGBOrPid/nNYGZ7mJ7Fbxn/s5JaF/dF39POHJ O2MTjuMIzjFbHiSpwYVSzh3E9Azl05Q= X-Google-Smtp-Source: AGHT+IH9RxzmajTLifcNJivXHXF15GYt6RNI6hyz68HWdvvEFKxSVfujRMym9vZDynLvRqNw698SRRht3dY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:842:b0:5cc:cd5e:8f0e with SMTP id bz2-20020a05690c084200b005cccd5e8f0emr592433ywb.0.1701297621195; Wed, 29 Nov 2023 14:40:21 -0800 (PST) Date: Wed, 29 Nov 2023 14:40:19 -0800 In-Reply-To: <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Mime-Version: 1.0 References: <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> <4ca2253d-276f-43c5-8e9f-0ded5d5b2779@redhat.com> <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Vlastimil Babka Cc: Paolo Bonzini , Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231129_144024_117582_7B884E79 X-CRM114-Status: GOOD ( 33.02 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org T24gTW9uLCBOb3YgMjcsIDIwMjMsIFZsYXN0aW1pbCBCYWJrYSB3cm90ZToKPiBPbiAxMS8yLzIz IDE2OjQ2LCBQYW9sbyBCb256aW5pIHdyb3RlOgo+ID4gT24gVGh1LCBOb3YgMiwgMjAyMyBhdCA0 OjM44oCvUE0gU2VhbiBDaHJpc3RvcGhlcnNvbiA8c2VhbmpjQGdvb2dsZS5jb20+IHdyb3RlOgo+ ID4+IEFjdHVhbGx5LCBsb29raW5nIHRoYXQgdGhpcyBhZ2FpbiwgdGhlcmUncyBub3QgYWN0dWFs bHkgYSBoYXJkIGRlcGVuZGVuY3kgb24gVEhQLgo+ID4+IEEgVEhQLWVuYWJsZWQga2VybmVsIF9w cm9iYWJseV8gIGdpdmVzIGEgaGlnaGVyIHByb2JhYmlsaXR5IG9mIHVzaW5nIGh1Z2VwYWdlcywK PiA+PiBidXQgbW9zdGx5IGJlY2F1c2UgVEhQIHNlbGVjdHMgQ09NUEFDVElPTiwgYW5kIEkgc3Vw cG9zZSBiZWNhdXNlIHVzaW5nIFRIUCBmb3IKPiA+PiBvdGhlciBhbGxvY2F0aW9ucyByZWR1Y2Vz IG92ZXJhbGwgZnJhZ21lbnRhdGlvbi4KPiA+IAo+ID4gWWVzLCB0aGF0J3Mgd2h5IEkgZGlkbid0 IGV2ZW4gYm90aGVyIGVuYWJsaW5nIGl0IHVubGVzcyBUSFAgaXMKPiA+IGVuYWJsZWQsIGJ1dCBp dCBtYWtlcyBldmVuIG1vcmUgc2Vuc2UgdG8ganVzdCB0cnkuCj4gPiAKPiA+PiBTbyByYXRoZXIg dGhhbiBob25vciBLVk1fR1VFU1RfTUVNRkRfQUxMT1dfSFVHRVBBR0UgaWZmIFRIUCBpcyBlbmFi bGVkLCBJIHRoaW5rCj4gPj4gd2Ugc2hvdWxkIGRvIHRoZSBiZWxvdyAoSSB2ZXJpZmllZCBLVk0g Y2FuIGNyZWF0ZSBodWdlcGFnZXMgd2l0aCBUSFA9bikuICBXZSdsbAo+ID4+IG5lZWQgYW5vdGhl ciBjYXBhYmlsaXR5LCBidXQgKGEpIHdlIHByb2JhYmx5IHNob3VsZCBoYXZlIHRoYXQgYW55d2F5 cyBhbmQgKGIpIGl0Cj4gPj4gcHJvdmlkZXMgYSBjbGVhbmVyIHBhdGggdG8gYWRkaW5nIFBVRC1z aXplZCBodWdlcGFnZSBzdXBwb3J0IGluIHRoZSBmdXR1cmUuCj4gPiAKPiA+IEkgd29uZGVyIGlm IHdlIG5lZWQgS1ZNX0NBUF9HVUVTVF9NRU1GRF9IVUdFUEFHRV9QTURfU0laRSB0aG91Z2guIFRo aXMKPiA+IHNob3VsZCBiZSBhIGdlbmVyaWMga2VybmVsIEFQSSBhbmQgaW4gZmFjdCB0aGUgc2l6 ZXMgYXJlIGF2YWlsYWJsZSBpbgo+ID4gYSBub3Qtc28tZnJpZW5kbHkgZm9ybWF0IGluIC9zeXMv a2VybmVsL21tL2h1Z2VwYWdlcy4KPiA+IAo+ID4gV2Ugc2hvdWxkIGp1c3QgYWRkIC9zeXMva2Vy bmVsL21tL2h1Z2VwYWdlcy9zaXplcyB0aGF0IGNvbnRhaW5zCj4gPiAiMjA5NzE1MiAxMDczNzQx ODI0IiBvbiB4ODYgKG9ubHkgdGhlIGZvcm1lciBpZiAxRyBwYWdlcyBhcmUgbm90Cj4gPiBzdXBw b3J0ZWQpLgo+ID4gCj4gPiBQbHVzOiBpcyB0aGlzIHRoZSBiZXN0IEFQSSBpZiB3ZSBuZWVkIHNv bWV0aGluZyBlbHNlIGZvciAxRyBwYWdlcz8KPiA+IAo+ID4gTGV0J3MgZHJvcCAqdGhpcyogcGF0 Y2ggYW5kIHByb2NlZWQgaW5jcmVtZW50YWxseS4gKEFnYWluLCB0aGlzIGlzCj4gPiB3aGF0IEkg d2FudCB0byBkbyB3aXRoIHRoaXMgZmluYWwgcmV2aWV3OiBpZGVudGlmeSBwbGFjZXMgdGhhdCBh cmUKPiA+IHN0aWwgc3RpY2t5LCBhbmQgZG9uJ3QgbGV0IHRoZW0gYmxvY2sgdGhlIHJlc3QpLgo+ ID4gCj4gPiBDb2luY2lkZW50aWFsbHkgd2UgaGF2ZSBhbiBvcGVuIHNwb3QgbmV4dCB3ZWVrIGF0 IHBsdW1iZXJzLiBMZXQncwo+ID4gZXh0ZW5kIEZ1YWQncyBzZWN0aW9uIHRvIGNvdmVyIG1vcmUg Z3Vlc3RtZW0gd29yay4KPiAKPiBIaSwKPiAKPiB3YXMgdGhlcmUgYW55IG91dGNvbWUgd3J0IHRo aXMgb25lPwoKTm8sIHdlIHB1bnRlZCBvbiBodWdlcGFnZSBzdXBwb3J0IGZvciB0aGUgaW5pdGlh bCBndWVzdF9tZW1mZCBtZXJnZS4gIFdlIGRlZmluaXRlbHkKcGxhbiBvbiBhZGRpbmcgaHVnZWFw Z2Ugc3VwcG9ydCBzb29uZXIgdGhhbiBsYXRlciwgYnV0IHdlIGhhdmVuJ3QgeWV0IGFncmVlZCBv bgpleGFjdGx5IHdoYXQgdGhhdCB3aWxsIGxvb2sgbGlrZS4KCj4gQmFzZWQgb24gbXkgZXhwZXJp ZW5jZSB3aXRoIFRIUCdzIGl0IHdvdWxkIGJlIGJlc3QgaWYgdXNlcnNwYWNlIGRpZG4ndCBoYXZl Cj4gdG8gb3B0LWluLCBub3IgY2FyZSBhYm91dCB0aGUgc3VwcG9ydGVkIHNpemUuIElmIHRoZSBn aXZlbiBzaXplIGlzIHVuYWxpZ25lZCwKPiBwcm92aWRlIGEgbWl4IG9mIGxhcmdlIHBhZ2VzIHVw IHRvIGFuIGFsaWduZWQgc2l6ZSwgYW5kIGZvciB0aGUgcmVzdCBmYWxsYmFjawo+IHRvIGJhc2Ug cGFnZXMsIHdoaWNoIHNob3VsZCBiZSBiZXR0ZXIgdGhhbiAtRUlOVkFMIG9uIGNyZWF0aW9uIChp cyBpdAo+IHBvc3NpYmxlIHdpdGggdGhlIGN1cnJlbnQgaW1wbGVtZW50YXRpb24/IEknZCBob3Bl IHNvIHNvPykuCgpndWVzdF9tZW1mZCBzZXJ2ZXMgYSBkaWZmZXJlbnQgdXNlIGNhc2UgdGhhbiBU SFAuICBGb3IgbW9kZXJuIFZNcywgYW5kIGVzcGVjaWFsbHkKZm9yIHNsaWNlLW9mLWhhcmR3YXJl IFZNcyB0aGF0IGFyZSBvbmUgb2YgdGhlIG1haW4gdGFyZ2V0cyBmb3IgZ3Vlc3RfbWVtZmQsIGlm IG5vdApfdGhlXyBtYWluIHRhcmdldCwgZ3Vlc3QgbWVtb3J5IHNob3VsZCBfYWx3YXlzXyBiZSBi YWNrZWQgYnkgaHVnZXBhZ2VzIGluIHRoZQpwaHlzaWNhbCBkb21haW4uICBUaGUgYWN0dWFsIGd1 ZXN0IG1hcHBpbmdzIG1pZ2h0IG5vdCBiZSBodWdlLCBlLmcuIHg4NiBuZWVkcyB0bwpkbyBwYXJ0 aWFsIG1hcHBpbmdzIHRvIHNraXAgb3ZlciAobGVnYWN5KSBtZW1vcnkgaG9sZXMsIGJ1dCBLVk0g YWxyZWFkeSBncmFjZWZ1bGx5CmhhbmRsZXMgdGhhdC4KCkluIG90aGVyIHdvcmRzLCBmb3IgbW9z dCBndWVzdF9tZW1mZCB1c2UgY2FzZXMsIGlmIHVzZXJzcGFjZSB3YW50cyBodWdlcGFnZXMgYnV0 CktWTSBjYW4ndCBwcm92aWRlIGh1Z2VwYWdlcywgdGhlbiBpdCBpcyBtdWNoIG1vcmUgZGVzaXJh YmxlIHRvIHJldHVybiBhbiBlcnJvcgp0aGFuIHRvIHNpbGVudGx5IGZhbGwgYmFjayB0byBzbWFs bCBwYWdlcy4KCkkgMTAwJSBhZ3JlZSB0aGF0IGhhdmluZyB0byBvcHQtaW4gaXMgc3Vib3B0aW1h bCwgYnV0IElNTyBwcm92aWRpbmcgImVycm9yIG9uIGFuCmluY29tcGF0aWJsZSBjb25maWd1cmF0 aW9uIiBzZW1hbnRpY3Mgd2l0aG91dCByZXF1aXJpbmcgdXNlcnNwYWNlIHRvIG9wdC1pbiBpcyBh bgpldmVuIHdvcnNlIGV4cGVyaWVuY2UgZm9yIHVzZXJzcGFjZS4KCj4gQSB3YXkgdG8gb3B0LW91 dCBmcm9tIGh1Z2UgcGFnZXMgY291bGQgYmUgdXNlZnVsIGFsdGhvdWdoIHRoZXJlJ3MgYWx3YXlz IHRoZQo+IHJpc2sgb2Ygc29tZSBpbml0aWFsIHRyb3VibGVzIHJlc3VsdGluZyBpbiB2YXJpb3Vz IG9ubGluZSBzb3VyY2VzIGNhcmdvLWN1bHQKPiByZWNvbW1lbmRpbmcgdG8gb3B0LW91dCBmb3Jl dmVyLgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlu dXgtcmlzY3YgbWFpbGluZyBsaXN0CmxpbnV4LXJpc2N2QGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0 cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1yaXNjdgo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9834EC4167B for ; Wed, 29 Nov 2023 22:41:16 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=4eeTdnVb; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4SgZ6y6ylnz3cVM for ; Thu, 30 Nov 2023 09:41:14 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=4eeTdnVb; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=flex--seanjc.bounces.google.com (client-ip=2607:f8b0:4864:20::114a; helo=mail-yw1-x114a.google.com; envelope-from=31b1nzqykdciqc8lhaemmejc.amkjglsvnna-bctjgqrq.mxj89q.mpe@flex--seanjc.bounces.google.com; receiver=lists.ozlabs.org) Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4SgZ610lCvz2yDM for ; Thu, 30 Nov 2023 09:40:23 +1100 (AEDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5d1b431fa7bso5921187b3.1 for ; Wed, 29 Nov 2023 14:40:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701297621; x=1701902421; darn=lists.ozlabs.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=4eeTdnVbJQWqSfXha8XOZCXNpbaSq0P4S3MSa1lh6zhIv7IoFWtxDfs5Q0P+oAy/qX y2mXqcVPvgoJZObsRRx+hbLxohIz5/v5hqy0nG58UtuXMn3S/5Y7i1/BKLMiQDiAZ1hA EHW105aHEd2Q8d01MUFIwiAsv8AMXQ25uEa7HARHGkOSzHuf3ni4knJsm7Jf7eU7BPVX yJCHiRNni8dX2e82gI2z9ZPo8L4pi6xp+VaCqdAK3lptfF+oO7sfxBhzr0bBxTNNUihu R4nPByCfHpBpcTGO/2JUyI3S/YzLkHJNecdH9Jky7dcHLX6U2Q0dvkG6toeH/NElx8DW AvJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701297621; x=1701902421; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=X6SftPNCkH0BdDmTQdX2DBxZHeC/4OPxDtb0MUkjD9pMRTOLOtCq+vJiq/5oKo+f+I P9u3VycqAHnocnad+yp4J6EJUMx3IbHtGNDMYCs7zzmcX+uCO8DTT/geZV55huSIiUYt K0s0ILh2vQB+ut7uuKn9Vk//EWcee8rbFhh2K1evf+bWTBxKRV0+KQd+gnTUVOD7ycc0 nyrQ1VFIIGLOqkDN2+87aSTEPVFdP7XhX5THylWq9AcMM+itas9VU8N62PYXG1+hah/7 /1fiV3M4bZyBKuofV97wATAbZqlBhyz3blZ8QYaEv2YSLxv8XEF1QiwOBoYIeAQ8Xv8v HJ5Q== X-Gm-Message-State: AOJu0YypDjLcSzMydi6fiVRSL5tf7u67BJkhg/nCCyxL6Bq0C+xuD/XJ KsTYDYbTR/z6ZFCzfxTNuGNhydVAfWI= X-Google-Smtp-Source: AGHT+IH9RxzmajTLifcNJivXHXF15GYt6RNI6hyz68HWdvvEFKxSVfujRMym9vZDynLvRqNw698SRRht3dY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:842:b0:5cc:cd5e:8f0e with SMTP id bz2-20020a05690c084200b005cccd5e8f0emr592433ywb.0.1701297621195; Wed, 29 Nov 2023 14:40:21 -0800 (PST) Date: Wed, 29 Nov 2023 14:40:19 -0800 In-Reply-To: <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Mime-Version: 1.0 References: <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> <4ca2253d-276f-43c5-8e9f-0ded5d5b2779@redhat.com> <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Vlastimil Babka Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chao Peng , linux-riscv@lists.infradead.org, Isaku Yamahata , Marc Zyngier , Huacai Chen , Xiaoyao Li , "Matthew Wilcox \(Oracle\)" , Wang , Fuad Tabba , Yu Zhang , Maciej Szmigiero , Albert Ou , Michael Roth , Ackerley Tng , Alexander Viro , Paul Walmsley , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, =?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?= , Isaku Yamahata , Christian Brauner , Quentin Perret , Liam Merwick , linux-mips@vger.kernel.org, Oliver Upton , David Matlack , Jarkko Sakkinen , Palmer Dabbelt , "Kirill A . Shutemov" , kvm-riscv@lists.infradead.org, Anup Patel , linux-fsdevel@vger.kernel.org, Paolo Bonzini , Andrew Morton , Vishal Annapurve , linuxppc-dev@lists.ozlabs.org, Xu Yilun , Anish Moorthy Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Mon, Nov 27, 2023, Vlastimil Babka wrote: > On 11/2/23 16:46, Paolo Bonzini wrote: > > On Thu, Nov 2, 2023 at 4:38=E2=80=AFPM Sean Christopherson wrote: > >> Actually, looking that this again, there's not actually a hard depende= ncy on THP. > >> A THP-enabled kernel _probably_ gives a higher probability of using h= ugepages, > >> but mostly because THP selects COMPACTION, and I suppose because using= THP for > >> other allocations reduces overall fragmentation. > >=20 > > Yes, that's why I didn't even bother enabling it unless THP is > > enabled, but it makes even more sense to just try. > >=20 > >> So rather than honor KVM_GUEST_MEMFD_ALLOW_HUGEPAGE iff THP is enabled= , I think > >> we should do the below (I verified KVM can create hugepages with THP= =3Dn). We'll > >> need another capability, but (a) we probably should have that anyways = and (b) it > >> provides a cleaner path to adding PUD-sized hugepage support in the fu= ture. > >=20 > > I wonder if we need KVM_CAP_GUEST_MEMFD_HUGEPAGE_PMD_SIZE though. This > > should be a generic kernel API and in fact the sizes are available in > > a not-so-friendly format in /sys/kernel/mm/hugepages. > >=20 > > We should just add /sys/kernel/mm/hugepages/sizes that contains > > "2097152 1073741824" on x86 (only the former if 1G pages are not > > supported). > >=20 > > Plus: is this the best API if we need something else for 1G pages? > >=20 > > Let's drop *this* patch and proceed incrementally. (Again, this is > > what I want to do with this final review: identify places that are > > stil sticky, and don't let them block the rest). > >=20 > > Coincidentially we have an open spot next week at plumbers. Let's > > extend Fuad's section to cover more guestmem work. >=20 > Hi, >=20 > was there any outcome wrt this one? No, we punted on hugepage support for the initial guest_memfd merge. We de= finitely plan on adding hugeapge support sooner than later, but we haven't yet agree= d on exactly what that will look like. > Based on my experience with THP's it would be best if userspace didn't ha= ve > to opt-in, nor care about the supported size. If the given size is unalig= ned, > provide a mix of large pages up to an aligned size, and for the rest fall= back > to base pages, which should be better than -EINVAL on creation (is it > possible with the current implementation? I'd hope so so?). guest_memfd serves a different use case than THP. For modern VMs, and espe= cially for slice-of-hardware VMs that are one of the main targets for guest_memfd,= if not _the_ main target, guest memory should _always_ be backed by hugepages in t= he physical domain. The actual guest mappings might not be huge, e.g. x86 nee= ds to do partial mappings to skip over (legacy) memory holes, but KVM already gra= cefully handles that. In other words, for most guest_memfd use cases, if userspace wants hugepage= s but KVM can't provide hugepages, then it is much more desirable to return an er= ror than to silently fall back to small pages. I 100% agree that having to opt-in is suboptimal, but IMO providing "error = on an incompatible configuration" semantics without requiring userspace to opt-in= is an even worse experience for userspace. > A way to opt-out from huge pages could be useful although there's always = the > risk of some initial troubles resulting in various online sources cargo-c= ult > recommending to opt-out forever. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A74F2C4167B for ; Wed, 29 Nov 2023 22:40:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=unNojSNTuu1fuJJss0RvSyMvgQtSBhWWM13oWRSIhhI=; b=NvrAjouMFh5u9kE5SOoXqulDC6 1oFyCFz7PdRTigOe1+E+7QjXgYgnk4fJmta8MWbpf0uHzv8Lhl/qalldXz6lXjsdxHkmsYjy6CLIr 7zb8tL49F5JzRux0NgX0kEDtUvhkw5fZSj3O6U/fUpez3XcYrpsxsDYvvZ1YE5+8AJBeuv86Vx4eP pa3WXdQvCeMolaEQirktdllJtmOEIKrO69FDEGOCgmClu60XUMKbwwN623143oWn1rVQHZ7eetBEG 5JnOAVJmTrkUZoS3WgHNeoyLw4WQTbGdSCNMZesRZuu9A9MqVsRWYfnx+GA+5mnEPRsKl/UX6ylqa leJuL2fA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r8TEI-009SL1-05; Wed, 29 Nov 2023 22:40:26 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r8TEF-009SJN-1P for linux-arm-kernel@lists.infradead.org; Wed, 29 Nov 2023 22:40:24 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5d10f5bf5d9so5714777b3.3 for ; Wed, 29 Nov 2023 14:40:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701297621; x=1701902421; darn=lists.infradead.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=A8TrdsqVlSB2vVj0cbQg0CZg/YrdMLGPoQcxd28Ednc63oOOi+g41ccZKPasLgLK+E 43XC9EMUUccQlK3/IMrX5/UxuAw8QjKh6xgD+3c6ysqE14hnxXd6SnAihDNzNEnuNItp on/zq3Eiljy24b0Smrg3Otlc7DOWjYTuryKj9buIrc70ugPMzp1AtjJsI6PDk+SbTzq4 qtjK6xVuXWeNKC4dO6Sw2dhZN66nnSeMN9PMDxx06B1vybzlQqTY2A9MrQyDcwAwSuEh 404e1G+NKvYmudE4AdvMUeUQ2Dpz9JKqY0RUxizawf375gTQAQvHRY61odf4BTF3eIrj 7Yxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701297621; x=1701902421; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=LDVeTl61QPCCxPz+sfODK89nbVJMVG6TQMGABDOMFAw=; b=IuFhwVzXI7zDWLSfX2ukqy7goJOuKT+fgcENdd0NafZd3hLUwut3/yOk2BwOrBdN3r ewruBYCKKsTc+yUQDKSP6cKZN8Aeals4m28ZbRGIJe+fGwPDCJ/q18Aso8Qf+RJGR2KS NgAH9pJOZSTKmGsVjx0RqhSx9LlqN+3pLyWFZ1OI/H9HujgiVUNa6rGa40IH8E2x5Kta Q2eqzWj32m9wAWcNwi+hSZ0yI3w8J307s/ra0YfYijwQXD8tjpr8YRRBjLA/PunhBKyG v+ZEs3Y1zAavj3xO5JScTYUbBVI3Luw//JP+Z9wvJOyH2AHBCYlyZNoT2OD1ccMJV+R1 mqZw== X-Gm-Message-State: AOJu0YwmbaWmlTtevs7ii/zPLHAoV8D0vgym62prd+gpZLSSIYVHFQto 7WnsDJThvQ3/gdv60a+jh8QFqZCtljg= X-Google-Smtp-Source: AGHT+IH9RxzmajTLifcNJivXHXF15GYt6RNI6hyz68HWdvvEFKxSVfujRMym9vZDynLvRqNw698SRRht3dY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:842:b0:5cc:cd5e:8f0e with SMTP id bz2-20020a05690c084200b005cccd5e8f0emr592433ywb.0.1701297621195; Wed, 29 Nov 2023 14:40:21 -0800 (PST) Date: Wed, 29 Nov 2023 14:40:19 -0800 In-Reply-To: <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Mime-Version: 1.0 References: <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> <4ca2253d-276f-43c5-8e9f-0ded5d5b2779@redhat.com> <81628606-ca9b-866f-5e71-91001e856871@suse.cz> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Vlastimil Babka Cc: Paolo Bonzini , Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231129_144023_482477_F690B432 X-CRM114-Status: GOOD ( 34.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gTW9uLCBOb3YgMjcsIDIwMjMsIFZsYXN0aW1pbCBCYWJrYSB3cm90ZToKPiBPbiAxMS8yLzIz IDE2OjQ2LCBQYW9sbyBCb256aW5pIHdyb3RlOgo+ID4gT24gVGh1LCBOb3YgMiwgMjAyMyBhdCA0 OjM44oCvUE0gU2VhbiBDaHJpc3RvcGhlcnNvbiA8c2VhbmpjQGdvb2dsZS5jb20+IHdyb3RlOgo+ ID4+IEFjdHVhbGx5LCBsb29raW5nIHRoYXQgdGhpcyBhZ2FpbiwgdGhlcmUncyBub3QgYWN0dWFs bHkgYSBoYXJkIGRlcGVuZGVuY3kgb24gVEhQLgo+ID4+IEEgVEhQLWVuYWJsZWQga2VybmVsIF9w cm9iYWJseV8gIGdpdmVzIGEgaGlnaGVyIHByb2JhYmlsaXR5IG9mIHVzaW5nIGh1Z2VwYWdlcywK PiA+PiBidXQgbW9zdGx5IGJlY2F1c2UgVEhQIHNlbGVjdHMgQ09NUEFDVElPTiwgYW5kIEkgc3Vw cG9zZSBiZWNhdXNlIHVzaW5nIFRIUCBmb3IKPiA+PiBvdGhlciBhbGxvY2F0aW9ucyByZWR1Y2Vz IG92ZXJhbGwgZnJhZ21lbnRhdGlvbi4KPiA+IAo+ID4gWWVzLCB0aGF0J3Mgd2h5IEkgZGlkbid0 IGV2ZW4gYm90aGVyIGVuYWJsaW5nIGl0IHVubGVzcyBUSFAgaXMKPiA+IGVuYWJsZWQsIGJ1dCBp dCBtYWtlcyBldmVuIG1vcmUgc2Vuc2UgdG8ganVzdCB0cnkuCj4gPiAKPiA+PiBTbyByYXRoZXIg dGhhbiBob25vciBLVk1fR1VFU1RfTUVNRkRfQUxMT1dfSFVHRVBBR0UgaWZmIFRIUCBpcyBlbmFi bGVkLCBJIHRoaW5rCj4gPj4gd2Ugc2hvdWxkIGRvIHRoZSBiZWxvdyAoSSB2ZXJpZmllZCBLVk0g Y2FuIGNyZWF0ZSBodWdlcGFnZXMgd2l0aCBUSFA9bikuICBXZSdsbAo+ID4+IG5lZWQgYW5vdGhl ciBjYXBhYmlsaXR5LCBidXQgKGEpIHdlIHByb2JhYmx5IHNob3VsZCBoYXZlIHRoYXQgYW55d2F5 cyBhbmQgKGIpIGl0Cj4gPj4gcHJvdmlkZXMgYSBjbGVhbmVyIHBhdGggdG8gYWRkaW5nIFBVRC1z aXplZCBodWdlcGFnZSBzdXBwb3J0IGluIHRoZSBmdXR1cmUuCj4gPiAKPiA+IEkgd29uZGVyIGlm IHdlIG5lZWQgS1ZNX0NBUF9HVUVTVF9NRU1GRF9IVUdFUEFHRV9QTURfU0laRSB0aG91Z2guIFRo aXMKPiA+IHNob3VsZCBiZSBhIGdlbmVyaWMga2VybmVsIEFQSSBhbmQgaW4gZmFjdCB0aGUgc2l6 ZXMgYXJlIGF2YWlsYWJsZSBpbgo+ID4gYSBub3Qtc28tZnJpZW5kbHkgZm9ybWF0IGluIC9zeXMv a2VybmVsL21tL2h1Z2VwYWdlcy4KPiA+IAo+ID4gV2Ugc2hvdWxkIGp1c3QgYWRkIC9zeXMva2Vy bmVsL21tL2h1Z2VwYWdlcy9zaXplcyB0aGF0IGNvbnRhaW5zCj4gPiAiMjA5NzE1MiAxMDczNzQx ODI0IiBvbiB4ODYgKG9ubHkgdGhlIGZvcm1lciBpZiAxRyBwYWdlcyBhcmUgbm90Cj4gPiBzdXBw b3J0ZWQpLgo+ID4gCj4gPiBQbHVzOiBpcyB0aGlzIHRoZSBiZXN0IEFQSSBpZiB3ZSBuZWVkIHNv bWV0aGluZyBlbHNlIGZvciAxRyBwYWdlcz8KPiA+IAo+ID4gTGV0J3MgZHJvcCAqdGhpcyogcGF0 Y2ggYW5kIHByb2NlZWQgaW5jcmVtZW50YWxseS4gKEFnYWluLCB0aGlzIGlzCj4gPiB3aGF0IEkg d2FudCB0byBkbyB3aXRoIHRoaXMgZmluYWwgcmV2aWV3OiBpZGVudGlmeSBwbGFjZXMgdGhhdCBh cmUKPiA+IHN0aWwgc3RpY2t5LCBhbmQgZG9uJ3QgbGV0IHRoZW0gYmxvY2sgdGhlIHJlc3QpLgo+ ID4gCj4gPiBDb2luY2lkZW50aWFsbHkgd2UgaGF2ZSBhbiBvcGVuIHNwb3QgbmV4dCB3ZWVrIGF0 IHBsdW1iZXJzLiBMZXQncwo+ID4gZXh0ZW5kIEZ1YWQncyBzZWN0aW9uIHRvIGNvdmVyIG1vcmUg Z3Vlc3RtZW0gd29yay4KPiAKPiBIaSwKPiAKPiB3YXMgdGhlcmUgYW55IG91dGNvbWUgd3J0IHRo aXMgb25lPwoKTm8sIHdlIHB1bnRlZCBvbiBodWdlcGFnZSBzdXBwb3J0IGZvciB0aGUgaW5pdGlh bCBndWVzdF9tZW1mZCBtZXJnZS4gIFdlIGRlZmluaXRlbHkKcGxhbiBvbiBhZGRpbmcgaHVnZWFw Z2Ugc3VwcG9ydCBzb29uZXIgdGhhbiBsYXRlciwgYnV0IHdlIGhhdmVuJ3QgeWV0IGFncmVlZCBv bgpleGFjdGx5IHdoYXQgdGhhdCB3aWxsIGxvb2sgbGlrZS4KCj4gQmFzZWQgb24gbXkgZXhwZXJp ZW5jZSB3aXRoIFRIUCdzIGl0IHdvdWxkIGJlIGJlc3QgaWYgdXNlcnNwYWNlIGRpZG4ndCBoYXZl Cj4gdG8gb3B0LWluLCBub3IgY2FyZSBhYm91dCB0aGUgc3VwcG9ydGVkIHNpemUuIElmIHRoZSBn aXZlbiBzaXplIGlzIHVuYWxpZ25lZCwKPiBwcm92aWRlIGEgbWl4IG9mIGxhcmdlIHBhZ2VzIHVw IHRvIGFuIGFsaWduZWQgc2l6ZSwgYW5kIGZvciB0aGUgcmVzdCBmYWxsYmFjawo+IHRvIGJhc2Ug cGFnZXMsIHdoaWNoIHNob3VsZCBiZSBiZXR0ZXIgdGhhbiAtRUlOVkFMIG9uIGNyZWF0aW9uIChp cyBpdAo+IHBvc3NpYmxlIHdpdGggdGhlIGN1cnJlbnQgaW1wbGVtZW50YXRpb24/IEknZCBob3Bl IHNvIHNvPykuCgpndWVzdF9tZW1mZCBzZXJ2ZXMgYSBkaWZmZXJlbnQgdXNlIGNhc2UgdGhhbiBU SFAuICBGb3IgbW9kZXJuIFZNcywgYW5kIGVzcGVjaWFsbHkKZm9yIHNsaWNlLW9mLWhhcmR3YXJl IFZNcyB0aGF0IGFyZSBvbmUgb2YgdGhlIG1haW4gdGFyZ2V0cyBmb3IgZ3Vlc3RfbWVtZmQsIGlm IG5vdApfdGhlXyBtYWluIHRhcmdldCwgZ3Vlc3QgbWVtb3J5IHNob3VsZCBfYWx3YXlzXyBiZSBi YWNrZWQgYnkgaHVnZXBhZ2VzIGluIHRoZQpwaHlzaWNhbCBkb21haW4uICBUaGUgYWN0dWFsIGd1 ZXN0IG1hcHBpbmdzIG1pZ2h0IG5vdCBiZSBodWdlLCBlLmcuIHg4NiBuZWVkcyB0bwpkbyBwYXJ0 aWFsIG1hcHBpbmdzIHRvIHNraXAgb3ZlciAobGVnYWN5KSBtZW1vcnkgaG9sZXMsIGJ1dCBLVk0g YWxyZWFkeSBncmFjZWZ1bGx5CmhhbmRsZXMgdGhhdC4KCkluIG90aGVyIHdvcmRzLCBmb3IgbW9z dCBndWVzdF9tZW1mZCB1c2UgY2FzZXMsIGlmIHVzZXJzcGFjZSB3YW50cyBodWdlcGFnZXMgYnV0 CktWTSBjYW4ndCBwcm92aWRlIGh1Z2VwYWdlcywgdGhlbiBpdCBpcyBtdWNoIG1vcmUgZGVzaXJh YmxlIHRvIHJldHVybiBhbiBlcnJvcgp0aGFuIHRvIHNpbGVudGx5IGZhbGwgYmFjayB0byBzbWFs bCBwYWdlcy4KCkkgMTAwJSBhZ3JlZSB0aGF0IGhhdmluZyB0byBvcHQtaW4gaXMgc3Vib3B0aW1h bCwgYnV0IElNTyBwcm92aWRpbmcgImVycm9yIG9uIGFuCmluY29tcGF0aWJsZSBjb25maWd1cmF0 aW9uIiBzZW1hbnRpY3Mgd2l0aG91dCByZXF1aXJpbmcgdXNlcnNwYWNlIHRvIG9wdC1pbiBpcyBh bgpldmVuIHdvcnNlIGV4cGVyaWVuY2UgZm9yIHVzZXJzcGFjZS4KCj4gQSB3YXkgdG8gb3B0LW91 dCBmcm9tIGh1Z2UgcGFnZXMgY291bGQgYmUgdXNlZnVsIGFsdGhvdWdoIHRoZXJlJ3MgYWx3YXlz IHRoZQo+IHJpc2sgb2Ygc29tZSBpbml0aWFsIHRyb3VibGVzIHJlc3VsdGluZyBpbiB2YXJpb3Vz IG9ubGluZSBzb3VyY2VzIGNhcmdvLWN1bHQKPiByZWNvbW1lbmRpbmcgdG8gb3B0LW91dCBmb3Jl dmVyLgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGlu dXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgtYXJtLWtlcm5lbEBsaXN0cy5pbmZyYWRl YWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgt YXJtLWtlcm5lbAo=