From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7D1912B83 for ; Wed, 1 Nov 2023 16:36:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yuFCX/+h" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A8F6110 for ; Wed, 1 Nov 2023 09:36:03 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-da1aa98ec19so6159633276.2 for ; Wed, 01 Nov 2023 09:36:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698856562; x=1699461362; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=yuFCX/+hNdCQoExuPOLtehV1sJV5YXuIC4dK84hnQsWhPdH2Z3iLN7bxdeMtxGj+WB cyzEaD66/7stT4SYcguXXwRhJjsUPKJnPLc7tlAGTT6QQ7U8Hc2NurenL8NF3eoKj+4P RfWTVWvFfkqVD2gRfDU09y5lBFNvzjzueEpuaO2xsVY5I4+kSjoC83fBVUIBnh3QJaL8 RsG8orHK+MB9w4bObQ/9WLURAj4uIVelYqCRJKbCGCOjf013V/nSWEedtPb+5hmOsFjL r6GSwII+CK6XrhP83YYhH3pshpbwSDDI4qB9CrX5BKhf+VGh0py3HXKrLZViwVuSh4mC zYng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698856562; x=1699461362; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=lSVP1vdXIJLDKV/UX7030w9JeIWDlIiUz+1jCr4UWo4gDE+Xo2aY4wt7KnplMnumY/ pfEsiWhBImbK/4RPt6V7CH/dgqrq9A9Xf3du6bu+FM8GJv13+7c1oJRqd2NTU2Q2twNc SX07PYKp6VwySf9uTWxaG4mj5UJ/QO0S2HZd4csS8iA0aqAoXxsGgLPVgi6kijWzJZOP mMhoFwpKiKgSDkkUGmBot77MB1sNv6BK8UtW9xzuEDV696yUnVfBHLtSEegHBeDhYykg WjyxorS0MZf1oF2SEEUy6VlfPFf7c+CCexzksB7J9PsWbdZqLJu8aMvzoO8uc/P+Fl5c q9ig== X-Gm-Message-State: AOJu0YxJehqPflKeDZGzMqwjuUkGB5I+k4AFjDCbG3YP44doi3lnWeb0 NYKhH5vig892OQPB0sZc4MlNwy8Qla4= X-Google-Smtp-Source: AGHT+IFqCABorYKmfCmJZy/oBSeOXTtnBU+krovHVjWnvhFIdm8AwbTMBJoMdfZ9FR5pLbycY5bTrtHt4Z4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:770f:0:b0:da0:73c2:db78 with SMTP id s15-20020a25770f000000b00da073c2db78mr326876ybc.9.1698856562473; Wed, 01 Nov 2023 09:36:02 -0700 (PDT) Date: Wed, 1 Nov 2023 09:36:00 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-18-seanjc@google.com> <7c0844d8-6f97-4904-a140-abeabeb552c1@intel.com> <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Paolo Bonzini Cc: Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Nov 01, 2023, Paolo Bonzini wrote: > On Wed, Nov 1, 2023 at 2:41=E2=80=AFPM Sean Christopherson wrote: > > > > On Wed, Nov 01, 2023, Xiaoyao Li wrote: > > > On 10/31/2023 10:16 PM, Sean Christopherson wrote: > > > > On Tue, Oct 31, 2023, Xiaoyao Li wrote: > > > > > On 10/28/2023 2:21 AM, Sean Christopherson wrote: > > > But it's different than MADV_HUGEPAGE, in a way. Per my understanding= , the > > > failure of MADV_HUGEPAGE is not fatal, user space can ignore it and > > > continue. > > > > > > However, the failure of KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is fatal, whic= h leads > > > to failure of guest memfd creation. > > > > Failing KVM_CREATE_GUEST_MEMFD isn't truly fatal, it just requires diff= erent > > action from userspace, i.e. instead of ignoring the error, userspace co= uld redo > > KVM_CREATE_GUEST_MEMFD with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE=3D0. > > > > We could make the behavior more like MADV_HUGEPAGE, e.g. theoretically = we could > > extend fadvise() with FADV_HUGEPAGE, or add a guest_memfd knob/ioctl() = to let > > userspace provide advice/hints after creating a guest_memfd. But I sus= pect that > > guest_memfd would be the only user of FADV_HUGEPAGE, and IMO a post-cre= ation hint > > is actually less desirable. > > > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will fail only if userspace didn't provi= de a > > compatible size or the kernel doesn't support THP. An incompatible siz= e is likely > > a userspace bug, and for most setups that want to utilize guest_memfd, = lack of THP > > support is likely a configuration bug. I.e. many/most uses *want* fail= ures due to > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE to be fatal. > > > > > For current implementation, I think maybe KVM_GUEST_MEMFD_DESIRE_HUGE= PAGE > > > fits better than KVM_GUEST_MEMFD_ALLOW_HUGEPAGE? or maybe *PREFER*? > > > > Why? Verbs like "prefer" and "desire" aren't a good fit IMO because th= ey suggest > > the flag is a hint, and hints are usually best effort only, i.e. are ig= nored if > > there is a fundamental incompatibility. > > > > "Allow" isn't perfect, e.g. I would much prefer a straight KVM_GUEST_ME= MFD_USE_HUGEPAGES > > or KVM_GUEST_MEMFD_HUGEPAGES flag, but I wanted the name to convey that= KVM doesn't > > (yet) guarantee hugepages. I.e. KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is stro= nger than > > a hint, but weaker than a requirement. And if/when KVM supports a dedi= cated memory > > pool of some kind, then we can add KVM_GUEST_MEMFD_REQUIRE_HUGEPAGE. >=20 > I think that the current patch is fine, but I will adjust it to always > allow the flag, and to make the size check even if !CONFIG_TRANSPARENT_HU= GEPAGE. > If hugepages are not guaranteed, and (theoretically) you could have no > hugepage at all in the result, it's okay to get this result even if THP i= s not > available in the kernel. Can you post a fixup patch? It's not clear to me exactly what behavior you= intend to end up with.