From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A2C1C4167D for ; Wed, 1 Nov 2023 16:36:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231793AbjKAQgK (ORCPT ); Wed, 1 Nov 2023 12:36:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231955AbjKAQgG (ORCPT ); Wed, 1 Nov 2023 12:36:06 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8906C111 for ; Wed, 1 Nov 2023 09:36:03 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-da1aa98ec19so6159635276.2 for ; Wed, 01 Nov 2023 09:36:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698856562; x=1699461362; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=yuFCX/+hNdCQoExuPOLtehV1sJV5YXuIC4dK84hnQsWhPdH2Z3iLN7bxdeMtxGj+WB cyzEaD66/7stT4SYcguXXwRhJjsUPKJnPLc7tlAGTT6QQ7U8Hc2NurenL8NF3eoKj+4P RfWTVWvFfkqVD2gRfDU09y5lBFNvzjzueEpuaO2xsVY5I4+kSjoC83fBVUIBnh3QJaL8 RsG8orHK+MB9w4bObQ/9WLURAj4uIVelYqCRJKbCGCOjf013V/nSWEedtPb+5hmOsFjL r6GSwII+CK6XrhP83YYhH3pshpbwSDDI4qB9CrX5BKhf+VGh0py3HXKrLZViwVuSh4mC zYng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698856562; x=1699461362; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=azJu2Dh8gujl5qnxyYwo1n6krfiG0RwudOI1y5vnuB4=; b=Wxp/atMJ5Z2zExZEN0FjqPRoBXHvPpylgSgLdtFQ5FsaO4guXcDqFOxm0zJlUy/4eH Dl54qqEjhQPrSKvHl+HItxUzykaEb7Fzxz+vTxVoEVaxdd3nk5siWsAj6U5lZuvMegYD Ahj96o4t/dMoAz1hL9v9qiY2kvdwgFWQcKDp912EeijVUdt1BcVSMdgPwiL4yaG/dE9y 0MpMPr2ojNdj73y+oIbRdXiW+NXbO0lE7LtsWXjIfkH5+LWDRkEu1EjiR0EFRmkgyK8m hFFlNbMUNbguZY64DzAyGUAd+vjELpGu3Mrn9FgmvHkgl5GM8QUTzOpuJSDUU8LEx3B8 V0FA== X-Gm-Message-State: AOJu0Yy+EHGLzoFOLp5eGCTP37YafnWSTAiGAo6S72Ll6kstixklSc6/ 5c3NrPypr8UVFls6sy/fS/NpEqz6zCE= X-Google-Smtp-Source: AGHT+IFqCABorYKmfCmJZy/oBSeOXTtnBU+krovHVjWnvhFIdm8AwbTMBJoMdfZ9FR5pLbycY5bTrtHt4Z4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:770f:0:b0:da0:73c2:db78 with SMTP id s15-20020a25770f000000b00da073c2db78mr326876ybc.9.1698856562473; Wed, 01 Nov 2023 09:36:02 -0700 (PDT) Date: Wed, 1 Nov 2023 09:36:00 -0700 In-Reply-To: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-18-seanjc@google.com> <7c0844d8-6f97-4904-a140-abeabeb552c1@intel.com> <92ba7ddd-2bc8-4a8d-bd67-d6614b21914f@intel.com> Message-ID: Subject: Re: [PATCH v13 17/35] KVM: Add transparent hugepage support for dedicated guest memory From: Sean Christopherson To: Paolo Bonzini Cc: Xiaoyao Li , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 01, 2023, Paolo Bonzini wrote: > On Wed, Nov 1, 2023 at 2:41=E2=80=AFPM Sean Christopherson wrote: > > > > On Wed, Nov 01, 2023, Xiaoyao Li wrote: > > > On 10/31/2023 10:16 PM, Sean Christopherson wrote: > > > > On Tue, Oct 31, 2023, Xiaoyao Li wrote: > > > > > On 10/28/2023 2:21 AM, Sean Christopherson wrote: > > > But it's different than MADV_HUGEPAGE, in a way. Per my understanding= , the > > > failure of MADV_HUGEPAGE is not fatal, user space can ignore it and > > > continue. > > > > > > However, the failure of KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is fatal, whic= h leads > > > to failure of guest memfd creation. > > > > Failing KVM_CREATE_GUEST_MEMFD isn't truly fatal, it just requires diff= erent > > action from userspace, i.e. instead of ignoring the error, userspace co= uld redo > > KVM_CREATE_GUEST_MEMFD with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE=3D0. > > > > We could make the behavior more like MADV_HUGEPAGE, e.g. theoretically = we could > > extend fadvise() with FADV_HUGEPAGE, or add a guest_memfd knob/ioctl() = to let > > userspace provide advice/hints after creating a guest_memfd. But I sus= pect that > > guest_memfd would be the only user of FADV_HUGEPAGE, and IMO a post-cre= ation hint > > is actually less desirable. > > > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will fail only if userspace didn't provi= de a > > compatible size or the kernel doesn't support THP. An incompatible siz= e is likely > > a userspace bug, and for most setups that want to utilize guest_memfd, = lack of THP > > support is likely a configuration bug. I.e. many/most uses *want* fail= ures due to > > KVM_GUEST_MEMFD_ALLOW_HUGEPAGE to be fatal. > > > > > For current implementation, I think maybe KVM_GUEST_MEMFD_DESIRE_HUGE= PAGE > > > fits better than KVM_GUEST_MEMFD_ALLOW_HUGEPAGE? or maybe *PREFER*? > > > > Why? Verbs like "prefer" and "desire" aren't a good fit IMO because th= ey suggest > > the flag is a hint, and hints are usually best effort only, i.e. are ig= nored if > > there is a fundamental incompatibility. > > > > "Allow" isn't perfect, e.g. I would much prefer a straight KVM_GUEST_ME= MFD_USE_HUGEPAGES > > or KVM_GUEST_MEMFD_HUGEPAGES flag, but I wanted the name to convey that= KVM doesn't > > (yet) guarantee hugepages. I.e. KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is stro= nger than > > a hint, but weaker than a requirement. And if/when KVM supports a dedi= cated memory > > pool of some kind, then we can add KVM_GUEST_MEMFD_REQUIRE_HUGEPAGE. >=20 > I think that the current patch is fine, but I will adjust it to always > allow the flag, and to make the size check even if !CONFIG_TRANSPARENT_HU= GEPAGE. > If hugepages are not guaranteed, and (theoretically) you could have no > hugepage at all in the result, it's okay to get this result even if THP i= s not > available in the kernel. Can you post a fixup patch? It's not clear to me exactly what behavior you= intend to end up with.