From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73A8ACCA479 for ; Mon, 18 Jul 2022 20:49:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233375AbiGRUtM (ORCPT ); Mon, 18 Jul 2022 16:49:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233431AbiGRUtE (ORCPT ); Mon, 18 Jul 2022 16:49:04 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F4EE63E2 for ; Mon, 18 Jul 2022 13:49:04 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id z12-20020a17090a7b8c00b001ef84000b8bso19377358pjc.1 for ; Mon, 18 Jul 2022 13:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Zbzt4chIOAHidm79RNeoJvZKga6Ri3IH/kwQEEuMyhQ=; b=ox+L7vvavOYp4ysTAQIE6ZNLqMWWkVbtwMP1VNfH3juV5EfYZ3YpuVCyI4axobd5eI 2RaL+uBtsUPlSjJnh1cpw8gz2z0Qm+NiegNkzBZwuXLUEv+DIhtdmugtgYlEQ3WpnQ71 TePdoppcwfTsJPsnkp88/eMaisHj3PYupQ2j7vZN9kA6vsfpH3BeD6pqNfmuljnYfo/Q Jxh6eh04QPrnDP35DacSAXi0oYSvX9J8TOtU+NtGoLa2Mpk6Ygzc4XUwrZucU4Q0j0SQ +WxpiJmMeitNO4mt7MBoPNwEkgAHx9knjOFgMuFce1mZluXqVj5MqbZ1kY/cYij15r8S x3ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Zbzt4chIOAHidm79RNeoJvZKga6Ri3IH/kwQEEuMyhQ=; b=BBbXlFE2MZ9EOPuod2PBq8WM7NsvaES51nj+dsXAbd+TURy/Rc5jf3RHGHosclX/Sn qa9YJOHgQHaIK0ns+0CwhlwNf7rfRjKOSqG6puilIhNLh7+apo3048pH29tOPwo/s83C PtC8hOsShwhqVoJMWF29WpOTTYK+KKtYEZ/q0cQOTI76rnOlUTJp2RJdhEpGKTJeim1i TlWXqALzUrUeL/PE9ftKhiReV2Tx1Jp4MjU34W5nlIXkH+hC3GTS00mtGPkNNqgP4XD/ WobNcLhJHXxbwYYQbADrB1Chk5QM1nGcm/pC/ZndYwk/vS/6PAKHgjIV72r7lveKOEd5 4bBA== X-Gm-Message-State: AJIora9KlVPJTMjZgqQowTUVT5MCZ2EtJ3uw1MpZND/M+lss4H4G97RR PfZWnuNC94Ki+kvZSxU9v9Mtjw== X-Google-Smtp-Source: AGRyM1vgptEceVtByGuNYUrudxA+4UJt+Gr4/h5m/Geot5gc24p2f9PZe/3aC+qtDZ5bkni9hTvQtg== X-Received: by 2002:a17:90b:1c0b:b0:1f0:23df:5406 with SMTP id oc11-20020a17090b1c0b00b001f023df5406mr34986891pjb.157.1658177343346; Mon, 18 Jul 2022 13:49:03 -0700 (PDT) Received: from google.com (59.39.145.34.bc.googleusercontent.com. [34.145.39.59]) by smtp.gmail.com with ESMTPSA id v21-20020a17090ac91500b001f113765d48sm7878522pjt.2.2022.07.18.13.49.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 13:49:01 -0700 (PDT) Date: Mon, 18 Jul 2022 20:48:55 +0000 From: Mingwei Zhang To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/4] KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() Message-ID: References: <20220715232107.3775620-1-seanjc@google.com> <20220715232107.3775620-3-seanjc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 18, 2022, Sean Christopherson wrote: > On Sat, Jul 16, 2022, Mingwei Zhang wrote: > > On Fri, Jul 15, 2022, Sean Christopherson wrote: > > > Add a comment to document how host_pfn_mapping_level() can be used safely, > > > as the line between safe and dangerous is quite thin. E.g. if KVM were > > > to ever support in-place promotion to create huge pages, consuming the > > > level is safe if the caller holds mmu_lock and checks that there's an > > > existing _leaf_ SPTE, but unsafe if the caller only checks that there's a > > > non-leaf SPTE. > > > > > > Opportunistically tweak the existing comments to explicitly document why > > > KVM needs to use READ_ONCE(). > > > > > > No functional change intended. > > > > > > Signed-off-by: Sean Christopherson > > > --- > > > arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++------- > > > 1 file changed, 35 insertions(+), 7 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index bebff1d5acd4..d5b644f3e003 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -2919,6 +2919,31 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) > > > __direct_pte_prefetch(vcpu, sp, sptep); > > > } > > > > > > +/* > > > + * Lookup the mapping level for @gfn in the current mm. > > > + * > > > + * WARNING! Use of host_pfn_mapping_level() requires the caller and the end > > > + * consumer to be tied into KVM's handlers for MMU notifier events! > > Since calling this function won't cause kernel crash now, I guess we can > > remove the warning sign here, but keep the remaining statement since it > > is necessary. > > Calling this function won't _directly_ crash the kernel, but improper usage can > most definitely crash the host kernel, or even worse, silently corrupt host and > or guest data. E.g. if KVM were to race with an mmu_notifier event and incorrectly > map a stale huge page into the guest. > > So yes, the function itself is robust, but usage is still very subtle and delicate. Understood. So we basically create another "gup_fast_only()" within KVM and we worry that may confuse other developers so we add the warning sign. > > > > + * > > > + * There are several ways to safely use this helper: > > > + * > > > + * - Check mmu_notifier_retry_hva() after grabbing the mapping level, before > > > + * consuming it. In this case, mmu_lock doesn't need to be held during the > > > + * lookup, but it does need to be held while checking the MMU notifier. > > > > but it does need to be held while checking the MMU notifier and > > consuming the result. > > I didn't want to include "consuming the result" because arguably the result is > being consumed while running the guest, and obviously KVM doesn't hold mmu_lock > while running the guest (though I fully acknowledge the above effectively uses > "consume" in the sense of shoving the result into SPTEs). > > > > + * > > > + * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation > > > + * event for the hva. This can be done by explicit checking the MMU notifier > > s/explicit/explicitly > > > > + * or by ensuring that KVM already has a valid mapping that covers the hva. > > > > Yes, more specifically, "mmu notifier sequence counter". > > Heh, depends on what the reader interprets as "sequence counter". If the reader > interprets that as the literal sequence counter, mmu_notifier_seq, then this phrasing > is incorrect as mmu_notifier_seq isn't bumped until the invalidation completes, > i.e. it guards against _past_ invalidations, not in-progress validations. > > My preference is to intentionally not be precise in describing how to check for an > in-progress invalidation, e.g. so that this comment doesn't need to be updated if > the details change, and to also to try and force developers to do more than copy > and paste if they want to use this helper. Hmm, I was going to say that I strongly disagree about the intentional unclearness. But then I find that MMU notifier implementation does require more than just the counter but also the range, so yeah, talking too much may fall into the weeds. But in general, I think mmu notifier deserves better documentation in both concept and implementation in KVM.