From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 8F46FC7EE2D
	for <linux-arm-kernel@archiver.kernel.org>; Mon, 22 May 2023 20:47:15 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID:
	References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:List-Owner;
	bh=QOhqtRy4h9OLuYIl8LRh0NRxDJst+9/8SU9auHcfJ1Q=; b=NJw+Z+lHhJIufwBhUEucoFvSC+
	2fV4dC0deiB/r9Jf08cRpxM+P3K1+QBiEypIRYHAjivdmaG/KRXMa6iaDrcP28Tn+Gaov1i7nsbhC
	e8S9G+ZTgrHNPNiac8tD8Fagy9g95WsBLi6zqmLfEQxMdTeyg2hqYbYilUWcKDcfnDf3bs8t/RW7L
	nM3O3Fwb84itInkCb+hUY4CXsexMlL+HeeEXCGBOIi6oHqqXACgRu6bB2t0rJcsvDO9rOgnuIbZOv
	7yNQvNpKzpKjIvGWoDYm4OuTnIfZvMsKkQGFniDj/+5lCAAiZeChCtKqMGee8oC4aadY/+/t4++lN
	t63n4w1g==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux))
	id 1q1CQa-007xZQ-1o;
	Mon, 22 May 2023 20:46:48 +0000
Received: from mail-pf1-x44a.google.com ([2607:f8b0:4864:20::44a])
	by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux))
	id 1q1CQW-007xYD-2X
	for linux-arm-kernel@lists.infradead.org;
	Mon, 22 May 2023 20:46:46 +0000
Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-64d1cdf9fa0so2903614b3a.0
        for <linux-arm-kernel@lists.infradead.org>; Mon, 22 May 2023 13:46:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1684788403; x=1687380403;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=W87RELi464qEY+cS32xNBLxYHSfFpf8luRfYTadu71s=;
        b=3WmA7xxVaeZdUxvIGDEkmw0U5+8pgH6A4D/wP81nd2cIpTBEuldmrRCAoJtvSHDt+6
         I9pSuquhStkKBpVZrmMtQ3xrA17/B+SV1xa1VbUtnKWToy8Zgzx+eY9D7sGpeRT0Z4PH
         vXbk89Kj2KupVtXxdXiQ+wAYTvsJhsSbRPyrXOQQiRtiLU5DQ1qqmvUntdLZwcUx1+95
         /NCZaahGvA7mpeZwh/bHpumNFHPq8POy0pNIzNjC3APOTmR9FjZVqlczLxIlPHaxPAig
         xrik+Y/H3sklUwbX63lYmkbTEYn8384LPvyhb79CFufru4wemgylfV9yphU+pFQDLC1M
         ZIaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1684788403; x=1687380403;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=W87RELi464qEY+cS32xNBLxYHSfFpf8luRfYTadu71s=;
        b=U+pmuVMaNylffbHvoZgRcMFQ6jjqXkMjrJkSVMWW9FaditgS4BNf0uMd0B6fLLtaB+
         vfH1nbhgxSdhkixwGFoGyVZ0L/w8EcGCDzJvGDmTOmc5veAPKtdhz6nzCzkrVYR2B8SR
         UQJhfhl+AsfR3qdVEzGeCmEvVuRfJezKlpb45fd0Cw8Ud7NHMdM+lpOS/UUYC67tKfsL
         3kmDX5KFFQK575rsOdbz/V9Fp2hyZM935ZAP2lzQZMi+NFEN/+u7O/8l11gF/M2b0MIZ
         PKTBUfsXHbdA/aI/0NLvTby6xocwMJ4zflAw5Bnze/UAlKyWivN6DqPk1PFCWw4qrgz+
         NxUQ==
X-Gm-Message-State: AC+VfDzR/tJWIHkagndVOLj6EZ2Y6GqXDXKBQXZ6yagh74cfMrrySRRD
	6vLJcX6vvfvzqU8FaCoDcOaR9D7Lg6c=
X-Google-Smtp-Source: ACHHUZ6NQ7aTOkvUTtDMEi0IDRSDH4TmeTu4tsYNNJ9QDi8IZHPcUOZMsPBzSPmHiwbofpkrZxUG52aqDCE=
X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:a1a:b0:624:5886:4b4b with SMTP id
 p26-20020a056a000a1a00b0062458864b4bmr5302538pfh.5.1684788403186; Mon, 22 May
 2023 13:46:43 -0700 (PDT)
Date: Mon, 22 May 2023 13:46:41 -0700
In-Reply-To: <20230330085802.2414466-2-stevensd@google.com>
Mime-Version: 1.0
References: <20230330085802.2414466-1-stevensd@google.com> <20230330085802.2414466-2-stevensd@google.com>
Message-ID: <ZGvUsf7lMkrNDHuE@google.com>
Subject: Re: [PATCH v6 1/4] KVM: mmu: introduce new gfn_to_pfn_noref functions
From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, 
	Paolo Bonzini <pbonzini@redhat.com>, linux-arm-kernel@lists.infradead.org, 
	kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, 
	kvm@vger.kernel.org, Peter Xu <peterx@redhat.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20230522_134644_825636_6570DD35 
X-CRM114-Status: GOOD (  32.36  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

+Peter

On Thu, Mar 30, 2023, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Introduce new gfn_to_pfn_noref functions that parallel existing
> gfn_to_pfn functions. These functions can be used when the caller does
> not need to maintain a reference to the returned pfn (i.e. when usage is
> guarded by a mmu_notifier). The noref functions take an out parameter
> that is used to return the struct page if the hva was resolved via gup.
> The caller needs to drop its reference such a returned page.

I dislike the "noref" name and the approach itself (of providing an entirely
separate set of APIs).  Using "noref" is confusing because the callers do actually
get a reference to the page (if a refcounted page is found).

As for the approach, I really, really don't want to end up with yet more APIs
for getting PFNs from GFNs.  We already have far too many.  In the short term,
I think we'll need to carry multiple sets of APIs, as converting all architectures
to any new API will be too much for a single series.  But I want to have line of
sight to convering on a single, as-small-as-possible set of APIs, and I think/hope
it should be possible to make the old APIs, e.g. gfn_to_pfn(), to be shims around
the new APIs.

And since this series is essentially overhauling the gfn_to_pfn APIs, I think it's
the right series to take on refactoring the APIs to clean up the growing flag
problem.  There was a bit of discussion back when "interruptible" support was
added (https://lore.kernel.org/all/YrTbKaRe497n8M0o@xz-m1.loca), but it got punted
because it wasn't necessary, and because there wasn't immediate agreement on what
exactly the APIs should look like.

Overhauling the APIs would also let us clean up things like async #PF, specifically
replacing the unintuitive "*async = true" logic with something like this:

		if ((flags & FOLL_NOWAIT) && vma_is_valid(vma, flags & FOLL_WRITE))
			pfn = KVM_PFN_ERR_FAULT_MINOR;
		else
			pfn = KVM_PFN_ERR_FAULT;

Lastly, I think there's also an opportunity here to harden KVM's interaction with
mmu_notifiers, and to dedup arch code in KVM .  Specifically, even when the proposed
"allow_unsafe_kmap" is true, KVM should either (a) be "in" an mmu_notifier sequence
or (b) _want_ to grab a reference.  And when KVM does NOT want a reference, the core
API can/should immediately drop the reference even before returning.

My thought is it provide an "entirely" new API, named something like kvm_follow_pfn()
to somewhat mirror follow_{pfn,pte,phys}().  Ideally something to pair with gup()
would be nice, but having a dedicated KVM helper to get _only_ struct page memory
doesn't work well because KVM almost never wants only struct page memory.

As for the flags vs. bools debate (see link above), I think the best approach is
a mix of the two.  Specifically, reuse the FOLL_* flags as-is for inputs, and use
booleans for outputs.  I don't _think_ there are any input bools/flags that don't
map 1:1 with existing FOLL_* flags.

As a very, *very* rough sketch, provide APIs that look a bit like this.

  kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll)
  {
	kvm_pfn_t pfn;

	if (WARN_ON_ONCE(!(foll->flags & FOLL_GET) && !foll.mmu_seq))
		return KVM_PFN_ERR_FAULT;

	pfn = ???;

	if (foll->page && !(foll->flags & FOLL_GET))
		put_page(foll->page);

	return pfn;
  }

  kvm_pfn_t kvm_follow_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct page **page)
  {
	struct kvm_follow_pfn foll = {
		.flags = FOLL_GET | FOLL_WRITE,
	};

	<more stuff here?>

	foll.slot = ???;
	if (!foll.slot || foll.slot->flags & KVM_MEMSLOT_INVALID)
		return KVM_HVA_ERR_BAD;

	if (memslot_is_readonly(foll.slot))
		return KVM_HVA_ERR_RO_BAD;

	return __kvm_follow_pfn(&foll);
  }

and a few partially converted users

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 67e2ac799aa7..5eaf0395ed87 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -550,12 +550,14 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 
        if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
                flush = true;
-               kvm_set_pfn_accessed(spte_to_pfn(old_spte));
+               if (is_refcounted_page_pte(old_spte))
+                       kvm_set_page_accessed(pfn_to_page(spte_to_pfn));
        }
 
        if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
                flush = true;
-               kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+               if (is_refcounted_page_pte(old_spte))
+                       kvm_set_page_dirty(pfn_to_page(spte_to_pfn));
        }
 
        return flush;
@@ -4278,6 +4280,10 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
+       struct kvm_follow_pfn foll = {
+               .mmu_seq = fault->mmu_seq,
+               .gfn = fault->gfn,
+       };
        struct kvm_memory_slot *slot = fault->slot;
        bool async;
 
@@ -4309,12 +4315,16 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
                        return RET_PF_EMULATE;
        }
 
-       async = false;
-       fault->pfn = __gfn_to_pfn_noref_memslot(slot, fault->gfn, false, false, &async,
-                                               fault->write, &fault->map_writable,
-                                               &fault->hva, &fault->page);
-       if (!async)
-               return RET_PF_CONTINUE; /* *pfn has correct page already */
+       foll.flags = FOLL_NOWAIT;
+       if (fault->write)
+               foll.flags |= FOLL_WRITE;
+
+       fault->pfn = __kvm_follow_pfn(&foll);
+       if (!is_error_noslot_pfn(fault->pfn))
+               goto success;
+
+       if (!is_fault_minor_pfn(fault->pfn))
+               return RET_PF_CONTINUE;
 
        if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) {
                trace_kvm_try_async_get_page(fault->addr, fault->gfn);
@@ -4332,9 +4342,18 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
         * to wait for IO.  Note, gup always bails if it is unable to quickly
         * get a page and a fatal signal, i.e. SIGKILL, is pending.
         */
-       fault->pfn = __gfn_to_pfn_noref_memslot(slot, fault->gfn, false, true, NULL,
-                                               fault->write, &fault->map_writable,
-                                               &fault->hva, &fault->page);
+       foll.flags |= FOLL_INTERRUPTIBLE;
+       foll.flags &= ~FOLL_NOWAIT;
+
+       fault->pfn = kvm_follow_pfn(&foll);
+       if (!is_error_noslot_pfn(fault->pfn))
+               goto success;
+
+       return RET_PF_CONTINUE;
+success:
+       fault->hva = foll.hva;
+       fault->page = foll.page;
+       fault->map_writable = foll.writable;
        return RET_PF_CONTINUE;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 360eaa24456f..0bae253c88dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2663,9 +2663,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
                if (r < 0)
                        pfn = KVM_PFN_ERR_FAULT;
        } else {
-               if (async && vma_is_valid(vma, write_fault))
-                       *async = true;
-               pfn = KVM_PFN_ERR_FAULT;
+               if ((flags & FOLL_NOWAIT) && vma_is_valid(vma, flags & FOLL_WRITE))
+                       pfn = KVM_PFN_ERR_FAULT_MINOR;
+               else
...skipping...
+       fault->pfn = kvm_follow_pfn(&foll);
+       if (!is_error_noslot_pfn(fault->pfn))
+               goto success;
+
+       return RET_PF_CONTINUE;
+success:
+       fault->hva = foll.hva;
+       fault->page = foll.page;
+       fault->map_writable = foll.writable;
        return RET_PF_CONTINUE;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 360eaa24456f..0bae253c88dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2663,9 +2663,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,
                if (r < 0)
                        pfn = KVM_PFN_ERR_FAULT;
        } else {
-               if (async && vma_is_valid(vma, write_fault))
-                       *async = true;
-               pfn = KVM_PFN_ERR_FAULT;
+               if ((flags & FOLL_NOWAIT) && vma_is_valid(vma, flags & FOLL_WRITE))
+                       pfn = KVM_PFN_ERR_FAULT_MINOR;
+               else
+                       pfn = KVM_PFN_ERR_FAULT;
        }
 exit:
        mmap_read_unlock(current->mm);
@@ -2732,6 +2733,30 @@ kvm_pfn_t __gfn_to_pfn_noref_memslot(const struct kvm_memory_slot *slot, gfn_t g
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_noref_memslot);
 
+kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll)
+{
+       kvm_pfn_t pfn;
+
+       if (WARN_ON_ONCE(!(foll->flags & FOLL_GET) && !foll.mmu_seq))
+               return KVM_PFN_ERR_FAULT;
+
+       pfn = __gfn_to_pfn_noref_memslot(...);
+
+       if (foll->page && !(foll->flags & FOLL_GET))
+               put_page(foll->page);
+
+       return pfn;
+}
+
+kvm_pfn_t kvm_follow_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct page **page)
+{
+       struct kvm_follow_pfn foll = {
+               .flags = FOLL_GET | FOLL_WRITE,
+       };
+
+       return __kvm_follow_pfn(&foll);
+}
+
 kvm_pfn_t gfn_to_pfn_noref_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
                                bool *writable, struct page **page)
 {
@@ -2910,25 +2935,23 @@ void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
 
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
 {
+       struct page *page;
        kvm_pfn_t pfn;
        void *hva = NULL;
-       struct page *page = KVM_UNMAPPED_PAGE;
 
        if (!map)
                return -EINVAL;
 
-       pfn = gfn_to_pfn(vcpu->kvm, gfn);
+       pfn = kvm_follow_pfn(vcpu->kvm, gfn, &page)
        if (is_error_noslot_pfn(pfn))
                return -EINVAL;
 
-       if (pfn_valid(pfn)) {
-               page = pfn_to_page(pfn);
+       if (page)
                hva = kmap(page);
 #ifdef CONFIG_HAS_IOMEM
-       } else {
+       else if (allow_unsafe_kmap)
                hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
 #endif
-       }
 
        if (!hva)
                return -EFAULT;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel