Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Cc: Zhi Wang <zhi.wang.linux@gmail.com>,
	kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
	linux-kernel@vger.kernel.org, Peter Xu <peterx@redhat.com>,
	kvmarm@lists.linux.dev, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET
Date: Wed, 6 Sep 2023 15:03:23 -0700	[thread overview]
Message-ID: <ZPj3KyvCa4GM0RJ2@google.com> (raw)
In-Reply-To: <CAD=HUj4W4PF3O9oLZx-3Rd_W51x1z30hQ36m_fcWUpw=mxUpSA@mail.gmail.com>

On Wed, Sep 06, 2023, David Stevens wrote:
> On Wed, Sep 6, 2023 at 9:45 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, Sep 05, 2023, David Stevens wrote:
> > > For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier
> > > is set, then we're all good here. If guarded_by_mmu_notifier is not
> > > set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is
> > > set. For struct page memory, we're safe because KVM will hold a
> > > reference as long as it's still using the page. For non struct page
> > > memory, we're not safe - this is where the breaking change of
> > > allow_unsafe_mappings would go. Note that for non-refcounted struct
> > > page, we can't use the allow_unsafe_mappings escape hatch. Since
> > > FOLL_GET was requested, if we returned such a page, then the caller
> > > would eventually corrupt the page refcount via kvm_release_pfn.
> >
> > Yes we can.  The caller simply needs to be made aware of is_refcounted_page.   I
> > didn't include that in the snippet below because I didn't want to write the entire
> > patch.  The whole point of adding is_refcounted_page is so that callers can
> > identify exactly what type of page was at the end of the trail that was followed.
> 
> Are you asking me to completely migrate every caller of any gfn_to_pfn
> variant to __kvm_follow_pfn, so that they can respect
> is_refcounted_page? That's the only way to make it safe for
> allow_unsafe_mappings to apply to non-refcounted pages. That is
> decidedly not simple. Or is kvm_vcpu_map the specific call site you
> care about? At best, I can try to migrate x86, and then just add some
> sort of compatibility shim for other architectures that rejects
> non-refcounted pages.

Ah, I see your conundrum.  No, I don't think it's reasonable to require you to
convert all users in every architecture.  I'll still ask, just in case you're
feeling generous, but it's not a requirement :-)

The easiest way forward I can think of is to add yet another flag to kvm_follow_pfn,
e.g. allow_non_refcounted_struct_page, to communicate whether or not the caller
has been enlightened to play nice with non-refcounted struct page memory.  We'll
need that flag no matter what, otherwise we'd have to convert all users in a single
patch (LOL).  Then your series can simply stop at a reasonable point, e.g. convert
all x86 usage (including kvm_vcpu_map(), and leave converting everything else to
future work.

E.g. I think this would be the outro of hva_to_pfn_remapped():

        if (!page)
                goto out;

        if (get_page_unless_zero(page))
                WARN_ON_ONCE(kvm_follow_refcounted_pfn(foll, page) != pfn);
 out:
        pte_unmap_unlock(ptep, ptl);

	/*
	 * TODO: Drop allow_non_refcounted_struct_page once all callers have
	 * been taught to play nice with non-refcounted tail pages.
	 */
	if (page && !foll->is_refcounted_page &&
	    !foll->allow_non_refcounted_struct_page)
		r = -EFAULT
        else if (!foll->is_refcounted_page && !foll->guarded_by_mmu_notifier &&
        	 !allow_unsafe_mappings)
        	r = -EFAULT;
        else
               *p_pfn = pfn;

        return r;

> > > Property 3 would be nice, but we've already concluded that guarding
> > > all translations with mmu notifiers is infeasible. So maintaining
> > > property 2 is the best we can hope for.
> >
> > No, #3 is just a variant of #2.  Unless you're talking about not making guarantees
> > about guest accesses being ordered with respect to VMA/memslot updates, but I
> > don't think that's the case.
> 
> I'm talking about the fact that kvm_vcpu_map is busted with respect to
> updates to VMA updates. It won't corrupt host memory because the
> mapping keeps a reference to the page, but it will continue to use
> stale translations.

True.  But barring some crazy paravirt use case, userspace modifying a mapping
that is in active use is inherently broken, the guest will have no idea that memory
just got yanked away.

Hmm, though I suppose userspace could theoretically mprotect() a mapping to be
read-only, which would "work" for mmu_notifier paths but not kvm_vcpu_map().  But
KVM doesn't provide enough information on -EFAULT for userspace to do anything in
response to a write to read-only memory, so in practice that's likely inherently
broken too.

> From [1], it sounds like you've granted that fixing that is not feasible, so
> I just wanted to make sure that this isn't the "unsafe" referred to by
> allow_unsafe_mappings.

Right, this is not the "unsafe" I'm referring to.

> [1] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@google.com/

next prev parent reply	other threads:[~2023-09-06 22:04 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-04  7:50 [PATCH v7 0/8] KVM: allow mapping non-refcounted pages David Stevens
2023-07-04  7:50 ` [PATCH v7 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty David Stevens
2023-07-04  7:50 ` [PATCH v7 2/8] KVM: Introduce __kvm_follow_pfn function David Stevens
2023-07-05  3:10   ` Yu Zhang
2023-07-05  9:22     ` David Stevens
2023-07-05 10:53       ` Yu Zhang
2023-07-06  5:29         ` David Stevens
2023-07-06 14:52           ` Yu Zhang
2023-08-04 22:03             ` Sean Christopherson
2023-07-05  8:47   ` Zhi Wang
2023-07-05  9:08     ` David Stevens
2023-07-11 17:37       ` Zhi Wang
2023-07-06  1:34   ` Isaku Yamahata
2023-07-06  5:52     ` David Stevens
2023-08-04 22:13   ` Sean Christopherson
2023-07-04  7:50 ` [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET David Stevens
2023-07-05  7:23   ` Yu Zhang
2023-07-05 11:56   ` Yu Zhang
2023-07-06  6:09     ` David Stevens
2023-07-05 13:19   ` Zhi Wang
2023-07-06  6:49     ` David Stevens
2023-07-11 17:33       ` Zhi Wang
2023-07-11 21:59         ` Sean Christopherson
2023-09-05  8:26           ` David Stevens
2023-09-06  0:45             ` Sean Christopherson
2023-09-06  3:24               ` David Stevens
2023-09-06 22:03                 ` Sean Christopherson [this message]
2023-07-04  7:50 ` [PATCH v7 4/8] KVM: x86/mmu: Migrate to __kvm_follow_pfn David Stevens
2023-07-05  8:07   ` Yu Zhang
2023-08-04 22:30     ` Sean Christopherson
2023-07-06  1:54   ` Isaku Yamahata
2023-08-24  8:03     ` David Stevens
2023-07-04  7:50 ` [PATCH v7 5/8] KVM: x86/mmu: Don't pass FOLL_GET " David Stevens
2023-07-05 10:18   ` Yu Zhang
2023-07-05 14:17     ` Yu Zhang
2023-07-06  4:52     ` David Stevens
2023-07-06  7:19       ` Yu Zhang
2023-07-06 15:58       ` Isaku Yamahata
2023-07-07  1:35         ` David Stevens
2023-07-10 16:34           ` Isaku Yamahata
2023-07-11  2:59             ` David Stevens
2023-08-04 22:45       ` Sean Christopherson
2023-07-05 10:25   ` Yu Zhang
2023-08-24  8:03     ` David Stevens
2023-08-24 15:15       ` Sean Christopherson
2023-08-25  1:38         ` David Stevens
2023-08-31 21:18           ` Sean Christopherson
2023-07-06  2:10   ` Isaku Yamahata
2023-07-06  5:18     ` David Stevens
2023-07-19  6:09   ` Yan Zhao
2023-07-19  7:16     ` David Stevens
2023-07-04  7:50 ` [PATCH v7 6/8] KVM: arm64: Migrate " David Stevens
2023-07-04  7:50 ` [PATCH v7 7/8] KVM: PPC: " David Stevens
2023-07-04  7:50 ` [PATCH v7 8/8] KVM: remove __gfn_to_pfn_memslot David Stevens
2023-08-04 22:47 ` [PATCH v7 0/8] KVM: allow mapping non-refcounted pages Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPj3KyvCa4GM0RJ2@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maz@kernel.org \
    --cc=peterx@redhat.com \
    --cc=stevensd@chromium.org \
    --cc=zhi.wang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).