public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: David Stevens <stevensd@chromium.org>,
	Christoph Hellwig <hch@infradead.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	 Isaku Yamahata <isaku.yamahata@gmail.com>,
	Zhi Wang <zhi.wang.linux@gmail.com>,
	 Maxim Levitsky <mlevitsk@redhat.com>,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
	 kvm@vger.kernel.org
Subject: Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages
Date: Wed, 13 Mar 2024 17:26:03 +0000	[thread overview]
Message-ID: <ZfHhqzKVZeOxXMnx@google.com> (raw)
In-Reply-To: <93df19f9-6dab-41fc-bbcd-b108e52ff50b@amd.com>

On Wed, Mar 13, 2024, Christian König wrote:
> Am 13.03.24 um 16:47 schrieb Sean Christopherson:
> > [SNIP]
> > > It can trivially be that userspace only maps 4KiB of some 2MiB piece of
> > > memory the driver has allocate.
> > > 
> > > > I.e. Christoph is (implicitly) saying that instead of modifying KVM to play nice,
> > > > we should instead fix the TTM allocations.  And David pointed out that that was
> > > > tried and got NAK'd.
> > > Well as far as I can see Christoph rejects the complexity coming with the
> > > approach of sometimes grabbing the reference and sometimes not.
> > Unless I've wildly misread multiple threads, that is not Christoph's objection.
> >  From v9 (https://lore.kernel.org/all/ZRpiXsm7X6BFAU%2Fy@infradead.org):
> > 
> >    On Sun, Oct 1, 2023 at 11:25 PM Christoph Hellwig<hch@infradead.org>  wrote:
> >    >
> >    > On Fri, Sep 29, 2023 at 09:06:34AM -0700, Sean Christopherson wrote:
> >    > > KVM needs to be aware of non-refcounted struct page memory no matter what; see
> >    > > CVE-2021-22543 and, commit f8be156be163 ("KVM: do not allow mapping valid but
> >    > > non-reference-counted pages").  I don't think it makes any sense whatsoever to
> >    > > remove that code and assume every driver in existence will do the right thing.
> >    >
> >    > Agreed.
> >    >
> >    > >
> >    > > With the cleanups done, playing nice with non-refcounted paged instead of outright
> >    > > rejecting them is a wash in terms of lines of code, complexity, and ongoing
> >    > > maintenance cost.
> >    >
> >    > I tend to strongly disagree with that, though.  We can't just let these
> >    > non-refcounted pages spread everywhere and instead need to fix their
> >    > usage.
> 
> And I can only repeat myself that I completely agree with Christoph here.

I am so confused.  If you agree with Christoph, why not fix the TTM allocations?

> > > And I have to agree that this is extremely odd.
> > Yes, it's odd and not ideal.  But with nested virtualization, KVM _must_ "map"
> > pfns directly into the guest via fields in the control structures that are
> > consumed by hardware.  I.e. pfns are exposed to the guest in an "out-of-band"
> > structure that is NOT part of the stage-2 page tables.  And wiring those up to
> > the MMU notifiers is extremely difficult for a variety of reasons[*].
> > 
> > Because KVM doesn't control which pfns are mapped this way, KVM's compromise is
> > to grab a reference to the struct page while the out-of-band mapping exists, i.e.
> > to pin the page to prevent use-after-free.
> 
> Wait a second, as far as I know this approach doesn't work correctly in all
> cases. See here as well: https://lwn.net/Articles/930667/
>
> The MMU notifier is not only to make sure that pages don't go away and
> prevent use after free, but also that PTE updates correctly wait for ongoing
> DMA transfers.
> 
> Otherwise quite a bunch of other semantics doesn't work correctly either.
> 
> > And KVM's historical ABI is to support
> > any refcounted page for these out-of-band mappings, regardless of whether the
> > page was obtained by gup() or follow_pte().
> 
> Well see the discussion from last year the LWN article summarizes.

Oof.  I suspect that in practice, no one has observed issues because the pages
in question are heavily used by the guest and don't get evicted from the page cache.

> I'm not an expert for KVM but as far as I can see what you guys do here is
> illegal and can lead to corruption.
>
> Basically you can't create a second channel to modify page content like
> that.

Well, KVM can, it just needs to honor mmu_notifier events in order to be 100%
robust.

That said, while this might be motivation to revisit tying the out-of-band mappings
to the mmu_notifier, it has no bearing on the outcome of Christoph's objection.
Because (a) AIUI, Christoph is objecting to mapping non-refcounted struct page
memory *anywhere*, and (b) in this series, KVM will explicitly disallow non-
refcounted pages from being mapped in the out-of-band structures (see below).

> > Thus, to support non-refouncted VM_PFNMAP pages without breaking existing userspace,
> > KVM resorts to conditionally grabbing references and disllowing non-refcounted
> > pages from being inserted into the out-of-band mappings.
> 
> Why not only refcount the pages when out of band mappings are requested?

That's also what this series effectively does.  By default, KVM will disallow
inserting *any* non-refcounted memory into the out-of-band mappings.

"By default" because there are use cases where the guest memory pool is hidden
from the kernel at boot, and is fully managed by userspace.  I.e. userspace is
effectively responsible/trusted to not free/change the mappings for an active
guest.

  parent reply	other threads:[~2024-03-13 17:26 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29  2:57 [PATCH v11 0/8] KVM: allow mapping non-refcounted pages David Stevens
2024-02-29  2:57 ` [PATCH v11 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty David Stevens
2024-02-29  2:57 ` [PATCH v11 2/8] KVM: Relax BUG_ON argument validation David Stevens
2024-02-29  2:57 ` [PATCH v11 3/8] KVM: mmu: Introduce kvm_follow_pfn() David Stevens
2024-02-29  2:57 ` [PATCH v11 4/8] KVM: mmu: Improve handling of non-refcounted pfns David Stevens
2024-02-29  2:57 ` [PATCH v11 5/8] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() David Stevens
2024-02-29  2:57 ` [PATCH v11 6/8] KVM: x86: Migrate " David Stevens
2024-02-29  2:57 ` [PATCH v11 7/8] KVM: x86/mmu: Track if sptes refer to refcounted pages David Stevens
2024-02-29  2:57 ` [PATCH v11 8/8] KVM: x86/mmu: Handle non-refcounted pages David Stevens
2024-04-04 16:03   ` Dmitry Osipenko
2024-04-15  7:28     ` David Stevens
2024-04-15  9:36       ` Paolo Bonzini
2024-02-29 13:36 ` [PATCH v11 0/8] KVM: allow mapping " Christoph Hellwig
2024-03-13  4:55   ` David Stevens
2024-03-13  9:55     ` Christian König
2024-03-13 13:34       ` Sean Christopherson
2024-03-13 14:37         ` Christian König
2024-03-13 14:48           ` Sean Christopherson
     [not found]             ` <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
2024-03-13 15:09               ` Christian König
2024-03-13 15:47               ` Sean Christopherson
     [not found]                 ` <93df19f9-6dab-41fc-bbcd-b108e52ff50b@amd.com>
2024-03-13 17:26                   ` Sean Christopherson [this message]
     [not found]                     ` <c84fcf0a-f944-4908-b7f6-a1b66a66a6bc@amd.com>
2024-03-14  9:20                       ` Christian König
2024-03-14 11:31                         ` David Stevens
2024-03-14 11:51                           ` Christian König
2024-03-14 14:45                             ` Sean Christopherson
2024-03-18  1:26                             ` Christoph Hellwig
2024-03-18 13:10                               ` Paolo Bonzini
2024-03-18 23:20                                 ` Christoph Hellwig
2024-03-14 16:17                           ` Sean Christopherson
2024-03-14 17:19                             ` Sean Christopherson
2024-03-15 17:59                               ` Sean Christopherson
2024-03-20 20:54                                 ` Axel Rasmussen
2024-03-13 13:33     ` Christoph Hellwig
2024-06-21 18:32 ` Sean Christopherson
2024-07-31 11:41   ` Alex Bennée
2024-07-31 15:01     ` Sean Christopherson
2024-08-05 23:44     ` David Stevens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfHhqzKVZeOxXMnx@google.com \
    --to=seanjc@google.com \
    --cc=christian.koenig@amd.com \
    --cc=hch@infradead.org \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=stevensd@chromium.org \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=zhi.wang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox