From: Sean Christopherson <seanjc@google.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: David Stevens <stevensd@chromium.org>,
Christoph Hellwig <hch@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
Isaku Yamahata <isaku.yamahata@gmail.com>,
Zhi Wang <zhi.wang.linux@gmail.com>,
Maxim Levitsky <mlevitsk@redhat.com>,
kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages
Date: Wed, 13 Mar 2024 15:47:47 +0000 [thread overview]
Message-ID: <ZfHKoxVMcBAMqcSC@google.com> (raw)
In-Reply-To: <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
On Wed, Mar 13, 2024, Christian König wrote:
> Am 13.03.24 um 15:48 schrieb Sean Christopherson:
> > On Wed, Mar 13, 2024, Christian König wrote:
> > > Am 13.03.24 um 14:34 schrieb Sean Christopherson:
> > > > What Christoph is objecting to is that, in this series, KVM is explicitly adding
> > > > support for mapping non-compound (huge)pages into KVM guests. David is arguing
> > > > that Christoph's objection to _KVM_ adding support is unfair, because the real
> > > > problem is that the kernel already maps such pages into host userspace. I.e. if
> > > > the userspace mapping ceases to exist, then there are no mappings for KVM to follow
> > > > and propagate to KVM's stage-2 page tables.
> > > And I have to agree with Christoph that this doesn't make much sense. KVM
> > > should *never* map (huge) pages from VMAs marked with VM_PFNMAP into KVM
> > > guests in the first place.
> > >
> > > What it should do instead is to mirror the PFN from the host page tables
> > > into the guest page tables.
> > That's exactly what this series does. Christoph is objecting to KVM playing nice
> > with non-compound hugepages, as he feels that such mappings should not exist
> > *anywhere*.
>
> Well Christoph is right those mappings shouldn't exists and they also don't
> exists.
>
> What happens here is that a driver has allocated some contiguous memory to
> do DMA with. And then some page table is pointing to a PFN inside that
> memory because userspace needs to provide parameters for the DMA transfer.
>
> This is *not* a mapping of a non-compound hugepage, it's simply a PTE
> pointing to some PFN.
Yes, I know. And David knows. By "such mappings" I did not mean "huge PMD mappings
that point at non-compound pages", I meant "any mapping in the host userspace
VMAs and page tables that points at memory that is backed by a larger-than-order-0,
non-compound allocation".
And even then, the whole larger-than-order-0 mapping is not something we on the
KVM side care about, at all. The _only_ new thing KVM is trying to do in this
series is to allow mapping non-refcounted struct page memory into KVM guest.
Those details were brought up purely because they provide context on how/why such
non-refcounted pages exist.
> It can trivially be that userspace only maps 4KiB of some 2MiB piece of
> memory the driver has allocate.
>
> > I.e. Christoph is (implicitly) saying that instead of modifying KVM to play nice,
> > we should instead fix the TTM allocations. And David pointed out that that was
> > tried and got NAK'd.
>
> Well as far as I can see Christoph rejects the complexity coming with the
> approach of sometimes grabbing the reference and sometimes not.
Unless I've wildly misread multiple threads, that is not Christoph's objection.
From v9 (https://lore.kernel.org/all/ZRpiXsm7X6BFAU%2Fy@infradead.org):
On Sun, Oct 1, 2023 at 11:25 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Sep 29, 2023 at 09:06:34AM -0700, Sean Christopherson wrote:
> > KVM needs to be aware of non-refcounted struct page memory no matter what; see
> > CVE-2021-22543 and, commit f8be156be163 ("KVM: do not allow mapping valid but
> > non-reference-counted pages"). I don't think it makes any sense whatsoever to
> > remove that code and assume every driver in existence will do the right thing.
>
> Agreed.
>
> >
> > With the cleanups done, playing nice with non-refcounted paged instead of outright
> > rejecting them is a wash in terms of lines of code, complexity, and ongoing
> > maintenance cost.
>
> I tend to strongly disagree with that, though. We can't just let these
> non-refcounted pages spread everywhere and instead need to fix their
> usage.
> And I have to agree that this is extremely odd.
Yes, it's odd and not ideal. But with nested virtualization, KVM _must_ "map"
pfns directly into the guest via fields in the control structures that are
consumed by hardware. I.e. pfns are exposed to the guest in an "out-of-band"
structure that is NOT part of the stage-2 page tables. And wiring those up to
the MMU notifiers is extremely difficult for a variety of reasons[*].
Because KVM doesn't control which pfns are mapped this way, KVM's compromise is
to grab a reference to the struct page while the out-of-band mapping exists, i.e.
to pin the page to prevent use-after-free. And KVM's historical ABI is to support
any refcounted page for these out-of-band mappings, regardless of whether the
page was obtained by gup() or follow_pte().
Thus, to support non-refouncted VM_PFNMAP pages without breaking existing userspace,
KVM resorts to conditionally grabbing references and disllowing non-refcounted
pages from being inserted into the out-of-band mappings.
But again, I don't think these details are relevant to Christoph's objection.
[*] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@google.com
next prev parent reply other threads:[~2024-03-13 15:47 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-29 2:57 [PATCH v11 0/8] KVM: allow mapping non-refcounted pages David Stevens
2024-02-29 2:57 ` [PATCH v11 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty David Stevens
2024-02-29 2:57 ` [PATCH v11 2/8] KVM: Relax BUG_ON argument validation David Stevens
2024-02-29 2:57 ` [PATCH v11 3/8] KVM: mmu: Introduce kvm_follow_pfn() David Stevens
2024-02-29 2:57 ` [PATCH v11 4/8] KVM: mmu: Improve handling of non-refcounted pfns David Stevens
2024-02-29 2:57 ` [PATCH v11 5/8] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() David Stevens
2024-02-29 2:57 ` [PATCH v11 6/8] KVM: x86: Migrate " David Stevens
2024-02-29 2:57 ` [PATCH v11 7/8] KVM: x86/mmu: Track if sptes refer to refcounted pages David Stevens
2024-02-29 2:57 ` [PATCH v11 8/8] KVM: x86/mmu: Handle non-refcounted pages David Stevens
2024-04-04 16:03 ` Dmitry Osipenko
2024-04-15 7:28 ` David Stevens
2024-04-15 9:36 ` Paolo Bonzini
2024-02-29 13:36 ` [PATCH v11 0/8] KVM: allow mapping " Christoph Hellwig
2024-03-13 4:55 ` David Stevens
2024-03-13 9:55 ` Christian König
2024-03-13 13:34 ` Sean Christopherson
2024-03-13 14:37 ` Christian König
2024-03-13 14:48 ` Sean Christopherson
[not found] ` <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
2024-03-13 15:09 ` Christian König
2024-03-13 15:47 ` Sean Christopherson [this message]
[not found] ` <93df19f9-6dab-41fc-bbcd-b108e52ff50b@amd.com>
2024-03-13 17:26 ` Sean Christopherson
[not found] ` <c84fcf0a-f944-4908-b7f6-a1b66a66a6bc@amd.com>
2024-03-14 9:20 ` Christian König
2024-03-14 11:31 ` David Stevens
2024-03-14 11:51 ` Christian König
2024-03-14 14:45 ` Sean Christopherson
2024-03-18 1:26 ` Christoph Hellwig
2024-03-18 13:10 ` Paolo Bonzini
2024-03-18 23:20 ` Christoph Hellwig
2024-03-14 16:17 ` Sean Christopherson
2024-03-14 17:19 ` Sean Christopherson
2024-03-15 17:59 ` Sean Christopherson
2024-03-20 20:54 ` Axel Rasmussen
2024-03-13 13:33 ` Christoph Hellwig
2024-06-21 18:32 ` Sean Christopherson
2024-07-31 11:41 ` Alex Bennée
2024-07-31 15:01 ` Sean Christopherson
2024-08-05 23:44 ` David Stevens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfHKoxVMcBAMqcSC@google.com \
--to=seanjc@google.com \
--cc=christian.koenig@amd.com \
--cc=hch@infradead.org \
--cc=isaku.yamahata@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=stevensd@chromium.org \
--cc=yu.c.zhang@linux.intel.com \
--cc=zhi.wang.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.