From: Sean Christopherson <seanjc@google.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: David Stevens <stevensd@chromium.org>,
Christoph Hellwig <hch@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
Isaku Yamahata <isaku.yamahata@gmail.com>,
Zhi Wang <zhi.wang.linux@gmail.com>,
Maxim Levitsky <mlevitsk@redhat.com>,
kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages
Date: Wed, 13 Mar 2024 15:47:47 +0000 [thread overview]
Message-ID: <ZfHKoxVMcBAMqcSC@google.com> (raw)
In-Reply-To: <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
On Wed, Mar 13, 2024, Christian König wrote:
> Am 13.03.24 um 15:48 schrieb Sean Christopherson:
> > On Wed, Mar 13, 2024, Christian König wrote:
> > > Am 13.03.24 um 14:34 schrieb Sean Christopherson:
> > > > What Christoph is objecting to is that, in this series, KVM is explicitly adding
> > > > support for mapping non-compound (huge)pages into KVM guests. David is arguing
> > > > that Christoph's objection to _KVM_ adding support is unfair, because the real
> > > > problem is that the kernel already maps such pages into host userspace. I.e. if
> > > > the userspace mapping ceases to exist, then there are no mappings for KVM to follow
> > > > and propagate to KVM's stage-2 page tables.
> > > And I have to agree with Christoph that this doesn't make much sense. KVM
> > > should *never* map (huge) pages from VMAs marked with VM_PFNMAP into KVM
> > > guests in the first place.
> > >
> > > What it should do instead is to mirror the PFN from the host page tables
> > > into the guest page tables.
> > That's exactly what this series does. Christoph is objecting to KVM playing nice
> > with non-compound hugepages, as he feels that such mappings should not exist
> > *anywhere*.
>
> Well Christoph is right those mappings shouldn't exists and they also don't
> exists.
>
> What happens here is that a driver has allocated some contiguous memory to
> do DMA with. And then some page table is pointing to a PFN inside that
> memory because userspace needs to provide parameters for the DMA transfer.
>
> This is *not* a mapping of a non-compound hugepage, it's simply a PTE
> pointing to some PFN.
Yes, I know. And David knows. By "such mappings" I did not mean "huge PMD mappings
that point at non-compound pages", I meant "any mapping in the host userspace
VMAs and page tables that points at memory that is backed by a larger-than-order-0,
non-compound allocation".
And even then, the whole larger-than-order-0 mapping is not something we on the
KVM side care about, at all. The _only_ new thing KVM is trying to do in this
series is to allow mapping non-refcounted struct page memory into KVM guest.
Those details were brought up purely because they provide context on how/why such
non-refcounted pages exist.
> It can trivially be that userspace only maps 4KiB of some 2MiB piece of
> memory the driver has allocate.
>
> > I.e. Christoph is (implicitly) saying that instead of modifying KVM to play nice,
> > we should instead fix the TTM allocations. And David pointed out that that was
> > tried and got NAK'd.
>
> Well as far as I can see Christoph rejects the complexity coming with the
> approach of sometimes grabbing the reference and sometimes not.
Unless I've wildly misread multiple threads, that is not Christoph's objection.
From v9 (https://lore.kernel.org/all/ZRpiXsm7X6BFAU%2Fy@infradead.org):
On Sun, Oct 1, 2023 at 11:25 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Sep 29, 2023 at 09:06:34AM -0700, Sean Christopherson wrote:
> > KVM needs to be aware of non-refcounted struct page memory no matter what; see
> > CVE-2021-22543 and, commit f8be156be163 ("KVM: do not allow mapping valid but
> > non-reference-counted pages"). I don't think it makes any sense whatsoever to
> > remove that code and assume every driver in existence will do the right thing.
>
> Agreed.
>
> >
> > With the cleanups done, playing nice with non-refcounted paged instead of outright
> > rejecting them is a wash in terms of lines of code, complexity, and ongoing
> > maintenance cost.
>
> I tend to strongly disagree with that, though. We can't just let these
> non-refcounted pages spread everywhere and instead need to fix their
> usage.
> And I have to agree that this is extremely odd.
Yes, it's odd and not ideal. But with nested virtualization, KVM _must_ "map"
pfns directly into the guest via fields in the control structures that are
consumed by hardware. I.e. pfns are exposed to the guest in an "out-of-band"
structure that is NOT part of the stage-2 page tables. And wiring those up to
the MMU notifiers is extremely difficult for a variety of reasons[*].
Because KVM doesn't control which pfns are mapped this way, KVM's compromise is
to grab a reference to the struct page while the out-of-band mapping exists, i.e.
to pin the page to prevent use-after-free. And KVM's historical ABI is to support
any refcounted page for these out-of-band mappings, regardless of whether the
page was obtained by gup() or follow_pte().
Thus, to support non-refouncted VM_PFNMAP pages without breaking existing userspace,
KVM resorts to conditionally grabbing references and disllowing non-refcounted
pages from being inserted into the out-of-band mappings.
But again, I don't think these details are relevant to Christoph's objection.
[*] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@google.com
next prev parent reply other threads:[~2024-03-13 15:47 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-29 2:57 [PATCH v11 0/8] KVM: allow mapping non-refcounted pages David Stevens
2024-02-29 2:57 ` [PATCH v11 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty David Stevens
2024-02-29 2:57 ` [PATCH v11 2/8] KVM: Relax BUG_ON argument validation David Stevens
2024-02-29 2:57 ` [PATCH v11 3/8] KVM: mmu: Introduce kvm_follow_pfn() David Stevens
2024-02-29 2:57 ` [PATCH v11 4/8] KVM: mmu: Improve handling of non-refcounted pfns David Stevens
2024-02-29 2:57 ` [PATCH v11 5/8] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() David Stevens
2024-02-29 2:57 ` [PATCH v11 6/8] KVM: x86: Migrate " David Stevens
2024-02-29 2:57 ` [PATCH v11 7/8] KVM: x86/mmu: Track if sptes refer to refcounted pages David Stevens
2024-02-29 2:57 ` [PATCH v11 8/8] KVM: x86/mmu: Handle non-refcounted pages David Stevens
2024-04-04 16:03 ` Dmitry Osipenko
2024-04-15 7:28 ` David Stevens
2024-04-15 9:36 ` Paolo Bonzini
2024-02-29 13:36 ` [PATCH v11 0/8] KVM: allow mapping " Christoph Hellwig
2024-03-13 4:55 ` David Stevens
2024-03-13 9:55 ` Christian König
2024-03-13 13:34 ` Sean Christopherson
2024-03-13 14:37 ` Christian König
2024-03-13 14:48 ` Sean Christopherson
[not found] ` <9e604f99-5b63-44d7-8476-00859dae1dc4@amd.com>
2024-03-13 15:09 ` Christian König
2024-03-13 15:47 ` Sean Christopherson [this message]
[not found] ` <93df19f9-6dab-41fc-bbcd-b108e52ff50b@amd.com>
2024-03-13 17:26 ` Sean Christopherson
[not found] ` <c84fcf0a-f944-4908-b7f6-a1b66a66a6bc@amd.com>
2024-03-14 9:20 ` Christian König
2024-03-14 11:31 ` David Stevens
2024-03-14 11:51 ` Christian König
2024-03-14 14:45 ` Sean Christopherson
2024-03-18 1:26 ` Christoph Hellwig
2024-03-18 13:10 ` Paolo Bonzini
2024-03-18 23:20 ` Christoph Hellwig
2024-03-14 16:17 ` Sean Christopherson
2024-03-14 17:19 ` Sean Christopherson
2024-03-15 17:59 ` Sean Christopherson
2024-03-20 20:54 ` Axel Rasmussen
2024-03-13 13:33 ` Christoph Hellwig
2024-06-21 18:32 ` Sean Christopherson
2024-07-31 11:41 ` Alex Bennée
2024-07-31 15:01 ` Sean Christopherson
2024-08-05 23:44 ` David Stevens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfHKoxVMcBAMqcSC@google.com \
--to=seanjc@google.com \
--cc=christian.koenig@amd.com \
--cc=hch@infradead.org \
--cc=isaku.yamahata@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=stevensd@chromium.org \
--cc=yu.c.zhang@linux.intel.com \
--cc=zhi.wang.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).