From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Cc: Wanpeng Li <wanpengli@tencent.com>,
kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
Joerg Roedel <joro@8bytes.org>,
linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Will Deacon <will@kernel.org>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org,
Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH v5 4/4] KVM: mmu: remove over-aggressive warnings
Date: Wed, 5 Jan 2022 19:02:41 +0000 [thread overview]
Message-ID: <YdXrURHO/R82puD4@google.com> (raw)
In-Reply-To: <CAD=HUj5Q6rW8UyxAXUa3o93T0LBqGQb7ScPj07kvuM3txHMMrQ@mail.gmail.com>
On Wed, Jan 05, 2022, David Stevens wrote:
> On Fri, Dec 31, 2021 at 4:22 AM Sean Christopherson <seanjc@google.com> wrote:
> > > */
> > > - if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
> > > + if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
> >
> > Hrm, I know the whole point of this series is to support pages without an elevated
> > refcount, but this WARN was extremely helpful in catching several use-after-free
> > bugs in the TDP MMU. We talked about burying a slow check behind MMU_WARN_ON, but
> > that isn't very helpful because no one runs with MMU_WARN_ON, and this is also a
> > type of check that's most useful if it runs in production.
> >
> > IIUC, this series explicitly disallows using pfns that have a struct page without
> > refcounting, and the issue with the WARN here is that kvm_is_zone_device_pfn() is
> > called by kvm_is_reserved_pfn() before ensure_pfn_ref() rejects problematic pages,
> > i.e. triggers false positive.
> >
> > So, can't we preserve the use-after-free benefits of the check by moving it to
> > where KVM releases the PFN? I.e.
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fbca2e232e94..675b835525fa 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2904,15 +2904,19 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
> >
> > void kvm_set_pfn_dirty(kvm_pfn_t pfn)
> > {
> > - if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
> > + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) {
> > + WARN_ON_ONCE(!page_count(pfn_to_page(pfn)));
> > SetPageDirty(pfn_to_page(pfn));
> > + }
> > }
> > EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
>
> I'm still seeing this warning show up via __handle_changed_spte
> calling kvm_set_pfn_dirty:
>
> [ 113.350473] kvm_set_pfn_dirty+0x26/0x3e
> [ 113.354861] __handle_changed_spte+0x452/0x4f6
> [ 113.359841] __handle_changed_spte+0x452/0x4f6
> [ 113.364819] __handle_changed_spte+0x452/0x4f6
> [ 113.369790] zap_gfn_range+0x1de/0x27a
> [ 113.373992] kvm_tdp_mmu_zap_invalidated_roots+0x64/0xb8
> [ 113.379945] kvm_mmu_zap_all_fast+0x18c/0x1c1
> [ 113.384827] kvm_page_track_flush_slot+0x55/0x87
> [ 113.390000] kvm_set_memslot+0x137/0x455
> [ 113.394394] kvm_delete_memslot+0x5c/0x91
> [ 113.398888] __kvm_set_memory_region+0x3c0/0x5e6
> [ 113.404061] kvm_set_memory_region+0x45/0x74
> [ 113.408844] kvm_vm_ioctl+0x563/0x60c
>
> I wasn't seeing it for my particular test case, but the gfn aging code
> might trigger the warning as well.
Ah, I got royally confused by ensure_pfn_ref()'s comment
* Certain IO or PFNMAP mappings can be backed with valid
* struct pages, but be allocated without refcounting e.g.,
* tail pages of non-compound higher order allocations, which
* would then underflow the refcount when the caller does the
* required put_page. Don't allow those pages here.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
that doesn't apply here because kvm_faultin_pfn() uses the low level
__gfn_to_pfn_page_memslot().
and my understanding is that @page will be non-NULL in ensure_pfn_ref() iff the
page has an elevated refcount.
Can you update the changelogs for the x86+arm64 "use gfn_to_pfn_page" patches to
explicitly call out the various ramifications of moving to gfn_to_pfn_page()?
Side topic, s/covert/convert in both changelogs :-)
> I don't know if setting the dirty/accessed bits in non-refcounted
> struct pages is problematic.
Without knowing exactly what lies behind such pages, KVM needs to set dirty bits,
otherwise there's a potential for data lost.
> The only way I can see to avoid it would be to try to map from the spte to
> the vma and then check its flags. If setting the flags is benign, then we'd
> need to do that lookup to differentiate the safe case from the use-after-free
> case. Do you have any advice on how to handle this?
Hrm. I can't think of a clever generic solution. But for x86-64, we can use a
software available bit to mark SPTEs as being refcounted use that flag to assert
the refcount is elevated when marking the backing pfn dirty/accessed. It'd be
64-bit only because we're out of software available bits for PAE paging, but (a)
practically no one cares about 32-bit and (b) odds are slim that a use-after-free
would be unique to 32-bit KVM.
But that can all go in after your series is merged, e.g. I'd prefer to cleanup
make_spte()'s prototype to use @fault adding yet another parameter, and that'll
take a few patches to make happen since FNAME(sync_page) also uses make_spte().
TL;DR: continue as you were, I'll stop whining about this :-)
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Will Deacon <will@kernel.org>, Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH v5 4/4] KVM: mmu: remove over-aggressive warnings
Date: Wed, 5 Jan 2022 19:02:41 +0000 [thread overview]
Message-ID: <YdXrURHO/R82puD4@google.com> (raw)
In-Reply-To: <CAD=HUj5Q6rW8UyxAXUa3o93T0LBqGQb7ScPj07kvuM3txHMMrQ@mail.gmail.com>
On Wed, Jan 05, 2022, David Stevens wrote:
> On Fri, Dec 31, 2021 at 4:22 AM Sean Christopherson <seanjc@google.com> wrote:
> > > */
> > > - if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
> > > + if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
> >
> > Hrm, I know the whole point of this series is to support pages without an elevated
> > refcount, but this WARN was extremely helpful in catching several use-after-free
> > bugs in the TDP MMU. We talked about burying a slow check behind MMU_WARN_ON, but
> > that isn't very helpful because no one runs with MMU_WARN_ON, and this is also a
> > type of check that's most useful if it runs in production.
> >
> > IIUC, this series explicitly disallows using pfns that have a struct page without
> > refcounting, and the issue with the WARN here is that kvm_is_zone_device_pfn() is
> > called by kvm_is_reserved_pfn() before ensure_pfn_ref() rejects problematic pages,
> > i.e. triggers false positive.
> >
> > So, can't we preserve the use-after-free benefits of the check by moving it to
> > where KVM releases the PFN? I.e.
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fbca2e232e94..675b835525fa 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2904,15 +2904,19 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
> >
> > void kvm_set_pfn_dirty(kvm_pfn_t pfn)
> > {
> > - if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
> > + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) {
> > + WARN_ON_ONCE(!page_count(pfn_to_page(pfn)));
> > SetPageDirty(pfn_to_page(pfn));
> > + }
> > }
> > EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
>
> I'm still seeing this warning show up via __handle_changed_spte
> calling kvm_set_pfn_dirty:
>
> [ 113.350473] kvm_set_pfn_dirty+0x26/0x3e
> [ 113.354861] __handle_changed_spte+0x452/0x4f6
> [ 113.359841] __handle_changed_spte+0x452/0x4f6
> [ 113.364819] __handle_changed_spte+0x452/0x4f6
> [ 113.369790] zap_gfn_range+0x1de/0x27a
> [ 113.373992] kvm_tdp_mmu_zap_invalidated_roots+0x64/0xb8
> [ 113.379945] kvm_mmu_zap_all_fast+0x18c/0x1c1
> [ 113.384827] kvm_page_track_flush_slot+0x55/0x87
> [ 113.390000] kvm_set_memslot+0x137/0x455
> [ 113.394394] kvm_delete_memslot+0x5c/0x91
> [ 113.398888] __kvm_set_memory_region+0x3c0/0x5e6
> [ 113.404061] kvm_set_memory_region+0x45/0x74
> [ 113.408844] kvm_vm_ioctl+0x563/0x60c
>
> I wasn't seeing it for my particular test case, but the gfn aging code
> might trigger the warning as well.
Ah, I got royally confused by ensure_pfn_ref()'s comment
* Certain IO or PFNMAP mappings can be backed with valid
* struct pages, but be allocated without refcounting e.g.,
* tail pages of non-compound higher order allocations, which
* would then underflow the refcount when the caller does the
* required put_page. Don't allow those pages here.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
that doesn't apply here because kvm_faultin_pfn() uses the low level
__gfn_to_pfn_page_memslot().
and my understanding is that @page will be non-NULL in ensure_pfn_ref() iff the
page has an elevated refcount.
Can you update the changelogs for the x86+arm64 "use gfn_to_pfn_page" patches to
explicitly call out the various ramifications of moving to gfn_to_pfn_page()?
Side topic, s/covert/convert in both changelogs :-)
> I don't know if setting the dirty/accessed bits in non-refcounted
> struct pages is problematic.
Without knowing exactly what lies behind such pages, KVM needs to set dirty bits,
otherwise there's a potential for data lost.
> The only way I can see to avoid it would be to try to map from the spte to
> the vma and then check its flags. If setting the flags is benign, then we'd
> need to do that lookup to differentiate the safe case from the use-after-free
> case. Do you have any advice on how to handle this?
Hrm. I can't think of a clever generic solution. But for x86-64, we can use a
software available bit to mark SPTEs as being refcounted use that flag to assert
the refcount is elevated when marking the backing pfn dirty/accessed. It'd be
64-bit only because we're out of software available bits for PAE paging, but (a)
practically no one cares about 32-bit and (b) odds are slim that a use-after-free
would be unique to 32-bit KVM.
But that can all go in after your series is merged, e.g. I'd prefer to cleanup
make_spte()'s prototype to use @fault adding yet another parameter, and that'll
take a few patches to make happen since FNAME(sync_page) also uses make_spte().
TL;DR: continue as you were, I'll stop whining about this :-)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Will Deacon <will@kernel.org>, Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH v5 4/4] KVM: mmu: remove over-aggressive warnings
Date: Wed, 5 Jan 2022 19:02:41 +0000 [thread overview]
Message-ID: <YdXrURHO/R82puD4@google.com> (raw)
In-Reply-To: <CAD=HUj5Q6rW8UyxAXUa3o93T0LBqGQb7ScPj07kvuM3txHMMrQ@mail.gmail.com>
On Wed, Jan 05, 2022, David Stevens wrote:
> On Fri, Dec 31, 2021 at 4:22 AM Sean Christopherson <seanjc@google.com> wrote:
> > > */
> > > - if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
> > > + if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
> >
> > Hrm, I know the whole point of this series is to support pages without an elevated
> > refcount, but this WARN was extremely helpful in catching several use-after-free
> > bugs in the TDP MMU. We talked about burying a slow check behind MMU_WARN_ON, but
> > that isn't very helpful because no one runs with MMU_WARN_ON, and this is also a
> > type of check that's most useful if it runs in production.
> >
> > IIUC, this series explicitly disallows using pfns that have a struct page without
> > refcounting, and the issue with the WARN here is that kvm_is_zone_device_pfn() is
> > called by kvm_is_reserved_pfn() before ensure_pfn_ref() rejects problematic pages,
> > i.e. triggers false positive.
> >
> > So, can't we preserve the use-after-free benefits of the check by moving it to
> > where KVM releases the PFN? I.e.
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fbca2e232e94..675b835525fa 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2904,15 +2904,19 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
> >
> > void kvm_set_pfn_dirty(kvm_pfn_t pfn)
> > {
> > - if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
> > + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) {
> > + WARN_ON_ONCE(!page_count(pfn_to_page(pfn)));
> > SetPageDirty(pfn_to_page(pfn));
> > + }
> > }
> > EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
>
> I'm still seeing this warning show up via __handle_changed_spte
> calling kvm_set_pfn_dirty:
>
> [ 113.350473] kvm_set_pfn_dirty+0x26/0x3e
> [ 113.354861] __handle_changed_spte+0x452/0x4f6
> [ 113.359841] __handle_changed_spte+0x452/0x4f6
> [ 113.364819] __handle_changed_spte+0x452/0x4f6
> [ 113.369790] zap_gfn_range+0x1de/0x27a
> [ 113.373992] kvm_tdp_mmu_zap_invalidated_roots+0x64/0xb8
> [ 113.379945] kvm_mmu_zap_all_fast+0x18c/0x1c1
> [ 113.384827] kvm_page_track_flush_slot+0x55/0x87
> [ 113.390000] kvm_set_memslot+0x137/0x455
> [ 113.394394] kvm_delete_memslot+0x5c/0x91
> [ 113.398888] __kvm_set_memory_region+0x3c0/0x5e6
> [ 113.404061] kvm_set_memory_region+0x45/0x74
> [ 113.408844] kvm_vm_ioctl+0x563/0x60c
>
> I wasn't seeing it for my particular test case, but the gfn aging code
> might trigger the warning as well.
Ah, I got royally confused by ensure_pfn_ref()'s comment
* Certain IO or PFNMAP mappings can be backed with valid
* struct pages, but be allocated without refcounting e.g.,
* tail pages of non-compound higher order allocations, which
* would then underflow the refcount when the caller does the
* required put_page. Don't allow those pages here.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
that doesn't apply here because kvm_faultin_pfn() uses the low level
__gfn_to_pfn_page_memslot().
and my understanding is that @page will be non-NULL in ensure_pfn_ref() iff the
page has an elevated refcount.
Can you update the changelogs for the x86+arm64 "use gfn_to_pfn_page" patches to
explicitly call out the various ramifications of moving to gfn_to_pfn_page()?
Side topic, s/covert/convert in both changelogs :-)
> I don't know if setting the dirty/accessed bits in non-refcounted
> struct pages is problematic.
Without knowing exactly what lies behind such pages, KVM needs to set dirty bits,
otherwise there's a potential for data lost.
> The only way I can see to avoid it would be to try to map from the spte to
> the vma and then check its flags. If setting the flags is benign, then we'd
> need to do that lookup to differentiate the safe case from the use-after-free
> case. Do you have any advice on how to handle this?
Hrm. I can't think of a clever generic solution. But for x86-64, we can use a
software available bit to mark SPTEs as being refcounted use that flag to assert
the refcount is elevated when marking the backing pfn dirty/accessed. It'd be
64-bit only because we're out of software available bits for PAE paging, but (a)
practically no one cares about 32-bit and (b) odds are slim that a use-after-free
would be unique to 32-bit KVM.
But that can all go in after your series is merged, e.g. I'd prefer to cleanup
make_spte()'s prototype to use @fault adding yet another parameter, and that'll
take a few patches to make happen since FNAME(sync_page) also uses make_spte().
TL;DR: continue as you were, I'll stop whining about this :-)
next prev parent reply other threads:[~2022-01-05 19:02 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-29 3:43 [PATCH v5 0/4] KVM: allow mapping non-refcounted pages David Stevens
2021-11-29 3:43 ` David Stevens
2021-11-29 3:43 ` David Stevens
2021-11-29 3:43 ` [PATCH v5 1/4] KVM: mmu: introduce new gfn_to_pfn_page functions David Stevens
2021-11-29 3:43 ` David Stevens
2021-11-29 3:43 ` David Stevens
2021-12-30 19:26 ` Sean Christopherson
2021-12-30 19:26 ` Sean Christopherson
2021-12-30 19:26 ` Sean Christopherson
2021-11-29 3:43 ` [PATCH v5 2/4] KVM: x86/mmu: use gfn_to_pfn_page David Stevens
2021-11-29 3:43 ` David Stevens
2021-11-29 3:43 ` David Stevens
2021-12-30 19:30 ` Sean Christopherson
2021-12-30 19:30 ` Sean Christopherson
2021-12-30 19:30 ` Sean Christopherson
2021-11-29 3:43 ` [PATCH v5 3/4] KVM: arm64/mmu: " David Stevens
2021-11-29 3:43 ` David Stevens
2021-11-29 3:43 ` David Stevens
2021-12-30 19:45 ` Sean Christopherson
2021-12-30 19:45 ` Sean Christopherson
2021-12-30 19:45 ` Sean Christopherson
2021-11-29 3:43 ` [PATCH v5 4/4] KVM: mmu: remove over-aggressive warnings David Stevens
2021-11-29 3:43 ` David Stevens
2021-11-29 3:43 ` David Stevens
2021-12-30 19:22 ` Sean Christopherson
2021-12-30 19:22 ` Sean Christopherson
2021-12-30 19:22 ` Sean Christopherson
2022-01-05 7:14 ` David Stevens
2022-01-05 7:14 ` David Stevens
2022-01-05 7:14 ` David Stevens
2022-01-05 19:02 ` Sean Christopherson [this message]
2022-01-05 19:02 ` Sean Christopherson
2022-01-05 19:02 ` Sean Christopherson
2022-01-05 19:19 ` Sean Christopherson
2022-01-05 19:19 ` Sean Christopherson
2022-01-05 19:19 ` Sean Christopherson
2022-01-06 2:42 ` David Stevens
2022-01-06 2:42 ` David Stevens
2022-01-06 2:42 ` David Stevens
2022-01-06 17:38 ` Sean Christopherson
2022-01-06 17:38 ` Sean Christopherson
2022-01-06 17:38 ` Sean Christopherson
2022-01-07 2:21 ` David Stevens
2022-01-07 2:21 ` David Stevens
2022-01-07 2:21 ` David Stevens
2022-01-07 16:31 ` Sean Christopherson
2022-01-07 16:31 ` Sean Christopherson
2022-01-07 16:31 ` Sean Christopherson
2022-01-07 16:46 ` Sean Christopherson
2022-01-07 16:46 ` Sean Christopherson
2022-01-07 16:46 ` Sean Christopherson
2022-01-10 23:47 ` David Stevens
2022-01-10 23:47 ` David Stevens
2022-01-10 23:47 ` David Stevens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YdXrURHO/R82puD4@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=pbonzini@redhat.com \
--cc=stevensd@chromium.org \
--cc=wanpengli@tencent.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.