From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dan Williams" <dan.j.williams@intel.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Vitaly Kuznetsov" <vkuznets@redhat.com>,
"Wanpeng Li" <wanpengli@tencent.com>,
"Jim Mattson" <jmattson@google.com>,
"Joerg Roedel" <joro@8bytes.org>,
"KVM list" <kvm@vger.kernel.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
"Adam Borowski" <kilobyte@angband.pl>,
"David Hildenbrand" <david@redhat.com>
Subject: Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
Date: Wed, 6 Nov 2019 13:30:32 -0800 [thread overview]
Message-ID: <20191106213032.GA20475@linux.intel.com> (raw)
In-Reply-To: <1cf71906-ba99-e637-650f-fc08ac4f3d5f@redhat.com>
On Wed, Nov 06, 2019 at 10:09:29PM +0100, Paolo Bonzini wrote:
> On 06/11/19 19:04, Dan Williams wrote:
> > On Wed, Nov 6, 2019 at 9:07 AM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > This is racy unless you can be certain that the pfn and resulting page
> > has already been pinned by get_user_pages().
>
> What is the race exactly?
The check in kvm_get_pfn() is guaranteed to be racy, but that's already
broken with respect to ZONE_DEVICE.
> In general KVM does not use pfn's until after having gotten them from
> get_user_pages (or follow_pfn for VM_IO | VM_PFNMAP vmas, for which
> get_user_pages fails, but this is not an issue here). It then creates
> the page tables and releases the reference to the struct page.
>
> Anything else happens _after_ the reference has been released, but still
> from an mmu notifier; this is why KVM uses pfn_to_page quite pervasively.
>
> If this is enough to avoid races, then I prefer Sean's patch. If it is
> racy, we need to fix kvm_set_pfn_accessed and kvm_set_pfn_dirty first,
> and second at transparent_hugepage_adjust and kvm_mmu_zap_collapsible_spte:
transparent_hugepage_adjust() would be ok, it's called before the original
reference is put back.
> - if accessed/dirty state need not be tracked properly for ZONE_DEVICE,
> then I suppose David's patch is okay (though I'd like to have a big
> comment explaining all the things that went on in these emails). If
> they need to work, however, Sean's patch 1 is the right thing to do.
>
> - if we need Sean's patch 1, but it is racy to use is_zone_device_page,
> we could first introduce a helper similar to kvm_is_hugepage_allowed()
> from his patch 2, but using pfn_to_online_page() to filter out
> ZONE_DEVICE pages. This would cover both transparent_hugepage_adjust
> and kvm_mmu_zap_collapsible_spte.
After more thought, I agree with Paolo that adding kvm_is_zone_device_pfn()
is preferably if blocking with mmu_notifier is sufficient to avoid races.
The only caller that isn't protected by mmu_notifier (or holding a gup()-
obtained referece) is kvm_get_pfn(), but again, that's completely borked
anyways. Adding a WARN_ON(page_count()) in kvm_is_zone_device_pfn() would
help with the auditing issue.
The scope of the changes required to completely avoid is_zone_device_page()
is massive, and probably would result in additional trickery :-(
> > This is why I told David
> > to steer clear of adding more is_zone_device_page() usage, it's
> > difficult to audit. Without an existing pin the metadata to determine
> > whether a page is ZONE_DEVICE or not could be in the process of being
> > torn down. Ideally KVM would pass around a struct { struct page *page,
> > unsigned long pfn } tuple so it would not have to do this "recall
> > context" dance on every pfn operation.
>
> Unfortunately once KVM has created its own page tables, the struct page*
> reference is lost, as the PFN is the only thing that is stored in there.
Ya, the horrible idea I had in mind was to steal a bit from KVM_PFN_ERR_MASK
to track whether or not the PFN is associated with a struct page.
next prev parent reply other threads:[~2019-11-06 21:30 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-06 17:07 [PATCH 0/2] KVM: MMU: Fix a refcount bug with ZONE_DEVICE pages Sean Christopherson
2019-11-06 17:07 ` [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved Sean Christopherson
2019-11-06 17:14 ` Paolo Bonzini
2019-11-06 17:46 ` Sean Christopherson
2019-11-06 18:04 ` Dan Williams
2019-11-06 20:26 ` Sean Christopherson
2019-11-06 20:34 ` Dan Williams
2019-11-06 21:09 ` Paolo Bonzini
2019-11-06 21:30 ` Sean Christopherson [this message]
2019-11-06 23:20 ` Dan Williams
2019-11-06 23:39 ` Sean Christopherson
2019-11-07 0:01 ` Dan Williams
2019-11-07 5:48 ` Dan Williams
2019-11-07 11:12 ` Paolo Bonzini
2019-11-07 15:36 ` Dan Williams
2019-11-07 15:58 ` Sean Christopherson
2019-11-09 1:43 ` Sean Christopherson
2019-11-09 2:00 ` Dan Williams
2019-11-11 18:27 ` Sean Christopherson
2019-11-11 19:15 ` Paolo Bonzini
2019-11-12 0:51 ` Dan Williams
2019-11-12 10:19 ` Paolo Bonzini
2019-11-12 16:57 ` Sean Christopherson
2019-11-06 17:07 ` [PATCH 2/2] KVM: x86/mmu: Add helper to consolidate huge page promotion Sean Christopherson
2019-11-06 17:22 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191106213032.GA20475@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kilobyte@angband.pl \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.