All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-hyperv@vger.kernel.org, "Michal Hocko" <mhocko@suse.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"KVM list" <kvm@vger.kernel.org>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"KarimAllah Ahmed" <karahmed@amazon.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Kees Cook" <keescook@chromium.org>,
	devel@driverdev.osuosl.org,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Stephen Hemminger" <sthemmin@microsoft.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"Joerg Roedel" <joro@8bytes.org>, "X86 ML" <x86@kernel.org>,
	YueHaibing <yuehaibing@huawei.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Isaac J. Manjarres" <isaacm@codeaurora.org>,
	"Matt Sickler" <Matt.Sickler@daktronics.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Sasha Levin" <sashal@kernel.org>,
	kvm-ppc@vger.kernel.org, "Qian Cai" <cai@lca.pw>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Jim Mattson" <jmattson@google.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Adam Borowski" <kilobyte@angband.pl>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Pavel Tatashin" <pasha.tatashin@soleen.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
Date: Wed, 6 Nov 2019 08:09:13 -0800	[thread overview]
Message-ID: <20191106160913.GD16249@linux.intel.com> (raw)
In-Reply-To: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com>

On Wed, Nov 06, 2019 at 07:56:34AM +0100, David Hildenbrand wrote:
> On 06.11.19 01:08, Dan Williams wrote:
> >On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson
> >>But David's proposed fix for the above refcount bug is to omit the patch
> >>so that KVM no longer treats ZONE_DEVICE pages as reserved.  That seems
> >>like the right thing to do, including for thp_adjust(), e.g. it would
> >>naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is
> >>mapped with a huge page (2mb or above) in the host.  The only hiccup is
> >>figuring out how to correctly transfer the reference.
> >
> >That might not be the only hiccup. There's currently no such thing as
> >huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud),
> >but the result of pfn_to_page() on such a mapping does not yield a
> >huge 'struct page'. It seems there are other paths in KVM that assume
> >that more typical page machinery is active like SetPageDirty() based
> >on kvm_is_reserved_pfn(). While I told David that I did not want to
> >see more usage of is_zone_device_page(), this patch below (untested)
> >seems a cleaner path with less surprises:
> >
> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >index 4df0aa6b8e5c..fbea17c1810c 100644
> >--- a/virt/kvm/kvm_main.c
> >+++ b/virt/kvm/kvm_main.c
> >@@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
> >
> >  void kvm_release_pfn_clean(kvm_pfn_t pfn)
> >  {
> >-       if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
> >+       if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) ||

The is_error_noslot_pfn() check shouldn't be overriden by zone_device.

> >+           (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn))))

But rather than special case kvm_release_pfn_clean(), I'd rather KVM
explicitly handle ZONE_DEVICE pages, there are other flows where KVM
really should be aware of ZONE_DEVICE pages, e.g. for sanity checks and
whatnot.  There are surprisingly few callers of kvm_is_reserved_pfn(), so
it's actually not too big of a change. 

> >                 put_page(pfn_to_page(pfn));
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
> 
> I had the same thought, but I do wonder about the kvm_get_pfn() users,
> e.g.,:
> 
> hva_to_pfn_remapped():
> 	r = follow_pfn(vma, addr, &pfn);
> 	...
> 	kvm_get_pfn(pfn);
> 	...
> 
> We would not take a reference for ZONE_DEVICE, but later drop one reference
> via kvm_release_pfn_clean(). IOW, kvm_get_pfn() gets *really* dangerous to
> use. I can't tell if this can happen right now.
> 
> We do have 3 users of kvm_get_pfn() that we have to audit before this
> change. Also, we should add a comment to kvm_get_pfn() that it should never
> be used with possible ZONE_DEVICE pages.
> 
> Also, we should add a comment to kvm_release_pfn_clean(), describing why we
> treat ZONE_DEVICE in a special way here.
> 
> 
> We can then progress like this
> 
> 1. Get this fix upstream, it's somewhat unrelated to this series.
> 2. This patch here remains as is in this series (+/- documentation update)
> 3. Long term, rework KVM to not have to not treat ZONE_DEVICE like reserved
> pages. E.g., get rid of kvm_get_pfn(). Then, this special zone check can go.

Dropping kvm_get_pfn() is less than ideal, and at this point unnecessary.
I'm 99% sure the existing call sites for kvm_get_pfn() can never be
reached with ZONE_DEVICE pages.  I think we can do:

  1. Get a fix upstream to have KVM stop treating ZONE_DEVICE pages as
     reserved PFNs, i.e. exempt them in kvm_is_reserved_pfn() and change
     the callers of kvm_is_reserved_pfn() to handle ZONE_DEVICE pages.
  2. Drop this patch from the series, and instead remove the special
     treatment of ZONE_DEVICE pages from kvm_is_reserved_pfn().

Give me a few minutes to prep a patch.

> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
> 

WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-hyperv@vger.kernel.org, "Michal Hocko" <mhocko@suse.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"KVM list" <kvm@vger.kernel.org>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"KarimAllah Ahmed" <karahmed@amazon.de>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Paul Mackerras" <paulus@ozlabs.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Kees Cook" <keescook@chromium.org>,
	devel@driverdev.osuosl.org,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Stephen Hemminger" <sthemmin@microsoft.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"Joerg Roedel" <joro@8bytes.org>, "X86 ML" <x86@kernel.org>,
	YueHaibing <yuehaibing@huawei.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Isaac J. Manjarres" <isaacm@codeaurora.org>,
	"Matt Sickler" <Matt.Sickler@daktronics.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Sasha Levin" <sashal@kernel.org>,
	kvm-ppc@vger.kernel.org, "Qian Cai" <cai@lca.pw>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Jim Mattson" <jmattson@google.com>,
	"Christophe Leroy" <christophe.leroy@c-s.fr>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Adam Borowski" <kilobyte@angband.pl>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Pavel Tatashin" <pasha.tatashin@soleen.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [Xen-devel] [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
Date: Wed, 6 Nov 2019 08:09:13 -0800	[thread overview]
Message-ID: <20191106160913.GD16249@linux.intel.com> (raw)
In-Reply-To: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com>

On Wed, Nov 06, 2019 at 07:56:34AM +0100, David Hildenbrand wrote:
> On 06.11.19 01:08, Dan Williams wrote:
> >On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson
> >>But David's proposed fix for the above refcount bug is to omit the patch
> >>so that KVM no longer treats ZONE_DEVICE pages as reserved.  That seems
> >>like the right thing to do, including for thp_adjust(), e.g. it would
> >>naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is
> >>mapped with a huge page (2mb or above) in the host.  The only hiccup is
> >>figuring out how to correctly transfer the reference.
> >
> >That might not be the only hiccup. There's currently no such thing as
> >huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud),
> >but the result of pfn_to_page() on such a mapping does not yield a
> >huge 'struct page'. It seems there are other paths in KVM that assume
> >that more typical page machinery is active like SetPageDirty() based
> >on kvm_is_reserved_pfn(). While I told David that I did not want to
> >see more usage of is_zone_device_page(), this patch below (untested)
> >seems a cleaner path with less surprises:
> >
> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >index 4df0aa6b8e5c..fbea17c1810c 100644
> >--- a/virt/kvm/kvm_main.c
> >+++ b/virt/kvm/kvm_main.c
> >@@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
> >
> >  void kvm_release_pfn_clean(kvm_pfn_t pfn)
> >  {
> >-       if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
> >+       if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) ||

The is_error_noslot_pfn() check shouldn't be overriden by zone_device.

> >+           (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn))))

But rather than special case kvm_release_pfn_clean(), I'd rather KVM
explicitly handle ZONE_DEVICE pages, there are other flows where KVM
really should be aware of ZONE_DEVICE pages, e.g. for sanity checks and
whatnot.  There are surprisingly few callers of kvm_is_reserved_pfn(), so
it's actually not too big of a change. 

> >                 put_page(pfn_to_page(pfn));
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
> 
> I had the same thought, but I do wonder about the kvm_get_pfn() users,
> e.g.,:
> 
> hva_to_pfn_remapped():
> 	r = follow_pfn(vma, addr, &pfn);
> 	...
> 	kvm_get_pfn(pfn);
> 	...
> 
> We would not take a reference for ZONE_DEVICE, but later drop one reference
> via kvm_release_pfn_clean(). IOW, kvm_get_pfn() gets *really* dangerous to
> use. I can't tell if this can happen right now.
> 
> We do have 3 users of kvm_get_pfn() that we have to audit before this
> change. Also, we should add a comment to kvm_get_pfn() that it should never
> be used with possible ZONE_DEVICE pages.
> 
> Also, we should add a comment to kvm_release_pfn_clean(), describing why we
> treat ZONE_DEVICE in a special way here.
> 
> 
> We can then progress like this
> 
> 1. Get this fix upstream, it's somewhat unrelated to this series.
> 2. This patch here remains as is in this series (+/- documentation update)
> 3. Long term, rework KVM to not have to not treat ZONE_DEVICE like reserved
> pages. E.g., get rid of kvm_get_pfn(). Then, this special zone check can go.

Dropping kvm_get_pfn() is less than ideal, and at this point unnecessary.
I'm 99% sure the existing call sites for kvm_get_pfn() can never be
reached with ZONE_DEVICE pages.  I think we can do:

  1. Get a fix upstream to have KVM stop treating ZONE_DEVICE pages as
     reserved PFNs, i.e. exempt them in kvm_is_reserved_pfn() and change
     the callers of kvm_is_reserved_pfn() to handle ZONE_DEVICE pages.
  2. Drop this patch from the series, and instead remove the special
     treatment of ZONE_DEVICE pages from kvm_is_reserved_pfn().

Give me a few minutes to prep a patch.

> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Dan Williams" <dan.j.williams@intel.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	kvm-ppc@vger.kernel.org,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	"KVM list" <kvm@vger.kernel.org>,
	linux-hyperv@vger.kernel.org, devel@driverdev.osuosl.org,
	xen-devel <xen-devel@lists.xenproject.org>,
	"X86 ML" <x86@kernel.org>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Borislav Petkov" <bp@alien8.de>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Christophe Leroy" <christophe.leroy@c-s.fr>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Isaac J. Manjarres" <isaacm@codeaurora.org>,
	"Jim Mattson" <jmattson@google.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Juergen Gross" <jgross@suse.com>,
	"KarimAllah Ahmed" <karahmed@amazon.de>,
	"Kees Cook" <keescook@chromium.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Matt Sickler" <Matt.Sickler@daktronics.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Paul Mackerras" <paulus@ozlabs.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Pavel Tatashin" <pasha.tatashin@soleen.com>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"Peter Zijlstra" <peterz@infradead.org>, "Qian Cai" <cai@lca.pw>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Sasha Levin" <sashal@kernel.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Stephen Hemminger" <sthemmin@microsoft.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	YueHaibing <yuehaibing@huawei.com>,
	"Adam Borowski" <kilobyte@angband.pl>
Subject: Re: [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
Date: Wed, 6 Nov 2019 08:09:13 -0800	[thread overview]
Message-ID: <20191106160913.GD16249@linux.intel.com> (raw)
In-Reply-To: <694202e7-d8e6-6ac8-6e47-3553b298bbcc@redhat.com>

On Wed, Nov 06, 2019 at 07:56:34AM +0100, David Hildenbrand wrote:
> On 06.11.19 01:08, Dan Williams wrote:
> >On Tue, Nov 5, 2019 at 4:03 PM Sean Christopherson
> >>But David's proposed fix for the above refcount bug is to omit the patch
> >>so that KVM no longer treats ZONE_DEVICE pages as reserved.  That seems
> >>like the right thing to do, including for thp_adjust(), e.g. it would
> >>naturally let KVM use 2mb pages for the guest when a ZONE_DEVICE page is
> >>mapped with a huge page (2mb or above) in the host.  The only hiccup is
> >>figuring out how to correctly transfer the reference.
> >
> >That might not be the only hiccup. There's currently no such thing as
> >huge pages for ZONE_DEVICE, there are huge *mappings* (pmd and pud),
> >but the result of pfn_to_page() on such a mapping does not yield a
> >huge 'struct page'. It seems there are other paths in KVM that assume
> >that more typical page machinery is active like SetPageDirty() based
> >on kvm_is_reserved_pfn(). While I told David that I did not want to
> >see more usage of is_zone_device_page(), this patch below (untested)
> >seems a cleaner path with less surprises:
> >
> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >index 4df0aa6b8e5c..fbea17c1810c 100644
> >--- a/virt/kvm/kvm_main.c
> >+++ b/virt/kvm/kvm_main.c
> >@@ -1831,7 +1831,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
> >
> >  void kvm_release_pfn_clean(kvm_pfn_t pfn)
> >  {
> >-       if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
> >+       if ((!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) ||

The is_error_noslot_pfn() check shouldn't be overriden by zone_device.

> >+           (pfn_valid(pfn) && is_zone_device_page(pfn_to_page(pfn))))

But rather than special case kvm_release_pfn_clean(), I'd rather KVM
explicitly handle ZONE_DEVICE pages, there are other flows where KVM
really should be aware of ZONE_DEVICE pages, e.g. for sanity checks and
whatnot.  There are surprisingly few callers of kvm_is_reserved_pfn(), so
it's actually not too big of a change. 

> >                 put_page(pfn_to_page(pfn));
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
> 
> I had the same thought, but I do wonder about the kvm_get_pfn() users,
> e.g.,:
> 
> hva_to_pfn_remapped():
> 	r = follow_pfn(vma, addr, &pfn);
> 	...
> 	kvm_get_pfn(pfn);
> 	...
> 
> We would not take a reference for ZONE_DEVICE, but later drop one reference
> via kvm_release_pfn_clean(). IOW, kvm_get_pfn() gets *really* dangerous to
> use. I can't tell if this can happen right now.
> 
> We do have 3 users of kvm_get_pfn() that we have to audit before this
> change. Also, we should add a comment to kvm_get_pfn() that it should never
> be used with possible ZONE_DEVICE pages.
> 
> Also, we should add a comment to kvm_release_pfn_clean(), describing why we
> treat ZONE_DEVICE in a special way here.
> 
> 
> We can then progress like this
> 
> 1. Get this fix upstream, it's somewhat unrelated to this series.
> 2. This patch here remains as is in this series (+/- documentation update)
> 3. Long term, rework KVM to not have to not treat ZONE_DEVICE like reserved
> pages. E.g., get rid of kvm_get_pfn(). Then, this special zone check can go.

Dropping kvm_get_pfn() is less than ideal, and at this point unnecessary.
I'm 99% sure the existing call sites for kvm_get_pfn() can never be
reached with ZONE_DEVICE pages.  I think we can do:

  1. Get a fix upstream to have KVM stop treating ZONE_DEVICE pages as
     reserved PFNs, i.e. exempt them in kvm_is_reserved_pfn() and change
     the callers of kvm_is_reserved_pfn() to handle ZONE_DEVICE pages.
  2. Drop this patch from the series, and instead remove the special
     treatment of ZONE_DEVICE pages from kvm_is_reserved_pfn().

Give me a few minutes to prep a patch.

> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
> 


  reply	other threads:[~2019-11-06 19:58 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24 12:09 [PATCH v1 00/10] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) David Hildenbrand
2019-10-24 12:09 ` David Hildenbrand
2019-10-24 12:09 ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 01/10] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-11-05  1:30   ` Dan Williams
2019-11-05  1:30     ` Dan Williams
2019-11-05  1:30     ` [Xen-devel] " Dan Williams
2019-11-05  9:31     ` David Hildenbrand
2019-11-05  9:31       ` David Hildenbrand
2019-11-05  9:31       ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 02/10] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-11-05  1:37   ` Dan Williams
2019-11-05  1:37     ` Dan Williams
2019-11-05  1:37     ` [Xen-devel] " Dan Williams
2019-11-05 11:09     ` David Hildenbrand
2019-11-05 11:09       ` David Hildenbrand
2019-11-05 11:09       ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() " David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-11-05  4:38   ` Dan Williams
2019-11-05  4:38     ` Dan Williams
2019-11-05  4:38     ` [Xen-devel] " Dan Williams
2019-11-05  9:17     ` David Hildenbrand
2019-11-05  9:17       ` David Hildenbrand
2019-11-05  9:17       ` [Xen-devel] " David Hildenbrand
2019-11-05  9:49       ` David Hildenbrand
2019-11-05  9:49         ` David Hildenbrand
2019-11-05  9:49         ` [Xen-devel] " David Hildenbrand
2019-11-05 10:02         ` David Hildenbrand
2019-11-05 10:02           ` David Hildenbrand
2019-11-05 10:02           ` [Xen-devel] " David Hildenbrand
2019-11-05 16:00           ` Sean Christopherson
2019-11-05 16:00             ` Sean Christopherson
2019-11-05 16:00             ` [Xen-devel] " Sean Christopherson
2019-11-05 20:30             ` David Hildenbrand
2019-11-05 20:30               ` David Hildenbrand
2019-11-05 20:30               ` [Xen-devel] " David Hildenbrand
2019-11-05 22:22               ` Sean Christopherson
2019-11-05 22:22                 ` Sean Christopherson
2019-11-05 22:22                 ` [Xen-devel] " Sean Christopherson
2019-11-05 23:02               ` Dan Williams
2019-11-05 23:02                 ` Dan Williams
2019-11-05 23:02                 ` [Xen-devel] " Dan Williams
2019-11-05 23:13                 ` Sean Christopherson
2019-11-05 23:13                   ` Sean Christopherson
2019-11-05 23:13                   ` [Xen-devel] " Sean Christopherson
2019-11-05 23:30                   ` Dan Williams
2019-11-05 23:30                     ` Dan Williams
2019-11-05 23:30                     ` [Xen-devel] " Dan Williams
2019-11-05 23:42                     ` Sean Christopherson
2019-11-05 23:42                       ` Sean Christopherson
2019-11-05 23:42                       ` [Xen-devel] " Sean Christopherson
2019-11-05 23:43                     ` Dan Williams
2019-11-05 23:43                       ` Dan Williams
2019-11-05 23:43                       ` [Xen-devel] " Dan Williams
2019-11-06  0:03                       ` Sean Christopherson
2019-11-06  0:03                         ` Sean Christopherson
2019-11-06  0:03                         ` [Xen-devel] " Sean Christopherson
2019-11-06  0:08                         ` Dan Williams
2019-11-06  0:08                           ` Dan Williams
2019-11-06  0:08                           ` [Xen-devel] " Dan Williams
2019-11-06  6:56                           ` David Hildenbrand
2019-11-06  6:56                             ` David Hildenbrand
2019-11-06  6:56                             ` [Xen-devel] " David Hildenbrand
2019-11-06 16:09                             ` Sean Christopherson [this message]
2019-11-06 16:09                               ` Sean Christopherson
2019-11-06 16:09                               ` [Xen-devel] " Sean Christopherson
2019-10-24 12:09 ` [PATCH v1 04/10] vfio/type1: Prepare is_invalid_reserved_pfn() " David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-11-07 15:40   ` Dan Williams
2019-11-07 15:40     ` Dan Williams
2019-11-07 15:40     ` [Xen-devel] " Dan Williams
2019-11-07 18:22     ` David Hildenbrand
2019-11-07 18:22       ` David Hildenbrand
2019-11-07 18:22       ` [Xen-devel] " David Hildenbrand
2019-11-07 22:07       ` David Hildenbrand
2019-11-07 22:07         ` David Hildenbrand
2019-11-07 22:07         ` [Xen-devel] " David Hildenbrand
2019-11-08  5:09         ` Dan Williams
2019-11-08  5:09           ` Dan Williams
2019-11-08  5:09           ` [Xen-devel] " Dan Williams
2019-11-08  7:14           ` David Hildenbrand
2019-11-08  7:14             ` David Hildenbrand
2019-11-08  7:14             ` [Xen-devel] " David Hildenbrand
2019-11-08 10:21             ` David Hildenbrand
2019-11-08 10:21               ` David Hildenbrand
2019-11-08 10:21               ` [Xen-devel] " David Hildenbrand
2019-11-08 18:29               ` Dan Williams
2019-11-08 18:29                 ` Dan Williams
2019-11-08 18:29                 ` [Xen-devel] " Dan Williams
2019-11-08 23:01                 ` David Hildenbrand
2019-11-08 23:01                   ` David Hildenbrand
2019-11-08 23:01                   ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 05/10] powerpc/book3s: Prepare kvmppc_book3s_instantiate_page() " David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 06/10] powerpc/64s: Prepare hash_page_do_lazy_icache() " David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 07/10] powerpc/mm: Prepare maybe_pte_to_page() " David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 08/10] x86/mm: Prepare __ioremap_check_ram() " David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-10-24 12:09 ` [PATCH v1 09/10] mm/memory_hotplug: Don't mark pages PG_reserved when initializing the memmap David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-11-04 22:44   ` Boris Ostrovsky
2019-11-04 22:44     ` Boris Ostrovsky
2019-11-04 22:44     ` [Xen-devel] " Boris Ostrovsky
2019-11-05 10:18     ` David Hildenbrand
2019-11-05 10:18       ` David Hildenbrand
2019-11-05 10:18       ` [Xen-devel] " David Hildenbrand
2019-11-05 16:06       ` Boris Ostrovsky
2019-11-05 16:06         ` Boris Ostrovsky
2019-11-05 16:06         ` [Xen-devel] " Boris Ostrovsky
2019-10-24 12:09 ` [PATCH v1 10/10] mm/usercopy.c: Update comment in check_page_span() regarding ZONE_DEVICE David Hildenbrand
2019-10-24 12:09   ` David Hildenbrand
2019-10-24 12:09   ` [Xen-devel] " David Hildenbrand
2019-11-01 19:24 ` [PATCH v1 00/10] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) David Hildenbrand
2019-11-01 19:24   ` David Hildenbrand
2019-11-01 19:24   ` [Xen-devel] " David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191106160913.GD16249@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=Matt.Sickler@daktronics.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=allison@lohutok.net \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=haiyangz@microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=isaacm@codeaurora.org \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=karahmed@amazon.de \
    --cc=keescook@chromium.org \
    --cc=kilobyte@angband.pl \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@samba.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rkrcmar@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sashal@kernel.org \
    --cc=sstabellini@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yuehaibing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.