All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Alex Sierra" <alex.sierra@amd.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"David Airlie" <airlied@linux.ie>,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, amd-gfx@lists.freedesktop.org,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	linux-kernel@vger.kernel.org, "Daniel Vetter" <daniel@ffwll.ch>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH 2/7] mm: Free device private pages have zero refcount
Date: Fri, 30 Sep 2022 10:45:37 +1000	[thread overview]
Message-ID: <87k05llly0.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <6335fd87adc7f_f6c9d29474@dwillia2-xfh.jf.intel.com.notmuch>


Dan Williams <dan.j.williams@intel.com> writes:

> Alistair Popple wrote:
>>
>> Jason Gunthorpe <jgg@nvidia.com> writes:
>>
>> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
>> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
>> >> refcount") device private pages have no longer had an extra reference
>> >> count when the page is in use. However before handing them back to the
>> >> owning device driver we add an extra reference count such that free
>> >> pages have a reference count of one.
>> >>
>> >> This makes it difficult to tell if a page is free or not because both
>> >> free and in use pages will have a non-zero refcount. Instead we should
>> >> return pages to the drivers page allocator with a zero reference count.
>> >> Kernel code can then safely use kernel functions such as
>> >> get_page_unless_zero().
>> >>
>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> >> ---
>> >>  arch/powerpc/kvm/book3s_hv_uvmem.c       | 1 +
>> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>> >>  lib/test_hmm.c                           | 1 +
>> >>  mm/memremap.c                            | 5 -----
>> >>  mm/page_alloc.c                          | 6 ++++++
>> >>  6 files changed, 10 insertions(+), 5 deletions(-)
>> >
>> > I think this is a great idea, but I'm surprised no dax stuff is
>> > touched here?
>>
>> free_zone_device_page() shouldn't be called for pgmap->type ==
>> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
>> there. Except that the folio code looks like it might have introduced a
>> bug. AFAICT put_page() always calls
>> put_devmap_managed_page(&folio->page) but folio_put() does not (although
>> folios_put() does!). So it seems folio_put() won't end up calling
>> __put_devmap_managed_page_refs() as I think it should.
>>
>> I think you're right about the change to __init_zone_device_page() - I
>> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
>> look at Dan's patch series more closely as I suspect it might be better
>> to rebase this patch on top of that.
>
> Apologies for the delay I was travelling the past few days. Yes, I think
> this patch slots in nicely to avoid the introduction of an init_mode
> [1]:
>
> https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.stgit@dwillia2-xfh.jf.intel.com/
>
> Mind if I steal it into my series?

No problem, although I notice Andrew has already merged it into
mm-unstable. If you end up rebasing your series on top of mine I think
all that's needed is a patch somewhere in your series to drop the
various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully)
avoid breaking DAX. Assuming DAX takes a pagemap reference on struct
page allocation something like below.

---

diff --git a/mm/memremap.c b/mm/memremap.c
index 421bec3a29ee..da1a0e0abb8b 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -507,15 +507,7 @@ void free_zone_device_page(struct page *page)
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);

-	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
-	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
-		/*
-		 * Reset the page count to 1 to prepare for handing out the page
-		 * again.
-		 */
-		set_page_count(page, 1);
-	else
-		put_dev_pagemap(page->pgmap);
+	put_dev_pagemap(page->pgmap);
 }

 void zone_device_page_init(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 014dbdf54d62..3e5ff06700ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6816,9 +6816,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	 * ZONE_DEVICE pages are released directly to the driver page allocator
 	 * which will set the page count to 1 when allocating the page.
 	 */
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-	    pgmap->type == MEMORY_DEVICE_COHERENT)
-		set_page_count(page, 0);
+	set_page_count(page, 0);
 }

 /*

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Alex Sierra" <alex.sierra@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, amd-gfx@lists.freedesktop.org,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	linux-kernel@vger.kernel.org, "Daniel Vetter" <daniel@ffwll.ch>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount
Date: Fri, 30 Sep 2022 10:45:37 +1000	[thread overview]
Message-ID: <87k05llly0.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <6335fd87adc7f_f6c9d29474@dwillia2-xfh.jf.intel.com.notmuch>


Dan Williams <dan.j.williams@intel.com> writes:

> Alistair Popple wrote:
>>
>> Jason Gunthorpe <jgg@nvidia.com> writes:
>>
>> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
>> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
>> >> refcount") device private pages have no longer had an extra reference
>> >> count when the page is in use. However before handing them back to the
>> >> owning device driver we add an extra reference count such that free
>> >> pages have a reference count of one.
>> >>
>> >> This makes it difficult to tell if a page is free or not because both
>> >> free and in use pages will have a non-zero refcount. Instead we should
>> >> return pages to the drivers page allocator with a zero reference count.
>> >> Kernel code can then safely use kernel functions such as
>> >> get_page_unless_zero().
>> >>
>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> >> ---
>> >>  arch/powerpc/kvm/book3s_hv_uvmem.c       | 1 +
>> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>> >>  lib/test_hmm.c                           | 1 +
>> >>  mm/memremap.c                            | 5 -----
>> >>  mm/page_alloc.c                          | 6 ++++++
>> >>  6 files changed, 10 insertions(+), 5 deletions(-)
>> >
>> > I think this is a great idea, but I'm surprised no dax stuff is
>> > touched here?
>>
>> free_zone_device_page() shouldn't be called for pgmap->type ==
>> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
>> there. Except that the folio code looks like it might have introduced a
>> bug. AFAICT put_page() always calls
>> put_devmap_managed_page(&folio->page) but folio_put() does not (although
>> folios_put() does!). So it seems folio_put() won't end up calling
>> __put_devmap_managed_page_refs() as I think it should.
>>
>> I think you're right about the change to __init_zone_device_page() - I
>> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
>> look at Dan's patch series more closely as I suspect it might be better
>> to rebase this patch on top of that.
>
> Apologies for the delay I was travelling the past few days. Yes, I think
> this patch slots in nicely to avoid the introduction of an init_mode
> [1]:
>
> https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.stgit@dwillia2-xfh.jf.intel.com/
>
> Mind if I steal it into my series?

No problem, although I notice Andrew has already merged it into
mm-unstable. If you end up rebasing your series on top of mine I think
all that's needed is a patch somewhere in your series to drop the
various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully)
avoid breaking DAX. Assuming DAX takes a pagemap reference on struct
page allocation something like below.

---

diff --git a/mm/memremap.c b/mm/memremap.c
index 421bec3a29ee..da1a0e0abb8b 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -507,15 +507,7 @@ void free_zone_device_page(struct page *page)
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);

-	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
-	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
-		/*
-		 * Reset the page count to 1 to prepare for handing out the page
-		 * again.
-		 */
-		set_page_count(page, 1);
-	else
-		put_dev_pagemap(page->pgmap);
+	put_dev_pagemap(page->pgmap);
 }

 void zone_device_page_init(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 014dbdf54d62..3e5ff06700ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6816,9 +6816,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	 * ZONE_DEVICE pages are released directly to the driver page allocator
 	 * which will set the page count to 1 when allocating the page.
 	 */
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-	    pgmap->type == MEMORY_DEVICE_COHERENT)
-		set_page_count(page, 0);
+	set_page_count(page, 0);
 }

 /*

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Alex Sierra" <alex.sierra@amd.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"David Airlie" <airlied@linux.ie>,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, amd-gfx@lists.freedesktop.org,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	linux-kernel@vger.kernel.org, "Daniel Vetter" <daniel@ffwll.ch>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH 2/7] mm: Free device private pages have zero refcount
Date: Fri, 30 Sep 2022 10:45:37 +1000	[thread overview]
Message-ID: <87k05llly0.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <6335fd87adc7f_f6c9d29474@dwillia2-xfh.jf.intel.com.notmuch>


Dan Williams <dan.j.williams@intel.com> writes:

> Alistair Popple wrote:
>>
>> Jason Gunthorpe <jgg@nvidia.com> writes:
>>
>> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
>> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
>> >> refcount") device private pages have no longer had an extra reference
>> >> count when the page is in use. However before handing them back to the
>> >> owning device driver we add an extra reference count such that free
>> >> pages have a reference count of one.
>> >>
>> >> This makes it difficult to tell if a page is free or not because both
>> >> free and in use pages will have a non-zero refcount. Instead we should
>> >> return pages to the drivers page allocator with a zero reference count.
>> >> Kernel code can then safely use kernel functions such as
>> >> get_page_unless_zero().
>> >>
>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> >> ---
>> >>  arch/powerpc/kvm/book3s_hv_uvmem.c       | 1 +
>> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>> >>  lib/test_hmm.c                           | 1 +
>> >>  mm/memremap.c                            | 5 -----
>> >>  mm/page_alloc.c                          | 6 ++++++
>> >>  6 files changed, 10 insertions(+), 5 deletions(-)
>> >
>> > I think this is a great idea, but I'm surprised no dax stuff is
>> > touched here?
>>
>> free_zone_device_page() shouldn't be called for pgmap->type ==
>> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
>> there. Except that the folio code looks like it might have introduced a
>> bug. AFAICT put_page() always calls
>> put_devmap_managed_page(&folio->page) but folio_put() does not (although
>> folios_put() does!). So it seems folio_put() won't end up calling
>> __put_devmap_managed_page_refs() as I think it should.
>>
>> I think you're right about the change to __init_zone_device_page() - I
>> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
>> look at Dan's patch series more closely as I suspect it might be better
>> to rebase this patch on top of that.
>
> Apologies for the delay I was travelling the past few days. Yes, I think
> this patch slots in nicely to avoid the introduction of an init_mode
> [1]:
>
> https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.stgit@dwillia2-xfh.jf.intel.com/
>
> Mind if I steal it into my series?

No problem, although I notice Andrew has already merged it into
mm-unstable. If you end up rebasing your series on top of mine I think
all that's needed is a patch somewhere in your series to drop the
various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully)
avoid breaking DAX. Assuming DAX takes a pagemap reference on struct
page allocation something like below.

---

diff --git a/mm/memremap.c b/mm/memremap.c
index 421bec3a29ee..da1a0e0abb8b 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -507,15 +507,7 @@ void free_zone_device_page(struct page *page)
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);

-	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
-	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
-		/*
-		 * Reset the page count to 1 to prepare for handing out the page
-		 * again.
-		 */
-		set_page_count(page, 1);
-	else
-		put_dev_pagemap(page->pgmap);
+	put_dev_pagemap(page->pgmap);
 }

 void zone_device_page_init(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 014dbdf54d62..3e5ff06700ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6816,9 +6816,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	 * ZONE_DEVICE pages are released directly to the driver page allocator
 	 * which will set the page count to 1 when allocating the page.
 	 */
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-	    pgmap->type == MEMORY_DEVICE_COHERENT)
-		set_page_count(page, 0);
+	set_page_count(page, 0);
 }

 /*

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Alex Sierra" <alex.sierra@amd.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"David Airlie" <airlied@linux.ie>,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, amd-gfx@lists.freedesktop.org,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	linux-kernel@vger.kernel.org,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH 2/7] mm: Free device private pages have zero refcount
Date: Fri, 30 Sep 2022 10:45:37 +1000	[thread overview]
Message-ID: <87k05llly0.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <6335fd87adc7f_f6c9d29474@dwillia2-xfh.jf.intel.com.notmuch>


Dan Williams <dan.j.williams@intel.com> writes:

> Alistair Popple wrote:
>>
>> Jason Gunthorpe <jgg@nvidia.com> writes:
>>
>> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
>> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
>> >> refcount") device private pages have no longer had an extra reference
>> >> count when the page is in use. However before handing them back to the
>> >> owning device driver we add an extra reference count such that free
>> >> pages have a reference count of one.
>> >>
>> >> This makes it difficult to tell if a page is free or not because both
>> >> free and in use pages will have a non-zero refcount. Instead we should
>> >> return pages to the drivers page allocator with a zero reference count.
>> >> Kernel code can then safely use kernel functions such as
>> >> get_page_unless_zero().
>> >>
>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> >> ---
>> >>  arch/powerpc/kvm/book3s_hv_uvmem.c       | 1 +
>> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>> >>  lib/test_hmm.c                           | 1 +
>> >>  mm/memremap.c                            | 5 -----
>> >>  mm/page_alloc.c                          | 6 ++++++
>> >>  6 files changed, 10 insertions(+), 5 deletions(-)
>> >
>> > I think this is a great idea, but I'm surprised no dax stuff is
>> > touched here?
>>
>> free_zone_device_page() shouldn't be called for pgmap->type ==
>> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
>> there. Except that the folio code looks like it might have introduced a
>> bug. AFAICT put_page() always calls
>> put_devmap_managed_page(&folio->page) but folio_put() does not (although
>> folios_put() does!). So it seems folio_put() won't end up calling
>> __put_devmap_managed_page_refs() as I think it should.
>>
>> I think you're right about the change to __init_zone_device_page() - I
>> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
>> look at Dan's patch series more closely as I suspect it might be better
>> to rebase this patch on top of that.
>
> Apologies for the delay I was travelling the past few days. Yes, I think
> this patch slots in nicely to avoid the introduction of an init_mode
> [1]:
>
> https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.stgit@dwillia2-xfh.jf.intel.com/
>
> Mind if I steal it into my series?

No problem, although I notice Andrew has already merged it into
mm-unstable. If you end up rebasing your series on top of mine I think
all that's needed is a patch somewhere in your series to drop the
various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully)
avoid breaking DAX. Assuming DAX takes a pagemap reference on struct
page allocation something like below.

---

diff --git a/mm/memremap.c b/mm/memremap.c
index 421bec3a29ee..da1a0e0abb8b 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -507,15 +507,7 @@ void free_zone_device_page(struct page *page)
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);

-	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
-	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
-		/*
-		 * Reset the page count to 1 to prepare for handing out the page
-		 * again.
-		 */
-		set_page_count(page, 1);
-	else
-		put_dev_pagemap(page->pgmap);
+	put_dev_pagemap(page->pgmap);
 }

 void zone_device_page_init(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 014dbdf54d62..3e5ff06700ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6816,9 +6816,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	 * ZONE_DEVICE pages are released directly to the driver page allocator
 	 * which will set the page count to 1 when allocating the page.
 	 */
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-	    pgmap->type == MEMORY_DEVICE_COHERENT)
-		set_page_count(page, 0);
+	set_page_count(page, 0);
 }

 /*

WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <apopple@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Jason Gunthorpe" <jgg@nvidia.com>,
	linux-mm@kvack.org, "Andrew Morton" <akpm@linux-foundation.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Alex Sierra" <alex.sierra@amd.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 2/7] mm: Free device private pages have zero refcount
Date: Fri, 30 Sep 2022 10:45:37 +1000	[thread overview]
Message-ID: <87k05llly0.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <6335fd87adc7f_f6c9d29474@dwillia2-xfh.jf.intel.com.notmuch>


Dan Williams <dan.j.williams@intel.com> writes:

> Alistair Popple wrote:
>>
>> Jason Gunthorpe <jgg@nvidia.com> writes:
>>
>> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
>> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
>> >> refcount") device private pages have no longer had an extra reference
>> >> count when the page is in use. However before handing them back to the
>> >> owning device driver we add an extra reference count such that free
>> >> pages have a reference count of one.
>> >>
>> >> This makes it difficult to tell if a page is free or not because both
>> >> free and in use pages will have a non-zero refcount. Instead we should
>> >> return pages to the drivers page allocator with a zero reference count.
>> >> Kernel code can then safely use kernel functions such as
>> >> get_page_unless_zero().
>> >>
>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> >> ---
>> >>  arch/powerpc/kvm/book3s_hv_uvmem.c       | 1 +
>> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>> >>  lib/test_hmm.c                           | 1 +
>> >>  mm/memremap.c                            | 5 -----
>> >>  mm/page_alloc.c                          | 6 ++++++
>> >>  6 files changed, 10 insertions(+), 5 deletions(-)
>> >
>> > I think this is a great idea, but I'm surprised no dax stuff is
>> > touched here?
>>
>> free_zone_device_page() shouldn't be called for pgmap->type ==
>> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
>> there. Except that the folio code looks like it might have introduced a
>> bug. AFAICT put_page() always calls
>> put_devmap_managed_page(&folio->page) but folio_put() does not (although
>> folios_put() does!). So it seems folio_put() won't end up calling
>> __put_devmap_managed_page_refs() as I think it should.
>>
>> I think you're right about the change to __init_zone_device_page() - I
>> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
>> look at Dan's patch series more closely as I suspect it might be better
>> to rebase this patch on top of that.
>
> Apologies for the delay I was travelling the past few days. Yes, I think
> this patch slots in nicely to avoid the introduction of an init_mode
> [1]:
>
> https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.stgit@dwillia2-xfh.jf.intel.com/
>
> Mind if I steal it into my series?

No problem, although I notice Andrew has already merged it into
mm-unstable. If you end up rebasing your series on top of mine I think
all that's needed is a patch somewhere in your series to drop the
various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully)
avoid breaking DAX. Assuming DAX takes a pagemap reference on struct
page allocation something like below.

---

diff --git a/mm/memremap.c b/mm/memremap.c
index 421bec3a29ee..da1a0e0abb8b 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -507,15 +507,7 @@ void free_zone_device_page(struct page *page)
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);

-	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
-	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
-		/*
-		 * Reset the page count to 1 to prepare for handing out the page
-		 * again.
-		 */
-		set_page_count(page, 1);
-	else
-		put_dev_pagemap(page->pgmap);
+	put_dev_pagemap(page->pgmap);
 }

 void zone_device_page_init(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 014dbdf54d62..3e5ff06700ca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6816,9 +6816,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	 * ZONE_DEVICE pages are released directly to the driver page allocator
 	 * which will set the page count to 1 when allocating the page.
 	 */
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-	    pgmap->type == MEMORY_DEVICE_COHERENT)
-		set_page_count(page, 0);
+	set_page_count(page, 0);
 }

 /*


  reply	other threads:[~2022-09-30 14:18 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-26  6:03 [PATCH 0/7] Fix several device private page reference counting issues Alistair Popple
2022-09-26  6:03 ` Alistair Popple
2022-09-26  6:03 ` Alistair Popple
2022-09-26  6:03 ` Alistair Popple
2022-09-26  6:03 ` [Nouveau] " Alistair Popple
2022-09-26  6:03 ` [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple
2022-09-29  0:07   ` Michael Ellerman
2022-09-29  0:07     ` Michael Ellerman
2022-09-29  0:07     ` Michael Ellerman
2022-09-29  0:07     ` Michael Ellerman
2022-09-29  0:07     ` [Nouveau] " Michael Ellerman
2022-09-29  1:40     ` Alistair Popple
2022-09-29  1:40       ` Alistair Popple
2022-09-29  1:40       ` Alistair Popple
2022-09-29  1:40       ` Alistair Popple
2022-09-29  1:40       ` [Nouveau] " Alistair Popple
2022-09-29  5:07       ` Michael Ellerman
2022-09-29  5:07         ` Michael Ellerman
2022-09-29  5:07         ` Michael Ellerman
2022-09-29  5:07         ` Michael Ellerman
2022-09-29  5:07         ` [Nouveau] " Michael Ellerman
2022-09-26  6:03 ` [PATCH 2/7] mm: Free device private pages have zero refcount Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple
2022-09-26 14:36   ` Jason Gunthorpe
2022-09-26 14:36     ` Jason Gunthorpe
2022-09-26 14:36     ` Jason Gunthorpe
2022-09-26 14:36     ` Jason Gunthorpe
2022-09-26 14:36     ` [Nouveau] " Jason Gunthorpe
2022-09-27  2:06     ` Alistair Popple
2022-09-27  2:06       ` Alistair Popple
2022-09-27  2:06       ` Alistair Popple
2022-09-27  2:06       ` Alistair Popple
2022-09-27  2:06       ` [Nouveau] " Alistair Popple
2022-09-29 20:18       ` Dan Williams
2022-09-29 20:18         ` Dan Williams
2022-09-29 20:18         ` Dan Williams
2022-09-29 20:18         ` Dan Williams
2022-09-29 20:18         ` [Nouveau] " Dan Williams
2022-09-30  0:45         ` Alistair Popple [this message]
2022-09-30  0:45           ` Alistair Popple
2022-09-30  0:45           ` Alistair Popple
2022-09-30  0:45           ` Alistair Popple
2022-09-30  0:45           ` [Nouveau] " Alistair Popple
2022-09-30  1:49           ` Dan Williams
2022-09-30  1:49             ` Dan Williams
2022-09-30  1:49             ` Dan Williams
2022-09-30  1:49             ` Dan Williams
2022-09-30  1:49             ` [Nouveau] " Dan Williams
2022-09-26  6:03 ` [PATCH 3/7] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page() Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple
2022-09-26  6:03 ` [PATCH 4/7] mm/migrate_device.c: Add migrate_device_range() Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple
2022-09-26  6:03 ` [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one() Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple
2022-09-26 21:29   ` Lyude Paul
2022-09-26 21:29     ` Lyude Paul
2022-09-26 21:29     ` Lyude Paul
2022-09-26 21:29     ` Lyude Paul
2022-09-26 21:29     ` [Nouveau] " Lyude Paul
2022-09-28 11:30     ` Alistair Popple
2022-09-28 11:30       ` Alistair Popple
2022-09-28 11:30       ` Alistair Popple
2022-09-28 11:30       ` Alistair Popple
2022-09-28 11:30       ` [Nouveau] " Alistair Popple
2022-09-26  6:03 ` [PATCH 6/7] nouveau/dmem: Evict device private memory during release Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple
2022-09-26 13:28   ` kernel test robot
2022-09-26 13:28     ` kernel test robot
2022-09-26 13:28     ` kernel test robot
2022-09-26 13:28     ` kernel test robot
2022-09-26 13:28     ` [Nouveau] " kernel test robot
2022-09-26 21:35   ` Lyude Paul
2022-09-26 21:35     ` Lyude Paul
2022-09-26 21:35     ` Lyude Paul
2022-09-26 21:35     ` Lyude Paul
2022-09-26 21:35     ` [Nouveau] " Lyude Paul
2022-09-26 22:14     ` John Hubbard
2022-09-26 22:14       ` John Hubbard
2022-09-26 22:14       ` John Hubbard
2022-09-26 22:14       ` John Hubbard
2022-09-26 22:14       ` [Nouveau] " John Hubbard
2022-09-26 23:45       ` Alistair Popple
2022-09-26 23:45         ` Alistair Popple
2022-09-26 23:45         ` Alistair Popple
2022-09-26 23:45         ` Alistair Popple
2022-09-26 23:45         ` [Nouveau] " Alistair Popple
2022-09-28 21:39         ` Lyude Paul
2022-09-28 21:39           ` Lyude Paul
2022-09-28 21:39           ` Lyude Paul
2022-09-28 21:39           ` Lyude Paul
2022-09-28 21:39           ` [Nouveau] " Lyude Paul
2022-09-26 23:07     ` Felix Kuehling
2022-09-26 23:07       ` Felix Kuehling
2022-09-26 23:07       ` Felix Kuehling
2022-09-26 23:07       ` Felix Kuehling
2022-09-26 23:07       ` [Nouveau] " Felix Kuehling
2022-09-27  1:39       ` Alistair Popple
2022-09-27  1:39         ` Alistair Popple
2022-09-27  1:39         ` Alistair Popple
2022-09-27  1:39         ` Alistair Popple
2022-09-27  1:39         ` [Nouveau] " Alistair Popple
2022-09-28 21:23         ` Lyude Paul
2022-09-28 21:23           ` Lyude Paul
2022-09-28 21:23           ` Lyude Paul
2022-09-28 21:23           ` Lyude Paul
2022-09-28 21:23           ` [Nouveau] " Lyude Paul
2022-09-26  6:03 ` [PATCH 7/7] hmm-tests: Add test for migrate_device_range() Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` Alistair Popple
2022-09-26  6:03   ` [Nouveau] " Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k05llly0.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=bskeggs@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kherbst@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=nouveau@lists.freedesktop.org \
    --cc=npiggin@gmail.com \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.