* [PATCH RFC 1/6] mm/migrate_device.c: Don't read dirty bit of non-present PTEs
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
@ 2025-03-16 4:29 ` Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 2/6] mm/migrate: Support file-backed pages with migrate_vma Alistair Popple
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2025-03-16 4:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel, Alistair Popple
migrate_vma_collect_pmd() will opportunisticly install migration PTEs
if it is able to lock the migrating folio. This involves clearing the
PTE, which also requires updating page flags such as PageDirty based on
the PTE value when it was cleared.
This was fixed by fd35ca3d12cc ("mm/migrate_device.c: copy pte dirty
bit to page"). However that fix will also copy the pte dirty bit from a
non-present PTE, which is meaningless. However it so happens that on a
default x86 configuration
pte_dirty(make_writable_device_private_entry(0)) is true.
This masks issues where drivers may not be correctly setting the
destination page as dirty when migrating from a device-private page,
because effectively the device-private page is always considered dirty
if it was mapped as writable.
In practice not marking the pages correctly is unlikely to cause issues,
because currently only anonymous memory is supported for device private
pages. Therefore the dirty bit is only read when there is a swap file
that has an uptodate copy of a writable page.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes: fd35ca3d12cc ("mm/migrate_device.c: copy pte dirty bit to page")
---
mm/migrate_device.c | 15 ++++++++++-----
mm/rmap.c | 2 +-
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 9cf2659..afc033b 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -215,10 +215,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
migrate->cpages++;
- /* Set the dirty flag on the folio now the pte is gone. */
- if (pte_dirty(pte))
- folio_mark_dirty(folio);
-
/* Setup special migration page table entry */
if (mpfn & MIGRATE_PFN_WRITE)
entry = make_writable_migration_entry(
@@ -232,8 +228,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (pte_present(pte)) {
if (pte_young(pte))
entry = make_migration_entry_young(entry);
- if (pte_dirty(pte))
+ if (pte_dirty(pte)) {
+ /*
+ * Mark the folio dirty now the pte is
+ * gone because
+ * make_migration_entry_dirty() won't
+ * store the dirty bit if there isn't
+ * room.
+ */
+ folio_mark_dirty(folio);
entry = make_migration_entry_dirty(entry);
+ }
}
swp_pte = swp_entry_to_pte(entry);
if (pte_present(pte)) {
diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4e..df88674 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2176,7 +2176,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
}
/* Set the dirty flag on the folio now the pte is gone. */
- if (pte_dirty(pteval))
+ if (pte_present(pteval) && pte_dirty(pteval))
folio_mark_dirty(folio);
/* Update high watermark before we lower rss */
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 2/6] mm/migrate: Support file-backed pages with migrate_vma
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 1/6] mm/migrate_device.c: Don't read dirty bit of non-present PTEs Alistair Popple
@ 2025-03-16 4:29 ` Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 3/6] mm: Allow device private pages to exist in page cache Alistair Popple
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2025-03-16 4:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel, Alistair Popple
This adds support for migrating file backed pages with the migrate_vma
calls. Note that this does not support migrating file backed pages
to device private pages, only CPU addressable memory. However add the
extra refcount argument to support faulting on device privatge pages
in future.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
Known issues with the RFC:
- Some filesystems (eg. xfs, nfs) can insert higher order compound
pages in the pagecache. Migration will fail for such pages.
---
include/linux/migrate.h | 4 ++++
mm/migrate.c | 19 +++++++++++--------
mm/migrate_device.c | 11 +++++++++--
3 files changed, 24 insertions(+), 10 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 29919fa..9023d0f 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -222,6 +222,10 @@ struct migrate_vma {
struct page *fault_page;
};
+int fallback_migrate_folio(struct address_space *mapping,
+ struct folio *dst, struct folio *src, enum migrate_mode mode,
+ int extra_count);
+
int migrate_vma_setup(struct migrate_vma *args);
void migrate_vma_pages(struct migrate_vma *migrate);
void migrate_vma_finalize(struct migrate_vma *migrate);
diff --git a/mm/migrate.c b/mm/migrate.c
index fb19a18..11fca43 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -749,9 +749,10 @@ EXPORT_SYMBOL(folio_migrate_flags);
static int __migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, void *src_private,
- enum migrate_mode mode)
+ enum migrate_mode mode, int extra_count)
{
- int rc, expected_count = folio_expected_refs(mapping, src);
+ int rc;
+ int expected_count = folio_expected_refs(mapping, src) + extra_count;
/* Check whether src does not have extra refs before we do more work */
if (folio_ref_count(src) != expected_count)
@@ -788,7 +789,7 @@ int migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, enum migrate_mode mode)
{
BUG_ON(folio_test_writeback(src)); /* Writeback must be complete */
- return __migrate_folio(mapping, dst, src, NULL, mode);
+ return __migrate_folio(mapping, dst, src, NULL, mode, 0);
}
EXPORT_SYMBOL(migrate_folio);
@@ -942,7 +943,8 @@ EXPORT_SYMBOL_GPL(buffer_migrate_folio_norefs);
int filemap_migrate_folio(struct address_space *mapping,
struct folio *dst, struct folio *src, enum migrate_mode mode)
{
- return __migrate_folio(mapping, dst, src, folio_get_private(src), mode);
+ return __migrate_folio(mapping, dst, src,
+ folio_get_private(src), mode, 0);
}
EXPORT_SYMBOL_GPL(filemap_migrate_folio);
@@ -990,8 +992,9 @@ static int writeout(struct address_space *mapping, struct folio *folio)
/*
* Default handling if a filesystem does not provide a migration function.
*/
-static int fallback_migrate_folio(struct address_space *mapping,
- struct folio *dst, struct folio *src, enum migrate_mode mode)
+int fallback_migrate_folio(struct address_space *mapping,
+ struct folio *dst, struct folio *src, enum migrate_mode mode,
+ int extra_count)
{
if (folio_test_dirty(src)) {
/* Only writeback folios in full synchronous migration */
@@ -1011,7 +1014,7 @@ static int fallback_migrate_folio(struct address_space *mapping,
if (!filemap_release_folio(src, GFP_KERNEL))
return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
- return migrate_folio(mapping, dst, src, mode);
+ return __migrate_folio(mapping, dst, src, NULL, mode, extra_count);
}
/*
@@ -1052,7 +1055,7 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
rc = mapping->a_ops->migrate_folio(mapping, dst, src,
mode);
else
- rc = fallback_migrate_folio(mapping, dst, src, mode);
+ rc = fallback_migrate_folio(mapping, dst, src, mode, 0);
} else {
const struct movable_operations *mops;
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index afc033b..7bcc177 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -763,11 +763,18 @@ static void __migrate_device_pages(unsigned long *src_pfns,
if (migrate && migrate->fault_page == page)
extra_cnt = 1;
- r = folio_migrate_mapping(mapping, newfolio, folio, extra_cnt);
+ if (mapping)
+ r = fallback_migrate_folio(mapping, newfolio, folio,
+ MIGRATE_SYNC, extra_cnt);
+ else
+ r = folio_migrate_mapping(mapping, newfolio, folio,
+ extra_cnt);
if (r != MIGRATEPAGE_SUCCESS)
src_pfns[i] &= ~MIGRATE_PFN_MIGRATE;
- else
+ else if (!mapping)
folio_migrate_flags(newfolio, folio);
+ else
+ folio->mapping = NULL;
}
if (notified)
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 3/6] mm: Allow device private pages to exist in page cache
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 1/6] mm/migrate_device.c: Don't read dirty bit of non-present PTEs Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 2/6] mm/migrate: Support file-backed pages with migrate_vma Alistair Popple
@ 2025-03-16 4:29 ` Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 4/6] mm: Implement writeback for share device private pages Alistair Popple
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2025-03-16 4:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel, Alistair Popple
Device private pages can currently only be used for private anonymous
memory. This is because they are inaccessible from the CPU, making
shared mappings between device and CPU difficult.
For private mappings this problem is resolved by installing non-present
PTEs which allows the pages to be migrated back to the CPU as required.
However shared filebacked mappings are not always accessed via PTEs
(for example read/write syscalls), so such entries are not sufficient to
prevent the CPU trying to access device private pages.
However most other accesses go via the pagecache, so can be intercepted
there. Implement this by allowing device private pages to exist in the
pagecache. Whenever a device private entry is found in the pagecache
migrate the entry back from the device to the CPU and restore the data
from disk.
Drivers can create these entries using the standard migrate_vma calls.
For this migration to succeed any buffer heads or private data must
be stripped from the page. Normally the migrate_folio() address space
operation would be used for this if available for a particular mapping.
However this is not appropriate for device private pages because buffers
cannot be migrated to device memory and ZONE_DEVICE pages have no where
to store the private data. So instead the page is always cleaned and
written back to disk in an attempt to remove any buffers and/or private
data. If that fails the migration will fail.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
include/linux/migrate.h | 2 +-
mm/filemap.c | 41 ++++++++++++++++++++++++++-
mm/memory.c | 9 ++----
mm/memremap.c | 1 +-
mm/migrate.c | 21 +++++++++----
mm/migrate_device.c | 66 +++++++++++++++++++++++++++++++++++++++++-
6 files changed, 128 insertions(+), 12 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9023d0f..623fea4 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -62,6 +62,7 @@ extern const char *migrate_reason_names[MR_TYPES];
#ifdef CONFIG_MIGRATION
+void migrate_device_page(struct page *page);
void putback_movable_pages(struct list_head *l);
int migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, enum migrate_mode mode);
@@ -82,6 +83,7 @@ int folio_migrate_mapping(struct address_space *mapping,
#else
+static inline void migrate_device_page(struct page *page) {}
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_folio_t new,
free_folio_t free, unsigned long private,
diff --git a/mm/filemap.c b/mm/filemap.c
index 804d736..ee35277 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -658,6 +658,12 @@ bool filemap_range_has_writeback(struct address_space *mapping,
xas_for_each(&xas, folio, max) {
if (xas_retry(&xas, folio))
continue;
+ /*
+ * TODO: We would have to query the driver to find out if write
+ * back is required. Probably easiest just to migrate the page
+ * back. Need to drop the rcu lock and retry.
+ */
+ WARN_ON(is_device_private_page(&folio->page));
if (xa_is_value(folio))
continue;
if (folio_test_dirty(folio) || folio_test_locked(folio) ||
@@ -1874,6 +1880,15 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
folio_put(folio);
goto repeat;
}
+
+ if (is_device_private_page(&folio->page)) {
+ rcu_read_unlock();
+ migrate_device_page(&folio->page);
+ folio_put(folio);
+ rcu_read_lock();
+ goto repeat;
+ }
+
out:
rcu_read_unlock();
@@ -2034,6 +2049,14 @@ static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
goto reset;
}
+ if (is_device_private_page(&folio->page)) {
+ rcu_read_unlock();
+ migrate_device_page(&folio->page);
+ folio_put(folio);
+ rcu_read_lock();
+ goto reset;
+ }
+
return folio;
reset:
xas_reset(xas);
@@ -2229,6 +2252,14 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
if (unlikely(folio != xas_reload(&xas)))
goto put_folio;
+ if (is_device_private_page(&folio->page)) {
+ rcu_read_unlock();
+ migrate_device_page(&folio->page);
+ folio_put(folio);
+ rcu_read_lock();
+ goto retry;
+ }
+
if (!folio_batch_add(fbatch, folio)) {
nr = folio_nr_pages(folio);
*start = folio->index + nr;
@@ -2361,6 +2392,14 @@ static void filemap_get_read_batch(struct address_space *mapping,
if (unlikely(folio != xas_reload(&xas)))
goto put_folio;
+ if (is_device_private_page(&folio->page)) {
+ rcu_read_unlock();
+ migrate_device_page(&folio->page);
+ folio_put(folio);
+ rcu_read_lock();
+ goto retry;
+ }
+
if (!folio_batch_add(fbatch, folio))
break;
if (!folio_test_uptodate(folio))
@@ -3642,6 +3681,8 @@ static struct folio *next_uptodate_folio(struct xa_state *xas,
/* Has the page moved or been split? */
if (unlikely(folio != xas_reload(xas)))
goto skip;
+ if (is_device_private_page(&folio->page))
+ goto skip;
if (!folio_test_uptodate(folio) || folio_test_readahead(folio))
goto skip;
if (!folio_trylock(folio))
diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7..c346683 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1616,12 +1616,11 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb,
if (unlikely(!should_zap_folio(details, folio)))
return 1;
/*
- * Both device private/exclusive mappings should only
- * work with anonymous page so far, so we don't need to
- * consider uffd-wp bit when zap. For more information,
- * see zap_install_uffd_wp_if_needed().
+ * TODO: Do we need to consider uffd-wp bit when zap? For more
+ * information, see zap_install_uffd_wp_if_needed().
*/
- WARN_ON_ONCE(!vma_is_anonymous(vma));
+ WARN_ON_ONCE(zap_install_uffd_wp_if_needed(vma, addr, pte, nr,
+ details, ptent));
rss[mm_counter(folio)]--;
if (is_device_private_entry(entry))
folio_remove_rmap_pte(folio, page, vma);
diff --git a/mm/memremap.c b/mm/memremap.c
index 40d4547..e49fdcb 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -143,7 +143,6 @@ void memunmap_pages(struct dev_pagemap *pgmap)
pgmap->type != MEMORY_DEVICE_COHERENT)
for (i = 0; i < pgmap->nr_range; i++)
percpu_ref_put_many(&pgmap->ref, pfn_len(pgmap, i));
-
wait_for_completion(&pgmap->done);
for (i = 0; i < pgmap->nr_range; i++)
diff --git a/mm/migrate.c b/mm/migrate.c
index 11fca43..21f92eb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -248,12 +248,14 @@ static bool remove_migration_pte(struct folio *folio,
pte_t pte;
swp_entry_t entry;
struct page *new;
+ struct page *old;
unsigned long idx = 0;
/* pgoff is invalid for ksm pages, but they are never large */
if (folio_test_large(folio) && !folio_test_hugetlb(folio))
idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff;
new = folio_page(folio, idx);
+ old = folio_page(rmap_walk_arg->folio, idx);
#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
/* PMD-mapped THP migration entry */
@@ -291,7 +293,12 @@ static bool remove_migration_pte(struct folio *folio,
rmap_flags |= RMAP_EXCLUSIVE;
if (unlikely(is_device_private_page(new))) {
- if (pte_write(pte))
+ /*
+ * Page should have been written out during migration.
+ */
+ WARN_ON_ONCE(PageDirty(old) &&
+ folio_mapping(page_folio(old)));
+ if (!folio_mapping(page_folio(old)) && pte_write(pte))
entry = make_writable_device_private_entry(
page_to_pfn(new));
else
@@ -758,9 +765,12 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
if (folio_ref_count(src) != expected_count)
return -EAGAIN;
- rc = folio_mc_copy(dst, src);
- if (unlikely(rc))
- return rc;
+ /* Drivers will do the copy before calling migrate_device_finalize() */
+ if (!folio_is_device_private(dst) && !folio_is_device_private(src)) {
+ rc = folio_mc_copy(dst, src);
+ if (unlikely(rc))
+ return rc;
+ }
rc = __folio_migrate_mapping(mapping, dst, src, expected_count);
if (rc != MIGRATEPAGE_SUCCESS)
@@ -1044,7 +1054,8 @@ static int move_to_new_folio(struct folio *dst, struct folio *src,
rc = migrate_folio(mapping, dst, src, mode);
else if (mapping_inaccessible(mapping))
rc = -EOPNOTSUPP;
- else if (mapping->a_ops->migrate_folio)
+ else if (!is_device_private_page(&dst->page) &&
+ mapping->a_ops->migrate_folio)
/*
* Most folios have a mapping and most filesystems
* provide a migrate_folio callback. Anonymous folios
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 7bcc177..946e9fd 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -745,7 +745,7 @@ static void __migrate_device_pages(unsigned long *src_pfns,
*
* Try to get rid of swap cache if possible.
*/
- if (!folio_test_anon(folio) ||
+ if (folio_test_anon(folio) &&
!folio_free_swap(folio)) {
src_pfns[i] &= ~MIGRATE_PFN_MIGRATE;
continue;
@@ -862,6 +862,7 @@ void migrate_device_finalize(unsigned long *src_pfns,
if (dst != src) {
folio_unlock(dst);
+
if (folio_is_zone_device(dst))
folio_put(dst);
else
@@ -888,6 +889,69 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
}
EXPORT_SYMBOL(migrate_vma_finalize);
+/*
+ * This migrates the device private page back to the page cache. It doesn't
+ * actually copy any data though, it reads it back from the filesystem.
+ */
+void migrate_device_page(struct page *page)
+{
+ int ret;
+ struct page *newpage;
+
+ WARN_ON(!is_device_private_page(page));
+
+ /*
+ * We don't support writeback of dirty pages from the driver yet.
+ */
+ WARN_ON(PageDirty(page));
+
+ lock_page(page);
+ try_to_migrate(page_folio(page), 0);
+
+ /*
+ * We should always be able to unmap device-private pages. Right?
+ */
+ WARN_ON(page_mapped(page));
+
+ newpage = alloc_pages(GFP_HIGHUSER_MOVABLE, 0);
+ /*
+ * OOM is fatal, so need to retry harder although 0-order allocations
+ * should never fail?
+ */
+ WARN_ON(!newpage);
+ lock_page(newpage);
+
+ /*
+ * Replace the device-private page with the new page in the page cache.
+ */
+ ret = fallback_migrate_folio(folio_mapping(page_folio(page)),
+ page_folio(newpage), page_folio(page),
+ MIGRATE_SYNC, 0);
+
+ /* This should never fail... */
+ WARN_ON_ONCE(ret != MIGRATEPAGE_SUCCESS);
+ page->mapping = NULL;
+
+ /*
+ * We're going to read the newpage back from disk so make it not
+ * uptodate.
+ */
+ ClearPageUptodate(newpage);
+
+ /*
+ * IO will unlock newpage asynchronously.
+ */
+ folio_mapping(page_folio(newpage))->a_ops->read_folio(NULL,
+ page_folio(newpage));
+ lock_page(newpage);
+
+ remove_migration_ptes(page_folio(page), page_folio(newpage), false);
+
+ unlock_page(page);
+ unlock_page(newpage);
+ folio_putback_lru(page_folio(newpage));
+}
+
/**
* migrate_device_range() - migrate device private pfns to normal memory.
* @src_pfns: array large enough to hold migrating source device private pfns.
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 4/6] mm: Implement writeback for share device private pages
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
` (2 preceding siblings ...)
2025-03-16 4:29 ` [PATCH RFC 3/6] mm: Allow device private pages to exist in page cache Alistair Popple
@ 2025-03-16 4:29 ` Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 5/6] selftests/hmm: Add file-backed migration tests Alistair Popple
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2025-03-16 4:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel, Alistair Popple
Currently devices can't write to shared filebacked device private pages
and any writes will be lost. This is because when a device private
pagecache page is migrated back to the CPU the contents are always
reloaded from backing storage.
To allow data written by the device to be migrated back add a new pgmap
call, migrate_to_pagecache(), which will be called when a device private
entry is found in the page cache to copy the data back from the device
to the new pagecache page.
Because the page was clean when migrating to the device we need to
inform the filesystem that the pages needs to be writable. Drivers are
expected to do this by calling set_page_dirty() on the new page if it
was written to in the migrate_to_pagecache() callback.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
include/linux/memremap.h | 2 ++-
mm/migrate.c | 2 +-
mm/migrate_device.c | 54 ++++++++++++++++++++++++++++-------------
3 files changed, 41 insertions(+), 17 deletions(-)
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 3f7143a..d921db2 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -89,6 +89,8 @@ struct dev_pagemap_ops {
*/
vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
+ int (*migrate_to_pagecache)(struct page *page, struct page *newpage);
+
/*
* Handle the memory failure happens on a range of pfns. Notify the
* processes who are using these pfns, and try to recover the data on
diff --git a/mm/migrate.c b/mm/migrate.c
index 21f92eb..c660151 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1006,7 +1006,7 @@ int fallback_migrate_folio(struct address_space *mapping,
struct folio *dst, struct folio *src, enum migrate_mode mode,
int extra_count)
{
- if (folio_test_dirty(src)) {
+ if (!folio_is_device_private(src) && folio_test_dirty(src)) {
/* Only writeback folios in full synchronous migration */
switch (mode) {
case MIGRATE_SYNC:
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 946e9fd..9aeba66 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -160,6 +160,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
goto next;
mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
+
+ /*
+ * Tell the driver it may write to the PTE. Normally
+ * page_mkwrite() would need to be called to upgrade a
+ * read-only to writable PTE for a folio with mappings.
+ * So the driver is responsible for marking the page dirty
+ * with set_page_dirty() if it does actually write to
+ * the page.
+ */
+ mpfn |= vma->vm_flags & VM_WRITE && page->mapping ?
+ MIGRATE_PFN_WRITE : 0;
}
/* FIXME support THP */
@@ -240,6 +251,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
entry = make_migration_entry_dirty(entry);
}
}
+ entry = make_migration_entry_dirty(entry);
swp_pte = swp_entry_to_pte(entry);
if (pte_present(pte)) {
if (pte_soft_dirty(pte))
@@ -898,14 +910,15 @@ void migrate_device_page(struct page *page)
int ret;
struct page *newpage;
- WARN_ON(!is_device_private_page(page));
+ if (WARN_ON_ONCE(!is_device_private_page(page)))
+ return;
+
+ lock_page(page);
/*
- * We don't support writeback of dirty pages from the driver yet.
+ * TODO: It would be nice to have the driver call some version of this
+ * (migrate_device_range()?) so it can expand the region.
*/
- WARN_ON(PageDirty(page));
-
- lock_page(page);
try_to_migrate(page_folio(page), 0);
/*
@@ -932,18 +945,27 @@ void migrate_device_page(struct page *page)
WARN_ON_ONCE(ret != MIGRATEPAGE_SUCCESS);
page->mapping = NULL;
- /*
- * We're going to read the newpage back from disk so make it not
- * uptodate.
- */
- ClearPageUptodate(newpage);
+ if (page->pgmap->ops->migrate_to_pagecache)
+ ret = page->pgmap->ops->migrate_to_pagecache(page, newpage);
- /*
- * IO will unlock newpage asynchronously.
- */
- folio_mapping(page_folio(newpage))->a_ops->read_folio(NULL,
- page_folio(newpage));
- lock_page(newpage);
+ /* Fallback to reading page from disk */
+ if (!page->pgmap->ops->migrate_to_pagecache || ret) {
+ if (WARN_ON_ONCE(PageDirty(newpage)))
+ ClearPageDirty(newpage);
+
+ /*
+ * We're going to read the newpage back from disk so make it not
+ * uptodate.
+ */
+ ClearPageUptodate(newpage);
+
+ /*
+ * IO will unlock newpage asynchronously.
+ */
+ folio_mapping(page_folio(newpage))->a_ops->read_folio(NULL,
+ page_folio(newpage));
+ lock_page(newpage);
+ }
remove_migration_ptes(page_folio(page), page_folio(newpage), false);
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 5/6] selftests/hmm: Add file-backed migration tests
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
` (3 preceding siblings ...)
2025-03-16 4:29 ` [PATCH RFC 4/6] mm: Implement writeback for share device private pages Alistair Popple
@ 2025-03-16 4:29 ` Alistair Popple
2025-03-16 4:29 ` [PATCH RFC 6/6] nouveau: Add SVM support for migrating file-backed pages to the GPU Alistair Popple
2025-03-17 6:04 ` [PATCH RFC 0/6] Allow file-backed or shared device private pages Christoph Hellwig
6 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2025-03-16 4:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel, Alistair Popple
Add tests of file-backed migration to hmm-tests.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
lib/test_hmm.c | 27 ++-
tools/testing/selftests/mm/hmm-tests.c | 252 +++++++++++++++++++++++++-
2 files changed, 277 insertions(+), 2 deletions(-)
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 056f2e4..bd8cd29 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -979,6 +979,8 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror,
mmap_read_lock(mm);
for (addr = start; addr < end; addr = next) {
+ int i, retried = 0;
+
vma = vma_lookup(mm, addr);
if (!vma || !(vma->vm_flags & VM_READ)) {
ret = -EINVAL;
@@ -987,7 +989,7 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror,
next = min(end, addr + (ARRAY_SIZE(src_pfns) << PAGE_SHIFT));
if (next > vma->vm_end)
next = vma->vm_end;
-
+retry:
args.vma = vma;
args.src = src_pfns;
args.dst = dst_pfns;
@@ -1004,6 +1006,16 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror,
migrate_vma_pages(&args);
dmirror_migrate_finalize_and_map(&args, dmirror);
migrate_vma_finalize(&args);
+
+ for (i = 0; i < ((next - addr) >> PAGE_SHIFT); i++) {
+ if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE)
+ && migrate_pfn_to_page(src_pfns[i])
+ && retried++ < 3) {
+ wait_on_page_writeback(
+ migrate_pfn_to_page(src_pfns[i]));
+ goto retry;
+ }
+ }
}
mmap_read_unlock(mm);
mmput(mm);
@@ -1404,6 +1416,10 @@ static void dmirror_devmem_free(struct page *page)
if (rpage != page)
__free_page(rpage);
+ /* Page has been freed so reinitialize these fields */
+ ClearPageDirty(page);
+ folio_clear_swapbacked(page_folio(page));
+
mdevice = dmirror_page_to_device(page);
spin_lock(&mdevice->lock);
@@ -1459,9 +1475,18 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
return 0;
}
+static int dmirror_devmem_pagecache(struct page *page, struct page *newpage)
+{
+ set_page_dirty(newpage);
+ copy_highpage(newpage, BACKING_PAGE(page));
+
+ return 0;
+}
+
static const struct dev_pagemap_ops dmirror_devmem_ops = {
.page_free = dmirror_devmem_free,
.migrate_to_ram = dmirror_devmem_fault,
+ .migrate_to_pagecache = dmirror_devmem_pagecache,
};
static int dmirror_device_init(struct dmirror_device *mdevice, int id)
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index 141bf63..4b77edd 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -999,6 +999,254 @@ TEST_F(hmm, migrate)
}
/*
+ * Migrate file memory to device private memory.
+ */
+TEST_F(hmm, migrate_file)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ int *ptr;
+ int ret;
+
+ npages = ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift;
+ ASSERT_NE(npages, 0);
+ size = npages << self->page_shift;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = hmm_create_file(size);
+ buffer->size = size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+
+ buffer->ptr = mmap(NULL, size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /*
+ * TODO: Migration code should try and clean the pages, but it's not
+ * working.
+ */
+ fsync(buffer->fd);
+
+ /* Migrate memory to device. */
+ ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ hmm_buffer_free(buffer);
+}
+
+TEST_F(hmm, migrate_file_fault)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ int *ptr;
+ int ret;
+
+ npages = ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift;
+ ASSERT_NE(npages, 0);
+ size = npages << self->page_shift;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = hmm_create_file(size);
+ buffer->size = size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+
+ buffer->ptr = mmap(NULL, size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /*
+ * TODO: Migration code should try and clean the pages, but it's not
+ * working.
+ */
+ fsync(buffer->fd);
+
+ /* Migrate memory to device. */
+ ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ /* Fault half the pages back to system memory and check them. */
+ for (i = 0, ptr = buffer->ptr; i < size / (2 * sizeof(*ptr)); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ /* Migrate memory to the device again. */
+ ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ hmm_buffer_free(buffer);
+}
+
+TEST_F(hmm, migrate_fault_read_buf)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ int *ptr;
+ int ret;
+
+ npages = ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift;
+ ASSERT_NE(npages, 0);
+ size = npages << self->page_shift;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = hmm_create_file(size);
+ buffer->size = size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+
+ buffer->ptr = mmap(NULL, size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /*
+ * TODO: Migration code should try and clean the pages, but it's not
+ * working.
+ */
+ fsync(buffer->fd);
+
+ /* Migrate memory to device. */
+ ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ /* Use read and check what we read */
+ read(buffer->fd, buffer->mirror, size);
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i);
+
+ hmm_buffer_free(buffer);
+}
+
+TEST_F(hmm, migrate_fault_write_buf)
+{
+ struct hmm_buffer *buffer;
+ unsigned long npages;
+ unsigned long size;
+ unsigned long i;
+ int *ptr;
+ int ret;
+
+ npages = ALIGN(HMM_BUFFER_SIZE, self->page_size) >> self->page_shift;
+ ASSERT_NE(npages, 0);
+ size = npages << self->page_shift;
+
+ buffer = malloc(sizeof(*buffer));
+ ASSERT_NE(buffer, NULL);
+
+ buffer->fd = hmm_create_file(size);
+ buffer->size = size;
+ buffer->mirror = malloc(size);
+ ASSERT_NE(buffer->mirror, NULL);
+
+ buffer->ptr = mmap(NULL, size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED,
+ buffer->fd, 0);
+ ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+ /* Initialize buffer in system memory. */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ptr[i] = i;
+
+ /*
+ * TODO: Migration code should try and clean the pages, but it's not
+ * working.
+ */
+ fsync(buffer->fd);
+
+ /* Migrate memory to device. */
+ ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(buffer->cpages, npages);
+
+ /* Check what the device read and update to write to the device. */
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i]++, i);
+
+ /* Write to the buffer from the device */
+ ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_WRITE, buffer, npages);
+ ASSERT_EQ(ret, 0);
+
+ /* Truncate half the file */
+ size >>= 1;
+ ret = truncate("hmm-test-file", size);
+ ASSERT_EQ(ret, 0);
+
+ /* Use read and check what we read */
+ ret = read(buffer->fd, buffer->mirror, size);
+ ASSERT_EQ(ret, size);
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i + 1);
+
+ /* Should see the same in the mmap */
+ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], i + 1);
+
+ /* And check we get zeros in the second half */
+ size <<= 1;
+ ret = truncate("hmm-test-file", size);
+ ASSERT_EQ(ret, 0);
+
+ for (i = 0, ptr = buffer->ptr; i < size / (2*sizeof(*ptr)); ++i)
+ ASSERT_EQ(ptr[i], i + 1);
+
+ for (i = size/(2*sizeof(*ptr)), ptr = buffer->ptr;
+ i < size / sizeof(*ptr); ++i)
+ ASSERT_EQ(ptr[i], 0);
+
+ hmm_buffer_free(buffer);
+}
+
+/*
* Migrate anonymous memory to device private memory and fault some of it back
* to system memory, then try migrating the resulting mix of system and device
* private memory to the device.
@@ -1040,8 +1288,10 @@ TEST_F(hmm, migrate_fault)
ASSERT_EQ(buffer->cpages, npages);
/* Check what the device read. */
- for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
+ for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) {
ASSERT_EQ(ptr[i], i);
+ ptr[i]++;
+ }
/* Fault half the pages back to system memory and check them. */
for (i = 0, ptr = buffer->ptr; i < size / (2 * sizeof(*ptr)); ++i)
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 6/6] nouveau: Add SVM support for migrating file-backed pages to the GPU
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
` (4 preceding siblings ...)
2025-03-16 4:29 ` [PATCH RFC 5/6] selftests/hmm: Add file-backed migration tests Alistair Popple
@ 2025-03-16 4:29 ` Alistair Popple
2025-03-17 6:04 ` [PATCH RFC 0/6] Allow file-backed or shared device private pages Christoph Hellwig
6 siblings, 0 replies; 12+ messages in thread
From: Alistair Popple @ 2025-03-16 4:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-fsdevel, linux-kernel, Alistair Popple
Currently SVM for Nouveau only allows private anonymous memory to be
migrated to the GPU. Add support for migrating file-backed pages by
implementing the new migrate_to_pagecache() callback to copy pages back
as required.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
drivers/gpu/drm/nouveau/nouveau_dmem.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 1a07256..f9a5103 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -218,9 +218,33 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
return ret;
}
+static int nouveau_dmem_migrate_to_pagecache(struct page *page,
+ struct page *newpage)
+{
+ struct nouveau_drm *drm = page_to_drm(page);
+ struct nouveau_dmem *dmem = drm->dmem;
+ dma_addr_t dma_addr = 0;
+ struct nouveau_svmm *svmm;
+ struct nouveau_fence *fence;
+
+ set_page_dirty(newpage);
+ svmm = page->zone_device_data;
+ mutex_lock(&svmm->mutex);
+
+ /* TODO: Error handling */
+ WARN_ON_ONCE(nouveau_dmem_copy_one(drm, page, newpage, &dma_addr));
+ mutex_unlock(&svmm->mutex);
+ nouveau_fence_new(&fence, dmem->migrate.chan);
+ nouveau_dmem_fence_done(&fence);
+ dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+ return 0;
+}
+
static const struct dev_pagemap_ops nouveau_dmem_pagemap_ops = {
.page_free = nouveau_dmem_page_free,
.migrate_to_ram = nouveau_dmem_migrate_to_ram,
+ .migrate_to_pagecache = nouveau_dmem_migrate_to_pagecache,
};
static int
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 0/6] Allow file-backed or shared device private pages
2025-03-16 4:29 [PATCH RFC 0/6] Allow file-backed or shared device private pages Alistair Popple
` (5 preceding siblings ...)
2025-03-16 4:29 ` [PATCH RFC 6/6] nouveau: Add SVM support for migrating file-backed pages to the GPU Alistair Popple
@ 2025-03-17 6:04 ` Christoph Hellwig
2025-03-26 2:14 ` Matthew Wilcox
6 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2025-03-17 6:04 UTC (permalink / raw)
To: Alistair Popple; +Cc: linux-mm, linux-fsdevel, linux-kernel
On Sun, Mar 16, 2025 at 03:29:23PM +1100, Alistair Popple wrote:
> This series lifts that restriction by allowing ZONE_DEVICE private pages to
> exist in the pagecache.
You'd better provide a really good argument for why we'd even want
to do that. So far this cover letter fails to do that.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 0/6] Allow file-backed or shared device private pages
2025-03-17 6:04 ` [PATCH RFC 0/6] Allow file-backed or shared device private pages Christoph Hellwig
@ 2025-03-26 2:14 ` Matthew Wilcox
2025-03-27 14:49 ` Alistair Popple
2025-04-07 9:15 ` Christoph Hellwig
0 siblings, 2 replies; 12+ messages in thread
From: Matthew Wilcox @ 2025-03-26 2:14 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Alistair Popple, linux-mm, linux-fsdevel, linux-kernel
On Sun, Mar 16, 2025 at 11:04:07PM -0700, Christoph Hellwig wrote:
> On Sun, Mar 16, 2025 at 03:29:23PM +1100, Alistair Popple wrote:
> > This series lifts that restriction by allowing ZONE_DEVICE private pages to
> > exist in the pagecache.
>
> You'd better provide a really good argument for why we'd even want
> to do that. So far this cover letter fails to do that.
Alistair and I discussed this during his session at LSFMM today.
Here's what I think we agreed to.
The use case is a file containing a potentially very large data set.
Some phases of processing that data set are best done on the GPU, other
phases on the CPU. We agreed that shared writable mmap was not actually
needed (it might need to be supported for correctness, but it's not a
performance requirement).
So, there's no need to put DEVICE_PRIVATE pages in the page cache.
Instead the GPU will take a copy of the page(s). We agreed that there
will have to be some indication (probably a folio flag?) that the GPU has
or may have a copy of (some of) the folio so that it can be invalidated
if the page is removed due to truncation / eviction.
Alistair, let me know if that's not what you think we agreed to ;-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 0/6] Allow file-backed or shared device private pages
2025-03-26 2:14 ` Matthew Wilcox
@ 2025-03-27 14:49 ` Alistair Popple
2025-03-27 16:47 ` Matthew Wilcox
2025-04-07 9:15 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Alistair Popple @ 2025-03-27 14:49 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Christoph Hellwig, linux-mm, linux-fsdevel, linux-kernel
On Wed, Mar 26, 2025 at 02:14:59AM +0000, Matthew Wilcox wrote:
> On Sun, Mar 16, 2025 at 11:04:07PM -0700, Christoph Hellwig wrote:
> > On Sun, Mar 16, 2025 at 03:29:23PM +1100, Alistair Popple wrote:
> > > This series lifts that restriction by allowing ZONE_DEVICE private pages to
> > > exist in the pagecache.
> >
> > You'd better provide a really good argument for why we'd even want
> > to do that. So far this cover letter fails to do that.
>
> Alistair and I discussed this during his session at LSFMM today.
> Here's what I think we agreed to.
Thanks for writing up this summary.
>
> The use case is a file containing a potentially very large data set.
> Some phases of processing that data set are best done on the GPU, other
> phases on the CPU. We agreed that shared writable mmap was not actually
> needed (it might need to be supported for correctness, but it's not a
> performance requirement).
Right. I agree we don't currently have a good usecase for writeback so the next
revision will definitely only support read-only access.
> So, there's no need to put DEVICE_PRIVATE pages in the page cache.
> Instead the GPU will take a copy of the page(s). We agreed that there
> will have to be some indication (probably a folio flag?) that the GPU has
> or may have a copy of (some of) the folio so that it can be invalidated
> if the page is removed due to truncation / eviction.
>
> Alistair, let me know if that's not what you think we agreed to ;-)
That all looks about right. I think the flag/indication is a good idea and is
probably the best solution, but I will need to write the code to truely convince
myself of that :-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 0/6] Allow file-backed or shared device private pages
2025-03-27 14:49 ` Alistair Popple
@ 2025-03-27 16:47 ` Matthew Wilcox
0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2025-03-27 16:47 UTC (permalink / raw)
To: Alistair Popple; +Cc: Christoph Hellwig, linux-mm, linux-fsdevel, linux-kernel
On Thu, Mar 27, 2025 at 07:49:47AM -0700, Alistair Popple wrote:
> On Wed, Mar 26, 2025 at 02:14:59AM +0000, Matthew Wilcox wrote:
> > So, there's no need to put DEVICE_PRIVATE pages in the page cache.
> > Instead the GPU will take a copy of the page(s). We agreed that there
> > will have to be some indication (probably a folio flag?) that the GPU has
> > or may have a copy of (some of) the folio so that it can be invalidated
> > if the page is removed due to truncation / eviction.
> >
> > Alistair, let me know if that's not what you think we agreed to ;-)
>
> That all looks about right. I think the flag/indication is a good idea and is
> probably the best solution, but I will need to write the code to truely convince
> myself of that :-)
It might end up making more sense to make it a per-VMA flag or a
per-inode flag, but that's probably something you're in a better
position to determine than I am.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 0/6] Allow file-backed or shared device private pages
2025-03-26 2:14 ` Matthew Wilcox
2025-03-27 14:49 ` Alistair Popple
@ 2025-04-07 9:15 ` Christoph Hellwig
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2025-04-07 9:15 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Alistair Popple, linux-mm, linux-fsdevel,
linux-kernel
On Wed, Mar 26, 2025 at 02:14:59AM +0000, Matthew Wilcox wrote:
> So, there's no need to put DEVICE_PRIVATE pages in the page cache.
> Instead the GPU will take a copy of the page(s). We agreed that there
> will have to be some indication (probably a folio flag?) that the GPU has
> or may have a copy of (some of) the folio so that it can be invalidated
> if the page is removed due to truncation / eviction.
Sounds like layout leases used for pnfs.
^ permalink raw reply [flat|nested] 12+ messages in thread