All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] nfs: Modernize Direct I/O path
@ 2026-06-16 13:39 Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 1/7] nfs: make nfs_page pin-aware Pranjal Shrivastava
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Modernize the NFS Direct I/O path as a preparatory step to enable PCI
Peer-to-Peer DMA (P2PDMA) support. Following feedback on the initial
RFC [1], the modernization and architectural changes are split into
this standalone series.

Currently, NFS O_DIRECT relies on the legacy iov_iter_get_pages_alloc2()
API which does not support the pinning requirements for P2P memory.
The implementation moves NFS to the modern iov_iter_extract_pages() API
and migrates NFS direct I/O away from pages to use folios.

Design
======

1. Pin-Awareness
Standard NFS requests use get_page() and put_page() for memory
management. However, memory extracted via iov_iter_extract_pages()
requires explicit pinning.

Introduce a PG_PINNED flag and a wb_nr_pinned count to struct nfs_page.
This allows the request lifecycle to track ownership of physical pins
and ensure that unpinning is performed only when the I/O is complete.

2. API Migration
Migrate the Direct I/O path to the modern iov_iter_extract_pages()
API. This aligns NFS with the modern extraction model and serves as
the foundation for passing ITER_ALLOW_P2PDMA in a follow-up series.

3. Extraction Helper and Folio Support
Introduce a new extraction helper in direct.c to group contiguous
pages from the same folio into a single struct nfs_page. This
effectively migrates the Direct I/O path from being page-based to being
folio-based.

Note: zone_device_pages_have_same_pgmap() checks are intentionally
omitted in the extraction helper since P2PDMA enablement will be
introduced in a follow-up series.

Bisectability
=============
The series attempts to remain bisectable. 

[Patches 1-2] Introduce pin-aware infrastructure and accounting.
[Patch 3] Adds a centralized request release helper.
[Patch 4] Migrates the Direct I/O path to iov_iter_extract_pages().
[Patches 5-6] Implement the extraction helper and folio-based grouping.
[Patch 7] Removes orphaned page-based helpers.

Testing
=======
This series has been tested with xfstests [2] on RDMA & TCP transports:

 ./check generic/091 generic/130 generic/139 generic/143 generic/154 \
         generic/155 generic/183 generic/188 generic/190 generic/196 \
         generic/198 generic/203 generic/214 generic/240 generic/263 \
         generic/287 generic/290 generic/292 generic/330 generic/444 \
         generic/450 generic/451 generic/586 generic/647 generic/708 \
         generic/729 generic/760

The following summary was tabulated via a custom script [3] (on github).

python3 display.py results/*/check.log
+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| testcase     | rdma-sys-3   | rdma-sys-4.0 | rdma-sys-4.1 | rdma-sys-4.2 | tcp-sys-3    | tcp-sys-4.0  | tcp-sys-4.1  | tcp-sys-4.2  |
+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| generic/091  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/130  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/139  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/143  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/154  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/155  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/183  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/188  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/190  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/196  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/198  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/203  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/214  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/240  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/263  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/287  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/290  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/292  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/330  | skipped      | skipped      | skipped      | pass         | skipped      | skipped      | skipped      | pass         |
| generic/444  | skipped      | skipped      | skipped      | skipped      | skipped      | skipped      | skipped      | skipped      |
| generic/450  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/451  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/586  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/647  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/708  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/729  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
| generic/760  | pass         | pass         | pass         | pass         | pass         | pass         | pass         | pass         |
+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+

Thanks,
Praan

[1] https://lore.kernel.org/all/20260401194501.2269200-1-praan@google.com/
[2] https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


[v2] 
 - Fix data corruption in nfs_direct_extract_pages() by correctly
   calculating intra-page offsets using offset_in_page().
 - Fix requested_bytes accounting in direct read/write paths to only
   increment after successful RPC scheduling.
 - Add missing kernel-doc descriptions for the @pinned parameter in
   nfs_page_create_from_page() and nfs_page_create_from_folio().
 - Rebase on fs-next/

[v1] https://lore.kernel.org/all/20260603053033.3300318-1-praan@google.com/

Pranjal Shrivastava (7):
  nfs: make nfs_page pin-aware
  nfs: Track number of pinned pages in nfs_page
  nfs: Introduce nfs_release_request_list helper
  nfs: migrate direct I/O to iov_iter_extract_pages
  nfs: introduce nfs_direct_extract_pages helper
  nfs: Optimize direct I/O to use folios for requests
  nfs: Cleanup the nfs_page_create_from_page helper

 fs/nfs/direct.c          | 165 +++++++++++++++++++++++----------------
 fs/nfs/pagelist.c        |  87 +++++++++++----------
 fs/nfs/read.c            |   2 +-
 fs/nfs/write.c           |   2 +-
 include/linux/nfs_page.h |  12 ++-
 5 files changed, 150 insertions(+), 118 deletions(-)


base-commit: 389bb4a76905771adfa86d21ee0b865247148e9d
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/7] nfs: make nfs_page pin-aware
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
@ 2026-06-16 13:39 ` Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 2/7] nfs: Track number of pinned pages in nfs_page Pranjal Shrivastava
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Modernizing the NFS Direct I/O path to use iov_iter_extract_pages()
introduces page pinning (GUP) instead of standard page referencing.
To handle this correctly, nfs_page must track whether it holds a
pin or a standard reference.

Introduce a new flag, PG_PINNED, to struct nfs_page. Update the creation
path (nfs_page_create_from_page and nfs_page_create_from_folio) to
accept a pinned bool and set the flag accordingly. If the page is pinned,
we skip the existing reference increment (get_page/folio_get) as the pin
itself acts as a reference.

Update nfs_clear_request() & nfs_direct_release_pages() to use
unpin_user_page() or unpin_user_folio() instead of only refcount
decrement (put_page) when PG_PINNED flag is set. Finally, ensure
subrequests inherit the pinning status from their parent request.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/direct.c          | 22 +++++++++++++++-------
 fs/nfs/pagelist.c        | 38 ++++++++++++++++++++++++++++----------
 fs/nfs/read.c            |  2 +-
 fs/nfs/write.c           |  2 +-
 include/linux/nfs_page.h |  3 +++
 5 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e626c72495e6..19792a38c924 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -165,11 +165,17 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
 	return 0;
 }
 
-static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
+static void nfs_direct_release_pages(struct page **pages, unsigned int npages,
+				     bool pinned)
 {
 	unsigned int i;
-	for (i = 0; i < npages; i++)
-		put_page(pages[i]);
+
+	if (pinned) {
+		unpin_user_pages(pages, npages);
+	} else {
+		for (i = 0; i < npages; i++)
+			put_page(pages[i]);
+	}
 }
 
 void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
@@ -371,7 +377,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
 			/* XXX do we need to do the eof zeroing found in async_filler? */
 			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-							pgbase, pos, req_len);
+							false, pgbase, pos,
+							req_len);
 			if (IS_ERR(req)) {
 				result = PTR_ERR(req);
 				break;
@@ -386,7 +393,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			requested_bytes += req_len;
 			pos += req_len;
 		}
-		nfs_direct_release_pages(pagevec, npages);
+		nfs_direct_release_pages(pagevec, npages, false);
 		kvfree(pagevec);
 		if (result < 0)
 			break;
@@ -907,7 +914,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
 
 			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-							pgbase, pos, req_len);
+							false, pgbase, pos,
+							req_len);
 			if (IS_ERR(req)) {
 				result = PTR_ERR(req);
 				break;
@@ -950,7 +958,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			desc.pg_error = 0;
 			defer = true;
 		}
-		nfs_direct_release_pages(pagevec, npages);
+		nfs_direct_release_pages(pagevec, npages, false);
 		kvfree(pagevec);
 		if (result < 0)
 			break;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 7dd478ffc2fa..faa8bc1c6526 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -404,20 +404,26 @@ static struct nfs_page *nfs_page_create(struct nfs_lock_context *l_ctx,
 	return req;
 }
 
-static void nfs_page_assign_folio(struct nfs_page *req, struct folio *folio)
+static void nfs_page_assign_folio(struct nfs_page *req, struct folio *folio, bool pinned)
 {
 	if (folio != NULL) {
 		req->wb_folio = folio;
-		folio_get(folio);
+		if (pinned)
+			set_bit(PG_PINNED, &req->wb_flags);
+		else
+			folio_get(folio);
 		set_bit(PG_FOLIO, &req->wb_flags);
 	}
 }
 
-static void nfs_page_assign_page(struct nfs_page *req, struct page *page)
+static void nfs_page_assign_page(struct nfs_page *req, struct page *page, bool pinned)
 {
 	if (page != NULL) {
 		req->wb_page = page;
-		get_page(page);
+		if (pinned)
+			set_bit(PG_PINNED, &req->wb_flags);
+		else
+			get_page(page);
 	}
 }
 
@@ -425,6 +431,7 @@ static void nfs_page_assign_page(struct nfs_page *req, struct page *page)
  * nfs_page_create_from_page - Create an NFS read/write request.
  * @ctx: open context to use
  * @page: page to write
+ * @pinned: true if page is pinned
  * @pgbase: starting offset within the page for the write
  * @offset: file offset for the write
  * @count: number of bytes to read/write
@@ -435,6 +442,7 @@ static void nfs_page_assign_page(struct nfs_page *req, struct page *page)
  */
 struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
 					   struct page *page,
+					   bool pinned,
 					   unsigned int pgbase, loff_t offset,
 					   unsigned int count)
 {
@@ -446,7 +454,7 @@ struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
 	ret = nfs_page_create(l_ctx, pgbase, offset >> PAGE_SHIFT,
 			      offset_in_page(offset), count);
 	if (!IS_ERR(ret)) {
-		nfs_page_assign_page(ret, page);
+		nfs_page_assign_page(ret, page, pinned);
 		nfs_page_group_init(ret, NULL);
 	}
 	nfs_put_lock_context(l_ctx);
@@ -457,6 +465,7 @@ struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
  * nfs_page_create_from_folio - Create an NFS read/write request.
  * @ctx: open context to use
  * @folio: folio to write
+ * @pinned: true if folio is pinned
  * @offset: starting offset within the folio for the write
  * @count: number of bytes to read/write
  *
@@ -466,6 +475,7 @@ struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
  */
 struct nfs_page *nfs_page_create_from_folio(struct nfs_open_context *ctx,
 					    struct folio *folio,
+					    bool pinned,
 					    unsigned int offset,
 					    unsigned int count)
 {
@@ -476,7 +486,7 @@ struct nfs_page *nfs_page_create_from_folio(struct nfs_open_context *ctx,
 		return ERR_CAST(l_ctx);
 	ret = nfs_page_create(l_ctx, offset, folio->index, offset, count);
 	if (!IS_ERR(ret)) {
-		nfs_page_assign_folio(ret, folio);
+		nfs_page_assign_folio(ret, folio, pinned);
 		nfs_page_group_init(ret, NULL);
 	}
 	nfs_put_lock_context(l_ctx);
@@ -498,9 +508,11 @@ nfs_create_subreq(struct nfs_page *req,
 			      offset, count);
 	if (!IS_ERR(ret)) {
 		if (folio)
-			nfs_page_assign_folio(ret, folio);
+			nfs_page_assign_folio(ret, folio,
+					      test_bit(PG_PINNED, &req->wb_flags));
 		else
-			nfs_page_assign_page(ret, page);
+			nfs_page_assign_page(ret, page,
+					     test_bit(PG_PINNED, &req->wb_flags));
 		/* find the last request */
 		for (last = req->wb_head;
 		     last->wb_this_page != req->wb_head;
@@ -552,11 +564,17 @@ static void nfs_clear_request(struct nfs_page *req)
 	struct nfs_open_context *ctx;
 
 	if (folio != NULL) {
-		folio_put(folio);
+		if (test_and_clear_bit(PG_PINNED, &req->wb_flags))
+			unpin_user_folio(folio, 1);
+		else
+			folio_put(folio);
 		req->wb_folio = NULL;
 		clear_bit(PG_FOLIO, &req->wb_flags);
 	} else if (page != NULL) {
-		put_page(page);
+		if (test_and_clear_bit(PG_PINNED, &req->wb_flags))
+			unpin_user_page(page);
+		else
+			put_page(page);
 		req->wb_page = NULL;
 	}
 	if (l_ctx != NULL) {
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 2b70bd2b934b..e7497b029d6c 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -324,7 +324,7 @@ int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
 
 	aligned_len = min_t(unsigned int, ALIGN(len, rsize), fsize);
 
-	new = nfs_page_create_from_folio(ctx, folio, 0, aligned_len);
+	new = nfs_page_create_from_folio(ctx, folio, false, 0, aligned_len);
 	if (IS_ERR(new)) {
 		error = PTR_ERR(new);
 		if (nfs_netfs_folio_unlock(folio))
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index fcffb8c9e9df..e39e62b65ce2 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1086,7 +1086,7 @@ static struct nfs_page *nfs_setup_write_request(struct nfs_open_context *ctx,
 	req = nfs_try_to_update_request(folio, offset, bytes);
 	if (req != NULL)
 		goto out;
-	req = nfs_page_create_from_folio(ctx, folio, offset, bytes);
+	req = nfs_page_create_from_folio(ctx, folio, false, offset, bytes);
 	if (IS_ERR(req))
 		goto out;
 	nfs_inode_add_request(req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 4b9a35dbc062..fd7aafe7cb54 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -38,6 +38,7 @@ enum {
 	PG_REMOVE,		/* page group sync bit in write path */
 	PG_CONTENDED1,		/* Is someone waiting for a lock? */
 	PG_CONTENDED2,		/* Is someone waiting for a lock? */
+	PG_PINNED,		/* page is pinned by GUP */
 };
 
 struct nfs_inode;
@@ -125,11 +126,13 @@ struct nfs_pageio_descriptor {
 
 extern struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
 						  struct page *page,
+						  bool pinned,
 						  unsigned int pgbase,
 						  loff_t offset,
 						  unsigned int count);
 extern struct nfs_page *nfs_page_create_from_folio(struct nfs_open_context *ctx,
 						   struct folio *folio,
+						   bool pinned,
 						   unsigned int offset,
 						   unsigned int count);
 extern	void nfs_release_request(struct nfs_page *);
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/7] nfs: Track number of pinned pages in nfs_page
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 1/7] nfs: make nfs_page pin-aware Pranjal Shrivastava
@ 2026-06-16 13:39 ` Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 3/7] nfs: Introduce nfs_release_request_list helper Pranjal Shrivastava
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Track the number of pinned pages in nfs_page to handle unpinning
correctly, ensuring that only primary requests perform the final
unpinning operation, preventing subrequests from incorrectly
performing unpinning on behalf of their parent requests.

Add wb_nr_pinned to struct nfs_page to store the count of pinned pages
owned by the request. Update request creation and cleanup helpers to
initialize and use wb_nr_pinned for primary requests. Use the
nfs_page_array_len() helper to calculate the number of pages spanned
by a request's offset and length.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/pagelist.c        | 9 +++++++--
 include/linux/nfs_page.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index faa8bc1c6526..7d51e10fe97a 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -455,6 +455,8 @@ struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
 			      offset_in_page(offset), count);
 	if (!IS_ERR(ret)) {
 		nfs_page_assign_page(ret, page, pinned);
+		if (pinned)
+			ret->wb_nr_pinned = 1;
 		nfs_page_group_init(ret, NULL);
 	}
 	nfs_put_lock_context(l_ctx);
@@ -487,6 +489,9 @@ struct nfs_page *nfs_page_create_from_folio(struct nfs_open_context *ctx,
 	ret = nfs_page_create(l_ctx, offset, folio->index, offset, count);
 	if (!IS_ERR(ret)) {
 		nfs_page_assign_folio(ret, folio, pinned);
+		if (pinned)
+			ret->wb_nr_pinned = nfs_page_array_len(offset_in_page(offset),
+							      count);
 		nfs_page_group_init(ret, NULL);
 	}
 	nfs_put_lock_context(l_ctx);
@@ -565,14 +570,14 @@ static void nfs_clear_request(struct nfs_page *req)
 
 	if (folio != NULL) {
 		if (test_and_clear_bit(PG_PINNED, &req->wb_flags))
-			unpin_user_folio(folio, 1);
+			unpin_user_folio(folio, req->wb_nr_pinned);
 		else
 			folio_put(folio);
 		req->wb_folio = NULL;
 		clear_bit(PG_FOLIO, &req->wb_flags);
 	} else if (page != NULL) {
 		if (test_and_clear_bit(PG_PINNED, &req->wb_flags))
-			unpin_user_page(page);
+			unpin_user_pages(&page, req->wb_nr_pinned);
 		else
 			put_page(page);
 		req->wb_page = NULL;
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index fd7aafe7cb54..080fa3e23580 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -59,6 +59,7 @@ struct nfs_page {
 	struct nfs_page		*wb_this_page;  /* list of reqs for this page */
 	struct nfs_page		*wb_head;       /* head pointer for req list */
 	unsigned short		wb_nio;		/* Number of I/O attempts */
+	unsigned int		wb_nr_pinned;	/* Number of pinned pages */
 };
 
 struct nfs_pgio_mirror;
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/7] nfs: Introduce nfs_release_request_list helper
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 1/7] nfs: make nfs_page pin-aware Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 2/7] nfs: Track number of pinned pages in nfs_page Pranjal Shrivastava
@ 2026-06-16 13:39 ` Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 4/7] nfs: migrate direct I/O to iov_iter_extract_pages Pranjal Shrivastava
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Introduce a centralized helper, nfs_release_request_list, to handle
the bulk release of nfs_page requests from a list.

This serves as a preparatory step for two upcoming improvements:

   1. Pin-Aware Cleanup: As we migrate to iov_iter_extract_* API,
      requests will hold pins (GUP) instead of standard references. The
      helper ensures that the correct unpinning logic gets applied
      consistently across all requests in a list.

   2. Folio Support: In subsequent patches where nfs_page structures
      will cover multi-page folios, this helper provides a clean
      infrastructure to unlock these larger units of I/O in bulk during
      completion, similat to the pattern in bio_release_pages.

Additionally, refactor nfs_read_sync_pgio_error() to utilize this new
helper.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/direct.c          |  8 +-------
 fs/nfs/pagelist.c        | 18 ++++++++++++++++++
 include/linux/nfs_page.h |  4 ++--
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 19792a38c924..96995736fac2 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -314,13 +314,7 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
 
 static void nfs_read_sync_pgio_error(struct list_head *head, int error)
 {
-	struct nfs_page *req;
-
-	while (!list_empty(head)) {
-		req = nfs_list_entry(head->next);
-		nfs_list_remove_request(req);
-		nfs_release_request(req);
-	}
+	nfs_release_request_list(head);
 }
 
 static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 7d51e10fe97a..569bac4faff7 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -622,6 +622,24 @@ void nfs_release_request(struct nfs_page *req)
 }
 EXPORT_SYMBOL_GPL(nfs_release_request);
 
+/*
+ * nfs_release_request_list - Release a list of NFS read/write requests
+ * @head: list of requests to release
+ *
+ * Removes each request from the list and drops it's refcount.
+ */
+void nfs_release_request_list(struct list_head *head)
+{
+	struct nfs_page *req;
+
+	while (!list_empty(head)) {
+		req = nfs_list_entry(head->next);
+		nfs_list_remove_request(req);
+		nfs_release_request(req);
+	}
+}
+EXPORT_SYMBOL_GPL(nfs_release_request_list);
+
 /*
  * nfs_generic_pg_test - determine if requests can be coalesced
  * @desc: pointer to descriptor
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 080fa3e23580..d23208ed3a33 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -136,8 +136,8 @@ extern struct nfs_page *nfs_page_create_from_folio(struct nfs_open_context *ctx,
 						   bool pinned,
 						   unsigned int offset,
 						   unsigned int count);
-extern	void nfs_release_request(struct nfs_page *);
-
+extern void nfs_release_request(struct nfs_page *req);
+extern void nfs_release_request_list(struct list_head *head);
 
 extern	void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 			     struct inode *inode,
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/7] nfs: migrate direct I/O to iov_iter_extract_pages
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
                   ` (2 preceding siblings ...)
  2026-06-16 13:39 ` [PATCH v2 3/7] nfs: Introduce nfs_release_request_list helper Pranjal Shrivastava
@ 2026-06-16 13:39 ` Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 5/7] nfs: introduce nfs_direct_extract_pages helper Pranjal Shrivastava
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Migrate the NFS Direct I/O path away from the legacy
iov_iter_get_pages_alloc2() API to the modern iov_iter_extract_pages API.
The transition aligns NFS with the modern VFS extraction model and serves
as a preparatory step for supporting requirements such as page pinning
via GUP for DMA.

The migration fixes a bug in the Direct I/O loop where pages were being
unpinned immediately after request creation. With the new extraction
model, pins are held until the I/O is complete. Manual release in the
loop is correspondingly updated to only clean up failed pages.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/direct.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 96995736fac2..b9ac0a67693c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -354,16 +354,17 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	inode_dio_begin(inode);
 
 	while (iov_iter_count(iter)) {
-		struct page **pagevec;
+		struct page **pagevec = NULL;
 		size_t bytes;
 		size_t pgbase;
 		unsigned npages, i;
+		bool pinned = iov_iter_extract_will_pin(iter);
 
-		result = iov_iter_get_pages_alloc2(iter, &pagevec,
-						  rsize, &pgbase);
+		result = iov_iter_extract_pages(iter, &pagevec,
+						rsize, ~0U, 0, &pgbase);
 		if (result < 0)
 			break;
-	
+
 		bytes = result;
 		npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
 		for (i = 0; i < npages; i++) {
@@ -371,7 +372,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
 			/* XXX do we need to do the eof zeroing found in async_filler? */
 			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-							false, pgbase, pos,
+							pinned, pgbase, pos,
 							req_len);
 			if (IS_ERR(req)) {
 				result = PTR_ERR(req);
@@ -387,7 +388,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			requested_bytes += req_len;
 			pos += req_len;
 		}
-		nfs_direct_release_pages(pagevec, npages, false);
+		if (i < npages)
+			nfs_direct_release_pages(pagevec + i, npages - i, pinned);
 		kvfree(pagevec);
 		if (result < 0)
 			break;
@@ -891,13 +893,14 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 
 	NFS_I(inode)->write_io += iov_iter_count(iter);
 	while (iov_iter_count(iter)) {
-		struct page **pagevec;
+		struct page **pagevec = NULL;
 		size_t bytes;
 		size_t pgbase;
 		unsigned npages, i;
+		bool pinned = iov_iter_extract_will_pin(iter);
 
-		result = iov_iter_get_pages_alloc2(iter, &pagevec,
-						  wsize, &pgbase);
+		result = iov_iter_extract_pages(iter, &pagevec,
+						wsize, ~0U, 0, &pgbase);
 		if (result < 0)
 			break;
 
@@ -908,7 +911,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
 
 			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-							false, pgbase, pos,
+							pinned, pgbase, pos,
 							req_len);
 			if (IS_ERR(req)) {
 				result = PTR_ERR(req);
@@ -952,7 +955,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			desc.pg_error = 0;
 			defer = true;
 		}
-		nfs_direct_release_pages(pagevec, npages, false);
+		if (i < npages)
+			nfs_direct_release_pages(pagevec + i, npages - i, pinned);
 		kvfree(pagevec);
 		if (result < 0)
 			break;
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/7] nfs: introduce nfs_direct_extract_pages helper
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
                   ` (3 preceding siblings ...)
  2026-06-16 13:39 ` [PATCH v2 4/7] nfs: migrate direct I/O to iov_iter_extract_pages Pranjal Shrivastava
@ 2026-06-16 13:39 ` Pranjal Shrivastava
  2026-06-16 13:39 ` [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests Pranjal Shrivastava
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Introduce nfs_direct_extract_pages() in direct.c to centralize page
extraction and request creation for the Direct I/O path. The helper
manages extraction from the iters and builds a list of nfs_page requests

Refactor nfs_direct_read_schedule_iovec() and
nfs_direct_write_schedule_iovec() to utilize the new helper, unifying
the extraction logic on both paths.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/direct.c | 122 +++++++++++++++++++++++-------------------------
 1 file changed, 59 insertions(+), 63 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index b9ac0a67693c..e2a93cfb6c72 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -178,6 +178,50 @@ static void nfs_direct_release_pages(struct page **pages, unsigned int npages,
 	}
 }
 
+static ssize_t nfs_direct_extract_pages(struct nfs_direct_req *dreq,
+					 struct iov_iter *iter,
+					 size_t size, loff_t *pos,
+					 struct list_head *list)
+{
+	bool pinned = iov_iter_extract_will_pin(iter);
+	struct page **pagevec = NULL;
+	ssize_t result, bytes = 0;
+	unsigned int npages, i;
+	size_t pgbase;
+
+	result = iov_iter_extract_pages(iter, &pagevec, size, ~0U, 0, &pgbase);
+	if (result <= 0)
+		return result;
+
+	npages = (result + pgbase + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	for (i = 0; i < npages; i++) {
+		struct nfs_page *req;
+		unsigned int req_len = min_t(size_t, result - bytes, PAGE_SIZE - pgbase);
+
+		req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
+						pinned, pgbase, *pos,
+						req_len);
+		if (IS_ERR(req)) {
+			if (!bytes)
+				bytes = PTR_ERR(req);
+			break;
+		}
+
+		list_add_tail(&req->wb_list, list);
+		pgbase = 0;
+		bytes += req_len;
+		*pos += req_len;
+	}
+
+	if (i < npages) {
+		iov_iter_revert(iter, result - bytes);
+		nfs_direct_release_pages(pagevec + i, npages - i, pinned);
+	}
+
+	kvfree(pagevec);
+	return bytes;
+}
+
 void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
 			      struct nfs_direct_req *dreq)
 {
@@ -346,6 +390,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	ssize_t result = -EINVAL;
 	size_t requested_bytes = 0;
 	size_t rsize = max_t(size_t, NFS_SERVER(inode)->rsize, PAGE_SIZE);
+	LIST_HEAD(nfs_page_list);
 
 	nfs_pageio_init_read(&desc, dreq->inode, false,
 			     &nfs_direct_read_completion_ops);
@@ -354,43 +399,22 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	inode_dio_begin(inode);
 
 	while (iov_iter_count(iter)) {
-		struct page **pagevec = NULL;
-		size_t bytes;
-		size_t pgbase;
-		unsigned npages, i;
-		bool pinned = iov_iter_extract_will_pin(iter);
-
-		result = iov_iter_extract_pages(iter, &pagevec,
-						rsize, ~0U, 0, &pgbase);
+		result = nfs_direct_extract_pages(dreq, iter, rsize, &pos, &nfs_page_list);
 		if (result < 0)
 			break;
 
-		bytes = result;
-		npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
-		for (i = 0; i < npages; i++) {
-			struct nfs_page *req;
-			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
-			/* XXX do we need to do the eof zeroing found in async_filler? */
-			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-							pinned, pgbase, pos,
-							req_len);
-			if (IS_ERR(req)) {
-				result = PTR_ERR(req);
-				break;
-			}
+		requested_bytes += result;
+		while (!list_empty(&nfs_page_list)) {
+			struct nfs_page *req = nfs_list_entry(nfs_page_list.next);
+
+			nfs_list_remove_request(req);
 			if (!nfs_pageio_add_request(&desc, req)) {
 				result = desc.pg_error;
 				nfs_release_request(req);
+				nfs_release_request_list(&nfs_page_list);
 				break;
 			}
-			pgbase = 0;
-			bytes -= req_len;
-			requested_bytes += req_len;
-			pos += req_len;
 		}
-		if (i < npages)
-			nfs_direct_release_pages(pagevec + i, npages - i, pinned);
-		kvfree(pagevec);
 		if (result < 0)
 			break;
 	}
@@ -881,6 +905,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 	ssize_t result = 0;
 	size_t requested_bytes = 0;
 	size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE);
+	LIST_HEAD(nfs_page_list);
 	bool defer = false;
 
 	trace_nfs_direct_write_schedule_iovec(dreq);
@@ -893,42 +918,15 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 
 	NFS_I(inode)->write_io += iov_iter_count(iter);
 	while (iov_iter_count(iter)) {
-		struct page **pagevec = NULL;
-		size_t bytes;
-		size_t pgbase;
-		unsigned npages, i;
-		bool pinned = iov_iter_extract_will_pin(iter);
-
-		result = iov_iter_extract_pages(iter, &pagevec,
-						wsize, ~0U, 0, &pgbase);
+		result = nfs_direct_extract_pages(dreq, iter, wsize, &pos, &nfs_page_list);
 		if (result < 0)
 			break;
 
-		bytes = result;
-		npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
-		for (i = 0; i < npages; i++) {
-			struct nfs_page *req;
-			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
-
-			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-							pinned, pgbase, pos,
-							req_len);
-			if (IS_ERR(req)) {
-				result = PTR_ERR(req);
-				break;
-			}
-
-			if (desc.pg_error < 0) {
-				nfs_free_request(req);
-				result = desc.pg_error;
-				break;
-			}
-
-			pgbase = 0;
-			bytes -= req_len;
-			requested_bytes += req_len;
-			pos += req_len;
+		requested_bytes += result;
+		while (!list_empty(&nfs_page_list)) {
+			struct nfs_page *req = nfs_list_entry(nfs_page_list.next);
 
+			nfs_list_remove_request(req);
 			if (defer) {
 				nfs_mark_request_commit(req, NULL, &cinfo, 0);
 				continue;
@@ -942,6 +940,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) {
 				result = desc.pg_error;
 				nfs_unlock_and_release_request(req);
+				nfs_release_request_list(&nfs_page_list);
 				break;
 			}
 
@@ -955,9 +954,6 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			desc.pg_error = 0;
 			defer = true;
 		}
-		if (i < npages)
-			nfs_direct_release_pages(pagevec + i, npages - i, pinned);
-		kvfree(pagevec);
 		if (result < 0)
 			break;
 	}
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
                   ` (4 preceding siblings ...)
  2026-06-16 13:39 ` [PATCH v2 5/7] nfs: introduce nfs_direct_extract_pages helper Pranjal Shrivastava
@ 2026-06-16 13:39 ` Pranjal Shrivastava
  2026-06-16 15:29   ` Trond Myklebust
  2026-06-16 13:40 ` [PATCH v2 7/7] nfs: Cleanup the nfs_page_create_from_page helper Pranjal Shrivastava
  2026-06-16 14:15 ` [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
  7 siblings, 1 reply; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:39 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Optimize nfs_direct_extract_pages() to group contiguous pages from the
same folio into single nfs_page structures. This effectively migrates
NFS Direct I/O from being page-based to being folio-based.

Reduce the number of nfs_page allocations and subsequent iterations
by utilizing nfs_page_create_from_folio() to create aggregated
requests.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/direct.c | 47 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e2a93cfb6c72..ddc6b27f5315 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -194,23 +194,45 @@ static ssize_t nfs_direct_extract_pages(struct nfs_direct_req *dreq,
 		return result;
 
 	npages = (result + pgbase + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	for (i = 0; i < npages; i++) {
+	for (i = 0; i < npages; ) {
+		unsigned int chunk_len, folio_offset;
+		unsigned int nr_to_add = 1;
 		struct nfs_page *req;
-		unsigned int req_len = min_t(size_t, result - bytes, PAGE_SIZE - pgbase);
+		struct folio *folio;
 
-		req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
-						pinned, pgbase, *pos,
-						req_len);
+		folio = page_folio(pagevec[i]);
+		folio_offset = (folio_page_idx(folio, pagevec[i]) << PAGE_SHIFT) + pgbase;
+		chunk_len = min_t(size_t, result - bytes, PAGE_SIZE - pgbase);
+
+		while (i + nr_to_add < npages) {
+			struct page *next_page = pagevec[i + nr_to_add];
+			struct page *prev_page = pagevec[i + nr_to_add - 1];
+
+			if (page_folio(next_page) != folio ||
+			    next_page != prev_page + 1)
+				break;
+
+			chunk_len += min_t(size_t, result - bytes - chunk_len, PAGE_SIZE);
+			nr_to_add++;
+		}
+
+		req = nfs_page_create_from_folio(dreq->ctx, folio,
+						  pinned, folio_offset,
+						  chunk_len);
 		if (IS_ERR(req)) {
 			if (!bytes)
 				bytes = PTR_ERR(req);
 			break;
 		}
 
+		req->wb_index = *pos >> PAGE_SHIFT;
+		req->wb_offset = offset_in_page(*pos);
+
 		list_add_tail(&req->wb_list, list);
 		pgbase = 0;
-		bytes += req_len;
-		*pos += req_len;
+		bytes += chunk_len;
+		*pos += chunk_len;
+		i += nr_to_add;
 	}
 
 	if (i < npages) {
@@ -403,9 +425,9 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 		if (result < 0)
 			break;
 
-		requested_bytes += result;
 		while (!list_empty(&nfs_page_list)) {
 			struct nfs_page *req = nfs_list_entry(nfs_page_list.next);
+			size_t req_len = req->wb_bytes;
 
 			nfs_list_remove_request(req);
 			if (!nfs_pageio_add_request(&desc, req)) {
@@ -414,6 +436,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 				nfs_release_request_list(&nfs_page_list);
 				break;
 			}
+			requested_bytes += req_len;
 		}
 		if (result < 0)
 			break;
@@ -922,19 +945,22 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 		if (result < 0)
 			break;
 
-		requested_bytes += result;
 		while (!list_empty(&nfs_page_list)) {
 			struct nfs_page *req = nfs_list_entry(nfs_page_list.next);
+			size_t req_len = req->wb_bytes;
 
 			nfs_list_remove_request(req);
 			if (defer) {
 				nfs_mark_request_commit(req, NULL, &cinfo, 0);
+				requested_bytes += req_len;
 				continue;
 			}
 
 			nfs_lock_request(req);
-			if (nfs_pageio_add_request(&desc, req))
+			if (nfs_pageio_add_request(&desc, req)) {
+				requested_bytes += req_len;
 				continue;
+			}
 
 			/* Exit on hard errors */
 			if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) {
@@ -951,6 +977,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			spin_unlock(&dreq->lock);
 			nfs_unlock_request(req);
 			nfs_mark_request_commit(req, NULL, &cinfo, 0);
+			requested_bytes += req_len;
 			desc.pg_error = 0;
 			defer = true;
 		}
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 7/7] nfs: Cleanup the nfs_page_create_from_page helper
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
                   ` (5 preceding siblings ...)
  2026-06-16 13:39 ` [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests Pranjal Shrivastava
@ 2026-06-16 13:40 ` Pranjal Shrivastava
  2026-06-16 14:15 ` [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 13:40 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant, Pranjal Shrivastava

Remove the nfs_page_create_from_page() helper and its public export.
Following the migration of the Direct I/O path to folios, this
function no longer has any callers in the NFS client.

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 fs/nfs/pagelist.c        | 36 ------------------------------------
 include/linux/nfs_page.h |  6 ------
 2 files changed, 42 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 569bac4faff7..56d397645cc0 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -427,42 +427,6 @@ static void nfs_page_assign_page(struct nfs_page *req, struct page *page, bool p
 	}
 }
 
-/**
- * nfs_page_create_from_page - Create an NFS read/write request.
- * @ctx: open context to use
- * @page: page to write
- * @pinned: true if page is pinned
- * @pgbase: starting offset within the page for the write
- * @offset: file offset for the write
- * @count: number of bytes to read/write
- *
- * The page must be locked by the caller. This makes sure we never
- * create two different requests for the same page.
- * User should ensure it is safe to sleep in this function.
- */
-struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
-					   struct page *page,
-					   bool pinned,
-					   unsigned int pgbase, loff_t offset,
-					   unsigned int count)
-{
-	struct nfs_lock_context *l_ctx = nfs_get_lock_context(ctx);
-	struct nfs_page *ret;
-
-	if (IS_ERR(l_ctx))
-		return ERR_CAST(l_ctx);
-	ret = nfs_page_create(l_ctx, pgbase, offset >> PAGE_SHIFT,
-			      offset_in_page(offset), count);
-	if (!IS_ERR(ret)) {
-		nfs_page_assign_page(ret, page, pinned);
-		if (pinned)
-			ret->wb_nr_pinned = 1;
-		nfs_page_group_init(ret, NULL);
-	}
-	nfs_put_lock_context(l_ctx);
-	return ret;
-}
-
 /**
  * nfs_page_create_from_folio - Create an NFS read/write request.
  * @ctx: open context to use
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d23208ed3a33..86d0300075d3 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -125,12 +125,6 @@ struct nfs_pageio_descriptor {
 /* arbitrarily selected limit to number of mirrors */
 #define NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX 16
 
-extern struct nfs_page *nfs_page_create_from_page(struct nfs_open_context *ctx,
-						  struct page *page,
-						  bool pinned,
-						  unsigned int pgbase,
-						  loff_t offset,
-						  unsigned int count);
 extern struct nfs_page *nfs_page_create_from_folio(struct nfs_open_context *ctx,
 						   struct folio *folio,
 						   bool pinned,
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] nfs: Modernize Direct I/O path
  2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
                   ` (6 preceding siblings ...)
  2026-06-16 13:40 ` [PATCH v2 7/7] nfs: Cleanup the nfs_page_create_from_page helper Pranjal Shrivastava
@ 2026-06-16 14:15 ` Pranjal Shrivastava
  7 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 14:15 UTC (permalink / raw)
  To: linux-nfs, linux-kernel
  Cc: Trond Myklebust, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant

On Tue, Jun 16, 2026 at 01:39:53PM +0000, Pranjal Shrivastava wrote:
> Modernize the NFS Direct I/O path as a preparatory step to enable PCI
> Peer-to-Peer DMA (P2PDMA) support. Following feedback on the initial
> RFC [1], the modernization and architectural changes are split into
> this standalone series.
> 
> Currently, NFS O_DIRECT relies on the legacy iov_iter_get_pages_alloc2()
> API which does not support the pinning requirements for P2P memory.
> The implementation moves NFS to the modern iov_iter_extract_pages() API
> and migrates NFS direct I/O away from pages to use folios.
> 
> Design
> ======
> 
> 1. Pin-Awareness
> Standard NFS requests use get_page() and put_page() for memory
> management. However, memory extracted via iov_iter_extract_pages()
> requires explicit pinning.
> 
> Introduce a PG_PINNED flag and a wb_nr_pinned count to struct nfs_page.
> This allows the request lifecycle to track ownership of physical pins
> and ensure that unpinning is performed only when the I/O is complete.
> 
> 2. API Migration
> Migrate the Direct I/O path to the modern iov_iter_extract_pages()
> API. This aligns NFS with the modern extraction model and serves as
> the foundation for passing ITER_ALLOW_P2PDMA in a follow-up series.
> 
> 3. Extraction Helper and Folio Support
> Introduce a new extraction helper in direct.c to group contiguous
> pages from the same folio into a single struct nfs_page. This
> effectively migrates the Direct I/O path from being page-based to being
> folio-based.
> 
> Note: zone_device_pages_have_same_pgmap() checks are intentionally
> omitted in the extraction helper since P2PDMA enablement will be
> introduced in a follow-up series.
> 
> Bisectability
> =============
> The series attempts to remain bisectable. 
> 
> [Patches 1-2] Introduce pin-aware infrastructure and accounting.
> [Patch 3] Adds a centralized request release helper.
> [Patch 4] Migrates the Direct I/O path to iov_iter_extract_pages().
> [Patches 5-6] Implement the extraction helper and folio-based grouping.
> [Patch 7] Removes orphaned page-based helpers.
> 
> Testing
> =======
> This series has been tested with xfstests [2] on RDMA & TCP transports:
> 
>  ./check generic/091 generic/130 generic/139 generic/143 generic/154 \
>          generic/155 generic/183 generic/188 generic/190 generic/196 \
>          generic/198 generic/203 generic/214 generic/240 generic/263 \
>          generic/287 generic/290 generic/292 generic/330 generic/444 \
>          generic/450 generic/451 generic/586 generic/647 generic/708 \
>          generic/729 generic/760
> 
> The following summary was tabulated via a custom script [3] (on github).
> 

[...]

> 
> [1] https://lore.kernel.org/all/20260401194501.2269200-1-praan@google.com/
> [2] https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git

Missed [3] https://github.com/pran005/tools/blob/main/display.py

> 
> 
> [v2] 
>  - Fix data corruption in nfs_direct_extract_pages() by correctly
>    calculating intra-page offsets using offset_in_page().
>  - Fix requested_bytes accounting in direct read/write paths to only
>    increment after successful RPC scheduling.
>  - Add missing kernel-doc descriptions for the @pinned parameter in
>    nfs_page_create_from_page() and nfs_page_create_from_folio().
>  - Rebase on fs-next/

Thanks,
Praan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-16 13:39 ` [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests Pranjal Shrivastava
@ 2026-06-16 15:29   ` Trond Myklebust
  2026-06-16 17:23     ` Pranjal Shrivastava
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2026-06-16 15:29 UTC (permalink / raw)
  To: Pranjal Shrivastava, linux-nfs, linux-kernel
  Cc: Anna Schumaker, Christoph Hellwig, Christoph Hellwig,
	Shivaji Kant

On Tue, 2026-06-16 at 13:39 +0000, Pranjal Shrivastava wrote:
> Optimize nfs_direct_extract_pages() to group contiguous pages from
> the
> same folio into single nfs_page structures. This effectively migrates
> NFS Direct I/O from being page-based to being folio-based.
> 
> Reduce the number of nfs_page allocations and subsequent iterations
> by utilizing nfs_page_create_from_folio() to create aggregated
> requests.
> 
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
>  fs/nfs/direct.c | 47 +++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 37 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index e2a93cfb6c72..ddc6b27f5315 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -194,23 +194,45 @@ static ssize_t nfs_direct_extract_pages(struct
> nfs_direct_req *dreq,
>  		return result;
>  
>  	npages = (result + pgbase + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -	for (i = 0; i < npages; i++) {
> +	for (i = 0; i < npages; ) {
> +		unsigned int chunk_len, folio_offset;
> +		unsigned int nr_to_add = 1;
>  		struct nfs_page *req;
> -		unsigned int req_len = min_t(size_t, result - bytes,
> PAGE_SIZE - pgbase);
> +		struct folio *folio;
>  
> -		req = nfs_page_create_from_page(dreq->ctx,
> pagevec[i],
> -						pinned, pgbase,
> *pos,
> -						req_len);
> +		folio = page_folio(pagevec[i]);

I'm clearly missing something. The memory pointed to by these pages can
be any arbitrary user space (or kernel space) memory region. It could
be mapped device memory, for instance.

So why can you assume that page_folio() will resolve to a valid folio
here?


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-16 15:29   ` Trond Myklebust
@ 2026-06-16 17:23     ` Pranjal Shrivastava
  0 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-06-16 17:23 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs, linux-kernel, Anna Schumaker, Christoph Hellwig,
	Christoph Hellwig, Shivaji Kant

On Tue, Jun 16, 2026 at 11:29:13AM -0400, Trond Myklebust wrote:

Hi Trond

> On Tue, 2026-06-16 at 13:39 +0000, Pranjal Shrivastava wrote:
> > Optimize nfs_direct_extract_pages() to group contiguous pages from
> > the
> > same folio into single nfs_page structures. This effectively migrates
> > NFS Direct I/O from being page-based to being folio-based.
> > 
> > Reduce the number of nfs_page allocations and subsequent iterations
> > by utilizing nfs_page_create_from_folio() to create aggregated
> > requests.
> > 
> > Signed-off-by: Pranjal Shrivastava <praan@google.com>
> > ---
> >  fs/nfs/direct.c | 47 +++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 37 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index e2a93cfb6c72..ddc6b27f5315 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -194,23 +194,45 @@ static ssize_t nfs_direct_extract_pages(struct
> > nfs_direct_req *dreq,
> >  		return result;
> >  
> >  	npages = (result + pgbase + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > -	for (i = 0; i < npages; i++) {
> > +	for (i = 0; i < npages; ) {
> > +		unsigned int chunk_len, folio_offset;
> > +		unsigned int nr_to_add = 1;
> >  		struct nfs_page *req;
> > -		unsigned int req_len = min_t(size_t, result - bytes,
> > PAGE_SIZE - pgbase);
> > +		struct folio *folio;
> >  
> > -		req = nfs_page_create_from_page(dreq->ctx,
> > pagevec[i],
> > -						pinned, pgbase,
> > *pos,
> > -						req_len);
> > +		folio = page_folio(pagevec[i]);
> 
> I'm clearly missing something. The memory pointed to by these pages can
> be any arbitrary user space (or kernel space) memory region. It could
> be mapped device memory, for instance.
> 
> So why can you assume that page_folio() will resolve to a valid folio
> here?

AFAIU, the MM subsystem explicitly ensures that every valid struct page
is part of a folio. The documentation for page_folio() explicitly 
states [1]:

     "Every page is part of a folio. This function cannot be called on a
      NULL pointer."

Since iov_iter_extract_pages() only returns pages that are successfully
pinned and tracked by the kernel, we are guaranteed that pagevec[i] 
points to a valid struct page and thus a valid folio.

Regarding device-mapped memory, ZONE_DEVICE pages have also been
refactored to support folios recently (e.g. free_zone_device_folio() [2])
If the memory is not part of a large compound page, page_folio() simply
returns the struct page pointer cast to a struct folio * [3]. In this 
case, the folio size is effectively 1, and our extraction loop correctly
handles it as a single-page request unless it identifies physical 
contiguity within the same folio.

The only other thing to take care was folio_split which applies 
specifically when the caller does not hold a reference on the page. 
However, in our case (NFS) the iov_iter_extract_pages() has already
pinned the folio via GUP by this point which ensures that the folio 
cannot be split or freed under us, making the page_folio() call and the
subsequent aggregation logic safe.

Finally, in cases where device memory is NOT backed by struct page
(e.g. dmabuf or PFN-based mappings via remap_pfn_range), the buffers
are already unsupported for NFS Direct I/O today. The underlying page
pinning (GUP) would fail with -EFAULT in check_vma_flags() [4] even
before reaching this point.

Given the above guarantees by the kernel, we can ensure that this
resolves to a valid folio at this point in the file-system.

Thanks,
Praan

[1] https://elixir.bootlin.com/linux/v7.1-rc6/source/include/linux/page-flags.h#L291
[2] https://elixir.bootlin.com/linux/v7.1-rc6/source/mm/memremap.c#L416
[3] https://elixir.bootlin.com/linux/v7.1-rc6/source/include/linux/page-flags.h#L234
[4] https://elixir.bootlin.com/linux/v7.1-rc6/source/mm/gup.c#L1208

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-16 17:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 13:39 [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava
2026-06-16 13:39 ` [PATCH v2 1/7] nfs: make nfs_page pin-aware Pranjal Shrivastava
2026-06-16 13:39 ` [PATCH v2 2/7] nfs: Track number of pinned pages in nfs_page Pranjal Shrivastava
2026-06-16 13:39 ` [PATCH v2 3/7] nfs: Introduce nfs_release_request_list helper Pranjal Shrivastava
2026-06-16 13:39 ` [PATCH v2 4/7] nfs: migrate direct I/O to iov_iter_extract_pages Pranjal Shrivastava
2026-06-16 13:39 ` [PATCH v2 5/7] nfs: introduce nfs_direct_extract_pages helper Pranjal Shrivastava
2026-06-16 13:39 ` [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests Pranjal Shrivastava
2026-06-16 15:29   ` Trond Myklebust
2026-06-16 17:23     ` Pranjal Shrivastava
2026-06-16 13:40 ` [PATCH v2 7/7] nfs: Cleanup the nfs_page_create_from_page helper Pranjal Shrivastava
2026-06-16 14:15 ` [PATCH v2 0/7] nfs: Modernize Direct I/O path Pranjal Shrivastava

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.