All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] nfs: Enable PCI Peer-to-Peer DMA (P2PDMA) support
@ 2026-04-01 19:44 Pranjal Shrivastava
  2026-04-01 19:44 ` [RFC PATCH 1/4] sunrpc: add supports_p2pdma to rpc_xprt_ops Pranjal Shrivastava
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-04-01 19:44 UTC (permalink / raw)
  To: trond.myklebust, anna
  Cc: davem, kuba, edumazet, pabeni, chuck.lever, jlayton, tom,
	okorniev, neil, dai.ngo, linux-nfs, netdev, Pranjal Shrivastava

As high-performance storage environments increasingly rely on direct
data movement between PCIe endpoints (e.g., moving data directly between
an NVMe Controller Memory Buffer and a Network Interface), support for 
Peer-to-Peer DMA (P2PDMA) in the network filesystem layer becomes 
essential. This series introduces P2PDMA support for the NFS Direct I/O.

Currently, NFS O_DIRECT operations fail with -EREMOTEIO if the user
buffer resides in PCIe BAR memory. This is primarily due to the use of
the legacy `iov_iter_get_pages_alloc2()` API, which cannot pass the 
required `FOLL_PCI_P2PDMA` flag, and a request lifecycle that is unaware
 of the pinning requirements for P2P memory.

Design
=======
The proposed design centers around making the NFS request lifecycle
"pin-aware" and upgrading the infrastructure to support modern memory 
extraction APIs.

1. 64-bit Capability Infrastructure
The existing nfs_server->caps bitmask is limited to 32 bits and is
currently exhausted. This series expands the bitmask to 64 bits to
accommodate NFS_CAP_P2PDMA. Crucially, it also refactors the NFS_CAP_*
constants to use ULL definitions. This prevents a subtle 32-bit
truncation bug where bitwise negations (e.g., caps &= ~NFS_CAP_ACLS)
would accidentally clear the high bits of the 64-bit capability field.

2. Transport-Level Detection
P2PDMA support is a property of the local transport hardware. A new
supports_p2pdma operation is added to the SunRPC transport ops. For RDMA,
this is implemented by querying the underlying device via 
ib_dma_pci_p2p_dma_supported(). The NFS client queries this during mount
and sets the NFS_CAP_P2PDMA bit accordingly.

3. Pin-Aware Request Lifecycle
Standard NFS requests use get_page() and put_page() for memory 
management. However, memory extracted via iov_iter_extract_pages()
requires explicit pinning and unpinning (unpin_user_page()).

This series introduces a PG_PINNED flag in struct nfs_page. When set,
the request lifecycle skips standard page referencing and ensures that
unpin_user_page() is called only when the I/O is complete. This ensures
that physical memory remains pinned for the duration of the DMA transfer

4. API Migration
The Direct I/O path is migrated to the modern iov_iter_extract_pages()
API. The ITER_ALLOW_P2PDMA flag is passed to the iterator only when the
local mount has signaled P2P support via the capability bit. This ensures
that "normal" users on standard TCP/UDP transports see no change in
behavior or overhead.

Call for review
===============
Any insights on the proposed changes to the nfs_page lifecycle and the
64-bit capability expansion are appreciated. If this approach is deemed
incorrect or if there is a more idiomatic way for this, please direct me
in the right direction.

Thanks,
Praan

Pranjal Shrivastava (4):
  sunrpc: add supports_p2pdma to rpc_xprt_ops
  nfs: add NFS_CAP_P2PDMA and detect transport support
  nfs: make nfs_page pin-aware
  nfs: allow P2PDMA in direct I/O path

 fs/nfs/client.c                 |  8 ++++
 fs/nfs/direct.c                 | 51 ++++++++++++++++++-------
 fs/nfs/nfs4_fs.h                |  2 +-
 fs/nfs/pagelist.c               | 18 ++++++---
 fs/nfs/super.c                  |  2 +-
 include/linux/nfs_fs_sb.h       | 67 +++++++++++++++++----------------
 include/linux/nfs_page.h        |  2 +
 include/linux/sunrpc/xprt.h     |  1 +
 net/sunrpc/xprtrdma/transport.c |  9 +++++
 9 files changed, 106 insertions(+), 54 deletions(-)

-- 
2.53.0.1185.g05d4b7b318-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-04-16  5:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 19:44 [RFC PATCH 0/4] nfs: Enable PCI Peer-to-Peer DMA (P2PDMA) support Pranjal Shrivastava
2026-04-01 19:44 ` [RFC PATCH 1/4] sunrpc: add supports_p2pdma to rpc_xprt_ops Pranjal Shrivastava
2026-04-01 19:44 ` [RFC PATCH 2/4] nfs: add NFS_CAP_P2PDMA and detect transport support Pranjal Shrivastava
2026-04-02 13:11   ` Chuck Lever
2026-04-14 19:54     ` Pranjal Shrivastava
2026-04-14 20:59       ` Chuck Lever
2026-04-01 19:44 ` [RFC PATCH 3/4] nfs: make nfs_page pin-aware Pranjal Shrivastava
2026-04-02  5:04   ` Christoph Hellwig
2026-04-14 19:58     ` Pranjal Shrivastava
2026-04-16  5:28       ` Christoph Hellwig
2026-04-01 19:45 ` [RFC PATCH 4/4] nfs: allow P2PDMA in direct I/O path Pranjal Shrivastava
2026-04-02  5:05   ` Christoph Hellwig
2026-04-14 20:00     ` Pranjal Shrivastava
2026-04-16  5:29       ` Christoph Hellwig
2026-04-02  5:07 ` [RFC PATCH 0/4] nfs: Enable PCI Peer-to-Peer DMA (P2PDMA) support Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.