From: "Chuck Lever" <cel@kernel.org>
To: "Pranjal Shrivastava" <praan@google.com>
Cc: "Trond Myklebust" <trond.myklebust@hammerspace.com>,
"Anna Schumaker" <anna@kernel.org>,
davem@davemloft.net, "Jakub Kicinski" <kuba@kernel.org>,
edumazet@google.com, "Paolo Abeni" <pabeni@redhat.com>,
"Chuck Lever" <chuck.lever@oracle.com>,
"Jeff Layton" <jlayton@kernel.org>, "Tom Talpey" <tom@talpey.com>,
"Olga Kornievskaia" <okorniev@redhat.com>,
NeilBrown <neil@brown.name>, "Dai Ngo" <dai.ngo@oracle.com>,
linux-nfs@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [RFC PATCH 2/4] nfs: add NFS_CAP_P2PDMA and detect transport support
Date: Tue, 14 Apr 2026 13:59:12 -0700 [thread overview]
Message-ID: <510c6806-66f6-41b6-bf94-957bc68d31b7@app.fastmail.com> (raw)
In-Reply-To: <ad6bkyA1ItA8ou9i@google.com>
On Tue, Apr 14, 2026, at 12:54 PM, Pranjal Shrivastava wrote:
> On Thu, Apr 02, 2026 at 09:11:04AM -0400, Chuck Lever wrote:
>>
>> On Wed, Apr 1, 2026, at 3:44 PM, Pranjal Shrivastava wrote:
>> > The NFS server capabilities bitmask (server->caps) is currently full,
>> > utilizing all 32 bits of the existing unsigned int. Expand the bitmask
>> > to 64 bits (u64) to allow for new feature flags.
>> >
>> > Introduce a new capability bit, NFS_CAP_P2PDMA, to indicate that the
>> > local mount is backed by hardware and a transport capable of PCI
>> > Peer-to-Peer DMA.
>> >
>> > Update nfs_server_set_init_caps() to query the underlying SunRPC
>> > transport for P2PDMA support during the mount process. If the transport
>> > (e.g., RDMA) signals support, set the NFS_CAP_P2PDMA bit in the mount's
>> > capabilities. This allows the high-performance Direct I/O path to
>> > efficiently determine if it should allow P2P memory buffers.
>>
>> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>> > index be02bb227741..f177cf098d44 100644
>> > --- a/fs/nfs/client.c
>> > +++ b/fs/nfs/client.c
>>
>> > @@ -725,6 +727,12 @@ void nfs_server_set_init_caps(struct nfs_server *server)
>> > nfs4_server_set_init_caps(server);
>> > break;
>> > }
>> > +
>> > + rcu_read_lock();
>> > + xprt = rcu_dereference(server->client->cl_xprt);
>> > + if (xprt->ops->supports_p2pdma && xprt->ops->supports_p2pdma(xprt))
>> > + server->caps |= NFS_CAP_P2PDMA;
>> > + rcu_read_unlock();
>> > }
>> > EXPORT_SYMBOL_GPL(nfs_server_set_init_caps);
>>
>> Is the transport even connected when the NFS client does this
>> test? If it isn't, xprtrdma and the RDMA core have not chosen
>> an underlying device yet.
>>
>> Note that, even if this logic /is/ correct, if the transport
>> connection is lost the transport will reconnect automatically,
>> doing the RDMA CM dance again and possibly resolving to a
>> different device. The NFS client layer will be none-the-wiser
>> and the NFS_CAP_P2PDMA flag setting will be stale at that point,
>> and quite possibly incorrect if the new connection's device is
>> not P2P-enabled.
>>
>> (Basically this is what happens when an RDMA device is removed).
>>
>> So this detection has to be done as part of xprtrdma's connection
>> flow, and it needs to set a flag somewhere in the rpc_xprt. The
>> NFS direct I/O code path then has to look for that flag before
>> choosing the mechanism/flags it uses for each iov iter.
>>
>
> Ack. I agree, so should we start with an inital cap and then update it
> in the event of a transport change / disconnect? Or shall we populate
> the cap only when a transport is connected?
IMO this flag does not belong in the NFS server CAPS, as it is a
capability associated with each RPC transport. How should
NFS_CAP_P2PDMA be set if there are two RPC transports, one with
P2PDMA enabled and with it disabled? (Perhaps it should be a flag
in the transport switch instance rather than the transport instance).
Which mechanism to use has to be re-decided every time a dreq is
scheduled because the xprt can change between an original send
and a retransmission (if, say, the COMMIT verifier changes due to
a server reboot).
Trond and Anna will have the final say about how this works.
--
Chuck Lever
next prev parent reply other threads:[~2026-04-14 20:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 19:44 [RFC PATCH 0/4] nfs: Enable PCI Peer-to-Peer DMA (P2PDMA) support Pranjal Shrivastava
2026-04-01 19:44 ` [RFC PATCH 1/4] sunrpc: add supports_p2pdma to rpc_xprt_ops Pranjal Shrivastava
2026-04-01 19:44 ` [RFC PATCH 2/4] nfs: add NFS_CAP_P2PDMA and detect transport support Pranjal Shrivastava
2026-04-02 13:11 ` Chuck Lever
2026-04-14 19:54 ` Pranjal Shrivastava
2026-04-14 20:59 ` Chuck Lever [this message]
2026-04-01 19:44 ` [RFC PATCH 3/4] nfs: make nfs_page pin-aware Pranjal Shrivastava
2026-04-02 5:04 ` Christoph Hellwig
2026-04-14 19:58 ` Pranjal Shrivastava
2026-04-16 5:28 ` Christoph Hellwig
2026-04-01 19:45 ` [RFC PATCH 4/4] nfs: allow P2PDMA in direct I/O path Pranjal Shrivastava
2026-04-02 5:05 ` Christoph Hellwig
2026-04-14 20:00 ` Pranjal Shrivastava
2026-04-16 5:29 ` Christoph Hellwig
2026-04-02 5:07 ` [RFC PATCH 0/4] nfs: Enable PCI Peer-to-Peer DMA (P2PDMA) support Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=510c6806-66f6-41b6-bf94-957bc68d31b7@app.fastmail.com \
--to=cel@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jlayton@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=pabeni@redhat.com \
--cc=praan@google.com \
--cc=tom@talpey.com \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox