public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
       [not found] <20251113093720.20428-1-gaurav.gangalwar@gmail.com>
@ 2025-11-13 14:19 ` Chuck Lever
  2025-11-13 16:39   ` gaurav gangalwar
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2025-11-13 14:19 UTC (permalink / raw)
  To: Gaurav Gangalwar
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, neilb,
	Jeff Layton, linux-rdma@vger.kernel.org

On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
> Bumped up rpcrdma_max_recv_batch to 64.
> Added param to change to it, it becomes handy to use higher value
> to avoid hung.

[ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]

Hi Gaurav -

Adding an administrative setting is generally a last resort. First,
we want a full root-cause analysis to understand the symptoms you
are trying to address. Do you have an RCA or a simple reproducer to
share with us?


> Signed-off-by: Gaurav Gangalwar <gaurav.gangalwar@gmail.com>
> ---
>  net/sunrpc/xprtrdma/frwr_ops.c           | 2 +-
>  net/sunrpc/xprtrdma/module.c             | 6 ++++++
>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
>  net/sunrpc/xprtrdma/verbs.c              | 2 +-
>  net/sunrpc/xprtrdma/xprt_rdma.h          | 4 +---
>  5 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 31434aeb8e29..863a0c567915 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -246,7 +246,7 @@ int frwr_query_device(struct rpcrdma_ep *ep, const struct ib_device *device)
>  	ep->re_attr.cap.max_send_wr += 1; /* for ib_drain_sq */
>  	ep->re_attr.cap.max_recv_wr = ep->re_max_requests;
>  	ep->re_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
> -	ep->re_attr.cap.max_recv_wr += RPCRDMA_MAX_RECV_BATCH;
> +	ep->re_attr.cap.max_recv_wr += rpcrdma_max_recv_batch;
>  	ep->re_attr.cap.max_recv_wr += 1; /* for ib_drain_rq */
>  
>  	ep->re_max_rdma_segs =
> diff --git a/net/sunrpc/xprtrdma/module.c b/net/sunrpc/xprtrdma/module.c
> index 697f571d4c01..afeec5a68151 100644
> --- a/net/sunrpc/xprtrdma/module.c
> +++ b/net/sunrpc/xprtrdma/module.c
> @@ -27,6 +27,12 @@ MODULE_ALIAS("svcrdma");
>  MODULE_ALIAS("xprtrdma");
>  MODULE_ALIAS("rpcrdma6");
>  
> +unsigned int rpcrdma_max_recv_batch = 64;
> +module_param_named(max_recv_batch, rpcrdma_max_recv_batch, uint, 0644);
> +MODULE_PARM_DESC(max_recv_batch,
> +		 "Maximum number of Receive WRs to post in a batch "
> +		 "(default: 64, set to 0 to disable batching)");
> +
>  static void __exit rpc_rdma_cleanup(void)
>  {
>  	xprt_rdma_cleanup();
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 3d7f1413df02..32a9ceb18389 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -440,7 +440,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	newxprt->sc_max_req_size = svcrdma_max_req_size;
>  	newxprt->sc_max_requests = svcrdma_max_requests;
>  	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
> -	newxprt->sc_recv_batch = RPCRDMA_MAX_RECV_BATCH;
> +	newxprt->sc_recv_batch = rpcrdma_max_recv_batch;
>  	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
>  
>  	/* Qualify the transport's resource defaults with the
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 63262ef0c2e3..7cd0a2c152e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1359,7 +1359,7 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed)
>  	if (likely(ep->re_receive_count > needed))
>  		goto out;
>  	needed -= ep->re_receive_count;
> -	needed += RPCRDMA_MAX_RECV_BATCH;
> +	needed += rpcrdma_max_recv_batch;
>  
>  	if (atomic_inc_return(&ep->re_receiving) > 1)
>  		goto out;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 8147d2b41494..1051aa612f36 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -216,9 +216,7 @@ struct rpcrdma_rep {
>   *
>   * Setting this to zero disables Receive post batching.
>   */
> -enum {
> -	RPCRDMA_MAX_RECV_BATCH = 7,
> -};
> +extern unsigned int rpcrdma_max_recv_batch;
>  
>  /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
>   */


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
  2025-11-13 14:19 ` [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable Chuck Lever
@ 2025-11-13 16:39   ` gaurav gangalwar
  2025-11-13 17:41     ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: gaurav gangalwar @ 2025-11-13 16:39 UTC (permalink / raw)
  To: Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, neilb,
	Jeff Layton, linux-rdma@vger.kernel.org

On Thu, Nov 13, 2025 at 7:49 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
> > Bumped up rpcrdma_max_recv_batch to 64.
> > Added param to change to it, it becomes handy to use higher value
> > to avoid hung.
>
> [ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]
>
> Hi Gaurav -
>
> Adding an administrative setting is generally a last resort. First,
> we want a full root-cause analysis to understand the symptoms you
> are trying to address. Do you have an RCA or a simple reproducer to
> share with us?

Issue found while testing fio workload over RDMA
Client: Ubuntu 24.04
Server: Ganesha NFS server
We have seen intermittent hung on client with buffered IO workload at
large scale with around 30 RDMA connections, client was under memory
pressure.
Ganesha log shows

10/11/2025 16:39:12Z : ntnx-10-57-210-224-a-fsvm 1309416[none]
[0x7f49a6c3fe80] rpc :TIRPC :EVENT :rpc_rdma_cq_event_handler() cq
completion status: RNR retry counter exceeded (13) rdma_xprt state 5
opcode 2 cbc 0x7f4996688000 inline 1

Which points to lack of posted recv buffers on client.
Once we increased rpcrdma_max_recv_batch to 64, issue was resolved.

>
>
> > Signed-off-by: Gaurav Gangalwar <gaurav.gangalwar@gmail.com>
> > ---
> >  net/sunrpc/xprtrdma/frwr_ops.c           | 2 +-
> >  net/sunrpc/xprtrdma/module.c             | 6 ++++++
> >  net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
> >  net/sunrpc/xprtrdma/verbs.c              | 2 +-
> >  net/sunrpc/xprtrdma/xprt_rdma.h          | 4 +---
> >  5 files changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> > index 31434aeb8e29..863a0c567915 100644
> > --- a/net/sunrpc/xprtrdma/frwr_ops.c
> > +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> > @@ -246,7 +246,7 @@ int frwr_query_device(struct rpcrdma_ep *ep, const struct ib_device *device)
> >       ep->re_attr.cap.max_send_wr += 1; /* for ib_drain_sq */
> >       ep->re_attr.cap.max_recv_wr = ep->re_max_requests;
> >       ep->re_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
> > -     ep->re_attr.cap.max_recv_wr += RPCRDMA_MAX_RECV_BATCH;
> > +     ep->re_attr.cap.max_recv_wr += rpcrdma_max_recv_batch;
> >       ep->re_attr.cap.max_recv_wr += 1; /* for ib_drain_rq */
> >
> >       ep->re_max_rdma_segs =
> > diff --git a/net/sunrpc/xprtrdma/module.c b/net/sunrpc/xprtrdma/module.c
> > index 697f571d4c01..afeec5a68151 100644
> > --- a/net/sunrpc/xprtrdma/module.c
> > +++ b/net/sunrpc/xprtrdma/module.c
> > @@ -27,6 +27,12 @@ MODULE_ALIAS("svcrdma");
> >  MODULE_ALIAS("xprtrdma");
> >  MODULE_ALIAS("rpcrdma6");
> >
> > +unsigned int rpcrdma_max_recv_batch = 64;
> > +module_param_named(max_recv_batch, rpcrdma_max_recv_batch, uint, 0644);
> > +MODULE_PARM_DESC(max_recv_batch,
> > +              "Maximum number of Receive WRs to post in a batch "
> > +              "(default: 64, set to 0 to disable batching)");
> > +
> >  static void __exit rpc_rdma_cleanup(void)
> >  {
> >       xprt_rdma_cleanup();
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > index 3d7f1413df02..32a9ceb18389 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > @@ -440,7 +440,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> >       newxprt->sc_max_req_size = svcrdma_max_req_size;
> >       newxprt->sc_max_requests = svcrdma_max_requests;
> >       newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
> > -     newxprt->sc_recv_batch = RPCRDMA_MAX_RECV_BATCH;
> > +     newxprt->sc_recv_batch = rpcrdma_max_recv_batch;
> >       newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
> >
> >       /* Qualify the transport's resource defaults with the
> > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> > index 63262ef0c2e3..7cd0a2c152e6 100644
> > --- a/net/sunrpc/xprtrdma/verbs.c
> > +++ b/net/sunrpc/xprtrdma/verbs.c
> > @@ -1359,7 +1359,7 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed)
> >       if (likely(ep->re_receive_count > needed))
> >               goto out;
> >       needed -= ep->re_receive_count;
> > -     needed += RPCRDMA_MAX_RECV_BATCH;
> > +     needed += rpcrdma_max_recv_batch;
> >
> >       if (atomic_inc_return(&ep->re_receiving) > 1)
> >               goto out;
> > diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> > index 8147d2b41494..1051aa612f36 100644
> > --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> > +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> > @@ -216,9 +216,7 @@ struct rpcrdma_rep {
> >   *
> >   * Setting this to zero disables Receive post batching.
> >   */
> > -enum {
> > -     RPCRDMA_MAX_RECV_BATCH = 7,
> > -};
> > +extern unsigned int rpcrdma_max_recv_batch;
> >
> >  /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
> >   */
>
>
> --
> Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
@ 2025-11-13 16:46 Gaurav Gangalwar
  2025-11-13 17:42 ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Gaurav Gangalwar @ 2025-11-13 16:46 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, linux-rdma, Gaurav Gangalwar

Bumped up rpcrdma_max_recv_batch to 64.
Added param to change to it, it becomes handy to use higher value
to avoid hung.

Signed-off-by: Gaurav Gangalwar <gaurav.gangalwar@gmail.com>
---
 net/sunrpc/xprtrdma/frwr_ops.c           | 2 +-
 net/sunrpc/xprtrdma/module.c             | 6 ++++++
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
 net/sunrpc/xprtrdma/verbs.c              | 2 +-
 net/sunrpc/xprtrdma/xprt_rdma.h          | 4 +---
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 31434aeb8e29..863a0c567915 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -246,7 +246,7 @@ int frwr_query_device(struct rpcrdma_ep *ep, const struct ib_device *device)
 	ep->re_attr.cap.max_send_wr += 1; /* for ib_drain_sq */
 	ep->re_attr.cap.max_recv_wr = ep->re_max_requests;
 	ep->re_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
-	ep->re_attr.cap.max_recv_wr += RPCRDMA_MAX_RECV_BATCH;
+	ep->re_attr.cap.max_recv_wr += rpcrdma_max_recv_batch;
 	ep->re_attr.cap.max_recv_wr += 1; /* for ib_drain_rq */
 
 	ep->re_max_rdma_segs =
diff --git a/net/sunrpc/xprtrdma/module.c b/net/sunrpc/xprtrdma/module.c
index 697f571d4c01..afeec5a68151 100644
--- a/net/sunrpc/xprtrdma/module.c
+++ b/net/sunrpc/xprtrdma/module.c
@@ -27,6 +27,12 @@ MODULE_ALIAS("svcrdma");
 MODULE_ALIAS("xprtrdma");
 MODULE_ALIAS("rpcrdma6");
 
+unsigned int rpcrdma_max_recv_batch = 64;
+module_param_named(max_recv_batch, rpcrdma_max_recv_batch, uint, 0644);
+MODULE_PARM_DESC(max_recv_batch,
+		 "Maximum number of Receive WRs to post in a batch "
+		 "(default: 64, set to 0 to disable batching)");
+
 static void __exit rpc_rdma_cleanup(void)
 {
 	xprt_rdma_cleanup();
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 3d7f1413df02..32a9ceb18389 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -440,7 +440,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = svcrdma_max_requests;
 	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
-	newxprt->sc_recv_batch = RPCRDMA_MAX_RECV_BATCH;
+	newxprt->sc_recv_batch = rpcrdma_max_recv_batch;
 	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
 
 	/* Qualify the transport's resource defaults with the
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 63262ef0c2e3..7cd0a2c152e6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1359,7 +1359,7 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed)
 	if (likely(ep->re_receive_count > needed))
 		goto out;
 	needed -= ep->re_receive_count;
-	needed += RPCRDMA_MAX_RECV_BATCH;
+	needed += rpcrdma_max_recv_batch;
 
 	if (atomic_inc_return(&ep->re_receiving) > 1)
 		goto out;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 8147d2b41494..1051aa612f36 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -216,9 +216,7 @@ struct rpcrdma_rep {
  *
  * Setting this to zero disables Receive post batching.
  */
-enum {
-	RPCRDMA_MAX_RECV_BATCH = 7,
-};
+extern unsigned int rpcrdma_max_recv_batch;
 
 /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
  */
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
  2025-11-13 16:39   ` gaurav gangalwar
@ 2025-11-13 17:41     ` Chuck Lever
  2025-11-14  3:22       ` gaurav gangalwar
  2025-11-14 21:04       ` Tom Talpey
  0 siblings, 2 replies; 8+ messages in thread
From: Chuck Lever @ 2025-11-13 17:41 UTC (permalink / raw)
  To: gaurav gangalwar
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, neilb,
	Jeff Layton, linux-rdma@vger.kernel.org

On 11/13/25 11:39 AM, gaurav gangalwar wrote:
> On Thu, Nov 13, 2025 at 7:49 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>
>> On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
>>> Bumped up rpcrdma_max_recv_batch to 64.
>>> Added param to change to it, it becomes handy to use higher value
>>> to avoid hung.
>>
>> [ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]
>>
>> Hi Gaurav -
>>
>> Adding an administrative setting is generally a last resort. First,
>> we want a full root-cause analysis to understand the symptoms you
>> are trying to address. Do you have an RCA or a simple reproducer to
>> share with us?
> 
> Issue found while testing fio workload over RDMA
> Client: Ubuntu 24.04
> Server: Ganesha NFS server
> We have seen intermittent hung on client with buffered IO workload at
> large scale with around 30 RDMA connections, client was under memory
> pressure.
> Ganesha log shows
> 
> 10/11/2025 16:39:12Z : ntnx-10-57-210-224-a-fsvm 1309416[none]
> [0x7f49a6c3fe80] rpc :TIRPC :EVENT :rpc_rdma_cq_event_handler() cq
> completion status: RNR retry counter exceeded (13) rdma_xprt state 5
> opcode 2 cbc 0x7f4996688000 inline 1
> 
> Which points to lack of posted recv buffers on client.
> Once we increased rpcrdma_max_recv_batch to 64, issue was resolved.

That still doesn't convince me that increasing the receive batch count
is a good fix, though it's certainly a workaround.

The client's RPC/RDMA code is supposed to track the number of Sends and
keep the correct number of Receives on the Receive Queue. The goal of
the implementation is to never encounter an RNR.

Therefore, if it's not doing that (and the RNR retries suggests that's
the case) there is an actual bug somewhere. The extra batch Receives are
an optimization, and should have no impact on correct operation.

If you can't reproduce this with the Linux NFS server, the place to
start looking for misbehavior is NFS/Ganesha, as it is the newer NFS
over RDMA implementation of the two servers. Maybe it's not handling
credit accounting correctly, or perhaps it's putting more Sends on
the wire than the credit limit allows.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
  2025-11-13 16:46 Gaurav Gangalwar
@ 2025-11-13 17:42 ` Chuck Lever
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2025-11-13 17:42 UTC (permalink / raw)
  To: Gaurav Gangalwar; +Cc: linux-nfs, linux-rdma

On 11/13/25 11:46 AM, Gaurav Gangalwar wrote:
> Bumped up rpcrdma_max_recv_batch to 64.
> Added param to change to it, it becomes handy to use higher value
> to avoid hung.

NAK until we have a full root cause analysis. Please explain why
this change helps.


> Signed-off-by: Gaurav Gangalwar <gaurav.gangalwar@gmail.com>
> ---
>  net/sunrpc/xprtrdma/frwr_ops.c           | 2 +-
>  net/sunrpc/xprtrdma/module.c             | 6 ++++++
>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
>  net/sunrpc/xprtrdma/verbs.c              | 2 +-
>  net/sunrpc/xprtrdma/xprt_rdma.h          | 4 +---
>  5 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 31434aeb8e29..863a0c567915 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -246,7 +246,7 @@ int frwr_query_device(struct rpcrdma_ep *ep, const struct ib_device *device)
>  	ep->re_attr.cap.max_send_wr += 1; /* for ib_drain_sq */
>  	ep->re_attr.cap.max_recv_wr = ep->re_max_requests;
>  	ep->re_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
> -	ep->re_attr.cap.max_recv_wr += RPCRDMA_MAX_RECV_BATCH;
> +	ep->re_attr.cap.max_recv_wr += rpcrdma_max_recv_batch;
>  	ep->re_attr.cap.max_recv_wr += 1; /* for ib_drain_rq */
>  
>  	ep->re_max_rdma_segs =
> diff --git a/net/sunrpc/xprtrdma/module.c b/net/sunrpc/xprtrdma/module.c
> index 697f571d4c01..afeec5a68151 100644
> --- a/net/sunrpc/xprtrdma/module.c
> +++ b/net/sunrpc/xprtrdma/module.c
> @@ -27,6 +27,12 @@ MODULE_ALIAS("svcrdma");
>  MODULE_ALIAS("xprtrdma");
>  MODULE_ALIAS("rpcrdma6");
>  
> +unsigned int rpcrdma_max_recv_batch = 64;
> +module_param_named(max_recv_batch, rpcrdma_max_recv_batch, uint, 0644);
> +MODULE_PARM_DESC(max_recv_batch,
> +		 "Maximum number of Receive WRs to post in a batch "
> +		 "(default: 64, set to 0 to disable batching)");
> +
>  static void __exit rpc_rdma_cleanup(void)
>  {
>  	xprt_rdma_cleanup();
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 3d7f1413df02..32a9ceb18389 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -440,7 +440,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	newxprt->sc_max_req_size = svcrdma_max_req_size;
>  	newxprt->sc_max_requests = svcrdma_max_requests;
>  	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
> -	newxprt->sc_recv_batch = RPCRDMA_MAX_RECV_BATCH;
> +	newxprt->sc_recv_batch = rpcrdma_max_recv_batch;
>  	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
>  
>  	/* Qualify the transport's resource defaults with the
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 63262ef0c2e3..7cd0a2c152e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1359,7 +1359,7 @@ void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, int needed)
>  	if (likely(ep->re_receive_count > needed))
>  		goto out;
>  	needed -= ep->re_receive_count;
> -	needed += RPCRDMA_MAX_RECV_BATCH;
> +	needed += rpcrdma_max_recv_batch;
>  
>  	if (atomic_inc_return(&ep->re_receiving) > 1)
>  		goto out;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 8147d2b41494..1051aa612f36 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -216,9 +216,7 @@ struct rpcrdma_rep {
>   *
>   * Setting this to zero disables Receive post batching.
>   */
> -enum {
> -	RPCRDMA_MAX_RECV_BATCH = 7,
> -};
> +extern unsigned int rpcrdma_max_recv_batch;
>  
>  /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
>   */


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
  2025-11-13 17:41     ` Chuck Lever
@ 2025-11-14  3:22       ` gaurav gangalwar
  2025-11-14 21:04       ` Tom Talpey
  1 sibling, 0 replies; 8+ messages in thread
From: gaurav gangalwar @ 2025-11-14  3:22 UTC (permalink / raw)
  To: Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, neilb,
	Jeff Layton, linux-rdma@vger.kernel.org

On Thu, Nov 13, 2025 at 11:11 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On 11/13/25 11:39 AM, gaurav gangalwar wrote:
> > On Thu, Nov 13, 2025 at 7:49 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>
> >> On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
> >>> Bumped up rpcrdma_max_recv_batch to 64.
> >>> Added param to change to it, it becomes handy to use higher value
> >>> to avoid hung.
> >>
> >> [ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]
> >>
> >> Hi Gaurav -
> >>
> >> Adding an administrative setting is generally a last resort. First,
> >> we want a full root-cause analysis to understand the symptoms you
> >> are trying to address. Do you have an RCA or a simple reproducer to
> >> share with us?
> >
> > Issue found while testing fio workload over RDMA
> > Client: Ubuntu 24.04
> > Server: Ganesha NFS server
> > We have seen intermittent hung on client with buffered IO workload at
> > large scale with around 30 RDMA connections, client was under memory
> > pressure.
> > Ganesha log shows
> >
> > 10/11/2025 16:39:12Z : ntnx-10-57-210-224-a-fsvm 1309416[none]
> > [0x7f49a6c3fe80] rpc :TIRPC :EVENT :rpc_rdma_cq_event_handler() cq
> > completion status: RNR retry counter exceeded (13) rdma_xprt state 5
> > opcode 2 cbc 0x7f4996688000 inline 1
> >
> > Which points to lack of posted recv buffers on client.
> > Once we increased rpcrdma_max_recv_batch to 64, issue was resolved.
>
> That still doesn't convince me that increasing the receive batch count
> is a good fix, though it's certainly a workaround.
>
> The client's RPC/RDMA code is supposed to track the number of Sends and
> keep the correct number of Receives on the Receive Queue. The goal of
> the implementation is to never encounter an RNR.
>
> Therefore, if it's not doing that (and the RNR retries suggests that's
> the case) there is an actual bug somewhere. The extra batch Receives are
> an optimization, and should have no impact on correct operation.
>
> If you can't reproduce this with the Linux NFS server, the place to
> start looking for misbehavior is NFS/Ganesha, as it is the newer NFS
> over RDMA implementation of the two servers. Maybe it's not handling
> credit accounting correctly, or perhaps it's putting more Sends on
> the wire than the credit limit allows.
Sure I will try to get more details.
Issue is specific to pNFS, for non pNFS shares we don't see this issue.
Even for pNFS we see hung when number of ds connections are high (30
ds connections)
But this work around is definitely helping.
>
>
> --
> Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
  2025-11-13 17:41     ` Chuck Lever
  2025-11-14  3:22       ` gaurav gangalwar
@ 2025-11-14 21:04       ` Tom Talpey
  2025-11-28  5:48         ` gaurav gangalwar
  1 sibling, 1 reply; 8+ messages in thread
From: Tom Talpey @ 2025-11-14 21:04 UTC (permalink / raw)
  To: Chuck Lever, gaurav gangalwar
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, neilb, Jeff Layton,
	linux-rdma@vger.kernel.org

On 11/13/2025 12:41 PM, Chuck Lever wrote:
> On 11/13/25 11:39 AM, gaurav gangalwar wrote:
>> On Thu, Nov 13, 2025 at 7:49 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>>
>>> On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
>>>> Bumped up rpcrdma_max_recv_batch to 64.
>>>> Added param to change to it, it becomes handy to use higher value
>>>> to avoid hung.
>>>
>>> [ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]
>>>
>>> Hi Gaurav -
>>>
>>> Adding an administrative setting is generally a last resort. First,
>>> we want a full root-cause analysis to understand the symptoms you
>>> are trying to address. Do you have an RCA or a simple reproducer to
>>> share with us?
>>
>> Issue found while testing fio workload over RDMA
>> Client: Ubuntu 24.04
>> Server: Ganesha NFS server
>> We have seen intermittent hung on client with buffered IO workload at
>> large scale with around 30 RDMA connections, client was under memory
>> pressure.
>> Ganesha log shows
>>
>> 10/11/2025 16:39:12Z : ntnx-10-57-210-224-a-fsvm 1309416[none]
>> [0x7f49a6c3fe80] rpc :TIRPC :EVENT :rpc_rdma_cq_event_handler() cq
>> completion status: RNR retry counter exceeded (13) rdma_xprt state 5
>> opcode 2 cbc 0x7f4996688000 inline 1
>>
>> Which points to lack of posted recv buffers on client.
>> Once we increased rpcrdma_max_recv_batch to 64, issue was resolved.
> 
> That still doesn't convince me that increasing the receive batch count
> is a good fix, though it's certainly a workaround.

It's not a workaround, this will fail on any RDMA provider that doesn't
perform RNR retry, for example iWARP. But more importantly, RNR retry is
unnecessary because the rpcrdma protocol implements a strict crediting
exchange. A proper rpcrdma implementation will never trigger RNR.

This is almost certainly an rpcrdma protocol violation in the sender,
which is failing to honor the credit limit granted by the receiving
peer and is overrunning the peer's receive queue. A wireshark trace
would prove it. Please do this research.

Tom.


> 
> The client's RPC/RDMA code is supposed to track the number of Sends and
> keep the correct number of Receives on the Receive Queue. The goal of
> the implementation is to never encounter an RNR.
> 
> Therefore, if it's not doing that (and the RNR retries suggests that's
> the case) there is an actual bug somewhere. The extra batch Receives are
> an optimization, and should have no impact on correct operation.
> 
> If you can't reproduce this with the Linux NFS server, the place to
> start looking for misbehavior is NFS/Ganesha, as it is the newer NFS
> over RDMA implementation of the two servers. Maybe it's not handling
> credit accounting correctly, or perhaps it's putting more Sends on
> the wire than the credit limit allows.
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
  2025-11-14 21:04       ` Tom Talpey
@ 2025-11-28  5:48         ` gaurav gangalwar
  0 siblings, 0 replies; 8+ messages in thread
From: gaurav gangalwar @ 2025-11-28  5:48 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, neilb,
	Jeff Layton, linux-rdma@vger.kernel.org

On Sat, Nov 15, 2025 at 2:34 AM Tom Talpey <tom@talpey.com> wrote:
>
> On 11/13/2025 12:41 PM, Chuck Lever wrote:
> > On 11/13/25 11:39 AM, gaurav gangalwar wrote:
> >> On Thu, Nov 13, 2025 at 7:49 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>
> >>> On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
> >>>> Bumped up rpcrdma_max_recv_batch to 64.
> >>>> Added param to change to it, it becomes handy to use higher value
> >>>> to avoid hung.
> >>>
> >>> [ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]
> >>>
> >>> Hi Gaurav -
> >>>
> >>> Adding an administrative setting is generally a last resort. First,
> >>> we want a full root-cause analysis to understand the symptoms you
> >>> are trying to address. Do you have an RCA or a simple reproducer to
> >>> share with us?
> >>
> >> Issue found while testing fio workload over RDMA
> >> Client: Ubuntu 24.04
> >> Server: Ganesha NFS server
> >> We have seen intermittent hung on client with buffered IO workload at
> >> large scale with around 30 RDMA connections, client was under memory
> >> pressure.
> >> Ganesha log shows
> >>
> >> 10/11/2025 16:39:12Z : ntnx-10-57-210-224-a-fsvm 1309416[none]
> >> [0x7f49a6c3fe80] rpc :TIRPC :EVENT :rpc_rdma_cq_event_handler() cq
> >> completion status: RNR retry counter exceeded (13) rdma_xprt state 5
> >> opcode 2 cbc 0x7f4996688000 inline 1
> >>
> >> Which points to lack of posted recv buffers on client.
> >> Once we increased rpcrdma_max_recv_batch to 64, issue was resolved.
> >
> > That still doesn't convince me that increasing the receive batch count
> > is a good fix, though it's certainly a workaround.
>
> It's not a workaround, this will fail on any RDMA provider that doesn't
> perform RNR retry, for example iWARP. But more importantly, RNR retry is
> unnecessary because the rpcrdma protocol implements a strict crediting
> exchange. A proper rpcrdma implementation will never trigger RNR.
>
> This is almost certainly an rpcrdma protocol violation in the sender,
> which is failing to honor the credit limit granted by the receiving
> peer and is overrunning the peer's receive queue. A wireshark trace
> would prove it. Please do this research.
>
> Tom.
>
>
> >
> > The client's RPC/RDMA code is supposed to track the number of Sends and
> > keep the correct number of Receives on the Receive Queue. The goal of
> > the implementation is to never encounter an RNR.
> >
> > Therefore, if it's not doing that (and the RNR retries suggests that's
> > the case) there is an actual bug somewhere. The extra batch Receives are
> > an optimization, and should have no impact on correct operation.
> >
> > If you can't reproduce this with the Linux NFS server, the place to
> > start looking for misbehavior is NFS/Ganesha, as it is the newer NFS
> > over RDMA implementation of the two servers. Maybe it's not handling
> > credit accounting correctly, or perhaps it's putting more Sends on
> > the wire than the credit limit allows.
> >
> >
>
Thanks for the review, I was going through server implementation for
NFS Ganesha, we strictly honor read_chunks, write_chunks and
reply_chunks, so the credit limit should be client driven only.
Only in case of callbacks from server to client, as of now there is no
credits check, we advertise server_credits in cb call similar to
rb_bc_max_requests in linux NFS server, but don't check for
client_credits from client in reply. We need to limit callbacks to
MIN(server_credits, client_credits)

Regards,
Gaurav

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-28  5:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251113093720.20428-1-gaurav.gangalwar@gmail.com>
2025-11-13 14:19 ` [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable Chuck Lever
2025-11-13 16:39   ` gaurav gangalwar
2025-11-13 17:41     ` Chuck Lever
2025-11-14  3:22       ` gaurav gangalwar
2025-11-14 21:04       ` Tom Talpey
2025-11-28  5:48         ` gaurav gangalwar
2025-11-13 16:46 Gaurav Gangalwar
2025-11-13 17:42 ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox