From: Chuck Lever <chuck.lever@oracle.com>
To: gaurav gangalwar <gaurav.gangalwar@gmail.com>
Cc: linux-nfs@vger.kernel.org,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
neilb@brown.name, Jeff Layton <jlayton@kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable.
Date: Thu, 13 Nov 2025 12:41:21 -0500 [thread overview]
Message-ID: <fc58b0f2-d00b-4e4e-a353-ffe43bec6c6e@oracle.com> (raw)
In-Reply-To: <CAJiE4O=zhEaJKQO7bBc8g9gXCiMoi7G7qSiVbQ5Cq+SwBK8OVw@mail.gmail.com>
On 11/13/25 11:39 AM, gaurav gangalwar wrote:
> On Thu, Nov 13, 2025 at 7:49 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>
>> On 11/13/25 4:37 AM, Gaurav Gangalwar wrote:
>>> Bumped up rpcrdma_max_recv_batch to 64.
>>> Added param to change to it, it becomes handy to use higher value
>>> to avoid hung.
>>
>> [ Resend with correct NFSD reviewer email addresses and linux-rdma@ ]
>>
>> Hi Gaurav -
>>
>> Adding an administrative setting is generally a last resort. First,
>> we want a full root-cause analysis to understand the symptoms you
>> are trying to address. Do you have an RCA or a simple reproducer to
>> share with us?
>
> Issue found while testing fio workload over RDMA
> Client: Ubuntu 24.04
> Server: Ganesha NFS server
> We have seen intermittent hung on client with buffered IO workload at
> large scale with around 30 RDMA connections, client was under memory
> pressure.
> Ganesha log shows
>
> 10/11/2025 16:39:12Z : ntnx-10-57-210-224-a-fsvm 1309416[none]
> [0x7f49a6c3fe80] rpc :TIRPC :EVENT :rpc_rdma_cq_event_handler() cq
> completion status: RNR retry counter exceeded (13) rdma_xprt state 5
> opcode 2 cbc 0x7f4996688000 inline 1
>
> Which points to lack of posted recv buffers on client.
> Once we increased rpcrdma_max_recv_batch to 64, issue was resolved.
That still doesn't convince me that increasing the receive batch count
is a good fix, though it's certainly a workaround.
The client's RPC/RDMA code is supposed to track the number of Sends and
keep the correct number of Receives on the Receive Queue. The goal of
the implementation is to never encounter an RNR.
Therefore, if it's not doing that (and the RNR retries suggests that's
the case) there is an actual bug somewhere. The extra batch Receives are
an optimization, and should have no impact on correct operation.
If you can't reproduce this with the Linux NFS server, the place to
start looking for misbehavior is NFS/Ganesha, as it is the newer NFS
over RDMA implementation of the two servers. Maybe it's not handling
credit accounting correctly, or perhaps it's putting more Sends on
the wire than the credit limit allows.
--
Chuck Lever
next prev parent reply other threads:[~2025-11-13 17:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251113093720.20428-1-gaurav.gangalwar@gmail.com>
2025-11-13 14:19 ` [PATCH] Make RPCRDMA_MAX_RECV_BATCH configurable Chuck Lever
2025-11-13 16:39 ` gaurav gangalwar
2025-11-13 17:41 ` Chuck Lever [this message]
2025-11-14 3:22 ` gaurav gangalwar
2025-11-14 21:04 ` Tom Talpey
2025-11-28 5:48 ` gaurav gangalwar
2025-11-13 16:46 Gaurav Gangalwar
2025-11-13 17:42 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fc58b0f2-d00b-4e4e-a353-ffe43bec6c6e@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=gaurav.gangalwar@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=neilb@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox