From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neil@brown.name>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
David Howells <dhowells@redhat.com>,
Brandon Adams <brandona@meta.com>,
linux-nfs@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker
Date: Fri, 10 Oct 2025 06:25:16 -0400 [thread overview]
Message-ID: <8cbef9511c8b70dfcf7cdaa9a620f931ab170faa.camel@kernel.org> (raw)
In-Reply-To: <176005502018.1793333.5043420085151021396@noble.neil.brown.name>
On Fri, 2025-10-10 at 11:10 +1100, NeilBrown wrote:
> On Thu, 09 Oct 2025, Jeff Layton wrote:
> > On Thu, 2025-10-09 at 08:51 +1100, NeilBrown wrote:
> > > On Thu, 09 Oct 2025, Jeff Layton wrote:
> > > > We've seen some occurrences of messages like this in dmesg on some knfsd
> > > > servers:
> > > >
> > > > xdr_buf_to_bvec: bio_vec array overflow
> > > >
> > > > Usually followed by messages like this that indicate a short send (note
> > > > that this message is from an older kernel and the amount that it reports
> > > > attempting to send is short by 4 bytes):
> > > >
> > > > rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket
> > > >
> > > > svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record
> > > > marker. If the send is an unaligned READ call though, then there may not
> > > > be enough slots in the rq_bvec array in some cases.
> > > >
> > > > Add a slot to the rq_bvec array, and fix up the array lengths in the
> > > > callers that care.
> > > >
> > > > Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
> > > > Tested-by: Brandon Adams <brandona@meta.com>
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > > fs/nfsd/vfs.c | 6 +++---
> > > > net/sunrpc/svc.c | 3 ++-
> > > > net/sunrpc/svcsock.c | 4 ++--
> > > > 3 files changed, 7 insertions(+), 6 deletions(-)
> > >
> > > I can't say that I'm liking this patch.
> > >
> > > There are 11 place where (in nfsd-testing recently) where
> > > rq_maxpages is used (as opposed to declared or assigned).
> > >
> > > 3 in nfsd/vfs.c
> > > 4 in sunrpc/svc.c
> > > 1 in sunrpc/svc_xprt.c
> > > 2 in sunrpc/svcsock.c
> > > 1 in xprtrdma/svc_rdma_rc.c
> > >
> > > Your patch changes six of those to add 1. I guess the others aren't
> > > "callers that care". It would help to have it clearly stated why, or
> > > why not, a caller might care.
> > >
> > > But also, what does "rq_maxpages" even mean now?
> > > The comment in svc.h still says "num of entries in rq_pages"
> > > which is certainly no longer the case.
> > > But if it was the case, we should have called it "rq_numpages"
> > > or similar.
> > > But maybe it wasn't meant to be the number of pages in the array,
> > > maybe it was meant to be the maximum number of pages is a request
> > > or a reply.....
> > > No - that is sv_max_mesg, to which we add 2 and 1.
> > > So I could ask "why not just add another 1 in svc_serv_maxpages()?"
> > > Would the callers that might not care be harmed if rq_maxpages were
> > > one larger than it is?
> > >
> > > It seems to me that rq_maxpages is rather confused and the bug you have
> > > found which requires this patch is some evidence to that confusion. We
> > > should fix the confusion, not just the bug.
> > >
> > > So simple question to cut through my waffle:
> > > Would this:
> > > - return DIV_ROUND_UP(serv->sv_max_mesg, PAGE_SIZE) + 2 + 1;
> > > + return DIV_ROUND_UP(serv->sv_max_mesg, PAGE_SIZE) + 2 + 1 + 1;
> > >
> > > fix the problem. If not, why not? If so, can we just do this?
> > > then look at renaming rq_maxpages to rq_numpages and audit all the uses
> > > (and maybe you have already audited...).
> > >
> >
> > I get the objection. I'm not crazy about all of the adjustments either.
> >
> > rq_maxpages is used to size two fields in the rqstp: rq_pages and
> > rq_bvec. It turns out that they both want rq_maxpages + 1 slots. The
> > rq_pages array needs the extra slot for a NULL terminator, and rq_bvec
> > needs it for the TCP record marker.
>
> Somehow the above para helped a lot for me to understand what the issue
> is here - thanks.
>
> rq_bvec is used for two quite separate purposes.
>
> nfsd/vfs.c uses it to assemble read/write requests to send to the
> filesystem.
> sunrpc/svcsock.c uses to assemble send/recv requests to send to the
> network.
>
> It might help me if this were documented clearly in svc.h as I seem to
> have had to discover several times now :-(
>
> Should these even use the same rq_bvec? I guess it makes sense to share
> but we should be cautious about letting the needs of one side infect the
> code of the other side.
>
> So if we increase the size of rq_bvec to meet the needs of svcsock.c, do
> we need to make *any* code changes to vfs.c? I doubt it.
>
> It bothers me a little bit that svc_tcp_sendmsg() needs to allocate a
> frag. But given that it does, could it also allocate a larger bvec if
> rq_bvec isn't big enough?
>
> Or should svc_tcp_recvfrom() allocate the frag and make sure the bvec is
> big enough ......
> Or svc_alloc_arg() could check with each active transport for any
> preallocation requirements...
> Or svc_create_socket() could update some "bvec_size" field in svc_serv
> which svc_alloc_arg() could check an possibly realloc rq_bvec.
>
> I'm rambling a bit here. I agree with Chuck (and you) that it would be
> nice if this need for a larger bvec were kept local to svcsock code if
> possible.
>
> But I'm fairly confident that the current problem doesn't justify any
> changes to vfs.c. svc.c probably needs to somehow be involved in
> rq_bvec being bigger and svcsock.c certainly needs to be able to make
> use of the extra space, but that seems to be all that is required.
>
I sent a v3 patch which adds a separate rq_bvec_len field and uses that
in the places where the code is iterating over the rq_bvec. That does
change places in vfs.c, but I think it makes the code clearer. Are you
OK with that version?
Thanks,
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-10-10 10:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-08 20:23 [PATCH v2 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton
2025-10-08 20:23 ` [PATCH v2 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton
2025-10-08 21:32 ` NeilBrown
2025-10-08 20:23 ` [PATCH v2 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
2025-10-08 21:51 ` NeilBrown
2025-10-09 0:08 ` Mike Snitzer
2025-10-09 12:22 ` Jeff Layton
2025-10-10 0:10 ` NeilBrown
2025-10-10 10:25 ` Jeff Layton [this message]
2025-10-10 10:47 ` NeilBrown
2025-10-10 12:47 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8cbef9511c8b70dfcf7cdaa9a620f931ab170faa.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=brandona@meta.com \
--cc=chuck.lever@oracle.com \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=pabeni@redhat.com \
--cc=tom@talpey.com \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.