linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst
@ 2025-10-09 14:40 Jeff Layton
  2025-10-09 14:40 ` [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton
  2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff Layton @ 2025-10-09 14:40 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells
  Cc: Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton

This version of the series just changes the second patch to use a
separate rq_bvec_len field instead of rq_maxpages at the places where
it's iterating over the rq_bvec.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v3:
- Add rq_bvec_len field and use it in appropriate places
- Link to v2: https://lore.kernel.org/r/20251008-rq_bvec-v2-0-823c0a85a27c@kernel.org

Changes in v2:
- Better changelog message for patch #2
- Link to v1: https://lore.kernel.org/r/20251008-rq_bvec-v1-0-7f23d32d75e5@kernel.org

---
Jeff Layton (2):
      sunrpc: account for TCP record marker in rq_bvec array when sending
      sunrpc: add a slot to rqstp->rq_bvec for TCP record marker

 fs/nfsd/vfs.c              | 6 +++---
 include/linux/sunrpc/svc.h | 1 +
 net/sunrpc/svc.c           | 4 +++-
 net/sunrpc/svcsock.c       | 4 ++--
 4 files changed, 9 insertions(+), 6 deletions(-)
---
base-commit: 177818f176ef904fb18d237d1dbba00c2643aaf2
change-id: 20251008-rq_bvec-b66afd0fdbbb

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending
  2025-10-09 14:40 [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton
@ 2025-10-09 14:40 ` Jeff Layton
  2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2025-10-09 14:40 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells
  Cc: Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton

The call to xdr_buf_to_bvec() in svc_tcp_sendto() passes in the second
slot to the bvec array as the starting slot, but doesn't decrease the
length of the array by one.

Fixes: 59cf7346542b ("sunrpc: Replace the rq_bvec array with dynamically-allocated memory")
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 net/sunrpc/svcsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 7b90abc5cf0ee1520796b2f38fcb977417009830..377fcaaaa061463fc5c85fc09c7a8eab5e06af77 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
 	memcpy(buf, &marker, sizeof(marker));
 	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
 
-	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages,
+	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1,
 				&rqstp->rq_res);
 
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker
  2025-10-09 14:40 [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton
  2025-10-09 14:40 ` [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton
@ 2025-10-09 14:40 ` Jeff Layton
  2025-10-09 15:03   ` Chuck Lever
  2025-10-10 10:54   ` NeilBrown
  1 sibling, 2 replies; 6+ messages in thread
From: Jeff Layton @ 2025-10-09 14:40 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells
  Cc: Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton

We've seen some occurrences of messages like this in dmesg on some knfsd
servers:

    xdr_buf_to_bvec: bio_vec array overflow

Usually followed by messages like this that indicate a short send (note
that this message is from an older kernel and the amount that it reports
attempting to send is short by 4 bytes):

    rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket

svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record
marker. If the send is an unaligned READ call though, then there may not
be enough slots in the rq_bvec array in some cases.

Add a rqstp->rq_bvec_len field and use that to keep track of the length
of rq_bvec. Use that in place of rq_maxpages where it's iterating over
the bvec.

Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
Tested-by: Brandon Adams <brandona@meta.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/vfs.c              | 6 +++---
 include/linux/sunrpc/svc.h | 1 +
 net/sunrpc/svc.c           | 4 +++-
 net/sunrpc/svcsock.c       | 4 ++--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	v = 0;
 	total = dio_end - dio_start;
-	while (total && v < rqstp->rq_maxpages &&
+	while (total && v < rqstp->rq_bvec_len &&
 	       rqstp->rq_next_page < rqstp->rq_page_end) {
 		len = min_t(size_t, total, PAGE_SIZE);
 		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
@@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	v = 0;
 	total = *count;
-	while (total && v < rqstp->rq_maxpages &&
+	while (total && v < rqstp->rq_bvec_len &&
 	       rqstp->rq_next_page < rqstp->rq_page_end) {
 		len = min_t(size_t, total, PAGE_SIZE - base);
 		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
@@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (stable && !fhp->fh_use_wgather)
 		kiocb.ki_flags |= IOCB_DSYNC;
 
-	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
+	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload);
 	iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
 	since = READ_ONCE(file->f_wb_err);
 	if (verf)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -206,6 +206,7 @@ struct svc_rqst {
 
 	struct folio_batch	rq_fbatch;
 	struct bio_vec		*rq_bvec;
+	u32			rq_bvec_len;
 
 	__be32			rq_xid;		/* transmission id */
 	u32			rq_prog;	/* program number */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	if (!svc_init_buffer(rqstp, serv, node))
 		goto out_enomem;
 
-	rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
+	/* +1 for the TCP record marker */
+	rqstp->rq_bvec_len = rqstp->rq_maxpages + 1;
+	rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len,
 				      sizeof(struct bio_vec),
 				      GFP_KERNEL, node);
 	if (!rqstp->rq_bvec)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
 	if (svc_xprt_is_dead(xprt))
 		goto out_notconn;
 
-	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
+	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr);
 
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
 		      count, rqstp->rq_res.len);
@@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
 	memcpy(buf, &marker, sizeof(marker));
 	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
 
-	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1,
+	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1,
 				&rqstp->rq_res);
 
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker
  2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
@ 2025-10-09 15:03   ` Chuck Lever
  2025-10-09 15:07     ` Jeff Layton
  2025-10-10 10:54   ` NeilBrown
  1 sibling, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2025-10-09 15:03 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells
  Cc: Brandon Adams, linux-nfs, netdev, linux-kernel

On 10/9/25 10:40 AM, Jeff Layton wrote:
> We've seen some occurrences of messages like this in dmesg on some knfsd
> servers:
> 
>     xdr_buf_to_bvec: bio_vec array overflow
> 
> Usually followed by messages like this that indicate a short send (note
> that this message is from an older kernel and the amount that it reports
> attempting to send is short by 4 bytes):
> 
>     rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket
> 
> svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record
> marker. If the send is an unaligned READ call though, then there may not
> be enough slots in the rq_bvec array in some cases.
> 
> Add a rqstp->rq_bvec_len field and use that to keep track of the length
> of rq_bvec. Use that in place of rq_maxpages where it's iterating over
> the bvec.

Granted that the number of items in rq_pages and in rq_bvec don't have
to coincide, they just happen to be the same, historically. And, each
bvec in rq_bvec doesn't necessarily have to be a page.


> Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
> Tested-by: Brandon Adams <brandona@meta.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/vfs.c              | 6 +++---
>  include/linux/sunrpc/svc.h | 1 +
>  net/sunrpc/svc.c           | 4 +++-
>  net/sunrpc/svcsock.c       | 4 ++--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	v = 0;
>  	total = dio_end - dio_start;
> -	while (total && v < rqstp->rq_maxpages &&
> +	while (total && v < rqstp->rq_bvec_len &&
>  	       rqstp->rq_next_page < rqstp->rq_page_end) {
>  		len = min_t(size_t, total, PAGE_SIZE);
>  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	v = 0;
>  	total = *count;
> -	while (total && v < rqstp->rq_maxpages &&
> +	while (total && v < rqstp->rq_bvec_len &&
>  	       rqstp->rq_next_page < rqstp->rq_page_end) {
>  		len = min_t(size_t, total, PAGE_SIZE - base);
>  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (stable && !fhp->fh_use_wgather)
>  		kiocb.ki_flags |= IOCB_DSYNC;
>  
> -	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> +	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload);
>  	iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
>  	since = READ_ONCE(file->f_wb_err);
>  	if (verf)
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -206,6 +206,7 @@ struct svc_rqst {
>  
>  	struct folio_batch	rq_fbatch;
>  	struct bio_vec		*rq_bvec;
> +	u32			rq_bvec_len;
>  
>  	__be32			rq_xid;		/* transmission id */
>  	u32			rq_prog;	/* program number */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>  	if (!svc_init_buffer(rqstp, serv, node))
>  		goto out_enomem;
>  
> -	rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
> +	/* +1 for the TCP record marker */
> +	rqstp->rq_bvec_len = rqstp->rq_maxpages + 1;

What bugs me about this is that svc_prepare_thread() shouldn't have
specific knowledge about the needs of transports. But I don't have a
better idea...


> +	rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len,
>  				      sizeof(struct bio_vec),
>  				      GFP_KERNEL, node);
>  	if (!rqstp->rq_bvec)
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
>  	if (svc_xprt_is_dead(xprt))
>  		goto out_notconn;
>  
> -	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
> +	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr);
>  
>  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
>  		      count, rqstp->rq_res.len);
> @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
>  	memcpy(buf, &marker, sizeof(marker));
>  	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
>  
> -	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1,
> +	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1,
>  				&rqstp->rq_res);
>  
>  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker
  2025-10-09 15:03   ` Chuck Lever
@ 2025-10-09 15:07     ` Jeff Layton
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2025-10-09 15:07 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells
  Cc: Brandon Adams, linux-nfs, netdev, linux-kernel

On Thu, 2025-10-09 at 11:03 -0400, Chuck Lever wrote:
> On 10/9/25 10:40 AM, Jeff Layton wrote:
> > We've seen some occurrences of messages like this in dmesg on some knfsd
> > servers:
> > 
> >     xdr_buf_to_bvec: bio_vec array overflow
> > 
> > Usually followed by messages like this that indicate a short send (note
> > that this message is from an older kernel and the amount that it reports
> > attempting to send is short by 4 bytes):
> > 
> >     rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket
> > 
> > svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record
> > marker. If the send is an unaligned READ call though, then there may not
> > be enough slots in the rq_bvec array in some cases.
> > 
> > Add a rqstp->rq_bvec_len field and use that to keep track of the length
> > of rq_bvec. Use that in place of rq_maxpages where it's iterating over
> > the bvec.
> 
> Granted that the number of items in rq_pages and in rq_bvec don't have
> to coincide, they just happen to be the same, historically. And, each
> bvec in rq_bvec doesn't necessarily have to be a page.
> 
> 
> > Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
> > Tested-by: Brandon Adams <brandona@meta.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/nfsd/vfs.c              | 6 +++---
> >  include/linux/sunrpc/svc.h | 1 +
> >  net/sunrpc/svc.c           | 4 +++-
> >  net/sunrpc/svcsock.c       | 4 ++--
> >  4 files changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  
> >  	v = 0;
> >  	total = dio_end - dio_start;
> > -	while (total && v < rqstp->rq_maxpages &&
> > +	while (total && v < rqstp->rq_bvec_len &&
> >  	       rqstp->rq_next_page < rqstp->rq_page_end) {
> >  		len = min_t(size_t, total, PAGE_SIZE);
> >  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> > @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  
> >  	v = 0;
> >  	total = *count;
> > -	while (total && v < rqstp->rq_maxpages &&
> > +	while (total && v < rqstp->rq_bvec_len &&
> >  	       rqstp->rq_next_page < rqstp->rq_page_end) {
> >  		len = min_t(size_t, total, PAGE_SIZE - base);
> >  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> > @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (stable && !fhp->fh_use_wgather)
> >  		kiocb.ki_flags |= IOCB_DSYNC;
> >  
> > -	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> > +	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload);
> >  	iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> >  	since = READ_ONCE(file->f_wb_err);
> >  	if (verf)
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -206,6 +206,7 @@ struct svc_rqst {
> >  
> >  	struct folio_batch	rq_fbatch;
> >  	struct bio_vec		*rq_bvec;
> > +	u32			rq_bvec_len;
> >  
> >  	__be32			rq_xid;		/* transmission id */
> >  	u32			rq_prog;	/* program number */
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
> >  	if (!svc_init_buffer(rqstp, serv, node))
> >  		goto out_enomem;
> >  
> > -	rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
> > +	/* +1 for the TCP record marker */
> > +	rqstp->rq_bvec_len = rqstp->rq_maxpages + 1;
> 
> What bugs me about this is that svc_prepare_thread() shouldn't have
> specific knowledge about the needs of transports. But I don't have a
> better idea...
> 

Yeah, it's a minor layering violation. I guess I could phrase it as:

    /* Some transports need an extra bvec. (e.g. TCP needs it for the record marker) */

...but I'm not sure that's any clearer.

> 
> > +	rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len,
> >  				      sizeof(struct bio_vec),
> >  				      GFP_KERNEL, node);
> >  	if (!rqstp->rq_bvec)
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
> >  	if (svc_xprt_is_dead(xprt))
> >  		goto out_notconn;
> >  
> > -	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
> > +	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr);
> >  
> >  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
> >  		      count, rqstp->rq_res.len);
> > @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
> >  	memcpy(buf, &marker, sizeof(marker));
> >  	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
> >  
> > -	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1,
> > +	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1,
> >  				&rqstp->rq_res);
> >  
> >  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker
  2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
  2025-10-09 15:03   ` Chuck Lever
@ 2025-10-10 10:54   ` NeilBrown
  1 sibling, 0 replies; 6+ messages in thread
From: NeilBrown @ 2025-10-10 10:54 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells,
	Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton

On Fri, 10 Oct 2025, Jeff Layton wrote:
> We've seen some occurrences of messages like this in dmesg on some knfsd
> servers:
> 
>     xdr_buf_to_bvec: bio_vec array overflow
> 
> Usually followed by messages like this that indicate a short send (note
> that this message is from an older kernel and the amount that it reports
> attempting to send is short by 4 bytes):
> 
>     rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket
> 
> svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record
> marker. If the send is an unaligned READ call though, then there may not
> be enough slots in the rq_bvec array in some cases.
> 
> Add a rqstp->rq_bvec_len field and use that to keep track of the length
> of rq_bvec. Use that in place of rq_maxpages where it's iterating over
> the bvec.

The above never says that the patch increases the size of rq_bvec, which
is important for actually fixing the bug.

This patch does two things:

 1/ introduce ->rq_bvec_len which records the length of rq_bvec, and use
    it where ever that length is needed, rather than assuming it is
    rq_maxpages.
 2/ increase ->rq_bvec_len by 1 as svc_tcp_sendmsg needs an extra slot
    to send the record header.

You could conceivably make it two patches, but I don't think that is
necessary.  It *is* necessary to make it clear that these two distinct
though related changes are happening.

With something like that added to the commit message:

  Reviewed-by: NeilBrown <neil@brown.name>

Thanks,
NeilBrown


> 
> Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
> Tested-by: Brandon Adams <brandona@meta.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/vfs.c              | 6 +++---
>  include/linux/sunrpc/svc.h | 1 +
>  net/sunrpc/svc.c           | 4 +++-
>  net/sunrpc/svcsock.c       | 4 ++--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	v = 0;
>  	total = dio_end - dio_start;
> -	while (total && v < rqstp->rq_maxpages &&
> +	while (total && v < rqstp->rq_bvec_len &&
>  	       rqstp->rq_next_page < rqstp->rq_page_end) {
>  		len = min_t(size_t, total, PAGE_SIZE);
>  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	v = 0;
>  	total = *count;
> -	while (total && v < rqstp->rq_maxpages &&
> +	while (total && v < rqstp->rq_bvec_len &&
>  	       rqstp->rq_next_page < rqstp->rq_page_end) {
>  		len = min_t(size_t, total, PAGE_SIZE - base);
>  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (stable && !fhp->fh_use_wgather)
>  		kiocb.ki_flags |= IOCB_DSYNC;
>  
> -	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> +	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload);
>  	iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
>  	since = READ_ONCE(file->f_wb_err);
>  	if (verf)
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -206,6 +206,7 @@ struct svc_rqst {
>  
>  	struct folio_batch	rq_fbatch;
>  	struct bio_vec		*rq_bvec;
> +	u32			rq_bvec_len;
>  
>  	__be32			rq_xid;		/* transmission id */
>  	u32			rq_prog;	/* program number */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>  	if (!svc_init_buffer(rqstp, serv, node))
>  		goto out_enomem;
>  
> -	rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
> +	/* +1 for the TCP record marker */
> +	rqstp->rq_bvec_len = rqstp->rq_maxpages + 1;
> +	rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len,
>  				      sizeof(struct bio_vec),
>  				      GFP_KERNEL, node);
>  	if (!rqstp->rq_bvec)
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
>  	if (svc_xprt_is_dead(xprt))
>  		goto out_notconn;
>  
> -	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
> +	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr);
>  
>  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
>  		      count, rqstp->rq_res.len);
> @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
>  	memcpy(buf, &marker, sizeof(marker));
>  	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
>  
> -	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1,
> +	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1,
>  				&rqstp->rq_res);
>  
>  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
> 
> -- 
> 2.51.0
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-10 10:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 14:40 [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton
2025-10-09 14:40 ` [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton
2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
2025-10-09 15:03   ` Chuck Lever
2025-10-09 15:07     ` Jeff Layton
2025-10-10 10:54   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).