* [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst
@ 2025-10-09 14:40 Jeff Layton
2025-10-09 14:40 ` [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton
2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton
0 siblings, 2 replies; 6+ messages in thread
From: Jeff Layton @ 2025-10-09 14:40 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells
Cc: Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton
This version of the series just changes the second patch to use a
separate rq_bvec_len field instead of rq_maxpages at the places where
it's iterating over the rq_bvec.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v3:
- Add rq_bvec_len field and use it in appropriate places
- Link to v2: https://lore.kernel.org/r/20251008-rq_bvec-v2-0-823c0a85a27c@kernel.org
Changes in v2:
- Better changelog message for patch #2
- Link to v1: https://lore.kernel.org/r/20251008-rq_bvec-v1-0-7f23d32d75e5@kernel.org
---
Jeff Layton (2):
sunrpc: account for TCP record marker in rq_bvec array when sending
sunrpc: add a slot to rqstp->rq_bvec for TCP record marker
fs/nfsd/vfs.c | 6 +++---
include/linux/sunrpc/svc.h | 1 +
net/sunrpc/svc.c | 4 +++-
net/sunrpc/svcsock.c | 4 ++--
4 files changed, 9 insertions(+), 6 deletions(-)
---
base-commit: 177818f176ef904fb18d237d1dbba00c2643aaf2
change-id: 20251008-rq_bvec-b66afd0fdbbb
Best regards,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending 2025-10-09 14:40 [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton @ 2025-10-09 14:40 ` Jeff Layton 2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton 1 sibling, 0 replies; 6+ messages in thread From: Jeff Layton @ 2025-10-09 14:40 UTC (permalink / raw) To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells Cc: Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton The call to xdr_buf_to_bvec() in svc_tcp_sendto() passes in the second slot to the bvec array as the starting slot, but doesn't decrease the length of the array by one. Fixes: 59cf7346542b ("sunrpc: Replace the rq_bvec array with dynamically-allocated memory") Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- net/sunrpc/svcsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 7b90abc5cf0ee1520796b2f38fcb977417009830..377fcaaaa061463fc5c85fc09c7a8eab5e06af77 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, memcpy(buf, &marker, sizeof(marker)); bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages, + count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1, &rqstp->rq_res); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, -- 2.51.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker 2025-10-09 14:40 [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton 2025-10-09 14:40 ` [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton @ 2025-10-09 14:40 ` Jeff Layton 2025-10-09 15:03 ` Chuck Lever 2025-10-10 10:54 ` NeilBrown 1 sibling, 2 replies; 6+ messages in thread From: Jeff Layton @ 2025-10-09 14:40 UTC (permalink / raw) To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells Cc: Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton We've seen some occurrences of messages like this in dmesg on some knfsd servers: xdr_buf_to_bvec: bio_vec array overflow Usually followed by messages like this that indicate a short send (note that this message is from an older kernel and the amount that it reports attempting to send is short by 4 bytes): rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record marker. If the send is an unaligned READ call though, then there may not be enough slots in the rq_bvec array in some cases. Add a rqstp->rq_bvec_len field and use that to keep track of the length of rq_bvec. Use that in place of rq_maxpages where it's iterating over the bvec. Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") Tested-by: Brandon Adams <brandona@meta.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/vfs.c | 6 +++--- include/linux/sunrpc/svc.h | 1 + net/sunrpc/svc.c | 4 +++- net/sunrpc/svcsock.c | 4 ++-- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp, v = 0; total = dio_end - dio_start; - while (total && v < rqstp->rq_maxpages && + while (total && v < rqstp->rq_bvec_len && rqstp->rq_next_page < rqstp->rq_page_end) { len = min_t(size_t, total, PAGE_SIZE); bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, v = 0; total = *count; - while (total && v < rqstp->rq_maxpages && + while (total && v < rqstp->rq_bvec_len && rqstp->rq_next_page < rqstp->rq_page_end) { len = min_t(size_t, total, PAGE_SIZE - base); bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, if (stable && !fhp->fh_use_wgather) kiocb.ki_flags |= IOCB_DSYNC; - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload); iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); since = READ_ONCE(file->f_wb_err); if (verf) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -206,6 +206,7 @@ struct svc_rqst { struct folio_batch rq_fbatch; struct bio_vec *rq_bvec; + u32 rq_bvec_len; __be32 rq_xid; /* transmission id */ u32 rq_prog; /* program number */ diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) if (!svc_init_buffer(rqstp, serv, node)) goto out_enomem; - rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages, + /* +1 for the TCP record marker */ + rqstp->rq_bvec_len = rqstp->rq_maxpages + 1; + rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len, sizeof(struct bio_vec), GFP_KERNEL, node); if (!rqstp->rq_bvec) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) if (svc_xprt_is_dead(xprt)) goto out_notconn; - count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr); + count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, count, rqstp->rq_res.len); @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, memcpy(buf, &marker, sizeof(marker)); bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1, + count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1, &rqstp->rq_res); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, -- 2.51.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker 2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton @ 2025-10-09 15:03 ` Chuck Lever 2025-10-09 15:07 ` Jeff Layton 2025-10-10 10:54 ` NeilBrown 1 sibling, 1 reply; 6+ messages in thread From: Chuck Lever @ 2025-10-09 15:03 UTC (permalink / raw) To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells Cc: Brandon Adams, linux-nfs, netdev, linux-kernel On 10/9/25 10:40 AM, Jeff Layton wrote: > We've seen some occurrences of messages like this in dmesg on some knfsd > servers: > > xdr_buf_to_bvec: bio_vec array overflow > > Usually followed by messages like this that indicate a short send (note > that this message is from an older kernel and the amount that it reports > attempting to send is short by 4 bytes): > > rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket > > svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record > marker. If the send is an unaligned READ call though, then there may not > be enough slots in the rq_bvec array in some cases. > > Add a rqstp->rq_bvec_len field and use that to keep track of the length > of rq_bvec. Use that in place of rq_maxpages where it's iterating over > the bvec. Granted that the number of items in rq_pages and in rq_bvec don't have to coincide, they just happen to be the same, historically. And, each bvec in rq_bvec doesn't necessarily have to be a page. > Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") > Tested-by: Brandon Adams <brandona@meta.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/vfs.c | 6 +++--- > include/linux/sunrpc/svc.h | 1 + > net/sunrpc/svc.c | 4 +++- > net/sunrpc/svcsock.c | 4 ++-- > 4 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > v = 0; > total = dio_end - dio_start; > - while (total && v < rqstp->rq_maxpages && > + while (total && v < rqstp->rq_bvec_len && > rqstp->rq_next_page < rqstp->rq_page_end) { > len = min_t(size_t, total, PAGE_SIZE); > bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, > @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > v = 0; > total = *count; > - while (total && v < rqstp->rq_maxpages && > + while (total && v < rqstp->rq_bvec_len && > rqstp->rq_next_page < rqstp->rq_page_end) { > len = min_t(size_t, total, PAGE_SIZE - base); > bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, > @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > if (stable && !fhp->fh_use_wgather) > kiocb.ki_flags |= IOCB_DSYNC; > > - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload); > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > since = READ_ONCE(file->f_wb_err); > if (verf) > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -206,6 +206,7 @@ struct svc_rqst { > > struct folio_batch rq_fbatch; > struct bio_vec *rq_bvec; > + u32 rq_bvec_len; > > __be32 rq_xid; /* transmission id */ > u32 rq_prog; /* program number */ > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) > if (!svc_init_buffer(rqstp, serv, node)) > goto out_enomem; > > - rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages, > + /* +1 for the TCP record marker */ > + rqstp->rq_bvec_len = rqstp->rq_maxpages + 1; What bugs me about this is that svc_prepare_thread() shouldn't have specific knowledge about the needs of transports. But I don't have a better idea... > + rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len, > sizeof(struct bio_vec), > GFP_KERNEL, node); > if (!rqstp->rq_bvec) > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) > if (svc_xprt_is_dead(xprt)) > goto out_notconn; > > - count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr); > + count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr); > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, > count, rqstp->rq_res.len); > @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, > memcpy(buf, &marker, sizeof(marker)); > bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); > > - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1, > + count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1, > &rqstp->rq_res); > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, > -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker 2025-10-09 15:03 ` Chuck Lever @ 2025-10-09 15:07 ` Jeff Layton 0 siblings, 0 replies; 6+ messages in thread From: Jeff Layton @ 2025-10-09 15:07 UTC (permalink / raw) To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells Cc: Brandon Adams, linux-nfs, netdev, linux-kernel On Thu, 2025-10-09 at 11:03 -0400, Chuck Lever wrote: > On 10/9/25 10:40 AM, Jeff Layton wrote: > > We've seen some occurrences of messages like this in dmesg on some knfsd > > servers: > > > > xdr_buf_to_bvec: bio_vec array overflow > > > > Usually followed by messages like this that indicate a short send (note > > that this message is from an older kernel and the amount that it reports > > attempting to send is short by 4 bytes): > > > > rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket > > > > svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record > > marker. If the send is an unaligned READ call though, then there may not > > be enough slots in the rq_bvec array in some cases. > > > > Add a rqstp->rq_bvec_len field and use that to keep track of the length > > of rq_bvec. Use that in place of rq_maxpages where it's iterating over > > the bvec. > > Granted that the number of items in rq_pages and in rq_bvec don't have > to coincide, they just happen to be the same, historically. And, each > bvec in rq_bvec doesn't necessarily have to be a page. > > > > Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") > > Tested-by: Brandon Adams <brandona@meta.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/nfsd/vfs.c | 6 +++--- > > include/linux/sunrpc/svc.h | 1 + > > net/sunrpc/svc.c | 4 +++- > > net/sunrpc/svcsock.c | 4 ++-- > > 4 files changed, 9 insertions(+), 6 deletions(-) > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > > v = 0; > > total = dio_end - dio_start; > > - while (total && v < rqstp->rq_maxpages && > > + while (total && v < rqstp->rq_bvec_len && > > rqstp->rq_next_page < rqstp->rq_page_end) { > > len = min_t(size_t, total, PAGE_SIZE); > > bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, > > @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > > v = 0; > > total = *count; > > - while (total && v < rqstp->rq_maxpages && > > + while (total && v < rqstp->rq_bvec_len && > > rqstp->rq_next_page < rqstp->rq_page_end) { > > len = min_t(size_t, total, PAGE_SIZE - base); > > bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, > > @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > > if (stable && !fhp->fh_use_wgather) > > kiocb.ki_flags |= IOCB_DSYNC; > > > > - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > > + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload); > > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > > since = READ_ONCE(file->f_wb_err); > > if (verf) > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > > index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644 > > --- a/include/linux/sunrpc/svc.h > > +++ b/include/linux/sunrpc/svc.h > > @@ -206,6 +206,7 @@ struct svc_rqst { > > > > struct folio_batch rq_fbatch; > > struct bio_vec *rq_bvec; > > + u32 rq_bvec_len; > > > > __be32 rq_xid; /* transmission id */ > > u32 rq_prog; /* program number */ > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644 > > --- a/net/sunrpc/svc.c > > +++ b/net/sunrpc/svc.c > > @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) > > if (!svc_init_buffer(rqstp, serv, node)) > > goto out_enomem; > > > > - rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages, > > + /* +1 for the TCP record marker */ > > + rqstp->rq_bvec_len = rqstp->rq_maxpages + 1; > > What bugs me about this is that svc_prepare_thread() shouldn't have > specific knowledge about the needs of transports. But I don't have a > better idea... > Yeah, it's a minor layering violation. I guess I could phrase it as: /* Some transports need an extra bvec. (e.g. TCP needs it for the record marker) */ ...but I'm not sure that's any clearer. > > > + rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len, > > sizeof(struct bio_vec), > > GFP_KERNEL, node); > > if (!rqstp->rq_bvec) > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644 > > --- a/net/sunrpc/svcsock.c > > +++ b/net/sunrpc/svcsock.c > > @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) > > if (svc_xprt_is_dead(xprt)) > > goto out_notconn; > > > > - count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr); > > + count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr); > > > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, > > count, rqstp->rq_res.len); > > @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, > > memcpy(buf, &marker, sizeof(marker)); > > bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); > > > > - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1, > > + count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1, > > &rqstp->rq_res); > > > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, > > > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker 2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton 2025-10-09 15:03 ` Chuck Lever @ 2025-10-10 10:54 ` NeilBrown 1 sibling, 0 replies; 6+ messages in thread From: NeilBrown @ 2025-10-10 10:54 UTC (permalink / raw) To: Jeff Layton Cc: Chuck Lever, Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, David Howells, Brandon Adams, linux-nfs, netdev, linux-kernel, Jeff Layton On Fri, 10 Oct 2025, Jeff Layton wrote: > We've seen some occurrences of messages like this in dmesg on some knfsd > servers: > > xdr_buf_to_bvec: bio_vec array overflow > > Usually followed by messages like this that indicate a short send (note > that this message is from an older kernel and the amount that it reports > attempting to send is short by 4 bytes): > > rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket > > svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record > marker. If the send is an unaligned READ call though, then there may not > be enough slots in the rq_bvec array in some cases. > > Add a rqstp->rq_bvec_len field and use that to keep track of the length > of rq_bvec. Use that in place of rq_maxpages where it's iterating over > the bvec. The above never says that the patch increases the size of rq_bvec, which is important for actually fixing the bug. This patch does two things: 1/ introduce ->rq_bvec_len which records the length of rq_bvec, and use it where ever that length is needed, rather than assuming it is rq_maxpages. 2/ increase ->rq_bvec_len by 1 as svc_tcp_sendmsg needs an extra slot to send the record header. You could conceivably make it two patches, but I don't think that is necessary. It *is* necessary to make it clear that these two distinct though related changes are happening. With something like that added to the commit message: Reviewed-by: NeilBrown <neil@brown.name> Thanks, NeilBrown > > Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call") > Tested-by: Brandon Adams <brandona@meta.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/vfs.c | 6 +++--- > include/linux/sunrpc/svc.h | 1 + > net/sunrpc/svc.c | 4 +++- > net/sunrpc/svcsock.c | 4 ++-- > 4 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > v = 0; > total = dio_end - dio_start; > - while (total && v < rqstp->rq_maxpages && > + while (total && v < rqstp->rq_bvec_len && > rqstp->rq_next_page < rqstp->rq_page_end) { > len = min_t(size_t, total, PAGE_SIZE); > bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, > @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > v = 0; > total = *count; > - while (total && v < rqstp->rq_maxpages && > + while (total && v < rqstp->rq_bvec_len && > rqstp->rq_next_page < rqstp->rq_page_end) { > len = min_t(size_t, total, PAGE_SIZE - base); > bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, > @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > if (stable && !fhp->fh_use_wgather) > kiocb.ki_flags |= IOCB_DSYNC; > > - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload); > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > since = READ_ONCE(file->f_wb_err); > if (verf) > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -206,6 +206,7 @@ struct svc_rqst { > > struct folio_batch rq_fbatch; > struct bio_vec *rq_bvec; > + u32 rq_bvec_len; > > __be32 rq_xid; /* transmission id */ > u32 rq_prog; /* program number */ > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) > if (!svc_init_buffer(rqstp, serv, node)) > goto out_enomem; > > - rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages, > + /* +1 for the TCP record marker */ > + rqstp->rq_bvec_len = rqstp->rq_maxpages + 1; > + rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len, > sizeof(struct bio_vec), > GFP_KERNEL, node); > if (!rqstp->rq_bvec) > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) > if (svc_xprt_is_dead(xprt)) > goto out_notconn; > > - count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr); > + count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr); > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, > count, rqstp->rq_res.len); > @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, > memcpy(buf, &marker, sizeof(marker)); > bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); > > - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1, > + count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1, > &rqstp->rq_res); > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, > > -- > 2.51.0 > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-10 10:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-09 14:40 [PATCH v3 0/2] sunrpc: fix handling of rq_bvec array in svc_rqst Jeff Layton 2025-10-09 14:40 ` [PATCH v3 1/2] sunrpc: account for TCP record marker in rq_bvec array when sending Jeff Layton 2025-10-09 14:40 ` [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP record marker Jeff Layton 2025-10-09 15:03 ` Chuck Lever 2025-10-09 15:07 ` Jeff Layton 2025-10-10 10:54 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).