* [PATCH v3 00/11] Allocate payload arrays dynamically
@ 2025-04-23 15:21 cel
2025-04-23 15:21 ` [PATCH v3 01/11] sunrpc: Remove backchannel check in svc_init_buffer() cel
` (10 more replies)
0 siblings, 11 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
In order to make RPCSVC_MAXPAYLOAD larger (or variable in size), we
need to do something clever with the payload arrays embedded in
struct svc_rqst and elsewhere.
My preference is to keep these arrays allocated all the time because
allocating them on demand increases the risk of a memory allocation
failure during a large I/O. This is a quick-and-dirty approach that
might be replaced once NFSD is converted to use large folios.
The downside of this design choice is that it pins a few pages per
NFSD thread (and that's the current situation already). But note
that because RPCSVC_MAXPAGES is 259, each array is just over a page
in size, making the allocation waste quite a bit of memory beyond
the end of the array due to power-of-2 allocator round up. This gets
worse as the MAXPAGES value is doubled or quadrupled.
This series also addresses similar issues in the socket and RDMA
transports.
Changes since v2:
* Address Jeff's review comments
* Address Neil's review comments
* Start removing a few uses of NFSSVC_MAXBLKSIZE
Chuck Lever (11):
sunrpc: Remove backchannel check in svc_init_buffer()
sunrpc: Add a helper to derive maxpages from sv_max_mesg
sunrpc: Replace the rq_pages array with dynamically-allocated memory
sunrpc: Replace the rq_vec array with dynamically-allocated memory
sunrpc: Replace the rq_bvec array with dynamically-allocated memory
sunrpc: Adjust size of socket's receive page array dynamically
svcrdma: Adjust the number of RDMA contexts per transport
svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
sunrpc: Remove the RPCSVC_MAXPAGES macro
NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize
fs/nfsd/nfs3proc.c | 4 +-
fs/nfsd/nfsproc.c | 4 +-
fs/nfsd/vfs.c | 2 +-
include/linux/sunrpc/svc.h | 31 +++++++++-----
include/linux/sunrpc/svc_rdma.h | 6 ++-
include/linux/sunrpc/svcsock.h | 4 +-
net/sunrpc/svc.c | 51 +++++++++++++++---------
net/sunrpc/svc_xprt.c | 10 +----
net/sunrpc/svcsock.c | 15 ++++---
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 8 +++-
net/sunrpc/xprtrdma/svc_rdma_rw.c | 2 +-
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 16 ++++++--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
13 files changed, 97 insertions(+), 58 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 01/11] sunrpc: Remove backchannel check in svc_init_buffer()
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 02/11] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
` (9 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The server's backchannel uses struct svc_rqst, but does not use the
pages in svc_rqst::rq_pages. It's rq_arg::pages and rq_res::pages
comes from the RPC client's page allocator. Currently,
svc_init_buffer() skips allocating pages in rq_pages for that
reason.
Except that, svc_rqst::rq_pages is filled anyway when a backchannel
svc_rqst is passed to svc_recv() -> and then to svc_alloc_arg().
This isn't really a problem at the moment, except that these pages
are allocated but then never used, as far as I can tell.
The problem is that later in this series, in addition to populating
the entries of rq_pages[], svc_init_buffer() will also allocate the
memory underlying the rq_pages[] array itself. If that allocation is
skipped, then svc_alloc_args() chases a NULL pointer for ingress
backchannel requests.
I'm not sure the way proposed here is best, but it does avoid
introducing extra conditional logic in svc_alloc_args(), which is a
hot path.
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/svc.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index e7f9c295d13c..8ce3e6b3df6a 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -640,10 +640,6 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
{
unsigned long pages, ret;
- /* bc_xprt uses fore channel allocated buffers */
- if (svc_is_backchannel(rqstp))
- return true;
-
pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
* We assume one is at most one page
*/
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 02/11] sunrpc: Add a helper to derive maxpages from sv_max_mesg
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
2025-04-23 15:21 ` [PATCH v3 01/11] sunrpc: Remove backchannel check in svc_init_buffer() cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 03/11] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
` (8 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
This page count is to be used to allocate various arrays of pages,
bio_vecs, and kvecs, replacing the fixed RPCSVC_MAXPAGES value.
The documenting comment is somewhat stale -- of course NFSv4
COMPOUND procedures may have multiple payloads.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 74658cca0f38..e83ac14267e8 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -159,6 +159,23 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
#define RPCSVC_MAXPAGES ((RPCSVC_MAXPAYLOAD+PAGE_SIZE-1)/PAGE_SIZE \
+ 2 + 1)
+/**
+ * svc_serv_maxpages - maximum pages/kvecs needed for one RPC message
+ * @serv: RPC service context
+ *
+ * Returns a count of pages or vectors that can hold the maximum
+ * size RPC message for @serv.
+ *
+ * Each request/reply pair can have at most one "payload", plus two
+ * pages, one for the request, and one for the reply.
+ * nfsd_splice_actor() might need an extra page when a READ payload
+ * is not page-aligned.
+ */
+static inline unsigned long svc_serv_maxpages(const struct svc_serv *serv)
+{
+ return DIV_ROUND_UP(serv->sv_max_mesg, PAGE_SIZE) + 2 + 1;
+}
+
/*
* The context of a single thread, including the request currently being
* processed.
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 03/11] sunrpc: Replace the rq_pages array with dynamically-allocated memory
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
2025-04-23 15:21 ` [PATCH v3 01/11] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-04-23 15:21 ` [PATCH v3 02/11] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 04/11] sunrpc: Replace the rq_vec " cel
` (7 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
As a step towards making NFSD's maximum rsize and wsize variable at
run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
with a chunk of dynamically-allocated memory.
On a system with 8-byte pointers and 4KB pages, pahole reports that
the rq_pages[] array is 2080 bytes. This patch replaces that with
a single 8-byte pointer field.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc.h | 3 ++-
net/sunrpc/svc.c | 34 ++++++++++++++++++-------------
net/sunrpc/svc_xprt.c | 10 +--------
net/sunrpc/xprtrdma/svc_rdma_rw.c | 2 +-
4 files changed, 24 insertions(+), 25 deletions(-)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index e83ac14267e8..ea3a33eec29b 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -205,7 +205,8 @@ struct svc_rqst {
struct xdr_stream rq_res_stream;
struct page *rq_scratch_page;
struct xdr_buf rq_res;
- struct page *rq_pages[RPCSVC_MAXPAGES + 1];
+ unsigned long rq_maxpages; /* num of entries in rq_pages */
+ struct page * *rq_pages;
struct page * *rq_respages; /* points into rq_pages */
struct page * *rq_next_page; /* next reply page to use */
struct page * *rq_page_end; /* one past the last page */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 8ce3e6b3df6a..682e11c9be36 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -636,20 +636,25 @@ svc_destroy(struct svc_serv **servp)
EXPORT_SYMBOL_GPL(svc_destroy);
static bool
-svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
+svc_init_buffer(struct svc_rqst *rqstp, const struct svc_serv *serv, int node)
{
- unsigned long pages, ret;
+ unsigned long ret;
- pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
- * We assume one is at most one page
- */
- WARN_ON_ONCE(pages > RPCSVC_MAXPAGES);
- if (pages > RPCSVC_MAXPAGES)
- pages = RPCSVC_MAXPAGES;
+ /* Add an extra page, as rq_pages holds both request and reply.
+ * We assume one of those is at most one page.
+ */
+ rqstp->rq_maxpages = svc_serv_maxpages(serv) + 1;
- ret = alloc_pages_bulk_node(GFP_KERNEL, node, pages,
+ /* rq_pages' last entry is NULL for historical reasons. */
+ rqstp->rq_pages = kcalloc_node(rqstp->rq_maxpages + 1,
+ sizeof(struct page *),
+ GFP_KERNEL, node);
+ if (!rqstp->rq_pages)
+ return false;
+
+ ret = alloc_pages_bulk_node(GFP_KERNEL, node, rqstp->rq_maxpages,
rqstp->rq_pages);
- return ret == pages;
+ return ret == rqstp->rq_maxpages;
}
/*
@@ -658,11 +663,12 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
static void
svc_release_buffer(struct svc_rqst *rqstp)
{
- unsigned int i;
+ unsigned long i;
- for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++)
+ for (i = 0; i < rqstp->rq_maxpages; i++)
if (rqstp->rq_pages[i])
put_page(rqstp->rq_pages[i]);
+ kfree(rqstp->rq_pages);
}
static void
@@ -704,7 +710,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
if (!rqstp->rq_resp)
goto out_enomem;
- if (!svc_init_buffer(rqstp, serv->sv_max_mesg, node))
+ if (!svc_init_buffer(rqstp, serv, node))
goto out_enomem;
rqstp->rq_err = -EAGAIN; /* No error yet */
@@ -896,7 +902,7 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
bool svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
{
struct page **begin = rqstp->rq_pages;
- struct page **end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
+ struct page **end = &rqstp->rq_pages[rqstp->rq_maxpages];
if (unlikely(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) {
trace_svc_replace_page_err(rqstp);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 32018557797b..cb14d6ddac6c 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -652,18 +652,10 @@ static void svc_check_conn_limits(struct svc_serv *serv)
static bool svc_alloc_arg(struct svc_rqst *rqstp)
{
- struct svc_serv *serv = rqstp->rq_server;
struct xdr_buf *arg = &rqstp->rq_arg;
unsigned long pages, filled, ret;
- pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
- if (pages > RPCSVC_MAXPAGES) {
- pr_warn_once("svc: warning: pages=%lu > RPCSVC_MAXPAGES=%lu\n",
- pages, RPCSVC_MAXPAGES);
- /* use as many pages as possible */
- pages = RPCSVC_MAXPAGES;
- }
-
+ pages = rqstp->rq_maxpages;
for (filled = 0; filled < pages; filled = ret) {
ret = alloc_pages_bulk(GFP_KERNEL, pages, rqstp->rq_pages);
if (ret > filled)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 40797114d50a..661b3fe2779f 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -765,7 +765,7 @@ static int svc_rdma_build_read_segment(struct svc_rqst *rqstp,
}
len -= seg_len;
- if (len && ((head->rc_curpage + 1) > ARRAY_SIZE(rqstp->rq_pages)))
+ if (len && ((head->rc_curpage + 1) > rqstp->rq_maxpages))
goto out_overrun;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 04/11] sunrpc: Replace the rq_vec array with dynamically-allocated memory
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (2 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 03/11] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 05/11] sunrpc: Replace the rq_bvec " cel
` (6 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
As a step towards making NFSD's maximum rsize and wsize variable at
run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
with a chunk of dynamically-allocated memory.
The rq_vec array is sized assuming request processing will need at
most one kvec per page in a maximum-sized RPC message.
On a system with 8-byte pointers and 4KB pages, pahole reports that
the rq_vec[] array is 4144 bytes. This patch replaces that array
with a single 8-byte pointer field.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/vfs.c | 2 +-
include/linux/sunrpc/svc.h | 2 +-
net/sunrpc/svc.c | 8 +++++++-
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 68f7d0094b06..ea0773bb7c05 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1096,7 +1096,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
++v;
base = 0;
}
- WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
+ WARN_ON_ONCE(v > rqstp->rq_maxpages);
trace_nfsd_read_vector(rqstp, fhp, offset, *count);
iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index ea3a33eec29b..f663d58abd7a 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -212,7 +212,7 @@ struct svc_rqst {
struct page * *rq_page_end; /* one past the last page */
struct folio_batch rq_fbatch;
- struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */
+ struct kvec *rq_vec;
struct bio_vec rq_bvec[RPCSVC_MAXPAGES];
__be32 rq_xid; /* transmission id */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 682e11c9be36..5808d4b97547 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -675,6 +675,7 @@ static void
svc_rqst_free(struct svc_rqst *rqstp)
{
folio_batch_release(&rqstp->rq_fbatch);
+ kfree(rqstp->rq_vec);
svc_release_buffer(rqstp);
if (rqstp->rq_scratch_page)
put_page(rqstp->rq_scratch_page);
@@ -713,6 +714,11 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
if (!svc_init_buffer(rqstp, serv, node))
goto out_enomem;
+ rqstp->rq_vec = kcalloc_node(rqstp->rq_maxpages, sizeof(struct kvec),
+ GFP_KERNEL, node);
+ if (!rqstp->rq_vec)
+ goto out_enomem;
+
rqstp->rq_err = -EAGAIN; /* No error yet */
serv->sv_nrthreads += 1;
@@ -1750,7 +1756,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp,
++pages;
}
- WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec));
+ WARN_ON_ONCE(i > rqstp->rq_maxpages);
return i;
}
EXPORT_SYMBOL_GPL(svc_fill_write_vector);
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 05/11] sunrpc: Replace the rq_bvec array with dynamically-allocated memory
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (3 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 04/11] sunrpc: Replace the rq_vec " cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 06/11] sunrpc: Adjust size of socket's receive page array dynamically cel
` (5 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
As a step towards making NFSD's maximum rsize and wsize variable at
run-time, replace the fixed-size rq_bvec[] array in struct svc_rqst
with a chunk of dynamically-allocated memory.
The rq_bvec[] array contains enough bio_vecs to handle each page in
a maximum size RPC message.
On a system with 8-byte pointers and 4KB pages, pahole reports that
the rq_bvec[] array is 4144 bytes. This patch replaces that array
with a single 8-byte pointer field.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc.h | 2 +-
net/sunrpc/svc.c | 7 +++++++
net/sunrpc/svcsock.c | 7 +++----
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index f663d58abd7a..4e6074bb0573 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -213,7 +213,7 @@ struct svc_rqst {
struct folio_batch rq_fbatch;
struct kvec *rq_vec;
- struct bio_vec rq_bvec[RPCSVC_MAXPAGES];
+ struct bio_vec *rq_bvec;
__be32 rq_xid; /* transmission id */
u32 rq_prog; /* program number */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 5808d4b97547..0741e506c35c 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -675,6 +675,7 @@ static void
svc_rqst_free(struct svc_rqst *rqstp)
{
folio_batch_release(&rqstp->rq_fbatch);
+ kfree(rqstp->rq_bvec);
kfree(rqstp->rq_vec);
svc_release_buffer(rqstp);
if (rqstp->rq_scratch_page)
@@ -719,6 +720,12 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
if (!rqstp->rq_vec)
goto out_enomem;
+ rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
+ sizeof(struct bio_vec),
+ GFP_KERNEL, node);
+ if (!rqstp->rq_bvec)
+ goto out_enomem;
+
rqstp->rq_err = -EAGAIN; /* No error yet */
serv->sv_nrthreads += 1;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 60f2883268fa..d9fdc6ae8020 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -713,8 +713,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
if (svc_xprt_is_dead(xprt))
goto out_notconn;
- count = xdr_buf_to_bvec(rqstp->rq_bvec,
- ARRAY_SIZE(rqstp->rq_bvec), xdr);
+ count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
count, rqstp->rq_res.len);
@@ -1219,8 +1218,8 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
memcpy(buf, &marker, sizeof(marker));
bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
- count = xdr_buf_to_bvec(rqstp->rq_bvec + 1,
- ARRAY_SIZE(rqstp->rq_bvec) - 1, &rqstp->rq_res);
+ count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages,
+ &rqstp->rq_res);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
1 + count, sizeof(marker) + rqstp->rq_res.len);
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 06/11] sunrpc: Adjust size of socket's receive page array dynamically
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (4 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 05/11] sunrpc: Replace the rq_bvec " cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 07/11] svcrdma: Adjust the number of RDMA contexts per transport cel
` (4 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
As a step towards making NFSD's maximum rsize and wsize variable at
run-time, make sk_pages a flexible array.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svcsock.h | 4 +++-
net/sunrpc/svcsock.c | 8 ++++++--
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index bf45d9e8492a..963bbe251e52 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -40,7 +40,9 @@ struct svc_sock {
struct completion sk_handshake_done;
- struct page * sk_pages[RPCSVC_MAXPAGES]; /* received data */
+ /* received data */
+ unsigned long sk_maxpages;
+ struct page * sk_pages[] __counted_by(sk_maxpages);
};
static inline u32 svc_sock_reclen(struct svc_sock *svsk)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index d9fdc6ae8020..e1c85123b445 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1339,7 +1339,8 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
svsk->sk_marker = xdr_zero;
svsk->sk_tcplen = 0;
svsk->sk_datalen = 0;
- memset(&svsk->sk_pages[0], 0, sizeof(svsk->sk_pages));
+ memset(&svsk->sk_pages[0], 0,
+ svsk->sk_maxpages * sizeof(struct page *));
tcp_sock_set_nodelay(sk);
@@ -1378,10 +1379,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
struct svc_sock *svsk;
struct sock *inet;
int pmap_register = !(flags & SVC_SOCK_ANONYMOUS);
+ unsigned long pages;
- svsk = kzalloc(sizeof(*svsk), GFP_KERNEL);
+ pages = svc_serv_maxpages(serv);
+ svsk = kzalloc(struct_size(svsk, sk_pages, pages), GFP_KERNEL);
if (!svsk)
return ERR_PTR(-ENOMEM);
+ svsk->sk_maxpages = pages;
inet = sock->sk;
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 07/11] svcrdma: Adjust the number of RDMA contexts per transport
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (5 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 06/11] sunrpc: Adjust size of socket's receive page array dynamically cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 08/11] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
` (3 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The RDMA accept code requests enough RDMA contexts to read
and write one page per maximum size RPC message, plus one
context that is getting recycled for the next RPC.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index aca8bdf65d72..22687533c3e9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -467,7 +467,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
* progress even if the client is using one rkey per page
* in each Read chunk.
*/
- ctxts = 3 * RPCSVC_MAXPAGES;
+ ctxts = 3 * svc_serv_maxpages(xprt->xpt_server);
newxprt->sc_sq_depth = rq_depth + ctxts;
if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 08/11] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (6 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 07/11] svcrdma: Adjust the number of RDMA contexts per transport cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 09/11] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
` (2 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Allow allocation of more entries in the rc_pages[] array when the
maximum size of an RPC message is increased.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc_rdma.h | 3 ++-
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 8 ++++++--
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 619fc0bd837a..1016f2feddc4 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -202,7 +202,8 @@ struct svc_rdma_recv_ctxt {
struct svc_rdma_pcl rc_reply_pcl;
unsigned int rc_page_count;
- struct page *rc_pages[RPCSVC_MAXPAGES];
+ unsigned long rc_maxpages;
+ struct page *rc_pages[] __counted_by(rc_maxpages);
};
/*
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 292022f0976e..e7e4a39ca6c6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -120,12 +120,16 @@ svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
{
int node = ibdev_to_node(rdma->sc_cm_id->device);
struct svc_rdma_recv_ctxt *ctxt;
+ unsigned long pages;
dma_addr_t addr;
void *buffer;
- ctxt = kzalloc_node(sizeof(*ctxt), GFP_KERNEL, node);
+ pages = svc_serv_maxpages(rdma->sc_xprt.xpt_server);
+ ctxt = kzalloc_node(struct_size(ctxt, rc_pages, pages),
+ GFP_KERNEL, node);
if (!ctxt)
goto fail0;
+ ctxt->rc_maxpages = pages;
buffer = kmalloc_node(rdma->sc_max_req_size, GFP_KERNEL, node);
if (!buffer)
goto fail1;
@@ -497,7 +501,7 @@ static bool xdr_check_write_chunk(struct svc_rdma_recv_ctxt *rctxt)
* a computation, perform a simple range check. This is an
* arbitrary but sensible limit (ie, not architectural).
*/
- if (unlikely(segcount > RPCSVC_MAXPAGES))
+ if (unlikely(segcount > rctxt->rc_maxpages))
return false;
p = xdr_inline_decode(&rctxt->rc_stream,
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 09/11] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (7 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 08/11] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 10/11] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-04-23 15:21 ` [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize cel
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Allow allocation of more entries in the sc_pages[] array when the
maximum size of an RPC message is increased.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc_rdma.h | 3 ++-
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 1016f2feddc4..22704c2e5b9b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -245,7 +245,8 @@ struct svc_rdma_send_ctxt {
void *sc_xprt_buf;
int sc_page_count;
int sc_cur_sge_no;
- struct page *sc_pages[RPCSVC_MAXPAGES];
+ unsigned long sc_maxpages;
+ struct page **sc_pages;
struct ib_sge sc_sges[];
};
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 96154a2367a1..914cd263c2f1 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -118,6 +118,7 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
{
int node = ibdev_to_node(rdma->sc_cm_id->device);
struct svc_rdma_send_ctxt *ctxt;
+ unsigned long pages;
dma_addr_t addr;
void *buffer;
int i;
@@ -126,13 +127,19 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
GFP_KERNEL, node);
if (!ctxt)
goto fail0;
+ pages = svc_serv_maxpages(rdma->sc_xprt.xpt_server);
+ ctxt->sc_pages = kcalloc_node(pages, sizeof(struct page *),
+ GFP_KERNEL, node);
+ if (!ctxt->sc_pages)
+ goto fail1;
+ ctxt->sc_maxpages = pages;
buffer = kmalloc_node(rdma->sc_max_req_size, GFP_KERNEL, node);
if (!buffer)
- goto fail1;
+ goto fail2;
addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
rdma->sc_max_req_size, DMA_TO_DEVICE);
if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
- goto fail2;
+ goto fail3;
svc_rdma_send_cid_init(rdma, &ctxt->sc_cid);
@@ -151,8 +158,10 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
ctxt->sc_sges[i].lkey = rdma->sc_pd->local_dma_lkey;
return ctxt;
-fail2:
+fail3:
kfree(buffer);
+fail2:
+ kfree(ctxt->sc_pages);
fail1:
kfree(ctxt);
fail0:
@@ -176,6 +185,7 @@ void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma)
rdma->sc_max_req_size,
DMA_TO_DEVICE);
kfree(ctxt->sc_xprt_buf);
+ kfree(ctxt->sc_pages);
kfree(ctxt);
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 10/11] sunrpc: Remove the RPCSVC_MAXPAGES macro
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (8 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 09/11] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
@ 2025-04-23 15:21 ` cel
2025-04-23 15:21 ` [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize cel
10 siblings, 0 replies; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
It is no longer used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc.h | 7 -------
1 file changed, 7 deletions(-)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 4e6074bb0573..e27bc051ec67 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -150,14 +150,7 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
* list. xdr_buf.tail points to the end of the first page.
* This assumes that the non-page part of an rpc reply will fit
* in a page - NFSd ensures this. lockd also has no trouble.
- *
- * Each request/reply pair can have at most one "payload", plus two pages,
- * one for the request, and one for the reply.
- * We using ->sendfile to return read data, we might need one extra page
- * if the request is not page-aligned. So add another '1'.
*/
-#define RPCSVC_MAXPAGES ((RPCSVC_MAXPAYLOAD+PAGE_SIZE-1)/PAGE_SIZE \
- + 2 + 1)
/**
* svc_serv_maxpages - maximum pages/kvecs needed for one RPC message
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
` (9 preceding siblings ...)
2025-04-23 15:21 ` [PATCH v3 10/11] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
@ 2025-04-23 15:21 ` cel
2025-04-26 4:31 ` NeilBrown
10 siblings, 1 reply; 14+ messages in thread
From: cel @ 2025-04-23 15:21 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The value in the .pc_xdrressize field is used to "reserve space in
the output queue". Relevant only to UDP transports, AFAICT.
The fixed value of NFSSVC_MAXBLKSIZE is added to that field for
NFSv2 and NFSv3 read requests, even though nfsd_proc_read() is
already careful to reserve the actual size of the read payload.
Adding the maximum payload size to .pc_xdrressize seems to be
unnecessary.
Also, instead of adding a constant 4 bytes for each payload's
XDR pad, add the actual size of the pad for better accuracy of
the reservation size.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs3proc.c | 4 ++--
fs/nfsd/nfsproc.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 372bdcf5e07a..dbb750a7b5db 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -202,7 +202,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
*/
resp->count = argp->count;
svc_reserve_auth(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3) << 2) +
- resp->count + 4);
+ xdr_align_size(resp->count));
fh_copy(&resp->fh, &argp->fh);
resp->status = nfsd_read(rqstp, &resp->fh, argp->offset,
@@ -921,7 +921,7 @@ static const struct svc_procedure nfsd_procedures3[22] = {
.pc_argzero = sizeof(struct nfsd3_readargs),
.pc_ressize = sizeof(struct nfsd3_readres),
.pc_cachetype = RC_NOCACHE,
- .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4,
+ .pc_xdrressize = ST+pAT+3,
.pc_name = "READ",
},
[NFS3PROC_WRITE] = {
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 6dda081eb24c..a95faf726e58 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -219,7 +219,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
/* Obtain buffer pointer for payload. 19 is 1 word for
* status, 17 words for fattr, and 1 word for the byte count.
*/
- svc_reserve_auth(rqstp, (19<<2) + argp->count + 4);
+ svc_reserve_auth(rqstp, (19<<2) + xdr_align_size(argp->count));
resp->count = argp->count;
fh_copy(&resp->fh, &argp->fh);
@@ -739,7 +739,7 @@ static const struct svc_procedure nfsd_procedures2[18] = {
.pc_argzero = sizeof(struct nfsd_readargs),
.pc_ressize = sizeof(struct nfsd_readres),
.pc_cachetype = RC_NOCACHE,
- .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4,
+ .pc_xdrressize = ST+AT+1,
.pc_name = "READ",
},
[NFSPROC_WRITECACHE] = {
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize
2025-04-23 15:21 ` [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize cel
@ 2025-04-26 4:31 ` NeilBrown
2025-04-26 15:08 ` Chuck Lever
0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2025-04-26 4:31 UTC (permalink / raw)
To: cel
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On Thu, 24 Apr 2025, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The value in the .pc_xdrressize field is used to "reserve space in
> the output queue". Relevant only to UDP transports, AFAICT.
>
> The fixed value of NFSSVC_MAXBLKSIZE is added to that field for
> NFSv2 and NFSv3 read requests, even though nfsd_proc_read() is
> already careful to reserve the actual size of the read payload.
> Adding the maximum payload size to .pc_xdrressize seems to be
> unnecessary.
I believe it is necessary.
svc_reserve() (and svc_reserve_auth) only ever reduces the size of the
reservation.
->rq_reserved is initialised to serv->sv_max_mesg, then reduced to
.pc_xdrressize once the proc is known, then possibly reduced further by
code in the proc.
So .pc_xdrressize must be (at least) the largest possible size.
>
> Also, instead of adding a constant 4 bytes for each payload's
> XDR pad, add the actual size of the pad for better accuracy of
> the reservation size.
Could we instead change svc_reserve() to add the pad, and remove all the
manual padding?
But pc_xdrressize is in xdr units - it is multiplied by 4 before passing
to svc_reserve. So these changes don't do what you think they do...
NeilBrown
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs3proc.c | 4 ++--
> fs/nfsd/nfsproc.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 372bdcf5e07a..dbb750a7b5db 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -202,7 +202,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
> */
> resp->count = argp->count;
> svc_reserve_auth(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3) << 2) +
> - resp->count + 4);
> + xdr_align_size(resp->count));
>
> fh_copy(&resp->fh, &argp->fh);
> resp->status = nfsd_read(rqstp, &resp->fh, argp->offset,
> @@ -921,7 +921,7 @@ static const struct svc_procedure nfsd_procedures3[22] = {
> .pc_argzero = sizeof(struct nfsd3_readargs),
> .pc_ressize = sizeof(struct nfsd3_readres),
> .pc_cachetype = RC_NOCACHE,
> - .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4,
> + .pc_xdrressize = ST+pAT+3,
> .pc_name = "READ",
> },
> [NFS3PROC_WRITE] = {
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index 6dda081eb24c..a95faf726e58 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -219,7 +219,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
> /* Obtain buffer pointer for payload. 19 is 1 word for
> * status, 17 words for fattr, and 1 word for the byte count.
> */
> - svc_reserve_auth(rqstp, (19<<2) + argp->count + 4);
> + svc_reserve_auth(rqstp, (19<<2) + xdr_align_size(argp->count));
>
> resp->count = argp->count;
> fh_copy(&resp->fh, &argp->fh);
> @@ -739,7 +739,7 @@ static const struct svc_procedure nfsd_procedures2[18] = {
> .pc_argzero = sizeof(struct nfsd_readargs),
> .pc_ressize = sizeof(struct nfsd_readres),
> .pc_cachetype = RC_NOCACHE,
> - .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4,
> + .pc_xdrressize = ST+AT+1,
> .pc_name = "READ",
> },
> [NFSPROC_WRITECACHE] = {
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize
2025-04-26 4:31 ` NeilBrown
@ 2025-04-26 15:08 ` Chuck Lever
0 siblings, 0 replies; 14+ messages in thread
From: Chuck Lever @ 2025-04-26 15:08 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On 4/26/25 12:31 AM, NeilBrown wrote:
> On Thu, 24 Apr 2025, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> The value in the .pc_xdrressize field is used to "reserve space in
>> the output queue". Relevant only to UDP transports, AFAICT.
>>
>> The fixed value of NFSSVC_MAXBLKSIZE is added to that field for
>> NFSv2 and NFSv3 read requests, even though nfsd_proc_read() is
>> already careful to reserve the actual size of the read payload.
>> Adding the maximum payload size to .pc_xdrressize seems to be
>> unnecessary.
>
> I believe it is necessary.
>
> svc_reserve() (and svc_reserve_auth) only ever reduces the size of the
> reservation.
> ->rq_reserved is initialised to serv->sv_max_mesg, then reduced to
> .pc_xdrressize once the proc is known, then possibly reduced further by
> code in the proc.
> So .pc_xdrressize must be (at least) the largest possible size.
Hrm. I instrumented this code path. It seemed to be doing exactly
what I expected. The behavior of xdr_reserve_space() is to /increase/
buffer space reservation, so svc_reserve() seems to behave in the
opposite way, then?
So if the maximum payload size is no longer a constant, these
pc_xdrressize values still have to add the largest payload size that
NFSD can support (probably will be 8MB), not the max-payload-size
setting in effect for that nfsd thread pool.
>> Also, instead of adding a constant 4 bytes for each payload's
>> XDR pad, add the actual size of the pad for better accuracy of
>> the reservation size.
>
> Could we instead change svc_reserve() to add the pad, and remove all the
> manual padding?
The padding is needed only for a few certain operations; READ and
READLINK, I think that's it. NFSv4 GETATTR might need it too.
IOW the common case is that no padding is needed.
The largest payload, even if it needs an XDR pad, will be
NFSSVC_MAXBLKSIZE.
> But pc_xdrressize is in xdr units - it is multiplied by 4 before passing
> to svc_reserve. So these changes don't do what you think they do...
Fair enough. I can drop "NFSD: Remove NFSSVC_MAXBLKSIZE from
.pc_xdrressize" and "NFSD: Remove NFSD_BUFSIZE".
I find the name NFSSVC_MAXBLKSIZE confusing, though. NFS is a file
protocol, so I'm not clear what a "block" is in this context.
Also, am I correct that the only transport that cares about this
send buffer reservation is UDP?
> NeilBrown
>
>
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/nfs3proc.c | 4 ++--
>> fs/nfsd/nfsproc.c | 4 ++--
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
>> index 372bdcf5e07a..dbb750a7b5db 100644
>> --- a/fs/nfsd/nfs3proc.c
>> +++ b/fs/nfsd/nfs3proc.c
>> @@ -202,7 +202,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
>> */
>> resp->count = argp->count;
>> svc_reserve_auth(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3) << 2) +
>> - resp->count + 4);
>> + xdr_align_size(resp->count));
>>
>> fh_copy(&resp->fh, &argp->fh);
>> resp->status = nfsd_read(rqstp, &resp->fh, argp->offset,
>> @@ -921,7 +921,7 @@ static const struct svc_procedure nfsd_procedures3[22] = {
>> .pc_argzero = sizeof(struct nfsd3_readargs),
>> .pc_ressize = sizeof(struct nfsd3_readres),
>> .pc_cachetype = RC_NOCACHE,
>> - .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4,
>> + .pc_xdrressize = ST+pAT+3,
>> .pc_name = "READ",
>> },
>> [NFS3PROC_WRITE] = {
>> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
>> index 6dda081eb24c..a95faf726e58 100644
>> --- a/fs/nfsd/nfsproc.c
>> +++ b/fs/nfsd/nfsproc.c
>> @@ -219,7 +219,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
>> /* Obtain buffer pointer for payload. 19 is 1 word for
>> * status, 17 words for fattr, and 1 word for the byte count.
>> */
>> - svc_reserve_auth(rqstp, (19<<2) + argp->count + 4);
>> + svc_reserve_auth(rqstp, (19<<2) + xdr_align_size(argp->count));
>>
>> resp->count = argp->count;
>> fh_copy(&resp->fh, &argp->fh);
>> @@ -739,7 +739,7 @@ static const struct svc_procedure nfsd_procedures2[18] = {
>> .pc_argzero = sizeof(struct nfsd_readargs),
>> .pc_ressize = sizeof(struct nfsd_readres),
>> .pc_cachetype = RC_NOCACHE,
>> - .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4,
>> + .pc_xdrressize = ST+AT+1,
>> .pc_name = "READ",
>> },
>> [NFSPROC_WRITECACHE] = {
>> --
>> 2.49.0
>>
>>
>
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-04-26 15:08 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 15:21 [PATCH v3 00/11] Allocate payload arrays dynamically cel
2025-04-23 15:21 ` [PATCH v3 01/11] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-04-23 15:21 ` [PATCH v3 02/11] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
2025-04-23 15:21 ` [PATCH v3 03/11] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
2025-04-23 15:21 ` [PATCH v3 04/11] sunrpc: Replace the rq_vec " cel
2025-04-23 15:21 ` [PATCH v3 05/11] sunrpc: Replace the rq_bvec " cel
2025-04-23 15:21 ` [PATCH v3 06/11] sunrpc: Adjust size of socket's receive page array dynamically cel
2025-04-23 15:21 ` [PATCH v3 07/11] svcrdma: Adjust the number of RDMA contexts per transport cel
2025-04-23 15:21 ` [PATCH v3 08/11] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
2025-04-23 15:21 ` [PATCH v3 09/11] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
2025-04-23 15:21 ` [PATCH v3 10/11] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-04-23 15:21 ` [PATCH v3 11/11] NFSD: Remove NFSSVC_MAXBLKSIZE from .pc_xdrressize cel
2025-04-26 4:31 ` NeilBrown
2025-04-26 15:08 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox