Linux NFS development
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Allocate payload arrays dynamically
@ 2025-04-19 17:28 cel
  2025-04-19 17:28 ` [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer() cel
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

In order to make RPCSVC_MAXPAYLOAD larger (or variable in size), we
need to do something clever with the payload arrays embedded in
struct svc_rqst and elsewhere.

My preference is to keep these arrays allocated all the time because
allocating them on demand increases the risk of a memory allocation
failure during a large I/O. This is a quick-and-dirty approach that
might be replaced once NFSD is converted to use large folios.

The downside of this design choice is that it pins a few pages per
NFSD thread (and that's the current situation already). But note
that because RPCSVC_MAXPAGES is 259, each array is just over a page
in size, making the allocation waste quite a bit of memory beyond
the end of the array due to power-of-2 allocator round up. This gets
worse as the MAXPAGES value is doubled or quadrupled.

This series also addresses similar issues in the socket and RDMA
transports.

Chuck Lever (9):
  sunrpc: Remove backchannel check in svc_init_buffer()
  sunrpc: Add a helper to derive maxpages from sv_max_mesg
  sunrpc: Replace the rq_pages array with dynamically-allocated memory
  sunrpc: Replace the rq_vec array with dynamically-allocated memory
  sunrpc: Replace the rq_bvec array with dynamically-allocated memory
  sunrpc: Adjust size of socket's receive page array dynamically
  svcrdma: Adjust the number of RDMA contexts per transport
  svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
  svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages

 fs/nfsd/nfs4proc.c                       |  1 -
 fs/nfsd/vfs.c                            |  2 +-
 include/linux/sunrpc/svc.h               | 19 +++++++--
 include/linux/sunrpc/svc_rdma.h          |  6 ++-
 include/linux/sunrpc/svcsock.h           |  4 +-
 net/sunrpc/svc.c                         | 51 +++++++++++++++---------
 net/sunrpc/svc_xprt.c                    | 10 +----
 net/sunrpc/svcsock.c                     | 15 ++++---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  8 +++-
 net/sunrpc/xprtrdma/svc_rdma_rw.c        |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 16 ++++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  2 +-
 12 files changed, 88 insertions(+), 48 deletions(-)

-- 
2.49.0


Chuck Lever (10):
  sunrpc: Remove backchannel check in svc_init_buffer()
  sunrpc: Add a helper to derive maxpages from sv_max_mesg
  sunrpc: Replace the rq_pages array with dynamically-allocated memory
  sunrpc: Replace the rq_vec array with dynamically-allocated memory
  sunrpc: Replace the rq_bvec array with dynamically-allocated memory
  sunrpc: Adjust size of socket's receive page array dynamically
  svcrdma: Adjust the number of RDMA contexts per transport
  svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
  svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
  sunrpc: Remove the RPCSVC_MAXPAGES macro

 fs/nfsd/nfs4proc.c                       |  1 -
 fs/nfsd/vfs.c                            |  2 +-
 include/linux/sunrpc/svc.h               | 31 +++++++++-----
 include/linux/sunrpc/svc_rdma.h          |  6 ++-
 include/linux/sunrpc/svcsock.h           |  4 +-
 net/sunrpc/svc.c                         | 51 +++++++++++++++---------
 net/sunrpc/svc_xprt.c                    | 10 +----
 net/sunrpc/svcsock.c                     | 15 ++++---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  8 +++-
 net/sunrpc/xprtrdma/svc_rdma_rw.c        |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 16 ++++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  2 +-
 12 files changed, 93 insertions(+), 55 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer()
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
@ 2025-04-19 17:28 ` cel
  2025-04-21 12:16   ` Jeff Layton
  2025-04-19 17:28 ` [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

A backchannel service might use forechannel send and receive
buffers, but svc_recv() still calls svc_alloc_arg() on backchannel
svc_rqsts, so it will populate the svc_rqst's rq_pages[] array
anyway. The "is backchannel" check in svc_init_buffer() saves us
nothing.

In a moment, I plan to replace the rq_pages[] array with a
dynamically-allocated piece of memory, and svc_init_buffer() is
where that memory will get allocated. Backchannel requests actually
do use that array, so it has to be available to handle those
requests without a segfault.

XXX: Or, make svc_alloc_arg() ignore backchannel requests too?
     Could set rqstp->rq_maxpages to zero.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svc.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index e7f9c295d13c..8ce3e6b3df6a 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -640,10 +640,6 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
 {
 	unsigned long pages, ret;
 
-	/* bc_xprt uses fore channel allocated buffers */
-	if (svc_is_backchannel(rqstp))
-		return true;
-
 	pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
 				       * We assume one is at most one page
 				       */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
  2025-04-19 17:28 ` [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer() cel
@ 2025-04-19 17:28 ` cel
  2025-04-22 20:48   ` NeilBrown
  2025-04-19 17:28 ` [PATCH v2 03/10] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

This page count is to be used to allocate various arrays of pages,
bio_vecs, and kvecs, replacing the fixed RPCSVC_MAXPAGES value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 74658cca0f38..5b879c31d7b8 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -159,6 +159,18 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
 #define RPCSVC_MAXPAGES		((RPCSVC_MAXPAYLOAD+PAGE_SIZE-1)/PAGE_SIZE \
 				+ 2 + 1)
 
+/**
+ * svc_serv_maxpages - maximum pages/kvecs needed for one RPC message
+ * @serv: RPC service context
+ *
+ * Returns a count of pages or vectors that can hold the maximum
+ * size RPC message for @serv.
+ */
+static inline unsigned long svc_serv_maxpages(const struct svc_serv *serv)
+{
+	return ((serv->sv_max_mesg + PAGE_SIZE - 1) >> PAGE_SHIFT) + 2 + 1;
+}
+
 /*
  * The context of a single thread, including the request currently being
  * processed.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/10] sunrpc: Replace the rq_pages array with dynamically-allocated memory
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
  2025-04-19 17:28 ` [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer() cel
  2025-04-19 17:28 ` [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
@ 2025-04-19 17:28 ` cel
  2025-04-21 12:19   ` Jeff Layton
  2025-04-19 17:28 ` [PATCH v2 04/10] sunrpc: Replace the rq_vec " cel
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

As a step towards making NFSD's maximum rsize and wsize variable at
run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
with a chunk of dynamically-allocated memory.

On a system with 8-byte pointers and 4KB pages, pahole reports that
the rq_pages[] array is 2080 bytes. Replacing it with a single
pointer reduces the size of struct svc_rqst to just over 9500 bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc.h        |  3 ++-
 net/sunrpc/svc.c                  | 34 ++++++++++++++++++-------------
 net/sunrpc/svc_xprt.c             | 10 +--------
 net/sunrpc/xprtrdma/svc_rdma_rw.c |  2 +-
 4 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 5b879c31d7b8..96ac12dbb04d 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -200,7 +200,8 @@ struct svc_rqst {
 	struct xdr_stream	rq_res_stream;
 	struct page		*rq_scratch_page;
 	struct xdr_buf		rq_res;
-	struct page		*rq_pages[RPCSVC_MAXPAGES + 1];
+	unsigned long		rq_maxpages;	/* num of entries in rq_pages */
+	struct page *		*rq_pages;
 	struct page *		*rq_respages;	/* points into rq_pages */
 	struct page *		*rq_next_page; /* next reply page to use */
 	struct page *		*rq_page_end;  /* one past the last page */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 8ce3e6b3df6a..682e11c9be36 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -636,20 +636,25 @@ svc_destroy(struct svc_serv **servp)
 EXPORT_SYMBOL_GPL(svc_destroy);
 
 static bool
-svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
+svc_init_buffer(struct svc_rqst *rqstp, const struct svc_serv *serv, int node)
 {
-	unsigned long pages, ret;
+	unsigned long ret;
 
-	pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
-				       * We assume one is at most one page
-				       */
-	WARN_ON_ONCE(pages > RPCSVC_MAXPAGES);
-	if (pages > RPCSVC_MAXPAGES)
-		pages = RPCSVC_MAXPAGES;
+	/* Add an extra page, as rq_pages holds both request and reply.
+	 * We assume one of those is at most one page.
+	 */
+	rqstp->rq_maxpages = svc_serv_maxpages(serv) + 1;
 
-	ret = alloc_pages_bulk_node(GFP_KERNEL, node, pages,
+	/* rq_pages' last entry is NULL for historical reasons. */
+	rqstp->rq_pages = kcalloc_node(rqstp->rq_maxpages + 1,
+				       sizeof(struct page *),
+				       GFP_KERNEL, node);
+	if (!rqstp->rq_pages)
+		return false;
+
+	ret = alloc_pages_bulk_node(GFP_KERNEL, node, rqstp->rq_maxpages,
 				    rqstp->rq_pages);
-	return ret == pages;
+	return ret == rqstp->rq_maxpages;
 }
 
 /*
@@ -658,11 +663,12 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
 static void
 svc_release_buffer(struct svc_rqst *rqstp)
 {
-	unsigned int i;
+	unsigned long i;
 
-	for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++)
+	for (i = 0; i < rqstp->rq_maxpages; i++)
 		if (rqstp->rq_pages[i])
 			put_page(rqstp->rq_pages[i]);
+	kfree(rqstp->rq_pages);
 }
 
 static void
@@ -704,7 +710,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	if (!rqstp->rq_resp)
 		goto out_enomem;
 
-	if (!svc_init_buffer(rqstp, serv->sv_max_mesg, node))
+	if (!svc_init_buffer(rqstp, serv, node))
 		goto out_enomem;
 
 	rqstp->rq_err = -EAGAIN; /* No error yet */
@@ -896,7 +902,7 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
 bool svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
 {
 	struct page **begin = rqstp->rq_pages;
-	struct page **end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
+	struct page **end = &rqstp->rq_pages[rqstp->rq_maxpages];
 
 	if (unlikely(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) {
 		trace_svc_replace_page_err(rqstp);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index ae25405d8bd2..23547ed25269 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -651,18 +651,10 @@ static void svc_check_conn_limits(struct svc_serv *serv)
 
 static bool svc_alloc_arg(struct svc_rqst *rqstp)
 {
-	struct svc_serv *serv = rqstp->rq_server;
 	struct xdr_buf *arg = &rqstp->rq_arg;
 	unsigned long pages, filled, ret;
 
-	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
-	if (pages > RPCSVC_MAXPAGES) {
-		pr_warn_once("svc: warning: pages=%lu > RPCSVC_MAXPAGES=%lu\n",
-			     pages, RPCSVC_MAXPAGES);
-		/* use as many pages as possible */
-		pages = RPCSVC_MAXPAGES;
-	}
-
+	pages = rqstp->rq_maxpages;
 	for (filled = 0; filled < pages; filled = ret) {
 		ret = alloc_pages_bulk(GFP_KERNEL, pages, rqstp->rq_pages);
 		if (ret > filled)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 40797114d50a..661b3fe2779f 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -765,7 +765,7 @@ static int svc_rdma_build_read_segment(struct svc_rqst *rqstp,
 		}
 		len -= seg_len;
 
-		if (len && ((head->rc_curpage + 1) > ARRAY_SIZE(rqstp->rq_pages)))
+		if (len && ((head->rc_curpage + 1) > rqstp->rq_maxpages))
 			goto out_overrun;
 	}
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/10] sunrpc: Replace the rq_vec array with dynamically-allocated memory
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (2 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 03/10] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
@ 2025-04-19 17:28 ` cel
  2025-04-21 12:22   ` Jeff Layton
  2025-04-19 17:28 ` [PATCH v2 05/10] sunrpc: Replace the rq_bvec " cel
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

As a step towards making NFSD's maximum rsize and wsize variable at
run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
with a chunk of dynamically-allocated memory.

The rq_vec array is sized assuming request processing will need at
most one kvec per page in a maximum-sized RPC message.

On a system with 8-byte pointers and 4KB pages, pahole reports that
the rq_vec[] array is 4144 bytes. Replacing it with a single
pointer reduces the size of struct svc_rqst to about 5400 bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c         | 1 -
 fs/nfsd/vfs.c              | 2 +-
 include/linux/sunrpc/svc.h | 2 +-
 net/sunrpc/svc.c           | 8 +++++++-
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b397246dae7b..d1be58b557d1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1228,7 +1228,6 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	write->wr_how_written = write->wr_stable_how;
 
 	nvecs = svc_fill_write_vector(rqstp, &write->wr_payload);
-	WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
 
 	status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf,
 				write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9abdc4b75813..4eaac3aa7e15 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1094,7 +1094,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		++v;
 		base = 0;
 	}
-	WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
+	WARN_ON_ONCE(v > rqstp->rq_maxpages);
 
 	trace_nfsd_read_vector(rqstp, fhp, offset, *count);
 	iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 96ac12dbb04d..72d016772711 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -207,7 +207,7 @@ struct svc_rqst {
 	struct page *		*rq_page_end;  /* one past the last page */
 
 	struct folio_batch	rq_fbatch;
-	struct kvec		rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */
+	struct kvec		*rq_vec;
 	struct bio_vec		rq_bvec[RPCSVC_MAXPAGES];
 
 	__be32			rq_xid;		/* transmission id */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 682e11c9be36..5808d4b97547 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -675,6 +675,7 @@ static void
 svc_rqst_free(struct svc_rqst *rqstp)
 {
 	folio_batch_release(&rqstp->rq_fbatch);
+	kfree(rqstp->rq_vec);
 	svc_release_buffer(rqstp);
 	if (rqstp->rq_scratch_page)
 		put_page(rqstp->rq_scratch_page);
@@ -713,6 +714,11 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	if (!svc_init_buffer(rqstp, serv, node))
 		goto out_enomem;
 
+	rqstp->rq_vec = kcalloc_node(rqstp->rq_maxpages, sizeof(struct kvec),
+				      GFP_KERNEL, node);
+	if (!rqstp->rq_vec)
+		goto out_enomem;
+
 	rqstp->rq_err = -EAGAIN; /* No error yet */
 
 	serv->sv_nrthreads += 1;
@@ -1750,7 +1756,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp,
 		++pages;
 	}
 
-	WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec));
+	WARN_ON_ONCE(i > rqstp->rq_maxpages);
 	return i;
 }
 EXPORT_SYMBOL_GPL(svc_fill_write_vector);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/10] sunrpc: Replace the rq_bvec array with dynamically-allocated memory
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (3 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 04/10] sunrpc: Replace the rq_vec " cel
@ 2025-04-19 17:28 ` cel
  2025-04-19 17:28 ` [PATCH v2 06/10] sunrpc: Adjust size of socket's receive page array dynamically cel
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

As a step towards making NFSD's maximum rsize and wsize variable at
run-time, replace the fixed-size rq_bvec[] array in struct svc_rqst
with a chunk of dynamically-allocated memory.

The rq_bvec[] array contains enough bio_vecs to handle each page in
a maximum size RPC message.

On a system with 8-byte pointers and 4KB pages, pahole reports that
the rq_bvec[] array is 4144 bytes. Replacing it with a single
pointer reduces the size of struct svc_rqst to about 1240 bytes, or
just over 19 cache lines.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc.h | 2 +-
 net/sunrpc/svc.c           | 7 +++++++
 net/sunrpc/svcsock.c       | 7 +++----
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 72d016772711..a524564094d6 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -208,7 +208,7 @@ struct svc_rqst {
 
 	struct folio_batch	rq_fbatch;
 	struct kvec		*rq_vec;
-	struct bio_vec		rq_bvec[RPCSVC_MAXPAGES];
+	struct bio_vec		*rq_bvec;
 
 	__be32			rq_xid;		/* transmission id */
 	u32			rq_prog;	/* program number */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 5808d4b97547..0741e506c35c 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -675,6 +675,7 @@ static void
 svc_rqst_free(struct svc_rqst *rqstp)
 {
 	folio_batch_release(&rqstp->rq_fbatch);
+	kfree(rqstp->rq_bvec);
 	kfree(rqstp->rq_vec);
 	svc_release_buffer(rqstp);
 	if (rqstp->rq_scratch_page)
@@ -719,6 +720,12 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	if (!rqstp->rq_vec)
 		goto out_enomem;
 
+	rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
+				      sizeof(struct bio_vec),
+				      GFP_KERNEL, node);
+	if (!rqstp->rq_bvec)
+		goto out_enomem;
+
 	rqstp->rq_err = -EAGAIN; /* No error yet */
 
 	serv->sv_nrthreads += 1;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 72e5a01df3d3..c846341bb08c 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -713,8 +713,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
 	if (svc_xprt_is_dead(xprt))
 		goto out_notconn;
 
-	count = xdr_buf_to_bvec(rqstp->rq_bvec,
-				ARRAY_SIZE(rqstp->rq_bvec), xdr);
+	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
 
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
 		      count, rqstp->rq_res.len);
@@ -1219,8 +1218,8 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
 	memcpy(buf, &marker, sizeof(marker));
 	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
 
-	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1,
-				ARRAY_SIZE(rqstp->rq_bvec) - 1, &rqstp->rq_res);
+	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages,
+				&rqstp->rq_res);
 
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
 		      1 + count, sizeof(marker) + rqstp->rq_res.len);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/10] sunrpc: Adjust size of socket's receive page array dynamically
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (4 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 05/10] sunrpc: Replace the rq_bvec " cel
@ 2025-04-19 17:28 ` cel
  2025-04-19 17:28 ` [PATCH v2 07/10] svcrdma: Adjust the number of RDMA contexts per transport cel
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

As a step towards making NFSD's maximum rsize and wsize variable at
run-time, make sk_pages a flexible array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svcsock.h | 4 +++-
 net/sunrpc/svcsock.c           | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index bf45d9e8492a..963bbe251e52 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -40,7 +40,9 @@ struct svc_sock {
 
 	struct completion	sk_handshake_done;
 
-	struct page *		sk_pages[RPCSVC_MAXPAGES];	/* received data */
+	/* received data */
+	unsigned long		sk_maxpages;
+	struct page *		sk_pages[] __counted_by(sk_maxpages);
 };
 
 static inline u32 svc_sock_reclen(struct svc_sock *svsk)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index c846341bb08c..5432e4a2f858 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1339,7 +1339,8 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
 		svsk->sk_marker = xdr_zero;
 		svsk->sk_tcplen = 0;
 		svsk->sk_datalen = 0;
-		memset(&svsk->sk_pages[0], 0, sizeof(svsk->sk_pages));
+		memset(&svsk->sk_pages[0], 0,
+		       svsk->sk_maxpages * sizeof(struct page *));
 
 		tcp_sock_set_nodelay(sk);
 
@@ -1378,10 +1379,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 	struct svc_sock	*svsk;
 	struct sock	*inet;
 	int		pmap_register = !(flags & SVC_SOCK_ANONYMOUS);
+	unsigned long	pages;
 
-	svsk = kzalloc(sizeof(*svsk), GFP_KERNEL);
+	pages = svc_serv_maxpages(serv);
+	svsk = kzalloc(struct_size(svsk, sk_pages, pages), GFP_KERNEL);
 	if (!svsk)
 		return ERR_PTR(-ENOMEM);
+	svsk->sk_maxpages = pages;
 
 	inet = sock->sk;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/10] svcrdma: Adjust the number of RDMA contexts per transport
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (5 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 06/10] sunrpc: Adjust size of socket's receive page array dynamically cel
@ 2025-04-19 17:28 ` cel
  2025-04-19 17:28 ` [PATCH v2 08/10] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

The RDMA accept code requests enough RDMA contexts to read
and write one page per maximum size RPC message, plus one
context that is getting recycled for the next RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index aca8bdf65d72..22687533c3e9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -467,7 +467,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	 * progress even if the client is using one rkey per page
 	 * in each Read chunk.
 	 */
-	ctxts = 3 * RPCSVC_MAXPAGES;
+	ctxts = 3 * svc_serv_maxpages(xprt->xpt_server);
 	newxprt->sc_sq_depth = rq_depth + ctxts;
 	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
 		newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/10] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (6 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 07/10] svcrdma: Adjust the number of RDMA contexts per transport cel
@ 2025-04-19 17:28 ` cel
  2025-04-19 17:28 ` [PATCH v2 09/10] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Allow allocation of more entries in the rc_pages[] array when the
maximum size of an RPC message is increased.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h         | 3 ++-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 8 ++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 619fc0bd837a..1016f2feddc4 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -202,7 +202,8 @@ struct svc_rdma_recv_ctxt {
 	struct svc_rdma_pcl	rc_reply_pcl;
 
 	unsigned int		rc_page_count;
-	struct page		*rc_pages[RPCSVC_MAXPAGES];
+	unsigned long		rc_maxpages;
+	struct page		*rc_pages[] __counted_by(rc_maxpages);
 };
 
 /*
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 292022f0976e..e7e4a39ca6c6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -120,12 +120,16 @@ svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
 {
 	int node = ibdev_to_node(rdma->sc_cm_id->device);
 	struct svc_rdma_recv_ctxt *ctxt;
+	unsigned long pages;
 	dma_addr_t addr;
 	void *buffer;
 
-	ctxt = kzalloc_node(sizeof(*ctxt), GFP_KERNEL, node);
+	pages = svc_serv_maxpages(rdma->sc_xprt.xpt_server);
+	ctxt = kzalloc_node(struct_size(ctxt, rc_pages, pages),
+			    GFP_KERNEL, node);
 	if (!ctxt)
 		goto fail0;
+	ctxt->rc_maxpages = pages;
 	buffer = kmalloc_node(rdma->sc_max_req_size, GFP_KERNEL, node);
 	if (!buffer)
 		goto fail1;
@@ -497,7 +501,7 @@ static bool xdr_check_write_chunk(struct svc_rdma_recv_ctxt *rctxt)
 	 * a computation, perform a simple range check. This is an
 	 * arbitrary but sensible limit (ie, not architectural).
 	 */
-	if (unlikely(segcount > RPCSVC_MAXPAGES))
+	if (unlikely(segcount > rctxt->rc_maxpages))
 		return false;
 
 	p = xdr_inline_decode(&rctxt->rc_stream,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/10] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (7 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 08/10] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
@ 2025-04-19 17:28 ` cel
  2025-04-19 17:28 ` [PATCH v2 10/10] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Allow allocation of more entries in the sc_pages[] array when the
maximum size of an RPC message is increased.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h       |  3 ++-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c | 16 +++++++++++++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 1016f2feddc4..22704c2e5b9b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -245,7 +245,8 @@ struct svc_rdma_send_ctxt {
 	void			*sc_xprt_buf;
 	int			sc_page_count;
 	int			sc_cur_sge_no;
-	struct page		*sc_pages[RPCSVC_MAXPAGES];
+	unsigned long		sc_maxpages;
+	struct page		**sc_pages;
 	struct ib_sge		sc_sges[];
 };
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 96154a2367a1..914cd263c2f1 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -118,6 +118,7 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 {
 	int node = ibdev_to_node(rdma->sc_cm_id->device);
 	struct svc_rdma_send_ctxt *ctxt;
+	unsigned long pages;
 	dma_addr_t addr;
 	void *buffer;
 	int i;
@@ -126,13 +127,19 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 			    GFP_KERNEL, node);
 	if (!ctxt)
 		goto fail0;
+	pages = svc_serv_maxpages(rdma->sc_xprt.xpt_server);
+	ctxt->sc_pages = kcalloc_node(pages, sizeof(struct page *),
+				      GFP_KERNEL, node);
+	if (!ctxt->sc_pages)
+		goto fail1;
+	ctxt->sc_maxpages = pages;
 	buffer = kmalloc_node(rdma->sc_max_req_size, GFP_KERNEL, node);
 	if (!buffer)
-		goto fail1;
+		goto fail2;
 	addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
 				 rdma->sc_max_req_size, DMA_TO_DEVICE);
 	if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
-		goto fail2;
+		goto fail3;
 
 	svc_rdma_send_cid_init(rdma, &ctxt->sc_cid);
 
@@ -151,8 +158,10 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 		ctxt->sc_sges[i].lkey = rdma->sc_pd->local_dma_lkey;
 	return ctxt;
 
-fail2:
+fail3:
 	kfree(buffer);
+fail2:
+	kfree(ctxt->sc_pages);
 fail1:
 	kfree(ctxt);
 fail0:
@@ -176,6 +185,7 @@ void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma)
 				    rdma->sc_max_req_size,
 				    DMA_TO_DEVICE);
 		kfree(ctxt->sc_xprt_buf);
+		kfree(ctxt->sc_pages);
 		kfree(ctxt);
 	}
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/10] sunrpc: Remove the RPCSVC_MAXPAGES macro
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (8 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 09/10] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
@ 2025-04-19 17:28 ` cel
  2025-04-19 17:54 ` [PATCH v2 00/10] Allocate payload arrays dynamically Chuck Lever
  2025-04-21 12:28 ` Jeff Layton
  11 siblings, 0 replies; 20+ messages in thread
From: cel @ 2025-04-19 17:28 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

It is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc.h | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index a524564094d6..f12abb476e07 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -150,14 +150,7 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
  * list.  xdr_buf.tail points to the end of the first page.
  * This assumes that the non-page part of an rpc reply will fit
  * in a page - NFSd ensures this.  lockd also has no trouble.
- *
- * Each request/reply pair can have at most one "payload", plus two pages,
- * one for the request, and one for the reply.
- * We using ->sendfile to return read data, we might need one extra page
- * if the request is not page-aligned.  So add another '1'.
  */
-#define RPCSVC_MAXPAGES		((RPCSVC_MAXPAYLOAD+PAGE_SIZE-1)/PAGE_SIZE \
-				+ 2 + 1)
 
 /**
  * svc_serv_maxpages - maximum pages/kvecs needed for one RPC message
@@ -165,6 +158,11 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
  *
  * Returns a count of pages or vectors that can hold the maximum
  * size RPC message for @serv.
+ *
+ * Each request/reply pair can have at most one "payload", plus two
+ * pages, one for the request, and one for the reply.
+ * nfsd_splice_actor() might need an extra page when a READ payload
+ * is not page-aligned.
  */
 static inline unsigned long svc_serv_maxpages(const struct svc_serv *serv)
 {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/10] Allocate payload arrays dynamically
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (9 preceding siblings ...)
  2025-04-19 17:28 ` [PATCH v2 10/10] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
@ 2025-04-19 17:54 ` Chuck Lever
  2025-04-21 12:28 ` Jeff Layton
  11 siblings, 0 replies; 20+ messages in thread
From: Chuck Lever @ 2025-04-19 17:54 UTC (permalink / raw)
  To: cel, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo,
	Tom Talpey
  Cc: linux-nfs

On 4/19/25 1:28 PM, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> In order to make RPCSVC_MAXPAYLOAD larger (or variable in size), we
> need to do something clever with the payload arrays embedded in
> struct svc_rqst and elsewhere.
> 
> My preference is to keep these arrays allocated all the time because
> allocating them on demand increases the risk of a memory allocation
> failure during a large I/O. This is a quick-and-dirty approach that
> might be replaced once NFSD is converted to use large folios.
> 
> The downside of this design choice is that it pins a few pages per
> NFSD thread (and that's the current situation already). But note
> that because RPCSVC_MAXPAGES is 259, each array is just over a page
> in size, making the allocation waste quite a bit of memory beyond
> the end of the array due to power-of-2 allocator round up. This gets
> worse as the MAXPAGES value is doubled or quadrupled.
> 
> This series also addresses similar issues in the socket and RDMA
> transports.

I forgot to note that this series has other benefits besides making
these arrays flexible in size. The reduction in size of struct
svc_rqst to under half a page is kind of a big deal.


> Chuck Lever (9):
>   sunrpc: Remove backchannel check in svc_init_buffer()
>   sunrpc: Add a helper to derive maxpages from sv_max_mesg
>   sunrpc: Replace the rq_pages array with dynamically-allocated memory
>   sunrpc: Replace the rq_vec array with dynamically-allocated memory
>   sunrpc: Replace the rq_bvec array with dynamically-allocated memory
>   sunrpc: Adjust size of socket's receive page array dynamically
>   svcrdma: Adjust the number of RDMA contexts per transport
>   svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
>   svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
> 
>  fs/nfsd/nfs4proc.c                       |  1 -
>  fs/nfsd/vfs.c                            |  2 +-
>  include/linux/sunrpc/svc.h               | 19 +++++++--
>  include/linux/sunrpc/svc_rdma.h          |  6 ++-
>  include/linux/sunrpc/svcsock.h           |  4 +-
>  net/sunrpc/svc.c                         | 51 +++++++++++++++---------
>  net/sunrpc/svc_xprt.c                    | 10 +----
>  net/sunrpc/svcsock.c                     | 15 ++++---
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  8 +++-
>  net/sunrpc/xprtrdma/svc_rdma_rw.c        |  2 +-
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 16 ++++++--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |  2 +-
>  12 files changed, 88 insertions(+), 48 deletions(-)
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer()
  2025-04-19 17:28 ` [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer() cel
@ 2025-04-21 12:16   ` Jeff Layton
  2025-04-21 14:59     ` Chuck Lever
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2025-04-21 12:16 UTC (permalink / raw)
  To: cel, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On Sat, 2025-04-19 at 13:28 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> A backchannel service might use forechannel send and receive
> buffers, but svc_recv() still calls svc_alloc_arg() on backchannel
> svc_rqsts, so it will populate the svc_rqst's rq_pages[] array
> anyway. The "is backchannel" check in svc_init_buffer() saves us
> nothing.
> 
> In a moment, I plan to replace the rq_pages[] array with a
> dynamically-allocated piece of memory, and svc_init_buffer() is
> where that memory will get allocated. Backchannel requests actually
> do use that array, so it has to be available to handle those
> requests without a segfault.
> 
> XXX: Or, make svc_alloc_arg() ignore backchannel requests too?
>      Could set rqstp->rq_maxpages to zero.
> 

Maybe I'm confused here, but the backchannel still needs some pages to
do its thing? I guess the main change here is that the pages are
allocated at svc_create time instead of waiting until later?

> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/svc.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index e7f9c295d13c..8ce3e6b3df6a 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -640,10 +640,6 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
>  {
>  	unsigned long pages, ret;
>  
> -	/* bc_xprt uses fore channel allocated buffers */
> -	if (svc_is_backchannel(rqstp))
> -		return true;
> -
>  	pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
>  				       * We assume one is at most one page
>  				       */

Acked-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/10] sunrpc: Replace the rq_pages array with dynamically-allocated memory
  2025-04-19 17:28 ` [PATCH v2 03/10] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
@ 2025-04-21 12:19   ` Jeff Layton
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2025-04-21 12:19 UTC (permalink / raw)
  To: cel, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On Sat, 2025-04-19 at 13:28 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> As a step towards making NFSD's maximum rsize and wsize variable at
> run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
> with a chunk of dynamically-allocated memory.
> 
> On a system with 8-byte pointers and 4KB pages, pahole reports that
> the rq_pages[] array is 2080 bytes. Replacing it with a single
> pointer reduces the size of struct svc_rqst to just over 9500 bytes.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  include/linux/sunrpc/svc.h        |  3 ++-
>  net/sunrpc/svc.c                  | 34 ++++++++++++++++++-------------
>  net/sunrpc/svc_xprt.c             | 10 +--------
>  net/sunrpc/xprtrdma/svc_rdma_rw.c |  2 +-
>  4 files changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 5b879c31d7b8..96ac12dbb04d 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -200,7 +200,8 @@ struct svc_rqst {
>  	struct xdr_stream	rq_res_stream;
>  	struct page		*rq_scratch_page;
>  	struct xdr_buf		rq_res;
> -	struct page		*rq_pages[RPCSVC_MAXPAGES + 1];
> +	unsigned long		rq_maxpages;	/* num of entries in rq_pages */
> +	struct page *		*rq_pages;
>  	struct page *		*rq_respages;	/* points into rq_pages */
>  	struct page *		*rq_next_page; /* next reply page to use */
>  	struct page *		*rq_page_end;  /* one past the last page */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 8ce3e6b3df6a..682e11c9be36 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -636,20 +636,25 @@ svc_destroy(struct svc_serv **servp)
>  EXPORT_SYMBOL_GPL(svc_destroy);
>  
>  static bool
> -svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
> +svc_init_buffer(struct svc_rqst *rqstp, const struct svc_serv *serv, int node)
>  {
> -	unsigned long pages, ret;
> +	unsigned long ret;
>  
> -	pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
> -				       * We assume one is at most one page
> -				       */
> -	WARN_ON_ONCE(pages > RPCSVC_MAXPAGES);
> -	if (pages > RPCSVC_MAXPAGES)
> -		pages = RPCSVC_MAXPAGES;
> +	/* Add an extra page, as rq_pages holds both request and reply.
> +	 * We assume one of those is at most one page.
> +	 */
> +	rqstp->rq_maxpages = svc_serv_maxpages(serv) + 1;
>  
> -	ret = alloc_pages_bulk_node(GFP_KERNEL, node, pages,
> +	/* rq_pages' last entry is NULL for historical reasons. */
> +	rqstp->rq_pages = kcalloc_node(rqstp->rq_maxpages + 1,
> +				       sizeof(struct page *),
> +				       GFP_KERNEL, node);
> +	if (!rqstp->rq_pages)
> +		return false;
> +
> +	ret = alloc_pages_bulk_node(GFP_KERNEL, node, rqstp->rq_maxpages,
>  				    rqstp->rq_pages);
> -	return ret == pages;
> +	return ret == rqstp->rq_maxpages;
>  }
>  
>  /*
> @@ -658,11 +663,12 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
>  static void
>  svc_release_buffer(struct svc_rqst *rqstp)
>  {
> -	unsigned int i;
> +	unsigned long i;
>  
> -	for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++)
> +	for (i = 0; i < rqstp->rq_maxpages; i++)
>  		if (rqstp->rq_pages[i])
>  			put_page(rqstp->rq_pages[i]);
> +	kfree(rqstp->rq_pages);
>  }
>  
>  static void
> @@ -704,7 +710,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>  	if (!rqstp->rq_resp)
>  		goto out_enomem;
>  
> -	if (!svc_init_buffer(rqstp, serv->sv_max_mesg, node))
> +	if (!svc_init_buffer(rqstp, serv, node))
>  		goto out_enomem;
>  
>  	rqstp->rq_err = -EAGAIN; /* No error yet */
> @@ -896,7 +902,7 @@ EXPORT_SYMBOL_GPL(svc_set_num_threads);
>  bool svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page)
>  {
>  	struct page **begin = rqstp->rq_pages;
> -	struct page **end = &rqstp->rq_pages[RPCSVC_MAXPAGES];
> +	struct page **end = &rqstp->rq_pages[rqstp->rq_maxpages];
>  
>  	if (unlikely(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) {
>  		trace_svc_replace_page_err(rqstp);
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index ae25405d8bd2..23547ed25269 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -651,18 +651,10 @@ static void svc_check_conn_limits(struct svc_serv *serv)
>  
>  static bool svc_alloc_arg(struct svc_rqst *rqstp)
>  {
> -	struct svc_serv *serv = rqstp->rq_server;
>  	struct xdr_buf *arg = &rqstp->rq_arg;
>  	unsigned long pages, filled, ret;
>  
> -	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> -	if (pages > RPCSVC_MAXPAGES) {
> -		pr_warn_once("svc: warning: pages=%lu > RPCSVC_MAXPAGES=%lu\n",
> -			     pages, RPCSVC_MAXPAGES);
> -		/* use as many pages as possible */
> -		pages = RPCSVC_MAXPAGES;
> -	}
> -
> +	pages = rqstp->rq_maxpages;
>  	for (filled = 0; filled < pages; filled = ret) {
>  		ret = alloc_pages_bulk(GFP_KERNEL, pages, rqstp->rq_pages);
>  		if (ret > filled)
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> index 40797114d50a..661b3fe2779f 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> @@ -765,7 +765,7 @@ static int svc_rdma_build_read_segment(struct svc_rqst *rqstp,
>  		}
>  		len -= seg_len;
>  
> -		if (len && ((head->rc_curpage + 1) > ARRAY_SIZE(rqstp->rq_pages)))
> +		if (len && ((head->rc_curpage + 1) > rqstp->rq_maxpages))
>  			goto out_overrun;
>  	}
>  

nit: I'd probably squash 2 and 3 together, but they both look right:

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 04/10] sunrpc: Replace the rq_vec array with dynamically-allocated memory
  2025-04-19 17:28 ` [PATCH v2 04/10] sunrpc: Replace the rq_vec " cel
@ 2025-04-21 12:22   ` Jeff Layton
  2025-04-21 15:05     ` Chuck Lever
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2025-04-21 12:22 UTC (permalink / raw)
  To: cel, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On Sat, 2025-04-19 at 13:28 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> As a step towards making NFSD's maximum rsize and wsize variable at
> run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
> with a chunk of dynamically-allocated memory.
> 
> The rq_vec array is sized assuming request processing will need at
> most one kvec per page in a maximum-sized RPC message.
> 
> On a system with 8-byte pointers and 4KB pages, pahole reports that
> the rq_vec[] array is 4144 bytes. Replacing it with a single
> pointer reduces the size of struct svc_rqst to about 5400 bytes.

nit: so I guess the current struct is ~9k or so? If you're going to
post numbers here, they should probably refer to the same thing.

I'm lazy -- don't make me do math.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4proc.c         | 1 -
>  fs/nfsd/vfs.c              | 2 +-
>  include/linux/sunrpc/svc.h | 2 +-
>  net/sunrpc/svc.c           | 8 +++++++-
>  4 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index b397246dae7b..d1be58b557d1 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1228,7 +1228,6 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	write->wr_how_written = write->wr_stable_how;
>  
>  	nvecs = svc_fill_write_vector(rqstp, &write->wr_payload);
> -	WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
>  
>  	status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf,
>  				write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 9abdc4b75813..4eaac3aa7e15 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1094,7 +1094,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		++v;
>  		base = 0;
>  	}
> -	WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
> +	WARN_ON_ONCE(v > rqstp->rq_maxpages);
>  
>  	trace_nfsd_read_vector(rqstp, fhp, offset, *count);
>  	iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 96ac12dbb04d..72d016772711 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -207,7 +207,7 @@ struct svc_rqst {
>  	struct page *		*rq_page_end;  /* one past the last page */
>  
>  	struct folio_batch	rq_fbatch;
> -	struct kvec		rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */
> +	struct kvec		*rq_vec;
>  	struct bio_vec		rq_bvec[RPCSVC_MAXPAGES];
>  
>  	__be32			rq_xid;		/* transmission id */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 682e11c9be36..5808d4b97547 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -675,6 +675,7 @@ static void
>  svc_rqst_free(struct svc_rqst *rqstp)
>  {
>  	folio_batch_release(&rqstp->rq_fbatch);
> +	kfree(rqstp->rq_vec);
>  	svc_release_buffer(rqstp);
>  	if (rqstp->rq_scratch_page)
>  		put_page(rqstp->rq_scratch_page);
> @@ -713,6 +714,11 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>  	if (!svc_init_buffer(rqstp, serv, node))
>  		goto out_enomem;
>  
> +	rqstp->rq_vec = kcalloc_node(rqstp->rq_maxpages, sizeof(struct kvec),
> +				      GFP_KERNEL, node);
> +	if (!rqstp->rq_vec)
> +		goto out_enomem;
> +
>  	rqstp->rq_err = -EAGAIN; /* No error yet */
>  
>  	serv->sv_nrthreads += 1;
> @@ -1750,7 +1756,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp,
>  		++pages;
>  	}
>  
> -	WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec));
> +	WARN_ON_ONCE(i > rqstp->rq_maxpages);
>  	return i;
>  }
>  EXPORT_SYMBOL_GPL(svc_fill_write_vector);

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/10] Allocate payload arrays dynamically
  2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
                   ` (10 preceding siblings ...)
  2025-04-19 17:54 ` [PATCH v2 00/10] Allocate payload arrays dynamically Chuck Lever
@ 2025-04-21 12:28 ` Jeff Layton
  11 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2025-04-21 12:28 UTC (permalink / raw)
  To: cel, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On Sat, 2025-04-19 at 13:28 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> In order to make RPCSVC_MAXPAYLOAD larger (or variable in size), we
> need to do something clever with the payload arrays embedded in
> struct svc_rqst and elsewhere.
> 
> My preference is to keep these arrays allocated all the time because
> allocating them on demand increases the risk of a memory allocation
> failure during a large I/O. This is a quick-and-dirty approach that
> might be replaced once NFSD is converted to use large folios.
> 
> The downside of this design choice is that it pins a few pages per
> NFSD thread (and that's the current situation already). But note
> that because RPCSVC_MAXPAGES is 259, each array is just over a page
> in size, making the allocation waste quite a bit of memory beyond
> the end of the array due to power-of-2 allocator round up. This gets
> worse as the MAXPAGES value is doubled or quadrupled.
> 
> This series also addresses similar issues in the socket and RDMA
> transports.
> 
> Chuck Lever (9):
>   sunrpc: Remove backchannel check in svc_init_buffer()
>   sunrpc: Add a helper to derive maxpages from sv_max_mesg
>   sunrpc: Replace the rq_pages array with dynamically-allocated memory
>   sunrpc: Replace the rq_vec array with dynamically-allocated memory
>   sunrpc: Replace the rq_bvec array with dynamically-allocated memory
>   sunrpc: Adjust size of socket's receive page array dynamically
>   svcrdma: Adjust the number of RDMA contexts per transport
>   svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
>   svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
> 
>  fs/nfsd/nfs4proc.c                       |  1 -
>  fs/nfsd/vfs.c                            |  2 +-
>  include/linux/sunrpc/svc.h               | 19 +++++++--
>  include/linux/sunrpc/svc_rdma.h          |  6 ++-
>  include/linux/sunrpc/svcsock.h           |  4 +-
>  net/sunrpc/svc.c                         | 51 +++++++++++++++---------
>  net/sunrpc/svc_xprt.c                    | 10 +----
>  net/sunrpc/svcsock.c                     | 15 ++++---
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  8 +++-
>  net/sunrpc/xprtrdma/svc_rdma_rw.c        |  2 +-
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 16 ++++++--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |  2 +-
>  12 files changed, 88 insertions(+), 48 deletions(-)
> 
> -- 
> 2.49.0
> 
> 
> Chuck Lever (10):
>   sunrpc: Remove backchannel check in svc_init_buffer()
>   sunrpc: Add a helper to derive maxpages from sv_max_mesg
>   sunrpc: Replace the rq_pages array with dynamically-allocated memory
>   sunrpc: Replace the rq_vec array with dynamically-allocated memory
>   sunrpc: Replace the rq_bvec array with dynamically-allocated memory
>   sunrpc: Adjust size of socket's receive page array dynamically
>   svcrdma: Adjust the number of RDMA contexts per transport
>   svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages
>   svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages
>   sunrpc: Remove the RPCSVC_MAXPAGES macro
> 
>  fs/nfsd/nfs4proc.c                       |  1 -
>  fs/nfsd/vfs.c                            |  2 +-
>  include/linux/sunrpc/svc.h               | 31 +++++++++-----
>  include/linux/sunrpc/svc_rdma.h          |  6 ++-
>  include/linux/sunrpc/svcsock.h           |  4 +-
>  net/sunrpc/svc.c                         | 51 +++++++++++++++---------
>  net/sunrpc/svc_xprt.c                    | 10 +----
>  net/sunrpc/svcsock.c                     | 15 ++++---
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  8 +++-
>  net/sunrpc/xprtrdma/svc_rdma_rw.c        |  2 +-
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 16 ++++++--
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |  2 +-
>  12 files changed, 93 insertions(+), 55 deletions(-)
> 

I think the pile looks fine, modulo a few nits (which you can clean up
if you like).

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer()
  2025-04-21 12:16   ` Jeff Layton
@ 2025-04-21 14:59     ` Chuck Lever
  0 siblings, 0 replies; 20+ messages in thread
From: Chuck Lever @ 2025-04-21 14:59 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On 4/21/25 8:16 AM, Jeff Layton wrote:
> On Sat, 2025-04-19 at 13:28 -0400, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> A backchannel service might use forechannel send and receive
>> buffers, but svc_recv() still calls svc_alloc_arg() on backchannel
>> svc_rqsts, so it will populate the svc_rqst's rq_pages[] array
>> anyway. The "is backchannel" check in svc_init_buffer() saves us
>> nothing.
>>
>> In a moment, I plan to replace the rq_pages[] array with a
>> dynamically-allocated piece of memory, and svc_init_buffer() is
>> where that memory will get allocated. Backchannel requests actually
>> do use that array, so it has to be available to handle those
>> requests without a segfault.
>>
>> XXX: Or, make svc_alloc_arg() ignore backchannel requests too?
>>      Could set rqstp->rq_maxpages to zero.
>>
> 
> Maybe I'm confused here, but the backchannel still needs some pages to
> do its thing? I guess the main change here is that the pages are
> allocated at svc_create time instead of waiting until later?

The backchannel does not use the pages in svc_rqst::rq_pages, it's
rq_arg::pages comes from the RPC client's page allocator. Currently,
svc_init_buffer() skips allocating pages in rq_pages for that reason.

Except that, svc_rqst::rq_pages is filled anyway when a backchannel
svc_rqst is passed to svc_recv() -> and then to svc_alloc_arg().

This isn't really a problem at the moment, except that these pages are
allocated but then never used AFAICT.

The problem is that in 03/10, in addition to populating rq_pages[],
svc_init_buffer() also now allocates the rq_pages array itself. If
that is skipped, then svc_alloc_args() chases a NULL pointer.

As I mentioned in the patch description, there are two or three ways to
handle this. I'm not sure the way proposed here is best, but it does
avoid introducing extra conditional logic in svc_alloc_args(), which is
a hot path.

Let me know if you have a better idea. I will try to bolster the patch
description here as well.


>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  net/sunrpc/svc.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>> index e7f9c295d13c..8ce3e6b3df6a 100644
>> --- a/net/sunrpc/svc.c
>> +++ b/net/sunrpc/svc.c
>> @@ -640,10 +640,6 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
>>  {
>>  	unsigned long pages, ret;
>>  
>> -	/* bc_xprt uses fore channel allocated buffers */
>> -	if (svc_is_backchannel(rqstp))
>> -		return true;
>> -
>>  	pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
>>  				       * We assume one is at most one page
>>  				       */
> 
> Acked-by: Jeff Layton <jlayton@kernel.org>


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 04/10] sunrpc: Replace the rq_vec array with dynamically-allocated memory
  2025-04-21 12:22   ` Jeff Layton
@ 2025-04-21 15:05     ` Chuck Lever
  0 siblings, 0 replies; 20+ messages in thread
From: Chuck Lever @ 2025-04-21 15:05 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On 4/21/25 8:22 AM, Jeff Layton wrote:
> On Sat, 2025-04-19 at 13:28 -0400, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> As a step towards making NFSD's maximum rsize and wsize variable at
>> run-time, replace the fixed-size rq_vec[] array in struct svc_rqst
>> with a chunk of dynamically-allocated memory.
>>
>> The rq_vec array is sized assuming request processing will need at
>> most one kvec per page in a maximum-sized RPC message.
>>
>> On a system with 8-byte pointers and 4KB pages, pahole reports that
>> the rq_vec[] array is 4144 bytes. Replacing it with a single
>> pointer reduces the size of struct svc_rqst to about 5400 bytes.
> 
> nit: so I guess the current struct is ~9k or so?

11K plus, on my test system.


> If you're going to
> post numbers here, they should probably refer to the same thing.
> 
> I'm lazy -- don't make me do math.

Well it's a running total. Each patch in this part of the series reduces
the size of struct svc_rqst.

But I can update the descriptions. What math were you attempting to do?


>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/nfsd/nfs4proc.c         | 1 -
>>  fs/nfsd/vfs.c              | 2 +-
>>  include/linux/sunrpc/svc.h | 2 +-
>>  net/sunrpc/svc.c           | 8 +++++++-
>>  4 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index b397246dae7b..d1be58b557d1 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1228,7 +1228,6 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>  	write->wr_how_written = write->wr_stable_how;
>>  
>>  	nvecs = svc_fill_write_vector(rqstp, &write->wr_payload);
>> -	WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
>>  
>>  	status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf,
>>  				write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 9abdc4b75813..4eaac3aa7e15 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1094,7 +1094,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>  		++v;
>>  		base = 0;
>>  	}
>> -	WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
>> +	WARN_ON_ONCE(v > rqstp->rq_maxpages);
>>  
>>  	trace_nfsd_read_vector(rqstp, fhp, offset, *count);
>>  	iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index 96ac12dbb04d..72d016772711 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -207,7 +207,7 @@ struct svc_rqst {
>>  	struct page *		*rq_page_end;  /* one past the last page */
>>  
>>  	struct folio_batch	rq_fbatch;
>> -	struct kvec		rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */
>> +	struct kvec		*rq_vec;
>>  	struct bio_vec		rq_bvec[RPCSVC_MAXPAGES];
>>  
>>  	__be32			rq_xid;		/* transmission id */
>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>> index 682e11c9be36..5808d4b97547 100644
>> --- a/net/sunrpc/svc.c
>> +++ b/net/sunrpc/svc.c
>> @@ -675,6 +675,7 @@ static void
>>  svc_rqst_free(struct svc_rqst *rqstp)
>>  {
>>  	folio_batch_release(&rqstp->rq_fbatch);
>> +	kfree(rqstp->rq_vec);
>>  	svc_release_buffer(rqstp);
>>  	if (rqstp->rq_scratch_page)
>>  		put_page(rqstp->rq_scratch_page);
>> @@ -713,6 +714,11 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>>  	if (!svc_init_buffer(rqstp, serv, node))
>>  		goto out_enomem;
>>  
>> +	rqstp->rq_vec = kcalloc_node(rqstp->rq_maxpages, sizeof(struct kvec),
>> +				      GFP_KERNEL, node);
>> +	if (!rqstp->rq_vec)
>> +		goto out_enomem;
>> +
>>  	rqstp->rq_err = -EAGAIN; /* No error yet */
>>  
>>  	serv->sv_nrthreads += 1;
>> @@ -1750,7 +1756,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp,
>>  		++pages;
>>  	}
>>  
>> -	WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec));
>> +	WARN_ON_ONCE(i > rqstp->rq_maxpages);
>>  	return i;
>>  }
>>  EXPORT_SYMBOL_GPL(svc_fill_write_vector);
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg
  2025-04-19 17:28 ` [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
@ 2025-04-22 20:48   ` NeilBrown
  2025-04-23 13:16     ` Chuck Lever
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2025-04-22 20:48 UTC (permalink / raw)
  To: cel
  Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On Sun, 20 Apr 2025, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> This page count is to be used to allocate various arrays of pages,
> bio_vecs, and kvecs, replacing the fixed RPCSVC_MAXPAGES value.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  include/linux/sunrpc/svc.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 74658cca0f38..5b879c31d7b8 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -159,6 +159,18 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
>  #define RPCSVC_MAXPAGES		((RPCSVC_MAXPAYLOAD+PAGE_SIZE-1)/PAGE_SIZE \
>  				+ 2 + 1)
>  
> +/**
> + * svc_serv_maxpages - maximum pages/kvecs needed for one RPC message
> + * @serv: RPC service context
> + *
> + * Returns a count of pages or vectors that can hold the maximum
> + * size RPC message for @serv.
> + */
> +static inline unsigned long svc_serv_maxpages(const struct svc_serv *serv)
> +{
> +	return ((serv->sv_max_mesg + PAGE_SIZE - 1) >> PAGE_SHIFT) + 2 + 1;
> +}
> +

This looks like it should be
     DIV_ROUND_UP(serv->sv_max_mesg, PAGE_SIZE) + 2

Could we document what the "+ 2" is for??

Thanks,
NeilBrown


>  /*
>   * The context of a single thread, including the request currently being
>   * processed.
> -- 
> 2.49.0
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg
  2025-04-22 20:48   ` NeilBrown
@ 2025-04-23 13:16     ` Chuck Lever
  0 siblings, 0 replies; 20+ messages in thread
From: Chuck Lever @ 2025-04-23 13:16 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On 4/22/25 4:48 PM, NeilBrown wrote:
> On Sun, 20 Apr 2025, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> This page count is to be used to allocate various arrays of pages,
>> bio_vecs, and kvecs, replacing the fixed RPCSVC_MAXPAGES value.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  include/linux/sunrpc/svc.h | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index 74658cca0f38..5b879c31d7b8 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -159,6 +159,18 @@ extern u32 svc_max_payload(const struct svc_rqst *rqstp);
>>  #define RPCSVC_MAXPAGES		((RPCSVC_MAXPAYLOAD+PAGE_SIZE-1)/PAGE_SIZE \
>>  				+ 2 + 1)
>>  
>> +/**
>> + * svc_serv_maxpages - maximum pages/kvecs needed for one RPC message
>> + * @serv: RPC service context
>> + *
>> + * Returns a count of pages or vectors that can hold the maximum
>> + * size RPC message for @serv.
>> + */
>> +static inline unsigned long svc_serv_maxpages(const struct svc_serv *serv)
>> +{
>> +	return ((serv->sv_max_mesg + PAGE_SIZE - 1) >> PAGE_SHIFT) + 2 + 1;
>> +}
>> +
> 
> This looks like it should be
>      DIV_ROUND_UP(serv->sv_max_mesg, PAGE_SIZE) + 2

Fair enough! I had forgotten about that macro.


> Could we document what the "+ 2" is for??

10/10 in this series moves an existing documenting comment for that
purpose. I've relocated that hunk to this patch, but I think the text
needs to be revisited because it ignores NFSv4 COMPOUNDs that may
carry multiple payloads.

I plan to post a v3 of this series soon.


> Thanks,
> NeilBrown
> 
> 
>>  /*
>>   * The context of a single thread, including the request currently being
>>   * processed.
>> -- 
>> 2.49.0
>>
>>
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-04-23 13:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-19 17:28 [PATCH v2 00/10] Allocate payload arrays dynamically cel
2025-04-19 17:28 ` [PATCH v2 01/10] sunrpc: Remove backchannel check in svc_init_buffer() cel
2025-04-21 12:16   ` Jeff Layton
2025-04-21 14:59     ` Chuck Lever
2025-04-19 17:28 ` [PATCH v2 02/10] sunrpc: Add a helper to derive maxpages from sv_max_mesg cel
2025-04-22 20:48   ` NeilBrown
2025-04-23 13:16     ` Chuck Lever
2025-04-19 17:28 ` [PATCH v2 03/10] sunrpc: Replace the rq_pages array with dynamically-allocated memory cel
2025-04-21 12:19   ` Jeff Layton
2025-04-19 17:28 ` [PATCH v2 04/10] sunrpc: Replace the rq_vec " cel
2025-04-21 12:22   ` Jeff Layton
2025-04-21 15:05     ` Chuck Lever
2025-04-19 17:28 ` [PATCH v2 05/10] sunrpc: Replace the rq_bvec " cel
2025-04-19 17:28 ` [PATCH v2 06/10] sunrpc: Adjust size of socket's receive page array dynamically cel
2025-04-19 17:28 ` [PATCH v2 07/10] svcrdma: Adjust the number of RDMA contexts per transport cel
2025-04-19 17:28 ` [PATCH v2 08/10] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages cel
2025-04-19 17:28 ` [PATCH v2 09/10] svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages cel
2025-04-19 17:28 ` [PATCH v2 10/10] sunrpc: Remove the RPCSVC_MAXPAGES macro cel
2025-04-19 17:54 ` [PATCH v2 00/10] Allocate payload arrays dynamically Chuck Lever
2025-04-21 12:28 ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox