From: Chuck Lever <cel@kernel.org>
To: Leon Romanovsky <leon@kernel.org>, Christoph Hellwig <hch@lst.de>
Cc: <linux-rdma@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: [RFC PATCH] svcrdma: Use compound pages for RDMA Read sink buffers
Date: Tue, 10 Mar 2026 15:56:50 -0400 [thread overview]
Message-ID: <20260310195650.15785-1-cel@kernel.org> (raw)
In-Reply-To: <20260310193151.GN12611@unreal>
From: Chuck Lever <chuck.lever@oracle.com>
svc_rdma_build_read_segment() constructs RDMA Read sink
buffers by consuming pages one-at-a-time from rq_pages[]
and building one bvec per page. A 64KB NFS READ payload
produces 16 separate bvecs, 16 DMA mappings, and
potentially multiple RDMA Read WRs.
A single higher-order allocation followed by split_page()
yields physically contiguous memory while preserving
per-page refcounts. A single bvec spanning the contiguous
range causes rdma_rw_ctx_init_bvec() to take the
rdma_rw_init_single_wr_bvec() fast path: one DMA mapping,
one SGE, one WR.
The split sub-pages replace the original rq_pages[] entries,
so all downstream page tracking, completion handling, and
xdr_buf assembly remain unchanged.
Allocation uses __GFP_NORETRY | __GFP_NOWARN and falls back
through decreasing orders. If even order-1 fails, the
existing per-page path handles the segment.
The compound path is attempted only when the segment starts
page-aligned (rc_pageoff == 0) and spans at least two pages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/svc_rdma_rw.c | 120 ++++++++++++++++++++++++++++++
1 file changed, 120 insertions(+)
What if svcrdma did something derpy like this?
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 9e17700fae2a..42de7151ae68 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -754,6 +754,118 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma,
return xdr->len;
}
+#define SVC_RDMA_COMPOUND_MAX_ORDER 4 /* 64KB max */
+
+/**
+ * svc_rdma_alloc_read_pages - Allocate physically contiguous pages
+ * @nr_pages: number of pages needed
+ * @order: on success, set to the allocation order
+ *
+ * Attempts a higher-order allocation, falling back to smaller orders.
+ * The returned pages are split immediately so each sub-page has its
+ * own refcount and can be freed independently.
+ *
+ * Returns a pointer to the first page on success, or NULL if even
+ * order-1 allocation fails.
+ */
+static struct page *
+svc_rdma_alloc_read_pages(unsigned int nr_pages, unsigned int *order)
+{
+ unsigned int o;
+ struct page *page;
+
+ o = get_order(nr_pages << PAGE_SHIFT);
+ if (o > SVC_RDMA_COMPOUND_MAX_ORDER)
+ o = SVC_RDMA_COMPOUND_MAX_ORDER;
+
+ while (o >= 1) {
+ page = alloc_pages(GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN,
+ o);
+ if (page) {
+ split_page(page, o);
+ *order = o;
+ return page;
+ }
+ o--;
+ }
+ return NULL;
+}
+
+/**
+ * svc_rdma_build_read_segment_compound - Build a single RDMA Read WR using compound pages
+ * @rqstp: RPC transaction context
+ * @head: context for ongoing I/O
+ * @segment: co-ordinates of remote memory to be read
+ *
+ * Allocates a higher-order page and splits it, then builds a single
+ * bvec spanning the contiguous physical range. The split sub-pages
+ * replace entries in rq_pages[] so downstream cleanup is unchanged.
+ *
+ * Returns:
+ * %0: the Read WR was constructed successfully
+ * %-EINVAL: not enough rq_pages slots
+ * %-ENOMEM: compound allocation or rw_ctxt allocation failed
+ * %-EIO: a DMA mapping error occurred
+ */
+static int svc_rdma_build_read_segment_compound(struct svc_rqst *rqstp,
+ struct svc_rdma_recv_ctxt *head,
+ const struct svc_rdma_segment *segment)
+{
+ struct svcxprt_rdma *rdma = svc_rdma_rqst_rdma(rqstp);
+ struct svc_rdma_chunk_ctxt *cc = &head->rc_cc;
+ unsigned int order, alloc_nr, nr_data_pages, i;
+ struct svc_rdma_rw_ctxt *ctxt;
+ struct page *page;
+ int ret;
+
+ nr_data_pages = PAGE_ALIGN(segment->rs_length) >> PAGE_SHIFT;
+
+ page = svc_rdma_alloc_read_pages(nr_data_pages, &order);
+ if (!page)
+ return -ENOMEM;
+ alloc_nr = 1 << order;
+
+ if (alloc_nr < nr_data_pages ||
+ head->rc_curpage + alloc_nr > rqstp->rq_maxpages) {
+ for (i = 0; i < alloc_nr; i++)
+ __free_page(page + i);
+ return -ENOMEM;
+ }
+
+ ctxt = svc_rdma_get_rw_ctxt(rdma, 1);
+ if (!ctxt) {
+ for (i = 0; i < alloc_nr; i++)
+ __free_page(page + i);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < alloc_nr; i++) {
+ put_page(rqstp->rq_pages[head->rc_curpage + i]);
+ rqstp->rq_pages[head->rc_curpage + i] = page + i;
+ }
+
+ bvec_set_page(&ctxt->rw_bvec[0], page, segment->rs_length, 0);
+ ctxt->rw_nents = 1;
+
+ head->rc_page_count += nr_data_pages;
+ head->rc_pageoff = offset_in_page(segment->rs_length);
+ if (head->rc_pageoff)
+ head->rc_curpage += nr_data_pages - 1;
+ else
+ head->rc_curpage += nr_data_pages;
+
+ ret = svc_rdma_rw_ctx_init(rdma, ctxt, segment->rs_offset,
+ segment->rs_handle, segment->rs_length,
+ DMA_FROM_DEVICE);
+ if (ret < 0)
+ return -EIO;
+ percpu_counter_inc(&svcrdma_stat_read);
+
+ list_add(&ctxt->rw_list, &cc->cc_rwctxts);
+ cc->cc_sqecount += ret;
+ return 0;
+}
+
/**
* svc_rdma_build_read_segment - Build RDMA Read WQEs to pull one RDMA segment
* @rqstp: RPC transaction context
@@ -780,6 +892,14 @@ static int svc_rdma_build_read_segment(struct svc_rqst *rqstp,
if (check_add_overflow(head->rc_pageoff, len, &total))
return -EINVAL;
nr_bvec = PAGE_ALIGN(total) >> PAGE_SHIFT;
+
+ if (head->rc_pageoff == 0 && nr_bvec >= 2) {
+ ret = svc_rdma_build_read_segment_compound(rqstp, head,
+ segment);
+ if (ret != -ENOMEM)
+ return ret;
+ }
+
ctxt = svc_rdma_get_rw_ctxt(rdma, nr_bvec);
if (!ctxt)
return -ENOMEM;
--
2.53.0
next prev parent reply other threads:[~2026-03-10 19:56 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 3:46 [PATCH] RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ path Chuck Lever
2026-03-10 13:42 ` Christoph Hellwig
2026-03-10 14:36 ` Chuck Lever
2026-03-10 18:37 ` Leon Romanovsky
2026-03-10 18:49 ` Chuck Lever
2026-03-10 19:31 ` Leon Romanovsky
2026-03-10 19:56 ` Chuck Lever [this message]
2026-03-10 20:27 ` [RFC PATCH] svcrdma: Use compound pages for RDMA Read sink buffers Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260310195650.15785-1-cel@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hch@lst.de \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.