From: Chuck Lever <cel@kernel.org>
To: Leon Romanovsky <leon@kernel.org>, Christoph Hellwig <hch@lst.de>
Cc: <linux-rdma@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: [RFC PATCH] svcrdma: Use compound pages for RDMA Read sink buffers
Date: Tue, 10 Mar 2026 15:56:50 -0400 [thread overview]
Message-ID: <20260310195650.15785-1-cel@kernel.org> (raw)
In-Reply-To: <20260310193151.GN12611@unreal>
From: Chuck Lever <chuck.lever@oracle.com>
svc_rdma_build_read_segment() constructs RDMA Read sink
buffers by consuming pages one-at-a-time from rq_pages[]
and building one bvec per page. A 64KB NFS READ payload
produces 16 separate bvecs, 16 DMA mappings, and
potentially multiple RDMA Read WRs.
A single higher-order allocation followed by split_page()
yields physically contiguous memory while preserving
per-page refcounts. A single bvec spanning the contiguous
range causes rdma_rw_ctx_init_bvec() to take the
rdma_rw_init_single_wr_bvec() fast path: one DMA mapping,
one SGE, one WR.
The split sub-pages replace the original rq_pages[] entries,
so all downstream page tracking, completion handling, and
xdr_buf assembly remain unchanged.
Allocation uses __GFP_NORETRY | __GFP_NOWARN and falls back
through decreasing orders. If even order-1 fails, the
existing per-page path handles the segment.
The compound path is attempted only when the segment starts
page-aligned (rc_pageoff == 0) and spans at least two pages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/svc_rdma_rw.c | 120 ++++++++++++++++++++++++++++++
1 file changed, 120 insertions(+)
What if svcrdma did something derpy like this?
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 9e17700fae2a..42de7151ae68 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -754,6 +754,118 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma,
return xdr->len;
}
+#define SVC_RDMA_COMPOUND_MAX_ORDER 4 /* 64KB max */
+
+/**
+ * svc_rdma_alloc_read_pages - Allocate physically contiguous pages
+ * @nr_pages: number of pages needed
+ * @order: on success, set to the allocation order
+ *
+ * Attempts a higher-order allocation, falling back to smaller orders.
+ * The returned pages are split immediately so each sub-page has its
+ * own refcount and can be freed independently.
+ *
+ * Returns a pointer to the first page on success, or NULL if even
+ * order-1 allocation fails.
+ */
+static struct page *
+svc_rdma_alloc_read_pages(unsigned int nr_pages, unsigned int *order)
+{
+ unsigned int o;
+ struct page *page;
+
+ o = get_order(nr_pages << PAGE_SHIFT);
+ if (o > SVC_RDMA_COMPOUND_MAX_ORDER)
+ o = SVC_RDMA_COMPOUND_MAX_ORDER;
+
+ while (o >= 1) {
+ page = alloc_pages(GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN,
+ o);
+ if (page) {
+ split_page(page, o);
+ *order = o;
+ return page;
+ }
+ o--;
+ }
+ return NULL;
+}
+
+/**
+ * svc_rdma_build_read_segment_compound - Build a single RDMA Read WR using compound pages
+ * @rqstp: RPC transaction context
+ * @head: context for ongoing I/O
+ * @segment: co-ordinates of remote memory to be read
+ *
+ * Allocates a higher-order page and splits it, then builds a single
+ * bvec spanning the contiguous physical range. The split sub-pages
+ * replace entries in rq_pages[] so downstream cleanup is unchanged.
+ *
+ * Returns:
+ * %0: the Read WR was constructed successfully
+ * %-EINVAL: not enough rq_pages slots
+ * %-ENOMEM: compound allocation or rw_ctxt allocation failed
+ * %-EIO: a DMA mapping error occurred
+ */
+static int svc_rdma_build_read_segment_compound(struct svc_rqst *rqstp,
+ struct svc_rdma_recv_ctxt *head,
+ const struct svc_rdma_segment *segment)
+{
+ struct svcxprt_rdma *rdma = svc_rdma_rqst_rdma(rqstp);
+ struct svc_rdma_chunk_ctxt *cc = &head->rc_cc;
+ unsigned int order, alloc_nr, nr_data_pages, i;
+ struct svc_rdma_rw_ctxt *ctxt;
+ struct page *page;
+ int ret;
+
+ nr_data_pages = PAGE_ALIGN(segment->rs_length) >> PAGE_SHIFT;
+
+ page = svc_rdma_alloc_read_pages(nr_data_pages, &order);
+ if (!page)
+ return -ENOMEM;
+ alloc_nr = 1 << order;
+
+ if (alloc_nr < nr_data_pages ||
+ head->rc_curpage + alloc_nr > rqstp->rq_maxpages) {
+ for (i = 0; i < alloc_nr; i++)
+ __free_page(page + i);
+ return -ENOMEM;
+ }
+
+ ctxt = svc_rdma_get_rw_ctxt(rdma, 1);
+ if (!ctxt) {
+ for (i = 0; i < alloc_nr; i++)
+ __free_page(page + i);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < alloc_nr; i++) {
+ put_page(rqstp->rq_pages[head->rc_curpage + i]);
+ rqstp->rq_pages[head->rc_curpage + i] = page + i;
+ }
+
+ bvec_set_page(&ctxt->rw_bvec[0], page, segment->rs_length, 0);
+ ctxt->rw_nents = 1;
+
+ head->rc_page_count += nr_data_pages;
+ head->rc_pageoff = offset_in_page(segment->rs_length);
+ if (head->rc_pageoff)
+ head->rc_curpage += nr_data_pages - 1;
+ else
+ head->rc_curpage += nr_data_pages;
+
+ ret = svc_rdma_rw_ctx_init(rdma, ctxt, segment->rs_offset,
+ segment->rs_handle, segment->rs_length,
+ DMA_FROM_DEVICE);
+ if (ret < 0)
+ return -EIO;
+ percpu_counter_inc(&svcrdma_stat_read);
+
+ list_add(&ctxt->rw_list, &cc->cc_rwctxts);
+ cc->cc_sqecount += ret;
+ return 0;
+}
+
/**
* svc_rdma_build_read_segment - Build RDMA Read WQEs to pull one RDMA segment
* @rqstp: RPC transaction context
@@ -780,6 +892,14 @@ static int svc_rdma_build_read_segment(struct svc_rqst *rqstp,
if (check_add_overflow(head->rc_pageoff, len, &total))
return -EINVAL;
nr_bvec = PAGE_ALIGN(total) >> PAGE_SHIFT;
+
+ if (head->rc_pageoff == 0 && nr_bvec >= 2) {
+ ret = svc_rdma_build_read_segment_compound(rqstp, head,
+ segment);
+ if (ret != -ENOMEM)
+ return ret;
+ }
+
ctxt = svc_rdma_get_rw_ctxt(rdma, nr_bvec);
if (!ctxt)
return -ENOMEM;
--
2.53.0
next prev parent reply other threads:[~2026-03-10 19:56 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 3:46 [PATCH] RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ path Chuck Lever
2026-03-10 13:42 ` Christoph Hellwig
2026-03-10 14:36 ` Chuck Lever
2026-03-10 18:37 ` Leon Romanovsky
2026-03-10 18:49 ` Chuck Lever
2026-03-10 19:31 ` Leon Romanovsky
2026-03-10 19:56 ` Chuck Lever [this message]
2026-03-10 20:27 ` [RFC PATCH] svcrdma: Use compound pages for RDMA Read sink buffers Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260310195650.15785-1-cel@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hch@lst.de \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox