All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jonathan Flynn <jonathan.flynn@hammerspace.com>
Subject: [PATCH] svcrdma: Avoid direct reclaim when allocating Read sink buffers
Date: Fri,  5 Jun 2026 18:31:18 -0400	[thread overview]
Message-ID: <20260605223118.75092-1-cel@kernel.org> (raw)

From: Chuck Lever <chuck.lever@oracle.com>

svc_rdma_alloc_read_pages() passes __GFP_NORETRY, which limits the
allocator to a single round of direct reclaim and asynchronous
compaction per attempt. Under memory pressure or fragmentation that
round can take a long time, and the fallback loop repeats it at
each order, multiplying the stall while the RPC waits for its Read
sink buffer.

The contiguous allocation is opportunistic: when it fails, Read
sink buffers come from the pages already in rq_pages[]. Direct
reclaim effort buys little here. Allocate with GFP_NOWAIT instead,
which omits __GFP_DIRECT_RECLAIM so the allocator takes pages only
from the free lists and returns NULL immediately when none are
available. GFP_NOWAIT retains __GFP_KSWAPD_RECLAIM, so a failed
attempt still wakes kswapd to replenish higher-order pages in the
background, and it already includes __GFP_NOWARN. __GFP_NORETRY
has no effect once direct reclaim is off. skb_page_frag_refill()
takes the same approach for its opportunistic high-order
allocation.

Reported-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Fixes: 18755b8c2f24 ("svcrdma: Use contiguous pages for RDMA Read sink buffers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_rw.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)


Given the perf symbol resolution inaccuracies I can't swear this
will fix the issue, but here's a stab at it.


diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 587e4cd29303..efde26cac961 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -746,10 +746,9 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma,
 }
 
 /*
- * Cap contiguous RDMA Read sink allocations at order-4.
- * Higher orders risk allocation failure under
- * __GFP_NORETRY, which would negate the benefit of the
- * contiguous fast path.
+ * Cap contiguous RDMA Read sink allocations at order-4. Higher orders risk
+ * allocation failure under GFP_NOWAIT, which would negate the benefit of
+ * the contiguous fast path.
  */
 #define SVC_RDMA_CONTIG_MAX_ORDER	4
 
@@ -758,9 +757,11 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma,
  * @nr_pages: number of pages needed
  * @order: on success, set to the allocation order
  *
- * Attempts a higher-order allocation, falling back to smaller orders.
- * The returned pages are split immediately so each sub-page has its
- * own refcount and can be freed independently.
+ * Attempts a higher-order allocation, falling back to smaller orders. The
+ * allocation is opportunistic: it takes pages only from the free lists,
+ * without direct reclaim, so it fails fast under memory pressure. The
+ * returned pages are split immediately so each sub-page has its own
+ * refcount and can be freed independently.
  *
  * Returns a pointer to the first page on success, or NULL if even
  * order-1 allocation fails.
@@ -775,8 +776,7 @@ svc_rdma_alloc_read_pages(unsigned int nr_pages, unsigned int *order)
 		SVC_RDMA_CONTIG_MAX_ORDER);
 
 	while (o >= 1) {
-		page = alloc_pages(GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN,
-				   o);
+		page = alloc_pages(GFP_NOWAIT, o);
 		if (page) {
 			split_page(page, o);
 			*order = o;
-- 
2.54.0


             reply	other threads:[~2026-06-05 22:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 22:31 Chuck Lever [this message]
2026-06-05 22:39 ` [PATCH] svcrdma: Avoid direct reclaim when allocating Read sink buffers Jonathan Flynn
2026-06-05 23:13 ` Jonathan Flynn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605223118.75092-1-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jonathan.flynn@hammerspace.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.