All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>
Cc: <linux-nfs@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jonathan Flynn <jonathan.flynn@hammerspace.com>
Subject: [PATCH] svcrdma: Cap Read sink allocations at PAGE_ALLOC_COSTLY_ORDER
Date: Fri,  5 Jun 2026 23:57:22 -0400	[thread overview]
Message-ID: <20260606035722.83175-1-cel@kernel.org> (raw)

From: Chuck Lever <chuck.lever@oracle.com>

Jonathan Flynn reports that commit 18755b8c2f24 ("svcrdma: Use
contiguous pages for RDMA Read sink buffers") regresses NFS/RDMA
WRITE throughput from 73.9 GiB/s to 30.3 GiB/s on a 128-core
single-NUMA-node server driving dual 400Gb/s links with 640 nfsd
threads. In the regressed configuration, server CPU utilization
rises from 8.5% to 76%, and 73% of all server CPU cycles are spent
in native_queued_spin_lock_slowpath.

The contended lock is zone->lock. The page allocator serves
allocations only up to PAGE_ALLOC_COSTLY_ORDER (3) from its per-CPU
page lists; SVC_RDMA_CONTIG_MAX_ORDER is 4, so every contiguous
sink buffer allocation falls through to rmqueue_buddy() and
acquires the zone lock. The workload above issues roughly half a
million order-4 allocations per second, all serialized on the
single zone lock of the one NUMA node. Replacing the GFP mask with
GFP_NOWAIT did not change the profile because direct reclaim never
ran: the cycles are spent acquiring the lock, not reclaiming
memory.

Cap the allocation order at PAGE_ALLOC_COSTLY_ORDER so contiguous
sink buffer allocations remain eligible for the per-CPU page
lists, where zone lock acquisition is amortized across pcp batch
refills. An order-3 chunk still replaces eight per-page bvecs with
one.

Reported-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Fixes: 18755b8c2f24 ("svcrdma: Use contiguous pages for RDMA Read sink buffers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_rw.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index efde26cac961..4546e594f2d7 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -746,11 +746,12 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma,
 }
 
 /*
- * Cap contiguous RDMA Read sink allocations at order-4. Higher orders risk
- * allocation failure under GFP_NOWAIT, which would negate the benefit of
- * the contiguous fast path.
+ * Cap contiguous RDMA Read sink allocations at PAGE_ALLOC_COSTLY_ORDER.
+ * The page allocator serves allocations at or below that order from
+ * its per-CPU page lists; above it, every allocation acquires the
+ * zone lock, which serializes all nfsd threads.
  */
-#define SVC_RDMA_CONTIG_MAX_ORDER	4
+#define SVC_RDMA_CONTIG_MAX_ORDER	PAGE_ALLOC_COSTLY_ORDER
 
 /**
  * svc_rdma_alloc_read_pages - Allocate physically contiguous pages
-- 
2.54.0


             reply	other threads:[~2026-06-06  3:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-06  3:57 Chuck Lever [this message]
2026-06-06 17:35 ` [PATCH] svcrdma: Cap Read sink allocations at PAGE_ALLOC_COSTLY_ORDER Jonathan Flynn
2026-06-07  3:17   ` Chuck Lever
2026-06-07 18:11     ` Jonathan Flynn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260606035722.83175-1-cel@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jonathan.flynn@hammerspace.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.