From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 569DA3D3332; Fri, 5 Jun 2026 22:31:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780698681; cv=none; b=ty0s0lQVMXKMi9ah7/P0xtIbXPIqqPzIYsNoBHJ1DBIK0qvl1aQfL8oDbozDq073hspp13RHrGLQWH4gDoVmHX5B9dgIsZgsQlS71GZxzwhbEwjRySOfch+eGVdLYve/to/GnSLxD7Eyd/Jj4XK4U9NAp87cZYh8aEm6KFI7mEU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780698681; c=relaxed/simple; bh=//fl8NY8cHCuLu0F6IcD2+A7VFp0LS+mBjTKvPo0Mkg=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=M+XyuaKPc/A8Scg1fGf2kps6oGqvcduSUFohJJSv2bZn9kiF9xOdvC4fbnrk1J+H6XH43D1wlst5ppf3MtSU72JNffMGbBelIFVQBfnsB3Bmg0j5SyGAOD4O1m8IO+8sKhIMAZ876P4nWkj5UX40e7qMwtq+dZx5rqT9P9DaLxk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XV47B6By; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XV47B6By" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DBFBC1F00893; Fri, 5 Jun 2026 22:31:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780698680; bh=+biCh8iFNbHBCiJ/wzUPBBlfLt4ML/msKMECFhIQ4+M=; h=From:To:Cc:Subject:Date; b=XV47B6ByzDuqQZsnYNuit8tWlCitgnM/wytQSeUXQUf+YIDiV+3Q8ivZ1JpC/YRDF lmsrkJFeX8BYUDOD8cFe5nel8Y5UvhazUr5lDTkTUiGow5ODIDm0Onoid3RchCCEtU u1l9Mo2TPY9/GKbnljICeyG+XDIdxosfHOkXot/z6WCbE9F5OxzLzlXRISyWvqbBIb KkPgrmiY6hzbVaqCVWj/coXOaQdqCAOm0ru0z7d48Qz9AFEMJad72U8C6ECdXMAISl XWjY88lPHpty4gT0JfBqT8BFvKTK2oTaie6aZzOZyKKRcHrQQF9PboQgbRoaymKqrn 8WDvIAYq59DTA== From: Chuck Lever To: Mike Snitzer Cc: , , Chuck Lever , Jonathan Flynn Subject: [PATCH] svcrdma: Avoid direct reclaim when allocating Read sink buffers Date: Fri, 5 Jun 2026 18:31:18 -0400 Message-ID: <20260605223118.75092-1-cel@kernel.org> X-Mailer: git-send-email 2.54.0 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Chuck Lever svc_rdma_alloc_read_pages() passes __GFP_NORETRY, which limits the allocator to a single round of direct reclaim and asynchronous compaction per attempt. Under memory pressure or fragmentation that round can take a long time, and the fallback loop repeats it at each order, multiplying the stall while the RPC waits for its Read sink buffer. The contiguous allocation is opportunistic: when it fails, Read sink buffers come from the pages already in rq_pages[]. Direct reclaim effort buys little here. Allocate with GFP_NOWAIT instead, which omits __GFP_DIRECT_RECLAIM so the allocator takes pages only from the free lists and returns NULL immediately when none are available. GFP_NOWAIT retains __GFP_KSWAPD_RECLAIM, so a failed attempt still wakes kswapd to replenish higher-order pages in the background, and it already includes __GFP_NOWARN. __GFP_NORETRY has no effect once direct reclaim is off. skb_page_frag_refill() takes the same approach for its opportunistic high-order allocation. Reported-by: Jonathan Flynn Fixes: 18755b8c2f24 ("svcrdma: Use contiguous pages for RDMA Read sink buffers") Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/svc_rdma_rw.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) Given the perf symbol resolution inaccuracies I can't swear this will fix the issue, but here's a stab at it. diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c index 587e4cd29303..efde26cac961 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c @@ -746,10 +746,9 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, } /* - * Cap contiguous RDMA Read sink allocations at order-4. - * Higher orders risk allocation failure under - * __GFP_NORETRY, which would negate the benefit of the - * contiguous fast path. + * Cap contiguous RDMA Read sink allocations at order-4. Higher orders risk + * allocation failure under GFP_NOWAIT, which would negate the benefit of + * the contiguous fast path. */ #define SVC_RDMA_CONTIG_MAX_ORDER 4 @@ -758,9 +757,11 @@ int svc_rdma_prepare_reply_chunk(struct svcxprt_rdma *rdma, * @nr_pages: number of pages needed * @order: on success, set to the allocation order * - * Attempts a higher-order allocation, falling back to smaller orders. - * The returned pages are split immediately so each sub-page has its - * own refcount and can be freed independently. + * Attempts a higher-order allocation, falling back to smaller orders. The + * allocation is opportunistic: it takes pages only from the free lists, + * without direct reclaim, so it fails fast under memory pressure. The + * returned pages are split immediately so each sub-page has its own + * refcount and can be freed independently. * * Returns a pointer to the first page on success, or NULL if even * order-1 allocation fails. @@ -775,8 +776,7 @@ svc_rdma_alloc_read_pages(unsigned int nr_pages, unsigned int *order) SVC_RDMA_CONTIG_MAX_ORDER); while (o >= 1) { - page = alloc_pages(GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN, - o); + page = alloc_pages(GFP_NOWAIT, o); if (page) { split_page(page, o); *order = o; -- 2.54.0