From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B80E3A758B for ; Thu, 30 Apr 2026 20:22:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580576; cv=none; b=MTyKPYGkqBNBsR+Wq8sLIRAxQlKZnJYeOo8AqpNVsByiqtv1EBunE6IoEq0Qru61IW90HG5isgikuY48idQFG+cI88QPdPFQXbmnDq594UknfNItNmQPekIuBLV3src+tbUGjqJjwGq26JhBZK83/WJvxaAHT3I6WUkhIBEndfg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580576; c=relaxed/simple; bh=IERHOWjvmcsJ94KIZp+GPkJ1g4D1ttde3QmPnHyUe4U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Y9ZWI3cUHA45fS6NUzxYemjG7CtCZpVjijvoTtjV5dcZGPjd16Q/4hcOnLuJHwbC45X00P6W+NPVxkmXMQEyENt/N3znDCZHRSywt4rDwUrxT0Z71orR+qeWlK6moo4XnWk3cbOjsRhQxiOtUp4OwvOI9hCU342GKtxzai9wkhQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=FTCpTDcS; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="FTCpTDcS" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=v47cF74dZcIPIzFPnA8W//wetS30xYd0Bvj0t+6vliw=; b=FTCpTDcSquZJXHFMchV/sXJVpj J01op1ryDh4hbdMVXN54Ml5lASwZpQt1v5siSuGYXiEgeVD+jAn2j3TpsA0ymaB3ZfjjT0GRHxRKF 55Q+ihj/Ejj/sqeXuGka2k60UuX53qTDHArCRNzBpPRzsnD88ajzeg0Dno0zcV6yxYG9q92BlhPdZ /SzszxGbBQdtGmEZ3SHMm8J04+GkCWXLXe0QHprkCqCI8h7bgE/tfvt6TlkXMPg5VPORwmTTU66On NYySBTrsWJLb/rkmek4oeJx2Xs5tetm7FFVyt+Vrfnrb0917dsSIzpbMXBVL/jgNUPN7514PkVbkG bvZU/gHQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuD-000000001R0-1Fql; Thu, 30 Apr 2026 16:22:41 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev, Rik van Riel , Rik van Riel Subject: [RFC PATCH 33/45] mm: page_alloc: refuse to taint clean SPBs for atomic NORETRY callers Date: Thu, 30 Apr 2026 16:21:02 -0400 Message-ID: <20260430202233.111010-34-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260430202233.111010-1-riel@surriel.com> References: <20260430202233.111010-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Rik van Riel get_page_from_freelist's atomic-allocation retry logic progressively relaxes ALLOC_NOFRAGMENT to give atomic allocs every chance: first add ALLOC_NOFRAG_TAINTED_OK (allow steal from tainted), then drop ALLOC_NOFRAGMENT entirely (allow tainting clean SPBs). The intent is that atomic allocations have no slowpath escape and need extra room to succeed. For callers that pass __GFP_NORETRY, this tradeoff is wrong. The NORETRY contract is "I have a fallback; don't go to extreme lengths." Network skb_page_frag_refill, slab high-order allocations, and similar hot-path callers all use NORETRY exactly so the allocator can return NULL and let the caller's own fallback (smaller frag, lower-order slab, etc.) take over. Tainting a clean superpageblock to satisfy such a request is a lasting cost — the SPB stays tainted for the remainder of the workload's lifetime, blocking 1 GiB hugepage allocation from that region — that outlives the single allocation that triggered it. Skip the relaxation steps for NORETRY callers and return NULL immediately. Their fallback path absorbs the failure cleanly. Observed on a 247 GB devvm running the page-superblock v18 series: an atomic order-3 alloc from swapper context (PCP refill, gfp=0x152820 = __GFP_HIGH | __GFP_KSWAPD_RECLAIM | __GFP_NOWARN | __GFP_NORETRY | __GFP_COMP | __GFP_HARDWALL) tainted a fresh clean SPB at boot+~90 min despite ALLOC_NOFRAGMENT being set, because the atomic-retry path stripped the flag. The caller had a NORETRY-fallback ready; the taint was gratuitous. Signed-off-by: Rik van Riel Assisted-by: Claude:claude-opus-4.7 syzkaller --- mm/page_alloc.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ff7755ef2b79..e8d6d5b47f63 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5895,9 +5895,20 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, * first: allow steal/claim from tainted SPBs only. This avoids * tainting clean SPBs while still finding pages in tainted ones. * Only drop NOFRAGMENT entirely if that also fails. + * + * Exception: callers that explicitly opted into failure with + * __GFP_NORETRY have a fallback path of their own (a smaller + * order, a different cache, returning NULL from a best-effort + * cache refill, etc.). Tainting a clean superpageblock is a + * lasting cost that outlives this allocation; it is not justified + * to absorb it just to satisfy a caller that already has a + * cheaper escape hatch. Return NULL and let the caller's fallback + * run instead. */ if (no_fallback && !defrag_mode && !(gfp_mask & __GFP_DIRECT_RECLAIM)) { + if (gfp_mask & __GFP_NORETRY) + return NULL; if (!(alloc_flags & ALLOC_NOFRAG_TAINTED_OK)) { alloc_flags |= ALLOC_NOFRAG_TAINTED_OK; goto retry; -- 2.52.0