From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A09C42765DF for ; Mon, 6 Apr 2026 21:46:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775511963; cv=none; b=qnLyUGzwDSi2FI1WzZ6PGX4AP0sD0V+QCZ0FfXh5GwnURHaCJiXDdRdkLLYD388mCLt1jEVIElSJ0+ZclViszYvj6IZcokcooLUgCHuGs6H8KgsSHJmO2toFd3tvNRDkcD316ltqcbrB0lFCyo5MRK+Arw5PNcANL2DbqdvLqLM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775511963; c=relaxed/simple; bh=TI1bJF9DNyhwHD5ieLf/U1MVqlvo4Pz2cyRMrDgXask=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DwxtLwvVhT/1oeGAYdtkuBE/APKX7ec3jepb9slA+POMsQAWRrxPQnDwnuJ+T3MIwDOBnpMPXnpCwHWCzV53tSTZNEKvpuM9Yp0WZaDvAZ6OyWKBh76+zWHAS4DYpY8oXPlKrCpOawtDzwwh8BF5TIAX6mPsob76DwpoCUIzZ+c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UZ6z3pYT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UZ6z3pYT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00D3FC4CEF7; Mon, 6 Apr 2026 21:46:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775511963; bh=TI1bJF9DNyhwHD5ieLf/U1MVqlvo4Pz2cyRMrDgXask=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=UZ6z3pYTs0yqQAZ+vI+5360Xh4Id/YkR/w+SpHMt3UL1tISCLqN6UfZwU/l2HCV3b fhI7j+Cg7lst7JC8Xx8g3b8upmjuvmFqXsIKQ8FHfHFXmqLZBxcLOcxfBk6Z5omQoY 7/nokPlj5ptrcJnkMRbhfgTg+Tc9nnUz4Ut+i/OPoUxU5gk4U11IvXk9tMtmEs4KKx RXrTwPsEG6Okk3gRsRnkQYy7DKMiZh5gLXoXlC18EnVAl35t4m6CGdubekm2sY+SlQ QittNQblkSu9IfhWuxctTy7Qi5hNaNBC90CU1K+uS26PCDefckEIca4cxEQwqhz8Rn teciBPDLPfmWQ== Date: Tue, 7 Apr 2026 07:45:58 +1000 From: Dave Chinner To: Ritesh Harjani Cc: Matthew Wilcox , linux-xfs@vger.kernel.org Subject: Re: Hang with xfs/285 on 2026-03-02 kernel Message-ID: References: <341amd4w.ritesh.list@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Apr 06, 2026 at 05:57:06AM +0530, Ritesh Harjani wrote: > > However, turning off direct reclaim should make no difference in > > the long run because vmalloc is only trying to allocate a batch of > > single page folios. > > > > If we are in low memory situations where no single page folios are > > not available, then even for a NORETRY/no direct reclaim allocation > > the expectation is that the failed allocation attempt would be > > kicking kswapd to perform background memory reclaim. > > > > This is especially true when the allocation is GFP_NOFS/GFP_NOIO > > even with direct reclaim turned on - if all the memory is held in > > shrinkable fs/vfs caches then direct reclaim cannot reclaim anything > > filesystem/IO related. > > > > So, looking at the logs from Matthew, I think, this case might have > benefitted from __GFP_DIRECT_RECLAIM, because we have many clean > inactive file pages. So theoritically, IMO direct reclaim should be able > to use one of those clean file pages (after it gets direct-reclaimed) > > nr_zone_inactive_file 62769 > nr_zone_write_pending 0 You miss the point - this is not an isolated use case. e.g. Look at xlog_kvmalloc() - it's also ~__GFP_DIRECT_RECLAIM, NORETRY vmalloc() loop. What's to stop that one from getting stuck in exactly the same way? To that point, kvmalloc(GFP_NOFAIL) now implements the semantics that xlog_kvmalloc() requires - it turns of direct reclaim (and hence costly compaction) for the kmalloc() allocation attempt, then falls back to vmalloc(GFP_NOFAIL) if kmalloc fails. That's also pretty much the exact semantics we are trying to implement in in xfs_buf_alloc(), yes? i.e. xfs_buf_alloc() does: For buffers < PAGE_SIZE, it calls kmalloc() directly and returns. For buffers == PAGESIZE, it calls folio_alloc(GFP_KERNEL). For buffers > PAGE_SIZE, it calls folio_alloc(NORETRY, ~__GFP_DIRECT_RECLAIM) if either folio_alloc() call fails, it effectively runs an open coded __vmalloc() no-fail loop. IOWs we are implementing essentially the same semantics as kvmalloc(__GFP_NOFAIL), modulo the reclaim flags for the __vmalloc() loop. If we are going to change the flags for the vmalloc() loop to be the original, then we are essentially reimplementing kvmalloc(GFP_NOFAIL) semantics exactly. At which point.... > > i.e. background reclaim making forwards progress is absolutely > > necessary for any sort of "nofail" allocation loop to succeed > > regardless of whether direct reclaim is enabled or not. > > > > Hence if background memory reclaim is making progress, this > > allocation loop should eventually succeed. If the allocation is not > > succeeding, then it implies that some critical resource in the > > allocation path is not being refilled either on allocation failure > > or by background reclaim, and hence the allocation failure persists > > because nothing alleviates the resource shortage that is triggering > > the ENOMEM issue. > > I agree, background memory reclaim / kswapd thread should have made > forward progress. > > I am not sure why in this case, we are we hitting hung tasks issues then. > Could be because of multiple fsstress threads running in parallel (from > ps -eax output), and maybe some other process ends up using the pages > reclaimed by background kswapd (just a theory). I don't think that's the case, because kswapd is supposed to run until watermarks are reached and that means all free page pools are supposed to have at least some free pages in them... That's why I think there's a reclaim bug lurking here - allocation appears to be stalling on something that background reclaim is not refilling. And if allocation is stalling on buffer allocation, then it can stall in other critical parts of XFS, too. Background reclaim not doing sufficient work to make looping non-blocking, no-retry allocations to succeed seems like a memory allocation/reclaim bug to me, not an XFS issue... > > So the question is: where in the __vmalloc allocation path is the > > ENOMEM error being generated from, and is it the same place every > > time? > > > > Although I can't say for sure, but in this case after looking at the > code, and knowing that we are not passing __GFP_DIRECT_RECLAIM, it might > be returning from here (after get_page_from_freelist() couldn't get a > free page). > > __alloc_pages_slowpath() { > ... > /* Caller is not willing to reclaim, we can't balance anything */ > if (!can_direct_reclaim) > goto nopage; Sure, we can't balance anything, but we've set ALLOC_KSWAPD early in this function and so every time we get to the above point in the allocation code we've alreayd run this: retry: /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); Hence kswapds should be active and doing reclaim work to bring everything back to minimum free pool watermarks. That *should* be sufficient for a no-direct-reclaim allocation loop to make progress. -Dave. -- Dave Chinner dgc@kernel.org