From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A86834D383; Thu, 30 Oct 2025 21:25:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761859555; cv=none; b=nmxJuahrU7yFM0WvqDD1znkrHfwKNch9E4J/ATd8jW5ujyr1V0Qnq3f46esYkYWuiybVwZeK3WjVSKrSECtIGU4L2ymSrJ1/6XcoEvOxDS1nI8fCBb2HUAe4/JL+kH6ncB3on/qYxYNCh5mg0kXqXKGgFl4NldBLvecdaskqKQs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761859555; c=relaxed/simple; bh=Jmh6jMrP3dDORoX1+gJfwJb2pFOf6i45AQLL6Hzo/Vg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=fc7guR9rBBR4g8zgcSOxNRRQ9i88Hu/W1aXANULdFDPIEieLcxQvhk7I5IcaHVfqfK7tIrg8kKm9Evmk+a7738gVMCoK14BNZuTSkcQVvcZbSetSKbDc9L2KeYrUIX2+YTZCSbPkLIVNBWzx83PSbWRkEeSViYB9vEeILJmDms0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=gEZTFOgV; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gEZTFOgV" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=WQjgh76ocKn6tEir5dlln+rELO/KQ8E5wii4qcczHPc=; b=gEZTFOgVBA16zievEla5GynjUp PEyXMVqR2EUv2ROto/2cPQtzxjiwuSYjvtOdbriPADtwgo/DJXnzw3GqWRYCYX3+RMz5uMXbLbvRh 9MinJUfmF9NsJsCBcRXCVwy95RGgFj9/wcDpiihdiDg50G1dkgoKMrlNMAji2QFNgffZdrBmd1RJi tWhDGBcPuHOJUSa28y/bIXNJvdcZ6Ol5atWR2O+pMM2XlD5Yc/vbnWqrAkLVMcYsSkQABgJl8/Uiv pLaf2suQ1vhKrtY3/ttlmI3WVsAOIn7iCFhO55ouG3KVBrvhXiXOM1RKpJW3lQU5jWPfLNR7iuXUy AY0QM0eg==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1vEa9P-0000000DkEe-3SJR; Thu, 30 Oct 2025 21:25:43 +0000 Date: Thu, 30 Oct 2025 21:25:43 +0000 From: Matthew Wilcox To: Baokun Li Cc: "Darrick J. Wong" , linux-ext4@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, linux-kernel@vger.kernel.org, kernel@pankajraghav.com, mcgrof@kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, yi.zhang@huawei.com, yangerkun@huawei.com, chengzhihao1@huawei.com, libaokun1@huawei.com Subject: Re: [PATCH 22/25] fs/buffer: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS Message-ID: References: <20251025032221.2905818-1-libaokun@huaweicloud.com> <20251025032221.2905818-23-libaokun@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sat, Oct 25, 2025 at 02:32:45PM +0800, Baokun Li wrote: > On 2025-10-25 12:45, Matthew Wilcox wrote: > > No, absolutely not. We're not having open-coded GFP_NOFAIL semantics. > > The right way forward is for ext4 to use iomap, not for buffer heads > > to support large block sizes. > > ext4 only calls getblk_unmovable or __getblk when reading critical > metadata. Both of these functions set __GFP_NOFAIL to ensure that > metadata reads do not fail due to memory pressure. > > Both functions eventually call grow_dev_folio(), which is why we > handle the __GFP_NOFAIL logic there. xfs_buf_alloc_backing_mem() > has similar logic, but XFS manages its own metadata, allowing it > to use vmalloc for memory allocation. In today's ext4 call, we discussed various options: 1. Change folios to be potentially fragmented. This change would be ridiculously large and nobody thinks this is a good idea. Included here for completeness. 2. Separate the buffer cache from the page cache again. They were unified about 25 years ago, and this also feels like a very big job. 3. Duplicate the buffer cache into ext4/jbd2, remove the functionality not needed and make _this_ version of the buffer cache allocate its own memory instead of aliasing into the page cache. More feasible than 1 or 2; still quite a big job. 4. Pick up Catherine's work and make ext4/jbd2 use it. Seems to be about an equivalent amount of work to option 3. 5. Make __GFP_NOFAIL work for allocations up to 64KiB (we decided this was probably the practical limit of sector sizes that people actually want). In terms of programming, it's a one-line change. But we need to sell this change to the MM people. I think it's doable because if we have a filesystem with 64KiB sectors, there will be many clean folios in the pagecache which are 64KiB or larger. So, we liked option 5 best.