From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF5741F0994
	for <linux-kernel@vger.kernel.org>; Fri,  2 Jan 2026 06:23:07 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1767334987; cv=none; b=lPFE4gK0el4XB+Y1Pj4wlDSaPkZTMgB2NSQ0v4aqNiYiRu24MeXlxYsXNZTje9dBt8M0ugbzZHxOt+Tkum0T77P6BE7tJCY+U/XnBzJ6IMToIvtjjKqBFl99jBIVZcTROOsScfziwj7PJrQpAv1dVryKv1c2nrod+0D68Spc++E=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1767334987; c=relaxed/simple;
	bh=SquersrdPm+OMwz4o28/QmCjQ1yhrhLHwIGGCtA6EZ8=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=jzJRttPloRmdbf4TOaqZXGvAV6JrEFb3T6xSD6OqxHyWV0xlyV6Ig29duYEmBOjkKgtEP5lCMSz0obly43eGDRLTcAekwX9DyvSXI2JtwL6lNnSaHaDmP4lqjTYvM0v6d2i6iH1st5PfTlYnAc56ODi9GwoAp88ZeCIQzISA2I0=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pylf3dWy; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pylf3dWy"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 27784C116B1;
	Fri,  2 Jan 2026 06:23:07 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1767334987;
	bh=SquersrdPm+OMwz4o28/QmCjQ1yhrhLHwIGGCtA6EZ8=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=pylf3dWyAtxpJkwEuMqmERtttl/1/mDcxanb2h20rp/Y5kRQ6dvByXggn/+N5VyXM
	 bfGTFqaPD5lMaD3t+FGwgrzxMXbgR7MFoMeaF6uGyEJvd/WueRxdSqOSPZWbIVOAcF
	 WwASZffGaxH7tCQR89mbpQV5p8n1UfV734fqoUG/r5wgc5WN96W0cPkvbfT1hykNs4
	 /nY0aI76//4kPHITXSFuT/xzRckjyTWYF73tbH8nXecL2ma3IeMQD86ZTcnl+5tilB
	 HD/fetDVMtc0mtcqwAoFOg1gwLkhkuPZXO4O/j372jHR+iSTLecCfierr3NRdVQGhW
	 6P2Nlt84UHkCQ==
Date: Fri, 2 Jan 2026 06:23:05 +0000
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Nanzhe Zhao <nzzhao@126.com>
Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH 1/2 v4] f2fs: support large folio for
 immutable non-compressed case
Message-ID: <aVdkSZeuwzsNq7pE@google.com>
References: <20251120235446.1947532-1-jaegeuk@kernel.org>
 <0153ff69-789d-4fe1-a89c-0c607a9a7d6c@kernel.org>
 <aS3s98fgy80_xDuK@google.com>
 <aS4W7NBYFPwP6KEQ@google.com>
 <aS5RG6IQ8AlXrLwT@google.com>
 <0bf2eedd-eebb-440b-96f5-72ac3a68b608@kernel.org>
 <aThslGwzpYmEXzU5@google.com>
 <a7f7efde-53e3-48c3-9caf-9410b018b1e1@126.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <a7f7efde-53e3-48c3-9caf-9410b018b1e1@126.com>

Hi Nanzhe,

On 01/01, Nanzhe Zhao wrote:
> Dear Kim:
> Happy New Year!
> 
> > +static struct f2fs_folio_state *
> > +ffs_find_or_alloc(struct folio *folio)
> > +{
> > +	struct f2fs_folio_state *ffs = folio->private;
> > +
> > +	if (ffs)
> > +		return ffs;
> > +
> > +	ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO, true, NULL);
> > +
> > +	spin_lock_init(&ffs->state_lock);
> > +	folio_attach_private(folio, ffs);
> > +	return ffs;
> > +}
> 
> It looks like ffs_find_or_alloc() does not initialize
> read_pages_pending.
> When I debug locally, printing read_pages_pending shows an undefined
> random value. Also, when I run a basic read test with dd, tasks can hang
> (because read_pages_pending never reaches zero, so the folio is never
> unlocked and never marked uptodate).
> 
> I know this function is modeled after iomap's ifs_alloc():
> 
> static struct iomap_folio_state *ifs_alloc(struct inode *inode,
> 					struct folio *folio, unsigned int flags)
> {
> 	struct iomap_folio_state *ifs = folio->private;
> 	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
> 	gfp_t gfp;
> 
> 	if (ifs || nr_blocks <= 1)
> 		return ifs;
> 	/*...*/
> 	/*
> 	 * ifs->state tracks two sets of state flags when the
> 	 * filesystem block size is smaller than the folio size.
> 	 * The first state tracks per-block uptodate and the
> 	 * second tracks per-block dirty state.
> 	 */
> 	ifs = kzalloc(struct_size(ifs, state,
> 		      BITS_TO_LONGS(2 * nr_blocks)), gfp);
> 	if (!ifs)
> 		return ifs;
> 
> 	spin_lock_init(&ifs->state_lock);
> 	if (folio_test_uptodate(folio))
> 		bitmap_set(ifs->state, 0, nr_blocks);
> 	if (folio_test_dirty(folio))
> 		bitmap_set(ifs->state, nr_blocks, nr_blocks);
> 	folio_attach_private(folio, ifs);
> 
> 	return ifs;
> }
> 
> Note ifs_alloc() uses kzalloc(), which zero-initializes the allocated memory
> by default while f2fs_kmem_cache_alloc() does not.
> 
> We could fix this by explicitly setting read_pages_pending = 0,
> or by doing a memset() right after f2fs_kmem_cache_alloc()
> (the latter seems more extensible if the struct grows). What do you think?

Agreed. What about adding __GFP_ZERO for f2fs_kmem_cache_alloc()?

> 
> > 		/*
> > +		 * This page will go to BIO.  Do we need to send this
> > +		 * BIO off first?
> > +		 */
> > +		if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio,
> > +					last_block_in_bio, block_nr) ||
> > +			!f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) {
> > +submit_and_realloc:
> > +			f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA);
> > +			bio = NULL;
> > +		}
> > +		if (bio == NULL)
> > +			bio = f2fs_grab_read_bio(inode, block_nr,
> > +					max_nr_pages,
> > +					f2fs_ra_op_flags(rac),
> > +					index, false);
> > +
> > +		/*
> > +		 * If the page is under writeback, we need to wait for
> > +		 * its completion to see the correct decrypted data.
> > +		 */
> > +		f2fs_wait_on_block_writeback(inode, block_nr);
> > +
> > +		if (!bio_add_folio(bio, folio, F2FS_BLKSIZE,
> > +					offset << PAGE_SHIFT))
> > +			goto submit_and_realloc;
> > +
> > +		if (folio_test_large(folio)) {
> > +			ffs = ffs_find_or_alloc(folio);
> > +
> > +			/* set the bitmap to wait */
> > +			spin_lock_irq(&ffs->state_lock);
> > +			ffs->read_pages_pending++;
> > +			spin_unlock_irq(&ffs->state_lock);
> > +		}
> 
> In the current code, it looks like a subpage is added to the BIO (or a
> cached BIO is submitted) before read_pages_pending is incremented.
> This can cause the following behaviour:
> 
> After one subpage of a folio is submitted, if the I/O completes very
> fast, the endio path may interrupt the read loop, run bio_endio, and
> eventually call f2fs_finish_read_bio(), which decrements read_pages_pending
> down to zero. That can make folio_finish_read() run too early, even though
> other parts of the same folio have not been added to a BIO yet.
> 
> I managed to trigger this locally by creating a heavily fragmented file
> and temporarily injecting the following code right after BIO submission:
> 
> 	f2fs_io_schedule_timeout(1);
> 	WARN_ON_ONCE(!folio_test_locked(folio));
> 
> I think the correct ordering is to increment read_pages_pending first,
> and then add the corresponding subpage to the BIO.
> In that ordering, the BIO side will either:
>   1) add a subpage after the increment (matching the new pending count),
> or
>   2) submit a BIO that corresponds to the pending increment from the
>      ** previous iteration **,
> so read_pages_pending will not reach zero prematurely.
> This is exactly the order that iomap_readpage_iter() implements.
> 
> If you need the script I used to reproduce the bug, please let me know.
> I will attach it in my next reply. Thanks!

I think this is also valid. If possible, could you please post patches to
fix these two bugs?

Thanks,

> 
> Best regards,
> Nanzhe Zhao
> 
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel