Re: [f2fs-dev] [PATCH 1/2 v4] f2fs: support large folio for immutable non-compressed case

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jaegeuk Kim via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>
To: Nanzhe Zhao <nzzhao@126.com>
Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH 1/2 v4] f2fs: support large folio for immutable non-compressed case
Date: Fri, 2 Jan 2026 06:23:05 +0000	[thread overview]
Message-ID: <aVdkSZeuwzsNq7pE@google.com> (raw)
In-Reply-To: <a7f7efde-53e3-48c3-9caf-9410b018b1e1@126.com>

Hi Nanzhe,

On 01/01, Nanzhe Zhao wrote:
> Dear Kim:
> Happy New Year!
> 
> > +static struct f2fs_folio_state *
> > +ffs_find_or_alloc(struct folio *folio)
> > +{
> > +	struct f2fs_folio_state *ffs = folio->private;
> > +
> > +	if (ffs)
> > +		return ffs;
> > +
> > +	ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO, true, NULL);
> > +
> > +	spin_lock_init(&ffs->state_lock);
> > +	folio_attach_private(folio, ffs);
> > +	return ffs;
> > +}
> 
> It looks like ffs_find_or_alloc() does not initialize
> read_pages_pending.
> When I debug locally, printing read_pages_pending shows an undefined
> random value. Also, when I run a basic read test with dd, tasks can hang
> (because read_pages_pending never reaches zero, so the folio is never
> unlocked and never marked uptodate).
> 
> I know this function is modeled after iomap's ifs_alloc():
> 
> static struct iomap_folio_state *ifs_alloc(struct inode *inode,
> 					struct folio *folio, unsigned int flags)
> {
> 	struct iomap_folio_state *ifs = folio->private;
> 	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
> 	gfp_t gfp;
> 
> 	if (ifs || nr_blocks <= 1)
> 		return ifs;
> 	/*...*/
> 	/*
> 	 * ifs->state tracks two sets of state flags when the
> 	 * filesystem block size is smaller than the folio size.
> 	 * The first state tracks per-block uptodate and the
> 	 * second tracks per-block dirty state.
> 	 */
> 	ifs = kzalloc(struct_size(ifs, state,
> 		      BITS_TO_LONGS(2 * nr_blocks)), gfp);
> 	if (!ifs)
> 		return ifs;
> 
> 	spin_lock_init(&ifs->state_lock);
> 	if (folio_test_uptodate(folio))
> 		bitmap_set(ifs->state, 0, nr_blocks);
> 	if (folio_test_dirty(folio))
> 		bitmap_set(ifs->state, nr_blocks, nr_blocks);
> 	folio_attach_private(folio, ifs);
> 
> 	return ifs;
> }
> 
> Note ifs_alloc() uses kzalloc(), which zero-initializes the allocated memory
> by default while f2fs_kmem_cache_alloc() does not.
> 
> We could fix this by explicitly setting read_pages_pending = 0,
> or by doing a memset() right after f2fs_kmem_cache_alloc()
> (the latter seems more extensible if the struct grows). What do you think?

Agreed. What about adding __GFP_ZERO for f2fs_kmem_cache_alloc()?

> 
> > 		/*
> > +		 * This page will go to BIO.  Do we need to send this
> > +		 * BIO off first?
> > +		 */
> > +		if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio,
> > +					last_block_in_bio, block_nr) ||
> > +			!f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) {
> > +submit_and_realloc:
> > +			f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA);
> > +			bio = NULL;
> > +		}
> > +		if (bio == NULL)
> > +			bio = f2fs_grab_read_bio(inode, block_nr,
> > +					max_nr_pages,
> > +					f2fs_ra_op_flags(rac),
> > +					index, false);
> > +
> > +		/*
> > +		 * If the page is under writeback, we need to wait for
> > +		 * its completion to see the correct decrypted data.
> > +		 */
> > +		f2fs_wait_on_block_writeback(inode, block_nr);
> > +
> > +		if (!bio_add_folio(bio, folio, F2FS_BLKSIZE,
> > +					offset << PAGE_SHIFT))
> > +			goto submit_and_realloc;
> > +
> > +		if (folio_test_large(folio)) {
> > +			ffs = ffs_find_or_alloc(folio);
> > +
> > +			/* set the bitmap to wait */
> > +			spin_lock_irq(&ffs->state_lock);
> > +			ffs->read_pages_pending++;
> > +			spin_unlock_irq(&ffs->state_lock);
> > +		}
> 
> In the current code, it looks like a subpage is added to the BIO (or a
> cached BIO is submitted) before read_pages_pending is incremented.
> This can cause the following behaviour:
> 
> After one subpage of a folio is submitted, if the I/O completes very
> fast, the endio path may interrupt the read loop, run bio_endio, and
> eventually call f2fs_finish_read_bio(), which decrements read_pages_pending
> down to zero. That can make folio_finish_read() run too early, even though
> other parts of the same folio have not been added to a BIO yet.
> 
> I managed to trigger this locally by creating a heavily fragmented file
> and temporarily injecting the following code right after BIO submission:
> 
> 	f2fs_io_schedule_timeout(1);
> 	WARN_ON_ONCE(!folio_test_locked(folio));
> 
> I think the correct ordering is to increment read_pages_pending first,
> and then add the corresponding subpage to the BIO.
> In that ordering, the BIO side will either:
>   1) add a subpage after the increment (matching the new pending count),
> or
>   2) submit a BIO that corresponds to the pending increment from the
>      ** previous iteration **,
> so read_pages_pending will not reach zero prematurely.
> This is exactly the order that iomap_readpage_iter() implements.
> 
> If you need the script I used to reproduce the bug, please let me know.
> I will attach it in my next reply. Thanks!

I think this is also valid. If possible, could you please post patches to
fix these two bugs?

Thanks,

> 
> Best regards,
> Nanzhe Zhao
> 
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)

From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Nanzhe Zhao <nzzhao@126.com>
Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH 1/2 v4] f2fs: support large folio for immutable non-compressed case
Date: Fri, 2 Jan 2026 06:23:05 +0000	[thread overview]
Message-ID: <aVdkSZeuwzsNq7pE@google.com> (raw)
In-Reply-To: <a7f7efde-53e3-48c3-9caf-9410b018b1e1@126.com>

Hi Nanzhe,

On 01/01, Nanzhe Zhao wrote:
> Dear Kim:
> Happy New Year!
> 
> > +static struct f2fs_folio_state *
> > +ffs_find_or_alloc(struct folio *folio)
> > +{
> > +	struct f2fs_folio_state *ffs = folio->private;
> > +
> > +	if (ffs)
> > +		return ffs;
> > +
> > +	ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO, true, NULL);
> > +
> > +	spin_lock_init(&ffs->state_lock);
> > +	folio_attach_private(folio, ffs);
> > +	return ffs;
> > +}
> 
> It looks like ffs_find_or_alloc() does not initialize
> read_pages_pending.
> When I debug locally, printing read_pages_pending shows an undefined
> random value. Also, when I run a basic read test with dd, tasks can hang
> (because read_pages_pending never reaches zero, so the folio is never
> unlocked and never marked uptodate).
> 
> I know this function is modeled after iomap's ifs_alloc():
> 
> static struct iomap_folio_state *ifs_alloc(struct inode *inode,
> 					struct folio *folio, unsigned int flags)
> {
> 	struct iomap_folio_state *ifs = folio->private;
> 	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
> 	gfp_t gfp;
> 
> 	if (ifs || nr_blocks <= 1)
> 		return ifs;
> 	/*...*/
> 	/*
> 	 * ifs->state tracks two sets of state flags when the
> 	 * filesystem block size is smaller than the folio size.
> 	 * The first state tracks per-block uptodate and the
> 	 * second tracks per-block dirty state.
> 	 */
> 	ifs = kzalloc(struct_size(ifs, state,
> 		      BITS_TO_LONGS(2 * nr_blocks)), gfp);
> 	if (!ifs)
> 		return ifs;
> 
> 	spin_lock_init(&ifs->state_lock);
> 	if (folio_test_uptodate(folio))
> 		bitmap_set(ifs->state, 0, nr_blocks);
> 	if (folio_test_dirty(folio))
> 		bitmap_set(ifs->state, nr_blocks, nr_blocks);
> 	folio_attach_private(folio, ifs);
> 
> 	return ifs;
> }
> 
> Note ifs_alloc() uses kzalloc(), which zero-initializes the allocated memory
> by default while f2fs_kmem_cache_alloc() does not.
> 
> We could fix this by explicitly setting read_pages_pending = 0,
> or by doing a memset() right after f2fs_kmem_cache_alloc()
> (the latter seems more extensible if the struct grows). What do you think?

Agreed. What about adding __GFP_ZERO for f2fs_kmem_cache_alloc()?

> 
> > 		/*
> > +		 * This page will go to BIO.  Do we need to send this
> > +		 * BIO off first?
> > +		 */
> > +		if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio,
> > +					last_block_in_bio, block_nr) ||
> > +			!f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) {
> > +submit_and_realloc:
> > +			f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA);
> > +			bio = NULL;
> > +		}
> > +		if (bio == NULL)
> > +			bio = f2fs_grab_read_bio(inode, block_nr,
> > +					max_nr_pages,
> > +					f2fs_ra_op_flags(rac),
> > +					index, false);
> > +
> > +		/*
> > +		 * If the page is under writeback, we need to wait for
> > +		 * its completion to see the correct decrypted data.
> > +		 */
> > +		f2fs_wait_on_block_writeback(inode, block_nr);
> > +
> > +		if (!bio_add_folio(bio, folio, F2FS_BLKSIZE,
> > +					offset << PAGE_SHIFT))
> > +			goto submit_and_realloc;
> > +
> > +		if (folio_test_large(folio)) {
> > +			ffs = ffs_find_or_alloc(folio);
> > +
> > +			/* set the bitmap to wait */
> > +			spin_lock_irq(&ffs->state_lock);
> > +			ffs->read_pages_pending++;
> > +			spin_unlock_irq(&ffs->state_lock);
> > +		}
> 
> In the current code, it looks like a subpage is added to the BIO (or a
> cached BIO is submitted) before read_pages_pending is incremented.
> This can cause the following behaviour:
> 
> After one subpage of a folio is submitted, if the I/O completes very
> fast, the endio path may interrupt the read loop, run bio_endio, and
> eventually call f2fs_finish_read_bio(), which decrements read_pages_pending
> down to zero. That can make folio_finish_read() run too early, even though
> other parts of the same folio have not been added to a BIO yet.
> 
> I managed to trigger this locally by creating a heavily fragmented file
> and temporarily injecting the following code right after BIO submission:
> 
> 	f2fs_io_schedule_timeout(1);
> 	WARN_ON_ONCE(!folio_test_locked(folio));
> 
> I think the correct ordering is to increment read_pages_pending first,
> and then add the corresponding subpage to the BIO.
> In that ordering, the BIO side will either:
>   1) add a subpage after the increment (matching the new pending count),
> or
>   2) submit a BIO that corresponds to the pending increment from the
>      ** previous iteration **,
> so read_pages_pending will not reach zero prematurely.
> This is exactly the order that iomap_readpage_iter() implements.
> 
> If you need the script I used to reproduce the bug, please let me know.
> I will attach it in my next reply. Thanks!

I think this is also valid. If possible, could you please post patches to
fix these two bugs?

Thanks,

> 
> Best regards,
> Nanzhe Zhao
> 
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

next prev parent reply	other threads:[~2026-01-02  6:23 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20 23:54 [f2fs-dev] [PATCH 1/2] f2fs: support large folio for immutable non-compressed case Jaegeuk Kim via Linux-f2fs-devel
2025-11-20 23:54 ` Jaegeuk Kim
2025-11-20 23:54 ` [f2fs-dev] [PATCH 2/2] f2fs: add a tracepoint to see large folio read submission Jaegeuk Kim via Linux-f2fs-devel
2025-11-20 23:54   ` Jaegeuk Kim
2025-11-21 10:23   ` [f2fs-dev] " Chao Yu via Linux-f2fs-devel
2025-11-21 10:23     ` Chao Yu
2025-11-21 10:20 ` [f2fs-dev] [PATCH 1/2] f2fs: support large folio for immutable non-compressed case Chao Yu via Linux-f2fs-devel
2025-11-21 10:20   ` Chao Yu
2025-11-22  1:17   ` Jaegeuk Kim via Linux-f2fs-devel
2025-11-22  1:17     ` Jaegeuk Kim
2025-11-25  1:38     ` Chao Yu via Linux-f2fs-devel
2025-11-25  1:38       ` Chao Yu
2025-12-01 19:31   ` [f2fs-dev] [PATCH 1/2 v2] " Jaegeuk Kim via Linux-f2fs-devel
2025-12-01 19:31     ` Jaegeuk Kim
2025-12-01 21:37     ` Chao Yu via Linux-f2fs-devel
2025-12-01 21:37       ` Chao Yu
2025-12-01 22:30     ` [f2fs-dev] [PATCH 1/2 v3] " Jaegeuk Kim via Linux-f2fs-devel
2025-12-01 22:30       ` Jaegeuk Kim
2025-12-01 22:37       ` Chao Yu via Linux-f2fs-devel
2025-12-01 22:37         ` Chao Yu
2025-12-02  2:38       ` [f2fs-dev] [PATCH 1/2 v4] " Jaegeuk Kim via Linux-f2fs-devel
2025-12-02  2:38         ` Jaegeuk Kim
2025-12-02 18:07         ` Chao Yu via Linux-f2fs-devel
2025-12-02 18:07           ` Chao Yu
2025-12-09  8:32         ` Chao Yu via Linux-f2fs-devel
2025-12-09  8:32           ` Chao Yu
2025-12-09 18:38           ` Jaegeuk Kim via Linux-f2fs-devel
2025-12-09 18:38             ` Jaegeuk Kim
2026-01-01 11:20             ` Nanzhe Zhao
2026-01-01 11:20               ` Nanzhe Zhao
2026-01-02  6:23               ` Jaegeuk Kim via Linux-f2fs-devel [this message]
2026-01-02  6:23                 ` Jaegeuk Kim
2026-01-03 10:54                 ` Nanzhe Zhao
2026-01-03 10:54                   ` Nanzhe Zhao
2026-01-04  3:20                   ` Jaegeuk Kim via Linux-f2fs-devel
2026-01-04  3:20                     ` Jaegeuk Kim
2025-11-22  1:18 ` [f2fs-dev] [PATCH 1/2 v2] " Jaegeuk Kim via Linux-f2fs-devel
2025-11-22  1:18   ` Jaegeuk Kim
2025-12-16 19:20 ` [f2fs-dev] [PATCH 1/2] " patchwork-bot+f2fs--- via Linux-f2fs-devel
2025-12-16 19:20   ` patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVdkSZeuwzsNq7pE@google.com \
    --to=linux-f2fs-devel@lists.sourceforge.net \
    --cc=jaegeuk@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nzzhao@126.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.