From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF5741F0994 for ; Fri, 2 Jan 2026 06:23:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767334987; cv=none; b=lPFE4gK0el4XB+Y1Pj4wlDSaPkZTMgB2NSQ0v4aqNiYiRu24MeXlxYsXNZTje9dBt8M0ugbzZHxOt+Tkum0T77P6BE7tJCY+U/XnBzJ6IMToIvtjjKqBFl99jBIVZcTROOsScfziwj7PJrQpAv1dVryKv1c2nrod+0D68Spc++E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767334987; c=relaxed/simple; bh=SquersrdPm+OMwz4o28/QmCjQ1yhrhLHwIGGCtA6EZ8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jzJRttPloRmdbf4TOaqZXGvAV6JrEFb3T6xSD6OqxHyWV0xlyV6Ig29duYEmBOjkKgtEP5lCMSz0obly43eGDRLTcAekwX9DyvSXI2JtwL6lNnSaHaDmP4lqjTYvM0v6d2i6iH1st5PfTlYnAc56ODi9GwoAp88ZeCIQzISA2I0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pylf3dWy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pylf3dWy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 27784C116B1; Fri, 2 Jan 2026 06:23:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1767334987; bh=SquersrdPm+OMwz4o28/QmCjQ1yhrhLHwIGGCtA6EZ8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=pylf3dWyAtxpJkwEuMqmERtttl/1/mDcxanb2h20rp/Y5kRQ6dvByXggn/+N5VyXM bfGTFqaPD5lMaD3t+FGwgrzxMXbgR7MFoMeaF6uGyEJvd/WueRxdSqOSPZWbIVOAcF WwASZffGaxH7tCQR89mbpQV5p8n1UfV734fqoUG/r5wgc5WN96W0cPkvbfT1hykNs4 /nY0aI76//4kPHITXSFuT/xzRckjyTWYF73tbH8nXecL2ma3IeMQD86ZTcnl+5tilB HD/fetDVMtc0mtcqwAoFOg1gwLkhkuPZXO4O/j372jHR+iSTLecCfierr3NRdVQGhW 6P2Nlt84UHkCQ== Date: Fri, 2 Jan 2026 06:23:05 +0000 From: Jaegeuk Kim To: Nanzhe Zhao Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH 1/2 v4] f2fs: support large folio for immutable non-compressed case Message-ID: References: <20251120235446.1947532-1-jaegeuk@kernel.org> <0153ff69-789d-4fe1-a89c-0c607a9a7d6c@kernel.org> <0bf2eedd-eebb-440b-96f5-72ac3a68b608@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi Nanzhe, On 01/01, Nanzhe Zhao wrote: > Dear Kim: > Happy New Year! > > > +static struct f2fs_folio_state * > > +ffs_find_or_alloc(struct folio *folio) > > +{ > > + struct f2fs_folio_state *ffs = folio->private; > > + > > + if (ffs) > > + return ffs; > > + > > + ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO, true, NULL); > > + > > + spin_lock_init(&ffs->state_lock); > > + folio_attach_private(folio, ffs); > > + return ffs; > > +} > > It looks like ffs_find_or_alloc() does not initialize > read_pages_pending. > When I debug locally, printing read_pages_pending shows an undefined > random value. Also, when I run a basic read test with dd, tasks can hang > (because read_pages_pending never reaches zero, so the folio is never > unlocked and never marked uptodate). > > I know this function is modeled after iomap's ifs_alloc(): > > static struct iomap_folio_state *ifs_alloc(struct inode *inode, > struct folio *folio, unsigned int flags) > { > struct iomap_folio_state *ifs = folio->private; > unsigned int nr_blocks = i_blocks_per_folio(inode, folio); > gfp_t gfp; > > if (ifs || nr_blocks <= 1) > return ifs; > /*...*/ > /* > * ifs->state tracks two sets of state flags when the > * filesystem block size is smaller than the folio size. > * The first state tracks per-block uptodate and the > * second tracks per-block dirty state. > */ > ifs = kzalloc(struct_size(ifs, state, > BITS_TO_LONGS(2 * nr_blocks)), gfp); > if (!ifs) > return ifs; > > spin_lock_init(&ifs->state_lock); > if (folio_test_uptodate(folio)) > bitmap_set(ifs->state, 0, nr_blocks); > if (folio_test_dirty(folio)) > bitmap_set(ifs->state, nr_blocks, nr_blocks); > folio_attach_private(folio, ifs); > > return ifs; > } > > Note ifs_alloc() uses kzalloc(), which zero-initializes the allocated memory > by default while f2fs_kmem_cache_alloc() does not. > > We could fix this by explicitly setting read_pages_pending = 0, > or by doing a memset() right after f2fs_kmem_cache_alloc() > (the latter seems more extensible if the struct grows). What do you think? Agreed. What about adding __GFP_ZERO for f2fs_kmem_cache_alloc()? > > > /* > > + * This page will go to BIO. Do we need to send this > > + * BIO off first? > > + */ > > + if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio, > > + last_block_in_bio, block_nr) || > > + !f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) { > > +submit_and_realloc: > > + f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); > > + bio = NULL; > > + } > > + if (bio == NULL) > > + bio = f2fs_grab_read_bio(inode, block_nr, > > + max_nr_pages, > > + f2fs_ra_op_flags(rac), > > + index, false); > > + > > + /* > > + * If the page is under writeback, we need to wait for > > + * its completion to see the correct decrypted data. > > + */ > > + f2fs_wait_on_block_writeback(inode, block_nr); > > + > > + if (!bio_add_folio(bio, folio, F2FS_BLKSIZE, > > + offset << PAGE_SHIFT)) > > + goto submit_and_realloc; > > + > > + if (folio_test_large(folio)) { > > + ffs = ffs_find_or_alloc(folio); > > + > > + /* set the bitmap to wait */ > > + spin_lock_irq(&ffs->state_lock); > > + ffs->read_pages_pending++; > > + spin_unlock_irq(&ffs->state_lock); > > + } > > In the current code, it looks like a subpage is added to the BIO (or a > cached BIO is submitted) before read_pages_pending is incremented. > This can cause the following behaviour: > > After one subpage of a folio is submitted, if the I/O completes very > fast, the endio path may interrupt the read loop, run bio_endio, and > eventually call f2fs_finish_read_bio(), which decrements read_pages_pending > down to zero. That can make folio_finish_read() run too early, even though > other parts of the same folio have not been added to a BIO yet. > > I managed to trigger this locally by creating a heavily fragmented file > and temporarily injecting the following code right after BIO submission: > > f2fs_io_schedule_timeout(1); > WARN_ON_ONCE(!folio_test_locked(folio)); > > I think the correct ordering is to increment read_pages_pending first, > and then add the corresponding subpage to the BIO. > In that ordering, the BIO side will either: > 1) add a subpage after the increment (matching the new pending count), > or > 2) submit a BIO that corresponds to the pending increment from the > ** previous iteration **, > so read_pages_pending will not reach zero prematurely. > This is exactly the order that iomap_readpage_iter() implements. > > If you need the script I used to reproduce the bug, please let me know. > I will attach it in my next reply. Thanks! I think this is also valid. If possible, could you please post patches to fix these two bugs? Thanks, > > Best regards, > Nanzhe Zhao > > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel