From: Brian Foster <bfoster@redhat.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: brauner@kernel.org, miklos@szeredi.hu, djwong@kernel.org,
hch@infradead.org, hsiangkao@linux.alibaba.com,
linux-block@vger.kernel.org, gfs2@lists.linux.dev,
linux-fsdevel@vger.kernel.org, kernel-team@meta.com,
linux-xfs@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 07/14] iomap: track pending read bytes more optimally
Date: Fri, 24 Oct 2025 13:14:09 -0400 [thread overview]
Message-ID: <aPuz4Uop66-jRpN-@bfoster> (raw)
In-Reply-To: <CAJnrk1b3bHYhbW9q0r4A0NjnMNEbtCFExosAL_rUoBupr1mO3Q@mail.gmail.com>
On Fri, Oct 24, 2025 at 09:25:13AM -0700, Joanne Koong wrote:
> On Thu, Oct 23, 2025 at 5:01 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Thu, Oct 23, 2025 at 12:30 PM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Thu, Sep 25, 2025 at 05:26:02PM -0700, Joanne Koong wrote:
> > > > Instead of incrementing read_bytes_pending for every folio range read in
> > > > (which requires acquiring the spinlock to do so), set read_bytes_pending
> > > > to the folio size when the first range is asynchronously read in, keep
> > > > track of how many bytes total are asynchronously read in, and adjust
> > > > read_bytes_pending accordingly after issuing requests to read in all the
> > > > necessary ranges.
> > > >
> > > > iomap_read_folio_ctx->cur_folio_in_bio can be removed since a non-zero
> > > > value for pending bytes necessarily indicates the folio is in the bio.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > ---
> > >
> > > Hi Joanne,
> > >
> > > I was throwing some extra testing at the vfs-6.19.iomap branch since the
> > > little merge conflict thing with iomap_iter_advance(). I end up hitting
> > > what appears to be a lockup on XFS with 1k FSB (-bsize=1k) running
> > > generic/051. It reproduces fairly reliably within a few iterations or so
> > > and seems to always stall during a read for a dedupe operation:
> > >
> > > task:fsstress state:D stack:0 pid:12094 tgid:12094 ppid:12091 task_flags:0x400140 flags:0x00080003
> > > Call Trace:
> > > <TASK>
> > > __schedule+0x2fc/0x7a0
> > > schedule+0x27/0x80
> > > io_schedule+0x46/0x70
> > > folio_wait_bit_common+0x12b/0x310
> > > ? __pfx_wake_page_function+0x10/0x10
> > > ? __pfx_xfs_vm_read_folio+0x10/0x10 [xfs]
> > > filemap_read_folio+0x85/0xd0
> > > ? __pfx_xfs_vm_read_folio+0x10/0x10 [xfs]
> > > do_read_cache_folio+0x7c/0x1b0
> > > vfs_dedupe_file_range_compare.constprop.0+0xaf/0x2d0
> > > __generic_remap_file_range_prep+0x276/0x2a0
> > > generic_remap_file_range_prep+0x10/0x20
> > > xfs_reflink_remap_prep+0x22c/0x300 [xfs]
> > > xfs_file_remap_range+0x84/0x360 [xfs]
> > > vfs_dedupe_file_range_one+0x1b2/0x1d0
> > > ? remap_verify_area+0x46/0x140
> > > vfs_dedupe_file_range+0x162/0x220
> > > do_vfs_ioctl+0x4d1/0x940
> > > __x64_sys_ioctl+0x75/0xe0
> > > do_syscall_64+0x84/0x800
> > > ? do_syscall_64+0xbb/0x800
> > > ? avc_has_perm_noaudit+0x6b/0xf0
> > > ? _copy_to_user+0x31/0x40
> > > ? cp_new_stat+0x130/0x170
> > > ? __do_sys_newfstat+0x44/0x70
> > > ? do_syscall_64+0xbb/0x800
> > > ? do_syscall_64+0xbb/0x800
> > > ? clear_bhb_loop+0x30/0x80
> > > ? clear_bhb_loop+0x30/0x80
> > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > RIP: 0033:0x7fe6bbd9a14d
> > > RSP: 002b:00007ffde72cd4e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > > RAX: ffffffffffffffda RBX: 0000000000000068 RCX: 00007fe6bbd9a14d
> > > RDX: 000000000a1394b0 RSI: 00000000c0189436 RDI: 0000000000000004
> > > RBP: 00007ffde72cd530 R08: 0000000000001000 R09: 000000000a11a3fc
> > > R10: 000000000001d6c0 R11: 0000000000000246 R12: 000000000a12cfb0
> > > R13: 000000000a12ba10 R14: 000000000a14e610 R15: 0000000000019000
> > > </TASK>
> > >
> > > It wasn't immediately clear to me what the issue was so I bisected and
> > > it landed on this patch. It kind of looks like we're failing to unlock a
> > > folio at some point and then tripping over it later..? I can kill the
> > > fsstress process but then the umount ultimately gets stuck tossing
> > > pagecache [1], so the mount still ends up stuck indefinitely. Anyways,
> > > I'll poke at it some more but I figure you might be able to make sense
> > > of this faster than I can.
> > >
> > > Brian
> >
> > Hi Brian,
> >
> > Thanks for your report and the repro instructions. I will look into
> > this and report back what I find.
>
> This is the fix:
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4e6258fdb915..aa46fec8362d 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -445,6 +445,9 @@ static void iomap_read_end(struct folio *folio,
> size_t bytes_pending)
> bool end_read, uptodate;
> size_t bytes_accounted = folio_size(folio) - bytes_pending;
>
> + if (!bytes_accounted)
> + return;
> +
> spin_lock_irq(&ifs->state_lock);
>
>
> What I missed was that if all the bytes in the folio are non-uptodate
> and need to read in by the filesystem, then there's a bug where the
> read will be ended on the folio twice (in iomap_read_end() and when
> the filesystem calls iomap_finish_folio_write(), when only the
> filesystem should end the read), which does 2 folio unlocks which ends
> up locking the folio. Looking at the writeback patch that does a
> similar optimization [1], I miss the same thing there.
>
Makes sense.. though a short comment wouldn't hurt in there. ;) I found
myself a little confused by the accounted vs. pending naming when
reading through that code. If I follow correctly, the intent is to refer
to the additional bytes accounted to read_bytes_pending via the init
(where it just accounts the whole folio up front) and pending refers to
submitted I/O.
Presumably that extra accounting doubly serves as the typical "don't
complete the op before the submitter is done processing" extra
reference, except in this full submit case of course. If so, that's
subtle enough in my mind that a sentence or two on it wouldn't hurt..
> I'll fix up both. Thanks for catching this and bisecting it down to
> this patch. Sorry for the trouble.
>
No prob. Thanks for the fix!
Brian
> Thanks,
> Joanne
>
> [1] https://lore.kernel.org/linux-fsdevel/20251009225611.3744728-4-joannelkoong@gmail.com/
> >
> > Thanks,
> > Joanne
> > >
>
next prev parent reply other threads:[~2025-10-24 17:10 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 0:25 [PATCH v5 00/14] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-26 0:25 ` [PATCH v5 01/14] iomap: move bio read logic into helper function Joanne Koong
2025-09-26 0:25 ` [PATCH v5 02/14] iomap: move read/readahead bio submission " Joanne Koong
2025-09-26 0:25 ` [PATCH v5 03/14] iomap: store read/readahead bio generically Joanne Koong
2025-09-26 0:25 ` [PATCH v5 04/14] iomap: iterate over folio mapping in iomap_readpage_iter() Joanne Koong
2025-09-26 0:26 ` [PATCH v5 05/14] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
2025-09-26 0:26 ` [PATCH v5 06/14] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
2025-09-26 0:26 ` [PATCH v5 07/14] iomap: track pending read bytes more optimally Joanne Koong
2025-10-23 19:34 ` Brian Foster
2025-10-24 0:01 ` Joanne Koong
2025-10-24 16:25 ` Joanne Koong
2025-10-24 17:14 ` Brian Foster [this message]
2025-10-24 19:48 ` Joanne Koong
2025-10-24 21:55 ` Joanne Koong
2025-10-27 12:16 ` Brian Foster
2025-10-24 17:21 ` Matthew Wilcox
2025-10-24 19:22 ` Joanne Koong
2025-10-24 20:59 ` Matthew Wilcox
2025-10-24 21:37 ` Darrick J. Wong
2025-10-24 21:58 ` Joanne Koong
2025-09-26 0:26 ` [PATCH v5 08/14] iomap: set accurate iter->pos when reading folio ranges Joanne Koong
2025-09-26 0:26 ` [PATCH v5 09/14] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-26 0:26 ` [PATCH v5 10/14] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-26 0:26 ` [PATCH v5 11/14] iomap: make iomap_read_folio() a void return Joanne Koong
2025-09-26 0:26 ` [PATCH v5 12/14] fuse: use iomap for read_folio Joanne Koong
2025-09-26 0:26 ` [PATCH v5 13/14] fuse: use iomap for readahead Joanne Koong
2025-09-26 0:26 ` [PATCH v5 14/14] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
2025-09-29 9:38 ` [PATCH v5 00/14] fuse: use iomap for buffered reads + readahead Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPuz4Uop66-jRpN-@bfoster \
--to=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=gfs2@lists.linux.dev \
--cc=hch@infradead.org \
--cc=hsiangkao@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).