Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Yun Zhou <yun.zhou@windriver.com>, clm@fb.com, dsterba@suse.com
Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] btrfs: wait for ordered extents before buffered write fallback in direct IO
Date: Thu, 25 Jun 2026 14:43:47 +0930	[thread overview]
Message-ID: <aa009e0d-a629-44ed-bb68-202ff2dc5063@suse.com> (raw)
In-Reply-To: <20260625021456.724803-1-yun.zhou@windriver.com>



在 2026/6/25 11:44, Yun Zhou 写道:
> When btrfs_direct_write() falls back to buffered IO after a failed DIO
> attempt, it may race with the asynchronous completion of DIO ordered
> extents.  This leads to a BUG_ON in insert_ordered_extent() due to
> overlapping ordered extents in the per-inode rb-tree.
> 
> The race sequence is:
>   1. DIO creates an ordered extent via btrfs_dio_iomap_begin()
>   2. Page fault occurs (nofault=true), no bio is submitted (submitted=0)
>   3. btrfs_dio_iomap_end() truncates and finishes the OE asynchronously
>      via btrfs_finish_ordered_extent() which queues work
>   4. iomap returns 0, retry logic faults in pages and retries DIO
>   5. Second DIO attempt also fails, code reaches buffered: label
>   6. btrfs_buffered_write() dirties pages for the same range

btrfs_buffered_write()
|- copy_one_range()
    |- lock_and_cleanup_extent_if_needed()
       |- btrfs_start_ordered_extent()

So your explanation doesn't makes sense. As if there is the direct IO oe 
remaining, we will wait for that OE to complete.

There is still something missing.

>   7. btrfs_fdatawrite_range() triggers writeback
>   8. run_delalloc_nocow() -> fallback_to_cow() -> cow_file_range()
>      tries to insert a new ordered extent for the same file offset
>   9. The DIO ordered extent hasn't been removed from the rb-tree yet
>      (btrfs_finish_ordered_io running async in workqueue) -> BUG_ON
> 
> Fix this by waiting for any pending ordered extents in the target range
> before starting the buffered write.
> 
> Reported-by: syzbot+ba2afde329fc27e3f22e@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=ba2afde329fc27e3f22e
> Fixes: acf9ed3a6c00 ("btrfs: retry faulting in the pages after a zero sized short direct write")
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
> ---
>   fs/btrfs/direct-io.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
> 
> diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
> index 460326d34143..e8ac9492844c 100644
> --- a/fs/btrfs/direct-io.c
> +++ b/fs/btrfs/direct-io.c
> @@ -844,6 +844,7 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>   	struct file *file = iocb->ki_filp;
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
> +	struct btrfs_ordered_extent *ordered;
>   	loff_t pos;
>   	ssize_t written = 0;
>   	ssize_t written_buffered;
> @@ -1025,6 +1026,29 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
>   	}
>   
>   	pos = iocb->ki_pos;
> +
> +	/*
> +	 * The DIO path may have created ordered extent(s) that are still being
> +	 * processed asynchronously in a work queue.  We must wait for them to
> +	 * be fully completed and removed from the rb-tree before doing a
> +	 * buffered write to the same or overlapping range; otherwise the
> +	 * buffered writeback path (run_delalloc_nocow -> fallback_to_cow ->
> +	 * cow_file_range) may try to insert a new ordered extent that conflicts
> +	 * with the still-pending DIO one, triggering a BUG_ON in
> +	 * insert_ordered_extent().
> +	 *
> +	 * This happens when DIO creates an ordered extent but has a short write
> +	 * (submitted < length in btrfs_dio_iomap_end()), which truncates and
> +	 * finishes the ordered extent asynchronously while we fall back to
> +	 * buffered IO for the same range.
> +	 */
> +	while ((ordered = btrfs_lookup_ordered_range(BTRFS_I(inode),
> +				(u64)(pos - written),
> +				(u64)written + iov_iter_count(from))) != NULL) {
> +		btrfs_start_ordered_extent(ordered);
> +		btrfs_put_ordered_extent(ordered);
> +	}
> +
>   	written_buffered = btrfs_buffered_write(iocb, from);
>   	if (written_buffered < 0) {
>   		ret = written_buffered;


  reply	other threads:[~2026-06-25  5:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25  2:14 [PATCH] btrfs: wait for ordered extents before buffered write fallback in direct IO Yun Zhou
2026-06-25  5:13 ` Qu Wenruo [this message]
2026-06-25  5:17   ` Qu Wenruo
2026-06-25  6:46     ` Zhou, Yun
2026-06-25  6:51       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa009e0d-a629-44ed-bb68-202ff2dc5063@suse.com \
    --to=wqu@suse.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yun.zhou@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox