[PATCH] btrfs: fix hung task when cloning inline extent races with writeback

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Deepanshu Kartikey <kartikey406@gmail.com>
To: syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com
Cc: Deepanshu Kartikey <Kartikey406@gmail.com>,
	stable@vger.kernel.org,
	Deepanshu Kartikey <kartikey406@gmail.com>
Subject: [PATCH] btrfs: fix hung task when cloning inline extent races with writeback
Date: Thu, 26 Mar 2026 07:19:53 +0530	[thread overview]
Message-ID: <20260326014953.16727-1-kartikey406@gmail.com> (raw)

From: Deepanshu Kartikey <Kartikey406@gmail.com>

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

When cloning an inline extent, clone_copy_inline_extent() calls
copy_inline_to_page() which locks an extent range in the destination
inode's io_tree, dirties a page with the inline data, and sets
BTRFS_INODE_NO_DELALLOC_FLUSH on the inode. At this point i_size is
still 0 since clone_finish_inode_update() has not been called yet.

Then clone_copy_inline_extent() calls start_transaction() which may
block waiting for the current transaction to commit. While blocked,
the transaction commit calls btrfs_start_delalloc_flush() which calls
try_to_writeback_inodes_sb(), queuing a kworker to flush the clone
destination inode.

The kworker calls btrfs_writepages() -> extent_writepage() and since
i_size is still 0, the dirty page appears to be beyond EOF. This
causes extent_writepage() to call folio_invalidate() ->
btrfs_invalidate_folio() -> btrfs_lock_extent() which blocks forever
because the clone operation holds that lock, creating a circular
deadlock:

  clone   -> waits for transaction commit to finish
  commit  -> waits for kworker writeback to finish
  kworker -> waits for extent lock held by clone

Additionally any periodic background writeback that races with the
clone operation before i_size is updated will also block on the same
extent lock causing a hung task warning.

The flag BTRFS_INODE_NO_DELALLOC_FLUSH was introduced by commit
3d45f221ce62 to prevent this deadlock but was only checked inside
start_delalloc_inodes(), which is only reached through the btrfs
metadata reclaim path. The transaction commit path goes through
try_to_writeback_inodes_sb() which is a VFS function that bypasses
start_delalloc_inodes() entirely, so the flag was never checked there.

Fix this by checking BTRFS_INODE_NO_DELALLOC_FLUSH at the top of
btrfs_writepages() and returning early if set. This catches all
writeback paths since every writeback on a btrfs inode eventually
calls btrfs_writepages(). The inode will be safely written after the
clone operation finishes and clears the flag, at which point all
locks are released and i_size is properly updated.

Also change the local variable type from 'struct inode *' to
'struct btrfs_inode *' to avoid the double BTRFS_I() conversion.

Fixes: 3d45f221ce62 ("btrfs: fix deadlock when cloning inline extent and low on free metadata space")
CC: stable@vger.kernel.org
Reported-by: syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 fs/btrfs/extent_io.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5f97a3d2a8d7..f7df7c0c8955 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2698,21 +2698,54 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f

 int btrfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
 {
-	struct inode *inode = mapping->host;
+	struct btrfs_inode *inode = BTRFS_I(mapping->host);
 	int ret = 0;
 	struct btrfs_bio_ctrl bio_ctrl = {
 		.wbc = wbc,
 		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
 	};

+	/*
+	 * If this inode is being used for a clone/reflink operation that
+	 * copied an inline extent into a page of the destination inode, skip
+	 * writeback to avoid a deadlock or a long blocked task.
+	 *
+	 * The clone operation holds the extent range locked in the inode's
+	 * io_tree for its entire duration. Any writeback attempt on this
+	 * inode will block trying to lock that same extent range inside
+	 * writepage_delalloc() or btrfs_invalidate_folio(), causing a
+	 * hung task.
+	 *
+	 * When writeback is triggered from the transaction commit path via
+	 * btrfs_start_delalloc_flush() -> try_to_writeback_inodes_sb(),
+	 * this becomes a true circular deadlock:
+	 *
+	 *   clone   -> waits for transaction commit to finish
+	 *   commit  -> waits for kworker writeback to finish
+	 *   kworker -> waits for extent lock held by clone
+	 *
+	 * The flag BTRFS_INODE_NO_DELALLOC_FLUSH was already checked in
+	 * start_delalloc_inodes() but only for the btrfs metadata reclaim
+	 * path. The transaction commit path goes through
+	 * try_to_writeback_inodes_sb() which bypasses that check entirely
+	 * and calls btrfs_writepages() directly.
+	 *
+	 * By checking the flag here we catch all writeback paths. The inode
+	 * will be safely written after the clone operation finishes and
+	 * clears BTRFS_INODE_NO_DELALLOC_FLUSH, at which point all locks
+	 * are released and writeback can proceed normally.
+	 */
+	if (test_bit(BTRFS_INODE_NO_DELALLOC_FLUSH, &inode->runtime_flags))
+		return 0;
+
 	/*
 	 * Allow only a single thread to do the reloc work in zoned mode to
 	 * protect the write pointer updates.
 	 */
-	btrfs_zoned_data_reloc_lock(BTRFS_I(inode));
+	btrfs_zoned_data_reloc_lock(inode);
 	ret = extent_write_cache_pages(mapping, &bio_ctrl);
 	submit_write_bio(&bio_ctrl, ret);
-	btrfs_zoned_data_reloc_unlock(BTRFS_I(inode));
+	btrfs_zoned_data_reloc_unlock(inode);
 	return ret;
 }

-- 
2.43.0

next             reply	other threads:[~2026-03-26  1:50 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  1:49 Deepanshu Kartikey [this message]
2026-03-26  2:46 ` [syzbot] [btrfs?] INFO: task hung in btrfs_invalidate_folio (3) syzbot

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5f97a3d2a8d dfblob:f7df7c0c895 )
 OR (
bs:"[PATCH] btrfs: fix hung task when cloning inline extent races with writeback" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260326014953.16727-1-kartikey406@gmail.com \
    --to=kartikey406@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox