linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock
@ 2018-08-14  5:51 Qu Wenruo
  2018-08-20 16:38 ` David Sterba
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2018-08-14  5:51 UTC (permalink / raw)
  To: linux-btrfs, dsterba

[BUG]
For certains crafted image, whose csum root leaf has missing backref, if
we try to trigger write with data csum, it could cause deadlock with the
following kernel WARN_ON():
------
WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
Workqueue: btrfs-endio-write btrfs_endio_write_helper
RIP: 0010:btrfs_tree_lock+0x3e2/0x400
Call Trace:
 btrfs_alloc_tree_block+0x39f/0x770
 __btrfs_cow_block+0x285/0x9e0
 btrfs_cow_block+0x191/0x2e0
 btrfs_search_slot+0x492/0x1160
 btrfs_lookup_csum+0xec/0x280
 btrfs_csum_file_blocks+0x2be/0xa60
 add_pending_csums+0xaf/0xf0
 btrfs_finish_ordered_io+0x74b/0xc90
 finish_ordered_fn+0x15/0x20
 normal_work_helper+0xf6/0x500
 btrfs_endio_write_helper+0x12/0x20
 process_one_work+0x302/0x770
 worker_thread+0x81/0x6d0
 kthread+0x180/0x1d0
 ret_from_fork+0x35/0x40
---[ end trace 2e85051acb5f6dc1 ]---
------

[CAUSE]
That crafted image has missing backref for csum tree root leaf.
And when we try to allocate new tree block, since there is no
EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
and use it.

The extent tree of the image looks like:
Normal image                      |       This fuzzed image
----------------------------------+--------------------------------
BG 29360128                       | BG 29360128
 One empty slot                   |  One empty slot
29364224: backref to UUID tree    | 29364224: backref to UUID tree
 Two empty slots                  |  Two empty slots
29376512: backref to CSUM tree    |  One empty slot (bad type) <<<
29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree
...                               | ...

Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to
alloc tree block, it's an valid slot for btrfs.

And for finish_ordered_write, when we need to insert csum, we try to CoW
csum tree root.

By *COINCIDENT*, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K,
BG_OFFSET + 12K is already used by tree block COW for other trees,
the next empty slot is BG_OFFSET + 16K, which should be the backref for
CSUM tree.

But due to the bad type, btrfs can recognize it and still consider it as
an empty slot, and will try to use it for csum tree CoW.

Then in the following call trace, we will try to lock the new tree
block, which turns out to be the old csum tree root which is already
locked:

btrfs_search_slot() called on csum tree root, which is at 29376512
|- btrfs_cow_block()
   |- btrfs_set_lock_block()
   |  |- Now locks tree block 29376512 (old csum tree root)
   |- __btrfs_cow_block()
      |- btrfs_alloc_tree_block()
         |- btrfs_reserve_extent()
            | Now it returns tree block 29376512, which extent tree
            | shows its empty slot, but it's already hold by csum tree
            |- btrfs_init_new_buffer()
               |- btrfs_tree_lock()
                  | Triggers WARN_ON(eb->lock_owner == current->pid)
                  |- wait_event()
                     Wait lock owner to release the lock, but it's
                     locked by ourself, so it will deadlock

[FIX]
This patch will do the lock_owner and current->pid check at
btrfs_init_new_buffer().
So above deadlock can be avoided.

Since such problem can only happen in crafted image, we will still
trigger kernel warning, but with a little more meaningful warning
message.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
changelog:
v2:
  Modify btrfs_tree_lock() to be able to return int to detect possible
  deadlock and return.
  Titled as "btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock"
v3:
  Instead of modify all btrfs_tree_lock() callers, only check possible
  deadlock at btrfs_init_new_buffer(), as the bug only happens for newly
  allocated tree block.
  With better commit message describing the on-disk extent tree
  corruption along with the call trace of how dead lock happens.

Hi David,

This v3 should explain the bug with more details and bring a minimal
impact to existing function callers.

Thanks,
Qu
---
 fs/btrfs/extent-tree.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e7559b89eaf7..c85ff8b7a95b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8302,6 +8302,19 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	if (IS_ERR(buf))
 		return buf;
 
+	/*
+	 * Just extra safe net in case extent tree is corrupted and extent
+	 * allocator chooses to use a tree block which is already used and
+	 * locked.
+	 */
+	if (buf->lock_owner == current->pid) {
+		WARN(1,
+"tree block %llu already locked by pid=%d, extent tree corruption detected",
+		     buf->start, current->pid);
+		free_extent_buffer(buf);
+		return ERR_PTR(-EUCLEAN);
+	}
+
 	btrfs_set_header_generation(buf, trans->transid);
 	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
 	btrfs_tree_lock(buf);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-21  3:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-14  5:51 [PATCH RFC] btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock Qu Wenruo
2018-08-20 16:38 ` David Sterba
2018-08-21  0:04   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).