From: fdmanana@kernel.org
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] Btrfs: fix loading of orphan roots leading to BUG_ON
Date: Wed, 2 Mar 2016 15:49:38 +0000 [thread overview]
Message-ID: <1456933778-7944-1-git-send-email-fdmanana@kernel.org> (raw)
From: Filipe Manana <fdmanana@suse.com>
When looking for orphan roots during mount we can end up hitting a
BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
replayed and qgroups are enabled. This is because after a log tree is
replayed, a transaction commit is made, which triggers qgroup extent
accounting which in turn does backref walking which ends up reading and
inserting all roots in the radix tree fs_info->fs_root_radix, including
orphan roots (deleted snapshots). So after the log tree is replayed, when
finding orphan roots we hit the BUG_ON with the following trace:
[118209.182438] ------------[ cut here ]------------
[118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
[118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
[118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G W 4.5.0-rc5-btrfs-next-24+ #1
[118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
[118209.186318] RIP: 0010:[<ffffffffa04237d7>] [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318] RSP: 0018:ffff8800af34faa8 EFLAGS: 00010246
[118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
[118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
[118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
[118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
[118209.186318] FS: 00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
[118209.186318] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
[118209.186318] Stack:
[118209.186318] fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
[118209.186318] fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
[118209.186318] 0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
[118209.186318] Call Trace:
[118209.186318] [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
[118209.186318] [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
[118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318] [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
[118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318] [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
[118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318] [<ffffffff81195637>] do_mount+0x8a6/0x9e8
[118209.186318] [<ffffffff8119598d>] SyS_mount+0x77/0x9f
[118209.186318] [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
[118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
[118209.186318] RIP [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318] RSP <ffff8800af34faa8>
[118209.230735] ---[ end trace 83938f987d85d477 ]---
So fix this by not treating the error -EEXIST, returned when attempting
to insert a root already inserted by the backref walking code, as an error.
The following test case for xfstests reproduces the bug:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
_cleanup_flakey
cd /
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_target flakey
_require_metadata_journaling $SCRATCH_DEV
rm -f $seqres.full
_scratch_mkfs >>$seqres.full 2>&1
_init_flakey
_mount_flakey
_run_btrfs_util_prog quota enable $SCRATCH_MNT
# Create 2 directories with one file in one of them.
# We use these just to trigger a transaction commit later, moving the file from
# directory a to directory b and doing an fsync against directory a.
mkdir $SCRATCH_MNT/a
mkdir $SCRATCH_MNT/b
touch $SCRATCH_MNT/a/f
sync
# Create our test file with 2 4K extents.
$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
# Create a snapshot and delete it. This doesn't really delete the snapshot
# immediately, just makes it inaccessible and invisible to user space, the
# snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
# which is woke up at the next transaction commit.
# A root orphan item is inserted into the tree of tree roots, so that if a
# power failure happens before the dedicated kernel thread does the snapshot
# deletion, the next time the filesystem is mounted it resumes the snapshot
# deletion.
_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
# Now overwrite half of the extents we wrote before. Because we made a snapshpot
# before, which isn't really deleted yet (since no transaction commit happened
# after we did the snapshot delete request), the non overwritten extents get
# referenced twice, once by the default subvolume and once by the snapshot.
$XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
# Now move file f from directory a to directory b and fsync directory a.
# The fsync on the directory a triggers a transaction commit (because a file
# was moved from it to another directory) and the file fsync leaves a log tree
# with file extent items to replay.
mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
echo "File digest before power failure:"
md5sum $SCRATCH_MNT/foobar | _filter_scratch
# Now simulate a power failure and mount the filesystem to replay the log tree.
# After the log tree was replayed, we used to hit a BUG_ON() when processing
# the root orphan item for the deleted snapshot. This is because when processing
# an orphan root the code expected to be the first code inserting the root into
# the fs_info->fs_root_radix radix tree, while in reallity it was the second
# caller attempting to do it - the first caller was the transaction commit that
# took place after replaying the log tree, when updating the qgroup counters.
_flakey_drop_and_remount
echo "File digest before after failure:"
# Must match what he got before the power failure.
md5sum $SCRATCH_MNT/foobar | _filter_scratch
_unmount_flakey
status=0
exit
Fixes: 2d9e97761087 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/root-tree.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index a25f3b2..9fcd6df 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -310,8 +310,16 @@ int btrfs_find_orphan_roots(struct btrfs_root *tree_root)
set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state);
err = btrfs_insert_fs_root(root->fs_info, root);
+ /*
+ * The root might have been inserted already, as before we look
+ * for orphan roots, log replay might have happened, which
+ * triggers a transaction commit and qgroup accounting, which
+ * in turn reads and inserts fs roots while doing backref
+ * walking.
+ */
+ if (err == -EEXIST)
+ err = 0;
if (err) {
- BUG_ON(err == -EEXIST);
btrfs_free_fs_root(root);
break;
}
--
2.7.0.rc3
next reply other threads:[~2016-03-02 15:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-02 15:49 fdmanana [this message]
2016-03-03 4:31 ` [PATCH] Btrfs: fix loading of orphan roots leading to BUG_ON Duncan
2016-03-03 6:26 ` Qu Wenruo
2016-03-03 7:44 ` Duncan
2016-03-03 8:04 ` Qu Wenruo
2016-03-03 9:10 ` Filipe Manana
2016-04-14 5:34 ` Qu Wenruo
2016-04-14 9:21 ` Filipe Manana
2016-04-15 1:17 ` Qu Wenruo
2016-04-15 9:39 ` David Sterba
2016-03-03 6:29 ` Qu Wenruo
2016-03-03 9:17 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1456933778-7944-1-git-send-email-fdmanana@kernel.org \
--to=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.