linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <fdmanana@kernel.org>, <linux-btrfs@vger.kernel.org>
Cc: Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH] Btrfs: fix order by which delayed references are run
Date: Fri, 10 Jul 2015 08:28:56 +0800	[thread overview]
Message-ID: <559F11C8.8090604@cn.fujitsu.com> (raw)
In-Reply-To: <1436453447-27466-1-git-send-email-fdmanana@kernel.org>



  wrote on 2015/07/09 15:50 +0100:
> From: Filipe Manana <fdmanana@suse.com>
>
> When we have an extent that got N references removed and N new references
> added in the same transaction, we must run the insertion of the references
> first because otherwise the last removed reference will remove the extent
> item from the extent tree, resulting in a failure for the insertions.
>
> This is a regression introduced in the 4.2-rc1 release and this fix just
> brings back the behaviour of selecting reference additions before any
> reference removals.
>
Thanks, Filipe, that's right, it's my fault to forgot such case.

Acked-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Thanks,
Qu
> The following test case for fstests reproduces the issue:
>
>    seq=`basename $0`
>    seqres=$RESULT_DIR/$seq
>    echo "QA output created by $seq"
>    tmp=/tmp/$$
>    status=1	# failure is the default!
>    trap "_cleanup; exit \$status" 0 1 2 3 15
>
>    _cleanup()
>    {
>        _cleanup_flakey
>        rm -f $tmp.*
>    }
>
>    # get standard environment, filters and checks
>    . ./common/rc
>    . ./common/filter
>    . ./common/dmflakey
>
>    # real QA test starts here
>    _need_to_be_root
>    _supported_fs btrfs
>    _supported_os Linux
>    _require_scratch
>    _require_dm_flakey
>    _require_cloner
>    _require_metadata_journaling $SCRATCH_DEV
>
>    rm -f $seqres.full
>
>    _scratch_mkfs >>$seqres.full 2>&1
>    _init_flakey
>    _mount_flakey
>
>    # Create prealloc extent covering range [160K, 620K[
>    $XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo
>
>    # Now write to the last 80K of the prealloc extent plus 40K to the unallocated
>    # space that immediately follows it. This creates a new extent of 40K that spans
>    # the range [620K, 660K[.
>    $XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io
>
>    # At this point, there are now 2 back references to the prealloc extent in our
>    # extent tree. Both are for our file offset 160K and one relates to a file
>    # extent item with a data offset of 0 and a length of 380K, while the other
>    # relates to a file extent item with a data offset of 380K and a length of 80K.
>
>    # Make sure everything done so far is durably persisted (all back references are
>    # in the extent tree, etc).
>    sync
>
>    # Now clone all extents of our file that cover the offset 160K up to its eof
>    # (660K at this point) into itself at offset 2M. This leaves a hole in the file
>    # covering the range [660K, 2M[. The prealloc extent will now be referenced by
>    # the file twice, once for offset 160K and once for offset 2M. The 40K extent
>    # that follows the prealloc extent will also be referenced twice by our file,
>    # once for offset 620K and once for offset 2M + 460K.
>    $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \
> 	$SCRATCH_MNT/foo
>
>    # Now create one new extent in our file with a size of 100Kb. It will span the
>    # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
>    # range [2M + 460K, 3M[. Our new file size is 3M + 100K.
>    $XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io
>
>    # At this point, there are now (in memory) 4 back references to the prealloc
>    # extent.
>    #
>    # Two of them are for file offset 160K, related to file extent items
>    # matching the file offsets 160K and 540K respectively, with data offsets of
>    # 0 and 380K respectively, and with lengths of 380K and 80K respectively.
>    #
>    # The other two references are for file offset 2M, related to file extent items
>    # matching the file offsets 2M and 2M + 380K respectively, with data offsets of
>    # 0 and 380K respectively, and with lengths of 389K and 80K respectively.
>    #
>    # The 40K extent has 2 back references, one for file offset 620K and the other
>    # for file offset 2M + 460K.
>    #
>    # The 100K extent has a single back reference and it relates to file offset 3M.
>
>    # Now clone our 100K extent into offset 600K. That offset covers the last 20K
>    # of the prealloc extent, the whole 40K extent and 40K of the hole starting at
>    # offset 660K.
>    $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \
>        $SCRATCH_MNT/foo $SCRATCH_MNT/foo
>
>    # At this point there's only one reference to the 40K extent, at file offset
>    # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset
>    # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for
>    # file offset 3M and a new one for file offset 600K).
>
>    # Now fsync our file to make all its new data and metadata updates are durably
>    # persisted and present if a power failure/crash happens after a successful
>    # fsync and before the next transaction commit.
>    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
>
>    echo "File digest before power failure:"
>    md5sum $SCRATCH_MNT/foo | _filter_scratch
>
>    # Silently drop all writes and ummount to simulate a crash/power failure.
>    _load_flakey_table $FLAKEY_DROP_WRITES
>    _unmount_flakey
>
>    # Allow writes again, mount to trigger log replay and validate file contents.
>    # During log replay, the btrfs delayed references implementation used to run the
>    # deletion of back references before the addition of new back references, which
>    # made the addition fail as it didn't find the key in the extent tree that it
>    # was looking for. The failure triggered by this test was related to the 40K
>    # extent, which got 1 reference dropped and 1 reference added during the fsync
>    # log replay - when running the delayed references at transaction commit time,
>    # btrfs was applying the deletion before the insertion, resulting in a failure
>    # of the insertion that ended up turning the fs into read-only mode.
>    _load_flakey_table $FLAKEY_ALLOW_WRITES
>    _mount_flakey
>
>    echo "File digest after log replay:"
>    md5sum $SCRATCH_MNT/foo | _filter_scratch
>
>    _unmount_flakey
>
>    status=0
>    exit
>
> This issue turned the filesystem into read-only mode (current transaction
> aborted) and produced the following traces:
>
>    [ 8247.578385] ------------[ cut here ]------------
>    [ 8247.579947] WARNING: CPU: 0 PID: 11341 at fs/btrfs/extent-tree.c:1547 lookup_inline_extent_backref+0x17d/0x45d [btrfs]()
>    (...)
>    [ 8247.601697] Call Trace:
>    [ 8247.602222]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
>    [ 8247.604320]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
>    [ 8247.605488]  [<ffffffffa0506c8d>] ? lookup_inline_extent_backref+0x17d/0x45d [btrfs]
>    [ 8247.608226]  [<ffffffffa0506c8d>] lookup_inline_extent_backref+0x17d/0x45d [btrfs]
>    [ 8247.617061]  [<ffffffffa0507957>] insert_inline_extent_backref+0x41/0xb2 [btrfs]
>    [ 8247.621856]  [<ffffffffa0507c4f>] __btrfs_inc_extent_ref+0x8c/0x20a [btrfs]
>    [ 8247.624366]  [<ffffffffa050ee60>] __btrfs_run_delayed_refs+0xb0c/0xd49 [btrfs]
>    [ 8247.626176]  [<ffffffffa0510dcd>] btrfs_run_delayed_refs+0x6d/0x1d4 [btrfs]
>    [ 8247.627435]  [<ffffffff81155c9b>] ? __cache_free+0x4a7/0x4b6
>    [ 8247.628531]  [<ffffffffa0520482>] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
>    (...)
>    [ 8247.648430] ---[ end trace 2461e55f92c2ac2d ]---
>
>    [ 8247.727263] WARNING: CPU: 3 PID: 11341 at fs/btrfs/extent-tree.c:2771 btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]()
>    [ 8247.728954] BTRFS: Transaction aborted (error -5)
>    (...)
>    [ 8247.760866] Call Trace:
>    [ 8247.761534]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
>    [ 8247.764271]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
>    [ 8247.767582]  [<ffffffffa0510e04>] ? btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
>    [ 8247.769373]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
>    [ 8247.770836]  [<ffffffffa0510e04>] btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
>    [ 8247.772532]  [<ffffffff81155c9b>] ? __cache_free+0x4a7/0x4b6
>    [ 8247.773664]  [<ffffffffa0520482>] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
>    [ 8247.775047]  [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
>    [ 8247.776176]  [<ffffffff81155dd5>] ? kmem_cache_free+0x12b/0x189
>    [ 8247.777427]  [<ffffffffa055a920>] btrfs_recover_log_trees+0x2da/0x33d [btrfs]
>    [ 8247.778575]  [<ffffffffa055898e>] ? replay_one_extent+0x4fc/0x4fc [btrfs]
>    [ 8247.779838]  [<ffffffffa051e265>] open_ctree+0x1cc0/0x201a [btrfs]
>    [ 8247.781020]  [<ffffffff81120f48>] ? register_shrinker+0x56/0x81
>    [ 8247.782285]  [<ffffffffa04fb12c>] btrfs_mount+0x5f0/0x734 [btrfs]
>    (...)
>    [ 8247.793394] ---[ end trace 2461e55f92c2ac2e ]---
>    [ 8247.794276] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2771: errno=-5 IO failure
>    [ 8247.797335] BTRFS: error (device dm-0) in btrfs_replay_log:2375: errno=-5 IO failure (Failed to recover log tree)
>
> Fixes: c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/extent-tree.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1c2bd17..bccceea5 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2296,9 +2296,22 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
>   static inline struct btrfs_delayed_ref_node *
>   select_delayed_ref(struct btrfs_delayed_ref_head *head)
>   {
> +	struct btrfs_delayed_ref_node *ref;
> +
>   	if (list_empty(&head->ref_list))
>   		return NULL;
>
> +	/*
> +	 * Select a delayed ref of type BTRFS_ADD_DELAYED_REF first.
> +	 * This is to prevent a ref count from going down to zero, which deletes
> +	 * the extent item from the extent tree, when here still are references
> +	 * to add, which would fail because they would not find the extent item.
> +	 */
> +	list_for_each_entry(ref, &head->ref_list, list) {
> +		if (ref->action == BTRFS_ADD_DELAYED_REF)
> +			return ref;
> +	}
> +
>   	return list_entry(head->ref_list.next, struct btrfs_delayed_ref_node,
>   			  list);
>   }
>

  reply	other threads:[~2015-07-10  0:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-09 14:50 [PATCH] Btrfs: fix order by which delayed references are run fdmanana
2015-07-10  0:28 ` Qu Wenruo [this message]
2015-07-27  6:53 ` Qu Wenruo
2015-07-27  8:26   ` Filipe Manana
2015-07-27  9:22     ` Qu Wenruo
2015-07-27 13:50       ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559F11C8.8090604@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).