Re: [PATCH V2] Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Josef Bacik <jbacik@fb.com>
To: Chandan Rajendra <chandan@linux.vnet.ibm.com>,
	<linux-btrfs@vger.kernel.org>
Cc: <clm@fb.com>, <chandan@mykolab.com>
Subject: Re: [PATCH V2] Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state
Date: Mon, 2 Nov 2015 10:46:25 -0500	[thread overview]
Message-ID: <56378551.6070601@fb.com> (raw)
In-Reply-To: <1446452986-6581-1-git-send-email-chandan@linux.vnet.ibm.com>

On 11/02/2015 03:29 AM, Chandan Rajendra wrote:
> When executing generic/001 in a loop on a ppc64 machine (with both sectorsize
> and nodesize set to 64k), the following call trace is observed,
>
> WARNING: at /root/repos/linux/fs/btrfs/locking.c:253
> Modules linked in:
> CPU: 2 PID: 8353 Comm: umount Not tainted 4.3.0-rc5-13676-ga5e681d #54
> task: c0000000f2b1f560 ti: c0000000f6008000 task.ti: c0000000f6008000
> NIP: c000000000520c88 LR: c0000000004a3b34 CTR: 0000000000000000
> REGS: c0000000f600a820 TRAP: 0700   Not tainted  (4.3.0-rc5-13676-ga5e681d)
> MSR: 8000000102029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 24444884  XER: 00000000
> CFAR: c0000000004a3b30 SOFTE: 1
> GPR00: c0000000004a3b34 c0000000f600aaa0 c00000000108ac00 c0000000f5a808c0
> GPR04: 0000000000000000 c0000000f600ae60 0000000000000000 0000000000000005
> GPR08: 00000000000020a1 0000000000000001 c0000000f2b1f560 0000000000000030
> GPR12: 0000000084842882 c00000000fdc0900 c0000000f600ae60 c0000000f070b800
> GPR16: 0000000000000000 c0000000f3c8a000 0000000000000000 0000000000000049
> GPR20: 0000000000000001 0000000000000001 c0000000f5aa01f8 0000000000000000
> GPR24: 0f83e0f83e0f83e1 c0000000f5a808c0 c0000000f3c8d000 c000000000000000
> GPR28: c0000000f600ae74 0000000000000001 c0000000f3c8d000 c0000000f5a808c0
> NIP [c000000000520c88] .btrfs_tree_lock+0x48/0x2a0
> LR [c0000000004a3b34] .btrfs_lock_root_node+0x44/0x80
> Call Trace:
> [c0000000f600aaa0] [c0000000f600ab80] 0xc0000000f600ab80 (unreliable)
> [c0000000f600ab80] [c0000000004a3b34] .btrfs_lock_root_node+0x44/0x80
> [c0000000f600ac00] [c0000000004a99dc] .btrfs_search_slot+0xa8c/0xc00
> [c0000000f600ad40] [c0000000004ab878] .btrfs_insert_empty_items+0x98/0x120
> [c0000000f600adf0] [c00000000050da44] .btrfs_finish_chunk_alloc+0x1d4/0x620
> [c0000000f600af20] [c0000000004be854] .btrfs_create_pending_block_groups+0x1d4/0x2c0
> [c0000000f600b020] [c0000000004bf188] .do_chunk_alloc+0x3c8/0x420
> [c0000000f600b100] [c0000000004c27cc] .find_free_extent+0xbfc/0x1030
> [c0000000f600b260] [c0000000004c2ce8] .btrfs_reserve_extent+0xe8/0x250
> [c0000000f600b330] [c0000000004c2f90] .btrfs_alloc_tree_block+0x140/0x590
> [c0000000f600b440] [c0000000004a47b4] .__btrfs_cow_block+0x124/0x780
> [c0000000f600b530] [c0000000004a4fc0] .btrfs_cow_block+0xf0/0x250
> [c0000000f600b5e0] [c0000000004a917c] .btrfs_search_slot+0x22c/0xc00
> [c0000000f600b720] [c00000000050aa40] .btrfs_remove_chunk+0x1b0/0x9f0
> [c0000000f600b850] [c0000000004c4e04] .btrfs_delete_unused_bgs+0x434/0x570
> [c0000000f600b950] [c0000000004d3cb8] .close_ctree+0x2e8/0x3b0
> [c0000000f600ba20] [c00000000049d178] .btrfs_put_super+0x18/0x30
> [c0000000f600ba90] [c000000000243cd4] .generic_shutdown_super+0xa4/0x1a0
> [c0000000f600bb10] [c0000000002441d8] .kill_anon_super+0x18/0x30
> [c0000000f600bb90] [c00000000049c898] .btrfs_kill_super+0x18/0xc0
> [c0000000f600bc10] [c0000000002444f8] .deactivate_locked_super+0x98/0xe0
> [c0000000f600bc90] [c000000000269f94] .cleanup_mnt+0x54/0xa0
> [c0000000f600bd10] [c0000000000bd744] .task_work_run+0xc4/0x100
> [c0000000f600bdb0] [c000000000016334] .do_notify_resume+0x74/0x80
> [c0000000f600be30] [c0000000000098b8] .ret_from_except_lite+0x64/0x68
> Instruction dump:
> fba1ffe8 fbc1fff0 fbe1fff8 7c791b78 f8010010 f821ff21 e94d0290 81030040
> 812a04e8 7d094a78 7d290034 5529d97e <0b090000> 3b400000 3be30050 3bc3004c
>
> The above call trace is seen even on x86_64; albeit very rarely and that too
> with nodesize set to 64k and with nospace_cache mount option being used.
>
> The reason for the above call trace is,
> btrfs_remove_chunk
>    check_system_chunk
>      Allocate chunk if required
>    For each physical stripe on underlying device,
>      btrfs_free_dev_extent
>        ...
>        Take lock on Device tree's root node
>        btrfs_cow_block("dev tree's root node");
>          btrfs_reserve_extent
>            find_free_extent
> 	    index = BTRFS_RAID_DUP;
> 	    have_caching_bg = false;
>
>              When in LOOP_CACHING_NOWAIT state, Assume we find a block group
> 	    which is being cached; Hence have_caching_bg is set to true
>
>              When repeating the search for the next RAID index, we set
> 	    have_caching_bg to false.
>
> Hence right after completing the LOOP_CACHING_NOWAIT state, we incorrectly
> skip LOOP_CACHING_WAIT state and move to LOOP_ALLOC_CHUNK state where we
> allocate a chunk and try to add entries corresponding to the chunk's physical
> stripe into the device tree. When doing so the task deadlocks itself waiting
> for the blocking lock on the root node of the device tree.
>
> This commit fixes the issue by introducing a new local variable to help
> indicate as to whether a block group of any RAID type is being cached.
>
> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> ---
> Changelog:
> v1->v2: Honor 80 column restriction.
>
>   fs/btrfs/extent-tree.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f50c7c2..99a8e57 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7029,6 +7029,7 @@ static noinline int find_free_extent(struct btrfs_root *orig_root,
>   	bool failed_alloc = false;
>   	bool use_cluster = true;
>   	bool have_caching_bg = false;
> +	bool orig_have_caching_bg = false;
>   	bool full_search = false;
>
>   	WARN_ON(num_bytes < root->sectorsize);
> @@ -7378,6 +7379,10 @@ loop:
>   	}
>   	up_read(&space_info->groups_sem);
>
> +	if ((loop == LOOP_CACHING_NOWAIT) && have_caching_bg
> +		&& !orig_have_caching_bg)
> +		orig_have_caching_bg = true;
> +
>   	if (!ins->objectid && loop >= LOOP_CACHING_WAIT && have_caching_bg)
>   		goto search;
>
> @@ -7400,7 +7405,7 @@ loop:
>   			 * don't have any unached bgs and we've alrelady done a
>   			 * full search through.
>   			 */
> -			if (have_caching_bg || !full_search)
> +			if (orig_have_caching_bg || !full_search)
>   				loop = LOOP_CACHING_WAIT;
>   			else
>   				loop = LOOP_ALLOC_CHUNK;
>

Ah good catch, thanks Chandan,

Reviewed-by: Josef Bacik <jbacik@fb.com>

next prev parent reply	other threads:[~2015-11-02 15:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-31 19:11 [PATCH] Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state Chandan Rajendra
2015-11-02  8:29 ` [PATCH V2] " Chandan Rajendra
2015-11-02 15:46   ` Josef Bacik [this message]
2015-11-02 16:52   ` Chris Mason
     [not found]     ` <CAOcd+r3ku8mhfK_ZvZRL2pRSZ=Y1Q2maNF_ob+8fEA=2Etm1Lg@mail.gmail.com>
2015-12-13 10:18       ` Alex Lyakas
2015-12-14  7:59         ` Chandan Rajendra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56378551.6070601@fb.com \
    --to=jbacik@fb.com \
    --cc=chandan@linux.vnet.ibm.com \
    --cc=chandan@mykolab.com \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).