From: Liu Bo <liubo2009@cn.fujitsu.com>
To: Daniel J Blueman <daniel@quora.org>
Cc: Chris Mason <chris.mason@oracle.com>,
Josef Bacik <josef@redhat.com>, Jeff Mahoney <jeffm@suse.com>,
Linux BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: 3.4-rc6: delayed alloc deadlock...
Date: Wed, 09 May 2012 09:20:42 +0800 [thread overview]
Message-ID: <4FA9C66A.80900@cn.fujitsu.com> (raw)
In-Reply-To: <CAMVG2st1p8xkgOj7dknGSNtX5y42tv_tmnsUjZFQkMP3dXa9rw@mail.gmail.com>
On 05/08/2012 04:56 PM, Daniel J Blueman wrote:
> Delayed allocation ref mutexes are taken [1] inside
> btrfs_commit_transaction. A later call fails and jumps to the
> cleanup_transaction label (transaction.c:1501) with these mutexes
> still held causing deadlock [2] when they are reacquired.
>
> Either we can introduce an earlier label (cleanup_transaction_lock)
> and function to unlock these mutexes or can tweak
> btrfs_destroy_delayed_refs to conditionally use mutex_try_lock.
>
> What is the suggested approach?
>
Hi Daniel,
I prefer mutex_try_lock, just as other places do.
You can give it a try. :)
thanks,
liubo
> Thanks,
> Daniel
>
> --- [1]
>
> btrfs_commit_transaction -> btrfs_run_delayed_refs ->
> run_clustered_refs -> btrfs_delayed_ref_lock -> struct
> btrfs_delayed_ref_head -> mutex
>
> --- [2]
>
> btrfs bad tree block start 0 39845888
> btrfs bad tree block start 0 39845888
> btrfs: run_one_delayed_ref returned -5
> WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xa6/0xc0 [btrfs]()
> Hardware name: Latitude E5420
> btrfs: Transaction aborted
> Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat dm_crypt
> dm_mod kvm_intel kvm coretemp binfmt_misc microcode uvcvideo
> videobuf2_core videodev videobuf2_vmalloc videobuf2_memops iwlwifi
> btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video usb_storage
> Pid: 14985, comm: btrfs-endio-wri Tainted: G W 3.4.0-rc6-debug #14
> Call Trace:
> [<ffffffff8103c5ca>] warn_slowpath_common+0x7a/0xb0
> [<ffffffff8103c6a1>] warn_slowpath_fmt+0x41/0x50
> [<ffffffff8108e9cd>] ? __lock_release+0xad/0xd0
> [<ffffffffa0094c76>] __btrfs_abort_transaction+0xa6/0xc0 [btrfs]
> [<ffffffffa00a87a6>] btrfs_run_delayed_refs+0x296/0x300 [btrfs]
> [<ffffffffa00b9ad7>] __btrfs_end_transaction+0xa7/0x360 [btrfs]
> [<ffffffffa00b9df0>] btrfs_end_transaction+0x10/0x20 [btrfs]
> [<ffffffffa00c049d>] btrfs_finish_ordered_io+0x17d/0x3b0 [btrfs]
> [<ffffffff8108f505>] ? trace_hardirqs_on_caller+0x105/0x190
> [<ffffffffa00c06e5>] btrfs_writepage_end_io_hook+0x15/0x20 [btrfs]
> [<ffffffffa00dbbb8>] end_extent_writepage+0x58/0x100 [btrfs]
> [<ffffffffa00dbcc4>] end_bio_extent_writepage+0x64/0x90 [btrfs]
> [<ffffffff81147458>] bio_endio+0x18/0x30
> [<ffffffffa00b1efc>] end_workqueue_fn+0x3c/0x50 [btrfs]
> [<ffffffffa00e8cc6>] worker_loop+0x86/0x330 [btrfs]
> [<ffffffffa00e8c40>] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs]
> [<ffffffff8105da6e>] kthread+0x8e/0xa0
> [<ffffffff815b1b94>] kernel_thread_helper+0x4/0x10
> [<ffffffff815b0259>] ? retint_restore_args+0xe/0xe
> [<ffffffff8105d9e0>] ? __init_kthread_worker+0x70/0x70
> [<ffffffff815b1b90>] ? gs_change+0xb/0xb
> ---[ end trace df06b72f93439fa3 ]---
> BTRFS warning (device ram1): Aborting unused transaction.
> btrfs bad tree block start 0 39845888
> btrfs bad tree block start 0 39845888
> btrfs: run_one_delayed_ref returned -5
> BTRFS error (device ram1) in btrfs_run_delayed_refs:2454: IO failure
> btrfs is forced readonly
> BTRFS warning (device ram1): Skipping commit of aborted transaction.
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.4.0-rc6-debug #14 Tainted: G W
> ---------------------------------------------
> btrfs/18749 is trying to acquire lock:
> (&head_ref->mutex){+.+...}, at: [<ffffffffa00b22a9>]
> btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs]
>
> but task is already holding lock:
> (&head_ref->mutex){+.+...}, at: [<ffffffffa00fce07>]
> btrfs_delayed_ref_lock+0x37/0x140 [btrfs]
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&head_ref->mutex);
> lock(&head_ref->mutex);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 3 locks held by btrfs/18749:
> #0: (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffffa00eb86e>]
> btrfs_mksubvol+0x4e/0x1a0 [btrfs]
> #1: (&fs_info->subvol_sem){++++..}, at: [<ffffffffa00eb927>]
> btrfs_mksubvol+0x107/0x1a0 [btrfs]
> #2: (&head_ref->mutex){+.+...}, at: [<ffffffffa00fce07>]
> btrfs_delayed_ref_lock+0x37/0x140 [btrfs]
>
> stack backtrace:
> Pid: 18749, comm: btrfs Tainted: G W 3.4.0-rc6-debug #14
> Call Trace:
> [<ffffffff8108b913>] print_deadlock_bug+0xf3/0x100
> [<ffffffff8108bb02>] check_deadlock.isra.29+0x1e2/0x1f0
> [<ffffffff8108d443>] validate_chain.isra.33+0x383/0x510
> [<ffffffff8108dff8>] __lock_acquire+0x388/0x900
> [<ffffffff8108ea95>] lock_acquire+0x55/0x70
> [<ffffffffa00b22a9>] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs]
> [<ffffffff815ad38b>] mutex_lock_nested+0x6b/0x340
> [<ffffffffa00b22a9>] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs]
> [<ffffffff8108e9cd>] ? __lock_release+0xad/0xd0
> [<ffffffffa00b22a9>] btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs]
> [<ffffffffa00b5682>] btrfs_cleanup_one_transaction+0x12/0x100 [btrfs]
> [<ffffffffa00b8976>] cleanup_transaction+0x76/0xf0 [btrfs]
> [<ffffffffa00b91f1>] btrfs_commit_transaction+0xf1/0x900 [btrfs]
> [<ffffffff8105e250>] ? __init_waitqueue_head+0x60/0x60
> [<ffffffffa00eb7eb>] create_snapshot.isra.46+0x1ab/0x1e0 [btrfs]
> [<ffffffffa00eb955>] btrfs_mksubvol+0x135/0x1a0 [btrfs]
> [<ffffffff811158e0>] ? files_lglock_local_lock+0x70/0x70
> [<ffffffffa00ebaea>] btrfs_ioctl_snap_create_transid+0x12a/0x190 [btrfs]
> [<ffffffffa00ec8b0>] btrfs_ioctl_snap_create_v2.constprop.57+0xe0/0xf0 [btrfs]
> [<ffffffff815ae241>] ? __schedule+0x351/0x8b0
> [<ffffffffa00eee39>] btrfs_ioctl+0x409/0x770 [btrfs]
> [<ffffffff81128767>] do_vfs_ioctl+0x87/0x340
> [<ffffffff81128a6a>] sys_ioctl+0x4a/0x80
> [<ffffffff815b09a2>] system_call_fastpath+0x16/0x1b
next prev parent reply other threads:[~2012-05-09 1:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-08 8:56 3.4-rc6: delayed alloc deadlock Daniel J Blueman
2012-05-09 1:20 ` Liu Bo [this message]
2012-05-09 3:39 ` Miao Xie
2012-05-17 5:49 ` Miao Xie
2012-05-17 6:41 ` Daniel J Blueman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FA9C66A.80900@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=daniel@quora.org \
--cc=jeffm@suse.com \
--cc=josef@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).