From mboxrd@z Thu Jan 1 00:00:00 1970 From: Liu Bo Subject: Re: 3.4-rc6: delayed alloc deadlock... Date: Wed, 09 May 2012 09:20:42 +0800 Message-ID: <4FA9C66A.80900@cn.fujitsu.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Chris Mason , Josef Bacik , Jeff Mahoney , Linux BTRFS To: Daniel J Blueman Return-path: In-Reply-To: List-ID: On 05/08/2012 04:56 PM, Daniel J Blueman wrote: > Delayed allocation ref mutexes are taken [1] inside > btrfs_commit_transaction. A later call fails and jumps to the > cleanup_transaction label (transaction.c:1501) with these mutexes > still held causing deadlock [2] when they are reacquired. > > Either we can introduce an earlier label (cleanup_transaction_lock) > and function to unlock these mutexes or can tweak > btrfs_destroy_delayed_refs to conditionally use mutex_try_lock. > > What is the suggested approach? > Hi Daniel, I prefer mutex_try_lock, just as other places do. You can give it a try. :) thanks, liubo > Thanks, > Daniel > > --- [1] > > btrfs_commit_transaction -> btrfs_run_delayed_refs -> > run_clustered_refs -> btrfs_delayed_ref_lock -> struct > btrfs_delayed_ref_head -> mutex > > --- [2] > > btrfs bad tree block start 0 39845888 > btrfs bad tree block start 0 39845888 > btrfs: run_one_delayed_ref returned -5 > WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xa6/0xc0 [btrfs]() > Hardware name: Latitude E5420 > btrfs: Transaction aborted > Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat dm_crypt > dm_mod kvm_intel kvm coretemp binfmt_misc microcode uvcvideo > videobuf2_core videodev videobuf2_vmalloc videobuf2_memops iwlwifi > btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video usb_storage > Pid: 14985, comm: btrfs-endio-wri Tainted: G W 3.4.0-rc6-debug #14 > Call Trace: > [] warn_slowpath_common+0x7a/0xb0 > [] warn_slowpath_fmt+0x41/0x50 > [] ? __lock_release+0xad/0xd0 > [] __btrfs_abort_transaction+0xa6/0xc0 [btrfs] > [] btrfs_run_delayed_refs+0x296/0x300 [btrfs] > [] __btrfs_end_transaction+0xa7/0x360 [btrfs] > [] btrfs_end_transaction+0x10/0x20 [btrfs] > [] btrfs_finish_ordered_io+0x17d/0x3b0 [btrfs] > [] ? trace_hardirqs_on_caller+0x105/0x190 > [] btrfs_writepage_end_io_hook+0x15/0x20 [btrfs] > [] end_extent_writepage+0x58/0x100 [btrfs] > [] end_bio_extent_writepage+0x64/0x90 [btrfs] > [] bio_endio+0x18/0x30 > [] end_workqueue_fn+0x3c/0x50 [btrfs] > [] worker_loop+0x86/0x330 [btrfs] > [] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs] > [] kthread+0x8e/0xa0 > [] kernel_thread_helper+0x4/0x10 > [] ? retint_restore_args+0xe/0xe > [] ? __init_kthread_worker+0x70/0x70 > [] ? gs_change+0xb/0xb > ---[ end trace df06b72f93439fa3 ]--- > BTRFS warning (device ram1): Aborting unused transaction. > btrfs bad tree block start 0 39845888 > btrfs bad tree block start 0 39845888 > btrfs: run_one_delayed_ref returned -5 > BTRFS error (device ram1) in btrfs_run_delayed_refs:2454: IO failure > btrfs is forced readonly > BTRFS warning (device ram1): Skipping commit of aborted transaction. > > ============================================= > [ INFO: possible recursive locking detected ] > 3.4.0-rc6-debug #14 Tainted: G W > --------------------------------------------- > btrfs/18749 is trying to acquire lock: > (&head_ref->mutex){+.+...}, at: [] > btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > > but task is already holding lock: > (&head_ref->mutex){+.+...}, at: [] > btrfs_delayed_ref_lock+0x37/0x140 [btrfs] > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&head_ref->mutex); > lock(&head_ref->mutex); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by btrfs/18749: > #0: (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [] > btrfs_mksubvol+0x4e/0x1a0 [btrfs] > #1: (&fs_info->subvol_sem){++++..}, at: [] > btrfs_mksubvol+0x107/0x1a0 [btrfs] > #2: (&head_ref->mutex){+.+...}, at: [] > btrfs_delayed_ref_lock+0x37/0x140 [btrfs] > > stack backtrace: > Pid: 18749, comm: btrfs Tainted: G W 3.4.0-rc6-debug #14 > Call Trace: > [] print_deadlock_bug+0xf3/0x100 > [] check_deadlock.isra.29+0x1e2/0x1f0 > [] validate_chain.isra.33+0x383/0x510 > [] __lock_acquire+0x388/0x900 > [] lock_acquire+0x55/0x70 > [] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > [] mutex_lock_nested+0x6b/0x340 > [] ? btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > [] ? __lock_release+0xad/0xd0 > [] btrfs_destroy_delayed_refs.isra.96+0xf9/0x210 [btrfs] > [] btrfs_cleanup_one_transaction+0x12/0x100 [btrfs] > [] cleanup_transaction+0x76/0xf0 [btrfs] > [] btrfs_commit_transaction+0xf1/0x900 [btrfs] > [] ? __init_waitqueue_head+0x60/0x60 > [] create_snapshot.isra.46+0x1ab/0x1e0 [btrfs] > [] btrfs_mksubvol+0x135/0x1a0 [btrfs] > [] ? files_lglock_local_lock+0x70/0x70 > [] btrfs_ioctl_snap_create_transid+0x12a/0x190 [btrfs] > [] btrfs_ioctl_snap_create_v2.constprop.57+0xe0/0xf0 [btrfs] > [] ? __schedule+0x351/0x8b0 > [] btrfs_ioctl+0x409/0x770 [btrfs] > [] do_vfs_ioctl+0x87/0x340 > [] sys_ioctl+0x4a/0x80 > [] system_call_fastpath+0x16/0x1b