From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from acsinet15.oracle.com ([141.146.126.227]:48759 "EHLO acsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753445Ab2ILJqW (ORCPT ); Wed, 12 Sep 2012 05:46:22 -0400 Message-ID: <505059E4.1080903@oracle.com> Date: Wed, 12 Sep 2012 17:46:12 +0800 From: Liu Bo MIME-Version: 1.0 To: Roman Mamedov CC: Linux Btrfs Subject: Re: Deadlock in btrfs-cleaner, related to snapshot deletion References: <20120912135315.32d670c2@natsu> In-Reply-To: <20120912135315.32d670c2@natsu> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 09/12/2012 03:53 PM, Roman Mamedov wrote: > Hello, > > (this is a recap of yesterday's discussion on BTRFS IRC, also to save relevant pastes before pastebins expire) > > I have my /home on btrfs; a cronjob makes one snapshot every 30 minutes; these > snapshots are kept for 24-48 hours, then deleted in batches. > > This is a 16K Leaf/Node BTRFS on top of mdadm RAID1. > > As system uptime approached 2 weeks, I started noticing that the free space on > that partition does not free up as it should assuming continuous clean-up of older snapshots. > > I have tried a reboot, and it locked up unmounting the FS. After a hard > reset, soon after mounting the FS processes start to lock-up in D state, with > messages that a task is blocked for more than 120 seconds in dmesg. > > I have upgraded to 3.6.0-rc5, and this is what I got via Alt-SysRq-W in this locked up state: > I try to figure out where goes wrong, but the listed stacks seems to be not enough for solving the deadlock, so could you please use a lockdep debug kernel and post the result here? thanks, liubo > [ 386.318062] SysRq : Show Blocked State > [ 386.318141] task PC stack pid father > [ 386.318188] btrfs-cleaner D ffffffff8180d440 0 3506 2 0x00000000 > [ 386.318197] ffff88040c34b6a0 0000000000000046 ffff88040c34b650 ffffffff81139f03 > [ 386.318205] ffff88040c34bfd8 ffff88040c34bfd8 ffff88040c34bfd8 0000000000013dc0 > [ 386.318213] ffffffff81c14440 ffff88040b3496d0 ffff88040c34b6b0 ffff880405a04408 > [ 386.318220] Call Trace: > [ 386.318236] [] ? activate_page+0x83/0xa0 > [ 386.318245] [] schedule+0x29/0x70 > [ 386.318305] [] btrfs_tree_lock+0xcd/0x250 [btrfs] > [ 386.318314] [] ? add_wait_queue+0x60/0x60 > [ 386.318348] [] btrfs_init_new_buffer+0x68/0x140 [btrfs] > [ 386.318379] [] btrfs_alloc_free_block+0x9f/0x220 [btrfs] > [ 386.318408] [] __btrfs_cow_block+0x122/0x4f0 [btrfs] > [ 386.318415] [] ? _raw_spin_lock+0xe/0x20 > [ 386.318443] [] btrfs_cow_block+0xfc/0x220 [btrfs] > [ 386.318472] [] btrfs_search_slot+0x368/0x740 [btrfs] > [ 386.318502] [] lookup_inline_extent_backref+0x8e/0x4c0 [btrfs] > [ 386.318532] [] lookup_extent_backref+0x60/0xf0 [btrfs] > [ 386.318561] [] __btrfs_free_extent+0xb5/0x670 [btrfs] > [ 386.318592] [] run_delayed_tree_ref+0x114/0x190 [btrfs] > [ 386.318623] [] run_one_delayed_ref+0xde/0xf0 [btrfs] > [ 386.318654] [] run_clustered_refs+0x136/0x3d0 [btrfs] > [ 386.318685] [] btrfs_run_delayed_refs+0xf0/0x2d0 [btrfs] > [ 386.318721] [] __btrfs_end_transaction+0xf9/0x3c0 [btrfs] > [ 386.318753] [] ? btrfs_drop_snapshot+0x545/0x600 [btrfs] > [ 386.318789] [] btrfs_end_transaction_throttle+0x18/0x20 [btrfs] > [ 386.318820] [] btrfs_drop_snapshot+0x213/0x600 [btrfs] > [ 386.318857] [] btrfs_clean_old_snapshots+0xa7/0x150 [btrfs] > [ 386.318890] [] cleaner_kthread+0xa9/0x120 [btrfs] > [ 386.318924] [] ? btree_invalidatepage+0x80/0x80 [btrfs] > [ 386.318931] [] kthread+0x93/0xa0 > [ 386.318939] [] kernel_thread_helper+0x4/0x10 > [ 386.318946] [] ? flush_kthread_worker+0xb0/0xb0 > [ 386.318952] [] ? gs_change+0x13/0x13 > [ 386.318956] btrfs-transacti D 0000000000000000 0 3507 2 0x00000000 > [ 386.318963] ffff88040aa15920 0000000000000046 0000000000000003 0000000000025250 > [ 386.318971] ffff88040aa15fd8 ffff88040aa15fd8 ffff88040aa15fd8 0000000000013dc0 > [ 386.318977] ffff88040a7fada0 ffff88040b34c470 ffff88040aa15930 ffff8803c7798ac0 > [ 386.318984] Call Trace: > [ 386.318991] [] schedule+0x29/0x70 > [ 386.319031] [] btrfs_tree_lock+0xcd/0x250 [btrfs] > [ 386.319039] [] ? add_wait_queue+0x60/0x60 > [ 386.319068] [] btrfs_search_slot+0x5f1/0x740 [btrfs] > [ 386.319098] [] lookup_inline_extent_backref+0x8e/0x4c0 [btrfs] > [ 386.319128] [] lookup_extent_backref+0x60/0xf0 [btrfs] > [ 386.319157] [] __btrfs_free_extent+0xb5/0x670 [btrfs] > [ 386.319188] [] run_delayed_tree_ref+0x114/0x190 [btrfs] > [ 386.319219] [] run_one_delayed_ref+0xde/0xf0 [btrfs] > [ 386.319250] [] run_clustered_refs+0x136/0x3d0 [btrfs] > [ 386.319281] [] btrfs_run_delayed_refs+0xf0/0x2d0 [btrfs] > [ 386.319317] [] btrfs_commit_transaction+0xa5/0x9c0 [btrfs] > [ 386.319325] [] ? add_wait_queue+0x60/0x60 > [ 386.319359] [] transaction_kthread+0x1b5/0x230 [btrfs] > [ 386.319393] [] ? btrfs_destroy_delayed_refs.isra.103+0x210/0x210 [btrfs] > [ 386.319400] [] kthread+0x93/0xa0 > [ 386.319407] [] kernel_thread_helper+0x4/0x10 > [ 386.319414] [] ? flush_kthread_worker+0xb0/0xb0 > [ 386.319421] [] ? gs_change+0x13/0x13 > [ 386.319424] flush-btrfs-3 D 0000000000000000 0 3520 2 0x00000000 > [ 386.319431] ffff8803c181b9d0 0000000000000046 ffff8803c181b970 ffff880405ab2d98 > [ 386.319437] ffff8803c181bfd8 ffff8803c181bfd8 ffff8803c181bfd8 0000000000013dc0 > [ 386.319444] ffff88040a7fada0 ffff8804014316d0 ffff8803c181b9e0 ffff8803d870e2b0 > [ 386.319450] Call Trace: > [ 386.319457] [] schedule+0x29/0x70 > [ 386.319496] [] btrfs_tree_lock+0xcd/0x250 [btrfs] > [ 386.319503] [] ? add_wait_queue+0x60/0x60 > [ 386.319542] [] lock_extent_buffer_for_io+0x103/0x260 [btrfs] > [ 386.319581] [] btree_write_cache_pages+0x21c/0x410 [btrfs] > [ 386.319619] [] ? can_nocow_odirect+0x2b0/0x2b0 [btrfs] > [ 386.319654] [] btree_writepages+0x3b/0x40 [btrfs] > [ 386.319661] [] do_writepages+0x20/0x40 > [ 386.319670] [] __writeback_single_inode.isra.32+0x3f/0x190 > [ 386.319678] [] writeback_sb_inodes+0x1a0/0x350 > [ 386.319685] [] wb_writeback+0xef/0x340 > [ 386.319693] [] wb_do_writeback+0xb8/0x1d0 > [ 386.319701] [] ? usleep_range+0x50/0x50 > [ 386.319709] [] bdi_writeback_thread+0x8b/0x290 > [ 386.319717] [] ? wb_do_writeback+0x1d0/0x1d0 > [ 386.319723] [] kthread+0x93/0xa0 > [ 386.319730] [] kernel_thread_helper+0x4/0x10 > [ 386.319738] [] ? flush_kthread_worker+0xb0/0xb0 > [ 386.319744] [] ? gs_change+0x13/0x13 > [ 386.319748] sync D ffffffff8180d440 0 3585 3570 0x00000000 > [ 386.319754] ffff8803dacb1cf8 0000000000000082 ffff8803dacb1cd8 ffffffff81094699 > [ 386.319761] ffff8803dacb1fd8 ffff8803dacb1fd8 ffff8803dacb1fd8 0000000000013dc0 > [ 386.319768] ffff88040d92db40 ffff88040bc14470 0000000000013dc0 7fffffffffffffff > [ 386.319774] Call Trace: > [ 386.319783] [] ? check_preempt_wakeup+0x1c9/0x270 > [ 386.319789] [] schedule+0x29/0x70 > [ 386.319796] [] schedule_timeout+0x1fd/0x260 > [ 386.319803] [] ? ttwu_do_activate.constprop.86+0x66/0x70 > [ 386.319809] [] ? ttwu_queue+0xd6/0xf0 > [ 386.319815] [] wait_for_common+0xdf/0x180 > [ 386.319822] [] ? fdatawrite_one_bdev+0x20/0x20 > [ 386.319828] [] ? try_to_wake_up+0x200/0x200 > [ 386.319834] [] ? fdatawrite_one_bdev+0x20/0x20 > [ 386.319840] [] wait_for_completion+0x1d/0x20 > [ 386.319847] [] sync_inodes_sb+0x97/0xd0 > [ 386.319854] [] sync_inodes_one_sb+0x1d/0x30 > [ 386.319862] [] iterate_supers+0x101/0x110 > [ 386.319867] [] sys_sync+0x35/0x90 > [ 386.319874] [] system_call_fastpath+0x1a/0x1f > [ 386.319881] Sched Debug Version: v0.10, 3.6.0-030600rc5-generic #201209082035 > [ 386.319885] ktime : 386839.691480 > [ 386.319889] sched_clk : 386319.878549 > [ 386.319893] cpu_clk : 386319.878681 > [ 386.319896] jiffies : 4294989005 > [ 386.319899] sched_clock_stable : 1 > [ 386.319901] > [ 386.319903] sysctl_sched > [ 386.319907] .sysctl_sched_latency : 12.000000 > [ 386.319911] .sysctl_sched_min_granularity : 1.500000 > [ 386.319914] .sysctl_sched_wakeup_granularity : 2.000000 > [ 386.319917] .sysctl_sched_child_runs_first : 0 > [ 386.319920] .sysctl_sched_features : 24119 > [ 386.319924] .sysctl_sched_tunable_scaling : 1 (logaritmic) > [ 386.319929] > [ 386.319929] cpu#0, 3013.515 MHz > [ 386.319933] .nr_running : 2 > [ 386.319936] .load : 2048 > [ 386.319939] .nr_switches : 437801 > [ 386.319942] .nr_load_updates : 22334 > [ 386.319945] .nr_uninterruptible : 3 > [ 386.319948] .next_balance : 4294.988996 > [ 386.319951] .curr->pid : 3590 > [ 386.319954] .clock : 386317.815742 > [ 386.319957] .cpu_load[0] : 0 > [ 386.319960] .cpu_load[1] : 0 > [ 386.319962] .cpu_load[2] : 0 > [ 386.319965] .cpu_load[3] : 0 > [ 386.319967] .cpu_load[4] : 0 > [ 386.319970] .yld_count : 0 > [ 386.319973] .sched_count : 445110 > [ 386.319976] .sched_goidle : 198003 > [ 386.319979] .avg_idle : 1000000 > [ 386.319982] .ttwu_count : 184667 > [ 386.319985] .ttwu_local : 178809 > [ 386.319990] > [ 386.319990] cfs_rq[0]:/autogroup-47 > [ 386.319994] .exec_clock : 13.273210 > [ 386.319998] .MIN_vruntime : 0.000001 > [ 386.320001] .min_vruntime : 13.529500 > [ 386.320005] .max_vruntime : 0.000001 > [ 386.320008] .spread : 0.000000 > [ 386.320011] .spread0 : -23453.744689 > [ 386.320015] .nr_spread_over : 0 > [ 386.320018] .nr_running : 1 > [ 386.320020] .load : 1024 > [ 386.320024] .load_avg : 0.000000 > [ 386.320027] .load_period : 9.999999 > [ 386.320030] .load_contrib : 0 > [ 386.320033] .load_tg : 0 > [ 386.320037] .se->exec_start : 386317.815742 > [ 386.320040] .se->vruntime : 23467.274189 > [ 386.320044] .se->sum_exec_runtime : 13.339940 > [ 386.320047] .se->statistics.wait_start : 0.000000 > [ 386.320051] .se->statistics.sleep_start : 0.000000 > [ 386.320054] .se->statistics.block_start : 0.000000 > [ 386.320057] .se->statistics.sleep_max : 0.000000 > [ 386.320060] .se->statistics.block_max : 0.000000 > [ 386.320063] .se->statistics.exec_max : 1.952409 > [ 386.320067] .se->statistics.slice_max : 0.000000 > [ 386.320070] .se->statistics.wait_max : 0.029838 > [ 386.320073] .se->statistics.wait_sum : 0.036460 > [ 386.320076] .se->statistics.wait_count : 32 > [ 386.320079] .se->load.weight : 1024 > [ 386.320083] > [ 386.320083] cfs_rq[0]:/ > [ 386.320087] .exec_clock : 23614.135229 > [ 386.320090] .MIN_vruntime : 23465.802624 > [ 386.320094] .min_vruntime : 23467.274189 > [ 386.320097] .max_vruntime : 23465.802624 > [ 386.320100] .spread : 0.000000 > [ 386.320103] .spread0 : 0.000000 > [ 386.320106] .nr_spread_over : 117 > [ 386.320109] .nr_running : 2 > [ 386.320112] .load : 2048 > [ 386.320115] .load_avg : 0.000000 > [ 386.320118] .load_period : 0.000000 > [ 386.320121] .load_contrib : 0 > [ 386.320123] .load_tg : 0 > [ 386.320131] > [ 386.320131] rt_rq[0]:/ > [ 386.320135] .rt_nr_running : 0 > [ 386.320137] .rt_throttled : 0 > [ 386.320140] .rt_time : 0.000000 > [ 386.320144] .rt_runtime : 950.000000 > [ 386.320148] > [ 386.320148] runnable tasks: > [ 386.320148] task PID tree-key switches prio exec-runtime sum-exec sum-sleep > [ 386.320148] ---------------------------------------------------------------------------------------------------------- > [ 386.320173] kworker/0:6 3413 23465.802624 9246 120 23465.802624 187.949272 352670.165012 / > [ 386.320190] R bash 3590 13.529500 34 120 13.529500 16.465746 17470.132476 /autogroup-47 > [ 386.320203] > [ 386.320203] cpu#1, 3013.515 MHz > [ 386.320207] .nr_running : 0 > [ 386.320210] .load : 0 > [ 386.320213] .nr_switches : 291294 > [ 386.320216] .nr_load_updates : 35844 > [ 386.320219] .nr_uninterruptible : 1 > [ 386.320222] .next_balance : 4294.988982 > [ 386.320225] .curr->pid : 0 > [ 386.320228] .clock : 386317.758699 > [ 386.320231] .cpu_load[0] : 0 > [ 386.320234] .cpu_load[1] : 0 > [ 386.320237] .cpu_load[2] : 0 > [ 386.320239] .cpu_load[3] : 0 > [ 386.320242] .cpu_load[4] : 2 > [ 386.320245] .yld_count : 0 > [ 386.320248] .sched_count : 311183 > [ 386.320251] .sched_goidle : 130944 > [ 386.320253] .avg_idle : 1000000 > [ 386.320256] .ttwu_count : 186513 > [ 386.320259] .ttwu_local : 143360 > [ 386.320263] > [ 386.320263] cfs_rq[1]:/autogroup-46 > [ 386.320268] .exec_clock : 64.885545 > [ 386.320271] .MIN_vruntime : 0.000001 > [ 386.320274] .min_vruntime : 63.581650 > [ 386.320278] .max_vruntime : 0.000001 > [ 386.320281] .spread : 0.000000 > [ 386.320284] .spread0 : -23403.692539 > [ 386.320287] .nr_spread_over : 1 > [ 386.320290] .nr_running : 0 > [ 386.320292] .load : 0 > [ 386.320296] .load_avg : 5119.999488 > [ 386.320299] .load_period : 5.010586 > [ 386.320302] .load_contrib : 1021 > [ 386.320305] .load_tg : 1021 > [ 386.320309] .se->exec_start : 386317.737525 > [ 386.320312] .se->vruntime : 16960.176657 > [ 386.320315] .se->sum_exec_runtime : 64.885545 > [ 386.320319] .se->statistics.wait_start : 0.000000 > [ 386.320322] .se->statistics.sleep_start : 0.000000 > [ 386.320325] .se->statistics.block_start : 0.000000 > [ 386.320328] .se->statistics.sleep_max : 0.000000 > [ 386.320331] .se->statistics.block_max : 0.000000 > [ 386.320335] .se->statistics.exec_max : 3.994871 > [ 386.320338] .se->statistics.slice_max : 0.000000 > [ 386.320341] .se->statistics.wait_max : 4.113969 > [ 386.320344] .se->statistics.wait_sum : 4.372857 > [ 386.320347] .se->statistics.wait_count : 93 > [ 386.320350] .se->load.weight : 2 > [ 386.320353] > [ 386.320353] cfs_rq[1]:/ > [ 386.320357] .exec_clock : 18869.698887 > [ 386.320360] .MIN_vruntime : 0.000001 > [ 386.320364] .min_vruntime : 16966.057472 > [ 386.320367] .max_vruntime : 0.000001 > [ 386.320370] .spread : 0.000000 > [ 386.320373] .spread0 : -6501.216717 > [ 386.320376] .nr_spread_over : 73 > [ 386.320379] .nr_running : 0 > [ 386.320382] .load : 0 > [ 386.320385] .load_avg : 0.000000 > [ 386.320388] .load_period : 0.000000 > [ 386.320390] .load_contrib : 0 > [ 386.320393] .load_tg : 0 > [ 386.320397] > [ 386.320397] rt_rq[1]:/ > [ 386.320401] .rt_nr_running : 0 > [ 386.320404] .rt_throttled : 0 > [ 386.320407] .rt_time : 0.000000 > [ 386.320410] .rt_runtime : 950.000000 > [ 386.320413] > [ 386.320413] runnable tasks: > [ 386.320413] task PID tree-key switches prio exec-runtime sum-exec sum-sleep > [ 386.320413] ---------------------------------------------------------------------------------------------------------- > [ 386.320437] >