From: Liu Bo <bo.li.liu@oracle.com>
To: Roman Mamedov <rm@romanrm.ru>
Cc: Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Deadlock in btrfs-cleaner, related to snapshot deletion
Date: Wed, 12 Sep 2012 17:46:12 +0800 [thread overview]
Message-ID: <505059E4.1080903@oracle.com> (raw)
In-Reply-To: <20120912135315.32d670c2@natsu>
On 09/12/2012 03:53 PM, Roman Mamedov wrote:
> Hello,
>
> (this is a recap of yesterday's discussion on BTRFS IRC, also to save relevant pastes before pastebins expire)
>
> I have my /home on btrfs; a cronjob makes one snapshot every 30 minutes; these
> snapshots are kept for 24-48 hours, then deleted in batches.
>
> This is a 16K Leaf/Node BTRFS on top of mdadm RAID1.
>
> As system uptime approached 2 weeks, I started noticing that the free space on
> that partition does not free up as it should assuming continuous clean-up of older snapshots.
>
> I have tried a reboot, and it locked up unmounting the FS. After a hard
> reset, soon after mounting the FS processes start to lock-up in D state, with
> messages that a task is blocked for more than 120 seconds in dmesg.
>
> I have upgraded to 3.6.0-rc5, and this is what I got via Alt-SysRq-W in this locked up state:
>
I try to figure out where goes wrong, but the listed stacks seems to be not enough for solving the deadlock,
so could you please use a lockdep debug kernel and post the result here?
thanks,
liubo
> [ 386.318062] SysRq : Show Blocked State
> [ 386.318141] task PC stack pid father
> [ 386.318188] btrfs-cleaner D ffffffff8180d440 0 3506 2 0x00000000
> [ 386.318197] ffff88040c34b6a0 0000000000000046 ffff88040c34b650 ffffffff81139f03
> [ 386.318205] ffff88040c34bfd8 ffff88040c34bfd8 ffff88040c34bfd8 0000000000013dc0
> [ 386.318213] ffffffff81c14440 ffff88040b3496d0 ffff88040c34b6b0 ffff880405a04408
> [ 386.318220] Call Trace:
> [ 386.318236] [<ffffffff81139f03>] ? activate_page+0x83/0xa0
> [ 386.318245] [<ffffffff8169d0e9>] schedule+0x29/0x70
> [ 386.318305] [<ffffffffa011b29d>] btrfs_tree_lock+0xcd/0x250 [btrfs]
> [ 386.318314] [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
> [ 386.318348] [<ffffffffa00dc4f8>] btrfs_init_new_buffer+0x68/0x140 [btrfs]
> [ 386.318379] [<ffffffffa00dc66f>] btrfs_alloc_free_block+0x9f/0x220 [btrfs]
> [ 386.318408] [<ffffffffa00c83b2>] __btrfs_cow_block+0x122/0x4f0 [btrfs]
> [ 386.318415] [<ffffffff8169e00e>] ? _raw_spin_lock+0xe/0x20
> [ 386.318443] [<ffffffffa00c887c>] btrfs_cow_block+0xfc/0x220 [btrfs]
> [ 386.318472] [<ffffffffa00cc9f8>] btrfs_search_slot+0x368/0x740 [btrfs]
> [ 386.318502] [<ffffffffa00d32de>] lookup_inline_extent_backref+0x8e/0x4c0 [btrfs]
> [ 386.318532] [<ffffffffa00d3770>] lookup_extent_backref+0x60/0xf0 [btrfs]
> [ 386.318561] [<ffffffffa00d5c55>] __btrfs_free_extent+0xb5/0x670 [btrfs]
> [ 386.318592] [<ffffffffa00d6324>] run_delayed_tree_ref+0x114/0x190 [btrfs]
> [ 386.318623] [<ffffffffa00dac2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
> [ 386.318654] [<ffffffffa00dad76>] run_clustered_refs+0x136/0x3d0 [btrfs]
> [ 386.318685] [<ffffffffa00db100>] btrfs_run_delayed_refs+0xf0/0x2d0 [btrfs]
> [ 386.318721] [<ffffffffa00ecf99>] __btrfs_end_transaction+0xf9/0x3c0 [btrfs]
> [ 386.318753] [<ffffffffa00dcd35>] ? btrfs_drop_snapshot+0x545/0x600 [btrfs]
> [ 386.318789] [<ffffffffa00ed2b8>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs]
> [ 386.318820] [<ffffffffa00dca03>] btrfs_drop_snapshot+0x213/0x600 [btrfs]
> [ 386.318857] [<ffffffffa00edae7>] btrfs_clean_old_snapshots+0xa7/0x150 [btrfs]
> [ 386.318890] [<ffffffffa00e57d9>] cleaner_kthread+0xa9/0x120 [btrfs]
> [ 386.318924] [<ffffffffa00e5730>] ? btree_invalidatepage+0x80/0x80 [btrfs]
> [ 386.318931] [<ffffffff8107c3f3>] kthread+0x93/0xa0
> [ 386.318939] [<ffffffff816a7a84>] kernel_thread_helper+0x4/0x10
> [ 386.318946] [<ffffffff8107c360>] ? flush_kthread_worker+0xb0/0xb0
> [ 386.318952] [<ffffffff816a7a80>] ? gs_change+0x13/0x13
> [ 386.318956] btrfs-transacti D 0000000000000000 0 3507 2 0x00000000
> [ 386.318963] ffff88040aa15920 0000000000000046 0000000000000003 0000000000025250
> [ 386.318971] ffff88040aa15fd8 ffff88040aa15fd8 ffff88040aa15fd8 0000000000013dc0
> [ 386.318977] ffff88040a7fada0 ffff88040b34c470 ffff88040aa15930 ffff8803c7798ac0
> [ 386.318984] Call Trace:
> [ 386.318991] [<ffffffff8169d0e9>] schedule+0x29/0x70
> [ 386.319031] [<ffffffffa011b29d>] btrfs_tree_lock+0xcd/0x250 [btrfs]
> [ 386.319039] [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
> [ 386.319068] [<ffffffffa00ccc81>] btrfs_search_slot+0x5f1/0x740 [btrfs]
> [ 386.319098] [<ffffffffa00d32de>] lookup_inline_extent_backref+0x8e/0x4c0 [btrfs]
> [ 386.319128] [<ffffffffa00d3770>] lookup_extent_backref+0x60/0xf0 [btrfs]
> [ 386.319157] [<ffffffffa00d5c55>] __btrfs_free_extent+0xb5/0x670 [btrfs]
> [ 386.319188] [<ffffffffa00d6324>] run_delayed_tree_ref+0x114/0x190 [btrfs]
> [ 386.319219] [<ffffffffa00dac2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
> [ 386.319250] [<ffffffffa00dad76>] run_clustered_refs+0x136/0x3d0 [btrfs]
> [ 386.319281] [<ffffffffa00db100>] btrfs_run_delayed_refs+0xf0/0x2d0 [btrfs]
> [ 386.319317] [<ffffffffa00ec555>] btrfs_commit_transaction+0xa5/0x9c0 [btrfs]
> [ 386.319325] [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
> [ 386.319359] [<ffffffffa00e6a65>] transaction_kthread+0x1b5/0x230 [btrfs]
> [ 386.319393] [<ffffffffa00e68b0>] ? btrfs_destroy_delayed_refs.isra.103+0x210/0x210 [btrfs]
> [ 386.319400] [<ffffffff8107c3f3>] kthread+0x93/0xa0
> [ 386.319407] [<ffffffff816a7a84>] kernel_thread_helper+0x4/0x10
> [ 386.319414] [<ffffffff8107c360>] ? flush_kthread_worker+0xb0/0xb0
> [ 386.319421] [<ffffffff816a7a80>] ? gs_change+0x13/0x13
> [ 386.319424] flush-btrfs-3 D 0000000000000000 0 3520 2 0x00000000
> [ 386.319431] ffff8803c181b9d0 0000000000000046 ffff8803c181b970 ffff880405ab2d98
> [ 386.319437] ffff8803c181bfd8 ffff8803c181bfd8 ffff8803c181bfd8 0000000000013dc0
> [ 386.319444] ffff88040a7fada0 ffff8804014316d0 ffff8803c181b9e0 ffff8803d870e2b0
> [ 386.319450] Call Trace:
> [ 386.319457] [<ffffffff8169d0e9>] schedule+0x29/0x70
> [ 386.319496] [<ffffffffa011b29d>] btrfs_tree_lock+0xcd/0x250 [btrfs]
> [ 386.319503] [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
> [ 386.319542] [<ffffffffa0102873>] lock_extent_buffer_for_io+0x103/0x260 [btrfs]
> [ 386.319581] [<ffffffffa010862c>] btree_write_cache_pages+0x21c/0x410 [btrfs]
> [ 386.319619] [<ffffffffa00f1e40>] ? can_nocow_odirect+0x2b0/0x2b0 [btrfs]
> [ 386.319654] [<ffffffffa00e4a6b>] btree_writepages+0x3b/0x40 [btrfs]
> [ 386.319661] [<ffffffff81138450>] do_writepages+0x20/0x40
> [ 386.319670] [<ffffffff811af12f>] __writeback_single_inode.isra.32+0x3f/0x190
> [ 386.319678] [<ffffffff811af850>] writeback_sb_inodes+0x1a0/0x350
> [ 386.319685] [<ffffffff811afbbf>] wb_writeback+0xef/0x340
> [ 386.319693] [<ffffffff811b1468>] wb_do_writeback+0xb8/0x1d0
> [ 386.319701] [<ffffffff81067c20>] ? usleep_range+0x50/0x50
> [ 386.319709] [<ffffffff811b160b>] bdi_writeback_thread+0x8b/0x290
> [ 386.319717] [<ffffffff811b1580>] ? wb_do_writeback+0x1d0/0x1d0
> [ 386.319723] [<ffffffff8107c3f3>] kthread+0x93/0xa0
> [ 386.319730] [<ffffffff816a7a84>] kernel_thread_helper+0x4/0x10
> [ 386.319738] [<ffffffff8107c360>] ? flush_kthread_worker+0xb0/0xb0
> [ 386.319744] [<ffffffff816a7a80>] ? gs_change+0x13/0x13
> [ 386.319748] sync D ffffffff8180d440 0 3585 3570 0x00000000
> [ 386.319754] ffff8803dacb1cf8 0000000000000082 ffff8803dacb1cd8 ffffffff81094699
> [ 386.319761] ffff8803dacb1fd8 ffff8803dacb1fd8 ffff8803dacb1fd8 0000000000013dc0
> [ 386.319768] ffff88040d92db40 ffff88040bc14470 0000000000013dc0 7fffffffffffffff
> [ 386.319774] Call Trace:
> [ 386.319783] [<ffffffff81094699>] ? check_preempt_wakeup+0x1c9/0x270
> [ 386.319789] [<ffffffff8169d0e9>] schedule+0x29/0x70
> [ 386.319796] [<ffffffff8169b82d>] schedule_timeout+0x1fd/0x260
> [ 386.319803] [<ffffffff8108c1a6>] ? ttwu_do_activate.constprop.86+0x66/0x70
> [ 386.319809] [<ffffffff8108c286>] ? ttwu_queue+0xd6/0xf0
> [ 386.319815] [<ffffffff8169cf3f>] wait_for_common+0xdf/0x180
> [ 386.319822] [<ffffffff811b4fc0>] ? fdatawrite_one_bdev+0x20/0x20
> [ 386.319828] [<ffffffff8108eec0>] ? try_to_wake_up+0x200/0x200
> [ 386.319834] [<ffffffff811b4fc0>] ? fdatawrite_one_bdev+0x20/0x20
> [ 386.319840] [<ffffffff8169d0bd>] wait_for_completion+0x1d/0x20
> [ 386.319847] [<ffffffff811ad907>] sync_inodes_sb+0x97/0xd0
> [ 386.319854] [<ffffffff811b4fdd>] sync_inodes_one_sb+0x1d/0x30
> [ 386.319862] [<ffffffff811894a1>] iterate_supers+0x101/0x110
> [ 386.319867] [<ffffffff811b5145>] sys_sync+0x35/0x90
> [ 386.319874] [<ffffffff816a686d>] system_call_fastpath+0x1a/0x1f
> [ 386.319881] Sched Debug Version: v0.10, 3.6.0-030600rc5-generic #201209082035
> [ 386.319885] ktime : 386839.691480
> [ 386.319889] sched_clk : 386319.878549
> [ 386.319893] cpu_clk : 386319.878681
> [ 386.319896] jiffies : 4294989005
> [ 386.319899] sched_clock_stable : 1
> [ 386.319901]
> [ 386.319903] sysctl_sched
> [ 386.319907] .sysctl_sched_latency : 12.000000
> [ 386.319911] .sysctl_sched_min_granularity : 1.500000
> [ 386.319914] .sysctl_sched_wakeup_granularity : 2.000000
> [ 386.319917] .sysctl_sched_child_runs_first : 0
> [ 386.319920] .sysctl_sched_features : 24119
> [ 386.319924] .sysctl_sched_tunable_scaling : 1 (logaritmic)
> [ 386.319929]
> [ 386.319929] cpu#0, 3013.515 MHz
> [ 386.319933] .nr_running : 2
> [ 386.319936] .load : 2048
> [ 386.319939] .nr_switches : 437801
> [ 386.319942] .nr_load_updates : 22334
> [ 386.319945] .nr_uninterruptible : 3
> [ 386.319948] .next_balance : 4294.988996
> [ 386.319951] .curr->pid : 3590
> [ 386.319954] .clock : 386317.815742
> [ 386.319957] .cpu_load[0] : 0
> [ 386.319960] .cpu_load[1] : 0
> [ 386.319962] .cpu_load[2] : 0
> [ 386.319965] .cpu_load[3] : 0
> [ 386.319967] .cpu_load[4] : 0
> [ 386.319970] .yld_count : 0
> [ 386.319973] .sched_count : 445110
> [ 386.319976] .sched_goidle : 198003
> [ 386.319979] .avg_idle : 1000000
> [ 386.319982] .ttwu_count : 184667
> [ 386.319985] .ttwu_local : 178809
> [ 386.319990]
> [ 386.319990] cfs_rq[0]:/autogroup-47
> [ 386.319994] .exec_clock : 13.273210
> [ 386.319998] .MIN_vruntime : 0.000001
> [ 386.320001] .min_vruntime : 13.529500
> [ 386.320005] .max_vruntime : 0.000001
> [ 386.320008] .spread : 0.000000
> [ 386.320011] .spread0 : -23453.744689
> [ 386.320015] .nr_spread_over : 0
> [ 386.320018] .nr_running : 1
> [ 386.320020] .load : 1024
> [ 386.320024] .load_avg : 0.000000
> [ 386.320027] .load_period : 9.999999
> [ 386.320030] .load_contrib : 0
> [ 386.320033] .load_tg : 0
> [ 386.320037] .se->exec_start : 386317.815742
> [ 386.320040] .se->vruntime : 23467.274189
> [ 386.320044] .se->sum_exec_runtime : 13.339940
> [ 386.320047] .se->statistics.wait_start : 0.000000
> [ 386.320051] .se->statistics.sleep_start : 0.000000
> [ 386.320054] .se->statistics.block_start : 0.000000
> [ 386.320057] .se->statistics.sleep_max : 0.000000
> [ 386.320060] .se->statistics.block_max : 0.000000
> [ 386.320063] .se->statistics.exec_max : 1.952409
> [ 386.320067] .se->statistics.slice_max : 0.000000
> [ 386.320070] .se->statistics.wait_max : 0.029838
> [ 386.320073] .se->statistics.wait_sum : 0.036460
> [ 386.320076] .se->statistics.wait_count : 32
> [ 386.320079] .se->load.weight : 1024
> [ 386.320083]
> [ 386.320083] cfs_rq[0]:/
> [ 386.320087] .exec_clock : 23614.135229
> [ 386.320090] .MIN_vruntime : 23465.802624
> [ 386.320094] .min_vruntime : 23467.274189
> [ 386.320097] .max_vruntime : 23465.802624
> [ 386.320100] .spread : 0.000000
> [ 386.320103] .spread0 : 0.000000
> [ 386.320106] .nr_spread_over : 117
> [ 386.320109] .nr_running : 2
> [ 386.320112] .load : 2048
> [ 386.320115] .load_avg : 0.000000
> [ 386.320118] .load_period : 0.000000
> [ 386.320121] .load_contrib : 0
> [ 386.320123] .load_tg : 0
> [ 386.320131]
> [ 386.320131] rt_rq[0]:/
> [ 386.320135] .rt_nr_running : 0
> [ 386.320137] .rt_throttled : 0
> [ 386.320140] .rt_time : 0.000000
> [ 386.320144] .rt_runtime : 950.000000
> [ 386.320148]
> [ 386.320148] runnable tasks:
> [ 386.320148] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
> [ 386.320148] ----------------------------------------------------------------------------------------------------------
> [ 386.320173] kworker/0:6 3413 23465.802624 9246 120 23465.802624 187.949272 352670.165012 /
> [ 386.320190] R bash 3590 13.529500 34 120 13.529500 16.465746 17470.132476 /autogroup-47
> [ 386.320203]
> [ 386.320203] cpu#1, 3013.515 MHz
> [ 386.320207] .nr_running : 0
> [ 386.320210] .load : 0
> [ 386.320213] .nr_switches : 291294
> [ 386.320216] .nr_load_updates : 35844
> [ 386.320219] .nr_uninterruptible : 1
> [ 386.320222] .next_balance : 4294.988982
> [ 386.320225] .curr->pid : 0
> [ 386.320228] .clock : 386317.758699
> [ 386.320231] .cpu_load[0] : 0
> [ 386.320234] .cpu_load[1] : 0
> [ 386.320237] .cpu_load[2] : 0
> [ 386.320239] .cpu_load[3] : 0
> [ 386.320242] .cpu_load[4] : 2
> [ 386.320245] .yld_count : 0
> [ 386.320248] .sched_count : 311183
> [ 386.320251] .sched_goidle : 130944
> [ 386.320253] .avg_idle : 1000000
> [ 386.320256] .ttwu_count : 186513
> [ 386.320259] .ttwu_local : 143360
> [ 386.320263]
> [ 386.320263] cfs_rq[1]:/autogroup-46
> [ 386.320268] .exec_clock : 64.885545
> [ 386.320271] .MIN_vruntime : 0.000001
> [ 386.320274] .min_vruntime : 63.581650
> [ 386.320278] .max_vruntime : 0.000001
> [ 386.320281] .spread : 0.000000
> [ 386.320284] .spread0 : -23403.692539
> [ 386.320287] .nr_spread_over : 1
> [ 386.320290] .nr_running : 0
> [ 386.320292] .load : 0
> [ 386.320296] .load_avg : 5119.999488
> [ 386.320299] .load_period : 5.010586
> [ 386.320302] .load_contrib : 1021
> [ 386.320305] .load_tg : 1021
> [ 386.320309] .se->exec_start : 386317.737525
> [ 386.320312] .se->vruntime : 16960.176657
> [ 386.320315] .se->sum_exec_runtime : 64.885545
> [ 386.320319] .se->statistics.wait_start : 0.000000
> [ 386.320322] .se->statistics.sleep_start : 0.000000
> [ 386.320325] .se->statistics.block_start : 0.000000
> [ 386.320328] .se->statistics.sleep_max : 0.000000
> [ 386.320331] .se->statistics.block_max : 0.000000
> [ 386.320335] .se->statistics.exec_max : 3.994871
> [ 386.320338] .se->statistics.slice_max : 0.000000
> [ 386.320341] .se->statistics.wait_max : 4.113969
> [ 386.320344] .se->statistics.wait_sum : 4.372857
> [ 386.320347] .se->statistics.wait_count : 93
> [ 386.320350] .se->load.weight : 2
> [ 386.320353]
> [ 386.320353] cfs_rq[1]:/
> [ 386.320357] .exec_clock : 18869.698887
> [ 386.320360] .MIN_vruntime : 0.000001
> [ 386.320364] .min_vruntime : 16966.057472
> [ 386.320367] .max_vruntime : 0.000001
> [ 386.320370] .spread : 0.000000
> [ 386.320373] .spread0 : -6501.216717
> [ 386.320376] .nr_spread_over : 73
> [ 386.320379] .nr_running : 0
> [ 386.320382] .load : 0
> [ 386.320385] .load_avg : 0.000000
> [ 386.320388] .load_period : 0.000000
> [ 386.320390] .load_contrib : 0
> [ 386.320393] .load_tg : 0
> [ 386.320397]
> [ 386.320397] rt_rq[1]:/
> [ 386.320401] .rt_nr_running : 0
> [ 386.320404] .rt_throttled : 0
> [ 386.320407] .rt_time : 0.000000
> [ 386.320410] .rt_runtime : 950.000000
> [ 386.320413]
> [ 386.320413] runnable tasks:
> [ 386.320413] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
> [ 386.320413] ----------------------------------------------------------------------------------------------------------
> [ 386.320437]
>
next prev parent reply other threads:[~2012-09-12 9:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-12 7:53 Deadlock in btrfs-cleaner, related to snapshot deletion Roman Mamedov
2012-09-12 9:46 ` Liu Bo [this message]
2012-09-12 21:23 ` Roman Mamedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=505059E4.1080903@oracle.com \
--to=bo.li.liu@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=rm@romanrm.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.