linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Mamedov <rm@romanrm.ru>
To: Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Deadlock in btrfs-cleaner, related to snapshot deletion
Date: Wed, 12 Sep 2012 13:53:15 +0600	[thread overview]
Message-ID: <20120912135315.32d670c2@natsu> (raw)

[-- Attachment #1: Type: text/plain, Size: 17803 bytes --]

Hello,

(this is a recap of yesterday's discussion on BTRFS IRC, also to save relevant pastes before pastebins expire)

I have my /home on btrfs; a cronjob makes one snapshot every 30 minutes; these
snapshots are kept for 24-48 hours, then deleted in batches.

This is a 16K Leaf/Node BTRFS on top of mdadm RAID1.

As system uptime approached 2 weeks, I started noticing that the free space on
that partition does not free up as it should assuming continuous clean-up of older snapshots.

I have tried a reboot, and it locked up unmounting the FS. After a hard
reset, soon after mounting the FS processes start to lock-up in D state, with
messages that a task is blocked for more than 120 seconds in dmesg.

I have upgraded to 3.6.0-rc5, and this is what I got via Alt-SysRq-W in this locked up state:

[  386.318062] SysRq : Show Blocked State
[  386.318141]   task                        PC stack   pid father
[  386.318188] btrfs-cleaner   D ffffffff8180d440     0  3506      2 0x00000000
[  386.318197]  ffff88040c34b6a0 0000000000000046 ffff88040c34b650 ffffffff81139f03
[  386.318205]  ffff88040c34bfd8 ffff88040c34bfd8 ffff88040c34bfd8 0000000000013dc0
[  386.318213]  ffffffff81c14440 ffff88040b3496d0 ffff88040c34b6b0 ffff880405a04408
[  386.318220] Call Trace:
[  386.318236]  [<ffffffff81139f03>] ? activate_page+0x83/0xa0
[  386.318245]  [<ffffffff8169d0e9>] schedule+0x29/0x70
[  386.318305]  [<ffffffffa011b29d>] btrfs_tree_lock+0xcd/0x250 [btrfs]
[  386.318314]  [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
[  386.318348]  [<ffffffffa00dc4f8>] btrfs_init_new_buffer+0x68/0x140 [btrfs]
[  386.318379]  [<ffffffffa00dc66f>] btrfs_alloc_free_block+0x9f/0x220 [btrfs]
[  386.318408]  [<ffffffffa00c83b2>] __btrfs_cow_block+0x122/0x4f0 [btrfs]
[  386.318415]  [<ffffffff8169e00e>] ? _raw_spin_lock+0xe/0x20
[  386.318443]  [<ffffffffa00c887c>] btrfs_cow_block+0xfc/0x220 [btrfs]
[  386.318472]  [<ffffffffa00cc9f8>] btrfs_search_slot+0x368/0x740 [btrfs]
[  386.318502]  [<ffffffffa00d32de>] lookup_inline_extent_backref+0x8e/0x4c0 [btrfs]
[  386.318532]  [<ffffffffa00d3770>] lookup_extent_backref+0x60/0xf0 [btrfs]
[  386.318561]  [<ffffffffa00d5c55>] __btrfs_free_extent+0xb5/0x670 [btrfs]
[  386.318592]  [<ffffffffa00d6324>] run_delayed_tree_ref+0x114/0x190 [btrfs]
[  386.318623]  [<ffffffffa00dac2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
[  386.318654]  [<ffffffffa00dad76>] run_clustered_refs+0x136/0x3d0 [btrfs]
[  386.318685]  [<ffffffffa00db100>] btrfs_run_delayed_refs+0xf0/0x2d0 [btrfs]
[  386.318721]  [<ffffffffa00ecf99>] __btrfs_end_transaction+0xf9/0x3c0 [btrfs]
[  386.318753]  [<ffffffffa00dcd35>] ? btrfs_drop_snapshot+0x545/0x600 [btrfs]
[  386.318789]  [<ffffffffa00ed2b8>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs]
[  386.318820]  [<ffffffffa00dca03>] btrfs_drop_snapshot+0x213/0x600 [btrfs]
[  386.318857]  [<ffffffffa00edae7>] btrfs_clean_old_snapshots+0xa7/0x150 [btrfs]
[  386.318890]  [<ffffffffa00e57d9>] cleaner_kthread+0xa9/0x120 [btrfs]
[  386.318924]  [<ffffffffa00e5730>] ? btree_invalidatepage+0x80/0x80 [btrfs]
[  386.318931]  [<ffffffff8107c3f3>] kthread+0x93/0xa0
[  386.318939]  [<ffffffff816a7a84>] kernel_thread_helper+0x4/0x10
[  386.318946]  [<ffffffff8107c360>] ? flush_kthread_worker+0xb0/0xb0
[  386.318952]  [<ffffffff816a7a80>] ? gs_change+0x13/0x13
[  386.318956] btrfs-transacti D 0000000000000000     0  3507      2 0x00000000
[  386.318963]  ffff88040aa15920 0000000000000046 0000000000000003 0000000000025250
[  386.318971]  ffff88040aa15fd8 ffff88040aa15fd8 ffff88040aa15fd8 0000000000013dc0
[  386.318977]  ffff88040a7fada0 ffff88040b34c470 ffff88040aa15930 ffff8803c7798ac0
[  386.318984] Call Trace:
[  386.318991]  [<ffffffff8169d0e9>] schedule+0x29/0x70
[  386.319031]  [<ffffffffa011b29d>] btrfs_tree_lock+0xcd/0x250 [btrfs]
[  386.319039]  [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
[  386.319068]  [<ffffffffa00ccc81>] btrfs_search_slot+0x5f1/0x740 [btrfs]
[  386.319098]  [<ffffffffa00d32de>] lookup_inline_extent_backref+0x8e/0x4c0 [btrfs]
[  386.319128]  [<ffffffffa00d3770>] lookup_extent_backref+0x60/0xf0 [btrfs]
[  386.319157]  [<ffffffffa00d5c55>] __btrfs_free_extent+0xb5/0x670 [btrfs]
[  386.319188]  [<ffffffffa00d6324>] run_delayed_tree_ref+0x114/0x190 [btrfs]
[  386.319219]  [<ffffffffa00dac2e>] run_one_delayed_ref+0xde/0xf0 [btrfs]
[  386.319250]  [<ffffffffa00dad76>] run_clustered_refs+0x136/0x3d0 [btrfs]
[  386.319281]  [<ffffffffa00db100>] btrfs_run_delayed_refs+0xf0/0x2d0 [btrfs]
[  386.319317]  [<ffffffffa00ec555>] btrfs_commit_transaction+0xa5/0x9c0 [btrfs]
[  386.319325]  [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
[  386.319359]  [<ffffffffa00e6a65>] transaction_kthread+0x1b5/0x230 [btrfs]
[  386.319393]  [<ffffffffa00e68b0>] ? btrfs_destroy_delayed_refs.isra.103+0x210/0x210 [btrfs]
[  386.319400]  [<ffffffff8107c3f3>] kthread+0x93/0xa0
[  386.319407]  [<ffffffff816a7a84>] kernel_thread_helper+0x4/0x10
[  386.319414]  [<ffffffff8107c360>] ? flush_kthread_worker+0xb0/0xb0
[  386.319421]  [<ffffffff816a7a80>] ? gs_change+0x13/0x13
[  386.319424] flush-btrfs-3   D 0000000000000000     0  3520      2 0x00000000
[  386.319431]  ffff8803c181b9d0 0000000000000046 ffff8803c181b970 ffff880405ab2d98
[  386.319437]  ffff8803c181bfd8 ffff8803c181bfd8 ffff8803c181bfd8 0000000000013dc0
[  386.319444]  ffff88040a7fada0 ffff8804014316d0 ffff8803c181b9e0 ffff8803d870e2b0
[  386.319450] Call Trace:
[  386.319457]  [<ffffffff8169d0e9>] schedule+0x29/0x70
[  386.319496]  [<ffffffffa011b29d>] btrfs_tree_lock+0xcd/0x250 [btrfs]
[  386.319503]  [<ffffffff8107ccd0>] ? add_wait_queue+0x60/0x60
[  386.319542]  [<ffffffffa0102873>] lock_extent_buffer_for_io+0x103/0x260 [btrfs]
[  386.319581]  [<ffffffffa010862c>] btree_write_cache_pages+0x21c/0x410 [btrfs]
[  386.319619]  [<ffffffffa00f1e40>] ? can_nocow_odirect+0x2b0/0x2b0 [btrfs]
[  386.319654]  [<ffffffffa00e4a6b>] btree_writepages+0x3b/0x40 [btrfs]
[  386.319661]  [<ffffffff81138450>] do_writepages+0x20/0x40
[  386.319670]  [<ffffffff811af12f>] __writeback_single_inode.isra.32+0x3f/0x190
[  386.319678]  [<ffffffff811af850>] writeback_sb_inodes+0x1a0/0x350
[  386.319685]  [<ffffffff811afbbf>] wb_writeback+0xef/0x340
[  386.319693]  [<ffffffff811b1468>] wb_do_writeback+0xb8/0x1d0
[  386.319701]  [<ffffffff81067c20>] ? usleep_range+0x50/0x50
[  386.319709]  [<ffffffff811b160b>] bdi_writeback_thread+0x8b/0x290
[  386.319717]  [<ffffffff811b1580>] ? wb_do_writeback+0x1d0/0x1d0
[  386.319723]  [<ffffffff8107c3f3>] kthread+0x93/0xa0
[  386.319730]  [<ffffffff816a7a84>] kernel_thread_helper+0x4/0x10
[  386.319738]  [<ffffffff8107c360>] ? flush_kthread_worker+0xb0/0xb0
[  386.319744]  [<ffffffff816a7a80>] ? gs_change+0x13/0x13
[  386.319748] sync            D ffffffff8180d440     0  3585   3570 0x00000000
[  386.319754]  ffff8803dacb1cf8 0000000000000082 ffff8803dacb1cd8 ffffffff81094699
[  386.319761]  ffff8803dacb1fd8 ffff8803dacb1fd8 ffff8803dacb1fd8 0000000000013dc0
[  386.319768]  ffff88040d92db40 ffff88040bc14470 0000000000013dc0 7fffffffffffffff
[  386.319774] Call Trace:
[  386.319783]  [<ffffffff81094699>] ? check_preempt_wakeup+0x1c9/0x270
[  386.319789]  [<ffffffff8169d0e9>] schedule+0x29/0x70
[  386.319796]  [<ffffffff8169b82d>] schedule_timeout+0x1fd/0x260
[  386.319803]  [<ffffffff8108c1a6>] ? ttwu_do_activate.constprop.86+0x66/0x70
[  386.319809]  [<ffffffff8108c286>] ? ttwu_queue+0xd6/0xf0
[  386.319815]  [<ffffffff8169cf3f>] wait_for_common+0xdf/0x180
[  386.319822]  [<ffffffff811b4fc0>] ? fdatawrite_one_bdev+0x20/0x20
[  386.319828]  [<ffffffff8108eec0>] ? try_to_wake_up+0x200/0x200
[  386.319834]  [<ffffffff811b4fc0>] ? fdatawrite_one_bdev+0x20/0x20
[  386.319840]  [<ffffffff8169d0bd>] wait_for_completion+0x1d/0x20
[  386.319847]  [<ffffffff811ad907>] sync_inodes_sb+0x97/0xd0
[  386.319854]  [<ffffffff811b4fdd>] sync_inodes_one_sb+0x1d/0x30
[  386.319862]  [<ffffffff811894a1>] iterate_supers+0x101/0x110
[  386.319867]  [<ffffffff811b5145>] sys_sync+0x35/0x90
[  386.319874]  [<ffffffff816a686d>] system_call_fastpath+0x1a/0x1f
[  386.319881] Sched Debug Version: v0.10, 3.6.0-030600rc5-generic #201209082035
[  386.319885] ktime                                   : 386839.691480
[  386.319889] sched_clk                               : 386319.878549
[  386.319893] cpu_clk                                 : 386319.878681
[  386.319896] jiffies                                 : 4294989005
[  386.319899] sched_clock_stable                      : 1
[  386.319901] 
[  386.319903] sysctl_sched
[  386.319907]   .sysctl_sched_latency                    : 12.000000
[  386.319911]   .sysctl_sched_min_granularity            : 1.500000
[  386.319914]   .sysctl_sched_wakeup_granularity         : 2.000000
[  386.319917]   .sysctl_sched_child_runs_first           : 0
[  386.319920]   .sysctl_sched_features                   : 24119
[  386.319924]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[  386.319929] 
[  386.319929] cpu#0, 3013.515 MHz
[  386.319933]   .nr_running                    : 2
[  386.319936]   .load                          : 2048
[  386.319939]   .nr_switches                   : 437801
[  386.319942]   .nr_load_updates               : 22334
[  386.319945]   .nr_uninterruptible            : 3
[  386.319948]   .next_balance                  : 4294.988996
[  386.319951]   .curr->pid                     : 3590
[  386.319954]   .clock                         : 386317.815742
[  386.319957]   .cpu_load[0]                   : 0
[  386.319960]   .cpu_load[1]                   : 0
[  386.319962]   .cpu_load[2]                   : 0
[  386.319965]   .cpu_load[3]                   : 0
[  386.319967]   .cpu_load[4]                   : 0
[  386.319970]   .yld_count                     : 0
[  386.319973]   .sched_count                   : 445110
[  386.319976]   .sched_goidle                  : 198003
[  386.319979]   .avg_idle                      : 1000000
[  386.319982]   .ttwu_count                    : 184667
[  386.319985]   .ttwu_local                    : 178809
[  386.319990] 
[  386.319990] cfs_rq[0]:/autogroup-47
[  386.319994]   .exec_clock                    : 13.273210
[  386.319998]   .MIN_vruntime                  : 0.000001
[  386.320001]   .min_vruntime                  : 13.529500
[  386.320005]   .max_vruntime                  : 0.000001
[  386.320008]   .spread                        : 0.000000
[  386.320011]   .spread0                       : -23453.744689
[  386.320015]   .nr_spread_over                : 0
[  386.320018]   .nr_running                    : 1
[  386.320020]   .load                          : 1024
[  386.320024]   .load_avg                      : 0.000000
[  386.320027]   .load_period                   : 9.999999
[  386.320030]   .load_contrib                  : 0
[  386.320033]   .load_tg                       : 0
[  386.320037]   .se->exec_start                : 386317.815742
[  386.320040]   .se->vruntime                  : 23467.274189
[  386.320044]   .se->sum_exec_runtime          : 13.339940
[  386.320047]   .se->statistics.wait_start     : 0.000000
[  386.320051]   .se->statistics.sleep_start    : 0.000000
[  386.320054]   .se->statistics.block_start    : 0.000000
[  386.320057]   .se->statistics.sleep_max      : 0.000000
[  386.320060]   .se->statistics.block_max      : 0.000000
[  386.320063]   .se->statistics.exec_max       : 1.952409
[  386.320067]   .se->statistics.slice_max      : 0.000000
[  386.320070]   .se->statistics.wait_max       : 0.029838
[  386.320073]   .se->statistics.wait_sum       : 0.036460
[  386.320076]   .se->statistics.wait_count     : 32
[  386.320079]   .se->load.weight               : 1024
[  386.320083] 
[  386.320083] cfs_rq[0]:/
[  386.320087]   .exec_clock                    : 23614.135229
[  386.320090]   .MIN_vruntime                  : 23465.802624
[  386.320094]   .min_vruntime                  : 23467.274189
[  386.320097]   .max_vruntime                  : 23465.802624
[  386.320100]   .spread                        : 0.000000
[  386.320103]   .spread0                       : 0.000000
[  386.320106]   .nr_spread_over                : 117
[  386.320109]   .nr_running                    : 2
[  386.320112]   .load                          : 2048
[  386.320115]   .load_avg                      : 0.000000
[  386.320118]   .load_period                   : 0.000000
[  386.320121]   .load_contrib                  : 0
[  386.320123]   .load_tg                       : 0
[  386.320131] 
[  386.320131] rt_rq[0]:/
[  386.320135]   .rt_nr_running                 : 0
[  386.320137]   .rt_throttled                  : 0
[  386.320140]   .rt_time                       : 0.000000
[  386.320144]   .rt_runtime                    : 950.000000
[  386.320148] 
[  386.320148] runnable tasks:
[  386.320148]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[  386.320148] ----------------------------------------------------------------------------------------------------------
[  386.320173]      kworker/0:6  3413     23465.802624      9246   120     23465.802624       187.949272    352670.165012 /
[  386.320190] R           bash  3590        13.529500        34   120        13.529500        16.465746     17470.132476 /autogroup-47
[  386.320203] 
[  386.320203] cpu#1, 3013.515 MHz
[  386.320207]   .nr_running                    : 0
[  386.320210]   .load                          : 0
[  386.320213]   .nr_switches                   : 291294
[  386.320216]   .nr_load_updates               : 35844
[  386.320219]   .nr_uninterruptible            : 1
[  386.320222]   .next_balance                  : 4294.988982
[  386.320225]   .curr->pid                     : 0
[  386.320228]   .clock                         : 386317.758699
[  386.320231]   .cpu_load[0]                   : 0
[  386.320234]   .cpu_load[1]                   : 0
[  386.320237]   .cpu_load[2]                   : 0
[  386.320239]   .cpu_load[3]                   : 0
[  386.320242]   .cpu_load[4]                   : 2
[  386.320245]   .yld_count                     : 0
[  386.320248]   .sched_count                   : 311183
[  386.320251]   .sched_goidle                  : 130944
[  386.320253]   .avg_idle                      : 1000000
[  386.320256]   .ttwu_count                    : 186513
[  386.320259]   .ttwu_local                    : 143360
[  386.320263] 
[  386.320263] cfs_rq[1]:/autogroup-46
[  386.320268]   .exec_clock                    : 64.885545
[  386.320271]   .MIN_vruntime                  : 0.000001
[  386.320274]   .min_vruntime                  : 63.581650
[  386.320278]   .max_vruntime                  : 0.000001
[  386.320281]   .spread                        : 0.000000
[  386.320284]   .spread0                       : -23403.692539
[  386.320287]   .nr_spread_over                : 1
[  386.320290]   .nr_running                    : 0
[  386.320292]   .load                          : 0
[  386.320296]   .load_avg                      : 5119.999488
[  386.320299]   .load_period                   : 5.010586
[  386.320302]   .load_contrib                  : 1021
[  386.320305]   .load_tg                       : 1021
[  386.320309]   .se->exec_start                : 386317.737525
[  386.320312]   .se->vruntime                  : 16960.176657
[  386.320315]   .se->sum_exec_runtime          : 64.885545
[  386.320319]   .se->statistics.wait_start     : 0.000000
[  386.320322]   .se->statistics.sleep_start    : 0.000000
[  386.320325]   .se->statistics.block_start    : 0.000000
[  386.320328]   .se->statistics.sleep_max      : 0.000000
[  386.320331]   .se->statistics.block_max      : 0.000000
[  386.320335]   .se->statistics.exec_max       : 3.994871
[  386.320338]   .se->statistics.slice_max      : 0.000000
[  386.320341]   .se->statistics.wait_max       : 4.113969
[  386.320344]   .se->statistics.wait_sum       : 4.372857
[  386.320347]   .se->statistics.wait_count     : 93
[  386.320350]   .se->load.weight               : 2
[  386.320353] 
[  386.320353] cfs_rq[1]:/
[  386.320357]   .exec_clock                    : 18869.698887
[  386.320360]   .MIN_vruntime                  : 0.000001
[  386.320364]   .min_vruntime                  : 16966.057472
[  386.320367]   .max_vruntime                  : 0.000001
[  386.320370]   .spread                        : 0.000000
[  386.320373]   .spread0                       : -6501.216717
[  386.320376]   .nr_spread_over                : 73
[  386.320379]   .nr_running                    : 0
[  386.320382]   .load                          : 0
[  386.320385]   .load_avg                      : 0.000000
[  386.320388]   .load_period                   : 0.000000
[  386.320390]   .load_contrib                  : 0
[  386.320393]   .load_tg                       : 0
[  386.320397] 
[  386.320397] rt_rq[1]:/
[  386.320401]   .rt_nr_running                 : 0
[  386.320404]   .rt_throttled                  : 0
[  386.320407]   .rt_time                       : 0.000000
[  386.320410]   .rt_runtime                    : 950.000000
[  386.320413] 
[  386.320413] runnable tasks:
[  386.320413]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[  386.320413] ----------------------------------------------------------------------------------------------------------
[  386.320437]

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

             reply	other threads:[~2012-09-12  7:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-12  7:53 Roman Mamedov [this message]
2012-09-12  9:46 ` Deadlock in btrfs-cleaner, related to snapshot deletion Liu Bo
2012-09-12 21:23   ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120912135315.32d670c2@natsu \
    --to=rm@romanrm.ru \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).