linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Steigerwald <Martin@lichtvoll.de>
To: Chris Mason <clm@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS hang with 3.16-rc5
Date: Sat, 19 Jul 2014 19:59:56 +0200	[thread overview]
Message-ID: <3378728.FvZP3jOPGh@merkaba> (raw)
In-Reply-To: <53C922C6.1020406@fb.com>

Am Freitag, 18. Juli 2014, 09:36:06 schrieb Chris Mason:
> On 07/18/2014 03:51 AM, Martin Steigerwald wrote:
> > Am Dienstag, 15. Juli 2014, 09:21:40 schrieb Chris Mason:
> >> On 07/14/2014 05:58 PM, Martin Steigerwald wrote:
> >>> Am Montag, 14. Juli 2014, 16:12:22 schrieb Chris Mason:
> >>>> On 07/14/2014 11:10 AM, Martin Steigerwald wrote:
> >>>>> Am Montag, 14. Juli 2014, 17:04:22 schrieben Sie:
> >>>>>> Hi!
> >>>>>> 
> >>>>>> While with 3.16-rc3 and rc4 I didn´t have a BTRFS hang in several
> >>>>>> days
> >>>>>> of
> >>>>>> usage, with 3-16-rc5 I had a hang again. Less than a hour since
> >>>>>> booting
> >>>>>> it.
> >>>>>> 
> >>>>>> Since the hang bug I and others had with 3.15 and upto 3.16-rc2
> >>>>>> usually
> >>>>>> didn´t happen that quickly after boot and since backtrace looks a bit
> >>>>>> different from what I have in memory, I post this in a new thread.
> >>>>>> See thread "Blocked tasks on 3.15.1" for a discussion of previous
> >>>>>> hang
> >>>>>> issues.
> >>>>> 
> >>>>> Probably good to add some basic information on the filesystem:
> >>>> Do you have compression enabled?  I wasn't able to nail down the 3.15.1
> >>>> hang before vacation attacked me, but I'm hoping to track it down
> >>>> today.
> >>> 
> >>> Yes. I have.
> >>> 
> >>> It just hung again while I was playing PlaneShift.
> >>> 
> >>> Back to 3.16-rc4 as rc5 seems to be broke here.
> >> 
> >> The btrfs hang you're hitting goes back to 3.15.  So 3.16-rc4 vs rc5
> >> shouldn't be a factor.  Are you hitting other problems with 3.16?
> > 
> > On this system it is a matter.
> > 
> > 3.16-rc5: Two hangs in one day
> > 
> > 3.16-rc4: No hang so far with three days uptime (well with hibernation
> > cycles in between)
> > 
> > So easy observation for me: 3.16-rc4 fine, 3.16-rc5 broke.
> 
> Can you please try this patch on rc5 and look for the printk:
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3668048..8ab56df 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
>  		spin_unlock(&root->fs_info->ordered_root_lock);
>  	}
> 
> +	spin_lock(&root->fs_info->ordered_root_lock);
> +	if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
> +		list_del_init(&BTRFS_I(inode)->ordered_operations);
> +printk(KERN_CRIT "racing inode deletion with ordered
> operations!!!!!!!!!!!\n"); +	}
> +	spin_unlock(&root->fs_info->ordered_root_lock);
> +
>  	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>  		     &BTRFS_I(inode)->runtime_flags)) {
>  		btrfs_info(root->fs_info, "inode %llu still on the orphan list",

Did so and again got a hang.

No racing inodes tough:

merkaba:/boot> zgrep -i "racing inode" /var/log/syslog*
merkaba:/boot#1>

Built kernel seems right:

martin@merkaba:[…]> LANG=C grep -ir "racing inode" fs/btrfs
fs/btrfs/inode.c:printk(KERN_CRIT "racing inode deletion with ordered operations!!!!!!!!!!!\n");
Binary file fs/btrfs/inode.o matches
Binary file fs/btrfs/btrfs.o matches
Binary file fs/btrfs/btrfs.ko matches

Backtrace doesn´t seem to contain any function related to inodes.


Back to rc4 again for now.


These hangs seemed to occur first at writing several hundred MiB onto a
high speed SDHC card… yet, they persisted long after the write was finished,
upto to a point where I had to reboot cause machine hung on trying to
switch between tty7 (X11) and tty1 (for diagnosis).



Jul 19 19:29:11 merkaba kernel: [12346.692457] mmc0: new high speed SDHC card at address 0001
Jul 19 19:29:11 merkaba kernel: [12346.744276] mmcblk0: mmc0:0001 00000 29.2 GiB 
Jul 19 19:29:11 merkaba kernel: [12346.769850]  mmcblk0: p1
Jul 19 19:29:20 merkaba kernel: [12355.796267] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case se
nsitive!
Jul 19 19:29:20 merkaba kernel: [12355.805515] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Jul 19 19:33:27 merkaba kernel: [12602.162818] INFO: task btrfs-transacti:715 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.162826]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.162827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.162828] btrfs-transacti D ffff8800cf90e780     0   715      2 0x00000000
Jul 19 19:33:27 merkaba kernel: [12602.162834]  ffff880401ddbc80 0000000000000002 ffff880407fc0000 ffff880401ddbfd8
Jul 19 19:33:27 merkaba kernel: [12602.162836]  ffff8800cf90e240 0000000000013080 ffff8800cf90e240 7fffffffffffffff
Jul 19 19:33:27 merkaba kernel: [12602.162839]  ffff8804018efd28 0000000000000002 ffffffff81470270 ffff8804018efd20
Jul 19 19:33:27 merkaba kernel: [12602.162842] Call Trace:
Jul 19 19:33:27 merkaba kernel: [12602.162856]  [<ffffffff81470270>] ? michael_mic.part.6+0x1e/0x1e
Jul 19 19:33:27 merkaba kernel: [12602.162860]  [<ffffffff81470c70>] schedule+0x64/0x66
Jul 19 19:33:27 merkaba kernel: [12602.162862]  [<ffffffff8147029f>] schedule_timeout+0x2f/0x114
Jul 19 19:33:27 merkaba kernel: [12602.162867]  [<ffffffff81062c4d>] ? get_parent_ip+0xd/0x3c
Jul 19 19:33:27 merkaba kernel: [12602.162868]  [<ffffffff81062cf7>] ? preempt_count_add+0x7b/0x8e
Jul 19 19:33:27 merkaba kernel: [12602.162871]  [<ffffffff8147149d>] __wait_for_common+0x11e/0x163
Jul 19 19:33:27 merkaba kernel: [12602.162873]  [<ffffffff8147149d>] ? __wait_for_common+0x11e/0x163
Jul 19 19:33:27 merkaba kernel: [12602.162876]  [<ffffffff810647e9>] ? wake_up_state+0xd/0xd
Jul 19 19:33:27 merkaba kernel: [12602.162879]  [<ffffffff81471501>] wait_for_completion+0x1f/0x21
Jul 19 19:33:27 merkaba kernel: [12602.162916]  [<ffffffffc0423c16>] btrfs_wait_and_free_delalloc_work+0x11/0x23 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162933]  [<ffffffffc042bb55>] btrfs_run_ordered_operations+0x1ea/0x21a [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162947]  [<ffffffffc0417392>] btrfs_commit_transaction+0x22/0x8df [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162959]  [<ffffffffc0413fb6>] transaction_kthread+0x107/0x1b9 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162970]  [<ffffffffc0413eaf>] ? btrfs_cleanup_transaction+0x43a/0x43a [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162973]  [<ffffffff81058002>] kthread+0xb2/0xba
Jul 19 19:33:27 merkaba kernel: [12602.162976]  [<ffffffff81470000>] ? iommu_prepare_identity_map+0x1d/0x19d
Jul 19 19:33:27 merkaba kernel: [12602.162978]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.162980]  [<ffffffff81473bec>] ret_from_fork+0x7c/0xb0
Jul 19 19:33:27 merkaba kernel: [12602.162982]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.162999] INFO: task QThread:2996 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.163000]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.163001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.163002] QThread         D ffff8803f71a6780     0  2996      1 0x00000000
Jul 19 19:33:27 merkaba kernel: [12602.163005]  ffff8802df447c18 0000000000000002 ffff880407f649b0 ffff8802df447fd8
Jul 19 19:33:27 merkaba kernel: [12602.163007]  ffff8803f71a6240 0000000000013080 ffff8803f71a6240 ffff88041e293080
Jul 19 19:33:27 merkaba kernel: [12602.163009]  ffff88041e5d5268 ffff8802df447cb0 0000000000000002 ffffffff810daffc
Jul 19 19:33:27 merkaba kernel: [12602.163012] Call Trace:
Jul 19 19:33:27 merkaba kernel: [12602.163015]  [<ffffffff810daffc>] ? wait_on_page_read+0x37/0x37
Jul 19 19:33:27 merkaba kernel: [12602.163017]  [<ffffffff81470c70>] schedule+0x64/0x66
Jul 19 19:33:27 merkaba kernel: [12602.163019]  [<ffffffff81470df7>] io_schedule+0x57/0x76
Jul 19 19:33:27 merkaba kernel: [12602.163021]  [<ffffffff810db005>] sleep_on_page+0x9/0xd
Jul 19 19:33:27 merkaba kernel: [12602.163023]  [<ffffffff814711d7>] __wait_on_bit_lock+0x41/0x85
Jul 19 19:33:27 merkaba kernel: [12602.163025]  [<ffffffff810db0c2>] __lock_page+0x70/0x7c
Jul 19 19:33:27 merkaba kernel: [12602.163030]  [<ffffffff81070f79>] ? autoremove_wake_function+0x2f/0x2f
Jul 19 19:33:27 merkaba kernel: [12602.163045]  [<ffffffffc042c58f>] lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163059]  [<ffffffffc042c58f>] ? lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163073]  [<ffffffffc043086c>] extent_write_cache_pages.isra.21.constprop.42+0x1a7/0x2d9 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163076]  [<ffffffff814716d9>] ? mutex_unlock+0x11/0x13
Jul 19 19:33:27 merkaba kernel: [12602.163089]  [<ffffffffc0425ebd>] ? btrfs_file_write_iter+0x322/0x412 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163091]  [<ffffffff81062c4d>] ? get_parent_ip+0xd/0x3c
Jul 19 19:33:27 merkaba kernel: [12602.163105]  [<ffffffffc0430d94>] extent_writepages+0x46/0x57 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163119]  [<ffffffffc041b25a>] ? btrfs_submit_direct+0x3ef/0x3ef [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163122]  [<ffffffff81075192>] ? cpuacct_account_field+0x55/0x5f
Jul 19 19:33:27 merkaba kernel: [12602.163134]  [<ffffffffc041960f>] btrfs_writepages+0x23/0x25 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163136]  [<ffffffff810e5381>] do_writepages+0x1b/0x24
Jul 19 19:33:27 merkaba kernel: [12602.163138]  [<ffffffff810dc6ad>] __filemap_fdatawrite_range+0x50/0x52
Jul 19 19:33:27 merkaba kernel: [12602.163140]  [<ffffffff810dc714>] filemap_fdatawrite_range+0xe/0x10
Jul 19 19:33:27 merkaba kernel: [12602.163154]  [<ffffffffc0424a26>] btrfs_sync_file+0x84/0x2bd [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163156]  [<ffffffff8120e159>] ? __this_cpu_preempt_check+0x13/0x15
Jul 19 19:33:27 merkaba kernel: [12602.163159]  [<ffffffff81155fab>] vfs_fsync_range+0x1c/0x1e
Jul 19 19:33:27 merkaba kernel: [12602.163161]  [<ffffffff81155fc4>] vfs_fsync+0x17/0x19
Jul 19 19:33:27 merkaba kernel: [12602.163164]  [<ffffffff811561eb>] do_fsync+0x2c/0x45
Jul 19 19:33:27 merkaba kernel: [12602.163166]  [<ffffffff811563d1>] SyS_fsync+0xb/0xf
Jul 19 19:33:27 merkaba kernel: [12602.163168]  [<ffffffff81473e8b>] tracesys+0xdd/0xe2
Jul 19 19:33:27 merkaba kernel: [12602.163193] INFO: task kworker/u8:6:3861 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.163194]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.163196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.163196] kworker/u8:6    D ffff880071cccef0     0  3861      2 0x00000000
Jul 19 19:33:27 merkaba kernel: [12602.163202] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
Jul 19 19:33:27 merkaba kernel: [12602.163204]  ffff88029ad179b8 0000000000000002 ffff880407fc6240 ffff88029ad17fd8
Jul 19 19:33:27 merkaba kernel: [12602.163206]  ffff880071ccc9b0 0000000000013080 ffff880071ccc9b0 ffff88041e293080
Jul 19 19:33:27 merkaba kernel: [12602.163209]  ffff88041e5c0268 ffff88029ad17a50 0000000000000002 ffffffff810daffc
Jul 19 19:33:27 merkaba kernel: [12602.163211] Call Trace:
Jul 19 19:33:27 merkaba kernel: [12602.163213]  [<ffffffff810daffc>] ? wait_on_page_read+0x37/0x37
Jul 19 19:33:27 merkaba kernel: [12602.163216]  [<ffffffff81470c70>] schedule+0x64/0x66
Jul 19 19:33:27 merkaba kernel: [12602.163218]  [<ffffffff81470df7>] io_schedule+0x57/0x76
Jul 19 19:33:27 merkaba kernel: [12602.163219]  [<ffffffff810db005>] sleep_on_page+0x9/0xd
Jul 19 19:33:27 merkaba kernel: [12602.163222]  [<ffffffff814711d7>] __wait_on_bit_lock+0x41/0x85
Jul 19 19:33:27 merkaba kernel: [12602.163224]  [<ffffffff810db0c2>] __lock_page+0x70/0x7c
Jul 19 19:33:27 merkaba kernel: [12602.163226]  [<ffffffff81070f79>] ? autoremove_wake_function+0x2f/0x2f
Jul 19 19:33:27 merkaba kernel: [12602.163242]  [<ffffffffc042c58f>] lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163255]  [<ffffffffc042c58f>] ? lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163269]  [<ffffffffc043086c>] extent_write_cache_pages.isra.21.constprop.42+0x1a7/0x2d9 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163284]  [<ffffffffc0430d94>] extent_writepages+0x46/0x57 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163296]  [<ffffffffc041b25a>] ? btrfs_submit_direct+0x3ef/0x3ef [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163298]  [<ffffffff81062cf4>] ? preempt_count_add+0x78/0x8e
Jul 19 19:33:27 merkaba kernel: [12602.163310]  [<ffffffffc041960f>] btrfs_writepages+0x23/0x25 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163312]  [<ffffffff810e5381>] do_writepages+0x1b/0x24
Jul 19 19:33:27 merkaba kernel: [12602.163315]  [<ffffffff81151d0d>] __writeback_single_inode+0x58/0x212
Jul 19 19:33:27 merkaba kernel: [12602.163317]  [<ffffffff81152c0f>] writeback_sb_inodes+0x1e9/0x32e
Jul 19 19:33:27 merkaba kernel: [12602.163319]  [<ffffffff81152dce>] __writeback_inodes_wb+0x7a/0xb0
Jul 19 19:33:27 merkaba kernel: [12602.163321]  [<ffffffff81152f1a>] wb_writeback+0x116/0x270
Jul 19 19:33:27 merkaba kernel: [12602.163323]  [<ffffffff811532c9>] bdi_writeback_workfn+0x171/0x313
Jul 19 19:33:27 merkaba kernel: [12602.163327]  [<ffffffff81052d7c>] process_one_work+0x16f/0x2b8
Jul 19 19:33:27 merkaba kernel: [12602.163329]  [<ffffffff81053626>] worker_thread+0x27b/0x32e
Jul 19 19:33:27 merkaba kernel: [12602.163332]  [<ffffffff810533ab>] ? cancel_delayed_work_sync+0x10/0x10
Jul 19 19:33:27 merkaba kernel: [12602.163334]  [<ffffffff81058002>] kthread+0xb2/0xba
Jul 19 19:33:27 merkaba kernel: [12602.163336]  [<ffffffff81470000>] ? iommu_prepare_identity_map+0x1d/0x19d
Jul 19 19:33:27 merkaba kernel: [12602.163338]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.163340]  [<ffffffff81473bec>] ret_from_fork+0x7c/0xb0
Jul 19 19:33:27 merkaba kernel: [12602.163342]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.163344] INFO: task kworker/u8:7:3987 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.163346]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.163347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.163348] kworker/u8:7    D ffff8803ce039dd0     0  3987      2 0x00000000


Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

  reply	other threads:[~2014-07-19 17:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-14 15:04 BTRFS hang with 3.16-rc5 Martin Steigerwald
2014-07-14 15:10 ` Martin Steigerwald
2014-07-14 17:51   ` Duncan
2014-07-14 22:03     ` Martin Steigerwald
2014-07-15  2:45       ` Duncan
2014-07-14 20:12   ` Chris Mason
2014-07-14 21:58     ` Martin Steigerwald
2014-07-15 13:21       ` Chris Mason
2014-07-15 15:08         ` Martin Steigerwald
2014-07-23 22:47           ` BTRFS hang with 3.16-rc5 (and also with 3.16-rc4) Martin Steigerwald
2014-07-24 14:58             ` Chris Mason
2014-07-24 16:24               ` Martin Steigerwald
2014-07-24 18:49               ` Martin Steigerwald
2014-07-24 20:04                 ` Chris Mason
2014-07-28 22:57                   ` Martin Steigerwald
2014-07-25  2:32                 ` Duncan
2014-07-25  3:06                   ` Nick Krause
     [not found]                     ` <20140725080244.GA31950@carfax.org.uk>
2014-07-25  9:13                       ` Hugo Mills
2014-07-28 13:20                         ` David Sterba
2014-07-25 10:07                   ` Martin Steigerwald
2014-07-25  4:51               ` Torbjørn
     [not found]                 ` <20140725092800.GC25859@localhost.localdomain>
2014-07-25 10:22                   ` Torbjørn
     [not found]                     ` <53D23AF1.9010704@skagestad.org>
2014-07-25 11:37                       ` Torbjørn
2014-07-25 16:14                         ` Torbjørn
2014-07-28 10:00                         ` Liu Bo
2014-07-28 11:11                           ` Torbjørn
2014-07-29 10:18                             ` Liu Bo
2014-07-29 15:07                               ` Torbjørn
2014-07-30  5:09                                 ` Liu Bo
2014-07-18  7:51         ` BTRFS hang with 3.16-rc5 Martin Steigerwald
2014-07-18 13:36           ` Chris Mason
2014-07-19 17:59             ` Martin Steigerwald [this message]
2014-07-19 18:39               ` Chris Mason
2014-07-19 19:00                 ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3378728.FvZP3jOPGh@merkaba \
    --to=martin@lichtvoll.de \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).