All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <Martin@lichtvoll.de>
To: Chris Mason <clm@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS hang with 3.16-rc5
Date: Sat, 19 Jul 2014 19:59:56 +0200	[thread overview]
Message-ID: <3378728.FvZP3jOPGh@merkaba> (raw)
In-Reply-To: <53C922C6.1020406@fb.com>

Am Freitag, 18. Juli 2014, 09:36:06 schrieb Chris Mason:
> On 07/18/2014 03:51 AM, Martin Steigerwald wrote:
> > Am Dienstag, 15. Juli 2014, 09:21:40 schrieb Chris Mason:
> >> On 07/14/2014 05:58 PM, Martin Steigerwald wrote:
> >>> Am Montag, 14. Juli 2014, 16:12:22 schrieb Chris Mason:
> >>>> On 07/14/2014 11:10 AM, Martin Steigerwald wrote:
> >>>>> Am Montag, 14. Juli 2014, 17:04:22 schrieben Sie:
> >>>>>> Hi!
> >>>>>> 
> >>>>>> While with 3.16-rc3 and rc4 I didn´t have a BTRFS hang in several
> >>>>>> days
> >>>>>> of
> >>>>>> usage, with 3-16-rc5 I had a hang again. Less than a hour since
> >>>>>> booting
> >>>>>> it.
> >>>>>> 
> >>>>>> Since the hang bug I and others had with 3.15 and upto 3.16-rc2
> >>>>>> usually
> >>>>>> didn´t happen that quickly after boot and since backtrace looks a bit
> >>>>>> different from what I have in memory, I post this in a new thread.
> >>>>>> See thread "Blocked tasks on 3.15.1" for a discussion of previous
> >>>>>> hang
> >>>>>> issues.
> >>>>> 
> >>>>> Probably good to add some basic information on the filesystem:
> >>>> Do you have compression enabled?  I wasn't able to nail down the 3.15.1
> >>>> hang before vacation attacked me, but I'm hoping to track it down
> >>>> today.
> >>> 
> >>> Yes. I have.
> >>> 
> >>> It just hung again while I was playing PlaneShift.
> >>> 
> >>> Back to 3.16-rc4 as rc5 seems to be broke here.
> >> 
> >> The btrfs hang you're hitting goes back to 3.15.  So 3.16-rc4 vs rc5
> >> shouldn't be a factor.  Are you hitting other problems with 3.16?
> > 
> > On this system it is a matter.
> > 
> > 3.16-rc5: Two hangs in one day
> > 
> > 3.16-rc4: No hang so far with three days uptime (well with hibernation
> > cycles in between)
> > 
> > So easy observation for me: 3.16-rc4 fine, 3.16-rc5 broke.
> 
> Can you please try this patch on rc5 and look for the printk:
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3668048..8ab56df 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
>  		spin_unlock(&root->fs_info->ordered_root_lock);
>  	}
> 
> +	spin_lock(&root->fs_info->ordered_root_lock);
> +	if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
> +		list_del_init(&BTRFS_I(inode)->ordered_operations);
> +printk(KERN_CRIT "racing inode deletion with ordered
> operations!!!!!!!!!!!\n"); +	}
> +	spin_unlock(&root->fs_info->ordered_root_lock);
> +
>  	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>  		     &BTRFS_I(inode)->runtime_flags)) {
>  		btrfs_info(root->fs_info, "inode %llu still on the orphan list",

Did so and again got a hang.

No racing inodes tough:

merkaba:/boot> zgrep -i "racing inode" /var/log/syslog*
merkaba:/boot#1>

Built kernel seems right:

martin@merkaba:[…]> LANG=C grep -ir "racing inode" fs/btrfs
fs/btrfs/inode.c:printk(KERN_CRIT "racing inode deletion with ordered operations!!!!!!!!!!!\n");
Binary file fs/btrfs/inode.o matches
Binary file fs/btrfs/btrfs.o matches
Binary file fs/btrfs/btrfs.ko matches

Backtrace doesn´t seem to contain any function related to inodes.


Back to rc4 again for now.


These hangs seemed to occur first at writing several hundred MiB onto a
high speed SDHC card… yet, they persisted long after the write was finished,
upto to a point where I had to reboot cause machine hung on trying to
switch between tty7 (X11) and tty1 (for diagnosis).



Jul 19 19:29:11 merkaba kernel: [12346.692457] mmc0: new high speed SDHC card at address 0001
Jul 19 19:29:11 merkaba kernel: [12346.744276] mmcblk0: mmc0:0001 00000 29.2 GiB 
Jul 19 19:29:11 merkaba kernel: [12346.769850]  mmcblk0: p1
Jul 19 19:29:20 merkaba kernel: [12355.796267] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case se
nsitive!
Jul 19 19:29:20 merkaba kernel: [12355.805515] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Jul 19 19:33:27 merkaba kernel: [12602.162818] INFO: task btrfs-transacti:715 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.162826]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.162827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.162828] btrfs-transacti D ffff8800cf90e780     0   715      2 0x00000000
Jul 19 19:33:27 merkaba kernel: [12602.162834]  ffff880401ddbc80 0000000000000002 ffff880407fc0000 ffff880401ddbfd8
Jul 19 19:33:27 merkaba kernel: [12602.162836]  ffff8800cf90e240 0000000000013080 ffff8800cf90e240 7fffffffffffffff
Jul 19 19:33:27 merkaba kernel: [12602.162839]  ffff8804018efd28 0000000000000002 ffffffff81470270 ffff8804018efd20
Jul 19 19:33:27 merkaba kernel: [12602.162842] Call Trace:
Jul 19 19:33:27 merkaba kernel: [12602.162856]  [<ffffffff81470270>] ? michael_mic.part.6+0x1e/0x1e
Jul 19 19:33:27 merkaba kernel: [12602.162860]  [<ffffffff81470c70>] schedule+0x64/0x66
Jul 19 19:33:27 merkaba kernel: [12602.162862]  [<ffffffff8147029f>] schedule_timeout+0x2f/0x114
Jul 19 19:33:27 merkaba kernel: [12602.162867]  [<ffffffff81062c4d>] ? get_parent_ip+0xd/0x3c
Jul 19 19:33:27 merkaba kernel: [12602.162868]  [<ffffffff81062cf7>] ? preempt_count_add+0x7b/0x8e
Jul 19 19:33:27 merkaba kernel: [12602.162871]  [<ffffffff8147149d>] __wait_for_common+0x11e/0x163
Jul 19 19:33:27 merkaba kernel: [12602.162873]  [<ffffffff8147149d>] ? __wait_for_common+0x11e/0x163
Jul 19 19:33:27 merkaba kernel: [12602.162876]  [<ffffffff810647e9>] ? wake_up_state+0xd/0xd
Jul 19 19:33:27 merkaba kernel: [12602.162879]  [<ffffffff81471501>] wait_for_completion+0x1f/0x21
Jul 19 19:33:27 merkaba kernel: [12602.162916]  [<ffffffffc0423c16>] btrfs_wait_and_free_delalloc_work+0x11/0x23 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162933]  [<ffffffffc042bb55>] btrfs_run_ordered_operations+0x1ea/0x21a [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162947]  [<ffffffffc0417392>] btrfs_commit_transaction+0x22/0x8df [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162959]  [<ffffffffc0413fb6>] transaction_kthread+0x107/0x1b9 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162970]  [<ffffffffc0413eaf>] ? btrfs_cleanup_transaction+0x43a/0x43a [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.162973]  [<ffffffff81058002>] kthread+0xb2/0xba
Jul 19 19:33:27 merkaba kernel: [12602.162976]  [<ffffffff81470000>] ? iommu_prepare_identity_map+0x1d/0x19d
Jul 19 19:33:27 merkaba kernel: [12602.162978]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.162980]  [<ffffffff81473bec>] ret_from_fork+0x7c/0xb0
Jul 19 19:33:27 merkaba kernel: [12602.162982]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.162999] INFO: task QThread:2996 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.163000]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.163001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.163002] QThread         D ffff8803f71a6780     0  2996      1 0x00000000
Jul 19 19:33:27 merkaba kernel: [12602.163005]  ffff8802df447c18 0000000000000002 ffff880407f649b0 ffff8802df447fd8
Jul 19 19:33:27 merkaba kernel: [12602.163007]  ffff8803f71a6240 0000000000013080 ffff8803f71a6240 ffff88041e293080
Jul 19 19:33:27 merkaba kernel: [12602.163009]  ffff88041e5d5268 ffff8802df447cb0 0000000000000002 ffffffff810daffc
Jul 19 19:33:27 merkaba kernel: [12602.163012] Call Trace:
Jul 19 19:33:27 merkaba kernel: [12602.163015]  [<ffffffff810daffc>] ? wait_on_page_read+0x37/0x37
Jul 19 19:33:27 merkaba kernel: [12602.163017]  [<ffffffff81470c70>] schedule+0x64/0x66
Jul 19 19:33:27 merkaba kernel: [12602.163019]  [<ffffffff81470df7>] io_schedule+0x57/0x76
Jul 19 19:33:27 merkaba kernel: [12602.163021]  [<ffffffff810db005>] sleep_on_page+0x9/0xd
Jul 19 19:33:27 merkaba kernel: [12602.163023]  [<ffffffff814711d7>] __wait_on_bit_lock+0x41/0x85
Jul 19 19:33:27 merkaba kernel: [12602.163025]  [<ffffffff810db0c2>] __lock_page+0x70/0x7c
Jul 19 19:33:27 merkaba kernel: [12602.163030]  [<ffffffff81070f79>] ? autoremove_wake_function+0x2f/0x2f
Jul 19 19:33:27 merkaba kernel: [12602.163045]  [<ffffffffc042c58f>] lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163059]  [<ffffffffc042c58f>] ? lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163073]  [<ffffffffc043086c>] extent_write_cache_pages.isra.21.constprop.42+0x1a7/0x2d9 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163076]  [<ffffffff814716d9>] ? mutex_unlock+0x11/0x13
Jul 19 19:33:27 merkaba kernel: [12602.163089]  [<ffffffffc0425ebd>] ? btrfs_file_write_iter+0x322/0x412 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163091]  [<ffffffff81062c4d>] ? get_parent_ip+0xd/0x3c
Jul 19 19:33:27 merkaba kernel: [12602.163105]  [<ffffffffc0430d94>] extent_writepages+0x46/0x57 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163119]  [<ffffffffc041b25a>] ? btrfs_submit_direct+0x3ef/0x3ef [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163122]  [<ffffffff81075192>] ? cpuacct_account_field+0x55/0x5f
Jul 19 19:33:27 merkaba kernel: [12602.163134]  [<ffffffffc041960f>] btrfs_writepages+0x23/0x25 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163136]  [<ffffffff810e5381>] do_writepages+0x1b/0x24
Jul 19 19:33:27 merkaba kernel: [12602.163138]  [<ffffffff810dc6ad>] __filemap_fdatawrite_range+0x50/0x52
Jul 19 19:33:27 merkaba kernel: [12602.163140]  [<ffffffff810dc714>] filemap_fdatawrite_range+0xe/0x10
Jul 19 19:33:27 merkaba kernel: [12602.163154]  [<ffffffffc0424a26>] btrfs_sync_file+0x84/0x2bd [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163156]  [<ffffffff8120e159>] ? __this_cpu_preempt_check+0x13/0x15
Jul 19 19:33:27 merkaba kernel: [12602.163159]  [<ffffffff81155fab>] vfs_fsync_range+0x1c/0x1e
Jul 19 19:33:27 merkaba kernel: [12602.163161]  [<ffffffff81155fc4>] vfs_fsync+0x17/0x19
Jul 19 19:33:27 merkaba kernel: [12602.163164]  [<ffffffff811561eb>] do_fsync+0x2c/0x45
Jul 19 19:33:27 merkaba kernel: [12602.163166]  [<ffffffff811563d1>] SyS_fsync+0xb/0xf
Jul 19 19:33:27 merkaba kernel: [12602.163168]  [<ffffffff81473e8b>] tracesys+0xdd/0xe2
Jul 19 19:33:27 merkaba kernel: [12602.163193] INFO: task kworker/u8:6:3861 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.163194]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.163196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.163196] kworker/u8:6    D ffff880071cccef0     0  3861      2 0x00000000
Jul 19 19:33:27 merkaba kernel: [12602.163202] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
Jul 19 19:33:27 merkaba kernel: [12602.163204]  ffff88029ad179b8 0000000000000002 ffff880407fc6240 ffff88029ad17fd8
Jul 19 19:33:27 merkaba kernel: [12602.163206]  ffff880071ccc9b0 0000000000013080 ffff880071ccc9b0 ffff88041e293080
Jul 19 19:33:27 merkaba kernel: [12602.163209]  ffff88041e5c0268 ffff88029ad17a50 0000000000000002 ffffffff810daffc
Jul 19 19:33:27 merkaba kernel: [12602.163211] Call Trace:
Jul 19 19:33:27 merkaba kernel: [12602.163213]  [<ffffffff810daffc>] ? wait_on_page_read+0x37/0x37
Jul 19 19:33:27 merkaba kernel: [12602.163216]  [<ffffffff81470c70>] schedule+0x64/0x66
Jul 19 19:33:27 merkaba kernel: [12602.163218]  [<ffffffff81470df7>] io_schedule+0x57/0x76
Jul 19 19:33:27 merkaba kernel: [12602.163219]  [<ffffffff810db005>] sleep_on_page+0x9/0xd
Jul 19 19:33:27 merkaba kernel: [12602.163222]  [<ffffffff814711d7>] __wait_on_bit_lock+0x41/0x85
Jul 19 19:33:27 merkaba kernel: [12602.163224]  [<ffffffff810db0c2>] __lock_page+0x70/0x7c
Jul 19 19:33:27 merkaba kernel: [12602.163226]  [<ffffffff81070f79>] ? autoremove_wake_function+0x2f/0x2f
Jul 19 19:33:27 merkaba kernel: [12602.163242]  [<ffffffffc042c58f>] lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163255]  [<ffffffffc042c58f>] ? lock_page+0x1e/0x21 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163269]  [<ffffffffc043086c>] extent_write_cache_pages.isra.21.constprop.42+0x1a7/0x2d9 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163284]  [<ffffffffc0430d94>] extent_writepages+0x46/0x57 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163296]  [<ffffffffc041b25a>] ? btrfs_submit_direct+0x3ef/0x3ef [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163298]  [<ffffffff81062cf4>] ? preempt_count_add+0x78/0x8e
Jul 19 19:33:27 merkaba kernel: [12602.163310]  [<ffffffffc041960f>] btrfs_writepages+0x23/0x25 [btrfs]
Jul 19 19:33:27 merkaba kernel: [12602.163312]  [<ffffffff810e5381>] do_writepages+0x1b/0x24
Jul 19 19:33:27 merkaba kernel: [12602.163315]  [<ffffffff81151d0d>] __writeback_single_inode+0x58/0x212
Jul 19 19:33:27 merkaba kernel: [12602.163317]  [<ffffffff81152c0f>] writeback_sb_inodes+0x1e9/0x32e
Jul 19 19:33:27 merkaba kernel: [12602.163319]  [<ffffffff81152dce>] __writeback_inodes_wb+0x7a/0xb0
Jul 19 19:33:27 merkaba kernel: [12602.163321]  [<ffffffff81152f1a>] wb_writeback+0x116/0x270
Jul 19 19:33:27 merkaba kernel: [12602.163323]  [<ffffffff811532c9>] bdi_writeback_workfn+0x171/0x313
Jul 19 19:33:27 merkaba kernel: [12602.163327]  [<ffffffff81052d7c>] process_one_work+0x16f/0x2b8
Jul 19 19:33:27 merkaba kernel: [12602.163329]  [<ffffffff81053626>] worker_thread+0x27b/0x32e
Jul 19 19:33:27 merkaba kernel: [12602.163332]  [<ffffffff810533ab>] ? cancel_delayed_work_sync+0x10/0x10
Jul 19 19:33:27 merkaba kernel: [12602.163334]  [<ffffffff81058002>] kthread+0xb2/0xba
Jul 19 19:33:27 merkaba kernel: [12602.163336]  [<ffffffff81470000>] ? iommu_prepare_identity_map+0x1d/0x19d
Jul 19 19:33:27 merkaba kernel: [12602.163338]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.163340]  [<ffffffff81473bec>] ret_from_fork+0x7c/0xb0
Jul 19 19:33:27 merkaba kernel: [12602.163342]  [<ffffffff81057f50>] ? __kthread_parkme+0x62/0x62
Jul 19 19:33:27 merkaba kernel: [12602.163344] INFO: task kworker/u8:7:3987 blocked for more than 120 seconds.
Jul 19 19:33:27 merkaba kernel: [12602.163346]       Tainted: G           O  3.16.0-rc5-tp520-btrfs-delrace+ #5
Jul 19 19:33:27 merkaba kernel: [12602.163347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 19:33:27 merkaba kernel: [12602.163348] kworker/u8:7    D ffff8803ce039dd0     0  3987      2 0x00000000


Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

  reply	other threads:[~2014-07-19 17:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-14 15:04 BTRFS hang with 3.16-rc5 Martin Steigerwald
2014-07-14 15:10 ` Martin Steigerwald
2014-07-14 17:51   ` Duncan
2014-07-14 22:03     ` Martin Steigerwald
2014-07-15  2:45       ` Duncan
2014-07-14 20:12   ` Chris Mason
2014-07-14 21:58     ` Martin Steigerwald
2014-07-15 13:21       ` Chris Mason
2014-07-15 15:08         ` Martin Steigerwald
2014-07-23 22:47           ` BTRFS hang with 3.16-rc5 (and also with 3.16-rc4) Martin Steigerwald
2014-07-24 14:58             ` Chris Mason
2014-07-24 16:24               ` Martin Steigerwald
2014-07-24 18:49               ` Martin Steigerwald
2014-07-24 20:04                 ` Chris Mason
2014-07-28 22:57                   ` Martin Steigerwald
2014-07-25  2:32                 ` Duncan
2014-07-25  3:06                   ` Nick Krause
     [not found]                     ` <20140725080244.GA31950@carfax.org.uk>
2014-07-25  9:13                       ` Hugo Mills
2014-07-28 13:20                         ` David Sterba
2014-07-25 15:36                       ` Nick Krause
2014-07-25 10:07                   ` Martin Steigerwald
2014-07-25  4:51               ` Torbjørn
     [not found]                 ` <20140725092800.GC25859@localhost.localdomain>
2014-07-25 10:22                   ` Torbjørn
     [not found]                     ` <53D23AF1.9010704@skagestad.org>
2014-07-25 11:37                       ` Torbjørn
2014-07-25 16:14                         ` Torbjørn
2014-07-28 10:00                         ` Liu Bo
2014-07-28 11:11                           ` Torbjørn
2014-07-29 10:18                             ` Liu Bo
2014-07-29 15:07                               ` Torbjørn
2014-07-30  5:09                                 ` Liu Bo
2014-07-18  7:51         ` BTRFS hang with 3.16-rc5 Martin Steigerwald
2014-07-18 13:36           ` Chris Mason
2014-07-19 17:59             ` Martin Steigerwald [this message]
2014-07-19 18:39               ` Chris Mason
2014-07-19 19:00                 ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3378728.FvZP3jOPGh@merkaba \
    --to=martin@lichtvoll.de \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.