linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Ernst <johannes.ernst@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Hanging after frequent use of systemd-nspawn --ephemeral
Date: Sun, 14 Jan 2018 12:18:01 -0800	[thread overview]
Message-ID: <204DC482-8ADF-4C88-B84C-D47A0E8C9346@gmail.com> (raw)
In-Reply-To: <8c9933e1-f694-a513-83ca-33fcb503b7e8@gmx.com>

> On Jan 13, 2018, at 18:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2018年01月14日 09:36, Johannes Ernst wrote:
>> Summary: frequent “hangs” of the system with dmesg saying:
>> 
>> task systemd:22229 blocked for more than 120 seconds.
>> [ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
>> [ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 2948.928665] systemd         D    0 22229  22226 0x00000100
>> [ 2948.928671] Call Trace:
>> [ 2948.928684]  ? __schedule+0x290/0x890
>> [ 2948.928690]  schedule+0x2f/0x90
>> [ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>> [ 2948.928752]  ? wait_woken+0x80/0x80
>> [ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
>> [ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
>> [ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
>> [ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]
> 
> You're using qgroup, and the timing is to find the old_roots of one
> extent, which will only search commit roots.
> 
> Normally this shouldn't cause any problem, especially for commit roots.
> 
> Is there any special operation happening?

Nope. It appears it happens right when systemd-nspawn begins to run and I am not executing any other commands.

I did not realize qgroups are involved … all I did is mkfs.btrfs and running systemd-nspawn :-) Perhaps the defaults should be qgroups off? (But I digress)

>> That works well … but not for long. Often we don’t make it through the test suite and the starting of new containers hangs. Other filesystem operations also hang. The above stacktrace, or something rather similar shows up in dmesg (not in the journal, because that hangs, too!) This is repeated, but I don’t see any other relevant messages. Only a reboot seems to allows to recover.
> 
> So Qgroup is used to limit disk usage of each container?
> 
> Maybe it's related to subvolume deletion?
> 
> Anyway, if qgroup is not necessary, disabling qgroup should fix your
> problem.
> 
> Despite of that, did that really hangs?
> Qgroup dramatically increase overhead to delete a subvolume or balance
> the fs.
> Maybe it's just a little slow?

I have waited for several hours and the system has not recovered (me walking away from the running tests, returning hours later).

Update: so I executed "btrfs quota disable” on all relevant file systems. (right?) Running tests again, this morning I’m getting:

INFO: task systemd-journal:20876 blocked for more than 120 seconds.
[ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
[ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
[ 5037.962622] Call Trace:
[ 5037.962635]  ? __schedule+0x290/0x890
[ 5037.962640]  ? __slab_free+0x14e/0x300
[ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
[ 5037.962651]  schedule+0x2f/0x90
[ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
[ 5037.962713]  ? wait_woken+0x80/0x80
[ 5037.962739]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 5037.962767]  btrfs_search_slot+0x703/0x9f0 [btrfs]
[ 5037.962796]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
[ 5037.962841]  btrfs_insert_orphan_item+0x66/0xa0 [btrfs]
[ 5037.962880]  btrfs_orphan_add+0xa1/0x200 [btrfs]
[ 5037.962919]  btrfs_setattr+0x123/0x3b0 [btrfs]
[ 5037.962926]  notify_change+0x2fd/0x420
[ 5037.962933]  do_truncate+0x75/0xc0
[ 5037.962940]  do_sys_ftruncate.constprop.19+0xe7/0x100
[ 5037.962947]  do_syscall_64+0x55/0x110
[ 5037.962952]  entry_SYSCALL64_slow_path+0x25/0x25
[ 5037.962956] RIP: 0033:0x7fd9423697ba
[ 5037.962958] RSP: 002b:00007fff1179cc18 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
[ 5037.962962] RAX: ffffffffffffffda RBX: 000055bd48cbe9f0 RCX: 00007fd9423697ba
[ 5037.962965] RDX: 000055bd48cbe660 RSI: 0000000000640000 RDI: 000000000000000f
[ 5037.962966] RBP: 00007fff1179cc50 R08: 000055bd48cbc62c R09: 000055bd48cbea6c
[ 5037.962968] R10: 000055bd48cbe9f0 R11: 0000000000000206 R12: 00007fff1179cc48
[ 5037.962970] R13: 0000000000000003 R14: 000562b7234e71a9 R15: 000055bd47749ca0

and doing a simple “touch /build/tmp/foo” never returns. 10+ hours have passed since the previous command was issued.



      parent reply	other threads:[~2018-01-14 20:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-14  1:36 Hanging after frequent use of systemd-nspawn --ephemeral Johannes Ernst
2018-01-14  2:27 ` Qu Wenruo
2018-01-14 16:30   ` Duncan
2018-01-14 20:17   ` Johannes Ernst
2018-01-15  0:48     ` Qu Wenruo
2018-01-15  0:58       ` Johannes Ernst
2018-01-15  1:34         ` Qu Wenruo
2018-01-15  1:47           ` Johannes Ernst
2018-01-15  1:55             ` Qu Wenruo
2018-01-14 20:18   ` Johannes Ernst [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=204DC482-8ADF-4C88-B84C-D47A0E8C9346@gmail.com \
    --to=johannes.ernst@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).