From: Nikolay Borisov <kernel@kyup.com>
To: Peter Hurley <peter@hurleysoftware.com>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>
Subject: more hangs in the tty layer
Date: Tue, 4 Oct 2016 11:10:20 +0300 [thread overview]
Message-ID: <57F363EC.8040304@kyup.com> (raw)
Hello Peter,
I've emailed you before re. spurious hangs in the TTY layer, but
at that time I was running a rather old (but LTS) 3.12 kernel. Now,
I'm running a 4.4.10 and I still observe the following lock-ups.
I have multiple processes which hang with the following callstack:
[4592064.967445] sulogin D ffff8839bca53b28 0 38279 38274 0x00000006
[4592064.967448] ffff8839bca53b28 ffff881ff2ed44c0 ffff883748d85280 ffff8839bca54000
[4592064.967452] ffff883748d85280 ffff881ff07c988c 00000000ffffffff ffff881ff07c9890
[4592064.967454] ffff8839bca53b40 ffffffff8161020c ffff881ff07c9888 ffff8839bca53b50
[4592064.967458] Call Trace:
[4592064.967460] [<ffffffff8161020c>] schedule+0x3c/0x90
[4592064.967463] [<ffffffff81610505>] schedule_preempt_disabled+0x15/0x20
[4592064.967466] [<ffffffff816121e0>] __mutex_lock_slowpath+0xb0/0x130
[4592064.967468] [<ffffffff8161228d>] mutex_lock+0x2d/0x40
[4592064.967471] [<ffffffff81614388>] tty_lock+0x38/0x90
[4592064.967474] [<ffffffff8138e6a5>] tty_release+0x55/0x590
[4592064.967478] [<ffffffff8119a8ea>] __fput+0xea/0x1f0
[4592064.967481] [<ffffffff8119aa2e>] ____fput+0xe/0x10
[4592064.967485] [<ffffffff8106fc6f>] task_work_run+0x7f/0xb0
[4592064.967488] [<ffffffff810554d3>] do_exit+0x373/0xb70
[4592064.967490] [<ffffffff816120c2>] ? __mutex_unlock_slowpath+0x32/0x40
[4592064.967493] [<ffffffff816120eb>] ? mutex_unlock+0x1b/0x20
[4592064.967495] [<ffffffff81390e5b>] ? n_tty_read+0x31b/0x9e0
[4592064.967498] [<ffffffff81055d53>] do_group_exit+0x43/0xd0
[4592064.967500] [<ffffffff81061bd1>] get_signal+0x2e1/0x710
[4592064.967503] [<ffffffff810053c8>] do_signal+0x28/0x770
[4592064.967505] [<ffffffff81394d36>] ? tty_ldisc_deref+0x16/0x20
[4592064.967508] [<ffffffff8138cfbc>] ? tty_read+0x9c/0xf0
[4592064.967511] [<ffffffff811987a8>] ? __vfs_read+0x28/0xd0
[4592064.967514] [<ffffffff812a7d53>] ? security_file_permission+0xa3/0xc0
[4592064.967517] [<ffffffff810021da>] exit_to_usermode_loop+0x8a/0xd0
[4592064.967520] [<ffffffff81002a90>] syscall_return_slowpath+0x80/0xf0
[4592064.967523] [<ffffffff816145f4>] int_ret_from_sys_call+0x25/0x9f
After debugging the memory dump of the server it turned out the
following task is holding the mutex on which the sulogin is stuck:
PID: 38274 TASK: ffff88395a109b80 CPU: 13 COMMAND: "sh"
#0 [ffff8837aba27ac8] __schedule at ffffffff8160fa9a
#1 [ffff8837aba27b10] schedule at ffffffff8161020c
#2 [ffff8837aba27b28] schedule_timeout at ffffffff81613472
#3 [ffff8837aba27ba8] ldsem_down_write at ffffffff81613cc5
#4 [ffff8837aba27c10] tty_ldisc_lock at ffffffff816142e6
#5 [ffff8837aba27c28] tty_ldisc_hangup at ffffffff8139561a
#6 [ffff8837aba27c50] __tty_hangup at ffffffff8138cd03
#7 [ffff8837aba27ca0] disassociate_ctty at ffffffff8138d2b9
#8 [ffff8837aba27cd0] disassociate_ctty at ffffffff8138d5d9
#9 [ffff8837aba27ce0] do_exit at ffffffff81055908
#10 [ffff8837aba27d48] do_group_exit at ffffffff81055d53
#11 [ffff8837aba27d70] get_signal at ffffffff81061bd1
#12 [ffff8837aba27de8] do_signal at ffffffff810053c8
#13 [ffff8837aba27ee0] exit_to_usermode_loop at ffffffff810021da
#14 [ffff8837aba27f18] syscall_return_slowpath at ffffffff81002a90
#15 [ffff8837aba27f50] int_ret_from_sys_call at ffffffff816145f4
RIP: 00007f18fde88aba RSP: 00007ffc47a1fa78 RFLAGS: 00000246
RAX: fffffffffffffe00 RBX: 0000561242611d20 RCX: 00007f18fde88aba
RDX: 0000000000000000 RSI: 00007ffc47a1fa9c RDI: 00000000ffffffff
RBP: 000056124260f948 R8: 0000000000000000 R9: 00007f18fe3a5740
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc47a1fb20
R13: 00007ffc47a1faa0 R14: 0000000000000000 R15: 0000000000000001
ORIG_RAX: 000000000000003d CS: 0033 SS: 002b
But I also observed also the following hung-up task,
waiting for read access:
[4592064.968193] index.pl D ffff881430387ce8 0 47168 1 0x00000004
[4592064.968196] ffff881430387ce8 ffff881ff2ed0dc0 ffff8808a0960000 ffff881430388000
[4592064.968200] ffff881ff07c9828 ffff881ff07c9830 ffff8808a0960000 7fffffffffffffff
[4592064.968203] ffff881430387d00 ffffffff8161020c ffffffff00000004 ffff881430387d80
[4592064.968205] Call Trace:
[4592064.968208] [<ffffffff8161020c>] schedule+0x3c/0x90
[4592064.968211] [<ffffffff81613472>] schedule_timeout+0x202/0x260
[4592064.968216] [<ffffffff812a6a48>] ? security_capable+0x48/0x60
[4592064.968219] [<ffffffff81613ab2>] ldsem_down_read+0x122/0x240
[4592064.968222] [<ffffffff813992cc>] ? vt_ioctl+0x6c/0x1460
[4592064.968225] [<ffffffff81394c90>] tty_ldisc_ref_wait+0x20/0x50
[4592064.968228] [<ffffffff8138da37>] tty_ioctl+0x417/0xd40
[4592064.968232] [<ffffffff8115a976>] ? handle_mm_fault+0x19b6/0x19c0
[4592064.968236] [<ffffffff811abe8f>] do_vfs_ioctl+0x30f/0x560
[4592064.968239] [<ffffffff811af8f2>] ? dput+0x192/0x230
[4592064.968242] [<ffffffff811ac159>] SyS_ioctl+0x79/0x90
[4592064.968245] [<ffffffff8161449b>] entry_SYSCALL_64_fastpath+0x16/0x6e
I checked and all the tasks are working on the same tty_struct. Here
is the state of the struct: http://paste.ubuntu.com/23273973/
To me this seems like a lost wakeup event, especially that I've
seen patches flying around fixing such issues. Unfortunately I
cannot reliably reproduce this and booting a production machine
with lockdep would be problematic.
Regards,
Nikolay
next reply other threads:[~2016-10-04 8:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-04 8:10 Nikolay Borisov [this message]
2016-11-04 0:38 ` more hangs in the tty layer Peter Hurley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57F363EC.8040304@kyup.com \
--to=kernel@kyup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peter@hurleysoftware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.