From: Nikolay Borisov <kernel@kyup.com>
To: tytso@mit.edu
Cc: "linux-ext4"@vger.kernel.org, linux-kernel@vger.kernel.org,
Michal Hocko <mhocko@suse.cz>, Marian Marinov <mm@1h.com>
Subject: Lockup in wait_transaction_locked under memory pressure
Date: Thu, 25 Jun 2015 13:13:27 +0300 [thread overview]
Message-ID: <558BD447.1010503@kyup.com> (raw)
Hello,
On a fairly busy server, running LXC I'm observing that sometimes
the processes for a particular container lock up by going into D
(uninterruptible sleep) state. Obtaining backtraces for those
processes one thing which stands out is that they are all
blocked in wait_transaction_locked (part of JBD2).
This occurs sporadically when the particular container
is under memory pressure and a process is selected by
OOM for killing. I'm running kernel 4.0.0 and
oom_kill_allocating_task is enabled.
Here are backtraces from multiple such processes:
alxc9 kernel: nginx D ffff8820a90b39a8 11496 9331 30627
0x00000004
alxc9 kernel: ffff8820a90b39a8 ffff881ff284f010 ffff88396c6d1e90
0000000000000282
alxc9 kernel: ffff8820a90b0010 ffff883ff12f3870 ffff883ff12f3800
0000000000027df1
alxc9 kernel: ffff880413c08000 ffff8820a90b39c8 ffffffff815ab76e
ffff883ff12f3870
alxc9 kernel: Call Trace:
alxc9 kernel: [<ffffffff815ab76e>] schedule+0x3e/0x90
alxc9 kernel: [<ffffffff81264265>] wait_transaction_locked+0x85/0xc0
alxc9 kernel: [<ffffffff810910d0>] ? woken_wake_function+0x20/0x20
alxc9 kernel: [<ffffffff815ab18a>] ? __schedule+0x39a/0x870
alxc9 kernel: [<ffffffff81264540>] add_transaction_credits+0xf0/0x250
alxc9 kernel: [<ffffffff815ab76e>] ? schedule+0x3e/0x90
alxc9 kernel: [<ffffffff815ae5e5>] ? schedule_timeout+0x165/0x1f0
alxc9 kernel: [<ffffffff81264824>] start_this_handle+0x184/0x420
alxc9 kernel: [<ffffffff810e0880>] ? __delayacct_blkio_end+0x30/0x50
alxc9 kernel: [<ffffffff8117a48e>] ? kmem_cache_alloc+0xee/0x1c0
alxc9 kernel: [<ffffffff81265220>] jbd2__journal_start+0x100/0x200
alxc9 kernel: [<ffffffff8121da5c>] ? ext4_dirty_inode+0x3c/0x80
alxc9 kernel: [<ffffffff8124bb49>] __ext4_journal_start_sb+0x79/0x100
alxc9 kernel: [<ffffffff8121da5c>] ext4_dirty_inode+0x3c/0x80
alxc9 kernel: [<ffffffff811be5c3>] __mark_inode_dirty+0x173/0x400
alxc9 kernel: [<ffffffff811ae9c5>] generic_update_time+0x85/0xd0
alxc9 kernel: [<ffffffff81120f5a>] ? filemap_map_pages+0x1ca/0x210
alxc9 kernel: [<ffffffff811ae632>] file_update_time+0xb2/0x110
alxc9 kernel: [<ffffffff811226c2>] __generic_file_write_iter+0x172/0x3a0
alxc9 kernel: [<ffffffff81214814>] ext4_file_write_iter+0x134/0x460
alxc9 kernel: [<ffffffff810ad910>] ? update_rmtp+0x80/0x80
alxc9 kernel: [<ffffffff81194047>] new_sync_write+0x97/0xc0
alxc9 kernel: [<ffffffff8119445e>] vfs_write+0xce/0x180
alxc9 kernel: [<ffffffff81194bda>] SyS_write+0x5a/0xd0
alxc9 kernel: [<ffffffff815afa89>] system_call_fastpath+0x12/0x17
alxc9 kernel: mysqld D ffff8821352638d8 11936 5176 30627
0x00000006
alxc9 kernel: ffff8821352638d8 ffff881ff2848000 ffff8812d3d28a30
0000000000000286
alxc9 kernel: ffff882135260010 ffff883ff12f3870 ffff883ff12f3800
0000000000027df1
alxc9 kernel: ffff880413c08000 ffff8821352638f8 ffffffff815ab76e
ffff883ff12f3870
alxc9 kernel: Call Trace:
alxc9 kernel: [<ffffffff815ab76e>] schedule+0x3e/0x90
alxc9 kernel: [<ffffffff81264265>] wait_transaction_locked+0x85/0xc0
alxc9 kernel: [<ffffffff810910d0>] ? woken_wake_function+0x20/0x20
alxc9 kernel: [<ffffffff81264540>] add_transaction_credits+0xf0/0x250
alxc9 kernel: [<ffffffff81264824>] start_this_handle+0x184/0x420
alxc9 kernel: [<ffffffff8117a48e>] ? kmem_cache_alloc+0xee/0x1c0
alxc9 kernel: [<ffffffff81265220>] jbd2__journal_start+0x100/0x200
alxc9 kernel: [<ffffffff8121fa70>] ? ext4_evict_inode+0x190/0x490
alxc9 kernel: [<ffffffff8124bb49>] __ext4_journal_start_sb+0x79/0x100
alxc9 kernel: [<ffffffff8121fa70>] ext4_evict_inode+0x190/0x490
alxc9 kernel: [<ffffffff811af6d8>] evict+0xb8/0x1a0
alxc9 kernel: [<ffffffff811af8b6>] iput_final+0xf6/0x190
alxc9 kernel: [<ffffffff811b0230>] iput+0xa0/0xe0
alxc9 kernel: [<ffffffff811ab068>] dentry_iput+0xa8/0xf0
alxc9 kernel: [<ffffffff811ac1c5>] __dentry_kill+0x85/0x130
alxc9 kernel: [<ffffffff811ac42c>] dput+0x1bc/0x220
alxc9 kernel: [<ffffffff811966b4>] __fput+0x144/0x200
alxc9 kernel: [<ffffffff8119681e>] ____fput+0xe/0x10
alxc9 kernel: [<ffffffff8106dc85>] task_work_run+0xd5/0x120
alxc9 kernel: [<ffffffff810537d9>] do_exit+0x1b9/0x560
alxc9 kernel: [<ffffffff810aebc2>] ? hrtimer_cancel+0x22/0x30
alxc9 kernel: [<ffffffff81053bd6>] do_group_exit+0x56/0x100
alxc9 kernel: [<ffffffff81061787>] get_signal+0x237/0x530
alxc9 kernel: [<ffffffff81002d45>] do_signal+0x25/0x130
alxc9 kernel: [<ffffffff810a57d9>] ? rcu_eqs_exit+0x79/0xb0
alxc9 kernel: [<ffffffff810a5823>] ? rcu_user_exit+0x13/0x20
alxc9 kernel: [<ffffffff81002ec8>] do_notify_resume+0x78/0xb0
alxc9 kernel: [<ffffffff815afce3>] int_signal+0x12/0x17
My hypotheses is that the OOM is killing a process while its performing
a write to a transaction without it having a chance to complete it which
leaves all other processes waiting to be woken up, which naturally never
happens. I wonder whether such a failure scenario is even possible? If
not then what could possibly force a transaction to stall for hours?
Regards,
Nikolay
next reply other threads:[~2015-06-25 10:15 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-25 10:13 Nikolay Borisov [this message]
2015-06-25 10:16 ` Lockup in wait_transaction_locked under memory pressure Nikolay Borisov
2015-06-25 11:21 ` Michal Hocko
2015-06-25 11:43 ` Nikolay Borisov
2015-06-25 11:50 ` Michal Hocko
2015-06-25 12:05 ` Nikolay Borisov
2015-06-25 13:29 ` Nikolay Borisov
2015-06-25 13:45 ` Michal Hocko
2015-06-25 13:54 ` Nikolay Borisov
2015-06-25 13:58 ` Michal Hocko
2015-06-25 13:31 ` Theodore Ts'o
2015-06-25 13:49 ` Nikolay Borisov
2015-06-25 14:05 ` Michal Hocko
2015-06-25 14:34 ` Nikolay Borisov
2015-06-25 15:18 ` Michal Hocko
2015-06-25 15:27 ` Nikolay Borisov
2015-06-29 8:32 ` Michal Hocko
2015-06-29 9:07 ` Nikolay Borisov
2015-06-29 9:16 ` Michal Hocko
2015-06-29 9:23 ` Nikolay Borisov
2015-06-29 9:38 ` Michal Hocko
2015-06-29 10:21 ` Nikolay Borisov
2015-06-29 11:44 ` Michal Hocko
2015-06-25 14:45 ` Theodore Ts'o
2015-06-25 13:57 ` Michal Hocko
2015-06-29 9:01 ` Nikolay Borisov
2015-06-29 9:36 ` Michal Hocko
2015-06-30 1:52 ` Dave Chinner
2015-06-30 3:02 ` Theodore Ts'o
2015-06-30 6:35 ` Nikolay Borisov
2015-06-30 12:30 ` Michal Hocko
2015-06-30 14:31 ` Michal Hocko
2015-06-30 22:58 ` Dave Chinner
2015-07-01 6:10 ` Michal Hocko
2015-07-01 11:13 ` Theodore Ts'o
2015-07-01 14:21 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558BD447.1010503@kyup.com \
--to=kernel@kyup.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mm@1h.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.