From: Nikolay Borisov <kernel@kyup.com>
To: linux-ext4@vger.kernel.org
Cc: Michal Hocko <mhocko@suse.cz>, Marian Marinov <mm@1h.com>
Subject: Lockup in wait_transaction_locked under memory pressure
Date: Thu, 25 Jun 2015 13:16:39 +0300 [thread overview]
Message-ID: <558BD507.9070002@kyup.com> (raw)
In-Reply-To: <558BD447.1010503@kyup.com>
[Resending to linux-ext4 as the first try failed]
On 06/25/2015 01:13 PM, Nikolay Borisov wrote:
> Hello,
>
> On a fairly busy server, running LXC I'm observing that sometimes
> the processes for a particular container lock up by going into D
> (uninterruptible sleep) state. Obtaining backtraces for those
> processes one thing which stands out is that they are all
> blocked in wait_transaction_locked (part of JBD2).
> This occurs sporadically when the particular container
> is under memory pressure and a process is selected by
> OOM for killing. I'm running kernel 4.0.0 and
> oom_kill_allocating_task is enabled.
>
> Here are backtraces from multiple such processes:
>
>
> alxc9 kernel: nginx D ffff8820a90b39a8 11496 9331 30627
> 0x00000004
> alxc9 kernel: ffff8820a90b39a8 ffff881ff284f010 ffff88396c6d1e90
> 0000000000000282
> alxc9 kernel: ffff8820a90b0010 ffff883ff12f3870 ffff883ff12f3800
> 0000000000027df1
> alxc9 kernel: ffff880413c08000 ffff8820a90b39c8 ffffffff815ab76e
> ffff883ff12f3870
> alxc9 kernel: Call Trace:
> alxc9 kernel: [<ffffffff815ab76e>] schedule+0x3e/0x90
> alxc9 kernel: [<ffffffff81264265>] wait_transaction_locked+0x85/0xc0
> alxc9 kernel: [<ffffffff810910d0>] ? woken_wake_function+0x20/0x20
> alxc9 kernel: [<ffffffff815ab18a>] ? __schedule+0x39a/0x870
> alxc9 kernel: [<ffffffff81264540>] add_transaction_credits+0xf0/0x250
> alxc9 kernel: [<ffffffff815ab76e>] ? schedule+0x3e/0x90
> alxc9 kernel: [<ffffffff815ae5e5>] ? schedule_timeout+0x165/0x1f0
> alxc9 kernel: [<ffffffff81264824>] start_this_handle+0x184/0x420
> alxc9 kernel: [<ffffffff810e0880>] ? __delayacct_blkio_end+0x30/0x50
> alxc9 kernel: [<ffffffff8117a48e>] ? kmem_cache_alloc+0xee/0x1c0
> alxc9 kernel: [<ffffffff81265220>] jbd2__journal_start+0x100/0x200
> alxc9 kernel: [<ffffffff8121da5c>] ? ext4_dirty_inode+0x3c/0x80
> alxc9 kernel: [<ffffffff8124bb49>] __ext4_journal_start_sb+0x79/0x100
> alxc9 kernel: [<ffffffff8121da5c>] ext4_dirty_inode+0x3c/0x80
> alxc9 kernel: [<ffffffff811be5c3>] __mark_inode_dirty+0x173/0x400
> alxc9 kernel: [<ffffffff811ae9c5>] generic_update_time+0x85/0xd0
> alxc9 kernel: [<ffffffff81120f5a>] ? filemap_map_pages+0x1ca/0x210
> alxc9 kernel: [<ffffffff811ae632>] file_update_time+0xb2/0x110
> alxc9 kernel: [<ffffffff811226c2>] __generic_file_write_iter+0x172/0x3a0
> alxc9 kernel: [<ffffffff81214814>] ext4_file_write_iter+0x134/0x460
> alxc9 kernel: [<ffffffff810ad910>] ? update_rmtp+0x80/0x80
> alxc9 kernel: [<ffffffff81194047>] new_sync_write+0x97/0xc0
> alxc9 kernel: [<ffffffff8119445e>] vfs_write+0xce/0x180
> alxc9 kernel: [<ffffffff81194bda>] SyS_write+0x5a/0xd0
> alxc9 kernel: [<ffffffff815afa89>] system_call_fastpath+0x12/0x17
>
> alxc9 kernel: mysqld D ffff8821352638d8 11936 5176 30627
> 0x00000006
> alxc9 kernel: ffff8821352638d8 ffff881ff2848000 ffff8812d3d28a30
> 0000000000000286
> alxc9 kernel: ffff882135260010 ffff883ff12f3870 ffff883ff12f3800
> 0000000000027df1
> alxc9 kernel: ffff880413c08000 ffff8821352638f8 ffffffff815ab76e
> ffff883ff12f3870
> alxc9 kernel: Call Trace:
> alxc9 kernel: [<ffffffff815ab76e>] schedule+0x3e/0x90
> alxc9 kernel: [<ffffffff81264265>] wait_transaction_locked+0x85/0xc0
> alxc9 kernel: [<ffffffff810910d0>] ? woken_wake_function+0x20/0x20
> alxc9 kernel: [<ffffffff81264540>] add_transaction_credits+0xf0/0x250
> alxc9 kernel: [<ffffffff81264824>] start_this_handle+0x184/0x420
> alxc9 kernel: [<ffffffff8117a48e>] ? kmem_cache_alloc+0xee/0x1c0
> alxc9 kernel: [<ffffffff81265220>] jbd2__journal_start+0x100/0x200
> alxc9 kernel: [<ffffffff8121fa70>] ? ext4_evict_inode+0x190/0x490
> alxc9 kernel: [<ffffffff8124bb49>] __ext4_journal_start_sb+0x79/0x100
> alxc9 kernel: [<ffffffff8121fa70>] ext4_evict_inode+0x190/0x490
> alxc9 kernel: [<ffffffff811af6d8>] evict+0xb8/0x1a0
> alxc9 kernel: [<ffffffff811af8b6>] iput_final+0xf6/0x190
> alxc9 kernel: [<ffffffff811b0230>] iput+0xa0/0xe0
> alxc9 kernel: [<ffffffff811ab068>] dentry_iput+0xa8/0xf0
> alxc9 kernel: [<ffffffff811ac1c5>] __dentry_kill+0x85/0x130
> alxc9 kernel: [<ffffffff811ac42c>] dput+0x1bc/0x220
> alxc9 kernel: [<ffffffff811966b4>] __fput+0x144/0x200
> alxc9 kernel: [<ffffffff8119681e>] ____fput+0xe/0x10
> alxc9 kernel: [<ffffffff8106dc85>] task_work_run+0xd5/0x120
> alxc9 kernel: [<ffffffff810537d9>] do_exit+0x1b9/0x560
> alxc9 kernel: [<ffffffff810aebc2>] ? hrtimer_cancel+0x22/0x30
> alxc9 kernel: [<ffffffff81053bd6>] do_group_exit+0x56/0x100
> alxc9 kernel: [<ffffffff81061787>] get_signal+0x237/0x530
> alxc9 kernel: [<ffffffff81002d45>] do_signal+0x25/0x130
> alxc9 kernel: [<ffffffff810a57d9>] ? rcu_eqs_exit+0x79/0xb0
> alxc9 kernel: [<ffffffff810a5823>] ? rcu_user_exit+0x13/0x20
> alxc9 kernel: [<ffffffff81002ec8>] do_notify_resume+0x78/0xb0
> alxc9 kernel: [<ffffffff815afce3>] int_signal+0x12/0x17
>
> My hypotheses is that the OOM is killing a process while its performing
> a write to a transaction without it having a chance to complete it which
> leaves all other processes waiting to be woken up, which naturally never
> happens. I wonder whether such a failure scenario is even possible? If
> not then what could possibly force a transaction to stall for hours?
>
> Regards,
> Nikolay
>
next parent reply other threads:[~2015-06-25 10:16 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <558BD447.1010503@kyup.com>
2015-06-25 10:16 ` Nikolay Borisov [this message]
2015-06-25 11:21 ` Lockup in wait_transaction_locked under memory pressure Michal Hocko
2015-06-25 11:43 ` Nikolay Borisov
2015-06-25 11:50 ` Michal Hocko
2015-06-25 12:05 ` Nikolay Borisov
2015-06-25 13:29 ` Nikolay Borisov
2015-06-25 13:45 ` Michal Hocko
2015-06-25 13:54 ` Nikolay Borisov
2015-06-25 13:58 ` Michal Hocko
2015-06-25 13:31 ` Theodore Ts'o
2015-06-25 13:49 ` Nikolay Borisov
2015-06-25 14:05 ` Michal Hocko
2015-06-25 14:34 ` Nikolay Borisov
2015-06-25 15:18 ` Michal Hocko
2015-06-25 15:27 ` Nikolay Borisov
2015-06-29 8:32 ` Michal Hocko
2015-06-29 9:07 ` Nikolay Borisov
2015-06-29 9:16 ` Michal Hocko
2015-06-29 9:23 ` Nikolay Borisov
2015-06-29 9:38 ` Michal Hocko
2015-06-29 10:21 ` Nikolay Borisov
2015-06-29 11:44 ` Michal Hocko
2015-06-25 14:45 ` Theodore Ts'o
2015-06-25 13:57 ` Michal Hocko
2015-06-29 9:01 ` Nikolay Borisov
2015-06-29 9:36 ` Michal Hocko
2015-06-30 1:52 ` Dave Chinner
2015-06-30 3:02 ` Theodore Ts'o
2015-06-30 6:35 ` Nikolay Borisov
2015-06-30 12:30 ` Michal Hocko
2015-06-30 14:31 ` Michal Hocko
2015-06-30 22:58 ` Dave Chinner
2015-07-01 6:10 ` Michal Hocko
2015-07-01 11:13 ` Theodore Ts'o
2015-07-01 14:21 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558BD507.9070002@kyup.com \
--to=kernel@kyup.com \
--cc=linux-ext4@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mm@1h.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).