All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel@kyup.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-ext4@vger.kernel.org, Marian Marinov <mm@1h.com>
Subject: Re: Lockup in wait_transaction_locked under memory pressure
Date: Thu, 25 Jun 2015 16:54:23 +0300	[thread overview]
Message-ID: <558C080F.7040104@kyup.com> (raw)
In-Reply-To: <20150625134558.GF17237@dhcp22.suse.cz>



On 06/25/2015 04:45 PM, Michal Hocko wrote:
> On Thu 25-06-15 16:29:31, Nikolay Borisov wrote:
>> I couldn't find any particular OOM which stands out, here how a typical 
>> one looks like: 
>>
>> alxc9 kernel: Memory cgroup out of memory (oom_kill_allocating_task): Kill process 9703 (postmaster) score 0 or sacrifice child
>> alxc9 kernel: Killed process 9703 (postmaster) total-vm:205800kB, anon-rss:1128kB, file-rss:0kB
>> alxc9 kernel: php invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
>> alxc9 kernel: php cpuset=cXXXX mems_allowed=0-1
>> alxc9 kernel: CPU: 12 PID: 1000 Comm: php Not tainted 4.0.0-clouder9+ #31
>> alxc9 kernel: Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
>> alxc9 kernel: ffff8805d8440400 ffff88208d863c78 ffffffff815aaca3 ffff8820b947c750
>> alxc9 kernel: ffff8820b947c750 ffff88208d863cc8 ffffffff81123b2e ffff882000000000
>> alxc9 kernel: ffffffff000000d0 ffff8805d8440400 ffff8820b947c750 ffff8820b947cee0
>> alxc9 kernel: Call Trace:
>> alxc9 kernel: [<ffffffff815aaca3>] dump_stack+0x48/0x5d
>> alxc9 kernel: [<ffffffff81123b2e>] dump_header+0x8e/0xe0
>> alxc9 kernel: [<ffffffff81123fa7>] oom_kill_process+0x1d7/0x3c0
>> alxc9 kernel: [<ffffffff810d85a1>] ? cpuset_mems_allowed_intersects+0x21/0x30
>> alxc9 kernel: [<ffffffff8118c2bd>] mem_cgroup_out_of_memory+0x2bd/0x370
>> alxc9 kernel: [<ffffffff81189b37>] ? mem_cgroup_iter+0x177/0x390
>> alxc9 kernel: [<ffffffff8118c5d7>] mem_cgroup_oom_synchronize+0x267/0x290
>> alxc9 kernel: [<ffffffff811874f0>] ? mem_cgroup_wait_acct_move+0x140/0x140
>> alxc9 kernel: [<ffffffff81124504>] pagefault_out_of_memory+0x24/0xe0
>> alxc9 kernel: [<ffffffff81041927>] mm_fault_error+0x47/0x160
>> alxc9 kernel: [<ffffffff81041db0>] __do_page_fault+0x340/0x3c0
>> alxc9 kernel: [<ffffffff81041e6c>] do_page_fault+0x3c/0x90
>> alxc9 kernel: [<ffffffff815b1758>] page_fault+0x28/0x30
>> alxc9 kernel: Task in /lxc/cXXXX killed as a result of limit of /lxc/cXXXX
>> alxc9 kernel: memory: usage 2097152kB, limit 2097152kB, failcnt 7832302
>> alxc9 kernel: memory+swap: usage 2097152kB, limit 2621440kB, failcnt 0
>> alxc9 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
>> alxc9 kernel: Memory cgroup stats for /lxc/cXXXX: cache:22708KB rss:2074444KB rss_huge:0KB 
>> mapped_file:19960KB writeback:4KB swap:0KB inactive_anon:20364KB active_anon:2074896KB 
>> inactive_file:1236KB active_file:464KB unevictable:0KB
>>
>> The backtrace for other processes is exactly the same. 
> 
> OK, so this is not the global OOM killer. That wasn't clear from your
> previous description. It makes a difference because it means that the
> system is still healthy globaly and allocation requests will not loop
> for ever in the allocator. Memcg charging path will not get blocked
> until the OOM resolves and return ENOMEM when not called from the page
> fault path.

Yes, overall, the machine is healthy, only the processes for a
particular container would all go into uninterruptible sleep.

> 
> memcg oom killer ignores oom_kill_allocating_task so the victim might be
> different from the current task. That means the victim might get stuck
> behind a lock held by somebody else. If the ext4 journaling code depends
> on memcg charges and retry endlessly then the waiters would get stuck as
> well.

I've patched the cgroup OOM so that it takes into account the
oom_kill_allocating_task.

> 
> I can see some calls to find_or_create_page from fs/ext4/mballoc.c but
> AFAIU they are handling ENOMEM and lead to transaction abort - but I am
> not familiar with this code enough so somebody familiar with ext4 should
> double check that.
> 
> This all suggests that your lockup is caused by something else than OOM
> most probably.
> 

  reply	other threads:[~2015-06-25 13:54 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25 10:13 Lockup in wait_transaction_locked under memory pressure Nikolay Borisov
2015-06-25 10:16 ` Nikolay Borisov
2015-06-25 11:21   ` Michal Hocko
2015-06-25 11:43     ` Nikolay Borisov
2015-06-25 11:50       ` Michal Hocko
2015-06-25 12:05         ` Nikolay Borisov
2015-06-25 13:29         ` Nikolay Borisov
2015-06-25 13:45           ` Michal Hocko
2015-06-25 13:54             ` Nikolay Borisov [this message]
2015-06-25 13:58               ` Michal Hocko
2015-06-25 13:31         ` Theodore Ts'o
2015-06-25 13:49           ` Nikolay Borisov
2015-06-25 14:05             ` Michal Hocko
2015-06-25 14:34               ` Nikolay Borisov
2015-06-25 15:18                 ` Michal Hocko
2015-06-25 15:27                   ` Nikolay Borisov
2015-06-29  8:32                     ` Michal Hocko
2015-06-29  9:07                       ` Nikolay Borisov
2015-06-29  9:16                         ` Michal Hocko
2015-06-29  9:23                           ` Nikolay Borisov
2015-06-29  9:38                             ` Michal Hocko
2015-06-29 10:21                               ` Nikolay Borisov
2015-06-29 11:44                                 ` Michal Hocko
2015-06-25 14:45             ` Theodore Ts'o
2015-06-25 13:57           ` Michal Hocko
2015-06-29  9:01           ` Nikolay Borisov
2015-06-29  9:36             ` Michal Hocko
2015-06-30  1:52               ` Dave Chinner
2015-06-30  3:02                 ` Theodore Ts'o
2015-06-30  6:35                   ` Nikolay Borisov
2015-06-30 12:30                 ` Michal Hocko
2015-06-30 14:31                   ` Michal Hocko
2015-06-30 22:58                     ` Dave Chinner
2015-07-01  6:10                       ` Michal Hocko
2015-07-01 11:13                         ` Theodore Ts'o
2015-07-01 14:21                           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558C080F.7040104@kyup.com \
    --to=kernel@kyup.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mm@1h.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.