mm,numad,rcu: hang on OOM

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <levinsasha928@gmail.com>
To: paulmck <paulmck@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: mm,numad,rcu: hang on OOM
Date: Fri, 29 Jun 2012 18:44:41 +0200	[thread overview]
Message-ID: <1340988281.2936.58.camel@lappy> (raw)

Hi all,

While fuzzing using trinity on a KVM tools guest with todays linux-next, I've hit the following lockup:

[  362.261729] INFO: task numad/2:27 blocked for more than 120 seconds.
[  362.263974] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  362.271684] numad/2         D 0000000000000001  5672    27      2 0x00000000
[  362.280052]  ffff8800294c7c58 0000000000000046 ffff8800294c7c08 ffffffff81163dba
[  362.294477]  ffff8800294c6000 ffff8800294c6010 ffff8800294c7fd8 ffff8800294c6000
[  362.306631]  ffff8800294c6010 ffff8800294c7fd8 ffff88000d5c3000 ffff8800294c8000
[  362.315395] Call Trace:
[  362.318556]  [<ffffffff81163dba>] ? __lock_release+0x1ba/0x1d0
[  362.325411]  [<ffffffff8372ab75>] schedule+0x55/0x60
[  362.328844]  [<ffffffff8372b965>] rwsem_down_failed_common+0xf5/0x130
[  362.332501]  [<ffffffff8115d38e>] ? put_lock_stats+0xe/0x40
[  362.334496]  [<ffffffff81160135>] ? __lock_contended+0x1f5/0x230
[  362.336723]  [<ffffffff8372b9d5>] rwsem_down_read_failed+0x15/0x17
[  362.339297]  [<ffffffff81985e34>] call_rwsem_down_read_failed+0x14/0x30
[  362.341768]  [<ffffffff83729a29>] ? down_read+0x79/0xa0
[  362.343669]  [<ffffffff8122d262>] ? lazy_migrate_process+0x22/0x60
[  362.345616]  [<ffffffff8122d262>] lazy_migrate_process+0x22/0x60
[  362.347464]  [<ffffffff811453c0>] process_mem_migrate+0x10/0x20
[  362.349340]  [<ffffffff81145090>] move_processes+0x190/0x230
[  362.351398]  [<ffffffff81145b7a>] numad_thread+0x7a/0x120
[  362.353245]  [<ffffffff81145b00>] ? find_busiest_node+0x310/0x310
[  362.355396]  [<ffffffff81119e82>] kthread+0xb2/0xc0
[  362.356996]  [<ffffffff8372ea34>] kernel_thread_helper+0x4/0x10
[  362.359253]  [<ffffffff8372ccb4>] ? retint_restore_args+0x13/0x13
[  362.361168]  [<ffffffff81119dd0>] ? __init_kthread_worker+0x70/0x70
[  362.363277]  [<ffffffff8372ea30>] ? gs_change+0x13/0x13

I've hit sysrq-t to see what might be the cause, and it appears that an OOM was in progress, and was stuck on RCU:

[  578.086230] trinity-child69 D ffff8800277a54c8  3968  6658   6580 0x00000000
[  578.086230]  ffff880022c5f518 0000000000000046 ffff880022c5f4c8 ffff88001b9d6e00
[  578.086230]  ffff880022c5e000 ffff880022c5e010 ffff880022c5ffd8 ffff880022c5e000
[  578.086230]  ffff880022c5e010 ffff880022c5ffd8 ffff880023c08000 ffff880022c33000
[  578.086230] Call Trace:
[  578.086230]  [<ffffffff8372ab75>] schedule+0x55/0x60
[  578.086230]  [<ffffffff837285c8>] schedule_timeout+0x38/0x2c0
[  578.086230]  [<ffffffff81161d16>] ? mark_held_locks+0xf6/0x120
[  578.086230]  [<ffffffff81163dba>] ? __lock_release+0x1ba/0x1d0
[  578.086230]  [<ffffffff8372c67b>] ? _raw_spin_unlock_irq+0x2b/0x80
[  578.086230]  [<ffffffff8372a06f>] wait_for_common+0xff/0x170
[  578.086230]  [<ffffffff81132c10>] ? try_to_wake_up+0x290/0x290
[  578.086230]  [<ffffffff8372a188>] wait_for_completion+0x18/0x20
[  578.086230]  [<ffffffff811a5de7>] _rcu_barrier+0x4a7/0x4e0
[  578.086230]  [<ffffffff810705bd>] ? sched_clock+0x1d/0x30
[  578.086230]  [<ffffffff81134c95>] ? sched_clock_local+0x25/0x90
[  578.086230]  [<ffffffff81134e08>] ? sched_clock_cpu+0x108/0x120
[  578.086230]  [<ffffffff8116369c>] ? __lock_acquire+0x42c/0x4b0
[  578.086230]  [<ffffffff811a58d0>] ? rcu_barrier_func+0x70/0x70
[  578.086230]  [<ffffffff8115d38e>] ? put_lock_stats+0xe/0x40
[  578.086230]  [<ffffffff8115fe14>] ? __lock_acquired+0x2a4/0x2e0
[  578.086230]  [<ffffffff811a5e70>] rcu_barrier_bh+0x10/0x20
[  578.086230]  [<ffffffff811a5e96>] rcu_oom_notify+0x16/0x30
[  578.086230]  [<ffffffff81121f3e>] notifier_call_chain+0xee/0x130
[  578.086230]  [<ffffffff81122326>] __blocking_notifier_call_chain+0xa6/0xd0
[  578.086230]  [<ffffffff81122361>] blocking_notifier_call_chain+0x11/0x20
[  578.086230]  [<ffffffff811e3f14>] out_of_memory+0x44/0x240
[  578.086230]  [<ffffffff8372c560>] ? _raw_spin_unlock+0x30/0x60
[  578.086230]  [<ffffffff811eaabf>] __alloc_pages_slowpath+0x55f/0x6a0
[  578.086230]  [<ffffffff811ea305>] ? get_page_from_freelist+0x625/0x660
[  578.086230]  [<ffffffff811eae46>] __alloc_pages_nodemask+0x246/0x330
[  578.086230]  [<ffffffff8122cd0d>] alloc_pages_current+0xdd/0x110
[  578.086230]  [<ffffffff811df077>] __page_cache_alloc+0xc7/0xe0
[  578.086230]  [<ffffffff811e110f>] filemap_fault+0x35f/0x4c0
[  578.086230]  [<ffffffff8120e26e>] __do_fault+0xae/0x560
[  578.086230]  [<ffffffff8120ed81>] handle_pte_fault+0x81/0x1f0
[  578.086230]  [<ffffffff8120f219>] handle_mm_fault+0x329/0x350
[  578.086230]  [<ffffffff810a5211>] do_page_fault+0x421/0x450
[  578.086230]  [<ffffffff81208b6e>] ? might_fault+0x4e/0xa0
[  578.086230]  [<ffffffff81208b6e>] ? might_fault+0x4e/0xa0
[  578.086230]  [<ffffffff81163dba>] ? __lock_release+0x1ba/0x1d0
[  578.086230]  [<ffffffff81208b6e>] ? might_fault+0x4e/0xa0
[  578.086230]  [<ffffffff8109d301>] do_async_page_fault+0x31/0xb0
[  578.086230]  [<ffffffff8372cf95>] async_page_fault+0x25/0x30

Other than that, there are several threads stuck in hugepage related code trying to allocate:

[  578.086230] trinity-child72 D ffff880022cd84c8  3264  6661   6580 0x00000004
[  578.086230]  ffff880022ccd848 0000000000000046 ffff880022ccd7f8 ffffffff81163dba
[  578.086230]  ffff880022ccc000 ffff880022ccc010 ffff880022ccdfd8 ffff880022ccc000
[  578.086230]  ffff880022ccc010 ffff880022ccdfd8 ffff880027733000 ffff880022cd0000
[  578.086230] Call Trace:
[  578.086230]  [<ffffffff81163dba>] ? __lock_release+0x1ba/0x1d0
[  578.086230]  [<ffffffff8372ab75>] schedule+0x55/0x60
[  578.086230]  [<ffffffff83728806>] schedule_timeout+0x276/0x2c0
[  578.086230]  [<ffffffff810fe110>] ? lock_timer_base+0x70/0x70
[  578.086230]  [<ffffffff83728869>] schedule_timeout_uninterruptible+0x19/0x20
[  578.086230]  [<ffffffff811eaa4f>] __alloc_pages_slowpath+0x4ef/0x6a0
[  578.086230]  [<ffffffff811ea305>] ? get_page_from_freelist+0x625/0x660
[  578.086230]  [<ffffffff811eae46>] __alloc_pages_nodemask+0x246/0x330
[  578.086230]  [<ffffffff8122cd0d>] alloc_pages_current+0xdd/0x110
[  578.086230]  [<ffffffff810a9a16>] pte_alloc_one+0x16/0x40
[  578.086230]  [<ffffffff812099bd>] __pte_alloc+0x2d/0x1e0
[  578.086230]  [<ffffffff81245831>] do_huge_pmd_anonymous_page+0x151/0x230
[  578.086230]  [<ffffffff8120f0d3>] handle_mm_fault+0x1e3/0x350
[  578.086230]  [<ffffffff8120b0b7>] ? follow_page+0xe7/0x5a0
[  578.086230]  [<ffffffff8120f738>] __get_user_pages+0x438/0x5d0
[  578.086230]  [<ffffffff81210826>] __mlock_vma_pages_range+0xc6/0xd0
[  578.086230]  [<ffffffff81210a25>] mlock_vma_pages_range+0x75/0xb0
[  578.086230]  [<ffffffff8121463c>] mmap_region+0x4bc/0x5f0
[  578.086230]  [<ffffffff81214a29>] do_mmap_pgoff+0x2b9/0x350
[  578.086230]  [<ffffffff811ff39c>] ? vm_mmap_pgoff+0x6c/0xb0
[  578.086230]  [<ffffffff811ff3b4>] vm_mmap_pgoff+0x84/0xb0
[  578.086230]  [<ffffffff81211f32>] sys_mmap_pgoff+0x182/0x190
[  578.086230]  [<ffffffff81985efe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  578.086230]  [<ffffffff8106d4dd>] sys_mmap+0x1d/0x20
[  578.086230]  [<ffffffff8372d579>] system_call_fastpath+0x16/0x1b

And with one, trying to do the following:

[  578.086230] trinity-child70 R  running task     3440  6659   6580 0x00000004
[  578.086230]  ffff880022c7f5e8 0000000000000046 ffff880022c7f5b8 ffffffff81161d16
[  578.086230]  ffff880022c7e000 ffff880022c7e010 ffff880022c7ffd8 ffff880022c7e000
[  578.086230]  ffff880022c7e010 ffff880022c7ffd8 ffff880028e13000 ffff880022c80000
[  578.086230] Call Trace:
[  578.086230]  [<ffffffff81161d16>] ? mark_held_locks+0xf6/0x120
[  578.086230]  [<ffffffff8372af94>] preempt_schedule_irq+0x94/0xd0
[  578.086230]  [<ffffffff8372cde6>] retint_kernel+0x26/0x30
[  578.086230]  [<ffffffff8305c9a5>] ? shrink_zcache_memory+0xe5/0x110
[  578.086230]  [<ffffffff811f6c10>] shrink_slab+0xd0/0x520
[  578.086230]  [<ffffffff811f6b10>] ? shrink_zones+0x1f0/0x220
[  578.086230]  [<ffffffff811f7ee9>] do_try_to_free_pages+0x1c9/0x3e0
[  578.086230]  [<ffffffff811f8323>] try_to_free_pages+0x143/0x200
[  578.086230]  [<ffffffff8372c5f5>] ? _raw_spin_unlock_irqrestore+0x65/0xc0
[  578.086230]  [<ffffffff811e60db>] __perform_reclaim+0x8b/0xe0
[  578.086230]  [<ffffffff811ea967>] __alloc_pages_slowpath+0x407/0x6a0
[  578.086230]  [<ffffffff811ea305>] ? get_page_from_freelist+0x625/0x660
[  578.086230]  [<ffffffff811eae46>] __alloc_pages_nodemask+0x246/0x330
[  578.086230]  [<ffffffff8122cd0d>] alloc_pages_current+0xdd/0x110
[  578.086230]  [<ffffffff810a9a16>] pte_alloc_one+0x16/0x40
[  578.086230]  [<ffffffff812099bd>] __pte_alloc+0x2d/0x1e0
[  578.086230]  [<ffffffff81245831>] do_huge_pmd_anonymous_page+0x151/0x230
[  578.086230]  [<ffffffff8120f0d3>] handle_mm_fault+0x1e3/0x350
[  578.086230]  [<ffffffff8120b0b7>] ? follow_page+0xe7/0x5a0
[  578.086230]  [<ffffffff8120f738>] __get_user_pages+0x438/0x5d0
[  578.086230]  [<ffffffff81210826>] __mlock_vma_pages_range+0xc6/0xd0
[  578.086230]  [<ffffffff81210a25>] mlock_vma_pages_range+0x75/0xb0
[  578.086230]  [<ffffffff8121463c>] mmap_region+0x4bc/0x5f0
[  578.086230]  [<ffffffff81214a29>] do_mmap_pgoff+0x2b9/0x350
[  578.086230]  [<ffffffff811ff39c>] ? vm_mmap_pgoff+0x6c/0xb0
[  578.086230]  [<ffffffff811ff3b4>] vm_mmap_pgoff+0x84/0xb0
[  578.086230]  [<ffffffff81211f32>] sys_mmap_pgoff+0x182/0x190
[  578.086230]  [<ffffffff81985efe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  578.086230]  [<ffffffff8106d4dd>] sys_mmap+0x1d/0x20
[  578.086230]  [<ffffffff8372d579>] system_call_fastpath+0x16/0x1b

The rest of the threads weren't particularly interesting, so I guess that the problem in one of the above.

next             reply	other threads:[~2012-06-29 16:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-29 16:44 Sasha Levin [this message]
2012-07-01  1:15 ` mm,numad,rcu: hang on OOM Paul E. McKenney
2012-07-01  1:15   ` Paul E. McKenney
2012-07-17  8:15   ` Sasha Levin
2012-07-17  8:15     ` Sasha Levin
2012-07-17 11:57     ` Paul E. McKenney
2012-07-17 11:57       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1340988281.2936.58.camel@lappy \
    --to=levinsasha928@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.