linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* mm: hung task (handle_pte_fault)
@ 2012-03-23 10:45 Sasha Levin
  2012-03-25 11:10 ` Avi Kivity
  2012-03-26 23:17 ` Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Sasha Levin @ 2012-03-23 10:45 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar
  Cc: Pekka Enberg, Dave Jones, linux-mm,
	linux-kernel@vger.kernel.org List

Hi guys,

During fuzzing using trinity inside a KVM tools guest with latest
linux-next, I seem to be getting it hung once in a while, with the
following spew:

[ 1441.420617] INFO: task trinity:2706 blocked for more than 120 seconds.
[ 1441.421894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1441.424493] trinity         D 0000000000000000  3472  2706  16846 0x00000004
[ 1441.426749]  ffff880029dbbb38 0000000000000086 ffff880029dbbae8
ffff880029dbbfd8
[ 1441.428582]  00000000001d45c0 ffff880029dba000 00000000001d45c0
00000000001d45c0
[ 1441.430156]  00000000001d45c0 00000000001d45c0 ffff880029dbbfd8
00000000001d45c0
[ 1441.432371] Call Trace:
[ 1441.433042]  [<ffffffff81176dd0>] ? file_read_actor+0x1d0/0x1d0
[ 1441.434251]  [<ffffffff827148d4>] schedule+0x24/0x70
[ 1441.435314]  [<ffffffff82714e67>] io_schedule+0x87/0xd0
[ 1441.436357]  [<ffffffff81176dd9>] sleep_on_page+0x9/0x10
[ 1441.437442]  [<ffffffff82712cb7>] __wait_on_bit+0x57/0x80
[ 1441.438584]  [<ffffffff81177bff>] ? __lock_page_or_retry+0x8f/0xd0
[ 1441.439948]  [<ffffffff811775de>] wait_on_page_bit+0x6e/0x80
[ 1441.440859]  [<ffffffff810d6d20>] ? autoremove_wake_function+0x40/0x40
[ 1441.441700]  [<ffffffff810dbfbe>] ? up_read+0x1e/0x40
[ 1441.442428]  [<ffffffff81177c36>] __lock_page_or_retry+0xc6/0xd0
[ 1441.443270]  [<ffffffff81178490>] filemap_fault+0x440/0x4e0
[ 1441.444072]  [<ffffffff811991cf>] __do_fault+0x7f/0x5f0
[ 1441.444829]  [<ffffffff81112c00>] ?
add_lock_to_list.clone.18.clone.27+0xd0/0xe0
[ 1441.445886]  [<ffffffff8119cd27>] handle_pte_fault+0xf7/0x1e0
[ 1441.446740]  [<ffffffff8119e1ce>] handle_mm_fault+0x1ce/0x330
[ 1441.447537]  [<ffffffff8119e53c>] __get_user_pages+0x14c/0x640
[ 1441.448399]  [<ffffffff811129ae>] ? put_lock_stats.clone.19+0xe/0x40
[ 1441.449288]  [<ffffffff81117b1d>] ? __lock_acquired+0x19d/0x270
[ 1441.450164]  [<ffffffff811a0087>] __mlock_vma_pages_range+0x87/0xa0
[ 1441.451127]  [<ffffffff811a0129>] do_mlock_pages+0x89/0x160
[ 1441.451932]  [<ffffffff811a0b71>] sys_mlockall+0x111/0x1a0
[ 1441.452761]  [<ffffffff827176bd>] system_call_fastpath+0x1a/0x1f
[ 1441.453659] no locks held by trinity/2706.
[ 1441.454267] Kernel panic - not syncing: hung_task: blocked tasks

According to the logs, it's not the direct result of anything specific
happening, so I can't give an exact scenario to reproduce it. It does
happen rather often.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: hung task (handle_pte_fault)
  2012-03-23 10:45 mm: hung task (handle_pte_fault) Sasha Levin
@ 2012-03-25 11:10 ` Avi Kivity
  2012-03-26 23:17 ` Andrew Morton
  1 sibling, 0 replies; 6+ messages in thread
From: Avi Kivity @ 2012-03-25 11:10 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrew Morton, Hugh Dickins, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Pekka Enberg, Dave Jones, linux-mm,
	linux-kernel@vger.kernel.org List

On 03/23/2012 12:45 PM, Sasha Levin wrote:
> Hi guys,
>
> During fuzzing using trinity inside a KVM tools guest with latest
> linux-next, I seem to be getting it hung once in a while, with the
> following spew:
>
> [ 1441.420617] INFO: task trinity:2706 blocked for more than 120 seconds.
> [ 1441.421894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1441.424493] trinity         D 0000000000000000  3472  2706  16846 0x00000004
> [ 1441.426749]  ffff880029dbbb38 0000000000000086 ffff880029dbbae8
> ffff880029dbbfd8
> [ 1441.428582]  00000000001d45c0 ffff880029dba000 00000000001d45c0
> 00000000001d45c0
> [ 1441.430156]  00000000001d45c0 00000000001d45c0 ffff880029dbbfd8
> 00000000001d45c0
> [ 1441.432371] Call Trace:
> [ 1441.433042]  [<ffffffff81176dd0>] ? file_read_actor+0x1d0/0x1d0
> [ 1441.434251]  [<ffffffff827148d4>] schedule+0x24/0x70
> [ 1441.435314]  [<ffffffff82714e67>] io_schedule+0x87/0xd0
> [ 1441.436357]  [<ffffffff81176dd9>] sleep_on_page+0x9/0x10
> [ 1441.437442]  [<ffffffff82712cb7>] __wait_on_bit+0x57/0x80
> [ 1441.438584]  [<ffffffff81177bff>] ? __lock_page_or_retry+0x8f/0xd0
> [ 1441.439948]  [<ffffffff811775de>] wait_on_page_bit+0x6e/0x80
> [ 1441.440859]  [<ffffffff810d6d20>] ? autoremove_wake_function+0x40/0x40
> [ 1441.441700]  [<ffffffff810dbfbe>] ? up_read+0x1e/0x40
> [ 1441.442428]  [<ffffffff81177c36>] __lock_page_or_retry+0xc6/0xd0
> [ 1441.443270]  [<ffffffff81178490>] filemap_fault+0x440/0x4e0
> [ 1441.444072]  [<ffffffff811991cf>] __do_fault+0x7f/0x5f0
> [ 1441.444829]  [<ffffffff81112c00>] ?
> add_lock_to_list.clone.18.clone.27+0xd0/0xe0
> [ 1441.445886]  [<ffffffff8119cd27>] handle_pte_fault+0xf7/0x1e0
> [ 1441.446740]  [<ffffffff8119e1ce>] handle_mm_fault+0x1ce/0x330
> [ 1441.447537]  [<ffffffff8119e53c>] __get_user_pages+0x14c/0x640
> [ 1441.448399]  [<ffffffff811129ae>] ? put_lock_stats.clone.19+0xe/0x40
> [ 1441.449288]  [<ffffffff81117b1d>] ? __lock_acquired+0x19d/0x270
> [ 1441.450164]  [<ffffffff811a0087>] __mlock_vma_pages_range+0x87/0xa0
> [ 1441.451127]  [<ffffffff811a0129>] do_mlock_pages+0x89/0x160
> [ 1441.451932]  [<ffffffff811a0b71>] sys_mlockall+0x111/0x1a0
> [ 1441.452761]  [<ffffffff827176bd>] system_call_fastpath+0x1a/0x1f
> [ 1441.453659] no locks held by trinity/2706.
> [ 1441.454267] Kernel panic - not syncing: hung_task: blocked tasks
>
> According to the logs, it's not the direct result of anything specific
> happening, so I can't give an exact scenario to reproduce it. It does
> happen rather often.
>

Ingo notes that this could well be a kvm problem.  Does your processor
support EPT/NPT?  Can you provide kvm_stat and traces of what's
happening in the host when this triggers?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: hung task (handle_pte_fault)
  2012-03-23 10:45 mm: hung task (handle_pte_fault) Sasha Levin
  2012-03-25 11:10 ` Avi Kivity
@ 2012-03-26 23:17 ` Andrew Morton
  2012-03-27 23:53   ` Sasha Levin
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2012-03-26 23:17 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Hugh Dickins, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Pekka Enberg, Dave Jones, linux-mm,
	linux-kernel@vger.kernel.org List

On Fri, 23 Mar 2012 12:45:03 +0200
Sasha Levin <levinsasha928@gmail.com> wrote:

> Hi guys,
> 
> During fuzzing using trinity inside a KVM tools guest with latest
> linux-next, I seem to be getting it hung once in a while, with the
> following spew:
> 
> [ 1441.420617] INFO: task trinity:2706 blocked for more than 120 seconds.
> [ 1441.421894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1441.424493] trinity         D 0000000000000000  3472  2706  16846 0x00000004
> [ 1441.426749]  ffff880029dbbb38 0000000000000086 ffff880029dbbae8
> ffff880029dbbfd8
> [ 1441.428582]  00000000001d45c0 ffff880029dba000 00000000001d45c0
> 00000000001d45c0
> [ 1441.430156]  00000000001d45c0 00000000001d45c0 ffff880029dbbfd8
> 00000000001d45c0
> [ 1441.432371] Call Trace:
> [ 1441.433042]  [<ffffffff81176dd0>] ? file_read_actor+0x1d0/0x1d0
> [ 1441.434251]  [<ffffffff827148d4>] schedule+0x24/0x70
> [ 1441.435314]  [<ffffffff82714e67>] io_schedule+0x87/0xd0
> [ 1441.436357]  [<ffffffff81176dd9>] sleep_on_page+0x9/0x10
> [ 1441.437442]  [<ffffffff82712cb7>] __wait_on_bit+0x57/0x80
> [ 1441.438584]  [<ffffffff81177bff>] ? __lock_page_or_retry+0x8f/0xd0
> [ 1441.439948]  [<ffffffff811775de>] wait_on_page_bit+0x6e/0x80
> [ 1441.440859]  [<ffffffff810d6d20>] ? autoremove_wake_function+0x40/0x40
> [ 1441.441700]  [<ffffffff810dbfbe>] ? up_read+0x1e/0x40
> [ 1441.442428]  [<ffffffff81177c36>] __lock_page_or_retry+0xc6/0xd0
> [ 1441.443270]  [<ffffffff81178490>] filemap_fault+0x440/0x4e0
> [ 1441.444072]  [<ffffffff811991cf>] __do_fault+0x7f/0x5f0
> [ 1441.444829]  [<ffffffff81112c00>] ?
> add_lock_to_list.clone.18.clone.27+0xd0/0xe0
> [ 1441.445886]  [<ffffffff8119cd27>] handle_pte_fault+0xf7/0x1e0
> [ 1441.446740]  [<ffffffff8119e1ce>] handle_mm_fault+0x1ce/0x330
> [ 1441.447537]  [<ffffffff8119e53c>] __get_user_pages+0x14c/0x640
> [ 1441.448399]  [<ffffffff811129ae>] ? put_lock_stats.clone.19+0xe/0x40
> [ 1441.449288]  [<ffffffff81117b1d>] ? __lock_acquired+0x19d/0x270
> [ 1441.450164]  [<ffffffff811a0087>] __mlock_vma_pages_range+0x87/0xa0
> [ 1441.451127]  [<ffffffff811a0129>] do_mlock_pages+0x89/0x160
> [ 1441.451932]  [<ffffffff811a0b71>] sys_mlockall+0x111/0x1a0
> [ 1441.452761]  [<ffffffff827176bd>] system_call_fastpath+0x1a/0x1f
> [ 1441.453659] no locks held by trinity/2706.
> [ 1441.454267] Kernel panic - not syncing: hung_task: blocked tasks
> 
> According to the logs, it's not the direct result of anything specific
> happening, so I can't give an exact scenario to reproduce it. It does
> happen rather often.

The task is waiting for IO to complete against a page, and it isn't
happening.

There are quite a lot of things which could cause this, alas.  VM,
readahead, scheduler, core wait/wakeup code, IO system, interrupt
system (if it happens outside KVM, I guess).

So....  ugh.  Hopefully someone will hit this in a situation where it
can be narrowed down or bisected.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: hung task (handle_pte_fault)
  2012-03-26 23:17 ` Andrew Morton
@ 2012-03-27 23:53   ` Sasha Levin
  2012-03-28  4:53     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2012-03-27 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Pekka Enberg, Dave Jones, linux-mm,
	linux-kernel@vger.kernel.org List

On Tue, Mar 27, 2012 at 1:17 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> The task is waiting for IO to complete against a page, and it isn't
> happening.
>
> There are quite a lot of things which could cause this, alas.  VM,
> readahead, scheduler, core wait/wakeup code, IO system, interrupt
> system (if it happens outside KVM, I guess).
>
> So....  ugh.  Hopefully someone will hit this in a situation where it
> can be narrowed down or bisected.

I've only managed to reproduce it once, and was unable to get anything
useful out of it due to technical reasons.

The good part is that I've managed to hit something similar (although
I'm not 100% sure it's the same problem as the one in the original
mail).

Here's the spew:

[ 1450.628565] BUG: sleeping function called from invalid context at
fs/proc/task_mmu.c:826
[ 1450.632828] in_atomic(): 1, irqs_disabled(): 0, pid: 17086, name: trinity
[ 1450.637242] 2 locks held by trinity/17086:
[ 1450.639308]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8124aee3>]
pagemap_read+0x263/0x340
[ 1450.656425]  #1:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
[<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
[ 1450.669409] Pid: 17086, comm: trinity Not tainted
3.3.0-next-20120327-sasha #70
[ 1450.674559] Call Trace:
[ 1450.676252]  [<ffffffff810eb609>] __might_sleep+0x149/0x200
[ 1450.679876]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
[ 1450.681392]  [<ffffffff8124b4bf>] pagemap_pte_range+0x11f/0x2d0
[ 1450.684906]  [<ffffffff811191bf>] ? lock_release_non_nested+0x30f/0x350
[ 1450.687309]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
[ 1450.689444]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
[ 1450.691498]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
[ 1450.693574]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
[ 1450.695271]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
[ 1450.700743]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
[ 1450.704875]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
[ 1450.707599]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
[ 1450.709751]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
[ 1450.713269]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
[ 1450.715966] BUG: scheduling while atomic: trinity/17086/0x10000002
[ 1450.722426] 2 locks held by trinity/17086:
[ 1450.724283]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8124aee3>]
pagemap_read+0x263/0x340
[ 1450.727407]  #1:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
[<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
[ 1450.732867] Pid: 17086, comm: trinity Not tainted
3.3.0-next-20120327-sasha #70
[ 1450.735580] Call Trace:
[ 1450.736529]  [<ffffffff810e1b9c>] __schedule_bug+0x8c/0xa0
[ 1450.738627]  [<ffffffff826ce28d>] __schedule+0x6bd/0x6d0
[ 1450.740642]  [<ffffffff810e7e83>] __cond_resched+0x13/0x20
[ 1450.742691]  [<ffffffff826ce4dc>] _cond_resched+0x2c/0x40
[ 1450.745357]  [<ffffffff8124b4c4>] pagemap_pte_range+0x124/0x2d0
[ 1450.747744]  [<ffffffff811191bf>] ? lock_release_non_nested+0x30f/0x350
[ 1450.752051]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
[ 1450.755811]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
[ 1450.758000]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
[ 1450.760054]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
[ 1450.762597]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
[ 1450.765484]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
[ 1450.767166]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
[ 1450.768858]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
[ 1450.770809]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
[ 1450.772074]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
[ 1450.791135]
[ 1450.791397] ======================================================
[ 1450.792019] [ INFO: possible circular locking dependency detected ]
[ 1450.792019] 3.3.0-next-20120327-sasha #70 Not tainted
[ 1450.792019] -------------------------------------------------------
[ 1450.792019] trinity/17086 is trying to acquire lock:
[ 1450.792019]  (&mm->mmap_sem){++++++}, at: [<ffffffff8124aee3>]
pagemap_read+0x263/0x340
[ 1450.792019]
[ 1450.792019] but task is already holding lock:
[ 1450.792019]  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
[<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
[ 1450.792019]
[ 1450.792019] which lock already depends on the new lock.
[ 1450.792019]
[ 1450.792019]
[ 1450.792019] the existing dependency chain (in reverse order) is:
[ 1450.813449]
[ 1450.813449] -> #1 (&(&mm->page_table_lock)->rlock){+.+.-.}:
[ 1450.813449]        [<ffffffff81115b9c>] validate_chain.clone.26+0x88c/0x960
[ 1450.813449]        [<ffffffff81118735>] __lock_acquire+0x3f5/0xb70
[ 1450.813449]        [<ffffffff81119553>] lock_acquire+0xc3/0x100
[ 1450.813449]        [<ffffffff826cf71b>] _raw_spin_lock+0x3b/0x70
[ 1450.813449]        [<ffffffff8119cd84>] __pmd_alloc+0x44/0x100
[ 1450.813449]        [<ffffffff811a4fce>] alloc_new_pmd.clone.3+0x13e/0x160
[ 1450.813449]        [<ffffffff811a52cd>] move_page_tables+0x12d/0x3a0
[ 1450.813449]        [<ffffffff811ea1e7>] shift_arg_pages+0xc7/0x190
[ 1450.813449]        [<ffffffff811ebb5b>] setup_arg_pages+0x1db/0x200
[ 1450.813449]        [<ffffffff8123a445>] load_elf_binary+0x455/0xe20
[ 1450.813449]        [<ffffffff811ec3f1>] search_binary_handler+0x141/0x2d0
[ 1450.813449]        [<ffffffff811ec82d>] do_execve_common.clone.32+0x2ad/0x340
[ 1450.813449]        [<ffffffff811ec8d6>] do_execve+0x16/0x20
[ 1450.813449]        [<ffffffff810579b5>] sys_execve+0x45/0x70
[ 1450.813449]        [<ffffffff826d26a8>] kernel_execve+0x68/0xd0
[ 1450.813449]        [<ffffffff81002130>] init_post+0xb0/0xd0
[ 1450.813449]        [<ffffffff83927f3b>] kernel_init+0x1d9/0x1eb
[ 1450.813449]        [<ffffffff826d2634>] kernel_thread_helper+0x4/0x10
[ 1450.813449]
[ 1450.813449] -> #0 (&mm->mmap_sem){++++++}:
[ 1450.813449]        [<ffffffff811152e1>] check_prev_add+0x6b1/0x6e0
[ 1450.813449]        [<ffffffff81115b9c>] validate_chain.clone.26+0x88c/0x960
[ 1450.813449]        [<ffffffff81118735>] __lock_acquire+0x3f5/0xb70
[ 1450.813449]        [<ffffffff81119553>] lock_acquire+0xc3/0x100
[ 1450.813449]        [<ffffffff826cd707>] down_read+0x47/0x90
[ 1450.813449]        [<ffffffff8124aee3>] pagemap_read+0x263/0x340
[ 1450.813449]        [<ffffffff811e3a93>] vfs_read+0xc3/0x180
[ 1450.813449]        [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
[ 1450.813449]        [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
[ 1450.813449]
[ 1450.813449] other info that might help us debug this:
[ 1450.813449]
[ 1450.813449]  Possible unsafe locking scenario:
[ 1450.813449]
[ 1450.813449]        CPU0                    CPU1
[ 1450.813449]        ----                    ----
[ 1450.813449]   lock(&(&mm->page_table_lock)->rlock);
[ 1450.813449]                                lock(&mm->mmap_sem);
[ 1450.813449]
lock(&(&mm->page_table_lock)->rlock);
[ 1450.813449]   lock(&mm->mmap_sem);
[ 1450.813449]
[ 1450.813449]  *** DEADLOCK ***
[ 1450.813449]
[ 1450.813449] 1 lock held by trinity/17086:
[ 1450.813449]  #0:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
[<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
[ 1450.813449]
[ 1450.813449] stack backtrace:
[ 1450.813449] Pid: 17086, comm: trinity Not tainted
3.3.0-next-20120327-sasha #70
[ 1450.813449] Call Trace:
[ 1450.813449]  [<ffffffff81113fcf>] print_circular_bug+0x10f/0x120
[ 1450.813449]  [<ffffffff811152e1>] check_prev_add+0x6b1/0x6e0
[ 1450.813449]  [<ffffffff81115b9c>] validate_chain.clone.26+0x88c/0x960
[ 1450.813449]  [<ffffffff81118735>] __lock_acquire+0x3f5/0xb70
[ 1450.813449]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
[ 1450.813449]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
[ 1450.813449]  [<ffffffff811ad5e8>] ? walk_pmd_range+0x118/0x200
[ 1450.813449]  [<ffffffff81119553>] lock_acquire+0xc3/0x100
[ 1450.813449]  [<ffffffff8124aee3>] ? pagemap_read+0x263/0x340
[ 1450.813449]  [<ffffffff826cd707>] down_read+0x47/0x90
[ 1450.813449]  [<ffffffff8124aee3>] ? pagemap_read+0x263/0x340
[ 1450.813449]  [<ffffffff810dc02e>] ? up_read+0x1e/0x40
[ 1450.813449]  [<ffffffff8124aee3>] pagemap_read+0x263/0x340
[ 1450.813449]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
[ 1450.813449]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
[ 1450.813449]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
[ 1450.813449]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
[ 1450.813449]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
[ 1450.813449]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
391000 iterations.
606000 iterations.
[ 1476.139003] BUG: soft lockup - CPU#0 stuck for 22s! [trinity:17086]
[ 1476.139012] irq event stamp: 4409
[ 1476.139012] hardirqs last  enabled at (4409): [<ffffffff826cfa4b>]
_raw_spin_unlock_irq+0x2b/0x70
[ 1476.139012] hardirqs last disabled at (4408): [<ffffffff826cf834>]
_raw_spin_lock_irq+0x24/0x90
[ 1476.139012] softirqs last  enabled at (4406): [<ffffffff810b9ac3>]
__do_softirq+0x133/0x180
[ 1476.139012] softirqs last disabled at (4361): [<ffffffff826d272c>]
call_softirq+0x1c/0x30
[ 1476.139012] CPU 0
[ 1476.139012] Pid: 17086, comm: trinity Not tainted
3.3.0-next-20120327-sasha #70
[ 1476.139012] RIP: 0010:[<ffffffff81056b6c>]  [<ffffffff81056b6c>]
native_read_tsc+0xc/0x20
[ 1476.139012] RSP: 0018:ffff880006a71c00  EFLAGS: 00000286
[ 1476.139012] RAX: 0000000000000359 RBX: ffffffff826d07b4 RCX: 00000000127bc595
[ 1476.139012] RDX: 0000000000000359 RSI: ffffffff82ed0b9f RDI: 0000000000000001
[ 1476.139012] RBP: ffff880006a71c38 R08: 0000000000000000 R09: 0000000000000002
[ 1476.139012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880006a71b78
[ 1476.139012] R13: ffff8800253a8000 R14: ffff880006a70000 R15: ffff880006a71fd8
[ 1476.139012] FS:  00007feb180a7700(0000) GS:ffff88003d600000(0000)
knlGS:0000000000000000
[ 1476.139012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1476.139012] CR2: 0000000000ec86b0 CR3: 0000000023814000 CR4: 00000000000406f0
[ 1476.139012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1476.139012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1476.139012] Process trinity (pid: 17086, threadinfo
ffff880006a70000, task ffff8800253a8000)
[ 1476.139012] Stack:
[ 1476.139012]  ffffffff818850df 0000000000000018 ffff8800350ad060
0000000011c28537
[ 1476.139012]  00000000948b09e0 0000000000000001 ffff880035177b10
ffff880006a71c48
[ 1476.139012]  ffffffff818851ea ffff880006a71c88 ffffffff81899972
ffff880006a71c88
[ 1476.139012] Call Trace:
[ 1476.139012]  [<ffffffff818850df>] ? delay_tsc+0x3f/0x120
[ 1476.139012]  [<ffffffff818851ea>] __delay+0xa/0x10
[ 1476.139012]  [<ffffffff81899972>] do_raw_spin_lock+0xf2/0x140
[ 1476.139012]  [<ffffffff826cf73b>] _raw_spin_lock+0x5b/0x70
[ 1476.139012]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
[ 1476.139012]  [<ffffffff8188640e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1476.139012]  [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
[ 1476.139012]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
[ 1476.139012]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
[ 1476.139012]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
[ 1476.139012]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
[ 1476.139012]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
[ 1476.139012]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
[ 1476.139012]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
[ 1476.139012]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
[ 1476.139012]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
[ 1476.139012]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
[ 1476.139012] Code: 02 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08
5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31$
[ 1476.139012] Call Trace:
[ 1476.139012]  [<ffffffff818850df>] ? delay_tsc+0x3f/0x120
[ 1476.139012]  [<ffffffff818851ea>] __delay+0xa/0x10
[ 1476.139012]  [<ffffffff81899972>] do_raw_spin_lock+0xf2/0x140
[ 1476.139012]  [<ffffffff826cf73b>] _raw_spin_lock+0x5b/0x70
[ 1476.139012]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
[ 1476.139012]  [<ffffffff8188640e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1476.139012]  [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
[ 1476.139012]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
[ 1476.139012]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
[ 1476.139012]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
[ 1476.139012]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
[ 1476.139012]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
[ 1476.139012]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
[ 1476.139012]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
[ 1476.139012]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
[ 1476.139012]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
[ 1476.139012]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
[ 1476.139012] Kernel panic - not syncing: softlockup: hung tasks
[ 1476.139012] Rebooting in 1 seconds..  # lkvm run -k ./bzImage -m
1024 -c 8 --name run

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: hung task (handle_pte_fault)
  2012-03-27 23:53   ` Sasha Levin
@ 2012-03-28  4:53     ` Hugh Dickins
  2012-03-28 17:35       ` Naoya Horiguchi
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2012-03-28  4:53 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrew Morton, Naoya Horiguchi, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Pekka Enberg, Dave Jones, linux-mm, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 14109 bytes --]

On Wed, 28 Mar 2012, Sasha Levin wrote:
> On Tue, Mar 27, 2012 at 1:17 AM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > The task is waiting for IO to complete against a page, and it isn't
> > happening.
> >
> > There are quite a lot of things which could cause this, alas.  VM,
> > readahead, scheduler, core wait/wakeup code, IO system, interrupt
> > system (if it happens outside KVM, I guess).
> >
> > So....  ugh.  Hopefully someone will hit this in a situation where it
> > can be narrowed down or bisected.
> 
> I've only managed to reproduce it once, and was unable to get anything
> useful out of it due to technical reasons.
> 
> The good part is that I've managed to hit something similar (although
> I'm not 100% sure it's the same problem as the one in the original
> mail).

I don't think this one has anything to do with the first you posted,
but it does look like a good catch against current linux-next, where
pagemap_pte_range() appears to do a spin_lock(&walk->mm->page_table_lock)
which should have been removed by "thp: optimize away unnecessary page
table locking".  Some kind of mismerge perhaps: Horiguchi-san added to Cc.

Hugh

> 
> Here's the spew:
> 
> [ 1450.628565] BUG: sleeping function called from invalid context at
> fs/proc/task_mmu.c:826
> [ 1450.632828] in_atomic(): 1, irqs_disabled(): 0, pid: 17086, name: trinity
> [ 1450.637242] 2 locks held by trinity/17086:
> [ 1450.639308]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8124aee3>]
> pagemap_read+0x263/0x340
> [ 1450.656425]  #1:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
> [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
> [ 1450.669409] Pid: 17086, comm: trinity Not tainted
> 3.3.0-next-20120327-sasha #70
> [ 1450.674559] Call Trace:
> [ 1450.676252]  [<ffffffff810eb609>] __might_sleep+0x149/0x200
> [ 1450.679876]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
> [ 1450.681392]  [<ffffffff8124b4bf>] pagemap_pte_range+0x11f/0x2d0
> [ 1450.684906]  [<ffffffff811191bf>] ? lock_release_non_nested+0x30f/0x350
> [ 1450.687309]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
> [ 1450.689444]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
> [ 1450.691498]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
> [ 1450.693574]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
> [ 1450.695271]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
> [ 1450.700743]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
> [ 1450.704875]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
> [ 1450.707599]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
> [ 1450.709751]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
> [ 1450.713269]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
> [ 1450.715966] BUG: scheduling while atomic: trinity/17086/0x10000002
> [ 1450.722426] 2 locks held by trinity/17086:
> [ 1450.724283]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8124aee3>]
> pagemap_read+0x263/0x340
> [ 1450.727407]  #1:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
> [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
> [ 1450.732867] Pid: 17086, comm: trinity Not tainted
> 3.3.0-next-20120327-sasha #70
> [ 1450.735580] Call Trace:
> [ 1450.736529]  [<ffffffff810e1b9c>] __schedule_bug+0x8c/0xa0
> [ 1450.738627]  [<ffffffff826ce28d>] __schedule+0x6bd/0x6d0
> [ 1450.740642]  [<ffffffff810e7e83>] __cond_resched+0x13/0x20
> [ 1450.742691]  [<ffffffff826ce4dc>] _cond_resched+0x2c/0x40
> [ 1450.745357]  [<ffffffff8124b4c4>] pagemap_pte_range+0x124/0x2d0
> [ 1450.747744]  [<ffffffff811191bf>] ? lock_release_non_nested+0x30f/0x350
> [ 1450.752051]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
> [ 1450.755811]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
> [ 1450.758000]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
> [ 1450.760054]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
> [ 1450.762597]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
> [ 1450.765484]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
> [ 1450.767166]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
> [ 1450.768858]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
> [ 1450.770809]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
> [ 1450.772074]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
> [ 1450.791135]
> [ 1450.791397] ======================================================
> [ 1450.792019] [ INFO: possible circular locking dependency detected ]
> [ 1450.792019] 3.3.0-next-20120327-sasha #70 Not tainted
> [ 1450.792019] -------------------------------------------------------
> [ 1450.792019] trinity/17086 is trying to acquire lock:
> [ 1450.792019]  (&mm->mmap_sem){++++++}, at: [<ffffffff8124aee3>]
> pagemap_read+0x263/0x340
> [ 1450.792019]
> [ 1450.792019] but task is already holding lock:
> [ 1450.792019]  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
> [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
> [ 1450.792019]
> [ 1450.792019] which lock already depends on the new lock.
> [ 1450.792019]
> [ 1450.792019]
> [ 1450.792019] the existing dependency chain (in reverse order) is:
> [ 1450.813449]
> [ 1450.813449] -> #1 (&(&mm->page_table_lock)->rlock){+.+.-.}:
> [ 1450.813449]        [<ffffffff81115b9c>] validate_chain.clone.26+0x88c/0x960
> [ 1450.813449]        [<ffffffff81118735>] __lock_acquire+0x3f5/0xb70
> [ 1450.813449]        [<ffffffff81119553>] lock_acquire+0xc3/0x100
> [ 1450.813449]        [<ffffffff826cf71b>] _raw_spin_lock+0x3b/0x70
> [ 1450.813449]        [<ffffffff8119cd84>] __pmd_alloc+0x44/0x100
> [ 1450.813449]        [<ffffffff811a4fce>] alloc_new_pmd.clone.3+0x13e/0x160
> [ 1450.813449]        [<ffffffff811a52cd>] move_page_tables+0x12d/0x3a0
> [ 1450.813449]        [<ffffffff811ea1e7>] shift_arg_pages+0xc7/0x190
> [ 1450.813449]        [<ffffffff811ebb5b>] setup_arg_pages+0x1db/0x200
> [ 1450.813449]        [<ffffffff8123a445>] load_elf_binary+0x455/0xe20
> [ 1450.813449]        [<ffffffff811ec3f1>] search_binary_handler+0x141/0x2d0
> [ 1450.813449]        [<ffffffff811ec82d>] do_execve_common.clone.32+0x2ad/0x340
> [ 1450.813449]        [<ffffffff811ec8d6>] do_execve+0x16/0x20
> [ 1450.813449]        [<ffffffff810579b5>] sys_execve+0x45/0x70
> [ 1450.813449]        [<ffffffff826d26a8>] kernel_execve+0x68/0xd0
> [ 1450.813449]        [<ffffffff81002130>] init_post+0xb0/0xd0
> [ 1450.813449]        [<ffffffff83927f3b>] kernel_init+0x1d9/0x1eb
> [ 1450.813449]        [<ffffffff826d2634>] kernel_thread_helper+0x4/0x10
> [ 1450.813449]
> [ 1450.813449] -> #0 (&mm->mmap_sem){++++++}:
> [ 1450.813449]        [<ffffffff811152e1>] check_prev_add+0x6b1/0x6e0
> [ 1450.813449]        [<ffffffff81115b9c>] validate_chain.clone.26+0x88c/0x960
> [ 1450.813449]        [<ffffffff81118735>] __lock_acquire+0x3f5/0xb70
> [ 1450.813449]        [<ffffffff81119553>] lock_acquire+0xc3/0x100
> [ 1450.813449]        [<ffffffff826cd707>] down_read+0x47/0x90
> [ 1450.813449]        [<ffffffff8124aee3>] pagemap_read+0x263/0x340
> [ 1450.813449]        [<ffffffff811e3a93>] vfs_read+0xc3/0x180
> [ 1450.813449]        [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
> [ 1450.813449]        [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
> [ 1450.813449]
> [ 1450.813449] other info that might help us debug this:
> [ 1450.813449]
> [ 1450.813449]  Possible unsafe locking scenario:
> [ 1450.813449]
> [ 1450.813449]        CPU0                    CPU1
> [ 1450.813449]        ----                    ----
> [ 1450.813449]   lock(&(&mm->page_table_lock)->rlock);
> [ 1450.813449]                                lock(&mm->mmap_sem);
> [ 1450.813449]
> lock(&(&mm->page_table_lock)->rlock);
> [ 1450.813449]   lock(&mm->mmap_sem);
> [ 1450.813449]
> [ 1450.813449]  *** DEADLOCK ***
> [ 1450.813449]
> [ 1450.813449] 1 lock held by trinity/17086:
> [ 1450.813449]  #0:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at:
> [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
> [ 1450.813449]
> [ 1450.813449] stack backtrace:
> [ 1450.813449] Pid: 17086, comm: trinity Not tainted
> 3.3.0-next-20120327-sasha #70
> [ 1450.813449] Call Trace:
> [ 1450.813449]  [<ffffffff81113fcf>] print_circular_bug+0x10f/0x120
> [ 1450.813449]  [<ffffffff811152e1>] check_prev_add+0x6b1/0x6e0
> [ 1450.813449]  [<ffffffff81115b9c>] validate_chain.clone.26+0x88c/0x960
> [ 1450.813449]  [<ffffffff81118735>] __lock_acquire+0x3f5/0xb70
> [ 1450.813449]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
> [ 1450.813449]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
> [ 1450.813449]  [<ffffffff811ad5e8>] ? walk_pmd_range+0x118/0x200
> [ 1450.813449]  [<ffffffff81119553>] lock_acquire+0xc3/0x100
> [ 1450.813449]  [<ffffffff8124aee3>] ? pagemap_read+0x263/0x340
> [ 1450.813449]  [<ffffffff826cd707>] down_read+0x47/0x90
> [ 1450.813449]  [<ffffffff8124aee3>] ? pagemap_read+0x263/0x340
> [ 1450.813449]  [<ffffffff810dc02e>] ? up_read+0x1e/0x40
> [ 1450.813449]  [<ffffffff8124aee3>] pagemap_read+0x263/0x340
> [ 1450.813449]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
> [ 1450.813449]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
> [ 1450.813449]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
> [ 1450.813449]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
> [ 1450.813449]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
> [ 1450.813449]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
> 391000 iterations.
> 606000 iterations.
> [ 1476.139003] BUG: soft lockup - CPU#0 stuck for 22s! [trinity:17086]
> [ 1476.139012] irq event stamp: 4409
> [ 1476.139012] hardirqs last  enabled at (4409): [<ffffffff826cfa4b>]
> _raw_spin_unlock_irq+0x2b/0x70
> [ 1476.139012] hardirqs last disabled at (4408): [<ffffffff826cf834>]
> _raw_spin_lock_irq+0x24/0x90
> [ 1476.139012] softirqs last  enabled at (4406): [<ffffffff810b9ac3>]
> __do_softirq+0x133/0x180
> [ 1476.139012] softirqs last disabled at (4361): [<ffffffff826d272c>]
> call_softirq+0x1c/0x30
> [ 1476.139012] CPU 0
> [ 1476.139012] Pid: 17086, comm: trinity Not tainted
> 3.3.0-next-20120327-sasha #70
> [ 1476.139012] RIP: 0010:[<ffffffff81056b6c>]  [<ffffffff81056b6c>]
> native_read_tsc+0xc/0x20
> [ 1476.139012] RSP: 0018:ffff880006a71c00  EFLAGS: 00000286
> [ 1476.139012] RAX: 0000000000000359 RBX: ffffffff826d07b4 RCX: 00000000127bc595
> [ 1476.139012] RDX: 0000000000000359 RSI: ffffffff82ed0b9f RDI: 0000000000000001
> [ 1476.139012] RBP: ffff880006a71c38 R08: 0000000000000000 R09: 0000000000000002
> [ 1476.139012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880006a71b78
> [ 1476.139012] R13: ffff8800253a8000 R14: ffff880006a70000 R15: ffff880006a71fd8
> [ 1476.139012] FS:  00007feb180a7700(0000) GS:ffff88003d600000(0000)
> knlGS:0000000000000000
> [ 1476.139012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1476.139012] CR2: 0000000000ec86b0 CR3: 0000000023814000 CR4: 00000000000406f0
> [ 1476.139012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1476.139012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 1476.139012] Process trinity (pid: 17086, threadinfo
> ffff880006a70000, task ffff8800253a8000)
> [ 1476.139012] Stack:
> [ 1476.139012]  ffffffff818850df 0000000000000018 ffff8800350ad060
> 0000000011c28537
> [ 1476.139012]  00000000948b09e0 0000000000000001 ffff880035177b10
> ffff880006a71c48
> [ 1476.139012]  ffffffff818851ea ffff880006a71c88 ffffffff81899972
> ffff880006a71c88
> [ 1476.139012] Call Trace:
> [ 1476.139012]  [<ffffffff818850df>] ? delay_tsc+0x3f/0x120
> [ 1476.139012]  [<ffffffff818851ea>] __delay+0xa/0x10
> [ 1476.139012]  [<ffffffff81899972>] do_raw_spin_lock+0xf2/0x140
> [ 1476.139012]  [<ffffffff826cf73b>] _raw_spin_lock+0x5b/0x70
> [ 1476.139012]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
> [ 1476.139012]  [<ffffffff8188640e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 1476.139012]  [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
> [ 1476.139012]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
> [ 1476.139012]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
> [ 1476.139012]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
> [ 1476.139012]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
> [ 1476.139012]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
> [ 1476.139012]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
> [ 1476.139012]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
> [ 1476.139012]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
> [ 1476.139012]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
> [ 1476.139012]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
> [ 1476.139012] Code: 02 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08
> 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31$
> [ 1476.139012] Call Trace:
> [ 1476.139012]  [<ffffffff818850df>] ? delay_tsc+0x3f/0x120
> [ 1476.139012]  [<ffffffff818851ea>] __delay+0xa/0x10
> [ 1476.139012]  [<ffffffff81899972>] do_raw_spin_lock+0xf2/0x140
> [ 1476.139012]  [<ffffffff826cf73b>] _raw_spin_lock+0x5b/0x70
> [ 1476.139012]  [<ffffffff8124b410>] ? pagemap_pte_range+0x70/0x2d0
> [ 1476.139012]  [<ffffffff8188640e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 1476.139012]  [<ffffffff8124b410>] pagemap_pte_range+0x70/0x2d0
> [ 1476.139012]  [<ffffffff811ad5e8>] walk_pmd_range+0x118/0x200
> [ 1476.139012]  [<ffffffff811ad7e8>] walk_pud_range+0x118/0x150
> [ 1476.139012]  [<ffffffff811ada31>] walk_page_range+0x211/0x260
> [ 1476.139012]  [<ffffffff8124aef2>] pagemap_read+0x272/0x340
> [ 1476.139012]  [<ffffffff8124b3a0>] ? clear_refs_pte_range+0x190/0x190
> [ 1476.139012]  [<ffffffff81249b10>] ? get_vmalloc_info+0x120/0x120
> [ 1476.139012]  [<ffffffff8124b010>] ? m_stop+0x50/0x50
> [ 1476.139012]  [<ffffffff811e3a93>] vfs_read+0xc3/0x180
> [ 1476.139012]  [<ffffffff811e3e3f>] sys_read+0x4f/0xa0
> [ 1476.139012]  [<ffffffff826d107d>] system_call_fastpath+0x1a/0x1f
> [ 1476.139012] Kernel panic - not syncing: softlockup: hung tasks
> [ 1476.139012] Rebooting in 1 seconds..  # lkvm run -k ./bzImage -m
> 1024 -c 8 --name run
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: hung task (handle_pte_fault)
  2012-03-28  4:53     ` Hugh Dickins
@ 2012-03-28 17:35       ` Naoya Horiguchi
  0 siblings, 0 replies; 6+ messages in thread
From: Naoya Horiguchi @ 2012-03-28 17:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sasha Levin, Andrew Morton, Naoya Horiguchi, Peter Zijlstra,
	Gleixner, Ingo Molnar, Pekka Enberg, Dave Jones, linux-mm,
	linux-kernel

On Tue, Mar 27, 2012 at 09:53:41PM -0700, Hugh Dickins wrote:
> On Wed, 28 Mar 2012, Sasha Levin wrote:
> > On Tue, Mar 27, 2012 at 1:17 AM, Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > > The task is waiting for IO to complete against a page, and it isn't
> > > happening.
> > >
> > > There are quite a lot of things which could cause this, alas. VM,
> > > readahead, scheduler, core wait/wakeup code, IO system, interrupt
> > > system (if it happens outside KVM, I guess).
> > >
> > > So.... ugh. Hopefully someone will hit this in a situation where it
> > > can be narrowed down or bisected.
> > 
> > I've only managed to reproduce it once, and was unable to get anything
> > useful out of it due to technical reasons.
> > 
> > The good part is that I've managed to hit something similar (although
> > I'm not 100% sure it's the same problem as the one in the original
> > mail).
> 
> I don't think this one has anything to do with the first you posted,
> but it does look like a good catch against current linux-next, where
> pagemap_pte_range() appears to do a spin_lock(&walk->mm->page_table_lock)
> which should have been removed by "thp: optimize away unnecessary page
> table locking".  Some kind of mismerge perhaps: Horiguchi-san added to Cc.

Thanks for reporting.
This spin_lock() also exists in mainline, so we need a fix on it.
I'll post later for -stable tree.

Thanks,
Naoya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-28 17:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-23 10:45 mm: hung task (handle_pte_fault) Sasha Levin
2012-03-25 11:10 ` Avi Kivity
2012-03-26 23:17 ` Andrew Morton
2012-03-27 23:53   ` Sasha Levin
2012-03-28  4:53     ` Hugh Dickins
2012-03-28 17:35       ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).