From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx169.postini.com [74.125.245.169]) by kanga.kvack.org (Postfix) with SMTP id BCD136B0081 for ; Sat, 26 May 2012 15:33:57 -0400 (EDT) Received: by bkcjm19 with SMTP id jm19so2073207bkc.14 for ; Sat, 26 May 2012 12:33:55 -0700 (PDT) Message-ID: <1338060890.4284.12.camel@lappy> Subject: mm: hung task after fuzzing From: Sasha Levin Date: Sat, 26 May 2012 21:34:50 +0200 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "a.p.zijlstra" , hughd@google.com Cc: linux-mm , "linux-kernel@vger.kernel.org" , Dave Jones Hi all, I stumbled on the following when fuzzing with trinity inside a KVM guest, using latest linux-next kernel: [ 3367.906805] INFO: task numad/2:24 blocked for more than 120 seconds. [ 3367.910318] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3367.914853] numad/2 D 0000000000000001 5928 24 2 0x00000000 [ 3367.918638] ffff8800294fdc58 0000000000000046 ffff8800294fdc08 ffffffff81152212 [ 3367.923198] ffff8800294fc000 ffff8800294fc010 ffff8800294fdfd8 ffff8800294fc000 [ 3367.933897] ffff8800294fc010 ffff8800294fdfd8 ffff88000c878000 ffff880029500000 [ 3367.942510] Call Trace: [ 3367.944815] [] ? __lock_release+0x1c2/0x1e0 [ 3367.948480] [] schedule+0x55/0x60 [ 3367.951647] [] rwsem_down_failed_common+0xf5/0x130 [ 3367.955499] [] ? put_lock_stats+0xe/0x40 [ 3367.959585] [] ? __lock_contended+0x1f5/0x230 [ 3367.964727] [] rwsem_down_read_failed+0x15/0x17 [ 3367.968708] [] call_rwsem_down_read_failed+0x14/0x30 [ 3367.972904] [] ? down_read+0x79/0xa0 [ 3367.976229] [] ? lazy_migrate_process+0x22/0x60 [ 3367.980044] [] lazy_migrate_process+0x22/0x60 [ 3367.983716] [] process_mem_migrate+0x10/0x20 [ 3367.987362] [] move_processes+0x190/0x230 [ 3367.990828] [] numad_thread+0x7a/0x120 [ 3367.994156] [] ? find_busiest_node+0x310/0x310 [ 3367.998057] [] kthread+0xb2/0xc0 [ 3367.998062] [] kernel_thread_helper+0x4/0x10 [ 3367.998067] [] ? retint_restore_args+0x13/0x13 [ 3367.998069] [] ? __init_kthread_worker+0x70/0x70 [ 3367.998072] [] ? gs_change+0x13/0x13 [ 3367.998076] 1 lock held by numad/2/24: [ 3367.998083] #0: (&mm->mmap_sem){++++++}, at: [] lazy_migrate_process+0x22/0x60 [ 3367.998117] INFO: task khugepaged:2640 blocked for more than 120 seconds. [ 3367.998118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3367.998124] khugepaged D 0000000000000001 5264 2640 2 0x00000000 [ 3367.998129] ffff88000c919c18 0000000000000046 ffff88000c919bc8 ffffffff81152212 [ 3367.998134] ffff88000c918000 ffff88000c918010 ffff88000c919fd8 ffff88000c918000 [ 3367.998138] ffff88000c918010 ffff88000c919fd8 ffff88000c87b000 ffff88000c89b000 [ 3367.998139] Call Trace: [ 3367.998143] [] ? __lock_release+0x1c2/0x1e0 [ 3367.998145] [] schedule+0x55/0x60 [ 3367.998147] [] rwsem_down_failed_common+0xf5/0x130 [ 3367.998150] [] ? put_lock_stats+0xe/0x40 [ 3367.998152] [] ? __lock_contended+0x1f5/0x230 [ 3367.998156] [] rwsem_down_read_failed+0x15/0x17 [ 3367.998160] [] call_rwsem_down_read_failed+0x14/0x30 [ 3367.998161] [] ? down_read+0x79/0xa0 [ 3367.998164] [] ? khugepaged_scan_mm_slot+0xaf/0x320 [ 3367.998166] [] ? _raw_spin_unlock+0x30/0x60 [ 3367.998168] [] khugepaged_scan_mm_slot+0xaf/0x320 [ 3367.998170] [] khugepaged_do_scan+0xc0/0x100 [ 3367.998172] [] khugepaged_loop+0x69/0x400 [ 3367.998174] [] ? __mutex_unlock_slowpath+0x1a5/0x200 [ 3367.998176] [] ? wake_up_bit+0x40/0x40 [ 3367.998178] [] ? khugepaged_loop+0x400/0x400 [ 3367.998180] [] khugepaged+0x5d/0xf0 [ 3367.998181] [] kthread+0xb2/0xc0 [ 3367.998183] [] kernel_thread_helper+0x4/0x10 [ 3367.998185] [] ? retint_restore_args+0x13/0x13 [ 3367.998187] [] ? __init_kthread_worker+0x70/0x70 [ 3367.998188] [] ? gs_change+0x13/0x13 [ 3367.998190] 1 lock held by khugepaged/2640: [ 3367.998193] #0: (&mm->mmap_sem){++++++}, at: [] khugepaged_scan_mm_slot+0xaf/0x320 Digging into it, I noticed that both of those threads were waiting for a down_read() on mmap_sem, but trying to figure out which thread was trying to write, yielded nothing: [ 3433.005415] Showing all locks held in the system: [ 3433.005415] 1 lock held by numad/2/24: [ 3433.005415] #0: (&mm->mmap_sem){++++++}, at: [] lazy_migrate_process+0x22/0x60 [ 3433.005415] 1 lock held by khugepaged/2640: [ 3433.005415] #0: (&mm->mmap_sem){++++++}, at: [] khugepaged_scan_mm_slot+0xaf/0x320 [ 3433.005415] 6 locks held by kjournald/5818: [ 3433.005415] #0: (shrinker_rwsem){++++..}, at: [] shrink_slab+0x37/0x520 [ 3433.005415] #1: (sb_lock){+.+.-.}, at: [] grab_super_passive+0x16/0x80 [ 3433.005415] #2: (&(&i->lock)->rlock){-.-.-.}, at: [] serial8250_interrupt+0x2c/0xd0 [ 3433.005415] #3: (&port_lock_key){-.-.-.}, at: [] serial8250_handle_irq+0x23/0x80 [ 3433.005415] #4: (sysrq_key_table_lock){-.....}, at: [] __handle_sysrq+0x2d/0x180 [ 3433.005415] #5: (tasklist_lock){.?.+..}, at: [] debug_show_all_locks+0x5c/0x260 [ 3433.005415] 1 lock held by sh/5834: [ 3433.005415] #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x2fe/0x8d0 On the other hand, I've noticed that all other fuzzing processes were stuck inside the kernel, trying to reclaim memory (I'm pasting just a few, they all look the same in the reclaiming part of the stack): [ 3433.005415] trinity-child2 R running task 3088 5243 5832 0x00000004 [ 3433.005415] ffff880023d1f638 0000000000000046 0000000000000001 0000000000000001 [ 3433.005415] ffff880023d1e000 ffff880023d1e010 ffff880023d1ffd8 ffff880023d1e000 [ 3433.005415] ffff880023d1e010 ffff880023d1ffd8 ffff88000c808000 ffff880023d2b000 [ 3433.005415] Call Trace: [ 3433.005415] [] ? lock_release+0x17d/0x1a0 [ 3433.005415] [] preempt_schedule+0x51/0x70 [ 3433.005415] [] _raw_spin_unlock+0x49/0x60 [ 3433.005415] [] put_super+0x2c/0x40 [ 3433.005415] [] drop_super+0x1d/0x30 [ 3433.005415] [] prune_super+0x16b/0x1b0 [ 3433.005415] [] shrink_slab+0xd0/0x520 [ 3433.005415] [] do_try_to_free_pages+0x1c9/0x3e0 [ 3433.005415] [] try_to_free_pages+0x143/0x200 [ 3433.005415] [] __perform_reclaim+0x8b/0xe0 [ 3433.005415] [] __alloc_pages_slowpath+0x407/0x6a0 [ 3433.005415] [] ? get_page_from_freelist+0x625/0x660 [ 3433.005415] [] __alloc_pages_nodemask+0x246/0x330 [ 3433.005415] [] alloc_pages_current+0xed/0x140 [ 3433.005415] [] pte_alloc_one+0x16/0x40 [ 3433.005415] [] __pte_alloc+0x2d/0x1e0 [ 3433.005415] [] do_huge_pmd_anonymous_page+0x151/0x230 [ 3433.005415] [] handle_mm_fault+0x1e3/0x350 [ 3433.005415] [] ? follow_page+0xe7/0x5a0 [ 3433.005415] [] __get_user_pages+0x438/0x5d0 [ 3433.005415] [] __mlock_vma_pages_range+0xc6/0xd0 [ 3433.005415] [] mlock_vma_pages_range+0x75/0xb0 [ 3433.005415] [] mmap_region+0x4c6/0x600 [ 3433.005415] [] do_mmap_pgoff+0x300/0x390 [ 3433.005415] [] ? sys_mmap_pgoff+0x1ed/0x230 [ 3433.005415] [] sys_mmap_pgoff+0x20b/0x230 [ 3433.005415] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 3433.005415] [] sys_mmap+0x1d/0x20 [ 3433.005415] [] system_call_fastpath+0x16/0x1b [ 3433.005415] trinity-child2 R running task 3824 5357 5826 0x00000004 [ 3433.005415] ffff88004c0935d8 0000000000000046 ffff88004c0935a8 ffffffff81150146 [ 3433.005415] ffff88004c092000 ffff88004c092010 ffff88004c093fd8 ffff88004c092000 [ 3433.005415] ffff88004c092010 ffff88004c093fd8 ffff88000c808000 ffff880023d28000 [ 3433.005415] Call Trace: [ 3433.005415] [] ? mark_held_locks+0xf6/0x120 [ 3433.005415] [] ? drop_super+0x15/0x30 [ 3433.005415] [] preempt_schedule_irq+0x94/0xd0 [ 3433.005415] [] retint_kernel+0x26/0x30 [ 3433.005415] [] ? lock_release+0x17d/0x1a0 [ 3433.005415] [] up_read+0x1e/0x40 [ 3433.005415] [] drop_super+0x15/0x30 [ 3433.005415] [] prune_super+0x16b/0x1b0 [ 3433.005415] [] shrink_slab+0xd0/0x520 [ 3433.005415] [] do_try_to_free_pages+0x1c9/0x3e0 [ 3433.005415] [] try_to_free_pages+0x143/0x200 [ 3433.005415] [] __perform_reclaim+0x8b/0xe0 [ 3433.005415] [] __alloc_pages_slowpath+0x407/0x6a0 [ 3433.005415] [] ? get_page_from_freelist+0x625/0x660 [ 3433.005415] [] __alloc_pages_nodemask+0x246/0x330 [ 3433.005415] [] alloc_pages_current+0xed/0x140 [ 3433.005415] [] __page_cache_alloc+0xc7/0xe0 [ 3433.005415] [] filemap_fault+0x367/0x4d0 [ 3433.005415] [] __do_fault+0xa9/0x5a0 [ 3433.005415] [] ? sched_clock_cpu+0x108/0x120 [ 3433.005415] [] handle_pte_fault+0x81/0x1f0 [ 3433.005415] [] handle_mm_fault+0x329/0x350 [ 3433.005415] [] ? follow_page+0x550/0x5a0 [ 3433.005415] [] __get_user_pages+0x438/0x5d0 [ 3433.005415] [] __mlock_vma_pages_range+0xc6/0xd0 [ 3433.005415] [] do_mlock_pages+0x101/0x180 [ 3433.005415] [] sys_mlockall+0x128/0x1a0 [ 3433.005415] [] system_call_fastpath+0x16/0x1b [ 3433.005415] trinity-child2 R running task 4032 5365 5830 0x00000004 [ 3433.005415] ffff880027a0b718 0000000000000046 ffff880027a0b6b8 000000000000bf6f [ 3433.005415] ffff880027a0a000 ffff880027a0a010 ffff880027a0bfd8 ffff880027a0a000 [ 3433.005415] ffff880027a0a010 ffff880027a0bfd8 ffff88000c80b000 ffff880023d90000 [ 3433.005415] Call Trace: [ 3433.005415] [] preempt_schedule+0x51/0x70 [ 3433.005415] [] _raw_spin_unlock+0x49/0x60 [ 3433.005415] [] grab_super_passive+0x46/0x80 [ 3433.005415] [] prune_super+0x47/0x1b0 [ 3433.005415] [] shrink_slab+0xd0/0x520 [ 3433.005415] [] do_try_to_free_pages+0x1c9/0x3e0 [ 3433.005415] [] ? mark_held_locks+0xf6/0x120 [ 3433.005415] [] try_to_free_pages+0x143/0x200 [ 3433.005415] [] __perform_reclaim+0x8b/0xe0 [ 3433.005415] [] __alloc_pages_slowpath+0x407/0x6a0 [ 3433.005415] [] ? get_page_from_freelist+0x625/0x660 [ 3433.005415] [] __alloc_pages_nodemask+0x246/0x330 [ 3433.005415] [] alloc_pages_vma+0x12f/0x140 [ 3433.005415] [] __do_fault+0x61/0x5a0 [ 3433.005415] [] ? sched_clock_cpu+0x108/0x120 [ 3433.005415] [] ? get_lock_stats+0x2a/0x60 [ 3433.005415] [] ? put_lock_stats+0xe/0x40 [ 3433.005415] [] handle_pte_fault+0x81/0x1f0 [ 3433.005415] [] handle_mm_fault+0x329/0x350 [ 3433.005415] [] ? follow_page+0x550/0x5a0 [ 3433.005415] [] __get_user_pages+0x438/0x5d0 [ 3433.005415] [] __mlock_vma_pages_range+0xc6/0xd0 [ 3433.005415] [] do_mlock_pages+0x101/0x180 [ 3433.005415] [] sys_mlockall+0x128/0x1a0 [ 3433.005415] [] system_call_fastpath+0x16/0x1b [ 3433.005415] trinity-child1 R running task 3824 5367 5832 0x00000004 [ 3433.005415] ffff880017189758 0000000000000046 ffff8800171896f8 0000000000000218 [ 3433.005415] ffff880017188000 ffff880017188010 ffff880017189fd8 ffff880017188000 [ 3433.005415] ffff880017188010 ffff880017189fd8 ffff88000c840000 ffff88001580b000 [ 3433.005415] Call Trace: [ 3433.005415] [] preempt_schedule+0x51/0x70 [ 3433.005415] [] _raw_spin_unlock+0x49/0x60 [ 3433.005415] [] grab_super_passive+0x46/0x80 [ 3433.005415] [] prune_super+0x47/0x1b0 [ 3433.005415] [] shrink_slab+0xd0/0x520 [ 3433.005415] [] ? shrink_zones+0x1f0/0x220 [ 3433.005415] [] do_try_to_free_pages+0x1c9/0x3e0 [ 3433.005415] [] ? retint_restore_args+0x13/0x13 [ 3433.005415] [] try_to_free_pages+0x143/0x200 [ 3433.005415] [] __perform_reclaim+0x8b/0xe0 [ 3433.005415] [] __alloc_pages_slowpath+0x407/0x6a0 [ 3433.005415] [] ? get_page_from_freelist+0x625/0x660 [ 3433.005415] [] __alloc_pages_nodemask+0x246/0x330 [ 3433.005415] [] alloc_pages_vma+0x12f/0x140 [ 3433.005415] [] do_anonymous_page+0x17b/0x310 [ 3433.005415] [] ? __lock_acquire+0x43c/0x4c0 [ 3433.005415] [] handle_pte_fault+0xa4/0x1f0 [ 3433.005415] [] handle_mm_fault+0x329/0x350 [ 3433.005415] [] do_page_fault+0x421/0x450 [ 3433.005415] [] ? get_lock_stats+0x2a/0x60 [ 3433.005415] [] ? put_lock_stats+0xe/0x40 [ 3433.005415] [] ? sys_brk+0x147/0x160 [ 3433.005415] [] ? __lock_release+0x1c2/0x1e0 [ 3433.005415] [] ? sys_brk+0x147/0x160 [ 3433.005415] [] do_async_page_fault+0x31/0xb0 [ 3433.005415] [] async_page_fault+0x25/0x30 [ 3433.005415] trinity-child3 R running task 3824 5368 5826 0x00000004 [ 3433.005415] ffff880027a09668 0000000000000046 ffff880027a09608 00000000000114d2 [ 3433.005415] ffff880027a08000 ffff880027a08010 ffff880027a09fd8 ffff880027a08000 [ 3433.005415] ffff880027a08010 ffff880027a09fd8 ffff88000c81b000 ffff880023d58000 [ 3433.005415] Call Trace: [ 3433.005415] [] preempt_schedule+0x51/0x70 [ 3433.005415] [] _raw_spin_unlock+0x49/0x60 [ 3433.005415] [] put_super+0x2c/0x40 [ 3433.005415] [] drop_super+0x1d/0x30 [ 3433.005415] [] prune_super+0x16b/0x1b0 [ 3433.005415] [] shrink_slab+0xd0/0x520 [ 3433.005415] [] do_try_to_free_pages+0x1c9/0x3e0 [ 3433.005415] [] try_to_free_pages+0x143/0x200 [ 3433.005415] [] __perform_reclaim+0x8b/0xe0 [ 3433.005415] [] __alloc_pages_slowpath+0x407/0x6a0 [ 3433.005415] [] ? get_page_from_freelist+0x625/0x660 [ 3433.005415] [] __alloc_pages_nodemask+0x246/0x330 [ 3433.005415] [] alloc_pages_current+0xed/0x140 [ 3433.005415] [] __page_cache_alloc+0xc7/0xe0 [ 3433.005415] [] filemap_fault+0x367/0x4d0 [ 3433.005415] [] __do_fault+0xa9/0x5a0 [ 3433.005415] [] handle_pte_fault+0x81/0x1f0 [ 3433.005415] [] handle_mm_fault+0x329/0x350 [ 3433.005415] [] do_page_fault+0x421/0x450 [ 3433.005415] [] ? might_fault+0x4e/0xa0 [ 3433.005415] [] ? might_fault+0x4e/0xa0 [ 3433.005415] [] ? __lock_release+0x1c2/0x1e0 [ 3433.005415] [] ? might_fault+0x4e/0xa0 [ 3433.005415] [] do_async_page_fault+0x31/0xb0 [ 3433.005415] [] async_page_fault+0x25/0x30 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org