From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753243Ab1LaPNT (ORCPT ); Sat, 31 Dec 2011 10:13:19 -0500 Received: from mail-ee0-f46.google.com ([74.125.83.46]:40041 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753146Ab1LaPNQ (ORCPT ); Sat, 31 Dec 2011 10:13:16 -0500 Message-ID: <1325344394.28904.43.camel@lappy> Subject: [BUG] 3.2-rc7: Hang when calling clone() From: Sasha Levin To: Linus Torvalds , Ingo Molnar , Peter Zijlstra Cc: linux-kernel Date: Sat, 31 Dec 2011 17:13:14 +0200 Content-Type: text/plain; charset="us-ascii" X-Mailer: Evolution 3.2.2 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, During recent fuzzer tests (Trinity over KVM tool), I've managed to cause the following kernel oops: [10080.793053] INFO: task kworker/u:0:5 blocked for more than 120 seconds. [10080.794297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10080.795751] kworker/u:0 D ffff880015320bd8 4576 5 2 0x00000000 [10080.797127] ffff880019d5ba00 0000000000000082 ffff8800ffffffff 000016b2b9b199b6 [10080.798571] ffff88001a7d3ec0 00000000001d33c0 ffff880019d3d800 00000000001d33c0 [10080.800050] ffff880019d5bfd8 ffff880019d5a000 00000000001d33c0 00000000001d33c0 [10080.802092] Call Trace: [10080.802546] [] schedule+0x3a/0x50 [10080.803466] [] schedule_timeout+0x245/0x2c0 [10080.804522] [] ? mark_held_locks+0x6e/0x130 [10080.805580] [] ? lock_release_holdtime+0xb2/0x160 [10080.806743] [] ? _raw_spin_unlock_irq+0x2b/0x70 [10080.807885] [] ? get_parent_ip+0x11/0x50 [10080.808933] [] wait_for_common+0x120/0x170 [10080.810118] [] ? try_to_wake_up+0x350/0x350 [10080.811205] [] ? wake_up_new_task+0x124/0x1f0 [10080.812318] [] wait_for_completion+0x18/0x20 [10080.813416] [] do_fork+0xf4/0x330 [10080.814339] [] ? wait_for_common+0x44/0x170 [10080.815429] [] kernel_thread+0x71/0x80 [10080.816448] [] ? proc_cap_handler+0x1c0/0x1c0 [10080.817565] [] ? gs_change+0x13/0x13 [10080.818543] [] __call_usermodehelper+0x32/0xa0 [10080.819679] [] process_one_work+0x1c7/0x460 [10080.820754] [] ? process_one_work+0x166/0x460 [10080.821843] [] ? call_usermodehelper_freeinfo+0x30/0x30 [10080.823078] [] worker_thread+0x162/0x340 [10080.824108] [] ? manage_workers.clone.20+0x240/0x240 [10080.825286] [] kthread+0xb6/0xc0 [10080.826171] [] kernel_thread_helper+0x4/0x10 [10080.827235] [] ? retint_restore_args+0x13/0x13 [10080.828368] [] ? kthread_flush_work_fn+0x10/0x10 [10080.829538] [] ? gs_change+0x13/0x13 [10080.843036] 2 locks held by kworker/u:0/5: [10080.843803] #0: (khelper){.+.+.+}, at: [] process_one_work+0x166/0x460 [10080.845447] #1: ((&sub_info->work)){+.+.+.}, at: [] process_one_work+0x166/0x460 [10080.847223] Kernel panic - not syncing: hung_task: blocked tasks [10080.848338] Pid: 947, comm: khungtaskd Not tainted 3.2.0-rc7-sasha-00039-g89307ba #93 [10080.849787] Call Trace: [10080.850361] [] panic+0x96/0x1c5 [10080.851259] [] ? print_lock+0x61/0xb0 [10080.852251] [] watchdog+0x2b6/0x2f0 [10080.853205] [] ? watchdog+0x70/0x2f0 [10080.854188] [] ? _raw_spin_unlock_irqrestore+0x93/0xa0 [10080.855421] [] ? hung_task_panic+0x20/0x20 [10080.856463] [] kthread+0xb6/0xc0 [10080.857363] [] kernel_thread_helper+0x4/0x10 [10080.858458] [] ? retint_restore_args+0x13/0x13 [10080.859570] [] ? kthread_flush_work_fn+0x10/0x10 [10080.860743] [] ? gs_change+0x13/0x13 This is the syscall that caused that: clone(clone_flags=0xd8220000, newsp=0xf3f270[page_0xff], parent_tid=0xf41290[page_allocs], child_tid=0xf3f270[page_0xff], regs=0x7f1b066a1000) I've seen two variants of this, one where the hang was in the same process that called clone(), and one (like the above) where it happened in kworker. In both cases, the stack above do_fork() is the same. -- Sasha.