From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754075AbYJFOMa (ORCPT ); Mon, 6 Oct 2008 10:12:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752420AbYJFOMX (ORCPT ); Mon, 6 Oct 2008 10:12:23 -0400 Received: from one.firstfloor.org ([213.235.205.2]:60970 "EHLO one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752338AbYJFOMW (ORCPT ); Mon, 6 Oct 2008 10:12:22 -0400 Date: Mon, 6 Oct 2008 16:12:20 +0200 From: Andi Kleen To: mingo@elte.hu, linux-kernel@vger.kernel.org, rjw@sisk.pl Subject: scheduler hang on cpu re-hotplug with 2.6.27rc8 Message-ID: <20081006141220.GA14160@basil.nowhere.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Rafael, something for the regression list] While testing cpu hotunplug/hotreplug (first setting two CPUs to offline and then to online again) on a 16 thread machine with 2.6.27rc8 the first # echo 1 > ./devices/system/cpu/cpu14/online after hotunplug deadlocked somewhere in the scheduler: bash D 00000000ffffcb5b 0 4683 4671 ffff8804bc583c68 0000000000000086 ffff8804bc9d8640 0000000000000296 ffff8804bdd34730 ffff8804be6fc090 ffff8804bdd34978 0000000c805a1e2a ffff8804be4fd780 ffffffff802298b4 ffffffff808acd98 ffff88027d0b1168 Call Trace: [] __dequeue_entity+0x25/0x68 [] schedule_timeout+0x1e/0xad [] __disable_runtime+0x57/0x155 [] cpupri_set+0xbe/0xcd [] wait_for_common+0xcd/0x131 [] default_wake_function+0x0/0xe [] synchronize_rcu+0x30/0x36 [] wakeme_after_rcu+0x0/0xc [] partition_sched_domains+0x9b/0x1dd [] update_sched_domains+0x2e/0x35 [] notifier_call_chain+0x29/0x4c [] _cpu_up+0xd0/0x10a [] cpu_up+0x54/0x61 [] store_online+0x43/0x67 [] sysfs_write_file+0xd2/0x110 [] vfs_write+0xad/0x136 [] sys_write+0x45/0x6e [] system_call_fastpath+0x16/0x1b It just hung forever, but the machine was otherwise fully functional. This was without frame pointers so the backtrace presumably has some garbage. Haven't looked too closely. -Andi -- ak@linux.intel.com