All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: tglx@linutronix.de, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: [BUG] hotplug_cpu vs no_hz
Date: Thu, 03 Jul 2008 14:35:53 +0800	[thread overview]
Message-ID: <486C7349.9010707@cn.fujitsu.com> (raw)

config:
CONFIG_DETECT_SOFTLOCKUP=y  # just for call trace
CONFIG_HOTPLUG_CPU=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_NO_HZ=y		# if =n, this 2 bugs can't occur


this 2 bugs occur on 2kernel_vr * 2platform
platform :	i386 2cpus
		x86_64 2core*2cpus

kernel_vr:	2.6.25
		2.6.26-rc8

test 1:(cpu dies)
offline the other cpus, just left cpu#0 cpu#1, and:

i=0
while ((++i))
do
	echo 0 > /sys/devices/system/cpu/cpu1/online
	sleep 1
	echo 1 > /sys/devices/system/cpu/cpu1/online
	sleep 1
	echo $i
done

after several seconds ~ several hours, "echo 1 > /sys/devices/system/cpu/cpu1/online"
was blocked, cpu#1 can not be used and the output of dmesg:

BUG: soft lockup - CPU#1 stuck for 61s! [events/1:9898]
CPU 1:
Modules linked in:
Pid: 9898, comm: events/1 Not tainted 2.6.26-rc8-official-LAI-00089-ge1441b9 #5
RIP: 0010:[<ffffffff80237612>]  [<ffffffff80237612>] __do_softirq+0x4b/0xc7
RSP: 0018:ffff81006b42ff20  EFLAGS: 00000206
RAX: ffff81006a9b9fd8 RBX: ffff81006b42ff40 RCX: 0000000000000006
RDX: 0000000000000042 RSI: ffffffff8022da16 RDI: ffffffff8022da16
RBP: ffff81006b42fea0 R08: ffff81007f2c9178 R09: ffff81007f2c9140
R10: ffff8100807cc000 R11: 0000000000000000 R12: ffffffff8020be36
R13: ffff81006b42fea0 R14: ffffffff807a5100 R15: 0000000000000042
FS:  0000000000000000(0000) GS:ffff81007fb3ccc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4cc97b6000 CR3: 0000000000201000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 <IRQ>  [<ffffffff8020c38c>] ? call_softirq+0x1c/0x28
 [<ffffffff8020dad6>] ? do_softirq+0x34/0x72
 [<ffffffff80237586>] ? irq_exit+0x3f/0x80
 [<ffffffff8021b128>] ? smp_apic_timer_interrupt+0x8b/0xa7
 [<ffffffff8020be36>] ? apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff8022da16>] ? finish_task_switch+0x31/0x82
 [<ffffffff80590e7d>] ? thread_return+0x3d/0x9c
 [<ffffffff80241ca1>] ? worker_thread+0xa3/0xe5
 [<ffffffff80244780>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff80241bfe>] ? worker_thread+0x0/0xe5
 [<ffffffff80244645>] ? kthread+0x49/0x78
 [<ffffffff8020c018>] ? child_rip+0xa/0x12
 [<ffffffff802445fc>] ? kthread+0x0/0x78
 [<ffffffff8020c00e>] ? child_rip+0x0/0x12

INFO: task syslogd:3835 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd       D ffffffff805af600     0  3835      1
 ffff81007b80fd28 0000000000000082 0000000000000000 ffff81007dd83ec0
 ffff81007f01a820 ffff81007fbb1560 ffff81007f01ab78 0000000100000001
 00000000ffffffff 0000000000000292 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8030f58f>] log_wait_commit+0xa4/0xf4
 [<ffffffff80244780>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8030b1e7>] journal_stop+0x17c/0x1a9
 [<ffffffff8030ba16>] journal_force_commit+0x23/0x25
 [<ffffffff803049d0>] ext3_force_commit+0x26/0x28
 [<ffffffff802fed64>] ext3_write_inode+0x39/0x3f
 [<ffffffff802ad9ad>] __writeback_single_inode+0x180/0x29f
 [<ffffffff802ae353>] sync_inode+0x24/0x31
 [<ffffffff802fb2ff>] ext3_sync_file+0xa3/0xb4
 [<ffffffff802b08f7>] do_fsync+0x54/0xaa
 [<ffffffff802b097b>] __do_fsync+0x2e/0x44
 [<ffffffff802b09ac>] sys_fsync+0xb/0xd
 [<ffffffff8020b1fb>] system_call_after_swapgs+0x7b/0x80



test 2:(time-subsystem was broken)
offline the other cpus, just left cpu0#0 cpu#1, and:

try several times: {
	echo 0 > /sys/devices/system/cpu/cpu1/online
	echo 1 > /sys/devices/system/cpu/cpu1/online
	cat /dev/zero > /dev/null &
	taskset -p 2 $!		# set affinity to cpu#1

	top	# get cpu usage of "cat /dev/zero"
		# if cpu usage=0%, bug of test 1 have occurred
		# stop test

		# if cpu usage>150%, time-subsystem was broken
		# offline/online again, "top" shows huger cpu usage

		# nothing happen, kill "cat /dev/zero" and try again
}


             reply	other threads:[~2008-07-03  7:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-03  6:35 Lai Jiangshan [this message]
2008-07-03  9:07 ` [BUG] hotplug_cpu vs no_hz Vegard Nossum
2008-07-05  5:23   ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486C7349.9010707@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.