From: Zachary Amsden <zach@vmware.com>
To: caglar@pardus.org.tr
Cc: Andi Kleen <ak@suse.de>,
linux-kernel@vger.kernel.org, Gerd Hoffmann <kraxel@suse.de>,
Andrew Morton <akpm@osdl.org>, Daniel Hecht <dhecht@vmware.com>,
Sahil Rihan <srihan@vmware.com>,
Trampus Richmond <trampus@vmware.com>,
betts@vmware.com
Subject: Fix for OpenSUSE kernel bug (was Re: [Opps] Invalid opcode)
Date: Wed, 29 Nov 2006 22:44:50 -0800 [thread overview]
Message-ID: <456E7DE2.5070808@vmware.com> (raw)
In-Reply-To: <200611051917.56971.caglar@pardus.org.tr>
[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]
S.Çağlar Onur wrote:
> 05 Kas 2006 Paz 18:40 tarihinde, Andi Kleen şunları yazmıştı:
>
>> How do you know this?
>>
>
> Just guessing, if im not wrong panics occur after SMP alternative switching
> code done its job.
>
>
>> And does it still happen in 2.6.19-rc4?
>>
>
> Will try
>
>
>>> in VmWare and Microsoft Virtual
>>> PC and in order to confirm this bug is not our distro specific i
>>> downloaded and tried latest OpenSuse also [1] and [2] are screens
>>> captured by vmware but exact same panic occurs in Virtual PC as reported
>>> to us in [3].
>>>
>> Always the same BUG()?
>>
>
> Yes, same bug
>
>
>> There is just some rolling Turkish text there.
>>
>
> Ah im sorry here is the correct links :(
>
> [1] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_opensuse.png
> [2] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_pardus.png
>
> Cheers
>
I'm proposing this as a fix for your bug. Having tasklets scheduled
before softirqd gets to run might be somewhat backwards, but there is
nothing I can find wrong about it from a correctness point of view.
Better to boot the kernel even when compiled with bug checking on, I think.
This bug started becoming apparent in 2.6.18 because of some rework with
the CPU hotplug code, but in theory, it exists at least all the way back
to 2.6.10, which is as far as I looked backwards in time.
Zach
[-- Attachment #2: fix-softirq-race --]
[-- Type: text/plain, Size: 1619 bytes --]
It is possible to have tasklets get scheduled before softirqd has had
a chance to spawn on all CPUs. This is totally harmless; after success
during action CPU_UP_PREPARE, action CPU_ONLINE will be called, which
immediately wakes softirqd on the appropriate CPU to process the already
pending tasklets. So there is no danger of having a missed wakeup for
any tasklets that were already pending.
In particular, i386 is affected by this during startup, and is visible when
using a very large initrd; during the time it takes for the initrd to be
decompressed, a timer IRQ can come in and schedule RCU callbacks. It is also
possible that resending of a hardware IRQ via a softirq triggers the same bug.
Because of different timing conditions, this shows up in all emulators
and virtual machines tested, including Xen, VMware, Virtual PC, and Qemu.
It is also possible to trigger on native hardware with a large enough initrd,
although I don't have a reliable case demonstrating that.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.18/kernel/softirq.c
===================================================================
--- linux-2.6.18.orig/kernel/softirq.c 2006-11-10 14:44:39.000000000 -0800
+++ linux-2.6.18/kernel/softirq.c 2006-11-29 22:19:36.000000000 -0800
@@ -574,8 +574,6 @@ static int __cpuinit cpu_callback(struct
switch (action) {
case CPU_UP_PREPARE:
- BUG_ON(per_cpu(tasklet_vec, hotcpu).list);
- BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list);
p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
if (IS_ERR(p)) {
printk("ksoftirqd for %i failed\n", hotcpu);
next prev parent reply other threads:[~2006-11-30 6:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-05 13:07 [Opps] Invalid opcode S.Çağlar Onur
2006-11-05 16:40 ` Andi Kleen
2006-11-05 17:17 ` S.Çağlar Onur
2006-11-05 17:25 ` Jan Engelhardt
2006-11-05 17:38 ` S.Çağlar Onur
2006-11-05 18:57 ` Andi Kleen
2006-11-05 19:51 ` S.Çağlar Onur
2006-11-05 23:13 ` Zachary Amsden
2006-11-05 23:33 ` Andi Kleen
2006-11-30 6:44 ` Zachary Amsden [this message]
2006-11-30 14:21 ` Fix for OpenSUSE kernel bug (was Re: [Opps] Invalid opcode) S.Çağlar Onur
2006-11-12 2:39 ` [Opps] Invalid opcode S.Çağlar Onur
2006-11-12 3:32 ` Andi Kleen
2006-11-13 4:20 ` Zachary Amsden
2006-11-13 5:31 ` Andi Kleen
2006-11-15 16:08 ` S.Çağlar Onur
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=456E7DE2.5070808@vmware.com \
--to=zach@vmware.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=betts@vmware.com \
--cc=caglar@pardus.org.tr \
--cc=dhecht@vmware.com \
--cc=kraxel@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=srihan@vmware.com \
--cc=trampus@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox