From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Wed, 6 Nov 2002 14:40:19 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Wed, 6 Nov 2002 14:40:19 -0500 Received: from packet.digeo.com ([12.110.80.53]:14534 "EHLO packet.digeo.com") by vger.kernel.org with ESMTP id ; Wed, 6 Nov 2002 14:40:16 -0500 Message-ID: <3DC9719B.AC139E50@digeo.com> Date: Wed, 06 Nov 2002 11:46:35 -0800 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.5.46 i686) X-Accept-Language: en MIME-Version: 1.0 To: "J.E.J. Bottomley" CC: "Martin J. Bligh" , linux-kernel , jejb@steeleye.com, Rusty Russell , dipankar@in.ibm.com Subject: Re: Strange panic as soon as timer interrupts are enabled (recent 2.5) References: Message from "Martin J. Bligh" of "Wed, 06 Nov 2002 12:11:09 PST." <116630000.1036613469@flay> <200211061932.gA6JWKa03782@localhost.localdomain> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 06 Nov 2002 19:46:35.0767 (UTC) FILETIME=[37052870:01C285CD] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org "J.E.J. Bottomley" wrote: > > Martin.Bligh@us.ibm.com said: > > Conversations on IRC revealed that jejb has hit the same thing on > > voyager ... as I understood him, he felt the cause was the CPU was > > taking an interrupt before cpu_up was called, and the interrupt was > > going back through a non existent tasklet structure (tasklets now > > have per_cpu areas which are allocated as the cpu comes up) I'll let > > him discuss what he did to fix that, but the ensuing discussion made > > me think that taking this out to a wider audience for an appropriate > > long-term solution would be prudent. > > Yes, this caused for me, a completely reliable boot time panic with 2.5.46. > The problem is that per_cpu areas aren't initiallised until cpu_up is called, > so a cpu cannot now take an interrupt before cpu_up is called. Rusty's da man on this, but I think the fix is to not turn on the interrupts (at the APIC level) until cpu_up() has called __cpu_up(). Look at cpu_up(): ret = notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu); if (ret == NOTIFY_BAD) { printk("%s: attempt to bring up CPU %u failed\n", __FUNCTION__, cpu); ret = -EINVAL; goto out_notify; } /* Arch-specific enabling code. */ ret = __cpu_up(cpu); The softirq storage is initialised inside the CPU_UP_PREPARE call. So we're ready for interrupts on that CPU when your architecture's __cpu_up() is called. And no sooner than this.