From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261535AbVFMXUl (ORCPT ); Mon, 13 Jun 2005 19:20:41 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261622AbVFMXSp (ORCPT ); Mon, 13 Jun 2005 19:18:45 -0400 Received: from mailout1.vmware.com ([65.113.40.130]:17414 "EHLO mailout1.vmware.com") by vger.kernel.org with ESMTP id S261632AbVFMXRE (ORCPT ); Mon, 13 Jun 2005 19:17:04 -0400 Message-ID: <42AE13EF.8060105@vmware.com> Date: Mon, 13 Jun 2005 16:17:03 -0700 From: Zachary Amsden User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Tom Duffy Cc: "Langsdorf, Mark" , discuss@x86-64.org, Linux Kernel Mailing List Subject: Re: [discuss] [OOPS] powernow on smp dual core amd64 References: <84EA05E2CA77634C82730353CBE3A84301CFC14C@SAUSEXMB1.amd.com> <1118701245.9114.23.camel@duffman> In-Reply-To: <1118701245.9114.23.camel@duffman> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 13 Jun 2005 23:17:03.0078 (UTC) FILETIME=[01EA6860:01C5706E] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Tom Duffy wrote: >On Mon, 2005-06-13 at 16:47 -0500, Langsdorf, Mark wrote: > > >>Okay, I think I have figured this out. During initialization, >>the cpufreq infrastruture only initializes the first core of >>each processor. When a request comes into the second core, >>it's data structre is unitialized and we get the null point >>dereference. >> >>The solution is to assign the pointer to the data structure for >>the first core to all the other cores. >> >>Tom, could you try this patch and see if it helps? >> >> > >Yes! It fixed the panic. I get much further. > >Thanks! > >Unfortunately, after starting cpuspeed daemon, I get this: > >Starting cpuspeed: [ OK ] >Starting pcmcia: Starting PCMCIA services: >CPU 6: Machine Check Exception: 4 Bank 4: b200000000070f0f >TSC 4129a3d70d >Kernel panic - not syncing: Machine check > <1>Unable to handle kernel NULL pointer dereference at 00000000000000ff RIP: >[<00000000000000ff>] > > asmlinkage void smp_call_function_interrupt(void) { void (*func) (void *info) = call_data->func; void *info = call_data->info; int wait = call_data->wait; ack_APIC_irq(); /* * Notify initiating CPU that I've grabbed the data and am * about to execute the function */ mb(); atomic_inc(&call_data->started); /* * At this point the info structure may be out of scope unless wait==1 */ irq_enter(); (*func)(info); <--- passed bogus data Looks like you jumped through a bogus function pointer. I'm guessing it has something to do with an unitialized IRQ vector for the CPU speed on one of the cores (simply because it seems somewhat plausible): extern u8 irq_vector[NR_IRQ_VECTORS]; #define IO_APIC_VECTOR(irq) (irq_vector[irq]) #define AUTO_ASSIGN -1 So irq_vector[AUTO_ASSIGN] = 0xff which could have somehow made it into your function pointer. Just a theory.