public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi
@ 2004-04-27 10:17 Juergen Stohr
  2004-04-28 23:13 ` Marcelo Tosatti
  0 siblings, 1 reply; 3+ messages in thread
From: Juergen Stohr @ 2004-04-27 10:17 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

Hi,

I have got a problem similar to the one discussed in the thread
"[BUG]linux-2.4.24 with k8 numa support panic when init scsi": When 
booting the 2.4.26 on a quad Opteron box, in most of the cases the 
kernel crashes when initializing SCSI.
It seems to me that this bug is caused by a race, as in a few
cases the machine is able to boot.

The machine always boots if I set maxcpus=1.

If I append numa=off to the command line the kernel crashes with a
"Machine Check Exception" (but is able to initialize SCSI; perhaps
this is another bug?)

Does anybody know how to solve this problem or is anybody working on it?
Can someone give me a hint where to start when debugging this race?

I will attach the config of my kernel, the syslogs and the output of
ksymoops.

Regards,
Jürgen

[-- Attachment #2: config-2.4.26.gz --]
[-- Type: application/x-gzip, Size: 981 bytes --]

[-- Attachment #3: syslog.txt.gz --]
[-- Type: application/x-gzip, Size: 3934 bytes --]

[-- Attachment #4: ksymoops.txt.gz --]
[-- Type: application/x-gzip, Size: 992 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi
@ 2004-04-29  9:24 Alexander v. Buelow
  0 siblings, 0 replies; 3+ messages in thread
From: Alexander v. Buelow @ 2004-04-29  9:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Marcelo Tosatti, Juergen Stohr, linux-kernel@vger.kernel.org

Hi,

> It looks like you compiled this on a SuSE 9.0/64bit system, right?
> I presume this means the SuSE 2.4.21 smp kernel worked on the same
> box, right? 

Yes, that's right.

> Can you perhaps try to narrow down where it broke between (mainline)
> 2.4.21 and 2.4.26 ? 

We tried: 2.4.21 -> ok
          2.4.22 -> ok
	  2.4.23 -> not ok !!

Then we tried to find the error and I changed in mm/numa.c:

--- linux-2.4.26/mm/numa.c      2001-09-18 01:15:02.000000000 +0200
+++ linux-2.4.26-recoms/mm/numa.c       2004-04-27 18:25:28.000000000
+0200
@@ -105,6 +105,11 @@
                return NULL;
 #ifdef CONFIG_NUMA
        temp = NODE_DATA(numa_node_id());
+       if((gfp_mask & GFP_DMA) == GFP_DMA)
+         {
+           printk(KERN_WARNING "RECOMS: Umleitung DMA auf CPU 0\n");
+           temp = NODE_DATA(0);
+         }
 #else
        spin_lock_irqsave(&node_lock, flags);
        if (!next) next = pgdat_list;

And in mm/page_alloc.c I added in void __init free_area_init_core(..):

*gmap = pgdat->node_mem_map = lmem_map;
pgdat->node_size = totalpages;
pgdat->node_start_paddr = zone_start_paddr;
pgdat->node_start_mapnr = (lmem_map - mem_map);
pgdat->nr_zones = 0;
+// Alex:
+pgdat->node_id = nid;

offset = lmem_map - mem_map;
for (j = 0; j < MAX_NR_ZONES; j++) {

This seemed to work, the scsi error didn't occur any more. 

But then we ran into various other problems: Sometimes we got MCEs (GART
TLB) and different kernel errors like page fault, NULL pointer
dereference and general protection fault. These errors are not
reproducible but occur frequently.

I hope you will find a solution!

Regards,

Jürgen and Alexander

-- 
-----------------------------------------------------------------------    
Dipl.-Ing. Alexander von Buelow                http://www.rcs.ei.tum.de
Institute for Real-Time Computersystems (RCS)      fon +49/89-289-23556 
Technische Universitaet Muenchen, D-80290 Muenchen fax +49/89-289-23555

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-04-29  9:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-27 10:17 [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi Juergen Stohr
2004-04-28 23:13 ` Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2004-04-29  9:24 Alexander v. Buelow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox