* [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi
@ 2004-04-27 10:17 Juergen Stohr
2004-04-28 23:13 ` Marcelo Tosatti
0 siblings, 1 reply; 3+ messages in thread
From: Juergen Stohr @ 2004-04-27 10:17 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 784 bytes --]
Hi,
I have got a problem similar to the one discussed in the thread
"[BUG]linux-2.4.24 with k8 numa support panic when init scsi": When
booting the 2.4.26 on a quad Opteron box, in most of the cases the
kernel crashes when initializing SCSI.
It seems to me that this bug is caused by a race, as in a few
cases the machine is able to boot.
The machine always boots if I set maxcpus=1.
If I append numa=off to the command line the kernel crashes with a
"Machine Check Exception" (but is able to initialize SCSI; perhaps
this is another bug?)
Does anybody know how to solve this problem or is anybody working on it?
Can someone give me a hint where to start when debugging this race?
I will attach the config of my kernel, the syslogs and the output of
ksymoops.
Regards,
Jürgen
[-- Attachment #2: config-2.4.26.gz --]
[-- Type: application/x-gzip, Size: 981 bytes --]
[-- Attachment #3: syslog.txt.gz --]
[-- Type: application/x-gzip, Size: 3934 bytes --]
[-- Attachment #4: ksymoops.txt.gz --]
[-- Type: application/x-gzip, Size: 992 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi
2004-04-27 10:17 [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi Juergen Stohr
@ 2004-04-28 23:13 ` Marcelo Tosatti
0 siblings, 0 replies; 3+ messages in thread
From: Marcelo Tosatti @ 2004-04-28 23:13 UTC (permalink / raw)
To: Juergen Stohr; +Cc: linux-kernel
On Tue, Apr 27, 2004 at 12:17:20PM +0200, Juergen Stohr wrote:
> Hi,
>
> I have got a problem similar to the one discussed in the thread
> "[BUG]linux-2.4.24 with k8 numa support panic when init scsi": When
> booting the 2.4.26 on a quad Opteron box, in most of the cases the
> kernel crashes when initializing SCSI.
> It seems to me that this bug is caused by a race, as in a few
> cases the machine is able to boot.
>
> The machine always boots if I set maxcpus=1.
>
> If I append numa=off to the command line the kernel crashes with a
> "Machine Check Exception" (but is able to initialize SCSI; perhaps
> this is another bug?)
>
> Does anybody know how to solve this problem or is anybody working on it?
> Can someone give me a hint where to start when debugging this race?
>
> I will attach the config of my kernel, the syslogs and the output of
> ksymoops.
Andi,
Have you seen Juergen's ksymoops trace?
It seems some BUG() (maybe BAD_RANGE()) is triggering at startup in
__alloc_pages().
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi
@ 2004-04-29 9:24 Alexander v. Buelow
0 siblings, 0 replies; 3+ messages in thread
From: Alexander v. Buelow @ 2004-04-29 9:24 UTC (permalink / raw)
To: Andi Kleen; +Cc: Marcelo Tosatti, Juergen Stohr, linux-kernel@vger.kernel.org
Hi,
> It looks like you compiled this on a SuSE 9.0/64bit system, right?
> I presume this means the SuSE 2.4.21 smp kernel worked on the same
> box, right?
Yes, that's right.
> Can you perhaps try to narrow down where it broke between (mainline)
> 2.4.21 and 2.4.26 ?
We tried: 2.4.21 -> ok
2.4.22 -> ok
2.4.23 -> not ok !!
Then we tried to find the error and I changed in mm/numa.c:
--- linux-2.4.26/mm/numa.c 2001-09-18 01:15:02.000000000 +0200
+++ linux-2.4.26-recoms/mm/numa.c 2004-04-27 18:25:28.000000000
+0200
@@ -105,6 +105,11 @@
return NULL;
#ifdef CONFIG_NUMA
temp = NODE_DATA(numa_node_id());
+ if((gfp_mask & GFP_DMA) == GFP_DMA)
+ {
+ printk(KERN_WARNING "RECOMS: Umleitung DMA auf CPU 0\n");
+ temp = NODE_DATA(0);
+ }
#else
spin_lock_irqsave(&node_lock, flags);
if (!next) next = pgdat_list;
And in mm/page_alloc.c I added in void __init free_area_init_core(..):
*gmap = pgdat->node_mem_map = lmem_map;
pgdat->node_size = totalpages;
pgdat->node_start_paddr = zone_start_paddr;
pgdat->node_start_mapnr = (lmem_map - mem_map);
pgdat->nr_zones = 0;
+// Alex:
+pgdat->node_id = nid;
offset = lmem_map - mem_map;
for (j = 0; j < MAX_NR_ZONES; j++) {
This seemed to work, the scsi error didn't occur any more.
But then we ran into various other problems: Sometimes we got MCEs (GART
TLB) and different kernel errors like page fault, NULL pointer
dereference and general protection fault. These errors are not
reproducible but occur frequently.
I hope you will find a solution!
Regards,
Jürgen and Alexander
--
-----------------------------------------------------------------------
Dipl.-Ing. Alexander von Buelow http://www.rcs.ei.tum.de
Institute for Real-Time Computersystems (RCS) fon +49/89-289-23556
Technische Universitaet Muenchen, D-80290 Muenchen fax +49/89-289-23555
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-04-29 9:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-27 10:17 [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi Juergen Stohr
2004-04-28 23:13 ` Marcelo Tosatti
-- strict thread matches above, loose matches on Subject: below --
2004-04-29 9:24 Alexander v. Buelow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox