public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Intel RAID Controller SRCU42X in SGI Altix 350
@ 2006-06-29 15:18 Robert Nagy
  2006-06-29 18:32 ` Jesse Barnes
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Nagy @ 2006-06-29 15:18 UTC (permalink / raw)
  To: linux-kernel

Hi,

Distribution: Debian testing/unstable
Hardware Environment: SGI Altix 350, 2xItanium 2, EFI (read dmesg)
http://bsd.hu/~robert/altix.dmesg
http://bsd.hu/~robert/altix.kconf

Problem Description: I've installed an Intel(r) RAID Controller SRCU42X (PCI-X)
controller to this machine.
http://www.intel.com/design/servers/raid/srcu42x/index.htm
I've never used such a controller so if someone has any idea about this please
tell me. The dmesg will show everyhing, but:

megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006)
megaraid: probe new device 0x1000:0x0407:0x8086:0x0532: bus 2:slot 0:func 0
megaraid: out of memory, megaraid_alloc_cmd_packets 965
megaraid: maibox adapter did not initialize

0001:01:01.0 Co-processor: Silicon Graphics, Inc. IOC4 I/O controller (rev 4f)
0001:01:03.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3
SCSI Processor (rev 06)
0001:01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)
0002:01:02.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 03)
0002:02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID (rev 02)

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-29 15:18 Intel RAID Controller SRCU42X in SGI Altix 350 Robert Nagy
@ 2006-06-29 18:32 ` Jesse Barnes
  2006-06-29 19:12   ` Robert Nagy
  0 siblings, 1 reply; 11+ messages in thread
From: Jesse Barnes @ 2006-06-29 18:32 UTC (permalink / raw)
  To: Robert Nagy; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]

On Thursday, June 29, 2006 8:18 am, Robert Nagy wrote:
> Hi,
>
> Distribution: Debian testing/unstable
> Hardware Environment: SGI Altix 350, 2xItanium 2, EFI (read dmesg)
> http://bsd.hu/~robert/altix.dmesg
> http://bsd.hu/~robert/altix.kconf
>
> Problem Description: I've installed an Intel(r) RAID Controller
> SRCU42X (PCI-X) controller to this machine.
> http://www.intel.com/design/servers/raid/srcu42x/index.htm
> I've never used such a controller so if someone has any idea about
> this please tell me. The dmesg will show everyhing, but:
>
> megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
> megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006)
> megaraid: probe new device 0x1000:0x0407:0x8086:0x0532: bus 2:slot
> 0:func 0 megaraid: out of memory, megaraid_alloc_cmd_packets 965
> megaraid: maibox adapter did not initialize

IIRC some Altix boxes don't support 32 bit DMA for PCI-X devices.  Based 
on the initialization code I looked at (just a quick scan), it looks 
like the command packet initialization is done before the switch to a 64 
bit DMA mask, which might cause the failure you see here.  You can try 
this patch out (totally untested).  It's not fully correct, it should 
probably try 64 bit first then fall back to 32 bit if that fails then 
give up, and the other 64 bit DMA mask call should probably be removed.

Anyway, good luck.  If this one doesn't work you'll have to talk with one 
of the SGI guys and get some more debug info about the allocation 
failure for the command packets.

Jesse

[-- Attachment #2: megaraid-dma-mask-hack.patch --]
[-- Type: text/x-diff, Size: 593 bytes --]

diff --git a/drivers/scsi/megaraid/megaraid_mbox.c b/drivers/scsi/megaraid/megaraid_mbox.c
index b7caf60..032a3d7 100644
--- a/drivers/scsi/megaraid/megaraid_mbox.c
+++ b/drivers/scsi/megaraid/megaraid_mbox.c
@@ -457,7 +457,7 @@ megaraid_probe_one(struct pci_dev *pdev,
 
 	// Setup the default DMA mask. This would be changed later on
 	// depending on hardware capabilities
-	if (pci_set_dma_mask(adapter->pdev, DMA_32BIT_MASK) != 0) {
+	if (pci_set_dma_mask(adapter->pdev, DMA_64BIT_MASK) != 0) {
 
 		con_log(CL_ANN, (KERN_WARNING
 			"megaraid: pci_set_dma_mask failed:%d\n", __LINE__));

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-29 18:32 ` Jesse Barnes
@ 2006-06-29 19:12   ` Robert Nagy
  2006-06-29 19:16     ` Jesse Barnes
  2006-06-29 20:42     ` Jesse Barnes
  0 siblings, 2 replies; 11+ messages in thread
From: Robert Nagy @ 2006-06-29 19:12 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel

I've tried the diff but there is no difference.
I've also tried to use the EFI driver from Intel, but that did not work either.

2006/6/29, Jesse Barnes <jbarnes@virtuousgeek.org>:
> On Thursday, June 29, 2006 8:18 am, Robert Nagy wrote:
> > Hi,
> >
> > Distribution: Debian testing/unstable
> > Hardware Environment: SGI Altix 350, 2xItanium 2, EFI (read dmesg)
> > http://bsd.hu/~robert/altix.dmesg
> > http://bsd.hu/~robert/altix.kconf
> >
> > Problem Description: I've installed an Intel(r) RAID Controller
> > SRCU42X (PCI-X) controller to this machine.
> > http://www.intel.com/design/servers/raid/srcu42x/index.htm
> > I've never used such a controller so if someone has any idea about
> > this please tell me. The dmesg will show everyhing, but:
> >
> > megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
> > megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006)
> > megaraid: probe new device 0x1000:0x0407:0x8086:0x0532: bus 2:slot
> > 0:func 0 megaraid: out of memory, megaraid_alloc_cmd_packets 965
> > megaraid: maibox adapter did not initialize
>
> IIRC some Altix boxes don't support 32 bit DMA for PCI-X devices.  Based
> on the initialization code I looked at (just a quick scan), it looks
> like the command packet initialization is done before the switch to a 64
> bit DMA mask, which might cause the failure you see here.  You can try
> this patch out (totally untested).  It's not fully correct, it should
> probably try 64 bit first then fall back to 32 bit if that fails then
> give up, and the other 64 bit DMA mask call should probably be removed.
>
> Anyway, good luck.  If this one doesn't work you'll have to talk with one
> of the SGI guys and get some more debug info about the allocation
> failure for the command packets.
>
> Jesse
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-29 19:12   ` Robert Nagy
@ 2006-06-29 19:16     ` Jesse Barnes
  2006-06-30  8:44       ` Robert Nagy
  2006-06-29 20:42     ` Jesse Barnes
  1 sibling, 1 reply; 11+ messages in thread
From: Jesse Barnes @ 2006-06-29 19:16 UTC (permalink / raw)
  To: Robert Nagy; +Cc: linux-kernel

On Thursday, June 29, 2006 12:12 pm, Robert Nagy wrote:
> I've tried the diff but there is no difference.
> I've also tried to use the EFI driver from Intel, but that did not
> work either.

Yeah, using the EFI driver won't help at all, as it's only available in 
EFI context (it might let you boot of the raid but that's about it).

If you applied the diff and recompiled the megaraid driver and still got 
the same error, I'm not sure what the problem is....

Jesse

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-29 19:12   ` Robert Nagy
  2006-06-29 19:16     ` Jesse Barnes
@ 2006-06-29 20:42     ` Jesse Barnes
  2006-06-30 12:07       ` Robert Nagy
  1 sibling, 1 reply; 11+ messages in thread
From: Jesse Barnes @ 2006-06-29 20:42 UTC (permalink / raw)
  To: Robert Nagy; +Cc: linux-kernel

On Thursday, June 29, 2006 12:12 pm, Robert Nagy wrote:
> I've tried the diff but there is no difference.
> I've also tried to use the EFI driver from Intel, but that did not
> work either.

I've just been informed that megaraid has command ring addressing 
limitations, so you may not be able to use this card in PCI-X mode at 
all, at least on your Altix.  You can force it into PCI mode by putting 
an old PCI device in the same bus though, I think, that might get things 
working (without my patch).

Jesse

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-29 19:16     ` Jesse Barnes
@ 2006-06-30  8:44       ` Robert Nagy
  2006-06-30 16:07         ` Jesse Barnes
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Nagy @ 2006-06-30  8:44 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel

I know that it only works in EFI context. But I haven't seen the disk
attached to it. I am going to try forcing it to PCI mode.

2006/6/29, Jesse Barnes <jbarnes@virtuousgeek.org>:
> On Thursday, June 29, 2006 12:12 pm, Robert Nagy wrote:
> > I've tried the diff but there is no difference.
> > I've also tried to use the EFI driver from Intel, but that did not
> > work either.
>
> Yeah, using the EFI driver won't help at all, as it's only available in
> EFI context (it might let you boot of the raid but that's about it).
>
> If you applied the diff and recompiled the megaraid driver and still got
> the same error, I'm not sure what the problem is....
>
> Jesse
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-29 20:42     ` Jesse Barnes
@ 2006-06-30 12:07       ` Robert Nagy
  2006-06-30 16:06         ` Jesse Barnes
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Nagy @ 2006-06-30 12:07 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel

I've tried that with two different cards. Now the error is different.
Even the firmware boots on the controller but then the machine resets.
Same thing happens if I load the EFI driver but that drops me to the debugger.
More info can be found at http://pastebin.ca/75652

megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006)
megaraid: probe new device 0x1000:0x0407:0x8086:0x0532: bus 2:slot 0:func 0
ACPI: PCI Interrupt 0002:02:00.0[A]: no GSI
megaraid mailbox: wait for FW to boot [ok]
Entered OS MCA handler. PSP=20000000fff21120 cpu=0 monarch=1
All OS MCA slaves have reached rendezvous

2006/6/29, Jesse Barnes <jbarnes@virtuousgeek.org>:
> On Thursday, June 29, 2006 12:12 pm, Robert Nagy wrote:
> > I've tried the diff but there is no difference.
> > I've also tried to use the EFI driver from Intel, but that did not
> > work either.
>
> I've just been informed that megaraid has command ring addressing
> limitations, so you may not be able to use this card in PCI-X mode at
> all, at least on your Altix.  You can force it into PCI mode by putting
> an old PCI device in the same bus though, I think, that might get things
> working (without my patch).
>
> Jesse
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-30 12:07       ` Robert Nagy
@ 2006-06-30 16:06         ` Jesse Barnes
  2006-06-30 16:14           ` Robert Nagy
  0 siblings, 1 reply; 11+ messages in thread
From: Jesse Barnes @ 2006-06-30 16:06 UTC (permalink / raw)
  To: Robert Nagy; +Cc: linux-kernel

On Friday, June 30, 2006 5:07 am, Robert Nagy wrote:
> I've tried that with two different cards. Now the error is different.
> Even the firmware boots on the controller but then the machine resets.
> Same thing happens if I load the EFI driver but that drops me to the
> debugger. More info can be found at http://pastebin.ca/75652
>
> megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
> megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006)
> megaraid: probe new device 0x1000:0x0407:0x8086:0x0532: bus 2:slot
> 0:func 0 ACPI: PCI Interrupt 0002:02:00.0[A]: no GSI
> megaraid mailbox: wait for FW to boot [ok]
> Entered OS MCA handler. PSP=20000000fff21120 cpu=0 monarch=1
> All OS MCA slaves have reached rendezvous

This is what happens when you have PCI card in the bus next to your RAID 
card and run without my patch?  Hm...  this might be a regular driver 
bug.  Interesting that this driver might do an msleep right after the 
[ok] is printed.  Do you have kdb builtin to your kernel?  If so, maybe 
you could get a backtrace.  Otherwise you could put in some printk 
statements to see if we can figure out where the MCA is occuring...

Jesse

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-30  8:44       ` Robert Nagy
@ 2006-06-30 16:07         ` Jesse Barnes
  0 siblings, 0 replies; 11+ messages in thread
From: Jesse Barnes @ 2006-06-30 16:07 UTC (permalink / raw)
  To: Robert Nagy; +Cc: linux-kernel

On Friday, June 30, 2006 1:44 am, Robert Nagy wrote:
> I know that it only works in EFI context. But I haven't seen the disk
> attached to it. I am going to try forcing it to PCI mode.

Ah ok, just as a sanity check?  It's been awhile since I looked at the 
EFI driver API, it could very well have problems on a machine like Altix 
due to memory mapping constraints and such.

Jesse

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-30 16:06         ` Jesse Barnes
@ 2006-06-30 16:14           ` Robert Nagy
  2006-06-30 16:21             ` Jesse Barnes
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Nagy @ 2006-06-30 16:14 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel

no i do not have kdb. and i cannot even boot the box now. is there any
way to disable the megaraid driver with an argument?

2006/6/30, Jesse Barnes <jbarnes@virtuousgeek.org>:
> On Friday, June 30, 2006 5:07 am, Robert Nagy wrote:
> > I've tried that with two different cards. Now the error is different.
> > Even the firmware boots on the controller but then the machine resets.
> > Same thing happens if I load the EFI driver but that drops me to the
> > debugger. More info can be found at http://pastebin.ca/75652
> >
> > megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
> > megaraid: 2.20.4.8 (Release Date: Mon Apr 11 12:27:22 EST 2006)
> > megaraid: probe new device 0x1000:0x0407:0x8086:0x0532: bus 2:slot
> > 0:func 0 ACPI: PCI Interrupt 0002:02:00.0[A]: no GSI
> > megaraid mailbox: wait for FW to boot [ok]
> > Entered OS MCA handler. PSP=20000000fff21120 cpu=0 monarch=1
> > All OS MCA slaves have reached rendezvous
>
> This is what happens when you have PCI card in the bus next to your RAID
> card and run without my patch?  Hm...  this might be a regular driver
> bug.  Interesting that this driver might do an msleep right after the
> [ok] is printed.  Do you have kdb builtin to your kernel?  If so, maybe
> you could get a backtrace.  Otherwise you could put in some printk
> statements to see if we can figure out where the MCA is occuring...
>
> Jesse
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel RAID Controller SRCU42X in SGI Altix 350
  2006-06-30 16:14           ` Robert Nagy
@ 2006-06-30 16:21             ` Jesse Barnes
  0 siblings, 0 replies; 11+ messages in thread
From: Jesse Barnes @ 2006-06-30 16:21 UTC (permalink / raw)
  To: Robert Nagy; +Cc: linux-kernel

On Friday, June 30, 2006 9:14 am, Robert Nagy wrote:
> no i do not have kdb. and i cannot even boot the box now. is there any
> way to disable the megaraid driver with an argument?

Not that I know of.  But you can use your system controller to disable 
the PCI slot containing the RAID card... that should let you boot.  I 
don't remember what the command is, but you can type 'help' at your L1 
or L2 prompt.

Jesse

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-06-30 16:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-29 15:18 Intel RAID Controller SRCU42X in SGI Altix 350 Robert Nagy
2006-06-29 18:32 ` Jesse Barnes
2006-06-29 19:12   ` Robert Nagy
2006-06-29 19:16     ` Jesse Barnes
2006-06-30  8:44       ` Robert Nagy
2006-06-30 16:07         ` Jesse Barnes
2006-06-29 20:42     ` Jesse Barnes
2006-06-30 12:07       ` Robert Nagy
2006-06-30 16:06         ` Jesse Barnes
2006-06-30 16:14           ` Robert Nagy
2006-06-30 16:21             ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox