sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
@ 2009-10-03  5:10 Bernie Innocenti
  2009-10-05 21:45 ` Mark Lord
  0 siblings, 1 reply; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-03  5:10 UTC (permalink / raw)
  To: linux-ide; +Cc: lkml, sysadmin

The error in the subject appears in the console immediately followed bv
a hard freeze of the machine.  The error occurs reproducibly on two
identical Opteron servers, each one equipped with two identical
controller cards:

03:04.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
03:06.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)

We can trigger the problem within a few seconds by starting a
reconstruction on a drive hooked to port 4 (counting from 0) of the
second controller.  Oddly, every other drive works reliably and the
faulty drive works if we connect it to, for example, port 4 of the first
controller.

I'd like to stress that the problem occurs systematically, on two
completely distinct machines.  We swapped drives, cables and controllers
to exclude other possibilities.

Tested with Debian kernels 2.6.26-19 and 2.6.30-8.  Let me know if
further details are needed.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-03  5:10 sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040 Bernie Innocenti
@ 2009-10-05 21:45 ` Mark Lord
  2009-10-06  4:16   ` Bernie Innocenti
  2009-10-06 12:25   ` Harri Olin
  0 siblings, 2 replies; 16+ messages in thread
From: Mark Lord @ 2009-10-05 21:45 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: linux-ide, lkml, sysadmin

Bernie Innocenti wrote:
> The error in the subject appears in the console immediately followed bv
> a hard freeze of the machine.  The error occurs reproducibly on two
> identical Opteron servers, each one equipped with two identical
> controller cards:
> 
> 03:04.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
> 03:06.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
> 
> We can trigger the problem within a few seconds by starting a
> reconstruction on a drive hooked to port 4 (counting from 0) of the
> second controller.  Oddly, every other drive works reliably and the
> faulty drive works if we connect it to, for example, port 4 of the first
> controller.
> 
> I'd like to stress that the problem occurs systematically, on two
> completely distinct machines.  We swapped drives, cables and controllers
> to exclude other possibilities.
> 
> Tested with Debian kernels 2.6.26-19 and 2.6.30-8.  Let me know if
> further details are needed.
..
> 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040..
..

  0x30000040 here means "MRdPerr":
    "bad data parity detected during PCI master read".

Which means there that a data parity error happened
during outgoing data transfer on the PCI-X bus.
This could happen due to noise on the bus,
dying capacitors, or (?) bad RAM (not sure about the last one).

The expected behaviour here is for sata_mv to then perform
perform a full SATA reset, after which the I/O will be reattempted.

But it appears to lock up before that happens.
The code does try and clear the PCI error interrupt,
but perhaps it needs clearing in more than the one register
where it currently does so.

Looking over the code and the documentation I have (NDA),
nothing obvious springs to view.  There are some extra registers
we could be dumping out, to show exactly what PCI phase and address
caused the error, but reading those won't cause or prevent a lockup.

Best bet would be to try replacing the RAM in that box,
and see if the problem goes away.

Cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-05 21:45 ` Mark Lord
@ 2009-10-06  4:16   ` Bernie Innocenti
  2009-10-06 12:25   ` Harri Olin
  1 sibling, 0 replies; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-06  4:16 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-ide, lkml, sysadmin

El Mon, 05-10-2009 a las 17:45 -0400, Mark Lord escribió:
>   0x30000040 here means "MRdPerr":
>     "bad data parity detected during PCI master read".
> 
> Which means there that a data parity error happened
> during outgoing data transfer on the PCI-X bus.
> This could happen due to noise on the bus,
> dying capacitors, or (?) bad RAM (not sure about the last one).

Oddly, we see this on two different machines.  And only on specific
ports of the second controller card.

On one of these machines, we've also found a bunch of MCEs related to
ECC errors, but we were unable to reproduce them by exercising the CPU
and the bus with tools like cpuburn or md5sum of entire drives.

The other one has been running for 2 days with no errors whatsoever.
Bother have successfully completed a 24h cycle of memtest86+.

> The expected behaviour here is for sata_mv to then perform
> perform a full SATA reset, after which the I/O will be reattempted.
>
> But it appears to lock up before that happens.
> The code does try and clear the PCI error interrupt,
> but perhaps it needs clearing in more than the one register
> where it currently does so.

I've got a few of these recoverable errors overnight (perhaps along with
the MCE errors I described above).  The bus was reset as you describe.

The PCI errors seem to cause a system freeze only during RAID
reconstruction. Perhaps the bus reset logic is not sufficiently locked
against re-entrance?

> Looking over the code and the documentation I have (NDA),
> nothing obvious springs to view.  There are some extra registers
> we could be dumping out, to show exactly what PCI phase and address
> caused the error, but reading those won't cause or prevent a lockup.
> 
> Best bet would be to try replacing the RAM in that box,
> and see if the problem goes away.

We'll try this tomorrow, thank you very much for providing these clues.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-05 21:45 ` Mark Lord
  2009-10-06  4:16   ` Bernie Innocenti
@ 2009-10-06 12:25   ` Harri Olin
  2009-10-06 18:04     ` Bernie Innocenti
  1 sibling, 1 reply; 16+ messages in thread
From: Harri Olin @ 2009-10-06 12:25 UTC (permalink / raw)
  To: Mark Lord; +Cc: Bernie Innocenti, linux-ide, lkml, sysadmin

Mark Lord wrote:
> Bernie Innocenti wrote:
>> The error in the subject appears in the console immediately followed bv
>> a hard freeze of the machine.  The error occurs reproducibly on two
>> identical Opteron servers, each one equipped with two identical
>> controller cards:
>>
>> 03:04.0 SCSI storage controller: Marvell Technology Group Ltd. 
>> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>> 03:06.0 SCSI storage controller: Marvell Technology Group Ltd. 
>> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>>
>> We can trigger the problem within a few seconds by starting a
>> reconstruction on a drive hooked to port 4 (counting from 0) of the
>> second controller.  Oddly, every other drive works reliably and the
>> faulty drive works if we connect it to, for example, port 4 of the first
>> controller.
>>
>> Tested with Debian kernels 2.6.26-19 and 2.6.30-8.  Let me know if
>> further details are needed.
> ..
>> 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040..
> ..
>
>  0x30000040 here means "MRdPerr":
>    "bad data parity detected during PCI master read".
>
> Which means there that a data parity error happened
> during outgoing data transfer on the PCI-X bus.
> This could happen due to noise on the bus,
> dying capacitors, or (?) bad RAM (not sure about the last one).
>
I have heard same thing happened with same kind of configuration, using 
Supermicro H8DME-2 motherboard, Opteron 2378 CPU.

Even the controllers were on same slots.

My initial suspicion was that the motherboard does not drop the PCI-X 
bus frequency to 100MHz and drives the bus at 133MHz even though there 
are 2 controllers connected. Proposed fix was to move the other 
controller to other bus, as the H8DME-2 has four PCI-X slots, 2x100MHz 
and 2x133MHz, but I haven't yet heard back if it helped.

Even the kernel was same - latest Debian distribution kernel. Might be 
worthwile to try using vanilla kernel.org kernel if possible.

I have at home two 6081 controllers at same bus but at 100MHz and no 
problems yet.

-- 
Harri.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-06 12:25   ` Harri Olin
@ 2009-10-06 18:04     ` Bernie Innocenti
  2009-10-06 20:06       ` Mark Lord
  2009-10-08 16:26       ` Bernie Innocenti
  0 siblings, 2 replies; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-06 18:04 UTC (permalink / raw)
  To: Harri Olin; +Cc: Mark Lord, linux-ide, lkml, sysadmin

El Tue, 06-10-2009 a las 15:25 +0300, Harri Olin escribió:
> Mark Lord wrote:
> > Bernie Innocenti wrote:
> >> The error in the subject appears in the console immediately followed bv
> >> a hard freeze of the machine.  The error occurs reproducibly on two
> >> identical Opteron servers, each one equipped with two identical
> >> controller cards:
> >>
> >> 03:04.0 SCSI storage controller: Marvell Technology Group Ltd. 
> >> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
> >> 03:06.0 SCSI storage controller: Marvell Technology Group Ltd. 
> >> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
> >>
> >> We can trigger the problem within a few seconds by starting a
> >> reconstruction on a drive hooked to port 4 (counting from 0) of the
> >> second controller.  Oddly, every other drive works reliably and the
> >> faulty drive works if we connect it to, for example, port 4 of the first
> >> controller.
> >>
> >> Tested with Debian kernels 2.6.26-19 and 2.6.30-8.  Let me know if
> >> further details are needed.
> > ..
> >> 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040..
> > ..
> >
> >  0x30000040 here means "MRdPerr":
> >    "bad data parity detected during PCI master read".
> >
> > Which means there that a data parity error happened
> > during outgoing data transfer on the PCI-X bus.
> > This could happen due to noise on the bus,
> > dying capacitors, or (?) bad RAM (not sure about the last one).
> >
> I have heard same thing happened with same kind of configuration, using 
> Supermicro H8DME-2 motherboard, Opteron 2378 CPU.
>
>Even the controllers were on same slots.

Close.  Mine is a Supermicro H8DM8-2 with 2x Opteron 2374 HE CPU.


> My initial suspicion was that the motherboard does not drop the PCI-X 
> bus frequency to 100MHz and drives the bus at 133MHz even though there 
> are 2 controllers connected. Proposed fix was to move the other 
> controller to other bus, as the H8DME-2 has four PCI-X slots, 2x100MHz 
> and 2x133MHz, but I haven't yet heard back if it helped.

Thanks for this hint, I'll try this tomorrow,


> Even the kernel was same - latest Debian distribution kernel. Might be 
> worthwile to try using vanilla kernel.org kernel if possible.

As a matter of fact, yesterday  I tried booting off an Open Solaris
Nexenta CD and I couldn't reproduce the issue, although I couldn't
reproduce the exact same conditions that trigger the bug systematically
on Linux.


> I have at home two 6081 controllers at same bus but at 100MHz and no 
> problems yet.

Is there a way to find out what the current PCI-X bus frequency is from
Linux?  And from the BIOS?

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-06 18:04     ` Bernie Innocenti
@ 2009-10-06 20:06       ` Mark Lord
  2009-10-07  0:06         ` Bernie Innocenti
  2009-10-08 16:26       ` Bernie Innocenti
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Lord @ 2009-10-06 20:06 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: Harri Olin, linux-ide, lkml, sysadmin

If you could also send me the output of "lspci -vv" for the cards
then I can also have a quick look at chipset errata for possibilities.

The early revs of these chips did have a number of errata specific to PCI-X.

Cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-06 20:06       ` Mark Lord
@ 2009-10-07  0:06         ` Bernie Innocenti
  2009-10-07  1:40           ` Bernie Innocenti
  0 siblings, 1 reply; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-07  0:06 UTC (permalink / raw)
  To: Mark Lord; +Cc: Harri Olin, linux-ide, lkml, sysadmin

El Tue, 06-10-2009 a las 16:06 -0400, Mark Lord escribió:
> If you could also send me the output of "lspci -vv" for the cards
> then I can also have a quick look at chipset errata for possibilities.

See below.

Looking at the Status field, is it correct to say that the cards are
definitely running at 133MHz?  Is there a way to force them to a
different speed from Linux or from the BIOS?


> The early revs of these chips did have a number of errata specific to PCI-X.

I checked the revision (09) against the sata_mv source and I couldn't
spot anything relevant to us.


03:04.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
	Subsystem: Marvell Technology Group Ltd. Device 11ab
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 19
	Region 0: Memory at feb00000 (64-bit, non-prefetchable) [size=1M]
	Region 2: I/O ports at e800 [size=256]
	Region 3: [virtual] Memory at fdc00000 (32-bit, non-prefetchable) [size=4M]
	[virtual] Expansion ROM at fd800000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] PCI-X non-bridge device
		Command: DPERE- ERO- RBC=512 OST=4
		Status: Dev=03:04.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
	Kernel driver in use: sata_mv
	Kernel modules: sata_mv

03:06.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
	Subsystem: Marvell Technology Group Ltd. Device 11ab
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
	Region 2: I/O ports at e400 [size=256]
	Region 3: [virtual] Memory at fd400000 (32-bit, non-prefetchable) [size=4M]
	[virtual] Expansion ROM at fd000000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] PCI-X non-bridge device
		Command: DPERE- ERO- RBC=512 OST=4
		Status: Dev=03:06.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
	Kernel driver in use: sata_mv
	Kernel modules: sata_mv


-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-07  0:06         ` Bernie Innocenti
@ 2009-10-07  1:40           ` Bernie Innocenti
  2009-10-07  3:13             ` Mark Lord
  0 siblings, 1 reply; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-07  1:40 UTC (permalink / raw)
  To: Mark Lord; +Cc: Harri Olin, linux-ide, lkml, sysadmin

El Tue, 06-10-2009 a las 20:06 -0400, Bernie Innocenti escribió:
> > The early revs of these chips did have a number of errata specific to PCI-X.
> 
> I checked the revision (09) against the sata_mv source and I couldn't
> spot anything relevant to us.

NEWSFLASH: today we replaced the 4x500GB Seagate drives with 4x1.5TB
drives and reconstruction of the array has been running for 2h without a
glitch.

One interesting difference is that the 500GB drives were being
configured in 1.5Gbps SATA mode.  Another notable difference is the
sequential read speed: ~70MB/s vs ~130MB/s with the 1.5TB model.

Could the PCI bus errors be a red herring?

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-07  1:40           ` Bernie Innocenti
@ 2009-10-07  3:13             ` Mark Lord
  2009-10-08 16:42               ` Bernie Innocenti
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Lord @ 2009-10-07  3:13 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: Harri Olin, linux-ide, lkml, sysadmin

Bernie Innocenti wrote:
> El Tue, 06-10-2009 a las 20:06 -0400, Bernie Innocenti escribió:
>>> The early revs of these chips did have a number of errata specific to PCI-X.
>> I checked the revision (09) against the sata_mv source and I couldn't
>> spot anything relevant to us.
> 
> NEWSFLASH: today we replaced the 4x500GB Seagate drives with 4x1.5TB
> drives and reconstruction of the array has been running for 2h without a
> glitch.
> 
> One interesting difference is that the 500GB drives were being
> configured in 1.5Gbps SATA mode.  Another notable difference is the
> sequential read speed: ~70MB/s vs ~130MB/s with the 1.5TB model.
> 
> Could the PCI bus errors be a red herring?
..

Dunno.  Rev.9 == "C0" in Marvell terminology,
and that's the latest/final rev for the 6081 chip,
with most of the PCI-X bugs fixed or worked around.
So not much to go on there.

The Bus error report was real, though.
But with 3.0gb/sec sata connections, the chip will be
using some different internal clocks and timings,
which could be enough to avoid triggering the PCI errors.

I guess.  Let's hope so, anyway.

Cheers



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-07  3:13             ` Mark Lord
@ 2009-10-08 16:42               ` Bernie Innocenti
  2009-10-08 17:09                 ` Tony Vroon
  2009-10-09  3:07                 ` Mark Lord
  0 siblings, 2 replies; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-08 16:42 UTC (permalink / raw)
  To: Mark Lord; +Cc: Harri Olin, linux-ide, lkml, sysadmin

El Tue, 06-10-2009 a las 23:13 -0400, Mark Lord escribió:
> Dunno.  Rev.9 == "C0" in Marvell terminology,
> and that's the latest/final rev for the 6081 chip,
> with most of the PCI-X bugs fixed or worked around.
> So not much to go on there.
> 
> The Bus error report was real, though.
> But with 3.0gb/sec sata connections, the chip will be
> using some different internal clocks and timings,
> which could be enough to avoid triggering the PCI errors.
> 
> I guess.  Let's hope so, anyway.

Our prayers have not been answered :-(

I tried several things:

 - Forcing all the 500GB Seagate drives to 3.0Gbps does not help

 - Replacing the 500GB drives with 1.5TB drives seems to make
   the PCI error much less frequent

 - Moving the controllers to different slots (on different busses)
   does not help

 - Happens with both 2.6.26 (from lenny) and 2.6.30 (from sid)

 - Unplugging one of the controllers appeared to lead to a stable
   configuration, but yesterday I left the machines reconstructing
   the arrays and this mornings one of them is not answering
   to pings ;-(

I want to try reducing the frequency of the PCI-X bus, but the BIOS does
not seem to provide a setting for it.  Is there another way?

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-08 16:42               ` Bernie Innocenti
@ 2009-10-08 17:09                 ` Tony Vroon
  2009-10-14 15:24                   ` [SOLVED] " Bernie Innocenti
  2009-10-09  3:07                 ` Mark Lord
  1 sibling, 1 reply; 16+ messages in thread
From: Tony Vroon @ 2009-10-08 17:09 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: Mark Lord, Harri Olin, linux-ide, lkml, sysadmin

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On Thu, 2009-10-08 at 12:42 -0400, Bernie Innocenti wrote:
> El Tue, 06-10-2009 a las 23:13 -0400, Mark Lord escribió:
> I want to try reducing the frequency of the PCI-X bus, but the BIOS does
> not seem to provide a setting for it.  Is there another way?

Generally this is done with a physical jumper on the board instead.
You'll find it near to the bridge chip, which is almost always by NEC.
Another technique to slow the bridge down is to insert a regular PCI
card in the other slot (these bridges tend to offer 2 or 3 slots). As
the weakest link, it'll drag everything down to 33MHz.
An old PCI-X 66MHz-only card may prove helpful here as well. You don't
have to drive it in any way; getting power to it is sufficient.

Regards,
Tony V.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [SOLVED] Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-08 17:09                 ` Tony Vroon
@ 2009-10-14 15:24                   ` Bernie Innocenti
  0 siblings, 0 replies; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-14 15:24 UTC (permalink / raw)
  To: Tony Vroon; +Cc: Mark Lord, Harri Olin, linux-ide, lkml, sysadmin

El Thu, 08-10-2009 a las 18:09 +0100, Tony Vroon escribió:
> On Thu, 2009-10-08 at 12:42 -0400, Bernie Innocenti wrote:
> > El Tue, 06-10-2009 a las 23:13 -0400, Mark Lord escribió:
> > I want to try reducing the frequency of the PCI-X bus, but the BIOS does
> > not seem to provide a setting for it.  Is there another way?
> 
> Generally this is done with a physical jumper on the board instead.
> You'll find it near to the bridge chip, which is almost always by NEC.
> Another technique to slow the bridge down is to insert a regular PCI
> card in the other slot (these bridges tend to offer 2 or 3 slots). As
> the weakest link, it'll drag everything down to 33MHz.
> An old PCI-X 66MHz-only card may prove helpful here as well. You don't
> have to drive it in any way; getting power to it is sufficient.

Hurray! It seems we've fixed our stability issue at last.

We forced the bus speed down to PCI-X 66MHz for both buses by shorting
pins 1-2 of the on-board jumpers.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-08 16:42               ` Bernie Innocenti
  2009-10-08 17:09                 ` Tony Vroon
@ 2009-10-09  3:07                 ` Mark Lord
  2009-10-09  3:16                   ` Mark Lord
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Lord @ 2009-10-09  3:07 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: Harri Olin, linux-ide, lkml, sysadmin

Bernie Innocenti wrote:
>
> I want to try reducing the frequency of the PCI-X bus, but the BIOS does
> not seem to provide a setting for it.  Is there another way?
..

Nothing that's easy.

Here.. apply this patch, and post the output after you reboot with it.


--- 2.6.31/drivers/ata/sata_mv.c.orig	2009-08-21 22:16:05.000000000 -0400
+++ linux/drivers/ata/sata_mv.c	2009-10-08 23:05:37.392203506 -0400
@@ -3738,6 +3738,12 @@
 			hp_flags |= MV_HP_ERRATA_60X1B2;
 			break;
 		case 0x9:
+		{
+			struct mv_host_priv *hpriv = host->private_data;
+			void __iomem *mmio = hpriv->base;
+			printk(KERN_INFO "sata_mv: pcix_mode=%d\n", mv_in_pcix_mode(host));
+			printk(KERN_INFO "sata_mv: MV_PCI_COMMAND=%08x\n", readl(mmio + MV_PCI_COMMAND);
+		}
 			hp_flags |= MV_HP_ERRATA_60X1C0;
 			break;
 		default:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-09  3:07                 ` Mark Lord
@ 2009-10-09  3:16                   ` Mark Lord
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Lord @ 2009-10-09  3:16 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: Harri Olin, linux-ide, lkml, sysadmin

Mark Lord wrote:
> Bernie Innocenti wrote:
>>
>> I want to try reducing the frequency of the PCI-X bus, but the BIOS does
>> not seem to provide a setting for it.  Is there another way?
> ..
> 
> Nothing that's easy.
..

Adding to that:  there is a register on the chip,
which software could use to override the normal auto-detected
PCI mode (bus speed) for the chip.  This could be used to,
say, select 100Mhz or 66Mhz, or even 33Mhz operation.

BUT.. the register is autodetected from the bus at power-on,
and so if software wants to override that (by rewriting the reg),
it will also need to reset the PCI bus afterward.

Which requires knowing how to reset a PCI bridge,
something I don't know about.

Cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-06 18:04     ` Bernie Innocenti
  2009-10-06 20:06       ` Mark Lord
@ 2009-10-08 16:26       ` Bernie Innocenti
  2009-10-08 21:51         ` Harri Olin
  1 sibling, 1 reply; 16+ messages in thread
From: Bernie Innocenti @ 2009-10-08 16:26 UTC (permalink / raw)
  To: Harri Olin; +Cc: Mark Lord, linux-ide, lkml, sysadmin

El Tue, 06-10-2009 a las 14:04 -0400, Bernie Innocenti escribió:
> > I have heard same thing happened with same kind of configuration, using 
> > Supermicro H8DME-2 motherboard, Opteron 2378 CPU.
> >
> >Even the controllers were on same slots.
> 
> Close.  Mine is a Supermicro H8DM8-2 with 2x Opteron 2374 HE CPU.

I was wrong (the BIOS DMI block is wrong).  The motherboard is labeled
as H8DME-2.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
  2009-10-08 16:26       ` Bernie Innocenti
@ 2009-10-08 21:51         ` Harri Olin
  0 siblings, 0 replies; 16+ messages in thread
From: Harri Olin @ 2009-10-08 21:51 UTC (permalink / raw)
  To: Bernie Innocenti; +Cc: Mark Lord, linux-ide, lkml, sysadmin

Bernie Innocenti kirjoitti:
> El Tue, 06-10-2009 a las 14:04 -0400, Bernie Innocenti escribió:
>>> I have heard same thing happened with same kind of configuration, using 
>>> Supermicro H8DME-2 motherboard, Opteron 2378 CPU.
>>>
>>> Even the controllers were on same slots.
>> Close.  Mine is a Supermicro H8DM8-2 with 2x Opteron 2374 HE CPU.
> 
> I was wrong (the BIOS DMI block is wrong).  The motherboard is labeled
> as H8DME-2.
> 

H8DME-2 is the same board as H8DM8-2, just without scsi controller.

There is 2 3-pin jumpers somewhere between pci-x slots, one for each 
bus. With these you can force the bus to 66MHz PCI or 66MHz PCI-X. 
Without jumper means autodetect. Note that this information is only from 
manual, haven't been able to confirm what it really does :)

Oh and on the other identical case, I heard that moving other controller 
to different bus (1st controller in top slot and 2nd controller in 2nd 
slot from bottom) resolved the issue, or at least it has not error'd yet.

-- 
Harri.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-10-14 15:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-03  5:10 sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040 Bernie Innocenti
2009-10-05 21:45 ` Mark Lord
2009-10-06  4:16   ` Bernie Innocenti
2009-10-06 12:25   ` Harri Olin
2009-10-06 18:04     ` Bernie Innocenti
2009-10-06 20:06       ` Mark Lord
2009-10-07  0:06         ` Bernie Innocenti
2009-10-07  1:40           ` Bernie Innocenti
2009-10-07  3:13             ` Mark Lord
2009-10-08 16:42               ` Bernie Innocenti
2009-10-08 17:09                 ` Tony Vroon
2009-10-14 15:24                   ` [SOLVED] " Bernie Innocenti
2009-10-09  3:07                 ` Mark Lord
2009-10-09  3:16                   ` Mark Lord
2009-10-08 16:26       ` Bernie Innocenti
2009-10-08 21:51         ` Harri Olin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox