public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* aic7xxx broken in 2.5.53/54 ?
@ 2003-01-03 10:16 Dipankar Sarma
  2003-01-03 15:14 ` Justin T. Gibbs
  0 siblings, 1 reply; 8+ messages in thread
From: Dipankar Sarma @ 2003-01-03 10:16 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel

Looks like the aic7xxx driver in 2.5.53 and 54 are broken on my hardware.
The older driver works fine. The new driver used to work until 2.5.52.
Does this look familiar to anyone ?

hda: ATAPI 48X CD-ROM drive, 120kB Cache, (U)DMA
Uniform CD-ROM driver Revision: 3.12
end_request: I/O error, dev hda, sector 0
aic7xxx: PCI Device 0:1:0 failed memory mapped test.  Using PIO.
Uhhuh. NMI received for unknown reason 25 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
aic7xxx: PCI Device 0:1:1 failed memory mapped test.  Using PIO.
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.25
        <Adaptec aic7896/97 Ultra2 SCSI adapter>
        aic7896/97: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi0: PCI error Interrupt at seqaddr = 0x2
scsi0: Signaled a Target Abort
scsi1: PCI error Interrupt at seqaddr = 0x2
scsi1: Signaled a Target Abort
Uhhuh. NMI received for unknown reason 25 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
(scsi1:A:0): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
(scsi1:A:1): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
(scsi1:A:2): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.25
        <Adaptec aic7896/97 Ultra2 SCSI adapter>
        aic7896/97: Ultra2 Wide Channel B, SCSI Id=7, 32/253 SCBs

  Vendor: IBM-ESXS  Model: ST318305LC    !#  Rev: B245
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:0:0: Tagged Queuing enabled.  Depth 253
  Vendor: IBM-ESXS  Model: ST318305LC    !#  Rev: B245
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:1:0: Tagged Queuing enabled.  Depth 253
  Vendor: IBM-ESXS  Model: ST318305LC    !#  Rev: B245
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:2:0: Tagged Queuing enabled.  Depth 253
  Vendor: IBM       Model: AuSaV1S2          Rev: 0
  Type:   Processor                          ANSI SCSI revision: 02

The hardware [4-CPU P3 xeon] -

[root@llm04 root]# lspci
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 21)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006
00:00.3 Host bridge: ServerWorks: Unknown device 0006
00:01.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
00:01.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
00:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] )00:06.0 VGA compatible controller: S3 Inc. Trio 64 3D (rev 01)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 04)
02:03.0 RAID bus controller: IBM Netfinity ServeRAID controller
02:05.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0c)
02:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0c)

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-03 10:16 aic7xxx broken in 2.5.53/54 ? Dipankar Sarma
@ 2003-01-03 15:14 ` Justin T. Gibbs
  2003-01-06  7:32   ` Dipankar Sarma
  0 siblings, 1 reply; 8+ messages in thread
From: Justin T. Gibbs @ 2003-01-03 15:14 UTC (permalink / raw)
  To: dipankar, linux-scsi; +Cc: linux-kernel

> Looks like the aic7xxx driver in 2.5.53 and 54 are broken on my hardware.

It looks like the driver recovers fine.

...

> aic7xxx: PCI Device 0:1:0 failed memory mapped test.  Using PIO.
> Uhhuh. NMI received for unknown reason 25 on CPU 0.

SERR must be enabled by your BIOS.  I will change the driver so
that, should the memory mapped I/O test fail, an SERR (and thus an
NMI) is not generated.

...

> scsi0: PCI error Interrupt at seqaddr = 0x2
> scsi0: Signaled a Target Abort

These are left over from the failed memory mapped I/O test.  They
should have been cleared by the test, but the behavior must be
different for the 7896/97.  I'll review the documentation for this
chip and see if I can quiet up the failure.

Just out of curiosity, do you have any strange PCI options enabled
in your BIOS?  I remeber seeing memory mapped I/O failures on this
ServerWorks chipset under FreeBSD in the past, but an updated BIOS
resolved the issue for the affected users.  It seemed that the BIOS
incorrectly placed the Adaptec controller in a prefetchable region.

--
Justin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-03 15:14 ` Justin T. Gibbs
@ 2003-01-06  7:32   ` Dipankar Sarma
  2003-01-06 16:16     ` Justin T. Gibbs
  0 siblings, 1 reply; 8+ messages in thread
From: Dipankar Sarma @ 2003-01-06  7:32 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: linux-scsi, linux-kernel

Hi Justin,

On Fri, Jan 03, 2003 at 08:14:06AM -0700, Justin T. Gibbs wrote:
> > Looks like the aic7xxx driver in 2.5.53 and 54 are broken on my hardware.
> 
> It looks like the driver recovers fine.

Not for long. It dies shortly afterwards.

> > aic7xxx: PCI Device 0:1:0 failed memory mapped test.  Using PIO.
> > Uhhuh. NMI received for unknown reason 25 on CPU 0.
> 
> SERR must be enabled by your BIOS.  I will change the driver so
> that, should the memory mapped I/O test fail, an SERR (and thus an
> NMI) is not generated.

I guess having to use PIO with aic7xxx is bad. MMIO failure is
what we need to investigate.

> 
> Just out of curiosity, do you have any strange PCI options enabled
> in your BIOS?  I remeber seeing memory mapped I/O failures on this
> ServerWorks chipset under FreeBSD in the past, but an updated BIOS
> resolved the issue for the affected users.  It seemed that the BIOS
> incorrectly placed the Adaptec controller in a prefetchable region.
> 

I didn't change anything in that box since it was delivered to me. FYI
it is an IBM x250. Would it help if I can get a PCI space dump and mtrr 
dump ? FWIW, the older driver works fine. Does the older driver use 
only PIO ?

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-06  7:32   ` Dipankar Sarma
@ 2003-01-06 16:16     ` Justin T. Gibbs
  2003-01-08  2:41       ` Tomas Szepe
  2003-01-09 11:52       ` David Lang
  0 siblings, 2 replies; 8+ messages in thread
From: Justin T. Gibbs @ 2003-01-06 16:16 UTC (permalink / raw)
  To: dipankar; +Cc: linux-scsi, linux-kernel

> Hi Justin,
> 
> On Fri, Jan 03, 2003 at 08:14:06AM -0700, Justin T. Gibbs wrote:
>> > Looks like the aic7xxx driver in 2.5.53 and 54 are broken on my
>> > hardware.
>> 
>> It looks like the driver recovers fine.
> 
> Not for long. It dies shortly afterwards.

In what fashion?

>> > aic7xxx: PCI Device 0:1:0 failed memory mapped test.  Using PIO.
>> > Uhhuh. NMI received for unknown reason 25 on CPU 0.
>> 
>> SERR must be enabled by your BIOS.  I will change the driver so
>> that, should the memory mapped I/O test fail, an SERR (and thus an
>> NMI) is not generated.
> 
> I guess having to use PIO with aic7xxx is bad. MMIO failure is
> what we need to investigate.

The only way that I know how to investigate these issues is
with a PCI bus analyzer.  We're in the process of going through
all of the systems we have in our lab to see which ones fail and
why, but I certainly don't have one of every failing system on
the planet. 8-)

>> Just out of curiosity, do you have any strange PCI options enabled
>> in your BIOS?  I remeber seeing memory mapped I/O failures on this
>> ServerWorks chipset under FreeBSD in the past, but an updated BIOS
>> resolved the issue for the affected users.  It seemed that the BIOS
>> incorrectly placed the Adaptec controller in a prefetchable region.
>> 
> 
> I didn't change anything in that box since it was delivered to me. FYI
> it is an IBM x250. Would it help if I can get a PCI space dump and mtrr 
> dump ? FWIW, the older driver works fine. Does the older driver use 
> only PIO ?

It would be good to know the chipset on the motherboard.  As to why
the old driver worked, for 6.X.X drivers, you may have just been lucky.
For 5.X.X drivers, they perform a read after every register write to
"manually" prevent any byte-merging.  These reads are actually more
expensive than just using PIO.  Neither of these older drivers included
a test to try and catch fishy behavior.

--
Justin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-06 16:16     ` Justin T. Gibbs
@ 2003-01-08  2:41       ` Tomas Szepe
  2003-01-08  4:23         ` Justin T. Gibbs
  2003-01-09 11:52       ` David Lang
  1 sibling, 1 reply; 8+ messages in thread
From: Tomas Szepe @ 2003-01-08  2:41 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: dipankar, linux-scsi, linux-kernel

> [gibbs@scsiguy.com]
>
> These reads are actually more expensive than just using PIO.  Neither of
> these older drivers included a test to try and catch fishy behavior.

Justin, are you quite sure that these tests actually work?
I too have just run into

aic7xxx: PCI Device 0:16:0 failed memory mapped test.  Using PIO.
aic7xxx: PCI Device 0:17:0 failed memory mapped test.  Using PIO.

with aic79xx-linux-2.4-20021230 (6.2.25) in Linux 2.4.21-pre3.
What makes me scratch my head in particular is:

	o  The chipset is i440BX aka the Compatibility King.
	o  I've never had *any* problems with 6.2.8.

Full ahc boot-up messages follow:

PCI: Found IRQ 11 for device 00:10.0
aic7xxx: PCI Device 0:16:0 failed memory mapped test.  Using PIO.
PCI: Found IRQ 10 for device 00:11.0
aic7xxx: PCI Device 0:17:0 failed memory mapped test.  Using PIO.
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.25
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Ultra Single Channel A, SCSI Id=7, 16/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.25
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Ultra Single Channel A, SCSI Id=7, 16/253 SCBs

(scsi1:A:4): 20.000MB/s transfers (20.000MHz, offset 15)
(scsi0:A:3): 20.000MB/s transfers (20.000MHz, offset 15)
  Vendor: SEAGATE   Model: ST39173N          Rev: 6244
  Type:   Direct-Access                      ANSI SCSI revision: 02
scsi0:A:3:0: Tagged Queuing enabled.  Depth 253
  Vendor: SEAGATE   Model: ST39173N          Rev: 6244
  Type:   Direct-Access                      ANSI SCSI revision: 02
scsi1:A:4:0: Tagged Queuing enabled.  Depth 253
Attached scsi disk sda at scsi0, channel 0, id 3, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 4, lun 0
...

-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-08  2:41       ` Tomas Szepe
@ 2003-01-08  4:23         ` Justin T. Gibbs
  2003-01-08 10:05           ` Tomas Szepe
  0 siblings, 1 reply; 8+ messages in thread
From: Justin T. Gibbs @ 2003-01-08  4:23 UTC (permalink / raw)
  To: Tomas Szepe; +Cc: dipankar, linux-scsi, linux-kernel

>> [gibbs@scsiguy.com]
>> 
>> These reads are actually more expensive than just using PIO.  Neither of
>> these older drivers included a test to try and catch fishy behavior.
> 
> Justin, are you quite sure that these tests actually work?
> I too have just run into

See my recent post to the SCSI list.  The tests don't work on
certain older controllers that lack a feature I was using.  The
latest csets submitted to Linus correct this problem (as verified
on a dusty dual P-90 PCI/EISA box just added to our regression cluster).

--
Justin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-08  4:23         ` Justin T. Gibbs
@ 2003-01-08 10:05           ` Tomas Szepe
  0 siblings, 0 replies; 8+ messages in thread
From: Tomas Szepe @ 2003-01-08 10:05 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: dipankar, linux-scsi, linux-kernel

> [gibbs@scsiguy.com]
> 
> > [gibbs@scsiguy.com]
> > 
> > These reads are actually more expensive than just using PIO.  Neither of
> > these older drivers included a test to try and catch fishy behavior.
> > 
> > Justin, are you quite sure that these tests actually work?
> > I too have just run into
> 
> See my recent post to the SCSI list.  The tests don't work on
> certain older controllers that lack a feature I was using.  The
> latest csets submitted to Linus correct this problem (as verified
> on a dusty dual P-90 PCI/EISA box just added to our regression cluster).

Ok.  I can confirm 6.2.26 fixes the false positive here:

PCI: Found IRQ 11 for device 00:10.0
PCI: Found IRQ 10 for device 00:11.0
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.26
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Ultra Single Channel A, SCSI Id=7, 16/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.26
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Ultra Single Channel A, SCSI Id=7, 16/253 SCBs

Thanks,
-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: aic7xxx broken in 2.5.53/54 ?
  2003-01-06 16:16     ` Justin T. Gibbs
  2003-01-08  2:41       ` Tomas Szepe
@ 2003-01-09 11:52       ` David Lang
  1 sibling, 0 replies; 8+ messages in thread
From: David Lang @ 2003-01-09 11:52 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: dipankar, linux-scsi, linux-kernel

I just tried 2.5.55 and it still locks up. I will hook up my laptop and
see if I can get aa serial console dump tomorrow night.

messages are

Slave Alloc 0
launching DV thread
begin domain validation
scsi0:2477 going from state 0 to state 1
scsi0:A:0:0: sending INQ
scsi0:timeout while doing DV command 12
scsi0:0:0:0 command completed status=0x90000
scsi0:A:0:0 enntering ahc_linux_dv_transition, state=1 statis=0x14005, cmd->result=0x90000
scsi0:2645 going from state 1 to state 1

at this point all the messages between the 'going to state' messages
repeat exactly, this happens for a couple min and then a whole bunch of
other stuff scrolls by (I don't know if this happens on previous versions,
I had given up before that much time had passed) the final message is
something about a recovery sleep and then the machine stops responding (I
waited 10 min this time to make sure it wasn't going to start working
again)

Daavid Lang

 On Mon, 6 Jan 2003, Justin T. Gibbs wrote:

> Date: Mon, 06 Jan 2003 09:16:53 -0700
> From: Justin T. Gibbs <gibbs@scsiguy.com>
> To: dipankar@in.ibm.com
> Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
> Subject: Re: aic7xxx broken in 2.5.53/54 ?
>
> > Hi Justin,
> >
> > On Fri, Jan 03, 2003 at 08:14:06AM -0700, Justin T. Gibbs wrote:
> >> > Looks like the aic7xxx driver in 2.5.53 and 54 are broken on my
> >> > hardware.
> >>
> >> It looks like the driver recovers fine.
> >
> > Not for long. It dies shortly afterwards.
>
> In what fashion?
>
> >> > aic7xxx: PCI Device 0:1:0 failed memory mapped test.  Using PIO.
> >> > Uhhuh. NMI received for unknown reason 25 on CPU 0.
> >>
> >> SERR must be enabled by your BIOS.  I will change the driver so
> >> that, should the memory mapped I/O test fail, an SERR (and thus an
> >> NMI) is not generated.
> >
> > I guess having to use PIO with aic7xxx is bad. MMIO failure is
> > what we need to investigate.
>
> The only way that I know how to investigate these issues is
> with a PCI bus analyzer.  We're in the process of going through
> all of the systems we have in our lab to see which ones fail and
> why, but I certainly don't have one of every failing system on
> the planet. 8-)
>
> >> Just out of curiosity, do you have any strange PCI options enabled
> >> in your BIOS?  I remeber seeing memory mapped I/O failures on this
> >> ServerWorks chipset under FreeBSD in the past, but an updated BIOS
> >> resolved the issue for the affected users.  It seemed that the BIOS
> >> incorrectly placed the Adaptec controller in a prefetchable region.
> >>
> >
> > I didn't change anything in that box since it was delivered to me. FYI
> > it is an IBM x250. Would it help if I can get a PCI space dump and mtrr
> > dump ? FWIW, the older driver works fine. Does the older driver use
> > only PIO ?
>
> It would be good to know the chipset on the motherboard.  As to why
> the old driver worked, for 6.X.X drivers, you may have just been lucky.
> For 5.X.X drivers, they perform a read after every register write to
> "manually" prevent any byte-merging.  These reads are actually more
> expensive than just using PIO.  Neither of these older drivers included
> a test to try and catch fishy behavior.
>
> --
> Justin
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-01-09 11:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-03 10:16 aic7xxx broken in 2.5.53/54 ? Dipankar Sarma
2003-01-03 15:14 ` Justin T. Gibbs
2003-01-06  7:32   ` Dipankar Sarma
2003-01-06 16:16     ` Justin T. Gibbs
2003-01-08  2:41       ` Tomas Szepe
2003-01-08  4:23         ` Justin T. Gibbs
2003-01-08 10:05           ` Tomas Szepe
2003-01-09 11:52       ` David Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox