Re: Re[2]: Sata Sil3512 bug?

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Re[2]: Sata Sil3512 bug?
@ 2007-10-03  7:26 Mikael Pettersson
  2007-10-03  8:31 ` Alexander Sabourenkov
  0 siblings, 1 reply; 32+ messages in thread
From: Mikael Pettersson @ 2007-10-03  7:26 UTC (permalink / raw)
  To: MisterE2002, htejun; +Cc: alan, benh, jgarzik, linux-ide

On Tue, 2 Oct 2007 21:20:23 +0200, MisterE wrote:
> I build another setup with almost the same hardware.
> This motherboard had already the latest bios.
> I notice that the computer does almost never find the hard drive
> although the controller is found every time (with lspci). So i get no
> drive (sda) assigned. I don't always see the "bios" screen from the
> controller at startup. And in the past it showed the hard drive.
> So i could not experiment with this motherboard.
> 
> After that i installed Windows XP and used the orginal (sweex)
> drivers with the first motherboard. This also makes the data corrupt.
> So it seems not to be an linux problem. So there is something wrong with
> the motherboard or the 3512 controller.
> 
> After that i plugged both hard drives (ide with windows and sata disk)
> to the Asus board. No data corruption. So the hard disks are'nt the
> problem either.
> 
> I'm thinking of replacing both 3512 controllers with a Promise SATA300
> TX4. Do you know if there are problems with this device?

(please don't top-post)

There are no known data-corruption issues with Promise SATA cards.
However, some of them, especially the 2nd generation SATA300 TX4,
are known to trigger intermittent error interrupts (that are dealt
with but may cause a speed reduction) in some systems. We're still
scratching our heads on that issue.

/Mikael

> Friday, September 28, 2007, 6:55:47 PM, you wrote:
> 
> > Alan Cox wrote:
> >>> sda1 are corrupted (2 to 4 blocks missing). Copying that data back to
> >>> Windows and it give the same results in Quickpar. So reading does not
> >>> have problems. The data written to hda1 is correct.
> >> 
> >> We've got a whole pile of reports like this with the 3512 and almost
> >> always Nvidia chipset, plus reports of BIOS updates fixing it. That you
> >> see something similar on intel boards is a bit worrying.
> 
> > Multiple sil3112/3512 + nvidia chipset problem doesn't usually involve
> > device errors or timeouts.  It usually corrupts data silently.  And,
> > yeah, data corruption on intel board is really disturbing.
> 
> > MisterE, do you have any processor powersaving mechanism enabled?  If
> > so, can you disable all and see whether that changes anything?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?
  2007-10-03  7:26 Re[2]: Sata Sil3512 bug? Mikael Pettersson
@ 2007-10-03  8:31 ` Alexander Sabourenkov
  2007-10-03 14:45   ` Re[2]: " MisterE
                     ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-03  8:31 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: MisterE2002, htejun, alan, benh, jgarzik, linux-ide

Mikael Pettersson wrote:

>>
>> I'm thinking of replacing both 3512 controllers with a Promise SATA300
>> TX4. Do you know if there are problems with this device?
> 
> (please don't top-post)
> 
> There are no known data-corruption issues with Promise SATA cards.
> However, some of them, especially the 2nd generation SATA300 TX4,
> are known to trigger intermittent error interrupts (that are dealt
> with but may cause a speed reduction) in some systems. We're still
> scratching our heads on that issue.
> 

But see this thread:

http://marc.info/?l=linux-ide&m=119122463403033&w=2
http://www.spinics.net/lists/linux-ide/msg14868.html

Personally I would not recommend Promise SATA300 TX4 at the moment.

-- 

./lxnt



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: Sata Sil3512 bug?
  2007-10-03  8:31 ` Alexander Sabourenkov
@ 2007-10-03 14:45   ` MisterE
  2007-10-03 14:50     ` Alan Cox
  2007-10-14 12:07   ` Re[2]: " MisterE
  2007-10-17 12:39   ` Re[2]: Sata Sil3512 bug?; Promise SATA300 TX4 MisterE
  2 siblings, 1 reply; 32+ messages in thread
From: MisterE @ 2007-10-03 14:45 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: Mikael Pettersson, htejun, alan, benh, jgarzik, linux-ide

Hello Alexander,

Wednesday, October 3, 2007, 10:31:17 AM, you wrote:

> Mikael Pettersson wrote:

>>>
>>> I'm thinking of replacing both 3512 controllers with a Promise SATA300
>>> TX4. Do you know if there are problems with this device?
>> 
>> (please don't top-post)
>> 
>> There are no known data-corruption issues with Promise SATA cards.
>> However, some of them, especially the 2nd generation SATA300 TX4,
>> are known to trigger intermittent error interrupts (that are dealt
>> with but may cause a speed reduction) in some systems. We're still
>> scratching our heads on that issue.
>> 

> But see this thread:

> http://marc.info/?l=linux-ide&m=119122463403033&w=2
> http://www.spinics.net/lists/linux-ide/msg14868.html

> Personally I would not recommend Promise SATA300 TX4 at the moment.


That is not hopefull. Highpoint does not have sata controllers (Except
softraid controllers). Other (real raid controllers) brands are too
expensive or/and does not have a PCI interface.
Maybe i should keep those 3512 cards? How are the user experiences
with these controllers (except nvidia boards)? Because i don't really
trust the intel boards so using the Asus would be an option.


-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?
  2007-10-03 14:45   ` Re[2]: " MisterE
@ 2007-10-03 14:50     ` Alan Cox
  0 siblings, 0 replies; 32+ messages in thread
From: Alan Cox @ 2007-10-03 14:50 UTC (permalink / raw)
  To: MisterE
  Cc: Alexander Sabourenkov, Mikael Pettersson, htejun, benh, jgarzik,
	linux-ide

> That is not hopefull. Highpoint does not have sata controllers (Except
> softraid controllers). Other (real raid controllers) brands are too

There are pretty much no "real" RAID controllers in the ATA world except
the very high end pricy ones.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: Sata Sil3512 bug?
  2007-10-03  8:31 ` Alexander Sabourenkov
  2007-10-03 14:45   ` Re[2]: " MisterE
@ 2007-10-14 12:07   ` MisterE
  2007-10-15  8:44     ` Alexander Sabourenkov
  2007-10-17 12:39   ` Re[2]: Sata Sil3512 bug?; Promise SATA300 TX4 MisterE
  2 siblings, 1 reply; 32+ messages in thread
From: MisterE @ 2007-10-14 12:07 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: Mikael Pettersson, htejun, alan, benh, jgarzik, linux-ide

Hello,

Alexander, does these problems with the Promise SATA300 TX4 happen to
everyone?

The only alternatives are
using soft-raid products as normal controllers. Does anyone have experiences
with the following products?
* Highpoint RocketRAID 1640 (150 MB/s)
* Highpoint RocketRAID 1740 (300 MB/s)
* Adaptec 1210SA


Wednesday, October 3, 2007, 10:31:17 AM, you wrote:

> Mikael Pettersson wrote:

>>>
>>> I'm thinking of replacing both 3512 controllers with a Promise SATA300
>>> TX4. Do you know if there are problems with this device?
>> 
>> (please don't top-post)
>> 
>> There are no known data-corruption issues with Promise SATA cards.
>> However, some of them, especially the 2nd generation SATA300 TX4,
>> are known to trigger intermittent error interrupts (that are dealt
>> with but may cause a speed reduction) in some systems. We're still
>> scratching our heads on that issue.
>> 

> But see this thread:

> http://marc.info/?l=linux-ide&m=119122463403033&w=2
> http://www.spinics.net/lists/linux-ide/msg14868.html

> Personally I would not recommend Promise SATA300 TX4 at the moment.




-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?
  2007-10-14 12:07   ` Re[2]: " MisterE
@ 2007-10-15  8:44     ` Alexander Sabourenkov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-15  8:44 UTC (permalink / raw)
  To: MisterE; +Cc: Mikael Pettersson, htejun, alan, benh, jgarzik, linux-ide

MisterE wrote:
> Hello,
> 
> Alexander, does these problems with the Promise SATA300 TX4 happen to
> everyone?
> 

Most probably not, as I think it would have been fixed much faster then.

I was waiting for a) release of 2.6.23, and b) me completing the move to 
another flat
to retest all the latest developments in mainline and libata-dev.

With a) done and b) almost done, I'll retest and report any issues quite 
soon.

Besides, there is a report of TX4 and 2.6.23 not showing problems that 
were there with 2.6.22,
( see "Bug is fixed in 2.6.23.1: sata_promise: port is slow to respond, 
reset failed" thread).

> The only alternatives are
> using soft-raid products as normal controllers. Does anyone have experiences
> with the following products?
> * Highpoint RocketRAID 1640 (150 MB/s)
> * Highpoint RocketRAID 1740 (300 MB/s)
> * Adaptec 1210SA
> 

For any kind of non-hobby task I'd skip trying to build a disk array to 
buying a SATA-SCSI/SATA-iSCSI box.
While I had many mind-boggling issues with various combinations of SATA 
HDDs, onboard and standalone
controllers, Promise and Infortrend disk arrays worked quite reliably.

-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-03  8:31 ` Alexander Sabourenkov
  2007-10-03 14:45   ` Re[2]: " MisterE
  2007-10-14 12:07   ` Re[2]: " MisterE
@ 2007-10-17 12:39   ` MisterE
  2007-10-17 12:54     ` Alexander Sabourenkov
  2 siblings, 1 reply; 32+ messages in thread
From: MisterE @ 2007-10-17 12:39 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: Mikael Pettersson, htejun, alan, benh, jgarzik, linux-ide, jeff

Hello,

Wednesday, October 3, 2007, 10:31:17 AM, you wrote:

> Mikael Pettersson wrote:

>>>
>>> I'm thinking of replacing both 3512 controllers with a Promise SATA300
>>> TX4. Do you know if there are problems with this device?
>> 
>> (please don't top-post)
>> 
>> There are no known data-corruption issues with Promise SATA cards.
>> However, some of them, especially the 2nd generation SATA300 TX4,
>> are known to trigger intermittent error interrupts (that are dealt
>> with but may cause a speed reduction) in some systems. We're still
>> scratching our heads on that issue.
>> 

> But see this thread:

> http://marc.info/?l=linux-ide&m=119122463403033&w=2
> http://www.spinics.net/lists/linux-ide/msg14868.html

> Personally I would not recommend Promise SATA300 TX4 at the moment.

After all the problems i had with the sweex 3512 cards i returned them
to the shop and decided to buy a Sata300 TX4 (because the shop nearby
had one. Unfortunately the shops in the region don't have Highpoints)

Things looked promising when i inserted the card in both Intel D815EEA
motherboards. No problems detecting the hard drives (unlike with the 3512 cards).
With the 3512 i had LOTS of error messages and corrupt data when writing to it.
Using a separate videocard, instead of the onboard one, seemed to reduce the amount of errors.

But after some heavy reading/writing with the promise i got 2 errors. (see log file).
But i did'nt find any corrupt files. I can not reproduce the error.
I'm not sure if these are the "intermittent error interrupts" Mikael
Pettersson mentioned?

ps: as you can i see i got at the boot some errors from the boot disk
(hda). I not sure what is wrong with it. Sometimes it produce these
errors. Used a non-destructive read-write test with badblocks but no
bad sectors found. I don't know if this could influence the sata controller.

Alexander Sabourenkov can you please tell me where i can find the
"Bug is fixed in 2.6.23.1: sata_promise: port is slow to respond,
reset failed" thread you mentioned?

I also see that the driver is now at version 2.10. Is there something
really critical changed? I've tried testing with Debian stable
(2.6.18-4-686; sata_promise: 1.04) and with Debian Unstable
(2.6.22-2-686; sata_promise: 2.07). 2.6.23 is not in the repositories
yet.

So basically the question is this. Can i trust the SATA300 TX4 or
should i buy a Highpoint RocketRAID 1640/1740?. I can order such device
online but i need to be sure that it works correctly :(

-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-17 12:39   ` Re[2]: Sata Sil3512 bug?; Promise SATA300 TX4 MisterE
@ 2007-10-17 12:54     ` Alexander Sabourenkov
  2007-10-17 15:04       ` Re[2]: " MisterE
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-17 12:54 UTC (permalink / raw)
  To: MisterE; +Cc: Mikael Pettersson, htejun, alan, benh, jgarzik, linux-ide, jeff

MisterE wrote:
  > But after some heavy reading/writing with the promise i got 2 
errors. (see log file).

Log file got lost. Please post relevant parts inline.


> Alexander Sabourenkov can you please tell me where i can find the
> "Bug is fixed in 2.6.23.1: sata_promise: port is slow to respond,
> reset failed" thread you mentioned?

That would be this one:
(got split into two parts)
http://www.spinics.net/lists/linux-ide/msg14069.html
http://www.spinics.net/lists/linux-ide/msg15299.html


> 
> So basically the question is this. Can i trust the SATA300 TX4 or
> should i buy a Highpoint RocketRAID 1640/1740?. I can order such device
> online but i need to be sure that it works correctly :(
> 

Since you have the hardware, do the tests and decide for yourself.

I'd try copying one (big, preferably over 160G ) disk onto another (with 
dd) for a start,
while waiting for answers on mailing lists.


-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-17 12:54     ` Alexander Sabourenkov
@ 2007-10-17 15:04       ` MisterE
  2007-10-17 19:21         ` Peter Favrholdt
  2007-10-18 21:07         ` Alexander Sabourenkov
  0 siblings, 2 replies; 32+ messages in thread
From: MisterE @ 2007-10-17 15:04 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: Mikael Pettersson, htejun, alan, benh, jgarzik, linux-ide, jeff

Hello Alexander,

Wednesday, October 17, 2007, 2:54:25 PM, you wrote:

> Log file got lost. Please post relevant parts inline.

Sorry, i totally forgot to include them.
I can not reproduce the errors. Last times hda did not give errors. So i'm
not sure if it is related to each other. (in the thread you mentioned
that you can't explain the fixing of problem from Peter Favrholdt, so
maybe it has indeed something to do with the libata)

ct 16 14:10:59 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:10:59 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:10:59 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:12:49 fileserver kernel: kjournald starting.  Commit interval 5 seconds
Oct 16 14:12:49 fileserver kernel: EXT3 FS on sda1, internal journal
Oct 16 14:12:49 fileserver kernel: EXT3-fs: mounted filesystem with ordered data mode.
Oct 16 14:13:34 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:13:34 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:13:34 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:17:21 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:17:21 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:17:21 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:17:21 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 16 14:17:21 fileserver kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Oct 16 14:17:21 fileserver kernel: ide: failed opcode was: unknown
Oct 16 14:17:21 fileserver kernel: hdb: DMA disabled
Oct 16 14:17:21 fileserver kernel: ide0: reset: success
Oct 16 14:32:51 fileserver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Oct 16 14:32:51 fileserver kernel: ata1.00: (port_status 0x20080000)
Oct 16 14:32:51 fileserver kernel: ata1.00: cmd c8/00:00:77:f6:6c/00:00:00:00:00/e4 tag 0 cdb 0x0 data 131072 in
Oct 16 14:32:51 fileserver kernel:          res 50/00:00:76:f7:6c/00:00:00:00:00/e4 Emask 0x2 (HSM violation)
Oct 16 14:32:51 fileserver kernel: ata1: soft resetting port
Oct 16 14:32:51 fileserver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 16 14:32:51 fileserver kernel: ata1.00: configured for UDMA/133
Oct 16 14:32:51 fileserver kernel: ata1: EH complete
Oct 16 14:32:51 fileserver kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Oct 16 14:32:51 fileserver kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 16 14:32:51 fileserver kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 16 14:32:51 fileserver kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 16 14:44:09 fileserver kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Oct 16 14:48:48 fileserver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Oct 16 14:48:48 fileserver kernel: ata1.00: (port_status 0x20080000)
Oct 16 14:48:48 fileserver kernel: ata1.00: cmd 25/00:00:3f:d0:26/00:01:23:00:00/e0 tag 0 cdb 0x0 data 131072 in
Oct 16 14:48:48 fileserver kernel:          res 50/00:00:3e:d1:26/00:00:23:00:00/e0 Emask 0x2 (HSM violation)
Oct 16 14:48:48 fileserver kernel: ata1: soft resetting port
Oct 16 14:48:49 fileserver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 16 14:48:49 fileserver kernel: ata1.00: configured for UDMA/133
Oct 16 14:48:49 fileserver kernel: ata1: EH complete
Oct 16 14:48:49 fileserver kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Oct 16 14:48:49 fileserver kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 16 14:48:49 fileserver kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 16 14:48:49 fileserver kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA



> Since you have the hardware, do the tests and decide for yourself.

> I'd try copying one (big, preferably over 160G ) disk onto another (with
> dd) for a start,
> while waiting for answers on mailing lists.


I can order that 1740 online, but returning something is always more
difficult. So need to be quite sure that there are'nt problems with
this highpoint.

Tonight i will try the Asus motherboard with 1 drive and much I/O. And
i will create a new array which takes 7 hours. But how often/hours do
you need to try something to prove it does not fail :P

-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-17 15:04       ` Re[2]: " MisterE
@ 2007-10-17 19:21         ` Peter Favrholdt
  2007-10-19 12:02           ` Re[2]: " MisterE
  2007-10-18 21:07         ` Alexander Sabourenkov
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Favrholdt @ 2007-10-17 19:21 UTC (permalink / raw)
  To: MisterE; +Cc: Alexander Sabourenkov, Mikael Pettersson, linux-ide

Hi,

MisterE wrote:
> Tonight i will try the Asus motherboard with 1 drive and much I/O. And
> i will create a new array which takes 7 hours. But how often/hours do
> you need to try something to prove it does not fail :P

On one box I had problems with the SATA300 TX4 using 2.6.21 through 
2.6.22 (different versions). I have 4x500GB Seagate ES SATA drives 
connected. The system would run fine, but when put to a stress - i.e. 
loaded on all sata ports one or two ports would fail - one after the 
other. I have _always_ been able to make it fail doing:

dd if=/dev/sda of=/dev/null bs=1M &
dd if=/dev/sdb of=/dev/null bs=1M &
dd if=/dev/sdc of=/dev/null bs=1M &
dd if=/dev/sdd of=/dev/null bs=1M &

The ports would freeze before running long - e.g. in less than an hour.

This can be done without even starting the array (mdadm). Therefore no 
data corruption will happen.

The above issue was fixed by updating to vanilla 2.6.23.1.

Until then I have been running with 2.6.21-rc2 with a Mikael Petterson 
patch to force the SATA to 1.5Gbps (this could possibly be accomplished 
by jumpers on the drives as well - but I didn't try that).

I have another system (Dell PE1800 = different from the above) running 
24x7 using vanilla linux 2.6.19.5. This system has been running without 
hickups for more than a year (current uptime 135 days).

Hope this helps,

Best regards,

Peter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-17 19:21         ` Peter Favrholdt
@ 2007-10-19 12:02           ` MisterE
  0 siblings, 0 replies; 32+ messages in thread
From: MisterE @ 2007-10-19 12:02 UTC (permalink / raw)
  To: Peter Favrholdt; +Cc: Alexander Sabourenkov, Mikael Pettersson, linux-ide

Hello Peter,

Wednesday, October 17, 2007, 9:21:28 PM, you wrote:


> On one box I had problems with the SATA300 TX4 using 2.6.21 through 
> 2.6.22 (different versions). I have 4x500GB Seagate ES SATA drives 
> connected. The system would run fine, but when put to a stress - i.e. 
> loaded on all sata ports one or two ports would fail - one after the 
> other. I have _always_ been able to make it fail doing:

> dd if=/dev/sda of=/dev/null bs=1M &
> dd if=/dev/sdb of=/dev/null bs=1M &
> dd if=/dev/sdc of=/dev/null bs=1M &
> dd if=/dev/sdd of=/dev/null bs=1M &

> The ports would freeze before running long - e.g. in less than an hour.

I followed your advice and tested it. I have 4x500GB drives (western
digital Caviar SE16 WD5000AAKS). I tested it with and without jumpers
(300 and 150Gb mode). All test are done with the Asus CUSL2-C


1 :: The first run; debian 2.6.18-4-686 (stable); 300Gb [3 hours in total]:
Oct 17 18:06:12 debian kernel: ata1: no sense translation for status: 0x50
Oct 17 18:06:12 debian kernel: ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 17 18:06:12 debian kernel: ata1: status=0x50 { DriveReady SeekComplete }
Oct 17 19:37:15 debian kernel: ata1: no sense translation for status: 0x50
Oct 17 19:37:15 debian kernel: ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 17 19:37:15 debian kernel: ata1: status=0x50 { DriveReady SeekComplete }
Oct 17 19:42:11 debian kernel: ata3: no sense translation for status: 0x50
Oct 17 19:42:12 debian kernel: ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 17 19:42:12 debian kernel: ata3: status=0x50 { DriveReady SeekComplete }
Oct 17 20:23:38 debian kernel: ata1: no sense translation for status: 0x50
Oct 17 20:23:39 debian kernel: ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 17 20:23:39 debian kernel: ata1: status=0x50 { DriveReady SeekComplete }
Oct 17 20:31:38 debian kernel: ata2: no sense translation for status: 0x50
Oct 17 20:31:38 debian kernel: ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 17 20:31:38 debian kernel: ata2: status=0x50 { DriveReady SeekComplete }
Oct 17 20:44:56 debian kernel: ata3: no sense translation for status: 0x50
Oct 17 20:44:56 debian kernel: ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 17 20:44:56 debian kernel: ata3: status=0x50 { DriveReady SeekComplete }

2 :: Second run (1 hour); same settings:
Oct 18 09:27:47 debian kernel: ata4: no sense translation for status: 0x50
Oct 18 09:27:47 debian kernel: ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 18 09:27:47 debian kernel: ata4: status=0x50 { DriveReady SeekComplete }
Oct 18 09:38:18 debian kernel: ata3: no sense translation for status: 0x50
Oct 18 09:38:18 debian kernel: ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Oct 18 09:38:18 debian kernel: ata3: status=0x50 { DriveReady SeekComplete }


3 :: After that 3 a 5 hours with the drives jumpered. No problems.


4 :: 17:15 - 18:28; 2.6.22-2-686; 300Gb

Oct 18 13:45:25 fileserver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Oct 18 13:45:25 fileserver kernel: ata1.00: (port_status 0x20080000)
Oct 18 13:45:25 fileserver kernel: ata1.00: cmd c8/00:08:00:e6:cb/00:00:00:00:00/e2 tag 0 cdb 0x0 data 4096 in
Oct 18 13:45:25 fileserver kernel:          res 50/00:00:07:e6:cb/00:00:00:00:00/e2 Emask 0x2 (HSM violation)
Oct 18 13:45:26 fileserver kernel: ata1: soft resetting port
Oct 18 13:45:26 fileserver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 18 13:45:26 fileserver kernel: ata1.00: configured for UDMA/133
Oct 18 13:45:26 fileserver kernel: ata1: EH complete
Oct 18 13:45:26 fileserver kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Oct 18 13:45:26 fileserver kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 18 13:45:26 fileserver kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 18 13:45:26 fileserver kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 18 13:57:19 fileserver kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Oct 18 13:57:19 fileserver kernel: ata2.00: (port_status 0x20080000)
Oct 18 13:57:19 fileserver kernel: ata2.00: cmd c8/00:08:00:e6:92/00:00:00:00:00/e4 tag 0 cdb 0x0 data 4096 in
Oct 18 13:57:19 fileserver kernel:          res 50/00:00:07:e6:92/00:00:00:00:00/e4 Emask 0x2 (HSM violation)
Oct 18 13:57:19 fileserver kernel: ata2: soft resetting port
Oct 18 13:57:20 fileserver kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 18 13:57:20 fileserver kernel: ata2.00: configured for UDMA/133
Oct 18 13:57:20 fileserver kernel: ata2: EH complete
Oct 18 13:57:20 fileserver kernel: sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Oct 18 13:57:20 fileserver kernel: sd 1:0:0:0: [sdb] Write Protect is off
Oct 18 13:57:20 fileserver kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Oct 18 13:57:20 fileserver kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 18 14:09:44 fileserver kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Oct 18 14:09:44 fileserver kernel: ata1.00: (port_status 0x20080000)
Oct 18 14:09:44 fileserver kernel: ata1.00: cmd c8/00:e0:20:8d:3b/00:00:00:00:00/e6 tag 0 cdb 0x0 data 114688 in
Oct 18 14:09:44 fileserver kernel:          res 50/00:00:ff:8d:3b/00:00:00:00:00/e6 Emask 0x2 (HSM violation)
Oct 18 14:09:44 fileserver kernel: ata1: soft resetting port
Oct 18 14:09:44 fileserver kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 18 14:09:44 fileserver kernel: ata1.00: configured for UDMA/133
Oct 18 14:09:44 fileserver kernel: ata1: EH complete
Oct 18 14:09:44 fileserver kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Oct 18 14:09:44 fileserver kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 18 14:09:44 fileserver kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 18 14:09:44 fileserver kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 18 14:15:37 fileserver kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Oct 18 14:15:37 fileserver kernel: ata3.00: (port_status 0x20080000)
Oct 18 14:15:37 fileserver kernel: ata3.00: cmd c8/00:08:00:4a:27/00:00:00:00:00/e7 tag 0 cdb 0x0 data 4096 in
Oct 18 14:15:37 fileserver kernel:          res 50/00:00:07:4a:27/00:00:00:00:00/e7 Emask 0x2 (HSM violation)
Oct 18 14:15:37 fileserver kernel: ata3: soft resetting port
Oct 18 14:15:38 fileserver kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 18 14:15:38 fileserver kernel: ata3.00: configured for UDMA/133
Oct 18 14:15:38 fileserver kernel: ata3: EH complete
Oct 18 14:15:38 fileserver kernel: sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Oct 18 14:15:38 fileserver kernel: sd 2:0:0:0: [sdc] Write Protect is off
Oct 18 14:15:38 fileserver kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Oct 18 14:15:38 fileserver kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA


5 :: 2.6.22-2-686 - 2 hours with the drives jumpered. No problems.


> The above issue was fixed by updating to vanilla 2.6.23.1.

So, when running in the 150Gb mode there are no problems.

I'm going to try the same with .23(.1). I'm not really familiar with
updating the kernel. Tried it before with: http://www.debianhelp.co.uk/kernel2.6.htm
but not much success. But, i'm going to try...
I will post the results later.



-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-17 15:04       ` Re[2]: " MisterE
  2007-10-17 19:21         ` Peter Favrholdt
@ 2007-10-18 21:07         ` Alexander Sabourenkov
  2007-10-19  1:26           ` Tejun Heo
  1 sibling, 1 reply; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-18 21:07 UTC (permalink / raw)
  To: linux-ide; +Cc: MisterE, Mikael Pettersson, htejun, alan, benh, jgarzik, jeff

Hello.


I have done some quick tests with 2.6.23/amd64 and unfortunately, the
very same problem persists.

By the way, 8 in (port_status 0x20080000) stands for
        PDC_OVERRUN_ERR         = (1 << 19), /* S/G byte count larger
than HD requires */


Does by any chance 'S/G' here somehow relate to 'sg in the 'sg-chaining
work' there is so much talk about on the -kernel mailing list?



In a somewhat parallel development, write errors caused my (other) md
RAID-1 to lose one drive while copying data under 2.6.22
from TX4-attached drives to onboard-VIA-attached ones.

Device: VIA VT6420
00:0f.0 0104: 1106:3149 (rev 80)

Boot:

Oct 17 21:28:25 host sata_via 0000:00:0f.0: version 2.2
Oct 17 21:28:25 host ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20
(level, low) -> IRQ 17
Oct 17 21:28:25 host sata_via 0000:00:0f.0: routed to hard irq line 10
Oct 17 21:28:25 host scsi4 : sata_via
Oct 17 21:28:25 host scsi5 : sata_via

Oct 17 21:28:25 host ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 17 21:28:25 host ata6.00: ATA-7: ST3200827AS, 3.AAH, max UDMA/133
Oct 17 21:28:25 host ata6.00: 390721968 sectors, multi 0: LBA48 NCQ
(depth 0/32)
Oct 17 21:28:25 host ata6.00: configured for UDMA/133

... the first two port resets:

Oct 17 23:10:50 host ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x2
Oct 17 23:10:50 host ata6.00: (BMDMA stat 0x4)
Oct 17 23:10:50 host ata6.00: cmd ca/00:08:e7:30:00/00:00:00:00:00/e0
tag 0 cdb 0x0 data 4096 out
Oct 17 23:10:50 host res 51/84:08:e7:30:00/00:00:00:00:00/e0 Emask 0x10
(ATA bus error)
Oct 17 23:10:50 host ata6: soft resetting port
Oct 17 23:10:50 host ata6.00: configured for UDMA/133
Oct 17 23:10:50 host ata6: EH complete
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] 390721968 512-byte hardware
sectors (200050 MB)
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] Write Protect is off
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
Oct 17 23:10:50 host ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x2
Oct 17 23:10:50 host ata6.00: (BMDMA stat 0x5)
Oct 17 23:10:50 host ata6.00: cmd ca/00:f8:4f:31:00/00:00:00:00:00/e0
tag 0 cdb 0x0 data 126976 out
Oct 17 23:10:50 host res 51/84:f8:4f:31:00/00:00:00:00:00/e0 Emask 0x10
(ATA bus error)
Oct 17 23:10:50 host ata6: soft resetting port
Oct 17 23:10:50 host ata6.00: configured for UDMA/133
Oct 17 23:10:50 host ata6: EH complete
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] 390721968 512-byte hardware
sectors (200050 MB)
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] Write Protect is off
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Oct 17 23:10:50 host sd 5:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

... and multiple unsuccessful port resets follow:

Oct 17 23:11:57 host ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x2 frozen
Oct 17 23:11:57 host ata6.00: cmd 25/00:08:7f:bf:28/00:00:16:00:00/e0
tag 0 cdb 0x0 data 4096 in
Oct 17 23:11:57 host res 40/00:f8:4f:31:00/00:00:00:00:00/e0 Emask 0x4
(timeout)
Oct 17 23:12:02 host ata6: port is slow to respond, please be patient
(Status 0xd0)
Oct 17 23:12:07 host ata6: soft resetting port
Oct 17 23:12:37 host ata6.00: qc timeout (cmd 0xec)
Oct 17 23:12:37 host ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 17 23:12:37 host ata6.00: revalidation failed (errno=-5)
Oct 17 23:12:37 host ata6: failed to recover some devices, retrying in 5
secs
Oct 17 23:12:47 host ata6: port is slow to respond, please be patient
(Status 0xd0)
Oct 17 23:12:52 host ata6: soft resetting port
Oct 17 23:13:22 host ata6.00: qc timeout (cmd 0xec)
Oct 17 23:13:22 host ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 17 23:13:22 host ata6.00: revalidation failed (errno=-5)
Oct 17 23:13:22 host ata6.00: limiting speed to UDMA/133:PIO3
Oct 17 23:13:22 host ata6: failed to recover some devices, retrying in 5
secs
Oct 17 23:13:32 host ata6: port is slow to respond, please be patient
(Status 0xd0)
Oct 17 23:13:37 host ata6: soft resetting port
Oct 17 23:14:08 host ata6.00: qc timeout (cmd 0xec)
Oct 17 23:14:08 host ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 17 23:14:08 host ata6.00: revalidation failed (errno=-5)
Oct 17 23:14:08 host ata6.00: disabled
Oct 17 23:14:08 host ata6: EH complete
Oct 17 23:14:08 host sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
Oct 17 23:14:08 host end_request: I/O error, dev sdd, sector 371769215
Oct 17 23:14:08 host raid1: sdd1: rescheduling sector 371769152
Oct 17 23:14:08 host sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
Oct 17 23:14:08 host end_request: I/O error, dev sdd, sector 390379327
Oct 17 23:14:08 host md: super_written gets error=-5, uptodate=0
Oct 17 23:14:08 host raid1: Disk failure on sdd1, disabling device.

I'm unable to reproduce this on 2.6.23, so this is of historic interest
only.

-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-18 21:07         ` Alexander Sabourenkov
@ 2007-10-19  1:26           ` Tejun Heo
  2007-10-19 21:06             ` Alexander Sabourenkov
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2007-10-19  1:26 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: linux-ide, MisterE, Mikael Pettersson, alan, benh, jgarzik, jeff

Hello,

Alexander Sabourenkov wrote:
> In a somewhat parallel development, write errors caused my (other) md
> RAID-1 to lose one drive while copying data under 2.6.22
> from TX4-attached drives to onboard-VIA-attached ones.
 >
> ... the first two port resets:
> 
> Oct 17 23:10:50 host ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0
> action 0x2
> Oct 17 23:10:50 host ata6.00: (BMDMA stat 0x4)
> Oct 17 23:10:50 host ata6.00: cmd ca/00:08:e7:30:00/00:00:00:00:00/e0
> tag 0 cdb 0x0 data 4096 out
> Oct 17 23:10:50 host res 51/84:08:e7:30:00/00:00:00:00:00/e0 Emask 0x10
> (ATA bus error)
> Oct 17 23:10:50 host ata6: soft resetting port
> Oct 17 23:10:50 host ata6.00: configured for UDMA/133
> Oct 17 23:10:50 host ata6: EH complete
[--snip--]
> Oct 17 23:13:37 host ata6: soft resetting port
> Oct 17 23:14:08 host ata6.00: qc timeout (cmd 0xec)
> Oct 17 23:14:08 host ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> Oct 17 23:14:08 host ata6.00: revalidation failed (errno=-5)
> Oct 17 23:14:08 host ata6.00: disabled
> Oct 17 23:14:08 host ata6: EH complete
> Oct 17 23:14:08 host sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 17 23:14:08 host end_request: I/O error, dev sdd, sector 371769215
> Oct 17 23:14:08 host raid1: sdd1: rescheduling sector 371769152
> Oct 17 23:14:08 host sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 17 23:14:08 host end_request: I/O error, dev sdd, sector 390379327
> Oct 17 23:14:08 host md: super_written gets error=-5, uptodate=0
> Oct 17 23:14:08 host raid1: Disk failure on sdd1, disabling device.
> 
> I'm unable to reproduce this on 2.6.23, so this is of historic interest
> only.

It might not have anything to do with the os and driver.  Some SATA 
controllers and/or drives aren't very reliable and they just fail from 
time to time.  My previous desktop was using sata_nv w/ seagate sata 
drives and was up 24/7.  I used it for like two years and during that 
time, there was single transfer error and it brought the drive down 
completely and I had to reboot and rebuild my RAID 1 array.  ISTR what's 
dead was the controller port.  IIRC, powering off and on the drive 
didn't help.

Another interesting case was first gen SATA harddrives from certain 
vendor.  After any transfer error, those drives went completely deaf. 
The only way to recover them was removing power, waiting a bit and 
reapplying it.

So, my bet for your second report is your hardware went through 
something similar as above.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-19  1:26           ` Tejun Heo
@ 2007-10-19 21:06             ` Alexander Sabourenkov
  2007-10-19 22:58               ` Re[2]: " MisterE
  2007-10-19 23:58               ` Tejun Heo
  0 siblings, 2 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-19 21:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, MisterE, alan, benh, jgarzik, jeff

Hello.

> 
> So, my bet for your second report is your hardware went through
> something similar as above.
> 

Thanks for the insight. Let's dismiss it then.

Back to the TX4, I tried libata-dev.git cloned at about 20:00 UTC 19.10,
  no perceived difference - parallel read from two drives causes a lot
of  errors.

dmesgs  with boot and errors are at http://lxnt.info/linux/libata-dev/

I don't know what to try next. Any ideas?

-- 

./lxnt






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-19 21:06             ` Alexander Sabourenkov
@ 2007-10-19 22:58               ` MisterE
  2007-10-19 23:58               ` Tejun Heo
  1 sibling, 0 replies; 32+ messages in thread
From: MisterE @ 2007-10-19 22:58 UTC (permalink / raw)
  To: Alexander Sabourenkov; +Cc: Tejun Heo, linux-ide, alan, benh, jgarzik, jeff

Hello Alexander,

Friday, October 19, 2007, 11:06:02 PM, you wrote:

> I don't know what to try next. Any ideas?

I'm no kernel hacker, so i'll take a shot.
I assume you have done most already...

* hardware (Tested/without/or used another: motherboard, videocard, memory,
hard drives, power supply, all other hardware)

* tried a more n00b-proof distribution. As far as i know you have all
those flags with gentoo. A mistake is easily made.

* Tested with the latest official drivers (redhat) from the Promise site. And
installing that OS on a disk. I assume they made working drivers, so
it should work with it...

* Does it work correctly with Windows?

This would be the steps i would take to determine the cause of the
problem.

Finally, my 2.6.23 kernel is done. I'm going try to install it now.
Tomorrow the results :)

-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-19 21:06             ` Alexander Sabourenkov
  2007-10-19 22:58               ` Re[2]: " MisterE
@ 2007-10-19 23:58               ` Tejun Heo
  2007-10-20 21:50                 ` Alexander Sabourenkov
  1 sibling, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2007-10-19 23:58 UTC (permalink / raw)
  To: Alexander Sabourenkov; +Cc: linux-ide, MisterE, alan, benh, jgarzik, jeff

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

Alexander Sabourenkov wrote:
> Hello.
> 
>> So, my bet for your second report is your hardware went through
>> something similar as above.
>>
> 
> Thanks for the insight. Let's dismiss it then.
> 
> Back to the TX4, I tried libata-dev.git cloned at about 20:00 UTC 19.10,
>   no perceived difference - parallel read from two drives causes a lot
> of  errors.
> 
> dmesgs  with boot and errors are at http://lxnt.info/linux/libata-dev/
> 
> I don't know what to try next. Any ideas?
> 

Does the attached patch help?

-- 
tejun

[-- Attachment #2: limit-PHY-to-1.5Gbps.patch --]
[-- Type: text/plain, Size: 402 bytes --]

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 68699b3..4c93fee 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6435,6 +6435,7 @@ int sata_link_init_spd(struct ata_link *link)
 	spd = (scontrol >> 4) & 0xf;
 	if (spd)
 		link->hw_sata_spd_limit &= (1 << spd) - 1;
+	link->hw_sata_spd_limit = 1;
 
 	link->sata_spd_limit = link->hw_sata_spd_limit;
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Sata Sil3512 bug?;  Promise SATA300 TX4
  2007-10-19 23:58               ` Tejun Heo
@ 2007-10-20 21:50                 ` Alexander Sabourenkov
  2007-10-27 13:24                   ` [PATCH-RFC] (was: Re: Sata Sil3512 bug?; Promise SATA300 TX4) Alexander Sabourenkov
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-20 21:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, MisterE, alan, benh, jgarzik, jeff

Hello.

Tejun Heo wrote:
> 
> Does the attached patch help?
> 

It does somehow force 1.5GB/s mode, and it does change the pattern of
'configured for UDMAxxx' messages that come along with errors, and it
causes the following error:

ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xb t4
ata3: hotplug_status 0x10
ata3: soft resetting link
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata3.00: configured for UDMA/133
ata3: EH complete

for both drives on TX4 on startup, but read errors are still there.

dmesgs at http://lxnt.info/linux/libata-dev/patch0/

READY
[]


-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH-RFC] (was: Re: Sata Sil3512 bug?;  Promise SATA300 TX4)
  2007-10-20 21:50                 ` Alexander Sabourenkov
@ 2007-10-27 13:24                   ` Alexander Sabourenkov
  2007-10-27 13:44                     ` [PATCH-RFC] Alexander Sabourenkov
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-27 13:24 UTC (permalink / raw)
  To: linux-ide; +Cc: Tejun Heo, MisterE, benh, jgarzik, jeff

Hello.

There appears to be a hardware bug in that it chokes on scatterlist
if the last item is larger than 164 bytes.

The patch that follows fixes my problem on 2.6.22.

I can't think of a way to avoid second pass over scatterlist without
duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).


--- a/drivers/ata/sata_promise.c	2007-07-09 03:32:17.000000000 +0400
+++ b/drivers/ata/sata_promise.c	2007-10-27 17:20:03.000000000 +0400
@@ -531,6 +531,80 @@
 	memcpy(buf+31, cdb, cdb_len);
 }

+/**
+ *	pdc_qc_prep - Fill PCI IDE PRD table
+ *	@qc: Metadata associated with taskfile to be transferred
+ *
+ *	Fill PCI IDE PRD (scatter-gather) table with segments
+ *	associated with the current disk command.
+ *	Make sure hardware does not choke on it.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ *
+ */
+static void pdc_qc_prep(struct ata_queued_cmd *qc)
+{
+	struct ata_port *ap = qc->ap;
+	struct scatterlist *sg;
+	unsigned int idx;
+	const u32 SG_COUNT_ASIC_BUG = 41*4;
+
+	if (!(qc->flags & ATA_QCFLAG_DMAMAP))
+		return;
+	
+	WARN_ON(qc->__sg == NULL);
+	WARN_ON(qc->n_elem == 0 && qc->pad_len == 0);
+
+	idx = 0;
+	ata_for_each_sg(sg, qc) {
+		u32 addr, offset;
+		u32 sg_len, len;
+
+		/* determine if physical DMA addr spans 64K boundary.
+		 * Note h/w doesn't support 64-bit, so we unconditionally
+		 * truncate dma_addr_t to u32.
+		 */
+		addr = (u32) sg_dma_address(sg);
+		sg_len = sg_dma_len(sg);
+
+		while (sg_len) {
+			offset = addr & 0xffff;
+			len = sg_len;
+			if ((offset + sg_len) > 0x10000)
+				len = 0x10000 - offset;
+
+			ap->prd[idx].addr = cpu_to_le32(addr);
+			ap->prd[idx].flags_len = cpu_to_le32(len & 0xffff);
+			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);
+
+			idx++;
+			sg_len -= len;
+			addr += len;
+		}
+	}
+
+	if (idx) {
+		u32 len = ap->prd[idx - 1].flags_len;
+		if (len > SG_COUNT_ASIC_BUG) {
+			u32 addr, len;
+
+			VPRINTK("Last PRD split\n");
+			
+			len = le32_to_cpu(ap->prd[idx - 1].flags_len) - SG_COUNT_ASIC_BUG;
+			addr = le32_to_cpu(ap->prd[idx - 1].addr);
+			ap->prd[idx - 1].flags_len = cpu_to_le32(len);
+			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);
+			
+			ap->prd[idx].flags_len = cpu_to_le32(SG_COUNT_ASIC_BUG);
+			ap->prd[idx].addr = cpu_to_le32(addr + len);
+			idx++;
+			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr + len, SG_COUNT_ASIC_BUG);
+		}
+		ap->prd[idx - 1].flags_len |= cpu_to_le32(ATA_PRD_EOT);
+	}
+}
+
 static void pdc_qc_prep(struct ata_queued_cmd *qc)
 {
 	struct pdc_port_priv *pp = qc->ap->private_data;
@@ -540,7 +614,7 @@

 	switch (qc->tf.protocol) {
 	case ATA_PROT_DMA:
-		ata_qc_prep(qc);
+		pdc_qc_prep(qc);
 		/* fall through */

 	case ATA_PROT_NODATA:


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC]
  2007-10-27 13:24                   ` [PATCH-RFC] (was: Re: Sata Sil3512 bug?; Promise SATA300 TX4) Alexander Sabourenkov
@ 2007-10-27 13:44                     ` Alexander Sabourenkov
  2007-10-27 14:08                       ` Re[2]: [PATCH-RFC] MisterE
  2007-10-27 15:16                       ` [PATCH-RFC] Promise TX4 implement hw-bug workaround Alexander Sabourenkov
  0 siblings, 2 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-27 13:44 UTC (permalink / raw)
  To: Alexander Sabourenkov; +Cc: linux-ide, Tejun Heo, MisterE, benh, jgarzik, jeff

Alexander Sabourenkov wrote:
> Hello.
> 
> There appears to be a hardware bug in that it chokes on scatterlist
> if the last item is larger than 164 bytes.
> 
> The patch that follows fixes my problem on 2.6.22.
> 
> I can't think of a way to avoid second pass over scatterlist without
> duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).
> 
> 

Sorry, this was wrong patch :(. Two days looking at vendor code must
have driven me insane. Will send the correct one asap.

-- 

./lxnt


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re[2]: [PATCH-RFC]
  2007-10-27 13:44                     ` [PATCH-RFC] Alexander Sabourenkov
@ 2007-10-27 14:08                       ` MisterE
  2007-10-27 15:09                         ` [PATCH-RFC] Alexander Sabourenkov
  2007-10-27 15:16                       ` [PATCH-RFC] Promise TX4 implement hw-bug workaround Alexander Sabourenkov
  1 sibling, 1 reply; 32+ messages in thread
From: MisterE @ 2007-10-27 14:08 UTC (permalink / raw)
  To: Alexander Sabourenkov; +Cc: linux-ide, Tejun Heo, benh, jgarzik, jeff

Hello Alexander,

Saturday, October 27, 2007, 3:44:51 PM, you wrote:

>> There appears to be a hardware bug in that it chokes on scatterlist
>> if the last item is larger than 164 bytes.

Can you confirm that this only will happen when running at 300Gb mode?
I have the drives jumpered and have no errors. I tested the "copy to
null" method several times with several kernel versions. I'm now in
the fase of copying all my data to the fileserver.

I'm willing to try your patch but i'm not a experienced linux guru ;)
Once i patched the kernel source (2.6.23 to 2.6.23.1) but i was stuck how
to install the updated driver....

-- 
Best regards,
 MisterE                            mailto:MisterE2002@zonnet.nl

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC]
  2007-10-27 14:08                       ` Re[2]: [PATCH-RFC] MisterE
@ 2007-10-27 15:09                         ` Alexander Sabourenkov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-27 15:09 UTC (permalink / raw)
  To: MisterE; +Cc: linux-ide

MisterE wrote:
> 
> Can you confirm that this only will happen when running at 300Gb mode?

I confirm that without this patch errors happen on both 150 and 300
modes, on both jumpered and unjumpered drives. It seems that errors are
highly hardware/configuration dependent.

> I'm willing to try your patch but i'm not a experienced linux guru ;)

I would not advise trying this patch now if you do not experience
problems, and certainly not with any valuable data behind the controller.

-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-27 13:44                     ` [PATCH-RFC] Alexander Sabourenkov
  2007-10-27 14:08                       ` Re[2]: [PATCH-RFC] MisterE
@ 2007-10-27 15:16                       ` Alexander Sabourenkov
  2007-10-27 18:09                         ` Alan Cox
  2007-10-28 10:29                         ` Jeff Garzik
  1 sibling, 2 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-27 15:16 UTC (permalink / raw)
  To: Alexander Sabourenkov; +Cc: linux-ide, Tejun Heo, MisterE, benh, jgarzik, jeff

Hello.
Once again,

There appears to be a hardware bug in that it chokes on scatterlist
if the last item is larger than 164 bytes. This was discovered by
reading the code of vendor-supplied driver.

The patch that follows fixes my problem on 2.6.22.

I can't think of a way to avoid second pass over scatterlist without
duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).





--- a/drivers/ata/sata_promise.c	2007-07-09 03:32:17.000000000 +0400
+++ b/drivers/ata/sata_promise.c	2007-10-27 19:12:46.000000000 +0400
@@ -531,6 +531,87 @@
 	memcpy(buf+31, cdb, cdb_len);
 }

+/**
+ *	pdc_fill_sg - Fill PCI IDE PRD table
+ *	@qc: Metadata associated with taskfile to be transferred
+ *
+ *	Fill PCI IDE PRD (scatter-gather) table with segments
+ *	associated with the current disk command.
+ *	Make sure hardware does not choke on it.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ *
+ */
+static void pdc_fill_sg(struct ata_queued_cmd *qc)
+{
+	struct ata_port *ap = qc->ap;
+	struct scatterlist *sg;
+	unsigned int idx;
+	const u32 SG_COUNT_ASIC_BUG = 41*4;
+
+	if (!(qc->flags & ATA_QCFLAG_DMAMAP))
+		return;
+	
+	WARN_ON(qc->__sg == NULL);
+	WARN_ON(qc->n_elem == 0 && qc->pad_len == 0);
+
+	idx = 0;
+	ata_for_each_sg(sg, qc) {
+		u32 addr, offset;
+		u32 sg_len, len;
+
+		/* determine if physical DMA addr spans 64K boundary.
+		 * Note h/w doesn't support 64-bit, so we unconditionally
+		 * truncate dma_addr_t to u32.
+		 */
+		addr = (u32) sg_dma_address(sg);
+		sg_len = sg_dma_len(sg);
+
+		while (sg_len) {
+			offset = addr & 0xffff;
+			len = sg_len;
+			if ((offset + sg_len) > 0x10000)
+				len = 0x10000 - offset;
+
+			ap->prd[idx].addr = cpu_to_le32(addr);
+			ap->prd[idx].flags_len = cpu_to_le32(len & 0xffff);
+			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);
+
+			idx++;
+			sg_len -= len;
+			addr += len;
+		}
+	}
+
+	if (idx) {
+		u32 len = le32_to_cpu(ap->prd[idx - 1].flags_len);
+
+		if (len > SG_COUNT_ASIC_BUG) {
+			u32 addr;
+			/* if len < 2*SG_COUNT_ASIC_BUG then last
+			   segment will be larger than next-to-last.
+			   Somewhat ugly :(
+			*/
+
+			VPRINTK("Splitting last PRD.\n");
+
+			ap->prd[idx - 1].flags_len -= cpu_to_le32(SG_COUNT_ASIC_BUG);
+			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx - 1, addr, SG_COUNT_ASIC_BUG);
+			
+			addr = le32_to_cpu(ap->prd[idx - 1].addr) + len - SG_COUNT_ASIC_BUG;
+			len  = SG_COUNT_ASIC_BUG;
+			ap->prd[idx].addr = cpu_to_le32(addr);
+			ap->prd[idx].flags_len = cpu_to_le32(len);
+			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);
+
+			idx++;
+		}
+
+		ap->prd[idx - 1].flags_len |= cpu_to_le32(ATA_PRD_EOT);
+	}
+}
+
 static void pdc_qc_prep(struct ata_queued_cmd *qc)
 {
 	struct pdc_port_priv *pp = qc->ap->private_data;
@@ -540,7 +621,7 @@

 	switch (qc->tf.protocol) {
 	case ATA_PROT_DMA:
-		ata_qc_prep(qc);
+		pdc_fill_sg(qc);
 		/* fall through */

 	case ATA_PROT_NODATA:
@@ -556,11 +637,11 @@
 		break;

 	case ATA_PROT_ATAPI:
-		ata_qc_prep(qc);
+		pdc_fill_sg(qc);
 		break;

 	case ATA_PROT_ATAPI_DMA:
-		ata_qc_prep(qc);
+		pdc_fill_sg(qc);
 		/*FALLTHROUGH*/
 	case ATA_PROT_ATAPI_NODATA:
 		pdc_atapi_pkt(qc);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-27 15:16                       ` [PATCH-RFC] Promise TX4 implement hw-bug workaround Alexander Sabourenkov
@ 2007-10-27 18:09                         ` Alan Cox
  2007-10-27 18:18                           ` Alexander Sabourenkov
  2007-10-28 10:29                         ` Jeff Garzik
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Cox @ 2007-10-27 18:09 UTC (permalink / raw)
  Cc: Alexander Sabourenkov, linux-ide, Tejun Heo, MisterE, benh,
	jgarzik, jeff

> I can't think of a way to avoid second pass over scatterlist without
> duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).

This appears to be incomplete:

> +			VPRINTK("Splitting last PRD.\n");
> +
> +			ap->prd[idx - 1].flags_len -= cpu_to_le32(SG_COUNT_ASIC_BUG);
> +			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx - 1, addr, SG_COUNT_ASIC_BUG);
> +			
> +			addr = le32_to_cpu(ap->prd[idx - 1].addr) + len - SG_COUNT_ASIC_BUG;
> +			len  = SG_COUNT_ASIC_BUG;
> +			ap->prd[idx].addr = cpu_to_le32(addr);
> +			ap->prd[idx].flags_len = cpu_to_le32(len);
> +			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);
> +
> +			idx++;

What guarantees you have enough PRD entries to do this without changing
the limit in the structures ?

Otherwise looks good

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-27 18:09                         ` Alan Cox
@ 2007-10-27 18:18                           ` Alexander Sabourenkov
  2007-10-27 18:37                             ` Alexander Sabourenkov
  2007-10-28  8:21                             ` Jeff Garzik
  0 siblings, 2 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-27 18:18 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, Tejun Heo, MisterE, benh, jgarzik, jeff

Alan Cox wrote:
>> I can't think of a way to avoid second pass over scatterlist without
>> duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).
> 
> This appears to be incomplete:
> 

[...]

> 
> What guarantees you have enough PRD entries to do this without changing
> the limit in the structures ?
> 
> Otherwise looks good

PRD entries count is 256
include/linux/ata.h:
	ATA_MAX_PRD		= 256,
	ATA_PRD_TBL_SZ          = (ATA_MAX_PRD * ATA_PRD_SZ),

drivers/ata/libata-core.c:
 ap->prd = dmam_alloc_coherent(dev, ATA_PRD_TBL_SZ, &ap->prd_dma,

sata_promise Scsi_Host declares support for half of that:

include/linux/libata.h:
LIBATA_MAX_PRD		= ATA_MAX_PRD / 2,

drivers/ata/sata_promise.c
    .sg_tablesize           = LIBATA_MAX_PRD,


PS: Vendor code has this limit at 32.

-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-27 18:18                           ` Alexander Sabourenkov
@ 2007-10-27 18:37                             ` Alexander Sabourenkov
  2007-10-28  8:21                             ` Jeff Garzik
  1 sibling, 0 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-27 18:37 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: Alan Cox, linux-ide, Tejun Heo, MisterE, benh, jgarzik, jeff

Alexander Sabourenkov wrote:
> Alan Cox wrote:
>>> I can't think of a way to avoid second pass over scatterlist without
>>> duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).
>> This appears to be incomplete:
>>
> 
> [...]
> 
>> What guarantees you have enough PRD entries to do this without changing
>> the limit in the structures ?
>>
>> Otherwise looks good
> 
> PRD entries count is 256
> include/linux/ata.h:
> 	ATA_MAX_PRD		= 256,
> 	ATA_PRD_TBL_SZ          = (ATA_MAX_PRD * ATA_PRD_SZ),
> 
> drivers/ata/libata-core.c:
>  ap->prd = dmam_alloc_coherent(dev, ATA_PRD_TBL_SZ, &ap->prd_dma,
> 
> sata_promise Scsi_Host declares support for half of that:
> 
> include/linux/libata.h:
> LIBATA_MAX_PRD		= ATA_MAX_PRD / 2,
> 
> drivers/ata/sata_promise.c
>     .sg_tablesize           = LIBATA_MAX_PRD,
> 
> 
> PS: Vendor code has this limit at 32.
> 

That's an interesting question of itself. I don't know what limits PRD
count, but if it's hardware, then the driver should somehow make sure
that it gets no more than hw can handle minus one for this errata.

Right now driver declares that any hardware it supports can handle 128
PRD entries. If this is not true for any possibly existing specimen,
we're welcoming trouble.

-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-27 18:18                           ` Alexander Sabourenkov
  2007-10-27 18:37                             ` Alexander Sabourenkov
@ 2007-10-28  8:21                             ` Jeff Garzik
  2007-10-28 20:03                               ` Alexander Sabourenkov
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Garzik @ 2007-10-28  8:21 UTC (permalink / raw)
  To: Alexander Sabourenkov
  Cc: Alan Cox, linux-ide, Tejun Heo, MisterE, benh, jgarzik

Alexander Sabourenkov wrote:
> Alan Cox wrote:
>>> I can't think of a way to avoid second pass over scatterlist without
>>> duplicating code (ata_qc_prep() and ata_fill_sg() from libata-core.c).
>> This appears to be incomplete:
>>
> 
> [...]
> 
>> What guarantees you have enough PRD entries to do this without changing
>> the limit in the structures ?
>>
>> Otherwise looks good
> 
> PRD entries count is 256
> include/linux/ata.h:
> 	ATA_MAX_PRD		= 256,
> 	ATA_PRD_TBL_SZ          = (ATA_MAX_PRD * ATA_PRD_SZ),
> 
> drivers/ata/libata-core.c:
>  ap->prd = dmam_alloc_coherent(dev, ATA_PRD_TBL_SZ, &ap->prd_dma,
> 
> sata_promise Scsi_Host declares support for half of that:
> 
> include/linux/libata.h:
> LIBATA_MAX_PRD		= ATA_MAX_PRD / 2,
> 
> drivers/ata/sata_promise.c
>     .sg_tablesize           = LIBATA_MAX_PRD,

Alan's point was that the existing code will give you up to 
LIBATA_MAX_PRD entries.  After the post-virtual-merge splitting code in 
ata_fill_sg() executes, the worst case result is ATA_MAX_PRD entries.

Thus, since your code has the potential to increase the number of s/g 
entries above that, it can potentially corrupt memory, lock up the 
machine, all the wonderful things that can happen when you run off the 
end of the s/g list.

The fix is to decrease .sg_tablesize (LIBATA_MAX_PRD - 2 perhaps?) so 
that you guarantee this worst case never occurs, by guaranteeing that 
the system never sends you enough s/g entries to cause your code to go 
out of bounds.

	Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-28  8:21                             ` Jeff Garzik
@ 2007-10-28 20:03                               ` Alexander Sabourenkov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-28 20:03 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alan Cox, linux-ide, Tejun Heo, MisterE, benh, jgarzik

Jeff Garzik wrote:
> 
> Alan's point was that the existing code will give you up to
> LIBATA_MAX_PRD entries.  After the post-virtual-merge splitting code in
> ata_fill_sg() executes, the worst case result is ATA_MAX_PRD entries.
> 
> Thus, since your code has the potential to increase the number of s/g
> entries above that, it can potentially corrupt memory, lock up the
> machine, all the wonderful things that can happen when you run off the
> end of the s/g list.
> 
> The fix is to decrease .sg_tablesize (LIBATA_MAX_PRD - 2 perhaps?) so
> that you guarantee this worst case never occurs, by guaranteeing that
> the system never sends you enough s/g entries to cause your code to go
> out of bounds.
> 

Ah, now I understand. Thanks for the explanation.

I take it something guarantees that s/g entry size can not exceed 128K.


-- 

./lxnt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-27 15:16                       ` [PATCH-RFC] Promise TX4 implement hw-bug workaround Alexander Sabourenkov
  2007-10-27 18:09                         ` Alan Cox
@ 2007-10-28 10:29                         ` Jeff Garzik
  2007-10-28 11:52                           ` Alexander Sabourenkov
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Garzik @ 2007-10-28 10:29 UTC (permalink / raw)
  To: Alexander Sabourenkov; +Cc: linux-ide, Tejun Heo, MisterE, benh, jgarzik

BTW, looking at the Promise code I see

> cam_con.h:
> /* for ASIC bug, limit the last element of SG byteCount must < 32 Dword */
> #define SG_COUNT_ASIC_BUG       32
> //#define SG_COUNT_ASIC_BUG     128

	and in the code itself

> /* check PRD table, last element <= (32 Dword), fix ASIC bug */

(though the code obviously uses SG_COUNT_ASIC_BUG==32, as the first 
paste indicates)

so it seems like Promise first used 128 (32 dwords), but then backed 
down to 32 (8 dwords).

Either way, we definitely have an ASIC bug to work around, it seems...

	Jeff




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-28 10:29                         ` Jeff Garzik
@ 2007-10-28 11:52                           ` Alexander Sabourenkov
  2007-10-28 11:10                             ` Jeff Garzik
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Sabourenkov @ 2007-10-28 11:52 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide, Tejun Heo, MisterE, benh, jgarzik

Jeff Garzik wrote:
> BTW, looking at the Promise code I see
> 
>> cam_con.h:
>> /* for ASIC bug, limit the last element of SG byteCount must < 32
>> Dword */
>> #define SG_COUNT_ASIC_BUG       32
>> //#define SG_COUNT_ASIC_BUG     128
> 
>     and in the code itself
> 
>> /* check PRD table, last element <= (32 Dword), fix ASIC bug */
> 
> (though the code obviously uses SG_COUNT_ASIC_BUG==32, as the first
> paste indicates)
> 
> so it seems like Promise first used 128 (32 dwords), but then backed
> down to 32 (8 dwords).
> 

Which version is this define from?

Both versions that are available now from their website define it at 41*4:


/* for ASIC bug, limit the last element of SG byteCount must <= 41 Dword */
#define SG_COUNT_ASIC_BUG       41*4
//#define SG_COUNT_ASIC_BUG     32
//#define SG_COUNT_ASIC_BUG     128


-- 

./lxnt









^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
  2007-10-28 11:52                           ` Alexander Sabourenkov
@ 2007-10-28 11:10                             ` Jeff Garzik
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Garzik @ 2007-10-28 11:10 UTC (permalink / raw)
  To: Alexander Sabourenkov, Mikael Pettersson
  Cc: linux-ide, Tejun Heo, MisterE, benh

Alexander Sabourenkov wrote:
> Jeff Garzik wrote:
>> BTW, looking at the Promise code I see
>>
>>> cam_con.h:
>>> /* for ASIC bug, limit the last element of SG byteCount must < 32
>>> Dword */
>>> #define SG_COUNT_ASIC_BUG       32
>>> //#define SG_COUNT_ASIC_BUG     128
>>     and in the code itself
>>
>>> /* check PRD table, last element <= (32 Dword), fix ASIC bug */
>> (though the code obviously uses SG_COUNT_ASIC_BUG==32, as the first
>> paste indicates)
>>
>> so it seems like Promise first used 128 (32 dwords), but then backed
>> down to 32 (8 dwords).
>>
> 
> Which version is this define from?
> 
> Both versions that are available now from their website define it at 41*4:

Mikael Pettersson wrote:
> You're looking at the old pdc-ultra2 driver. The newer unified
> sataii150-300 driver (v1.01.0.23) upped the value to 41*4.


I was looking at pdc-ulsata2_1.00.0.15.tgz, which was the latest driver 
that Promise's website gave me to when I listed "SATA300 TX4" as my product.

Sounds like that is outdated information, thanks for the correction!

	Jeff



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
@ 2007-10-28 11:03 Mikael Pettersson
  0 siblings, 0 replies; 32+ messages in thread
From: Mikael Pettersson @ 2007-10-28 11:03 UTC (permalink / raw)
  To: jeff, screwdriver; +Cc: MisterE2002, benh, htejun, jgarzik, linux-ide

On Sun, 28 Oct 2007 06:29:32 -0400, Jeff Garzik wrote:
> BTW, looking at the Promise code I see
> 
> > cam_con.h:
> > /* for ASIC bug, limit the last element of SG byteCount must < 32 Dword */
> > #define SG_COUNT_ASIC_BUG       32
> > //#define SG_COUNT_ASIC_BUG     128
> 
> 	and in the code itself
> 
> > /* check PRD table, last element <= (32 Dword), fix ASIC bug */
> 
> (though the code obviously uses SG_COUNT_ASIC_BUG==32, as the first 
> paste indicates)
> 
> so it seems like Promise first used 128 (32 dwords), but then backed 
> down to 32 (8 dwords).
> 
> Either way, we definitely have an ASIC bug to work around, it seems...

You're looking at the old pdc-ultra2 driver. The newer unified
sataii150-300 driver (v1.01.0.23) upped the value to 41*4.

I've reviewed Alexander's patch, and I'm currently testing it
with the add-on patch below (fix sg_tablesize, code formatting
stuff, fix uninitialised 'addr' in a VPRINTK).

/Mikael

--- linux-2.6.24-rc1/drivers/ata/sata_promise.c.~1~	2007-10-28 11:58:01.000000000 +0100
+++ linux-2.6.24-rc1/drivers/ata/sata_promise.c	2007-10-28 12:20:53.000000000 +0100
@@ -155,7 +155,7 @@ static struct scsi_host_template pdc_ata
 	.queuecommand		= ata_scsi_queuecmd,
 	.can_queue		= ATA_DEF_QUEUE,
 	.this_id		= ATA_SHT_THIS_ID,
-	.sg_tablesize		= LIBATA_MAX_PRD,
+	.sg_tablesize		= LIBATA_MAX_PRD-1,
 	.cmd_per_lun		= ATA_SHT_CMD_PER_LUN,
 	.emulated		= ATA_SHT_EMULATED,
 	.use_clustering		= ATA_SHT_USE_CLUSTERING,
@@ -542,7 +542,7 @@ static void pdc_fill_sg(struct ata_queue
 
 	if (!(qc->flags & ATA_QCFLAG_DMAMAP))
 		return;
-	
+
 	WARN_ON(qc->__sg == NULL);
 	WARN_ON(qc->n_elem == 0 && qc->pad_len == 0);
 
@@ -579,18 +579,15 @@ static void pdc_fill_sg(struct ata_queue
 
 		if (len > SG_COUNT_ASIC_BUG) {
 			u32 addr;
-			/* if len < 2*SG_COUNT_ASIC_BUG then last
-			   segment will be larger than next-to-last.
-			   Somewhat ugly :(
-			*/
 
 			VPRINTK("Splitting last PRD.\n");
 
+			addr = le32_to_cpu(ap->prd[idx - 1].addr);
 			ap->prd[idx - 1].flags_len -= cpu_to_le32(SG_COUNT_ASIC_BUG);
 			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx - 1, addr, SG_COUNT_ASIC_BUG);
-			
-			addr = le32_to_cpu(ap->prd[idx - 1].addr) + len - SG_COUNT_ASIC_BUG;
-			len  = SG_COUNT_ASIC_BUG;
+
+			addr = addr + len - SG_COUNT_ASIC_BUG;
+			len = SG_COUNT_ASIC_BUG;
 			ap->prd[idx].addr = cpu_to_le32(addr);
 			ap->prd[idx].flags_len = cpu_to_le32(len);
 			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH-RFC] Promise TX4 implement hw-bug workaround
@ 2007-10-28 16:32 Mikael Pettersson
  0 siblings, 0 replies; 32+ messages in thread
From: Mikael Pettersson @ 2007-10-28 16:32 UTC (permalink / raw)
  To: jeff, mikpe, screwdriver; +Cc: MisterE2002, benh, htejun, jgarzik, linux-ide

On Sun, 28 Oct 2007 12:03:16 +0100 (MET), Mikael Pettersson wrote:
> On Sun, 28 Oct 2007 06:29:32 -0400, Jeff Garzik wrote:
> > BTW, looking at the Promise code I see
> > 
> > > cam_con.h:
> > > /* for ASIC bug, limit the last element of SG byteCount must < 32 Dword */
> > > #define SG_COUNT_ASIC_BUG       32
> > > //#define SG_COUNT_ASIC_BUG     128
> > 
> > 	and in the code itself
> > 
> > > /* check PRD table, last element <= (32 Dword), fix ASIC bug */
> > 
> > (though the code obviously uses SG_COUNT_ASIC_BUG==32, as the first 
> > paste indicates)
> > 
> > so it seems like Promise first used 128 (32 dwords), but then backed 
> > down to 32 (8 dwords).
> > 
> > Either way, we definitely have an ASIC bug to work around, it seems...
> 
> You're looking at the old pdc-ultra2 driver. The newer unified
> sataii150-300 driver (v1.01.0.23) upped the value to 41*4.
> 
> I've reviewed Alexander's patch, and I'm currently testing it
> with the add-on patch below (fix sg_tablesize, code formatting
> stuff, fix uninitialised 'addr' in a VPRINTK).

FYI:

Several hours of testing on a SATA300 TX4 with two 3Gbps disks went well,
as did a quick test on a SATA300 TX2plus with one SATA and one PATA disk.

I'll test further on a 1st generation controller tomorrow.

> 
> /Mikael
> 
> --- linux-2.6.24-rc1/drivers/ata/sata_promise.c.~1~	2007-10-28 11:58:01.000000000 +0100
> +++ linux-2.6.24-rc1/drivers/ata/sata_promise.c	2007-10-28 12:20:53.000000000 +0100
> @@ -155,7 +155,7 @@ static struct scsi_host_template pdc_ata
>  	.queuecommand		= ata_scsi_queuecmd,
>  	.can_queue		= ATA_DEF_QUEUE,
>  	.this_id		= ATA_SHT_THIS_ID,
> -	.sg_tablesize		= LIBATA_MAX_PRD,
> +	.sg_tablesize		= LIBATA_MAX_PRD-1,
>  	.cmd_per_lun		= ATA_SHT_CMD_PER_LUN,
>  	.emulated		= ATA_SHT_EMULATED,
>  	.use_clustering		= ATA_SHT_USE_CLUSTERING,
> @@ -542,7 +542,7 @@ static void pdc_fill_sg(struct ata_queue
>  
>  	if (!(qc->flags & ATA_QCFLAG_DMAMAP))
>  		return;
> -	
> +
>  	WARN_ON(qc->__sg == NULL);
>  	WARN_ON(qc->n_elem == 0 && qc->pad_len == 0);
>  
> @@ -579,18 +579,15 @@ static void pdc_fill_sg(struct ata_queue
>  
>  		if (len > SG_COUNT_ASIC_BUG) {
>  			u32 addr;
> -			/* if len < 2*SG_COUNT_ASIC_BUG then last
> -			   segment will be larger than next-to-last.
> -			   Somewhat ugly :(
> -			*/
>  
>  			VPRINTK("Splitting last PRD.\n");
>  
> +			addr = le32_to_cpu(ap->prd[idx - 1].addr);
>  			ap->prd[idx - 1].flags_len -= cpu_to_le32(SG_COUNT_ASIC_BUG);
>  			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx - 1, addr, SG_COUNT_ASIC_BUG);
> -			
> -			addr = le32_to_cpu(ap->prd[idx - 1].addr) + len - SG_COUNT_ASIC_BUG;
> -			len  = SG_COUNT_ASIC_BUG;
> +
> +			addr = addr + len - SG_COUNT_ASIC_BUG;
> +			len = SG_COUNT_ASIC_BUG;
>  			ap->prd[idx].addr = cpu_to_le32(addr);
>  			ap->prd[idx].flags_len = cpu_to_le32(len);
>  			VPRINTK("PRD[%u] = (0x%X, 0x%X)\n", idx, addr, len);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2007-10-28 19:13 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-03  7:26 Re[2]: Sata Sil3512 bug? Mikael Pettersson
2007-10-03  8:31 ` Alexander Sabourenkov
2007-10-03 14:45   ` Re[2]: " MisterE
2007-10-03 14:50     ` Alan Cox
2007-10-14 12:07   ` Re[2]: " MisterE
2007-10-15  8:44     ` Alexander Sabourenkov
2007-10-17 12:39   ` Re[2]: Sata Sil3512 bug?; Promise SATA300 TX4 MisterE
2007-10-17 12:54     ` Alexander Sabourenkov
2007-10-17 15:04       ` Re[2]: " MisterE
2007-10-17 19:21         ` Peter Favrholdt
2007-10-19 12:02           ` Re[2]: " MisterE
2007-10-18 21:07         ` Alexander Sabourenkov
2007-10-19  1:26           ` Tejun Heo
2007-10-19 21:06             ` Alexander Sabourenkov
2007-10-19 22:58               ` Re[2]: " MisterE
2007-10-19 23:58               ` Tejun Heo
2007-10-20 21:50                 ` Alexander Sabourenkov
2007-10-27 13:24                   ` [PATCH-RFC] (was: Re: Sata Sil3512 bug?; Promise SATA300 TX4) Alexander Sabourenkov
2007-10-27 13:44                     ` [PATCH-RFC] Alexander Sabourenkov
2007-10-27 14:08                       ` Re[2]: [PATCH-RFC] MisterE
2007-10-27 15:09                         ` [PATCH-RFC] Alexander Sabourenkov
2007-10-27 15:16                       ` [PATCH-RFC] Promise TX4 implement hw-bug workaround Alexander Sabourenkov
2007-10-27 18:09                         ` Alan Cox
2007-10-27 18:18                           ` Alexander Sabourenkov
2007-10-27 18:37                             ` Alexander Sabourenkov
2007-10-28  8:21                             ` Jeff Garzik
2007-10-28 20:03                               ` Alexander Sabourenkov
2007-10-28 10:29                         ` Jeff Garzik
2007-10-28 11:52                           ` Alexander Sabourenkov
2007-10-28 11:10                             ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2007-10-28 11:03 Mikael Pettersson
2007-10-28 16:32 Mikael Pettersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).