Linux ATA/IDE development
 help / color / mirror / Atom feed
* sata_sil data corruption, possible workarounds
@ 2012-12-15  8:02 bl0
  2012-12-15 21:55 ` Robert Hancock
  0 siblings, 1 reply; 17+ messages in thread
From: bl0 @ 2012-12-15  8:02 UTC (permalink / raw)
  To: linux-ide

I have a PCI card based on Silicon Image 3114 SATA controller. Like many
people in the past I have experienced silent data corruption.
I am lucky to have a hardware configuration where it is easy to reproduce
this behavior with 100% rate by copying a file from a USB stick plugged
into another PCI card. My motherboard has nvidia chipset.

Going through messages and bug reports about this problem, someone mentioned
that PCI cache line size may be relevant. I did some testing with different
CLS values and found that the problem of data corruption is solved if
either
A). CLS is set to 0, before or after sata_sil kernel driver is loaded
  # setpci -d 1095:3114 CACHE_LINE_SIZE=0
where 1095:3114 is the device id as shown by 'lspci -nn'. The same command
can also be used in grub2 (recent versions) shell or configuration file
before booting linux.
or
B). CLS is set to a sufficiently large value, only after sata_sil is loaded.
  # setpci -d 1095:3114 CACHE_LINE_SIZE=28
(value is hexadecimal, in 4-byte units, here it's 160 bytes)
What is a sufficiently large value depends on the value that is set before
the driver is loaded. If the value before the driver is loaded is 32 or 64
bytes, I have to increase it (after the driver is loaded) to 128 or 160
bytes, respectively.

In sata_sil.c source in sil_init_controller it writes some hardware-specific
value depending on PCI cache line size. By lowering this value I can get it
to work with lower CLS. The lowest value 0 works with CLS 64 bytes. If the
CLS is 32 bytes, I have to increase the CLS.

Data corruption is the biggest problem for me and these workarounds help but
another problem remains, sometimes when accessing multiple PCI devices at
the same time sata becomes inaccessible and times out with log messages
similar to:
[  411.351805] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[  411.351824] ata3.00: cmd c8/00:00:00:af:00/00:00:00:00:00/e0 tag 0 dma
131072 in
[  411.351826]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
(timeout)
[  411.351830] ata3.00: status: { DRDY }
[  411.351843] ata3: hard resetting link
[  411.671775] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  411.697059] ata3.00: configured for UDMA/100
[  411.697080] ata3: EH complete

Reboot is needed to access sata drives again. If I had the root filesystem
on a sata drive it would probably crash the system.

Another thing that may be related. Comparing lspci output reveals that when
accessing multiple PCI devices at the same time, the flag DiscTmrStat
(Discard Timer Status) gets toggled on for device "00:08.0 PCI bridge:
nVidia Corporation nForce2 External PCI Bridge". I don't know if it's
normal or not.

Finally, the same simple test that I use on Linux does not produce data
corruption on FreeBSD. Either this problem doesn't occur there or it's not
trivial to reproduce.

This bug has been around for so long. I hope someone will find this
information useful.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-01-15  7:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-15  8:02 sata_sil data corruption, possible workarounds bl0
2012-12-15 21:55 ` Robert Hancock
2012-12-16 12:21   ` bl0
2012-12-17  5:44     ` Robert Hancock
2012-12-18 15:23       ` bl0
2012-12-19  3:44         ` Robert Hancock
2012-12-20  8:54           ` bl0
2013-01-07  4:11             ` Robert Hancock
2013-01-08 12:25               ` bl0
2012-12-24 14:37         ` bl0
2013-01-09  4:48           ` Robert Hancock
2013-01-09 19:17             ` Tejun Heo
2013-01-11 10:28               ` bl0
2013-01-11 13:53               ` Mark Lord
2013-01-11 13:54                 ` Mark Lord
2013-01-14 17:58                   ` Jeff Garzik
2013-01-15  7:44                     ` bl0

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox