linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Corrupt data - RAID sata_sil 3114 chip
@ 2010-01-29 16:13 Ulli.Brennenstuhl
  2010-01-29 19:37 ` Robert Hancock
  0 siblings, 1 reply; 29+ messages in thread
From: Ulli.Brennenstuhl @ 2010-01-29 16:13 UTC (permalink / raw)
  To: linux-ide

The last message of this discussion is more than one year old, but still 
there was no solution to this problem.

I recently encountered the same problem that a raid created with mdadm 
consisting of three SAMSUNG HD154UI sata harddisks had random errors and 
mdadm --examine would randomly report that checksums are wrong/correct.

The sata controller with the SIL 3114 chipset runs on an old Epox 8K3A 
board with a VIA KT133 chipset. I noticed that placing the controller in 
another pci slot would change the results of mdadm --examine.
While in one slot it was the checksums were randomly changing between 
correct and wrong in another slot it was always displayed as wrong.

After deactivating every single bios option that somehow optimizes the 
pci bus the problem seems to be gone. After some more testing I could 
narrow the problem down to the option "PCI Master 0 WS Write", which 
controls if requests to the pci bus are executed immediately (with zero 
wait states) or if every write request will be delayed by one wait state.

Obviously this reduces the performance. I didn't perform tests but the 
resync speed of the raid dropped from ~ 28mb/s to ~ 17mb/s.

I hope this also solves the problems for other people and it would be 
interesting if any change to the driver would allow to reenable the "PCI 
Master 0 WS Write" option.


Regards,
Ulli Brennenstuhl


^ permalink raw reply	[flat|nested] 29+ messages in thread
[parent not found: <bQVFb-3SB-37@gated-at.bofh.it>]
* Re: Corrupt data - RAID sata_sil 3114 chip
@ 2009-01-03 20:04 Bernd Schubert
  2009-01-03 20:53 ` Robert Hancock
  2009-01-07  4:59 ` Tejun Heo
  0 siblings, 2 replies; 29+ messages in thread
From: Bernd Schubert @ 2009-01-03 20:04 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Alan Cox, Justin Piszcz, debian-user, linux-raid, linux-ide

[sorry sent again, since Robert dropped all mailing list CCs and I didn't 
notice first]

On Sat, Jan 03, 2009 at 12:31:12PM -0600, Robert Hancock wrote:
> Bernd Schubert wrote:
>> On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:
>>> On Fri, 2 Jan 2009 22:30:07 +0100
>>> Bernd Schubert <bs@q-leap.de> wrote:
>>>
>>>> Hello Bengt,
>>>>
>>>> sil3114 is known to cause data corruption with some disks. 
>>> News to me. There are a few people with lots of SI and other devices
>>
>> No no, you just forgot about it, since you even reviewed the patches ;)
>>
>> http://lkml.org/lkml/2007/10/11/137
>
> And Jeff explained why they were not merged:
>
> http://lkml.org/lkml/2007/10/11/166
>
> All the patch does is try to reduce the speed impact of the workaround.  
> But as was pointed out, they don't reliably solve the problem the  
> workaround is trying to fix, and besides, the workaround is already not  
> applied to SiI3114 at all, as it is apparently not applicable on that  
> controller (only 3112).

Well, do they reliable solve the problem in our case (before taking the patch
into production I run a checksum tests for about 2 weeks). Anyway, I entirely
understand the patches didn't get accepted. 

But now more than a year has passed again without doing anything
about it and actually this is what I strongly criticize. Most people don't
know about issues like that and don't run file checksum tests as I now always
do before taking a disk into production. So users are exposed to known
data corruption problems without even being warned about it. Usually
even backups don't help, since one creates a backup of the corrupted data.

So IMHO, the driver should be deactived for sil3114 until a real solution is 
found. And it only should be possible to force activate it by a kernel flag, 
which then also would print a huuuge warning about possible data corruption 
(unfortunately most distributions disables inital kernel messages *grumble*).


Cheers,
Bernd




^ permalink raw reply	[flat|nested] 29+ messages in thread
[parent not found: <495E01E3.9060903@sm7jqb.se>]

end of thread, other threads:[~2010-02-08 14:26 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-29 16:13 Corrupt data - RAID sata_sil 3114 chip Ulli.Brennenstuhl
2010-01-29 19:37 ` Robert Hancock
2010-02-06  3:54   ` Tejun Heo
2010-02-06 15:16     ` Tim Small
2010-02-07 16:09       ` Robert Hancock
2010-02-08  2:31         ` Tejun Heo
2010-02-08 14:25         ` Tim Small
     [not found] <bQVFb-3SB-37@gated-at.bofh.it>
     [not found] ` <bQVFb-3SB-39@gated-at.bofh.it>
     [not found]   ` <bQVFb-3SB-41@gated-at.bofh.it>
     [not found]     ` <bQVFc-3SB-43@gated-at.bofh.it>
     [not found]       ` <bQVFc-3SB-45@gated-at.bofh.it>
     [not found]         ` <bQVFc-3SB-47@gated-at.bofh.it>
     [not found]           ` <bQVFb-3SB-35@gated-at.bofh.it>
     [not found]             ` <4963306F.4060504@sm7jqb.se>
2009-01-06 10:48               ` Justin Piszcz
  -- strict thread matches above, loose matches on Subject: below --
2009-01-03 20:04 Bernd Schubert
2009-01-03 20:53 ` Robert Hancock
2009-01-03 21:11   ` Bernd Schubert
2009-01-03 23:23     ` Robert Hancock
2009-01-07  4:59 ` Tejun Heo
2009-01-07  5:38   ` Robert Hancock
2009-01-07 15:31     ` Bernd Schubert
2009-01-11  0:32       ` Robert Hancock
2009-01-11  0:43         ` Robert Hancock
2009-01-12  1:30           ` Tejun Heo
2009-01-19 18:43             ` Dave Jones
2009-01-20  2:50               ` Robert Hancock
2009-01-20 20:07                 ` Dave Jones
     [not found] <495E01E3.9060903@sm7jqb.se>
     [not found] ` <alpine.DEB.1.10.0901020741200.11852@p34.internal.lan>
2009-01-02 21:30   ` Bernd Schubert
2009-01-02 21:47     ` Twigathy
2009-01-03  2:31     ` Redeeman
2009-01-03 13:13       ` Bernd Schubert
2009-01-03 13:39     ` Alan Cox
2009-01-03 16:20       ` Bernd Schubert
2009-01-03 18:31         ` Robert Hancock
2009-01-03 22:19     ` James Youngman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).