From: Robert Hancock <hancockr@shaw.ca>
To: Bernd Schubert <bs@q-leap.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
Justin Piszcz <jpiszcz@lucidpixels.com>,
debian-user@lists.debian.org, linux-raid@vger.kernel.org,
linux-ide@vger.kernel.org
Subject: Re: Corrupt data - RAID sata_sil 3114 chip
Date: Sat, 03 Jan 2009 17:23:34 -0600 [thread overview]
Message-ID: <495FF376.3050909@shaw.ca> (raw)
In-Reply-To: <20090103211147.GA3707@lanczos.q-leap.de>
Bernd Schubert wrote:
> On Sat, Jan 03, 2009 at 02:53:09PM -0600, Robert Hancock wrote:
>> Bernd Schubert wrote:
>>> [sorry sent again, since Robert dropped all mailing list CCs and I
>>> didn't notice first]
>>>
>>> On Sat, Jan 03, 2009 at 12:31:12PM -0600, Robert Hancock wrote:
>>>> Bernd Schubert wrote:
>>>>> On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:
>>>>>> On Fri, 2 Jan 2009 22:30:07 +0100
>>>>>> Bernd Schubert <bs@q-leap.de> wrote:
>>>>>>
>>>>>>> Hello Bengt,
>>>>>>>
>>>>>>> sil3114 is known to cause data corruption with some disks.
>>>>>> News to me. There are a few people with lots of SI and other devices
>>>>> No no, you just forgot about it, since you even reviewed the patches ;)
>>>>>
>>>>> http://lkml.org/lkml/2007/10/11/137
>>>> And Jeff explained why they were not merged:
>>>>
>>>> http://lkml.org/lkml/2007/10/11/166
>>>>
>>>> All the patch does is try to reduce the speed impact of the
>>>> workaround. But as was pointed out, they don't reliably solve the
>>>> problem the workaround is trying to fix, and besides, the workaround
>>>> is already not applied to SiI3114 at all, as it is apparently not
>>>> applicable on that controller (only 3112).
>>> Well, do they reliable solve the problem in our case (before taking the patch
>>> into production I run a checksum tests for about 2 weeks). Anyway, I entirely
>>> understand the patches didn't get accepted.
>>>
>>> But now more than a year has passed again without doing anything
>>> about it and actually this is what I strongly criticize. Most people don't
>>> know about issues like that and don't run file checksum tests as I now always
>>> do before taking a disk into production. So users are exposed to known
>>> data corruption problems without even being warned about it. Usually
>>> even backups don't help, since one creates a backup of the corrupted data.
>>>
>>> So IMHO, the driver should be deactived for sil3114 until a real
>>> solution is found. And it only should be possible to force activate it
>>> by a kernel flag, which then also would print a huuuge warning about
>>> possible data corruption (unfortunately most distributions disables
>>> inital kernel messages *grumble*).
>> If the corruption was happening on all such controllers then people
>> would have been complaining in droves and something would have been
>> done. It seems much more likely that in this case the problem is some
>> kind of hardware fault or combination of hardware which is causing the
>> problem. Unfortunately these kind of not-easily-reproducible issues tend
>> to be very hard to track down.
>>
>
> Well yes, it only happens with certain drives. But these drives work fine on
> other controllers. But still these are by now
> known issues and nothing is done for that.
> I would happily help to solve the problem, I just don't have any knowledge
> about hardware programming. What would be your next step, if you had remote
> access to such a system?
Have you been able to track down what kind of corruption is occurring
exactly, i.e. what is happening to the data, is data being zeroed out,
random bits being flipped, chunks of a certain size being corrupted,
etc. That would likely be useful in determining where to go next..
next prev parent reply other threads:[~2009-01-03 23:23 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-03 20:04 Corrupt data - RAID sata_sil 3114 chip Bernd Schubert
2009-01-03 20:53 ` Robert Hancock
2009-01-03 21:11 ` Bernd Schubert
2009-01-03 23:23 ` Robert Hancock [this message]
2009-01-07 4:59 ` Tejun Heo
2009-01-07 5:38 ` Robert Hancock
2009-01-07 15:31 ` Bernd Schubert
2009-01-11 0:32 ` Robert Hancock
2009-01-11 0:43 ` Robert Hancock
2009-01-12 1:30 ` Tejun Heo
2009-01-19 18:43 ` Dave Jones
2009-01-20 2:50 ` Robert Hancock
2009-01-20 20:07 ` Dave Jones
-- strict thread matches above, loose matches on Subject: below --
2010-01-29 16:13 Ulli.Brennenstuhl
2010-01-29 19:37 ` Robert Hancock
2010-02-06 3:54 ` Tejun Heo
2010-02-06 15:16 ` Tim Small
2010-02-07 16:09 ` Robert Hancock
2010-02-08 2:31 ` Tejun Heo
2010-02-08 14:25 ` Tim Small
[not found] <bQVFb-3SB-37@gated-at.bofh.it>
[not found] ` <bQVFb-3SB-39@gated-at.bofh.it>
[not found] ` <bQVFb-3SB-41@gated-at.bofh.it>
[not found] ` <bQVFc-3SB-43@gated-at.bofh.it>
[not found] ` <bQVFc-3SB-45@gated-at.bofh.it>
[not found] ` <bQVFc-3SB-47@gated-at.bofh.it>
[not found] ` <bQVFb-3SB-35@gated-at.bofh.it>
[not found] ` <4963306F.4060504@sm7jqb.se>
2009-01-06 10:48 ` Justin Piszcz
[not found] <495E01E3.9060903@sm7jqb.se>
[not found] ` <alpine.DEB.1.10.0901020741200.11852@p34.internal.lan>
2009-01-02 21:30 ` Bernd Schubert
2009-01-02 21:47 ` Twigathy
2009-01-03 2:31 ` Redeeman
2009-01-03 13:13 ` Bernd Schubert
2009-01-03 13:39 ` Alan Cox
2009-01-03 16:20 ` Bernd Schubert
2009-01-03 18:31 ` Robert Hancock
2009-01-03 22:19 ` James Youngman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=495FF376.3050909@shaw.ca \
--to=hancockr@shaw.ca \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=bs@q-leap.de \
--cc=debian-user@lists.debian.org \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).