From: Tejun Heo <htejun@gmail.com>
To: Tejun Heo <htejun@gmail.com>
Cc: Greg Freemyer <greg.freemyer@gmail.com>,
Jens Axboe <axboe@suse.de>,
linux@horizon.com, linux-ide@vger.kernel.org
Subject: Re: sata_sil24 corruption details
Date: Fri, 11 Nov 2005 02:32:46 +0900 [thread overview]
Message-ID: <4373843E.2030308@gmail.com> (raw)
In-Reply-To: <43735C19.4040402@gmail.com>
Tejun Heo wrote:
> Greg Freemyer wrote:
>
>> On 11/10/05, Tejun Heo <htejun@gmail.com> wrote:
>>
>>> linux@horizon.com wrote:
>>>
>>>> Three days ago, I wrote:
>>>>
>>>>
>>>>> I finished "badblocks -b 4096 -c 65536 -s -v -w -t random" run on 350
>>>>> G of one drive without seeing problems, and am working on the other 5.
>>>>> (In parallel, just to stress the driver.)
>>>>
>>>>
>>>>
>>>> My parallel -p1 badblocks runs (I shrunk the chunk size to -c 16384)
>>>> finished on 3 of the 5 drives, but after 69 hours and I don't know how
>>>> many passes, it's still running on one pair of drives. Interestingly,
>>>> the pair (sdc4 & sdd4) is connected to a single controller.
>>>>
>>>> Thus, it might not be a multiple-controller issue (I don't know how
>>>> many other people have 3 Sil3132s in a system), but perhaps an issue
>>>> with simultaneous activity on the 2 ports of a single controller.
>>>>
>>>> Is there anything else I could do to help debug this problem? Any
>>>> additional
>>>> debugging I can enable?
>>>>
>>>> It would take me a while to clean the backups off the system and move
>>>> it outside the firewall to allow remote access if someone wants access
>>>> to that particular hardware, but it's just an expensive bit bucket at
>>>> the moment, so ask if it would help...
>>>
>>>
>>> Hello, there.
>>>
>>> I'll soon try to tackle this one. However, I currently have only one
>>> 3124 controller and one harddisk to hook to that controller, so I cannot
>>> reproduce your setup over here. Here are things that I think might help
>>> in diagnosing the problem.
>>>
>>> * Trying other drivers
>>> * Trying the original driver. I'll port the original driver
>>> from sii to the current tree and post the patch.
>>> * Performing similar test under Windows.
>>>
>>> * Ruling out disk problem
>>> * Trying other harddisks. All harddisk drives perform error
>>> detection/correction when data are read from the media, but
>>> ruling out the possibility would still be helpful.
>>>
>>> * If you have log of failed sectors, finding patterns will be helpful.
>>> If the errors occur at random places, it's likely that we have
>>> controller/driver issues. If errors are localized over multiple runs,
>>> maybe the disk is at fault.
>>>
>>> --
>>> tejun
>>
>>
>>
>> Tejun,
>>
>> I assume you saw my e-mail that with a 3112 and a single SATA drive we
>> were seeing corruption as well. That being the case I think you
>> should first verify that corruption is not occuring in the single SATA
>> drive case.
>>
>> Our test was to create a bunch of 2 GB files on a PATA drive.
>>
>> We simply used a drive with real data as the source of our test files.
>> ie. IIRC: cd test_dir; dd if=/dev/hde conv=noerror,sync | split -b 2000m
>>
>> Then we calculated the md5 of all the 2 GB pieces. All of this done
>> in a pure PATA setup.
>>
>> Then we connected a SATA drive to a 3112 and simply copied the files
>> from the PATA drive to the SATA drive and verified the md5 values. We
>> found corruption in 1 - 3% of the files copied.
>>
>> FYI: The above are all very common steps for a computer forensic
>> examine, thus we found this issue in our attempts to qualify the 3112
>> as part of our forensic equipment. We have not tested since 2.6.11
>> and that was with a SUSE kernel.
>>
>
> Hi,
>
> I'll run single drive test on sil3112 tonight, but can you please try
> 2.6.14? IIRC, there have been some PCI FIFO setting change. Hmmm..
> oh.. it was the following commit.
>
> ---
> $ git-cat-file commit e1dd23a0012c3929737798fda9fede0e783f4ff3
> tree c7f808b6433ef1015f55418e7f11f432943bdefd
> parent 5273a00d9c763108397658d440618f7ac3e40f83
> author Jens Axboe <axboe@suse.de> 1118228545 +0200
> committer Jeff Garzik <jgarzik@pobox.com> 1118300782 -0400
>
> [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration kernel oops
>
> Correct this.
> ---
>
> Jens, is it possible that above change fixes data corruption?
>
Greg, first pass of 'badblocks -t random -v -w' on 100G partion of 160G
disk just finished without any error. This is samsung hd160jj drive on
sil3112 controller. I'll let badblocks run thorough the night and
perform file copy & md5sum test tomorrow. But my hunch is that there is
no common data corruption problem with sil3112. It's just in too
wide-spread use to have such data corruption problem with so few reportings.
What exact controller/disk did you use? Care to retest your setup with
2.6.14?
--
tejun
next prev parent reply other threads:[~2005-11-10 17:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-07 9:59 sata_sil24 corruption details linux
2005-11-07 16:15 ` Greg Freemyer
2005-11-10 7:17 ` linux
2005-11-10 9:01 ` Tejun Heo
2005-11-10 14:15 ` Greg Freemyer
2005-11-10 14:41 ` Tejun Heo
2005-11-10 15:26 ` linux
2005-11-10 17:32 ` Tejun Heo [this message]
2005-11-10 20:34 ` Greg Freemyer
2005-11-12 0:49 ` Greg Freemyer
2005-11-12 2:59 ` Tejun Heo
2005-11-13 10:19 ` Tejun Heo
2005-11-14 23:30 ` Greg Freemyer
2005-11-18 2:23 ` sata_sil24 corruption FIXED by motherboard swap linux
2005-11-18 19:36 ` sata_sil24 test support linux
2005-11-22 0:23 ` linux
2005-11-22 1:52 ` Tejun Heo
2005-11-11 2:16 ` sata_sil24 corruption details linux
2005-11-13 6:11 ` linux
2005-11-10 17:39 ` Jens Axboe
2005-11-10 20:27 ` Edward Falk
-- strict thread matches above, loose matches on Subject: below --
2005-11-07 16:05 SMALL, Timothy
2005-11-15 9:30 SMALL, Timothy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4373843E.2030308@gmail.com \
--to=htejun@gmail.com \
--cc=axboe@suse.de \
--cc=greg.freemyer@gmail.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux@horizon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).