From: Tejun Heo <htejun@gmail.com>
To: Tejun Heo <htejun@gmail.com>
Cc: Greg Freemyer <greg.freemyer@gmail.com>,
Jens Axboe <axboe@suse.de>,
linux@horizon.com, linux-ide@vger.kernel.org
Subject: Re: sata_sil24 corruption details
Date: Fri, 11 Nov 2005 02:32:46 +0900 [thread overview]
Message-ID: <4373843E.2030308@gmail.com> (raw)
In-Reply-To: <43735C19.4040402@gmail.com>
Tejun Heo wrote:
> Greg Freemyer wrote:
>
>> On 11/10/05, Tejun Heo <htejun@gmail.com> wrote:
>>
>>> linux@horizon.com wrote:
>>>
>>>> Three days ago, I wrote:
>>>>
>>>>
>>>>> I finished "badblocks -b 4096 -c 65536 -s -v -w -t random" run on 350
>>>>> G of one drive without seeing problems, and am working on the other 5.
>>>>> (In parallel, just to stress the driver.)
>>>>
>>>>
>>>>
>>>> My parallel -p1 badblocks runs (I shrunk the chunk size to -c 16384)
>>>> finished on 3 of the 5 drives, but after 69 hours and I don't know how
>>>> many passes, it's still running on one pair of drives. Interestingly,
>>>> the pair (sdc4 & sdd4) is connected to a single controller.
>>>>
>>>> Thus, it might not be a multiple-controller issue (I don't know how
>>>> many other people have 3 Sil3132s in a system), but perhaps an issue
>>>> with simultaneous activity on the 2 ports of a single controller.
>>>>
>>>> Is there anything else I could do to help debug this problem? Any
>>>> additional
>>>> debugging I can enable?
>>>>
>>>> It would take me a while to clean the backups off the system and move
>>>> it outside the firewall to allow remote access if someone wants access
>>>> to that particular hardware, but it's just an expensive bit bucket at
>>>> the moment, so ask if it would help...
>>>
>>>
>>> Hello, there.
>>>
>>> I'll soon try to tackle this one. However, I currently have only one
>>> 3124 controller and one harddisk to hook to that controller, so I cannot
>>> reproduce your setup over here. Here are things that I think might help
>>> in diagnosing the problem.
>>>
>>> * Trying other drivers
>>> * Trying the original driver. I'll port the original driver
>>> from sii to the current tree and post the patch.
>>> * Performing similar test under Windows.
>>>
>>> * Ruling out disk problem
>>> * Trying other harddisks. All harddisk drives perform error
>>> detection/correction when data are read from the media, but
>>> ruling out the possibility would still be helpful.
>>>
>>> * If you have log of failed sectors, finding patterns will be helpful.
>>> If the errors occur at random places, it's likely that we have
>>> controller/driver issues. If errors are localized over multiple runs,
>>> maybe the disk is at fault.
>>>
>>> --
>>> tejun
>>
>>
>>
>> Tejun,
>>
>> I assume you saw my e-mail that with a 3112 and a single SATA drive we
>> were seeing corruption as well. That being the case I think you
>> should first verify that corruption is not occuring in the single SATA
>> drive case.
>>
>> Our test was to create a bunch of 2 GB files on a PATA drive.
>>
>> We simply used a drive with real data as the source of our test files.
>> ie. IIRC: cd test_dir; dd if=/dev/hde conv=noerror,sync | split -b 2000m
>>
>> Then we calculated the md5 of all the 2 GB pieces. All of this done
>> in a pure PATA setup.
>>
>> Then we connected a SATA drive to a 3112 and simply copied the files
>> from the PATA drive to the SATA drive and verified the md5 values. We
>> found corruption in 1 - 3% of the files copied.
>>
>> FYI: The above are all very common steps for a computer forensic
>> examine, thus we found this issue in our attempts to qualify the 3112
>> as part of our forensic equipment. We have not tested since 2.6.11
>> and that was with a SUSE kernel.
>>
>
> Hi,
>
> I'll run single drive test on sil3112 tonight, but can you please try
> 2.6.14? IIRC, there have been some PCI FIFO setting change. Hmmm..
> oh.. it was the following commit.
>
> ---
> $ git-cat-file commit e1dd23a0012c3929737798fda9fede0e783f4ff3
> tree c7f808b6433ef1015f55418e7f11f432943bdefd
> parent 5273a00d9c763108397658d440618f7ac3e40f83
> author Jens Axboe <axboe@suse.de> 1118228545 +0200
> committer Jeff Garzik <jgarzik@pobox.com> 1118300782 -0400
>
> [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration kernel oops
>
> Correct this.
> ---
>
> Jens, is it possible that above change fixes data corruption?
>
Greg, first pass of 'badblocks -t random -v -w' on 100G partion of 160G
disk just finished without any error. This is samsung hd160jj drive on
sil3112 controller. I'll let badblocks run thorough the night and
perform file copy & md5sum test tomorrow. But my hunch is that there is
no common data corruption problem with sil3112. It's just in too
wide-spread use to have such data corruption problem with so few reportings.
What exact controller/disk did you use? Care to retest your setup with
2.6.14?
--
tejun
next prev parent reply other threads:[~2005-11-10 17:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-07 9:59 sata_sil24 corruption details linux
2005-11-07 16:15 ` Greg Freemyer
2005-11-10 7:17 ` linux
2005-11-10 9:01 ` Tejun Heo
2005-11-10 14:15 ` Greg Freemyer
2005-11-10 14:41 ` Tejun Heo
2005-11-10 15:26 ` linux
2005-11-10 17:32 ` Tejun Heo [this message]
2005-11-10 20:34 ` Greg Freemyer
2005-11-12 0:49 ` Greg Freemyer
2005-11-12 2:59 ` Tejun Heo
2005-11-13 10:19 ` Tejun Heo
2005-11-14 23:30 ` Greg Freemyer
2005-11-18 2:23 ` sata_sil24 corruption FIXED by motherboard swap linux
2005-11-18 19:36 ` sata_sil24 test support linux
2005-11-22 0:23 ` linux
2005-11-22 1:52 ` Tejun Heo
2005-11-11 2:16 ` sata_sil24 corruption details linux
2005-11-13 6:11 ` linux
2005-11-10 17:39 ` Jens Axboe
2005-11-10 20:27 ` Edward Falk
-- strict thread matches above, loose matches on Subject: below --
2005-11-07 16:05 SMALL, Timothy
2005-11-15 9:30 SMALL, Timothy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4373843E.2030308@gmail.com \
--to=htejun@gmail.com \
--cc=axboe@suse.de \
--cc=greg.freemyer@gmail.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux@horizon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.