linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Tejun Heo <htejun@gmail.com>
Cc: Greg Freemyer <greg.freemyer@gmail.com>,
	Jens Axboe <axboe@suse.de>,
	linux@horizon.com, linux-ide@vger.kernel.org
Subject: Re: sata_sil24 corruption details
Date: Fri, 11 Nov 2005 02:32:46 +0900	[thread overview]
Message-ID: <4373843E.2030308@gmail.com> (raw)
In-Reply-To: <43735C19.4040402@gmail.com>

Tejun Heo wrote:
> Greg Freemyer wrote:
> 
>> On 11/10/05, Tejun Heo <htejun@gmail.com> wrote:
>>
>>> linux@horizon.com wrote:
>>>
>>>> Three days ago, I wrote:
>>>>
>>>>
>>>>> I finished "badblocks -b 4096 -c 65536 -s -v -w -t random" run on 350
>>>>> G of one drive without seeing problems, and am working on the other 5.
>>>>> (In parallel, just to stress the driver.)
>>>>
>>>>
>>>>
>>>> My parallel -p1 badblocks runs (I shrunk the chunk size to -c 16384)
>>>> finished on 3 of the 5 drives, but after 69 hours and I don't know how
>>>> many passes, it's still running on one pair of drives.  Interestingly,
>>>> the pair (sdc4 & sdd4) is connected to a single controller.
>>>>
>>>> Thus, it might not be a multiple-controller issue (I don't know how
>>>> many other people have 3 Sil3132s in a system), but perhaps an issue
>>>> with simultaneous activity on the 2 ports of a single controller.
>>>>
>>>> Is there anything else I could do to help debug this problem?  Any 
>>>> additional
>>>> debugging I can enable?
>>>>
>>>> It would take me a while to clean the backups off the system and move
>>>> it outside the firewall to allow remote access if someone wants access
>>>> to that particular hardware, but it's just an expensive bit bucket at
>>>> the moment, so ask if it would help...
>>>
>>>
>>> Hello, there.
>>>
>>> I'll soon try to tackle this one.  However, I currently have only one
>>> 3124 controller and one harddisk to hook to that controller, so I cannot
>>> reproduce your setup over here.  Here are things that I think might help
>>> in diagnosing the problem.
>>>
>>> * Trying other drivers
>>>        * Trying the original driver.  I'll port the original driver
>>>          from sii to the current tree and post the patch.
>>>        * Performing similar test under Windows.
>>>
>>> * Ruling out disk problem
>>>        * Trying other harddisks.  All harddisk drives perform error
>>>          detection/correction when data are read from the media, but
>>>          ruling out the possibility would still be helpful.
>>>
>>> * If you have log of failed sectors, finding patterns will be helpful.
>>>  If the errors occur at random places, it's likely that we have
>>>  controller/driver issues.  If errors are localized over multiple runs,
>>>  maybe the disk is at fault.
>>>
>>> -- 
>>> tejun
>>
>>
>>
>> Tejun,
>>
>> I assume you saw my e-mail that with a 3112 and a single SATA drive we
>> were seeing corruption as well.  That being the case I think you
>> should first verify that corruption is not occuring in the single SATA
>> drive case.
>>
>> Our test was to create a bunch of 2 GB files on a PATA drive.
>>
>> We simply used a drive with real data as the source of our test files.
>> ie. IIRC: cd test_dir; dd if=/dev/hde conv=noerror,sync | split -b 2000m
>>
>> Then we calculated the md5 of all the 2 GB pieces.  All of this done
>> in a pure PATA setup.
>>
>> Then we connected a SATA drive to a 3112 and simply copied the files
>> from the PATA drive to the SATA drive and verified the md5 values.  We
>> found corruption in 1 - 3% of the files copied.
>>
>> FYI: The above are all very common steps for a computer forensic
>> examine, thus we found this issue in our attempts to qualify the 3112
>> as part of our forensic equipment.  We have not tested since 2.6.11
>> and that was with a SUSE kernel.
>>
> 
> Hi,
> 
> I'll run single drive test on sil3112 tonight, but can you please try 
> 2.6.14?  IIRC, there have been some PCI FIFO setting change.  Hmmm.. 
> oh.. it was the following commit.
> 
> ---
> $ git-cat-file commit e1dd23a0012c3929737798fda9fede0e783f4ff3
> tree c7f808b6433ef1015f55418e7f11f432943bdefd
> parent 5273a00d9c763108397658d440618f7ac3e40f83
> author Jens Axboe <axboe@suse.de> 1118228545 +0200
> committer Jeff Garzik <jgarzik@pobox.com> 1118300782 -0400
> 
> [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration kernel oops
> 
> Correct this.
> ---
> 
> Jens, is it possible that above change fixes data corruption?
> 

Greg, first pass of 'badblocks -t random -v -w' on 100G partion of 160G 
disk just finished without any error.  This is samsung hd160jj drive on 
sil3112 controller.  I'll let badblocks run thorough the night and 
perform file copy & md5sum test tomorrow.  But my hunch is that there is 
no common data corruption problem with sil3112.  It's just in too 
wide-spread use to have such data corruption problem with so few reportings.

What exact controller/disk did you use?  Care to retest your setup with 
2.6.14?

-- 
tejun

  parent reply	other threads:[~2005-11-10 17:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-07  9:59 sata_sil24 corruption details linux
2005-11-07 16:15 ` Greg Freemyer
2005-11-10  7:17 ` linux
2005-11-10  9:01   ` Tejun Heo
2005-11-10 14:15     ` Greg Freemyer
2005-11-10 14:41       ` Tejun Heo
2005-11-10 15:26         ` linux
2005-11-10 17:32         ` Tejun Heo [this message]
2005-11-10 20:34           ` Greg Freemyer
2005-11-12  0:49             ` Greg Freemyer
2005-11-12  2:59               ` Tejun Heo
2005-11-13 10:19                 ` Tejun Heo
2005-11-14 23:30                   ` Greg Freemyer
2005-11-18  2:23                     ` sata_sil24 corruption FIXED by motherboard swap linux
2005-11-18 19:36                       ` sata_sil24 test support linux
2005-11-22  0:23                         ` linux
2005-11-22  1:52                           ` Tejun Heo
2005-11-11  2:16           ` sata_sil24 corruption details linux
2005-11-13  6:11             ` linux
2005-11-10 17:39         ` Jens Axboe
2005-11-10 20:27   ` Edward Falk
  -- strict thread matches above, loose matches on Subject: below --
2005-11-07 16:05 SMALL, Timothy
2005-11-15  9:30 SMALL, Timothy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4373843E.2030308@gmail.com \
    --to=htejun@gmail.com \
    --cc=axboe@suse.de \
    --cc=greg.freemyer@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux@horizon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).