Re: Corrupt data - RAID sata_sil 3114 chip

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robert Hancock <hancockr@shaw.ca>
Cc: Bernd Schubert <bs@q-leap.de>, Tejun Heo <tj@kernel.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Justin Piszcz <jpiszcz@lucidpixels.com>,
	debian-user@lists.debian.org, linux-raid@vger.kernel.org,
	linux-ide@vger.kernel.org
Subject: Re: Corrupt data - RAID sata_sil 3114 chip
Date: Sat, 10 Jan 2009 18:43:00 -0600	[thread overview]
Message-ID: <49694094.60501@shaw.ca> (raw)
In-Reply-To: <49693E08.3050209@shaw.ca>

Robert Hancock wrote:
> Bernd Schubert wrote:
>>>> I think it's something related to setting up the PCI side of things.
>>>> There have been hints that incorrect CLS setting was the culprit and I
>>>> tried thte combinations but without any success and unfortunately the
>>>> problem wasn't reproducible with the hardware I have here.  :-(
>>> As far as the cache line size register, the only thing the documentation
>>> says it controls _directly_ is "With the SiI3114 as a master, initiating
>>> a read transaction, it issues PCI command Read Multiple in place, when
>>> empty space in its FIFO is larger than the value programmed in this
>>> register."
>>>
>>> The interesting thing is the commit (log below) that added code to the
>>> driver to check the PCI cache line size register and set up the FIFO
>>> thresholds:
>>>
>>>    2005/03/24 23:32:42-05:00 Carlos.Pardo
>>>    [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration
>>>
>>>    This patch set default values for the FIFO PCI Bus Arbitration to
>>>    avoid data corruption. The root cause is due to our PCI bus master
>>>    handling mismatch with the chipset PCI bridge during DMA xfer (write
>>>    data to the device). The patch is to setup the DMA fifo threshold so
>>>    that there is no chance for the DMA engine to change protocol. We 
>>> have
>>>    seen this problem only on one motherboard.
>>>
>>>    Signed-off-by: Silicon Image Corporation <cpardo@siliconimage.com>
>>>    Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
>>> 4
>>> What the code's doing is setting the FIFO thresholds, used to assign
>>> priority when requesting a PCI bus read or write operation, based on the
>>> cache line size somehow. It seems to be trusting that the chip's cache
>>> line size register has been set properly by the BIOS. The kernel should
>>> know what the cache line size is but AFAIK normally only sets it when
>>> the driver requests MWI. This chip doesn't support MWI, but it looks
>>> like pci_set_mwi would fix up the CLS register as a side effect..
>>>
>>>> Anyways, there was an interesting report that updating the BIOS on the
>>>> controller fixed the problem.
>>>>
>>>>   http://bugzilla.kernel.org/show_bug.cgi?id=10480
>>>>
>>>> Taking "lspci -nnvvvxxx" output of before and after such BIOS update
>>>> will shed some light on what's really going on.  Can you please try
>>>> that?
>>> Yes, that would be quite interesting.. the output even with the current
>>> BIOS would be useful to see if the BIOS set some stupid cache line size
>>> value..
>>
>> Unfortunately I can't update the bios/firmware of the Sil3114 
>> directly, it is onboard and the firmware is included into the 
>> mainboard bios. There is not the most recent bios version installed, 
>> but when we initially had the problems, we first tried a bios update, 
>> but it didn't help.
> 
> Well if one is really adventurous one can sometimes use some BIOS image 
> editing tools to install an updated flash image for such integrated 
> chips into the main BIOS image. This is definitely for advanced users 
> only though..
> 
>>
>> As suggested by Robert, I'm presently trying to figure out the 
>> corruption pattern. Actually our test tool easily provides these data. 
>> Unfortunately, it so far didn't report anything, although the reiserfs 
>> already got corrupted. Might be my colleague, who wrote that tool, 
>> recently broke something (as it is the second time, it doesn't report 
>> corruptions), in the past it did work reliably. Please give me a few 
>> more days...
>>
>>
>> 03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114 
>> [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
>>         Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller 
>> [1095:3114]
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
>> ParErr- Stepping- SERR+ FastB2B-
>>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
>> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>         Latency: 64, Cache Line Size: 64 bytes
> 
> Well, 64 seems quite reasonable, so that doesn't really give any more 
> useful information.
> 
> I'm CCing Carlos Pardo at Silicon Image who wrote the patch above, maybe 
> he has some insight.. Carlos, we have a case here where Bernd is 
> reporting seeing corruption on an integrated SiI3114 on a Tyan Thunder 
> K8S Pro (S2882) board, AMD 8111 chipset. This is reportedly occurring 
> only with certain Seagate drives. Do you have any insight into this 
> problem, in particular as far as whether the problem worked around in 
> the patch mentioned above might be related?
> 
> There are apparently some reports of issues on NVidia chipsets as well, 
> though I don't have any details at hand.

Well, Carlos' email bounces, so much for that one. Anyone have any other 
contacts at Silicon Image?

> 
>>         Interrupt: pin A routed to IRQ 19
>>         Region 0: I/O ports at bc00 [size=8]
>>         Region 1: I/O ports at b880 [size=4]
>>         Region 2: I/O ports at b800 [size=8]
>>         Region 3: I/O ports at ac00 [size=4]
>>         Region 4: I/O ports at a880 [size=16]
>>         Region 5: Memory at feafec00 (32-bit, non-prefetchable) [size=1K]
>>         Expansion ROM at fea00000 [disabled] [size=512K]
>>         Capabilities: [60] Power Management version 2
>>                 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                 Status: D0 PME-Enable- DSel=0 DScale=2 PME-
>> 00: 95 10 14 31 07 01 b0 02 02 00 80 01 10 40 00 00
>> 10: 01 bc 00 00 81 b8 00 00 01 b8 00 00 01 ac 00 00
>> 20: 81 a8 00 00 00 ec af fe 00 00 00 00 95 10 14 31
>> 30: 00 00 a0 fe 60 00 00 00 00 00 00 00 0a 01 00 00
>> 40: 02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
>> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
>> 70: 00 00 60 00 d0 d0 09 00 00 00 60 00 00 00 00 00
>> 80: 03 00 00 00 22 00 00 00 00 00 00 00 c8 93 7f ef
>> 90: 00 00 00 09 ff ff 00 00 00 00 00 19 00 00 00 00
>> a0: 01 31 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
>> b0: 01 21 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
>> c0: 84 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>>
>>
>> Cheers,
>> Bernd
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2009-01-11  0:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-03 20:04 Corrupt data - RAID sata_sil 3114 chip Bernd Schubert
2009-01-03 20:53 ` Robert Hancock
2009-01-03 21:11   ` Bernd Schubert
2009-01-03 23:23     ` Robert Hancock
2009-01-07  4:59 ` Tejun Heo
2009-01-07  5:38   ` Robert Hancock
2009-01-07 15:31     ` Bernd Schubert
2009-01-11  0:32       ` Robert Hancock
2009-01-11  0:43         ` Robert Hancock [this message]
2009-01-12  1:30           ` Tejun Heo
2009-01-19 18:43             ` Dave Jones
2009-01-20  2:50               ` Robert Hancock
2009-01-20 20:07                 ` Dave Jones
     [not found] <bQVFb-3SB-37@gated-at.bofh.it>
     [not found] ` <bQVFb-3SB-39@gated-at.bofh.it>
     [not found]   ` <bQVFb-3SB-41@gated-at.bofh.it>
     [not found]     ` <bQVFc-3SB-43@gated-at.bofh.it>
     [not found]       ` <bQVFc-3SB-45@gated-at.bofh.it>
     [not found]         ` <bQVFc-3SB-47@gated-at.bofh.it>
     [not found]           ` <bQVFb-3SB-35@gated-at.bofh.it>
     [not found]             ` <4963306F.4060504@sm7jqb.se>
2009-01-06 10:48               ` Justin Piszcz
     [not found] <495E01E3.9060903@sm7jqb.se>
2009-01-02 12:42 ` Justin Piszcz
2009-01-02 21:30   ` Bernd Schubert
2009-01-02 21:47     ` Twigathy
2009-01-03  2:31     ` Redeeman
2009-01-03 13:13       ` Bernd Schubert
2009-01-03 13:39     ` Alan Cox
2009-01-03 16:20       ` Bernd Schubert
2009-01-03 18:31         ` Robert Hancock
2009-01-03 22:19     ` James Youngman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49694094.60501@shaw.ca \
    --to=hancockr@shaw.ca \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bs@q-leap.de \
    --cc=debian-user@lists.debian.org \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).