From: Robert Hancock <hancockr@shaw.ca>
Cc: Bernd Schubert <bs@q-leap.de>, Tejun Heo <tj@kernel.org>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
Justin Piszcz <jpiszcz@lucidpixels.com>,
debian-user@lists.debian.org, linux-raid@vger.kernel.org,
linux-ide@vger.kernel.org
Subject: Re: Corrupt data - RAID sata_sil 3114 chip
Date: Sat, 10 Jan 2009 18:43:00 -0600 [thread overview]
Message-ID: <49694094.60501@shaw.ca> (raw)
In-Reply-To: <49693E08.3050209@shaw.ca>
Robert Hancock wrote:
> Bernd Schubert wrote:
>>>> I think it's something related to setting up the PCI side of things.
>>>> There have been hints that incorrect CLS setting was the culprit and I
>>>> tried thte combinations but without any success and unfortunately the
>>>> problem wasn't reproducible with the hardware I have here. :-(
>>> As far as the cache line size register, the only thing the documentation
>>> says it controls _directly_ is "With the SiI3114 as a master, initiating
>>> a read transaction, it issues PCI command Read Multiple in place, when
>>> empty space in its FIFO is larger than the value programmed in this
>>> register."
>>>
>>> The interesting thing is the commit (log below) that added code to the
>>> driver to check the PCI cache line size register and set up the FIFO
>>> thresholds:
>>>
>>> 2005/03/24 23:32:42-05:00 Carlos.Pardo
>>> [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration
>>>
>>> This patch set default values for the FIFO PCI Bus Arbitration to
>>> avoid data corruption. The root cause is due to our PCI bus master
>>> handling mismatch with the chipset PCI bridge during DMA xfer (write
>>> data to the device). The patch is to setup the DMA fifo threshold so
>>> that there is no chance for the DMA engine to change protocol. We
>>> have
>>> seen this problem only on one motherboard.
>>>
>>> Signed-off-by: Silicon Image Corporation <cpardo@siliconimage.com>
>>> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
>>> 4
>>> What the code's doing is setting the FIFO thresholds, used to assign
>>> priority when requesting a PCI bus read or write operation, based on the
>>> cache line size somehow. It seems to be trusting that the chip's cache
>>> line size register has been set properly by the BIOS. The kernel should
>>> know what the cache line size is but AFAIK normally only sets it when
>>> the driver requests MWI. This chip doesn't support MWI, but it looks
>>> like pci_set_mwi would fix up the CLS register as a side effect..
>>>
>>>> Anyways, there was an interesting report that updating the BIOS on the
>>>> controller fixed the problem.
>>>>
>>>> http://bugzilla.kernel.org/show_bug.cgi?id=10480
>>>>
>>>> Taking "lspci -nnvvvxxx" output of before and after such BIOS update
>>>> will shed some light on what's really going on. Can you please try
>>>> that?
>>> Yes, that would be quite interesting.. the output even with the current
>>> BIOS would be useful to see if the BIOS set some stupid cache line size
>>> value..
>>
>> Unfortunately I can't update the bios/firmware of the Sil3114
>> directly, it is onboard and the firmware is included into the
>> mainboard bios. There is not the most recent bios version installed,
>> but when we initially had the problems, we first tried a bios update,
>> but it didn't help.
>
> Well if one is really adventurous one can sometimes use some BIOS image
> editing tools to install an updated flash image for such integrated
> chips into the main BIOS image. This is definitely for advanced users
> only though..
>
>>
>> As suggested by Robert, I'm presently trying to figure out the
>> corruption pattern. Actually our test tool easily provides these data.
>> Unfortunately, it so far didn't report anything, although the reiserfs
>> already got corrupted. Might be my colleague, who wrote that tool,
>> recently broke something (as it is the second time, it doesn't report
>> corruptions), in the past it did work reliably. Please give me a few
>> more days...
>>
>>
>> 03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114
>> [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
>> Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller
>> [1095:3114]
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR+ FastB2B-
>> Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>> Latency: 64, Cache Line Size: 64 bytes
>
> Well, 64 seems quite reasonable, so that doesn't really give any more
> useful information.
>
> I'm CCing Carlos Pardo at Silicon Image who wrote the patch above, maybe
> he has some insight.. Carlos, we have a case here where Bernd is
> reporting seeing corruption on an integrated SiI3114 on a Tyan Thunder
> K8S Pro (S2882) board, AMD 8111 chipset. This is reportedly occurring
> only with certain Seagate drives. Do you have any insight into this
> problem, in particular as far as whether the problem worked around in
> the patch mentioned above might be related?
>
> There are apparently some reports of issues on NVidia chipsets as well,
> though I don't have any details at hand.
Well, Carlos' email bounces, so much for that one. Anyone have any other
contacts at Silicon Image?
>
>> Interrupt: pin A routed to IRQ 19
>> Region 0: I/O ports at bc00 [size=8]
>> Region 1: I/O ports at b880 [size=4]
>> Region 2: I/O ports at b800 [size=8]
>> Region 3: I/O ports at ac00 [size=4]
>> Region 4: I/O ports at a880 [size=16]
>> Region 5: Memory at feafec00 (32-bit, non-prefetchable) [size=1K]
>> Expansion ROM at fea00000 [disabled] [size=512K]
>> Capabilities: [60] Power Management version 2
>> Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>> Status: D0 PME-Enable- DSel=0 DScale=2 PME-
>> 00: 95 10 14 31 07 01 b0 02 02 00 80 01 10 40 00 00
>> 10: 01 bc 00 00 81 b8 00 00 01 b8 00 00 01 ac 00 00
>> 20: 81 a8 00 00 00 ec af fe 00 00 00 00 95 10 14 31
>> 30: 00 00 a0 fe 60 00 00 00 00 00 00 00 0a 01 00 00
>> 40: 02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
>> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
>> 70: 00 00 60 00 d0 d0 09 00 00 00 60 00 00 00 00 00
>> 80: 03 00 00 00 22 00 00 00 00 00 00 00 c8 93 7f ef
>> 90: 00 00 00 09 ff ff 00 00 00 00 00 19 00 00 00 00
>> a0: 01 31 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
>> b0: 01 21 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
>> c0: 84 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>>
>>
>> Cheers,
>> Bernd
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2009-01-11 0:43 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-03 20:04 Corrupt data - RAID sata_sil 3114 chip Bernd Schubert
2009-01-03 20:53 ` Robert Hancock
2009-01-03 21:11 ` Bernd Schubert
2009-01-03 23:23 ` Robert Hancock
2009-01-07 4:59 ` Tejun Heo
2009-01-07 5:38 ` Robert Hancock
2009-01-07 15:31 ` Bernd Schubert
2009-01-11 0:32 ` Robert Hancock
2009-01-11 0:43 ` Robert Hancock [this message]
2009-01-12 1:30 ` Tejun Heo
2009-01-19 18:43 ` Dave Jones
2009-01-20 2:50 ` Robert Hancock
2009-01-20 20:07 ` Dave Jones
-- strict thread matches above, loose matches on Subject: below --
2010-01-29 16:13 Ulli.Brennenstuhl
2010-01-29 19:37 ` Robert Hancock
2010-02-06 3:54 ` Tejun Heo
2010-02-06 15:16 ` Tim Small
2010-02-07 16:09 ` Robert Hancock
2010-02-08 2:31 ` Tejun Heo
2010-02-08 14:25 ` Tim Small
[not found] <bQVFb-3SB-37@gated-at.bofh.it>
[not found] ` <bQVFb-3SB-39@gated-at.bofh.it>
[not found] ` <bQVFb-3SB-41@gated-at.bofh.it>
[not found] ` <bQVFc-3SB-43@gated-at.bofh.it>
[not found] ` <bQVFc-3SB-45@gated-at.bofh.it>
[not found] ` <bQVFc-3SB-47@gated-at.bofh.it>
[not found] ` <bQVFb-3SB-35@gated-at.bofh.it>
[not found] ` <4963306F.4060504@sm7jqb.se>
2009-01-06 10:48 ` Justin Piszcz
[not found] <495E01E3.9060903@sm7jqb.se>
[not found] ` <alpine.DEB.1.10.0901020741200.11852@p34.internal.lan>
2009-01-02 21:30 ` Bernd Schubert
2009-01-02 21:47 ` Twigathy
2009-01-03 2:31 ` Redeeman
2009-01-03 13:13 ` Bernd Schubert
2009-01-03 13:39 ` Alan Cox
2009-01-03 16:20 ` Bernd Schubert
2009-01-03 18:31 ` Robert Hancock
2009-01-03 22:19 ` James Youngman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49694094.60501@shaw.ca \
--to=hancockr@shaw.ca \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=bs@q-leap.de \
--cc=debian-user@lists.debian.org \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).