All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert Hancock <hancockr@shaw.ca>
Cc: Bernd Schubert <bs@q-leap.de>, Tejun Heo <tj@kernel.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Justin Piszcz <jpiszcz@lucidpixels.com>,
	debian-user@lists.debian.org, linux-raid@vger.kernel.org,
	linux-ide@vger.kernel.org
Subject: Re: Corrupt data - RAID sata_sil 3114 chip
Date: Sat, 10 Jan 2009 18:43:00 -0600	[thread overview]
Message-ID: <49694094.60501@shaw.ca> (raw)
In-Reply-To: <49693E08.3050209@shaw.ca>

Robert Hancock wrote:
> Bernd Schubert wrote:
>>>> I think it's something related to setting up the PCI side of things.
>>>> There have been hints that incorrect CLS setting was the culprit and I
>>>> tried thte combinations but without any success and unfortunately the
>>>> problem wasn't reproducible with the hardware I have here.  :-(
>>> As far as the cache line size register, the only thing the documentation
>>> says it controls _directly_ is "With the SiI3114 as a master, initiating
>>> a read transaction, it issues PCI command Read Multiple in place, when
>>> empty space in its FIFO is larger than the value programmed in this
>>> register."
>>>
>>> The interesting thing is the commit (log below) that added code to the
>>> driver to check the PCI cache line size register and set up the FIFO
>>> thresholds:
>>>
>>>    2005/03/24 23:32:42-05:00 Carlos.Pardo
>>>    [PATCH] sata_sil: Fix FIFO PCI Bus Arbitration
>>>
>>>    This patch set default values for the FIFO PCI Bus Arbitration to
>>>    avoid data corruption. The root cause is due to our PCI bus master
>>>    handling mismatch with the chipset PCI bridge during DMA xfer (write
>>>    data to the device). The patch is to setup the DMA fifo threshold so
>>>    that there is no chance for the DMA engine to change protocol. We 
>>> have
>>>    seen this problem only on one motherboard.
>>>
>>>    Signed-off-by: Silicon Image Corporation <cpardo@siliconimage.com>
>>>    Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
>>> 4
>>> What the code's doing is setting the FIFO thresholds, used to assign
>>> priority when requesting a PCI bus read or write operation, based on the
>>> cache line size somehow. It seems to be trusting that the chip's cache
>>> line size register has been set properly by the BIOS. The kernel should
>>> know what the cache line size is but AFAIK normally only sets it when
>>> the driver requests MWI. This chip doesn't support MWI, but it looks
>>> like pci_set_mwi would fix up the CLS register as a side effect..
>>>
>>>> Anyways, there was an interesting report that updating the BIOS on the
>>>> controller fixed the problem.
>>>>
>>>>   http://bugzilla.kernel.org/show_bug.cgi?id=10480
>>>>
>>>> Taking "lspci -nnvvvxxx" output of before and after such BIOS update
>>>> will shed some light on what's really going on.  Can you please try
>>>> that?
>>> Yes, that would be quite interesting.. the output even with the current
>>> BIOS would be useful to see if the BIOS set some stupid cache line size
>>> value..
>>
>> Unfortunately I can't update the bios/firmware of the Sil3114 
>> directly, it is onboard and the firmware is included into the 
>> mainboard bios. There is not the most recent bios version installed, 
>> but when we initially had the problems, we first tried a bios update, 
>> but it didn't help.
> 
> Well if one is really adventurous one can sometimes use some BIOS image 
> editing tools to install an updated flash image for such integrated 
> chips into the main BIOS image. This is definitely for advanced users 
> only though..
> 
>>
>> As suggested by Robert, I'm presently trying to figure out the 
>> corruption pattern. Actually our test tool easily provides these data. 
>> Unfortunately, it so far didn't report anything, although the reiserfs 
>> already got corrupted. Might be my colleague, who wrote that tool, 
>> recently broke something (as it is the second time, it doesn't report 
>> corruptions), in the past it did work reliably. Please give me a few 
>> more days...
>>
>>
>> 03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114 
>> [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
>>         Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller 
>> [1095:3114]
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
>> ParErr- Stepping- SERR+ FastB2B-
>>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium 
>> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>         Latency: 64, Cache Line Size: 64 bytes
> 
> Well, 64 seems quite reasonable, so that doesn't really give any more 
> useful information.
> 
> I'm CCing Carlos Pardo at Silicon Image who wrote the patch above, maybe 
> he has some insight.. Carlos, we have a case here where Bernd is 
> reporting seeing corruption on an integrated SiI3114 on a Tyan Thunder 
> K8S Pro (S2882) board, AMD 8111 chipset. This is reportedly occurring 
> only with certain Seagate drives. Do you have any insight into this 
> problem, in particular as far as whether the problem worked around in 
> the patch mentioned above might be related?
> 
> There are apparently some reports of issues on NVidia chipsets as well, 
> though I don't have any details at hand.

Well, Carlos' email bounces, so much for that one. Anyone have any other 
contacts at Silicon Image?

> 
>>         Interrupt: pin A routed to IRQ 19
>>         Region 0: I/O ports at bc00 [size=8]
>>         Region 1: I/O ports at b880 [size=4]
>>         Region 2: I/O ports at b800 [size=8]
>>         Region 3: I/O ports at ac00 [size=4]
>>         Region 4: I/O ports at a880 [size=16]
>>         Region 5: Memory at feafec00 (32-bit, non-prefetchable) [size=1K]
>>         Expansion ROM at fea00000 [disabled] [size=512K]
>>         Capabilities: [60] Power Management version 2
>>                 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                 Status: D0 PME-Enable- DSel=0 DScale=2 PME-
>> 00: 95 10 14 31 07 01 b0 02 02 00 80 01 10 40 00 00
>> 10: 01 bc 00 00 81 b8 00 00 01 b8 00 00 01 ac 00 00
>> 20: 81 a8 00 00 00 ec af fe 00 00 00 00 95 10 14 31
>> 30: 00 00 a0 fe 60 00 00 00 00 00 00 00 0a 01 00 00
>> 40: 02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
>> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
>> 70: 00 00 60 00 d0 d0 09 00 00 00 60 00 00 00 00 00
>> 80: 03 00 00 00 22 00 00 00 00 00 00 00 c8 93 7f ef
>> 90: 00 00 00 09 ff ff 00 00 00 00 00 19 00 00 00 00
>> a0: 01 31 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
>> b0: 01 21 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40
>> c0: 84 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>>
>>
>> Cheers,
>> Bernd
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  reply	other threads:[~2009-01-11  0:43 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-03 20:04 Corrupt data - RAID sata_sil 3114 chip Bernd Schubert
2009-01-03 20:53 ` Robert Hancock
2009-01-03 21:11   ` Bernd Schubert
2009-01-03 23:23     ` Robert Hancock
2009-01-07  4:59 ` Tejun Heo
2009-01-07  5:38   ` Robert Hancock
2009-01-07 15:31     ` Bernd Schubert
2009-01-11  0:32       ` Robert Hancock
2009-01-11  0:43         ` Robert Hancock [this message]
2009-01-12  1:30           ` Tejun Heo
2009-01-19 18:43             ` Dave Jones
2009-01-20  2:50               ` Robert Hancock
2009-01-20 20:07                 ` Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2010-01-29 16:13 Ulli.Brennenstuhl
2010-01-29 19:37 ` Robert Hancock
2010-02-06  3:54   ` Tejun Heo
2010-02-06 15:16     ` Tim Small
2010-02-07 16:09       ` Robert Hancock
2010-02-08  2:31         ` Tejun Heo
2010-02-08 14:25         ` Tim Small
     [not found] <bQVFb-3SB-37@gated-at.bofh.it>
     [not found] ` <bQVFb-3SB-39@gated-at.bofh.it>
     [not found]   ` <bQVFb-3SB-41@gated-at.bofh.it>
     [not found]     ` <bQVFc-3SB-43@gated-at.bofh.it>
     [not found]       ` <bQVFc-3SB-45@gated-at.bofh.it>
     [not found]         ` <bQVFc-3SB-47@gated-at.bofh.it>
     [not found]           ` <bQVFb-3SB-35@gated-at.bofh.it>
     [not found]             ` <4963306F.4060504@sm7jqb.se>
2009-01-06 10:48               ` Justin Piszcz
     [not found] <495E01E3.9060903@sm7jqb.se>
2009-01-02 12:42 ` Justin Piszcz
2009-01-02 21:30   ` Bernd Schubert
2009-01-02 21:47     ` Twigathy
2009-01-03  2:31     ` Redeeman
2009-01-03 13:13       ` Bernd Schubert
2009-01-03 13:39     ` Alan Cox
2009-01-03 16:20       ` Bernd Schubert
2009-01-03 18:31         ` Robert Hancock
2009-01-03 22:19     ` James Youngman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49694094.60501@shaw.ca \
    --to=hancockr@shaw.ca \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bs@q-leap.de \
    --cc=debian-user@lists.debian.org \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.