From mboxrd@z Thu Jan 1 00:00:00 1970 From: bl0 Subject: Re: sata_sil data corruption, possible workarounds Date: Tue, 18 Dec 2012 16:23:02 +0100 Message-ID: References: <50CCF1E0.9070804@gmail.com> <50CEB13B.9010100@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from plane.gmane.org ([80.91.229.3]:54108 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932067Ab2LRPWf (ORCPT ); Tue, 18 Dec 2012 10:22:35 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Tkz0H-0006px-M5 for linux-ide@vger.kernel.org; Tue, 18 Dec 2012 16:22:45 +0100 Received: from 91.150.147.9.internetia.net.pl ([91.150.147.9]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 18 Dec 2012 16:22:45 +0100 Received: from bl0-052 by 91.150.147.9.internetia.net.pl with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 18 Dec 2012 16:22:45 +0100 Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: linux-ide@vger.kernel.org Cc: linux-pci@vger.kernel.org On Monday 17 December 2012 06:44, Robert Hancock wrote: > Hmm, looks like I was looking at the wrong register. The CLS itself i= s > described by what I posted, so changing that does affect things (i.e. > the threshold for Memory Read Multiple). The other value being writte= n > into fifo_cfg is the FIFO Write Request Control and FIFO Read Request > Control field (that's why it's written to bits 0-2 and 8-10). >=20 > "The FIFO Write Request Control and FIFO Read Request Control fields = in > these registers provide threshold settings for establishing when PCI > requests are made to the Arbiter. The Arbiter arbitrates among the fo= ur > requests using fixed priority with masking. The fixed priority is, fr= om > highest to lowest: channel 0; channel 1; channel 2; and channel 3. If > multiple requests are present, the arbiter grants PCI bus access to t= he > highest priority channel that is not masked. That channel=E2=80=99s r= equest is > then masked as long as any unmasked requests are present. >=20 > .. >=20 > FIFO Read Request Control. This bit field defines the FIFO threshold= to > assign priority when requesting a PCI bus read operation. A value of = 00H > indicates that read request priority is set whenever the FIFO has > greater than 32 bytes available space, while a value of 07H indicates > that read request priority is set whenever the FIFO has greater than > 7x32 bytes (=3D224 bytes) available space.=20 A fencepost error? They probably mean 8x32 bytes... > This bit field is useful when=20 > multiple DMA channels are competing for accessing the PCI bus. >=20 >=20 > FIFO Write Request Control. This bit field defines the FIFO threshold= to > assign priority when requesting a PCI bus write operation. A value of > 00H indicates that write request priority is set whenever the FIFO > contains greater than 32 bytes, while a value of 07H indicates that > write request priority is set whenever the FIFO contains greater than > 7x32 bytes (=3D224 bytes). This bit field is useful when multiple DMA > channels are competing for the PCI bus." >=20 > The value apparently being written to the register according to the c= ode > (and given that the value in the CLS register is in units of 32-bit > words) is (cache line size >> 3) + 1. >=20 > From looking at the history of this code (which dates from the pre-g= it > days in 2005) it comes from: >=20 > https://git.kernel.org/?p=3Dlinux/kernel/git/tglx/history.git;a=3Dcommi= t;h=3Dfceff08ed7660f9bbe96ee659acb02841a3f1f39 >=20 > which refers to an issue with DMA FIFO thresholds which could cause d= ata > corruption. The description is pretty much hand-waving and doesn't=20 > really describe what is going on.=20 =46rom the description: "The patch is to setup the DMA fifo threshold s= o that there is no chance for the DMA engine to change protocol." Is there a w= ay to check (or add debugging messages) if/when this change of protocol is happening? > But it seems quite likely that=20 > whatever magic numbers this code is picking don't work on your system > for some reason. It appears the root cause is likely a bug in the SiI > chip. There shouldn't be any region why messing around with these val= ues > should cause data corruption other than that. Do you think something should be done about it in the linux sata_sil dr= iver? =46or a lack of a better solution, here is my suggestion. There is alre= ady one option 'slow_down' for problematic disks. Another option, for example 'cache_line_workaround', could be added for problematic motherboards. If enabled, the most straightforward way is to set cache = line size to 0 and not worry about the fifo_cfg register. If someone else confirms that it solves the problem for them, this option could be enab= led automatically if certain motherboard chipset is detected. A comment in the source of another Silicon Image driver, siimage.c: "If= you have strange problems with nVidia chipset systems please see the SI sup= port documentation and update your system BIOS if necessary". Do you (or any= one else) know what support documentation it's refering to?