linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Sil3114 and Hitachi disks
@ 2004-12-23 15:35 Bogdan Costescu
  2004-12-24 17:20 ` Bogdan Costescu
  0 siblings, 1 reply; 8+ messages in thread
From: Bogdan Costescu @ 2004-12-23 15:35 UTC (permalink / raw)
  To: linux-ide


Hi!

I see some data corruption with a Sil3114 controller and some Hitachi
disks and I wanted to ask if I'm the only one ( =if I have some broken
disks...)

Hardware details:
Sil3114 on board of Tyan S2882 (dual Opteron 246, 8Gb RAM)
Hitachi HDS722516VLSA80 Rev: V34O (as reported by sata_sil) 160Gb

Software details:
TaoLinux 1.0 (RHEL3 clone), used both kernel 2.4.21-20 and the newly 
released 2.4.21-27, the x86_64 version, containing:
libata version 1.02
sata_sil version 0.54
Latest BIOS (including some update for the Sil firmware) installed

The disk is addressed with lba48:
ata4: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e8 86:3c02 87:4023 88:203f
ata4: dev 0 ATA, max UDMA/100, 321672960 sectors: lba48
ata4: dev 0 configured for UDMA/100

and therefore I wonder if the problem is the same as with the other 
blacklisted drives. However, no other Hitachi drive appears in the 
list and I've even found some e-mails saying that the Hitachi drives 
should not be affected.

The corruption was initially seen after running an ext3 FS for some
time and later on also when e2fsck is run. How to reproduce: on a
clean disk, I create a partition (tested even with a small partition
at the beginning of the disk), format it with 'mke2fs -j -m 0', mount
it without any options, copy some files there with tar from the
network, unmount, then run e2fsck which finds lots of errors. I have 2
identical disks and tried both of them, also with different SATA
cables and connected to different controller ports and the results
were the same.

In the exactly same conditions, a Maxtor 6Y120M0 Rev: YAR5 (120Gb, no 
lba48) works fine, as well as a PATA drive connected to the onboard 
controller (to take out any suspicion of corruption coming from bad 
memory or bad kernel), so this leads me to believe that the Hitachi 
drives are the problem.

I did not get any error messages in the logs nor crashes, except for 
ext3 starting to complain at some point which led me to these tests.

Have other people used this combination of controller and disks ?Is
this enough data to blacklist this kind of disks ? Is there any other
data that I can provide ?

Thanks in advance!

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-23 15:35 Sil3114 and Hitachi disks Bogdan Costescu
@ 2004-12-24 17:20 ` Bogdan Costescu
  2004-12-24 17:49   ` Bogdan Costescu
  0 siblings, 1 reply; 8+ messages in thread
From: Bogdan Costescu @ 2004-12-24 17:20 UTC (permalink / raw)
  To: linux-ide

On Thu, 23 Dec 2004, Bogdan Costescu wrote:

> I see some data corruption with a Sil3114 controller and some Hitachi
> disks and I wanted to ask if I'm the only one ( =if I have some broken
> disks...)

More data:
- using a UP or a SMP x86_64 kernel doesn't make any difference.
- adding this model of Hitachi disks to the blacklist of sata_sil in 
the same way as the Seagate disks (mod15 bug) reduces the speed, but 
does not fix the misbehaviour, when using the x86_64 kernel.
- the Hitachi disks behave well if connected to a standalone Sil3114 
controller in a 32-bit computer running a SMP athlon kernel.
- the Hitachi disks behave well if I use the original hardware setup 
(dual Opteron), but run the SMP athlon kernel and 32-bit userland.
- when moving a disk with a corrupted FS (checked with e2fsck -n) from 
a x86_64 kernel to an athlon kernel, the corruption is still found by 
e2fsck. When moving a disk with a non-corrupted FS from an athlon 
kernel to x86_64, e2fsck does not find any problem.

So, I come to the following conclusions:
- my Hitachi disks are not broken, therefore my suggestion of 
blacklisting is unfounded.
- the combination of x86_64 kernel, sata_sil and the Hitachi disks is
not good; the corruption seems to appear while writting to disk.  
Unfortunately, I don't have any other SATA disks with lba48, so I
can't test if this is a more general lba48 problem or restricted to
this specific Hitachi model.

Please let me know if I can provide more data for fixing this.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-24 17:20 ` Bogdan Costescu
@ 2004-12-24 17:49   ` Bogdan Costescu
  2004-12-25  4:46     ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Bogdan Costescu @ 2004-12-24 17:49 UTC (permalink / raw)
  To: linux-ide

On Fri, 24 Dec 2004, Bogdan Costescu wrote:

> - the combination of x86_64 kernel, sata_sil and the Hitachi disks is
> not good; the corruption seems to appear while writting to disk.  

I've gotten some more inspiration just after I sent the message...

The Sil3114 appears on the PCI bus as a 32-bit 66MHz device, therefore
it should use IOMMU with >4Gb RAM. Booting the x86_64 kernel with
"mem=2048m" makes the corruption go away; the difference is that the
IOMMU is automatically disabled by the kernel:

PCI-DMA: Disabling IOMMU.

With 8Gb, there is an error related to IOMMU:

Checking aperture...
CPU 0: aperture @ 1ee0000000 size 32768 KB
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
Mapping aperture over 65536 KB of RAM @ 8000000

The IOMMU was indeed disabled in BIOS. I enabled it and gave the
aperture several sizes, but the IOMMU message from the kernel is the
same. Checking back logs backups, I have always obtained this message
related to IOMMU, while the non-lba48 Maxtor disk worked fine for
several months and still works.

So, is there any link between IOMMU and lba48 ?

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-24 17:49   ` Bogdan Costescu
@ 2004-12-25  4:46     ` Jeff Garzik
  2004-12-27 13:23       ` Bogdan Costescu
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-12-25  4:46 UTC (permalink / raw)
  To: Bogdan Costescu; +Cc: linux-ide

On Fri, Dec 24, 2004 at 06:49:42PM +0100, Bogdan Costescu wrote:
> On Fri, 24 Dec 2004, Bogdan Costescu wrote:
> 
> > - the combination of x86_64 kernel, sata_sil and the Hitachi disks is
> > not good; the corruption seems to appear while writting to disk.  
> 
> I've gotten some more inspiration just after I sent the message...
> 
> The Sil3114 appears on the PCI bus as a 32-bit 66MHz device, therefore
> it should use IOMMU with >4Gb RAM. Booting the x86_64 kernel with
> "mem=2048m" makes the corruption go away; the difference is that the
> IOMMU is automatically disabled by the kernel:

Does 2.4.29-pre3 or 2.6.10 work for you?

	Jeff




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-25  4:46     ` Jeff Garzik
@ 2004-12-27 13:23       ` Bogdan Costescu
  2004-12-27 13:31         ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Bogdan Costescu @ 2004-12-27 13:23 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide


On Fri, 24 Dec 2004, Jeff Garzik wrote:

> > The Sil3114 appears on the PCI bus as a 32-bit 66MHz device, therefore
> > it should use IOMMU with >4Gb RAM. Booting the x86_64 kernel with
> > "mem=2048m" makes the corruption go away; the difference is that the
> > IOMMU is automatically disabled by the kernel:
> 
> Does 2.4.29-pre3 or 2.6.10 work for you?

Due to the userland not being able to cope easily with any of these
(2.4 due to NPTL and 2.6 due to modutils), I installed the x86_64
Fedora Core 3. It seems that its kernel works fine. I've tried both
the original (2.6.9-1.667) and the latest update (2.6.9-1.681_FC3) and
they both behaved well. I haven't made any modification to BIOS
settings or hardware in the mean time.

Do you want me to still try the vanilla kernels that you asked about ?

I would like to get the RHEL3 kernels to behave. Both libata and
sata_sil seem to be the same version in the RHEL3 and FC3 kernels, so
I gather this would be more of a IOMMU problem than a SATA problem. Am
I interpreting things correctly ? If so, I'll file a report in Red
Hat's buzgilla.

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-27 13:23       ` Bogdan Costescu
@ 2004-12-27 13:31         ` Jeff Garzik
  2004-12-27 14:22           ` Bogdan Costescu
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-12-27 13:31 UTC (permalink / raw)
  To: Bogdan Costescu; +Cc: linux-ide

On Mon, Dec 27, 2004 at 02:23:56PM +0100, Bogdan Costescu wrote:
> I would like to get the RHEL3 kernels to behave. Both libata and

RHEL3 is a bit behind the times.  In particular it needs one hotfix for
all swiotlb platforms (includes Intel's EM64T).  The next update will
have it, but the current one does not.


> sata_sil seem to be the same version in the RHEL3 and FC3 kernels, so
> I gather this would be more of a IOMMU problem than a SATA problem. Am
> I interpreting things correctly ? If so, I'll file a report in Red
> Hat's buzgilla.

Lack-of-an-IOMMU problem actually :)

	Jeff




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-27 13:31         ` Jeff Garzik
@ 2004-12-27 14:22           ` Bogdan Costescu
  2004-12-27 19:58             ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Bogdan Costescu @ 2004-12-27 14:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide

On Mon, 27 Dec 2004, Jeff Garzik wrote:

> RHEL3 is a bit behind the times.

That's not always a bad thing ;-)

> In particular it needs one hotfix for all swiotlb platforms
> (includes Intel's EM64T).  The next update will have it, but the
> current one does not.

Is there some patch that I can test ? I tried to comb through Red
Hat's Bugzilla, but I didn't find anything related.
Does this fix exist in vanilla 2.4.29-pre3 ?

> Lack-of-an-IOMMU problem actually :)

The hardware is AMD Opteron with an AMD chipset, so IOMMU should
exist. It might be that for some reason is not working properly, I
already found some bug reports about the Tyan boards' BIOS, but this
is the latest BIOS version (beta actually 2.03I, the stable 2.03
behaved the same). I have again tried to activate IOMMU in BIOS and
gave it 64Mb, but even the FC3 kernels are misdetecting it:

Checking aperture...
CPU 0: aperture @ 1ee0000000 size 64 MB
Aperture from northbridge cpu 0 beyond 4GB. Ignoring.
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 8000000
...
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 8000000 size 65536 KB
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture

But this is no longer SATA related, so I'll stop this thread here.

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sil3114 and Hitachi disks
  2004-12-27 14:22           ` Bogdan Costescu
@ 2004-12-27 19:58             ` Jeff Garzik
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2004-12-27 19:58 UTC (permalink / raw)
  To: Bogdan Costescu; +Cc: linux-ide

Bogdan Costescu wrote:
> On Mon, 27 Dec 2004, Jeff Garzik wrote:
> 
> 
>>RHEL3 is a bit behind the times.
> 
> 
> That's not always a bad thing ;-)
> 
> 
>>In particular it needs one hotfix for all swiotlb platforms
>>(includes Intel's EM64T).  The next update will have it, but the
>>current one does not.
> 
> 
> Is there some patch that I can test ? I tried to comb through Red
> Hat's Bugzilla, but I didn't find anything related.
> Does this fix exist in vanilla 2.4.29-pre3 ?


Yes.

	Jeff



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-12-27 19:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-23 15:35 Sil3114 and Hitachi disks Bogdan Costescu
2004-12-24 17:20 ` Bogdan Costescu
2004-12-24 17:49   ` Bogdan Costescu
2004-12-25  4:46     ` Jeff Garzik
2004-12-27 13:23       ` Bogdan Costescu
2004-12-27 13:31         ` Jeff Garzik
2004-12-27 14:22           ` Bogdan Costescu
2004-12-27 19:58             ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).