From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264411AbUFSSGg (ORCPT ); Sat, 19 Jun 2004 14:06:36 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264515AbUFSSGg (ORCPT ); Sat, 19 Jun 2004 14:06:36 -0400 Received: from port760.ds1-suoe.adsl.cybercity.dk ([212.242.163.7]:12115 "EHLO mha.dyndns.dk") by vger.kernel.org with ESMTP id S264411AbUFSSG3 (ORCPT ); Sat, 19 Jun 2004 14:06:29 -0400 Subject: Corruption and crashes with SIL3112A SATA chipset From: Martin Alexander Hammer To: linux-kernel@vger.kernel.org Content-Type: text/plain Message-Id: <1087668387.1972.72.camel@idoru> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sat, 19 Jun 2004 20:06:27 +0200 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, I'm experiencing both data corruption and crashes when using an SATA controller with the SIL3112A chipset. The controller itself is a "Syba" PCI adapter with two SATA-150 connectors: http://www.syba.com/us/en/product/43/02/03/index.html Two 200GB Seagate disks are connected to the adapter, and I have tested it in two different machines, by trying to store 180GB of data on each drive. Here's a list of the combinations of kernels and drivers, that I have tried, and what the outcome was: *** Machine 1: Pentium 4, VIA P4X266A chipset Kernel: 2.6.5 and 2.6.7 Driver: siimage and sata_sil Files copied to any of the two Seagate disks are instantly corrupted. Md5sum returns a different checksum each time it is ran on the same file. I have also tried using only one of the disks at a time, and using another brand of SATA cables, but no luck. *** Machine 2: Pentium 3, Intel i815 chipset Kernel: 2.4.27-pre6 and 2.6.7 Driver: sata_sil At first all seems fine, but each time any of the disks get filled to around 80-85GB, something crashes and takes the entire system down with it. No errors are logged, but it manages to write af few lines to the console, among others: "lost page write due to I/O error" ... and some SCSI errors. Unfortunately, I didn't write it all down, but I can easily crash it again, if anyone needs it. *** Machine 2: Pentium 3, Intel i815 chipset Kernel: 2.6.7 Driver: siimage This one is a little slower than the sata_sil driver, but in the beginning everything seemed fine again. About 50GB into the copying, the transfer rate slows down to a couple of megabytes pr. second, and the following appears in the log several times: Jun 19 18:41:38 debian kernel: hde: sata_error = 0x00090000, watchdog = 1, siimage_mmio_ide_dma_test_irq Jun 19 18:41:58 debian kernel: hde: dma_timer_expiry: dma status == 0x21 Jun 19 18:42:08 debian kernel: hde: DMA timeout error Jun 19 18:42:08 debian kernel: hde: dma timeout error: status=0x50 { DriveReady SeekComplete } Jun 19 19:04:50 debian kernel: hde: sata_error = 0x00090000, watchdog = 1, siimage_mmio_ide_dma_test_irq Jun 19 19:04:50 debian kernel: hde: sata_error = 0x00090000, watchdog = 1, siimage_mmio_ide_dma_test_irq Jun 19 19:05:10 debian kernel: hde: dma_timer_expiry: dma status == 0x21 Jun 19 19:05:20 debian kernel: hde: DMA timeout error Jun 19 19:05:20 debian kernel: hde: dma timeout error: status=0x50 { DriveReady SeekComplete } Jun 19 19:05:20 debian kernel: But then it starts to get really bad, and it spews out this several times: Jun 19 19:30:43 debian kernel: [] __report_bad_irq+0x2a/0x90 Jun 19 19:30:43 debian kernel: [] note_interrupt+0x70/0xb0 Jun 19 19:30:43 debian kernel: [] do_IRQ+0xe0/0xf0 Jun 19 19:30:43 debian kernel: [] common_interrupt+0x18/0x20 Jun 19 19:30:43 debian kernel: Jun 19 19:31:12 debian kernel: hde: lost interrupt Jun 19 19:31:12 debian kernel: hde: task_out_intr: status=0x50 { DriveReady SeekComplete } Jun 19 19:31:12 debian kernel: Jun 19 19:31:12 debian kernel: hde: status timeout: status=0xd0 { Busy } Jun 19 19:31:12 debian kernel: Jun 19 19:31:12 debian kernel: ide2: reset phy, status=0x00000113, siimage_reset Jun 19 19:31:12 debian kernel: ide2: reset: success Jun 19 19:31:14 debian kernel: [] __report_bad_irq+0x2a/0x90 Jun 19 19:31:14 debian kernel: [] note_interrupt+0x70/0xb0 Jun 19 19:31:14 debian kernel: [] do_IRQ+0xe0/0xf0 Jun 19 19:31:14 debian kernel: [] common_interrupt+0x18/0x20 Jun 19 19:31:14 debian kernel: [] default_idle+0x23/0x40 Jun 19 19:31:45 debian kernel: [] cpu_idle+0x34/0x40 Jun 19 19:31:45 debian kernel: [] start_kernel+0x148/0x170 Jun 19 19:31:45 debian kernel: [] unknown_bootoption+0x0/0x120 ... and this: Jun 19 19:34:22 debian kernel: [] __report_bad_irq+0x2a/0x90 Jun 19 19:34:22 debian kernel: [] note_interrupt+0x70/0xb0 Jun 19 19:34:22 debian kernel: [] do_IRQ+0xe0/0xf0 Jun 19 19:34:22 debian kernel: [] common_interrupt+0x18/0x20 Jun 19 19:34:22 debian kernel: [] inode2sd+0x35/0x160 Jun 19 19:34:22 debian kernel: [] wake_up_buffer+0x13/0x40 Jun 19 19:34:22 debian kernel: [] reiserfs_update_sd_size+0x159/0x230 Jun 19 19:34:22 debian kernel: [] reiserfs_dirty_inode+0x0/0x90 Jun 19 19:34:22 debian kernel: [] reiserfs_dirty_inode+0x73/0x90 Jun 19 19:34:22 debian kernel: [] __mark_inode_dirty+0x1ad/0x1c0 Jun 19 19:34:22 debian kernel: [] inode_update_time+0xd0/0xe0 Jun 19 19:34:22 debian kernel: [] reiserfs_file_write+0x24f/0x690 Jun 19 19:34:22 debian kernel: [] linvfs_read+0x8b/0xa0 Jun 19 19:34:22 debian kernel: [] do_sync_read+0x89/0xc0 Jun 19 19:34:22 debian kernel: [] handle_IRQ_event+0x49/0x80 Jun 19 19:34:22 debian kernel: [] do_IRQ+0x8c/0xf0 Jun 19 19:34:22 debian kernel: [] common_interrupt+0x18/0x20 Jun 19 19:34:22 debian kernel: [] vfs_write+0xb8/0x130 Jun 19 19:34:22 debian kernel: [] sys_write+0x42/0x70 Jun 19 19:34:22 debian kernel: [] syscall_call+0x7/0xb And this is written to the console several times: debian kernel: Disabling IRQ #18 (IRQ 18 belongs to the sata adapter). Have I got a bad SATA controller, or what is going on here? -- Med venlig hilsen Martin Alexander Hammer http://mha.dyndns.dk