public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* sata_nv issues with MCP51 SATA controller
@ 2007-09-13  7:18 Jon Ivar Rykkelid
  2007-09-13  9:16 ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-13  7:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ide

[-- Attachment #1: Type: text/plain, Size: 2437 bytes --]

Hi, I was told to forward my error report to this address.
I am keen to test again if someone has a good suggestion / updated 
driver etc... (Give me a couple of days in that case...)


-----
Hi,

I'm having serious disk-issues when using the on-board nvidia controller 
for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia 
chipset, cpu is intel Core2Quad)

excerpt from "lspci":
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller 
(rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller 
(rev a1)

I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that 
works fine (/dev/hda)

However, any number of disks (I have tried 2 and 4) connected to the 
SATA-controller(s), will eventually fail. - See attached log (excerpt / 
anything relevant from /var/log/messages)

At first, disks were REALLY unstable, but then I disabled S.M.A.R.T. 
(both in BIOS and Linux), and I updated from the CentOS5 (equivalent of 
RHEL5) kernel (2.6.18) to the latest (at that time) official kernel form 
kernel.org:

 > uname -a
Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007 
i686 i686 i386 GNU/Linux

Now it will normally take a day or two before SATA crashes, so things 
are better, but still rather useless.

First error when sata_nv get into problems is always:
"exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
(as shown in the attached log-file.) - when this happens to one device, 
it'll almost instantly happen to the other disk attached to that 
controller as well. A couple of minutes (or so) later, the disk(s) 
connected to the other controller will start acting up as well (in the 
same manner). - I/O freezes, and nothing helps except a reboot...

As I run a rather large (software / md) RAID-5 disk array on this server 
(I'm doing a bit of video editing), every crash means a time-consuming 
rebuild of the disk-array...

I have given up on the sata_nv / nvidia-controllers for the time being.
I now resort to some old PCI-connected sata-controllers which work fine 
(but slow, as they are outdated and "overloaded").

So, if anyone has a good solution / suggestion / improved driver (over 
the one supplied with the official 2.6.22.5-kernel) I am eager to give 
it a go and see if the situation can be resolved.

I appreciate any sensible suggestions.

BR
Jon Ivar

-----

[-- Attachment #2: sata_nv-error.log --]
[-- Type: text/plain, Size: 17111 bytes --]

Sep  8 00:05:59 mirakel kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:05:59 mirakel kernel: ata1.00: cmd 35/00:08:47:83:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep  8 00:05:59 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:05:59 mirakel kernel: ata1: soft resetting port
Sep  8 00:05:59 mirakel kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:00 mirakel kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:06:00 mirakel kernel: ata2.00: cmd c8/00:08:d7:6e:6f/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 in
Sep  8 00:06:00 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:06:00 mirakel kernel: ata2: soft resetting port
Sep  8 00:06:01 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:30 mirakel kernel: ata1.00: qc timeout (cmd 0x27)
Sep  8 00:06:30 mirakel kernel: ata1.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:06:30 mirakel kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:06:30 mirakel kernel: ata1: failed to recover some devices, retrying in 5 secs
Sep  8 00:06:31 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:06:31 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:06:31 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:06:31 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep  8 00:06:35 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:35 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:35 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:06:36 mirakel kernel: ata2: hard resetting port
Sep  8 00:06:36 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:45 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:45 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:45 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:06:55 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:55 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:55 mirakel kernel: ata1: reset failed (errno=-19), retrying in 35 secs
Sep  8 00:07:06 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:07:06 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:07:06 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:07:06 mirakel kernel: ata2.00: limiting speed to UDMA/133:PIO3
Sep  8 00:07:06 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep  8 00:07:11 mirakel kernel: ata2: hard resetting port
Sep  8 00:07:12 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:07:30 mirakel kernel: ata1: hard resetting port
Sep  8 00:07:30 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:07:30 mirakel kernel: ata1: reset failed, giving up
Sep  8 00:07:30 mirakel kernel: ata1.00: disabled
Sep  8 00:07:30 mirakel kernel: ata1: EH complete
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 488407879
Sep  8 00:07:30 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:07:30 mirakel kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 7 devices
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 141263543
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 4560055
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] READ CAPACITY failed
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Sense not available.
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Sep  8 00:07:42 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:07:42 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:07:42 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:07:42 mirakel kernel: ata2.00: disabled
Sep  8 00:07:42 mirakel kernel: ata2: EH complete
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141520599
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141671879
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 488407879
Sep  8 00:07:42 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:07:42 mirakel kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 6 devices
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] READ CAPACITY failed
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Sense not available.
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Write Protect is off
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Asking for cache data failed
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Assuming drive cache: write through
Sep  8 00:08:12 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:08:12 mirakel kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Sep  8 00:08:12 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:08:13 mirakel kernel: ata3: soft resetting port
Sep  8 00:08:13 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:08:42 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:08:42 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep  8 00:08:42 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:08:43 mirakel kernel: ata4: soft resetting port
Sep  8 00:08:43 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:08:43 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep  8 00:08:43 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:08:43 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:08:43 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep  8 00:08:48 mirakel kernel: ata3: hard resetting port
Sep  8 00:08:48 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:08:48 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:08:58 mirakel kernel: ata3: hard resetting port
Sep  8 00:08:58 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:08:58 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:09:08 mirakel kernel: ata3: hard resetting port
Sep  8 00:09:08 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:09:08 mirakel kernel: ata3: reset failed (errno=-19), retrying in 35 secs
Sep  8 00:09:13 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:09:13 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:09:13 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:09:13 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep  8 00:09:18 mirakel kernel: ata4: hard resetting port
Sep  8 00:09:18 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:09:43 mirakel kernel: ata3: hard resetting port
Sep  8 00:09:43 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:09:43 mirakel kernel: ata3: reset failed, giving up
Sep  8 00:09:43 mirakel kernel: ata3.00: disabled
Sep  8 00:09:43 mirakel kernel: ata3: EH complete
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:09:43 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep  8 00:09:43 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:09:43 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 5 devices
Sep  8 00:09:48 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:09:48 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:09:48 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:09:48 mirakel kernel: ata4.00: limiting speed to UDMA/133:PIO3
Sep  8 00:09:48 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep  8 00:09:53 mirakel kernel: ata4: hard resetting port
Sep  8 00:09:54 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:10:24 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:10:24 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:10:24 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:10:24 mirakel kernel: ata4.00: disabled
Sep  8 00:10:25 mirakel kernel: ata4: EH complete
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:10:25 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep  8 00:10:25 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:10:25 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] READ CAPACITY failed
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Sense not available.
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Write Protect is off
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Asking for cache data failed
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Assuming drive cache: write through
Sep  8 00:10:25 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:25 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716576
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:25 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716499
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716500
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716501
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 6175
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Aborting journal on device md0.
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Sep  8 00:10:25 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:25 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:25 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:25 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:25 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:25 mirakel kernel:  disk 7, o:0, dev:sdd1
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Sep  8 00:10:26 mirakel kernel: ext3_abort called.
Sep  8 00:10:26 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep  8 00:10:26 mirakel kernel: Remounting filesystem read-only
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123686376
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689709
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689744
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:26 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:26 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:26 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:26 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:26 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:26 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: EXT3-fs error (device md0): ext3_readdir: directory #126337 contains a hole at offset 4096

^ permalink raw reply	[flat|nested] 25+ messages in thread

* sata_nv issues with MCP51 SATA controller
@ 2007-09-13  7:46 Jon Ivar Rykkelid
  2007-09-13 14:20 ` Jeff Garzik
  0 siblings, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-13  7:46 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]


Hi, I'm resending (didn't see my first attempt appear on the maillist):



I'm having serious disk-issues when using the on-board nvidia controller
for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
chipset, cpu is intel Core2Quad)

excerpt from "lspci":
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
(rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
(rev a1)

I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
works fine (/dev/hda)

However, any number of disks (I have tried 2 and 4) connected to the
SATA-controller(s), will eventually fail. - See attached log (excerpt /
anything relevant from /var/log/messages)

At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
(both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
kernel.org:

  > uname -a
Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
i686 i686 i386 GNU/Linux

Now it will normally take a day or two before SATA crashes, so things
are better, but still rather useless.

First error when sata_nv get into problems is always:
"exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
(as shown in the attached log-file.) - when this happens to one device,
it'll almost instantly happen to the other disk attached to that
controller as well. A couple of minutes (or so) later, the disk(s)
connected to the other controller will start acting up as well (in the
same manner). - I/O freezes, and nothing helps except a reboot...

As I run a rather large (software / md) RAID-5 disk array on this server
(I'm doing a bit of video editing), every crash means a time-consuming
rebuild of the disk-array...

I have given up on the sata_nv / nvidia-controllers for the time being.
I now resort to some old PCI-connected sata-controllers which work fine
(but slow, as they are outdated and "overloaded").

So, if anyone has a good solution / suggestion / improved driver (over
the one supplied with the official 2.6.22.5-kernel) I am eager to give
it a go and see if the situation can be resolved.

I appreciate any sensible suggestions.

BR
Jon Ivar


[-- Attachment #2: sata_nv-error.log --]
[-- Type: text/plain, Size: 17112 bytes --]

Sep  8 00:05:59 mirakel kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:05:59 mirakel kernel: ata1.00: cmd 35/00:08:47:83:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep  8 00:05:59 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:05:59 mirakel kernel: ata1: soft resetting port
Sep  8 00:05:59 mirakel kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:00 mirakel kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:06:00 mirakel kernel: ata2.00: cmd c8/00:08:d7:6e:6f/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 in
Sep  8 00:06:00 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:06:00 mirakel kernel: ata2: soft resetting port
Sep  8 00:06:01 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:30 mirakel kernel: ata1.00: qc timeout (cmd 0x27)
Sep  8 00:06:30 mirakel kernel: ata1.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:06:30 mirakel kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:06:30 mirakel kernel: ata1: failed to recover some devices, retrying in 5 secs
Sep  8 00:06:31 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:06:31 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:06:31 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:06:31 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep  8 00:06:35 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:35 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:35 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:06:36 mirakel kernel: ata2: hard resetting port
Sep  8 00:06:36 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:45 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:45 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:45 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:06:55 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:55 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:55 mirakel kernel: ata1: reset failed (errno=-19), retrying in 35 secs
Sep  8 00:07:06 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:07:06 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:07:06 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:07:06 mirakel kernel: ata2.00: limiting speed to UDMA/133:PIO3
Sep  8 00:07:06 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep  8 00:07:11 mirakel kernel: ata2: hard resetting port
Sep  8 00:07:12 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:07:30 mirakel kernel: ata1: hard resetting port
Sep  8 00:07:30 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:07:30 mirakel kernel: ata1: reset failed, giving up
Sep  8 00:07:30 mirakel kernel: ata1.00: disabled
Sep  8 00:07:30 mirakel kernel: ata1: EH complete
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 488407879
Sep  8 00:07:30 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:07:30 mirakel kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 7 devices
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 141263543
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 4560055
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] READ CAPACITY failed
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Sense not available.
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Sep  8 00:07:42 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:07:42 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:07:42 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:07:42 mirakel kernel: ata2.00: disabled
Sep  8 00:07:42 mirakel kernel: ata2: EH complete
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141520599
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141671879
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 488407879
Sep  8 00:07:42 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:07:42 mirakel kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 6 devices
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] READ CAPACITY failed
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Sense not available.
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Write Protect is off
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Asking for cache data failed
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Assuming drive cache: write through
Sep  8 00:08:12 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:08:12 mirakel kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Sep  8 00:08:12 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:08:13 mirakel kernel: ata3: soft resetting port
Sep  8 00:08:13 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:08:42 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:08:42 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep  8 00:08:42 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:08:43 mirakel kernel: ata4: soft resetting port
Sep  8 00:08:43 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:08:43 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep  8 00:08:43 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:08:43 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:08:43 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep  8 00:08:48 mirakel kernel: ata3: hard resetting port
Sep  8 00:08:48 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:08:48 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:08:58 mirakel kernel: ata3: hard resetting port
Sep  8 00:08:58 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:08:58 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:09:08 mirakel kernel: ata3: hard resetting port
Sep  8 00:09:08 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:09:08 mirakel kernel: ata3: reset failed (errno=-19), retrying in 35 secs
Sep  8 00:09:13 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:09:13 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:09:13 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:09:13 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep  8 00:09:18 mirakel kernel: ata4: hard resetting port
Sep  8 00:09:18 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:09:43 mirakel kernel: ata3: hard resetting port
Sep  8 00:09:43 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:09:43 mirakel kernel: ata3: reset failed, giving up
Sep  8 00:09:43 mirakel kernel: ata3.00: disabled
Sep  8 00:09:43 mirakel kernel: ata3: EH complete
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:09:43 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep  8 00:09:43 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:09:43 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 5 devices
Sep  8 00:09:48 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:09:48 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:09:48 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:09:48 mirakel kernel: ata4.00: limiting speed to UDMA/133:PIO3
Sep  8 00:09:48 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep  8 00:09:53 mirakel kernel: ata4: hard resetting port
Sep  8 00:09:54 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:10:24 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:10:24 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:10:24 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:10:24 mirakel kernel: ata4.00: disabled
Sep  8 00:10:25 mirakel kernel: ata4: EH complete
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:10:25 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep  8 00:10:25 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:10:25 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] READ CAPACITY failed
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Sense not available.
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Write Protect is off
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Asking for cache data failed
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Assuming drive cache: write through
Sep  8 00:10:25 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:25 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716576
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:25 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716499
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716500
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716501
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 6175
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Aborting journal on device md0.
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Sep  8 00:10:25 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:25 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:25 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:25 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:25 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:25 mirakel kernel:  disk 7, o:0, dev:sdd1
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Sep  8 00:10:26 mirakel kernel: ext3_abort called.
Sep  8 00:10:26 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep  8 00:10:26 mirakel kernel: Remounting filesystem read-only
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123686376
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689709
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689744
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:26 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:26 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:26 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:26 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:26 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:26 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: EXT3-fs error (device md0): ext3_readdir: directory #126337 contains a hole at offset 4096


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13  7:18 sata_nv issues with MCP51 SATA controller Jon Ivar Rykkelid
@ 2007-09-13  9:16 ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2007-09-13  9:16 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: linux-kernel, linux-ide, Robert Hancock

Jon Ivar Rykkelid wrote:
> I'm having serious disk-issues when using the on-board nvidia controller
> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
> chipset, cpu is intel Core2Quad)
> 
> excerpt from "lspci":
> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 
> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
> works fine (/dev/hda)
> 
> However, any number of disks (I have tried 2 and 4) connected to the
> SATA-controller(s), will eventually fail. - See attached log (excerpt /
> anything relevant from /var/log/messages)
> 
> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel form
> kernel.org:
> 
>> uname -a
> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
> i686 i686 i386 GNU/Linux
> 
> Now it will normally take a day or two before SATA crashes, so things
> are better, but still rather useless.
> 
> First error when sata_nv get into problems is always:
> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
> (as shown in the attached log-file.) - when this happens to one device,
> it'll almost instantly happen to the other disk attached to that
> controller as well. A couple of minutes (or so) later, the disk(s)
> connected to the other controller will start acting up as well (in the
> same manner). - I/O freezes, and nothing helps except a reboot...
> 
> As I run a rather large (software / md) RAID-5 disk array on this server
> (I'm doing a bit of video editing), every crash means a time-consuming
> rebuild of the disk-array...
> 
> I have given up on the sata_nv / nvidia-controllers for the time being.
> I now resort to some old PCI-connected sata-controllers which work fine
> (but slow, as they are outdated and "overloaded").
> 
> So, if anyone has a good solution / suggestion / improved driver (over
> the one supplied with the official 2.6.22.5-kernel) I am eager to give
> it a go and see if the situation can be resolved.
> 
> I appreciate any sensible suggestions.

Wheeee... the whole controller seems to have went down at once and it's
not even IRQ routing problem - resets are failing.  This is the first
time I see something like this.  Sorry but I don't have any idea what's
going on.  cc'ing Robert.  Any ideas?

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13  7:46 Jon Ivar Rykkelid
@ 2007-09-13 14:20 ` Jeff Garzik
  2007-09-13 15:05   ` Jon Ivar Rykkelid
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2007-09-13 14:20 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: linux-kernel

Jon Ivar Rykkelid wrote:
> 
> Hi, I'm resending (didn't see my first attempt appear on the maillist):
> 
> 
> 
> I'm having serious disk-issues when using the on-board nvidia controller
> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
> chipset, cpu is intel Core2Quad)
> 
> excerpt from "lspci":
> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 
> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
> works fine (/dev/hda)
> 
> However, any number of disks (I have tried 2 and 4) connected to the
> SATA-controller(s), will eventually fail. - See attached log (excerpt /
> anything relevant from /var/log/messages)
> 
> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
> kernel.org:
> 
>  > uname -a
> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
> i686 i686 i386 GNU/Linux
> 
> Now it will normally take a day or two before SATA crashes, so things
> are better, but still rather useless.
> 
> First error when sata_nv get into problems is always:
> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
> (as shown in the attached log-file.) - when this happens to one device,
> it'll almost instantly happen to the other disk attached to that
> controller as well. A couple of minutes (or so) later, the disk(s)
> connected to the other controller will start acting up as well (in the
> same manner). - I/O freezes, and nothing helps except a reboot...
> 
> As I run a rather large (software / md) RAID-5 disk array on this server
> (I'm doing a bit of video editing), every crash means a time-consuming
> rebuild of the disk-array...
> 
> I have given up on the sata_nv / nvidia-controllers for the time being.
> I now resort to some old PCI-connected sata-controllers which work fine
> (but slow, as they are outdated and "overloaded").
> 
> So, if anyone has a good solution / suggestion / improved driver (over
> the one supplied with the official 2.6.22.5-kernel) I am eager to give
> it a go and see if the situation can be resolved.

does adma=0 module option do anything?

	Jeff




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 14:20 ` Jeff Garzik
@ 2007-09-13 15:05   ` Jon Ivar Rykkelid
  2007-09-13 15:14     ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-13 15:05 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Tejun Heo

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>>
>> Hi, I'm resending (didn't see my first attempt appear on the maillist):
>>
>>
>>
>> I'm having serious disk-issues when using the on-board nvidia controller
>> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
>> chipset, cpu is intel Core2Quad)
>>
>> excerpt from "lspci":
>> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
>> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
>> (rev a1)
>> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
>> (rev a1)
>>
>> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
>> works fine (/dev/hda)
>>
>> However, any number of disks (I have tried 2 and 4) connected to the
>> SATA-controller(s), will eventually fail. - See attached log (excerpt /
>> anything relevant from /var/log/messages)
>>
>> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
>> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
>> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
>> kernel.org:
>>
>>  > uname -a
>> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
>> i686 i686 i386 GNU/Linux
>>
>> Now it will normally take a day or two before SATA crashes, so things
>> are better, but still rather useless.
>>
>> First error when sata_nv get into problems is always:
>> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
>> (as shown in the attached log-file.) - when this happens to one device,
>> it'll almost instantly happen to the other disk attached to that
>> controller as well. A couple of minutes (or so) later, the disk(s)
>> connected to the other controller will start acting up as well (in the
>> same manner). - I/O freezes, and nothing helps except a reboot...
>>
>> As I run a rather large (software / md) RAID-5 disk array on this server
>> (I'm doing a bit of video editing), every crash means a time-consuming
>> rebuild of the disk-array...
>>
>> I have given up on the sata_nv / nvidia-controllers for the time being.
>> I now resort to some old PCI-connected sata-controllers which work fine
>> (but slow, as they are outdated and "overloaded").
>>
>> So, if anyone has a good solution / suggestion / improved driver (over
>> the one supplied with the official 2.6.22.5-kernel) I am eager to give
>> it a go and see if the situation can be resolved.
>
> does adma=0 module option do anything?
>
>     Jeff
Thanks for the suggestion, but sata_nv is not built modular in my 
current kernel, so "no can do" at the moment
(However, if some expert REALLY thinks this will fix things, I will 
CERTAINLY recompile and give it a go)

As I said before, it all works for some time (a day or two) before it 
crashes with the current kernel & no "S.M.A.R.T.". With my current setup 
I have always had the time to fully rebuild my disk-array before a new 
crash. - In the case of 4 disks attached to the nvidia controllers 
(disregarding the disks on other controllers), this means that the 
sata_nv-driver / controllers alone have read at least 750GB and written 
250GB of data before the crash (with no resets working) - soft reboot 
fixes everything. -  I'm pretty confident that this is a driver issue.

As Tejun Heo <htejun@gmail.com> writes "the whole controller seems to 
have went down at once and it's not even IRQ routing problem - resets 
are failing."

The error-messages / crash-symptoms were the same with SMART enabled and 
the original CentOS5-kernel, except that with that setup, the crashes 
were much more frequent.

Any help?

BR
Jon Ivar


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 15:05   ` Jon Ivar Rykkelid
@ 2007-09-13 15:14     ` Tejun Heo
  2007-09-13 18:01       ` Jon Ivar Rykkelid
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2007-09-13 15:14 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: Jeff Garzik, linux-kernel

Jon Ivar Rykkelid wrote:
> Thanks for the suggestion, but sata_nv is not built modular in my
> current kernel, so "no can do" at the moment
> (However, if some expert REALLY thinks this will fix things, I will
> CERTAINLY recompile and give it a go)

Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 15:14     ` Tejun Heo
@ 2007-09-13 18:01       ` Jon Ivar Rykkelid
  2007-09-13 19:26         ` Jon Ivar Rykkelid
  2007-09-14 13:29         ` Prakash Punnoor
  0 siblings, 2 replies; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-13 18:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jeff Garzik, linux-kernel, Robert Hancock

Resending, as my first attempts contained HTML and was blocked...

Tejun Heo wrote:
> Jon Ivar Rykkelid wrote:
>   
>> Thanks for the suggestion, but sata_nv is not built modular in my
>> current kernel, so "no can do" at the moment
>> (However, if some expert REALLY thinks this will fix things, I will
>> CERTAINLY recompile and give it a go)
>>     
>
> Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.
>
>   
Ahh, silly me... Of course!
Ooops, I just got back, and verified: I actually have sata_nv running as 
a module after all on this server... My bad.
I fixed /etc/modprobe.conf to include the following two lines:
"
alias scsi_hostadapter sata_nv
options sata_nv adma=0
...
"

I then ran "mkinitrd" (to ensure that the latest options from 
modprobe.conf were included) in the initrd-image that I load at boot.

- Do you guys think this is worth a try? Anyway, I have rebooted now, so 
I'll test it for a few days and let you know - We'll just have to wait 
and see...
Do you think I should re-enable SMART to provoke a failure, or would 
that be to tempt fate too much? (For now I have not re-enabled SMART)

PS: Is there any way of testing / verifying that sata_nv is now running 
with this option? - I am pretty sure I have done it correctly, but I 
would still like to confirm that the proper option has been passed if 
possible.

Thanks
Jon Ivar


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 18:01       ` Jon Ivar Rykkelid
@ 2007-09-13 19:26         ` Jon Ivar Rykkelid
  2007-09-13 19:54           ` Jeff Garzik
  2007-09-14 13:29         ` Prakash Punnoor
  1 sibling, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-13 19:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jeff Garzik, linux-kernel, Robert Hancock

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

Hi,

I now tested with the adma=0 option, but if anything I got a crash 
quicker than before. Same error message started coming in, but this time 
the system hung before I was able to capture the log as well (but I saw 
the error, and it was the same as before, except that this time it was 
the ata3-channel that first started acting up..) - To remind you all 
what this is about, I have reattached the log that I originally captured...

Any help / clever suggestions is appreciated.

Jon Ivar Rykkelid wrote:
> I fixed /etc/modprobe.conf to include the following two lines:
> "
> alias scsi_hostadapter sata_nv
> options sata_nv adma=0
> ...
> "

Jon Ivar

[-- Attachment #2: sata_nv-error.log --]
[-- Type: text/plain, Size: 17111 bytes --]

Sep  8 00:05:59 mirakel kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:05:59 mirakel kernel: ata1.00: cmd 35/00:08:47:83:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep  8 00:05:59 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:05:59 mirakel kernel: ata1: soft resetting port
Sep  8 00:05:59 mirakel kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:00 mirakel kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:06:00 mirakel kernel: ata2.00: cmd c8/00:08:d7:6e:6f/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 in
Sep  8 00:06:00 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:06:00 mirakel kernel: ata2: soft resetting port
Sep  8 00:06:01 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:30 mirakel kernel: ata1.00: qc timeout (cmd 0x27)
Sep  8 00:06:30 mirakel kernel: ata1.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:06:30 mirakel kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:06:30 mirakel kernel: ata1: failed to recover some devices, retrying in 5 secs
Sep  8 00:06:31 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:06:31 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:06:31 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:06:31 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep  8 00:06:35 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:35 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:35 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:06:36 mirakel kernel: ata2: hard resetting port
Sep  8 00:06:36 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:06:45 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:45 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:45 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:06:55 mirakel kernel: ata1: hard resetting port
Sep  8 00:06:55 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:06:55 mirakel kernel: ata1: reset failed (errno=-19), retrying in 35 secs
Sep  8 00:07:06 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:07:06 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:07:06 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:07:06 mirakel kernel: ata2.00: limiting speed to UDMA/133:PIO3
Sep  8 00:07:06 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep  8 00:07:11 mirakel kernel: ata2: hard resetting port
Sep  8 00:07:12 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:07:30 mirakel kernel: ata1: hard resetting port
Sep  8 00:07:30 mirakel kernel: ata1: SRST failed (errno=-19)
Sep  8 00:07:30 mirakel kernel: ata1: reset failed, giving up
Sep  8 00:07:30 mirakel kernel: ata1.00: disabled
Sep  8 00:07:30 mirakel kernel: ata1: EH complete
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 488407879
Sep  8 00:07:30 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:07:30 mirakel kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 7 devices
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 141263543
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 4560055
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] READ CAPACITY failed
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Sense not available.
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Sep  8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Sep  8 00:07:42 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep  8 00:07:42 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:07:42 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:07:42 mirakel kernel: ata2.00: disabled
Sep  8 00:07:42 mirakel kernel: ata2: EH complete
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141520599
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141671879
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 488407879
Sep  8 00:07:42 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:07:42 mirakel kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 6 devices
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] READ CAPACITY failed
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Sense not available.
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Write Protect is off
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Asking for cache data failed
Sep  8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Assuming drive cache: write through
Sep  8 00:08:12 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:08:12 mirakel kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Sep  8 00:08:12 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:08:13 mirakel kernel: ata3: soft resetting port
Sep  8 00:08:13 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:08:42 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  8 00:08:42 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep  8 00:08:42 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  8 00:08:43 mirakel kernel: ata4: soft resetting port
Sep  8 00:08:43 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:08:43 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep  8 00:08:43 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:08:43 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:08:43 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep  8 00:08:48 mirakel kernel: ata3: hard resetting port
Sep  8 00:08:48 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:08:48 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:08:58 mirakel kernel: ata3: hard resetting port
Sep  8 00:08:58 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:08:58 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep  8 00:09:08 mirakel kernel: ata3: hard resetting port
Sep  8 00:09:08 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:09:08 mirakel kernel: ata3: reset failed (errno=-19), retrying in 35 secs
Sep  8 00:09:13 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:09:13 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:09:13 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:09:13 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep  8 00:09:18 mirakel kernel: ata4: hard resetting port
Sep  8 00:09:18 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:09:43 mirakel kernel: ata3: hard resetting port
Sep  8 00:09:43 mirakel kernel: ata3: SRST failed (errno=-19)
Sep  8 00:09:43 mirakel kernel: ata3: reset failed, giving up
Sep  8 00:09:43 mirakel kernel: ata3.00: disabled
Sep  8 00:09:43 mirakel kernel: ata3: EH complete
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:09:43 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep  8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep  8 00:09:43 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:09:43 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 5 devices
Sep  8 00:09:48 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:09:48 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:09:48 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:09:48 mirakel kernel: ata4.00: limiting speed to UDMA/133:PIO3
Sep  8 00:09:48 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep  8 00:09:53 mirakel kernel: ata4: hard resetting port
Sep  8 00:09:54 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep  8 00:10:24 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep  8 00:10:24 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep  8 00:10:24 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep  8 00:10:24 mirakel kernel: ata4.00: disabled
Sep  8 00:10:25 mirakel kernel: ata4: EH complete
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:10:25 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep  8 00:10:25 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep  8 00:10:25 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] READ CAPACITY failed
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Sense not available.
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Write Protect is off
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Asking for cache data failed
Sep  8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Assuming drive cache: write through
Sep  8 00:10:25 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:25 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716576
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:25 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716499
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716500
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716501
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 6175
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: Aborting journal on device md0.
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Sep  8 00:10:25 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:25 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:25 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:25 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:25 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:25 mirakel kernel:  disk 7, o:0, dev:sdd1
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
Sep  8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep  8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Sep  8 00:10:26 mirakel kernel: ext3_abort called.
Sep  8 00:10:26 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep  8 00:10:26 mirakel kernel: Remounting filesystem read-only
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123686376
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689709
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689744
Sep  8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:26 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:26 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:26 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:26 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:26 mirakel kernel:  disk 5, o:0, dev:sdc1
Sep  8 00:10:26 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:26 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:26 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:26 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:26 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:26 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 4, o:0, dev:dm-0
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 2, o:0, dev:dm-1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep  8 00:10:27 mirakel kernel:  --- rd:8 wd:4
Sep  8 00:10:27 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep  8 00:10:27 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep  8 00:10:27 mirakel kernel:  disk 3, o:1, dev:hds1
Sep  8 00:10:27 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep  8 00:10:27 mirakel kernel: EXT3-fs error (device md0): ext3_readdir: directory #126337 contains a hole at offset 4096

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 19:26         ` Jon Ivar Rykkelid
@ 2007-09-13 19:54           ` Jeff Garzik
  2007-09-13 21:15             ` Jon Ivar Rykkelid
  2007-09-14  0:37             ` Robert Hancock
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff Garzik @ 2007-09-13 19:54 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: Tejun Heo, linux-kernel, Robert Hancock

Jon Ivar Rykkelid wrote:
> Hi,
> 
> I now tested with the adma=0 option, but if anything I got a crash 
> quicker than before. Same error message started coming in, but this time 
> the system hung before I was able to capture the log as well (but I saw 
> the error, and it was the same as before, except that this time it was 
> the ata3-channel that first started acting up..) - To remind you all 
> what this is about, I have reattached the log that I originally captured...

Sounds like a hardware problem, since disabling ADMA is generally the 
cure-all we use -- it appears to stress the hardware less.

	Jeff




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 19:54           ` Jeff Garzik
@ 2007-09-13 21:15             ` Jon Ivar Rykkelid
  2007-09-14  0:37             ` Robert Hancock
  1 sibling, 0 replies; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-13 21:15 UTC (permalink / raw)
  To: Jeff Garzik, Tejun Heo, Robert Hancock; +Cc: linux-kernel

Is this the general opinion? - Should I try to get a replacement 
motherboard of the same type?

If so, can anyone confirm that the sata_nv-driver is working with the 
Gigabyte GA-N650SLI-DS4 motherboard at all / have anyone been successful 
with this MB? How about the MCP51 SATA controller? - Can anyone confirm 
that the driver is working for this HW? I would feel awkward to try to 
claim a warranty replacement if it is proved that the HW is OK after 
all, and the problem is with the linux-driver...

BR
Jon Ivar

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Hi,
>>
>> I now tested with the adma=0 option, but if anything I got a crash 
>> quicker than before. Same error message started coming in, but this 
>> time the system hung before I was able to capture the log as well 
>> (but I saw the error, and it was the same as before, except that this 
>> time it was the ata3-channel that first started acting up..) - To 
>> remind you all what this is about, I have reattached the log that I 
>> originally captured...
>
> Sounds like a hardware problem, since disabling ADMA is generally the 
> cure-all we use -- it appears to stress the hardware less.
>
>     Jeff
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


-- 
Jon Ivar Rykkelid                     Web: http://www.pvv.org/~jonry
Enromvegen 191                        Phone:     +47 72 56 86 86
N-7026 Trondheim                      Mob.:      +47 906 20 250
Norway                                Email:     jonry@pvv.org


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 19:54           ` Jeff Garzik
  2007-09-13 21:15             ` Jon Ivar Rykkelid
@ 2007-09-14  0:37             ` Robert Hancock
  2007-09-14 12:10               ` Jon Ivar Rykkelid
  1 sibling, 1 reply; 25+ messages in thread
From: Robert Hancock @ 2007-09-14  0:37 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jon Ivar Rykkelid, Tejun Heo, linux-kernel

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Hi,
>>
>> I now tested with the adma=0 option, but if anything I got a crash 
>> quicker than before. Same error message started coming in, but this 
>> time the system hung before I was able to capture the log as well (but 
>> I saw the error, and it was the same as before, except that this time 
>> it was the ata3-channel that first started acting up..) - To remind 
>> you all what this is about, I have reattached the log that I 
>> originally captured...
> 
> Sounds like a hardware problem, since disabling ADMA is generally the 
> cure-all we use -- it appears to stress the hardware less.

If this is an MCP51 chipset, adma=0 will make no difference since that 
chipset does not support ADMA in the first place.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14  0:37             ` Robert Hancock
@ 2007-09-14 12:10               ` Jon Ivar Rykkelid
  0 siblings, 0 replies; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-14 12:10 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Jeff Garzik, Tejun Heo, linux-kernel

Hi,

To eliminate the possibility of this being a hardware issue, I have now 
acquired another "Gigabyte GA-N650SLI-DS4" motherboard (with the "MCP51" 
chipset) for testing. I'll swap parts this evening. Hopefully I'll be 
able to tell you in a few hours whether this appears to be working as it 
should. The motherboard that I'm going to swap to has actually been 
tested (with MS Windows OS+driver) for more than a day with a disk 
connected, so if this MB also fails, I think it will be safe to say that 
the issue is with the sata_nv driver... So hang on.

(You can't think of something else that could conflict with the sata_nv 
driver after a bit of time, like two of my raid-disks being encrypted, 
me running a SW raid-5 array / some special HW (quad-core CPU) / me 
running vmware on this server ... ? - To me, all these suggestions seems 
rather far fetched, especially as all is working with another 
controller, so I'm arguing that unless there's a HW issue, the issue is 
with the driver, but you're the expert(s), so let me know if you differ.)

I'll keep you posted as to the result of swapping HW.. Give me a few 
hours.  :-)

BR
Jon Ivar

Robert Hancock wrote:
> Jeff Garzik wrote:
>> Jon Ivar Rykkelid wrote:
>>> Hi,
>>>
>>> I now tested with the adma=0 option, but if anything I got a crash 
>>> quicker than before. Same error message started coming in, but this 
>>> time the system hung before I was able to capture the log as well 
>>> (but I saw the error, and it was the same as before, except that 
>>> this time it was the ata3-channel that first started acting up..) - 
>>> To remind you all what this is about, I have reattached the log that 
>>> I originally captured...
>>
>> Sounds like a hardware problem, since disabling ADMA is generally the 
>> cure-all we use -- it appears to stress the hardware less.
>
> If this is an MCP51 chipset, adma=0 will make no difference since that 
> chipset does not support ADMA in the first place.
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-13 18:01       ` Jon Ivar Rykkelid
  2007-09-13 19:26         ` Jon Ivar Rykkelid
@ 2007-09-14 13:29         ` Prakash Punnoor
  2007-09-14 14:17           ` Jon Ivar Rykkelid
  1 sibling, 1 reply; 25+ messages in thread
From: Prakash Punnoor @ 2007-09-14 13:29 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: Tejun Heo, Jeff Garzik, linux-kernel, Robert Hancock

[-- Attachment #1: Type: text/plain, Size: 1018 bytes --]

On the day of Thursday 13 September 2007 Jon Ivar Rykkelid hast written:
> Resending, as my first attempts contained HTML and was blocked...
>
> Tejun Heo wrote:
> > Jon Ivar Rykkelid wrote:
> >> Thanks for the suggestion, but sata_nv is not built modular in my
> >> current kernel, so "no can do" at the moment
> >> (However, if some expert REALLY thinks this will fix things, I will
> >> CERTAINLY recompile and give it a go)
> >
> > Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.
>
> Ahh, silly me... Of course!
> Ooops, I just got back, and verified: I actually have sata_nv running as
> a module after all on this server... My bad.
> I fixed /etc/modprobe.conf to include the following two lines:
> "
> alias scsi_hostadapter sata_nv
> options sata_nv adma=0
> ...
> "

I don't think it will matter, as adma doesn't affect MCP51, but only nforce4. 
So I'd look for other trouble makers.
-- 
(°=                 =°)
//\ Prakash Punnoor /\\
V_/                 \_V

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 13:29         ` Prakash Punnoor
@ 2007-09-14 14:17           ` Jon Ivar Rykkelid
  2007-09-14 14:25             ` Jeff Garzik
  2007-09-14 20:35             ` Jon Ivar Rykkelid
  0 siblings, 2 replies; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-14 14:17 UTC (permalink / raw)
  To: Prakash Punnoor; +Cc: Tejun Heo, Jeff Garzik, linux-kernel, Robert Hancock

Prakash Punnoor wrote:
> I don't think it will matter, as adma doesn't affect MCP51, but only nforce4. 
> So I'd look for other trouble makers.
>   
Robert told me. (And you're correct - It didn't help).

I'm going to test another (identical) motherboard this evening to 
establish whether it could be a HW-issue.

I'll keep you posted

Jon Ivar


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 14:17           ` Jon Ivar Rykkelid
@ 2007-09-14 14:25             ` Jeff Garzik
  2007-09-14 14:39               ` Tejun Heo
  2007-09-14 20:35             ` Jon Ivar Rykkelid
  1 sibling, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2007-09-14 14:25 UTC (permalink / raw)
  To: Jon Ivar Rykkelid
  Cc: Prakash Punnoor, Tejun Heo, linux-kernel, Robert Hancock

Jon Ivar Rykkelid wrote:
> Prakash Punnoor wrote:
>> I don't think it will matter, as adma doesn't affect MCP51, but only 
>> nforce4. So I'd look for other trouble makers.
>>   
> Robert told me. (And you're correct - It didn't help).

Yes, it was already in slow-and-safe mode.


> I'm going to test another (identical) motherboard this evening to 
> establish whether it could be a HW-issue.

Not just motherboard.  It is more likely to be a cable, drive or PSU 
problem.

	Jeff



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 14:25             ` Jeff Garzik
@ 2007-09-14 14:39               ` Tejun Heo
       [not found]                 ` <46EAAA9A.1020903@pvv.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2007-09-14 14:39 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jon Ivar Rykkelid, Prakash Punnoor, linux-kernel, Robert Hancock

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Prakash Punnoor wrote:
>>> I don't think it will matter, as adma doesn't affect MCP51, but only
>>> nforce4. So I'd look for other trouble makers.
>>>   
>> Robert told me. (And you're correct - It didn't help).
> 
> Yes, it was already in slow-and-safe mode.
> 
> 
>> I'm going to test another (identical) motherboard this evening to
>> establish whether it could be a HW-issue.
> 
> Not just motherboard.  It is more likely to be a cable, drive or PSU
> problem.

I don't think it's cable as the problem occurs on multiple ports.  My
bet is either the controller or PSU.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
       [not found]                 ` <46EAAA9A.1020903@pvv.org>
@ 2007-09-14 15:58                   ` Jeff Garzik
  2007-09-14 18:38                     ` Jon Ivar Rykkelid
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2007-09-14 15:58 UTC (permalink / raw)
  To: Jon Ivar Rykkelid
  Cc: Tejun Heo, Prakash Punnoor, linux-kernel, Robert Hancock

Jon Ivar Rykkelid wrote:
> It is NOT the PSU, nor is it cables, as all the drives work well using 
> the same cables + PSU (in the same box) if I connect them to my other 
> two controllers (in that same box).


It's sometimes the combination that matters most.  You cannot really 
make that determination yet.

	Jeff



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 15:58                   ` Jeff Garzik
@ 2007-09-14 18:38                     ` Jon Ivar Rykkelid
  2007-09-14 20:24                       ` auxsvr
  0 siblings, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-14 18:38 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Tejun Heo, Prakash Punnoor, linux-kernel, Robert Hancock

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>
>> I'm going to test another (identical) motherboard this evening to 
>> establish whether it could be a HW-issue.
>
> Not just motherboard.  It is more likely to be a cable, drive or PSU 
> problem.

>> It is NOT the PSU, nor is it cables, as all the drives work well 
>> using the same cables + PSU (in the same box) if I connect them to my 
>> other two controllers (in that same box).
>
>
> It's sometimes the combination that matters most.  You cannot really 
> make that determination yet.
>

Whatever.
(Though I must confess, that in spite of my Master degree in Electrical 
Engineering and extensive HW experience, I can not for the life of me 
understand how you can find it more likely to be cables (that work fine 
with other controllers), disks (that also work fine with other 
controllers) or the power-supply (that also works fine with exactly the 
same things connected to it) rather than the motherboard's 
SATA-controller (that is the item that actually is reported to fail in 
the first place). - Sure, I'm well aware that sometimes the combination 
of HW matters, but to my experience we're normally not talking about 
"dumb" stuff like cables and PSU if that is the issue.)

Anyway, I have just changed to the other (identical) motherboard, and 
things are running just fine at the moment...
I'll let you know if they start acting up (as they did before). If not I 
guess the fault was with the motherboard and not the driver - Guess 
we'll know pretty soon...

Thanks for all your effort, gents, let's hope it all works now!

BR
Jon Ivar


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 18:38                     ` Jon Ivar Rykkelid
@ 2007-09-14 20:24                       ` auxsvr
  0 siblings, 0 replies; 25+ messages in thread
From: auxsvr @ 2007-09-14 20:24 UTC (permalink / raw)
  To: linux-kernel

Hello,

I get a similar, if not identical, problem with an ASUS A8N SLI nforce4 based 
motherboard. The PC (with a seagate SATA-2 120 GB HDD) ran fine for two 
years , last Christmas windows xp (I didn't change either hardware or 
drivers) started crashing and the filesystem got corrupted beyond repair 
within 8 hours after every installation. The system log contained entries 
about bad sectors and, based on the seagate diagnosis tool, I returned the 
system to the supplier. According to the retail shop, neither the disk nor 
the system had any problems, so I was coerced to pay for a replacement disk. 
The replacement HDD (seagate again, 120 GB) ran fine until a month ago (this 
time the system is connected to a UPS), when the same problem occurred! I 
moved the disk to a linux system with the promise tx2plus controller (the one 
I'm typing this from), found bad sectors, formatted it and everything works 
fine for at least 6 hours of continuous disk writes and reads in this system. 
If I return the disk to the nforce4 system, it becomes corrupted within some 
hours of disk access, no matter whether linux or windows is installed, 
regardless of NCQ settings, drivers and cables.

The symptoms are the same in both cases: the system crashes, then runs for 
some hours, then the controller stops completely responding (ata1: exception 
Emask 0x10 SAct 0x0 SErr 0x1810000 action 0x2 frozen is the first error 
message), the disk access LED blinks continuously, linux 2.6.18 (opensuse 
10.2) throws lots of error messages similar to the ones you mention above, 
linux says that the device is dead and the system becomes unusable (no disk 
access). After a reboot, the filesystem is fine for some time, afterwards 
similar error messages appear, seek errors appear and the filesystem becomes 
completely destroyed. The positive part of this ordeal is that the linux SATA 
error handling works fine and linux recovered the first time, without access 
to the drive of course, while windows crashed badly and I was unable to find 
out what was happening in the beginning.

I cannot say with certainty that this is a hardware error or damage, seagate 
technical support insists that their HDD is at fault, which is obviously 
wrong, the PC is (after the second incident) connected to a UPS and was 
checked by the service at the shop, and the most weird thing I cannot 
explain is that the system ran fine for 8 months after I changed the 
disk, even though the disk wasn't damaged! Either the motherboard is damaged 
or faulty (how can you explain that it ran fine for 8 months after I changed 
the disk?)  or there is some very weird interaction with the HDD and the SATA 
controller, which isn't unlikely, considering the problems reported about 
combinations of nforce4 and maxtor HDDs, yet still doesn't explain the 2 year 
and 8 month period of normal operation. I'm going to contact the service 
again and see how this comes out.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 14:17           ` Jon Ivar Rykkelid
  2007-09-14 14:25             ` Jeff Garzik
@ 2007-09-14 20:35             ` Jon Ivar Rykkelid
  2007-09-15  7:12               ` Prakash Punnoor
  1 sibling, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-14 20:35 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Prakash Punnoor, Tejun Heo, Jeff Garzik, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

Hi, I'm getting inmore confident that the driver is the issue.

I have now been able to reproduce the same error on the new motherboard 
as well... - (the same MB was tested to work in Windows with 
windows-drivers)...

Unless you guys can come up with something clever, I'll see if I can get 
my hands on / change to another (non-nvidia) chipset in a day or two, as 
the sata_nv with this chipset apparently isn't working.

(Or have anyone EVER been successful with the latest kernel/driver on 
this HW)?

Attaching everything relevant from /var/log/messages...


Jon Ivar Rykkelid wrote:
> I'm going to test another (identical) motherboard this evening to 
> establish whether it could be a HW-issue.
>
> I'll keep you posted
Jon Ivar



[-- Attachment #2: sata_nv-new.log --]
[-- Type: text/plain, Size: 8614 bytes --]

Sep 14 20:09:15 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 14 20:09:15 mirakel kernel: ata3.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 14 20:09:15 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 14 20:09:15 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 14 20:09:15 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 14 20:09:15 mirakel kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 14 20:09:16 mirakel kernel: ata3: soft resetting port
Sep 14 20:09:16 mirakel kernel: ata4: soft resetting port
Sep 14 20:09:16 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:09:16 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:09:46 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 14 20:09:46 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:09:46 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:09:46 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 14 20:09:46 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 14 20:09:46 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:09:46 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:09:46 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 14 20:09:51 mirakel kernel: ata3: hard resetting port
Sep 14 20:09:51 mirakel kernel: ata4: hard resetting port
Sep 14 20:09:51 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:09:51 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:09:51 mirakel kernel: ata4: reset failed (errno=-19), retrying in 10 secs
Sep 14 20:10:01 mirakel kernel: ata4: hard resetting port
Sep 14 20:10:01 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:10:01 mirakel kernel: ata4: reset failed (errno=-19), retrying in 10 secs
Sep 14 20:10:11 mirakel kernel: ata4: hard resetting port
Sep 14 20:10:11 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:10:11 mirakel kernel: ata4: reset failed (errno=-19), retrying in 35 secs
Sep 14 20:10:21 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 14 20:10:21 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:10:21 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:10:21 mirakel kernel: ata3.00: limiting speed to UDMA/133:PIO3
Sep 14 20:10:21 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 14 20:10:26 mirakel kernel: ata3: hard resetting port
Sep 14 20:10:27 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:10:46 mirakel kernel: ata4: hard resetting port
Sep 14 20:10:46 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:10:46 mirakel kernel: ata4: reset failed, giving up
Sep 14 20:10:46 mirakel kernel: ata4.00: disabled
Sep 14 20:10:46 mirakel kernel: ata4: EH complete
Sep 14 20:10:46 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 14 20:10:46 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep 14 20:10:46 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 14 20:10:46 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 7 devices
Sep 14 20:10:57 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 14 20:10:57 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:10:57 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:10:57 mirakel kernel: ata3.00: disabled
Sep 14 20:10:58 mirakel kernel: ata3: EH complete
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 14 20:10:58 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep 14 20:10:58 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 14 20:10:58 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 6 devices
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel: Buffer I/O error on device md0, logical block 119194013
Sep 14 20:10:58 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:58 mirakel kernel: Buffer I/O error on device md0, logical block 119194014
Sep 14 20:10:58 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:58 mirakel kernel: Buffer I/O error on device md0, logical block 6660
Sep 14 20:10:58 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:58 mirakel kernel: Aborting journal on device md0.
Sep 14 20:10:58 mirakel kernel:  --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel:  disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel:  disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel:  disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel:  disk 5, o:0, dev:sdd1
Sep 14 20:10:58 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep 14 20:10:58 mirakel kernel:  disk 7, o:0, dev:sdc1
Sep 14 20:10:58 mirakel kernel: ext3_abort called.
Sep 14 20:10:58 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep 14 20:10:58 mirakel kernel: Remounting filesystem read-only
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel:  --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel:  disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel:  disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel:  disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel:  disk 5, o:0, dev:sdd1
Sep 14 20:10:58 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel:  --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel:  disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel:  disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel:  disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel:  disk 5, o:0, dev:sdd1
Sep 14 20:10:58 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel:  --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel:  disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel:  disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel:  disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel:  disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel:  disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel:  disk 6, o:1, dev:hdk1
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 29
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 30
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119177216
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119180640
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119180739
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119193951
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119193953
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-14 20:35             ` Jon Ivar Rykkelid
@ 2007-09-15  7:12               ` Prakash Punnoor
  2007-09-15 10:14                 ` Jon Ivar Rykkelid
       [not found]                 ` <46EBA82C.6050000@pvv.org>
  0 siblings, 2 replies; 25+ messages in thread
From: Prakash Punnoor @ 2007-09-15  7:12 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: Robert Hancock, Tejun Heo, Jeff Garzik, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

On the day of Friday 14 September 2007 Jon Ivar Rykkelid hast written:
> Hi, I'm getting inmore confident that the driver is the issue.
>
>
> (Or have anyone EVER been successful with the latest kernel/driver on
> this HW)?

I don't have exaclty the same hw, but the same chipset and I don't have any 
problems - even with the swncq patch applied. Do you have an hpet? If not, 
try booting with acpi_use_time_override. My system won't work with skipping 
the override.

-- 
(°=                 =°)
//\ Prakash Punnoor /\\
V_/                 \_V

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-15  7:12               ` Prakash Punnoor
@ 2007-09-15 10:14                 ` Jon Ivar Rykkelid
  2007-09-15 14:47                   ` John Stoffel
       [not found]                 ` <46EBA82C.6050000@pvv.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-15 10:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prakash Punnoor, Robert Hancock, Tejun Heo, Jeff Garzik

Prakash Punnoor wrote:
> I don't have exaclty the same hw, but the same chipset and I don't have any 
> problems - even with the swncq patch applied. Do you have an hpet? If not, 
> try booting with acpi_use_time_override. My system won't work with skipping 
> the override.
>
>   
Hi , I reconnected and rebooted with the kernel option 
"acpi_use_timer_override" (this is the correct spelling, isn't it? - 
Kernel didn't complain.). Didn't help, the same error received as 
before. - I'll have to connect all disks back to my PCI-connected SATA 
controllers and start rebuilding my RAID yet again.

It seems random which disk is first affected (This far, I know that it 
has happened to ata1, ata3 and ata4, three of my potential disks) - I 
guess it just happens to the disk that is being used at the moment when 
the driver / controller acts up.)

I'm about to give in. I think I'll try to replace both ( Gigabyte 
GA-N650SLI-DS4 ) motherboards, as the driver simply isn't working for 
the on-board controller of these boards. Could be a combination of the 
controllers and some other HW on the motherboards of course, but all is 
working when I connect all disks to my non-nvidia controllers. - Guess 
I'll opt for a motherboard with an intel-chipset after all...

BR
Jon Ivar

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
       [not found]                 ` <46EBA82C.6050000@pvv.org>
@ 2007-09-15 11:30                   ` Prakash Punnoor
  0 siblings, 0 replies; 25+ messages in thread
From: Prakash Punnoor @ 2007-09-15 11:30 UTC (permalink / raw)
  To: Jon Ivar Rykkelid; +Cc: Robert Hancock, Tejun Heo, Jeff Garzik, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On the day of Saturday 15 September 2007 Jon Ivar Rykkelid hast written:
> Do you get the same error messages that I do if you're running without
> the "acpi_use_timer_override" (this is how it is spelled, isn't it) ?

I don't remeber which messages I get, but for me the kernel didn't boot with 
certain versions. Any yes, you spelled it correctly.
-- 
(°=                 =°)
//\ Prakash Punnoor /\\
V_/                 \_V

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-15 10:14                 ` Jon Ivar Rykkelid
@ 2007-09-15 14:47                   ` John Stoffel
  2007-09-15 19:29                     ` Jon Ivar Rykkelid
  0 siblings, 1 reply; 25+ messages in thread
From: John Stoffel @ 2007-09-15 14:47 UTC (permalink / raw)
  To: Jon Ivar Rykkelid
  Cc: linux-kernel, Prakash Punnoor, Robert Hancock, Tejun Heo,
	Jeff Garzik

>>>>> "Jon" == Jon Ivar Rykkelid <jonry@pvv.org> writes:

Jon> Prakash Punnoor wrote:
>> I don't have exaclty the same hw, but the same chipset and I don't have any 
>> problems - even with the swncq patch applied. Do you have an hpet? If not, 
>> try booting with acpi_use_time_override. My system won't work with skipping 
>> the override.

Jon> Hi , I reconnected and rebooted with the kernel option
Jon> "acpi_use_timer_override" (this is the correct spelling, isn't
Jon> it? - Kernel didn't complain.). Didn't help, the same error
Jon> received as before. - I'll have to connect all disks back to my
Jon> PCI-connected SATA controllers and start rebuilding my RAID yet
Jon> again.

What happens when you just have ONE disk connected to the motherboard
controller, and the rest connected to PCI controllers?  Does it crap
out then?  You've just such a nice repeatable problem across
motherboards that it's a shame to waste this debugging time.

I'm wondering if it's a PCI bus issue somehow, and that the load on
the motherboard controller isn't supportable when you have a bunch of
disks on PCI controllers as well.  Shot in the dark...

Thanks for all your hard work on this, I know how frustrating it is to
not have a stable system!

John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: sata_nv issues with MCP51 SATA controller
  2007-09-15 14:47                   ` John Stoffel
@ 2007-09-15 19:29                     ` Jon Ivar Rykkelid
  0 siblings, 0 replies; 25+ messages in thread
From: Jon Ivar Rykkelid @ 2007-09-15 19:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: John Stoffel, Prakash Punnoor, Robert Hancock, Tejun Heo,
	Jeff Garzik

John Stoffel wrote:
> What happens when you just have ONE disk connected to the motherboard
> controller, and the rest connected to PCI controllers?  Does it crap
> out then?  You've just such a nice repeatable problem across
> motherboards that it's a shame to waste this debugging time.
>   
Sorry, I gave in. I have now abandoned my nvidia trials (both 
motherboards have been returned, and I'm now running with Intel chipset) 
- My current motherboard is less ideal (in terms of PCI-slots etc.), but 
on the other hand it works...
> I'm wondering if it's a PCI bus issue somehow, and that the load on
> the motherboard controller isn't supportable when you have a bunch of
> disks on PCI controllers as well.  Shot in the dark...
>   
That was actually not such a bad idea... Unfortunately it's too late now 
(If not I should have tested for sure). I was/am after all running an 
8-disk SATA array (plus a normal IDE disk - not in the raid). I had 4 
disks running through two PCI-cards and 4 disks used the motherboard's 
controller. - When all 8 disks were connected to the two PCI-cards the 
speed dropped compared to when the motherboard's controller took some 
load.. (So it could maybe be an issue with bandwidth / load ? - I don't 
know.)
> Thanks for all your hard work on this, I know how frustrating it is to
> not have a stable system!
>   
Sorry for giving in, but I felt I was banging my head against the wall 
(and with too few sensible solutions being suggested). Now I guess I'm 
semi-happy that all seems to work OK with the Intel chipset.. 
Frustrating that the sata_nv-driver / nvidia HW didn't work with my 
configuration, though...

Thank you all for your effort as well - hope someone figures this out 
sometime in the future.

All the best
Jon Ivar

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-09-15 19:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-13  7:18 sata_nv issues with MCP51 SATA controller Jon Ivar Rykkelid
2007-09-13  9:16 ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2007-09-13  7:46 Jon Ivar Rykkelid
2007-09-13 14:20 ` Jeff Garzik
2007-09-13 15:05   ` Jon Ivar Rykkelid
2007-09-13 15:14     ` Tejun Heo
2007-09-13 18:01       ` Jon Ivar Rykkelid
2007-09-13 19:26         ` Jon Ivar Rykkelid
2007-09-13 19:54           ` Jeff Garzik
2007-09-13 21:15             ` Jon Ivar Rykkelid
2007-09-14  0:37             ` Robert Hancock
2007-09-14 12:10               ` Jon Ivar Rykkelid
2007-09-14 13:29         ` Prakash Punnoor
2007-09-14 14:17           ` Jon Ivar Rykkelid
2007-09-14 14:25             ` Jeff Garzik
2007-09-14 14:39               ` Tejun Heo
     [not found]                 ` <46EAAA9A.1020903@pvv.org>
2007-09-14 15:58                   ` Jeff Garzik
2007-09-14 18:38                     ` Jon Ivar Rykkelid
2007-09-14 20:24                       ` auxsvr
2007-09-14 20:35             ` Jon Ivar Rykkelid
2007-09-15  7:12               ` Prakash Punnoor
2007-09-15 10:14                 ` Jon Ivar Rykkelid
2007-09-15 14:47                   ` John Stoffel
2007-09-15 19:29                     ` Jon Ivar Rykkelid
     [not found]                 ` <46EBA82C.6050000@pvv.org>
2007-09-15 11:30                   ` Prakash Punnoor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox