* CK804 SATA Errors (still got them)
@ 2007-03-01 13:39 Alistair John Strachan
2007-03-01 14:45 ` Robert Hancock
0 siblings, 1 reply; 12+ messages in thread
From: Alistair John Strachan @ 2007-03-01 13:39 UTC (permalink / raw)
To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel
Hi Robert,
Despite all the work that went into making these less frequent with ADMA,
they're still possible to trigger.
alistair@damocles:~$ cat /proc/version
Linux version 2.6.21-rc2-damocles (root@damocles) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Wed Feb 28 21:58:41 GMT 2007
alistair@damocles:~$ dmesg | tail -n 13
ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:38:ae:08:c2/00:00:00:00:00/e0 tag 0 cdb 0x0 data 28672 out
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
These cause the same ~30 second stalls. Machine was not under load.
No 3rd party modules were loaded.
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: CK804 SATA Errors (still got them) 2007-03-01 13:39 CK804 SATA Errors (still got them) Alistair John Strachan @ 2007-03-01 14:45 ` Robert Hancock 2007-03-01 15:13 ` Alistair John Strachan 0 siblings, 1 reply; 12+ messages in thread From: Robert Hancock @ 2007-03-01 14:45 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel Alistair John Strachan wrote: > Hi Robert, > > Despite all the work that went into making these less frequent with ADMA, > they're still possible to trigger. > > alistair@damocles:~$ cat /proc/version > Linux version 2.6.21-rc2-damocles (root@damocles) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Wed Feb 28 21:58:41 GMT 2007 > > alistair@damocles:~$ dmesg | tail -n 13 > ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0 > ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata1.00: cmd ca/00:38:ae:08:c2/00:00:00:00:00/e0 tag 0 cdb 0x0 data 28672 out > res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) > ata1: soft resetting port > ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata1.00: configured for UDMA/133 > ata1: EH complete > SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) > sda: Write Protect is off > sda: Mode Sense: 00 3a 00 00 > SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > These cause the same ~30 second stalls. Machine was not under load. > > No 3rd party modules were loaded. This one seems a bit different. This time it's not related to NCQ vs. non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's presumably not related to switching between ADMA and register mode, unless perhaps a flush cache or something executed just before), and from the CPB data it appears the command completed but the controller's registers aren't indicating that it has. Not sure if I've seen one like that before.. How easily can you reproduce this? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-01 14:45 ` Robert Hancock @ 2007-03-01 15:13 ` Alistair John Strachan 2007-03-02 1:20 ` Alistair John Strachan 0 siblings, 1 reply; 12+ messages in thread From: Alistair John Strachan @ 2007-03-01 15:13 UTC (permalink / raw) To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel On Thursday 01 March 2007 14:45, Robert Hancock wrote: > This one seems a bit different. This time it's not related to NCQ vs. > non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's > presumably not related to switching between ADMA and register mode, > unless perhaps a flush cache or something executed just before), and > from the CPB data it appears the command completed but the controller's > registers aren't indicating that it has. Not sure if I've seen one like > that before.. > > How easily can you reproduce this? It's the first one since -rc2, so apparently not easily. I'm more than willing to find loads that expose it, though, so I might try that this afternoon. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-01 15:13 ` Alistair John Strachan @ 2007-03-02 1:20 ` Alistair John Strachan 2007-03-02 2:40 ` Robert Hancock 0 siblings, 1 reply; 12+ messages in thread From: Alistair John Strachan @ 2007-03-02 1:20 UTC (permalink / raw) To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel On Thursday 01 March 2007 15:13, Alistair John Strachan wrote: > On Thursday 01 March 2007 14:45, Robert Hancock wrote: > > This one seems a bit different. This time it's not related to NCQ vs. > > non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's > > presumably not related to switching between ADMA and register mode, > > unless perhaps a flush cache or something executed just before), and > > from the CPB data it appears the command completed but the controller's > > registers aren't indicating that it has. Not sure if I've seen one like > > that before.. > > > > How easily can you reproduce this? > > It's the first one since -rc2, so apparently not easily. I'm more than > willing to find loads that expose it, though, so I might try that this > afternoon. Got another: ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0 ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd c8/00:80:85:c4:ed/00:00:00:00:00/e3 tag 0 cdb 0x0 data 65536 in res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft resetting port ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: configured for UDMA/133 ata2: EH complete SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Different HD, similar problem. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-02 1:20 ` Alistair John Strachan @ 2007-03-02 2:40 ` Robert Hancock 2007-03-02 15:47 ` Alistair John Strachan 0 siblings, 1 reply; 12+ messages in thread From: Robert Hancock @ 2007-03-02 2:40 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel Alistair John Strachan wrote: > On Thursday 01 March 2007 15:13, Alistair John Strachan wrote: >> On Thursday 01 March 2007 14:45, Robert Hancock wrote: >>> This one seems a bit different. This time it's not related to NCQ vs. >>> non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's >>> presumably not related to switching between ADMA and register mode, >>> unless perhaps a flush cache or something executed just before), and >>> from the CPB data it appears the command completed but the controller's >>> registers aren't indicating that it has. Not sure if I've seen one like >>> that before.. >>> >>> How easily can you reproduce this? >> It's the first one since -rc2, so apparently not easily. I'm more than >> willing to find loads that expose it, though, so I might try that this >> afternoon. > > Got another: > > ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0 > ata2: CPB 0: ctl_flags 0xd, resp_flags 0x1 > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata2.00: cmd c8/00:80:85:c4:ed/00:00:00:00:00/e3 tag 0 cdb 0x0 data 65536 in > res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) > ata2: soft resetting port > ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > ata2.00: configured for UDMA/133 > ata2: EH complete > SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) > sdb: Write Protect is off > sdb: Mode Sense: 00 3a 00 00 > SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > Different HD, similar problem. Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a (link below) and see what effect that has? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=721449bf0d51213fe3abf0ac3e3561ef9ea7827a -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-02 2:40 ` Robert Hancock @ 2007-03-02 15:47 ` Alistair John Strachan 2007-03-04 23:25 ` Robert Hancock 0 siblings, 1 reply; 12+ messages in thread From: Alistair John Strachan @ 2007-03-02 15:47 UTC (permalink / raw) To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel On Friday 02 March 2007 02:40, Robert Hancock wrote: > Alistair John Strachan wrote: > > On Thursday 01 March 2007 15:13, Alistair John Strachan wrote: > >> On Thursday 01 March 2007 14:45, Robert Hancock wrote: > >>> This one seems a bit different. This time it's not related to NCQ vs. > >>> non-NCQ (this is a non-NCQ write here), it's in ADMA mode (so it's > >>> presumably not related to switching between ADMA and register mode, > >>> unless perhaps a flush cache or something executed just before), and > >>> from the CPB data it appears the command completed but the controller's > >>> registers aren't indicating that it has. Not sure if I've seen one like > >>> that before.. > >>> > >>> How easily can you reproduce this? > >> > >> It's the first one since -rc2, so apparently not easily. I'm more than > >> willing to find loads that expose it, though, so I might try that this > >> afternoon. > > > > Got another: > > > > ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 > > status 0x500 next cpb count 0x0 next cpb idx 0x0 ata2: CPB 0: ctl_flags > > 0xd, resp_flags 0x1 > > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > > ata2.00: cmd c8/00:80:85:c4:ed/00:00:00:00:00/e3 tag 0 cdb 0x0 data 65536 > > in res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata2: soft > > resetting port > > ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > > ata2.00: configured for UDMA/133 > > ata2: EH complete > > SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) > > sdb: Write Protect is off > > sdb: Mode Sense: 00 3a 00 00 > > SCSI device sdb: write cache: enabled, read cache: enabled, doesn't > > support DPO or FUA > > > > Different HD, similar problem. > > Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a > (link below) and see what effect that has? > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h >=721449bf0d51213fe3abf0ac3e3561ef9ea7827a Obviously, I'll let you know if it happens again, but I've reverted this commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on an NVIDIA sata controller, and this error hasn't appeared. So I'm inclined to (very unscientifically) say that this brings it back to 2.6.20's level of stability. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-02 15:47 ` Alistair John Strachan @ 2007-03-04 23:25 ` Robert Hancock 2007-03-04 23:41 ` Alistair John Strachan ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Robert Hancock @ 2007-03-04 23:25 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel Alistair John Strachan wrote: >> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a >> (link below) and see what effect that has? >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h >> =721449bf0d51213fe3abf0ac3e3561ef9ea7827a > > Obviously, I'll let you know if it happens again, but I've reverted this > commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on an > NVIDIA sata controller, and this error hasn't appeared. > > So I'm inclined to (very unscientifically) say that this brings it back to > 2.6.20's level of stability. Interesting. Can you try un-reverting that patch, and applying this one? The reading of the status register is something that was part of the original NVidia code, which I'm not really sure why is there. Given that reading the status register clears the drive's interrupt status, that might be causing some wierd interaction with the ADMA controller. Also, I added in a printk for cases where notifiers are triggered but the command doesn't indicate completion - if you still get problems, let me know if you see that message. --- linux-2.6.21-rc2-git3/drivers/ata/sata_nv.c 2007-03-04 14:44:05.000000000 -0600 +++ linux-2.6.21-rc2-git3edit/drivers/ata/sata_nv.c 2007-03-04 17:09:06.000000000 -0600 @@ -745,10 +745,10 @@ /* Grab the ATA port status for non-NCQ commands. For NCQ commands the current status may have nothing to do with the command just completed. */ - if (qc->tf.protocol != ATA_PROT_NCQ) { +/* if (qc->tf.protocol != ATA_PROT_NCQ) { u8 ata_status = readb(pp->ctl_block + (ATA_REG_STATUS * 4)); qc->err_mask |= ac_err_mask(ata_status); - } + }*/ DPRINTK("Completing qc from tag %d with err_mask %u\n",cpb_num, qc->err_mask); ata_qc_complete(qc); @@ -764,6 +764,9 @@ ata_port_freeze(ap); return 1; } + } else { + ata_port_printk(ap, KERN_WARNING, "notifier for tag %d but not complete?\n", + cpb_num); } return 0; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-04 23:25 ` Robert Hancock @ 2007-03-04 23:41 ` Alistair John Strachan 2007-03-04 23:49 ` Robert Hancock 2007-03-04 23:50 ` Jeff Garzik 2007-03-04 23:46 ` Jeff Garzik 2007-03-05 3:52 ` Alistair John Strachan 2 siblings, 2 replies; 12+ messages in thread From: Alistair John Strachan @ 2007-03-04 23:41 UTC (permalink / raw) To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel On Sunday 04 March 2007 23:25, Robert Hancock wrote: > Alistair John Strachan wrote: > >> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a > >> (link below) and see what effect that has? > >> > >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi > >>t;h =721449bf0d51213fe3abf0ac3e3561ef9ea7827a > > > > Obviously, I'll let you know if it happens again, but I've reverted this > > commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on > > an NVIDIA sata controller, and this error hasn't appeared. > > > > So I'm inclined to (very unscientifically) say that this brings it back > > to 2.6.20's level of stability. > > Interesting. Can you try un-reverting that patch, and applying this one? Sorry for the newbie question, but is it adequate to do a: git reset --hard v2.6.21-rc2 To ensure a patch is "unreverted" (I reverted it with "git revert"), before applying your patch? I've done so now, assuming this _will_ work. The reason I ask is that your diff was offset by 12 lines versus -rc2. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-04 23:41 ` Alistair John Strachan @ 2007-03-04 23:49 ` Robert Hancock 2007-03-04 23:50 ` Jeff Garzik 1 sibling, 0 replies; 12+ messages in thread From: Robert Hancock @ 2007-03-04 23:49 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Jeff Garzik, linux-kernel Alistair John Strachan wrote: >> Interesting. Can you try un-reverting that patch, and applying this one? > > Sorry for the newbie question, but is it adequate to do a: > > git reset --hard v2.6.21-rc2 > > To ensure a patch is "unreverted" (I reverted it with "git revert"), before > applying your patch? > > I've done so now, assuming this _will_ work. The reason I ask is that your > diff was offset by 12 lines versus -rc2. I assume it's OK, though I'm not a git expert. I diffed against rc2-git3 which has some CONFIG_PM ifdef changes, those shouldn't be important though. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-04 23:41 ` Alistair John Strachan 2007-03-04 23:49 ` Robert Hancock @ 2007-03-04 23:50 ` Jeff Garzik 1 sibling, 0 replies; 12+ messages in thread From: Jeff Garzik @ 2007-03-04 23:50 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Robert Hancock, linux-kernel Alistair John Strachan wrote: > On Sunday 04 March 2007 23:25, Robert Hancock wrote: >> Alistair John Strachan wrote: >>>> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a >>>> (link below) and see what effect that has? >>>> >>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi >>>> t;h =721449bf0d51213fe3abf0ac3e3561ef9ea7827a >>> Obviously, I'll let you know if it happens again, but I've reverted this >>> commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on >>> an NVIDIA sata controller, and this error hasn't appeared. >>> >>> So I'm inclined to (very unscientifically) say that this brings it back >>> to 2.6.20's level of stability. >> Interesting. Can you try un-reverting that patch, and applying this one? > > Sorry for the newbie question, but is it adequate to do a: > > git reset --hard v2.6.21-rc2 > > To ensure a patch is "unreverted" (I reverted it with "git revert"), before > applying your patch? > > I've done so now, assuming this _will_ work. The reason I ask is that your > diff was offset by 12 lines versus -rc2. If you committed the revert to the repository, it's probably to blow it away and re-clone. Generally, with git, you want to keep a pristine, never-touched-except-for-pulling kernel repository around, and then when doing compiles and experiments and such, run git-clone --reference my-vanilla-2.6-repo $URL The --reference argument will ensure that you don't haul around multiple copies of the repository objects, with each clone. Otherwise, if you have committed nothing to the repository, this will undo all your not-committed changes: git checkout -f Regards, Jeff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-04 23:25 ` Robert Hancock 2007-03-04 23:41 ` Alistair John Strachan @ 2007-03-04 23:46 ` Jeff Garzik 2007-03-05 3:52 ` Alistair John Strachan 2 siblings, 0 replies; 12+ messages in thread From: Jeff Garzik @ 2007-03-04 23:46 UTC (permalink / raw) To: Robert Hancock Cc: Alistair John Strachan, linux-kernel, IDE/ATA development list Robert Hancock wrote: > Alistair John Strachan wrote: >>> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a >>> (link below) and see what effect that has? >>> >>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h >>> >>> =721449bf0d51213fe3abf0ac3e3561ef9ea7827a >> >> Obviously, I'll let you know if it happens again, but I've reverted >> this commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 >> HDs on an NVIDIA sata controller, and this error hasn't appeared. >> >> So I'm inclined to (very unscientifically) say that this brings it >> back to 2.6.20's level of stability. > > Interesting. Can you try un-reverting that patch, and applying this one? > > The reading of the status register is something that was part of the > original > NVidia code, which I'm not really sure why is there. Given that reading > the status register clears the drive's interrupt status, that might be > causing some wierd interaction with the ADMA controller. Also, I added in > a printk for cases where notifiers are triggered but the command doesn't > indicate completion - if you still get problems, let me know if you see > that message. AFAICS, when in ADMA mode, you absolutely should not touch the ATA shadow registers at all. This is normal for all controllers with both a "legacy mode" and an "enhanced DMA mode" of some sort: the internal silicon state machines "own" the ATA shadow registers while in enhanced DMA mode. Reading or writing the ATA shadow registers while in enhanced DMA mode can lead to undefined results, running the gamut from no-op to data corruption and hardware lock-ups. You may only access the ATA shadow registers when NV_ADMA_CTL_GO is cleared, and then NV_ADMA_STAT_LEGACY is set, indicating the NVIDIA chip is in register mode (aka legacy mode). Jeff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CK804 SATA Errors (still got them) 2007-03-04 23:25 ` Robert Hancock 2007-03-04 23:41 ` Alistair John Strachan 2007-03-04 23:46 ` Jeff Garzik @ 2007-03-05 3:52 ` Alistair John Strachan 2 siblings, 0 replies; 12+ messages in thread From: Alistair John Strachan @ 2007-03-05 3:52 UTC (permalink / raw) To: Robert Hancock; +Cc: Jeff Garzik, linux-kernel On Sunday 04 March 2007 23:25, Robert Hancock wrote: > Alistair John Strachan wrote: > >> Can you try reverting commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a > >> (link below) and see what effect that has? > >> > >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi > >>t;h =721449bf0d51213fe3abf0ac3e3561ef9ea7827a > > > > Obviously, I'll let you know if it happens again, but I've reverted this > > commit and transferred 22.5GB over 45 minutes onto a RAID5 with 4 HDs on > > an NVIDIA sata controller, and this error hasn't appeared. > > > > So I'm inclined to (very unscientifically) say that this brings it back > > to 2.6.20's level of stability. > > Interesting. Can you try un-reverting that patch, and applying this one? > > The reading of the status register is something that was part of the > original NVidia code, which I'm not really sure why is there. Given that > reading the status register clears the drive's interrupt status, that might > be causing some wierd interaction with the ADMA controller. Also, I added > in a printk for cases where notifiers are triggered but the command doesn't > indicate completion - if you still get problems, let me know if you see > that message. Didn't take long to observe the problem again, so I'm guessing that this isn't it. I was definitely using a kernel compiled with your patch: alistair@damocles:~$ uname -v #1 SMP Sun Mar 4 23:39:56 GMT 2007 I got the following in dmesg: ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x500 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd c8/00:08:37:77:61/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: soft resetting port ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Your debugging message did not appear in dmesg, however. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-03-05 3:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-01 13:39 CK804 SATA Errors (still got them) Alistair John Strachan 2007-03-01 14:45 ` Robert Hancock 2007-03-01 15:13 ` Alistair John Strachan 2007-03-02 1:20 ` Alistair John Strachan 2007-03-02 2:40 ` Robert Hancock 2007-03-02 15:47 ` Alistair John Strachan 2007-03-04 23:25 ` Robert Hancock 2007-03-04 23:41 ` Alistair John Strachan 2007-03-04 23:49 ` Robert Hancock 2007-03-04 23:50 ` Jeff Garzik 2007-03-04 23:46 ` Jeff Garzik 2007-03-05 3:52 ` Alistair John Strachan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox