* Re: help with PMP failures
[not found] ` <4B0238EC.6060803@kernel.org>
@ 2009-11-17 17:39 ` Marc MERLIN
2009-11-18 4:03 ` Tejun Heo
0 siblings, 1 reply; 5+ messages in thread
From: Marc MERLIN @ 2009-11-17 17:39 UTC (permalink / raw)
To: Tejun Heo; +Cc: Tejun Heo, linux-ide
On Tue, Nov 17, 2009 at 02:47:24PM +0900, Tejun Heo wrote:
> Hello,
>
> Can you please cc linux-ide@vger.kernel.org?
Absolutely, didn't know it was good for PMP too. Done.
> > Nov 2 17:03:17 gargamel kernel: ata6.15: exception Emask 0x100 SAct 0x0 SErr 0x200000 action 0x6 frozen
> > Nov 2 17:03:17 gargamel kernel: ata6.15: irq_stat 0x02060002, PMP DMA CS errata
>
> Command execution error reported.
>
> Sil3124/32 has an errata which worsens PMP error handling quite a bit.
> It's DMA context gets corrupt if a failure occurs when commands are in
> flight to 3 or more commands, so the driver has to abort all commands
> immediately.
gotcha
> This is the actual failure. Your 6.02 drive reported media error
> which combined with the controller errata caused port wide failure.
Ah, I see, so it should be the one for me to focus on.
If it hadn't had an error, everything wouldn't have gone down the toilet,
next, right?
scsi 6:2:0:0: Direct-Access ATA Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
sd 6:2:0:0: [sdj] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
If it's a media error, shouldn't it show up in the smart counters?
=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K1000
Device Model: Hitachi HDS721010KLA330
Serial Number: GTJ000PAG2JLKC
Firmware Version: GKAOA70F
User Capacity: 1,000,204,886,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1
Local Time is: Tue Nov 17 09:32:47 2009 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 150
3 Spin_Up_Time 0x0007 105 105 024 Pre-fail Always - 662 (Average 662)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 179
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 132 132 020 Pre-fail Offline - 33
9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 18566
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 92
192 Power-Off_Retract_Count 0x0032 061 061 000 Old_age Always - 47436
193 Load_Cycle_Count 0x0012 061 061 000 Old_age Always - 47436
194 Temperature_Celsius 0x0002 125 125 000 Old_age Always - 48 (Lifetime Min/Max 20/63)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 359
> The device gets kicked out of the system so the errors follow. I have
> no idea why ata6.00 decided to stop responding. It might be a
> firmware bug or the PMP is malfunctioning. If this happens again, you
> can verify that by detaching the offending drive from the PMP without
> disconnecting power (the drive stays powered up) and then connect it
> in a different port and see whether it works. If it doesn't, it means
> the firmware on the drive is firmly hung and will require power cycle
> to get working again. Earlier SATA drives and few of recent ones
> sometimes do this after certain failures.
I can't really move it to another PMP port but I have indeed had failures
that required not just a reboot of my server but an actual power cycle
of the drive.
> Anyways, if my guess is right, the sequence of the event is first the
> drive with bad sector led to EH kicking in abruptly due to controller
> errata, which in turn caused another drive to lock up due to its
> firmware problem.
Ok, so this all sounds like it's a bit fragile due to hardware issues :)
I now have to figure out if /dev/sdj has a bad sector or not.
Last time I had this happen, though I did run
dd if=/dev/drive of=/dev/null bs=1M
for my 5 drives, and it ran clean.
If I had a bad sector, shouldn't it show up in Current_Pending_Sector
and shouldn't reading the entire drive with dd fail?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: help with PMP failures
2009-11-17 17:39 ` help with PMP failures Marc MERLIN
@ 2009-11-18 4:03 ` Tejun Heo
2009-11-18 7:41 ` Marc MERLIN
0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2009-11-18 4:03 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Tejun Heo, linux-ide
Hello,
11/18/2009 02:39 AM, Marc MERLIN wrote:
>> This is the actual failure. Your 6.02 drive reported media error
>> which combined with the controller errata caused port wide failure.
>
> Ah, I see, so it should be the one for me to focus on.
> If it hadn't had an error, everything wouldn't have gone down the toilet,
> next, right?
Yes, that's my guess.
> scsi 6:2:0:0: Direct-Access ATA Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
> sd 6:2:0:0: [sdj] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
>
> If it's a media error, shouldn't it show up in the smart counters?
Does smartctl -a output shows any logged errors?
> I can't really move it to another PMP port but I have indeed had failures
> that required not just a reboot of my server but an actual power cycle
> of the drive.
Yeah, some old drives do that after abruptly aborted while executing
commands. :-(
> Ok, so this all sounds like it's a bit fragile due to hardware issues :)
>
> I now have to figure out if /dev/sdj has a bad sector or not.
>
> Last time I had this happen, though I did run
> dd if=/dev/drive of=/dev/null bs=1M
> for my 5 drives, and it ran clean.
>
> If I had a bad sector, shouldn't it show up in Current_Pending_Sector
> and shouldn't reading the entire drive with dd fail?
I'm not sure which smart counter would be affected. It also depends
on the firmware implementation and read errors might happen one time
but not on the next trial (if the drive for some reason didn't move
the failed sector elsewhere) or maybe the drive is continuously
developing bad sectors.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: help with PMP failures
2009-11-18 4:03 ` Tejun Heo
@ 2009-11-18 7:41 ` Marc MERLIN
2009-11-18 8:33 ` Tejun Heo
0 siblings, 1 reply; 5+ messages in thread
From: Marc MERLIN @ 2009-11-18 7:41 UTC (permalink / raw)
To: Tejun Heo; +Cc: Tejun Heo, linux-ide
On Wed, Nov 18, 2009 at 01:03:00PM +0900, Tejun Heo wrote:
> Hello,
>
> 11/18/2009 02:39 AM, Marc MERLIN wrote:
> >> This is the actual failure. Your 6.02 drive reported media error
> >> which combined with the controller errata caused port wide failure.
> >
> > Ah, I see, so it should be the one for me to focus on.
> > If it hadn't had an error, everything wouldn't have gone down the toilet,
> > next, right?
>
> Yes, that's my guess.
>
> > scsi 6:2:0:0: Direct-Access ATA Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
> > sd 6:2:0:0: [sdj] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
> >
> > If it's a media error, shouldn't it show up in the smart counters?
>
> Does smartctl -a output shows any logged errors?
Didn't know about -a, good call.
Yep, they do and the times are about consistent with the raid going down
Not sure if the last 5 errors are enough to give a good clue, or not.
If I can't quite figure it out, I'll just pop in a 16 port adaptec sata
board I recently picked up. It'll take PMP out of the equation if I truly
have drives returning read / write errors that don't quite seem to show up
in smart yet.
9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 18580
SMART Error Log Version: 1
ATA Error Count: 47 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 47 occurred at disk power-on lifetime: 18547 hours (772 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 46 59 70 44
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 3f 59 70 40 08 13d+19:41:56.500 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 13d+19:41:56.500 FLUSH CACHE EXIT
ea 00 00 00 00 00 a0 08 13d+19:41:56.200 FLUSH CACHE EXIT
61 08 00 3f 59 70 40 08 13d+19:41:56.200 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 13d+19:41:56.200 FLUSH CACHE EXIT
Error 46 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 46 59 70 44
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 3f 59 70 40 08 13d+19:27:26.400 WRITE FPDMA QUEUED
27 00 00 00 00 00 e0 08 13d+19:27:26.400 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 08 13d+19:27:26.400 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 08 13d+19:27:26.400 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 08 13d+19:27:26.400 READ NATIVE MAX ADDRESS EXT
Error 45 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 46 59 70 44
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 3f 59 70 40 08 13d+19:27:26.100 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 13d+19:27:26.100 FLUSH CACHE EXIT
ea 00 00 00 00 00 a0 08 13d+19:27:25.800 FLUSH CACHE EXIT
61 08 00 3f 59 70 40 08 13d+19:27:25.800 WRITE FPDMA QUEUED
27 00 00 00 00 00 e0 08 13d+19:27:25.800 READ NATIVE MAX ADDRESS EXT
Error 44 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 46 59 70 44
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 3f 59 70 40 08 13d+19:27:24.900 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 13d+19:27:24.900 FLUSH CACHE EXIT
b0 d0 01 00 4f c2 00 08 13d+19:20:22.300 SMART READ DATA
b0 d8 00 00 4f c2 00 08 13d+19:20:22.100 SMART ENABLE OPERATIONS
e5 00 00 00 00 00 00 08 13d+19:20:22.100 CHECK POWER MODE
Error 43 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 46 59 70 44
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 00 3f 59 70 40 08 13d+19:12:24.500 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 13d+19:12:24.500 FLUSH CACHE EXIT
ea 00 00 00 00 00 a0 08 13d+19:12:23.600 FLUSH CACHE EXIT
61 08 00 3f 59 70 40 08 13d+19:12:23.600 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 13d+19:12:23.600 FLUSH CACHE EXIT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 18565 -
# 2 Short offline Completed without error 00% 18535 -
# 3 Short offline Completed without error 00% 18511 -
# 4 Extended offline Completed without error 00% 18492 -
# 5 Short offline Completed without error 00% 18487 -
# 6 Short offline Completed without error 00% 18463 -
# 7 Short offline Completed without error 00% 18439 -
# 8 Short offline Completed without error 00% 18415 -
# 9 Short offline Completed without error 00% 18391 -
#10 Short offline Completed without error 00% 18366 -
#11 Short offline Completed without error 00% 18343 -
#12 Extended offline Completed without error 00% 18324 -
#13 Short offline Completed without error 00% 18319 -
#14 Short offline Completed without error 00% 18295 -
#15 Short offline Completed without error 00% 18271 -
#16 Short offline Completed without error 00% 18247 -
#17 Short offline Completed without error 00% 18223 -
#18 Short offline Completed without error 00% 18199 -
#19 Short offline Completed without error 00% 18178 -
#20 Short offline Completed without error 00% 18178 -
#21 Short offline Completed without error 00% 18178 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Thanks for looking,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: help with PMP failures
2009-11-18 7:41 ` Marc MERLIN
@ 2009-11-18 8:33 ` Tejun Heo
2009-11-18 18:29 ` Marc MERLIN
0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2009-11-18 8:33 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Tejun Heo, linux-ide
Hello,
11/18/2009 04:41 PM, Marc MERLIN wrote:
> Didn't know about -a, good call.
> Yep, they do and the times are about consistent with the raid going down
>
> Not sure if the last 5 errors are enough to give a good clue, or not.
>
> If I can't quite figure it out, I'll just pop in a 16 port adaptec sata
> board I recently picked up. It'll take PMP out of the equation if I truly
> have drives returning read / write errors that don't quite seem to show up
> in smart yet.
>
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 84 51 00 46 59 70 44
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 61 08 00 3f 59 70 40 08 13d+19:41:56.500 WRITE FPDMA QUEUED
All the loggedcommands are write but the one which triggered the
failure was read. Error value of 0x84 indicates ICRC and ABORT, so
all the logged commands were failed due to transmission failure from
the host. Hmmm....
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: help with PMP failures
2009-11-18 8:33 ` Tejun Heo
@ 2009-11-18 18:29 ` Marc MERLIN
0 siblings, 0 replies; 5+ messages in thread
From: Marc MERLIN @ 2009-11-18 18:29 UTC (permalink / raw)
To: Tejun Heo; +Cc: Tejun Heo, linux-ide
On Wed, Nov 18, 2009 at 05:33:23PM +0900, Tejun Heo wrote:
> All the loggedcommands are write but the one which triggered the
> failure was read. Error value of 0x84 indicates ICRC and ABORT, so
> all the logged commands were failed due to transmission failure from
> the host. Hmmm....
If that helps, I dug up my 3rd such failure in my logs (back a month ago
now).
This should let you confirm or contradict your earlier suspicions
Funny how it also starts with a CRC error with the PMP too.
For that matter, it's a similar failure string then the previous
one I posted.
However this trace doesn't show any media error.
Comparing this with the last one you looked at, does it help pinpointing where
the fault might be?
Oct 17 21:18:25 gargamel kernel: ata6.00: failed to read SCR 1 (Emask=0x40)
Oct 17 21:18:25 gargamel kernel: ata6.01: failed to read SCR 1 (Emask=0x40)
Oct 17 21:18:25 gargamel kernel: ata6.02: failed to read SCR 1 (Emask=0x40)
Oct 17 21:18:25 gargamel kernel: ata6.03: failed to read SCR 1 (Emask=0x40)
Oct 17 21:18:25 gargamel kernel: ata6.04: failed to read SCR 1 (Emask=0x40)
Oct 17 21:18:25 gargamel kernel: ata6.05: failed to read SCR 1 (Emask=0x40)
Oct 17 21:18:25 gargamel kernel: ata6.15: exception Emask 0x100 SAct 0x0 SErr 0x200000 action 0x6 frozen
Oct 17 21:18:25 gargamel kernel: ata6.15: irq_stat 0x02060002, PMP DMA CS errata
Oct 17 21:18:25 gargamel kernel: ata6.15: SError: { BadCRC }
Oct 17 21:18:25 gargamel kernel: ata6.00: exception Emask 0x100 SAct 0xa SErr 0x0 action 0x6 frozen
Oct 17 21:18:25 gargamel kernel: ata6.00: cmd 60/80:08:bf:67:cc/00:00:2b:00:00/40 tag 1 ncq 65536 in
Oct 17 21:18:25 gargamel kernel: res 3c/36:00:00:00:00/cd:00:40:10:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:25 gargamel kernel: ata6.00: status: { DF DRQ }
Oct 17 21:18:25 gargamel kernel: ata6.00: error: { IDNF ABRT }
Oct 17 21:18:25 gargamel kernel: ata6.00: cmd 60/10:18:3f:68:cc/00:00:2b:00:00/40 tag 3 ncq 8192 in
Oct 17 21:18:25 gargamel kernel: res 60/10:18:3f:68:cc/00:00:2b:00:00/40 Emask 0x81 (invalid argument)
Oct 17 21:18:25 gargamel kernel: ata6.00: status: { DRDY DF }
Oct 17 21:18:25 gargamel kernel: ata6.00: error: { IDNF }
Oct 17 21:18:25 gargamel kernel: ata6.01: exception Emask 0x100 SAct 0x885 SErr 0x0 action 0x6 frozen
Oct 17 21:18:25 gargamel kernel: ata6.01: cmd 60/70:00:cf:66:cc/00:00:2b:00:00/40 tag 0 ncq 57344 in
Oct 17 21:18:25 gargamel kernel: res 3c/36:00:00:00:00/cd:00:00:00:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:25 gargamel kernel: ata6.01: status: { DF DRQ }
Oct 17 21:18:25 gargamel kernel: ata6.01: error: { IDNF ABRT }
Oct 17 21:18:25 gargamel kernel: ata6.01: cmd 60/10:10:bf:66:cc/00:00:2b:00:00/40 tag 2 ncq 8192 in
Oct 17 21:18:25 gargamel kernel: res 3c/36:00:00:00:00/00:00:00:20:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:25 gargamel kernel: ata6.01: status: { DF DRQ }
Oct 17 21:18:25 gargamel kernel: ata6.01: error: { IDNF ABRT }
Oct 17 21:18:25 gargamel kernel: ata6.01: cmd 60/80:38:3f:67:cc/00:00:2b:00:00/40 tag 7 ncq 65536 in
Oct 17 21:18:25 gargamel kernel: res 3c/36:00:00:00:00/00:00:00:70:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:25 gargamel kernel: ata6.01: status: { DF DRQ }
Oct 17 21:18:25 gargamel kernel: ata6.01: error: { IDNF ABRT }
Oct 17 21:18:25 gargamel kernel: ata6.01: cmd 60/10:58:bf:67:cc/00:00:2b:00:00/40 tag 11 ncq 8192 in
Oct 17 21:18:25 gargamel kernel: res 3c/36:00:00:00:00/00:00:00:b0:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:25 gargamel kernel: ata6.01: status: { DF DRQ }
Oct 17 21:18:25 gargamel kernel: ata6.01: error: { IDNF ABRT }
Oct 17 21:18:26 gargamel kernel: ata6.02: exception Emask 0x1 SAct 0x1100 SErr 0x0 action 0x6 frozen
Oct 17 21:18:26 gargamel kernel: ata6.02: irq_stat 0x02060002, device error via SDB FIS
Oct 17 21:18:26 gargamel kernel: ata6.02: cmd 60/70:40:cf:66:cc/00:00:2b:00:00/40 tag 8 ncq 57344 in
Oct 17 21:18:26 gargamel kernel: res 3c/36:00:00:00:00/00:00:80:80:3c/00 Emask 0x3 (HSM violation)
Oct 17 21:18:26 gargamel kernel: ata6.02: status: { DF DRQ }
Oct 17 21:18:26 gargamel kernel: ata6.02: error: { IDNF ABRT }
Oct 17 21:18:26 gargamel kernel: ata6.02: cmd 60/80:60:3f:67:cc/00:00:2b:00:00/40 tag 12 ncq 65536 in
Oct 17 21:18:26 gargamel kernel: res 60/80:60:3f:67:cc/00:00:2b:00:00/40 Emask 0x10 (ATA bus error)
Oct 17 21:18:26 gargamel kernel: ata6.02: status: { DRDY DF }
Oct 17 21:18:26 gargamel kernel: ata6.02: error: { ICRC }
Oct 17 21:18:26 gargamel kernel: ata6.03: exception Emask 0x100 SAct 0x2000 SErr 0x0 action 0x6 frozen
Oct 17 21:18:26 gargamel kernel: ata6.03: cmd 60/80:68:bf:67:cc/00:00:2b:00:00/40 tag 13 ncq 65536 in
Oct 17 21:18:26 gargamel kernel: res 3c/36:00:00:00:00/00:00:c0:d0:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:26 gargamel kernel: ata6.03: status: { DF DRQ }
Oct 17 21:18:26 gargamel kernel: ata6.03: error: { IDNF ABRT }
Oct 17 21:18:26 gargamel kernel: ata6.04: exception Emask 0x100 SAct 0x40 SErr 0x0 action 0x6 frozen
Oct 17 21:18:26 gargamel kernel: ata6.04: cmd 60/70:30:4f:67:cc/00:00:2b:00:00/40 tag 6 ncq 57344 in
Oct 17 21:18:26 gargamel kernel: res 3c/36:00:00:00:00/00:00:40:60:3c/00 Emask 0x2 (HSM violation)
Oct 17 21:18:26 gargamel kernel: ata6.04: status: { DF DRQ }
Oct 17 21:18:26 gargamel kernel: ata6.04: error: { IDNF ABRT }
Oct 17 21:18:26 gargamel kernel: ata6.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 17 21:18:26 gargamel kernel: ata6.15: hard resetting link
Oct 17 21:18:26 gargamel kernel: ata6: controller in dubious state, performing PORT_RST
Oct 17 21:18:28 gargamel kernel: ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Oct 17 21:18:28 gargamel kernel: ata6.00: hard resetting link
Oct 17 21:18:28 gargamel kernel: ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Oct 17 21:18:28 gargamel kernel: ata6.01: hard resetting link
Oct 17 21:18:28 gargamel kernel: ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:28 gargamel kernel: ata6.02: hard resetting link
Oct 17 21:18:29 gargamel kernel: ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:29 gargamel kernel: ata6.03: hard resetting link
Oct 17 21:18:29 gargamel kernel: ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:29 gargamel kernel: ata6.04: hard resetting link
Oct 17 21:18:29 gargamel kernel: ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:29 gargamel kernel: ata6.05: hard resetting link
Oct 17 21:18:30 gargamel kernel: ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
Oct 17 21:18:30 gargamel kernel: ata6.00: configured for UDMA/100
Oct 17 21:18:35 gargamel kernel: ata6.01: qc timeout (cmd 0xec)
Oct 17 21:18:35 gargamel kernel: ata6.01: failed to IDENTIFY (I/O error, err_mask=0x5)
Oct 17 21:18:35 gargamel kernel: ata6.01: revalidation failed (errno=-5)
Oct 17 21:18:35 gargamel kernel: ata6.15: hard resetting link
Oct 17 21:18:37 gargamel kernel: ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Oct 17 21:18:37 gargamel kernel: ata6.00: hard resetting link
Oct 17 21:18:37 gargamel kernel: ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Oct 17 21:18:37 gargamel kernel: ata6.01: hard resetting link
Oct 17 21:18:37 gargamel kernel: ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:37 gargamel kernel: ata6.02: hard resetting link
Oct 17 21:18:38 gargamel kernel: ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:38 gargamel kernel: ata6.03: hard resetting link
Oct 17 21:18:38 gargamel kernel: ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:38 gargamel kernel: ata6.04: hard resetting link
Oct 17 21:18:38 gargamel kernel: ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:38 gargamel kernel: ata6.05: hard resetting link
Oct 17 21:18:39 gargamel kernel: ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
Oct 17 21:18:39 gargamel kernel: ata6.00: configured for UDMA/100
Oct 17 21:18:49 gargamel kernel: ata6.01: qc timeout (cmd 0xec)
Oct 17 21:18:49 gargamel kernel: ata6.01: failed to IDENTIFY (I/O error, err_mask=0x5)
Oct 17 21:18:49 gargamel kernel: ata6.01: revalidation failed (errno=-5)
Oct 17 21:18:49 gargamel kernel: ata6.15: hard resetting link
Oct 17 21:18:51 gargamel kernel: ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Oct 17 21:18:51 gargamel kernel: ata6.00: hard resetting link
Oct 17 21:18:51 gargamel kernel: ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Oct 17 21:18:51 gargamel kernel: ata6.01: hard resetting link
Oct 17 21:18:51 gargamel kernel: ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:51 gargamel kernel: ata6.02: hard resetting link
Oct 17 21:18:52 gargamel kernel: ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:52 gargamel kernel: ata6.03: hard resetting link
Oct 17 21:18:52 gargamel kernel: ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:52 gargamel kernel: ata6.04: hard resetting link
Oct 17 21:18:52 gargamel kernel: ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:18:52 gargamel kernel: ata6.05: hard resetting link
Oct 17 21:18:53 gargamel kernel: ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
Oct 17 21:18:53 gargamel kernel: ata6.00: configured for UDMA/100
Oct 17 21:19:23 gargamel kernel: ata6.01: qc timeout (cmd 0xec)
Oct 17 21:19:23 gargamel kernel: ata6.01: failed to IDENTIFY (I/O error, err_mask=0x5)
Oct 17 21:19:23 gargamel kernel: ata6.01: revalidation failed (errno=-5)
Oct 17 21:19:23 gargamel kernel: ata6.01: failed to recover link after 3 tries, disabling
Oct 17 21:19:23 gargamel kernel: ata6.01: disabled
Oct 17 21:19:23 gargamel kernel: ata6.15: hard resetting link
Oct 17 21:19:25 gargamel kernel: ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Oct 17 21:19:25 gargamel kernel: ata6.00: hard resetting link
Oct 17 21:19:25 gargamel kernel: ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Oct 17 21:19:25 gargamel kernel: ata6.02: hard resetting link
Oct 17 21:19:26 gargamel kernel: ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:19:26 gargamel kernel: ata6.03: hard resetting link
Oct 17 21:19:26 gargamel kernel: ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:19:26 gargamel kernel: ata6.04: hard resetting link
Oct 17 21:19:26 gargamel kernel: ata6.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 17 21:19:26 gargamel kernel: ata6.05: hard resetting link
Oct 17 21:19:27 gargamel kernel: ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
Oct 17 21:19:27 gargamel kernel: ata6.00: configured for UDMA/100
Oct 17 21:19:27 gargamel kernel: ata6.02: configured for UDMA/100
Oct 17 21:19:27 gargamel kernel: ata6.03: configured for UDMA/100
Oct 17 21:19:27 gargamel kernel: ata6.04: configured for UDMA/100
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Unhandled sense code
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Sense Key : Hardware Error [current] [descriptor]
Oct 17 21:19:27 gargamel kernel: Descriptor sense data with sense descriptors (in hex):
Oct 17 21:19:27 gargamel kernel: 72 04 00 00 00 00 00 0c 00 0a 80 00 00 00 3c 00
Oct 17 21:19:27 gargamel kernel: 00 00 00 00
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Add. Sense: No additional sense information
Oct 17 21:19:27 gargamel kernel: end_request: I/O error, dev sdi, sector 734815951
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: rejecting I/O to offline device
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Unhandled error code
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Oct 17 21:19:27 gargamel kernel: end_request: I/O error, dev sdi, sector 734816207
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Unhandled sense code
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Sense Key : Hardware Error [current] [descriptor]
Oct 17 21:19:27 gargamel kernel: Descriptor sense data with sense descriptors (in hex):
Oct 17 21:19:27 gargamel kernel: 72 04 00 00 00 00 00 0c 00 0a 80 00 00 00 3c 20
Oct 17 21:19:27 gargamel kernel: 00 00 00 00
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Add. Sense: No additional sense information
Oct 17 21:19:27 gargamel kernel: end_request: I/O error, dev sdi, sector 734815935
Oct 17 21:19:27 gargamel kernel: sd 6:0:0:0: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 17 21:19:27 gargamel kernel: sd 6:0:0:0: [sdh] Sense Key : Aborted Command [current] [descriptor]
Oct 17 21:19:27 gargamel kernel: Descriptor sense data with sense descriptors (in hex):
Oct 17 21:19:27 gargamel kernel: 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 17 21:19:27 gargamel kernel: 2b cc 68 3f
Oct 17 21:19:27 gargamel kernel: sd 6:0:0:0: [sdh] Add. Sense: Recorded entity not found
Oct 17 21:19:27 gargamel kernel: end_request: I/O error, dev sdh, sector 734816319
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Unhandled sense code
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: rejecting I/O to offline device
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Sense Key : Hardware Error [current] [descriptor]
Oct 17 21:19:27 gargamel kernel: Descriptor sense data with sense descriptors (in hex):
Oct 17 21:19:27 gargamel kernel: 72 04 00 00 00 00 00 0c 00 0a 80 00 00 00 3c 70
Oct 17 21:19:27 gargamel kernel: 00 00 00 00
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Add. Sense: No additional sense information
Oct 17 21:19:27 gargamel kernel: end_request: I/O error, dev sdi, sector 734816063
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Unhandled sense code
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Sense Key : Hardware Error [current] [descriptor]
Oct 17 21:19:27 gargamel kernel: Descriptor sense data with sense descriptors (in hex):
Oct 17 21:19:27 gargamel kernel: 72 04 00 00 00 00 00 0c 00 0a 80 00 00 00 3c b0
Oct 17 21:19:27 gargamel kernel: 00 00 00 00
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Add. Sense: No additional sense information
Oct 17 21:19:27 gargamel kernel: end_request: I/O error, dev sdi, sector 734816191
Oct 17 21:19:27 gargamel kernel: ata6: EH complete
Oct 17 21:19:27 gargamel kernel: ata6.01: detaching (SCSI 6:1:0:0)
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Synchronizing SCSI cache
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Stopping disk
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] START_STOP FAILED
Oct 17 21:19:27 gargamel kernel: sd 6:1:0:0: [sdi] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 17 21:19:27 gargamel kernel: raid5: Disk failure on sdi1, disabling device.
Oct 17 21:19:27 gargamel kernel: raid5: Operation continuing on 4 devices.
Oct 17 21:19:27 gargamel kernel: raid5: Disk failure on sdh1, disabling device.
Oct 17 21:19:27 gargamel kernel: raid5: Operation continuing on 3 devices.
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-11-18 18:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20091116184242.GA22250@merlins.org>
[not found] ` <20091116184853.GA23126@merlins.org>
[not found] ` <4B0238EC.6060803@kernel.org>
2009-11-17 17:39 ` help with PMP failures Marc MERLIN
2009-11-18 4:03 ` Tejun Heo
2009-11-18 7:41 ` Marc MERLIN
2009-11-18 8:33 ` Tejun Heo
2009-11-18 18:29 ` Marc MERLIN
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).