* 2.6.34 PDC20268 PATA IO error loop makes system unusable
@ 2010-06-13 15:48 Andi Kleen
2010-06-14 7:43 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2010-06-13 15:48 UTC (permalink / raw)
To: linux-ide
Hi,
On 2.6.34:
While writing some data to an old PATA Maxtor disk connected to a PDC20268 Promise
controller using the libata driver there were some IO errors.
After some time those resulted in a endless error message loop that made the system essentially
unusable: (console was flooded and unusable, ssh was extremly slow etc.):
This does not exactly look like graceful error handling.
Excerpts from the log (full version available on request)
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
ata12: EH complete
ata12.00: limiting speed to UDMA/66:PIO4
ata12: soft resetting link
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
quiet_error: 10 callbacks suppressed
ata12: EH complete
EXT4-fs (dm-0): mounted filesystem with ordered data mode
kjournald starting. Commit interval 5 seconds
EXT3-fs (dm-1): using internal journal
EXT3-fs (dm-1): mounted filesystem with writeback data mode
EXT4-fs (dm-2): mounted filesystem with ordered data mode
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
...
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
quiet_error: 10 callbacks suppressed
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
... lots of similar messages until it goes down to PIO0 then some more errors ....
ata12: soft resetting link
ata12: soft resetting link
ata12: link is slow to respond, please be patient (ready=0)
ata12.00: qc timeout (cmd 0xec)
ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata12: link is slow to respond, please be patient (ready=0)
ata12: soft resetting link
ata12.00: disabled
ata12: soft resetting link
ata12: EH complete
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
sd 11:0:0:0: [sdd] Unhandled error code
and finally and endless flood of
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
same messages repeating forever, just with CDB changing occasionally.
....
not stopping until I reset the box.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen @ 2010-06-14 7:43 ` Tejun Heo 2010-06-14 7:53 ` Andi Kleen 2010-06-14 11:17 ` Alan Cox 0 siblings, 2 replies; 8+ messages in thread From: Tejun Heo @ 2010-06-14 7:43 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-ide, linux-scsi Hello, Andi. (cc'ing linux-scsi and quoting whole message) On 06/13/2010 05:48 PM, Andi Kleen wrote: > > Hi, > > On 2.6.34: > > While writing some data to an old PATA Maxtor disk connected to a > PDC20268 Promise controller using the libata driver there were some > IO errors. > > After some time those resulted in a endless error message loop that > made the system essentially unusable: (console was flooded and > unusable, ssh was extremly slow etc.): > > This does not exactly look like graceful error handling. > > Excerpts from the log (full version available on request) > > ata12.00: configured for UDMA/100 > ata12: EH complete > ata12.00: configured for UDMA/100 > ata12: EH complete > ata12.00: configured for UDMA/100 > ata12: EH complete > ata12.00: configured for UDMA/100 > ata12: EH complete > ata12.00: configured for UDMA/100 > ata12: EH complete > ata12.00: configured for UDMA/100 > sd 11:0:0:0: [sdd] Unhandled sense code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] > Descriptor sense data with sense descriptors (in hex): > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > 00 5b 23 bb > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00 > ata12: EH complete > ata12.00: limiting speed to UDMA/66:PIO4 > ata12: soft resetting link > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > sd 11:0:0:0: [sdd] Unhandled sense code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] > Descriptor sense data with sense descriptors (in hex): > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > 00 5b 23 bb > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00 > quiet_error: 10 callbacks suppressed > ata12: EH complete > EXT4-fs (dm-0): mounted filesystem with ordered data mode > kjournald starting. Commit interval 5 seconds > EXT3-fs (dm-1): using internal journal > EXT3-fs (dm-1): mounted filesystem with writeback data mode > EXT4-fs (dm-2): mounted filesystem with ordered data mode > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > ata12: EH complete > ata12.00: configured for UDMA/66 > > ... > > sd 11:0:0:0: [sdd] Unhandled sense code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] > Descriptor sense data with sense descriptors (in hex): > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > 00 5b 23 bb > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00 > quiet_error: 10 callbacks suppressed > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > sd 11:0:0:0: [sdd] Unhandled sense code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] > Descriptor sense data with sense descriptors (in hex): > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > 00 5b 23 bb > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > ata12: EH complete > ata12.00: configured for UDMA/33 > > > ... lots of similar messages until it goes down to PIO0 then some more errors .... > > > ata12: soft resetting link > ata12: soft resetting link > ata12: link is slow to respond, please be patient (ready=0) > ata12.00: qc timeout (cmd 0xec) > ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4) > ata12: link is slow to respond, please be patient (ready=0) > ata12: soft resetting link > ata12.00: disabled > ata12: soft resetting link > ata12: EH complete At this point, the drive stopped responding and libata removed the drive from the system. > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00 As the device is gone, any command is immediately failed with DID_BAD_TARGET. > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > lost page write due to I/O error on sdd > sd 11:0:0:0: [sdd] Unhandled error code > > and finally and endless flood of > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00 > sd 11:0:0:0: [sdd] Unhandled error code > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00 > > same messages repeating forever, just with CDB changing occasionally. > > .... > > not stopping until I reset the box. Did you have a lot of dirty pages? It looks like upper layer is trying to flush all the dirty buffers and SCSI is a tad bit too verbose about failing each IO w/ DID_BAD_TARGET thus taking a very long time if there are many to fail. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-14 7:43 ` Tejun Heo @ 2010-06-14 7:53 ` Andi Kleen 2010-06-14 7:59 ` Tejun Heo 2010-06-14 11:17 ` Alan Cox 1 sibling, 1 reply; 8+ messages in thread From: Andi Kleen @ 2010-06-14 7:53 UTC (permalink / raw) To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote: > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00 > > sd 11:0:0:0: [sdd] Unhandled error code > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00 > > sd 11:0:0:0: [sdd] Unhandled error code > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00 > > > > same messages repeating forever, just with CDB changing occasionally. > > > > .... > > > > not stopping until I reset the box. > > Did you have a lot of dirty pages? It looks like upper layer is Yes, there was a dd running. > trying to flush all the dirty buffers and SCSI is a tad bit too > verbose about failing each IO w/ DID_BAD_TARGET thus taking a very A bit too verbose? That's really an euphemism ... During the CDB: Write loop the console was totally unusable! And I think the fsyncs in syslogd completely made the performance tank. So basically it was a "reset button only" situation. When the device is gone what's the point in giving a message more than once? Can't the requests just be silently failed in this case? -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-14 7:53 ` Andi Kleen @ 2010-06-14 7:59 ` Tejun Heo 2010-06-14 9:22 ` Andi Kleen 2010-06-14 16:20 ` James Bottomley 0 siblings, 2 replies; 8+ messages in thread From: Tejun Heo @ 2010-06-14 7:59 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-ide, linux-scsi Hello, On 06/14/2010 09:53 AM, Andi Kleen wrote: > On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote: >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00 >>> sd 11:0:0:0: [sdd] Unhandled error code >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00 >>> sd 11:0:0:0: [sdd] Unhandled error code >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00 >>> >>> same messages repeating forever, just with CDB changing occasionally. >>> >>> .... >>> >>> not stopping until I reset the box. >> >> Did you have a lot of dirty pages? It looks like upper layer is > > Yes, there was a dd running. > >> trying to flush all the dirty buffers and SCSI is a tad bit too >> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very > > A bit too verbose? That's really an euphemism ... Yeap, of course it was. :-) > During the CDB: Write loop the console was totally unusable! > > And I think the fsyncs in syslogd completely made the performance > tank. Console often becomes the bottleneck too when there are a lot of kernel messages. > So basically it was a "reset button only" situation. > > When the device is gone what's the point in giving a message > more than once? Can't the requests just be silently failed in this > case? Yeah, it would be better to somehow summarize those error message instead of spitting out all of them. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-14 7:59 ` Tejun Heo @ 2010-06-14 9:22 ` Andi Kleen 2010-06-14 16:20 ` James Bottomley 1 sibling, 0 replies; 8+ messages in thread From: Andi Kleen @ 2010-06-14 9:22 UTC (permalink / raw) To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi > > So basically it was a "reset button only" situation. > > > > When the device is gone what's the point in giving a message > > more than once? Can't the requests just be silently failed in this > > case? > > Yeah, it would be better to somehow summarize those error message > instead of spitting out all of them. Hmm I think I have some ideas on that, I'll take a look. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-14 7:59 ` Tejun Heo 2010-06-14 9:22 ` Andi Kleen @ 2010-06-14 16:20 ` James Bottomley 1 sibling, 0 replies; 8+ messages in thread From: James Bottomley @ 2010-06-14 16:20 UTC (permalink / raw) To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi On Mon, 2010-06-14 at 09:59 +0200, Tejun Heo wrote: > Hello, > > On 06/14/2010 09:53 AM, Andi Kleen wrote: > > On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote: > >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00 > >>> sd 11:0:0:0: [sdd] Unhandled error code > >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00 > >>> sd 11:0:0:0: [sdd] Unhandled error code > >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00 > >>> > >>> same messages repeating forever, just with CDB changing occasionally. > >>> > >>> .... > >>> > >>> not stopping until I reset the box. > >> > >> Did you have a lot of dirty pages? It looks like upper layer is > > > > Yes, there was a dd running. > > > >> trying to flush all the dirty buffers and SCSI is a tad bit too > >> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very > > > > A bit too verbose? That's really an euphemism ... > > Yeap, of course it was. :-) > > > During the CDB: Write loop the console was totally unusable! > > > > And I think the fsyncs in syslogd completely made the performance > > tank. > > Console often becomes the bottleneck too when there are a lot of > kernel messages. > > > So basically it was a "reset button only" situation. > > > > When the device is gone what's the point in giving a message > > more than once? Can't the requests just be silently failed in this > > case? > > Yeah, it would be better to somehow summarize those error message > instead of spitting out all of them. I don't think we can summarize. However, when things start to go wrong, it's usually only the first set of errors that are significant, so we could do a simple ratelimit. James --- diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1646fe7..c8c7483 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -896,7 +896,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) case ACTION_FAIL: /* Give up and fail the remainder of the request */ scsi_release_buffers(cmd); - if (!(req->cmd_flags & REQ_QUIET)) { + if (!(req->cmd_flags & REQ_QUIET) && printk_ratelimit()) { if (description) scmd_printk(KERN_INFO, cmd, "%s\n", description); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-14 7:43 ` Tejun Heo 2010-06-14 7:53 ` Andi Kleen @ 2010-06-14 11:17 ` Alan Cox 2010-06-14 11:29 ` Andi Kleen 1 sibling, 1 reply; 8+ messages in thread From: Alan Cox @ 2010-06-14 11:17 UTC (permalink / raw) To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] That is the drive complaining about a bad block. > > Descriptor sense data with sense descriptors (in hex): > > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > > 00 5b 23 bb > > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field > > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00 Which in my experience usually comes from - The disc being unable to find the data block - A corrupt indirect block in the file system leading to a requst for an out of range data block In theory it could be an on the wire bitflip but thats a bit too consistent to believe. Alan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable 2010-06-14 11:17 ` Alan Cox @ 2010-06-14 11:29 ` Andi Kleen 0 siblings, 0 replies; 8+ messages in thread From: Andi Kleen @ 2010-06-14 11:29 UTC (permalink / raw) To: Alan Cox; +Cc: Tejun Heo, Andi Kleen, linux-ide, linux-scsi On Mon, Jun 14, 2010 at 12:17:53PM +0100, Alan Cox wrote: > > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > > > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] > > That is the drive complaining about a bad block. > > > > Descriptor sense data with sense descriptors (in hex): > > > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 > > > 00 5b 23 bb > > > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field > > > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00 > > Which in my experience usually comes from > > - The disc being unable to find the data block > > - A corrupt indirect block in the file system leading to a requst for an > out of range data block > > In theory it could be an on the wire bitflip but thats a bit too > consistent to believe. Thank, Alan. I really don't care about the drive (nothing valuable on it). It was an old IDE disk and I am not surprised it has some bad sectors. In fact I was clearing it for junking it (although I may keep it now just in case I need to test IDE error handling again :) But I cared about the system becoming unusable due to error handling running amok. That was the point of this bug report. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-14 16:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen 2010-06-14 7:43 ` Tejun Heo 2010-06-14 7:53 ` Andi Kleen 2010-06-14 7:59 ` Tejun Heo 2010-06-14 9:22 ` Andi Kleen 2010-06-14 16:20 ` James Bottomley 2010-06-14 11:17 ` Alan Cox 2010-06-14 11:29 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).