All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
Date: Mon, 14 Jun 2010 09:43:28 +0200	[thread overview]
Message-ID: <4C15DDA0.3020409@kernel.org> (raw)
In-Reply-To: <20100613154808.GA10408@basil.fritz.box>

Hello, Andi.

(cc'ing linux-scsi and quoting whole message)

On 06/13/2010 05:48 PM, Andi Kleen wrote:
> 
> Hi,
> 
> On 2.6.34:
> 
> While writing some data to an old PATA Maxtor disk connected to a
> PDC20268 Promise controller using the libata driver there were some
> IO errors.
> 
> After some time those resulted in a endless error message loop that
> made the system essentially unusable: (console was flooded and
> unusable, ssh was extremly slow etc.):
> 
> This does not exactly look like graceful error handling.
> 
> Excerpts from the log (full version available on request)
> 
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
> ata12: EH complete
> ata12.00: limiting speed to UDMA/66:PIO4
> ata12: soft resetting link
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> EXT4-fs (dm-0): mounted filesystem with ordered data mode
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (dm-1): using internal journal
> EXT3-fs (dm-1): mounted filesystem with writeback data mode
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> 
> ...
> 
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> 
> 
> ... lots of similar messages until it goes down to PIO0 then some more errors ....
> 
> 
> ata12: soft resetting link
> ata12: soft resetting link
> ata12: link is slow to respond, please be patient (ready=0)
> ata12.00: qc timeout (cmd 0xec)
> ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata12: link is slow to respond, please be patient (ready=0)
> ata12: soft resetting link
> ata12.00: disabled
> ata12: soft resetting link
> ata12: EH complete

At this point, the drive stopped responding and libata removed the
drive from the system.

> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00

As the device is gone, any command is immediately failed with
DID_BAD_TARGET.

> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> sd 11:0:0:0: [sdd] Unhandled error code
> 
> and finally and endless flood of 
> 
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> 
> same messages repeating forever, just with CDB changing occasionally.
> 
> ....
> 
> not stopping until I reset the box.

Did you have a lot of dirty pages?  It looks like upper layer is
trying to flush all the dirty buffers and SCSI is a tad bit too
verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
long time if there are many to fail.

Thanks.

-- 
tejun

  reply	other threads:[~2010-06-14  7:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen
2010-06-14  7:43 ` Tejun Heo [this message]
2010-06-14  7:53   ` Andi Kleen
2010-06-14  7:59     ` Tejun Heo
2010-06-14  9:22       ` Andi Kleen
2010-06-14 16:20       ` James Bottomley
2010-06-14 11:17   ` Alan Cox
2010-06-14 11:29     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C15DDA0.3020409@kernel.org \
    --to=tj@kernel.org \
    --cc=andi@firstfloor.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.