From: Douglas Gilbert <dougg@torque.net>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Alan Stern <stern@rowland.harvard.edu>,
Martin Peschke <mpeschke@de.ibm.com>,
Radovan Garabik <garabik@kassiopeia.juls.savba.sk>,
SCSI Mailing List <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH as468] Retry supposedly "unrecoverable" hardware errors
Date: Fri, 18 Feb 2005 10:49:53 +1000 [thread overview]
Message-ID: <42153BB1.4050303@torque.net> (raw)
In-Reply-To: <1108653107.5507.3.camel@mulgrave>
James Bottomley wrote:
> On Thu, 2005-02-17 at 14:27 +1000, Douglas Gilbert wrote:
>
>>Recent SPC-3 and SBC-2 drafts treat the sense keys of
>>MEDIUM ERROR and HARDWARE ERROR in a similar way.
>>Both can return an "info" field which has the same
>>meaning (lba of first failure). The distinction is that
>>MEDIUM ERROR is a little more precise (at least for
>>magnetic rotating media) **. For flash ram the distinction
>>is moot.
>
>
> My copy of SPC-3 (r21d) still defined HARDWARE ERROR in Table 27 as
>
> HARDWARE ERROR: Indicates that the device server detected a non-
> recoverable hardware failure
> (e.g., controller failure, device failure, or parity error) while
> performing the command or during a self
> test.
>
> which looks pretty non-retryable to me ... where does it say that the
> error might be retryable?
James,
The definition of MEDIUM ERROR from the same table:
"Indicates that the command terminated with a non-recoverable
error condition that may have been caused by a flaw in the
medium or an error in the recorded data. This sense key may
also be returned if the device server is unable to
distinguish between a flaw in the medium and a specific
hardware failure (i.e. sense key 4h)". Sense key "4h" is
HARDWARE ERROR.
I interpret that as SPC-3 saying MEDIUM ERROR and
HARDWARE ERROR may both report non-recoverable errors.
Also note that MEDIUM ERROR, HARDWARE ERROR and RECOVERED
ERROR can return an "actual retry count" in their additional
sense data.
SBC-2 (rev 16) makes little distinction between
the two sense keys for "unrecovered read errors": table 4 shows
either can be used. It also says on page 19: "When
an unrecovered read error is reported the information field
of the sense data shall contain the LBA of the unrecovered
logical block."
Nothing that I can see links an "unrecovered (read) error" with
the application client retrying the same command in either draft.
If "actual retry count" is > 1 in the sense key specific field
then that implies the device has already tried several times.
SSC-3 (for tape drives) also allows MEDIUM ERROR or HARDWARE ERROR
to indicate an unrecovered read error (rev 1c, table 2). For tape
drives, retrying the same command is probably not appropriate. [I
note that st and sg set their 'max_retries' to 0 to inhibit this.]
MMC-5 only mentions the HARDWARE ERROR sense key for a self
diagnostic failure.
This analysis leads me to question why retries are instigated
from the mid level and not the sd driver (and perhaps sr driver
as well). If so, sd should not instigate retries if the device
indicates a reasonable number of retries have already taken
place, unless it can change some other factor or is instructed by
some parameter to sd.
As Alan Stern points out, my patch fails the reality
test. The device in question obviously required a retry when
it returned a HARDWARE ERROR sense key (but perhaps the
reason was not an unrecovered error or it was not reported
properly).
Doug Gilbert
prev parent reply other threads:[~2005-02-18 0:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20050210134432.GA12229@kassiopeia.juls.savba.sk>
2005-02-11 16:00 ` [usb-storage] Re: MPIO HS200 Gigabox weird behaviour again Alan Stern
2005-02-11 16:18 ` James Bottomley
2005-02-11 18:31 ` Alan Stern
2005-02-11 19:07 ` Patrick Mansfield
2005-02-11 19:41 ` Alan Stern
2005-02-16 14:37 ` Radovan Garabik
2005-02-16 16:53 ` [PATCH as468] Retry supposedly "unrecoverable" hardware errors Alan Stern
2005-02-17 4:27 ` Douglas Gilbert
2005-02-17 5:06 ` Douglas Gilbert
2005-02-17 15:20 ` Alan Stern
2005-02-17 15:11 ` James Bottomley
2005-02-18 0:49 ` Douglas Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42153BB1.4050303@torque.net \
--to=dougg@torque.net \
--cc=James.Bottomley@SteelEye.com \
--cc=garabik@kassiopeia.juls.savba.sk \
--cc=linux-scsi@vger.kernel.org \
--cc=mpeschke@de.ibm.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox