* [PATCH] libata: handle IDE AMNF errors in the EH path
@ 2014-07-12 8:22 Alexey Asemov
2014-07-14 16:10 ` Tejun Heo
0 siblings, 1 reply; 5+ messages in thread
From: Alexey Asemov @ 2014-07-12 8:22 UTC (permalink / raw)
To: linux-ide; +Cc: tj, linux-kernel, Alexey Asemov
libata-eh.c should handle AMNF error condition (error byte bit 0, usually
code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack
can handle it properly (translation code 0x01 is already present in
libata-scsi.c) but was never passed down due to lack of handling in EH.
https://bugzilla.kernel.org/show_bug.cgi?id=80031
Signed-off-by: Alexey Asemov <alex@alex-at.ru>
---
drivers/ata/libata-eh.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 6760fc4..dad83df 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1811,7 +1811,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
case ATA_DEV_ATA:
if (err & ATA_ICRC)
qc->err_mask |= AC_ERR_ATA_BUS;
- if (err & ATA_UNC)
+ if (err & (ATA_UNC | ATA_AMNF))
qc->err_mask |= AC_ERR_MEDIA;
if (err & ATA_IDNF)
qc->err_mask |= AC_ERR_INVALID;
@@ -2556,11 +2556,12 @@ static void ata_eh_link_report(struct ata_link *link)
}
if (cmd->command != ATA_CMD_PACKET &&
- (res->feature & (ATA_ICRC | ATA_UNC | ATA_IDNF |
- ATA_ABORTED)))
- ata_dev_err(qc->dev, "error: { %s%s%s%s}\n",
+ (res->feature & (ATA_ICRC | ATA_UNC | ATA_AMNF |
+ ATA_IDNF | ATA_ABORTED)))
+ ata_dev_err(qc->dev, "error: { %s%s%s%s%s}\n",
res->feature & ATA_ICRC ? "ICRC " : "",
res->feature & ATA_UNC ? "UNC " : "",
+ res->feature & ATA_AMNF ? "AMNF " : "",
res->feature & ATA_IDNF ? "IDNF " : "",
res->feature & ATA_ABORTED ? "ABRT " : "");
#endif
--
1.7.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] libata: handle IDE AMNF errors in the EH path
2014-07-12 8:22 [PATCH] libata: handle IDE AMNF errors in the EH path Alexey Asemov
@ 2014-07-14 16:10 ` Tejun Heo
2014-07-14 21:16 ` Alexey Asemov (Alex/AT)
2014-07-15 6:28 ` [PATCH] libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH Alexey Asemov
0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2014-07-14 16:10 UTC (permalink / raw)
To: Alexey Asemov; +Cc: linux-ide, linux-kernel
On Sat, Jul 12, 2014 at 12:22:53PM +0400, Alexey Asemov wrote:
> libata-eh.c should handle AMNF error condition (error byte bit 0, usually
> code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack
> can handle it properly (translation code 0x01 is already present in
> libata-scsi.c) but was never passed down due to lack of handling in EH.
IIRC AMNF has been deprecated for a very long time now. The bit isn't
too likely to get reused given that major updates to ATA spec is
improbable but I'm still not quite fond of adding handling of a long
deprecated flag. Can you please provide a bit more detail? Which
device was it?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libata: handle IDE AMNF errors in the EH path
2014-07-14 16:10 ` Tejun Heo
@ 2014-07-14 21:16 ` Alexey Asemov (Alex/AT)
2014-07-15 6:28 ` [PATCH] libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH Alexey Asemov
1 sibling, 0 replies; 5+ messages in thread
From: Alexey Asemov (Alex/AT) @ 2014-07-14 21:16 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-ide, linux-kernel
Tejun Heo <tj@kernel.org> wrote at Mon, 14 Jul 2014 20:10:10 +0400:
> IIRC AMNF has been deprecated for a very long time now. The bit isn't
> too likely to get reused given that major updates to ATA spec is
> improbable but I'm still not quite fond of adding handling of a long
> deprecated flag. Can you please provide a bit more detail? Which
> device was it?
The hardware is: AMD 6550M-based notebook, WD7500BPVT 2.5" 750G SATA2 hard
drive. PCI IDs for the controller are 1022:7801 subsys 1025:059d.
I was using linux-based machine to salvage data from failing hard drive,
and found that pure AMNF 0x01 error code generates generic "device error"
that is retried several times by SCSI stack instead of "media error" that
is passed up to software. That slowed down salvaging a lot, so the
patching was done and then everything went smooth.
So I presume AMNF error code is surely not dead yet, and it's better for
it to be handled. It is used by modern enough devices, and used properly:
drive returned AMNF only when IDs for track cannot be read completely due
to dying head or positioning, otherwise it returned UNC(orrectables).
Also, there is handling code in libata-scsi.c for 0x01 AMNF error already.
Not handling it causes excessive retries that can damage failing drivers
further, and wrong generic error code ("device error") reporting.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH.
2014-07-14 16:10 ` Tejun Heo
2014-07-14 21:16 ` Alexey Asemov (Alex/AT)
@ 2014-07-15 6:28 ` Alexey Asemov
2014-07-15 15:16 ` Tejun Heo
1 sibling, 1 reply; 5+ messages in thread
From: Alexey Asemov @ 2014-07-15 6:28 UTC (permalink / raw)
To: linux-ide; +Cc: tj, linux-kernel, Alexey Asemov
While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the
controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data
from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure
AMNF 0x01 error code generates generic "device error" that is retried
several times by SCSI stack instead of "media error" that is passed up to
software.
So we may assume deprecated AMNF error code is surely not dead yet, and
it's better for it to be handled properly. As we may see it is used by
modern enough devices, and used properly: drive returned AMNF only when IDs
for track cannot be read completely due to dying head or positioning,
otherwise it returned UNC(orrectables).
Not handling it causes wrong generic error code ("device error") reporting
down the stack, can damage failing drives further because of excessive
retries, and slows salvaging down a lot. Also, there is handling code in
libata-scsi.c for 0x01 AMNF error already.
https://bugzilla.kernel.org/show_bug.cgi?id=80031
Signed-off-by: Alexey Asemov <alex@alex-at.ru>
---
drivers/ata/libata-eh.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 6760fc4..dad83df 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1811,7 +1811,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc,
case ATA_DEV_ATA:
if (err & ATA_ICRC)
qc->err_mask |= AC_ERR_ATA_BUS;
- if (err & ATA_UNC)
+ if (err & (ATA_UNC | ATA_AMNF))
qc->err_mask |= AC_ERR_MEDIA;
if (err & ATA_IDNF)
qc->err_mask |= AC_ERR_INVALID;
@@ -2556,11 +2556,12 @@ static void ata_eh_link_report(struct ata_link *link)
}
if (cmd->command != ATA_CMD_PACKET &&
- (res->feature & (ATA_ICRC | ATA_UNC | ATA_IDNF |
- ATA_ABORTED)))
- ata_dev_err(qc->dev, "error: { %s%s%s%s}\n",
+ (res->feature & (ATA_ICRC | ATA_UNC | ATA_AMNF |
+ ATA_IDNF | ATA_ABORTED)))
+ ata_dev_err(qc->dev, "error: { %s%s%s%s%s}\n",
res->feature & ATA_ICRC ? "ICRC " : "",
res->feature & ATA_UNC ? "UNC " : "",
+ res->feature & ATA_AMNF ? "AMNF " : "",
res->feature & ATA_IDNF ? "IDNF " : "",
res->feature & ATA_ABORTED ? "ABRT " : "");
#endif
--
1.7.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH.
2014-07-15 6:28 ` [PATCH] libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH Alexey Asemov
@ 2014-07-15 15:16 ` Tejun Heo
0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2014-07-15 15:16 UTC (permalink / raw)
To: Alexey Asemov; +Cc: linux-ide, linux-kernel
On Tue, Jul 15, 2014 at 10:28:42AM +0400, Alexey Asemov wrote:
> While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the
> controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data
> from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure
> AMNF 0x01 error code generates generic "device error" that is retried
> several times by SCSI stack instead of "media error" that is passed up to
> software.
>
> So we may assume deprecated AMNF error code is surely not dead yet, and
> it's better for it to be handled properly. As we may see it is used by
> modern enough devices, and used properly: drive returned AMNF only when IDs
> for track cannot be read completely due to dying head or positioning,
> otherwise it returned UNC(orrectables).
>
> Not handling it causes wrong generic error code ("device error") reporting
> down the stack, can damage failing drives further because of excessive
> retries, and slows salvaging down a lot. Also, there is handling code in
> libata-scsi.c for 0x01 AMNF error already.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=80031
>
> Signed-off-by: Alexey Asemov <alex@alex-at.ru>
Applied to libata/for-3.16-fixes w/ shortened $SUBJ (moved to the
first paragraph).
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-15 15:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-12 8:22 [PATCH] libata: handle IDE AMNF errors in the EH path Alexey Asemov
2014-07-14 16:10 ` Tejun Heo
2014-07-14 21:16 ` Alexey Asemov (Alex/AT)
2014-07-15 6:28 ` [PATCH] libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH Alexey Asemov
2014-07-15 15:16 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).