linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ
@ 2017-04-03 12:03 Dmitry Monakhov
  2017-04-03 12:03 ` [PATCH 2/2] block: Improve error handling verbosity Dmitry Monakhov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2017-04-03 12:03 UTC (permalink / raw)
  To: linux-block; +Cc: linux-scsi, Dmitry Monakhov

It is quite easily to get URE after power failure and get scary message.
URE is happens due to internal drive crc mismatch due to partial sector
update. Most people interpret such message as "My drive is dying", which
isreasonable assumption if your dmesg is full of complain from disks and
read(2) return EIO. In fact this error is not fatal. One can fix it easily
by rewriting affected sector.

So we have to handle URE like follows:
- Return EILSEQ to signall caller that this is bad data related problem
- Do not retry command, because this is useless.



### Test case
#Test uses two HDD: disks sdb sdc
#Write_phase
# let fio work ~100sec and then cut the power
fio --ioengine=libaio --direct=1 --rw=write --bs=1M --iodepth=16 \
--time_based=1 --runtime=600 --filesize=1G --size=1T \
--name /dev/sdb --name /dev/sdc

# Check_phase after system goes back
fio --ioengine=libaio --direct=1 --group_reporting --rw=read --bs=1M \
--iodepth=16 --size=1G --filesize=1G
--name=/dev/sdb --name /dev/sdc

More info about URE probability here:
https://plus.google.com/101761226576930717211/posts/Pctq7kk1dLL

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 drivers/scsi/scsi_lib.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d7..59d64ad 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -961,6 +961,19 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			/* See SSC3rXX or current. */
 			action = ACTION_FAIL;
 			break;
+		case MEDIUM_ERROR:
+			if (sshdr.asc == 0x11) {
+				/* Handle unrecovered read error */
+				switch (sshdr.ascq) {
+				case 0x00: /* URE */
+				case 0x04: /* URE auto reallocate failed */
+				case 0x0B: /* URE recommend reassignment*/
+				case 0x0C: /* URE recommend rewrite the data */
+					action = ACTION_FAIL;
+					error = -EILSEQ;
+					break;
+				}
+			}
 		default:
 			action = ACTION_FAIL;
 			break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] block: Improve error handling verbosity
  2017-04-03 12:03 [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Dmitry Monakhov
@ 2017-04-03 12:03 ` Dmitry Monakhov
  2017-04-20  5:37   ` Christoph Hellwig
  2017-04-04  6:57 ` [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Christoph Hellwig
  2017-04-20  5:37 ` Christoph Hellwig
  2 siblings, 1 reply; 5+ messages in thread
From: Dmitry Monakhov @ 2017-04-03 12:03 UTC (permalink / raw)
  To: linux-block; +Cc: linux-scsi, Dmitry Monakhov

EILSEQ is returned due to internal csum error on disk/fabric,
let's add special message to distinguish it from others. Also dump
original numerical error code.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/blk-core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 071a998..8eab846 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2576,13 +2576,16 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 		case -ENODATA:
 			error_type = "critical medium";
 			break;
+		case -EILSEQ:
+			error_type = "bad data";
+			break;
 		case -EIO:
 		default:
 			error_type = "I/O";
 			break;
 		}
-		printk_ratelimited(KERN_ERR "%s: %s error, dev %s, sector %llu\n",
-				   __func__, error_type, req->rq_disk ?
+		printk_ratelimited(KERN_ERR "%s: %s error (%d), dev %s, sector %llu\n",
+				   __func__, error_type, error, req->rq_disk ?
 				   req->rq_disk->disk_name : "?",
 				   (unsigned long long)blk_rq_pos(req));
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ
  2017-04-03 12:03 [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Dmitry Monakhov
  2017-04-03 12:03 ` [PATCH 2/2] block: Improve error handling verbosity Dmitry Monakhov
@ 2017-04-04  6:57 ` Christoph Hellwig
  2017-04-20  5:37 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2017-04-04  6:57 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-block, linux-scsi

I'm planning to introduce new block-layer specific status code ASAP,
so I'd prefer not to add new errno special cases.

I'll port your patches to the new code and will send them out with
my series in a few days, though.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ
  2017-04-03 12:03 [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Dmitry Monakhov
  2017-04-03 12:03 ` [PATCH 2/2] block: Improve error handling verbosity Dmitry Monakhov
  2017-04-04  6:57 ` [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Christoph Hellwig
@ 2017-04-20  5:37 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2017-04-20  5:37 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-block, linux-scsi

Looks like I won't get the major error status changes into 4.12,
so let's go with these patches for now:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] block: Improve error handling verbosity
  2017-04-03 12:03 ` [PATCH 2/2] block: Improve error handling verbosity Dmitry Monakhov
@ 2017-04-20  5:37   ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2017-04-20  5:37 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-block, linux-scsi

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-20  5:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-03 12:03 [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Dmitry Monakhov
2017-04-03 12:03 ` [PATCH 2/2] block: Improve error handling verbosity Dmitry Monakhov
2017-04-20  5:37   ` Christoph Hellwig
2017-04-04  6:57 ` [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ Christoph Hellwig
2017-04-20  5:37 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).