All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH]sd: Don't treat succeeded SYNC as error
@ 2016-04-25 10:36 Jinpu Wang
  2016-04-25 20:28 ` Bart Van Assche
  0 siblings, 1 reply; 3+ messages in thread
From: Jinpu Wang @ 2016-04-25 10:36 UTC (permalink / raw)
  To: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Christoph Hellwig
  Cc: s.parschauer

[-- Attachment #1: Type: text/plain, Size: 233 bytes --]

Hi, all

We hit IO error on fsync, it turns out was because sd treat succeeded
SYNC as error. From what I checked in SBC spec there is no indication
we should fail IO in this case, so we create this patch.


Best Regards,

Jack Wang

[-- Attachment #2: 0001-sd-Don-t-treat-succeeded-SYNC-as-error.patch --]
[-- Type: text/x-patch, Size: 1453 bytes --]

From 5d1f72d9643ce61cd9f3d312377378c43f171d0c Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Mon, 25 Apr 2016 12:05:22 +0200
Subject: [PATCH] sd: Don't treat succeeded SYNC as error

We hit IO error in our production on multipath devices during resize
device on target side, the problem turns out sd driver passes up as IO
error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate
Capacity data has changed, even storage side sync the data properly.

In order to fix this check in sd_done, report success if condition
matches.

Sebastian Parschauer report/analyze the bug here:
https://sourceforge.net/p/scst/mailman/message/34953416/

Signed-off-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/sd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5a5457a..e9bfe01 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1833,6 +1833,19 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 			}
 		}
 		break;
+	case UNIT_ATTENTION:
+		/* Capacity data has changed */
+		if (sshdr.asc == 0x2a && sshdr.ascq == 0x09) {
+			switch (op) {
+			/* don't treat succeeded fsync() as error */
+			case SYNCHRONIZE_CACHE:
+			case SYNCHRONIZE_CACHE_16:
+				if (good_bytes == scsi_bufflen(SCpnt))
+					SCpnt->result = 0;
+					break;
+			}
+		}
+		break;
 	default:
 		break;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH]sd: Don't treat succeeded SYNC as error
  2016-04-25 10:36 [PATCH]sd: Don't treat succeeded SYNC as error Jinpu Wang
@ 2016-04-25 20:28 ` Bart Van Assche
  2016-04-26 10:57   ` Jinpu Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Bart Van Assche @ 2016-04-25 20:28 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Christoph Hellwig, s.parschauer

On 04/25/2016 03:36 AM, Jinpu Wang wrote:
> We hit IO error on fsync, it turns out was because sd treat succeeded
> SYNC as error. From what I checked in SBC spec there is no indication
> we should fail IO in this case, so we create this patch.

Please follow the rules in Documentation/SubmittingPatches and submit 
patches inline. Regarding the patch itself: why is the switch(op) 
needed? Can it be left out? And regarding (sshdr.asc == 0x2a && 
sshdr.ascq == 0x09): which other unit attentions should cause SCSI 
commands to succeed instead of to fail?

Bart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH]sd: Don't treat succeeded SYNC as error
  2016-04-25 20:28 ` Bart Van Assche
@ 2016-04-26 10:57   ` Jinpu Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Jinpu Wang @ 2016-04-26 10:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Martin K. Petersen, James E.J. Bottomley,
	Christoph Hellwig, Sebastian Parschauer

On Mon, Apr 25, 2016 at 10:28 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 04/25/2016 03:36 AM, Jinpu Wang wrote:
>>
>> We hit IO error on fsync, it turns out was because sd treat succeeded
>> SYNC as error. From what I checked in SBC spec there is no indication
>> we should fail IO in this case, so we create this patch.
>
>
> Please follow the rules in Documentation/SubmittingPatches and submit
> patches inline. Regarding the patch itself: why is the switch(op) needed?
> Can it be left out? And regarding (sshdr.asc == 0x2a && sshdr.ascq == 0x09):
> which other unit attentions should cause SCSI commands to succeed instead of
> to fail?
>
> Bart.

Hi Bart,

Thanks for looking into this.
Sure, I can resubmit inline.
Because our test only cover this, unconditional treat other commands
without test may break other application.

At least for READ/WRITE operation, ML will requeue it because not all
bytes transfered:
[ 1794.236001] sd 1:0:0:0: Capacity data has changed
[ 1794.236141] sd 1:0:0:0: [sdb] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[ 1794.236381] sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00
00 00 01 00
[ 1794.236609] sd 1:0:0:0: [sdb] tag#0 Sense Key : Unit Attention [current]
[ 1794.236725] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Capacity data has changed
[ 1794.236839] sd 1:0:0:0: [sdb] tag#0 scsi host busy 1 failed 0
[ 1794.236981] sd 1:0:0:0: Notifying upper driver of completion (result 8000002)
[ 1794.237096] sd 1:0:0:0: [sdb] tag#0 sd_done: completed 0 of 512 bytes
[ 1794.237210] sd 1:0:0:0: [sdb] tag#0 1 sectors total, 0 bytes done.
[ 1794.237325] sd 1:0:0:0: [sdb] tag#0 Inserting command
ffff88022b9d4780 into mlqueue
[ 1794.238631] sd 1:0:0:0: [sdb] sd_open
[ 1794.238745] sd 1:0:0:0: scsi_block_when_processing_errors: rtn: 1
[ 1794.238864] sd 1:0:0:0: scsi_block_when_processing_errors: rtn: 1
[ 1794.238989] sd 1:0:0:0: scsi_block_when_processing_errors: rtn: 1
[ 1794.239107] sd 1:0:0:0: unblocking device at zero depth
[ 1794.239224] sd 1:0:0:0: [sdb] tag#0 Send: scmd 0xffff88022b9d4780
[ 1794.239338] sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00
00 00 01 00
[ 1794.239592] sd 1:0:0:0: [sdb] tag#1 Send: scmd 0xffff8802296af380
[ 1794.239675] sd 1:0:0:0: [sdb] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[ 1794.239677] sd 1:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00
00 00 01 00
[ 1794.239678] sd 1:0:0:0: [sdb] tag#0 scsi host busy 2 failed 0
[ 1794.239679] sd 1:0:0:0: Notifying upper driver of completion (result 0)

But for SYNC CACHE:

[   84.450906] sd 6:0:0:1: [sdc] sd_open
[   84.451008] sd 6:0:0:1: scsi_block_when_processing_errors: rtn: 1
[   84.451084] sd 6:0:0:1: [sdc] tag#1 Send: scmd 0xffff880210357080
[   84.451143] sd 6:0:0:1: [sdc] tag#1 CDB: Synchronize Cache(10) 35
00 00 00 00 00 00 00 00 00
[   84.456081] sd 6:0:0:1: [sdc] tag#0 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[   84.456209] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 10
00 00 08 00
[   84.456295] sd 6:0:0:1: [sdc] tag#0 scsi host busy 2 failed 0
[   84.456374] sd 6:0:0:1: Notifying upper driver of completion (result 0)
[   84.456438] sd 6:0:0:1: [sdc] tag#0 sd_done: completed 4096 of 4096 bytes
[   84.456498] sd 6:0:0:1: [sdc] tag#0 8 sectors total, 4096 bytes done.
[   84.456595] sd 6:0:0:1: Capacity data has changed
[   84.456612] sd 6:0:0:1: [sdc] sd_setup_read_write_cmnd: block=280, count=8
[   84.456616] sd 6:0:0:1: [sdc] block=280
[   84.456620] sd 6:0:0:1: [sdc] reading 8/8 512 byte blocks.
[   84.456628] sd 6:0:0:1: [sdc] tag#0 Send: scmd 0xffff8800ca41ed80
[   84.456634] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 18
00 00 08 00
[   84.457003] sd 6:0:0:1: [sdc] tag#1 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[   84.457090] sd 6:0:0:1: [sdc] tag#1 CDB: Synchronize Cache(10) 35
00 00 00 00 00 00 00 00 00
[   84.457211] sd 6:0:0:1: [sdc] tag#1 Sense Key : Unit Attention [current]
[   84.457272] sd 6:0:0:1: [sdc] tag#1 Add. Sense: Capacity data has changed
[   84.457331] sd 6:0:0:1: [sdc] tag#1 scsi host busy 2 failed 0
[   84.457387] sd 6:0:0:1: Notifying upper driver of completion (result 8000002)
[   84.457445] sd 6:0:0:1: [sdc] tag#1 sd_done: completed 0 of 0 bytes
[   84.457503] sd 6:0:0:1: [sdc] tag#1 0 sectors total, 0 bytes done.
[   84.457562] blk_update_request: I/O error, dev sdc, sector 0

It completed with error directly.
We don't have failure case with our UA yet.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-26 10:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-25 10:36 [PATCH]sd: Don't treat succeeded SYNC as error Jinpu Wang
2016-04-25 20:28 ` Bart Van Assche
2016-04-26 10:57   ` Jinpu Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.