public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* T10-PI: Getting failed tag info
@ 2014-12-11  4:58 Vladislav Bolkhovitin
  2014-12-12  3:12 ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Vladislav Bolkhovitin @ 2014-12-11  4:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: Martin K. Petersen

Hi,

We are currently developing a SCSI target system with T10-PI. We are using block integrity interface and found a problem that this interface fundamentally can not pass Oracle T10-PI certification tests. Those tests require to receive on the initiator side information about which particular tag failed the target checks, but the block integrity interface does not preserve this information, hence the target can not deliver it to the initiator => certification failure. The storage provides the right sense, but then in scsi_io_completion() it is dropped and replaced by a single EILSEQ.

What would be the best way to fix that? By making a patch introducing new -EXXXXXX error codes for the PI errors?

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-11  4:58 T10-PI: Getting failed tag info Vladislav Bolkhovitin
@ 2014-12-12  3:12 ` Martin K. Petersen
  2014-12-12 23:15   ` Nicholas A. Bellinger
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Martin K. Petersen @ 2014-12-12  3:12 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: linux-scsi, Martin K. Petersen

>>>>> "Vlad" == Vladislav Bolkhovitin <vst@vlnb.net> writes:

Vlad,

Vlad> We are currently developing a SCSI target system with T10-PI. We
Vlad> are using block integrity interface and found a problem that this
Vlad> interface fundamentally can not pass Oracle T10-PI certification
Vlad> tests. Those tests require to receive on the initiator side
Vlad> information about which particular tag failed the target checks,
Vlad> but the block integrity interface does not preserve this
Vlad> information, hence the target can not deliver it to the initiator
Vlad> => certification failure. The storage provides the right sense,
Vlad> but then in scsi_io_completion() it is dropped and replaced by a
Vlad> single EILSEQ.

Vlad> What would be the best way to fix that? By making a patch
Vlad> introducing new -EXXXXXX error codes for the PI errors?

I posted such a patch a while back. We use that in our qualification
tooling to ensure that the right things are reported when a PI error is
injected at various places in the stack.

One thing that needs to be done is to make returning these new errors to
userland conditional on !BIP_BLOCK_INTEGRITY. I'll put that on my list.

-- 
Martin K. Petersen	Oracle Linux Engineering


block: Add specific data integrity errors

Introduce a set of error codes that can be used by the block integrity
subsystem to signal which class of error was encountered by either the
I/O controller or the storage device.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/block/blk-core.c b/block/blk-core.c
index 6f8dba161bfe..bde1de440b03 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2442,6 +2442,18 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 		case -ENODATA:
 			error_type = "critical medium";
 			break;
+		case -ECTRLGRD:
+		case -ECTRLAPP:
+		case -ECTRLREF:
+		case -EDISKGRD:
+		case -EDISKAPP:
+		case -EDISKREF:
+		case -EKERNGRD:
+		case -EKERNAPP:
+		case -EKERNREF:
+		case -EILSEQ:
+			error_type = "data integrity";
+			break;
 		case -EIO:
 		default:
 			error_type = "I/O";
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index f4167b013d99..c05e86f4dded 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1210,6 +1210,15 @@ static int noretry_error(int error)
 	case -EOPNOTSUPP:
 	case -EREMOTEIO:
 	case -EILSEQ:
+	case -ECTRLGRD:
+	case -ECTRLAPP:
+	case -ECTRLREF:
+	case -EDISKGRD:
+	case -EDISKAPP:
+	case -EDISKREF:
+	case -EKERNGRD:
+	case -EKERNAPP:
+	case -EKERNREF:
 	case -ENODATA:
 	case -ENOSPC:
 		return 1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 85cf0ef843f6..9f9e5770606b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -837,7 +837,20 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				action = ACTION_REPREP;
 			} else if (sshdr.asc == 0x10) /* DIX */ {
 				action = ACTION_FAIL;
-				error = -EILSEQ;
+				switch (sshdr.ascq) {
+				case 0x1:
+					error = -ECTRLGRD;
+					break;
+				case 0x2:
+					error = -ECTRLAPP;
+					break;
+				case 0x3:
+					error = -ECTRLREF;
+					break;
+				default:
+					error = -EILSEQ;
+					break;
+				}
 			/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
 			} else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
 				action = ACTION_FAIL;
@@ -847,8 +860,22 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			break;
 		case ABORTED_COMMAND:
 			action = ACTION_FAIL;
-			if (sshdr.asc == 0x10) /* DIF */
-				error = -EILSEQ;
+			if (sshdr.asc == 0x10) { /* DIF */
+				switch (sshdr.ascq) {
+				case 0x1:
+					error = -EDISKGRD;
+					break;
+				case 0x2:
+					error = -EDISKAPP;
+					break;
+				case 0x3:
+					error = -EDISKREF;
+					break;
+				default:
+					error = -EILSEQ;
+					break;
+				}
+			}
 			break;
 		case NOT_READY:
 			/* If the device is in the process of becoming
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
index 1e1ea6e6e7a5..d89af4ba1e04 100644
--- a/include/uapi/asm-generic/errno.h
+++ b/include/uapi/asm-generic/errno.h
@@ -110,4 +110,15 @@
 
 #define EHWPOISON	133	/* Memory page has hardware error */
 
+/* data integrity errors */
+#define ECTRLGRD	134	/* I/O controller detected guard tag error */
+#define ECTRLAPP	135	/* I/O controller detected app tag error */
+#define ECTRLREF	136	/* I/O controller detected ref tag error */
+#define EDISKGRD	137	/* Storage device detected guard tag error */
+#define EDISKAPP	138	/* Storage device detected app tag error */
+#define EDISKREF	139	/* Storage device detected ref tag error */
+#define EKERNGRD	140	/* Kernel detected guard tag error */
+#define EKERNAPP	141	/* Kernel detected app tag error */
+#define EKERNREF	142	/* Kernel detected ref tag error */
+
 #endif

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-12  3:12 ` Martin K. Petersen
@ 2014-12-12 23:15   ` Nicholas A. Bellinger
  2014-12-16  0:30     ` Martin K. Petersen
  2014-12-13  3:57   ` Vladislav Bolkhovitin
  2014-12-15  8:18   ` Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Nicholas A. Bellinger @ 2014-12-12 23:15 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Vladislav Bolkhovitin, linux-scsi

On Thu, 2014-12-11 at 22:12 -0500, Martin K. Petersen wrote:
> >>>>> "Vlad" == Vladislav Bolkhovitin <vst@vlnb.net> writes:
> 
> Vlad,
> 
> Vlad> We are currently developing a SCSI target system with T10-PI. We
> Vlad> are using block integrity interface and found a problem that this
> Vlad> interface fundamentally can not pass Oracle T10-PI certification
> Vlad> tests. Those tests require to receive on the initiator side
> Vlad> information about which particular tag failed the target checks,
> Vlad> but the block integrity interface does not preserve this
> Vlad> information, hence the target can not deliver it to the initiator
> Vlad> => certification failure. The storage provides the right sense,
> Vlad> but then in scsi_io_completion() it is dropped and replaced by a
> Vlad> single EILSEQ.
> 
> Vlad> What would be the best way to fix that? By making a patch
> Vlad> introducing new -EXXXXXX error codes for the PI errors?
> 
> I posted such a patch a while back. We use that in our qualification
> tooling to ensure that the right things are reported when a PI error is
> injected at various places in the stack.
> 
> One thing that needs to be done is to make returning these new errors to
> userland conditional on !BIP_BLOCK_INTEGRITY. I'll put that on my list.
> 

Any chance this patch will be merged up soon..? 

I still need the same logic for IBLOCK backends to propagate up the
correct sense codes from bio completion in the event of GUARD check or
REF tag failure.

Also, AFAICT this patch still doesn't set the failed LBA, right..?
Any thoughts about the most sane way to get the failed LBA information
back to a caller like IBLOCK performing submission with bio + bip..?

--nab






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-12  3:12 ` Martin K. Petersen
  2014-12-12 23:15   ` Nicholas A. Bellinger
@ 2014-12-13  3:57   ` Vladislav Bolkhovitin
  2014-12-16  0:32     ` Martin K. Petersen
  2014-12-15  8:18   ` Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Vladislav Bolkhovitin @ 2014-12-13  3:57 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi

Martin K. Petersen wrote on 12/11/2014 07:12 PM:
>>>>>> "Vlad" == Vladislav Bolkhovitin <vst@vlnb.net> writes:
> Vlad> We are currently developing a SCSI target system with T10-PI. We
> Vlad> are using block integrity interface and found a problem that this
> Vlad> interface fundamentally can not pass Oracle T10-PI certification
> Vlad> tests. Those tests require to receive on the initiator side
> Vlad> information about which particular tag failed the target checks,
> Vlad> but the block integrity interface does not preserve this
> Vlad> information, hence the target can not deliver it to the initiator
> Vlad> => certification failure. The storage provides the right sense,
> Vlad> but then in scsi_io_completion() it is dropped and replaced by a
> Vlad> single EILSEQ.
> 
> Vlad> What would be the best way to fix that? By making a patch
> Vlad> introducing new -EXXXXXX error codes for the PI errors?
> 
> I posted such a patch a while back. We use that in our qualification
> tooling to ensure that the right things are reported when a PI error is
> injected at various places in the stack.

Thanks, this is exactly what is needed.

Reviewed-by: Vladislav Bolkhovitin <vst@vlnb.net>

> One thing that needs to be done is to make returning these new errors to
> userland conditional on !BIP_BLOCK_INTEGRITY. I'll put that on my list.

Ever without it this patch is quite valuable.

Vlad



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-12  3:12 ` Martin K. Petersen
  2014-12-12 23:15   ` Nicholas A. Bellinger
  2014-12-13  3:57   ` Vladislav Bolkhovitin
@ 2014-12-15  8:18   ` Christoph Hellwig
  2014-12-16  0:45     ` Martin K. Petersen
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2014-12-15  8:18 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Vladislav Bolkhovitin, linux-scsi

On Thu, Dec 11, 2014 at 10:12:11PM -0500, Martin K. Petersen wrote:
> block: Add specific data integrity errors
> 
> Introduce a set of error codes that can be used by the block integrity
> subsystem to signal which class of error was encountered by either the
> I/O controller or the storage device.

I really don't like adding new errno codes for all these.  I'd much
rather have a integrity error field with specific codes in the bio.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-12 23:15   ` Nicholas A. Bellinger
@ 2014-12-16  0:30     ` Martin K. Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2014-12-16  0:30 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Martin K. Petersen, Vladislav Bolkhovitin, linux-scsi

>>>>> "Nic" == Nicholas A Bellinger <nab@linux-iscsi.org> writes:

Nic> Also, AFAICT this patch still doesn't set the failed LBA, right..?
Nic> Any thoughts about the most sane way to get the failed LBA
Nic> information back to a caller like IBLOCK performing submission with
Nic> bio + bip..?

That's correct. My original patch did this.

More on that in my response to hch...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-13  3:57   ` Vladislav Bolkhovitin
@ 2014-12-16  0:32     ` Martin K. Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2014-12-16  0:32 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: Martin K. Petersen, linux-scsi

>>>>> "Vlad" == Vladislav Bolkhovitin <vst@vlnb.net> writes:

>> One thing that needs to be done is to make returning these new errors
>> to userland conditional on !BIP_BLOCK_INTEGRITY. I'll put that on my
>> list.

Vlad> Ever without it this patch is quite valuable.

Well, the concern here is that the new magic errors bubble up to
userland applications that only check for EIO/EINTR.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-15  8:18   ` Christoph Hellwig
@ 2014-12-16  0:45     ` Martin K. Petersen
  2014-12-30 12:15       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2014-12-16  0:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Martin K. Petersen, Vladislav Bolkhovitin, linux-scsi

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> I really don't like adding new errno codes for all these.

This was mainly done to accommodate Darrick's work on aio extensions. If
these errors were forever trapped inside the kernel I would agree with
you but the plan is to make this generally applicable.

Christoph> I'd much rather have a integrity error field with specific
Christoph> codes in the bio.

My original device qualification code did this. I also had one that
included a field for the offending LBA but that appears to be lost in
the mist of time.

(Patch against a 2.6.3x era kernel so will need some massaging).

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index ba48f0b..716285a 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -757,6 +757,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 	bip->bip_vcnt = bip_src->bip_vcnt;
 	bip->bip_idx = bip_src->bip_idx;
 	bip->bip_flags = bip_src->bip_flags;
+	bip->bip_completion = bip_src->bip_completion;
 
 	return 0;
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1aa9ee4..b4c08b8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -163,6 +163,21 @@ static inline int bio_has_allocated_vec(struct bio *bio)
 #define bio_get(bio)	atomic_inc(&(bio)->bi_cnt)
 
 #if defined(CONFIG_BLK_DEV_INTEGRITY)
+
+enum bio_integrity_errors {
+	BIP_ERR_NONE = 0,	/* no error */
+	BIP_ERR_CTRL_GUARD,	/* controller detected guard tag error */
+	BIP_ERR_CTRL_APP,	/* controller detected app tag error */
+	BIP_ERR_CTRL_REF,	/* controller detected ref tag error */
+	BIP_ERR_DISK_GUARD,	/* disk detected guard tag error */
+	BIP_ERR_DISK_APP,	/* disk detected app tag error */
+	BIP_ERR_DISK_REF,	/* disk detected ref tag error */
+};
+
+struct bio_integrity_completion {
+	enum bio_integrity_errors	error;
+};
+
 /*
  * bio integrity payload
  */
@@ -173,13 +188,14 @@ struct bio_integrity_payload {
 
 	void			*bip_buf;	/* generated integrity data */
 	bio_end_io_t		*bip_end_io;	/* saved I/O completion fn */
+	struct bio_integrity_completion	*bip_completion; /* I/O completion */
 
 	unsigned int		bip_size;
 
 	unsigned short		bip_slab;	/* slab the bip came from */
 	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
 	unsigned short		bip_idx;	/* current bip_vec index */
-	unsigned short		bip_flags;	/* control and status flags */
+	unsigned short		bip_flags;	/* control flags */
 
 	struct work_struct	bip_work;	/* I/O completion */
 	struct bio_vec		bip_vec[0];	/* embedded bvec array */

The thing that bugged me about this approach was the shared completion
mask for all clones. I felt that was a bit icky. However, it seemed like
a major hassle to have this be clone-private and have to register endio
handlers for each and every clone to get status bubbled up. That was
something that the switch to errnos handled much more elegantly.
However, if we want to have accurate offset reporting we will need to do
the completion struct thing.

Open to ideas. The many-clones-to-one-completion-status issue isn't
entirely trivial to tackle.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-16  0:45     ` Martin K. Petersen
@ 2014-12-30 12:15       ` Christoph Hellwig
  2015-01-06 23:49         ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2014-12-30 12:15 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, Vladislav Bolkhovitin, linux-scsi

On Mon, Dec 15, 2014 at 07:45:39PM -0500, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
> 
> Christoph> I really don't like adding new errno codes for all these.
> 
> This was mainly done to accommodate Darrick's work on aio extensions. If
> these errors were forever trapped inside the kernel I would agree with
> you but the plan is to make this generally applicable.

But those extensions require new structures to pass in the PI info
anyway..


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: T10-PI: Getting failed tag info
  2014-12-30 12:15       ` Christoph Hellwig
@ 2015-01-06 23:49         ` Martin K. Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2015-01-06 23:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Martin K. Petersen, Vladislav Bolkhovitin, linux-scsi

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> But those extensions require new structures to pass in the PI
Christoph> info anyway..

Yep. But I still don't see why that warrants having a parallel error
passing infrastructure in the kernel to handle data integrity
errors. What does that buy us?

I agree with not leaking any of the new errors to unsuspecting
applications and only passing them out if the caller has explicitly
enabled DIX.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-06 23:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-11  4:58 T10-PI: Getting failed tag info Vladislav Bolkhovitin
2014-12-12  3:12 ` Martin K. Petersen
2014-12-12 23:15   ` Nicholas A. Bellinger
2014-12-16  0:30     ` Martin K. Petersen
2014-12-13  3:57   ` Vladislav Bolkhovitin
2014-12-16  0:32     ` Martin K. Petersen
2014-12-15  8:18   ` Christoph Hellwig
2014-12-16  0:45     ` Martin K. Petersen
2014-12-30 12:15       ` Christoph Hellwig
2015-01-06 23:49         ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox