Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
       [not found] <bug-9405-10286@http.bugzilla.kernel.org/>
@ 2007-11-19 20:50 ` Andrew Morton
  2007-11-19 20:56   ` James Bottomley
  2007-11-19 21:15   ` Mike Christie
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2007-11-19 20:50 UTC (permalink / raw)
  To: linux-scsi; +Cc: bugme-daemon, bart.vanassche

On Mon, 19 Nov 2007 05:44:01 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9405
> 
>            Summary: iSCSI does not implement ordering guarantees required by
>                     e.g. journaling filesystems
>            Product: IO/Storage
>            Version: 2.5
>      KernelVersion: 2.6.23.1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: SCSI
>         AssignedTo: io_scsi@kernel-bugs.osdl.org
>         ReportedBy: bart.vanassche@gmail.com
> 
> 
> Most recent kernel where this bug did not occur: (new issue)
> Distribution: any
> Hardware Environment: (does not apply)
> Software Environment: (does not apply) 
> Problem Description: The sd (SCSI disk) driver ignores block device barriers
> (REQ_HARDBARRIER). The iSCSI code in the kernel sends all iSCSI commands with
> flag ISCSI_ATTR_SIMPLE to the iSCSI target. This means that the target may
> reorder these commands. Since a.o. correct operation of journaling filesystems
> depends on being able to enforce the order of certain block write operations,
> not enforcing write ordering is a bug. This can be solved by either adding
> support for REQ_HARDBARRIER in the sd device or by replacing ISCSI_ATTR_SIMPLE
> by ISCSI_ATTR_ORDERED.
> 
> Steps to reproduce: Source reading of drivers/scsi/sd.c and
> drivers/scsi/libiscsi.c.
> 
> References: SCSI Architecture Model - 3, paragraph 8.6
> (http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf).
> 

(does iscsi have a maintainer?)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 20:50 ` [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems Andrew Morton
@ 2007-11-19 20:56   ` James Bottomley
  2007-11-19 21:22     ` Mike Christie
  2007-11-19 21:15   ` Mike Christie
  1 sibling, 1 reply; 18+ messages in thread
From: James Bottomley @ 2007-11-19 20:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-scsi, bugme-daemon, bart.vanassche


On Mon, 2007-11-19 at 12:50 -0800, Andrew Morton wrote:
> On Mon, 19 Nov 2007 05:44:01 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9405
> > 
> >            Summary: iSCSI does not implement ordering guarantees required by
> >                     e.g. journaling filesystems
> >            Product: IO/Storage
> >            Version: 2.5
> >      KernelVersion: 2.6.23.1
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: SCSI
> >         AssignedTo: io_scsi@kernel-bugs.osdl.org
> >         ReportedBy: bart.vanassche@gmail.com
> > 
> > 
> > Most recent kernel where this bug did not occur: (new issue)
> > Distribution: any
> > Hardware Environment: (does not apply)
> > Software Environment: (does not apply) 
> > Problem Description: The sd (SCSI disk) driver ignores block device barriers
> > (REQ_HARDBARRIER). The iSCSI code in the kernel sends all iSCSI commands with
> > flag ISCSI_ATTR_SIMPLE to the iSCSI target. This means that the target may
> > reorder these commands. Since a.o. correct operation of journaling filesystems
> > depends on being able to enforce the order of certain block write operations,
> > not enforcing write ordering is a bug. This can be solved by either adding
> > support for REQ_HARDBARRIER in the sd device or by replacing ISCSI_ATTR_SIMPLE
> > by ISCSI_ATTR_ORDERED.
> > 
> > Steps to reproduce: Source reading of drivers/scsi/sd.c and
> > drivers/scsi/libiscsi.c.
> > 
> > References: SCSI Architecture Model - 3, paragraph 8.6
> > (http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf).
> > 
> 
> (does iscsi have a maintainer?)

Yes, mike christie

And please close this as invalid.  FS ordering guarantees in linux
aren't done via ordered tags.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 20:50 ` [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems Andrew Morton
  2007-11-19 20:56   ` James Bottomley
@ 2007-11-19 21:15   ` Mike Christie
  2007-11-19 21:18     ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Christie @ 2007-11-19 21:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-scsi, bugme-daemon, bart.vanassche

[-- Attachment #1: Type: text/plain, Size: 1837 bytes --]

Andrew Morton wrote:
> On Mon, 19 Nov 2007 05:44:01 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=9405
>>
>>            Summary: iSCSI does not implement ordering guarantees required by
>>                     e.g. journaling filesystems
>>            Product: IO/Storage
>>            Version: 2.5
>>      KernelVersion: 2.6.23.1
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: high
>>           Priority: P1
>>          Component: SCSI
>>         AssignedTo: io_scsi@kernel-bugs.osdl.org
>>         ReportedBy: bart.vanassche@gmail.com
>>
>>
>> Most recent kernel where this bug did not occur: (new issue)
>> Distribution: any
>> Hardware Environment: (does not apply)
>> Software Environment: (does not apply) 
>> Problem Description: The sd (SCSI disk) driver ignores block device barriers
>> (REQ_HARDBARRIER). The iSCSI code in the kernel sends all iSCSI commands with
>> flag ISCSI_ATTR_SIMPLE to the iSCSI target. This means that the target may
>> reorder these commands. Since a.o. correct operation of journaling filesystems
>> depends on being able to enforce the order of certain block write operations,
>> not enforcing write ordering is a bug. This can be solved by either adding
>> support for REQ_HARDBARRIER in the sd device or by replacing ISCSI_ATTR_SIMPLE
>> by ISCSI_ATTR_ORDERED.
>>
>> Steps to reproduce: Source reading of drivers/scsi/sd.c and
>> drivers/scsi/libiscsi.c.
>>
>> References: SCSI Architecture Model - 3, paragraph 8.6
>> (http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf).
>>
> 
> (does iscsi have a maintainer?)

Attached is a patch to add me to the maintainers file so it will be 
easier to hunt me down in the future. It was made over 2.6.24-rc2.

[-- Attachment #2: add-mike-christie-to-maintainers.patch --]
[-- Type: text/x-patch, Size: 560 bytes --]

Add Mike Christie to MAINTAINERS file.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>

diff --git a/MAINTAINERS b/MAINTAINERS
index 1c7c229..82b5751 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2111,6 +2111,14 @@ L:	irda-users@lists.sourceforge.net (subscribers-only)
 W:	http://irda.sourceforge.net/
 S:	Maintained
 
+iSCSI
+P:	Mike Christie
+M:	michaelc@cs.wisc.edu
+L:	open-iscsi@googlegroups.com
+W:	www.open-iscsi.org
+T:	git kernel.org:/pub/scm/linux/kernel/mnc/linux-2.6-iscsi.git
+S:	Maintained
+
 ISAPNP
 P:	Jaroslav Kysela
 M:	perex@perex.cz

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 21:15   ` Mike Christie
@ 2007-11-19 21:18     ` Matthew Wilcox
  2007-11-19 21:24       ` Mike Christie
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2007-11-19 21:18 UTC (permalink / raw)
  To: Mike Christie; +Cc: Andrew Morton, linux-scsi, bugme-daemon, bart.vanassche

On Mon, Nov 19, 2007 at 03:15:22PM -0600, Mike Christie wrote:
> +iSCSI

It's traditional to use all-caps here, even when the normal
capitalisation is different.  so I think this should be "ISCSI".
Damned if I know why we have this convention, though.

> +P:	Mike Christie
> +M:	michaelc@cs.wisc.edu
> +L:	open-iscsi@googlegroups.com
> +W:	www.open-iscsi.org
> +T:	git kernel.org:/pub/scm/linux/kernel/mnc/linux-2.6-iscsi.git
> +S:	Maintained


-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 20:56   ` James Bottomley
@ 2007-11-19 21:22     ` Mike Christie
  2007-11-19 21:28       ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Christie @ 2007-11-19 21:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, bugme-daemon, bart.vanassche

[-- Attachment #1: Type: text/plain, Size: 2443 bytes --]

James Bottomley wrote:
> On Mon, 2007-11-19 at 12:50 -0800, Andrew Morton wrote:
>> On Mon, 19 Nov 2007 05:44:01 -0800 (PST)
>> bugme-daemon@bugzilla.kernel.org wrote:
>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=9405
>>>
>>>            Summary: iSCSI does not implement ordering guarantees required by
>>>                     e.g. journaling filesystems
>>>            Product: IO/Storage
>>>            Version: 2.5
>>>      KernelVersion: 2.6.23.1
>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: high
>>>           Priority: P1
>>>          Component: SCSI
>>>         AssignedTo: io_scsi@kernel-bugs.osdl.org
>>>         ReportedBy: bart.vanassche@gmail.com
>>>
>>>
>>> Most recent kernel where this bug did not occur: (new issue)
>>> Distribution: any
>>> Hardware Environment: (does not apply)
>>> Software Environment: (does not apply) 
>>> Problem Description: The sd (SCSI disk) driver ignores block device barriers
>>> (REQ_HARDBARRIER). The iSCSI code in the kernel sends all iSCSI commands with
>>> flag ISCSI_ATTR_SIMPLE to the iSCSI target. This means that the target may
>>> reorder these commands. Since a.o. correct operation of journaling filesystems
>>> depends on being able to enforce the order of certain block write operations,
>>> not enforcing write ordering is a bug. This can be solved by either adding
>>> support for REQ_HARDBARRIER in the sd device or by replacing ISCSI_ATTR_SIMPLE
>>> by ISCSI_ATTR_ORDERED.
>>>
>>> Steps to reproduce: Source reading of drivers/scsi/sd.c and
>>> drivers/scsi/libiscsi.c.
>>>
>>> References: SCSI Architecture Model - 3, paragraph 8.6
>>> (http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf).
>>>
>> (does iscsi have a maintainer?)
> 
> Yes, mike christie
> 
> And please close this as invalid.  FS ordering guarantees in linux
> aren't done via ordered tags.
> 

I had a related question. I was working on the attached patch for soe 
other testing (patch made against scsi-rc-fixes, but is not stable so do 
not apply), which does the scsi_populate_tag_msg conversion from MSG_* 
to ISCSI_ATTR and sets the proper iscsi bits.

If I do this patch where I call scsi_activate_tcq on a device and that 
concertsion, does this require that my driver not reorder commands? I 
was just a little worried on some of the error handling paths where we 
requeue commands to the mid layer.

[-- Attachment #2: add-tcq.patch --]
[-- Type: text/x-patch, Size: 2192 bytes --]

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 57ce225..d256cf3 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -37,6 +37,7 @@
 #include <linux/scatterlist.h>
 #include <net/tcp.h>
 #include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_tcq.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi.h>
@@ -2211,8 +2212,17 @@ static void iscsi_tcp_session_destroy(struct iscsi_cls_session *cls_session)
 
 static int iscsi_tcp_slave_configure(struct scsi_device *sdev)
 {
+	int depth = 1, tag = 0;
+
 	blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
 	blk_queue_dma_alignment(sdev->request_queue, 0);
+
+	if (sdev->tagged_supported) {
+		scsi_activate_tcq(sdev, ISCSI_DEF_CMD_PER_LUN);
+		depth = ISCSI_DEF_CMD_PER_LUN;
+		tag = MSG_ORDERED_TAG;
+	}
+	scsi_adjust_queue_depth(sdev, tag, depth);
 	return 0;
 }
 
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 8b57af5..7e13a03 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -122,6 +122,27 @@ void iscsi_prep_unsolicit_data_pdu(struct iscsi_cmd_task *ctask,
 }
 EXPORT_SYMBOL_GPL(iscsi_prep_unsolicit_data_pdu);
 
+static uint32_t iscsi_command_attr(struct scsi_cmnd *cmd)
+{
+	unsigned int attr = ISCSI_ATTR_UNTAGGED;
+	char msg[2];
+
+	if (scsi_populate_tag_msg(cmd, msg) == 2) {
+		switch (msg[0]) {
+		case MSG_SIMPLE_TAG:
+			attr = ISCSI_ATTR_SIMPLE;
+			break;
+		case MSG_HEAD_TAG:
+			attr = ISCSI_ATTR_HEAD_OF_QUEUE;
+			break;
+		case MSG_ORDERED_TAG:
+			attr = ISCSI_ATTR_ORDERED;
+			break;
+		}
+	}
+	return attr;
+}
+
 /**
  * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
  * @ctask: iscsi cmd task
@@ -137,7 +158,8 @@ static void iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 	struct scsi_cmnd *sc = ctask->sc;
 
         hdr->opcode = ISCSI_OP_SCSI_CMD;
-        hdr->flags = ISCSI_ATTR_SIMPLE;
+        hdr->flags = hdr->flags = (iscsi_command_attr(sc) &
+						ISCSI_FLAG_CMD_ATTR_MASK);
         int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
         hdr->itt = build_itt(ctask->itt, conn->id, session->age);
         hdr->data_length = cpu_to_be32(scsi_bufflen(sc));

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 21:18     ` Matthew Wilcox
@ 2007-11-19 21:24       ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2007-11-19 21:24 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, linux-scsi, bugme-daemon, bart.vanassche

[-- Attachment #1: Type: text/plain, Size: 322 bytes --]

Matthew Wilcox wrote:
> On Mon, Nov 19, 2007 at 03:15:22PM -0600, Mike Christie wrote:
>> +iSCSI
> 
> It's traditional to use all-caps here, even when the normal
> capitalisation is different.  so I think this should be "ISCSI".
> Damned if I know why we have this convention, though.
> 

Thanks. Here is a updated patch.

[-- Attachment #2: add-mike-christie-to-maintainers.patch --]
[-- Type: text/x-patch, Size: 591 bytes --]

Add Mike Christie to MAINTAINERS file.

v2 - used all caps for system

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>

diff --git a/MAINTAINERS b/MAINTAINERS
index 1c7c229..1fdbb72 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2111,6 +2111,14 @@ L:	irda-users@lists.sourceforge.net (subscribers-only)
 W:	http://irda.sourceforge.net/
 S:	Maintained
 
+ISCSI
+P:	Mike Christie
+M:	michaelc@cs.wisc.edu
+L:	open-iscsi@googlegroups.com
+W:	www.open-iscsi.org
+T:	git kernel.org:/pub/scm/linux/kernel/mnc/linux-2.6-iscsi.git
+S:	Maintained
+
 ISAPNP
 P:	Jaroslav Kysela
 M:	perex@perex.cz

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 21:22     ` Mike Christie
@ 2007-11-19 21:28       ` James Bottomley
  2007-11-20 15:04         ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2007-11-19 21:28 UTC (permalink / raw)
  To: Mike Christie; +Cc: Andrew Morton, linux-scsi, bugme-daemon, bart.vanassche


On Mon, 2007-11-19 at 15:22 -0600, Mike Christie wrote:
> James Bottomley wrote:
> > On Mon, 2007-11-19 at 12:50 -0800, Andrew Morton wrote:
> >> On Mon, 19 Nov 2007 05:44:01 -0800 (PST)
> >> bugme-daemon@bugzilla.kernel.org wrote:
> >>
> >>> http://bugzilla.kernel.org/show_bug.cgi?id=9405
> >>>
> >>>            Summary: iSCSI does not implement ordering guarantees required by
> >>>                     e.g. journaling filesystems
> >>>            Product: IO/Storage
> >>>            Version: 2.5
> >>>      KernelVersion: 2.6.23.1
> >>>           Platform: All
> >>>         OS/Version: Linux
> >>>               Tree: Mainline
> >>>             Status: NEW
> >>>           Severity: high
> >>>           Priority: P1
> >>>          Component: SCSI
> >>>         AssignedTo: io_scsi@kernel-bugs.osdl.org
> >>>         ReportedBy: bart.vanassche@gmail.com
> >>>
> >>>
> >>> Most recent kernel where this bug did not occur: (new issue)
> >>> Distribution: any
> >>> Hardware Environment: (does not apply)
> >>> Software Environment: (does not apply) 
> >>> Problem Description: The sd (SCSI disk) driver ignores block device barriers
> >>> (REQ_HARDBARRIER). The iSCSI code in the kernel sends all iSCSI commands with
> >>> flag ISCSI_ATTR_SIMPLE to the iSCSI target. This means that the target may
> >>> reorder these commands. Since a.o. correct operation of journaling filesystems
> >>> depends on being able to enforce the order of certain block write operations,
> >>> not enforcing write ordering is a bug. This can be solved by either adding
> >>> support for REQ_HARDBARRIER in the sd device or by replacing ISCSI_ATTR_SIMPLE
> >>> by ISCSI_ATTR_ORDERED.
> >>>
> >>> Steps to reproduce: Source reading of drivers/scsi/sd.c and
> >>> drivers/scsi/libiscsi.c.
> >>>
> >>> References: SCSI Architecture Model - 3, paragraph 8.6
> >>> (http://www.t10.org/ftp/t10/drafts/sam3/sam3r14.pdf).
> >>>
> >> (does iscsi have a maintainer?)
> > 
> > Yes, mike christie
> > 
> > And please close this as invalid.  FS ordering guarantees in linux
> > aren't done via ordered tags.
> > 
> 
> I had a related question. I was working on the attached patch for soe 
> other testing (patch made against scsi-rc-fixes, but is not stable so do 
> not apply), which does the scsi_populate_tag_msg conversion from MSG_* 
> to ISCSI_ATTR and sets the proper iscsi bits.
> 
> If I do this patch where I call scsi_activate_tcq on a device and that 
> concertsion, does this require that my driver not reorder commands? I 
> was just a little worried on some of the error handling paths where we 
> requeue commands to the mid layer.

Right, there's no way of guaranteeing that commands aren't reordered in
the error path (or even the queue full submission path) which is why we
don't use ordered tags to enforce barriers.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-19 21:28       ` James Bottomley
@ 2007-11-20 15:04         ` Vladislav Bolkhovitin
  2007-11-20 15:28           ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Vladislav Bolkhovitin @ 2007-11-20 15:04 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

James Bottomley wrote:
>>>And please close this as invalid.  FS ordering guarantees in linux
>>>aren't done via ordered tags.
>>
>>I had a related question. I was working on the attached patch for soe 
>>other testing (patch made against scsi-rc-fixes, but is not stable so do 
>>not apply), which does the scsi_populate_tag_msg conversion from MSG_* 
>>to ISCSI_ATTR and sets the proper iscsi bits.
>>
>>If I do this patch where I call scsi_activate_tcq on a device and that 
>>concertsion, does this require that my driver not reorder commands? I 
>>was just a little worried on some of the error handling paths where we 
>>requeue commands to the mid layer.
> 
> Right, there's no way of guaranteeing that commands aren't reordered in
> the error path (or even the queue full submission path) which is why we
> don't use ordered tags to enforce barriers.

May I make your answer more precise? SCSI for non-caching and 
write-through caching devices provides a way to guarantee order of 
commands on the error path via ACA and UA_INTLCK facilities, if they are 
supported by device. For write-back caching devices it's different, 
because cache may reorder commands after they are reported as completed 
to the initiator as well as there is a possibility for deferred errors.

So, there is no way to guarantee commands order in case of errors, 
because Linux doesn't implement that.

BTW, there is still something wrong in the SCSI/block/FS layers error 
processing. Playing with my SCSI target I've noticed that if it returns 
pretty valid TASK ABORTED status for some SCSI command, FS on initiator 
(ext3) immediately gets corrupted and journal replay on remount doesn't 
repair it, only manual e2fsck helps. So, apparently:

1. SCSI ML handles well not all status codes, which it should.

2. Block/FS levels (sometimes) don't handle I/O errors well enough 
without corrupting file systems.

I don't have time for further investigations, but, if somebody prepare a 
patch to fix that, I'm willing to assist in testing.

Vlad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 15:04         ` Vladislav Bolkhovitin
@ 2007-11-20 15:28           ` James Bottomley
  2007-11-20 16:15             ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2007-11-20 15:28 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

On Tue, 2007-11-20 at 18:04 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> >>>And please close this as invalid.  FS ordering guarantees in linux
> >>>aren't done via ordered tags.
> >>
> >>I had a related question. I was working on the attached patch for soe 
> >>other testing (patch made against scsi-rc-fixes, but is not stable so do 
> >>not apply), which does the scsi_populate_tag_msg conversion from MSG_* 
> >>to ISCSI_ATTR and sets the proper iscsi bits.
> >>
> >>If I do this patch where I call scsi_activate_tcq on a device and that 
> >>concertsion, does this require that my driver not reorder commands? I 
> >>was just a little worried on some of the error handling paths where we 
> >>requeue commands to the mid layer.
> > 
> > Right, there's no way of guaranteeing that commands aren't reordered in
> > the error path (or even the queue full submission path) which is why we
> > don't use ordered tags to enforce barriers.
> 
> May I make your answer more precise? SCSI for non-caching and 
> write-through caching devices provides a way to guarantee order of 
> commands on the error path via ACA and UA_INTLCK facilities, if they are 
> supported by device. For write-back caching devices it's different, 
> because cache may reorder commands after they are reported as completed 
> to the initiator as well as there is a possibility for deferred errors.

Yes, I know this.  The problem is that because we can't rely on the
ordering guarantees in *every* situation, it's unsafe to rely on them
for barrier support (the case you most need them is the one where the
guarantees have likely failed).  Thus, linux fs on SCSI implement
barriers by waiting for completions.  The only case we could implement
flush barriers in SCSI, as they do in IDE is in the single outstanding
command case where we don't have any reordering to worry about (i.e.
queue depth of one).

> So, there is no way to guarantee commands order in case of errors, 
> because Linux doesn't implement that.
> 
> BTW, there is still something wrong in the SCSI/block/FS layers error 
> processing. Playing with my SCSI target I've noticed that if it returns 
> pretty valid TASK ABORTED status for some SCSI command, FS on initiator 
> (ext3) immediately gets corrupted and journal replay on remount doesn't 
> repair it, only manual e2fsck helps. So, apparently:
> 
> 1. SCSI ML handles well not all status codes, which it should.

It certainly handles TASK ABORTED.

> 2. Block/FS levels (sometimes) don't handle I/O errors well enough 
> without corrupting file systems.

I'm not sure your conclusions necessarily follow your data.  What was
the reason for the TASK ABORTED (I'd guess QErr settings, right)?

Journals can fail to recover in cases where the underlying medium is
corrupted.  If TASK ABORTED was because of QErr, what was the original
failure?

Also, what was going on in the system (and what device was this ...
iSCSI I guess) ... I assume nothing powered down, so it's not a caching
problem (and that, since you seem to be using TCQ you do have your
caches set to write through).

> I don't have time for further investigations, but, if somebody prepare a 
> patch to fix that, I'm willing to assist in testing.

We'll need a bit more data to identify an actual root cause for this
problem before anyone can prepare a patch to fix it.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 15:28           ` James Bottomley
@ 2007-11-20 16:15             ` Vladislav Bolkhovitin
  2007-11-20 16:43               ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Vladislav Bolkhovitin @ 2007-11-20 16:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

James Bottomley wrote:
> On Tue, 2007-11-20 at 18:04 +0300, Vladislav Bolkhovitin wrote:
> 
>>James Bottomley wrote:
>>
>>>>>And please close this as invalid.  FS ordering guarantees in linux
>>>>>aren't done via ordered tags.
>>>>
>>>>I had a related question. I was working on the attached patch for soe 
>>>>other testing (patch made against scsi-rc-fixes, but is not stable so do 
>>>>not apply), which does the scsi_populate_tag_msg conversion from MSG_* 
>>>>to ISCSI_ATTR and sets the proper iscsi bits.
>>>>
>>>>If I do this patch where I call scsi_activate_tcq on a device and that 
>>>>concertsion, does this require that my driver not reorder commands? I 
>>>>was just a little worried on some of the error handling paths where we 
>>>>requeue commands to the mid layer.
>>>
>>>Right, there's no way of guaranteeing that commands aren't reordered in
>>>the error path (or even the queue full submission path) which is why we
>>>don't use ordered tags to enforce barriers.
>>
>>May I make your answer more precise? SCSI for non-caching and 
>>write-through caching devices provides a way to guarantee order of 
>>commands on the error path via ACA and UA_INTLCK facilities, if they are 
>>supported by device. For write-back caching devices it's different, 
>>because cache may reorder commands after they are reported as completed 
>>to the initiator as well as there is a possibility for deferred errors.
> 
> Yes, I know this.  The problem is that because we can't rely on the
> ordering guarantees in *every* situation, it's unsafe to rely on them
> for barrier support (the case you most need them is the one where the
> guarantees have likely failed).  Thus, linux fs on SCSI implement
> barriers by waiting for completions.  The only case we could implement
> flush barriers in SCSI, as they do in IDE is in the single outstanding
> command case where we don't have any reordering to worry about (i.e.
> queue depth of one).

...if we are going to work only with devices with write-back cache only 
or not supporting ACA/UA_INTLCK facilities. It might be well possible 
that some hypothetic SCSI device with write-through cache (WCE bit is 0 
or set to 0), ACA/UA_INTLCK and ORDERED commands support would perform 
considerebly better with barriers by ORDERED tags, than with barriers by 
waiting for completions and write-back cache, especially for file 
systems like XFS, because with barriers by ORDERED tags it is possible 
to keep SCSI tarnsport wire pipe full, where it has to be drained with 
barriers by waiting for completions. But, since AFAIK the majority of 
SCSI disks don't support ACA/UA_INTLCK, I have to agree with you, there 
is not much point currently to implement barriers by ORDERED tags in the 
SCSI ML.

>>So, there is no way to guarantee commands order in case of errors, 
>>because Linux doesn't implement that.
>>
>>BTW, there is still something wrong in the SCSI/block/FS layers error 
>>processing. Playing with my SCSI target I've noticed that if it returns 
>>pretty valid TASK ABORTED status for some SCSI command, FS on initiator 
>>(ext3) immediately gets corrupted and journal replay on remount doesn't 
>>repair it, only manual e2fsck helps. So, apparently:
>>
>>1. SCSI ML handles well not all status codes, which it should.
> 
> 
> It certainly handles TASK ABORTED.
> 
> 
>>2. Block/FS levels (sometimes) don't handle I/O errors well enough 
>>without corrupting file systems.
> 
> 
> I'm not sure your conclusions necessarily follow your data.  What was
> the reason for the TASK ABORTED (I'd guess QErr settings, right)?

It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
when it working with several initiators with different transports over 
the same set of devices, each of them having with TAS bit in the control 
mode page set. According to SAM, in this case TASK ABORTED status can be 
returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
just should be retried. But QUEUE FULL status handled well, but TASK 
ABORTED leads to filesystem corruption.

> Journals can fail to recover in cases where the underlying medium is
> corrupted.  If TASK ABORTED was because of QErr, what was the original
> failure?

See above. No "medium" corruption happened.

> Also, what was going on in the system (and what device was this ...
> iSCSI I guess) ...

It doesn't matter. It happens with FC transport as well.

> I assume nothing powered down, so it's not a caching
> problem (and that, since you seem to be using TCQ you do have your
> caches set to write through).

The target stays pretty well and healthy.

>>I don't have time for further investigations, but, if somebody prepare a 
>>patch to fix that, I'm willing to assist in testing.
> 
> We'll need a bit more data to identify an actual root cause for this
> problem before anyone can prepare a patch to fix it.
> 
> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 16:15             ` Vladislav Bolkhovitin
@ 2007-11-20 16:43               ` James Bottomley
  2007-11-20 17:17                 ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2007-11-20 16:43 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

On Tue, 2007-11-20 at 19:15 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> > I'm not sure your conclusions necessarily follow your data.  What was
> > the reason for the TASK ABORTED (I'd guess QErr settings, right)?
> 
> It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
> when it working with several initiators with different transports over 
> the same set of devices, each of them having with TAS bit in the control 
> mode page set. According to SAM, in this case TASK ABORTED status can be 
> returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
> just should be retried. But QUEUE FULL status handled well, but TASK 
> ABORTED leads to filesystem corruption.

So this is with a soft target implementation ... so it could be an
ordering issue inside the target that's causing the filesystem
corruption on error.

if you specifically set TAS=1 you're giving up the right to know what
caused the command termination.  With insufficient information, it's
really unsafe to simply retry, which is why the mid layer just returns
TASK ABORTED as an error.  If you set TAS=0 we'll get a check
condition/unit attention explaining what happened (usually commands
cleared by another initiator) and we'll explicitly do the right thing
based on the sense data.

One of my test suites has an initiator which randomly spits errors.
I've yet to see it cause an error that an ext3 journal can't recover
from.  So, if there's a genuine problem we need a nice test case to pass
to the filesystem people.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 16:43               ` James Bottomley
@ 2007-11-20 17:17                 ` Vladislav Bolkhovitin
  2007-11-20 17:30                   ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Vladislav Bolkhovitin @ 2007-11-20 17:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

James Bottomley wrote:
> On Tue, 2007-11-20 at 19:15 +0300, Vladislav Bolkhovitin wrote:
> 
>>James Bottomley wrote:
>>
>>>I'm not sure your conclusions necessarily follow your data.  What was
>>>the reason for the TASK ABORTED (I'd guess QErr settings, right)?
>>
>>It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
>>when it working with several initiators with different transports over 
>>the same set of devices, each of them having with TAS bit in the control 
>>mode page set. According to SAM, in this case TASK ABORTED status can be 
>>returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
>>just should be retried. But QUEUE FULL status handled well, but TASK 
>>ABORTED leads to filesystem corruption.
> 
> So this is with a soft target implementation ... so it could be an
> ordering issue inside the target that's causing the filesystem
> corruption on error.

Target offers no ordering guarantees for SIMPLE commands and frankly 
says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
control mode page. As we know, initiator doesn't use ORDERED tags (and 
it really doesn't use them according to the logs), so if it's an 
ordering issue, it's at the initiator's side.

> if you specifically set TAS=1 you're giving up the right to know what
> caused the command termination.  With insufficient information, it's
> really unsafe to simply retry, which is why the mid layer just returns
> TASK ABORTED as an error.  If you set TAS=0 we'll get a check
> condition/unit attention explaining what happened (usually commands
> cleared by another initiator) and we'll explicitly do the right thing
> based on the sense data.

But having TAS=1 is legal, right? So it should be handled well. If 
TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
ABORTED status can only be returned with TAS=1.

> One of my test suites has an initiator which randomly spits errors.
> I've yet to see it cause an error that an ext3 journal can't recover
> from.  So, if there's a genuine problem we need a nice test case to pass
> to the filesystem people.

If you need a clear testcase (IMHO, in this case it isn't needed, 
because it's clear without it), I can prepare a patch for SCST to 
randomly return TASK ABORTED status.

You can get the latest version of SCST and the target drivers using SVN:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst

> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 17:17                 ` Vladislav Bolkhovitin
@ 2007-11-20 17:30                   ` James Bottomley
  2007-11-20 17:45                     ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2007-11-20 17:30 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche


On Tue, 2007-11-20 at 20:17 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> > On Tue, 2007-11-20 at 19:15 +0300, Vladislav Bolkhovitin wrote:
> > 
> >>James Bottomley wrote:
> >>
> >>>I'm not sure your conclusions necessarily follow your data.  What was
> >>>the reason for the TASK ABORTED (I'd guess QErr settings, right)?
> >>
> >>It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
> >>when it working with several initiators with different transports over 
> >>the same set of devices, each of them having with TAS bit in the control 
> >>mode page set. According to SAM, in this case TASK ABORTED status can be 
> >>returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
> >>just should be retried. But QUEUE FULL status handled well, but TASK 
> >>ABORTED leads to filesystem corruption.
> > 
> > So this is with a soft target implementation ... so it could be an
> > ordering issue inside the target that's causing the filesystem
> > corruption on error.
> 
> Target offers no ordering guarantees for SIMPLE commands and frankly 
> says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
> control mode page. As we know, initiator doesn't use ORDERED tags (and 
> it really doesn't use them according to the logs), so if it's an 
> ordering issue, it's at the initiator's side.
> 
> > if you specifically set TAS=1 you're giving up the right to know what
> > caused the command termination.  With insufficient information, it's
> > really unsafe to simply retry, which is why the mid layer just returns
> > TASK ABORTED as an error.  If you set TAS=0 we'll get a check
> > condition/unit attention explaining what happened (usually commands
> > cleared by another initiator) and we'll explicitly do the right thing
> > based on the sense data.
> 
> But having TAS=1 is legal, right? So it should be handled well. If 
> TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
> ABORTED status can only be returned with TAS=1.

Driving with your handbrake on is legal too ... that doesn't mean you
should do it ... and it certainly doesn't give you a legitimate
complaint against the manufacturer of your car for excessive brake pad
wear.

We handle TASK ABORTED as well as we can (by failing it).  For better
handling set TAS=0 and we'll handle the individual cases according to
the sense codes.

> > One of my test suites has an initiator which randomly spits errors.
> > I've yet to see it cause an error that an ext3 journal can't recover
> > from.  So, if there's a genuine problem we need a nice test case to pass
> > to the filesystem people.
> 
> If you need a clear testcase (IMHO, in this case it isn't needed, 
> because it's clear without it), I can prepare a patch for SCST to 
> randomly return TASK ABORTED status.
> 
> You can get the latest version of SCST and the target drivers using SVN:
> 
> $ svn co https://scst.svn.sourceforge.net/svnroot/scst

There's no real need to bother with setting all this up ... a simple
initiator modification randomly to return TASK ABORTED should suffice.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 17:30                   ` James Bottomley
@ 2007-11-20 17:45                     ` Vladislav Bolkhovitin
  2007-11-20 17:52                       ` Matthew Wilcox
  2007-11-20 17:57                       ` James Bottomley
  0 siblings, 2 replies; 18+ messages in thread
From: Vladislav Bolkhovitin @ 2007-11-20 17:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

James Bottomley wrote:
>>>>>I'm not sure your conclusions necessarily follow your data.  What was
>>>>>the reason for the TASK ABORTED (I'd guess QErr settings, right)?
>>>>
>>>>It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
>>>>when it working with several initiators with different transports over 
>>>>the same set of devices, each of them having with TAS bit in the control 
>>>>mode page set. According to SAM, in this case TASK ABORTED status can be 
>>>>returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
>>>>just should be retried. But QUEUE FULL status handled well, but TASK 
>>>>ABORTED leads to filesystem corruption.
>>>
>>>So this is with a soft target implementation ... so it could be an
>>>ordering issue inside the target that's causing the filesystem
>>>corruption on error.
>>
>>Target offers no ordering guarantees for SIMPLE commands and frankly 
>>says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
>>control mode page. As we know, initiator doesn't use ORDERED tags (and 
>>it really doesn't use them according to the logs), so if it's an 
>>ordering issue, it's at the initiator's side.
>>
>>
>>>if you specifically set TAS=1 you're giving up the right to know what
>>>caused the command termination.  With insufficient information, it's
>>>really unsafe to simply retry, which is why the mid layer just returns
>>>TASK ABORTED as an error.  If you set TAS=0 we'll get a check
>>>condition/unit attention explaining what happened (usually commands
>>>cleared by another initiator) and we'll explicitly do the right thing
>>>based on the sense data.
>>
>>But having TAS=1 is legal, right? So it should be handled well. If 
>>TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
>>ABORTED status can only be returned with TAS=1.
> 
> Driving with your handbrake on is legal too ... that doesn't mean you
> should do it ... and it certainly doesn't give you a legitimate
> complaint against the manufacturer of your car for excessive brake pad
> wear.
> 
> We handle TASK ABORTED as well as we can (by failing it).  For better
> handling set TAS=0 and we'll handle the individual cases according to
> the sense codes.

So, should I consider your words as you think that it's perfectly fine 
to corrupt file system for devices with TAS=1? Absolutely legal devices, 
repeat. Hence, in your opinion, no further investigation should be done?

>>>One of my test suites has an initiator which randomly spits errors.
>>>I've yet to see it cause an error that an ext3 journal can't recover
>>>from.  So, if there's a genuine problem we need a nice test case to pass
>>>to the filesystem people.
>>
>>If you need a clear testcase (IMHO, in this case it isn't needed, 
>>because it's clear without it), I can prepare a patch for SCST to 
>>randomly return TASK ABORTED status.
>>
>>You can get the latest version of SCST and the target drivers using SVN:
>>
>>$ svn co https://scst.svn.sourceforge.net/svnroot/scst
> 
> There's no real need to bother with setting all this up ... a simple
> initiator modification randomly to return TASK ABORTED should suffice.

Yes, you're right. Then, I suppose, Mike Christie should be the best 
person to do it?

Vlad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 17:45                     ` Vladislav Bolkhovitin
@ 2007-11-20 17:52                       ` Matthew Wilcox
  2007-11-20 17:57                       ` James Bottomley
  1 sibling, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2007-11-20 17:52 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: James Bottomley, Mike Christie, Andrew Morton, linux-scsi,
	bugme-daemon, bart.vanassche

On Tue, Nov 20, 2007 at 08:45:12PM +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> >We handle TASK ABORTED as well as we can (by failing it).  For better
> >handling set TAS=0 and we'll handle the individual cases according to
> >the sense codes.
> 
> So, should I consider your words as you think that it's perfectly fine 
> to corrupt file system for devices with TAS=1? Absolutely legal devices, 
> repeat. Hence, in your opinion, no further investigation should be done?

I don't know how you manage to read his words this way.  I understand
him to mean that the SCSI subsystem is doing the best in can under the
somewhat misconfigured circumstances, and the problem lies in the FS not
handling errors correctly.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 17:45                     ` Vladislav Bolkhovitin
  2007-11-20 17:52                       ` Matthew Wilcox
@ 2007-11-20 17:57                       ` James Bottomley
  2007-11-20 18:22                         ` Vladislav Bolkhovitin
  2007-11-21 12:31                         ` Vladislav Bolkhovitin
  1 sibling, 2 replies; 18+ messages in thread
From: James Bottomley @ 2007-11-20 17:57 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche


On Tue, 2007-11-20 at 20:45 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> >>>>>I'm not sure your conclusions necessarily follow your data.  What was
> >>>>>the reason for the TASK ABORTED (I'd guess QErr settings, right)?
> >>>>
> >>>>It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
> >>>>when it working with several initiators with different transports over 
> >>>>the same set of devices, each of them having with TAS bit in the control 
> >>>>mode page set. According to SAM, in this case TASK ABORTED status can be 
> >>>>returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
> >>>>just should be retried. But QUEUE FULL status handled well, but TASK 
> >>>>ABORTED leads to filesystem corruption.
> >>>
> >>>So this is with a soft target implementation ... so it could be an
> >>>ordering issue inside the target that's causing the filesystem
> >>>corruption on error.
> >>
> >>Target offers no ordering guarantees for SIMPLE commands and frankly 
> >>says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
> >>control mode page. As we know, initiator doesn't use ORDERED tags (and 
> >>it really doesn't use them according to the logs), so if it's an 
> >>ordering issue, it's at the initiator's side.
> >>
> >>
> >>>if you specifically set TAS=1 you're giving up the right to know what
> >>>caused the command termination.  With insufficient information, it's
> >>>really unsafe to simply retry, which is why the mid layer just returns
> >>>TASK ABORTED as an error.  If you set TAS=0 we'll get a check
> >>>condition/unit attention explaining what happened (usually commands
> >>>cleared by another initiator) and we'll explicitly do the right thing
> >>>based on the sense data.
> >>
> >>But having TAS=1 is legal, right? So it should be handled well. If 
> >>TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
> >>ABORTED status can only be returned with TAS=1.
> > 
> > Driving with your handbrake on is legal too ... that doesn't mean you
> > should do it ... and it certainly doesn't give you a legitimate
> > complaint against the manufacturer of your car for excessive brake pad
> > wear.
> > 
> > We handle TASK ABORTED as well as we can (by failing it).  For better
> > handling set TAS=0 and we'll handle the individual cases according to
> > the sense codes.
> 
> So, should I consider your words as you think that it's perfectly fine 
> to corrupt file system for devices with TAS=1? Absolutely legal devices, 
> repeat. Hence, in your opinion, no further investigation should be done?

Logic wouldn't support such a conclusion.

You have intertwined two issues

     1. How should the mid layer handle TASK ABORTED.  I think we've
        reached the point where returning I/O error is the best we can
        do, but if TAS=0 we could have used the sense data to do better.
     2. Should a request I/O error cause corruption in ext3 that can't
        be recovered by a journal replay.  I think the answer here is
        no, so there needs to be an easily reproducible test case to
        pass to the filesystem people.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 17:57                       ` James Bottomley
@ 2007-11-20 18:22                         ` Vladislav Bolkhovitin
  2007-11-21 12:31                         ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 18+ messages in thread
From: Vladislav Bolkhovitin @ 2007-11-20 18:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

James Bottomley wrote:
>>>>>>>I'm not sure your conclusions necessarily follow your data.  What was
>>>>>>>the reason for the TASK ABORTED (I'd guess QErr settings, right)?
>>>>>>
>>>>>>It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
>>>>>>when it working with several initiators with different transports over 
>>>>>>the same set of devices, each of them having with TAS bit in the control 
>>>>>>mode page set. According to SAM, in this case TASK ABORTED status can be 
>>>>>>returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
>>>>>>just should be retried. But QUEUE FULL status handled well, but TASK 
>>>>>>ABORTED leads to filesystem corruption.
>>>>>
>>>>>So this is with a soft target implementation ... so it could be an
>>>>>ordering issue inside the target that's causing the filesystem
>>>>>corruption on error.
>>>>
>>>>Target offers no ordering guarantees for SIMPLE commands and frankly 
>>>>says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
>>>>control mode page. As we know, initiator doesn't use ORDERED tags (and 
>>>>it really doesn't use them according to the logs), so if it's an 
>>>>ordering issue, it's at the initiator's side.
>>>>
>>>>>if you specifically set TAS=1 you're giving up the right to know what
>>>>>caused the command termination.  With insufficient information, it's
>>>>>really unsafe to simply retry, which is why the mid layer just returns
>>>>>TASK ABORTED as an error.  If you set TAS=0 we'll get a check
>>>>>condition/unit attention explaining what happened (usually commands
>>>>>cleared by another initiator) and we'll explicitly do the right thing
>>>>>based on the sense data.
>>>>
>>>>But having TAS=1 is legal, right? So it should be handled well. If 
>>>>TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
>>>>ABORTED status can only be returned with TAS=1.
>>>
>>>Driving with your handbrake on is legal too ... that doesn't mean you
>>>should do it ... and it certainly doesn't give you a legitimate
>>>complaint against the manufacturer of your car for excessive brake pad
>>>wear.
>>>
>>>We handle TASK ABORTED as well as we can (by failing it).  For better
>>>handling set TAS=0 and we'll handle the individual cases according to
>>>the sense codes.
>>
>>So, should I consider your words as you think that it's perfectly fine 
>>to corrupt file system for devices with TAS=1? Absolutely legal devices, 
>>repeat. Hence, in your opinion, no further investigation should be done?
> 
> Logic wouldn't support such a conclusion.

Sorry, lately I've got too many "I won't bother, this is your problem" 
style answers

> You have intertwined two issues
> 
>      1. How should the mid layer handle TASK ABORTED.  I think we've
>         reached the point where returning I/O error is the best we can
>         do, but if TAS=0 we could have used the sense data to do better.
>      2. Should a request I/O error cause corruption in ext3 that can't
>         be recovered by a journal replay. I think the answer here is
>         no, so there needs to be an easily reproducible test case to
>         pass to the filesystem people.

OK, I see you point. As I already wrote, I can assist only in testing here.

> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems
  2007-11-20 17:57                       ` James Bottomley
  2007-11-20 18:22                         ` Vladislav Bolkhovitin
@ 2007-11-21 12:31                         ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 18+ messages in thread
From: Vladislav Bolkhovitin @ 2007-11-21 12:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, Andrew Morton, linux-scsi, bugme-daemon,
	bart.vanassche

James Bottomley wrote:
>>>>>if you specifically set TAS=1 you're giving up the right to know what
>>>>>caused the command termination.  With insufficient information, it's
>>>>>really unsafe to simply retry, which is why the mid layer just returns
>>>>>TASK ABORTED as an error.  If you set TAS=0 we'll get a check
>>>>>condition/unit attention explaining what happened (usually commands
>>>>>cleared by another initiator) and we'll explicitly do the right thing
>>>>>based on the sense data.

Actually, having TAS=1 has a considerably advantage over TAS=0 from 
error recovery point of view. With TAS=1 all aborted commands are 
supposed to be returned immediately to all affected initiators. With 
TAS=0 affected initiators will not receive any notification about 
aborted commands, only COMMANDS CLEARED BY ANOTHER INITIATOR UA will be 
established. So, they will know about that only after there will be 
timeout for their commands.

Thus, with TAS=1 almost immediate error recovery is possible, but with 
TAS=0 error recovery is possible after timeout, which for SSC devices 
can be hours.

>>>>But having TAS=1 is legal, right? So it should be handled well. If 
>>>>TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
>>>>ABORTED status can only be returned with TAS=1.
>>>
>>>Driving with your handbrake on is legal too ... that doesn't mean you
>>>should do it ... and it certainly doesn't give you a legitimate
>>>complaint against the manufacturer of your car for excessive brake pad
>>>wear.
>>>
>>>We handle TASK ABORTED as well as we can (by failing it).  For better
>>>handling set TAS=0 and we'll handle the individual cases according to
>>>the sense codes.

After some digging in SAM/SPC I've figured out that TASK ABORTED status 
can be returned exactly in the same circumstances as COMMANDS CLEARED BY 
ANOTHER INITIATOR UA, it only depends from TAS bit, which way of the 
notification is used. So, TASK ABORTED status carries the same 
information as COMMANDS CLEARED BY ANOTHER INITIATOR UA and should be 
handled at the same way. I.e., if for COMMANDS CLEARED BY ANOTHER 
INITIATOR UA the affected commands are restarted, they should be 
restarted for TASK ABORTED status as well.

Vlad

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-11-21 12:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-9405-10286@http.bugzilla.kernel.org/>
2007-11-19 20:50 ` [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems Andrew Morton
2007-11-19 20:56   ` James Bottomley
2007-11-19 21:22     ` Mike Christie
2007-11-19 21:28       ` James Bottomley
2007-11-20 15:04         ` Vladislav Bolkhovitin
2007-11-20 15:28           ` James Bottomley
2007-11-20 16:15             ` Vladislav Bolkhovitin
2007-11-20 16:43               ` James Bottomley
2007-11-20 17:17                 ` Vladislav Bolkhovitin
2007-11-20 17:30                   ` James Bottomley
2007-11-20 17:45                     ` Vladislav Bolkhovitin
2007-11-20 17:52                       ` Matthew Wilcox
2007-11-20 17:57                       ` James Bottomley
2007-11-20 18:22                         ` Vladislav Bolkhovitin
2007-11-21 12:31                         ` Vladislav Bolkhovitin
2007-11-19 21:15   ` Mike Christie
2007-11-19 21:18     ` Matthew Wilcox
2007-11-19 21:24       ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).