Proposed changes to generic blk tag for use in SCSI (1/3)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Proposed changes to generic blk tag for use in SCSI (1/3)
@ 2002-06-11  2:46 James Bottomley
  2002-06-11  5:50 ` Jens Axboe
  2002-06-13 21:01 ` Doug Ledford
  0 siblings, 2 replies; 39+ messages in thread
From: James Bottomley @ 2002-06-11  2:46 UTC (permalink / raw)
  To: axboe, linux-scsi; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

The attached is what I needed in the generic block layer to get the SCSI 
subsystem using it for Tag Command Queueing.

The changes are basically

1) I need a function to find the tagged request given the queue and the tag, 
which I've added as a function in the block layer

2) The SCSI queue will stall if it gets an untagged request in the stream, so 
once tagged queueing is enabled, all commands (including SPECIALS) must be 
tagged.  I altered the check in blk_queue_start_tag to permit this.

This is part of a set of three patches which provide a sample implementation 
of a SCSI driver using the generic TCQ code.

There are several shortcomings of the prototype, most notably it doesn't have 
tag starvation detection and processing.  However, I think I can re-introduce 
this as part of the error handler functions.

James Bottomley


[-- Attachment #2: blk-tag-2.5.21.diff --]
[-- Type: text/plain , Size: 2554 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.585   -> 1.586  
#	drivers/block/ll_rw_blk.c	1.67    -> 1.68   
#	include/linux/blkdev.h	1.44    -> 1.45   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/06/10	jejb@mulgrave.(none)	1.586
# [BLK LAYER]
# 
# add find tag function, adjust criteria for tagging commands.
# --------------------------------------------
#
diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c	Mon Jun 10 22:29:38 2002
+++ b/drivers/block/ll_rw_blk.c	Mon Jun 10 22:29:38 2002
@@ -304,6 +304,27 @@
 }
 
 /**
+ * blk_queue_find_tag - find a request by its tag and queue
+ *
+ * @q:	 The request queue for the device
+ * @tag: The tag of the request
+ *
+ * Notes:
+ *    Should be used when a device returns a tag and you want to match
+ *    it with a request.
+ *
+ *    no locks need be held.
+ **/
+struct request *blk_queue_find_tag(request_queue_t *q, int tag)
+{
+	struct blk_queue_tag *bqt = q->queue_tags;
+
+	if(unlikely(bqt == NULL || bqt->max_depth < tag))
+		return NULL;
+
+	return bqt->tag_index[tag];
+}
+/**
  * blk_queue_free_tags - release tag maintenance info
  * @q:  the request queue for the device
  *
@@ -448,7 +469,7 @@
 	unsigned long *map = bqt->tag_map;
 	int tag = 0;
 
-	if (unlikely(!(rq->flags & REQ_CMD)))
+	if (unlikely((rq->flags & REQ_QUEUED)))
 		return 1;
 
 	for (map = bqt->tag_map; *map == -1UL; map++) {
@@ -1945,6 +1966,7 @@
 EXPORT_SYMBOL(ll_10byte_cmd_build);
 EXPORT_SYMBOL(blk_queue_prep_rq);
 
+EXPORT_SYMBOL(blk_queue_find_tag);
 EXPORT_SYMBOL(blk_queue_init_tags);
 EXPORT_SYMBOL(blk_queue_free_tags);
 EXPORT_SYMBOL(blk_queue_start_tag);
diff -Nru a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h	Mon Jun 10 22:29:38 2002
+++ b/include/linux/blkdev.h	Mon Jun 10 22:29:38 2002
@@ -339,6 +339,7 @@
 #define blk_queue_tag_queue(q)		((q)->queue_tags->busy < (q)->queue_tags->max_depth)
 #define blk_rq_tagged(rq)		((rq)->flags & REQ_QUEUED)
 extern int blk_queue_start_tag(request_queue_t *, struct request *);
+extern struct request *blk_queue_find_tag(request_queue_t *, int);
 extern void blk_queue_end_tag(request_queue_t *, struct request *);
 extern int blk_queue_init_tags(request_queue_t *, int);
 extern void blk_queue_free_tags(request_queue_t *);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
@ 2002-06-11  5:50 ` Jens Axboe
  2002-06-11 14:29   ` James Bottomley
  2002-06-13 21:01 ` Doug Ledford
  1 sibling, 1 reply; 39+ messages in thread
From: Jens Axboe @ 2002-06-11  5:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel

On Mon, Jun 10 2002, James Bottomley wrote:
> The attached is what I needed in the generic block layer to get the SCSI 
> subsystem using it for Tag Command Queueing.
> 
> The changes are basically
> 
> 1) I need a function to find the tagged request given the queue and the tag, 
> which I've added as a function in the block layer

Ehm it's already there, one could argue that it's pretty core
functionality for this type of stuff :-). It's called
blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that
we should just move it into ll_rw_blk.c. That gets better documented as
well. Could you redo that part?

> 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> once tagged queueing is enabled, all commands (including SPECIALS) must be 
> tagged.  I altered the check in blk_queue_start_tag to permit this.

I completely agree with this, blk_queue_start_tag() should not need to
know about these things so just checking if the request is already
marked tagged is fine with me. But please make that a warning, like

	if (rq->flags & REQ_QUEUED) {
		printk("blk_queue_start_tag: rq already tagged\n");
		return 1;
	}

Also, you need to fix drivers/ide/tcq.c to make sure it doesn't call
blk_queue_start_tag() for non REQ_CMD requests. Ah wait, I'll just
change that. And also _please_ fix the comment about REQ_CMD and not
just the code, it's doesn't stand anymore.

> This is part of a set of three patches which provide a sample implementation 
> of a SCSI driver using the generic TCQ code.

Cool! Looking forward to reviewing it later today.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11  5:50 ` Jens Axboe
@ 2002-06-11 14:29   ` James Bottomley
  2002-06-11 14:45     ` Jens Axboe
  0 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2002-06-11 14:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, linux-scsi, linux-kernel

axboe@suse.de said:
> Ehm it's already there, one could argue that it's pretty core
> functionality for this type of stuff :-). It's called
> blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that
> we should just move it into ll_rw_blk.c. That gets better documented
> as well. Could you redo that part? 

I guessed it must be.  I grepped the IDE tree looking for anything with `get' 
or `find' in it, but came up empty.  It's actually called 
blk_queue_tag_request(), which is why I didn't find it.

Do you want me to keep this name if I move it?

axboe@suse.de said:
> I completely agree with this, blk_queue_start_tag() should not need to
> know about these things so just checking if the request is already
> marked tagged is fine with me. But please make that a warning, like 

Actually, I think it should be a BUG().  By the time a tagged request comes in 
to blk_queue_start_tag, we must already have corrupted the lists since we use 
the same list element (req->queuelist) to queue on both the tag queue and the 
request queue.

> And also _please_ fix the comment about REQ_CMD and not just the code,
> it's doesn't stand anymore. 

Will do...I didn't see much point altering the comment in the prototype until 
there was agreement that it was OK to do it this way.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11 14:29   ` James Bottomley
@ 2002-06-11 14:45     ` Jens Axboe
  2002-06-11 16:39       ` James Bottomley
  0 siblings, 1 reply; 39+ messages in thread
From: Jens Axboe @ 2002-06-11 14:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel

On Tue, Jun 11 2002, James Bottomley wrote:
> axboe@suse.de said:
> > Ehm it's already there, one could argue that it's pretty core
> > functionality for this type of stuff :-). It's called
> > blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that
> > we should just move it into ll_rw_blk.c. That gets better documented
> > as well. Could you redo that part? 
> 
> I guessed it must be.  I grepped the IDE tree looking for anything with `get' 
> or `find' in it, but came up empty.  It's actually called 
> blk_queue_tag_request(), which is why I didn't find it.
> 
> Do you want me to keep this name if I move it?

Nah the name sucks, you see I didn't even remember it myself either. Why
not just change the name to something sane, like blk_queue_get_tag() or
blk_queue_find_tag(). I think yours (the latter) describes it best. Just
ide accordingly as well, that's the sole current user.

> axboe@suse.de said:
> > I completely agree with this, blk_queue_start_tag() should not need to
> > know about these things so just checking if the request is already
> > marked tagged is fine with me. But please make that a warning, like 
> 
> Actually, I think it should be a BUG().  By the time a tagged request
> comes in to blk_queue_start_tag, we must already have corrupted the
> lists since we use the same list element (req->queuelist) to queue on
> both the tag queue and the request queue.

Agree, make it a BUG_ON() or something.

> > And also _please_ fix the comment about REQ_CMD and not just the code,
> > it's doesn't stand anymore. 
> 
> Will do...I didn't see much point altering the comment in the
> prototype until there was agreement that it was OK to do it this way.

Fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11 14:45     ` Jens Axboe
@ 2002-06-11 16:39       ` James Bottomley
  0 siblings, 0 replies; 39+ messages in thread
From: James Bottomley @ 2002-06-11 16:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, linux-scsi, linux-kernel

OK, this is the new patch with all the changes.  I didn't alter the IDE 
blk_queue_start_tag() to check for a REQ_CMD, since you said you would do that.

James

You can import this changeset into BK by piping this whole message to:
'| bk receive [path to repository]' or apply the patch as usual.

===================================================================


ChangeSet@1.490, 2002-06-11 12:33:05-04:00, jejb@mulgrave.(none)
  [BLK TCQ]
  
  remove blk_queue_tag_request() in favour of blk_queue_find_tag()
  Allow any request type into blk_queue_start_tag()

ChangeSet@1.489, 2002-06-11 10:34:17-04:00, jejb@mulgrave.(none)
  [BLK LAYER]
  
  add find tag function, adjust criteria for tagging commands.


 drivers/block/ll_rw_blk.c |   40 ++++++++++++++++++++++++++++++++++------
 drivers/ide/tcq.c         |    2 +-
 include/linux/blkdev.h    |    2 +-
 3 files changed, 36 insertions(+), 8 deletions(-)


diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c	Tue Jun 11 12:38:12 2002
+++ b/drivers/block/ll_rw_blk.c	Tue Jun 11 12:38:12 2002
@@ -297,6 +297,27 @@
 }
 
 /**
+ * blk_queue_find_tag - find a request by its tag and queue
+ *
+ * @q:	 The request queue for the device
+ * @tag: The tag of the request
+ *
+ * Notes:
+ *    Should be used when a device returns a tag and you want to match
+ *    it with a request.
+ *
+ *    no locks need be held.
+ **/
+struct request *blk_queue_find_tag(request_queue_t *q, int tag)
+{
+	struct blk_queue_tag *bqt = q->queue_tags;
+
+	if(unlikely(bqt == NULL || bqt->max_depth < tag))
+		return NULL;
+
+	return bqt->tag_index[tag];
+}
+/**
  * blk_queue_free_tags - release tag maintenance info
  * @q:  the request queue for the device
  *
@@ -429,10 +450,12 @@
  *  Description:
  *    This can either be used as a stand-alone helper, or possibly be
  *    assigned as the queue &prep_rq_fn (in which case &struct request
- *    automagically gets a tag assigned). Note that this function assumes
- *    that only REQ_CMD requests can be queued! The request will also be
- *    removed from the request queue, so it's the drivers responsibility to
- *    readd it if it should need to be restarted for some reason.
+ *    automagically gets a tag assigned). Note that this function
+ *    assumes that any type of request can be queued! if this is not
+ *    true for your device, you must check the request type before
+ *    calling this function.  The request will also be removed from
+ *    the request queue, so it's the drivers responsibility to readd
+ *    it if it should need to be restarted for some reason.
  *
  *  Notes:
  *   queue lock must be held.
@@ -443,8 +466,12 @@
 	unsigned long *map = bqt->tag_map;
 	int tag = 0;
 
-	if (unlikely(!(rq->flags & REQ_CMD)))
-		return 1;
+	if (unlikely((rq->flags & REQ_QUEUED))) {
+		printk(KERN_ERR 
+		       "request %p for device [02%x:02%x] already tagged %d",
+		       rq, major(rq->rq_dev), minor(rq->rq_dev), rq->tag);
+		BUG();
+	}
 
 	for (map = bqt->tag_map; *map == -1UL; map++) {
 		tag += BLK_TAGS_PER_LONG;
@@ -2027,6 +2054,7 @@
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 
+EXPORT_SYMBOL(blk_queue_find_tag);
 EXPORT_SYMBOL(blk_queue_init_tags);
 EXPORT_SYMBOL(blk_queue_free_tags);
 EXPORT_SYMBOL(blk_queue_start_tag);
diff -Nru a/drivers/ide/tcq.c b/drivers/ide/tcq.c
--- a/drivers/ide/tcq.c	Tue Jun 11 12:38:12 2002
+++ b/drivers/ide/tcq.c	Tue Jun 11 12:38:12 2002
@@ -282,7 +282,7 @@
 
 	TCQ_PRINTK("%s: stat %x, feat %x\n", __FUNCTION__, stat, feat);
 
-	rq = blk_queue_tag_request(&drive->queue, tag);
+	rq = blk_queue_find_tag(&drive->queue, tag);
 	if (!rq) {
 		printk(KERN_ERR"%s: missing request for tag %d\n", __FUNCTION__, tag);
 		return ide_stopped;
diff -Nru a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h	Tue Jun 11 12:38:12 2002
+++ b/include/linux/blkdev.h	Tue Jun 11 12:38:12 2002
@@ -328,11 +328,11 @@
 /*
  * tag stuff
  */
-#define blk_queue_tag_request(q, tag)	((q)->queue_tags->tag_index[(tag)])
 #define blk_queue_tag_depth(q)		((q)->queue_tags->busy)
 #define blk_queue_tag_queue(q)		((q)->queue_tags->busy < (q)->queue_tags->
max_depth)
 #define blk_rq_tagged(rq)		((rq)->flags & REQ_QUEUED)
 extern int blk_queue_start_tag(request_queue_t *, struct request *);
+extern struct request *blk_queue_find_tag(request_queue_t *, int);
 extern void blk_queue_end_tag(request_queue_t *, struct request *);
 extern int blk_queue_init_tags(request_queue_t *, int);
 extern void blk_queue_free_tags(request_queue_t *);

===================================================================


This BitKeeper patch contains the following changesets:
1.489..1.490
## Wrapped with gzip_uu ##


begin 664 bkpatch9679
M'XL(`'0G!CT``\U8^W/:.!#^&?\5>^VT![D`\I-'+YF\F%XF:9J29N8ZO0ZC
MV`*<^`&6@#!'__=;R>892IJT=,YA@BVO5OOX]M.*EW#-65+/W;+;&^TE_!5S
M4<^%@Z"3T"$KY:,X8@4<;\8QCI>[<<C*4K1\=%:^">Z*$1L5!>T4C9*MH=@E
M%6X7ABSA]9Q>,F<C8MQC]5RS\?;Z_+"I:7M[<-RE48==,0%[>YJ(DR$-/'Y`
M13>(HY)(:,1#)FC)C</)3'1B$&+@GZU73&([$]TA5F7BZIZN4TMG'C&LJF/-
MM4EK-^MR2(W84M_$L&NFK9V`7K*J-2!&F3AE70>=U$VKKE>*Q*H3`M+U@Y7H
MP!\&%(EV!#_7BV/-A<]'YV=P?OBIT?R"3_BAG@=M/_(`8P[M0>0*/XYV<?AV
MP`6XB2]8XE-HQXF4Z/A1!W#MD$8>+VEG8%B$U+3+>?"UXA,O32.4:/N/^.HE
MOL0`(B1V[\I!T$I&+41+R5WP'4VQT'>G6IM43=UT+.+JAN>XMNFL#?-C2AU=
MURW3THT)L:VJ_:B-?N0&`X^5`S\:W$LL>VQ8ZBXFIV:1B648IC6YH<RUJ5,Q
M:J1BMF^,]09NTKA@7968AJ:MU;"*SFP2HE,E3J&S1A;0:=1-LT[LC>@TMXK.
MC\<?,FPF+(R'#-#O5G_`!JR%"&PE#.^YR!?`CZ!-A_$@@;B]("31+"7S!51Q
M&`3Q"&@TAFR>(@Z<*N*%*5S01&1S$--HIOX+,/WK\>*8ADTF1+<P]?O@T8"Y
M=_X!&_H<B[XX9)$8)(PO%9R/ZH3;G]:$34R]HB.U371\PC16VI6*P=KM"G&K
MK&9M+K0597.33)LXQO^6!%(;#=NIZ&JG>>".W'&V$<UG*74P#(Y9Q4HS;02/
MJG'=62YQHVZ3C26N0U'?2HE?T)"!JR8NE^UR;2^5YT)%GT$*E?=03$;J@]5V
M^3`CSRC9$Z-J@ZZ=IE^YI`][ZVQXK=8J[JOA7;DI%MXLH>(!D![O1WX0V-\'
MDV\!?`87P['-%"Y5\T&_8FSN5\QMX47V)P^S,.M4)%!4^T)G!-]!=W&\RT#-
MD3K2]B;=5`[3SF8-^4,4"ZDO8>T!9_).#OIM-=YL?&@=OSLI*77C5(8&4R&7
MX@RN%KTZOCJ%GM]C2,)*">H/`MDKJ2;B&\!]D)QG`/C4J-7`T#7861>QXFJ<
M;L;@"Z[,EP%*@P4[<OI!OYZ#C^C,5%:]3)M`',5=Q7>9$L39=24IU6`]B_FD
M3-=%+!BORSN\KKKQ(,"$,L!X>3#J8J9HI@_G(6@CC@-3F\;Q`$8T4ED)9=N?
MJ?$%C'S1G3M3RA;#*XI!QI)#Q)A:J<L"3[[?*6M<)`-7S+S:65/>V;LI*<%.
M?U=V"ZK.M7^U7*9BB;=041]/'=#/6$&.\3?:/UK.;^<'4>#?L6"<5S)[<'%]
M?@Z3">!C<3^D]RV/]="5/]4*!2V72\.@Y)2.[%G)2Y)$2]G]9[S[\D;[JI5W
M=K03RW(D<Z5?N"C,5\TG:%4[0(/@M0+QA^O&=>.D@$N=8@G6<$+C[\OWS8^M
MJT_OCMZ?YQ_&1/+;S^B?%;-83]V(\"CD;(590GK'UM(`=5U,B2ISU2PBK*>(
MR;-[]0XK01)!D##JC=7!"+&&!G/5/ZI&88N%?F+9)EB8;]L!)X,]'8@XI!W?
M1;(90X>)61UQ[G<BYA5*JA2Q0BFBN>OS&8=.-7`^"!6)T?7.NS22]:3"Y?TF
MB5&IP0\29*8$JR/EB;'LR]/"WE5U'*KS9!?WJ4622!>Y83B%92JD!_*8N61C
M"9;X:.0'`8:?Q]*@])B`Q]@D#J=FK'+7+J"L+WY/.3I+!HKP7AQQ_\8/?#%.
MN1\WG#G+H(_XGZ>LI0A%9$LJN,A%T5<>AW*(<K03D^-4P,#D.%5,SO<6(R"U
MY'H),LU=_JS1O&@UFDW`(4BO%U-G7O74BAEC?B;&J_NZ_/=E%8ROO!>[\_D)
MLEA(;^-$69#TD72&!1SRH]4A^9#6?"YW=/TV+V^^JOYF_>'B\>;F1XXYFSN;
MC<>=65M#\#LE'UM_8ENC;^N@^XVNII?$(I85@2RB#O4K)++>X>>T"J8IFUQV
M+QAN+<_9%M6N^,V=X2D_7*C,&$_<%LBV^LU-OSE`7C9,">L%U&4A(E%RW[IC
FBMP%U"%[6_D[,4T,U/SW4<6K2-][)JVZ58>VM?\`!I#L'H85````
`
end



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
  2002-06-11  5:50 ` Jens Axboe
@ 2002-06-13 21:01 ` Doug Ledford
  2002-06-13 21:26   ` James Bottomley
  1 sibling, 1 reply; 39+ messages in thread
From: Doug Ledford @ 2002-06-13 21:01 UTC (permalink / raw)
  To: James Bottomley; +Cc: axboe, linux-scsi, linux-kernel

On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> once tagged queueing is enabled, all commands (including SPECIALS) must be 
> tagged.  I altered the check in blk_queue_start_tag to permit this.

Hmmm...this seems broken to me.  Switching from tagged to untagged 
momentarily and then back is perfectly valid.  Can the bio layer 
handle this and not the scsi layer, or are both layers unable to handle 
this sort of tag manipulation? 

> There are several shortcomings of the prototype, most notably it doesn't have 
> tag starvation detection and processing.  However, I think I can re-introduce 
> this as part of the error handler functions.

If you are using the bio layer tag processing, then it should be 
doing this part I would think.  If it isn't, then it sounds like either 
it's design is missing some key elements required to be fully functional 
or the integration between the scsi layer and the bio layer needs some 
additional work.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-13 21:01 ` Doug Ledford
@ 2002-06-13 21:26   ` James Bottomley
  2002-06-13 21:50     ` Doug Ledford
  0 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2002-06-13 21:26 UTC (permalink / raw)
  To: James Bottomley, axboe, linux-scsi, linux-kernel

On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> once tagged queueing is enabled, all commands (including SPECIALS) must be 
> tagged.  I altered the check in blk_queue_start_tag to permit this.

dledford@redhat.com said:
> Hmmm...this seems broken to me.  Switching from tagged to untagged
> momentarily and then back is perfectly valid.  Can the bio layer
> handle this and not the scsi layer, or are both layers unable to
> handle  this sort of tag manipulation?  

The layers can cope with the switch easily enough.  The problem is that to 
send an untagged command to a SCSI device you have to wait for the outstanding 
tags to clear which is what causes the stall.  The scsi mid-layer queue push 
back system pushes all commands back to the BIO layer marked as REQ_SPECIAL 
(because the upper layer drivers generate the commands and it has no idea what 
they are supposed to be doing) if the driver cannot handle them.  This means 
for those drivers (like the new adaptec) which load up the device until it 
returns a queue full (thus causing push back into the bio layer) we'd get 
stutter in the command pipeline.  The cleanest solution is to allow (but not 
require) tagging of every request type.

On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> There are several shortcomings of the prototype, most notably it doesn't have 
> tag starvation detection and processing.  However, I think I can re-introduce 
> this as part of the error handler functions.

dledford@redhat.com said:
> If you are using the bio layer tag processing, then it should be
> doing this part I would think.  If it isn't, then it sounds like
> either  it's design is missing some key elements required to be fully
> functional  or the integration between the scsi layer and the bio
> layer needs some  additional work. 

I thought about doing this.  The problem is that the blk layer doesn't have 
very good instrumentation for detecting the condition.  The SCSI layer is the 
one that has per command timers and all the other necessaries so it can detect 
when a command should have returned and take corrective action.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-13 21:26   ` James Bottomley
@ 2002-06-13 21:50     ` Doug Ledford
  2002-06-13 22:09       ` James Bottomley
  0 siblings, 1 reply; 39+ messages in thread
From: Doug Ledford @ 2002-06-13 21:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: axboe, linux-scsi, linux-kernel

On Thu, Jun 13, 2002 at 05:26:44PM -0400, James Bottomley wrote:
> On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> > 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> > once tagged queueing is enabled, all commands (including SPECIALS) must be 
> > tagged.  I altered the check in blk_queue_start_tag to permit this.
> 
> dledford@redhat.com said:
> > Hmmm...this seems broken to me.  Switching from tagged to untagged
> > momentarily and then back is perfectly valid.  Can the bio layer
> > handle this and not the scsi layer, or are both layers unable to
> > handle  this sort of tag manipulation?  
> 
> The layers can cope with the switch easily enough.  The problem is that to 
> send an untagged command to a SCSI device you have to wait for the outstanding 
> tags to clear which is what causes the stall.

Well, intentional behaviour is hardly what I would call a stall.  I 
thought you were implying that it would stall the queue indefinitely.  I'm 
fully aware that it forces the queue to wait until all outstanding 
commands have completed before sending the untagged command, that's part 
of the desired behaviour in that case.

>  The scsi mid-layer queue push 
> back system pushes all commands back to the BIO layer marked as REQ_SPECIAL 
> (because the upper layer drivers generate the commands and it has no idea what 
> they are supposed to be doing) if the driver cannot handle them.  This means 
> for those drivers (like the new adaptec) which load up the device until it 
> returns a queue full (thus causing push back into the bio layer) we'd get 
> stutter in the command pipeline.  The cleanest solution is to allow (but not 
> require) tagging of every request type.

This I'm not following.  If you get a QUEUE_FULL from the adaptec driver, 
then the commands you are pushing back should still be tagged and no stall 
should be required beyond just waiting for any outstanding command on the 
drive to complete or for a timeout to pass.  It should not require any 
untagged type stall where you have to drain the entire pipeline...

> I thought about doing this.  The problem is that the blk layer doesn't have 
> very good instrumentation for detecting the condition.  The SCSI layer is the 
> one that has per command timers and all the other necessaries so it can detect 
> when a command should have returned and take corrective action.

I would think that, eventually, the bio layer will support I/O fencing via 
tagged commands (aka, ext3 needs an I/O fence and the bio layer does as 
needed to enforce that, which on scsi may mean an ordered queue tag is 
generated instead of a regular tag and on IDE it may mean something else).  
It will have to be able to tell that some of these conditions have been 
satisfied in those cases, so I see no reason why it shouldn't be made 
aware of them now.  Just my $.02

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-13 21:50     ` Doug Ledford
@ 2002-06-13 22:09       ` James Bottomley
  0 siblings, 0 replies; 39+ messages in thread
From: James Bottomley @ 2002-06-13 22:09 UTC (permalink / raw)
  To: James Bottomley, axboe, linux-scsi, linux-kernel

dledford@redhat.com said:
> This I'm not following.  If you get a QUEUE_FULL from the adaptec
> driver,  then the commands you are pushing back should still be tagged
> and no stall  should be required beyond just waiting for any
> outstanding command on the  drive to complete or for a timeout to
> pass.  It should not require any  untagged type stall where you have
> to drain the entire pipeline... 

Ah, but that was the problem in the blk generic tagging.  only requests of 
type REQ_CMD get tagged, so say the SD driver gets a block read request as a 
REQ_CMD, translates it into a SCSI READ, sends it through the mid-layer to the 
low layer driver which requests a block layer tag but eventually responds 
QUEUE_FULL.  Now the command gets pushed back to the blk queue head as a 
REQ_SPECIAL (and as part of the push back, we have to finish the tag since 
command moves from the tag queue to the blk queue), which to the scsi mid 
layer means request with already formulated SCSI command so don't go back 
through the upper layer driver again.  The problem is that when this command 
comes back again into the scsi mid layer for execution it would now do so as 
an untagged command because the blk_queue_start_tag() code will only tag 
REQ_CMD requests, and hence we get a queue stall every time the low level 
driver responds QUEUE_FULL.

This was the behaviour (in the blk layer) I was objecting to---on the second 
go around, we request a tag using blk_queue_start_tag() but get denied because 
the request isn't of the correct type---and why I think the 
blk_generic_start_tag() needs to allow REQ_SPECIAL requests to be tagged.

> I would think that, eventually, the bio layer will support I/O fencing
> via  tagged commands (aka, ext3 needs an I/O fence and the bio layer
> does as  needed to enforce that, which on scsi may mean an ordered
> queue tag is  generated instead of a regular tag and on IDE it may
> mean something else).   It will have to be able to tell that some of
> these conditions have been  satisfied in those cases, so I see no
> reason why it shouldn't be made  aware of them now.  Just my $.02 

I've actually already put this code into the mid layer patch (the 
scsi_populate_tag_msg() function in scsi.h) to generate an ordered tag for the 
case where the request is marked REQ_BARRIER.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
@ 2002-09-03 14:35 James Bottomley
  2002-09-03 18:23 ` Doug Ledford
  0 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2002-09-03 14:35 UTC (permalink / raw)
  To: Justin T. Gibbs, Doug Ledford; +Cc: linux-kernel, linux-scsi

> Doug Ledford writes:
> 
>  > took the device off line.  So, in short, the mid layer isn't waiting
> long   > enough, or when it gets sense indicated not ready it needs to
> implement a   > waiting queue with a timeout to try rekicking things a
> few times and don't   > actually mark the device off line until a longer
> period of time has   > elasped without the device coming back.
> 
> There is a kernel config CONFIG_AIC7XXX_RESET_DELAY_MS (default 15s).
> Would increasing it help?

Justin Gibbs writes:
> This currently only effects the initial bus reset delay.  If the
> driver holds off commands after subsequent bus resets, it can cause
> undeserved timeouts on the commands it has intentionally deferred.
> The mid-layer has a 5 second delay after bus resets, but I haven't
> verified that this is honored correctly during error recovery.

I'm planning a major re-write of this area in the error handler.  The way I 
think it should go is:

1) Quiesce host (set in_recovery flag)
2) Suspend active timers on this host
3) Proceed down the error correction track (eliminate abort and go down 
device, bus and host resets and finally set the device offline).
5) On each error recovery wait out a recovery timer for the device to become 
active before talking to it again.  Send all affected commands back to the 
block layer to await reissue (note: it would now be illegal for commands to 
lie to the mid layer and say they've done the reset when they haven't).
6) issue a TUR using a command allocated to the eh for that purpose.  Process 
the return code (in particular, if the device says NOT READY, wait some more). 
 Only if the TUR definitively fails proceed up the recovery chain all the way 
to taking the device offline.

I also plan to expose the suspend and resume timers API in some form for FC 
drivers to use.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley
@ 2002-09-03 18:23 ` Doug Ledford
  2002-09-03 19:09   ` James Bottomley
  0 siblings, 1 reply; 39+ messages in thread
From: Doug Ledford @ 2002-09-03 18:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 09:35:02AM -0500, James Bottomley wrote:
> 
> 1) Quiesce host (set in_recovery flag)

Right.

> 2) Suspend active timers on this host

Right.

> 3) Proceed down the error correction track (eliminate abort and go down 
> device, bus and host resets and finally set the device offline).

Leave abort active.  It does actually work in certain scenarios.  The CD 
burner scenario that started this thread is an example of somewhere that 
an abort should actually do the job.

> 5) On each error recovery wait out a recovery timer for the device to become 
> active before talking to it again.  Send all affected commands back to the 
> block layer to await reissue (note: it would now be illegal for commands to 
> lie to the mid layer and say they've done the reset when they haven't).
> 6) issue a TUR using a command allocated to the eh for that purpose.  Process 
> the return code (in particular, if the device says NOT READY, wait some more). 
>  Only if the TUR definitively fails proceed up the recovery chain all the way 
> to taking the device offline.

Right.

> I also plan to expose the suspend and resume timers API in some form for FC 
> drivers to use.



-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 18:23 ` Doug Ledford
@ 2002-09-03 19:09   ` James Bottomley
  2002-09-03 20:59     ` Alan Cox
                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: James Bottomley @ 2002-09-03 19:09 UTC (permalink / raw)
  To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

dledford@redhat.com said:
> Leave abort active.  It does actually work in certain scenarios.  The
> CD  burner scenario that started this thread is an example of
> somewhere that  an abort should actually do the job. 

Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  
At best, abort probably causes a command to overtake a barrier it shouldn't, 
at worst we abort the ordered tag that is the barrier and transactional 
integrity is lost.

When error correction is needed, we have to return all the commands for that 
device to the block layer so that ordering and barrier issues can be taken 
care of in the reissue.  This makes LUN RESET (for those that support it) the 
minimum level of error correction we can apply.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 19:09   ` James Bottomley
@ 2002-09-03 20:59     ` Alan Cox
  2002-09-03 21:32       ` James Bottomley
  2002-09-03 21:13     ` Doug Ledford
  2002-09-03 21:24     ` Patrick Mansfield
  2 siblings, 1 reply; 39+ messages in thread
From: Alan Cox @ 2002-09-03 20:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 20:09, James Bottomley wrote:
> dledford@redhat.com said:
> > Leave abort active.  It does actually work in certain scenarios.  The
> > CD  burner scenario that started this thread is an example of
> > somewhere that  an abort should actually do the job. 
> 
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  
> At best, abort probably causes a command to overtake a barrier it shouldn't, 
> at worst we abort the ordered tag that is the barrier and transactional 
> integrity is lost.

What do we plan to do for the cases where reset is disabled because we
have shared disk scsi so don't want to reset and hose the reservations ?

If your error correction always requires all commands return to the
block layer then the block layer is IMHO broken. Its messy enough doing
that before you hit the fun situations where insert scsi commands of
their own the block layer never initiated.

Next you only need to return stuff if commands have been issued between
the aborting command and a barrier. Since most sane systems will never
be causing REQ_BARRIER that should mean the general case for an abort is
going to be fine. The CD burner example is also true for this. If we
track barrier sequences then we will know the barrier count for the
command we are aborting and the top barrier count for commands issued to
the device. Finally you only need to go to the large hammer approach
when you are dealing with a media changing command (ie WRITE*) - if we
abort a read then knowing we don't queue overlapping read then write to
disk we already know that the read will not break down the tag ordering
as I understand it ?

If we get to the point we need an abort we don't want to issue a reset.
Not every device comes back sane from a reset and in some cases we have
to issue a whole sequence of commands to get the state of the device
back (door locking, power management, ..)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 20:59     ` Alan Cox
@ 2002-09-03 21:32       ` James Bottomley
  2002-09-03 21:54         ` Alan Cox
  2002-09-03 22:50         ` Doug Ledford
  0 siblings, 2 replies; 39+ messages in thread
From: James Bottomley @ 2002-09-03 21:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

alan@lxorguk.ukuu.org.uk said:
> What do we plan to do for the cases where reset is disabled because we
> have shared disk scsi so don't want to reset and hose the reservations

The reset gets issued and the reservation gets broken.  Good HA or other 
software knows the reservation may be lost and takes this into account in the 
cluster algorithm.

With SCSI-2 reservations, there's no way to preserve the reservation and have 
the reset be effective (I know, in theory, that this can be circumvented by 
the soft reset alternative, but I've never seen a device that implements it 
correctly).  I suppose we hope SCSI-3 Persistent Group Reservations come along 
quickly.

> If your error correction always requires all commands return to the
> block layer then the block layer is IMHO broken. Its messy enough
> doing that before you hit the fun situations where insert scsi
> commands of their own the block layer never initiated. 

This is part of the slim SCSI down approach.  The block layer already has 
handling for tag errors like this.  Inserted SCSI commands should now work 
correctly since we're deprecating the scsi_do_cmnd() in favour of scsi_do_req, 
which means the command is always associated with a request and goes into the 
block queue just like any other request.

I think the block layer, which already knows about the barrier ordering, is 
the appropriate place for this.  If you think the scsi error handler is a 
hairy wart now, just watch it grow into a stonking great carbuncle as I try to 
introduce it to the concept of command queue ordering and appropriate recovery.

> Next you only need to return stuff if commands have been issued
> between the aborting command and a barrier. Since most sane systems
> will never be causing REQ_BARRIER that should mean the general case
> for an abort is going to be fine. The CD burner example is also true
> for this. If we track barrier sequences then we will know the barrier
> count for the command we are aborting and the top barrier count for
> commands issued to the device. Finally you only need to go to the
> large hammer approach when you are dealing with a media changing
> command (ie WRITE*) - if we abort a read then knowing we don't queue
> overlapping read then write to disk we already know that the read will
> not break down the tag ordering as I understand it ? 

I agree with your reasoning.  However, errors occur infrequently enough (I 
hope) so that its just not worth the extra code complexity to make the error 
handler look for that case.

However, in all honesty, I have to say that I just don't believe ABORTs are 
ever particularly effective.  As part of error recovery, If a device is 
tipping over into failure, adding another message isn't a good way to pull it 
back.  ABORT is really part of the I/O cancellation API, and, like all 
cancellation implementations, it's potentially full of holes.  The only uses 
it might have---like oops I didn't mean to fixate that CD, give it back to me 
now---aren't clearly defined in the SPEC to produce the desired effect (stop 
the fixation so the drive door can be opened).

> If we get to the point we need an abort we don't want to issue a
> reset. Not every device comes back sane from a reset and in some cases
> we have to issue a whole sequence of commands to get the state of the
> device back (door locking, power management, ..)

Well, this is SCSI---the first thing most controllers do for parallel SCSI at 
least is reset the BUS.  Some FC drivers do the FC equivalent as well (not 
that they should, but that's another issue).

The pain of coming back from a reset (and I grant, it isn't trivial) is well 
known and well implemented in SCSI.  It also, from error handlings point of 
view, sets the device back to a known point in the state model.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:32       ` James Bottomley
@ 2002-09-03 21:54         ` Alan Cox
  2002-09-03 22:50         ` Doug Ledford
  1 sibling, 0 replies; 39+ messages in thread
From: Alan Cox @ 2002-09-03 21:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 22:32, James Bottomley wrote:
> I think the block layer, which already knows about the barrier ordering, is 
> the appropriate place for this.  If you think the scsi error handler is a 
> hairy wart now, just watch it grow into a stonking great carbuncle as I try to 
> introduce it to the concept of command queue ordering and appropriate recovery.

Point taken

> I agree with your reasoning.  However, errors occur infrequently enough (I 
> hope) so that its just not worth the extra code complexity to make the error 
> handler look for that case.

When you ar ehandling CD-ROM problems then they can be quite a lot. Not
helped by the fact some ancient CD's seem to like to take a long walk in
the park when told to reset.

> cancellation implementations, it's potentially full of holes.  The only uses 
> it might have---like oops I didn't mean to fixate that CD, give it back to me 
> now---aren't clearly defined in the SPEC to produce the desired effect (stop 
> the fixation so the drive door can be opened).

Yes but does windows assume it. There are two specs 8)

> The pain of coming back from a reset (and I grant, it isn't trivial) is well 
> known and well implemented in SCSI.  It also, from error handlings point of 
> view, sets the device back to a known point in the state model.
> 

Resetting the bus is antisocial to say the least. We had lots of hangs
with the old eh handler that went away when the new one didnt keep
resetting the bus. If we do reset the bus then we have to handle "Sorry
can't reset the bus" (we have cards that can't or won't) as well as
being conservative about timings, making sure we don't reset the bus
during the seconds while a device rattles back into online state and so
on.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:32       ` James Bottomley
  2002-09-03 21:54         ` Alan Cox
@ 2002-09-03 22:50         ` Doug Ledford
  2002-09-03 23:28           ` Alan Cox
                             ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Doug Ledford @ 2002-09-03 22:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

> alan@lxorguk.ukuu.org.uk said:
> > Next you only need to return stuff if commands have been issued
> > between the aborting command and a barrier. Since most sane systems
> > will never be causing REQ_BARRIER

Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to 
support more robust journaling with order requirement verification.  If 
that's true, then REQ_BARRIER commands could become quite common on disks 
using ext3.

On Tue, Sep 03, 2002 at 04:32:38PM -0500, James Bottomley wrote:
> However, in all honesty, I have to say that I just don't believe ABORTs are 
> ever particularly effective.  As part of error recovery, If a device is 
> tipping over into failure, adding another message isn't a good way to pull it 
                             ^^^^^^^^^^^^^^^^^^^^^^
Then you might as well skip device resets since they are implemented using 
messages and go straight to bus resets.  Shot deflected, no score.

> back.  ABORT is really part of the I/O cancellation API, and, like all 
> cancellation implementations, it's potentially full of holes.  The only uses 
> it might have---like oops I didn't mean to fixate that CD, give it back to me 
> now---aren't clearly defined in the SPEC to produce the desired effect (stop 
> the fixation so the drive door can be opened).

In my experience, aborts have always actually worked fairly well in any 
scenario where a bus device reset will work.  Generally speaking, the 
problems I've always ran into with SCSI busses have been either A) this 
command is screwing up but it isn't confusing the drive so we can abort it 
or BDR it because the drive still responds to us or B) the bus is hung 
hard and no transfers or messages of any kind can make it through.  In the 
B case, a full bus reset is the only thing that works.  In the A case, 
aborts work just as often as anything else.

> The pain of coming back from a reset (and I grant, it isn't trivial) is well 
> known and well implemented in SCSI.  It also, from error handlings point of 
> view, sets the device back to a known point in the state model.

So does a successful abort.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:50         ` Doug Ledford
@ 2002-09-03 23:28           ` Alan Cox
  2002-09-04  7:40           ` Jeremy Higdon
  2002-09-04 16:13           ` James Bottomley
  2 siblings, 0 replies; 39+ messages in thread
From: Alan Cox @ 2002-09-03 23:28 UTC (permalink / raw)
  To: Doug Ledford; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 23:50, Doug Ledford wrote:
> Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to 
> support more robust journaling with order requirement verification.  If 
> that's true, then REQ_BARRIER commands could become quite common on disks 
> using ext3.

I doubt it very much because the only way to implement barriers on IDE
is fantastically expensive. I'm dubious it makes sense for ext3 to use
barriers in that way. All it needs as I understand the rules is that an
I/O isnt reported as completed by the device driver before it is on the
medium or in a non volatile cache.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:50         ` Doug Ledford
  2002-09-03 23:28           ` Alan Cox
@ 2002-09-04  7:40           ` Jeremy Higdon
  2002-09-04 16:24             ` James Bottomley
  2002-09-04 16:13           ` James Bottomley
  2 siblings, 1 reply; 39+ messages in thread
From: Jeremy Higdon @ 2002-09-04  7:40 UTC (permalink / raw)
  To: Doug Ledford, James Bottomley
  Cc: Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

On Sep 3,  6:50pm, Doug Ledford wrote:
> 
> > alan@lxorguk.ukuu.org.uk said:
> > > Next you only need to return stuff if commands have been issued
> > > between the aborting command and a barrier. Since most sane systems
> > > will never be causing REQ_BARRIER
> 
> Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to 
> support more robust journaling with order requirement verification.  If 
> that's true, then REQ_BARRIER commands could become quite common on disks 
> using ext3.

Hmm.  There do seem to be a lot of loopholes/race conditions where the
barrier just won't work right in the face of error recovery.  I wouldn't
want to use barriers on any system where data integrity was crucial.

For example, in Fibrechannel using class 3 (the usual)

	send command (command frame corrupted; device does not receive)
	send barrier (completes normally)
	... (lots of time goes by, many more commands are processed)
	timeout original command whose command frame was corrupted

The only safe way to run such a filesystem is to hold the barriers in
the driver until all previous commands are successfully completed.

There was also the problem of the queue full to the barrier command,
etc.

Did I miss the answer to these?  I don't recall seeing an answer to
Patrick Mansfield's questions either (original message edited cutting
out a couple of paragraphs):

> On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> > dledford@redhat.com said:
> > > Leave abort active.  It does actually work in certain scenarios.  The
> > > CD  burner scenario that started this thread is an example of
> > > somewhere that  an abort should actually do the job.
> >
> > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
> > At best, abort probably causes a command to overtake a barrier it shouldn't,
> > at worst we abort the ordered tag that is the barrier and transactional
> > integrity is lost.
> >
> > When error correction is needed, we have to return all the commands for that
> > device to the block layer so that ordering and barrier issues can be taken
> > care of in the reissue.  This makes LUN RESET (for those that support it) the
> > minimum level of error correction we can apply.
> >
> > James
> 
> If we only send an abort or reset after a quiesce I don't see why one
> is better than the other.
> 
> Not specific to reset or abort - if a single command gets an error, we
> wait for oustanding commands to complete before starting up the error
> handler thread. If all the commands (error one and outstanding) have
> barriers, those that do not error out will complete out of order from
> the errored command.
> 
> How is this properly handled?
> 
> And what happens if one command gets some sort of check condition (like
> medium error, or aborted command) that causes a retry? Will IO's still
> be correctly ordered?


jeremy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04  7:40           ` Jeremy Higdon
@ 2002-09-04 16:24             ` James Bottomley
  2002-09-04 17:13               ` Mike Anderson
  2002-09-05  9:50               ` Jeremy Higdon
  0 siblings, 2 replies; 39+ messages in thread
From: James Bottomley @ 2002-09-04 16:24 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

jeremy@classic.engr.sgi.com said:
> For example, in Fibrechannel using class 3 (the usual)

> 	send command (command frame corrupted; device does not receive)
> 	send barrier (completes normally)
> 	... (lots of time goes by, many more commands are processed)
> 	timeout original command whose command frame was corrupted 

This doesn't look right to me from the SCSI angle  I don't see how you can get 
a successful disconnect on a command the device doesn't receive (I take it 
this is some type of Fibre magic?).  Of course, if the device (or its proxy) 
does receive the command then the ordered queue tag implementation requires 
that the corrupted frame command be processed prior to the barrier,  this 
isn't optional if you obey the spec.  Thus, assuming the processor does no 
integrity checking of the command until it does processing (this should be a 
big if), then we still must get notification of the failed command before the 
barrier tag is begun.  Obviously, from that notification we do then race to 
eliminate the overtaking tags.

> There was also the problem of the queue full to the barrier command,
> etc. 

The queue full problem still exists.  I've used this argument against the 
filesystem people many times at the various fora where it has been discussed.  
The situation is that everyone agrees that it's a theoretical problem, but 
no-one is convinced that it will actually occur in practice (I think it falls 
into the "risk we're willing to take" category).

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:24             ` James Bottomley
@ 2002-09-04 17:13               ` Mike Anderson
  2002-09-05  9:50               ` Jeremy Higdon
  1 sibling, 0 replies; 39+ messages in thread
From: Mike Anderson @ 2002-09-04 17:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeremy Higdon, Doug Ledford, Alan Cox, Justin T. Gibbs,
	linux-kernel, linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> jeremy@classic.engr.sgi.com said:
> > For example, in Fibrechannel using class 3 (the usual)
> 
> > 	send command (command frame corrupted; device does not receive)
> > 	send barrier (completes normally)
> > 	... (lots of time goes by, many more commands are processed)
> > 	timeout original command whose command frame was corrupted 
> 
> This doesn't look right to me from the SCSI angle  I don't see how you can get 
> a successful disconnect on a command the device doesn't receive (I take it 
> this is some type of Fibre magic?).  Of course, if the device (or its proxy) 
> does receive the command then the ordered queue tag implementation requires 
> that the corrupted frame command be processed prior to the barrier,  this 
> isn't optional if you obey the spec.  Thus, assuming the processor does no 
> integrity checking of the command until it does processing (this should be a 
> big if), then we still must get notification of the failed command before the 
> barrier tag is begun.  Obviously, from that notification we do then race to 
> eliminate the overtaking tags.

In FC class 3 if you are logged into a port then notice of this loss
doesn't happen until a upper level timeout occurs (ULTP?). The loss can
happen prior to the command reaching the device (i.e. the switch can
drop the frame). If a corrupted frame makes it to the device it will be
discarded as there is not much it can do with a frame containing unreliable
data. In FC class 2 frames are ack'd so the recovery can be much more
responsive.


-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:24             ` James Bottomley
  2002-09-04 17:13               ` Mike Anderson
@ 2002-09-05  9:50               ` Jeremy Higdon
  1 sibling, 0 replies; 39+ messages in thread
From: Jeremy Higdon @ 2002-09-05  9:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

On Sep 4, 11:24am, James Bottomley wrote:
> 
> jeremy@classic.engr.sgi.com said:
> > For example, in Fibrechannel using class 3 (the usual)
> 
> > 	send command (command frame corrupted; device does not receive)
> > 	send barrier (completes normally)
> > 	... (lots of time goes by, many more commands are processed)
> > 	timeout original command whose command frame was corrupted 
> 
> This doesn't look right to me from the SCSI angle  I don't see how you can get 
> a successful disconnect on a command the device doesn't receive (I take it 
> this is some type of Fibre magic?).  Of course, if the device (or its proxy) 
> does receive the command then the ordered queue tag implementation requires 
> that the corrupted frame command be processed prior to the barrier,  this 
> isn't optional if you obey the spec.  Thus, assuming the processor does no 
> integrity checking of the command until it does processing (this should be a 
> big if), then we still must get notification of the failed command before the 
> barrier tag is begun.  Obviously, from that notification we do then race to 
> eliminate the overtaking tags.

You don't have disconnect per se with Fibre (or with other packet based
interfaces).  A command is sent as a frame.  If the frame is corrupted
or lost on the way to the target, then the target will never know about
the command having been sent.  There are obvious problems if that command
has an ordered tag, preceded and followed by simple tags dependent on
the ordered semantics being followed, and where the simple tagged commands
following don't wait for the ordered tag to finish.  There are other
failure modes too, of course.

> > There was also the problem of the queue full to the barrier command,
> > etc. 
> 
> The queue full problem still exists.  I've used this argument against the 
> filesystem people many times at the various fora where it has been discussed.  
> The situation is that everyone agrees that it's a theoretical problem, but 
> no-one is convinced that it will actually occur in practice (I think it falls 
> into the "risk we're willing to take" category).

I guess it all depends on how sensitive one is to getting it wrong.  In
any case, it looks as though you are doing what you can . . . .

> James

thanks

jeremy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:50         ` Doug Ledford
  2002-09-03 23:28           ` Alan Cox
  2002-09-04  7:40           ` Jeremy Higdon
@ 2002-09-04 16:13           ` James Bottomley
  2002-09-04 16:50             ` Justin T. Gibbs
  2 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2002-09-04 16:13 UTC (permalink / raw)
  To: James Bottomley, Alan Cox, Justin T. Gibbs, linux-kernel,
	linux-scsi

dledford@redhat.com said:
> Now, granted, that is more complex than going straight to a BDR, but I
>  have to argue that it *isn't* that complex.  It certainly isn't the
> nightmare you make it sound like ;-) 

It's three times longer even in pseudocode...

However, assume we do this (because we must for barrier preservation).  The 
chances are that for a failing device we're aborting a significant number of 
the tags.  This is quite a big increase in the message load over what we do 
now---Particularly for the AIC driver which can have hundreds of tags 
outstanding (murphys law says it's usually the earilest tag which times out).  
I'm not convinced that a BDR, which is a single message and has roughly the 
same effect, isn't preferable.

However, what about a compromise?  We can count outstanding commands, so what 
about doing abort *if* the number of outstanding commands is exactly one (the 
one we're trying to abort).  This means for devices that don't do TCQ (like 
old CD-ROMs) we'll always try abort first.  For large numbers of outstanding 
tags, we skip over abort and move straight to BDR.  The code to implement this 
will be clean and simple because abort no longer has to pay attention to the 
barrier.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:13           ` James Bottomley
@ 2002-09-04 16:50             ` Justin T. Gibbs
  2002-09-05  9:39               ` Jeremy Higdon
  0 siblings, 1 reply; 39+ messages in thread
From: Justin T. Gibbs @ 2002-09-04 16:50 UTC (permalink / raw)
  To: James Bottomley, Alan Cox, linux-kernel, linux-scsi

> dledford@redhat.com said:
>> Now, granted, that is more complex than going straight to a BDR, but I
>>  have to argue that it *isn't* that complex.  It certainly isn't the
>> nightmare you make it sound like ;-) 
> 
> It's three times longer even in pseudocode...

To make this work, you really need to use the QErr bit in the
disconnect/reconnect page and/or ECA or ACA.  QErr I believe is
well supported in devices, but ECA (pre SCSI-3) and ACA most
likely receive very little testing.

I will also voice my opinion (again) that watchdog timer recovery
is in the wrong place in Linux.  It belongs in the controller drivers:

1) Only the controller driver knows when to start the timeout
2) Only the controller driver knows the current status of the bus/transport
3) Only the controller can close timeout/just completed races
4) Only the controller driver knows the true transport type
   (SPI/FC/ATA/USB/1394/IP) and what recovery actions are appropriate
   for that transport type given the capabilities of the controller.
5) The algorithm for recovery and maintaining order becomes quite simple:
	1) Freeze the input queue for the controller
	2) Return all transactions unqueued to a device to the mid-layer
	3) Perform the recovery actions required
	4) Unfreeze the controller's queue
	5) Device type driver (sd, cd, tape, etc) decides what errors
	   at what rates should cause the failure of a device.  The
	   controller driver just needs to have the error codes so
	   it can honestly and fully report to the type driver what
	   really happens so it can make good decissions

   This of course assumes that all transactions have a serial number and
   that requeuing transactions orders them by serial number.  With QErr
   set, the device closes the rest if the barrier race for you, but even
   without barriers, full transaction ordering is required if you don't
   want a read to inadvertantly pass a write to the same location during
   recovery.

   For prior art, take a look at FreeBSD.  In the worst case, where
   escalation to a bus reset is required, recovery takes 5 seconds.

--
Justin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:50             ` Justin T. Gibbs
@ 2002-09-05  9:39               ` Jeremy Higdon
  2002-09-05 13:35                 ` Justin T. Gibbs
  0 siblings, 1 reply; 39+ messages in thread
From: Jeremy Higdon @ 2002-09-05  9:39 UTC (permalink / raw)
  To: Justin T. Gibbs, James Bottomley, Alan Cox, linux-kernel,
	linux-scsi

On Sep 4, 10:50am, Justin T. Gibbs wrote:
> 
>    This of course assumes that all transactions have a serial number and
>    that requeuing transactions orders them by serial number.  With QErr
>    set, the device closes the rest if the barrier race for you, but even
>    without barriers, full transaction ordering is required if you don't
>    want a read to inadvertantly pass a write to the same location during
>    recovery.


The original FCP (SCSI commands over Fibre) profile specified that QERR=1
was not available.  Unless that is changed, it would appear that you cannot
count on being able to set Qerr.

Qerr was one of those annoying little things in SCSI that forces host
adapter drivers to know a mode page setting.

jeremy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-05  9:39               ` Jeremy Higdon
@ 2002-09-05 13:35                 ` Justin T. Gibbs
  0 siblings, 0 replies; 39+ messages in thread
From: Justin T. Gibbs @ 2002-09-05 13:35 UTC (permalink / raw)
  To: Jeremy Higdon, James Bottomley, Alan Cox, linux-kernel,
	linux-scsi

> On Sep 4, 10:50am, Justin T. Gibbs wrote:
>> 
>>    This of course assumes that all transactions have a serial number and
>>    that requeuing transactions orders them by serial number.  With QErr
>>    set, the device closes the rest if the barrier race for you, but even
>>    without barriers, full transaction ordering is required if you don't
>>    want a read to inadvertantly pass a write to the same location during
>>    recovery.
> 
> 
> The original FCP (SCSI commands over Fibre) profile specified that QERR=1
> was not available.  Unless that is changed, it would appear that you
> cannot count on being able to set Qerr.
> 
> Qerr was one of those annoying little things in SCSI that forces host
> adapter drivers to know a mode page setting.

It is not the controllers that know, but the type drivers.  The controller
drivers should be conduits for commands, nothing more.  With the proper
events and error codes, the type drivers can maintain mode parameters
and only the type drivers know what parameters are appropriate for their
type of service.

For FC, you need to use ECA/ACA anyway as that is the only way to deal with
inflight commands at the time of an error.

--
Justin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 19:09   ` James Bottomley
  2002-09-03 20:59     ` Alan Cox
@ 2002-09-03 21:13     ` Doug Ledford
  2002-09-03 21:48       ` James Bottomley
  2002-09-03 21:24     ` Patrick Mansfield
  2 siblings, 1 reply; 39+ messages in thread
From: Doug Ledford @ 2002-09-03 21:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> > Leave abort active.  It does actually work in certain scenarios.  The
> > CD  burner scenario that started this thread is an example of
> > somewhere that  an abort should actually do the job. 
> 
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  

REQ_BARRIER is used for filesystem devices (disks mainly) while the 
devices that might benefit most from working aborts would likely be other 
devices.  But, regardless, the REQ_BARRIER ordering *can* be preserved 
while using abort processing.  Since the command that needs aborting is, 
as you are hypothesizing, before the REQ_BARRIER command, and since it 
hasn't completed, then the REQ_BARRIER command can not be complete and 
neither can any of the commands behind the REQ_BARRIER.  On direct access 
devices you are only concerned about ordering around the barrier, not 
ordering of the actual tagged commands, so for abort you can actually call 
abort on all the commands past the REQ_BARRIER command first, then the 
REQ_BARRIER command, then the hung command.  That would do the job and 
preserve REQ_BARRIER ordering while still using aborts.

> At best, abort probably causes a command to overtake a barrier it shouldn't, 
> at worst we abort the ordered tag that is the barrier and transactional 
> integrity is lost.
> 
> When error correction is needed, we have to return all the commands for that 
> device to the block layer so that ordering and barrier issues can be taken 
> care of in the reissue.

Not really, this would be easily enough done in the ML_QUEUE area of the 
scsi layer, but it matters not to me.  However, if you throw a BDR, then 
you have cancelled all outstanding commands and (typically) initiated a 
hard reset of the device which then requires a device settle time.  All of 
this is more drastic and typically takes longer than the individual aborts 
which are completed in a single connect->disconnect cycle without ever 
hitting a data phase and without triggering a full device reset and 
requiring a settle time.

>  This makes LUN RESET (for those that support it) the 
> minimum level of error correction we can apply.

Not true.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:13     ` Doug Ledford
@ 2002-09-03 21:48       ` James Bottomley
  2002-09-03 22:42         ` Doug Ledford
  2002-09-04 10:37         ` Andries Brouwer
  0 siblings, 2 replies; 39+ messages in thread
From: James Bottomley @ 2002-09-03 21:48 UTC (permalink / raw)
  To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

dledford@redhat.com said:
> But, regardless, the REQ_BARRIER ordering *can* be preserved  while
> using abort processing.  Since the command that needs aborting is,  as
> you are hypothesizing, before the REQ_BARRIER command, and since it
> hasn't completed, then the REQ_BARRIER command can not be complete and
>  neither can any of the commands behind the REQ_BARRIER.

You are correct.  However, as soon as you abort the problem command (assuming 
the device recovers from this), it will go on its merry way processing the 
remaining commands in the queue.  Assuming one of these is the barrier, you've 
no way now of re-queueing the aborted command so that it comes before the 
ordered tag barrier.  You can try using a head of queue tag, but it's still a 
nasty race.

> On direct access  devices you are only concerned about ordering around
> the barrier, not  ordering of the actual tagged commands, so for abort
> you can actually call  abort on all the commands past the REQ_BARRIER
> command first, then the  REQ_BARRIER command, then the hung command.
> That would do the job and  preserve REQ_BARRIER ordering while still
> using aborts.

I agree, but the most likely scenario is that now you're trying to abort 
almost every tag for that device in the system.  Isn't reset a simpler 
alternative to this?

> > At best, abort probably causes a command to overtake a barrier it shouldn't, 
> > at worst we abort the ordered tag that is the barrier and transactional 
> > integrity is lost.
> > 
> > When error correction is needed, we have to return all the commands for that 
> > device to the block layer so that ordering and barrier issues can be taken 
> > care of in the reissue.

> Not really, this would be easily enough done in the ML_QUEUE area of
> the  scsi layer, but it matters not to me.  However, if you throw a
> BDR, then  you have cancelled all outstanding commands and (typically)
> initiated a  hard reset of the device which then requires a device
> settle time.  All of  this is more drastic and typically takes longer
> than the individual aborts  which are completed in a single connect->
> disconnect cycle without ever  hitting a data phase and without
> triggering a full device reset and  requiring a settle time. 

I agree.  I certainly could do it.  I'm just a lazy so-and-so.  However, think 
what it does.  Apart from me having to do more work, the code becomes longer 
and the error recovery path more convoluted and difficult to follow.  The 
benefit?  well, error recovery might be faster in certain circumstances.  I 
just don't see that it's a cost effective change.  If you're hitting error 
recovery so often that whether it recovers in  half a second or several 
seconds makes a difference, I'd say there's something else wrong.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:48       ` James Bottomley
@ 2002-09-03 22:42         ` Doug Ledford
  2002-09-03 22:52           ` Doug Ledford
                             ` (2 more replies)
  2002-09-04 10:37         ` Andries Brouwer
  1 sibling, 3 replies; 39+ messages in thread
From: Doug Ledford @ 2002-09-03 22:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> You are correct.  However, as soon as you abort the problem command (assuming 
> the device recovers from this), it will go on its merry way processing the 
> remaining commands in the queue.  Assuming one of these is the barrier, you've 
> no way now of re-queueing the aborted command so that it comes before the 
> ordered tag barrier.  You can try using a head of queue tag, but it's still a 
> nasty race.

(Solution to this race was in my next paragraph as you found ;-)

> > On direct access  devices you are only concerned about ordering around
> > the barrier, not  ordering of the actual tagged commands, so for abort
> > you can actually call  abort on all the commands past the REQ_BARRIER
> > command first, then the  REQ_BARRIER command, then the hung command.
> > That would do the job and  preserve REQ_BARRIER ordering while still
> > using aborts.
> 
> I agree, but the most likely scenario is that now you're trying to abort 
> almost every tag for that device in the system.  Isn't reset a simpler 
> alternative to this?

Not really.  It hasn't been done yet, but one of my goals is to change the 
scsi commands over to reasonable list usage (finally) so that we can avoid 
all these horrible linear scans it does now looking for an available 
command (it also means things like SCSI_OWNER_MID_LAYER can go away 
because ownership is defined implicitly by list membership).  So, 
basically, you have a list item struct on each command.  When you build 
the commands, you add them to SDpnt->free_list.  When you need a command, 
instead of searching for a free one, you just grab the head of 
SDpnt->free_list and use it.  Once you've built the command and are ready 
to hand it off to the lldd, you put the command on the tail of the 
SDpnt->active_list.  When a command completes, you list_remove() it from 
the SDpnt->active_list and put it on the SDpnt->complete_list to be 
handled by the tasklet.  When the tasklet actually completes the command, 
it frees the scsi command struct by simply putting it back on the 
SDpnt->free_list.  Now, if you do things that way, your reset vs. abort 
code is actually pretty trivial.

Case 1: you want to throw a BDR.  Sample code might end up looking like 
this,

	[ oops we timed out ]
	hostt->bus_device_reset(cmd);
	if(!list_empty(cmd->device->active_list)) {
		[ our commands haven't all been returned, spew chunks! ]
	}
	[ do post reset processing ]

Case 2: you want to do an abort, but you need to preserve ordering around 
any possible REQ_BARRIERs on the bus.  This requires that we keep a 
REQ_BARRIER count for the device, it is after all possible that we could 
have multiple barriers active at once, so as each command is put on the 
active_list, if it is a barrier, then we increment SDpnt->barrier_count 
and as we complete commands (at the interrupt context completion, not the 
final completion) if it is a barrier command we decrement the count.

	[ oops we timed out ]
	while(SDpnt->barrier_count && cmd) {
		// when the aborted command is returned via the done()
		// it will remove it from the active_list, so don't remove
		// it here
		abort_cmd = list_get_tail(SDpnt->active_list);
		if(hostt->abort(abort_cmd) != SUCCESS) {
			[ oops, go on to more drastic action ]
		} else {
			if(abort_cmd->type == BARRIER)
				SDpnt->barrier_count--;
			if(abort_cmd == cmd)
				cmd = NULL;
		}
	}
	if(cmd) {
		if(hostt->abort(cmd) != SUCCESS)
			[ oops, go on to more drastic action ]
	}

Now, granted, that is more complex than going straight to a BDR, but I 
have to argue that it *isn't* that complex.  It certainly isn't the 
nightmare you make it sound like ;-)

> > > At best, abort probably causes a command to overtake a barrier it shouldn't, 
> > > at worst we abort the ordered tag that is the barrier and transactional 
> > > integrity is lost.
> > > 
> > > When error correction is needed, we have to return all the commands for that 
> > > device to the block layer so that ordering and barrier issues can be taken 
> > > care of in the reissue.
> 
> > Not really, this would be easily enough done in the ML_QUEUE area of
> > the  scsi layer, but it matters not to me.  However, if you throw a
> > BDR, then  you have cancelled all outstanding commands and (typically)
> > initiated a  hard reset of the device which then requires a device
> > settle time.  All of  this is more drastic and typically takes longer
> > than the individual aborts  which are completed in a single connect->
> > disconnect cycle without ever  hitting a data phase and without
> > triggering a full device reset and  requiring a settle time. 
> 
> I agree.  I certainly could do it.  I'm just a lazy so-and-so.  However, think 
> what it does.  Apart from me having to do more work, the code becomes longer 
> and the error recovery path more convoluted and difficult to follow.  The 
> benefit?  well, error recovery might be faster in certain circumstances.

Well, as I've laid it out above, I don't really think it's all that much 
to implement ;-)  At least not in the mid layer.  The low level device 
drivers are doing *far* more work to support aborts than the mid layer has 
to do.

>  I 
> just don't see that it's a cost effective change.

Matter of some question, I'm sure.  I don't see it as all that much work, 
so it seems reasonably cost effective to me ;-)

>  If you're hitting error 
> recovery so often that whether it recovers in  half a second or several 
> seconds makes a difference, I'd say there's something else wrong.

Hehehe, if you are hitting error recovery at all then something else is 
wrong by definition, the only difference is in how you handle it :-P

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:42         ` Doug Ledford
@ 2002-09-03 22:52           ` Doug Ledford
  2002-09-03 23:29           ` Alan Cox
  2002-09-04 21:16           ` Luben Tuikov
  2 siblings, 0 replies; 39+ messages in thread
From: Doug Ledford @ 2002-09-03 22:52 UTC (permalink / raw)
  To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 06:42:16PM -0400, Doug Ledford wrote:
> 
> Case 2: you want to do an abort, but you need to preserve ordering around 
> any possible REQ_BARRIERs on the bus.  This requires that we keep a 
> REQ_BARRIER count for the device, it is after all possible that we could 
> have multiple barriers active at once, so as each command is put on the 
> active_list, if it is a barrier, then we increment SDpnt->barrier_count 
> and as we complete commands (at the interrupt context completion, not the 
> final completion) if it is a barrier command we decrement the count.
> 
> 	[ oops we timed out ]
> 	while(SDpnt->barrier_count && cmd) {
> 		// when the aborted command is returned via the done()
> 		// it will remove it from the active_list, so don't remove
> 		// it here
> 		abort_cmd = list_get_tail(SDpnt->active_list);
> 		if(hostt->abort(abort_cmd) != SUCCESS) {
> 			[ oops, go on to more drastic action ]
> 		} else {
> 			if(abort_cmd->type == BARRIER)
> 				SDpnt->barrier_count--;

Oops, delete those last two lines....the done() function decrements the 
barrier_count for us.

> 			if(abort_cmd == cmd)
> 				cmd = NULL;
> 		}
> 	}
> 	if(cmd) {
> 		if(hostt->abort(cmd) != SUCCESS)
> 			[ oops, go on to more drastic action ]
> 	}

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:42         ` Doug Ledford
  2002-09-03 22:52           ` Doug Ledford
@ 2002-09-03 23:29           ` Alan Cox
  2002-09-04 21:16           ` Luben Tuikov
  2 siblings, 0 replies; 39+ messages in thread
From: Alan Cox @ 2002-09-03 23:29 UTC (permalink / raw)
  To: Doug Ledford; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 23:42, Doug Ledford wrote:
> Not really.  It hasn't been done yet, but one of my goals is to change the 
> scsi commands over to reasonable list usage (finally) so that we can avoid 
> all these horrible linear scans it does now looking for an available 
> command (it also means things like SCSI_OWNER_MID_LAYER can go away 

At least partly done in 2.4-ac and waiting pushing to Marcelo. I believe
Hugh contributed the O(1) command finder



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:42         ` Doug Ledford
  2002-09-03 22:52           ` Doug Ledford
  2002-09-03 23:29           ` Alan Cox
@ 2002-09-04 21:16           ` Luben Tuikov
  2 siblings, 0 replies; 39+ messages in thread
From: Luben Tuikov @ 2002-09-04 21:16 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-kernel, linux-scsi

Doug Ledford wrote:
> 
> Not really.  It hasn't been done yet, but one of my goals is to change the
> scsi commands over to reasonable list usage (finally) so that we can avoid
> all these horrible linear scans it does now looking for an available
> command

Using the struct list_head for this will literally allow you to do _magic_.
Avoiding the linear scan is the last thing this will fix.

It would allow for a lot better/simpler/sound design of all of
the mid layer/SCSI core. Things will be/become easier as you
point out below. Currently the mid-layer queuing is hairy at best.

I'm all for it.

> So,
> basically, you have a list item struct on each command.  When you build
> the commands, you add them to SDpnt->free_list.  When you need a command,
> instead of searching for a free one, you just grab the head of
> SDpnt->free_list and use it.  Once you've built the command and are ready
> to hand it off to the lldd, you put the command on the tail of the
> SDpnt->active_list.  When a command completes, you list_remove() it from
> the SDpnt->active_list and put it on the SDpnt->complete_list to be
> handled by the tasklet.  When the tasklet actually completes the command,
> it frees the scsi command struct by simply putting it back on the
> SDpnt->free_list.

Great!

Once you're on that train, you may want to rethink the whole queuing
mechanism of the mid-layer (straight from sd/etc and internally down to LLDD)
for an improved design.

There'd be problems like cmd moving b/n lists is atomic, only cmd movers
can actually cancel a command, move before calling queuecommand(), etc,
but is nothing extraordinary.

> Now, granted, that is more complex than going straight to a BDR, but I
> have to argue that it *isn't* that complex.  It certainly isn't the
> nightmare you make it sound like ;-)

No, it certainly is NOT!

Granted, by looking at the code it will not be overly clear
who moves what and when, but a 2 page commentary on the design
would only leave one exlaiming ``Aaaaah... such simplicity, so great!'' 

> Well, as I've laid it out above, I don't really think it's all that much
> to implement ;-)  At least not in the mid layer.

Right, it's not. This type of queuing mechanism would only make things
more consistent and easy to manipulate.

There'd be logistical issues, but those are easy to figure out
with pen and paper.

-- 
Luben

``Perfection is achieved not when there is nothing more to add
  but when there is nothing left to take away.''
                              Antoine de Saint Exupery

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:48       ` James Bottomley
  2002-09-03 22:42         ` Doug Ledford
@ 2002-09-04 10:37         ` Andries Brouwer
  2002-09-04 10:48           ` Doug Ledford
  2002-09-04 11:23           ` Alan Cox
  1 sibling, 2 replies; 39+ messages in thread
From: Andries Brouwer @ 2002-09-04 10:37 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote:

> If you're hitting error recovery so often that whether it recovers
> in  half a second or several seconds makes a difference, I'd say
> there's something else wrong.

Not that I want to contradict, but an example.
Without my sd.c patch from yesterday or so (fixing MODE SENSE calls)
an "insmod usb-storage.o" would take 14 minutes and 6 seconds for me.

[One USB device, with 3 subdevices, gets into a bad state when
presented with a MODE SENSE command that asks for more than the
56 bytes it has available. For each of the three subdevices we
get a long sequence of retries, abort, reset, host reset, bus reset
before it is taken off-line.]

The scsi error recovery has many bad properties, but one is its slowness.
Once it gets triggered on a machine with SCSI disks it is common to
have a dead system for several minutes. I have not yet met a situation
in which rebooting was not preferable above scsi error recovery,
especially since the attempt to recover often fails.

Andries

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 10:37         ` Andries Brouwer
@ 2002-09-04 10:48           ` Doug Ledford
  2002-09-04 11:23           ` Alan Cox
  1 sibling, 0 replies; 39+ messages in thread
From: Doug Ledford @ 2002-09-04 10:48 UTC (permalink / raw)
  To: Andries Brouwer
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Wed, Sep 04, 2002 at 12:37:37PM +0200, Andries Brouwer wrote:
> 
> The scsi error recovery has many bad properties, but one is its slowness.

This does not have to be this way.  It is a solvable problem.

> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes.

Yes, well, this too is solvable.  It, in fact, reminds me that one of the 
things I think needs added to the scsi host settings is a default timeout 
value for typical devices.  Something like adding a default_timeout value 
to each Scsi_Device struct and allowing the scsi driver to modify the 
value during the slave_attach() call.  Then we can put the default timeout 
on non-intelligent controllers to something sane while things like 
MegaRAID controllers can keep their sky high timeout values.

> I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.

This, too, is solvable.  It just requires that the scsi subsystem start 
paying attention to *how* things fail and making the error handling code 
smart enough to know when to retry things and when to just fail.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 10:37         ` Andries Brouwer
  2002-09-04 10:48           ` Doug Ledford
@ 2002-09-04 11:23           ` Alan Cox
  2002-09-04 16:25             ` Rogier Wolff
  1 sibling, 1 reply; 39+ messages in thread
From: Alan Cox @ 2002-09-04 11:23 UTC (permalink / raw)
  To: Andries Brouwer
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote:
> The scsi error recovery has many bad properties, but one is its slowness.
> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes. I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.

Well I for one prefer the scsi timeout/abort sequence on a CD getting
confused badly by a bad block (as at least some of my drives do) to a
reboot everytime I get bad media


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 11:23           ` Alan Cox
@ 2002-09-04 16:25             ` Rogier Wolff
  2002-09-04 19:34               ` Thunder from the hill
  0 siblings, 1 reply; 39+ messages in thread
From: Rogier Wolff @ 2002-09-04 16:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andries Brouwer, James Bottomley, Justin T. Gibbs, linux-kernel,
	linux-scsi

On Wed, Sep 04, 2002 at 12:23:13PM +0100, Alan Cox wrote:
> On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote:
> > The scsi error recovery has many bad properties, but one is its slowness.
> > Once it gets triggered on a machine with SCSI disks it is common to
> > have a dead system for several minutes. I have not yet met a situation
> > in which rebooting was not preferable above scsi error recovery,
> > especially since the attempt to recover often fails.
> 
> Well I for one prefer the scsi timeout/abort sequence on a CD getting
> confused badly by a bad block (as at least some of my drives do) to a
> reboot everytime I get bad media

Reboot is bad. Retries are bad. 

Errors should be returned to an upper layer, with an error code: "may
retry", or "will never work". (Like in SMTP)

I will most likely set the "retry count" to 0: Never retry. Almost
never works anyway. And the disk already retried manytimes, so
retrying in software is only "taking time".

We do datarecovery around here. We get bad disks on a dayly basis. We
are currently reading a drive that gets over 10Mb per second while
spitting out bad block reports!

Thing is: those blocks that didn't work first time, may work on our
second retry. However, we need userspace control over that retry. We
prefer to get the 18G worth of data off the disk first, and only then
retry the blocks that happened to be bad first time around.

			Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currenly in such an       * 
* excursion: The stable situation does not include humans. ***************

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:25             ` Rogier Wolff
@ 2002-09-04 19:34               ` Thunder from the hill
  0 siblings, 0 replies; 39+ messages in thread
From: Thunder from the hill @ 2002-09-04 19:34 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Alan Cox, Andries Brouwer, James Bottomley, Justin T. Gibbs,
	linux-kernel, linux-scsi

Hi,

On Wed, 4 Sep 2002, Rogier Wolff wrote:
> I will most likely set the "retry count" to 0: Never retry. Almost
> never works anyway. And the disk already retried manytimes, so
> retrying in software is only "taking time".

fd = open(pathname, O_NORETRY); ?

			Thunder
-- 
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 19:09   ` James Bottomley
  2002-09-03 20:59     ` Alan Cox
  2002-09-03 21:13     ` Doug Ledford
@ 2002-09-03 21:24     ` Patrick Mansfield
  2002-09-03 22:02       ` James Bottomley
  2 siblings, 1 reply; 39+ messages in thread
From: Patrick Mansfield @ 2002-09-03 21:24 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

James -

On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> > Leave abort active.  It does actually work in certain scenarios.  The
> > CD  burner scenario that started this thread is an example of
> > somewhere that  an abort should actually do the job. 
> 
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  
> At best, abort probably causes a command to overtake a barrier it shouldn't, 
> at worst we abort the ordered tag that is the barrier and transactional 
> integrity is lost.
> 
> When error correction is needed, we have to return all the commands for that 
> device to the block layer so that ordering and barrier issues can be taken 
> care of in the reissue.  This makes LUN RESET (for those that support it) the 
> minimum level of error correction we can apply.
> 
> James

If we only send an abort or reset after a quiesce I don't see why one
is better than the other.

Not specific to reset or abort - if a single command gets an error, we
wait for oustanding commands to complete before starting up the error
handler thread. If all the commands (error one and outstanding) have
barriers, those that do not error out will complete out of order from
the errored command.

How is this properly handled? 

And what happens if one command gets some sort of check condition (like
medium error, or aborted command) that causes a retry? Will IO's still
be correctly ordered?

The abort could also be usefull in handling the locking/ownership of the
scsi_cmnd - the abort at the LLD layer can be used by the LLD to cancel
any software timeouts, as well as to flush the command from the hardware.
After the abort, the mid-layer could assume that it once again "owned"
the scsi_cmnd, especially if the LLD abort were a required function.

I would like to see error handling occur without quiescing the entire
adapter before taking any action. Stopping all adapter IO for a timeout
can be a bit expensive - imagine a tape drive and multiple disks on an
adapter, any IO disk timeout or failure will wait for the tape IO to
complete before allowing any other IO, if the tape operation is long or
is going to timeout this could be minutes or hours.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:24     ` Patrick Mansfield
@ 2002-09-03 22:02       ` James Bottomley
  2002-09-03 23:26         ` Alan Cox
  0 siblings, 1 reply; 39+ messages in thread
From: James Bottomley @ 2002-09-03 22:02 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

patmans@us.ibm.com said:
> If we only send an abort or reset after a quiesce I don't see why one
> is better than the other.

Quiesce means from the top (no more commands go from the block layer to the 
device) not from the bottom (commands can still be completed for the device).

> Not specific to reset or abort - if a single command gets an error, we
> wait for oustanding commands to complete before starting up the error
> handler thread. If all the commands (error one and outstanding) have
> barriers, those that do not error out will complete out of order from
> the errored command. 

We don't wait.  The commands may possibly be completing in parallel with 
recovery.  To address your point, though, that's why the device needs to be 
reset as fast as possible: to preserve what's left of the command order.  I 
accept that for a misbehaving device, this may be a race.

> And what happens if one command gets some sort of check condition
> (like medium error, or aborted command) that causes a retry? Will IO's
> still be correctly ordered? 

Retries get eliminated.  It should be up to the upper layers (sd or beyond) to 
say whether a retry is done.  Since, as you point out, retries automatically 
violate any barrier, it is probably up to the block layer to judge what should 
be done about the problem.

> I would like to see error handling occur without quiescing the entire
> adapter before taking any action. Stopping all adapter IO for a
> timeout can be a bit expensive - imagine a tape drive and multiple
> disks on an adapter, any IO disk timeout or failure will wait for the
> tape IO to complete before allowing any other IO, if the tape
> operation is long or is going to timeout this could be minutes or
> hours. 

Actually, I do think that quiescing has an important role to play.  A lot of 
drive errors can be self inflicted indigestion by accepting more tags into the 
queue than it can process.  Quiescing the queue lets us see if the drive can 
digest what it currently has or whether we need to apply a strong emetic.

I'd actually like to add tag starvation recovery to the error handler's list 
of things to do.  In that case, all you do on entry to the error handler is 
quiesce the upper queue and wait a while to see if the commands continue 
returning.  You only begin more drastic measures if nothing comes back in the 
designated time period.

James

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:02       ` James Bottomley
@ 2002-09-03 23:26         ` Alan Cox
  0 siblings, 0 replies; 39+ messages in thread
From: Alan Cox @ 2002-09-03 23:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 23:02, James Bottomley wrote:
> > And what happens if one command gets some sort of check condition
> > (like medium error, or aborted command) that causes a retry? Will IO's
> > still be correctly ordered? 
> 
> Retries get eliminated.  It should be up to the upper layers (sd or beyond) to 
> say whether a retry is done.  Since, as you point out, retries automatically 
> violate any barrier, it is probably up to the block layer to judge what should 
> be done about the problem.

Then we need to give the block layer a lot more information about what
kind of a problem occurred


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2002-09-05 13:32 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
2002-06-11  5:50 ` Jens Axboe
2002-06-11 14:29   ` James Bottomley
2002-06-11 14:45     ` Jens Axboe
2002-06-11 16:39       ` James Bottomley
2002-06-13 21:01 ` Doug Ledford
2002-06-13 21:26   ` James Bottomley
2002-06-13 21:50     ` Doug Ledford
2002-06-13 22:09       ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley
2002-09-03 18:23 ` Doug Ledford
2002-09-03 19:09   ` James Bottomley
2002-09-03 20:59     ` Alan Cox
2002-09-03 21:32       ` James Bottomley
2002-09-03 21:54         ` Alan Cox
2002-09-03 22:50         ` Doug Ledford
2002-09-03 23:28           ` Alan Cox
2002-09-04  7:40           ` Jeremy Higdon
2002-09-04 16:24             ` James Bottomley
2002-09-04 17:13               ` Mike Anderson
2002-09-05  9:50               ` Jeremy Higdon
2002-09-04 16:13           ` James Bottomley
2002-09-04 16:50             ` Justin T. Gibbs
2002-09-05  9:39               ` Jeremy Higdon
2002-09-05 13:35                 ` Justin T. Gibbs
2002-09-03 21:13     ` Doug Ledford
2002-09-03 21:48       ` James Bottomley
2002-09-03 22:42         ` Doug Ledford
2002-09-03 22:52           ` Doug Ledford
2002-09-03 23:29           ` Alan Cox
2002-09-04 21:16           ` Luben Tuikov
2002-09-04 10:37         ` Andries Brouwer
2002-09-04 10:48           ` Doug Ledford
2002-09-04 11:23           ` Alan Cox
2002-09-04 16:25             ` Rogier Wolff
2002-09-04 19:34               ` Thunder from the hill
2002-09-03 21:24     ` Patrick Mansfield
2002-09-03 22:02       ` James Bottomley
2002-09-03 23:26         ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox