* Proposed changes to generic blk tag for use in SCSI (1/3)
@ 2002-06-11 2:46 James Bottomley
2002-06-11 5:50 ` Jens Axboe
2002-06-13 21:01 ` Doug Ledford
0 siblings, 2 replies; 39+ messages in thread
From: James Bottomley @ 2002-06-11 2:46 UTC (permalink / raw)
To: axboe, linux-scsi; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 845 bytes --]
The attached is what I needed in the generic block layer to get the SCSI
subsystem using it for Tag Command Queueing.
The changes are basically
1) I need a function to find the tagged request given the queue and the tag,
which I've added as a function in the block layer
2) The SCSI queue will stall if it gets an untagged request in the stream, so
once tagged queueing is enabled, all commands (including SPECIALS) must be
tagged. I altered the check in blk_queue_start_tag to permit this.
This is part of a set of three patches which provide a sample implementation
of a SCSI driver using the generic TCQ code.
There are several shortcomings of the prototype, most notably it doesn't have
tag starvation detection and processing. However, I think I can re-introduce
this as part of the error handler functions.
James Bottomley
[-- Attachment #2: blk-tag-2.5.21.diff --]
[-- Type: text/plain , Size: 2554 bytes --]
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.585 -> 1.586
# drivers/block/ll_rw_blk.c 1.67 -> 1.68
# include/linux/blkdev.h 1.44 -> 1.45
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/06/10 jejb@mulgrave.(none) 1.586
# [BLK LAYER]
#
# add find tag function, adjust criteria for tagging commands.
# --------------------------------------------
#
diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c Mon Jun 10 22:29:38 2002
+++ b/drivers/block/ll_rw_blk.c Mon Jun 10 22:29:38 2002
@@ -304,6 +304,27 @@
}
/**
+ * blk_queue_find_tag - find a request by its tag and queue
+ *
+ * @q: The request queue for the device
+ * @tag: The tag of the request
+ *
+ * Notes:
+ * Should be used when a device returns a tag and you want to match
+ * it with a request.
+ *
+ * no locks need be held.
+ **/
+struct request *blk_queue_find_tag(request_queue_t *q, int tag)
+{
+ struct blk_queue_tag *bqt = q->queue_tags;
+
+ if(unlikely(bqt == NULL || bqt->max_depth < tag))
+ return NULL;
+
+ return bqt->tag_index[tag];
+}
+/**
* blk_queue_free_tags - release tag maintenance info
* @q: the request queue for the device
*
@@ -448,7 +469,7 @@
unsigned long *map = bqt->tag_map;
int tag = 0;
- if (unlikely(!(rq->flags & REQ_CMD)))
+ if (unlikely((rq->flags & REQ_QUEUED)))
return 1;
for (map = bqt->tag_map; *map == -1UL; map++) {
@@ -1945,6 +1966,7 @@
EXPORT_SYMBOL(ll_10byte_cmd_build);
EXPORT_SYMBOL(blk_queue_prep_rq);
+EXPORT_SYMBOL(blk_queue_find_tag);
EXPORT_SYMBOL(blk_queue_init_tags);
EXPORT_SYMBOL(blk_queue_free_tags);
EXPORT_SYMBOL(blk_queue_start_tag);
diff -Nru a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h Mon Jun 10 22:29:38 2002
+++ b/include/linux/blkdev.h Mon Jun 10 22:29:38 2002
@@ -339,6 +339,7 @@
#define blk_queue_tag_queue(q) ((q)->queue_tags->busy < (q)->queue_tags->max_depth)
#define blk_rq_tagged(rq) ((rq)->flags & REQ_QUEUED)
extern int blk_queue_start_tag(request_queue_t *, struct request *);
+extern struct request *blk_queue_find_tag(request_queue_t *, int);
extern void blk_queue_end_tag(request_queue_t *, struct request *);
extern int blk_queue_init_tags(request_queue_t *, int);
extern void blk_queue_free_tags(request_queue_t *);
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-11 2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley @ 2002-06-11 5:50 ` Jens Axboe 2002-06-11 14:29 ` James Bottomley 2002-06-13 21:01 ` Doug Ledford 1 sibling, 1 reply; 39+ messages in thread From: Jens Axboe @ 2002-06-11 5:50 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi, linux-kernel On Mon, Jun 10 2002, James Bottomley wrote: > The attached is what I needed in the generic block layer to get the SCSI > subsystem using it for Tag Command Queueing. > > The changes are basically > > 1) I need a function to find the tagged request given the queue and the tag, > which I've added as a function in the block layer Ehm it's already there, one could argue that it's pretty core functionality for this type of stuff :-). It's called blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that we should just move it into ll_rw_blk.c. That gets better documented as well. Could you redo that part? > 2) The SCSI queue will stall if it gets an untagged request in the stream, so > once tagged queueing is enabled, all commands (including SPECIALS) must be > tagged. I altered the check in blk_queue_start_tag to permit this. I completely agree with this, blk_queue_start_tag() should not need to know about these things so just checking if the request is already marked tagged is fine with me. But please make that a warning, like if (rq->flags & REQ_QUEUED) { printk("blk_queue_start_tag: rq already tagged\n"); return 1; } Also, you need to fix drivers/ide/tcq.c to make sure it doesn't call blk_queue_start_tag() for non REQ_CMD requests. Ah wait, I'll just change that. And also _please_ fix the comment about REQ_CMD and not just the code, it's doesn't stand anymore. > This is part of a set of three patches which provide a sample implementation > of a SCSI driver using the generic TCQ code. Cool! Looking forward to reviewing it later today. -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-11 5:50 ` Jens Axboe @ 2002-06-11 14:29 ` James Bottomley 2002-06-11 14:45 ` Jens Axboe 0 siblings, 1 reply; 39+ messages in thread From: James Bottomley @ 2002-06-11 14:29 UTC (permalink / raw) To: Jens Axboe; +Cc: James Bottomley, linux-scsi, linux-kernel axboe@suse.de said: > Ehm it's already there, one could argue that it's pretty core > functionality for this type of stuff :-). It's called > blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that > we should just move it into ll_rw_blk.c. That gets better documented > as well. Could you redo that part? I guessed it must be. I grepped the IDE tree looking for anything with `get' or `find' in it, but came up empty. It's actually called blk_queue_tag_request(), which is why I didn't find it. Do you want me to keep this name if I move it? axboe@suse.de said: > I completely agree with this, blk_queue_start_tag() should not need to > know about these things so just checking if the request is already > marked tagged is fine with me. But please make that a warning, like Actually, I think it should be a BUG(). By the time a tagged request comes in to blk_queue_start_tag, we must already have corrupted the lists since we use the same list element (req->queuelist) to queue on both the tag queue and the request queue. > And also _please_ fix the comment about REQ_CMD and not just the code, > it's doesn't stand anymore. Will do...I didn't see much point altering the comment in the prototype until there was agreement that it was OK to do it this way. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-11 14:29 ` James Bottomley @ 2002-06-11 14:45 ` Jens Axboe 2002-06-11 16:39 ` James Bottomley 0 siblings, 1 reply; 39+ messages in thread From: Jens Axboe @ 2002-06-11 14:45 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi, linux-kernel On Tue, Jun 11 2002, James Bottomley wrote: > axboe@suse.de said: > > Ehm it's already there, one could argue that it's pretty core > > functionality for this type of stuff :-). It's called > > blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that > > we should just move it into ll_rw_blk.c. That gets better documented > > as well. Could you redo that part? > > I guessed it must be. I grepped the IDE tree looking for anything with `get' > or `find' in it, but came up empty. It's actually called > blk_queue_tag_request(), which is why I didn't find it. > > Do you want me to keep this name if I move it? Nah the name sucks, you see I didn't even remember it myself either. Why not just change the name to something sane, like blk_queue_get_tag() or blk_queue_find_tag(). I think yours (the latter) describes it best. Just ide accordingly as well, that's the sole current user. > axboe@suse.de said: > > I completely agree with this, blk_queue_start_tag() should not need to > > know about these things so just checking if the request is already > > marked tagged is fine with me. But please make that a warning, like > > Actually, I think it should be a BUG(). By the time a tagged request > comes in to blk_queue_start_tag, we must already have corrupted the > lists since we use the same list element (req->queuelist) to queue on > both the tag queue and the request queue. Agree, make it a BUG_ON() or something. > > And also _please_ fix the comment about REQ_CMD and not just the code, > > it's doesn't stand anymore. > > Will do...I didn't see much point altering the comment in the > prototype until there was agreement that it was OK to do it this way. Fine. -- Jens Axboe ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-11 14:45 ` Jens Axboe @ 2002-06-11 16:39 ` James Bottomley 0 siblings, 0 replies; 39+ messages in thread From: James Bottomley @ 2002-06-11 16:39 UTC (permalink / raw) To: Jens Axboe; +Cc: James Bottomley, linux-scsi, linux-kernel OK, this is the new patch with all the changes. I didn't alter the IDE blk_queue_start_tag() to check for a REQ_CMD, since you said you would do that. James You can import this changeset into BK by piping this whole message to: '| bk receive [path to repository]' or apply the patch as usual. =================================================================== ChangeSet@1.490, 2002-06-11 12:33:05-04:00, jejb@mulgrave.(none) [BLK TCQ] remove blk_queue_tag_request() in favour of blk_queue_find_tag() Allow any request type into blk_queue_start_tag() ChangeSet@1.489, 2002-06-11 10:34:17-04:00, jejb@mulgrave.(none) [BLK LAYER] add find tag function, adjust criteria for tagging commands. drivers/block/ll_rw_blk.c | 40 ++++++++++++++++++++++++++++++++++------ drivers/ide/tcq.c | 2 +- include/linux/blkdev.h | 2 +- 3 files changed, 36 insertions(+), 8 deletions(-) diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c --- a/drivers/block/ll_rw_blk.c Tue Jun 11 12:38:12 2002 +++ b/drivers/block/ll_rw_blk.c Tue Jun 11 12:38:12 2002 @@ -297,6 +297,27 @@ } /** + * blk_queue_find_tag - find a request by its tag and queue + * + * @q: The request queue for the device + * @tag: The tag of the request + * + * Notes: + * Should be used when a device returns a tag and you want to match + * it with a request. + * + * no locks need be held. + **/ +struct request *blk_queue_find_tag(request_queue_t *q, int tag) +{ + struct blk_queue_tag *bqt = q->queue_tags; + + if(unlikely(bqt == NULL || bqt->max_depth < tag)) + return NULL; + + return bqt->tag_index[tag]; +} +/** * blk_queue_free_tags - release tag maintenance info * @q: the request queue for the device * @@ -429,10 +450,12 @@ * Description: * This can either be used as a stand-alone helper, or possibly be * assigned as the queue &prep_rq_fn (in which case &struct request - * automagically gets a tag assigned). Note that this function assumes - * that only REQ_CMD requests can be queued! The request will also be - * removed from the request queue, so it's the drivers responsibility to - * readd it if it should need to be restarted for some reason. + * automagically gets a tag assigned). Note that this function + * assumes that any type of request can be queued! if this is not + * true for your device, you must check the request type before + * calling this function. The request will also be removed from + * the request queue, so it's the drivers responsibility to readd + * it if it should need to be restarted for some reason. * * Notes: * queue lock must be held. @@ -443,8 +466,12 @@ unsigned long *map = bqt->tag_map; int tag = 0; - if (unlikely(!(rq->flags & REQ_CMD))) - return 1; + if (unlikely((rq->flags & REQ_QUEUED))) { + printk(KERN_ERR + "request %p for device [02%x:02%x] already tagged %d", + rq, major(rq->rq_dev), minor(rq->rq_dev), rq->tag); + BUG(); + } for (map = bqt->tag_map; *map == -1UL; map++) { tag += BLK_TAGS_PER_LONG; @@ -2027,6 +2054,7 @@ EXPORT_SYMBOL(blk_queue_prep_rq); +EXPORT_SYMBOL(blk_queue_find_tag); EXPORT_SYMBOL(blk_queue_init_tags); EXPORT_SYMBOL(blk_queue_free_tags); EXPORT_SYMBOL(blk_queue_start_tag); diff -Nru a/drivers/ide/tcq.c b/drivers/ide/tcq.c --- a/drivers/ide/tcq.c Tue Jun 11 12:38:12 2002 +++ b/drivers/ide/tcq.c Tue Jun 11 12:38:12 2002 @@ -282,7 +282,7 @@ TCQ_PRINTK("%s: stat %x, feat %x\n", __FUNCTION__, stat, feat); - rq = blk_queue_tag_request(&drive->queue, tag); + rq = blk_queue_find_tag(&drive->queue, tag); if (!rq) { printk(KERN_ERR"%s: missing request for tag %d\n", __FUNCTION__, tag); return ide_stopped; diff -Nru a/include/linux/blkdev.h b/include/linux/blkdev.h --- a/include/linux/blkdev.h Tue Jun 11 12:38:12 2002 +++ b/include/linux/blkdev.h Tue Jun 11 12:38:12 2002 @@ -328,11 +328,11 @@ /* * tag stuff */ -#define blk_queue_tag_request(q, tag) ((q)->queue_tags->tag_index[(tag)]) #define blk_queue_tag_depth(q) ((q)->queue_tags->busy) #define blk_queue_tag_queue(q) ((q)->queue_tags->busy < (q)->queue_tags-> max_depth) #define blk_rq_tagged(rq) ((rq)->flags & REQ_QUEUED) extern int blk_queue_start_tag(request_queue_t *, struct request *); +extern struct request *blk_queue_find_tag(request_queue_t *, int); extern void blk_queue_end_tag(request_queue_t *, struct request *); extern int blk_queue_init_tags(request_queue_t *, int); extern void blk_queue_free_tags(request_queue_t *); =================================================================== This BitKeeper patch contains the following changesets: 1.489..1.490 ## Wrapped with gzip_uu ## begin 664 bkpatch9679 M'XL(`'0G!CT``\U8^W/:.!#^&?\5>^VT![D`\I-'+YF\F%XF:9J29N8ZO0ZC MV`*<^`&6@#!'__=;R>892IJT=,YA@BVO5OOX]M.*EW#-65+/W;+;&^TE_!5S M4<^%@Z"3T"$KY:,X8@4<;\8QCI>[<<C*4K1\=%:^">Z*$1L5!>T4C9*MH=@E M%6X7ABSA]9Q>,F<C8MQC]5RS\?;Z_+"I:7M[<-RE48==,0%[>YJ(DR$-/'Y` M13>(HY)(:,1#)FC)C</)3'1B$&+@GZU73&([$]TA5F7BZIZN4TMG'C&LJF/- MM4EK-^MR2(W84M_$L&NFK9V`7K*J-2!&F3AE70>=U$VKKE>*Q*H3`M+U@Y7H MP!\&%(EV!#_7BV/-A<]'YV=P?OBIT?R"3_BAG@=M/_(`8P[M0>0*/XYV<?AV MP`6XB2]8XE-HQXF4Z/A1!W#MD$8>+VEG8%B$U+3+>?"UXA,O32.4:/N/^.HE MOL0`(B1V[\I!T$I&+41+R5WP'4VQT'>G6IM43=UT+.+JAN>XMNFL#?-C2AU= MURW3THT)L:VJ_:B-?N0&`X^5`S\:W$LL>VQ8ZBXFIV:1B648IC6YH<RUJ5,Q M:J1BMF^,]09NTKA@7968AJ:MU;"*SFP2HE,E3J&S1A;0:=1-LT[LC>@TMXK. MC\<?,FPF+(R'#-#O5G_`!JR%"&PE#.^YR!?`CZ!-A_$@@;B]("31+"7S!51Q M&`3Q"&@TAFR>(@Z<*N*%*5S01&1S$--HIOX+,/WK\>*8ADTF1+<P]?O@T8"Y M=_X!&_H<B[XX9)$8)(PO%9R/ZH3;G]:$34R]HB.U371\PC16VI6*P=KM"G&K MK&9M+K0597.33)LXQO^6!%(;#=NIZ&JG>>".W'&V$<UG*74P#(Y9Q4HS;02/ MJG'=62YQHVZ3C26N0U'?2HE?T)"!JR8NE^UR;2^5YT)%GT$*E?=03$;J@]5V M^3`CSRC9$Z-J@ZZ=IE^YI`][ZVQXK=8J[JOA7;DI%MXLH>(!D![O1WX0V-\' MDV\!?`87P['-%"Y5\T&_8FSN5\QMX47V)P^S,.M4)%!4^T)G!-]!=W&\RT#- MD3K2]B;=5`[3SF8-^4,4"ZDO8>T!9_).#OIM-=YL?&@=OSLI*77C5(8&4R&7 MX@RN%KTZOCJ%GM]C2,)*">H/`MDKJ2;B&\!]D)QG`/C4J-7`T#7861>QXFJ< M;L;@"Z[,EP%*@P4[<OI!OYZ#C^C,5%:]3)M`',5=Q7>9$L39=24IU6`]B_FD M3-=%+!BORSN\KKKQ(,"$,L!X>3#J8J9HI@_G(6@CC@-3F\;Q`$8T4ED)9=N? MJ?$%C'S1G3M3RA;#*XI!QI)#Q)A:J<L"3[[?*6M<)`-7S+S:65/>V;LI*<%. M?U=V"ZK.M7^U7*9BB;=041]/'=#/6$&.\3?:/UK.;^<'4>#?L6"<5S)[<'%] M?@Z3">!C<3^D]RV/]="5/]4*!2V72\.@Y)2.[%G)2Y)$2]G]9[S[\D;[JI5W M=K03RW(D<Z5?N"C,5\TG:%4[0(/@M0+QA^O&=>.D@$N=8@G6<$+C[\OWS8^M MJT_OCMZ?YQ_&1/+;S^B?%;-83]V(\"CD;(590GK'UM(`=5U,B2ISU2PBK*>( MR;-[]0XK01)!D##JC=7!"+&&!G/5/ZI&88N%?F+9)EB8;]L!)X,]'8@XI!W? M1;(90X>)61UQ[G<BYA5*JA2Q0BFBN>OS&8=.-7`^"!6)T?7.NS22]:3"Y?TF MB5&IP0\29*8$JR/EB;'LR]/"WE5U'*KS9!?WJ4622!>Y83B%92JD!_*8N61C M"9;X:.0'`8:?Q]*@])B`Q]@D#J=FK'+7+J"L+WY/.3I+!HKP7AQQ_\8/?#%. MN1\WG#G+H(_XGZ>LI0A%9$LJN,A%T5<>AW*(<K03D^-4P,#D.%5,SO<6(R"U MY'H),LU=_JS1O&@UFDW`(4BO%U-G7O74BAEC?B;&J_NZ_/=E%8ROO!>[\_D) MLEA(;^-$69#TD72&!1SRH]4A^9#6?"YW=/TV+V^^JOYF_>'B\>;F1XXYFSN; MC<>=65M#\#LE'UM_8ENC;^N@^XVNII?$(I85@2RB#O4K)++>X>>T"J8IFUQV M+QAN+<_9%M6N^,V=X2D_7*C,&$_<%LBV^LU-OSE`7C9,">L%U&4A(E%RW[IC FBMP%U"%[6_D[,4T,U/SW4<6K2-][)JVZ58>VM?\`!I#L'H85```` ` end ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-11 2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley 2002-06-11 5:50 ` Jens Axboe @ 2002-06-13 21:01 ` Doug Ledford 2002-06-13 21:26 ` James Bottomley 1 sibling, 1 reply; 39+ messages in thread From: Doug Ledford @ 2002-06-13 21:01 UTC (permalink / raw) To: James Bottomley; +Cc: axboe, linux-scsi, linux-kernel On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote: > 2) The SCSI queue will stall if it gets an untagged request in the stream, so > once tagged queueing is enabled, all commands (including SPECIALS) must be > tagged. I altered the check in blk_queue_start_tag to permit this. Hmmm...this seems broken to me. Switching from tagged to untagged momentarily and then back is perfectly valid. Can the bio layer handle this and not the scsi layer, or are both layers unable to handle this sort of tag manipulation? > There are several shortcomings of the prototype, most notably it doesn't have > tag starvation detection and processing. However, I think I can re-introduce > this as part of the error handler functions. If you are using the bio layer tag processing, then it should be doing this part I would think. If it isn't, then it sounds like either it's design is missing some key elements required to be fully functional or the integration between the scsi layer and the bio layer needs some additional work. -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-13 21:01 ` Doug Ledford @ 2002-06-13 21:26 ` James Bottomley 2002-06-13 21:50 ` Doug Ledford 0 siblings, 1 reply; 39+ messages in thread From: James Bottomley @ 2002-06-13 21:26 UTC (permalink / raw) To: James Bottomley, axboe, linux-scsi, linux-kernel On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote: > 2) The SCSI queue will stall if it gets an untagged request in the stream, so > once tagged queueing is enabled, all commands (including SPECIALS) must be > tagged. I altered the check in blk_queue_start_tag to permit this. dledford@redhat.com said: > Hmmm...this seems broken to me. Switching from tagged to untagged > momentarily and then back is perfectly valid. Can the bio layer > handle this and not the scsi layer, or are both layers unable to > handle this sort of tag manipulation? The layers can cope with the switch easily enough. The problem is that to send an untagged command to a SCSI device you have to wait for the outstanding tags to clear which is what causes the stall. The scsi mid-layer queue push back system pushes all commands back to the BIO layer marked as REQ_SPECIAL (because the upper layer drivers generate the commands and it has no idea what they are supposed to be doing) if the driver cannot handle them. This means for those drivers (like the new adaptec) which load up the device until it returns a queue full (thus causing push back into the bio layer) we'd get stutter in the command pipeline. The cleanest solution is to allow (but not require) tagging of every request type. On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote: > There are several shortcomings of the prototype, most notably it doesn't have > tag starvation detection and processing. However, I think I can re-introduce > this as part of the error handler functions. dledford@redhat.com said: > If you are using the bio layer tag processing, then it should be > doing this part I would think. If it isn't, then it sounds like > either it's design is missing some key elements required to be fully > functional or the integration between the scsi layer and the bio > layer needs some additional work. I thought about doing this. The problem is that the blk layer doesn't have very good instrumentation for detecting the condition. The SCSI layer is the one that has per command timers and all the other necessaries so it can detect when a command should have returned and take corrective action. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-13 21:26 ` James Bottomley @ 2002-06-13 21:50 ` Doug Ledford 2002-06-13 22:09 ` James Bottomley 0 siblings, 1 reply; 39+ messages in thread From: Doug Ledford @ 2002-06-13 21:50 UTC (permalink / raw) To: James Bottomley; +Cc: axboe, linux-scsi, linux-kernel On Thu, Jun 13, 2002 at 05:26:44PM -0400, James Bottomley wrote: > On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote: > > 2) The SCSI queue will stall if it gets an untagged request in the stream, so > > once tagged queueing is enabled, all commands (including SPECIALS) must be > > tagged. I altered the check in blk_queue_start_tag to permit this. > > dledford@redhat.com said: > > Hmmm...this seems broken to me. Switching from tagged to untagged > > momentarily and then back is perfectly valid. Can the bio layer > > handle this and not the scsi layer, or are both layers unable to > > handle this sort of tag manipulation? > > The layers can cope with the switch easily enough. The problem is that to > send an untagged command to a SCSI device you have to wait for the outstanding > tags to clear which is what causes the stall. Well, intentional behaviour is hardly what I would call a stall. I thought you were implying that it would stall the queue indefinitely. I'm fully aware that it forces the queue to wait until all outstanding commands have completed before sending the untagged command, that's part of the desired behaviour in that case. > The scsi mid-layer queue push > back system pushes all commands back to the BIO layer marked as REQ_SPECIAL > (because the upper layer drivers generate the commands and it has no idea what > they are supposed to be doing) if the driver cannot handle them. This means > for those drivers (like the new adaptec) which load up the device until it > returns a queue full (thus causing push back into the bio layer) we'd get > stutter in the command pipeline. The cleanest solution is to allow (but not > require) tagging of every request type. This I'm not following. If you get a QUEUE_FULL from the adaptec driver, then the commands you are pushing back should still be tagged and no stall should be required beyond just waiting for any outstanding command on the drive to complete or for a timeout to pass. It should not require any untagged type stall where you have to drain the entire pipeline... > I thought about doing this. The problem is that the blk layer doesn't have > very good instrumentation for detecting the condition. The SCSI layer is the > one that has per command timers and all the other necessaries so it can detect > when a command should have returned and take corrective action. I would think that, eventually, the bio layer will support I/O fencing via tagged commands (aka, ext3 needs an I/O fence and the bio layer does as needed to enforce that, which on scsi may mean an ordered queue tag is generated instead of a regular tag and on IDE it may mean something else). It will have to be able to tell that some of these conditions have been satisfied in those cases, so I see no reason why it shouldn't be made aware of them now. Just my $.02 -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Proposed changes to generic blk tag for use in SCSI (1/3) 2002-06-13 21:50 ` Doug Ledford @ 2002-06-13 22:09 ` James Bottomley 0 siblings, 0 replies; 39+ messages in thread From: James Bottomley @ 2002-06-13 22:09 UTC (permalink / raw) To: James Bottomley, axboe, linux-scsi, linux-kernel dledford@redhat.com said: > This I'm not following. If you get a QUEUE_FULL from the adaptec > driver, then the commands you are pushing back should still be tagged > and no stall should be required beyond just waiting for any > outstanding command on the drive to complete or for a timeout to > pass. It should not require any untagged type stall where you have > to drain the entire pipeline... Ah, but that was the problem in the blk generic tagging. only requests of type REQ_CMD get tagged, so say the SD driver gets a block read request as a REQ_CMD, translates it into a SCSI READ, sends it through the mid-layer to the low layer driver which requests a block layer tag but eventually responds QUEUE_FULL. Now the command gets pushed back to the blk queue head as a REQ_SPECIAL (and as part of the push back, we have to finish the tag since command moves from the tag queue to the blk queue), which to the scsi mid layer means request with already formulated SCSI command so don't go back through the upper layer driver again. The problem is that when this command comes back again into the scsi mid layer for execution it would now do so as an untagged command because the blk_queue_start_tag() code will only tag REQ_CMD requests, and hence we get a queue stall every time the low level driver responds QUEUE_FULL. This was the behaviour (in the blk layer) I was objecting to---on the second go around, we request a tag using blk_queue_start_tag() but get denied because the request isn't of the correct type---and why I think the blk_generic_start_tag() needs to allow REQ_SPECIAL requests to be tagged. > I would think that, eventually, the bio layer will support I/O fencing > via tagged commands (aka, ext3 needs an I/O fence and the bio layer > does as needed to enforce that, which on scsi may mean an ordered > queue tag is generated instead of a regular tag and on IDE it may > mean something else). It will have to be able to tell that some of > these conditions have been satisfied in those cases, so I see no > reason why it shouldn't be made aware of them now. Just my $.02 I've actually already put this code into the mid layer patch (the scsi_populate_tag_msg() function in scsi.h) to generate an ordered tag for the case where the request is marked REQ_BARRIER. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? @ 2002-09-03 14:35 James Bottomley 2002-09-03 18:23 ` Doug Ledford 0 siblings, 1 reply; 39+ messages in thread From: James Bottomley @ 2002-09-03 14:35 UTC (permalink / raw) To: Justin T. Gibbs, Doug Ledford; +Cc: linux-kernel, linux-scsi > Doug Ledford writes: > > > took the device off line. So, in short, the mid layer isn't waiting > long > enough, or when it gets sense indicated not ready it needs to > implement a > waiting queue with a timeout to try rekicking things a > few times and don't > actually mark the device off line until a longer > period of time has > elasped without the device coming back. > > There is a kernel config CONFIG_AIC7XXX_RESET_DELAY_MS (default 15s). > Would increasing it help? Justin Gibbs writes: > This currently only effects the initial bus reset delay. If the > driver holds off commands after subsequent bus resets, it can cause > undeserved timeouts on the commands it has intentionally deferred. > The mid-layer has a 5 second delay after bus resets, but I haven't > verified that this is honored correctly during error recovery. I'm planning a major re-write of this area in the error handler. The way I think it should go is: 1) Quiesce host (set in_recovery flag) 2) Suspend active timers on this host 3) Proceed down the error correction track (eliminate abort and go down device, bus and host resets and finally set the device offline). 5) On each error recovery wait out a recovery timer for the device to become active before talking to it again. Send all affected commands back to the block layer to await reissue (note: it would now be illegal for commands to lie to the mid layer and say they've done the reset when they haven't). 6) issue a TUR using a command allocated to the eh for that purpose. Process the return code (in particular, if the device says NOT READY, wait some more). Only if the TUR definitively fails proceed up the recovery chain all the way to taking the device offline. I also plan to expose the suspend and resume timers API in some form for FC drivers to use. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley @ 2002-09-03 18:23 ` Doug Ledford 2002-09-03 19:09 ` James Bottomley 0 siblings, 1 reply; 39+ messages in thread From: Doug Ledford @ 2002-09-03 18:23 UTC (permalink / raw) To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi On Tue, Sep 03, 2002 at 09:35:02AM -0500, James Bottomley wrote: > > 1) Quiesce host (set in_recovery flag) Right. > 2) Suspend active timers on this host Right. > 3) Proceed down the error correction track (eliminate abort and go down > device, bus and host resets and finally set the device offline). Leave abort active. It does actually work in certain scenarios. The CD burner scenario that started this thread is an example of somewhere that an abort should actually do the job. > 5) On each error recovery wait out a recovery timer for the device to become > active before talking to it again. Send all affected commands back to the > block layer to await reissue (note: it would now be illegal for commands to > lie to the mid layer and say they've done the reset when they haven't). > 6) issue a TUR using a command allocated to the eh for that purpose. Process > the return code (in particular, if the device says NOT READY, wait some more). > Only if the TUR definitively fails proceed up the recovery chain all the way > to taking the device offline. Right. > I also plan to expose the suspend and resume timers API in some form for FC > drivers to use. -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 18:23 ` Doug Ledford @ 2002-09-03 19:09 ` James Bottomley 2002-09-03 20:59 ` Alan Cox ` (2 more replies) 0 siblings, 3 replies; 39+ messages in thread From: James Bottomley @ 2002-09-03 19:09 UTC (permalink / raw) To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi dledford@redhat.com said: > Leave abort active. It does actually work in certain scenarios. The > CD burner scenario that started this thread is an example of > somewhere that an abort should actually do the job. Unfortunately, it would destroy the REQ_BARRIER approach in the block layer. At best, abort probably causes a command to overtake a barrier it shouldn't, at worst we abort the ordered tag that is the barrier and transactional integrity is lost. When error correction is needed, we have to return all the commands for that device to the block layer so that ordering and barrier issues can be taken care of in the reissue. This makes LUN RESET (for those that support it) the minimum level of error correction we can apply. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 19:09 ` James Bottomley @ 2002-09-03 20:59 ` Alan Cox 2002-09-03 21:32 ` James Bottomley 2002-09-03 21:13 ` Doug Ledford 2002-09-03 21:24 ` Patrick Mansfield 2 siblings, 1 reply; 39+ messages in thread From: Alan Cox @ 2002-09-03 20:59 UTC (permalink / raw) To: James Bottomley Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Tue, 2002-09-03 at 20:09, James Bottomley wrote: > dledford@redhat.com said: > > Leave abort active. It does actually work in certain scenarios. The > > CD burner scenario that started this thread is an example of > > somewhere that an abort should actually do the job. > > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer. > At best, abort probably causes a command to overtake a barrier it shouldn't, > at worst we abort the ordered tag that is the barrier and transactional > integrity is lost. What do we plan to do for the cases where reset is disabled because we have shared disk scsi so don't want to reset and hose the reservations ? If your error correction always requires all commands return to the block layer then the block layer is IMHO broken. Its messy enough doing that before you hit the fun situations where insert scsi commands of their own the block layer never initiated. Next you only need to return stuff if commands have been issued between the aborting command and a barrier. Since most sane systems will never be causing REQ_BARRIER that should mean the general case for an abort is going to be fine. The CD burner example is also true for this. If we track barrier sequences then we will know the barrier count for the command we are aborting and the top barrier count for commands issued to the device. Finally you only need to go to the large hammer approach when you are dealing with a media changing command (ie WRITE*) - if we abort a read then knowing we don't queue overlapping read then write to disk we already know that the read will not break down the tag ordering as I understand it ? If we get to the point we need an abort we don't want to issue a reset. Not every device comes back sane from a reset and in some cases we have to issue a whole sequence of commands to get the state of the device back (door locking, power management, ..) ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 20:59 ` Alan Cox @ 2002-09-03 21:32 ` James Bottomley 2002-09-03 21:54 ` Alan Cox 2002-09-03 22:50 ` Doug Ledford 0 siblings, 2 replies; 39+ messages in thread From: James Bottomley @ 2002-09-03 21:32 UTC (permalink / raw) To: Alan Cox; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi alan@lxorguk.ukuu.org.uk said: > What do we plan to do for the cases where reset is disabled because we > have shared disk scsi so don't want to reset and hose the reservations The reset gets issued and the reservation gets broken. Good HA or other software knows the reservation may be lost and takes this into account in the cluster algorithm. With SCSI-2 reservations, there's no way to preserve the reservation and have the reset be effective (I know, in theory, that this can be circumvented by the soft reset alternative, but I've never seen a device that implements it correctly). I suppose we hope SCSI-3 Persistent Group Reservations come along quickly. > If your error correction always requires all commands return to the > block layer then the block layer is IMHO broken. Its messy enough > doing that before you hit the fun situations where insert scsi > commands of their own the block layer never initiated. This is part of the slim SCSI down approach. The block layer already has handling for tag errors like this. Inserted SCSI commands should now work correctly since we're deprecating the scsi_do_cmnd() in favour of scsi_do_req, which means the command is always associated with a request and goes into the block queue just like any other request. I think the block layer, which already knows about the barrier ordering, is the appropriate place for this. If you think the scsi error handler is a hairy wart now, just watch it grow into a stonking great carbuncle as I try to introduce it to the concept of command queue ordering and appropriate recovery. > Next you only need to return stuff if commands have been issued > between the aborting command and a barrier. Since most sane systems > will never be causing REQ_BARRIER that should mean the general case > for an abort is going to be fine. The CD burner example is also true > for this. If we track barrier sequences then we will know the barrier > count for the command we are aborting and the top barrier count for > commands issued to the device. Finally you only need to go to the > large hammer approach when you are dealing with a media changing > command (ie WRITE*) - if we abort a read then knowing we don't queue > overlapping read then write to disk we already know that the read will > not break down the tag ordering as I understand it ? I agree with your reasoning. However, errors occur infrequently enough (I hope) so that its just not worth the extra code complexity to make the error handler look for that case. However, in all honesty, I have to say that I just don't believe ABORTs are ever particularly effective. As part of error recovery, If a device is tipping over into failure, adding another message isn't a good way to pull it back. ABORT is really part of the I/O cancellation API, and, like all cancellation implementations, it's potentially full of holes. The only uses it might have---like oops I didn't mean to fixate that CD, give it back to me now---aren't clearly defined in the SPEC to produce the desired effect (stop the fixation so the drive door can be opened). > If we get to the point we need an abort we don't want to issue a > reset. Not every device comes back sane from a reset and in some cases > we have to issue a whole sequence of commands to get the state of the > device back (door locking, power management, ..) Well, this is SCSI---the first thing most controllers do for parallel SCSI at least is reset the BUS. Some FC drivers do the FC equivalent as well (not that they should, but that's another issue). The pain of coming back from a reset (and I grant, it isn't trivial) is well known and well implemented in SCSI. It also, from error handlings point of view, sets the device back to a known point in the state model. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 21:32 ` James Bottomley @ 2002-09-03 21:54 ` Alan Cox 2002-09-03 22:50 ` Doug Ledford 1 sibling, 0 replies; 39+ messages in thread From: Alan Cox @ 2002-09-03 21:54 UTC (permalink / raw) To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi On Tue, 2002-09-03 at 22:32, James Bottomley wrote: > I think the block layer, which already knows about the barrier ordering, is > the appropriate place for this. If you think the scsi error handler is a > hairy wart now, just watch it grow into a stonking great carbuncle as I try to > introduce it to the concept of command queue ordering and appropriate recovery. Point taken > I agree with your reasoning. However, errors occur infrequently enough (I > hope) so that its just not worth the extra code complexity to make the error > handler look for that case. When you ar ehandling CD-ROM problems then they can be quite a lot. Not helped by the fact some ancient CD's seem to like to take a long walk in the park when told to reset. > cancellation implementations, it's potentially full of holes. The only uses > it might have---like oops I didn't mean to fixate that CD, give it back to me > now---aren't clearly defined in the SPEC to produce the desired effect (stop > the fixation so the drive door can be opened). Yes but does windows assume it. There are two specs 8) > The pain of coming back from a reset (and I grant, it isn't trivial) is well > known and well implemented in SCSI. It also, from error handlings point of > view, sets the device back to a known point in the state model. > Resetting the bus is antisocial to say the least. We had lots of hangs with the old eh handler that went away when the new one didnt keep resetting the bus. If we do reset the bus then we have to handle "Sorry can't reset the bus" (we have cards that can't or won't) as well as being conservative about timings, making sure we don't reset the bus during the seconds while a device rattles back into online state and so on. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 21:32 ` James Bottomley 2002-09-03 21:54 ` Alan Cox @ 2002-09-03 22:50 ` Doug Ledford 2002-09-03 23:28 ` Alan Cox ` (2 more replies) 1 sibling, 3 replies; 39+ messages in thread From: Doug Ledford @ 2002-09-03 22:50 UTC (permalink / raw) To: James Bottomley; +Cc: Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi > alan@lxorguk.ukuu.org.uk said: > > Next you only need to return stuff if commands have been issued > > between the aborting command and a barrier. Since most sane systems > > will never be causing REQ_BARRIER Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to support more robust journaling with order requirement verification. If that's true, then REQ_BARRIER commands could become quite common on disks using ext3. On Tue, Sep 03, 2002 at 04:32:38PM -0500, James Bottomley wrote: > However, in all honesty, I have to say that I just don't believe ABORTs are > ever particularly effective. As part of error recovery, If a device is > tipping over into failure, adding another message isn't a good way to pull it ^^^^^^^^^^^^^^^^^^^^^^ Then you might as well skip device resets since they are implemented using messages and go straight to bus resets. Shot deflected, no score. > back. ABORT is really part of the I/O cancellation API, and, like all > cancellation implementations, it's potentially full of holes. The only uses > it might have---like oops I didn't mean to fixate that CD, give it back to me > now---aren't clearly defined in the SPEC to produce the desired effect (stop > the fixation so the drive door can be opened). In my experience, aborts have always actually worked fairly well in any scenario where a bus device reset will work. Generally speaking, the problems I've always ran into with SCSI busses have been either A) this command is screwing up but it isn't confusing the drive so we can abort it or BDR it because the drive still responds to us or B) the bus is hung hard and no transfers or messages of any kind can make it through. In the B case, a full bus reset is the only thing that works. In the A case, aborts work just as often as anything else. > The pain of coming back from a reset (and I grant, it isn't trivial) is well > known and well implemented in SCSI. It also, from error handlings point of > view, sets the device back to a known point in the state model. So does a successful abort. -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:50 ` Doug Ledford @ 2002-09-03 23:28 ` Alan Cox 2002-09-04 7:40 ` Jeremy Higdon 2002-09-04 16:13 ` James Bottomley 2 siblings, 0 replies; 39+ messages in thread From: Alan Cox @ 2002-09-03 23:28 UTC (permalink / raw) To: Doug Ledford; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Tue, 2002-09-03 at 23:50, Doug Ledford wrote: > Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to > support more robust journaling with order requirement verification. If > that's true, then REQ_BARRIER commands could become quite common on disks > using ext3. I doubt it very much because the only way to implement barriers on IDE is fantastically expensive. I'm dubious it makes sense for ext3 to use barriers in that way. All it needs as I understand the rules is that an I/O isnt reported as completed by the device driver before it is on the medium or in a non volatile cache. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:50 ` Doug Ledford 2002-09-03 23:28 ` Alan Cox @ 2002-09-04 7:40 ` Jeremy Higdon 2002-09-04 16:24 ` James Bottomley 2002-09-04 16:13 ` James Bottomley 2 siblings, 1 reply; 39+ messages in thread From: Jeremy Higdon @ 2002-09-04 7:40 UTC (permalink / raw) To: Doug Ledford, James Bottomley Cc: Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi On Sep 3, 6:50pm, Doug Ledford wrote: > > > alan@lxorguk.ukuu.org.uk said: > > > Next you only need to return stuff if commands have been issued > > > between the aborting command and a barrier. Since most sane systems > > > will never be causing REQ_BARRIER > > Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to > support more robust journaling with order requirement verification. If > that's true, then REQ_BARRIER commands could become quite common on disks > using ext3. Hmm. There do seem to be a lot of loopholes/race conditions where the barrier just won't work right in the face of error recovery. I wouldn't want to use barriers on any system where data integrity was crucial. For example, in Fibrechannel using class 3 (the usual) send command (command frame corrupted; device does not receive) send barrier (completes normally) ... (lots of time goes by, many more commands are processed) timeout original command whose command frame was corrupted The only safe way to run such a filesystem is to hold the barriers in the driver until all previous commands are successfully completed. There was also the problem of the queue full to the barrier command, etc. Did I miss the answer to these? I don't recall seeing an answer to Patrick Mansfield's questions either (original message edited cutting out a couple of paragraphs): > On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote: > > dledford@redhat.com said: > > > Leave abort active. It does actually work in certain scenarios. The > > > CD burner scenario that started this thread is an example of > > > somewhere that an abort should actually do the job. > > > > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer. > > At best, abort probably causes a command to overtake a barrier it shouldn't, > > at worst we abort the ordered tag that is the barrier and transactional > > integrity is lost. > > > > When error correction is needed, we have to return all the commands for that > > device to the block layer so that ordering and barrier issues can be taken > > care of in the reissue. This makes LUN RESET (for those that support it) the > > minimum level of error correction we can apply. > > > > James > > If we only send an abort or reset after a quiesce I don't see why one > is better than the other. > > Not specific to reset or abort - if a single command gets an error, we > wait for oustanding commands to complete before starting up the error > handler thread. If all the commands (error one and outstanding) have > barriers, those that do not error out will complete out of order from > the errored command. > > How is this properly handled? > > And what happens if one command gets some sort of check condition (like > medium error, or aborted command) that causes a retry? Will IO's still > be correctly ordered? jeremy ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 7:40 ` Jeremy Higdon @ 2002-09-04 16:24 ` James Bottomley 2002-09-04 17:13 ` Mike Anderson 2002-09-05 9:50 ` Jeremy Higdon 0 siblings, 2 replies; 39+ messages in thread From: James Bottomley @ 2002-09-04 16:24 UTC (permalink / raw) To: Jeremy Higdon Cc: Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi jeremy@classic.engr.sgi.com said: > For example, in Fibrechannel using class 3 (the usual) > send command (command frame corrupted; device does not receive) > send barrier (completes normally) > ... (lots of time goes by, many more commands are processed) > timeout original command whose command frame was corrupted This doesn't look right to me from the SCSI angle I don't see how you can get a successful disconnect on a command the device doesn't receive (I take it this is some type of Fibre magic?). Of course, if the device (or its proxy) does receive the command then the ordered queue tag implementation requires that the corrupted frame command be processed prior to the barrier, this isn't optional if you obey the spec. Thus, assuming the processor does no integrity checking of the command until it does processing (this should be a big if), then we still must get notification of the failed command before the barrier tag is begun. Obviously, from that notification we do then race to eliminate the overtaking tags. > There was also the problem of the queue full to the barrier command, > etc. The queue full problem still exists. I've used this argument against the filesystem people many times at the various fora where it has been discussed. The situation is that everyone agrees that it's a theoretical problem, but no-one is convinced that it will actually occur in practice (I think it falls into the "risk we're willing to take" category). James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 16:24 ` James Bottomley @ 2002-09-04 17:13 ` Mike Anderson 2002-09-05 9:50 ` Jeremy Higdon 1 sibling, 0 replies; 39+ messages in thread From: Mike Anderson @ 2002-09-04 17:13 UTC (permalink / raw) To: James Bottomley Cc: Jeremy Higdon, Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi James Bottomley [James.Bottomley@steeleye.com] wrote: > jeremy@classic.engr.sgi.com said: > > For example, in Fibrechannel using class 3 (the usual) > > > send command (command frame corrupted; device does not receive) > > send barrier (completes normally) > > ... (lots of time goes by, many more commands are processed) > > timeout original command whose command frame was corrupted > > This doesn't look right to me from the SCSI angle I don't see how you can get > a successful disconnect on a command the device doesn't receive (I take it > this is some type of Fibre magic?). Of course, if the device (or its proxy) > does receive the command then the ordered queue tag implementation requires > that the corrupted frame command be processed prior to the barrier, this > isn't optional if you obey the spec. Thus, assuming the processor does no > integrity checking of the command until it does processing (this should be a > big if), then we still must get notification of the failed command before the > barrier tag is begun. Obviously, from that notification we do then race to > eliminate the overtaking tags. In FC class 3 if you are logged into a port then notice of this loss doesn't happen until a upper level timeout occurs (ULTP?). The loss can happen prior to the command reaching the device (i.e. the switch can drop the frame). If a corrupted frame makes it to the device it will be discarded as there is not much it can do with a frame containing unreliable data. In FC class 2 frames are ack'd so the recovery can be much more responsive. -Mike -- Michael Anderson andmike@us.ibm.com ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 16:24 ` James Bottomley 2002-09-04 17:13 ` Mike Anderson @ 2002-09-05 9:50 ` Jeremy Higdon 1 sibling, 0 replies; 39+ messages in thread From: Jeremy Higdon @ 2002-09-05 9:50 UTC (permalink / raw) To: James Bottomley Cc: Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi On Sep 4, 11:24am, James Bottomley wrote: > > jeremy@classic.engr.sgi.com said: > > For example, in Fibrechannel using class 3 (the usual) > > > send command (command frame corrupted; device does not receive) > > send barrier (completes normally) > > ... (lots of time goes by, many more commands are processed) > > timeout original command whose command frame was corrupted > > This doesn't look right to me from the SCSI angle I don't see how you can get > a successful disconnect on a command the device doesn't receive (I take it > this is some type of Fibre magic?). Of course, if the device (or its proxy) > does receive the command then the ordered queue tag implementation requires > that the corrupted frame command be processed prior to the barrier, this > isn't optional if you obey the spec. Thus, assuming the processor does no > integrity checking of the command until it does processing (this should be a > big if), then we still must get notification of the failed command before the > barrier tag is begun. Obviously, from that notification we do then race to > eliminate the overtaking tags. You don't have disconnect per se with Fibre (or with other packet based interfaces). A command is sent as a frame. If the frame is corrupted or lost on the way to the target, then the target will never know about the command having been sent. There are obvious problems if that command has an ordered tag, preceded and followed by simple tags dependent on the ordered semantics being followed, and where the simple tagged commands following don't wait for the ordered tag to finish. There are other failure modes too, of course. > > There was also the problem of the queue full to the barrier command, > > etc. > > The queue full problem still exists. I've used this argument against the > filesystem people many times at the various fora where it has been discussed. > The situation is that everyone agrees that it's a theoretical problem, but > no-one is convinced that it will actually occur in practice (I think it falls > into the "risk we're willing to take" category). I guess it all depends on how sensitive one is to getting it wrong. In any case, it looks as though you are doing what you can . . . . > James thanks jeremy ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:50 ` Doug Ledford 2002-09-03 23:28 ` Alan Cox 2002-09-04 7:40 ` Jeremy Higdon @ 2002-09-04 16:13 ` James Bottomley 2002-09-04 16:50 ` Justin T. Gibbs 2 siblings, 1 reply; 39+ messages in thread From: James Bottomley @ 2002-09-04 16:13 UTC (permalink / raw) To: James Bottomley, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi dledford@redhat.com said: > Now, granted, that is more complex than going straight to a BDR, but I > have to argue that it *isn't* that complex. It certainly isn't the > nightmare you make it sound like ;-) It's three times longer even in pseudocode... However, assume we do this (because we must for barrier preservation). The chances are that for a failing device we're aborting a significant number of the tags. This is quite a big increase in the message load over what we do now---Particularly for the AIC driver which can have hundreds of tags outstanding (murphys law says it's usually the earilest tag which times out). I'm not convinced that a BDR, which is a single message and has roughly the same effect, isn't preferable. However, what about a compromise? We can count outstanding commands, so what about doing abort *if* the number of outstanding commands is exactly one (the one we're trying to abort). This means for devices that don't do TCQ (like old CD-ROMs) we'll always try abort first. For large numbers of outstanding tags, we skip over abort and move straight to BDR. The code to implement this will be clean and simple because abort no longer has to pay attention to the barrier. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 16:13 ` James Bottomley @ 2002-09-04 16:50 ` Justin T. Gibbs 2002-09-05 9:39 ` Jeremy Higdon 0 siblings, 1 reply; 39+ messages in thread From: Justin T. Gibbs @ 2002-09-04 16:50 UTC (permalink / raw) To: James Bottomley, Alan Cox, linux-kernel, linux-scsi > dledford@redhat.com said: >> Now, granted, that is more complex than going straight to a BDR, but I >> have to argue that it *isn't* that complex. It certainly isn't the >> nightmare you make it sound like ;-) > > It's three times longer even in pseudocode... To make this work, you really need to use the QErr bit in the disconnect/reconnect page and/or ECA or ACA. QErr I believe is well supported in devices, but ECA (pre SCSI-3) and ACA most likely receive very little testing. I will also voice my opinion (again) that watchdog timer recovery is in the wrong place in Linux. It belongs in the controller drivers: 1) Only the controller driver knows when to start the timeout 2) Only the controller driver knows the current status of the bus/transport 3) Only the controller can close timeout/just completed races 4) Only the controller driver knows the true transport type (SPI/FC/ATA/USB/1394/IP) and what recovery actions are appropriate for that transport type given the capabilities of the controller. 5) The algorithm for recovery and maintaining order becomes quite simple: 1) Freeze the input queue for the controller 2) Return all transactions unqueued to a device to the mid-layer 3) Perform the recovery actions required 4) Unfreeze the controller's queue 5) Device type driver (sd, cd, tape, etc) decides what errors at what rates should cause the failure of a device. The controller driver just needs to have the error codes so it can honestly and fully report to the type driver what really happens so it can make good decissions This of course assumes that all transactions have a serial number and that requeuing transactions orders them by serial number. With QErr set, the device closes the rest if the barrier race for you, but even without barriers, full transaction ordering is required if you don't want a read to inadvertantly pass a write to the same location during recovery. For prior art, take a look at FreeBSD. In the worst case, where escalation to a bus reset is required, recovery takes 5 seconds. -- Justin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 16:50 ` Justin T. Gibbs @ 2002-09-05 9:39 ` Jeremy Higdon 2002-09-05 13:35 ` Justin T. Gibbs 0 siblings, 1 reply; 39+ messages in thread From: Jeremy Higdon @ 2002-09-05 9:39 UTC (permalink / raw) To: Justin T. Gibbs, James Bottomley, Alan Cox, linux-kernel, linux-scsi On Sep 4, 10:50am, Justin T. Gibbs wrote: > > This of course assumes that all transactions have a serial number and > that requeuing transactions orders them by serial number. With QErr > set, the device closes the rest if the barrier race for you, but even > without barriers, full transaction ordering is required if you don't > want a read to inadvertantly pass a write to the same location during > recovery. The original FCP (SCSI commands over Fibre) profile specified that QERR=1 was not available. Unless that is changed, it would appear that you cannot count on being able to set Qerr. Qerr was one of those annoying little things in SCSI that forces host adapter drivers to know a mode page setting. jeremy ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-05 9:39 ` Jeremy Higdon @ 2002-09-05 13:35 ` Justin T. Gibbs 0 siblings, 0 replies; 39+ messages in thread From: Justin T. Gibbs @ 2002-09-05 13:35 UTC (permalink / raw) To: Jeremy Higdon, James Bottomley, Alan Cox, linux-kernel, linux-scsi > On Sep 4, 10:50am, Justin T. Gibbs wrote: >> >> This of course assumes that all transactions have a serial number and >> that requeuing transactions orders them by serial number. With QErr >> set, the device closes the rest if the barrier race for you, but even >> without barriers, full transaction ordering is required if you don't >> want a read to inadvertantly pass a write to the same location during >> recovery. > > > The original FCP (SCSI commands over Fibre) profile specified that QERR=1 > was not available. Unless that is changed, it would appear that you > cannot count on being able to set Qerr. > > Qerr was one of those annoying little things in SCSI that forces host > adapter drivers to know a mode page setting. It is not the controllers that know, but the type drivers. The controller drivers should be conduits for commands, nothing more. With the proper events and error codes, the type drivers can maintain mode parameters and only the type drivers know what parameters are appropriate for their type of service. For FC, you need to use ECA/ACA anyway as that is the only way to deal with inflight commands at the time of an error. -- Justin ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 19:09 ` James Bottomley 2002-09-03 20:59 ` Alan Cox @ 2002-09-03 21:13 ` Doug Ledford 2002-09-03 21:48 ` James Bottomley 2002-09-03 21:24 ` Patrick Mansfield 2 siblings, 1 reply; 39+ messages in thread From: Doug Ledford @ 2002-09-03 21:13 UTC (permalink / raw) To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote: > dledford@redhat.com said: > > Leave abort active. It does actually work in certain scenarios. The > > CD burner scenario that started this thread is an example of > > somewhere that an abort should actually do the job. > > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer. REQ_BARRIER is used for filesystem devices (disks mainly) while the devices that might benefit most from working aborts would likely be other devices. But, regardless, the REQ_BARRIER ordering *can* be preserved while using abort processing. Since the command that needs aborting is, as you are hypothesizing, before the REQ_BARRIER command, and since it hasn't completed, then the REQ_BARRIER command can not be complete and neither can any of the commands behind the REQ_BARRIER. On direct access devices you are only concerned about ordering around the barrier, not ordering of the actual tagged commands, so for abort you can actually call abort on all the commands past the REQ_BARRIER command first, then the REQ_BARRIER command, then the hung command. That would do the job and preserve REQ_BARRIER ordering while still using aborts. > At best, abort probably causes a command to overtake a barrier it shouldn't, > at worst we abort the ordered tag that is the barrier and transactional > integrity is lost. > > When error correction is needed, we have to return all the commands for that > device to the block layer so that ordering and barrier issues can be taken > care of in the reissue. Not really, this would be easily enough done in the ML_QUEUE area of the scsi layer, but it matters not to me. However, if you throw a BDR, then you have cancelled all outstanding commands and (typically) initiated a hard reset of the device which then requires a device settle time. All of this is more drastic and typically takes longer than the individual aborts which are completed in a single connect->disconnect cycle without ever hitting a data phase and without triggering a full device reset and requiring a settle time. > This makes LUN RESET (for those that support it) the > minimum level of error correction we can apply. Not true. -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 21:13 ` Doug Ledford @ 2002-09-03 21:48 ` James Bottomley 2002-09-03 22:42 ` Doug Ledford 2002-09-04 10:37 ` Andries Brouwer 0 siblings, 2 replies; 39+ messages in thread From: James Bottomley @ 2002-09-03 21:48 UTC (permalink / raw) To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi dledford@redhat.com said: > But, regardless, the REQ_BARRIER ordering *can* be preserved while > using abort processing. Since the command that needs aborting is, as > you are hypothesizing, before the REQ_BARRIER command, and since it > hasn't completed, then the REQ_BARRIER command can not be complete and > neither can any of the commands behind the REQ_BARRIER. You are correct. However, as soon as you abort the problem command (assuming the device recovers from this), it will go on its merry way processing the remaining commands in the queue. Assuming one of these is the barrier, you've no way now of re-queueing the aborted command so that it comes before the ordered tag barrier. You can try using a head of queue tag, but it's still a nasty race. > On direct access devices you are only concerned about ordering around > the barrier, not ordering of the actual tagged commands, so for abort > you can actually call abort on all the commands past the REQ_BARRIER > command first, then the REQ_BARRIER command, then the hung command. > That would do the job and preserve REQ_BARRIER ordering while still > using aborts. I agree, but the most likely scenario is that now you're trying to abort almost every tag for that device in the system. Isn't reset a simpler alternative to this? > > At best, abort probably causes a command to overtake a barrier it shouldn't, > > at worst we abort the ordered tag that is the barrier and transactional > > integrity is lost. > > > > When error correction is needed, we have to return all the commands for that > > device to the block layer so that ordering and barrier issues can be taken > > care of in the reissue. > Not really, this would be easily enough done in the ML_QUEUE area of > the scsi layer, but it matters not to me. However, if you throw a > BDR, then you have cancelled all outstanding commands and (typically) > initiated a hard reset of the device which then requires a device > settle time. All of this is more drastic and typically takes longer > than the individual aborts which are completed in a single connect-> > disconnect cycle without ever hitting a data phase and without > triggering a full device reset and requiring a settle time. I agree. I certainly could do it. I'm just a lazy so-and-so. However, think what it does. Apart from me having to do more work, the code becomes longer and the error recovery path more convoluted and difficult to follow. The benefit? well, error recovery might be faster in certain circumstances. I just don't see that it's a cost effective change. If you're hitting error recovery so often that whether it recovers in half a second or several seconds makes a difference, I'd say there's something else wrong. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 21:48 ` James Bottomley @ 2002-09-03 22:42 ` Doug Ledford 2002-09-03 22:52 ` Doug Ledford ` (2 more replies) 2002-09-04 10:37 ` Andries Brouwer 1 sibling, 3 replies; 39+ messages in thread From: Doug Ledford @ 2002-09-03 22:42 UTC (permalink / raw) To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote: > dledford@redhat.com said: > You are correct. However, as soon as you abort the problem command (assuming > the device recovers from this), it will go on its merry way processing the > remaining commands in the queue. Assuming one of these is the barrier, you've > no way now of re-queueing the aborted command so that it comes before the > ordered tag barrier. You can try using a head of queue tag, but it's still a > nasty race. (Solution to this race was in my next paragraph as you found ;-) > > On direct access devices you are only concerned about ordering around > > the barrier, not ordering of the actual tagged commands, so for abort > > you can actually call abort on all the commands past the REQ_BARRIER > > command first, then the REQ_BARRIER command, then the hung command. > > That would do the job and preserve REQ_BARRIER ordering while still > > using aborts. > > I agree, but the most likely scenario is that now you're trying to abort > almost every tag for that device in the system. Isn't reset a simpler > alternative to this? Not really. It hasn't been done yet, but one of my goals is to change the scsi commands over to reasonable list usage (finally) so that we can avoid all these horrible linear scans it does now looking for an available command (it also means things like SCSI_OWNER_MID_LAYER can go away because ownership is defined implicitly by list membership). So, basically, you have a list item struct on each command. When you build the commands, you add them to SDpnt->free_list. When you need a command, instead of searching for a free one, you just grab the head of SDpnt->free_list and use it. Once you've built the command and are ready to hand it off to the lldd, you put the command on the tail of the SDpnt->active_list. When a command completes, you list_remove() it from the SDpnt->active_list and put it on the SDpnt->complete_list to be handled by the tasklet. When the tasklet actually completes the command, it frees the scsi command struct by simply putting it back on the SDpnt->free_list. Now, if you do things that way, your reset vs. abort code is actually pretty trivial. Case 1: you want to throw a BDR. Sample code might end up looking like this, [ oops we timed out ] hostt->bus_device_reset(cmd); if(!list_empty(cmd->device->active_list)) { [ our commands haven't all been returned, spew chunks! ] } [ do post reset processing ] Case 2: you want to do an abort, but you need to preserve ordering around any possible REQ_BARRIERs on the bus. This requires that we keep a REQ_BARRIER count for the device, it is after all possible that we could have multiple barriers active at once, so as each command is put on the active_list, if it is a barrier, then we increment SDpnt->barrier_count and as we complete commands (at the interrupt context completion, not the final completion) if it is a barrier command we decrement the count. [ oops we timed out ] while(SDpnt->barrier_count && cmd) { // when the aborted command is returned via the done() // it will remove it from the active_list, so don't remove // it here abort_cmd = list_get_tail(SDpnt->active_list); if(hostt->abort(abort_cmd) != SUCCESS) { [ oops, go on to more drastic action ] } else { if(abort_cmd->type == BARRIER) SDpnt->barrier_count--; if(abort_cmd == cmd) cmd = NULL; } } if(cmd) { if(hostt->abort(cmd) != SUCCESS) [ oops, go on to more drastic action ] } Now, granted, that is more complex than going straight to a BDR, but I have to argue that it *isn't* that complex. It certainly isn't the nightmare you make it sound like ;-) > > > At best, abort probably causes a command to overtake a barrier it shouldn't, > > > at worst we abort the ordered tag that is the barrier and transactional > > > integrity is lost. > > > > > > When error correction is needed, we have to return all the commands for that > > > device to the block layer so that ordering and barrier issues can be taken > > > care of in the reissue. > > > Not really, this would be easily enough done in the ML_QUEUE area of > > the scsi layer, but it matters not to me. However, if you throw a > > BDR, then you have cancelled all outstanding commands and (typically) > > initiated a hard reset of the device which then requires a device > > settle time. All of this is more drastic and typically takes longer > > than the individual aborts which are completed in a single connect-> > > disconnect cycle without ever hitting a data phase and without > > triggering a full device reset and requiring a settle time. > > I agree. I certainly could do it. I'm just a lazy so-and-so. However, think > what it does. Apart from me having to do more work, the code becomes longer > and the error recovery path more convoluted and difficult to follow. The > benefit? well, error recovery might be faster in certain circumstances. Well, as I've laid it out above, I don't really think it's all that much to implement ;-) At least not in the mid layer. The low level device drivers are doing *far* more work to support aborts than the mid layer has to do. > I > just don't see that it's a cost effective change. Matter of some question, I'm sure. I don't see it as all that much work, so it seems reasonably cost effective to me ;-) > If you're hitting error > recovery so often that whether it recovers in half a second or several > seconds makes a difference, I'd say there's something else wrong. Hehehe, if you are hitting error recovery at all then something else is wrong by definition, the only difference is in how you handle it :-P -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:42 ` Doug Ledford @ 2002-09-03 22:52 ` Doug Ledford 2002-09-03 23:29 ` Alan Cox 2002-09-04 21:16 ` Luben Tuikov 2 siblings, 0 replies; 39+ messages in thread From: Doug Ledford @ 2002-09-03 22:52 UTC (permalink / raw) To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Tue, Sep 03, 2002 at 06:42:16PM -0400, Doug Ledford wrote: > > Case 2: you want to do an abort, but you need to preserve ordering around > any possible REQ_BARRIERs on the bus. This requires that we keep a > REQ_BARRIER count for the device, it is after all possible that we could > have multiple barriers active at once, so as each command is put on the > active_list, if it is a barrier, then we increment SDpnt->barrier_count > and as we complete commands (at the interrupt context completion, not the > final completion) if it is a barrier command we decrement the count. > > [ oops we timed out ] > while(SDpnt->barrier_count && cmd) { > // when the aborted command is returned via the done() > // it will remove it from the active_list, so don't remove > // it here > abort_cmd = list_get_tail(SDpnt->active_list); > if(hostt->abort(abort_cmd) != SUCCESS) { > [ oops, go on to more drastic action ] > } else { > if(abort_cmd->type == BARRIER) > SDpnt->barrier_count--; Oops, delete those last two lines....the done() function decrements the barrier_count for us. > if(abort_cmd == cmd) > cmd = NULL; > } > } > if(cmd) { > if(hostt->abort(cmd) != SUCCESS) > [ oops, go on to more drastic action ] > } -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:42 ` Doug Ledford 2002-09-03 22:52 ` Doug Ledford @ 2002-09-03 23:29 ` Alan Cox 2002-09-04 21:16 ` Luben Tuikov 2 siblings, 0 replies; 39+ messages in thread From: Alan Cox @ 2002-09-03 23:29 UTC (permalink / raw) To: Doug Ledford; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Tue, 2002-09-03 at 23:42, Doug Ledford wrote: > Not really. It hasn't been done yet, but one of my goals is to change the > scsi commands over to reasonable list usage (finally) so that we can avoid > all these horrible linear scans it does now looking for an available > command (it also means things like SCSI_OWNER_MID_LAYER can go away At least partly done in 2.4-ac and waiting pushing to Marcelo. I believe Hugh contributed the O(1) command finder ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:42 ` Doug Ledford 2002-09-03 22:52 ` Doug Ledford 2002-09-03 23:29 ` Alan Cox @ 2002-09-04 21:16 ` Luben Tuikov 2 siblings, 0 replies; 39+ messages in thread From: Luben Tuikov @ 2002-09-04 21:16 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-kernel, linux-scsi Doug Ledford wrote: > > Not really. It hasn't been done yet, but one of my goals is to change the > scsi commands over to reasonable list usage (finally) so that we can avoid > all these horrible linear scans it does now looking for an available > command Using the struct list_head for this will literally allow you to do _magic_. Avoiding the linear scan is the last thing this will fix. It would allow for a lot better/simpler/sound design of all of the mid layer/SCSI core. Things will be/become easier as you point out below. Currently the mid-layer queuing is hairy at best. I'm all for it. > So, > basically, you have a list item struct on each command. When you build > the commands, you add them to SDpnt->free_list. When you need a command, > instead of searching for a free one, you just grab the head of > SDpnt->free_list and use it. Once you've built the command and are ready > to hand it off to the lldd, you put the command on the tail of the > SDpnt->active_list. When a command completes, you list_remove() it from > the SDpnt->active_list and put it on the SDpnt->complete_list to be > handled by the tasklet. When the tasklet actually completes the command, > it frees the scsi command struct by simply putting it back on the > SDpnt->free_list. Great! Once you're on that train, you may want to rethink the whole queuing mechanism of the mid-layer (straight from sd/etc and internally down to LLDD) for an improved design. There'd be problems like cmd moving b/n lists is atomic, only cmd movers can actually cancel a command, move before calling queuecommand(), etc, but is nothing extraordinary. > Now, granted, that is more complex than going straight to a BDR, but I > have to argue that it *isn't* that complex. It certainly isn't the > nightmare you make it sound like ;-) No, it certainly is NOT! Granted, by looking at the code it will not be overly clear who moves what and when, but a 2 page commentary on the design would only leave one exlaiming ``Aaaaah... such simplicity, so great!'' > Well, as I've laid it out above, I don't really think it's all that much > to implement ;-) At least not in the mid layer. Right, it's not. This type of queuing mechanism would only make things more consistent and easy to manipulate. There'd be logistical issues, but those are easy to figure out with pen and paper. -- Luben ``Perfection is achieved not when there is nothing more to add but when there is nothing left to take away.'' Antoine de Saint Exupery ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 21:48 ` James Bottomley 2002-09-03 22:42 ` Doug Ledford @ 2002-09-04 10:37 ` Andries Brouwer 2002-09-04 10:48 ` Doug Ledford 2002-09-04 11:23 ` Alan Cox 1 sibling, 2 replies; 39+ messages in thread From: Andries Brouwer @ 2002-09-04 10:37 UTC (permalink / raw) To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote: > If you're hitting error recovery so often that whether it recovers > in half a second or several seconds makes a difference, I'd say > there's something else wrong. Not that I want to contradict, but an example. Without my sd.c patch from yesterday or so (fixing MODE SENSE calls) an "insmod usb-storage.o" would take 14 minutes and 6 seconds for me. [One USB device, with 3 subdevices, gets into a bad state when presented with a MODE SENSE command that asks for more than the 56 bytes it has available. For each of the three subdevices we get a long sequence of retries, abort, reset, host reset, bus reset before it is taken off-line.] The scsi error recovery has many bad properties, but one is its slowness. Once it gets triggered on a machine with SCSI disks it is common to have a dead system for several minutes. I have not yet met a situation in which rebooting was not preferable above scsi error recovery, especially since the attempt to recover often fails. Andries ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 10:37 ` Andries Brouwer @ 2002-09-04 10:48 ` Doug Ledford 2002-09-04 11:23 ` Alan Cox 1 sibling, 0 replies; 39+ messages in thread From: Doug Ledford @ 2002-09-04 10:48 UTC (permalink / raw) To: Andries Brouwer Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Wed, Sep 04, 2002 at 12:37:37PM +0200, Andries Brouwer wrote: > > The scsi error recovery has many bad properties, but one is its slowness. This does not have to be this way. It is a solvable problem. > Once it gets triggered on a machine with SCSI disks it is common to > have a dead system for several minutes. Yes, well, this too is solvable. It, in fact, reminds me that one of the things I think needs added to the scsi host settings is a default timeout value for typical devices. Something like adding a default_timeout value to each Scsi_Device struct and allowing the scsi driver to modify the value during the slave_attach() call. Then we can put the default timeout on non-intelligent controllers to something sane while things like MegaRAID controllers can keep their sky high timeout values. > I have not yet met a situation > in which rebooting was not preferable above scsi error recovery, > especially since the attempt to recover often fails. This, too, is solvable. It just requires that the scsi subsystem start paying attention to *how* things fail and making the error handling code smart enough to know when to retry things and when to just fail. -- Doug Ledford <dledford@redhat.com> 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 10:37 ` Andries Brouwer 2002-09-04 10:48 ` Doug Ledford @ 2002-09-04 11:23 ` Alan Cox 2002-09-04 16:25 ` Rogier Wolff 1 sibling, 1 reply; 39+ messages in thread From: Alan Cox @ 2002-09-04 11:23 UTC (permalink / raw) To: Andries Brouwer Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote: > The scsi error recovery has many bad properties, but one is its slowness. > Once it gets triggered on a machine with SCSI disks it is common to > have a dead system for several minutes. I have not yet met a situation > in which rebooting was not preferable above scsi error recovery, > especially since the attempt to recover often fails. Well I for one prefer the scsi timeout/abort sequence on a CD getting confused badly by a bad block (as at least some of my drives do) to a reboot everytime I get bad media ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 11:23 ` Alan Cox @ 2002-09-04 16:25 ` Rogier Wolff 2002-09-04 19:34 ` Thunder from the hill 0 siblings, 1 reply; 39+ messages in thread From: Rogier Wolff @ 2002-09-04 16:25 UTC (permalink / raw) To: Alan Cox Cc: Andries Brouwer, James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi On Wed, Sep 04, 2002 at 12:23:13PM +0100, Alan Cox wrote: > On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote: > > The scsi error recovery has many bad properties, but one is its slowness. > > Once it gets triggered on a machine with SCSI disks it is common to > > have a dead system for several minutes. I have not yet met a situation > > in which rebooting was not preferable above scsi error recovery, > > especially since the attempt to recover often fails. > > Well I for one prefer the scsi timeout/abort sequence on a CD getting > confused badly by a bad block (as at least some of my drives do) to a > reboot everytime I get bad media Reboot is bad. Retries are bad. Errors should be returned to an upper layer, with an error code: "may retry", or "will never work". (Like in SMTP) I will most likely set the "retry count" to 0: Never retry. Almost never works anyway. And the disk already retried manytimes, so retrying in software is only "taking time". We do datarecovery around here. We get bad disks on a dayly basis. We are currently reading a drive that gets over 10Mb per second while spitting out bad block reports! Thing is: those blocks that didn't work first time, may work on our second retry. However, we need userspace control over that retry. We prefer to get the 18G worth of data off the disk first, and only then retry the blocks that happened to be bad first time around. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * The Worlds Ecosystem is a stable system. Stable systems may experience * * excursions from the stable situation. We are currenly in such an * * excursion: The stable situation does not include humans. *************** ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-04 16:25 ` Rogier Wolff @ 2002-09-04 19:34 ` Thunder from the hill 0 siblings, 0 replies; 39+ messages in thread From: Thunder from the hill @ 2002-09-04 19:34 UTC (permalink / raw) To: Rogier Wolff Cc: Alan Cox, Andries Brouwer, James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi Hi, On Wed, 4 Sep 2002, Rogier Wolff wrote: > I will most likely set the "retry count" to 0: Never retry. Almost > never works anyway. And the disk already retried manytimes, so > retrying in software is only "taking time". fd = open(pathname, O_NORETRY); ? Thunder -- --./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .- --/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..- .- -/---/--/---/.-./.-./---/.--/.-.-.- --./.-/-.../.-./.././.-../.-.-.- ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 19:09 ` James Bottomley 2002-09-03 20:59 ` Alan Cox 2002-09-03 21:13 ` Doug Ledford @ 2002-09-03 21:24 ` Patrick Mansfield 2002-09-03 22:02 ` James Bottomley 2 siblings, 1 reply; 39+ messages in thread From: Patrick Mansfield @ 2002-09-03 21:24 UTC (permalink / raw) To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi James - On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote: > dledford@redhat.com said: > > Leave abort active. It does actually work in certain scenarios. The > > CD burner scenario that started this thread is an example of > > somewhere that an abort should actually do the job. > > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer. > At best, abort probably causes a command to overtake a barrier it shouldn't, > at worst we abort the ordered tag that is the barrier and transactional > integrity is lost. > > When error correction is needed, we have to return all the commands for that > device to the block layer so that ordering and barrier issues can be taken > care of in the reissue. This makes LUN RESET (for those that support it) the > minimum level of error correction we can apply. > > James If we only send an abort or reset after a quiesce I don't see why one is better than the other. Not specific to reset or abort - if a single command gets an error, we wait for oustanding commands to complete before starting up the error handler thread. If all the commands (error one and outstanding) have barriers, those that do not error out will complete out of order from the errored command. How is this properly handled? And what happens if one command gets some sort of check condition (like medium error, or aborted command) that causes a retry? Will IO's still be correctly ordered? The abort could also be usefull in handling the locking/ownership of the scsi_cmnd - the abort at the LLD layer can be used by the LLD to cancel any software timeouts, as well as to flush the command from the hardware. After the abort, the mid-layer could assume that it once again "owned" the scsi_cmnd, especially if the LLD abort were a required function. I would like to see error handling occur without quiescing the entire adapter before taking any action. Stopping all adapter IO for a timeout can be a bit expensive - imagine a tape drive and multiple disks on an adapter, any IO disk timeout or failure will wait for the tape IO to complete before allowing any other IO, if the tape operation is long or is going to timeout this could be minutes or hours. -- Patrick Mansfield ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 21:24 ` Patrick Mansfield @ 2002-09-03 22:02 ` James Bottomley 2002-09-03 23:26 ` Alan Cox 0 siblings, 1 reply; 39+ messages in thread From: James Bottomley @ 2002-09-03 22:02 UTC (permalink / raw) To: Patrick Mansfield Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi patmans@us.ibm.com said: > If we only send an abort or reset after a quiesce I don't see why one > is better than the other. Quiesce means from the top (no more commands go from the block layer to the device) not from the bottom (commands can still be completed for the device). > Not specific to reset or abort - if a single command gets an error, we > wait for oustanding commands to complete before starting up the error > handler thread. If all the commands (error one and outstanding) have > barriers, those that do not error out will complete out of order from > the errored command. We don't wait. The commands may possibly be completing in parallel with recovery. To address your point, though, that's why the device needs to be reset as fast as possible: to preserve what's left of the command order. I accept that for a misbehaving device, this may be a race. > And what happens if one command gets some sort of check condition > (like medium error, or aborted command) that causes a retry? Will IO's > still be correctly ordered? Retries get eliminated. It should be up to the upper layers (sd or beyond) to say whether a retry is done. Since, as you point out, retries automatically violate any barrier, it is probably up to the block layer to judge what should be done about the problem. > I would like to see error handling occur without quiescing the entire > adapter before taking any action. Stopping all adapter IO for a > timeout can be a bit expensive - imagine a tape drive and multiple > disks on an adapter, any IO disk timeout or failure will wait for the > tape IO to complete before allowing any other IO, if the tape > operation is long or is going to timeout this could be minutes or > hours. Actually, I do think that quiescing has an important role to play. A lot of drive errors can be self inflicted indigestion by accepting more tags into the queue than it can process. Quiescing the queue lets us see if the drive can digest what it currently has or whether we need to apply a strong emetic. I'd actually like to add tag starvation recovery to the error handler's list of things to do. In that case, all you do on entry to the error handler is quiesce the upper queue and wait a while to see if the commands continue returning. You only begin more drastic measures if nothing comes back in the designated time period. James ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: aic7xxx sets CDR offline, how to reset? 2002-09-03 22:02 ` James Bottomley @ 2002-09-03 23:26 ` Alan Cox 0 siblings, 0 replies; 39+ messages in thread From: Alan Cox @ 2002-09-03 23:26 UTC (permalink / raw) To: James Bottomley Cc: Patrick Mansfield, Justin T. Gibbs, linux-kernel, linux-scsi On Tue, 2002-09-03 at 23:02, James Bottomley wrote: > > And what happens if one command gets some sort of check condition > > (like medium error, or aborted command) that causes a retry? Will IO's > > still be correctly ordered? > > Retries get eliminated. It should be up to the upper layers (sd or beyond) to > say whether a retry is done. Since, as you point out, retries automatically > violate any barrier, it is probably up to the block layer to judge what should > be done about the problem. Then we need to give the block layer a lot more information about what kind of a problem occurred ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2002-09-05 13:32 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-06-11 2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley 2002-06-11 5:50 ` Jens Axboe 2002-06-11 14:29 ` James Bottomley 2002-06-11 14:45 ` Jens Axboe 2002-06-11 16:39 ` James Bottomley 2002-06-13 21:01 ` Doug Ledford 2002-06-13 21:26 ` James Bottomley 2002-06-13 21:50 ` Doug Ledford 2002-06-13 22:09 ` James Bottomley -- strict thread matches above, loose matches on Subject: below -- 2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley 2002-09-03 18:23 ` Doug Ledford 2002-09-03 19:09 ` James Bottomley 2002-09-03 20:59 ` Alan Cox 2002-09-03 21:32 ` James Bottomley 2002-09-03 21:54 ` Alan Cox 2002-09-03 22:50 ` Doug Ledford 2002-09-03 23:28 ` Alan Cox 2002-09-04 7:40 ` Jeremy Higdon 2002-09-04 16:24 ` James Bottomley 2002-09-04 17:13 ` Mike Anderson 2002-09-05 9:50 ` Jeremy Higdon 2002-09-04 16:13 ` James Bottomley 2002-09-04 16:50 ` Justin T. Gibbs 2002-09-05 9:39 ` Jeremy Higdon 2002-09-05 13:35 ` Justin T. Gibbs 2002-09-03 21:13 ` Doug Ledford 2002-09-03 21:48 ` James Bottomley 2002-09-03 22:42 ` Doug Ledford 2002-09-03 22:52 ` Doug Ledford 2002-09-03 23:29 ` Alan Cox 2002-09-04 21:16 ` Luben Tuikov 2002-09-04 10:37 ` Andries Brouwer 2002-09-04 10:48 ` Doug Ledford 2002-09-04 11:23 ` Alan Cox 2002-09-04 16:25 ` Rogier Wolff 2002-09-04 19:34 ` Thunder from the hill 2002-09-03 21:24 ` Patrick Mansfield 2002-09-03 22:02 ` James Bottomley 2002-09-03 23:26 ` Alan Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox