* sg driver and the error handler @ 2005-05-10 18:51 Alan Stern 2005-05-10 22:58 ` Patrick Mansfield 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2005-05-10 18:51 UTC (permalink / raw) To: James Bottomley, Doug Gilbert; +Cc: SCSI development list I don't know who the right person is to handle this, but maybe someone can help... When a command injected through the sg driver encounters any sort of error, the usual retry mechanism and error handler are brought into play. Since sg sets the number of retries to 0 by default, the retry mechanism shouldn't cause any difficulty. But the error handler does. IMO it should never get involved with requests coming through sg -- sg should provide access that is as transparent as possible so that userspace programs will have a clean shot at managing their device. Consider a case that just came up. A USB CD drive causes a phase error when it receives a certain READ BUFFER command (buggy firmware on the drive, never mind that now). You would expect the user program to receive from sg a host_status value indicating DID_ERROR or something of the sort. Instead the error handler takes charge and sends out one or two TEST UNIT READY commands. The status information finally received by the user program is the status from the TEST UNIT READY, not from the failed READ BUFFER! How's a program supposed to cope with that sort of obfuscation? Something in the SCSI stack (scsi_io_completion ?) should check for requests coming in via sg and should know to complete the requests immediately. No requeuing and no error handling. Does this sound like a feasible thing to implement? Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-10 18:51 sg driver and the error handler Alan Stern @ 2005-05-10 22:58 ` Patrick Mansfield 2005-05-11 11:59 ` Douglas Gilbert ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Patrick Mansfield @ 2005-05-10 22:58 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, Doug Gilbert, SCSI development list Hi Alan - On Tue, May 10, 2005 at 02:51:44PM -0400, Alan Stern wrote: > When a command injected through the sg driver encounters any sort of > error, the usual retry mechanism and error handler are brought into play. > Since sg sets the number of retries to 0 by default, the retry mechanism > shouldn't cause any difficulty. But the error handler does. IMO it > should never get involved with requests coming through sg -- sg should > provide access that is as transparent as possible so that userspace > programs will have a clean shot at managing their device. The error handler is mainly a timeout handler, so it has to run to cancel the timed out command, we can't complete the timed out command back to the upper level until we know the HBA is no longer using it. > Consider a case that just came up. A USB CD drive causes a phase error > when it receives a certain READ BUFFER command (buggy firmware on the > drive, never mind that now). You would expect the user program to receive > from sg a host_status value indicating DID_ERROR or something of the sort. > > Instead the error handler takes charge and sends out one or two TEST UNIT > READY commands. The status information finally received by the user > program is the status from the TEST UNIT READY, not from the failed READ > BUFFER! How's a program supposed to cope with that sort of obfuscation? :-( > Something in the SCSI stack (scsi_io_completion ?) should check for > requests coming in via sg and should know to complete the requests > immediately. No requeuing and no error handling. > Does this sound like a feasible thing to implement? No per above. I think there are still some cases where sg or SG_IO commands are not immediately returned, I think for some certain ASC codes, some of those probably should be retried, as should cases like QUEUE FULL. As you say, what you really want is the correct result going back to the user, not the result of the TUR. So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu. The request sense one already saves and restores it. And always set result |= (DRIVER_TIMEOUT << 24) if we are not requeueing the IO. Like (only compile tested) this, against scsi-misc git tree of a few days ago. Hopefully, sg or SG_IO programs can handle DRIVER_TIMEOUT :-/ but it is certainly better than returning as if no error occurred. What do you think? diff -uprN -X /home/patman/dontdiff scsi-misc-2.6.git/drivers/scsi/scsi_error.c to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c --- scsi-misc-2.6.git/drivers/scsi/scsi_error.c 2005-05-09 14:19:41.000000000 -0700 +++ to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c 2005-05-10 15:49:01.000000000 -0700 @@ -770,6 +770,7 @@ static int scsi_eh_tur(struct scsi_cmnd { static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0}; int retry_cnt = 1, rtn; + int saved_result; retry_tur: memcpy(scmd->cmnd, tur_command, sizeof(tur_command)); @@ -780,6 +781,7 @@ retry_tur: */ memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer)); + saved_result = scmd->result; scmd->request_buffer = NULL; scmd->request_bufflen = 0; scmd->use_sg = 0; @@ -794,6 +796,7 @@ retry_tur: * the original request, so let's restore the original data. (db) */ scsi_setup_cmd_retry(scmd); + scmd->result = saved_result; /* * hey, we are done. let's look to see what happened. @@ -896,6 +899,7 @@ static int scsi_eh_try_stu(struct scsi_c { static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0}; int rtn; + int saved_result; if (!scmd->device->allow_restart) return 1; @@ -908,6 +912,7 @@ static int scsi_eh_try_stu(struct scsi_c */ memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer)); + saved_result = scmd->result; scmd->request_buffer = NULL; scmd->request_bufflen = 0; scmd->use_sg = 0; @@ -922,6 +927,7 @@ static int scsi_eh_try_stu(struct scsi_c * the original request, so let's restore the original data. (db) */ scsi_setup_cmd_retry(scmd); + scmd->result = saved_result; /* * hey, we are done. let's look to see what happened. @@ -1561,8 +1567,7 @@ static void scsi_eh_flush_done_q(struct scmd)); scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY); } else { - if (!scmd->result) - scmd->result |= (DRIVER_TIMEOUT << 24); + scmd->result |= (DRIVER_TIMEOUT << 24); SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish" " cmd: %p\n", current->comm, scmd)); -- Patrick Mansfield ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-10 22:58 ` Patrick Mansfield @ 2005-05-11 11:59 ` Douglas Gilbert 2005-05-11 16:35 ` Luben Tuikov 2005-05-11 16:40 ` Luben Tuikov 2005-05-16 17:42 ` [PATCH] saved and restore result for timed out commands Patrick Mansfield 2 siblings, 1 reply; 18+ messages in thread From: Douglas Gilbert @ 2005-05-11 11:59 UTC (permalink / raw) To: Patrick Mansfield; +Cc: Alan Stern, James Bottomley, SCSI development list Patrick Mansfield wrote: > Hi Alan - > > On Tue, May 10, 2005 at 02:51:44PM -0400, Alan Stern wrote: > >>When a command injected through the sg driver encounters any sort of >>error, the usual retry mechanism and error handler are brought into play. >>Since sg sets the number of retries to 0 by default, the retry mechanism >>shouldn't cause any difficulty. But the error handler does. IMO it >>should never get involved with requests coming through sg -- sg should >>provide access that is as transparent as possible so that userspace >>programs will have a clean shot at managing their device. > > > The error handler is mainly a timeout handler, so it has to run to cancel > the timed out command, we can't complete the timed out command back to the > upper level until we know the HBA is no longer using it. > > >>Consider a case that just came up. A USB CD drive causes a phase error >>when it receives a certain READ BUFFER command (buggy firmware on the >>drive, never mind that now). You would expect the user program to receive >>from sg a host_status value indicating DID_ERROR or something of the sort. >> >>Instead the error handler takes charge and sends out one or two TEST UNIT >>READY commands. The status information finally received by the user >>program is the status from the TEST UNIT READY, not from the failed READ >>BUFFER! How's a program supposed to cope with that sort of obfuscation? > > > :-( Writing and maintaining a clean SCSI command pass-through in a block-centric environment has not been easy. It often feels like a one step forward and two steps back process :-) In my recent "sense descriptor format" changes I did unclutter some SG_IO error return paths, mainly for usages via block device nodes (e.g. /dev/sda). As ever, handling errors sanely is not easy. It also runs the risk of tripping up some hardware. >>Something in the SCSI stack (scsi_io_completion ?) should check for >>requests coming in via sg and should know to complete the requests >>immediately. No requeuing and no error handling. The sg driver does not currently call scsi_io_completion(). In the mid level you can see code like: if (blk_pc_request(scsi_cmnd->request)) pass-through else block-injected [almost always a READ or WRITE] I think the "_pc_" stands for packet command. Maybe we need more instances of this. >>Does this sound like a feasible thing to implement? > > > No per above. I think there are still some cases where sg or SG_IO > commands are not immediately returned, I think for some certain ASC codes, > some of those probably should be retried, as should cases like QUEUE FULL. > > As you say, what you really want is the correct result going back to the > user, not the result of the TUR. > > So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu. > The request sense one already saves and restores it. > > And always set result |= (DRIVER_TIMEOUT << 24) if we are not requeueing > the IO. > > Like (only compile tested) this, against scsi-misc git tree of a few days > ago. Hopefully, sg or SG_IO programs can handle DRIVER_TIMEOUT :-/ but it > is certainly better than returning as if no error occurred. sg3_utils error processing first decodes SCSI status and sense data. Then if any of the other bits in "result" are set, then it comments on them. <snip> Patch looks fine. Perhaps Alan could try some utility from sg3_utils in the problem environment once Patrick's patch has been applied. The sg_dd utility now has a "verbose=<n>" option where <n>=1 should decode any SCSI status (including DID and DRIVER) errors received. Doug Gilbert ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-11 11:59 ` Douglas Gilbert @ 2005-05-11 16:35 ` Luben Tuikov 2005-05-11 17:45 ` Patrick Mansfield 2005-05-11 19:43 ` Alan Stern 0 siblings, 2 replies; 18+ messages in thread From: Luben Tuikov @ 2005-05-11 16:35 UTC (permalink / raw) To: dougg; +Cc: Patrick Mansfield, Alan Stern, James Bottomley, SCSI development list >>The error handler is mainly a timeout handler, so it has to run to cancel >>the timed out command, we can't complete the timed out command back to the >>upper level until we know the HBA is no longer using it. >> >> >> >>>Consider a case that just came up. A USB CD drive causes a phase error >>>when it receives a certain READ BUFFER command (buggy firmware on the >>>drive, never mind that now). You would expect the user program to receive >> >>>from sg a host_status value indicating DID_ERROR or something of the sort. >> >>>Instead the error handler takes charge and sends out one or two TEST UNIT >>>READY commands. The status information finally received by the user >>>program is the status from the TEST UNIT READY, not from the failed READ >>>BUFFER! How's a program supposed to cope with that sort of obfuscation? >> >> >>:-( I think Alan even wants to see the QUEUEFULL status not being retried. That is, other than reclaiming the command from the LLDD, the eh code should not retry or do anything with the command, just pass it back to the user process who sent it via sg. Maybe the application client has other tricks up its sleeve on QUEUEFULL coming from the device. I think scsi generic as a passthrough, where minimal logic from SCSI Core is applied and most is left to the application client. Luben ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-11 16:35 ` Luben Tuikov @ 2005-05-11 17:45 ` Patrick Mansfield 2005-05-11 17:55 ` Luben Tuikov 2005-05-11 19:43 ` Alan Stern 1 sibling, 1 reply; 18+ messages in thread From: Patrick Mansfield @ 2005-05-11 17:45 UTC (permalink / raw) To: Luben Tuikov; +Cc: dougg, Alan Stern, James Bottomley, SCSI development list On Wed, May 11, 2005 at 12:35:39PM -0400, Luben Tuikov wrote: > I think Alan even wants to see the QUEUEFULL status not being > retried. > > That is, other than reclaiming the command from the LLDD, the > eh code should not retry or do anything with the command, just > pass it back to the user process who sent it via sg. > > Maybe the application client has other tricks up its sleeve > on QUEUEFULL coming from the device. > > I think scsi generic as a passthrough, where minimal logic > from SCSI Core is applied and most is left to the application > client. IMO, normal sg/SG_IO usage wants scsi core to requeue the request, they want the command to be processed by the LUN. And user space does not have the information available in scsi core (status of other outstanding IO to the same LUN), and can't retry in a timely manner, possibly avoid starvation issues. Maybe fast fail should be used to modify QUEUE FULL re-queue behaviours. It might be a bit odd in some cases and maybe hardware specific (for multi path io, does QUEUE FULL mean a storage port/controller is full, or the LUN is full? if port, then you want to send back to the upper level, but if LUN, you might want to let scsi core requeue it). -- Patrick Mansfield ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-11 17:45 ` Patrick Mansfield @ 2005-05-11 17:55 ` Luben Tuikov 2005-05-11 18:02 ` Patrick Mansfield 0 siblings, 1 reply; 18+ messages in thread From: Luben Tuikov @ 2005-05-11 17:55 UTC (permalink / raw) To: Patrick Mansfield Cc: dougg, Alan Stern, James Bottomley, SCSI development list On 05/11/05 13:45, Patrick Mansfield wrote: > IMO, normal sg/SG_IO usage wants scsi core to requeue the request, they > want the command to be processed by the LUN. > > And user space does not have the information available in scsi core > (status of other outstanding IO to the same LUN), and can't retry in a > timely manner, possibly avoid starvation issues. > > Maybe fast fail should be used to modify QUEUE FULL re-queue behaviours. Yes, if this is controllable through sg, that would avoid the problems Alan mentioned. > It might be a bit odd in some cases and maybe hardware specific (for multi > path io, does QUEUE FULL mean a storage port/controller is full, or the > LUN is full? if port, then you want to send back to the upper level, but > if LUN, you might want to let scsi core requeue it). If it is coming from SDS (TASK SET FULL) then it is the LUN. If it is coming from the LLDD, then it is the other (host queue full). Luben ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-11 17:55 ` Luben Tuikov @ 2005-05-11 18:02 ` Patrick Mansfield 0 siblings, 0 replies; 18+ messages in thread From: Patrick Mansfield @ 2005-05-11 18:02 UTC (permalink / raw) To: Luben Tuikov; +Cc: dougg, Alan Stern, James Bottomley, SCSI development list On Wed, May 11, 2005 at 01:55:50PM -0400, Luben Tuikov wrote: > On 05/11/05 13:45, Patrick Mansfield wrote: > > It might be a bit odd in some cases and maybe hardware specific (for multi > > path io, does QUEUE FULL mean a storage port/controller is full, or the > > LUN is full? if port, then you want to send back to the upper level, but > > if LUN, you might want to let scsi core requeue it). > > If it is coming from SDS (TASK SET FULL) then it is the LUN. If it > is coming from the LLDD, then it is the other (host queue full). I mean that if it is from the LUN, and there are multiple ports or controllers on the LUN, you don't know if the QUEUE FULL means the port is busy, or the LUN (or target) is busy. -- Patrick Mansfield ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-11 16:35 ` Luben Tuikov 2005-05-11 17:45 ` Patrick Mansfield @ 2005-05-11 19:43 ` Alan Stern 1 sibling, 0 replies; 18+ messages in thread From: Alan Stern @ 2005-05-11 19:43 UTC (permalink / raw) To: Luben Tuikov Cc: dougg, Patrick Mansfield, James Bottomley, SCSI development list On Wed, 11 May 2005, Luben Tuikov wrote: > I think Alan even wants to see the QUEUEFULL status not being > retried. > > That is, other than reclaiming the command from the LLDD, the > eh code should not retry or do anything with the command, just > pass it back to the user process who sent it via sg. Yes. > Maybe the application client has other tricks up its sleeve > on QUEUEFULL coming from the device. > > I think scsi generic as a passthrough, where minimal logic > from SCSI Core is applied and most is left to the application > client. That's how I think of it too. > I think he even wants the stronger condition -- after the command is reclaimed > from the LLDD, just send back the result to the application client. And let > the application client decide whether they want to send TUR or something else. Absolutely correct. Now, I have no idea how applications will handle exotic errors like QUEUEFULL. And it might make sense to have the core retry errors like that when they come from the LLDD and not from the device. But when an error does come from the device, it certainly should be passed directly back to the sg client. The error handler shouldn't do anything more than abort and clean up after a timed-out command. On Wed, 11 May 2005, Patrick Mansfield wrote: > The TUR is part of the error recovery, and so it should be sent > independent of the timed out command. Or shouldn't be sent, as the case may be. If the error handler isn't going to do error recovery and is going to leave it up to the application, then there's no reason to send the TUR. > I know we want to have transport specific error (or timeout) handling and > recovery. But for current usage, we should return the result of the > timeout to the upper level driver (and implicitly to the application). Right. Ultimately I don't know how transport-specific issues need to be handled. It would be best if they can be limited to minimal cleanup so that the application can do device-related error recovery. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-10 22:58 ` Patrick Mansfield 2005-05-11 11:59 ` Douglas Gilbert @ 2005-05-11 16:40 ` Luben Tuikov 2005-05-11 17:14 ` Patrick Mansfield 2005-05-16 17:42 ` [PATCH] saved and restore result for timed out commands Patrick Mansfield 2 siblings, 1 reply; 18+ messages in thread From: Luben Tuikov @ 2005-05-11 16:40 UTC (permalink / raw) To: Patrick Mansfield Cc: Alan Stern, James Bottomley, Doug Gilbert, SCSI development list On 05/10/05 18:58, Patrick Mansfield wrote: > As you say, what you really want is the correct result going back to the > user, not the result of the TUR. > > So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu. > The request sense one already saves and restores it. I think he even wants the stronger condition -- after the command is reclaimed from the LLDD, just send back the result to the application client. And let the application client decide whether they want to send TUR or something else. Luben ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: sg driver and the error handler 2005-05-11 16:40 ` Luben Tuikov @ 2005-05-11 17:14 ` Patrick Mansfield 0 siblings, 0 replies; 18+ messages in thread From: Patrick Mansfield @ 2005-05-11 17:14 UTC (permalink / raw) To: Luben Tuikov Cc: Alan Stern, James Bottomley, Doug Gilbert, SCSI development list On Wed, May 11, 2005 at 12:40:05PM -0400, Luben Tuikov wrote: > On 05/10/05 18:58, Patrick Mansfield wrote: > > As you say, what you really want is the correct result going back to the > > user, not the result of the TUR. > > > > So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu. > > The request sense one already saves and restores it. > > I think he even wants the stronger condition -- after the command is reclaimed > from the LLDD, just send back the result to the application client. And let > the application client decide whether they want to send TUR or something else. The TUR is part of the error recovery, and so it should be sent independent of the timed out command. I know we want to have transport specific error (or timeout) handling and recovery. But for current usage, we should return the result of the timeout to the upper level driver (and implicitly to the application). -- Patrick Mansfield ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] saved and restore result for timed out commands 2005-05-10 22:58 ` Patrick Mansfield 2005-05-11 11:59 ` Douglas Gilbert 2005-05-11 16:40 ` Luben Tuikov @ 2005-05-16 17:42 ` Patrick Mansfield 2005-05-16 19:42 ` Alan Stern 2005-06-01 18:45 ` Alan Stern 2 siblings, 2 replies; 18+ messages in thread From: Patrick Mansfield @ 2005-05-16 17:42 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, Doug Gilbert, SCSI development list Save and restore the scmd->result, so that timed out commands do not return the result of the TEST UNIT READY or the start/stop commands. Code is already in place to save and restore the result for the request sense case. The previous version of this patch erroneously removed the "if" check, instead add a comment as to why the "if" is needed. Signed-off-by: Patrick Mansfield <patmans@us.ibm.com> diff -uprN -X /home/patman/dontdiff scsi-misc-2.6.git/drivers/scsi/scsi_error.c to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c --- scsi-misc-2.6.git/drivers/scsi/scsi_error.c 2005-05-09 14:19:41.000000000 -0700 +++ to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c 2005-05-16 10:01:11.000000000 -0700 @@ -770,6 +770,7 @@ static int scsi_eh_tur(struct scsi_cmnd { static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0}; int retry_cnt = 1, rtn; + int saved_result; retry_tur: memcpy(scmd->cmnd, tur_command, sizeof(tur_command)); @@ -780,6 +781,7 @@ retry_tur: */ memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer)); + saved_result = scmd->result; scmd->request_buffer = NULL; scmd->request_bufflen = 0; scmd->use_sg = 0; @@ -794,6 +796,7 @@ retry_tur: * the original request, so let's restore the original data. (db) */ scsi_setup_cmd_retry(scmd); + scmd->result = saved_result; /* * hey, we are done. let's look to see what happened. @@ -896,6 +899,7 @@ static int scsi_eh_try_stu(struct scsi_c { static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0}; int rtn; + int saved_result; if (!scmd->device->allow_restart) return 1; @@ -908,6 +912,7 @@ static int scsi_eh_try_stu(struct scsi_c */ memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer)); + saved_result = scmd->result; scmd->request_buffer = NULL; scmd->request_bufflen = 0; scmd->use_sg = 0; @@ -922,6 +927,7 @@ static int scsi_eh_try_stu(struct scsi_c * the original request, so let's restore the original data. (db) */ scsi_setup_cmd_retry(scmd); + scmd->result = saved_result; /* * hey, we are done. let's look to see what happened. @@ -1561,6 +1567,11 @@ static void scsi_eh_flush_done_q(struct scmd)); scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY); } else { + /* + * If just we got sense for the device (called + * scsi_eh_get_sense), scmd->result is already + * set, do not set DRIVER_TIMEOUT. + */ if (!scmd->result) scmd->result |= (DRIVER_TIMEOUT << 24); SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish" ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-05-16 17:42 ` [PATCH] saved and restore result for timed out commands Patrick Mansfield @ 2005-05-16 19:42 ` Alan Stern 2005-06-01 18:45 ` Alan Stern 1 sibling, 0 replies; 18+ messages in thread From: Alan Stern @ 2005-05-16 19:42 UTC (permalink / raw) To: Patrick Mansfield; +Cc: James Bottomley, Doug Gilbert, SCSI development list On Mon, 16 May 2005, Patrick Mansfield wrote: > Save and restore the scmd->result, so that timed out commands do not > return the result of the TEST UNIT READY or the start/stop commands. Code > is already in place to save and restore the result for the request sense > case. > > The previous version of this patch erroneously removed the "if" check, > instead add a comment as to why the "if" is needed. I can confirm that this patch does help; the correct status is now returned to the sg user. However there are still a few things that should be fixed up: First, this doesn't erase the sense data after error recovery. I tried it with two different scenarios, one where the command returned a phase error and one where the command timed out. In both cases the sg user got back a buffer with valid sense data, from the TUR commands issued by the error handler. This could be highly misleading. Second, as Patrick has noted, this doesn't prevent the error handler from issuing those TURs in the first place. If some sort of Unit Attention came up, the sg user wouldn't be able to handle it correctly because the error handler would have swallowed it. Of course, once this is fixed the first problem will no longer be an issue (no TUR, hence no unsolicited sense data). Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-05-16 17:42 ` [PATCH] saved and restore result for timed out commands Patrick Mansfield 2005-05-16 19:42 ` Alan Stern @ 2005-06-01 18:45 ` Alan Stern 2005-06-01 21:00 ` James Bottomley 1 sibling, 1 reply; 18+ messages in thread From: Alan Stern @ 2005-06-01 18:45 UTC (permalink / raw) To: Patrick Mansfield; +Cc: James Bottomley, Doug Gilbert, SCSI development list On Mon, 16 May 2005, Patrick Mansfield wrote: > Save and restore the scmd->result, so that timed out commands do not > return the result of the TEST UNIT READY or the start/stop commands. Code > is already in place to save and restore the result for the request sense > case. > > The previous version of this patch erroneously removed the "if" check, > instead add a comment as to why the "if" is needed. Has there been any more progress on this issue? I spoke with Jörg Schilling (author of the cdrecord package, a widely-used sg client) and he said that failed operations should be retried automatically if the command had not gotten as far as being sent on the bus -- e.g., if the host adapter driver's queue was full. But if the command did get sent to the device then any subsequent failure should be reflected back to the client with no retrys. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-06-01 18:45 ` Alan Stern @ 2005-06-01 21:00 ` James Bottomley 2005-06-01 21:26 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: James Bottomley @ 2005-06-01 21:00 UTC (permalink / raw) To: Alan Stern; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list On Wed, 2005-06-01 at 14:45 -0400, Alan Stern wrote: > Has there been any more progress on this issue? Well, the patch is in scsi-misc, if that's what you're asking. > I spoke with Jörg Schilling (author of the cdrecord package, a widely-used > sg client) and he said that failed operations should be retried > automatically if the command had not gotten as far as being sent on the > bus -- e.g., if the host adapter driver's queue was full. But if the > command did get sent to the device then any subsequent failure should be > reflected back to the client with no retrys. That's pretty much what we do now ... with the exception of QUEUE_FULL and BUSY handling. What's the actual problem you were discussing? James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-06-01 21:00 ` James Bottomley @ 2005-06-01 21:26 ` Alan Stern 2005-06-01 21:57 ` Patrick Mansfield 2005-06-03 15:21 ` James Bottomley 0 siblings, 2 replies; 18+ messages in thread From: Alan Stern @ 2005-06-01 21:26 UTC (permalink / raw) To: James Bottomley; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list On Wed, 1 Jun 2005, James Bottomley wrote: > On Wed, 2005-06-01 at 14:45 -0400, Alan Stern wrote: > > Has there been any more progress on this issue? > > Well, the patch is in scsi-misc, if that's what you're asking. That's good. Is it too much to ask for an additional patch to zero out the sense data buffer after the error handler has finished with a command? After a transport error or a timeout there shouldn't be any sense data (races aside), but the error handler often leaves apparently valid data in the buffer, unrelated to the original command. By the way, is scsi-misc available anywhere as a patch (or set of patches) for those of us not on the bleeding edge, or does it exist only as a git repository? > > I spoke with Jörg Schilling (author of the cdrecord package, a widely-used > > sg client) and he said that failed operations should be retried > > automatically if the command had not gotten as far as being sent on the > > bus -- e.g., if the host adapter driver's queue was full. But if the > > command did get sent to the device then any subsequent failure should be > > reflected back to the client with no retrys. > > That's pretty much what we do now ... with the exception of QUEUE_FULL > and BUSY handling. What's the actual problem you were discussing? This isn't so much in response to a particular problem as it is an attempt to make the Linux sg driver more useful. (The discussion got started because of an actual problem, but that problem was at the USB level, not the SCSI level.) You may not have run across Schilling very much. He's done a lot of work on SCSI application programming on a variety of OS's. His opinion of the Linux SCSI stack isn't very high, although admittedly many of the problems he complains about only existed in much earlier versions of Linux. (Try reading through the some of the documentation or man pages for cdrecord and you'll see what I mean.) Anyway, he still talks about how Linux's sg interface is so much worse than Solaris's. Considering this issue of proper return codes and client control of the device, you can agree that he does have a point. He must think of himself purely as an applications programmer, not a kernel programmer, because he hasn't done anything to _fix_ these problems -- he prefers to complain about them. So I decided to see if something could be done to fix this particular issue, which does appear to be genuine. Overall, the motivation is simply to improve sg by making it more transparent, giving clients more control over the device, and returning valid and useful status. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-06-01 21:26 ` Alan Stern @ 2005-06-01 21:57 ` Patrick Mansfield 2005-06-03 15:21 ` James Bottomley 1 sibling, 0 replies; 18+ messages in thread From: Patrick Mansfield @ 2005-06-01 21:57 UTC (permalink / raw) To: Alan Stern; +Cc: James Bottomley, Doug Gilbert, SCSI development list On Wed, Jun 01, 2005 at 05:26:34PM -0400, Alan Stern wrote: > On Wed, 1 Jun 2005, James Bottomley wrote: > > > On Wed, 2005-06-01 at 14:45 -0400, Alan Stern wrote: > > > Has there been any more progress on this issue? > > > > Well, the patch is in scsi-misc, if that's what you're asking. > > That's good. Is it too much to ask for an additional patch to zero out > the sense data buffer after the error handler has finished with a command? > After a transport error or a timeout there shouldn't be any sense data > (races aside), but the error handler often leaves apparently valid data in > the buffer, unrelated to the original command. I was hoping someone else might comment on that, I am not certain if sense data is valid only when CHECK_CONDITION is set. But, we can zero/clear it after the TUR without hurting anything. -- Patrick Mansfield ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-06-01 21:26 ` Alan Stern 2005-06-01 21:57 ` Patrick Mansfield @ 2005-06-03 15:21 ` James Bottomley 2005-06-03 15:35 ` Alan Stern 1 sibling, 1 reply; 18+ messages in thread From: James Bottomley @ 2005-06-03 15:21 UTC (permalink / raw) To: Alan Stern; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list On Wed, 2005-06-01 at 17:26 -0400, Alan Stern wrote: > By the way, is scsi-misc available anywhere as a patch (or set of > patches) for those of us not on the bleeding edge, or does it exist only > as a git repository? I suppose I should have a cron job sending this information to linux- scsi weekly ... The git repositories are: rsync://www.parisc-linux.org/~jejb/git/scsi-misc-2.6.git rsync://www.parisc-linux.org/~jejb/git/scsi-rc-fixes-2.6.git There's a browser for the git tree at http://www.parisc-linux.org/cgi-bin/gitweb.pl And the patches and change logs are here http://www.parisc-linux.org/~jejb/scsi_diffs I'm also in the process of setting up SCSI git trees on www.kernel.org James ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] saved and restore result for timed out commands 2005-06-03 15:21 ` James Bottomley @ 2005-06-03 15:35 ` Alan Stern 0 siblings, 0 replies; 18+ messages in thread From: Alan Stern @ 2005-06-03 15:35 UTC (permalink / raw) To: James Bottomley; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list On Fri, 3 Jun 2005, James Bottomley wrote: > On Wed, 2005-06-01 at 17:26 -0400, Alan Stern wrote: > > By the way, is scsi-misc available anywhere as a patch (or set of > > patches) for those of us not on the bleeding edge, or does it exist only > > as a git repository? > > I suppose I should have a cron job sending this information to linux- > scsi weekly ... > > The git repositories are: > > rsync://www.parisc-linux.org/~jejb/git/scsi-misc-2.6.git > rsync://www.parisc-linux.org/~jejb/git/scsi-rc-fixes-2.6.git > > There's a browser for the git tree at > > http://www.parisc-linux.org/cgi-bin/gitweb.pl > > And the patches and change logs are here > > http://www.parisc-linux.org/~jejb/scsi_diffs > > I'm also in the process of setting up SCSI git trees on www.kernel.org Sorry -- it's my fault for not subscribing to linux-scsi (I don't post to it very often). Thanks for the pointers. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-06-03 15:35 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-10 18:51 sg driver and the error handler Alan Stern 2005-05-10 22:58 ` Patrick Mansfield 2005-05-11 11:59 ` Douglas Gilbert 2005-05-11 16:35 ` Luben Tuikov 2005-05-11 17:45 ` Patrick Mansfield 2005-05-11 17:55 ` Luben Tuikov 2005-05-11 18:02 ` Patrick Mansfield 2005-05-11 19:43 ` Alan Stern 2005-05-11 16:40 ` Luben Tuikov 2005-05-11 17:14 ` Patrick Mansfield 2005-05-16 17:42 ` [PATCH] saved and restore result for timed out commands Patrick Mansfield 2005-05-16 19:42 ` Alan Stern 2005-06-01 18:45 ` Alan Stern 2005-06-01 21:00 ` James Bottomley 2005-06-01 21:26 ` Alan Stern 2005-06-01 21:57 ` Patrick Mansfield 2005-06-03 15:21 ` James Bottomley 2005-06-03 15:35 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox