* [PATCH] Fix eh_abort race condition
@ 2004-02-25 16:11 Brian King
2004-02-25 17:55 ` Mike Anderson
0 siblings, 1 reply; 3+ messages in thread
From: Brian King @ 2004-02-25 16:11 UTC (permalink / raw)
To: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 193 bytes --]
The following patch fixes the race condition discussed here:
http://marc.theaimsgroup.com/?l=linux-scsi&m=107757213405773&w=2
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
[-- Attachment #2: patch-2.6.3-eh_abort.patch --]
[-- Type: text/plain, Size: 1786 bytes --]
The following patch fixes a race condition in abort processing. With this
patch, the mid-layer can now guarantee to LLDs that it will only call
eh_abort for ops which returned 0 in queuecommand and have not yet had
their ->done function called.
---
diff -puN drivers/scsi/scsi_error.c~eh_abort drivers/scsi/scsi_error.c
--- linux-2.6.3/drivers/scsi/scsi_error.c~eh_abort Wed Feb 25 09:54:52 2004
+++ linux-2.6.3-bjking1/drivers/scsi/scsi_error.c Wed Feb 25 09:54:52 2004
@@ -471,8 +471,9 @@ static int scsi_send_eh_cmnd(struct scsi
* we should treat them differently anyways.
*/
spin_lock_irqsave(scmd->device->host->host_lock, flags);
- if (scmd->device->host->hostt->eh_abort_handler)
- scmd->device->host->hostt->eh_abort_handler(scmd);
+ if (scmd->serial_number != 0)
+ if (scmd->device->host->hostt->eh_abort_handler)
+ scmd->device->host->hostt->eh_abort_handler(scmd);
spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
scmd->request->rq_status = RQ_SCSI_DONE;
@@ -687,17 +688,17 @@ static int scsi_try_to_abort_cmd(struct
if (!scmd->device->host->hostt->eh_abort_handler)
return rtn;
+ spin_lock_irqsave(scmd->device->host->host_lock, flags);
/*
* scsi_done was called just after the command timed out and before
* we had a chance to process it. (db)
*/
- if (scmd->serial_number == 0)
- return SUCCESS;
-
- scmd->owner = SCSI_OWNER_LOWLEVEL;
-
- spin_lock_irqsave(scmd->device->host->host_lock, flags);
- rtn = scmd->device->host->hostt->eh_abort_handler(scmd);
+ if (scmd->serial_number == 0) {
+ rtn = SUCCESS;
+ } else {
+ scmd->owner = SCSI_OWNER_LOWLEVEL;
+ rtn = scmd->device->host->hostt->eh_abort_handler(scmd);
+ }
spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
return rtn;
_
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fix eh_abort race condition
2004-02-25 16:11 [PATCH] Fix eh_abort race condition Brian King
@ 2004-02-25 17:55 ` Mike Anderson
2004-02-25 21:10 ` Brian King
0 siblings, 1 reply; 3+ messages in thread
From: Mike Anderson @ 2004-02-25 17:55 UTC (permalink / raw)
To: Brian King; +Cc: linux-scsi
Brian King [brking@us.ibm.com] wrote:
>
> The following patch fixes a race condition in abort processing. With this
> patch, the mid-layer can now guarantee to LLDs that it will only call
> eh_abort for ops which returned 0 in queuecommand and have not yet had
> their ->done function called.
How does this address the race of scsi_times_out has fired and the LLD
calls scsi_done. scsi_done will return prior to setting the serial
number to 0 if the timer has fired. The LLDD will believe scsi_done
added the command to the scsi_done_q, but it did not.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fix eh_abort race condition
2004-02-25 17:55 ` Mike Anderson
@ 2004-02-25 21:10 ` Brian King
0 siblings, 0 replies; 3+ messages in thread
From: Brian King @ 2004-02-25 21:10 UTC (permalink / raw)
To: Mike Anderson; +Cc: linux-scsi
Mike Anderson wrote:
> Brian King [brking@us.ibm.com] wrote:
>
>>The following patch fixes a race condition in abort processing. With this
>>patch, the mid-layer can now guarantee to LLDs that it will only call
>>eh_abort for ops which returned 0 in queuecommand and have not yet had
>>their ->done function called.
>
>
>
> How does this address the race of scsi_times_out has fired and the LLD
> calls scsi_done. scsi_done will return prior to setting the serial
> number to 0 if the timer has fired. The LLDD will believe scsi_done
> added the command to the scsi_done_q, but it did not.
It doesn't;) I missed that... Is the fix as simple as changing
scsi_times_out to zero the serial_number before calling scsi_delete_timer,
or am I missing something?
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-02-25 21:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25 16:11 [PATCH] Fix eh_abort race condition Brian King
2004-02-25 17:55 ` Mike Anderson
2004-02-25 21:10 ` Brian King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).