linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix eh_abort race condition
@ 2004-02-25 16:11 Brian King
  2004-02-25 17:55 ` Mike Anderson
  0 siblings, 1 reply; 3+ messages in thread
From: Brian King @ 2004-02-25 16:11 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 193 bytes --]

The following patch fixes the race condition discussed here:

http://marc.theaimsgroup.com/?l=linux-scsi&m=107757213405773&w=2



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: patch-2.6.3-eh_abort.patch --]
[-- Type: text/plain, Size: 1786 bytes --]


The following patch fixes a race condition in abort processing. With this
patch, the mid-layer can now guarantee to LLDs that it will only call
eh_abort for ops which returned 0 in queuecommand and have not yet had 
their ->done function called.

---


diff -puN drivers/scsi/scsi_error.c~eh_abort drivers/scsi/scsi_error.c
--- linux-2.6.3/drivers/scsi/scsi_error.c~eh_abort	Wed Feb 25 09:54:52 2004
+++ linux-2.6.3-bjking1/drivers/scsi/scsi_error.c	Wed Feb 25 09:54:52 2004
@@ -471,8 +471,9 @@ static int scsi_send_eh_cmnd(struct scsi
 		 * we should treat them differently anyways.
 		 */
 		spin_lock_irqsave(scmd->device->host->host_lock, flags);
-		if (scmd->device->host->hostt->eh_abort_handler)
-			scmd->device->host->hostt->eh_abort_handler(scmd);
+		if (scmd->serial_number != 0)
+			if (scmd->device->host->hostt->eh_abort_handler)
+				scmd->device->host->hostt->eh_abort_handler(scmd);
 		spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
 			
 		scmd->request->rq_status = RQ_SCSI_DONE;
@@ -687,17 +688,17 @@ static int scsi_try_to_abort_cmd(struct 
 	if (!scmd->device->host->hostt->eh_abort_handler)
 		return rtn;
 
+	spin_lock_irqsave(scmd->device->host->host_lock, flags);
 	/*
 	 * scsi_done was called just after the command timed out and before
 	 * we had a chance to process it. (db)
 	 */
-	if (scmd->serial_number == 0)
-		return SUCCESS;
-
-	scmd->owner = SCSI_OWNER_LOWLEVEL;
-
-	spin_lock_irqsave(scmd->device->host->host_lock, flags);
-	rtn = scmd->device->host->hostt->eh_abort_handler(scmd);
+	if (scmd->serial_number == 0) {
+		rtn = SUCCESS;
+	} else {
+		scmd->owner = SCSI_OWNER_LOWLEVEL;
+		rtn = scmd->device->host->hostt->eh_abort_handler(scmd);
+	}
 	spin_unlock_irqrestore(scmd->device->host->host_lock, flags);
 
 	return rtn;

_

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix eh_abort race condition
  2004-02-25 16:11 [PATCH] Fix eh_abort race condition Brian King
@ 2004-02-25 17:55 ` Mike Anderson
  2004-02-25 21:10   ` Brian King
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Anderson @ 2004-02-25 17:55 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

Brian King [brking@us.ibm.com] wrote:
> 
> The following patch fixes a race condition in abort processing. With this
> patch, the mid-layer can now guarantee to LLDs that it will only call
> eh_abort for ops which returned 0 in queuecommand and have not yet had 
> their ->done function called.


How does this address the race of scsi_times_out has fired and the LLD
calls scsi_done. scsi_done will return prior to setting the serial
number to 0 if the timer has fired. The LLDD will believe scsi_done
added the command to the scsi_done_q, but it did not.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix eh_abort race condition
  2004-02-25 17:55 ` Mike Anderson
@ 2004-02-25 21:10   ` Brian King
  0 siblings, 0 replies; 3+ messages in thread
From: Brian King @ 2004-02-25 21:10 UTC (permalink / raw)
  To: Mike Anderson; +Cc: linux-scsi

Mike Anderson wrote:
> Brian King [brking@us.ibm.com] wrote:
> 
>>The following patch fixes a race condition in abort processing. With this
>>patch, the mid-layer can now guarantee to LLDs that it will only call
>>eh_abort for ops which returned 0 in queuecommand and have not yet had 
>>their ->done function called.
> 
> 
> 
> How does this address the race of scsi_times_out has fired and the LLD
> calls scsi_done. scsi_done will return prior to setting the serial
> number to 0 if the timer has fired. The LLDD will believe scsi_done
> added the command to the scsi_done_q, but it did not.

It doesn't;) I missed that... Is the fix as simple as changing
scsi_times_out to zero the serial_number before calling scsi_delete_timer,
or am I missing something?



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-02-25 21:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25 16:11 [PATCH] Fix eh_abort race condition Brian King
2004-02-25 17:55 ` Mike Anderson
2004-02-25 21:10   ` Brian King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).