[PATCH] qla2xxx: fix bad locking during eh

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] qla2xxx: fix bad locking during eh_abort
@ 2005-05-26 23:19 Andrew Vasquez
  2005-05-27 20:06 ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Vasquez @ 2005-05-26 23:19 UTC (permalink / raw)
  To: Linux-SCSI Mailing List; +Cc: James Bottomley

James,

Please apply, this should go in before 2.6.12 is released.




Correct incorrect locking order in qla2xxx_eh_abort() handler which
would case a hang during certain code-paths.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
---

 drivers/scsi/qla2xxx/qla_os.c |    3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)

drivers/scsi/qla2xxx/qla_os.c: needs update
Index: drivers/scsi/qla2xxx/qla_os.c
===================================================================
--- d9be308337b4cca0f0829b0bd62f1d5b830954e1/drivers/scsi/qla2xxx/qla_os.c  (mode:100644)
+++ uncommitted/drivers/scsi/qla2xxx/qla_os.c  (mode:100644)
@@ -547,16 +547,15 @@
 
 		break;
 	}
+	spin_unlock(&ha->hardware_lock);
 
 	/* Wait for the command to be returned. */
 	if (ret == SUCCESS) {
-		spin_unlock(&ha->hardware_lock);
 		if (qla2x00_eh_wait_on_command(ha, cmd) != QLA_SUCCESS) {
 			qla_printk(KERN_ERR, ha, 
 			    "scsi(%ld:%d:%d): Abort handler timed out -- %lx "
 			    "%x.\n", ha->host_no, id, lun, serial, ret);
 		}
-		spin_lock(&ha->hardware_lock);
 	}
 	spin_lock_irq(ha->host->host_lock);
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-26 23:19 [PATCH] qla2xxx: fix bad locking during eh_abort Andrew Vasquez
@ 2005-05-27 20:06 ` James Bottomley
  2005-05-27 20:18   ` Andrew Vasquez
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2005-05-27 20:06 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: Linux-SCSI Mailing List

On Thu, 2005-05-26 at 16:19 -0700, Andrew Vasquez wrote:
> Please apply, this should go in before 2.6.12 is released.

I've got it in the rc-fixes tree for scsi.  However:

> +	spin_unlock(&ha->hardware_lock);

Should be spin_unlock_irq(&ha->hardware_lock); shouldn't it?  Otherwise
we could sleep with interrupts disabled and the kernel now squeaks about
that.

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 20:06 ` James Bottomley
@ 2005-05-27 20:18   ` Andrew Vasquez
  2005-05-27 20:30     ` Jeff Garzik
  2005-05-27 20:31     ` James Bottomley
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Vasquez @ 2005-05-27 20:18 UTC (permalink / raw)
  To: James Bottomley, Jeff Garzik; +Cc: Linux-SCSI Mailing List

On Fri, 27 May 2005, James Bottomley wrote:

> On Thu, 2005-05-26 at 16:19 -0700, Andrew Vasquez wrote:
> > Please apply, this should go in before 2.6.12 is released.
> 
> I've got it in the rc-fixes tree for scsi.  However:
> 
> > +	spin_unlock(&ha->hardware_lock);
> 
> Should be spin_unlock_irq(&ha->hardware_lock); shouldn't it?  Otherwise
> we could sleep with interrupts disabled and the kernel now squeaks about
> that.

Yes, with the latest changes being proposed/implemented by Jeff G.,
there would need to be some additional massaging of the driver's
eh_*() routines.

Are you planning on putting the host_lock-free changes in for -rc,
I figured that would be going into the next kernel rev.  I mention
that because I have a block of patches which I have queued-up to add
new chip support to the driver.

Jeff, could you drop the qla2xxx driver from your scrubing
host_lock-free changes.  I'll go ahead and post an updated patch
following the patches in my queue.

Thanks,
Andrew Vasquez

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 20:18   ` Andrew Vasquez
@ 2005-05-27 20:30     ` Jeff Garzik
  2005-05-27 20:38       ` James Bottomley
  2005-05-27 20:31     ` James Bottomley
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2005-05-27 20:30 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Vasquez, Linux-SCSI Mailing List

Andrew Vasquez wrote:
> On Fri, 27 May 2005, James Bottomley wrote:
> 
> 
>>On Thu, 2005-05-26 at 16:19 -0700, Andrew Vasquez wrote:
>>
>>>Please apply, this should go in before 2.6.12 is released.
>>
>>I've got it in the rc-fixes tree for scsi.  However:
>>
>>
>>>+	spin_unlock(&ha->hardware_lock);
>>
>>Should be spin_unlock_irq(&ha->hardware_lock); shouldn't it?  Otherwise
>>we could sleep with interrupts disabled and the kernel now squeaks about
>>that.

The ha->hardware_lock is obtained without spin_lock_irq(), so that's 
correct.

The statement immediately following that one is spin_lock_irq(host_lock)


> Yes, with the latest changes being proposed/implemented by Jeff G.,
> there would need to be some additional massaging of the driver's
> eh_*() routines.

No, I think James was under the impression that ha->hardware_lock was 
obtained with spin_lock_irq()


> Jeff, could you drop the qla2xxx driver from your scrubing
> host_lock-free changes.  I'll go ahead and post an updated patch
> following the patches in my queue.

I'll filter them out appropriately.  It's easier to leave in ATM, since 
the changes are off on an unofficial dev branch noone cares about.

	Jeff



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 20:30     ` Jeff Garzik
@ 2005-05-27 20:38       ` James Bottomley
  2005-05-27 20:42         ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2005-05-27 20:38 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Vasquez, Linux-SCSI Mailing List

On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote:
> The ha->hardware_lock is obtained without spin_lock_irq(), so that's 
> correct.

Yes, that was my confusion

But ... I wish you hadn't pointed it out.  Now I look at the driver, the
ha->hardware_lock is taken at interrupt level (in the interrupt
routines).

However, this sequence of code:

	spin_unlock_irq(ha->host->host_lock);
	spin_lock(&ha->hardware_lock);

Enables interrupts then takes this lock.  If we ever get a qla interrupt
before we drop the ha->hardware_lock again, that will be a classic
deadlock.

James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 20:38       ` James Bottomley
@ 2005-05-27 20:42         ` Jeff Garzik
  2005-05-27 22:04           ` Andrew Vasquez
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2005-05-27 20:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Vasquez, Linux-SCSI Mailing List

James Bottomley wrote:
> On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote:
> 
>>The ha->hardware_lock is obtained without spin_lock_irq(), so that's 
>>correct.
> 
> 
> Yes, that was my confusion
> 
> But ... I wish you hadn't pointed it out.  Now I look at the driver, the
> ha->hardware_lock is taken at interrupt level (in the interrupt
> routines).
> 
> However, this sequence of code:
> 
> 	spin_unlock_irq(ha->host->host_lock);
> 	spin_lock(&ha->hardware_lock);
> 
> Enables interrupts then takes this lock.  If we ever get a qla interrupt
> before we drop the ha->hardware_lock again, that will be a classic
> deadlock.

Agreed, this was my analysis as well.

This driver appears to get locking -backwards-:

it uses spin_lock_irqsave() in interrupt handler.
it does not use spin_lock_irqsave() outside interrupt handler.

IMO this is a 2.6.12-rc issue...

	Jeff




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 20:42         ` Jeff Garzik
@ 2005-05-27 22:04           ` Andrew Vasquez
  2005-05-27 22:47             ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Vasquez @ 2005-05-27 22:04 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: James Bottomley, Linux-SCSI Mailing List

On Fri, 27 May 2005, Jeff Garzik wrote:

> James Bottomley wrote:
> >On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote:
> >
> >>The ha->hardware_lock is obtained without spin_lock_irq(), so that's 
> >>correct.
> >
> >
> >Yes, that was my confusion
> >
> >But ... I wish you hadn't pointed it out.  Now I look at the driver, the
> >ha->hardware_lock is taken at interrupt level (in the interrupt
> >routines).
> >
> >However, this sequence of code:
> >
> >	spin_unlock_irq(ha->host->host_lock);
> >	spin_lock(&ha->hardware_lock);
> >
> >Enables interrupts then takes this lock.  If we ever get a qla interrupt
> >before we drop the ha->hardware_lock again, that will be a classic
> >deadlock.
> 
> Agreed, this was my analysis as well.
> 
> This driver appears to get locking -backwards-:
> 
> it uses spin_lock_irqsave() in interrupt handler.

Actually, the ISR function is used as a polling function during
hardware initialization.  But since at that point we've never
registered it via request_irq(), it's safe to not use the
_irqsave()/_irqrestore() variants.

> it does not use spin_lock_irqsave() outside interrupt handler.
> 

No, only within EH routines where I dealt (poorly) with the nested
locking...

> IMO this is a 2.6.12-rc issue...

Arrg...  This would essentailly revert a patch I sent last year:

	ftp://ftp.qlogic.com/outgoing/linux/patches/8.x/8.00.00b14k/18-eh_fixes.patch

which I submitted to 'take care of' these nested spinlock issues.
I'll go through my old emails to look for some 'justification' for the
original patch (and subsequently climb into a dark whole for three
days, when I can't).

--
av



Index: drivers/scsi/qla2xxx/qla_os.c
===================================================================
--- d9be308337b4cca0f0829b0bd62f1d5b830954e1/drivers/scsi/qla2xxx/qla_os.c  (mode:100644)
+++ uncommitted/drivers/scsi/qla2xxx/qla_os.c  (mode:100644)
@@ -507,6 +507,7 @@
 	int ret, i;
 	unsigned int id, lun;
 	unsigned long serial;
+	unsigned long flags;
 
 	if (!CMD_SP(cmd))
 		return FAILED;
@@ -519,7 +520,7 @@
 
 	/* Check active list for command command. */
 	spin_unlock_irq(ha->host->host_lock);
-	spin_lock(&ha->hardware_lock);
+	spin_lock_irqsave(&ha->hardware_lock, flags);
 	for (i = 1; i < MAX_OUTSTANDING_COMMANDS; i++) {
 		sp = ha->outstanding_cmds[i];
 
@@ -534,7 +535,7 @@
 		    sp->state));
 		DEBUG3(qla2x00_print_scsi_cmd(cmd);)
 
-		spin_unlock(&ha->hardware_lock);
+		spin_unlock_irqrestore(&ha->hardware_lock, flags);
 		if (qla2x00_abort_command(ha, sp)) {
 			DEBUG2(printk("%s(%ld): abort_command "
 			    "mbx failed.\n", __func__, ha->host_no));
@@ -543,20 +544,19 @@
 			    "mbx success.\n", __func__, ha->host_no));
 			ret = SUCCESS;
 		}
-		spin_lock(&ha->hardware_lock);
+		spin_lock_irqsave(&ha->hardware_lock, flags);
 
 		break;
 	}
+	spin_unlock_irqrestore(&ha->hardware_lock, flags);
 
 	/* Wait for the command to be returned. */
 	if (ret == SUCCESS) {
-		spin_unlock(&ha->hardware_lock);
 		if (qla2x00_eh_wait_on_command(ha, cmd) != QLA_SUCCESS) {
 			qla_printk(KERN_ERR, ha, 
 			    "scsi(%ld:%d:%d): Abort handler timed out -- %lx "
 			    "%x.\n", ha->host_no, id, lun, serial, ret);
 		}
-		spin_lock(&ha->hardware_lock);
 	}
 	spin_lock_irq(ha->host->host_lock);
 
@@ -588,6 +588,7 @@
 	int	status;
 	srb_t		*sp;
 	struct scsi_cmnd *cmd;
+	unsigned long flags;
 
 	status = 0;
 
@@ -596,11 +597,11 @@
 	 * array
 	 */
 	for (cnt = 1; cnt < MAX_OUTSTANDING_COMMANDS; cnt++) {
-		spin_lock(&ha->hardware_lock);
+		spin_lock_irqsave(&ha->hardware_lock, flags);
 		sp = ha->outstanding_cmds[cnt];
 		if (sp) {
 			cmd = sp->cmd;
-			spin_unlock(&ha->hardware_lock);
+			spin_unlock_irqrestore(&ha->hardware_lock, flags);
 			if (cmd->device->id == t) {
 				if (!qla2x00_eh_wait_on_command(ha, cmd)) {
 					status = 1;
@@ -608,7 +609,7 @@
 				}
 			}
 		} else {
-			spin_unlock(&ha->hardware_lock);
+			spin_unlock_irqrestore(&ha->hardware_lock, flags);
 		}
 	}
 	return (status);
@@ -740,6 +741,7 @@
 	int	status;
 	srb_t		*sp;
 	struct scsi_cmnd *cmd;
+	unsigned long flags;
 
 	status = 1;
 
@@ -748,17 +750,17 @@
 	 * array
 	 */
 	for (cnt = 1; cnt < MAX_OUTSTANDING_COMMANDS; cnt++) {
-		spin_lock(&ha->hardware_lock);
+		spin_lock_irqsave(&ha->hardware_lock, flags);
 		sp = ha->outstanding_cmds[cnt];
 		if (sp) {
 			cmd = sp->cmd;
-			spin_unlock(&ha->hardware_lock);
+			spin_unlock_irqrestore(&ha->hardware_lock, flags);
 			status = qla2x00_eh_wait_on_command(ha, cmd);
 			if (status == 0)
 				break;
 		}
 		else {
-			spin_unlock(&ha->hardware_lock);
+			spin_unlock_irqrestore(&ha->hardware_lock, flags);
 		}
 	}
 	return (status);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 22:04           ` Andrew Vasquez
@ 2005-05-27 22:47             ` Jeff Garzik
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2005-05-27 22:47 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: James Bottomley, Linux-SCSI Mailing List

Andrew Vasquez wrote:
> On Fri, 27 May 2005, Jeff Garzik wrote:
> 
> 
>>James Bottomley wrote:
>>
>>>On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote:
>>>
>>>
>>>>The ha->hardware_lock is obtained without spin_lock_irq(), so that's 
>>>>correct.
>>>
>>>
>>>Yes, that was my confusion
>>>
>>>But ... I wish you hadn't pointed it out.  Now I look at the driver, the
>>>ha->hardware_lock is taken at interrupt level (in the interrupt
>>>routines).
>>>
>>>However, this sequence of code:
>>>
>>>	spin_unlock_irq(ha->host->host_lock);
>>>	spin_lock(&ha->hardware_lock);
>>>
>>>Enables interrupts then takes this lock.  If we ever get a qla interrupt
>>>before we drop the ha->hardware_lock again, that will be a classic
>>>deadlock.
>>
>>Agreed, this was my analysis as well.
>>
>>This driver appears to get locking -backwards-:
>>
>>it uses spin_lock_irqsave() in interrupt handler.
> 
> 
> Actually, the ISR function is used as a polling function during
> hardware initialization.  But since at that point we've never
> registered it via request_irq(), it's safe to not use the
> _irqsave()/_irqrestore() variants.

I see what's going on, thanks for explaining.

Note using _irqsave in the interrupt handler is not recommended and 
quite unusual, for performance reasons if nothing else.  The fast, 
common path [interrupt handler] should use the least expensive spinlock 
variant: spin_lock().  This implies that other [slow] paths need to be 
changed to call local_irq_save() or spin_lock_irq() or whatnot.

	Jeff



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] qla2xxx: fix bad locking during eh_abort
  2005-05-27 20:18   ` Andrew Vasquez
  2005-05-27 20:30     ` Jeff Garzik
@ 2005-05-27 20:31     ` James Bottomley
  1 sibling, 0 replies; 9+ messages in thread
From: James Bottomley @ 2005-05-27 20:31 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: Jeff Garzik, Linux-SCSI Mailing List

On Fri, 2005-05-27 at 13:18 -0700, Andrew Vasquez wrote:
> Yes, with the latest changes being proposed/implemented by Jeff G.,
> there would need to be some additional massaging of the driver's
> eh_*() routines.
> 
> Are you planning on putting the host_lock-free changes in for -rc,
> I figured that would be going into the next kernel rev.  I mention
> that because I have a block of patches which I have queued-up to add
> new chip support to the driver.

No ... we're too close to a real kernel to put something like that in.
I'm just doing minimal essential bug fixes at the moment.

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-05-27 22:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-26 23:19 [PATCH] qla2xxx: fix bad locking during eh_abort Andrew Vasquez
2005-05-27 20:06 ` James Bottomley
2005-05-27 20:18   ` Andrew Vasquez
2005-05-27 20:30     ` Jeff Garzik
2005-05-27 20:38       ` James Bottomley
2005-05-27 20:42         ` Jeff Garzik
2005-05-27 22:04           ` Andrew Vasquez
2005-05-27 22:47             ` Jeff Garzik
2005-05-27 20:31     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox