* [PATCH] qla2xxx: fix bad locking during eh_abort
@ 2005-05-26 23:19 Andrew Vasquez
2005-05-27 20:06 ` James Bottomley
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Vasquez @ 2005-05-26 23:19 UTC (permalink / raw)
To: Linux-SCSI Mailing List; +Cc: James Bottomley
James,
Please apply, this should go in before 2.6.12 is released.
Correct incorrect locking order in qla2xxx_eh_abort() handler which
would case a hang during certain code-paths.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
---
drivers/scsi/qla2xxx/qla_os.c | 3 +--
1 files changed, 1 insertion(+), 2 deletions(-)
drivers/scsi/qla2xxx/qla_os.c: needs update
Index: drivers/scsi/qla2xxx/qla_os.c
===================================================================
--- d9be308337b4cca0f0829b0bd62f1d5b830954e1/drivers/scsi/qla2xxx/qla_os.c (mode:100644)
+++ uncommitted/drivers/scsi/qla2xxx/qla_os.c (mode:100644)
@@ -547,16 +547,15 @@
break;
}
+ spin_unlock(&ha->hardware_lock);
/* Wait for the command to be returned. */
if (ret == SUCCESS) {
- spin_unlock(&ha->hardware_lock);
if (qla2x00_eh_wait_on_command(ha, cmd) != QLA_SUCCESS) {
qla_printk(KERN_ERR, ha,
"scsi(%ld:%d:%d): Abort handler timed out -- %lx "
"%x.\n", ha->host_no, id, lun, serial, ret);
}
- spin_lock(&ha->hardware_lock);
}
spin_lock_irq(ha->host->host_lock);
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-26 23:19 [PATCH] qla2xxx: fix bad locking during eh_abort Andrew Vasquez @ 2005-05-27 20:06 ` James Bottomley 2005-05-27 20:18 ` Andrew Vasquez 0 siblings, 1 reply; 9+ messages in thread From: James Bottomley @ 2005-05-27 20:06 UTC (permalink / raw) To: Andrew Vasquez; +Cc: Linux-SCSI Mailing List On Thu, 2005-05-26 at 16:19 -0700, Andrew Vasquez wrote: > Please apply, this should go in before 2.6.12 is released. I've got it in the rc-fixes tree for scsi. However: > + spin_unlock(&ha->hardware_lock); Should be spin_unlock_irq(&ha->hardware_lock); shouldn't it? Otherwise we could sleep with interrupts disabled and the kernel now squeaks about that. James ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 20:06 ` James Bottomley @ 2005-05-27 20:18 ` Andrew Vasquez 2005-05-27 20:30 ` Jeff Garzik 2005-05-27 20:31 ` James Bottomley 0 siblings, 2 replies; 9+ messages in thread From: Andrew Vasquez @ 2005-05-27 20:18 UTC (permalink / raw) To: James Bottomley, Jeff Garzik; +Cc: Linux-SCSI Mailing List On Fri, 27 May 2005, James Bottomley wrote: > On Thu, 2005-05-26 at 16:19 -0700, Andrew Vasquez wrote: > > Please apply, this should go in before 2.6.12 is released. > > I've got it in the rc-fixes tree for scsi. However: > > > + spin_unlock(&ha->hardware_lock); > > Should be spin_unlock_irq(&ha->hardware_lock); shouldn't it? Otherwise > we could sleep with interrupts disabled and the kernel now squeaks about > that. Yes, with the latest changes being proposed/implemented by Jeff G., there would need to be some additional massaging of the driver's eh_*() routines. Are you planning on putting the host_lock-free changes in for -rc, I figured that would be going into the next kernel rev. I mention that because I have a block of patches which I have queued-up to add new chip support to the driver. Jeff, could you drop the qla2xxx driver from your scrubing host_lock-free changes. I'll go ahead and post an updated patch following the patches in my queue. Thanks, Andrew Vasquez ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 20:18 ` Andrew Vasquez @ 2005-05-27 20:30 ` Jeff Garzik 2005-05-27 20:38 ` James Bottomley 2005-05-27 20:31 ` James Bottomley 1 sibling, 1 reply; 9+ messages in thread From: Jeff Garzik @ 2005-05-27 20:30 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Vasquez, Linux-SCSI Mailing List Andrew Vasquez wrote: > On Fri, 27 May 2005, James Bottomley wrote: > > >>On Thu, 2005-05-26 at 16:19 -0700, Andrew Vasquez wrote: >> >>>Please apply, this should go in before 2.6.12 is released. >> >>I've got it in the rc-fixes tree for scsi. However: >> >> >>>+ spin_unlock(&ha->hardware_lock); >> >>Should be spin_unlock_irq(&ha->hardware_lock); shouldn't it? Otherwise >>we could sleep with interrupts disabled and the kernel now squeaks about >>that. The ha->hardware_lock is obtained without spin_lock_irq(), so that's correct. The statement immediately following that one is spin_lock_irq(host_lock) > Yes, with the latest changes being proposed/implemented by Jeff G., > there would need to be some additional massaging of the driver's > eh_*() routines. No, I think James was under the impression that ha->hardware_lock was obtained with spin_lock_irq() > Jeff, could you drop the qla2xxx driver from your scrubing > host_lock-free changes. I'll go ahead and post an updated patch > following the patches in my queue. I'll filter them out appropriately. It's easier to leave in ATM, since the changes are off on an unofficial dev branch noone cares about. Jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 20:30 ` Jeff Garzik @ 2005-05-27 20:38 ` James Bottomley 2005-05-27 20:42 ` Jeff Garzik 0 siblings, 1 reply; 9+ messages in thread From: James Bottomley @ 2005-05-27 20:38 UTC (permalink / raw) To: Jeff Garzik; +Cc: Andrew Vasquez, Linux-SCSI Mailing List On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote: > The ha->hardware_lock is obtained without spin_lock_irq(), so that's > correct. Yes, that was my confusion But ... I wish you hadn't pointed it out. Now I look at the driver, the ha->hardware_lock is taken at interrupt level (in the interrupt routines). However, this sequence of code: spin_unlock_irq(ha->host->host_lock); spin_lock(&ha->hardware_lock); Enables interrupts then takes this lock. If we ever get a qla interrupt before we drop the ha->hardware_lock again, that will be a classic deadlock. James ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 20:38 ` James Bottomley @ 2005-05-27 20:42 ` Jeff Garzik 2005-05-27 22:04 ` Andrew Vasquez 0 siblings, 1 reply; 9+ messages in thread From: Jeff Garzik @ 2005-05-27 20:42 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Vasquez, Linux-SCSI Mailing List James Bottomley wrote: > On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote: > >>The ha->hardware_lock is obtained without spin_lock_irq(), so that's >>correct. > > > Yes, that was my confusion > > But ... I wish you hadn't pointed it out. Now I look at the driver, the > ha->hardware_lock is taken at interrupt level (in the interrupt > routines). > > However, this sequence of code: > > spin_unlock_irq(ha->host->host_lock); > spin_lock(&ha->hardware_lock); > > Enables interrupts then takes this lock. If we ever get a qla interrupt > before we drop the ha->hardware_lock again, that will be a classic > deadlock. Agreed, this was my analysis as well. This driver appears to get locking -backwards-: it uses spin_lock_irqsave() in interrupt handler. it does not use spin_lock_irqsave() outside interrupt handler. IMO this is a 2.6.12-rc issue... Jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 20:42 ` Jeff Garzik @ 2005-05-27 22:04 ` Andrew Vasquez 2005-05-27 22:47 ` Jeff Garzik 0 siblings, 1 reply; 9+ messages in thread From: Andrew Vasquez @ 2005-05-27 22:04 UTC (permalink / raw) To: Jeff Garzik; +Cc: James Bottomley, Linux-SCSI Mailing List On Fri, 27 May 2005, Jeff Garzik wrote: > James Bottomley wrote: > >On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote: > > > >>The ha->hardware_lock is obtained without spin_lock_irq(), so that's > >>correct. > > > > > >Yes, that was my confusion > > > >But ... I wish you hadn't pointed it out. Now I look at the driver, the > >ha->hardware_lock is taken at interrupt level (in the interrupt > >routines). > > > >However, this sequence of code: > > > > spin_unlock_irq(ha->host->host_lock); > > spin_lock(&ha->hardware_lock); > > > >Enables interrupts then takes this lock. If we ever get a qla interrupt > >before we drop the ha->hardware_lock again, that will be a classic > >deadlock. > > Agreed, this was my analysis as well. > > This driver appears to get locking -backwards-: > > it uses spin_lock_irqsave() in interrupt handler. Actually, the ISR function is used as a polling function during hardware initialization. But since at that point we've never registered it via request_irq(), it's safe to not use the _irqsave()/_irqrestore() variants. > it does not use spin_lock_irqsave() outside interrupt handler. > No, only within EH routines where I dealt (poorly) with the nested locking... > IMO this is a 2.6.12-rc issue... Arrg... This would essentailly revert a patch I sent last year: ftp://ftp.qlogic.com/outgoing/linux/patches/8.x/8.00.00b14k/18-eh_fixes.patch which I submitted to 'take care of' these nested spinlock issues. I'll go through my old emails to look for some 'justification' for the original patch (and subsequently climb into a dark whole for three days, when I can't). -- av Index: drivers/scsi/qla2xxx/qla_os.c =================================================================== --- d9be308337b4cca0f0829b0bd62f1d5b830954e1/drivers/scsi/qla2xxx/qla_os.c (mode:100644) +++ uncommitted/drivers/scsi/qla2xxx/qla_os.c (mode:100644) @@ -507,6 +507,7 @@ int ret, i; unsigned int id, lun; unsigned long serial; + unsigned long flags; if (!CMD_SP(cmd)) return FAILED; @@ -519,7 +520,7 @@ /* Check active list for command command. */ spin_unlock_irq(ha->host->host_lock); - spin_lock(&ha->hardware_lock); + spin_lock_irqsave(&ha->hardware_lock, flags); for (i = 1; i < MAX_OUTSTANDING_COMMANDS; i++) { sp = ha->outstanding_cmds[i]; @@ -534,7 +535,7 @@ sp->state)); DEBUG3(qla2x00_print_scsi_cmd(cmd);) - spin_unlock(&ha->hardware_lock); + spin_unlock_irqrestore(&ha->hardware_lock, flags); if (qla2x00_abort_command(ha, sp)) { DEBUG2(printk("%s(%ld): abort_command " "mbx failed.\n", __func__, ha->host_no)); @@ -543,20 +544,19 @@ "mbx success.\n", __func__, ha->host_no)); ret = SUCCESS; } - spin_lock(&ha->hardware_lock); + spin_lock_irqsave(&ha->hardware_lock, flags); break; } + spin_unlock_irqrestore(&ha->hardware_lock, flags); /* Wait for the command to be returned. */ if (ret == SUCCESS) { - spin_unlock(&ha->hardware_lock); if (qla2x00_eh_wait_on_command(ha, cmd) != QLA_SUCCESS) { qla_printk(KERN_ERR, ha, "scsi(%ld:%d:%d): Abort handler timed out -- %lx " "%x.\n", ha->host_no, id, lun, serial, ret); } - spin_lock(&ha->hardware_lock); } spin_lock_irq(ha->host->host_lock); @@ -588,6 +588,7 @@ int status; srb_t *sp; struct scsi_cmnd *cmd; + unsigned long flags; status = 0; @@ -596,11 +597,11 @@ * array */ for (cnt = 1; cnt < MAX_OUTSTANDING_COMMANDS; cnt++) { - spin_lock(&ha->hardware_lock); + spin_lock_irqsave(&ha->hardware_lock, flags); sp = ha->outstanding_cmds[cnt]; if (sp) { cmd = sp->cmd; - spin_unlock(&ha->hardware_lock); + spin_unlock_irqrestore(&ha->hardware_lock, flags); if (cmd->device->id == t) { if (!qla2x00_eh_wait_on_command(ha, cmd)) { status = 1; @@ -608,7 +609,7 @@ } } } else { - spin_unlock(&ha->hardware_lock); + spin_unlock_irqrestore(&ha->hardware_lock, flags); } } return (status); @@ -740,6 +741,7 @@ int status; srb_t *sp; struct scsi_cmnd *cmd; + unsigned long flags; status = 1; @@ -748,17 +750,17 @@ * array */ for (cnt = 1; cnt < MAX_OUTSTANDING_COMMANDS; cnt++) { - spin_lock(&ha->hardware_lock); + spin_lock_irqsave(&ha->hardware_lock, flags); sp = ha->outstanding_cmds[cnt]; if (sp) { cmd = sp->cmd; - spin_unlock(&ha->hardware_lock); + spin_unlock_irqrestore(&ha->hardware_lock, flags); status = qla2x00_eh_wait_on_command(ha, cmd); if (status == 0) break; } else { - spin_unlock(&ha->hardware_lock); + spin_unlock_irqrestore(&ha->hardware_lock, flags); } } return (status); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 22:04 ` Andrew Vasquez @ 2005-05-27 22:47 ` Jeff Garzik 0 siblings, 0 replies; 9+ messages in thread From: Jeff Garzik @ 2005-05-27 22:47 UTC (permalink / raw) To: Andrew Vasquez; +Cc: James Bottomley, Linux-SCSI Mailing List Andrew Vasquez wrote: > On Fri, 27 May 2005, Jeff Garzik wrote: > > >>James Bottomley wrote: >> >>>On Fri, 2005-05-27 at 16:30 -0400, Jeff Garzik wrote: >>> >>> >>>>The ha->hardware_lock is obtained without spin_lock_irq(), so that's >>>>correct. >>> >>> >>>Yes, that was my confusion >>> >>>But ... I wish you hadn't pointed it out. Now I look at the driver, the >>>ha->hardware_lock is taken at interrupt level (in the interrupt >>>routines). >>> >>>However, this sequence of code: >>> >>> spin_unlock_irq(ha->host->host_lock); >>> spin_lock(&ha->hardware_lock); >>> >>>Enables interrupts then takes this lock. If we ever get a qla interrupt >>>before we drop the ha->hardware_lock again, that will be a classic >>>deadlock. >> >>Agreed, this was my analysis as well. >> >>This driver appears to get locking -backwards-: >> >>it uses spin_lock_irqsave() in interrupt handler. > > > Actually, the ISR function is used as a polling function during > hardware initialization. But since at that point we've never > registered it via request_irq(), it's safe to not use the > _irqsave()/_irqrestore() variants. I see what's going on, thanks for explaining. Note using _irqsave in the interrupt handler is not recommended and quite unusual, for performance reasons if nothing else. The fast, common path [interrupt handler] should use the least expensive spinlock variant: spin_lock(). This implies that other [slow] paths need to be changed to call local_irq_save() or spin_lock_irq() or whatnot. Jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] qla2xxx: fix bad locking during eh_abort 2005-05-27 20:18 ` Andrew Vasquez 2005-05-27 20:30 ` Jeff Garzik @ 2005-05-27 20:31 ` James Bottomley 1 sibling, 0 replies; 9+ messages in thread From: James Bottomley @ 2005-05-27 20:31 UTC (permalink / raw) To: Andrew Vasquez; +Cc: Jeff Garzik, Linux-SCSI Mailing List On Fri, 2005-05-27 at 13:18 -0700, Andrew Vasquez wrote: > Yes, with the latest changes being proposed/implemented by Jeff G., > there would need to be some additional massaging of the driver's > eh_*() routines. > > Are you planning on putting the host_lock-free changes in for -rc, > I figured that would be going into the next kernel rev. I mention > that because I have a block of patches which I have queued-up to add > new chip support to the driver. No ... we're too close to a real kernel to put something like that in. I'm just doing minimal essential bug fixes at the moment. James ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-05-27 22:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-26 23:19 [PATCH] qla2xxx: fix bad locking during eh_abort Andrew Vasquez 2005-05-27 20:06 ` James Bottomley 2005-05-27 20:18 ` Andrew Vasquez 2005-05-27 20:30 ` Jeff Garzik 2005-05-27 20:38 ` James Bottomley 2005-05-27 20:42 ` Jeff Garzik 2005-05-27 22:04 ` Andrew Vasquez 2005-05-27 22:47 ` Jeff Garzik 2005-05-27 20:31 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox