All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-11-28 21:16 James.Smart
  2005-11-29  7:11 ` Douglas Gilbert
  2005-11-30 19:17 ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: James.Smart @ 2005-11-28 21:16 UTC (permalink / raw)
  To: hch; +Cc: stefanr, linux-scsi


> So you this storage vendor please provide us this test case 
> as a start?

Basic test case:
 - start load on the fc disks
 - run sg_reset -b /dev/sgX or sg_reset -d /dev/sgX

> That beeing said I mentioned that we need to do SG-based 
> resets via the
> EH thread aswell.  As Emulex apparently cares about the above 
> case would
> you mind implementing it?

Well - we would prefer the vendor test the system as it should behave
in normal circumstances rather than test-suite-derived conditions.

Yes, we will sign up for this. However, we have several items on our
plate so we won't be on this quickly. I would hope there would be no
issue with accepting the patch as is and backing out the relatived code
as part of the eventual patch.  We'll also be contacting the SG maintainer
to get his/her thoughts.

-- james


^ permalink raw reply	[flat|nested] 9+ messages in thread
* RE: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-12-01 15:37 James.Smart
  0 siblings, 0 replies; 9+ messages in thread
From: James.Smart @ 2005-12-01 15:37 UTC (permalink / raw)
  To: hch; +Cc: stefanr, linux-scsi

> > Yes, we will sign up for this. However, we have several items on our
> > plate so we won't be on this quickly. I would hope there would be no
> > issue with accepting the patch as is and backing out the 
> relatived code
> > as part of the eventual patch.
> 
> No.  As you might have noticed fixing mid-layer bugs in 
> drivers is generally
> not okay.

Bug ?  I thought this was a feature.....  :)

I think it's a little silly to use the strong-arm tactics on this kind of
patch. I've already committed to fix it. All it does is lessen -rc testing
on the driver changes, especially if it holds up the whole group of patches
which includes things (and bug fixes) that are unrelated.

Given your response, you deem this high priority. So, I'll get on the fix
and try to get it done in time to make 2.6.15.

-- james

^ permalink raw reply	[flat|nested] 9+ messages in thread
* RE: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-11-28 19:08 James.Smart
  2005-11-28 19:27 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: James.Smart @ 2005-11-28 19:08 UTC (permalink / raw)
  To: stefanr, linux-scsi

This is true - when invoked via the error handler thread.  Unfortunately,
reset functions invoked via SG do not have the same behavior. There is a
rather large test suite from a big storage vendor that uses asynchronous
resets via SG extensively.

-- james.

> -----Original Message-----
> From: Stefan Richter [mailto:stefanr@s5r6.in-berlin.de]
> Sent: Monday, November 28, 2005 2:03 PM
> To: linux-scsi@vger.kernel.org
> Cc: Smart, James
> Subject: Re: [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers
> 
> 
> James Smart wrote:
> > - Serialize EH calls and block requests when EH function is running.
> 
> Is this necessary? From what I understood from docs or 
> comments, there 
> is nothing enqueued while an EH is run. (Furthermore, EHs are not 
> concurrent per host.) Or am I mistaken?
> -- 
> Stefan Richter
> -=====-=-=-= =-== ===--
> http://arcgraph.de/sr/
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread
* [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-11-28 16:42 James.Smart
  2005-11-28 19:02 ` Stefan Richter
  0 siblings, 1 reply; 9+ messages in thread
From: James.Smart @ 2005-11-28 16:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Smart


Fixes to error handlers:
- Release task management command before counting outstanding commands.
  TMF was being erroneously counted as an active outstanding command.
- Serialize EH calls and block requests when EH function is running.


Signed-off-by: James Smart <James.Smart@emulex.com>


diff -upNr a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
--- a/drivers/scsi/lpfc/lpfc.h	2005-11-24 10:07:36.000000000 -0500
+++ b/drivers/scsi/lpfc/lpfc.h	2005-11-28 10:32:43.000000000 -0500
@@ -167,6 +167,7 @@ struct lpfc_hba {
 	dma_addr_t slim2p_mapping;
 	uint16_t pci_cfg_value;
 
+	struct semaphore hba_can_block;
 	uint32_t hba_state;
 
 #define LPFC_INIT_START           1	/* Initial state after board reset */
diff -upNr a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
--- a/drivers/scsi/lpfc/lpfc_init.c	2005-11-28 10:15:37.000000000 -0500
+++ b/drivers/scsi/lpfc/lpfc_init.c	2005-11-28 10:32:43.000000000 -0500
@@ -1345,7 +1345,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
 		goto out_put_host;
 
 	host->unique_id = phba->brd_no;
-
+	init_MUTEX(&phba->hba_can_block);
 	INIT_LIST_HEAD(&phba->ctrspbuflist);
 	INIT_LIST_HEAD(&phba->rnidrspbuflist);
 	INIT_LIST_HEAD(&phba->freebufList);
diff -upNr a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
--- a/drivers/scsi/lpfc/lpfc_scsi.c	2005-11-28 10:26:47.000000000 -0500
+++ b/drivers/scsi/lpfc/lpfc_scsi.c	2005-11-28 10:32:43.000000000 -0500
@@ -41,6 +41,20 @@
 #define LPFC_ABORT_WAIT  2
 
 
+static inline void
+lpfc_block_requests(struct lpfc_hba * phba)
+{
+	down(&phba->hba_can_block);
+	scsi_block_requests(phba->host);
+}
+
+static inline void
+lpfc_unblock_requests(struct lpfc_hba * phba)
+{
+	scsi_unblock_requests(phba->host);
+	up(&phba->hba_can_block);
+}
+
 /*
  * This routine allocates a scsi buffer, which contains all the necessary
  * information needed to initiate a SCSI I/O.  The non-DMAable buffer region
@@ -774,6 +788,7 @@ lpfc_abort_handler(struct scsi_cmnd *cmn
 	unsigned int loop_count = 0;
 	int ret = SUCCESS;
 
+	lpfc_block_requests(phba);
 	spin_lock_irq(shost->host_lock);
 
 	lpfc_cmd = (struct lpfc_scsi_buf *)cmnd->host_scribble;
@@ -853,6 +868,7 @@ lpfc_abort_handler(struct scsi_cmnd *cmn
 			cmnd->device->lun, cmnd->serial_number);
 
 	spin_unlock_irq(shost->host_lock);
+	lpfc_unblock_requests(phba);
 
 	return ret;
 }
@@ -866,9 +882,11 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 	struct lpfc_iocbq *iocbq, *iocbqrsp;
 	struct lpfc_rport_data *rdata = cmnd->device->hostdata;
 	struct lpfc_nodelist *pnode = rdata->pnode;
+	uint32_t cmd_result = 0, cmd_status = 0;
 	int ret = FAILED;
 	int cnt, loopcnt;
 
+	lpfc_block_requests(phba);
 	spin_lock_irq(shost->host_lock);
 	/*
 	 * If target is not in a MAPPED state, delay the reset until
@@ -912,26 +930,28 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 	if (ret == IOCB_SUCCESS)
 		ret = SUCCESS;
 
-	lpfc_cmd->result = iocbqrsp->iocb.un.ulpWord[4];
-	lpfc_cmd->status = iocbqrsp->iocb.ulpStatus;
-	if (lpfc_cmd->status == IOSTAT_LOCAL_REJECT)
-		if (lpfc_cmd->result & IOERR_DRVR_MASK)
-			lpfc_cmd->status = IOSTAT_DRIVER_REJECT;
+
+	cmd_result = iocbqrsp->iocb.un.ulpWord[4];
+	cmd_status = iocbqrsp->iocb.ulpStatus;
+
+	lpfc_sli_release_iocbq(phba, iocbqrsp);
+	lpfc_release_scsi_buf(phba, lpfc_cmd);
 
 	/*
-	 * All outstanding txcmplq I/Os should have been aborted by the target.
+	 * All outstanding txcmplq I/Os should have been aborted by the device.
 	 * Unfortunately, some targets do not abide by this forcing the driver
 	 * to double check.
 	 */
-	lpfc_sli_abort_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
-			    cmnd->device->id, cmnd->device->lun, 0,
-			    LPFC_CTX_LUN);
-
+	cnt = lpfc_sli_sum_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
+				cmnd->device->id, cmnd->device->lun,
+				LPFC_CTX_LUN);
+	if (cnt)
+		lpfc_sli_abort_iocb(phba,
+				    &phba->sli.ring[phba->sli.fcp_ring],
+				    cmnd->device->id, cmnd->device->lun,
+				    0, LPFC_CTX_LUN);
 	loopcnt = 0;
-	while((cnt = lpfc_sli_sum_iocb(phba,
-				       &phba->sli.ring[phba->sli.fcp_ring],
-				       cmnd->device->id, cmnd->device->lun,
-				       LPFC_CTX_LUN))) {
+	while(cnt) {
 		spin_unlock_irq(phba->host->host_lock);
 		schedule_timeout_uninterruptible(LPFC_RESET_WAIT*HZ);
 		spin_lock_irq(phba->host->host_lock);
@@ -939,6 +959,11 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 		if (++loopcnt
 		    > (2 * phba->cfg_nodev_tmo)/LPFC_RESET_WAIT)
 			break;
+
+		cnt = lpfc_sli_sum_iocb(phba,
+					&phba->sli.ring[phba->sli.fcp_ring],
+					cmnd->device->id, cmnd->device->lun,
+					LPFC_CTX_LUN);
 	}
 
 	if (cnt) {
@@ -948,18 +973,16 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 		ret = FAILED;
 	}
 
-	lpfc_sli_release_iocbq(phba, iocbqrsp);
-
 out_free_scsi_buf:
 	lpfc_printf_log(phba, KERN_ERR, LOG_FCP,
 			"%d:0713 SCSI layer issued LUN reset (%d, %d) "
 			"Data: x%x x%x x%x\n",
-			phba->brd_no, lpfc_cmd->pCmd->device->id,
-			lpfc_cmd->pCmd->device->lun, ret, lpfc_cmd->status,
-			lpfc_cmd->result);
-	lpfc_release_scsi_buf(phba, lpfc_cmd);
+			phba->brd_no, cmnd->device->id,cmnd->device->lun,
+			ret, cmd_status, cmd_result);
+
 out:
 	spin_unlock_irq(shost->host_lock);
+	lpfc_unblock_requests(phba);
 	return ret;
 }
 
@@ -975,6 +998,7 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 	unsigned int midlayer_id = 0;
 	struct lpfc_scsi_buf * lpfc_cmd;
 
+	lpfc_block_requests(phba);
 	spin_lock_irq(shost->host_lock);
 
 	lpfc_cmd = lpfc_sli_get_scsi_buf (phba);
@@ -1008,18 +1032,31 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 		lpfc_cmd->pCmd->device->hostdata = ndlp->rport->dd_data;
 		ret = lpfc_scsi_tgt_reset(lpfc_cmd, phba);
 		if (ret != SUCCESS) {
-			lpfc_printf_log(phba, KERN_INFO, LOG_FCP,
+			lpfc_printf_log(phba, KERN_ERR, LOG_FCP,
 				"%d:0713 Bus Reset on target %d failed\n",
 				phba->brd_no, i);
 			err_count++;
 		}
 	}
 
+	if (err_count == 0)
+		ret = SUCCESS;
+
+	lpfc_release_scsi_buf(phba, lpfc_cmd);
+
+	/*
+	 * All outstanding txcmplq I/Os should have been aborted by
+	 * the targets.  Unfortunately, some targets do not abide by
+	 * this forcing the driver to double check.
+	 */
 	cmnd->device->id = midlayer_id;
+	cnt = lpfc_sli_sum_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
+				0, 0, LPFC_CTX_HOST);
+	if (cnt)
+		lpfc_sli_abort_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
+				    0, 0, 0, LPFC_CTX_HOST);
 	loopcnt = 0;
-	while((cnt = lpfc_sli_sum_iocb(phba,
-				&phba->sli.ring[phba->sli.fcp_ring],
-				0, 0, LPFC_CTX_HOST))) {
+	while(cnt) {
 		spin_unlock_irq(phba->host->host_lock);
 		schedule_timeout_uninterruptible(LPFC_RESET_WAIT*HZ);
 		spin_lock_irq(phba->host->host_lock);
@@ -1027,25 +1064,19 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 		if (++loopcnt
 		    > (2 * phba->cfg_nodev_tmo)/LPFC_RESET_WAIT)
 			break;
+
+		cnt = lpfc_sli_sum_iocb(phba,
+					&phba->sli.ring[phba->sli.fcp_ring],
+					0, 0, LPFC_CTX_HOST);
 	}
 
 	if (cnt) {
-		/* flush all outstanding commands on the host */
-		i = lpfc_sli_abort_iocb(phba,
-				&phba->sli.ring[phba->sli.fcp_ring], 0, 0, 0,
-				LPFC_CTX_HOST);
-
-		lpfc_printf_log(phba, KERN_INFO, LOG_FCP,
+		lpfc_printf_log(phba, KERN_ERR, LOG_FCP,
 		   "%d:0715 Bus Reset I/O flush failure: cnt x%x left x%x\n",
 		   phba->brd_no, cnt, i);
-	}
-
-	if (cnt == 0)
-		ret = SUCCESS;
-	else
 		ret = FAILED;
+	}
 
-	lpfc_release_scsi_buf(phba, lpfc_cmd);
 	lpfc_printf_log(phba,
 			KERN_ERR,
 			LOG_FCP,
@@ -1053,6 +1084,7 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 			phba->brd_no, ret);
 out:
 	spin_unlock_irq(shost->host_lock);
+	lpfc_unblock_requests(phba);
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-12-01 15:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-28 21:16 [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers James.Smart
2005-11-29  7:11 ` Douglas Gilbert
2005-11-30 19:16   ` Christoph Hellwig
2005-11-30 19:17 ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2005-12-01 15:37 James.Smart
2005-11-28 19:08 James.Smart
2005-11-28 19:27 ` Christoph Hellwig
2005-11-28 16:42 James.Smart
2005-11-28 19:02 ` Stefan Richter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.