RE: [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-11-28 21:16 James.Smart
  2005-11-29  7:11 ` Douglas Gilbert
  2005-11-30 19:17 ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: James.Smart @ 2005-11-28 21:16 UTC (permalink / raw)
  To: hch; +Cc: stefanr, linux-scsi

> So you this storage vendor please provide us this test case 
> as a start?

Basic test case:
 - start load on the fc disks
 - run sg_reset -b /dev/sgX or sg_reset -d /dev/sgX

> That beeing said I mentioned that we need to do SG-based 
> resets via the
> EH thread aswell.  As Emulex apparently cares about the above 
> case would
> you mind implementing it?

Well - we would prefer the vendor test the system as it should behave
in normal circumstances rather than test-suite-derived conditions.

Yes, we will sign up for this. However, we have several items on our
plate so we won't be on this quickly. I would hope there would be no
issue with accepting the patch as is and backing out the relatived code
as part of the eventual patch.  We'll also be contacting the SG maintainer
to get his/her thoughts.

-- james

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
  2005-11-28 21:16 [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers James.Smart
@ 2005-11-29  7:11 ` Douglas Gilbert
  2005-11-30 19:16   ` Christoph Hellwig
  2005-11-30 19:17 ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Douglas Gilbert @ 2005-11-29  7:11 UTC (permalink / raw)
  To: James.Smart; +Cc: hch, stefanr, linux-scsi

James.Smart@Emulex.Com wrote:
>>So you this storage vendor please provide us this test case 
>>as a start?
> 
> 
> Basic test case:
>  - start load on the fc disks
>  - run sg_reset -b /dev/sgX or sg_reset -d /dev/sgX
> 
> 
>>That beeing said I mentioned that we need to do SG-based 
>>resets via the
>>EH thread aswell.  As Emulex apparently cares about the above 
>>case would
>>you mind implementing it?
> 
> 
> Well - we would prefer the vendor test the system as it should behave
> in normal circumstances rather than test-suite-derived conditions.
> 
> Yes, we will sign up for this. However, we have several items on our
> plate so we won't be on this quickly. I would hope there would be no
> issue with accepting the patch as is and backing out the relatived code
> as part of the eventual patch.  We'll also be contacting the SG maintainer
> to get his/her thoughts.

While on the subject of resets, linux still has a
fuzziness around a "device" reset. There should be
two resets around this level:
  - target reset: reset a target device, its ports and
    all logical units within it [transport level]
  - logical unit reset: reset a logical unit only
    [this is a mandatory SCSI task management function]

If changes are being made in this area, then perhaps
this fuzziness should be addressed.

Doug Gilbert


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
  2005-11-29  7:11 ` Douglas Gilbert
@ 2005-11-30 19:16   ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2005-11-30 19:16 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: James.Smart, hch, stefanr, linux-scsi

On Tue, Nov 29, 2005 at 05:11:42PM +1000, Douglas Gilbert wrote:
> While on the subject of resets, linux still has a
> fuzziness around a "device" reset. There should be
> two resets around this level:
>   - target reset: reset a target device, its ports and
>     all logical units within it [transport level]
>   - logical unit reset: reset a logical unit only
>     [this is a mandatory SCSI task management function]

Agreed.  In fact the emulex driver has a target reset handler already
in lower level code, but only loops over it for all found targets in the
linux bus reset handler.

> If changes are being made in this area, then perhaps
> this fuzziness should be addressed.

I don't think it's related to the asynchronous reset change.  On the
other hand it's a pretty simple change, I should have done it long
ago.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
  2005-11-28 21:16 [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers James.Smart
  2005-11-29  7:11 ` Douglas Gilbert
@ 2005-11-30 19:17 ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2005-11-30 19:17 UTC (permalink / raw)
  To: James.Smart; +Cc: hch, stefanr, linux-scsi

On Mon, Nov 28, 2005 at 04:16:33PM -0500, James.Smart@Emulex.Com wrote:
> Well - we would prefer the vendor test the system as it should behave
> in normal circumstances rather than test-suite-derived conditions.
> 
> Yes, we will sign up for this. However, we have several items on our
> plate so we won't be on this quickly. I would hope there would be no
> issue with accepting the patch as is and backing out the relatived code
> as part of the eventual patch.

No.  As you might have noticed fixing mid-layer bugs in drivers is generally
not okay.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-12-01 15:37 James.Smart
  0 siblings, 0 replies; 9+ messages in thread
From: James.Smart @ 2005-12-01 15:37 UTC (permalink / raw)
  To: hch; +Cc: stefanr, linux-scsi

> > Yes, we will sign up for this. However, we have several items on our
> > plate so we won't be on this quickly. I would hope there would be no
> > issue with accepting the patch as is and backing out the 
> relatived code
> > as part of the eventual patch.
> 
> No.  As you might have noticed fixing mid-layer bugs in 
> drivers is generally
> not okay.

Bug ?  I thought this was a feature.....  :)

I think it's a little silly to use the strong-arm tactics on this kind of
patch. I've already committed to fix it. All it does is lessen -rc testing
on the driver changes, especially if it holds up the whole group of patches
which includes things (and bug fixes) that are unrelated.

Given your response, you deem this high priority. So, I'll get on the fix
and try to get it done in time to make 2.6.15.

-- james

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-11-28 19:08 James.Smart
  2005-11-28 19:27 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: James.Smart @ 2005-11-28 19:08 UTC (permalink / raw)
  To: stefanr, linux-scsi

This is true - when invoked via the error handler thread.  Unfortunately,
reset functions invoked via SG do not have the same behavior. There is a
rather large test suite from a big storage vendor that uses asynchronous
resets via SG extensively.

-- james.

> -----Original Message-----
> From: Stefan Richter [mailto:stefanr@s5r6.in-berlin.de]
> Sent: Monday, November 28, 2005 2:03 PM
> To: linux-scsi@vger.kernel.org
> Cc: Smart, James
> Subject: Re: [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers
> 
> 
> James Smart wrote:
> > - Serialize EH calls and block requests when EH function is running.
> 
> Is this necessary? From what I understood from docs or 
> comments, there 
> is nothing enqueued while an EH is run. (Furthermore, EHs are not 
> concurrent per host.) Or am I mistaken?
> -- 
> Stefan Richter
> -=====-=-=-= =-== ===--
> http://arcgraph.de/sr/
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
  2005-11-28 19:08 James.Smart
@ 2005-11-28 19:27 ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2005-11-28 19:27 UTC (permalink / raw)
  To: James.Smart; +Cc: stefanr, linux-scsi

On Mon, Nov 28, 2005 at 02:08:56PM -0500, James.Smart@Emulex.Com wrote:
> This is true - when invoked via the error handler thread.  Unfortunately,
> reset functions invoked via SG do not have the same behavior. There is a
> rather large test suite from a big storage vendor that uses asynchronous
> resets via SG extensively.

So you this storage vendor please provide us this test case as a start?

That beeing said I mentioned that we need to do SG-based resets via the
EH thread aswell.  As Emulex apparently cares about the above case would
you mind implementing it?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
@ 2005-11-28 16:42 James.Smart
  2005-11-28 19:02 ` Stefan Richter
  0 siblings, 1 reply; 9+ messages in thread
From: James.Smart @ 2005-11-28 16:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Smart


Fixes to error handlers:
- Release task management command before counting outstanding commands.
  TMF was being erroneously counted as an active outstanding command.
- Serialize EH calls and block requests when EH function is running.


Signed-off-by: James Smart <James.Smart@emulex.com>


diff -upNr a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
--- a/drivers/scsi/lpfc/lpfc.h	2005-11-24 10:07:36.000000000 -0500
+++ b/drivers/scsi/lpfc/lpfc.h	2005-11-28 10:32:43.000000000 -0500
@@ -167,6 +167,7 @@ struct lpfc_hba {
 	dma_addr_t slim2p_mapping;
 	uint16_t pci_cfg_value;
 
+	struct semaphore hba_can_block;
 	uint32_t hba_state;
 
 #define LPFC_INIT_START           1	/* Initial state after board reset */
diff -upNr a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
--- a/drivers/scsi/lpfc/lpfc_init.c	2005-11-28 10:15:37.000000000 -0500
+++ b/drivers/scsi/lpfc/lpfc_init.c	2005-11-28 10:32:43.000000000 -0500
@@ -1345,7 +1345,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
 		goto out_put_host;
 
 	host->unique_id = phba->brd_no;
-
+	init_MUTEX(&phba->hba_can_block);
 	INIT_LIST_HEAD(&phba->ctrspbuflist);
 	INIT_LIST_HEAD(&phba->rnidrspbuflist);
 	INIT_LIST_HEAD(&phba->freebufList);
diff -upNr a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
--- a/drivers/scsi/lpfc/lpfc_scsi.c	2005-11-28 10:26:47.000000000 -0500
+++ b/drivers/scsi/lpfc/lpfc_scsi.c	2005-11-28 10:32:43.000000000 -0500
@@ -41,6 +41,20 @@
 #define LPFC_ABORT_WAIT  2
 
 
+static inline void
+lpfc_block_requests(struct lpfc_hba * phba)
+{
+	down(&phba->hba_can_block);
+	scsi_block_requests(phba->host);
+}
+
+static inline void
+lpfc_unblock_requests(struct lpfc_hba * phba)
+{
+	scsi_unblock_requests(phba->host);
+	up(&phba->hba_can_block);
+}
+
 /*
  * This routine allocates a scsi buffer, which contains all the necessary
  * information needed to initiate a SCSI I/O.  The non-DMAable buffer region
@@ -774,6 +788,7 @@ lpfc_abort_handler(struct scsi_cmnd *cmn
 	unsigned int loop_count = 0;
 	int ret = SUCCESS;
 
+	lpfc_block_requests(phba);
 	spin_lock_irq(shost->host_lock);
 
 	lpfc_cmd = (struct lpfc_scsi_buf *)cmnd->host_scribble;
@@ -853,6 +868,7 @@ lpfc_abort_handler(struct scsi_cmnd *cmn
 			cmnd->device->lun, cmnd->serial_number);
 
 	spin_unlock_irq(shost->host_lock);
+	lpfc_unblock_requests(phba);
 
 	return ret;
 }
@@ -866,9 +882,11 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 	struct lpfc_iocbq *iocbq, *iocbqrsp;
 	struct lpfc_rport_data *rdata = cmnd->device->hostdata;
 	struct lpfc_nodelist *pnode = rdata->pnode;
+	uint32_t cmd_result = 0, cmd_status = 0;
 	int ret = FAILED;
 	int cnt, loopcnt;
 
+	lpfc_block_requests(phba);
 	spin_lock_irq(shost->host_lock);
 	/*
 	 * If target is not in a MAPPED state, delay the reset until
@@ -912,26 +930,28 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 	if (ret == IOCB_SUCCESS)
 		ret = SUCCESS;
 
-	lpfc_cmd->result = iocbqrsp->iocb.un.ulpWord[4];
-	lpfc_cmd->status = iocbqrsp->iocb.ulpStatus;
-	if (lpfc_cmd->status == IOSTAT_LOCAL_REJECT)
-		if (lpfc_cmd->result & IOERR_DRVR_MASK)
-			lpfc_cmd->status = IOSTAT_DRIVER_REJECT;
+
+	cmd_result = iocbqrsp->iocb.un.ulpWord[4];
+	cmd_status = iocbqrsp->iocb.ulpStatus;
+
+	lpfc_sli_release_iocbq(phba, iocbqrsp);
+	lpfc_release_scsi_buf(phba, lpfc_cmd);
 
 	/*
-	 * All outstanding txcmplq I/Os should have been aborted by the target.
+	 * All outstanding txcmplq I/Os should have been aborted by the device.
 	 * Unfortunately, some targets do not abide by this forcing the driver
 	 * to double check.
 	 */
-	lpfc_sli_abort_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
-			    cmnd->device->id, cmnd->device->lun, 0,
-			    LPFC_CTX_LUN);
-
+	cnt = lpfc_sli_sum_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
+				cmnd->device->id, cmnd->device->lun,
+				LPFC_CTX_LUN);
+	if (cnt)
+		lpfc_sli_abort_iocb(phba,
+				    &phba->sli.ring[phba->sli.fcp_ring],
+				    cmnd->device->id, cmnd->device->lun,
+				    0, LPFC_CTX_LUN);
 	loopcnt = 0;
-	while((cnt = lpfc_sli_sum_iocb(phba,
-				       &phba->sli.ring[phba->sli.fcp_ring],
-				       cmnd->device->id, cmnd->device->lun,
-				       LPFC_CTX_LUN))) {
+	while(cnt) {
 		spin_unlock_irq(phba->host->host_lock);
 		schedule_timeout_uninterruptible(LPFC_RESET_WAIT*HZ);
 		spin_lock_irq(phba->host->host_lock);
@@ -939,6 +959,11 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 		if (++loopcnt
 		    > (2 * phba->cfg_nodev_tmo)/LPFC_RESET_WAIT)
 			break;
+
+		cnt = lpfc_sli_sum_iocb(phba,
+					&phba->sli.ring[phba->sli.fcp_ring],
+					cmnd->device->id, cmnd->device->lun,
+					LPFC_CTX_LUN);
 	}
 
 	if (cnt) {
@@ -948,18 +973,16 @@ lpfc_reset_lun_handler(struct scsi_cmnd 
 		ret = FAILED;
 	}
 
-	lpfc_sli_release_iocbq(phba, iocbqrsp);
-
 out_free_scsi_buf:
 	lpfc_printf_log(phba, KERN_ERR, LOG_FCP,
 			"%d:0713 SCSI layer issued LUN reset (%d, %d) "
 			"Data: x%x x%x x%x\n",
-			phba->brd_no, lpfc_cmd->pCmd->device->id,
-			lpfc_cmd->pCmd->device->lun, ret, lpfc_cmd->status,
-			lpfc_cmd->result);
-	lpfc_release_scsi_buf(phba, lpfc_cmd);
+			phba->brd_no, cmnd->device->id,cmnd->device->lun,
+			ret, cmd_status, cmd_result);
+
 out:
 	spin_unlock_irq(shost->host_lock);
+	lpfc_unblock_requests(phba);
 	return ret;
 }
 
@@ -975,6 +998,7 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 	unsigned int midlayer_id = 0;
 	struct lpfc_scsi_buf * lpfc_cmd;
 
+	lpfc_block_requests(phba);
 	spin_lock_irq(shost->host_lock);
 
 	lpfc_cmd = lpfc_sli_get_scsi_buf (phba);
@@ -1008,18 +1032,31 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 		lpfc_cmd->pCmd->device->hostdata = ndlp->rport->dd_data;
 		ret = lpfc_scsi_tgt_reset(lpfc_cmd, phba);
 		if (ret != SUCCESS) {
-			lpfc_printf_log(phba, KERN_INFO, LOG_FCP,
+			lpfc_printf_log(phba, KERN_ERR, LOG_FCP,
 				"%d:0713 Bus Reset on target %d failed\n",
 				phba->brd_no, i);
 			err_count++;
 		}
 	}
 
+	if (err_count == 0)
+		ret = SUCCESS;
+
+	lpfc_release_scsi_buf(phba, lpfc_cmd);
+
+	/*
+	 * All outstanding txcmplq I/Os should have been aborted by
+	 * the targets.  Unfortunately, some targets do not abide by
+	 * this forcing the driver to double check.
+	 */
 	cmnd->device->id = midlayer_id;
+	cnt = lpfc_sli_sum_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
+				0, 0, LPFC_CTX_HOST);
+	if (cnt)
+		lpfc_sli_abort_iocb(phba, &phba->sli.ring[phba->sli.fcp_ring],
+				    0, 0, 0, LPFC_CTX_HOST);
 	loopcnt = 0;
-	while((cnt = lpfc_sli_sum_iocb(phba,
-				&phba->sli.ring[phba->sli.fcp_ring],
-				0, 0, LPFC_CTX_HOST))) {
+	while(cnt) {
 		spin_unlock_irq(phba->host->host_lock);
 		schedule_timeout_uninterruptible(LPFC_RESET_WAIT*HZ);
 		spin_lock_irq(phba->host->host_lock);
@@ -1027,25 +1064,19 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 		if (++loopcnt
 		    > (2 * phba->cfg_nodev_tmo)/LPFC_RESET_WAIT)
 			break;
+
+		cnt = lpfc_sli_sum_iocb(phba,
+					&phba->sli.ring[phba->sli.fcp_ring],
+					0, 0, LPFC_CTX_HOST);
 	}
 
 	if (cnt) {
-		/* flush all outstanding commands on the host */
-		i = lpfc_sli_abort_iocb(phba,
-				&phba->sli.ring[phba->sli.fcp_ring], 0, 0, 0,
-				LPFC_CTX_HOST);
-
-		lpfc_printf_log(phba, KERN_INFO, LOG_FCP,
+		lpfc_printf_log(phba, KERN_ERR, LOG_FCP,
 		   "%d:0715 Bus Reset I/O flush failure: cnt x%x left x%x\n",
 		   phba->brd_no, cnt, i);
-	}
-
-	if (cnt == 0)
-		ret = SUCCESS;
-	else
 		ret = FAILED;
+	}
 
-	lpfc_release_scsi_buf(phba, lpfc_cmd);
 	lpfc_printf_log(phba,
 			KERN_ERR,
 			LOG_FCP,
@@ -1053,6 +1084,7 @@ lpfc_reset_bus_handler(struct scsi_cmnd 
 			phba->brd_no, ret);
 out:
 	spin_unlock_irq(shost->host_lock);
+	lpfc_unblock_requests(phba);
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 6/11]  lpfc 8.1.1 : Fixes to error handlers
  2005-11-28 16:42 James.Smart
@ 2005-11-28 19:02 ` Stefan Richter
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Richter @ 2005-11-28 19:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Smart

James Smart wrote:
> - Serialize EH calls and block requests when EH function is running.

Is this necessary? From what I understood from docs or comments, there 
is nothing enqueued while an EH is run. (Furthermore, EHs are not 
concurrent per host.) Or am I mistaken?
-- 
Stefan Richter
-=====-=-=-= =-== ===--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-12-01 15:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-28 21:16 [PATCH 6/11] lpfc 8.1.1 : Fixes to error handlers James.Smart
2005-11-29  7:11 ` Douglas Gilbert
2005-11-30 19:16   ` Christoph Hellwig
2005-11-30 19:17 ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2005-12-01 15:37 James.Smart
2005-11-28 19:08 James.Smart
2005-11-28 19:27 ` Christoph Hellwig
2005-11-28 16:42 James.Smart
2005-11-28 19:02 ` Stefan Richter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.