[PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
@ 2011-08-04 11:13 kashyap.desai
  2011-08-04 13:09 ` Bernd Schubert
  0 siblings, 1 reply; 5+ messages in thread
From: kashyap.desai @ 2011-08-04 11:13 UTC (permalink / raw)
  To: linux-scsi; +Cc: Nagalakshmi.Nandigama, Sathya.Prakash, Eric.Moore, JBottomley

Issue: 
Device goes offline while doing aggressive HBA reset 
along with IO using some utility.

Root cause:
FW goes into bad state due to aggressive reset. Softreset does
not help to recover FW. And also aggressive reset open up the
window for Error handling thread to kicked off at the same time
HBA will be in constant RESET loop as part of aggressive reset
test case can lead Device to goes offline.

Changes: 
1. Added extra check as below inside eh_timed_out call back as below.
if(ioc->ioc_reset_in_progress)
    Rc = EH_TIMER_RESET                                          
2. Removed " DOORBELL_ACTIVE" check for SAS controller from task management context.
   Since SAS controller uses high priority queue for task management. This check is 
   not required for SAS controller.
3. Moved SoftReset call to HardReset from Task Mgmt context.
   
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
---
diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 517621f..e9c6a60 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -6474,8 +6474,19 @@ mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS *pCfg)
 			pReq->Action, ioc->mptbase_cmds.status, timeleft));
 		if (ioc->mptbase_cmds.status & MPT_MGMT_STATUS_DID_IOCRESET)
 			goto out;
-		if (!timeleft)
+		if (!timeleft) {
+			spin_lock_irqsave(&ioc->taskmgmt_lock, flags);
+			if (ioc->ioc_reset_in_progress) {
+				spin_unlock_irqrestore(&ioc->taskmgmt_lock,
+					flags);
+				printk(MYIOC_s_INFO_FMT "%s: host reset in"
+					" progress mpt_config timed out.!!\n",
+					__func__, ioc->name);
+				return -EFAULT;
+			}
+			spin_unlock_irqrestore(&ioc->taskmgmt_lock, flags);
 			issue_hard_reset = 1;
+		}
 		goto out;
 	}
 
@@ -7189,7 +7200,18 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
 	spin_lock_irqsave(&ioc->taskmgmt_lock, flags);
 	if (ioc->ioc_reset_in_progress) {
 		spin_unlock_irqrestore(&ioc->taskmgmt_lock, flags);
-		return 0;
+		ioc->wait_on_reset_completion = 1;
+		do {
+			ssleep(1);
+		} while (ioc->ioc_reset_in_progress == 1);
+		ioc->wait_on_reset_completion = 0;
+		return ioc->reset_status;
+	}
+	if (ioc->wait_on_reset_completion) {
+		spin_unlock_irqrestore(&ioc->taskmgmt_lock, flags);
+		rc = 0;
+		time_count = jiffies;
+		goto exit;
 	}
 	ioc->ioc_reset_in_progress = 1;
 	if (ioc->alt_ioc)
@@ -7226,6 +7248,7 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
 	ioc->ioc_reset_in_progress = 0;
 	ioc->taskmgmt_quiesce_io = 0;
 	ioc->taskmgmt_in_progress = 0;
+	ioc->reset_status = rc;
 	if (ioc->alt_ioc) {
 		ioc->alt_ioc->ioc_reset_in_progress = 0;
 		ioc->alt_ioc->taskmgmt_quiesce_io = 0;
@@ -7241,7 +7264,7 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
 					ioc->alt_ioc, MPT_IOC_POST_RESET);
 		}
 	}
-
+exit:
 	dtmprintk(ioc,
 	    printk(MYIOC_s_DEBUG_FMT
 		"HardResetHandler: completed (%d seconds): %s\n", ioc->name,
diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
index 69ddabd..66e6f7b 100644
--- a/drivers/message/fusion/mptbase.h
+++ b/drivers/message/fusion/mptbase.h
@@ -753,6 +753,8 @@ typedef struct _MPT_ADAPTER
 	int			 taskmgmt_in_progress;
 	u8			 taskmgmt_quiesce_io;
 	u8			 ioc_reset_in_progress;
+	u8			 reset_status;
+	u8			 wait_on_reset_completion;
 	MPT_SCHEDULE_TARGET_RESET schedule_target_reset;
 	MPT_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds;
 	struct work_struct	 sas_persist_task;
diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index c65aa75..40ecfcf 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -1950,6 +1950,15 @@ static enum blk_eh_timer_return mptsas_eh_timed_out(struct scsi_cmnd *sc)
 		goto done;
 	}
 
+	/* In case if IOC is in reset from internal context.
+	*  Do not execute EEH for the same IOC. SML should to reset timer.
+	*/
+	if (ioc->ioc_reset_in_progress) {
+		dtmprintk(ioc, printk(MYIOC_s_WARN_FMT ": %s: ioc is in reset,"
+		    "SML need to reset the timer (sc=%p)\n",
+		    ioc->name, __func__, sc));
+		rc = BLK_EH_RESET_TIMER;
+	}
 	vdevice = sc->device->hostdata;
 	if (vdevice && vdevice->vtarget && (vdevice->vtarget->inDMD
 		|| vdevice->vtarget->deleted)) {
diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
index c9a7109..c1c0339 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun,
 		return 0;
 	}
 
-	if (ioc_raw_state & MPI_DOORBELL_ACTIVE) {
+	/* DOORBELL ACTIVE check is not required if
+	*  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
+	*/
+
+	if (!((ioc->facts.IOCCapabilities & MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
+		 && (ioc->facts.MsgVersion >= MPI_VERSION_01_05)) &&
+		(ioc_raw_state & MPI_DOORBELL_ACTIVE)) {
 		printk(MYIOC_s_WARN_FMT
 			"TaskMgmt type=%x: ioc_state: "
 			"DOORBELL_ACTIVE (0x%x)!\n",
@@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun,
 		printk(MYIOC_s_WARN_FMT
 		       "Issuing Reset from %s!! doorbell=0x%08x\n",
 		       ioc->name, __func__, mpt_GetIocState(ioc, 0));
-		retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
+		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
 		mpt_free_msg_frame(ioc, mf);
 	}
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
  2011-08-04 11:13 [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset kashyap.desai
@ 2011-08-04 13:09 ` Bernd Schubert
  2011-08-04 13:37   ` Desai, Kashyap
  0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schubert @ 2011-08-04 13:09 UTC (permalink / raw)
  To: kashyap.desai
  Cc: linux-scsi, Nagalakshmi.Nandigama, Sathya.Prakash, Eric.Moore,
	JBottomley

On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote:
> Issue:
> Device goes offline while doing aggressive HBA reset
> along with IO using some utility.
>
> Root cause:
> FW goes into bad state due to aggressive reset. Softreset does
> not help to recover FW. And also aggressive reset open up the
> window for Error handling thread to kicked off at the same time
> HBA will be in constant RESET loop as part of aggressive reset
> test case can lead Device to goes offline.
>
> Changes:
> 1. Added extra check as below inside eh_timed_out call back as below.
> if(ioc->ioc_reset_in_progress)
>      Rc = EH_TIMER_RESET
> 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task management context.
>     Since SAS controller uses high priority queue for task management. This check is
>     not required for SAS controller.
> 3. Moved SoftReset call to HardReset from Task Mgmt context.

[...]


> --- a/drivers/message/fusion/mptscsih.c
> +++ b/drivers/message/fusion/mptscsih.c
> @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun,
>   		return 0;
>   	}
>
> -	if (ioc_raw_state&  MPI_DOORBELL_ACTIVE) {
> +	/* DOORBELL ACTIVE check is not required if
> +	*  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
> +	*/
> +
> +	if (!((ioc->facts.IOCCapabilities&  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
> +		&&  (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&&
> +		(ioc_raw_state&  MPI_DOORBELL_ACTIVE)) {
>   		printk(MYIOC_s_WARN_FMT
>   			"TaskMgmt type=%x: ioc_state: "
>   			"DOORBELL_ACTIVE (0x%x)!\n",
> @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 type, u8 channel, u8 id, int lun,
>   		printk(MYIOC_s_WARN_FMT
>   		       "Issuing Reset from %s!! doorbell=0x%08x\n",
>   		       ioc->name, __func__, mpt_GetIocState(ioc, 0));
> -		retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
> +		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>   		mpt_free_msg_frame(ioc, mf);
>   	}

Have you ever tested that with dual port 501030C parallel scsi HBAs? The 
hard reset with those HBAs will reset *both* ports and eventually *both* 
ports will fail. A couple of years ago I tried to convince Eric to 
disable hard resets for those chips at all (and even sent a patch), but 
Eric never agreed on that.
The soft-reset handler was a workaround for that problem, but with that 
patch the issue will re-appear. The affected systems are still in 
production and probably will still be for the next few years.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
  2011-08-04 13:09 ` Bernd Schubert
@ 2011-08-04 13:37   ` Desai, Kashyap
  2011-08-04 13:47     ` Bernd Schubert
  0 siblings, 1 reply; 5+ messages in thread
From: Desai, Kashyap @ 2011-08-04 13:37 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-scsi@vger.kernel.org, Nandigama, Nagalakshmi,
	Prakash, Sathya, Moore, Eric, JBottomley@parallels.com



> -----Original Message-----
> From: Bernd Schubert [mailto:bernd.schubert@itwm.fraunhofer.de]
> Sent: Thursday, August 04, 2011 6:39 PM
> To: Desai, Kashyap
> Cc: linux-scsi@vger.kernel.org; Nandigama, Nagalakshmi; Prakash, Sathya;
> Moore, Eric; JBottomley@parallels.com
> Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while doing
> aggressive HBA reset
> 
> On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote:
> > Issue:
> > Device goes offline while doing aggressive HBA reset
> > along with IO using some utility.
> >
> > Root cause:
> > FW goes into bad state due to aggressive reset. Softreset does
> > not help to recover FW. And also aggressive reset open up the
> > window for Error handling thread to kicked off at the same time
> > HBA will be in constant RESET loop as part of aggressive reset
> > test case can lead Device to goes offline.
> >
> > Changes:
> > 1. Added extra check as below inside eh_timed_out call back as below.
> > if(ioc->ioc_reset_in_progress)
> >      Rc = EH_TIMER_RESET
> > 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
> management context.
> >     Since SAS controller uses high priority queue for task management.
> This check is
> >     not required for SAS controller.
> > 3. Moved SoftReset call to HardReset from Task Mgmt context.
> 
> [...]
> 
> 
> > --- a/drivers/message/fusion/mptscsih.c
> > +++ b/drivers/message/fusion/mptscsih.c
> > @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
> type, u8 channel, u8 id, int lun,
> >   		return 0;
> >   	}
> >
> > -	if (ioc_raw_state&  MPI_DOORBELL_ACTIVE) {
> > +	/* DOORBELL ACTIVE check is not required if
> > +	*  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
> > +	*/
> > +
> > +	if (!((ioc->facts.IOCCapabilities&
> MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
> > +		&&  (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&&
> > +		(ioc_raw_state&  MPI_DOORBELL_ACTIVE)) {
> >   		printk(MYIOC_s_WARN_FMT
> >   			"TaskMgmt type=%x: ioc_state: "
> >   			"DOORBELL_ACTIVE (0x%x)!\n",
> > @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
> type, u8 channel, u8 id, int lun,
> >   		printk(MYIOC_s_WARN_FMT
> >   		       "Issuing Reset from %s!! doorbell=0x%08x\n",
> >   		       ioc->name, __func__, mpt_GetIocState(ioc, 0));
> > -		retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
> > +		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
> >   		mpt_free_msg_frame(ioc, mf);
> >   	}
> 
> Have you ever tested that with dual port 501030C parallel scsi HBAs? The
> hard reset with those HBAs will reset *both* ports and eventually *both*
> ports will fail. A couple of years ago I tried to convince Eric to
> disable hard resets for those chips at all (and even sent a patch), but
> Eric never agreed on that.
> The soft-reset handler was a workaround for that problem, but with that
> patch the issue will re-appear. The affected systems are still in
> production and probably will still be for the next few years.

I did not tried with dual port 501030C parallel scsi HBA.. I remember that exact issue you have described here.
I can add check for ioc->bus_type == SAS to have HardReset and other case I will continue with SoftReset.
Just wanted to know Is this fine to avoid issue which you have mentioned ?

Pls let me know your view on it, so that I can resend the patch.

~ Kashyap
> 
> 
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
  2011-08-04 13:37   ` Desai, Kashyap
@ 2011-08-04 13:47     ` Bernd Schubert
  2011-08-10 16:57       ` Benjamin ESTRABAUD
  0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schubert @ 2011-08-04 13:47 UTC (permalink / raw)
  To: Desai, Kashyap
  Cc: linux-scsi@vger.kernel.org, Nandigama, Nagalakshmi,
	Prakash, Sathya, Moore, Eric, JBottomley@parallels.com

On 08/04/2011 03:37 PM, Desai, Kashyap wrote:
>
>
>> -----Original Message-----
>> From: Bernd Schubert [mailto:bernd.schubert@itwm.fraunhofer.de]
>> Sent: Thursday, August 04, 2011 6:39 PM
>> To: Desai, Kashyap
>> Cc: linux-scsi@vger.kernel.org; Nandigama, Nagalakshmi; Prakash, Sathya;
>> Moore, Eric; JBottomley@parallels.com
>> Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while doing
>> aggressive HBA reset
>>
>> On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote:
>>> Issue:
>>> Device goes offline while doing aggressive HBA reset
>>> along with IO using some utility.
>>>
>>> Root cause:
>>> FW goes into bad state due to aggressive reset. Softreset does
>>> not help to recover FW. And also aggressive reset open up the
>>> window for Error handling thread to kicked off at the same time
>>> HBA will be in constant RESET loop as part of aggressive reset
>>> test case can lead Device to goes offline.
>>>
>>> Changes:
>>> 1. Added extra check as below inside eh_timed_out call back as below.
>>> if(ioc->ioc_reset_in_progress)
>>>       Rc = EH_TIMER_RESET
>>> 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
>> management context.
>>>      Since SAS controller uses high priority queue for task management.
>> This check is
>>>      not required for SAS controller.
>>> 3. Moved SoftReset call to HardReset from Task Mgmt context.
>>
>> [...]
>>
>>
>>> --- a/drivers/message/fusion/mptscsih.c
>>> +++ b/drivers/message/fusion/mptscsih.c
>>> @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>> type, u8 channel, u8 id, int lun,
>>>    		return 0;
>>>    	}
>>>
>>> -	if (ioc_raw_state&   MPI_DOORBELL_ACTIVE) {
>>> +	/* DOORBELL ACTIVE check is not required if
>>> +	*  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
>>> +	*/
>>> +
>>> +	if (!((ioc->facts.IOCCapabilities&
>> MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
>>> +		&&   (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&&
>>> +		(ioc_raw_state&   MPI_DOORBELL_ACTIVE)) {
>>>    		printk(MYIOC_s_WARN_FMT
>>>    			"TaskMgmt type=%x: ioc_state: "
>>>    			"DOORBELL_ACTIVE (0x%x)!\n",
>>> @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>> type, u8 channel, u8 id, int lun,
>>>    		printk(MYIOC_s_WARN_FMT
>>>    		       "Issuing Reset from %s!! doorbell=0x%08x\n",
>>>    		       ioc->name, __func__, mpt_GetIocState(ioc, 0));
>>> -		retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
>>> +		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>>>    		mpt_free_msg_frame(ioc, mf);
>>>    	}
>>
>> Have you ever tested that with dual port 501030C parallel scsi HBAs? The
>> hard reset with those HBAs will reset *both* ports and eventually *both*
>> ports will fail. A couple of years ago I tried to convince Eric to
>> disable hard resets for those chips at all (and even sent a patch), but
>> Eric never agreed on that.
>> The soft-reset handler was a workaround for that problem, but with that
>> patch the issue will re-appear. The affected systems are still in
>> production and probably will still be for the next few years.
>
> I did not tried with dual port 501030C parallel scsi HBA.. I remember that exact issue you have described here.
> I can add check for ioc->bus_type == SAS to have HardReset and other case I will continue with SoftReset.
> Just wanted to know Is this fine to avoid issue which you have mentioned ?
>
> Pls let me know your view on it, so that I can resend the patch.

Yes, I think adding a test for SAS would be fine and would keep the 
workaround for 1030C Chips.

Thanks,
Bernd


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
  2011-08-04 13:47     ` Bernd Schubert
@ 2011-08-10 16:57       ` Benjamin ESTRABAUD
  0 siblings, 0 replies; 5+ messages in thread
From: Benjamin ESTRABAUD @ 2011-08-10 16:57 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Desai, Kashyap, linux-scsi@vger.kernel.org,
	Nandigama, Nagalakshmi, Prakash, Sathya, Moore, Eric,
	JBottomley@parallels.com

On 04/08/11 14:47, Bernd Schubert wrote:
> On 08/04/2011 03:37 PM, Desai, Kashyap wrote:
>>
>>
>>> -----Original Message-----
>>> From: Bernd Schubert [mailto:bernd.schubert@itwm.fraunhofer.de]
>>> Sent: Thursday, August 04, 2011 6:39 PM
>>> To: Desai, Kashyap
>>> Cc: linux-scsi@vger.kernel.org; Nandigama, Nagalakshmi; Prakash, 
>>> Sathya;
>>> Moore, Eric; JBottomley@parallels.com
>>> Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while 
>>> doing
>>> aggressive HBA reset
>>>
>>> On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote:
>>>> Issue:
>>>> Device goes offline while doing aggressive HBA reset
>>>> along with IO using some utility.
>>>>
>>>> Root cause:
>>>> FW goes into bad state due to aggressive reset. Softreset does
>>>> not help to recover FW. And also aggressive reset open up the
>>>> window for Error handling thread to kicked off at the same time
>>>> HBA will be in constant RESET loop as part of aggressive reset
>>>> test case can lead Device to goes offline.
>>>>
>>>> Changes:
>>>> 1. Added extra check as below inside eh_timed_out call back as below.
>>>> if(ioc->ioc_reset_in_progress)
>>>>       Rc = EH_TIMER_RESET
>>>> 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
>>> management context.
>>>>      Since SAS controller uses high priority queue for task 
>>>> management.
>>> This check is
>>>>      not required for SAS controller.
>>>> 3. Moved SoftReset call to HardReset from Task Mgmt context.
>>>
>>> [...]
>>>
>>>
>>>> --- a/drivers/message/fusion/mptscsih.c
>>>> +++ b/drivers/message/fusion/mptscsih.c
>>>> @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>>> type, u8 channel, u8 id, int lun,
>>>>            return 0;
>>>>        }
>>>>
>>>> -    if (ioc_raw_state&   MPI_DOORBELL_ACTIVE) {
>>>> +    /* DOORBELL ACTIVE check is not required if
>>>> +    *  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
>>>> +    */
>>>> +
>>>> +    if (!((ioc->facts.IOCCapabilities&
>>> MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
>>>> + &&   (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&&
>>>> +        (ioc_raw_state&   MPI_DOORBELL_ACTIVE)) {
>>>>            printk(MYIOC_s_WARN_FMT
>>>>                "TaskMgmt type=%x: ioc_state: "
>>>>                "DOORBELL_ACTIVE (0x%x)!\n",
>>>> @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>>> type, u8 channel, u8 id, int lun,
>>>>            printk(MYIOC_s_WARN_FMT
>>>>                   "Issuing Reset from %s!! doorbell=0x%08x\n",
>>>>                   ioc->name, __func__, mpt_GetIocState(ioc, 0));
>>>> -        retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
>>>> +        retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>>>>            mpt_free_msg_frame(ioc, mf);
>>>>        }
>>>
>>> Have you ever tested that with dual port 501030C parallel scsi HBAs? 
>>> The
>>> hard reset with those HBAs will reset *both* ports and eventually 
>>> *both*
>>> ports will fail. A couple of years ago I tried to convince Eric to
>>> disable hard resets for those chips at all (and even sent a patch), but
>>> Eric never agreed on that.
>>> The soft-reset handler was a workaround for that problem, but with that
>>> patch the issue will re-appear. The affected systems are still in
>>> production and probably will still be for the next few years.
>>
>> I did not tried with dual port 501030C parallel scsi HBA.. I remember 
>> that exact issue you have described here.
>> I can add check for ioc->bus_type == SAS to have HardReset and other 
>> case I will continue with SoftReset.
>> Just wanted to know Is this fine to avoid issue which you have 
>> mentioned ?
>>
>> Pls let me know your view on it, so that I can resend the patch.
>
> Yes, I think adding a test for SAS would be fine and would keep the 
> workaround for 1030C Chips.
>
> Thanks,
> Bernd

Just a quick note: as far as I know, the hard reset will also disable 
both ports on SAS (on 1068 chips for instance), causing one bad port to 
take down the second port. I am unsure whether the port comes back up ok 
after the hard reset or not.

Regards,
Ben.
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-10 17:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-04 11:13 [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset kashyap.desai
2011-08-04 13:09 ` Bernd Schubert
2011-08-04 13:37   ` Desai, Kashyap
2011-08-04 13:47     ` Bernd Schubert
2011-08-10 16:57       ` Benjamin ESTRABAUD

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox