public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
To: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>,
	"Prakash, Sathya" <Sathya.Prakash@lsi.com>,
	"Moore, Eric" <Eric.Moore@lsi.com>,
	"JBottomley@parallels.com" <JBottomley@parallels.com>
Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
Date: Thu, 04 Aug 2011 15:47:40 +0200	[thread overview]
Message-ID: <4E3AA2FC.9030805@itwm.fraunhofer.de> (raw)
In-Reply-To: <B2FD678A64EAAD45B089B123FDFC3ED7016650A1BA@inbmail01.lsi.com>

On 08/04/2011 03:37 PM, Desai, Kashyap wrote:
>
>
>> -----Original Message-----
>> From: Bernd Schubert [mailto:bernd.schubert@itwm.fraunhofer.de]
>> Sent: Thursday, August 04, 2011 6:39 PM
>> To: Desai, Kashyap
>> Cc: linux-scsi@vger.kernel.org; Nandigama, Nagalakshmi; Prakash, Sathya;
>> Moore, Eric; JBottomley@parallels.com
>> Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while doing
>> aggressive HBA reset
>>
>> On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote:
>>> Issue:
>>> Device goes offline while doing aggressive HBA reset
>>> along with IO using some utility.
>>>
>>> Root cause:
>>> FW goes into bad state due to aggressive reset. Softreset does
>>> not help to recover FW. And also aggressive reset open up the
>>> window for Error handling thread to kicked off at the same time
>>> HBA will be in constant RESET loop as part of aggressive reset
>>> test case can lead Device to goes offline.
>>>
>>> Changes:
>>> 1. Added extra check as below inside eh_timed_out call back as below.
>>> if(ioc->ioc_reset_in_progress)
>>>       Rc = EH_TIMER_RESET
>>> 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
>> management context.
>>>      Since SAS controller uses high priority queue for task management.
>> This check is
>>>      not required for SAS controller.
>>> 3. Moved SoftReset call to HardReset from Task Mgmt context.
>>
>> [...]
>>
>>
>>> --- a/drivers/message/fusion/mptscsih.c
>>> +++ b/drivers/message/fusion/mptscsih.c
>>> @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>> type, u8 channel, u8 id, int lun,
>>>    		return 0;
>>>    	}
>>>
>>> -	if (ioc_raw_state&   MPI_DOORBELL_ACTIVE) {
>>> +	/* DOORBELL ACTIVE check is not required if
>>> +	*  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
>>> +	*/
>>> +
>>> +	if (!((ioc->facts.IOCCapabilities&
>> MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
>>> +		&&   (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&&
>>> +		(ioc_raw_state&   MPI_DOORBELL_ACTIVE)) {
>>>    		printk(MYIOC_s_WARN_FMT
>>>    			"TaskMgmt type=%x: ioc_state: "
>>>    			"DOORBELL_ACTIVE (0x%x)!\n",
>>> @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>> type, u8 channel, u8 id, int lun,
>>>    		printk(MYIOC_s_WARN_FMT
>>>    		       "Issuing Reset from %s!! doorbell=0x%08x\n",
>>>    		       ioc->name, __func__, mpt_GetIocState(ioc, 0));
>>> -		retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
>>> +		retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>>>    		mpt_free_msg_frame(ioc, mf);
>>>    	}
>>
>> Have you ever tested that with dual port 501030C parallel scsi HBAs? The
>> hard reset with those HBAs will reset *both* ports and eventually *both*
>> ports will fail. A couple of years ago I tried to convince Eric to
>> disable hard resets for those chips at all (and even sent a patch), but
>> Eric never agreed on that.
>> The soft-reset handler was a workaround for that problem, but with that
>> patch the issue will re-appear. The affected systems are still in
>> production and probably will still be for the next few years.
>
> I did not tried with dual port 501030C parallel scsi HBA.. I remember that exact issue you have described here.
> I can add check for ioc->bus_type == SAS to have HardReset and other case I will continue with SoftReset.
> Just wanted to know Is this fine to avoid issue which you have mentioned ?
>
> Pls let me know your view on it, so that I can resend the patch.

Yes, I think adding a test for SAS would be fine and would keep the 
workaround for 1030C Chips.

Thanks,
Bernd


  reply	other threads:[~2011-08-04 13:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-04 11:13 [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset kashyap.desai
2011-08-04 13:09 ` Bernd Schubert
2011-08-04 13:37   ` Desai, Kashyap
2011-08-04 13:47     ` Bernd Schubert [this message]
2011-08-10 16:57       ` Benjamin ESTRABAUD

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3AA2FC.9030805@itwm.fraunhofer.de \
    --to=bernd.schubert@itwm.fraunhofer.de \
    --cc=Eric.Moore@lsi.com \
    --cc=JBottomley@parallels.com \
    --cc=Kashyap.Desai@lsi.com \
    --cc=Nagalakshmi.Nandigama@lsi.com \
    --cc=Sathya.Prakash@lsi.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox