From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin ESTRABAUD Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset Date: Wed, 10 Aug 2011 17:57:28 +0100 Message-ID: <4E42B878.4030103@mpstor.com> References: <201108041051.p74AprAP005912@milmhbs0.lsil.com> <4E3A9A05.8090602@itwm.fraunhofer.de> <4E3AA2FC.9030805@itwm.fraunhofer.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from relay2.blacknight.com ([78.153.203.205]:53307 "EHLO relay2.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752409Ab1HJRGo (ORCPT ); Wed, 10 Aug 2011 13:06:44 -0400 In-Reply-To: <4E3AA2FC.9030805@itwm.fraunhofer.de> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Bernd Schubert Cc: "Desai, Kashyap" , "linux-scsi@vger.kernel.org" , "Nandigama, Nagalakshmi" , "Prakash, Sathya" , "Moore, Eric" , "JBottomley@parallels.com" On 04/08/11 14:47, Bernd Schubert wrote: > On 08/04/2011 03:37 PM, Desai, Kashyap wrote: >> >> >>> -----Original Message----- >>> From: Bernd Schubert [mailto:bernd.schubert@itwm.fraunhofer.de] >>> Sent: Thursday, August 04, 2011 6:39 PM >>> To: Desai, Kashyap >>> Cc: linux-scsi@vger.kernel.org; Nandigama, Nagalakshmi; Prakash, >>> Sathya; >>> Moore, Eric; JBottomley@parallels.com >>> Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while >>> doing >>> aggressive HBA reset >>> >>> On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote: >>>> Issue: >>>> Device goes offline while doing aggressive HBA reset >>>> along with IO using some utility. >>>> >>>> Root cause: >>>> FW goes into bad state due to aggressive reset. Softreset does >>>> not help to recover FW. And also aggressive reset open up the >>>> window for Error handling thread to kicked off at the same time >>>> HBA will be in constant RESET loop as part of aggressive reset >>>> test case can lead Device to goes offline. >>>> >>>> Changes: >>>> 1. Added extra check as below inside eh_timed_out call back as below. >>>> if(ioc->ioc_reset_in_progress) >>>> Rc = EH_TIMER_RESET >>>> 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task >>> management context. >>>> Since SAS controller uses high priority queue for task >>>> management. >>> This check is >>>> not required for SAS controller. >>>> 3. Moved SoftReset call to HardReset from Task Mgmt context. >>> >>> [...] >>> >>> >>>> --- a/drivers/message/fusion/mptscsih.c >>>> +++ b/drivers/message/fusion/mptscsih.c >>>> @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 >>> type, u8 channel, u8 id, int lun, >>>> return 0; >>>> } >>>> >>>> - if (ioc_raw_state& MPI_DOORBELL_ACTIVE) { >>>> + /* DOORBELL ACTIVE check is not required if >>>> + * MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported. >>>> + */ >>>> + >>>> + if (!((ioc->facts.IOCCapabilities& >>> MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q) >>>> + && (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&& >>>> + (ioc_raw_state& MPI_DOORBELL_ACTIVE)) { >>>> printk(MYIOC_s_WARN_FMT >>>> "TaskMgmt type=%x: ioc_state: " >>>> "DOORBELL_ACTIVE (0x%x)!\n", >>>> @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8 >>> type, u8 channel, u8 id, int lun, >>>> printk(MYIOC_s_WARN_FMT >>>> "Issuing Reset from %s!! doorbell=0x%08x\n", >>>> ioc->name, __func__, mpt_GetIocState(ioc, 0)); >>>> - retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP); >>>> + retval = mpt_HardResetHandler(ioc, CAN_SLEEP); >>>> mpt_free_msg_frame(ioc, mf); >>>> } >>> >>> Have you ever tested that with dual port 501030C parallel scsi HBAs? >>> The >>> hard reset with those HBAs will reset *both* ports and eventually >>> *both* >>> ports will fail. A couple of years ago I tried to convince Eric to >>> disable hard resets for those chips at all (and even sent a patch), but >>> Eric never agreed on that. >>> The soft-reset handler was a workaround for that problem, but with that >>> patch the issue will re-appear. The affected systems are still in >>> production and probably will still be for the next few years. >> >> I did not tried with dual port 501030C parallel scsi HBA.. I remember >> that exact issue you have described here. >> I can add check for ioc->bus_type == SAS to have HardReset and other >> case I will continue with SoftReset. >> Just wanted to know Is this fine to avoid issue which you have >> mentioned ? >> >> Pls let me know your view on it, so that I can resend the patch. > > Yes, I think adding a test for SAS would be fine and would keep the > workaround for 1030C Chips. > > Thanks, > Bernd Just a quick note: as far as I know, the hard reset will also disable both ports on SAS (on 1068 chips for instance), causing one bad port to take down the second port. I am unsure whether the port comes back up ok after the hard reset or not. Regards, Ben. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >