All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin ESTRABAUD <be@mpstor.com>
To: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Cc: "Desai, Kashyap" <Kashyap.Desai@lsi.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>,
	"Prakash, Sathya" <Sathya.Prakash@lsi.com>,
	"Moore, Eric" <Eric.Moore@lsi.com>,
	"JBottomley@parallels.com" <JBottomley@parallels.com>
Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset
Date: Wed, 10 Aug 2011 17:57:28 +0100	[thread overview]
Message-ID: <4E42B878.4030103@mpstor.com> (raw)
In-Reply-To: <4E3AA2FC.9030805@itwm.fraunhofer.de>

On 04/08/11 14:47, Bernd Schubert wrote:
> On 08/04/2011 03:37 PM, Desai, Kashyap wrote:
>>
>>
>>> -----Original Message-----
>>> From: Bernd Schubert [mailto:bernd.schubert@itwm.fraunhofer.de]
>>> Sent: Thursday, August 04, 2011 6:39 PM
>>> To: Desai, Kashyap
>>> Cc: linux-scsi@vger.kernel.org; Nandigama, Nagalakshmi; Prakash, 
>>> Sathya;
>>> Moore, Eric; JBottomley@parallels.com
>>> Subject: Re: [PATCH 04/05] mptfusion: Fix for device offline while 
>>> doing
>>> aggressive HBA reset
>>>
>>> On 08/04/2011 01:13 PM, kashyap.desai@lsi.com wrote:
>>>> Issue:
>>>> Device goes offline while doing aggressive HBA reset
>>>> along with IO using some utility.
>>>>
>>>> Root cause:
>>>> FW goes into bad state due to aggressive reset. Softreset does
>>>> not help to recover FW. And also aggressive reset open up the
>>>> window for Error handling thread to kicked off at the same time
>>>> HBA will be in constant RESET loop as part of aggressive reset
>>>> test case can lead Device to goes offline.
>>>>
>>>> Changes:
>>>> 1. Added extra check as below inside eh_timed_out call back as below.
>>>> if(ioc->ioc_reset_in_progress)
>>>>       Rc = EH_TIMER_RESET
>>>> 2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
>>> management context.
>>>>      Since SAS controller uses high priority queue for task 
>>>> management.
>>> This check is
>>>>      not required for SAS controller.
>>>> 3. Moved SoftReset call to HardReset from Task Mgmt context.
>>>
>>> [...]
>>>
>>>
>>>> --- a/drivers/message/fusion/mptscsih.c
>>>> +++ b/drivers/message/fusion/mptscsih.c
>>>> @@ -1630,7 +1630,13 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>>> type, u8 channel, u8 id, int lun,
>>>>            return 0;
>>>>        }
>>>>
>>>> -    if (ioc_raw_state&   MPI_DOORBELL_ACTIVE) {
>>>> +    /* DOORBELL ACTIVE check is not required if
>>>> +    *  MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q is supported.
>>>> +    */
>>>> +
>>>> +    if (!((ioc->facts.IOCCapabilities&
>>> MPI_IOCFACTS_CAPABILITY_HIGH_PRI_Q)
>>>> + &&   (ioc->facts.MsgVersion>= MPI_VERSION_01_05))&&
>>>> +        (ioc_raw_state&   MPI_DOORBELL_ACTIVE)) {
>>>>            printk(MYIOC_s_WARN_FMT
>>>>                "TaskMgmt type=%x: ioc_state: "
>>>>                "DOORBELL_ACTIVE (0x%x)!\n",
>>>> @@ -1729,7 +1735,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd, u8
>>> type, u8 channel, u8 id, int lun,
>>>>            printk(MYIOC_s_WARN_FMT
>>>>                   "Issuing Reset from %s!! doorbell=0x%08x\n",
>>>>                   ioc->name, __func__, mpt_GetIocState(ioc, 0));
>>>> -        retval = mpt_Soft_Hard_ResetHandler(ioc, CAN_SLEEP);
>>>> +        retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
>>>>            mpt_free_msg_frame(ioc, mf);
>>>>        }
>>>
>>> Have you ever tested that with dual port 501030C parallel scsi HBAs? 
>>> The
>>> hard reset with those HBAs will reset *both* ports and eventually 
>>> *both*
>>> ports will fail. A couple of years ago I tried to convince Eric to
>>> disable hard resets for those chips at all (and even sent a patch), but
>>> Eric never agreed on that.
>>> The soft-reset handler was a workaround for that problem, but with that
>>> patch the issue will re-appear. The affected systems are still in
>>> production and probably will still be for the next few years.
>>
>> I did not tried with dual port 501030C parallel scsi HBA.. I remember 
>> that exact issue you have described here.
>> I can add check for ioc->bus_type == SAS to have HardReset and other 
>> case I will continue with SoftReset.
>> Just wanted to know Is this fine to avoid issue which you have 
>> mentioned ?
>>
>> Pls let me know your view on it, so that I can resend the patch.
>
> Yes, I think adding a test for SAS would be fine and would keep the 
> workaround for 1030C Chips.
>
> Thanks,
> Bernd

Just a quick note: as far as I know, the hard reset will also disable 
both ports on SAS (on 1068 chips for instance), causing one bad port to 
take down the second port. I am unsure whether the port comes back up ok 
after the hard reset or not.

Regards,
Ben.
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


      reply	other threads:[~2011-08-10 17:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-04 11:13 [PATCH 04/05] mptfusion: Fix for device offline while doing aggressive HBA reset kashyap.desai
2011-08-04 13:09 ` Bernd Schubert
2011-08-04 13:37   ` Desai, Kashyap
2011-08-04 13:47     ` Bernd Schubert
2011-08-10 16:57       ` Benjamin ESTRABAUD [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E42B878.4030103@mpstor.com \
    --to=be@mpstor.com \
    --cc=Eric.Moore@lsi.com \
    --cc=JBottomley@parallels.com \
    --cc=Kashyap.Desai@lsi.com \
    --cc=Nagalakshmi.Nandigama@lsi.com \
    --cc=Sathya.Prakash@lsi.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.