* [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error @ 2012-12-17 21:58 Sreekanth Reddy 2012-12-17 13:12 ` Tomas Henzl 0 siblings, 1 reply; 4+ messages in thread From: Sreekanth Reddy @ 2012-12-17 21:58 UTC (permalink / raw) To: jejb, sreekanth.reddy, Nagalakshmi.Nandigama, JBottomley Cc: linux-scsi, sathya.prakash This patch stops the driver to invoke kthread (which remove the dead ioc) for some time while EEH recovery has started. Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> --- diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index ffd85c5..2349531 100755 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -155,7 +155,7 @@ _base_fault_reset_work(struct work_struct *work) struct task_struct *p; spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags); - if (ioc->shost_recovery) + if (ioc->shost_recovery || ioc->pci_error_recovery) goto rearm_timer; spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags); @@ -164,6 +164,20 @@ _base_fault_reset_work(struct work_struct *work) printk(MPT2SAS_INFO_FMT "%s : SAS host is non-operational !!!!\n", ioc->name, __func__); + /* It may be possible that EEH recovery can resolve some of + * pci bus failure issues rather removing the dead ioc function + * by considering controller is in a non-operational state. So + * here priority is given to the EEH recovery. If it doesn't + * not resolve this issue, mpt2sas driver will consider this + * controller to non-operational state and remove the dead ioc + * function. + */ + if (ioc->non_operational_loop++ < 5) { + spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, + flags); + goto rearm_timer; + } + /* * Call _scsih_flush_pending_cmds callback so that we flush all * pending commands back to OS. This call is required to aovid @@ -4386,6 +4400,7 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) if (missing_delay[0] != -1 && missing_delay[1] != -1) _base_update_missing_delay(ioc, missing_delay[0], missing_delay[1]); + ioc->non_operational_loop = 0; return 0; diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h index 543d8d6..c6ee7aa 100755 --- a/drivers/scsi/mpt2sas/mpt2sas_base.h +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h @@ -835,6 +835,7 @@ struct MPT2SAS_ADAPTER { u16 cpu_msix_table_sz; u32 ioc_reset_count; MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; + u32 non_operational_loop; /* internal commands, callback index */ u8 scsi_io_cb_idx; ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error 2012-12-17 21:58 [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error Sreekanth Reddy @ 2012-12-17 13:12 ` Tomas Henzl 2012-12-18 5:07 ` Reddy, Sreekanth 0 siblings, 1 reply; 4+ messages in thread From: Tomas Henzl @ 2012-12-17 13:12 UTC (permalink / raw) To: Sreekanth Reddy Cc: jejb, Nagalakshmi.Nandigama, JBottomley, linux-scsi, sathya.prakash On 12/17/2012 10:58 PM, Sreekanth Reddy wrote: > This patch stops the driver to invoke kthread (which remove the dead ioc) > for some time while EEH recovery has started. Thank you for posting this, the issue we have seen is resolved now. Shouldn't be an additional initialization added? So after a transient event the non_operational_loop is reset again? Tomas diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index fd3b3d7..480111c 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -208,6 +208,8 @@ _base_fault_reset_work(struct work_struct *work) return; /* don't rearm timer */ } + ioc->non_operational_loop = 0; + if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) { rc = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, FORCE_BIG_HAMMER); > > Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> > --- > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c > index ffd85c5..2349531 100755 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c > @@ -155,7 +155,7 @@ _base_fault_reset_work(struct work_struct *work) > struct task_struct *p; > > spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags); > - if (ioc->shost_recovery) > + if (ioc->shost_recovery || ioc->pci_error_recovery) > goto rearm_timer; > spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags); > > @@ -164,6 +164,20 @@ _base_fault_reset_work(struct work_struct *work) > printk(MPT2SAS_INFO_FMT "%s : SAS host is non-operational !!!!\n", > ioc->name, __func__); > > + /* It may be possible that EEH recovery can resolve some of > + * pci bus failure issues rather removing the dead ioc function > + * by considering controller is in a non-operational state. So > + * here priority is given to the EEH recovery. If it doesn't > + * not resolve this issue, mpt2sas driver will consider this > + * controller to non-operational state and remove the dead ioc > + * function. > + */ > + if (ioc->non_operational_loop++ < 5) { > + spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, > + flags); > + goto rearm_timer; > + } > + > /* > * Call _scsih_flush_pending_cmds callback so that we flush all > * pending commands back to OS. This call is required to aovid > @@ -4386,6 +4400,7 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) > if (missing_delay[0] != -1 && missing_delay[1] != -1) > _base_update_missing_delay(ioc, missing_delay[0], > missing_delay[1]); > + ioc->non_operational_loop = 0; > > return 0; > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h > index 543d8d6..c6ee7aa 100755 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.h > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h > @@ -835,6 +835,7 @@ struct MPT2SAS_ADAPTER { > u16 cpu_msix_table_sz; > u32 ioc_reset_count; > MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; > + u32 non_operational_loop; > > /* internal commands, callback index */ > u8 scsi_io_cb_idx; > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error 2012-12-17 13:12 ` Tomas Henzl @ 2012-12-18 5:07 ` Reddy, Sreekanth 2012-12-18 13:36 ` Tomas Henzl 0 siblings, 1 reply; 4+ messages in thread From: Reddy, Sreekanth @ 2012-12-18 5:07 UTC (permalink / raw) To: Tomas Henzl Cc: jejb@kernel.org, Nandigama, Nagalakshmi, JBottomley@Parallels.com, linux-scsi@vger.kernel.org, Prakash, Sathya Yes Thomas, we need to reset the non_operational_loop to zero after the transient event. Thanks, Sreekanth. -----Original Message----- From: Tomas Henzl [mailto:thenzl@redhat.com] Sent: Monday, December 17, 2012 6:43 PM To: Reddy, Sreekanth Cc: jejb@kernel.org; Nandigama, Nagalakshmi; JBottomley@Parallels.com; linux-scsi@vger.kernel.org; Prakash, Sathya Subject: Re: [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error On 12/17/2012 10:58 PM, Sreekanth Reddy wrote: > This patch stops the driver to invoke kthread (which remove the dead > ioc) for some time while EEH recovery has started. Thank you for posting this, the issue we have seen is resolved now. Shouldn't be an additional initialization added? So after a transient event the non_operational_loop is reset again? Tomas diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index fd3b3d7..480111c 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -208,6 +208,8 @@ _base_fault_reset_work(struct work_struct *work) return; /* don't rearm timer */ } + ioc->non_operational_loop = 0; + if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) { rc = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, FORCE_BIG_HAMMER); > > Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> > --- > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c > b/drivers/scsi/mpt2sas/mpt2sas_base.c > index ffd85c5..2349531 100755 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c > @@ -155,7 +155,7 @@ _base_fault_reset_work(struct work_struct *work) > struct task_struct *p; > > spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags); > - if (ioc->shost_recovery) > + if (ioc->shost_recovery || ioc->pci_error_recovery) > goto rearm_timer; > spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags); > > @@ -164,6 +164,20 @@ _base_fault_reset_work(struct work_struct *work) > printk(MPT2SAS_INFO_FMT "%s : SAS host is non-operational !!!!\n", > ioc->name, __func__); > > + /* It may be possible that EEH recovery can resolve some of > + * pci bus failure issues rather removing the dead ioc function > + * by considering controller is in a non-operational state. So > + * here priority is given to the EEH recovery. If it doesn't > + * not resolve this issue, mpt2sas driver will consider this > + * controller to non-operational state and remove the dead ioc > + * function. > + */ > + if (ioc->non_operational_loop++ < 5) { > + spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, > + flags); > + goto rearm_timer; > + } > + > /* > * Call _scsih_flush_pending_cmds callback so that we flush all > * pending commands back to OS. This call is required to aovid @@ > -4386,6 +4400,7 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) > if (missing_delay[0] != -1 && missing_delay[1] != -1) > _base_update_missing_delay(ioc, missing_delay[0], > missing_delay[1]); > + ioc->non_operational_loop = 0; > > return 0; > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h > b/drivers/scsi/mpt2sas/mpt2sas_base.h > index 543d8d6..c6ee7aa 100755 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.h > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h > @@ -835,6 +835,7 @@ struct MPT2SAS_ADAPTER { > u16 cpu_msix_table_sz; > u32 ioc_reset_count; > MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; > + u32 non_operational_loop; > > /* internal commands, callback index */ > u8 scsi_io_cb_idx; > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in the body of a message to majordomo@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error 2012-12-18 5:07 ` Reddy, Sreekanth @ 2012-12-18 13:36 ` Tomas Henzl 0 siblings, 0 replies; 4+ messages in thread From: Tomas Henzl @ 2012-12-18 13:36 UTC (permalink / raw) To: Reddy, Sreekanth Cc: jejb@kernel.org, Nandigama, Nagalakshmi, JBottomley@Parallels.com, linux-scsi@vger.kernel.org, Prakash, Sathya On 12/18/2012 06:07 AM, Reddy, Sreekanth wrote: > Yes Thomas, we need to reset the non_operational_loop to zero after the transient event. OK, so let me repost a V2 of the whole patch. > > Thanks, > Sreekanth. > > -----Original Message----- > From: Tomas Henzl [mailto:thenzl@redhat.com] > Sent: Monday, December 17, 2012 6:43 PM > To: Reddy, Sreekanth > Cc: jejb@kernel.org; Nandigama, Nagalakshmi; JBottomley@Parallels.com; linux-scsi@vger.kernel.org; Prakash, Sathya > Subject: Re: [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error > > On 12/17/2012 10:58 PM, Sreekanth Reddy wrote: >> This patch stops the driver to invoke kthread (which remove the dead >> ioc) for some time while EEH recovery has started. > Thank you for posting this, the issue we have seen is resolved now. > Shouldn't be an additional initialization added? > So after a transient event the non_operational_loop is reset again? > > Tomas > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c > index fd3b3d7..480111c 100644 > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c > @@ -208,6 +208,8 @@ _base_fault_reset_work(struct work_struct *work) > return; /* don't rearm timer */ > } > > + ioc->non_operational_loop = 0; > + > if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) { > rc = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, > FORCE_BIG_HAMMER); > > > >> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> >> --- >> >> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c >> b/drivers/scsi/mpt2sas/mpt2sas_base.c >> index ffd85c5..2349531 100755 >> --- a/drivers/scsi/mpt2sas/mpt2sas_base.c >> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c >> @@ -155,7 +155,7 @@ _base_fault_reset_work(struct work_struct *work) >> struct task_struct *p; >> >> spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags); >> - if (ioc->shost_recovery) >> + if (ioc->shost_recovery || ioc->pci_error_recovery) >> goto rearm_timer; >> spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags); >> >> @@ -164,6 +164,20 @@ _base_fault_reset_work(struct work_struct *work) >> printk(MPT2SAS_INFO_FMT "%s : SAS host is non-operational !!!!\n", >> ioc->name, __func__); >> >> + /* It may be possible that EEH recovery can resolve some of >> + * pci bus failure issues rather removing the dead ioc function >> + * by considering controller is in a non-operational state. So >> + * here priority is given to the EEH recovery. If it doesn't >> + * not resolve this issue, mpt2sas driver will consider this >> + * controller to non-operational state and remove the dead ioc >> + * function. >> + */ >> + if (ioc->non_operational_loop++ < 5) { >> + spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, >> + flags); >> + goto rearm_timer; >> + } >> + >> /* >> * Call _scsih_flush_pending_cmds callback so that we flush all >> * pending commands back to OS. This call is required to aovid @@ >> -4386,6 +4400,7 @@ mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc) >> if (missing_delay[0] != -1 && missing_delay[1] != -1) >> _base_update_missing_delay(ioc, missing_delay[0], >> missing_delay[1]); >> + ioc->non_operational_loop = 0; >> >> return 0; >> >> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h >> b/drivers/scsi/mpt2sas/mpt2sas_base.h >> index 543d8d6..c6ee7aa 100755 >> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h >> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h >> @@ -835,6 +835,7 @@ struct MPT2SAS_ADAPTER { >> u16 cpu_msix_table_sz; >> u32 ioc_reset_count; >> MPT2SAS_FLUSH_RUNNING_CMDS schedule_dead_ioc_flush_running_cmds; >> + u32 non_operational_loop; >> >> /* internal commands, callback index */ >> u8 scsi_io_cb_idx; >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-12-18 13:37 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-17 21:58 [PATCH] [SCSI] mpt2sas: fix for driver fails EEH recovery from injected pci bus error Sreekanth Reddy 2012-12-17 13:12 ` Tomas Henzl 2012-12-18 5:07 ` Reddy, Sreekanth 2012-12-18 13:36 ` Tomas Henzl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).