* [PATCH 0/4 V2] mpt fusion error handler patches
@ 2008-09-23 13:16 Bernd Schubert
2008-09-23 13:20 ` [ PATCH 1/4 ] mpt fusion SoftReset handler Bernd Schubert
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Bernd Schubert @ 2008-09-23 13:16 UTC (permalink / raw)
To: Linux SCSI Mailing List, Eric Moore, Sathya Prakash,
James Bottomley
Cc: DL-MPTFusionLinux
Hello,
here is version 2 of the patches, version 1 may be found here:
http://kerneltrap.org/mailarchive/linux-scsi/2008/9/12/3279954
In difference to the previous series I skipped the removal of
mptscsih_TMHandler(), since it actually does the right thing, it only cause
the HardReset function to be activated, which caused the real trouble.
I also made the soft/hard reset calls into one main call. From this main call
it is then also generally possible to avoid calling the hard reset handler.
There is still one issue left, while in the 4.x series of the driver the
mptscsih_IssueTaskMgmt() succeeds for task aborts, this still fails in the in
kernel version. My guess is the this is related to the completion calls.
While the in-kernel driver uses mptscsih_tm_wait_for_completion(), the 4.x
series uses just like the in kernel mptsas the generic
wait_for_completion_timeout() handler.
But switching to that actually means a lot of work. Without to be sure this is
really the cause of the trouble, I won't spend my time on it.
Cheers,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* [ PATCH 1/4 ] mpt fusion SoftReset handler
2008-09-23 13:16 [PATCH 0/4 V2] mpt fusion error handler patches Bernd Schubert
@ 2008-09-23 13:20 ` Bernd Schubert
2008-10-30 15:37 ` Prakash, Sathya
2008-09-23 13:26 ` [ PATCH 1/4 ] mpt fusion disable hard resets for 53C1030 based devices Bernd Schubert
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-09-23 13:20 UTC (permalink / raw)
To: Linux SCSI Mailing List
Cc: Eric Moore, Sathya Prakash, James Bottomley, DL-MPTFusionLinux
On dual port 53C1030 based HBAs such as the LSI22320R, the hard reset handler
will cause DID_SOFT_ERROR for innocent devices on the second port.
Introduce a mpt_SoftResetHandler() which doesn't cause this issue and
slightly improve mpt_HardResetHandler().
This is mostly a backport of the fusion-4.x driver available from LSI.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptbase.c | 211 ++++++++++++++++++++++++----
drivers/message/fusion/mptbase.h | 11 +
drivers/message/fusion/mptctl.c | 7
drivers/message/fusion/mptsas.c | 4
drivers/message/fusion/mptscsih.c | 35 ++--
5 files changed, 218 insertions(+), 50 deletions(-)
Index: linux-2.6.26/drivers/message/fusion/mptbase.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
+++ linux-2.6.26/drivers/message/fusion/mptbase.c
@@ -5858,7 +5858,7 @@ mpt_timer_expired(unsigned long data)
dcprintk(ioc, printk(MYIOC_s_DEBUG_FMT "mpt_timer_expired! \n", ioc->name));
/* Perform a FW reload */
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n", ioc->name);
/* No more processing.
@@ -6232,6 +6232,129 @@ mpt_print_ioc_summary(MPT_ADAPTER *ioc,
/*
* Reset Handling
*/
+
+/**
+ * mpt_SoftResetHandler - Issues a less expensive reset
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleepFlag: Indicates if sleep or schedule must be called.
+
+ *
+ * Returns 0 for SUCCESS or -1 if FAILED.
+ *
+ * Message Unit Reset - instructs the IOC to reset the Reply Post and
+ * Free FIFO's. All the Message Frames on Reply Free FIFO are discarded.
+ * All posted buffers are freed, and event notification is turned off.
+ * IOC doesnt reply to any outstanding request. This will transfer IOC
+ * to READY state.
+ **/
+static int
+mpt_SoftResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
+{
+ int rc;
+ int ii;
+ u8 cb_idx;
+ unsigned long flags;
+ u32 ioc_state;
+ unsigned long time_count;
+
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "SoftResetHandler Entered!\n",
+ ioc->name));
+
+ ioc_state = mpt_GetIocState(ioc, 0) & MPI_IOC_STATE_MASK;
+ if (ioc_state == MPI_IOC_STATE_FAULT ||
+ ioc_state == MPI_IOC_STATE_RESET) {
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT
+ "skipping, either in FAULT or RESET state!\n", ioc->name));
+ return -1;
+ }
+
+ spin_lock_irqsave(&ioc->diagLock, flags);
+ if (ioc->ioc_reset_in_progress) {
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+ return -1;
+ }
+ ioc->ioc_reset_in_progress = 1;
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+
+ rc = -1;
+
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
+ }
+
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
+ }
+
+ /* Disable reply interrupts (also blocks FreeQ) */
+ CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+ ioc->active = 0;
+ time_count = jiffies;
+ rc = SendIocReset(ioc, MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET, sleepFlag);
+ if (rc != 0)
+ goto out;
+ ioc_state = mpt_GetIocState(ioc, 0) & MPI_IOC_STATE_MASK;
+ if (ioc_state != MPI_IOC_STATE_READY)
+ goto out;
+
+ for (ii = 0; ii < 5; ii++) {
+ /* Get IOC facts! Allow 5 retries */
+ rc = GetIocFacts(ioc, sleepFlag, MPT_HOSTEVENT_IOC_RECOVER);
+ if (rc == 0)
+ break;
+ if (sleepFlag == CAN_SLEEP)
+ msleep(100);
+ else
+ mdelay(100);
+ }
+ if (ii == 5)
+ goto out;
+
+ rc = PrimeIocFifos(ioc);
+ if (rc != 0)
+ goto out;
+
+ rc = SendIocInit(ioc, sleepFlag);
+ if (rc != 0)
+ goto out;
+
+ rc = SendEventNotification(ioc, 1);
+ if (rc != 0)
+ goto out;
+
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
+
+ /*
+ * At this point, we know soft reset succeeded.
+ */
+
+ ioc->active = 1;
+ CHIPREG_WRITE32(&ioc->chip->IntMask, MPI_HIM_DIM);
+
+ out:
+ spin_lock_irqsave(&ioc->diagLock, flags);
+ ioc->ioc_reset_in_progress = 0;
+ ioc->taskmgmt_quiesce_io = 0;
+ ioc->taskmgmt_in_progress = 0;
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+
+ if (ioc->active) { /* otherwise, hard reset coming */
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_POST_RESET);
+ }
+ }
+
+ printk(MYIOC_s_INFO_FMT "SoftResetHandler: completed (%d seconds): %s\n",
+ ioc->name, jiffies_to_msecs(jiffies - time_count)/1000,
+ ((rc == 0) ? "SUCCESS" : "FAILED"));
+
+ return rc;
+}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/**
* mpt_HardResetHandler - Generic reset handler
@@ -6253,9 +6376,10 @@ int
mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
{
int rc;
+ u8 cb_idx;
unsigned long flags;
+ unsigned long time_count;
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler Entered!\n", ioc->name));
#ifdef MFCNT
printk(MYIOC_s_INFO_FMT "HardResetHandler Entered!\n", ioc->name);
printk("MF count 0x%x !\n", ioc->mfcnt);
@@ -6265,12 +6389,15 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
* mpt_do_ioc_recovery at any instant in time.
*/
spin_lock_irqsave(&ioc->diagLock, flags);
- if ((ioc->diagPending) || (ioc->alt_ioc && ioc->alt_ioc->diagPending)){
+ if (ioc->ioc_reset_in_progress) {
spin_unlock_irqrestore(&ioc->diagLock, flags);
return 0;
} else {
ioc->diagPending = 1;
}
+ ioc->ioc_reset_in_progress = 1;
+ if (ioc->alt_ioc)
+ ioc->alt_ioc->ioc_reset_in_progress = 1;
spin_unlock_irqrestore(&ioc->diagLock, flags);
/* FIXME: If do_ioc_recovery fails, repeat....
@@ -6281,42 +6408,71 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
* Prevents timeouts occurring during a diagnostic reset...very bad.
* For all other protocol drivers, this is a no-op.
*/
- {
- u8 cb_idx;
- int r = 0;
-
- for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
- if (MptResetHandlers[cb_idx]) {
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling IOC reset_setup handler #%d\n",
- ioc->name, cb_idx));
- r += mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
- if (ioc->alt_ioc) {
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling alt-%s setup reset handler #%d\n",
- ioc->name, ioc->alt_ioc->name, cb_idx));
- r += mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_SETUP_RESET);
- }
- }
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx]) {
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
+ if (ioc->alt_ioc)
+ mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_SETUP_RESET);
}
}
- if ((rc = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_RECOVER, sleepFlag)) != 0) {
- printk(MYIOC_s_WARN_FMT "Cannot recover rc = %d!\n", ioc->name, rc);
+ time_count = jiffies;
+ rc = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_RECOVER, sleepFlag);
+ if (rc != 0) {
+ printk(KERN_WARNING MYNAM ": WARNING - (%d) Cannot recover %s\n",
+ rc, ioc->name);
+ } else {
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
}
- ioc->reload_fw = 0;
- if (ioc->alt_ioc)
- ioc->alt_ioc->reload_fw = 0;
spin_lock_irqsave(&ioc->diagLock, flags);
- ioc->diagPending = 0;
- if (ioc->alt_ioc)
- ioc->alt_ioc->diagPending = 0;
+ ioc->ioc_reset_in_progress = 0;
+ ioc->taskmgmt_quiesce_io = 0;
+ ioc->taskmgmt_in_progress = 0;
+ if (ioc->alt_ioc) {
+ ioc->alt_ioc->ioc_reset_in_progress = 0;
+ ioc->alt_ioc->taskmgmt_quiesce_io = 0;
+ ioc->alt_ioc->taskmgmt_in_progress = 0;
+ }
spin_unlock_irqrestore(&ioc->diagLock, flags);
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler rc = %d!\n", ioc->name, rc));
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx]) {
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_POST_RESET);
+ if (ioc->alt_ioc)
+ mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_POST_RESET);
+ }
+ }
+
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler: completed (%d seconds): %s\n",
+ ioc->name, jiffies_to_msecs(jiffies - time_count)/1000,
+ ((rc == 0) ? "SUCCESS" : "FAILED")));
+ return rc;
+}
+
+/**
+ * mpt_SoftHardResetHandler - Generic reset handler
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleepFlag: Indicates if sleep or schedule must be called.
+ *
+ * First try to do a soft reset and if this fails, call the
+ * hard-reset-handler
+ */
+int
+mpt_SoftHardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
+{
+ int rc;
+
+ rc = mpt_SoftResetHandler(ioc, sleepFlag);
+ if (rc) {
+ rc = mpt_HardResetHandler(ioc, sleepFlag);
+ }
return rc;
}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
static void
EventDescriptionStr(u8 event, u32 evData0, char *evStr)
@@ -7475,6 +7631,7 @@ EXPORT_SYMBOL(mpt_verify_adapter);
EXPORT_SYMBOL(mpt_GetIocState);
EXPORT_SYMBOL(mpt_print_ioc_summary);
EXPORT_SYMBOL(mpt_HardResetHandler);
+EXPORT_SYMBOL(mpt_SoftHardResetHandler);
EXPORT_SYMBOL(mpt_config);
EXPORT_SYMBOL(mpt_findImVolumes);
EXPORT_SYMBOL(mpt_alloc_fw_memory);
Index: linux-2.6.26/drivers/message/fusion/mptbase.h
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptbase.h
+++ linux-2.6.26/drivers/message/fusion/mptbase.h
@@ -699,6 +699,9 @@ typedef struct _MPT_ADAPTER
MPT_SAS_MGMT sas_mgmt;
struct work_struct sas_persist_task;
+ int taskmgmt_in_progress;
+ u8 taskmgmt_quiesce_io;
+
struct work_struct fc_setup_reset_work;
struct list_head fc_rports;
struct work_struct fc_lsc_work;
@@ -707,6 +710,11 @@ typedef struct _MPT_ADAPTER
struct work_struct fc_rescan_work;
char fc_rescan_work_q_name[KOBJ_NAME_LEN];
struct workqueue_struct *fc_rescan_work_q;
+
+ unsigned long hard_resets; /* driver forced bus resets count */
+ unsigned long soft_resets; /* fw/external bus resets count */
+ u8 ioc_reset_in_progress;
+
struct scsi_cmnd **ScsiLookup;
spinlock_t scsi_lookup_lock;
} MPT_ADAPTER;
@@ -836,8 +844,6 @@ typedef struct _MPT_SCSI_HOST {
MPT_FRAME_HDR *cmdPtr; /* Ptr to nonOS request */
struct scsi_cmnd *abortSCpnt;
MPT_LOCAL_REPLY localReply; /* internal cmd reply struct */
- unsigned long hard_resets; /* driver forced bus resets count */
- unsigned long soft_resets; /* fw/external bus resets count */
unsigned long timeouts; /* cmd timeouts */
ushort sel_timeout[MPT_MAX_FC_DEVICES];
char *info_kbuf;
@@ -908,6 +914,7 @@ extern int mpt_verify_adapter(int iocid
extern u32 mpt_GetIocState(MPT_ADAPTER *ioc, int cooked);
extern void mpt_print_ioc_summary(MPT_ADAPTER *ioc, char *buf, int *size, int len, int showlan);
extern int mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag);
+extern int mpt_SoftHardResetHandler(MPT_ADAPTER *ioc, int sleepFlag);
extern int mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS *cfg);
extern int mpt_alloc_fw_memory(MPT_ADAPTER *ioc, int size);
extern void mpt_free_fw_memory(MPT_ADAPTER *ioc);
Index: linux-2.6.26/drivers/message/fusion/mptscsih.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptscsih.c
+++ linux-2.6.26/drivers/message/fusion/mptscsih.c
@@ -1605,7 +1605,7 @@ mptscsih_TMHandler(MPT_SCSI_HOST *hd, u8
"TM Handler for type=%x: IOC Not operational (0x%x)!\n",
ioc->name, type, ioc_raw_state);
printk(MYIOC_s_WARN_FMT " Issuing HardReset!!\n", ioc->name);
- if (mpt_HardResetHandler(ioc, CAN_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, CAN_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT "TMHandler: HardReset "
"FAILED!!\n", ioc->name);
return FAILED;
@@ -1621,8 +1621,8 @@ mptscsih_TMHandler(MPT_SCSI_HOST *hd, u8
/* Isse the Task Mgmt request.
*/
- if (hd->hard_resets < -1)
- hd->hard_resets++;
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
rc = mptscsih_IssueTaskMgmt(hd, type, channel, id, lun,
ctx2abort, timeout);
@@ -1724,7 +1724,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd
ioc, mf));
dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling HardReset! \n",
ioc->name));
- retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
+ retval = mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
ioc->name, retval));
goto fail_out;
@@ -1999,11 +1999,12 @@ int
mptscsih_host_reset(struct scsi_cmnd *SCpnt)
{
MPT_SCSI_HOST * hd;
- int retval;
+ int retval, status;
MPT_ADAPTER *ioc;
/* If we can't locate the host to reset, then we failed. */
- if ((hd = shost_priv(SCpnt->device->host)) == NULL){
+ hd = shost_priv(SCpnt->device->host);
+ if (hd == NULL) {
printk(KERN_ERR MYNAM ": host reset: "
"Can't locate host! (sc=%p)\n", SCpnt);
return FAILED;
@@ -2016,21 +2017,23 @@ mptscsih_host_reset(struct scsi_cmnd *SC
/* If our attempts to reset the host failed, then return a failed
* status. The host will be taken off line by the SCSI mid-layer.
*/
- if (mpt_HardResetHandler(ioc, CAN_SLEEP) < 0) {
- retval = FAILED;
- } else {
+ retval = mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
+
+ if (retval < 0)
+ status = FAILED;
+ else {
/* Make sure TM pending is cleared and TM state is set to
* NONE.
*/
- retval = 0;
+ status = SUCCESS;
hd->tmPending = 0;
hd->tmState = TM_STATE_NONE;
}
printk(MYIOC_s_INFO_FMT "host reset: %s (sc=%p)\n",
- ioc->name, ((retval == 0) ? "SUCCESS" : "FAILED" ), SCpnt);
+ ioc->name, ((status == SUCCESS) ? "SUCCESS" : "FAILED"), SCpnt);
- return retval;
+ return status;
}
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
@@ -2219,7 +2222,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *
*/
if (iocstatus == MPI_IOCSTATUS_SCSI_TASK_MGMT_FAILED ||
hd->cmdPtr)
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT " Firmware Reload FAILED!!\n", ioc->name);
break;
@@ -2741,8 +2744,8 @@ mptscsih_event_process(MPT_ADAPTER *ioc,
break;
case MPI_EVENT_IOC_BUS_RESET: /* 04 */
case MPI_EVENT_EXT_BUS_RESET: /* 05 */
- if (hd && (ioc->bus_type == SPI) && (hd->soft_resets < -1))
- hd->soft_resets++;
+ if (hd && (ioc->bus_type == SPI) && (ioc->soft_resets < -1))
+ ioc->soft_resets++;
break;
case MPI_EVENT_LOGOUT: /* 09 */
/* FIXME! */
@@ -2980,7 +2983,7 @@ mptscsih_timer_expired(unsigned long dat
*/
} else {
/* Perform a FW reload */
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0) {
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0) {
printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n", ioc->name);
}
}
Index: linux-2.6.26/drivers/message/fusion/mptctl.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptctl.c
+++ linux-2.6.26/drivers/message/fusion/mptctl.c
@@ -323,7 +323,7 @@ static void mptctl_timeout_expired (MPT_
*/
dctlprintk(ioctl->ioc, printk(MYIOC_s_DEBUG_FMT "Calling HardReset! \n",
ioctl->ioc->name));
- mpt_HardResetHandler(ioctl->ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioctl->ioc, CAN_SLEEP);
}
return;
@@ -680,6 +680,7 @@ static int mptctl_do_reset(unsigned long
dctlprintk(iocp, printk(MYIOC_s_DEBUG_FMT "mptctl_do_reset called.\n",
iocp->name));
+ /* FIXME: Can we call mptSoftHardResetHandler() here? */
if (mpt_HardResetHandler(iocp, CAN_SLEEP) != 0) {
printk (MYIOC_s_ERR_FMT "%s@%d::mptctl_do_reset - reset failed.\n",
iocp->name, __FILE__, __LINE__);
@@ -2467,8 +2468,8 @@ mptctl_hp_hostinfo(unsigned long arg, un
MPT_SCSI_HOST *hd = shost_priv(ioc->sh);
if (hd && (cim_rev == 1)) {
- karg.hard_resets = hd->hard_resets;
- karg.soft_resets = hd->soft_resets;
+ karg.hard_resets = ioc->hard_resets;
+ karg.soft_resets = ioc->soft_resets;
karg.timeouts = hd->timeouts;
}
}
Index: linux-2.6.26/drivers/message/fusion/mptsas.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptsas.c
+++ linux-2.6.26/drivers/message/fusion/mptsas.c
@@ -1167,7 +1167,7 @@ static int mptsas_phy_reset(struct sas_p
if (!timeleft) {
/* On timeout reset the board */
mpt_free_msg_frame(ioc, mf);
- mpt_HardResetHandler(ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
error = -ETIMEDOUT;
goto out_unlock;
}
@@ -1345,7 +1345,7 @@ static int mptsas_smp_handler(struct Scs
if (!timeleft) {
printk(MYIOC_s_ERR_FMT "%s: smp timeout!\n", ioc->name, __FUNCTION__);
/* On timeout reset the board */
- mpt_HardResetHandler(ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
ret = -ETIMEDOUT;
goto unmap;
}
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* [ PATCH 1/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-09-23 13:16 [PATCH 0/4 V2] mpt fusion error handler patches Bernd Schubert
2008-09-23 13:20 ` [ PATCH 1/4 ] mpt fusion SoftReset handler Bernd Schubert
@ 2008-09-23 13:26 ` Bernd Schubert
2008-09-23 13:30 ` [ PATCH 2/4 " Bernd Schubert
2008-09-23 13:27 ` [ PATCH 3/4 ] mpt fusion prevent DV deadlock Bernd Schubert
2008-09-23 13:28 ` [PATCH 4/4 ] Increase scsi timeouts Bernd Schubert
3 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-09-23 13:26 UTC (permalink / raw)
To: Linux SCSI Mailing List
Cc: Eric Moore, Sathya Prakash, James Bottomley, DL-MPTFusionLinux
For 53C1030 based dual port HBAs the hard reset handler will cause
trouble on the second channel with innocent devices. It is then better
to fail the device which activated the error handler than to fail
cause errors on unrelated devices. Of course, the real solutions
would be to figure out why the hard reset handler cause trouble on the
second channel. Probably only LSI can do, though.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptbase.c | 42 ++++++++++++++++++++++++++++-
drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
2 files changed, 72 insertions(+), 1 deletion(-)
Index: linux-2.6.26/drivers/message/fusion/mptbase.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
+++ linux-2.6.26/drivers/message/fusion/mptbase.c
@@ -59,6 +59,7 @@
#include <linux/interrupt.h> /* needed for in_interrupt() proto */
#include <linux/dma-mapping.h>
#include <asm/io.h>
+#include <scsi/scsi_device.h>
#ifdef CONFIG_MTRR
#include <asm/mtrr.h>
#endif
@@ -6452,6 +6453,33 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
}
/**
+ * Check if there are devices connected to the second (alt) ioc.
+ * Return 1 if there is at least on device and 0 if there are
+ * none or no alt_ioc.
+ */
+static int
+alt_ioc_with_dev(MPT_ADAPTER *ioc)
+{
+ struct Scsi_Host *shost;
+ struct scsi_device *sdev;
+ int have_devices = 0;
+
+ if (!ioc->alt_ioc)
+ return 0;
+
+ shost = ioc->alt_ioc->sh;
+
+ shost_for_each_device(sdev, shost) {
+ /* when we are here, we know there is is a device
+ * attached to this host, which is all we need to know */
+ have_devices = 1;
+ break;
+ }
+
+ return have_devices;
+}
+
+/**
* mpt_SoftHardResetHandler - Generic reset handler
* @ioc: Pointer to MPT_ADAPTER structure
* @sleepFlag: Indicates if sleep or schedule must be called.
@@ -6466,7 +6494,19 @@ mpt_SoftHardResetHandler(MPT_ADAPTER *io
rc = mpt_SoftResetHandler(ioc, sleepFlag);
if (rc) {
- rc = mpt_HardResetHandler(ioc, sleepFlag);
+ if (ioc->no_hard_reset && alt_ioc_with_dev(ioc)) {
+ /* On dual port HBAs based on the 53C1030 chip the
+ * hard reset handler will cause DID_SOFT_ERROR on
+ * the second (in principle independent) port.
+ * Almost always this error cannot be recovered
+ * causing entire device failures. So it better not
+ * to call the hard reset handler at all in order to
+ * prevent failures of independent devices */
+ printk(MYIOC_s_INFO_FMT "Skipping hard reset in "
+ "order to prevent failures on %s.\n",
+ ioc->name, ioc->alt_ioc->name);
+ } else
+ rc = mpt_HardResetHandler(ioc, sleepFlag);
}
return rc;
Index: linux-2.6.26/drivers/message/fusion/mptspi.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
+++ linux-2.6.26/drivers/message/fusion/mptspi.c
@@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev *pdev)
#endif
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
+/**
+ * avoid_hard_reset - check if hard resets should be avoided
+ * @pdev: Pointer to pci_dev structure
+ *
+ * Hard resets will cause trouble on the the secondary IOC of
+ * 53C1030 based devices.
+ *
+ * Returns 1 if affected chip is found and 1 for unaffected chips
+ */
+static int
+avoid_hard_reset(struct pci_dev *pdev)
+{
+ int avoid;
+
+ switch (pdev->device) {
+ case MPI_MANUFACTPAGE_DEVID_53C1030:
+ case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
+ /* TODO: which chips are affected as well? */
+ avoid = 1;
+ break;
+ default:
+ avoid = 0;
+ }
+
+ return avoid;
+}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* mptspi_probe - Installs scsi devices per bus.
@@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev *pdev, const
goto out_mptspi_probe;
}
+ /* hard resets on 53C1030 HBAs will cause trouble on secondaray (alt)
+ * IOCs, so better no hard reset on these */
+ ioc->no_hard_reset = avoid_hard_reset(pdev);
+
/*
* issue internal bus reset
*/
^ permalink raw reply [flat|nested] 18+ messages in thread
* [ PATCH 3/4 ] mpt fusion prevent DV deadlock
2008-09-23 13:16 [PATCH 0/4 V2] mpt fusion error handler patches Bernd Schubert
2008-09-23 13:20 ` [ PATCH 1/4 ] mpt fusion SoftReset handler Bernd Schubert
2008-09-23 13:26 ` [ PATCH 1/4 ] mpt fusion disable hard resets for 53C1030 based devices Bernd Schubert
@ 2008-09-23 13:27 ` Bernd Schubert
2008-10-30 17:58 ` Prakash, Sathya
2008-09-23 13:28 ` [PATCH 4/4 ] Increase scsi timeouts Bernd Schubert
3 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-09-23 13:27 UTC (permalink / raw)
To: Linux SCSI Mailing List
Cc: Eric Moore, Sathya Prakash, James Bottomley, DL-MPTFusionLinux
The mpt fusion driver will do a domain revalidation on an ioc reset, but this
DV might cause a live deadlock. The problem has been entirely analyzed in
this thread http://marc.info/?t=118039577800004, but so far none of the
suggested
solutions has been implemented.
This patch simply disables the domain revalidation, if it does know it
will run into the deadlock.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptspi.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux-2.6.26/drivers/message/fusion/mptspi.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
+++ linux-2.6.26/drivers/message/fusion/mptspi.c
@@ -672,12 +672,24 @@ static void mptspi_dv_device(struct _MPT
{
VirtTarget *vtarget = scsi_target(sdev)->hostdata;
MPT_ADAPTER *ioc = hd->ioc;
+ unsigned long nr_requests = sdev->request_queue->nr_requests;
+ struct request_list *rl = &sdev->request_queue->rq;
/* no DV on RAID devices */
if (sdev->channel == 0 &&
mptspi_is_raid(hd, sdev->id))
return;
+ if (rl->count[0] + 1 >= nr_requests
+ || rl->count[1] + 1 >= nr_requests) {
+ /* we must NOT do a DV after an error recovery, when we
+ * don't have left a space in the request list, since
+ * this will cause a live dead lock */
+ starget_printk(KERN_INFO, scsi_target(sdev), MYIOC_s_FMT
+ "Skipping DV, to prevent dead lock!\n", ioc->name);
+ return;
+ }
+
/* If this is a piece of a RAID, then quiesce first */
if (sdev->channel == 1 &&
mptscsih_quiesce_raid(hd, 1, vtarget->channel, vtarget->id) < 0) {
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/4 ] Increase scsi timeouts
2008-09-23 13:16 [PATCH 0/4 V2] mpt fusion error handler patches Bernd Schubert
` (2 preceding siblings ...)
2008-09-23 13:27 ` [ PATCH 3/4 ] mpt fusion prevent DV deadlock Bernd Schubert
@ 2008-09-23 13:28 ` Bernd Schubert
2008-10-06 9:07 ` Prakash, Sathya
3 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-09-23 13:28 UTC (permalink / raw)
To: Linux SCSI Mailing List
Cc: Eric Moore, Sathya Prakash, James Bottomley, DL-MPTFusionLinux
Increase scsi-timeouts, similariy to the LSI 4.x driver.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptscsih.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Index: linux-2.6.26/drivers/message/fusion/mptscsih.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptscsih.c
+++ linux-2.6.26/drivers/message/fusion/mptscsih.c
@@ -1760,10 +1760,9 @@ mptscsih_get_tm_timeout(MPT_ADAPTER *ioc
case FC:
return 40;
case SAS:
- return 10;
case SPI:
default:
- return 2;
+ return 10;
}
}
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-09-23 13:26 ` [ PATCH 1/4 ] mpt fusion disable hard resets for 53C1030 based devices Bernd Schubert
@ 2008-09-23 13:30 ` Bernd Schubert
2008-10-06 9:07 ` Prakash, Sathya
0 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-09-23 13:30 UTC (permalink / raw)
To: Linux SCSI Mailing List
Cc: Eric Moore, Sathya Prakash, James Bottomley, DL-MPTFusionLinux
This is patch 2/4 ...
On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> For 53C1030 based dual port HBAs the hard reset handler will cause
> trouble on the second channel with innocent devices. It is then better
> to fail the device which activated the error handler than to fail
> cause errors on unrelated devices. Of course, the real solutions
> would be to figure out why the hard reset handler cause trouble on the
> second channel. Probably only LSI can do, though.
>
> Signed-off-by: Bernd Schubert <bs@q-leap.de>
>
> drivers/message/fusion/mptbase.c | 42 ++++++++++++++++++++++++++++-
> drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> 2 files changed, 72 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> ===================================================================
> --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> @@ -59,6 +59,7 @@
> #include <linux/interrupt.h> /* needed for in_interrupt() proto */
> #include <linux/dma-mapping.h>
> #include <asm/io.h>
> +#include <scsi/scsi_device.h>
> #ifdef CONFIG_MTRR
> #include <asm/mtrr.h>
> #endif
> @@ -6452,6 +6453,33 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
> }
>
> /**
> + * Check if there are devices connected to the second (alt) ioc.
> + * Return 1 if there is at least on device and 0 if there are
> + * none or no alt_ioc.
> + */
> +static int
> +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> +{
> + struct Scsi_Host *shost;
> + struct scsi_device *sdev;
> + int have_devices = 0;
> +
> + if (!ioc->alt_ioc)
> + return 0;
> +
> + shost = ioc->alt_ioc->sh;
> +
> + shost_for_each_device(sdev, shost) {
> + /* when we are here, we know there is is a device
> + * attached to this host, which is all we need to know */
> + have_devices = 1;
> + break;
> + }
> +
> + return have_devices;
> +}
> +
> +/**
> * mpt_SoftHardResetHandler - Generic reset handler
> * @ioc: Pointer to MPT_ADAPTER structure
> * @sleepFlag: Indicates if sleep or schedule must be called.
> @@ -6466,7 +6494,19 @@ mpt_SoftHardResetHandler(MPT_ADAPTER *io
>
> rc = mpt_SoftResetHandler(ioc, sleepFlag);
> if (rc) {
> - rc = mpt_HardResetHandler(ioc, sleepFlag);
> + if (ioc->no_hard_reset && alt_ioc_with_dev(ioc)) {
> + /* On dual port HBAs based on the 53C1030 chip the
> + * hard reset handler will cause DID_SOFT_ERROR on
> + * the second (in principle independent) port.
> + * Almost always this error cannot be recovered
> + * causing entire device failures. So it better not
> + * to call the hard reset handler at all in order to
> + * prevent failures of independent devices */
> + printk(MYIOC_s_INFO_FMT "Skipping hard reset in "
> + "order to prevent failures on %s.\n",
> + ioc->name, ioc->alt_ioc->name);
> + } else
> + rc = mpt_HardResetHandler(ioc, sleepFlag);
> }
>
> return rc;
> Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> ===================================================================
> --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev *pdev)
> #endif
>
>
> /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>-=*/ +/**
> + * avoid_hard_reset - check if hard resets should be avoided
> + * @pdev: Pointer to pci_dev structure
> + *
> + * Hard resets will cause trouble on the the secondary IOC of
> + * 53C1030 based devices.
> + *
> + * Returns 1 if affected chip is found and 1 for unaffected chips
> + */
> +static int
> +avoid_hard_reset(struct pci_dev *pdev)
> +{
> + int avoid;
> +
> + switch (pdev->device) {
> + case MPI_MANUFACTPAGE_DEVID_53C1030:
> + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> + /* TODO: which chips are affected as well? */
> + avoid = 1;
> + break;
> + default:
> + avoid = 0;
> + }
> +
> + return avoid;
> +}
> +
>
> /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>-=*/ /*
> * mptspi_probe - Installs scsi devices per bus.
> @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev *pdev, const
> goto out_mptspi_probe;
> }
>
> + /* hard resets on 53C1030 HBAs will cause trouble on secondaray (alt)
> + * IOCs, so better no hard reset on these */
> + ioc->no_hard_reset = avoid_hard_reset(pdev);
> +
> /*
> * issue internal bus reset
> */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-09-23 13:30 ` [ PATCH 2/4 " Bernd Schubert
@ 2008-10-06 9:07 ` Prakash, Sathya
2008-10-06 9:32 ` Bernd Schubert
0 siblings, 1 reply; 18+ messages in thread
From: Prakash, Sathya @ 2008-10-06 9:07 UTC (permalink / raw)
To: Bernd Schubert, Linux SCSI Mailing List
Cc: Moore, Eric, James Bottomley, DL-MPT Fusion Linux
Hi Bernd,
There are some cases where we MUST need hardreset for the firmware to recover. So we can not completely avoid SoftReset, That is the main purpose of our design of first to try softreset and if it is failing then to go for hard reset.
So I would suggest it is better not to remove the hard reset. If you think the latest LSI provided driver has some cases of avoiding hard reset like the one in this patch. Please let me know I will check and revert back.
Thanks
Sathya
-----Original Message-----
From: Bernd Schubert [mailto:bs@q-leap.de]
Sent: Tuesday, September 23, 2008 7:00 PM
To: Linux SCSI Mailing List
Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT Fusion Linux
Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
This is patch 2/4 ...
On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> For 53C1030 based dual port HBAs the hard reset handler will cause
> trouble on the second channel with innocent devices. It is then better
> to fail the device which activated the error handler than to fail
> cause errors on unrelated devices. Of course, the real solutions would
> be to figure out why the hard reset handler cause trouble on the
> second channel. Probably only LSI can do, though.
>
> Signed-off-by: Bernd Schubert <bs@q-leap.de>
>
> drivers/message/fusion/mptbase.c | 42 ++++++++++++++++++++++++++++-
> drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> 2 files changed, 72 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> ===================================================================
> --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> @@ -59,6 +59,7 @@
> #include <linux/interrupt.h> /* needed for in_interrupt() proto */
> #include <linux/dma-mapping.h>
> #include <asm/io.h>
> +#include <scsi/scsi_device.h>
> #ifdef CONFIG_MTRR
> #include <asm/mtrr.h>
> #endif
> @@ -6452,6 +6453,33 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i }
>
> /**
> + * Check if there are devices connected to the second (alt) ioc.
> + * Return 1 if there is at least on device and 0 if there are
> + * none or no alt_ioc.
> + */
> +static int
> +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> +{
> + struct Scsi_Host *shost;
> + struct scsi_device *sdev;
> + int have_devices = 0;
> +
> + if (!ioc->alt_ioc)
> + return 0;
> +
> + shost = ioc->alt_ioc->sh;
> +
> + shost_for_each_device(sdev, shost) {
> + /* when we are here, we know there is is a device
> + * attached to this host, which is all we need to know */
> + have_devices = 1;
> + break;
> + }
> +
> + return have_devices;
> +}
> +
> +/**
> * mpt_SoftHardResetHandler - Generic reset handler
> * @ioc: Pointer to MPT_ADAPTER structure
> * @sleepFlag: Indicates if sleep or schedule must be called.
> @@ -6466,7 +6494,19 @@ mpt_SoftHardResetHandler(MPT_ADAPTER *io
>
> rc = mpt_SoftResetHandler(ioc, sleepFlag);
> if (rc) {
> - rc = mpt_HardResetHandler(ioc, sleepFlag);
> + if (ioc->no_hard_reset && alt_ioc_with_dev(ioc)) {
> + /* On dual port HBAs based on the 53C1030 chip the
> + * hard reset handler will cause DID_SOFT_ERROR on
> + * the second (in principle independent) port.
> + * Almost always this error cannot be recovered
> + * causing entire device failures. So it better not
> + * to call the hard reset handler at all in order to
> + * prevent failures of independent devices */
> + printk(MYIOC_s_INFO_FMT "Skipping hard reset in "
> + "order to prevent failures on %s.\n",
> + ioc->name, ioc->alt_ioc->name);
> + } else
> + rc = mpt_HardResetHandler(ioc, sleepFlag);
> }
>
> return rc;
> Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> ===================================================================
> --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev *pdev) #endif
>
>
> /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>-=*/ +/**
> + * avoid_hard_reset - check if hard resets should be avoided
> + * @pdev: Pointer to pci_dev structure
> + *
> + * Hard resets will cause trouble on the the secondary IOC of
> + * 53C1030 based devices.
> + *
> + * Returns 1 if affected chip is found and 1 for unaffected chips
> + */
> +static int
> +avoid_hard_reset(struct pci_dev *pdev)
> +{
> + int avoid;
> +
> + switch (pdev->device) {
> + case MPI_MANUFACTPAGE_DEVID_53C1030:
> + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> + /* TODO: which chips are affected as well? */
> + avoid = 1;
> + break;
> + default:
> + avoid = 0;
> + }
> +
> + return avoid;
> +}
> +
>
> /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>-=*/ /*
> * mptspi_probe - Installs scsi devices per bus.
> @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev *pdev, const
> goto out_mptspi_probe;
> }
>
> + /* hard resets on 53C1030 HBAs will cause trouble on secondaray (alt)
> + * IOCs, so better no hard reset on these */
> + ioc->no_hard_reset = avoid_hard_reset(pdev);
> +
> /*
> * issue internal bus reset
> */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 4/4 ] Increase scsi timeouts
2008-09-23 13:28 ` [PATCH 4/4 ] Increase scsi timeouts Bernd Schubert
@ 2008-10-06 9:07 ` Prakash, Sathya
0 siblings, 0 replies; 18+ messages in thread
From: Prakash, Sathya @ 2008-10-06 9:07 UTC (permalink / raw)
To: Bernd Schubert, Linux SCSI Mailing List
Cc: Moore, Eric, James Bottomley, DL-MPT Fusion Linux
ACK
-----Original Message-----
From: Bernd Schubert [mailto:bs@q-leap.de]
Sent: Tuesday, September 23, 2008 6:59 PM
To: Linux SCSI Mailing List
Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT Fusion Linux
Subject: [PATCH 4/4 ] Increase scsi timeouts
Increase scsi-timeouts, similariy to the LSI 4.x driver.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptscsih.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Index: linux-2.6.26/drivers/message/fusion/mptscsih.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptscsih.c
+++ linux-2.6.26/drivers/message/fusion/mptscsih.c
@@ -1760,10 +1760,9 @@ mptscsih_get_tm_timeout(MPT_ADAPTER *ioc
case FC:
return 40;
case SAS:
- return 10;
case SPI:
default:
- return 2;
+ return 10;
}
}
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-10-06 9:07 ` Prakash, Sathya
@ 2008-10-06 9:32 ` Bernd Schubert
0 siblings, 0 replies; 18+ messages in thread
From: Bernd Schubert @ 2008-10-06 9:32 UTC (permalink / raw)
To: Prakash, Sathya
Cc: Linux SCSI Mailing List, Moore, Eric, James Bottomley,
DL-MPT Fusion Linux
Hello Sathya,
On Monday 06 October 2008 11:07:00 Prakash, Sathya wrote:
> Hi Bernd,
> There are some cases where we MUST need hardreset for the firmware to
> recover. So we can not completely avoid SoftReset, That is the main purpose
> of our design of first to try softreset and if it is failing then to go for
> hard reset.
>
> So I would suggest it is better not to remove the hard reset. If you think
> the latest LSI provided driver has some cases of avoiding hard reset like
> the one in this patch. Please let me know I will check and revert back.
yes I can understand that, however, it only skips the hard reset, when it
knows it will cause even more trouble than without it. Primary rule really
MUST be "Do not kill innocent unrelated devices!".
What are the requirements for a hard reset? When BOTH IOCs are in trouble, a
single hard reset wouldn't make the situation worse.
I didn't check yet and maybe you know immediately, is ioc->alt_ioc a pointer
to the real IOC structure? If so, we could use it to check if this IOC also
is in deep trouble.
Cheers,
Bernd
>
> Thanks
> Sathya
>
>
> -----Original Message-----
> From: Bernd Schubert [mailto:bs@q-leap.de]
> Sent: Tuesday, September 23, 2008 7:00 PM
> To: Linux SCSI Mailing List
> Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT Fusion Linux
> Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based
> devices
>
> This is patch 2/4 ...
>
> On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> > For 53C1030 based dual port HBAs the hard reset handler will cause
> > trouble on the second channel with innocent devices. It is then better
> > to fail the device which activated the error handler than to fail
> > cause errors on unrelated devices. Of course, the real solutions would
> > be to figure out why the hard reset handler cause trouble on the
> > second channel. Probably only LSI can do, though.
> >
> > Signed-off-by: Bernd Schubert <bs@q-leap.de>
> >
> > drivers/message/fusion/mptbase.c | 42 ++++++++++++++++++++++++++++-
> > drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> > 2 files changed, 72 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> > ===================================================================
> > --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> > +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> > @@ -59,6 +59,7 @@
> > #include <linux/interrupt.h> /* needed for in_interrupt() proto
> > */ #include <linux/dma-mapping.h>
> > #include <asm/io.h>
> > +#include <scsi/scsi_device.h>
> > #ifdef CONFIG_MTRR
> > #include <asm/mtrr.h>
> > #endif
> > @@ -6452,6 +6453,33 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i }
> >
> > /**
> > + * Check if there are devices connected to the second (alt) ioc.
> > + * Return 1 if there is at least on device and 0 if there are
> > + * none or no alt_ioc.
> > + */
> > +static int
> > +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> > +{
> > + struct Scsi_Host *shost;
> > + struct scsi_device *sdev;
> > + int have_devices = 0;
> > +
> > + if (!ioc->alt_ioc)
> > + return 0;
> > +
> > + shost = ioc->alt_ioc->sh;
> > +
> > + shost_for_each_device(sdev, shost) {
> > + /* when we are here, we know there is is a device
> > + * attached to this host, which is all we need to know */
> > + have_devices = 1;
> > + break;
> > + }
> > +
> > + return have_devices;
> > +}
> > +
> > +/**
> > * mpt_SoftHardResetHandler - Generic reset handler
> > * @ioc: Pointer to MPT_ADAPTER structure
> > * @sleepFlag: Indicates if sleep or schedule must be called.
> > @@ -6466,7 +6494,19 @@ mpt_SoftHardResetHandler(MPT_ADAPTER *io
> >
> > rc = mpt_SoftResetHandler(ioc, sleepFlag);
> > if (rc) {
> > - rc = mpt_HardResetHandler(ioc, sleepFlag);
> > + if (ioc->no_hard_reset && alt_ioc_with_dev(ioc)) {
> > + /* On dual port HBAs based on the 53C1030 chip the
> > + * hard reset handler will cause DID_SOFT_ERROR on
> > + * the second (in principle independent) port.
> > + * Almost always this error cannot be recovered
> > + * causing entire device failures. So it better not
> > + * to call the hard reset handler at all in order to
> > + * prevent failures of independent devices */
> > + printk(MYIOC_s_INFO_FMT "Skipping hard reset in "
> > + "order to prevent failures on %s.\n",
> > + ioc->name, ioc->alt_ioc->name);
> > + } else
> > + rc = mpt_HardResetHandler(ioc, sleepFlag);
> > }
> >
> > return rc;
> > Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> > ===================================================================
> > --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> > +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> > @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev *pdev) #endif
> >
> >
> > /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> >-= -=*/ +/**
> > + * avoid_hard_reset - check if hard resets should be avoided
> > + * @pdev: Pointer to pci_dev structure
> > + *
> > + * Hard resets will cause trouble on the the secondary IOC of
> > + * 53C1030 based devices.
> > + *
> > + * Returns 1 if affected chip is found and 1 for unaffected chips
> > + */
> > +static int
> > +avoid_hard_reset(struct pci_dev *pdev)
> > +{
> > + int avoid;
> > +
> > + switch (pdev->device) {
> > + case MPI_MANUFACTPAGE_DEVID_53C1030:
> > + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> > + /* TODO: which chips are affected as well? */
> > + avoid = 1;
> > + break;
> > + default:
> > + avoid = 0;
> > + }
> > +
> > + return avoid;
> > +}
> > +
> >
> > /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> >-= -=*/ /*
> > * mptspi_probe - Installs scsi devices per bus.
> > @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev *pdev, const
> > goto out_mptspi_probe;
> > }
> >
> > + /* hard resets on 53C1030 HBAs will cause trouble on secondaray
> > (alt) + * IOCs, so better no hard reset on these */
> > + ioc->no_hard_reset = avoid_hard_reset(pdev);
> > +
> > /*
> > * issue internal bus reset
> > */
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Bernd Schubert
> Q-Leap Networks GmbH
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
[not found] <6A4D764DC1BDE14495DA8DC60A3D69531E967A0759@hkgmail01.lsi.com>
@ 2008-10-07 16:25 ` Moore, Eric
2008-10-07 17:09 ` Bernd Schubert
0 siblings, 1 reply; 18+ messages in thread
From: Moore, Eric @ 2008-10-07 16:25 UTC (permalink / raw)
To: bs@q-leap.de, James.Bottomley@HansenPartnership.com,
linux-scsi@vger.kernel.org
Cc: Prakash, Sathya
Bernd - No we can not remove the hard reset from the code. Your going to break a whole lot of customers. The hard reset is a "start of day" big hammer reset, where the controller firmware is reloaded. The soft reset is not doing a 'start of day"reset, hence why its called soft reset. The soft reset will only reset a single function of the controller, and not effect the other function. Firmware is not reloaded in the case of soft reset. We have several multifunction cards like the 53c1030 ULTRA320 controller, and several in a line of the Fibre Channel Cards. The soft reset added to address some problems with timeouts on one channel resulting in host reset being called, and when you did that it would kill all the IO on the other channel, resulting in timeouts on the other channel.
So adding the soft reset prevented the other channel from being effected. In some cases the soft reset doesnt not recover the card, we need the bigg hammer to do a "start of day recovery" The ioc->alt_ioc pointer is used in the case of multifunction card, where one function is pointing to the other. There are cases where we need to be aware of the other controller, for instanced some controllers require that firmware be uploaded into driver memory ,and when ever we unload the driver, we can only download the image back on one function (not both).
Eric
On Tuesday, October 07, 2008 4:56 AM, Prakash, Sathya wrote:
>
> Eric,
> I think this patch will create more trouble, can you please
> send your thoughts on this at your free time. I strongly
> believe that removing hard reset totally is a bad Idea.
>
> Thanks
> Sathya
>
> -----Original Message-----
> From: Bernd Schubert [mailto:bs@q-leap.de]
> Sent: Monday, October 06, 2008 3:02 PM
> To: Prakash, Sathya
> Cc: Linux SCSI Mailing List; Moore, Eric; James Bottomley;
> DL-MPT Fusion Linux
> Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets for
> 53C1030 based devices
>
> Hello Sathya,
>
> On Monday 06 October 2008 11:07:00 Prakash, Sathya wrote:
> > Hi Bernd,
> > There are some cases where we MUST need hardreset for the
> firmware to
> > recover. So we can not completely avoid SoftReset, That is the main
> > purpose of our design of first to try softreset and if it
> is failing
> > then to go for hard reset.
> >
> > So I would suggest it is better not to remove the hard
> reset. If you
> > think the latest LSI provided driver has some cases of
> avoiding hard
> > reset like the one in this patch. Please let me know I will
> check and revert back.
>
> yes I can understand that, however, it only skips the hard
> reset, when it knows it will cause even more trouble than
> without it. Primary rule really MUST be "Do not kill innocent
> unrelated devices!".
>
> What are the requirements for a hard reset? When BOTH IOCs
> are in trouble, a single hard reset wouldn't make the situation worse.
> I didn't check yet and maybe you know immediately, is
> ioc->alt_ioc a pointer to the real IOC structure? If so, we
> could use it to check if this IOC also is in deep trouble.
>
>
> Cheers,
> Bernd
>
>
> >
> > Thanks
> > Sathya
> >
> >
> > -----Original Message-----
> > From: Bernd Schubert [mailto:bs@q-leap.de]
> > Sent: Tuesday, September 23, 2008 7:00 PM
> > To: Linux SCSI Mailing List
> > Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT
> Fusion Linux
> > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets
> for 53C1030
> > based devices
> >
> > This is patch 2/4 ...
> >
> > On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> > > For 53C1030 based dual port HBAs the hard reset handler
> will cause
> > > trouble on the second channel with innocent devices. It is then
> > > better to fail the device which activated the error
> handler than to
> > > fail cause errors on unrelated devices. Of course, the real
> > > solutions would be to figure out why the hard reset handler cause
> > > trouble on the second channel. Probably only LSI can do, though.
> > >
> > > Signed-off-by: Bernd Schubert <bs@q-leap.de>
> > >
> > > drivers/message/fusion/mptbase.c | 42
> ++++++++++++++++++++++++++++-
> > > drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> > > 2 files changed, 72 insertions(+), 1 deletion(-)
> > >
> > > Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> > >
> ===================================================================
> > > --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> > > +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> > > @@ -59,6 +59,7 @@
> > > #include <linux/interrupt.h> /* needed for
> in_interrupt() proto
> > > */ #include <linux/dma-mapping.h>
> > > #include <asm/io.h>
> > > +#include <scsi/scsi_device.h>
> > > #ifdef CONFIG_MTRR
> > > #include <asm/mtrr.h>
> > > #endif
> > > @@ -6452,6 +6453,33 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i }
> > >
> > > /**
> > > + * Check if there are devices connected to the second (alt) ioc.
> > > + * Return 1 if there is at least on device and 0 if there are
> > > + * none or no alt_ioc.
> > > + */
> > > +static int
> > > +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> > > +{
> > > + struct Scsi_Host *shost;
> > > + struct scsi_device *sdev;
> > > + int have_devices = 0;
> > > +
> > > + if (!ioc->alt_ioc)
> > > + return 0;
> > > +
> > > + shost = ioc->alt_ioc->sh;
> > > +
> > > + shost_for_each_device(sdev, shost) {
> > > + /* when we are here, we know there is is a device
> > > + * attached to this host, which is all we
> need to know */
> > > + have_devices = 1;
> > > + break;
> > > + }
> > > +
> > > + return have_devices;
> > > +}
> > > +
> > > +/**
> > > * mpt_SoftHardResetHandler - Generic reset handler
> > > * @ioc: Pointer to MPT_ADAPTER structure
> > > * @sleepFlag: Indicates if sleep or schedule must be called.
> > > @@ -6466,7 +6494,19 @@ mpt_SoftHardResetHandler(MPT_ADAPTER *io
> > >
> > > rc = mpt_SoftResetHandler(ioc, sleepFlag);
> > > if (rc) {
> > > - rc = mpt_HardResetHandler(ioc, sleepFlag);
> > > + if (ioc->no_hard_reset && alt_ioc_with_dev(ioc)) {
> > > + /* On dual port HBAs based on the
> 53C1030 chip the
> > > + * hard reset handler will cause
> DID_SOFT_ERROR on
> > > + * the second (in principle
> independent) port.
> > > + * Almost always this error cannot
> be recovered
> > > + * causing entire device failures.
> So it better not
> > > + * to call the hard reset handler at
> all in order to
> > > + * prevent failures of independent devices */
> > > + printk(MYIOC_s_INFO_FMT "Skipping
> hard reset in "
> > > + "order to prevent failures
> on %s.\n",
> > > + ioc->name, ioc->alt_ioc->name);
> > > + } else
> > > + rc = mpt_HardResetHandler(ioc, sleepFlag);
> > > }
> > >
> > > return rc;
> > > Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> > >
> ===================================================================
> > > --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> > > +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> > > @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev *pdev) #endif
> > >
> > >
> > >
> >
> >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > >-=-=
> > >-= -=*/ +/**
> > > + * avoid_hard_reset - check if hard resets should be avoided
> > > + * @pdev: Pointer to pci_dev structure
> > > + *
> > > + * Hard resets will cause trouble on the the secondary IOC of
> > > + * 53C1030 based devices.
> > > + *
> > > + * Returns 1 if affected chip is found and 1 for
> unaffected chips
> > > + */
> > > +static int
> > > +avoid_hard_reset(struct pci_dev *pdev) {
> > > + int avoid;
> > > +
> > > + switch (pdev->device) {
> > > + case MPI_MANUFACTPAGE_DEVID_53C1030:
> > > + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> > > + /* TODO: which chips are affected as well? */
> > > + avoid = 1;
> > > + break;
> > > + default:
> > > + avoid = 0;
> > > + }
> > > +
> > > + return avoid;
> > > +}
> > > +
> > >
> > >
> >
> >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > >-=-=
> > >-= -=*/ /*
> > > * mptspi_probe - Installs scsi devices per bus.
> > > @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev *pdev, const
> > > goto out_mptspi_probe;
> > > }
> > >
> > > + /* hard resets on 53C1030 HBAs will cause trouble on
> > > + secondaray
> > > (alt) + * IOCs, so better no hard reset on these */
> > > + ioc->no_hard_reset = avoid_hard_reset(pdev);
> > > +
> > > /*
> > > * issue internal bus reset
> > > */
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-scsi" in the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > --
> > Bernd Schubert
> > Q-Leap Networks GmbH
>
>
>
> --
> Bernd Schubert
> Q-Leap Networks GmbH
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-10-07 16:25 ` [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices Moore, Eric
@ 2008-10-07 17:09 ` Bernd Schubert
2008-10-07 20:00 ` Moore, Eric
0 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-10-07 17:09 UTC (permalink / raw)
To: Moore, Eric
Cc: James.Bottomley@HansenPartnership.com, linux-scsi@vger.kernel.org,
Prakash, Sathya
Hello Eric, hello Sathya,
I am open to other ideas, of course. Here a few notes:
I didn't entirely remove the hard reset handler. It is only remove for 53C1030
based chips. Though, I'm not entire sure, if struct _MPT_ADAPTER is
initialized with zeros for FC and SAS cards. If not, we need to fix the patch
to set ioc->no_hard_reset=0 on these. If you look at the patch, you will see
one **needs** to set ioc->no_hard_reset=1 to disable the hard reset handler.
Furthermore, the hard resets are *only* disabled, when there is an alt_ioc
with connected devices.
So I already tried my best not break other customers. If you think the tests
still need to be extended, please tell me what needs to be done.
I guess you count the LSI22320R HBA as "multifunction card"? This is the card
I wrote the patches for, since the way the hard reset handler destroys the
operational state of the second port is simply not acceptable. Basically what
happens is the following:
1) scsi error handler activates the fusion reset handler
2) The port of the LSI22320R which the hard reset handler was activated for
recovers properly. However, now devices on the second port of this HBA
suddenly fail with DID_SOFT_ERROR ==> Goto 1)
3) Eventually the chip is in such a bad state by a number of hard resets, that
even the hard reset handler fails ==> devices on both ports are offlined
If you are 'luckily' the ping-pong of hard resets may last for hours, please
see my messages to scsi list back in December. I have some generic (not yet
posted) scsi-error patches here, which will at least limit the maximum number
of failures within a given time frame and then offline the device.
Of course, it would be optimal to make the hard reset handler not to cause
errors on the second port. If you have an idea how to archieve this, I
absolutely open for this perfect solution.
Do you have an example of multifunction 53C1030 cards, of which you know the
hard resets won't trouble? Working for a small mostly hardware selling
company, I don't have the possibilty to test all of your hardware, but on the
other hand I now tested the LSI22320R for weeks. And it definitely works
better without the hardware reset handler...
Cheers,
Bernd
On Tuesday 07 October 2008 18:25:35 Moore, Eric wrote:
> Bernd - No we can not remove the hard reset from the code. Your going to
> break a whole lot of customers. The hard reset is a "start of day" big
> hammer reset, where the controller firmware is reloaded. The soft reset is
> not doing a 'start of day"reset, hence why its called soft reset. The soft
> reset will only reset a single function of the controller, and not effect
> the other function. Firmware is not reloaded in the case of soft reset.
> We have several multifunction cards like the 53c1030 ULTRA320 controller,
> and several in a line of the Fibre Channel Cards. The soft reset added to
> address some problems with timeouts on one channel resulting in host reset
> being called, and when you did that it would kill all the IO on the other
> channel, resulting in timeouts on the other channel. So adding the soft
> reset prevented the other channel from being effected. In some cases the
> soft reset doesnt not recover the card, we need the bigg hammer to do a
> "start of day recovery" The ioc->alt_ioc pointer is used in the case of
> multifunction card, where one function is pointing to the other. There are
> cases where we need to be aware of the other controller, for instanced some
> controllers require that firmware be uploaded into driver memory ,and when
> ever we unload the driver, we can only download the image back on one
> function (not both).
>
> Eric
>
> On Tuesday, October 07, 2008 4:56 AM, Prakash, Sathya wrote:
> > Eric,
> > I think this patch will create more trouble, can you please
> > send your thoughts on this at your free time. I strongly
> > believe that removing hard reset totally is a bad Idea.
> >
> > Thanks
> > Sathya
> >
> > -----Original Message-----
> > From: Bernd Schubert [mailto:bs@q-leap.de]
> > Sent: Monday, October 06, 2008 3:02 PM
> > To: Prakash, Sathya
> > Cc: Linux SCSI Mailing List; Moore, Eric; James Bottomley;
> > DL-MPT Fusion Linux
> > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets for
> > 53C1030 based devices
> >
> > Hello Sathya,
> >
> > On Monday 06 October 2008 11:07:00 Prakash, Sathya wrote:
> > > Hi Bernd,
> > > There are some cases where we MUST need hardreset for the
> >
> > firmware to
> >
> > > recover. So we can not completely avoid SoftReset, That is the main
> > > purpose of our design of first to try softreset and if it
> >
> > is failing
> >
> > > then to go for hard reset.
> > >
> > > So I would suggest it is better not to remove the hard
> >
> > reset. If you
> >
> > > think the latest LSI provided driver has some cases of
> >
> > avoiding hard
> >
> > > reset like the one in this patch. Please let me know I will
> >
> > check and revert back.
> >
> > yes I can understand that, however, it only skips the hard
> > reset, when it knows it will cause even more trouble than
> > without it. Primary rule really MUST be "Do not kill innocent
> > unrelated devices!".
> >
> > What are the requirements for a hard reset? When BOTH IOCs
> > are in trouble, a single hard reset wouldn't make the situation worse.
> > I didn't check yet and maybe you know immediately, is
> > ioc->alt_ioc a pointer to the real IOC structure? If so, we
> > could use it to check if this IOC also is in deep trouble.
> >
> >
> > Cheers,
> > Bernd
> >
> > > Thanks
> > > Sathya
> > >
> > >
> > > -----Original Message-----
> > > From: Bernd Schubert [mailto:bs@q-leap.de]
> > > Sent: Tuesday, September 23, 2008 7:00 PM
> > > To: Linux SCSI Mailing List
> > > Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT
> >
> > Fusion Linux
> >
> > > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets
> >
> > for 53C1030
> >
> > > based devices
> > >
> > > This is patch 2/4 ...
> > >
> > > On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> > > > For 53C1030 based dual port HBAs the hard reset handler
> >
> > will cause
> >
> > > > trouble on the second channel with innocent devices. It is then
> > > > better to fail the device which activated the error
> >
> > handler than to
> >
> > > > fail cause errors on unrelated devices. Of course, the real
> > > > solutions would be to figure out why the hard reset handler cause
> > > > trouble on the second channel. Probably only LSI can do, though.
> > > >
> > > > Signed-off-by: Bernd Schubert <bs@q-leap.de>
> > > >
> > > > drivers/message/fusion/mptbase.c | 42
> >
> > ++++++++++++++++++++++++++++-
> >
> > > > drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> > > > 2 files changed, 72 insertions(+), 1 deletion(-)
> > > >
> > > > Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> >
> > ===================================================================
> >
> > > > --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> > > > +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> > > > @@ -59,6 +59,7 @@
> > > > #include <linux/interrupt.h> /* needed for
> >
> > in_interrupt() proto
> >
> > > > */ #include <linux/dma-mapping.h>
> > > > #include <asm/io.h>
> > > > +#include <scsi/scsi_device.h>
> > > > #ifdef CONFIG_MTRR
> > > > #include <asm/mtrr.h>
> > > > #endif
> > > > @@ -6452,6 +6453,33 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i }
> > > >
> > > > /**
> > > > + * Check if there are devices connected to the second (alt) ioc.
> > > > + * Return 1 if there is at least on device and 0 if there are
> > > > + * none or no alt_ioc.
> > > > + */
> > > > +static int
> > > > +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> > > > +{
> > > > + struct Scsi_Host *shost;
> > > > + struct scsi_device *sdev;
> > > > + int have_devices = 0;
> > > > +
> > > > + if (!ioc->alt_ioc)
> > > > + return 0;
> > > > +
> > > > + shost = ioc->alt_ioc->sh;
> > > > +
> > > > + shost_for_each_device(sdev, shost) {
> > > > + /* when we are here, we know there is is a device
> > > > + * attached to this host, which is all we
> >
> > need to know */
> >
> > > > + have_devices = 1;
> > > > + break;
> > > > + }
> > > > +
> > > > + return have_devices;
> > > > +}
> > > > +
> > > > +/**
> > > > * mpt_SoftHardResetHandler - Generic reset handler
> > > > * @ioc: Pointer to MPT_ADAPTER structure
> > > > * @sleepFlag: Indicates if sleep or schedule must be called.
> > > > @@ -6466,7 +6494,19 @@ mpt_SoftHardResetHandler(MPT_ADAPTER *io
> > > >
> > > > rc = mpt_SoftResetHandler(ioc, sleepFlag);
> > > > if (rc) {
> > > > - rc = mpt_HardResetHandler(ioc, sleepFlag);
> > > > + if (ioc->no_hard_reset && alt_ioc_with_dev(ioc)) {
> > > > + /* On dual port HBAs based on the
> >
> > 53C1030 chip the
> >
> > > > + * hard reset handler will cause
> >
> > DID_SOFT_ERROR on
> >
> > > > + * the second (in principle
> >
> > independent) port.
> >
> > > > + * Almost always this error cannot
> >
> > be recovered
> >
> > > > + * causing entire device failures.
> >
> > So it better not
> >
> > > > + * to call the hard reset handler at
> >
> > all in order to
> >
> > > > + * prevent failures of independent devices */
> > > > + printk(MYIOC_s_INFO_FMT "Skipping
> >
> > hard reset in "
> >
> > > > + "order to prevent failures
> >
> > on %s.\n",
> >
> > > > + ioc->name, ioc->alt_ioc->name);
> > > > + } else
> > > > + rc = mpt_HardResetHandler(ioc, sleepFlag);
> > > > }
> > > >
> > > > return rc;
> > > > Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> >
> > ===================================================================
> >
> > > > --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> > > > +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> > > > @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev *pdev) #endif
> > >
> > >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > >
> > > >-=-=
> > > >-= -=*/ +/**
> > > > + * avoid_hard_reset - check if hard resets should be avoided
> > > > + * @pdev: Pointer to pci_dev structure
> > > > + *
> > > > + * Hard resets will cause trouble on the the secondary IOC of
> > > > + * 53C1030 based devices.
> > > > + *
> > > > + * Returns 1 if affected chip is found and 1 for
> >
> > unaffected chips
> >
> > > > + */
> > > > +static int
> > > > +avoid_hard_reset(struct pci_dev *pdev) {
> > > > + int avoid;
> > > > +
> > > > + switch (pdev->device) {
> > > > + case MPI_MANUFACTPAGE_DEVID_53C1030:
> > > > + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> > > > + /* TODO: which chips are affected as well? */
> > > > + avoid = 1;
> > > > + break;
> > > > + default:
> > > > + avoid = 0;
> > > > + }
> > > > +
> > > > + return avoid;
> > > > +}
> > > > +
> > >
> > >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > >
> > > >-=-=
> > > >-= -=*/ /*
> > > > * mptspi_probe - Installs scsi devices per bus.
> > > > @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev *pdev, const
> > > > goto out_mptspi_probe;
> > > > }
> > > >
> > > > + /* hard resets on 53C1030 HBAs will cause trouble on
> > > > + secondaray
> > > > (alt) + * IOCs, so better no hard reset on these */
> > > > + ioc->no_hard_reset = avoid_hard_reset(pdev);
> > > > +
> > > > /*
> > > > * issue internal bus reset
> > > > */
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > > linux-scsi" in the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > > --
> > > Bernd Schubert
> > > Q-Leap Networks GmbH
> >
> > --
> > Bernd Schubert
> > Q-Leap Networks GmbH
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-10-07 17:09 ` Bernd Schubert
@ 2008-10-07 20:00 ` Moore, Eric
2008-10-07 21:27 ` Bernd Schubert
0 siblings, 1 reply; 18+ messages in thread
From: Moore, Eric @ 2008-10-07 20:00 UTC (permalink / raw)
To: Bernd Schubert
Cc: James.Bottomley@HansenPartnership.com, linux-scsi@vger.kernel.org,
Prakash, Sathya
No, the hard reset can't be removed. If your controller ever goes into FAULT state, only the hard reset will recover it. The soft reset is unable to recover a card in fault. I have an application I can send you that will put a card into FAULT. Please let me know.
regarding multifunction card, you can figure this out using lspci, also when the driver loads, you will have an ioc0 and ioc1 assigned to a single controller.
Here is an example of 53C1030 dual function card. Notice 03:01.0 is the 1st function, and 03:01.1 is the 2nd function.
03:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
03:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
Here is an example of Fibre Channel dual function card. Notice 04:02.0 is the 1st function, and 04:02.1 is the 2nd function.
04:02.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel Adapter (rev 80)
04:02.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel Adapter (rev 80)
Eric
Bernd Schubert wrote:
>
> I am open to other ideas, of course. Here a few notes:
>
> I didn't entirely remove the hard reset handler. It is only
> remove for 53C1030
> based chips. Though, I'm not entire sure, if struct _MPT_ADAPTER is
> initialized with zeros for FC and SAS cards. If not, we need
> to fix the patch
> to set ioc->no_hard_reset=0 on these. If you look at the
> patch, you will see
> one **needs** to set ioc->no_hard_reset=1 to disable the hard
> reset handler.
> Furthermore, the hard resets are *only* disabled, when there
> is an alt_ioc
> with connected devices.
> So I already tried my best not break other customers. If you
> think the tests
> still need to be extended, please tell me what needs to be done.
>
> I guess you count the LSI22320R HBA as "multifunction card"?
> This is the card
> I wrote the patches for, since the way the hard reset handler
> destroys the
> operational state of the second port is simply not
> acceptable. Basically what
> happens is the following:
>
> 1) scsi error handler activates the fusion reset handler
>
> 2) The port of the LSI22320R which the hard reset handler was
> activated for
> recovers properly. However, now devices on the second port of this HBA
> suddenly fail with DID_SOFT_ERROR ==> Goto 1)
>
> 3) Eventually the chip is in such a bad state by a number of
> hard resets, that
> even the hard reset handler fails ==> devices on both ports
> are offlined
>
> If you are 'luckily' the ping-pong of hard resets may last
> for hours, please
> see my messages to scsi list back in December. I have some
> generic (not yet
> posted) scsi-error patches here, which will at least limit
> the maximum number
> of failures within a given time frame and then offline the device.
>
> Of course, it would be optimal to make the hard reset handler
> not to cause
> errors on the second port. If you have an idea how to archieve this, I
> absolutely open for this perfect solution.
>
> Do you have an example of multifunction 53C1030 cards, of
> which you know the
> hard resets won't trouble? Working for a small mostly hardware selling
> company, I don't have the possibilty to test all of your
> hardware, but on the
> other hand I now tested the LSI22320R for weeks. And it
> definitely works
> better without the hardware reset handler...
>
>
> Cheers,
> Bernd
>
>
>
> On Tuesday 07 October 2008 18:25:35 Moore, Eric wrote:
> > Bernd - No we can not remove the hard reset from the code.
> Your going to
> > break a whole lot of customers. The hard reset is a
> "start of day" big
> > hammer reset, where the controller firmware is reloaded.
> The soft reset is
> > not doing a 'start of day"reset, hence why its called soft
> reset. The soft
> > reset will only reset a single function of the controller,
> and not effect
> > the other function. Firmware is not reloaded in the case
> of soft reset.
> > We have several multifunction cards like the 53c1030
> ULTRA320 controller,
> > and several in a line of the Fibre Channel Cards. The
> soft reset added to
> > address some problems with timeouts on one channel
> resulting in host reset
> > being called, and when you did that it would kill all the
> IO on the other
> > channel, resulting in timeouts on the other channel. So
> adding the soft
> > reset prevented the other channel from being effected.
> In some cases the
> > soft reset doesnt not recover the card, we need the bigg
> hammer to do a
> > "start of day recovery" The ioc->alt_ioc pointer is used
> in the case of
> > multifunction card, where one function is pointing to the
> other. There are
> > cases where we need to be aware of the other controller,
> for instanced some
> > controllers require that firmware be uploaded into driver
> memory ,and when
> > ever we unload the driver, we can only download the image
> back on one
> > function (not both).
> >
> > Eric
> >
> > On Tuesday, October 07, 2008 4:56 AM, Prakash, Sathya wrote:
> > > Eric,
> > > I think this patch will create more trouble, can you please
> > > send your thoughts on this at your free time. I strongly
> > > believe that removing hard reset totally is a bad Idea.
> > >
> > > Thanks
> > > Sathya
> > >
> > > -----Original Message-----
> > > From: Bernd Schubert [mailto:bs@q-leap.de]
> > > Sent: Monday, October 06, 2008 3:02 PM
> > > To: Prakash, Sathya
> > > Cc: Linux SCSI Mailing List; Moore, Eric; James Bottomley;
> > > DL-MPT Fusion Linux
> > > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets for
> > > 53C1030 based devices
> > >
> > > Hello Sathya,
> > >
> > > On Monday 06 October 2008 11:07:00 Prakash, Sathya wrote:
> > > > Hi Bernd,
> > > > There are some cases where we MUST need hardreset for the
> > >
> > > firmware to
> > >
> > > > recover. So we can not completely avoid SoftReset, That
> is the main
> > > > purpose of our design of first to try softreset and if it
> > >
> > > is failing
> > >
> > > > then to go for hard reset.
> > > >
> > > > So I would suggest it is better not to remove the hard
> > >
> > > reset. If you
> > >
> > > > think the latest LSI provided driver has some cases of
> > >
> > > avoiding hard
> > >
> > > > reset like the one in this patch. Please let me know I will
> > >
> > > check and revert back.
> > >
> > > yes I can understand that, however, it only skips the hard
> > > reset, when it knows it will cause even more trouble than
> > > without it. Primary rule really MUST be "Do not kill innocent
> > > unrelated devices!".
> > >
> > > What are the requirements for a hard reset? When BOTH IOCs
> > > are in trouble, a single hard reset wouldn't make the
> situation worse.
> > > I didn't check yet and maybe you know immediately, is
> > > ioc->alt_ioc a pointer to the real IOC structure? If so, we
> > > could use it to check if this IOC also is in deep trouble.
> > >
> > >
> > > Cheers,
> > > Bernd
> > >
> > > > Thanks
> > > > Sathya
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Bernd Schubert [mailto:bs@q-leap.de]
> > > > Sent: Tuesday, September 23, 2008 7:00 PM
> > > > To: Linux SCSI Mailing List
> > > > Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT
> > >
> > > Fusion Linux
> > >
> > > > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets
> > >
> > > for 53C1030
> > >
> > > > based devices
> > > >
> > > > This is patch 2/4 ...
> > > >
> > > > On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> > > > > For 53C1030 based dual port HBAs the hard reset handler
> > >
> > > will cause
> > >
> > > > > trouble on the second channel with innocent devices.
> It is then
> > > > > better to fail the device which activated the error
> > >
> > > handler than to
> > >
> > > > > fail cause errors on unrelated devices. Of course, the real
> > > > > solutions would be to figure out why the hard reset
> handler cause
> > > > > trouble on the second channel. Probably only LSI can
> do, though.
> > > > >
> > > > > Signed-off-by: Bernd Schubert <bs@q-leap.de>
> > > > >
> > > > > drivers/message/fusion/mptbase.c | 42
> > >
> > > ++++++++++++++++++++++++++++-
> > >
> > > > > drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> > > > > 2 files changed, 72 insertions(+), 1 deletion(-)
> > > > >
> > > > > Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> > >
> > >
> ===================================================================
> > >
> > > > > --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> > > > > +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> > > > > @@ -59,6 +59,7 @@
> > > > > #include <linux/interrupt.h> /* needed for
> > >
> > > in_interrupt() proto
> > >
> > > > > */ #include <linux/dma-mapping.h>
> > > > > #include <asm/io.h>
> > > > > +#include <scsi/scsi_device.h>
> > > > > #ifdef CONFIG_MTRR
> > > > > #include <asm/mtrr.h>
> > > > > #endif
> > > > > @@ -6452,6 +6453,33 @@
> mpt_HardResetHandler(MPT_ADAPTER *ioc, i }
> > > > >
> > > > > /**
> > > > > + * Check if there are devices connected to the
> second (alt) ioc.
> > > > > + * Return 1 if there is at least on device and 0 if
> there are
> > > > > + * none or no alt_ioc.
> > > > > + */
> > > > > +static int
> > > > > +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> > > > > +{
> > > > > + struct Scsi_Host *shost;
> > > > > + struct scsi_device *sdev;
> > > > > + int have_devices = 0;
> > > > > +
> > > > > + if (!ioc->alt_ioc)
> > > > > + return 0;
> > > > > +
> > > > > + shost = ioc->alt_ioc->sh;
> > > > > +
> > > > > + shost_for_each_device(sdev, shost) {
> > > > > + /* when we are here, we know there is
> is a device
> > > > > + * attached to this host, which is all we
> > >
> > > need to know */
> > >
> > > > > + have_devices = 1;
> > > > > + break;
> > > > > + }
> > > > > +
> > > > > + return have_devices;
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > * mpt_SoftHardResetHandler - Generic reset handler
> > > > > * @ioc: Pointer to MPT_ADAPTER structure
> > > > > * @sleepFlag: Indicates if sleep or schedule must
> be called.
> > > > > @@ -6466,7 +6494,19 @@
> mpt_SoftHardResetHandler(MPT_ADAPTER *io
> > > > >
> > > > > rc = mpt_SoftResetHandler(ioc, sleepFlag);
> > > > > if (rc) {
> > > > > - rc = mpt_HardResetHandler(ioc, sleepFlag);
> > > > > + if (ioc->no_hard_reset &&
> alt_ioc_with_dev(ioc)) {
> > > > > + /* On dual port HBAs based on the
> > >
> > > 53C1030 chip the
> > >
> > > > > + * hard reset handler will cause
> > >
> > > DID_SOFT_ERROR on
> > >
> > > > > + * the second (in principle
> > >
> > > independent) port.
> > >
> > > > > + * Almost always this error cannot
> > >
> > > be recovered
> > >
> > > > > + * causing entire device failures.
> > >
> > > So it better not
> > >
> > > > > + * to call the hard reset handler at
> > >
> > > all in order to
> > >
> > > > > + * prevent failures of
> independent devices */
> > > > > + printk(MYIOC_s_INFO_FMT "Skipping
> > >
> > > hard reset in "
> > >
> > > > > + "order to prevent failures
> > >
> > > on %s.\n",
> > >
> > > > > + ioc->name, ioc->alt_ioc->name);
> > > > > + } else
> > > > > + rc = mpt_HardResetHandler(ioc,
> sleepFlag);
> > > > > }
> > > > >
> > > > > return rc;
> > > > > Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> > >
> > >
> ===================================================================
> > >
> > > > > --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> > > > > +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> > > > > @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev
> *pdev) #endif
> > > >
> > >
> >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > > >
> > > > >-=-=
> > > > >-= -=*/ +/**
> > > > > + * avoid_hard_reset - check if hard resets should be avoided
> > > > > + * @pdev: Pointer to pci_dev structure
> > > > > + *
> > > > > + * Hard resets will cause trouble on the the
> secondary IOC of
> > > > > + * 53C1030 based devices.
> > > > > + *
> > > > > + * Returns 1 if affected chip is found and 1 for
> > >
> > > unaffected chips
> > >
> > > > > + */
> > > > > +static int
> > > > > +avoid_hard_reset(struct pci_dev *pdev) {
> > > > > + int avoid;
> > > > > +
> > > > > + switch (pdev->device) {
> > > > > + case MPI_MANUFACTPAGE_DEVID_53C1030:
> > > > > + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> > > > > + /* TODO: which chips are affected as well? */
> > > > > + avoid = 1;
> > > > > + break;
> > > > > + default:
> > > > > + avoid = 0;
> > > > > + }
> > > > > +
> > > > > + return avoid;
> > > > > +}
> > > > > +
> > > >
> > >
> >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > > >
> > > > >-=-=
> > > > >-= -=*/ /*
> > > > > * mptspi_probe - Installs scsi devices per bus.
> > > > > @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev
> *pdev, const
> > > > > goto out_mptspi_probe;
> > > > > }
> > > > >
> > > > > + /* hard resets on 53C1030 HBAs will cause trouble on
> > > > > + secondaray
> > > > > (alt) + * IOCs, so better no hard reset on these */
> > > > > + ioc->no_hard_reset = avoid_hard_reset(pdev);
> > > > > +
> > > > > /*
> > > > > * issue internal bus reset
> > > > > */
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe
> > > > > linux-scsi" in the body of a message to
> majordomo@vger.kernel.org
> > > > > More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > > >
> > > > --
> > > > Bernd Schubert
> > > > Q-Leap Networks GmbH
> > >
> > > --
> > > Bernd Schubert
> > > Q-Leap Networks GmbH
>
>
>
> --
> Bernd Schubert
> Q-Leap Networks GmbH
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices
2008-10-07 20:00 ` Moore, Eric
@ 2008-10-07 21:27 ` Bernd Schubert
0 siblings, 0 replies; 18+ messages in thread
From: Bernd Schubert @ 2008-10-07 21:27 UTC (permalink / raw)
To: Moore, Eric
Cc: James.Bottomley@HansenPartnership.com, linux-scsi@vger.kernel.org,
Prakash, Sathya
Yes, please send me the application to the card into FAULT. Also, is there
documentation available how to access the MPT firmware and/or the 53C1030
chip?
So we can agree on that an example for a dual function card is the LSI22320R
HBA :) this is what I get here with this HBA
01:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
01:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
Also, as I wrote before, the patch is not supposed to change anything for
FC or SAS devices, only for 53C1030 based HBAs.
One way or another the hard reset handler will cause trouble on 53C1030
devices. If it doesn't activate, the port it is supposed to be run for
will not work anymore and if it activates the other port will also get
errors...
Cheers,
Bernd
On Tue, Oct 07, 2008 at 02:00:59PM -0600, Moore, Eric wrote:
> No, the hard reset can't be removed. If your controller ever goes into FAULT state, only the hard reset will recover it. The soft reset is unable to recover a card in fault. I have an application I can send you that will put a card into FAULT. Please let me know.
>
> regarding multifunction card, you can figure this out using lspci, also when the driver loads, you will have an ioc0 and ioc1 assigned to a single controller.
>
> Here is an example of 53C1030 dual function card. Notice 03:01.0 is the 1st function, and 03:01.1 is the 2nd function.
>
> 03:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
> 03:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>
> Here is an example of Fibre Channel dual function card. Notice 04:02.0 is the 1st function, and 04:02.1 is the 2nd function.
>
> 04:02.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel Adapter (rev 80)
> 04:02.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre Channel Adapter (rev 80)
>
> Eric
>
> Bernd Schubert wrote:
>
> >
> > I am open to other ideas, of course. Here a few notes:
> >
> > I didn't entirely remove the hard reset handler. It is only
> > remove for 53C1030
> > based chips. Though, I'm not entire sure, if struct _MPT_ADAPTER is
> > initialized with zeros for FC and SAS cards. If not, we need
> > to fix the patch
> > to set ioc->no_hard_reset=0 on these. If you look at the
> > patch, you will see
> > one **needs** to set ioc->no_hard_reset=1 to disable the hard
> > reset handler.
> > Furthermore, the hard resets are *only* disabled, when there
> > is an alt_ioc
> > with connected devices.
> > So I already tried my best not break other customers. If you
> > think the tests
> > still need to be extended, please tell me what needs to be done.
> >
> > I guess you count the LSI22320R HBA as "multifunction card"?
> > This is the card
> > I wrote the patches for, since the way the hard reset handler
> > destroys the
> > operational state of the second port is simply not
> > acceptable. Basically what
> > happens is the following:
> >
> > 1) scsi error handler activates the fusion reset handler
> >
> > 2) The port of the LSI22320R which the hard reset handler was
> > activated for
> > recovers properly. However, now devices on the second port of this HBA
> > suddenly fail with DID_SOFT_ERROR ==> Goto 1)
> >
> > 3) Eventually the chip is in such a bad state by a number of
> > hard resets, that
> > even the hard reset handler fails ==> devices on both ports
> > are offlined
> >
> > If you are 'luckily' the ping-pong of hard resets may last
> > for hours, please
> > see my messages to scsi list back in December. I have some
> > generic (not yet
> > posted) scsi-error patches here, which will at least limit
> > the maximum number
> > of failures within a given time frame and then offline the device.
> >
> > Of course, it would be optimal to make the hard reset handler
> > not to cause
> > errors on the second port. If you have an idea how to archieve this, I
> > absolutely open for this perfect solution.
> >
> > Do you have an example of multifunction 53C1030 cards, of
> > which you know the
> > hard resets won't trouble? Working for a small mostly hardware selling
> > company, I don't have the possibilty to test all of your
> > hardware, but on the
> > other hand I now tested the LSI22320R for weeks. And it
> > definitely works
> > better without the hardware reset handler...
> >
> >
> > Cheers,
> > Bernd
> >
> >
> >
> > On Tuesday 07 October 2008 18:25:35 Moore, Eric wrote:
> > > Bernd - No we can not remove the hard reset from the code.
> > Your going to
> > > break a whole lot of customers. The hard reset is a
> > "start of day" big
> > > hammer reset, where the controller firmware is reloaded.
> > The soft reset is
> > > not doing a 'start of day"reset, hence why its called soft
> > reset. The soft
> > > reset will only reset a single function of the controller,
> > and not effect
> > > the other function. Firmware is not reloaded in the case
> > of soft reset.
> > > We have several multifunction cards like the 53c1030
> > ULTRA320 controller,
> > > and several in a line of the Fibre Channel Cards. The
> > soft reset added to
> > > address some problems with timeouts on one channel
> > resulting in host reset
> > > being called, and when you did that it would kill all the
> > IO on the other
> > > channel, resulting in timeouts on the other channel. So
> > adding the soft
> > > reset prevented the other channel from being effected.
> > In some cases the
> > > soft reset doesnt not recover the card, we need the bigg
> > hammer to do a
> > > "start of day recovery" The ioc->alt_ioc pointer is used
> > in the case of
> > > multifunction card, where one function is pointing to the
> > other. There are
> > > cases where we need to be aware of the other controller,
> > for instanced some
> > > controllers require that firmware be uploaded into driver
> > memory ,and when
> > > ever we unload the driver, we can only download the image
> > back on one
> > > function (not both).
> > >
> > > Eric
> > >
> > > On Tuesday, October 07, 2008 4:56 AM, Prakash, Sathya wrote:
> > > > Eric,
> > > > I think this patch will create more trouble, can you please
> > > > send your thoughts on this at your free time. I strongly
> > > > believe that removing hard reset totally is a bad Idea.
> > > >
> > > > Thanks
> > > > Sathya
> > > >
> > > > -----Original Message-----
> > > > From: Bernd Schubert [mailto:bs@q-leap.de]
> > > > Sent: Monday, October 06, 2008 3:02 PM
> > > > To: Prakash, Sathya
> > > > Cc: Linux SCSI Mailing List; Moore, Eric; James Bottomley;
> > > > DL-MPT Fusion Linux
> > > > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets for
> > > > 53C1030 based devices
> > > >
> > > > Hello Sathya,
> > > >
> > > > On Monday 06 October 2008 11:07:00 Prakash, Sathya wrote:
> > > > > Hi Bernd,
> > > > > There are some cases where we MUST need hardreset for the
> > > >
> > > > firmware to
> > > >
> > > > > recover. So we can not completely avoid SoftReset, That
> > is the main
> > > > > purpose of our design of first to try softreset and if it
> > > >
> > > > is failing
> > > >
> > > > > then to go for hard reset.
> > > > >
> > > > > So I would suggest it is better not to remove the hard
> > > >
> > > > reset. If you
> > > >
> > > > > think the latest LSI provided driver has some cases of
> > > >
> > > > avoiding hard
> > > >
> > > > > reset like the one in this patch. Please let me know I will
> > > >
> > > > check and revert back.
> > > >
> > > > yes I can understand that, however, it only skips the hard
> > > > reset, when it knows it will cause even more trouble than
> > > > without it. Primary rule really MUST be "Do not kill innocent
> > > > unrelated devices!".
> > > >
> > > > What are the requirements for a hard reset? When BOTH IOCs
> > > > are in trouble, a single hard reset wouldn't make the
> > situation worse.
> > > > I didn't check yet and maybe you know immediately, is
> > > > ioc->alt_ioc a pointer to the real IOC structure? If so, we
> > > > could use it to check if this IOC also is in deep trouble.
> > > >
> > > >
> > > > Cheers,
> > > > Bernd
> > > >
> > > > > Thanks
> > > > > Sathya
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Bernd Schubert [mailto:bs@q-leap.de]
> > > > > Sent: Tuesday, September 23, 2008 7:00 PM
> > > > > To: Linux SCSI Mailing List
> > > > > Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT
> > > >
> > > > Fusion Linux
> > > >
> > > > > Subject: Re: [ PATCH 2/4 ] mpt fusion disable hard resets
> > > >
> > > > for 53C1030
> > > >
> > > > > based devices
> > > > >
> > > > > This is patch 2/4 ...
> > > > >
> > > > > On Tuesday 23 September 2008 15:26:30 Bernd Schubert wrote:
> > > > > > For 53C1030 based dual port HBAs the hard reset handler
> > > >
> > > > will cause
> > > >
> > > > > > trouble on the second channel with innocent devices.
> > It is then
> > > > > > better to fail the device which activated the error
> > > >
> > > > handler than to
> > > >
> > > > > > fail cause errors on unrelated devices. Of course, the real
> > > > > > solutions would be to figure out why the hard reset
> > handler cause
> > > > > > trouble on the second channel. Probably only LSI can
> > do, though.
> > > > > >
> > > > > > Signed-off-by: Bernd Schubert <bs@q-leap.de>
> > > > > >
> > > > > > drivers/message/fusion/mptbase.c | 42
> > > >
> > > > ++++++++++++++++++++++++++++-
> > > >
> > > > > > drivers/message/fusion/mptspi.c | 31 +++++++++++++++++++++
> > > > > > 2 files changed, 72 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > Index: linux-2.6.26/drivers/message/fusion/mptbase.c
> > > >
> > > >
> > ===================================================================
> > > >
> > > > > > --- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
> > > > > > +++ linux-2.6.26/drivers/message/fusion/mptbase.c
> > > > > > @@ -59,6 +59,7 @@
> > > > > > #include <linux/interrupt.h> /* needed for
> > > >
> > > > in_interrupt() proto
> > > >
> > > > > > */ #include <linux/dma-mapping.h>
> > > > > > #include <asm/io.h>
> > > > > > +#include <scsi/scsi_device.h>
> > > > > > #ifdef CONFIG_MTRR
> > > > > > #include <asm/mtrr.h>
> > > > > > #endif
> > > > > > @@ -6452,6 +6453,33 @@
> > mpt_HardResetHandler(MPT_ADAPTER *ioc, i }
> > > > > >
> > > > > > /**
> > > > > > + * Check if there are devices connected to the
> > second (alt) ioc.
> > > > > > + * Return 1 if there is at least on device and 0 if
> > there are
> > > > > > + * none or no alt_ioc.
> > > > > > + */
> > > > > > +static int
> > > > > > +alt_ioc_with_dev(MPT_ADAPTER *ioc)
> > > > > > +{
> > > > > > + struct Scsi_Host *shost;
> > > > > > + struct scsi_device *sdev;
> > > > > > + int have_devices = 0;
> > > > > > +
> > > > > > + if (!ioc->alt_ioc)
> > > > > > + return 0;
> > > > > > +
> > > > > > + shost = ioc->alt_ioc->sh;
> > > > > > +
> > > > > > + shost_for_each_device(sdev, shost) {
> > > > > > + /* when we are here, we know there is
> > is a device
> > > > > > + * attached to this host, which is all we
> > > >
> > > > need to know */
> > > >
> > > > > > + have_devices = 1;
> > > > > > + break;
> > > > > > + }
> > > > > > +
> > > > > > + return have_devices;
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > * mpt_SoftHardResetHandler - Generic reset handler
> > > > > > * @ioc: Pointer to MPT_ADAPTER structure
> > > > > > * @sleepFlag: Indicates if sleep or schedule must
> > be called.
> > > > > > @@ -6466,7 +6494,19 @@
> > mpt_SoftHardResetHandler(MPT_ADAPTER *io
> > > > > >
> > > > > > rc = mpt_SoftResetHandler(ioc, sleepFlag);
> > > > > > if (rc) {
> > > > > > - rc = mpt_HardResetHandler(ioc, sleepFlag);
> > > > > > + if (ioc->no_hard_reset &&
> > alt_ioc_with_dev(ioc)) {
> > > > > > + /* On dual port HBAs based on the
> > > >
> > > > 53C1030 chip the
> > > >
> > > > > > + * hard reset handler will cause
> > > >
> > > > DID_SOFT_ERROR on
> > > >
> > > > > > + * the second (in principle
> > > >
> > > > independent) port.
> > > >
> > > > > > + * Almost always this error cannot
> > > >
> > > > be recovered
> > > >
> > > > > > + * causing entire device failures.
> > > >
> > > > So it better not
> > > >
> > > > > > + * to call the hard reset handler at
> > > >
> > > > all in order to
> > > >
> > > > > > + * prevent failures of
> > independent devices */
> > > > > > + printk(MYIOC_s_INFO_FMT "Skipping
> > > >
> > > > hard reset in "
> > > >
> > > > > > + "order to prevent failures
> > > >
> > > > on %s.\n",
> > > >
> > > > > > + ioc->name, ioc->alt_ioc->name);
> > > > > > + } else
> > > > > > + rc = mpt_HardResetHandler(ioc,
> > sleepFlag);
> > > > > > }
> > > > > >
> > > > > > return rc;
> > > > > > Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> > > >
> > > >
> > ===================================================================
> > > >
> > > > > > --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> > > > > > +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> > > > > > @@ -1301,6 +1301,33 @@ mptspi_resume(struct pci_dev
> > *pdev) #endif
> > > > >
> > > >
> > >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > > > >
> > > > > >-=-=
> > > > > >-= -=*/ +/**
> > > > > > + * avoid_hard_reset - check if hard resets should be avoided
> > > > > > + * @pdev: Pointer to pci_dev structure
> > > > > > + *
> > > > > > + * Hard resets will cause trouble on the the
> > secondary IOC of
> > > > > > + * 53C1030 based devices.
> > > > > > + *
> > > > > > + * Returns 1 if affected chip is found and 1 for
> > > >
> > > > unaffected chips
> > > >
> > > > > > + */
> > > > > > +static int
> > > > > > +avoid_hard_reset(struct pci_dev *pdev) {
> > > > > > + int avoid;
> > > > > > +
> > > > > > + switch (pdev->device) {
> > > > > > + case MPI_MANUFACTPAGE_DEVID_53C1030:
> > > > > > + case MPI_MANUFACTPAGE_DEVID_53C1030ZC:
> > > > > > + /* TODO: which chips are affected as well? */
> > > > > > + avoid = 1;
> > > > > > + break;
> > > > > > + default:
> > > > > > + avoid = 0;
> > > > > > + }
> > > > > > +
> > > > > > + return avoid;
> > > > > > +}
> > > > > > +
> > > > >
> > > >
> > >/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > > > >
> > > > > >-=-=
> > > > > >-= -=*/ /*
> > > > > > * mptspi_probe - Installs scsi devices per bus.
> > > > > > @@ -1509,6 +1536,10 @@ mptspi_probe(struct pci_dev
> > *pdev, const
> > > > > > goto out_mptspi_probe;
> > > > > > }
> > > > > >
> > > > > > + /* hard resets on 53C1030 HBAs will cause trouble on
> > > > > > + secondaray
> > > > > > (alt) + * IOCs, so better no hard reset on these */
> > > > > > + ioc->no_hard_reset = avoid_hard_reset(pdev);
> > > > > > +
> > > > > > /*
> > > > > > * issue internal bus reset
> > > > > > */
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe
> > > > > > linux-scsi" in the body of a message to
> > majordomo@vger.kernel.org
> > > > > > More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> > > > >
> > > > > --
> > > > > Bernd Schubert
> > > > > Q-Leap Networks GmbH
> > > >
> > > > --
> > > > Bernd Schubert
> > > > Q-Leap Networks GmbH
> >
> >
> >
> > --
> > Bernd Schubert
> > Q-Leap Networks GmbH
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [ PATCH 1/4 ] mpt fusion SoftReset handler
2008-09-23 13:20 ` [ PATCH 1/4 ] mpt fusion SoftReset handler Bernd Schubert
@ 2008-10-30 15:37 ` Prakash, Sathya
2008-10-30 16:18 ` Bernd Schubert
0 siblings, 1 reply; 18+ messages in thread
From: Prakash, Sathya @ 2008-10-30 15:37 UTC (permalink / raw)
To: Bernd Schubert, Linux SCSI Mailing List
Cc: Moore, Eric, James Bottomley, DL-MPT Fusion Linux
Bernd,
My comments are inlined with /*SP. Except those the patch looks good
Thanks
Sathya
-----Original Message-----
From: Bernd Schubert [mailto:bs@q-leap.de]
Sent: Tuesday, September 23, 2008 6:51 PM
To: Linux SCSI Mailing List
Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT Fusion Linux
Subject: [ PATCH 1/4 ] mpt fusion SoftReset handler
On dual port 53C1030 based HBAs such as the LSI22320R, the hard reset handler will cause DID_SOFT_ERROR for innocent devices on the second port.
Introduce a mpt_SoftResetHandler() which doesn't cause this issue and slightly improve mpt_HardResetHandler().
This is mostly a backport of the fusion-4.x driver available from LSI.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptbase.c | 211 ++++++++++++++++++++++++----
drivers/message/fusion/mptbase.h | 11 +
drivers/message/fusion/mptctl.c | 7
drivers/message/fusion/mptsas.c | 4
drivers/message/fusion/mptscsih.c | 35 ++--
5 files changed, 218 insertions(+), 50 deletions(-)
Index: linux-2.6.26/drivers/message/fusion/mptbase.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptbase.c
+++ linux-2.6.26/drivers/message/fusion/mptbase.c
@@ -5858,7 +5858,7 @@ mpt_timer_expired(unsigned long data)
dcprintk(ioc, printk(MYIOC_s_DEBUG_FMT "mpt_timer_expired! \n", ioc->name));
/* Perform a FW reload */
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n", ioc->name);
/* No more processing.
@@ -6232,6 +6232,129 @@ mpt_print_ioc_summary(MPT_ADAPTER *ioc,
/*
* Reset Handling
*/
+
+/**
+ * mpt_SoftResetHandler - Issues a less expensive reset
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleepFlag: Indicates if sleep or schedule must be called.
+
+ *
+ * Returns 0 for SUCCESS or -1 if FAILED.
+ *
+ * Message Unit Reset - instructs the IOC to reset the Reply Post and
+ * Free FIFO's. All the Message Frames on Reply Free FIFO are discarded.
+ * All posted buffers are freed, and event notification is turned off.
+ * IOC doesnt reply to any outstanding request. This will transfer IOC
+ * to READY state.
+ **/
+static int
+mpt_SoftResetHandler(MPT_ADAPTER *ioc, int sleepFlag) {
+ int rc;
+ int ii;
+ u8 cb_idx;
+ unsigned long flags;
+ u32 ioc_state;
+ unsigned long time_count;
+
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "SoftResetHandler Entered!\n",
+ ioc->name));
+
+ ioc_state = mpt_GetIocState(ioc, 0) & MPI_IOC_STATE_MASK;
+ if (ioc_state == MPI_IOC_STATE_FAULT ||
+ ioc_state == MPI_IOC_STATE_RESET) {
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT
+ "skipping, either in FAULT or RESET state!\n", ioc->name));
+ return -1;
+ }
+
+ spin_lock_irqsave(&ioc->diagLock, flags);
+ if (ioc->ioc_reset_in_progress) {
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+ return -1;
+ }
+ ioc->ioc_reset_in_progress = 1;
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+
+ rc = -1;
+
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
+ }
+
/*SP: Move the following PRE_RESET signal after issuing the MUR. This will cause the firmware to fault sometime */
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
+ }
+
+ /* Disable reply interrupts (also blocks FreeQ) */
+ CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+ ioc->active = 0;
+ time_count = jiffies;
+ rc = SendIocReset(ioc, MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET, sleepFlag);
+ if (rc != 0)
+ goto out;
+ ioc_state = mpt_GetIocState(ioc, 0) & MPI_IOC_STATE_MASK;
+ if (ioc_state != MPI_IOC_STATE_READY)
+ goto out;
+
+ for (ii = 0; ii < 5; ii++) {
+ /* Get IOC facts! Allow 5 retries */
+ rc = GetIocFacts(ioc, sleepFlag, MPT_HOSTEVENT_IOC_RECOVER);
+ if (rc == 0)
+ break;
+ if (sleepFlag == CAN_SLEEP)
+ msleep(100);
+ else
+ mdelay(100);
+ }
+ if (ii == 5)
+ goto out;
+
+ rc = PrimeIocFifos(ioc);
+ if (rc != 0)
+ goto out;
+
+ rc = SendIocInit(ioc, sleepFlag);
+ if (rc != 0)
+ goto out;
+
+ rc = SendEventNotification(ioc, 1);
+ if (rc != 0)
+ goto out;
+
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
+
+ /*
+ * At this point, we know soft reset succeeded.
+ */
+
+ ioc->active = 1;
+ CHIPREG_WRITE32(&ioc->chip->IntMask, MPI_HIM_DIM);
+
+ out:
+ spin_lock_irqsave(&ioc->diagLock, flags);
+ ioc->ioc_reset_in_progress = 0;
+ ioc->taskmgmt_quiesce_io = 0;
+ ioc->taskmgmt_in_progress = 0;
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+
+ if (ioc->active) { /* otherwise, hard reset coming */
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_POST_RESET);
+ }
+ }
+
+ printk(MYIOC_s_INFO_FMT "SoftResetHandler: completed (%d seconds): %s\n",
+ ioc->name, jiffies_to_msecs(jiffies - time_count)/1000,
+ ((rc == 0) ? "SUCCESS" : "FAILED"));
+
+ return rc;
+}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/**
* mpt_HardResetHandler - Generic reset handler
@@ -6253,9 +6376,10 @@ int
mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag) {
int rc;
+ u8 cb_idx;
unsigned long flags;
+ unsigned long time_count;
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler Entered!\n", ioc->name));
#ifdef MFCNT
printk(MYIOC_s_INFO_FMT "HardResetHandler Entered!\n", ioc->name);
printk("MF count 0x%x !\n", ioc->mfcnt); @@ -6265,12 +6389,15 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
* mpt_do_ioc_recovery at any instant in time.
*/
spin_lock_irqsave(&ioc->diagLock, flags);
- if ((ioc->diagPending) || (ioc->alt_ioc && ioc->alt_ioc->diagPending)){
+ if (ioc->ioc_reset_in_progress) {
spin_unlock_irqrestore(&ioc->diagLock, flags);
return 0;
} else {
ioc->diagPending = 1;
}
+ ioc->ioc_reset_in_progress = 1;
+ if (ioc->alt_ioc)
+ ioc->alt_ioc->ioc_reset_in_progress = 1;
spin_unlock_irqrestore(&ioc->diagLock, flags);
/* FIXME: If do_ioc_recovery fails, repeat....
@@ -6281,42 +6408,71 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
* Prevents timeouts occurring during a diagnostic reset...very bad.
* For all other protocol drivers, this is a no-op.
*/
- {
- u8 cb_idx;
- int r = 0;
-
- for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
- if (MptResetHandlers[cb_idx]) {
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling IOC reset_setup handler #%d\n",
- ioc->name, cb_idx));
- r += mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
- if (ioc->alt_ioc) {
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling alt-%s setup reset handler #%d\n",
- ioc->name, ioc->alt_ioc->name, cb_idx));
- r += mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_SETUP_RESET);
- }
- }
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx]) {
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
+ if (ioc->alt_ioc)
+ mpt_signal_reset(cb_idx, ioc->alt_ioc,
+ MPT_IOC_SETUP_RESET);
}
}
- if ((rc = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_RECOVER, sleepFlag)) != 0) {
- printk(MYIOC_s_WARN_FMT "Cannot recover rc = %d!\n", ioc->name, rc);
+ time_count = jiffies;
+ rc = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_RECOVER, sleepFlag);
+ if (rc != 0) {
+ printk(KERN_WARNING MYNAM ": WARNING - (%d) Cannot recover %s\n",
+ rc, ioc->name);
+ } else {
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
}
- ioc->reload_fw = 0;
- if (ioc->alt_ioc)
- ioc->alt_ioc->reload_fw = 0;
spin_lock_irqsave(&ioc->diagLock, flags);
- ioc->diagPending = 0;
- if (ioc->alt_ioc)
- ioc->alt_ioc->diagPending = 0;
+ ioc->ioc_reset_in_progress = 0;
+ ioc->taskmgmt_quiesce_io = 0;
+ ioc->taskmgmt_in_progress = 0;
+ if (ioc->alt_ioc) {
+ ioc->alt_ioc->ioc_reset_in_progress = 0;
+ ioc->alt_ioc->taskmgmt_quiesce_io = 0;
+ ioc->alt_ioc->taskmgmt_in_progress = 0;
+ }
spin_unlock_irqrestore(&ioc->diagLock, flags);
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler rc = %d!\n", ioc->name, rc));
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx]) {
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_POST_RESET);
+ if (ioc->alt_ioc)
+ mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_POST_RESET);
+ }
+ }
+
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler: completed (%d seconds): %s\n",
+ ioc->name, jiffies_to_msecs(jiffies - time_count)/1000,
+ ((rc == 0) ? "SUCCESS" : "FAILED")));
+ return rc;
+}
+
+/**
+ * mpt_SoftHardResetHandler - Generic reset handler
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleepFlag: Indicates if sleep or schedule must be called.
+ *
+ * First try to do a soft reset and if this fails, call the
+ * hard-reset-handler
+ */
+int
+mpt_SoftHardResetHandler(MPT_ADAPTER *ioc, int sleepFlag) {
+ int rc;
+
+ rc = mpt_SoftResetHandler(ioc, sleepFlag);
+ if (rc) {
+ rc = mpt_HardResetHandler(ioc, sleepFlag);
+ }
return rc;
}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
static void
EventDescriptionStr(u8 event, u32 evData0, char *evStr) @@ -7475,6 +7631,7 @@ EXPORT_SYMBOL(mpt_verify_adapter);
EXPORT_SYMBOL(mpt_GetIocState);
EXPORT_SYMBOL(mpt_print_ioc_summary);
EXPORT_SYMBOL(mpt_HardResetHandler);
+EXPORT_SYMBOL(mpt_SoftHardResetHandler);
EXPORT_SYMBOL(mpt_config);
EXPORT_SYMBOL(mpt_findImVolumes);
EXPORT_SYMBOL(mpt_alloc_fw_memory);
Index: linux-2.6.26/drivers/message/fusion/mptbase.h
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptbase.h
+++ linux-2.6.26/drivers/message/fusion/mptbase.h
@@ -699,6 +699,9 @@ typedef struct _MPT_ADAPTER
MPT_SAS_MGMT sas_mgmt;
struct work_struct sas_persist_task;
+ int taskmgmt_in_progress;
+ u8 taskmgmt_quiesce_io;
+
struct work_struct fc_setup_reset_work;
struct list_head fc_rports;
struct work_struct fc_lsc_work;
@@ -707,6 +710,11 @@ typedef struct _MPT_ADAPTER
struct work_struct fc_rescan_work;
char fc_rescan_work_q_name[KOBJ_NAME_LEN];
struct workqueue_struct *fc_rescan_work_q;
+
+ unsigned long hard_resets; /* driver forced bus resets count */
+ unsigned long soft_resets; /* fw/external bus resets count */
+ u8 ioc_reset_in_progress;
+
struct scsi_cmnd **ScsiLookup;
spinlock_t scsi_lookup_lock;
} MPT_ADAPTER;
@@ -836,8 +844,6 @@ typedef struct _MPT_SCSI_HOST {
MPT_FRAME_HDR *cmdPtr; /* Ptr to nonOS request */
struct scsi_cmnd *abortSCpnt;
MPT_LOCAL_REPLY localReply; /* internal cmd reply struct */
- unsigned long hard_resets; /* driver forced bus resets count */
- unsigned long soft_resets; /* fw/external bus resets count */
unsigned long timeouts; /* cmd timeouts */
ushort sel_timeout[MPT_MAX_FC_DEVICES];
char *info_kbuf;
@@ -908,6 +914,7 @@ extern int mpt_verify_adapter(int iocid
extern u32 mpt_GetIocState(MPT_ADAPTER *ioc, int cooked);
extern void mpt_print_ioc_summary(MPT_ADAPTER *ioc, char *buf, int *size, int len, int showlan);
extern int mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag);
+extern int mpt_SoftHardResetHandler(MPT_ADAPTER *ioc, int sleepFlag);
extern int mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS *cfg);
extern int mpt_alloc_fw_memory(MPT_ADAPTER *ioc, int size);
extern void mpt_free_fw_memory(MPT_ADAPTER *ioc);
Index: linux-2.6.26/drivers/message/fusion/mptscsih.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptscsih.c
+++ linux-2.6.26/drivers/message/fusion/mptscsih.c
@@ -1605,7 +1605,7 @@ mptscsih_TMHandler(MPT_SCSI_HOST *hd, u8
"TM Handler for type=%x: IOC Not operational (0x%x)!\n",
ioc->name, type, ioc_raw_state);
printk(MYIOC_s_WARN_FMT " Issuing HardReset!!\n", ioc->name);
- if (mpt_HardResetHandler(ioc, CAN_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, CAN_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT "TMHandler: HardReset "
"FAILED!!\n", ioc->name);
return FAILED;
@@ -1621,8 +1621,8 @@ mptscsih_TMHandler(MPT_SCSI_HOST *hd, u8
/* Isse the Task Mgmt request.
*/
- if (hd->hard_resets < -1)
- hd->hard_resets++;
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
rc = mptscsih_IssueTaskMgmt(hd, type, channel, id, lun,
ctx2abort, timeout);
@@ -1724,7 +1724,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd
ioc, mf));
dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling HardReset! \n",
ioc->name));
- retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
+ retval = mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
ioc->name, retval));
goto fail_out;
@@ -1999,11 +1999,12 @@ int
mptscsih_host_reset(struct scsi_cmnd *SCpnt) {
MPT_SCSI_HOST * hd;
- int retval;
+ int retval, status;
MPT_ADAPTER *ioc;
/* If we can't locate the host to reset, then we failed. */
- if ((hd = shost_priv(SCpnt->device->host)) == NULL){
+ hd = shost_priv(SCpnt->device->host);
+ if (hd == NULL) {
printk(KERN_ERR MYNAM ": host reset: "
"Can't locate host! (sc=%p)\n", SCpnt);
return FAILED;
@@ -2016,21 +2017,23 @@ mptscsih_host_reset(struct scsi_cmnd *SC
/* If our attempts to reset the host failed, then return a failed
* status. The host will be taken off line by the SCSI mid-layer.
*/
- if (mpt_HardResetHandler(ioc, CAN_SLEEP) < 0) {
- retval = FAILED;
- } else {
+ retval = mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
+
+ if (retval < 0)
+ status = FAILED;
+ else {
/* Make sure TM pending is cleared and TM state is set to
* NONE.
*/
- retval = 0;
+ status = SUCCESS;
hd->tmPending = 0;
hd->tmState = TM_STATE_NONE;
}
printk(MYIOC_s_INFO_FMT "host reset: %s (sc=%p)\n",
- ioc->name, ((retval == 0) ? "SUCCESS" : "FAILED" ), SCpnt);
+ ioc->name, ((status == SUCCESS) ? "SUCCESS" : "FAILED"),
+ SCpnt);
- return retval;
+ return status;
}
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
@@ -2219,7 +2222,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *
*/
if (iocstatus == MPI_IOCSTATUS_SCSI_TASK_MGMT_FAILED ||
hd->cmdPtr)
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT " Firmware Reload FAILED!!\n", ioc->name);
break;
@@ -2741,8 +2744,8 @@ mptscsih_event_process(MPT_ADAPTER *ioc,
break;
case MPI_EVENT_IOC_BUS_RESET: /* 04 */
case MPI_EVENT_EXT_BUS_RESET: /* 05 */
- if (hd && (ioc->bus_type == SPI) && (hd->soft_resets < -1))
- hd->soft_resets++;
+ if (hd && (ioc->bus_type == SPI) && (ioc->soft_resets < -1))
+ ioc->soft_resets++;
break;
case MPI_EVENT_LOGOUT: /* 09 */
/* FIXME! */
@@ -2980,7 +2983,7 @@ mptscsih_timer_expired(unsigned long dat
*/
} else {
/* Perform a FW reload */
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0) {
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
+ {
printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n", ioc->name);
}
}
Index: linux-2.6.26/drivers/message/fusion/mptctl.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptctl.c
+++ linux-2.6.26/drivers/message/fusion/mptctl.c
@@ -323,7 +323,7 @@ static void mptctl_timeout_expired (MPT_
*/
dctlprintk(ioctl->ioc, printk(MYIOC_s_DEBUG_FMT "Calling HardReset! \n",
ioctl->ioc->name));
- mpt_HardResetHandler(ioctl->ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioctl->ioc, CAN_SLEEP);
}
return;
@@ -680,6 +680,7 @@ static int mptctl_do_reset(unsigned long
dctlprintk(iocp, printk(MYIOC_s_DEBUG_FMT "mptctl_do_reset called.\n",
iocp->name));
+ /* FIXME: Can we call mptSoftHardResetHandler() here? */
/*SP No It should be retained as a last option to clear firmware faults using utilities */
if (mpt_HardResetHandler(iocp, CAN_SLEEP) != 0) {
printk (MYIOC_s_ERR_FMT "%s@%d::mptctl_do_reset - reset failed.\n",
iocp->name, __FILE__, __LINE__); @@ -2467,8 +2468,8 @@ mptctl_hp_hostinfo(unsigned long arg, un
MPT_SCSI_HOST *hd = shost_priv(ioc->sh);
if (hd && (cim_rev == 1)) {
- karg.hard_resets = hd->hard_resets;
- karg.soft_resets = hd->soft_resets;
+ karg.hard_resets = ioc->hard_resets;
+ karg.soft_resets = ioc->soft_resets;
karg.timeouts = hd->timeouts;
}
}
Index: linux-2.6.26/drivers/message/fusion/mptsas.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptsas.c
+++ linux-2.6.26/drivers/message/fusion/mptsas.c
@@ -1167,7 +1167,7 @@ static int mptsas_phy_reset(struct sas_p
if (!timeleft) {
/* On timeout reset the board */
mpt_free_msg_frame(ioc, mf);
- mpt_HardResetHandler(ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
error = -ETIMEDOUT;
goto out_unlock;
}
@@ -1345,7 +1345,7 @@ static int mptsas_smp_handler(struct Scs
if (!timeleft) {
printk(MYIOC_s_ERR_FMT "%s: smp timeout!\n", ioc->name, __FUNCTION__);
/* On timeout reset the board */
- mpt_HardResetHandler(ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
ret = -ETIMEDOUT;
goto unmap;
}
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [ PATCH 1/4 ] mpt fusion SoftReset handler
2008-10-30 15:37 ` Prakash, Sathya
@ 2008-10-30 16:18 ` Bernd Schubert
2008-10-30 18:10 ` Prakash, Sathya
0 siblings, 1 reply; 18+ messages in thread
From: Bernd Schubert @ 2008-10-30 16:18 UTC (permalink / raw)
To: Prakash, Sathya
Cc: Linux SCSI Mailing List, Moore, Eric, James Bottomley,
DL-MPT Fusion Linux
Hello Sathya,
thanks for review! See below for my comments.
On Thursday 30 October 2008 16:37:33 Prakash, Sathya wrote:
> Bernd,
> My comments are inlined with /*SP. Except those the patch looks good
> Thanks
> Sathya
> + for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
> + if (MptResetHandlers[cb_idx])
> + mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
> + }
> +
>
> /*SP: Move the following PRE_RESET signal after issuing the MUR. This will
> cause the firmware to fault sometime */
>
> + for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
> + if (MptResetHandlers[cb_idx])
> + mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
> + }
> +
> + /* Disable reply interrupts (also blocks FreeQ) */
> + CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
> + ioc->active = 0;
> + time_count = jiffies;
> + rc = SendIocReset(ioc, MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET,
> sleepFlag); + if (rc != 0)
> + goto out;
You mean
mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
here? But at this point we already did a reset, so MPT_IOC_PRE_RESET seems a
bit odd. But if you are sure... (I really wish I had the MPT docs).
[...]
>
> @@ -680,6 +680,7 @@ static int mptctl_do_reset(unsigned long
> dctlprintk(iocp, printk(MYIOC_s_DEBUG_FMT "mptctl_do_reset
> called.\n", iocp->name));
>
> + /* FIXME: Can we call mptSoftHardResetHandler() here? */
> /*SP No It should be retained as a last option to clear firmware faults
> using utilities */ if (mpt_HardResetHandler(iocp, CAN_SLEEP) != 0) {
Ok, I will remove this hunk then.
Cheers,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [ PATCH 3/4 ] mpt fusion prevent DV deadlock
2008-09-23 13:27 ` [ PATCH 3/4 ] mpt fusion prevent DV deadlock Bernd Schubert
@ 2008-10-30 17:58 ` Prakash, Sathya
2008-10-31 18:53 ` Bernd Schubert
0 siblings, 1 reply; 18+ messages in thread
From: Prakash, Sathya @ 2008-10-30 17:58 UTC (permalink / raw)
To: Bernd Schubert, Linux SCSI Mailing List
Cc: Moore, Eric, James Bottomley, DL-MPT Fusion Linux
Bernd,
This looks OK for me, but I am not sure whether we should take this work-around to the upstream driver. As this is a workaround to avoid the deadlock and not the actual fix. I hope you can retain this with your in-house driver.
May be James can comment whether we can take this into MPT driver or not?
Thanks
Sathya
-----Original Message-----
From: Bernd Schubert [mailto:bs@q-leap.de]
Sent: Tuesday, September 23, 2008 6:58 PM
To: Linux SCSI Mailing List
Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT Fusion Linux
Subject: [ PATCH 3/4 ] mpt fusion prevent DV deadlock
The mpt fusion driver will do a domain revalidation on an ioc reset, but this DV might cause a live deadlock. The problem has been entirely analyzed in this thread http://marc.info/?t=118039577800004, but so far none of the suggested solutions has been implemented.
This patch simply disables the domain revalidation, if it does know it will run into the deadlock.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptspi.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux-2.6.26/drivers/message/fusion/mptspi.c
===================================================================
--- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
+++ linux-2.6.26/drivers/message/fusion/mptspi.c
@@ -672,12 +672,24 @@ static void mptspi_dv_device(struct _MPT {
VirtTarget *vtarget = scsi_target(sdev)->hostdata;
MPT_ADAPTER *ioc = hd->ioc;
+ unsigned long nr_requests = sdev->request_queue->nr_requests;
+ struct request_list *rl = &sdev->request_queue->rq;
/* no DV on RAID devices */
if (sdev->channel == 0 &&
mptspi_is_raid(hd, sdev->id))
return;
+ if (rl->count[0] + 1 >= nr_requests
+ || rl->count[1] + 1 >= nr_requests) {
+ /* we must NOT do a DV after an error recovery, when we
+ * don't have left a space in the request list, since
+ * this will cause a live dead lock */
+ starget_printk(KERN_INFO, scsi_target(sdev), MYIOC_s_FMT
+ "Skipping DV, to prevent dead lock!\n", ioc->name);
+ return;
+ }
+
/* If this is a piece of a RAID, then quiesce first */
if (sdev->channel == 1 &&
mptscsih_quiesce_raid(hd, 1, vtarget->channel, vtarget->id) < 0) {
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [ PATCH 1/4 ] mpt fusion SoftReset handler
2008-10-30 16:18 ` Bernd Schubert
@ 2008-10-30 18:10 ` Prakash, Sathya
0 siblings, 0 replies; 18+ messages in thread
From: Prakash, Sathya @ 2008-10-30 18:10 UTC (permalink / raw)
To: Bernd Schubert
Cc: Linux SCSI Mailing List, Moore, Eric, James Bottomley,
DL-MPT Fusion Linux
Hi Bernd,
Response inline
Thanks
Sathya
-----Original Message-----
From: Bernd Schubert [mailto:bs@q-leap.de]
Sent: Thursday, October 30, 2008 9:48 PM
To: Prakash, Sathya
Cc: Linux SCSI Mailing List; Moore, Eric; James Bottomley; DL-MPT Fusion Linux
Subject: Re: [ PATCH 1/4 ] mpt fusion SoftReset handler
Hello Sathya,
thanks for review! See below for my comments.
On Thursday 30 October 2008 16:37:33 Prakash, Sathya wrote:
> Bernd,
> My comments are inlined with /*SP. Except those the patch looks good
> Thanks Sathya
> + for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
> + if (MptResetHandlers[cb_idx])
> + mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
> + }
> +
>
> /*SP: Move the following PRE_RESET signal after issuing the MUR. This
> will cause the firmware to fault sometime */
>
> + for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
> + if (MptResetHandlers[cb_idx])
> + mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
> + }
> +
> + /* Disable reply interrupts (also blocks FreeQ) */
> + CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
> + ioc->active = 0;
> + time_count = jiffies;
> + rc = SendIocReset(ioc, MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET,
> sleepFlag); + if (rc != 0)
> + goto out;
You mean
mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
here? But at this point we already did a reset, so MPT_IOC_PRE_RESET seems a bit odd. But if you are sure... (I really wish I had the MPT docs).
[...]
/* Yes Here. There name PRE_RESET is confusing, if you see the action inside PRE_RESET in mptscsih.c it clears the pending requests issued to firmware. Before sending the MUR to FW. There is possibility that when firmware before processing the MUR tries to find the request to do DMA and couldn't find that because the driver cleared it already in PRE_RESET and hence resulted in fault.
>
> @@ -680,6 +680,7 @@ static int mptctl_do_reset(unsigned long
> dctlprintk(iocp, printk(MYIOC_s_DEBUG_FMT "mptctl_do_reset
> called.\n", iocp->name));
>
> + /* FIXME: Can we call mptSoftHardResetHandler() here? */
> /*SP No It should be retained as a last option to clear firmware
> faults using utilities */ if (mpt_HardResetHandler(iocp, CAN_SLEEP) !=
> 0) {
Ok, I will remove this hunk then.
Cheers,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [ PATCH 3/4 ] mpt fusion prevent DV deadlock
2008-10-30 17:58 ` Prakash, Sathya
@ 2008-10-31 18:53 ` Bernd Schubert
0 siblings, 0 replies; 18+ messages in thread
From: Bernd Schubert @ 2008-10-31 18:53 UTC (permalink / raw)
To: Prakash, Sathya
Cc: Linux SCSI Mailing List, Moore, Eric, James Bottomley,
DL-MPT Fusion Linux
Hello Sathya,
thanks for review. I have prepared an updated patch series, but can't test it
properly, since someone has broken the error handler between 2.6.26 and
2.6.28-rc. It activates the error handler, but then somewhere deadlocks (not
allways, but in 4 of 5 testcases) :(
I don't have time to figure out what is this now, all of that has to wait till
next wednesday when I'm back to office.
Regarding the DV deadlock patch, this is something we also really need to fix
upstream, since transtec (from whom I get my test hardware) mostly uses SLES
or RHEL kernels and this patch is essential for them.
Thanks,
Bernd
On Thursday 30 October 2008 18:58:31 Prakash, Sathya wrote:
> Bernd,
> This looks OK for me, but I am not sure whether we should take this
> work-around to the upstream driver. As this is a workaround to avoid the
> deadlock and not the actual fix. I hope you can retain this with your
> in-house driver.
>
> May be James can comment whether we can take this into MPT driver or not?
>
> Thanks
> Sathya
>
> -----Original Message-----
> From: Bernd Schubert [mailto:bs@q-leap.de]
> Sent: Tuesday, September 23, 2008 6:58 PM
> To: Linux SCSI Mailing List
> Cc: Moore, Eric; Prakash, Sathya; James Bottomley; DL-MPT Fusion Linux
> Subject: [ PATCH 3/4 ] mpt fusion prevent DV deadlock
>
> The mpt fusion driver will do a domain revalidation on an ioc reset, but
> this DV might cause a live deadlock. The problem has been entirely analyzed
> in this thread http://marc.info/?t=118039577800004, but so far none of the
> suggested solutions has been implemented. This patch simply disables the
> domain revalidation, if it does know it will run into the deadlock.
>
> Signed-off-by: Bernd Schubert <bs@q-leap.de>
>
> drivers/message/fusion/mptspi.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> Index: linux-2.6.26/drivers/message/fusion/mptspi.c
> ===================================================================
> --- linux-2.6.26.orig/drivers/message/fusion/mptspi.c
> +++ linux-2.6.26/drivers/message/fusion/mptspi.c
> @@ -672,12 +672,24 @@ static void mptspi_dv_device(struct _MPT {
> VirtTarget *vtarget = scsi_target(sdev)->hostdata;
> MPT_ADAPTER *ioc = hd->ioc;
> + unsigned long nr_requests = sdev->request_queue->nr_requests;
> + struct request_list *rl = &sdev->request_queue->rq;
>
> /* no DV on RAID devices */
> if (sdev->channel == 0 &&
> mptspi_is_raid(hd, sdev->id))
> return;
>
> + if (rl->count[0] + 1 >= nr_requests
> + || rl->count[1] + 1 >= nr_requests) {
> + /* we must NOT do a DV after an error recovery, when we
> + * don't have left a space in the request list, since
> + * this will cause a live dead lock */
> + starget_printk(KERN_INFO, scsi_target(sdev), MYIOC_s_FMT
> + "Skipping DV, to prevent dead lock!\n", ioc->name);
> + return;
> + }
> +
> /* If this is a piece of a RAID, then quiesce first */
> if (sdev->channel == 1 &&
> mptscsih_quiesce_raid(hd, 1, vtarget->channel, vtarget->id) <
> 0) {
>
>
>
> --
> Bernd Schubert
> Q-Leap Networks GmbH
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-10-31 18:53 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-23 13:16 [PATCH 0/4 V2] mpt fusion error handler patches Bernd Schubert
2008-09-23 13:20 ` [ PATCH 1/4 ] mpt fusion SoftReset handler Bernd Schubert
2008-10-30 15:37 ` Prakash, Sathya
2008-10-30 16:18 ` Bernd Schubert
2008-10-30 18:10 ` Prakash, Sathya
2008-09-23 13:26 ` [ PATCH 1/4 ] mpt fusion disable hard resets for 53C1030 based devices Bernd Schubert
2008-09-23 13:30 ` [ PATCH 2/4 " Bernd Schubert
2008-10-06 9:07 ` Prakash, Sathya
2008-10-06 9:32 ` Bernd Schubert
2008-09-23 13:27 ` [ PATCH 3/4 ] mpt fusion prevent DV deadlock Bernd Schubert
2008-10-30 17:58 ` Prakash, Sathya
2008-10-31 18:53 ` Bernd Schubert
2008-09-23 13:28 ` [PATCH 4/4 ] Increase scsi timeouts Bernd Schubert
2008-10-06 9:07 ` Prakash, Sathya
[not found] <6A4D764DC1BDE14495DA8DC60A3D69531E967A0759@hkgmail01.lsi.com>
2008-10-07 16:25 ` [ PATCH 2/4 ] mpt fusion disable hard resets for 53C1030 based devices Moore, Eric
2008-10-07 17:09 ` Bernd Schubert
2008-10-07 20:00 ` Moore, Eric
2008-10-07 21:27 ` Bernd Schubert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox