linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yaniv Gardi <ygardi@codeaurora.org>
To: James.Bottomley@HansenPartnership.com, pebolle@tiscali.nl,
	hch@infradead.org
Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-arm-msm@vger.kernel.org, santoshsy@gmail.com,
	linux-scsi-owner@vger.kernel.org, subhashj@codeaurora.org,
	ygardi@codeaurora.org, gbroner@codeaurora.org,
	draviv@codeaurora.org, Vinayak Holikatti <vinholikatti@gmail.com>,
	"James E.J. Bottomley" <JBottomley@odin.com>
Subject: [PATCH v1 11/17] scsi: ufs: add error recovery after DL NAC error
Date: Sun, 13 Sep 2015 17:52:51 +0300	[thread overview]
Message-ID: <1442155977-7686-12-git-send-email-ygardi@codeaurora.org> (raw)
In-Reply-To: <1442155977-7686-1-git-send-email-ygardi@codeaurora.org>

Some vendor's UFS device sends back to back NACs for the DL data frames
causing the host controller to raise the DFES error status. Sometimes
such UFS devices send back to back NAC without waiting for new
retransmitted DL frame from the host and in such cases it might be
possible the Host UniPro goes into bad state without raising the DFES
error interrupt. If this happens then all the pending commands would
timeout only after respective SW command (which is generally too
large).

This change workarounds such device behaviour like this:
- As soon as SW sees the DL NAC error, it would schedule the error
  handler
- Error handler would sleep for 50ms to see if there any fatal errors
  raised by UFS controller.
   - If there are fatal errors then SW does normal error recovery.
   - If there are no fatal errors then SW sends the NOP command to
     device to check if link is alive.
       - If NOP command times out, SW does normal error recovery
       - If NOP command succeed, skip the error handling.

If DL NAC error is seen multiple times with some vendor's UFS devices
then enable this quirk to initiate quick error recovery and also
silence related error logs to reduce spamming of kernel logs.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>

---
 drivers/scsi/ufs/ufshcd.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufshci.h |  2 +
 2 files changed, 95 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 6a1a8cd..a649250 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3837,6 +3837,79 @@ static void ufshcd_complete_requests(struct ufs_hba *hba)
 }
 
 /**
+ * ufshcd_quirk_dl_nac_errors - This function checks if error handling is
+ *				to recover from the DL NAC errors or not.
+ * @hba: per-adapter instance
+ *
+ * Returns true if error handling is required, false otherwise
+ */
+static bool ufshcd_quirk_dl_nac_errors(struct ufs_hba *hba)
+{
+	unsigned long flags;
+	bool err_handling = true;
+
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	/*
+	 * UFS_DEVICE_QUIRK_RECOVERY_FROM_DL_NAC_ERRORS only workaround the
+	 * device fatal error and/or DL NAC & REPLAY timeout errors.
+	 */
+	if (hba->saved_err & (CONTROLLER_FATAL_ERROR | SYSTEM_BUS_FATAL_ERROR))
+		goto out;
+
+	if ((hba->saved_err & DEVICE_FATAL_ERROR) ||
+	    ((hba->saved_err & UIC_ERROR) &&
+	     (hba->saved_uic_err & UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))
+		goto out;
+
+	if ((hba->saved_err & UIC_ERROR) &&
+	    (hba->saved_uic_err & UFSHCD_UIC_DL_NAC_RECEIVED_ERROR)) {
+		int err;
+		/*
+		 * wait for 50ms to see if we can get any other errors or not.
+		 */
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+		msleep(50);
+		spin_lock_irqsave(hba->host->host_lock, flags);
+
+		/*
+		 * now check if we have got any other severe errors other than
+		 * DL NAC error?
+		 */
+		if ((hba->saved_err & INT_FATAL_ERRORS) ||
+		    ((hba->saved_err & UIC_ERROR) &&
+		    (hba->saved_uic_err & ~UFSHCD_UIC_DL_NAC_RECEIVED_ERROR)))
+			goto out;
+
+		/*
+		 * As DL NAC is the only error received so far, send out NOP
+		 * command to confirm if link is still active or not.
+		 *   - If we don't get any response then do error recovery.
+		 *   - If we get response then clear the DL NAC error bit.
+		 */
+
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+		err = ufshcd_verify_dev_init(hba);
+		spin_lock_irqsave(hba->host->host_lock, flags);
+
+		if (err)
+			goto out;
+
+		/* Link seems to be alive hence ignore the DL NAC errors */
+		if (hba->saved_uic_err == UFSHCD_UIC_DL_NAC_RECEIVED_ERROR)
+			hba->saved_err &= ~UIC_ERROR;
+		/* clear NAC error */
+		hba->saved_uic_err &= ~UFSHCD_UIC_DL_NAC_RECEIVED_ERROR;
+		if (!hba->saved_uic_err) {
+			err_handling = false;
+			goto out;
+		}
+	}
+out:
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	return err_handling;
+}
+
+/**
  * ufshcd_err_handler - handle UFS errors that require s/w attention
  * @work: pointer to work structure
  */
@@ -3864,6 +3937,17 @@ static void ufshcd_err_handler(struct work_struct *work)
 
 	/* Complete requests that have door-bell cleared by h/w */
 	ufshcd_complete_requests(hba);
+
+	if (hba->dev_quirks & UFS_DEVICE_QUIRK_RECOVERY_FROM_DL_NAC_ERRORS) {
+		bool ret;
+
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+		/* release the lock as ufshcd_quirk_dl_nac_errors() may sleep */
+		ret = ufshcd_quirk_dl_nac_errors(hba);
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		if (!ret)
+			goto skip_err_handling;
+	}
 	if ((hba->saved_err & INT_FATAL_ERRORS) ||
 	    ((hba->saved_err & UIC_ERROR) &&
 	    (hba->saved_uic_err & (UFSHCD_UIC_DL_PA_INIT_ERROR |
@@ -3939,6 +4023,7 @@ skip_pending_xfer_clear:
 		hba->saved_uic_err = 0;
 	}
 
+skip_err_handling:
 	if (!needs_reset) {
 		hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
 		if (hba->saved_err || hba->saved_uic_err)
@@ -3967,6 +4052,14 @@ static void ufshcd_update_uic_error(struct ufs_hba *hba)
 	reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DATA_LINK_LAYER);
 	if (reg & UIC_DATA_LINK_LAYER_ERROR_PA_INIT)
 		hba->uic_error |= UFSHCD_UIC_DL_PA_INIT_ERROR;
+	else if (hba->dev_quirks &
+		   UFS_DEVICE_QUIRK_RECOVERY_FROM_DL_NAC_ERRORS) {
+		if (reg & UIC_DATA_LINK_LAYER_ERROR_NAC_RECEIVED)
+			hba->uic_error |=
+				UFSHCD_UIC_DL_NAC_RECEIVED_ERROR;
+		else if (reg & UIC_DATA_LINK_LAYER_ERROR_TCx_REPLAY_TIMEOUT)
+			hba->uic_error |= UFSHCD_UIC_DL_TCx_REPLAY_ERROR;
+	}
 
 	/* UIC NL/TL/DME errors needs software retry */
 	reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_NETWORK_LAYER);
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h
index 0ae0967..2b05bfb 100644
--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -170,6 +170,8 @@ enum {
 #define UIC_DATA_LINK_LAYER_ERROR		UFS_BIT(31)
 #define UIC_DATA_LINK_LAYER_ERROR_CODE_MASK	0x7FFF
 #define UIC_DATA_LINK_LAYER_ERROR_PA_INIT	0x2000
+#define UIC_DATA_LINK_LAYER_ERROR_NAC_RECEIVED	0x0001
+#define UIC_DATA_LINK_LAYER_ERROR_TCx_REPLAY_TIMEOUT 0x0002
 
 /* UECN - Host UIC Error Code Network Layer 40h */
 #define UIC_NETWORK_LAYER_ERROR			UFS_BIT(31)
-- 
1.8.5.2

-- 
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

  parent reply	other threads:[~2015-09-13 14:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-13 14:52 [PATCH v1 00/17] add fixes, device quirks, error recovery, Yaniv Gardi
     [not found] ` <1442155977-7686-1-git-send-email-ygardi-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2015-09-13 14:52   ` [PATCH v1 01/17] scsi: ufs-qcom: add number of lanes per direction Yaniv Gardi
2015-10-21 15:47     ` Akinobu Mita
2015-09-13 14:52 ` [PATCH v1 02/17] scsi: ufs: add option to change default UFS power management level Yaniv Gardi
2015-10-21 15:51   ` Akinobu Mita
2015-10-26 13:04     ` ygardi
2015-09-13 14:52 ` [PATCH v1 03/17] scsi: ufs: optimize system suspend handling Yaniv Gardi
2015-09-13 14:52 ` [PATCH v1 04/17] scsi: ufs: avoid spurious UFS host controller interrupts Yaniv Gardi
2015-09-13 14:52 ` [PATCH v1 05/17] scsi: ufs: implement scsi host timeout handler Yaniv Gardi
2015-09-13 14:52 ` [PATCH v1 06/17] scsi :ufs: verify hba controller hce reg value Yaniv Gardi
2015-10-21 15:53   ` Akinobu Mita
2015-10-26 13:11     ` ygardi
2015-09-13 14:52 ` [PATCH v1 07/17] scsi: ufs: separate device and host quirks Yaniv Gardi
2015-10-21 15:59   ` Akinobu Mita
2015-10-26 13:30     ` ygardi
2015-09-13 14:52 ` [PATCH v1 08/17] scsi: ufs: split broken LCC quirk Yaniv Gardi
2015-10-21 16:00   ` Akinobu Mita
2015-10-26 13:58     ` ygardi
2015-09-13 14:52 ` [PATCH v1 09/17] scsi: ufs: disable vccq if it's not needed by UFS device Yaniv Gardi
2015-09-13 14:52 ` [PATCH v1 10/17] scsi: ufs: make error handling bit faster Yaniv Gardi
2015-09-13 14:52 ` Yaniv Gardi [this message]
2015-09-13 14:52 ` [PATCH v1 12/17] scsi: ufs: add retry for query descriptors Yaniv Gardi
2015-10-21 16:05   ` Akinobu Mita
2015-10-26 14:03     ` ygardi
2015-09-13 14:52 ` [PATCH v1 13/17] scsi: ufs: handle non spec compliant bkops behaviour by device Yaniv Gardi
2015-09-13 14:52 ` [PATCH v1 14/17] scsi: ufs: tune UniPro parameters to optimize hibern8 exit time Yaniv Gardi
2015-10-21 16:10   ` Akinobu Mita
2015-10-26 14:32     ` ygardi
2015-09-13 14:52 ` [PATCH v1 15/17] scsi: ufs: fix leakage during link off state Yaniv Gardi
2015-09-13 14:52 ` [PATCH v1 16/17] scsi: ufs: add delay before putting UFS rails in low power modes Yaniv Gardi
2015-10-21 16:11   ` Akinobu Mita
2015-10-26 15:08     ` ygardi
2015-09-13 14:52 ` [PATCH v1 17/17] scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup Yaniv Gardi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1442155977-7686-12-git-send-email-ygardi@codeaurora.org \
    --to=ygardi@codeaurora.org \
    --cc=JBottomley@odin.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=draviv@codeaurora.org \
    --cc=gbroner@codeaurora.org \
    --cc=hch@infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi-owner@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=pebolle@tiscali.nl \
    --cc=santoshsy@gmail.com \
    --cc=subhashj@codeaurora.org \
    --cc=vinholikatti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).