linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ira.weiny@intel.com
To: gregkh@linuxfoundation.org, devel@driverdev.osuosl.org
Cc: linux-rdma@vger.kernel.org, dennis.dalessandro@intel.com,
	dledford@redhat.com,
	Vennila Megavannan <vennila.megavannan@intel.com>
Subject: [PATCH v3 04/23] staging/rdma/hfi1: Prevent host software lock up
Date: Mon, 26 Oct 2015 10:28:30 -0400	[thread overview]
Message-ID: <1445869729-7507-5-git-send-email-ira.weiny@intel.com> (raw)
In-Reply-To: <1445869729-7507-1-git-send-email-ira.weiny@intel.com>

From: Vennila Megavannan <vennila.megavannan@intel.com>

If packets stop egressing the hardware link, software can lock up.

Implement a timeout for send context halt recovery.  This patch increases the
timeout for packet egress to 500 us and timer resets to zero if the packet
occupancy changes. Also we bounce the link on time out.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/staging/rdma/hfi1/pio.c  | 14 +++++++++++---
 drivers/staging/rdma/hfi1/sdma.c | 15 ++++++++++++---
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c
index 67dd93a6888c..e5c32db4bc67 100644
--- a/drivers/staging/rdma/hfi1/pio.c
+++ b/drivers/staging/rdma/hfi1/pio.c
@@ -922,10 +922,12 @@ void sc_disable(struct send_context *sc)
 static void sc_wait_for_packet_egress(struct send_context *sc, int pause)
 {
 	struct hfi1_devdata *dd = sc->dd;
-	u64 reg;
+	u64 reg = 0;
+	u64 reg_prev;
 	u32 loop = 0;
 
 	while (1) {
+		reg_prev = reg;
 		reg = read_csr(dd, sc->hw_context * 8 +
 			       SEND_EGRESS_CTXT_STATUS);
 		/* done if egress is stopped */
@@ -934,11 +936,17 @@ static void sc_wait_for_packet_egress(struct send_context *sc, int pause)
 		reg = packet_occupancy(reg);
 		if (reg == 0)
 			break;
-		if (loop > 100) {
+		/* counter is reset if occupancy count changes */
+		if (reg != reg_prev)
+			loop = 0;
+		if (loop > 500) {
+			/* timed out - bounce the link */
 			dd_dev_err(dd,
-				"%s: context %u(%u) timeout waiting for packets to egress, remaining count %u\n",
+				"%s: context %u(%u) timeout waiting for packets to egress, remaining count %u, bouncing link\n",
 				__func__, sc->sw_index,
 				sc->hw_context, (u32)reg);
+			queue_work(dd->pport->hfi1_wq,
+				&dd->pport->link_bounce_work);
 			break;
 		}
 		loop++;
diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c
index 63ab72102183..d57531796723 100644
--- a/drivers/staging/rdma/hfi1/sdma.c
+++ b/drivers/staging/rdma/hfi1/sdma.c
@@ -303,17 +303,26 @@ static void sdma_wait_for_packet_egress(struct sdma_engine *sde,
 	u64 off = 8 * sde->this_idx;
 	struct hfi1_devdata *dd = sde->dd;
 	int lcnt = 0;
+	u64 reg_prev;
+	u64 reg = 0;
 
 	while (1) {
-		u64 reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS);
+		reg_prev = reg;
+		reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS);
 
 		reg &= SDMA_EGRESS_PACKET_OCCUPANCY_SMASK;
 		reg >>= SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT;
 		if (reg == 0)
 			break;
-		if (lcnt++ > 100) {
-			dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u\n",
+		/* counter is reest if accupancy count changes */
+		if (reg != reg_prev)
+			lcnt = 0;
+		if (lcnt++ > 500) {
+			/* timed out - bounce the link */
+			dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u, bouncing link\n",
 				  __func__, sde->this_idx, (u32)reg);
+			queue_work(dd->pport->hfi1_wq,
+				&dd->pport->link_bounce_work);
 			break;
 		}
 		udelay(1);
-- 
1.8.2

  parent reply	other threads:[~2015-10-26 14:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26 14:28 [PATCH v3 00/23] staging/rdma/hfi1: Fix bugs and performance issues ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28 ` [PATCH v3 02/23] staging/rdma/hfi1: Fix code to reset ASIC CSRs on FLR ira.weiny
2015-10-26 14:28 ` ira.weiny [this message]
2015-10-26 14:28 ` [PATCH v3 05/23] staging/rdma/hfi1: Remove QSFP_ENABLED from HFI capability mask ira.weiny
2015-10-26 14:28 ` [PATCH v3 06/23] staging/rdma/hfi1: Add coalescing support for SDMA TX descriptors ira.weiny
     [not found] ` <1445869729-7507-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-10-26 14:28   ` [PATCH v3 01/23] staging/rdma/hfi1: Fix regression in send performance ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 03/23] staging/rdma/hfi1: Extend the offline timeout ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 07/23] staging/rdma/hfi1: close shared context security hole ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 09/23] staging/rdma/hfi1: Add a schedule in send thread ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 12/23] staging/rdma/hfi1: Macro code clean up ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1445869729-7507-13-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-10-27  8:19       ` Greg KH
     [not found]         ` <20151027081910.GA25171-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2015-10-27 20:51           ` ira.weiny
     [not found]             ` <20151027205115.GB32118-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-10-27 21:14               ` Greg KH
     [not found]                 ` <20151027211404.GA18879-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2015-10-28 15:47                   ` ira.weiny
2015-10-26 14:28   ` [PATCH v3 13/23] staging/rdma/hfi1: Wrong cast breaks desired pointer arithmetic ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 15/23] staging/rdma/hfi1: Allow tuning of SDMA interrupt rate ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 18/23] staging/rdma/hfi1: Thread the receive interrupt ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 19/23] staging/rdma/hfi: modify workqueue for parallelism ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-27  8:44     ` Greg KH
2015-10-26 14:28   ` [PATCH v3 20/23] staging/rdma/hfi1: Load SBus firmware once per ASIC ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 21/23] staging/rdma/hfi1: Add unit # to verbs txreq cache name ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 22/23] staging/rdma/hfi1: add additional rc traces ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-26 14:28   ` [PATCH v3 23/23] staging/rdma/hfi1: Update driver version string to 0.9-294 ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1445869729-7507-24-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-10-27  8:46       ` Greg KH
     [not found]         ` <20151027084641.GA16795-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2015-10-27 21:00           ` ira.weiny
2015-10-27 21:15             ` Greg KH
2015-10-26 14:28 ` [PATCH v3 08/23] staging/rdma/hfi1: Reset firmware instead of reloading Sbus ira.weiny
2015-10-26 14:28 ` [PATCH v3 10/23] staging/rdma/hfi1: Fix port bounce issues with 0.22 DC firmware ira.weiny
2015-10-26 14:28 ` [PATCH v3 11/23] staging/rdma/hfi1: Prevent silent data corruption with user SDMA ira.weiny
2015-10-26 14:28 ` [PATCH v3 14/23] staging/rdma/hfi1: Implement Expected Receive TID caching ira.weiny
2015-10-27  8:22   ` Greg KH
2015-10-26 14:28 ` [PATCH v3 16/23] staging/rdma/hfi1: Increase SDMA descriptor queue size ira.weiny
2015-10-26 14:28 ` [PATCH v3 17/23] staging/rdma/hfi1: Add irqsaves in the packet processing path ira.weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445869729-7507-5-git-send-email-ira.weiny@intel.com \
    --to=ira.weiny@intel.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=dledford@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=vennila.megavannan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).