linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org
To: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b@public.gmane.org
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	Vennila Megavannan
	<vennila.megavannan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: [PATCH v2 04/22] staging/rdma/hfi1: Prevent host software lock up
Date: Mon, 19 Oct 2015 22:11:19 -0400	[thread overview]
Message-ID: <1445307097-8244-5-git-send-email-ira.weiny@intel.com> (raw)
In-Reply-To: <1445307097-8244-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

From: Vennila Megavannan <vennila.megavannan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

If packets stop egressing the hardware link, software can lock up.

Implement a timeout for send context halt recovery.  This patch increases the
timeout for packet egress to 500 us and timer resets to zero if the packet
occupancy changes. Also we bounce the link on time out.

Reviewed-by: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Vennila Megavannan <vennila.megavannan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/staging/rdma/hfi1/pio.c  | 14 +++++++++++---
 drivers/staging/rdma/hfi1/sdma.c | 15 ++++++++++++---
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c
index 67dd93a6888c..e5c32db4bc67 100644
--- a/drivers/staging/rdma/hfi1/pio.c
+++ b/drivers/staging/rdma/hfi1/pio.c
@@ -922,10 +922,12 @@ void sc_disable(struct send_context *sc)
 static void sc_wait_for_packet_egress(struct send_context *sc, int pause)
 {
 	struct hfi1_devdata *dd = sc->dd;
-	u64 reg;
+	u64 reg = 0;
+	u64 reg_prev;
 	u32 loop = 0;
 
 	while (1) {
+		reg_prev = reg;
 		reg = read_csr(dd, sc->hw_context * 8 +
 			       SEND_EGRESS_CTXT_STATUS);
 		/* done if egress is stopped */
@@ -934,11 +936,17 @@ static void sc_wait_for_packet_egress(struct send_context *sc, int pause)
 		reg = packet_occupancy(reg);
 		if (reg == 0)
 			break;
-		if (loop > 100) {
+		/* counter is reset if occupancy count changes */
+		if (reg != reg_prev)
+			loop = 0;
+		if (loop > 500) {
+			/* timed out - bounce the link */
 			dd_dev_err(dd,
-				"%s: context %u(%u) timeout waiting for packets to egress, remaining count %u\n",
+				"%s: context %u(%u) timeout waiting for packets to egress, remaining count %u, bouncing link\n",
 				__func__, sc->sw_index,
 				sc->hw_context, (u32)reg);
+			queue_work(dd->pport->hfi1_wq,
+				&dd->pport->link_bounce_work);
 			break;
 		}
 		loop++;
diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c
index 63ab72102183..d57531796723 100644
--- a/drivers/staging/rdma/hfi1/sdma.c
+++ b/drivers/staging/rdma/hfi1/sdma.c
@@ -303,17 +303,26 @@ static void sdma_wait_for_packet_egress(struct sdma_engine *sde,
 	u64 off = 8 * sde->this_idx;
 	struct hfi1_devdata *dd = sde->dd;
 	int lcnt = 0;
+	u64 reg_prev;
+	u64 reg = 0;
 
 	while (1) {
-		u64 reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS);
+		reg_prev = reg;
+		reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS);
 
 		reg &= SDMA_EGRESS_PACKET_OCCUPANCY_SMASK;
 		reg >>= SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT;
 		if (reg == 0)
 			break;
-		if (lcnt++ > 100) {
-			dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u\n",
+		/* counter is reest if accupancy count changes */
+		if (reg != reg_prev)
+			lcnt = 0;
+		if (lcnt++ > 500) {
+			/* timed out - bounce the link */
+			dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u, bouncing link\n",
 				  __func__, sde->this_idx, (u32)reg);
+			queue_work(dd->pport->hfi1_wq,
+				&dd->pport->link_bounce_work);
 			break;
 		}
 		udelay(1);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-10-20  2:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-20  2:11 [PATCH v2 00/22] staging/rdma/hfi1: Fix bugs and performance issues ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found] ` <1445307097-8244-1-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-10-20  2:11   ` [PATCH v2 01/22] staging/rdma/hfi1: Fix regression in send performance ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1445307097-8244-2-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-10-21 13:18       ` Dan Carpenter
2015-10-26  2:10         ` ira.weiny
2015-10-20  2:11   ` [PATCH v2 02/22] staging/rdma/hfi1: Fix code to reset ASIC CSRs on FLR ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 03/22] staging/rdma/hfi1: Extend the offline timeout ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` ira.weiny-ral2JQCrhuEAvxtiuMwx3w [this message]
2015-10-20  2:11   ` [PATCH v2 05/22] staging/rdma/hfi1: Remove QSFP_ENABLED from HFI capability mask ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 06/22] staging/rdma/hfi1: Add coalescing support for SDMA TX descriptors ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 07/22] staging/rdma/hfi1: Fix sparse error in sdma.h file ira.weiny-ral2JQCrhuEAvxtiuMwx3w
     [not found]     ` <1445307097-8244-8-git-send-email-ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-10-21 14:12       ` Dan Carpenter
2015-10-21 16:29         ` Weiny, Ira
2015-10-22 10:01           ` Dan Carpenter
2015-10-25  1:59             ` gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r
2015-10-20  2:11   ` [PATCH v2 08/22] staging/rdma/hfi1: close shared context security hole ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 09/22] staging/rdma/hfi1: Reset firmware instead of reloading Sbus ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 10/22] staging/rdma/hfi1: Add a schedule in send thread ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 11/22] staging/rdma/hfi1: Fix port bounce issues with 0.22 DC firmware ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 12/22] staging/rdma/hfi1: Prevent silent data corruption with user SDMA ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 13/22] staging/rdma/hfi1: Macro code clean up ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 14/22] staging/rdma/hfi1: Implement Expected Receive TID caching ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-22 10:41     ` Dan Carpenter
2015-10-22 23:18       ` ira.weiny
     [not found]         ` <20151022231819.GB4019-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-10-23  3:53           ` Dan Carpenter
2015-10-20  2:11   ` [PATCH v2 15/22] staging/rdma/hfi1: Allow tuning of SDMA interrupt rate ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-22 10:54     ` Dan Carpenter
2015-10-22 22:27       ` ira.weiny
2015-10-20  2:11   ` [PATCH v2 16/22] staging/rdma/hfi1: Add irqsaves in the packet processing path ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 17/22] staging/rdma/hfi1: Thread the receive interrupt ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 18/22] staging/rdma/hfi: modify workqueue for parallelism ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 19/22] staging/rdma/hfi1: Load SBus firmware once per ASIC ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 20/22] staging/rdma/hfi1: Add unit # to verbs txreq cache name ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 21/22] staging/rdma/hfi1: add additional rc traces ira.weiny-ral2JQCrhuEAvxtiuMwx3w
2015-10-20  2:11   ` [PATCH v2 22/22] staging/rdma/hfi1: Update driver version string to 0.9-294 ira.weiny-ral2JQCrhuEAvxtiuMwx3w

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445307097-8244-5-git-send-email-ira.weiny@intel.com \
    --to=ira.weiny-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=vennila.megavannan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).