From: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Easwar Hariharan
<easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: [PATCH 11/28] IB/hfi1: Explain state complete frame details
Date: Mon, 25 Jul 2016 13:38:56 -0700 [thread overview]
Message-ID: <20160725203855.4800.55229.stgit@scvm10.sc.intel.com> (raw)
In-Reply-To: <20160725203554.4800.37248.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
From: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
When link up fails in LNI, the local and peer state complete
frames are reported as numbers. Explain what the values mean
so the operator can better diagnose the problem.
Reviewed-by: Easwar Hariharan <easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dean Luick <dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/hw/hfi1/chip.c | 134 +++++++++++++++++++++++++++++++++++--
1 files changed, 126 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index e5f49ef..f3782b3 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -9918,6 +9918,131 @@ static int wait_phy_linkstate(struct hfi1_devdata *dd, u32 state, u32 msecs)
return 0;
}
+static const char *state_completed_string(u32 completed)
+{
+ static const char * const state_completed[] = {
+ "EstablishComm",
+ "OptimizeEQ",
+ "VerifyCap"
+ };
+
+ if (completed < ARRAY_SIZE(state_completed))
+ return state_completed[completed];
+
+ return "unknown";
+}
+
+static const char all_lanes_dead_timeout_expired[] =
+ "All lanes were inactive – was the interconnect media removed?";
+static const char tx_out_of_policy[] =
+ "Passing lanes on local port do not meet the local link width policy";
+static const char no_state_complete[] =
+ "State timeout occurred before link partner completed the state";
+static const char * const state_complete_reasons[] = {
+ [0x00] = "Reason unknown",
+ [0x01] = "Link was halted by driver, refer to LinkDownReason",
+ [0x02] = "Link partner reported failure",
+ [0x10] = "Unable to achieve frame sync on any lane",
+ [0x11] =
+ "Unable to find a common bit rate with the link partner",
+ [0x12] =
+ "Unable to achieve frame sync on sufficient lanes to meet the local link width policy",
+ [0x13] =
+ "Unable to identify preset equalization on sufficient lanes to meet the local link width policy",
+ [0x14] = no_state_complete,
+ [0x15] =
+ "State timeout occurred before link partner identified equalization presets",
+ [0x16] =
+ "Link partner completed the EstablishComm state, but the passing lanes do not meet the local link width policy",
+ [0x17] = tx_out_of_policy,
+ [0x20] = all_lanes_dead_timeout_expired,
+ [0x21] =
+ "Unable to achieve acceptable BER on sufficient lanes to meet the local link width policy",
+ [0x22] = no_state_complete,
+ [0x23] =
+ "Link partner completed the OptimizeEq state, but the passing lanes do not meet the local link width policy",
+ [0x24] = tx_out_of_policy,
+ [0x30] = all_lanes_dead_timeout_expired,
+ [0x31] =
+ "State timeout occurred waiting for host to process received frames",
+ [0x32] = no_state_complete,
+ [0x33] =
+ "Link partner completed the VerifyCap state, but the passing lanes do not meet the local link width policy",
+ [0x34] = tx_out_of_policy,
+};
+
+static const char *state_complete_reason_code_string(struct hfi1_pportdata *ppd,
+ u32 code)
+{
+ const char *str = NULL;
+
+ if (code < ARRAY_SIZE(state_complete_reasons))
+ str = state_complete_reasons[code];
+
+ if (str)
+ return str;
+ return "Reserved";
+}
+
+/* describe the given last state complete frame */
+static void decode_state_complete(struct hfi1_pportdata *ppd, u32 frame,
+ const char *prefix)
+{
+ struct hfi1_devdata *dd = ppd->dd;
+ u32 success;
+ u32 state;
+ u32 reason;
+ u32 lanes;
+
+ /*
+ * Decode frame:
+ * [ 0: 0] - success
+ * [ 3: 1] - state
+ * [ 7: 4] - next state timeout
+ * [15: 8] - reason code
+ * [31:16] - lanes
+ */
+ success = frame & 0x1;
+ state = (frame >> 1) & 0x7;
+ reason = (frame >> 8) & 0xff;
+ lanes = (frame >> 16) & 0xffff;
+
+ dd_dev_err(dd, "Last %s LNI state complete frame 0x%08x:\n",
+ prefix, frame);
+ dd_dev_err(dd, " last reported state state: %s (0x%x)\n",
+ state_completed_string(state), state);
+ dd_dev_err(dd, " state successfully completed: %s\n",
+ success ? "yes" : "no");
+ dd_dev_err(dd, " fail reason 0x%x: %s\n",
+ reason, state_complete_reason_code_string(ppd, reason));
+ dd_dev_err(dd, " passing lane mask: 0x%x", lanes);
+}
+
+/*
+ * Read the last state complete frames and explain them. This routine
+ * expects to be called if the link went down during link negotiation
+ * and initialization (LNI). That is, anywhere between polling and link up.
+ */
+static void check_lni_states(struct hfi1_pportdata *ppd)
+{
+ u32 last_local_state;
+ u32 last_remote_state;
+
+ read_last_local_state(ppd->dd, &last_local_state);
+ read_last_remote_state(ppd->dd, &last_remote_state);
+
+ /*
+ * Don't report anything if there is nothing to report. A value of
+ * 0 means the link was taken down while polling and there was no
+ * training in-process.
+ */
+ if (last_local_state == 0 && last_remote_state == 0)
+ return;
+
+ decode_state_complete(ppd, last_local_state, "transmitted");
+ decode_state_complete(ppd, last_remote_state, "received");
+}
+
/*
* Helper for set_link_state(). Do not call except from that routine.
* Expects ppd->hls_mutex to be held.
@@ -9930,8 +10055,6 @@ static int goto_offline(struct hfi1_pportdata *ppd, u8 rem_reason)
{
struct hfi1_devdata *dd = ppd->dd;
u32 pstate, previous_state;
- u32 last_local_state;
- u32 last_remote_state;
int ret;
int do_transition;
int do_wait;
@@ -10031,12 +10154,7 @@ static int goto_offline(struct hfi1_pportdata *ppd, u8 rem_reason)
} else if (previous_state
& (HLS_DN_POLL | HLS_VERIFY_CAP | HLS_GOING_UP)) {
/* went down while attempting link up */
- /* byte 1 of last_*_state is the failure reason */
- read_last_local_state(dd, &last_local_state);
- read_last_remote_state(dd, &last_remote_state);
- dd_dev_err(dd,
- "LNI failure last states: local 0x%08x, remote 0x%08x\n",
- last_local_state, last_remote_state);
+ check_lni_states(ppd);
}
/* the active link width (downgrade) is 0 on link down */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-07-25 20:38 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-25 20:37 [PATCH 00/28] RDMA/hfi1,qib,rdmavt: Second round of fixes for 4.8 Dennis Dalessandro
[not found] ` <20160725203554.4800.37248.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2016-07-25 20:37 ` [PATCH 01/28] IB/hfi1: Fix integrity errors counter value calculation Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 02/28] IB/hfi1: Fix to fully initialize send context area Dennis Dalessandro
[not found] ` <20160725203759.4800.2358.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2016-07-26 5:26 ` Leon Romanovsky
[not found] ` <20160726052657.GD20674-2ukJVAZIZ/Y@public.gmane.org>
2016-07-26 14:18 ` Dalessandro, Dennis
2016-07-28 16:32 ` ira.weiny
[not found] ` <20160728163209.GA28030-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2016-07-31 6:53 ` Leon Romanovsky
2016-07-25 20:38 ` [PATCH 03/28] IB/hfi1: Pull FECN/BECN processing to a common place Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 04/28] IB/rdmavt: Add support for ib_map_mr_sg Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 05/28] IB/rdmavt: Add mechanism to invalidate MR keys Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 06/28] IB/rdmavt: Handle local operations in post send Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 07/28] IB/hfi1: Handle send with invalidate opcode in the RC recv path Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 08/28] IB/hfi1: Work request processing for fast register mr and invalidate Dennis Dalessandro
2016-07-25 20:38 ` [PATCH 09/28] IB/hfi1: Add support for extended memory management Dennis Dalessandro
[not found] ` <20160725203842.4800.60710.stgit-9QXIwq+3FY+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2016-07-25 21:24 ` Jason Gunthorpe
[not found] ` <20160725212457.GA21162-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-25 21:39 ` Dalessandro, Dennis
2016-07-25 20:38 ` [PATCH 10/28] IB/hfi1: Modify the default number of kernel receive conexts Dennis Dalessandro
2016-07-25 20:38 ` Dennis Dalessandro [this message]
2016-07-25 20:39 ` [PATCH 12/28] IB/hfi1: Read all firmware versions Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 14/28] IB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 15/28] IB/hfi1: Add static PCIe Gen3 CTLE tuning Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 16/28] IB/hfi1: Add sysfs entry to override SDMA interrupt affinity Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 17/28] IB/hfi1: Fix trace message units Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 18/28] IB/hfi1: Add the capability for reserved operations Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 19/28] IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 20/28] IB/hfi1: Disable external device configuration requests Dennis Dalessandro
2016-07-25 20:39 ` [PATCH 21/28] IB/hfi1: Ignore QSFP interrupts until power stabilizes Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 22/28] IB/hfi1: Reset QSFP on every run through channel tuning Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 23/28] IB/hfi1: Remove unused elements from struct ahg_ib_header Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 24/28] IB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 25/28] IB/hfi1: Rename hfi1_pio_header to hfi1_sdma_header Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 26/28] IB/hfi1: Cleanup UD packet handler Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 27/28] IB/hfi1: Use hdr2sc function to calculate 5-bit SC Dennis Dalessandro
2016-07-25 20:40 ` [PATCH 28/28] IB/qib, IB/hfi1: Fix grh creation in ud loopback Dennis Dalessandro
2016-08-03 2:40 ` [PATCH 00/28] RDMA/hfi1,qib,rdmavt: Second round of fixes for 4.8 Doug Ledford
2016-07-25 20:39 ` [PATCH 13/28] IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lock Dennis Dalessandro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160725203855.4800.55229.stgit@scvm10.sc.intel.com \
--to=dennis.dalessandro-ral2jqcrhueavxtiumwx3w@public.gmane.org \
--cc=dean.luick-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=easwar.hariharan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).