From: Nick Child <nnac123@linux.ibm.com>
To: netdev@vger.kernel.org
Cc: haren@linux.ibm.com, ricklind@us.ibm.com, danymadden@us.ibm.com,
tlfalcon@linux.ibm.com, bjking1@linux.ibm.com,
Nick Child <nnac123@linux.ibm.com>
Subject: [PATCH net 4/5] ibmvnic: Do partial reset on login failure
Date: Thu, 3 Aug 2023 15:20:09 -0500 [thread overview]
Message-ID: <20230803202010.37149-4-nnac123@linux.ibm.com> (raw)
In-Reply-To: <20230803202010.37149-1-nnac123@linux.ibm.com>
Perform a partial reset before sending a login request if any of the
following are true:
1. If a previous request times out. This can be dangerous because the
VIOS could still receive the old login request at any point after
the timeout. Therefore, it is best to re-register the CRQ's and
sub-CRQ's before retrying.
2. If the previous request returns an error that is not described in
PAPR. PAPR provides procedures if the login returns with partial
success or aborted return codes (section L.5.1) but other values
do not have a defined procedure. Previously, these conditions
just returned error from the login function rather than trying
to resolve the issue.
This can cause further issues since most callers of the login
function are not prepared to handle an error when logging in. This
improper cleanup can lead to the device being permanently DOWN'd.
For example, if the VIOS believes that the device is already logged
in then it will return INVALID_STATE (-7). If we never re-register
CRQ's then it will always think that the device is already logged
in. This leaves the device inoperable.
The partial reset involves freeing the sub-CRQs, freeing the CRQ then
registering and initializing a new CRQ and sub-CRQs. This essentially
restarts all communication with VIOS to allow for a fresh login attempt
that will be unhindered by any previous failed attempts.
Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 46 ++++++++++++++++++++++++++----
1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 718af76fd711..8fd9639665a0 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -97,6 +97,8 @@ static int pending_scrq(struct ibmvnic_adapter *,
static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *,
struct ibmvnic_sub_crq_queue *);
static int ibmvnic_poll(struct napi_struct *napi, int data);
+static int reset_sub_crq_queues(struct ibmvnic_adapter *adapter);
+static inline void reinit_init_done(struct ibmvnic_adapter *adapter);
static void send_query_map(struct ibmvnic_adapter *adapter);
static int send_request_map(struct ibmvnic_adapter *, dma_addr_t, u32, u8);
static int send_request_unmap(struct ibmvnic_adapter *, u8);
@@ -1527,11 +1529,9 @@ static int ibmvnic_login(struct net_device *netdev)
if (!wait_for_completion_timeout(&adapter->init_done,
timeout)) {
- netdev_warn(netdev, "Login timed out, retrying...\n");
- retry = true;
- adapter->init_done_rc = 0;
- retry_count++;
- continue;
+ netdev_warn(netdev, "Login timed out\n");
+ adapter->login_pending = false;
+ goto partial_reset;
}
if (adapter->init_done_rc == ABORTED) {
@@ -1576,7 +1576,41 @@ static int ibmvnic_login(struct net_device *netdev)
} else if (adapter->init_done_rc) {
netdev_warn(netdev, "Adapter login failed, init_done_rc = %d\n",
adapter->init_done_rc);
- return -EIO;
+
+partial_reset:
+ /* adapter login failed, so free any CRQs or sub-CRQs
+ * and register again before attempting to login again.
+ * If we don't do this then the VIOS may think that
+ * we are already logged in and reject any subsequent
+ * attempts
+ */
+ netdev_warn(netdev,
+ "Freeing and re-registering CRQs before attempting to login again\n");
+ retry = true;
+ adapter->init_done_rc = 0;
+ retry_count++;
+ release_sub_crqs(adapter, true);
+ reinit_init_done(adapter);
+ release_crq_queue(adapter);
+ /* If we don't sleep here then we risk an unnecessary
+ * failover event from the VIOS. This is a known VIOS
+ * issue caused by a vnic device freeing and registering
+ * a CRQ too quickly.
+ */
+ msleep(1500);
+ rc = init_crq_queue(adapter);
+ if (rc) {
+ netdev_err(netdev, "login recovery: init CRQ failed %d\n",
+ rc);
+ return -EIO;
+ }
+
+ rc = ibmvnic_reset_init(adapter, false);
+ if (rc) {
+ netdev_err(netdev, "login recovery: Reset init failed %d\n",
+ rc);
+ return -EIO;
+ }
}
} while (retry);
--
2.39.3
next prev parent reply other threads:[~2023-08-03 20:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-03 20:20 [PATCH net 1/5] ibmvnic: Enforce stronger sanity checks on login response Nick Child
2023-08-03 20:20 ` [PATCH net 2/5] ibmvnic: Unmap DMA login rsp buffer on send login fail Nick Child
2023-08-05 7:19 ` Simon Horman
2023-08-03 20:20 ` [PATCH net 3/5] ibmvnic: Handle DMA unmapping of login buffs in release functions Nick Child
2023-08-05 7:19 ` Simon Horman
2023-08-03 20:20 ` Nick Child [this message]
2023-08-05 7:20 ` [PATCH net 4/5] ibmvnic: Do partial reset on login failure Simon Horman
2023-08-03 20:20 ` [PATCH net 5/5] ibmvnic: Ensure login failure recovery is safe from other resets Nick Child
2023-08-05 7:20 ` Simon Horman
2023-08-08 2:13 ` Jakub Kicinski
2023-08-05 7:18 ` [PATCH net 1/5] ibmvnic: Enforce stronger sanity checks on login response Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230803202010.37149-4-nnac123@linux.ibm.com \
--to=nnac123@linux.ibm.com \
--cc=bjking1@linux.ibm.com \
--cc=danymadden@us.ibm.com \
--cc=haren@linux.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=ricklind@us.ibm.com \
--cc=tlfalcon@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).