public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ibmvfc: Move login error handling
@ 2021-05-11 18:12 Brian King
  2021-05-11 18:12 ` [PATCH 1/3] ibmvfc: Handle move login failure Brian King
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Brian King @ 2021-05-11 18:12 UTC (permalink / raw)
  To: james.bottomley; +Cc: martin.petersen, linux-scsi, tyreld, Brian King

This set of patches makes the move login handling in the
ibmvfc driver more robust and resolves some issues seen
in our SAN interop testing.

Brian King (3):
  ibmvfc: Handle move login failure
  ibmvfc: Avoid move login if fast fail is enabled
  ibmvfc: Reinit target retries

 drivers/scsi/ibmvscsi/ibmvfc.c | 60 +++++++++++++++++++++++++++++-------------
 drivers/scsi/ibmvscsi/ibmvfc.h |  3 ++-
 2 files changed, 43 insertions(+), 20 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] ibmvfc: Handle move login failure
  2021-05-11 18:12 [PATCH 0/3] ibmvfc: Move login error handling Brian King
@ 2021-05-11 18:12 ` Brian King
  2021-05-11 18:12 ` [PATCH 2/3] ibmvfc: Avoid move login if fast fail is enabled Brian King
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Brian King @ 2021-05-11 18:12 UTC (permalink / raw)
  To: james.bottomley; +Cc: martin.petersen, linux-scsi, tyreld, Brian King

When service is being performed on an SVC with NPIV enabled, the WWPN of
the canister / node being serviced fails over to the another canister /
node. This looks to the ibmvfc driver as a WWPN moving from one SCSI ID
to another. The driver will first attempt to do an implicit logout of
the old SCSI ID. If this works, we simply delete the rport at the old
location and add an rport at the new location and the FC transport class
handles everything. However, if there is I/O outstanding, this implicit
logout will fail, in which case we will send a "move login" request to
the VIOS. This will cancel any outstanding I/O to that port, logout the
port, and PLOGI the new port. Recently we've encountered a scenario
where the move login fails. This was resulting in an attempted plogi
to the new scsi id, without the old scsi id getting logged out, which
is a VIOS protocol violation. To solve this, we want to keep tracking
the old scsi id as the current scsi id. That way, once
terminate_rport_io cancels the outstanding i/o, it will send us back
through to do an implicit logout of the old scsi id, rather than the
new scsi id, and then we can plogi the new scsi id.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 16 ++++++++--------
 drivers/scsi/ibmvscsi/ibmvfc.h |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 6540d48..4ac5bff 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4299,9 +4299,10 @@ static void ibmvfc_tgt_move_login_done(struct ibmvfc_event *evt)
 	ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_NONE);
 	switch (status) {
 	case IBMVFC_MAD_SUCCESS:
-		tgt_dbg(tgt, "Move Login succeeded for old scsi_id: %llX\n", tgt->old_scsi_id);
+		tgt_dbg(tgt, "Move Login succeeded for new scsi_id: %llX\n", tgt->new_scsi_id);
 		tgt->ids.node_name = wwn_to_u64(rsp->service_parms.node_name);
 		tgt->ids.port_name = wwn_to_u64(rsp->service_parms.port_name);
+		tgt->scsi_id = tgt->new_scsi_id;
 		tgt->ids.port_id = tgt->scsi_id;
 		memcpy(&tgt->service_parms, &rsp->service_parms,
 		       sizeof(tgt->service_parms));
@@ -4319,8 +4320,8 @@ static void ibmvfc_tgt_move_login_done(struct ibmvfc_event *evt)
 		level += ibmvfc_retry_tgt_init(tgt, ibmvfc_tgt_move_login);
 
 		tgt_log(tgt, level,
-			"Move Login failed: old scsi_id: %llX, flags:%x, vios_flags:%x, rc=0x%02X\n",
-			tgt->old_scsi_id, be32_to_cpu(rsp->flags), be16_to_cpu(rsp->vios_flags),
+			"Move Login failed: new scsi_id: %llX, flags:%x, vios_flags:%x, rc=0x%02X\n",
+			tgt->new_scsi_id, be32_to_cpu(rsp->flags), be16_to_cpu(rsp->vios_flags),
 			status);
 		break;
 	}
@@ -4357,8 +4358,8 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *tgt)
 	move->common.opcode = cpu_to_be32(IBMVFC_MOVE_LOGIN);
 	move->common.length = cpu_to_be16(sizeof(*move));
 
-	move->old_scsi_id = cpu_to_be64(tgt->old_scsi_id);
-	move->new_scsi_id = cpu_to_be64(tgt->scsi_id);
+	move->old_scsi_id = cpu_to_be64(tgt->scsi_id);
+	move->new_scsi_id = cpu_to_be64(tgt->new_scsi_id);
 	move->wwpn = cpu_to_be64(tgt->wwpn);
 	move->node_name = cpu_to_be64(tgt->ids.node_name);
 
@@ -4367,7 +4368,7 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *tgt)
 		ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_DEL_RPORT);
 		kref_put(&tgt->kref, ibmvfc_release_tgt);
 	} else
-		tgt_dbg(tgt, "Sent Move Login for old scsi_id: %llX\n", tgt->old_scsi_id);
+		tgt_dbg(tgt, "Sent Move Login for new scsi_id: %llX\n", tgt->new_scsi_id);
 }
 
 /**
@@ -4737,8 +4738,7 @@ static int ibmvfc_alloc_target(struct ibmvfc_host *vhost,
 			 * normal ibmvfc_set_tgt_action to set this, as we
 			 * don't normally want to allow this state change.
 			 */
-			wtgt->old_scsi_id = wtgt->scsi_id;
-			wtgt->scsi_id = scsi_id;
+			wtgt->new_scsi_id = scsi_id;
 			wtgt->action = IBMVFC_TGT_ACTION_INIT;
 			ibmvfc_init_tgt(wtgt, ibmvfc_tgt_move_login);
 			goto unlock_out;
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 19dcec3..4601bd2 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -718,7 +718,7 @@ struct ibmvfc_target {
 	struct ibmvfc_host *vhost;
 	u64 scsi_id;
 	u64 wwpn;
-	u64 old_scsi_id;
+	u64 new_scsi_id;
 	struct fc_rport *rport;
 	int target_id;
 	enum ibmvfc_target_action action;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] ibmvfc: Avoid move login if fast fail is enabled
  2021-05-11 18:12 [PATCH 0/3] ibmvfc: Move login error handling Brian King
  2021-05-11 18:12 ` [PATCH 1/3] ibmvfc: Handle move login failure Brian King
@ 2021-05-11 18:12 ` Brian King
  2021-05-11 18:12 ` [PATCH 3/3] ibmvfc: Reinit target retries Brian King
  2021-05-15  3:02 ` [PATCH 0/3] ibmvfc: Move login error handling Martin K. Petersen
  3 siblings, 0 replies; 5+ messages in thread
From: Brian King @ 2021-05-11 18:12 UTC (permalink / raw)
  To: james.bottomley; +Cc: martin.petersen, linux-scsi, tyreld, Brian King

If fast fail is enabled and we encounter a WWPN moving from one port id
to another port id with I/O outstanding, if we use the move login MAD,
although it will work, it will leave any outstanding I/O still outstanding
to the old port id. Eventually, the SCSI command timers will fire and
we will abort these commands, however, this is generally much longer than
the fast fail timeout, which can lead to I/O operations being outstanding
for a long time. This patch changes the behavior to avoid the move login
if fast fail is enabled. Once terminate_rport_io cleans up the rport, then
we force the target back through the delete process, which re-drives the
implicit logout, then kicks us back into discovery where we will discover
the WWPN at the new location and do a PLOGI to it.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 39 ++++++++++++++++++++++++++++-----------
 drivers/scsi/ibmvscsi/ibmvfc.h |  1 +
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 4ac5bff..c8d3fdf 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4728,19 +4728,24 @@ static int ibmvfc_alloc_target(struct ibmvfc_host *vhost,
 		 * and it failed for some reason, such as there being I/O
 		 * pending to the target. In this case, we will have already
 		 * deleted the rport from the FC transport so we do a move
-		 * login, which works even with I/O pending, as it will cancel
-		 * any active commands.
+		 * login, which works even with I/O pending, however, if
+		 * there is still I/O pending, it will stay outstanding, so
+		 * we only do this if fast fail is disabled for the rport,
+		 * otherwise we let terminate_rport_io clean up the port
+		 * before we login at the new location.
 		 */
 		if (wtgt->action == IBMVFC_TGT_ACTION_LOGOUT_DELETED_RPORT) {
-			/*
-			 * Do a move login here. The old target is no longer
-			 * known to the transport layer We don't use the
-			 * normal ibmvfc_set_tgt_action to set this, as we
-			 * don't normally want to allow this state change.
-			 */
-			wtgt->new_scsi_id = scsi_id;
-			wtgt->action = IBMVFC_TGT_ACTION_INIT;
-			ibmvfc_init_tgt(wtgt, ibmvfc_tgt_move_login);
+			if (wtgt->move_login) {
+				/*
+				 * Do a move login here. The old target is no longer
+				 * known to the transport layer We don't use the
+				 * normal ibmvfc_set_tgt_action to set this, as we
+				 * don't normally want to allow this state change.
+				 */
+				wtgt->new_scsi_id = scsi_id;
+				wtgt->action = IBMVFC_TGT_ACTION_INIT;
+				ibmvfc_init_tgt(wtgt, ibmvfc_tgt_move_login);
+			}
 			goto unlock_out;
 		} else {
 			tgt_err(wtgt, "Unexpected target state: %d, %p\n",
@@ -5486,6 +5491,18 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
 				rport = tgt->rport;
 				tgt->rport = NULL;
 				ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_LOGOUT_DELETED_RPORT);
+
+				/*
+				 * If fast fail is enabled, we wait for it to fire and then clean up
+				 * the old port, since we expect the fast fail timer to clean up the
+				 * outstanding I/O faster than waiting for normal command timeouts.
+				 * However, if fast fail is disabled, any I/O outstanding to the
+				 * rport LUNs will stay outstanding indefinitely, since the EH handlers
+				 * won't get invoked for I/O's timing out. If this is a NPIV failover
+				 * scenario, the better alternative is to use the move login.
+				 */
+				if (rport && rport->fast_io_fail_tmo == -1)
+					tgt->move_login = 1;
 				spin_unlock_irqrestore(vhost->host->host_lock, flags);
 				if (rport)
 					fc_remote_port_delete(rport);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 4601bd2..4f0f3ba 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -726,6 +726,7 @@ struct ibmvfc_target {
 	int add_rport;
 	int init_retries;
 	int logo_rcvd;
+	int move_login;
 	u32 cancel_key;
 	struct ibmvfc_service_parms service_parms;
 	struct ibmvfc_service_parms service_parms_change;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] ibmvfc: Reinit target retries
  2021-05-11 18:12 [PATCH 0/3] ibmvfc: Move login error handling Brian King
  2021-05-11 18:12 ` [PATCH 1/3] ibmvfc: Handle move login failure Brian King
  2021-05-11 18:12 ` [PATCH 2/3] ibmvfc: Avoid move login if fast fail is enabled Brian King
@ 2021-05-11 18:12 ` Brian King
  2021-05-15  3:02 ` [PATCH 0/3] ibmvfc: Move login error handling Martin K. Petersen
  3 siblings, 0 replies; 5+ messages in thread
From: Brian King @ 2021-05-11 18:12 UTC (permalink / raw)
  To: james.bottomley; +Cc: martin.petersen, linux-scsi, tyreld, Brian King

If rport target discovery commands fail for some reason, they get retried
up to a set number of retries. Once the retry limit is exceeded, the
target is deleted. In order to delete the target, we either need to
do an implicit logout or a move login. In the move login case, if the
move login fails, we want to retry it. This ensures the retry counter
gets reinitialized so the move login will get retried.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index c8d3fdf..a251dbf 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -654,8 +654,10 @@ static void ibmvfc_reinit_host(struct ibmvfc_host *vhost)
  **/
 static void ibmvfc_del_tgt(struct ibmvfc_target *tgt)
 {
-	if (!ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_LOGOUT_RPORT))
+	if (!ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_LOGOUT_RPORT)) {
 		tgt->job_step = ibmvfc_tgt_implicit_logout_and_del;
+		tgt->init_retries = 0;
+	}
 	wake_up(&tgt->vhost->work_wait_q);
 }
 
@@ -4744,6 +4746,7 @@ static int ibmvfc_alloc_target(struct ibmvfc_host *vhost,
 				 */
 				wtgt->new_scsi_id = scsi_id;
 				wtgt->action = IBMVFC_TGT_ACTION_INIT;
+				wtgt->init_retries = 0;
 				ibmvfc_init_tgt(wtgt, ibmvfc_tgt_move_login);
 			}
 			goto unlock_out;
@@ -5336,6 +5339,7 @@ static void ibmvfc_tgt_add_rport(struct ibmvfc_target *tgt)
 		tgt_dbg(tgt, "Deleting rport with outstanding I/O\n");
 		ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_LOGOUT_DELETED_RPORT);
 		tgt->rport = NULL;
+		tgt->init_retries = 0;
 		spin_unlock_irqrestore(vhost->host->host_lock, flags);
 		fc_remote_port_delete(rport);
 		return;
@@ -5490,6 +5494,7 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
 				tgt_dbg(tgt, "Deleting rport with I/O outstanding\n");
 				rport = tgt->rport;
 				tgt->rport = NULL;
+				tgt->init_retries = 0;
 				ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_LOGOUT_DELETED_RPORT);
 
 				/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] ibmvfc: Move login error handling
  2021-05-11 18:12 [PATCH 0/3] ibmvfc: Move login error handling Brian King
                   ` (2 preceding siblings ...)
  2021-05-11 18:12 ` [PATCH 3/3] ibmvfc: Reinit target retries Brian King
@ 2021-05-15  3:02 ` Martin K. Petersen
  3 siblings, 0 replies; 5+ messages in thread
From: Martin K. Petersen @ 2021-05-15  3:02 UTC (permalink / raw)
  To: Brian King; +Cc: james.bottomley, martin.petersen, linux-scsi, tyreld


Brian,

> This set of patches makes the move login handling in the ibmvfc driver
> more robust and resolves some issues seen in our SAN interop testing.

Applied to 5.14/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-15  3:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-11 18:12 [PATCH 0/3] ibmvfc: Move login error handling Brian King
2021-05-11 18:12 ` [PATCH 1/3] ibmvfc: Handle move login failure Brian King
2021-05-11 18:12 ` [PATCH 2/3] ibmvfc: Avoid move login if fast fail is enabled Brian King
2021-05-11 18:12 ` [PATCH 3/3] ibmvfc: Reinit target retries Brian King
2021-05-15  3:02 ` [PATCH 0/3] ibmvfc: Move login error handling Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox