[PATCHv500/15] scsi: EH rework prep patches, part 2

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHv500/15] scsi: EH rework prep patches, part 2
@ 2023-10-02 15:49 Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset Hannes Reinecke
                   ` (14 more replies)
  0 siblings, 15 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Hi all,

(taking up an old thread:)
here's the second batch of patches for my EH rework.
It modifies the reset callbacks for SCSI drivers
such that the final conversion to drop the 'struct scsi_cmnd'
argument and use the entity in question (host, bus, target, device)
as the argument to the SCSI EH callbacks becomes possible.
The second part covers drivers which require a bit more love.
In particular the fnic, snic, and csiostor drivers require a tag to
send TMFs. So to handle that I've modified the patches to call
scsi_alloc_request() with the NOWAIT flag; that will return a scsi
command with a valid tag. It might fail (eg when the tagset is full),
but in these cases it might be better to fall back to host_reset anyway.

As usual, comments and reviews are welcome.

Changes to v4:
- rework snic to use a dedicated tag for host reset
- rework snic to allocate TMFs on the fly
- rework fnic to allocate TMFs on the fly
- rework csiostor to allocat TMFs on the fly
- drop fc_block_rport() from zfcp host_reset
- Rebase to latest linus tree

Changes to v3:
- Move fnic and snic patches to the next patchset
- Include reviews from Ewan Milne

Changes to v2:
- Include reviews from John Garry
- move mpi3mr, zfcp, sym53c8xx_2, and qla1280 patches to the
  next patchset

Changes to the initial version:
- Include reviews from Christoph
- Fixup build robot issues

Hannes Reinecke (15):
  zfcp: do not wait for rports to become unblocked after host reset
  bfa: Do not use scsi command to signal TMF status
  aha152x: look for stuck command when resetting device
  a1000u2w: do not rely on the command for inia100_device_reset()
  fas216: Rework device reset to not rely on SCSI command pointer
  xen-scsifront: add scsi device as argument to scsifront_do_request()
  xen-scsifront: rework scsifront_action_handler()
  libiscsi: use cls_session as argument for target and session reset
  scsi_transport_iscsi: use session as argument for
    iscsi_block_scsi_eh()
  snic: reserve tag for TMF
  snic: allocate device reset command
  snic: Use scsi_host_busy_iter() to traverse commands
  fnic: allocate device reset command on the fly
  fnic: use fc_block_rport() correctly
  csiostor: use separate TMF command

 drivers/s390/scsi/zfcp_scsi.c       |   4 -
 drivers/scsi/a100u2w.c              |  43 +---
 drivers/scsi/aha152x.c              |  26 ++-
 drivers/scsi/arm/fas216.c           |  39 ++--
 drivers/scsi/be2iscsi/be_main.c     |  10 +-
 drivers/scsi/bfa/bfad_im.c          | 112 +++++-----
 drivers/scsi/bfa/bfad_im.h          |   2 +
 drivers/scsi/csiostor/csio_scsi.c   |  72 ++++---
 drivers/scsi/fnic/fnic.h            |   1 -
 drivers/scsi/fnic/fnic_scsi.c       | 115 +++++-----
 drivers/scsi/libiscsi.c             |  21 +-
 drivers/scsi/qla4xxx/ql4_os.c       |  34 +--
 drivers/scsi/scsi_transport_iscsi.c |   6 +-
 drivers/scsi/snic/snic.h            |   2 +-
 drivers/scsi/snic/snic_main.c       |   5 +-
 drivers/scsi/snic/snic_scsi.c       | 319 ++++++++++++----------------
 drivers/scsi/xen-scsifront.c        |  49 +++--
 include/scsi/libiscsi.h             |   2 +-
 include/scsi/scsi_transport_iscsi.h |   2 +-
 19 files changed, 414 insertions(+), 450 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-12 13:54   ` Benjamin Block
  2023-10-02 15:49 ` [PATCH 02/15] bfa: Do not use scsi command to signal TMF status Hannes Reinecke
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke,
	Benjamin Block

zfcp_scsi_eh_host_reset_handler() would call fc_block_rport() to
wait for all rports to become unblocked after host reset.
But after host reset it might happen that the port is gone, hence
fc_block_rport() might fail due to a missing port.
But that's a perfectly legal operation; on FC remote ports might
come and go.
In the same vein FC HBAs are able to deal with ports being temporarily
blocked, so really there is not point in waiting for all ports
to become unblocked during host reset.
Hence remove the call to fc_block_rport() in host reset.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Steffen Maier <maier@linux.ibm.com
Cc: Benjamin Block <bblock@linux.ibm.com>
---
 drivers/s390/scsi/zfcp_scsi.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index b2a8cd792266..14f929cca271 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -383,10 +383,6 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
 	}
 	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
 	zfcp_erp_wait(adapter);
-	fc_ret = fc_block_scsi_eh(scpnt);
-	if (fc_ret)
-		ret = fc_ret;
-
 	zfcp_dbf_scsi_eh("schrh_r", adapter, ~0, ret);
 	return ret;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/15] bfa: Do not use scsi command to signal TMF status
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 03/15] aha152x: look for stuck command when resetting device Hannes Reinecke
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

The bfa driver hijacks the scsi command to signal the TMF status,
which will no longer work if the TMF handler will be converted.
So rework TMF handling to not use a scsi command but rather add
new TMF fields to the per-device structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/bfa/bfad_im.c | 112 ++++++++++++++++++++-----------------
 drivers/scsi/bfa/bfad_im.h |   2 +
 2 files changed, 63 insertions(+), 51 deletions(-)

diff --git a/drivers/scsi/bfa/bfad_im.c b/drivers/scsi/bfa/bfad_im.c
index a9d3d8562d3c..ef352fc59458 100644
--- a/drivers/scsi/bfa/bfad_im.c
+++ b/drivers/scsi/bfa/bfad_im.c
@@ -147,13 +147,13 @@ void
 bfa_cb_tskim_done(void *bfad, struct bfad_tskim_s *dtsk,
 		   enum bfi_tskim_status tsk_status)
 {
-	struct scsi_cmnd *cmnd = (struct scsi_cmnd *)dtsk;
+	struct bfad_itnim_data_s *itnim_data =
+		(struct bfad_itnim_data_s *)dtsk;
 	wait_queue_head_t *wq;
 
-	bfad_priv(cmnd)->status |= tsk_status << 1;
-	set_bit(IO_DONE_BIT, &bfad_priv(cmnd)->status);
-	wq = bfad_priv(cmnd)->wq;
-	bfad_priv(cmnd)->wq = NULL;
+	itnim_data->tmf_status |= tsk_status << 1;
+	set_bit(IO_DONE_BIT, &itnim_data->tmf_status);
+	wq = itnim_data->tmf_wq;
 
 	if (wq)
 		wake_up(wq);
@@ -238,15 +238,16 @@ bfad_im_abort_handler(struct scsi_cmnd *cmnd)
 }
 
 static bfa_status_t
-bfad_im_target_reset_send(struct bfad_s *bfad, struct scsi_cmnd *cmnd,
-		     struct bfad_itnim_s *itnim)
+bfad_im_target_reset_send(struct bfad_s *bfad,
+			  struct bfad_itnim_data_s *itnim_data)
 {
+	struct bfad_itnim_s *itnim = itnim_data->itnim;
 	struct bfa_tskim_s *tskim;
 	struct bfa_itnim_s *bfa_itnim;
 	bfa_status_t    rc = BFA_STATUS_OK;
 	struct scsi_lun scsilun;
 
-	tskim = bfa_tskim_alloc(&bfad->bfa, (struct bfad_tskim_s *) cmnd);
+	tskim = bfa_tskim_alloc(&bfad->bfa, (struct bfad_tskim_s *) itnim_data);
 	if (!tskim) {
 		BFA_LOG(KERN_ERR, bfad, bfa_log_level,
 			"target reset, fail to allocate tskim\n");
@@ -254,12 +255,6 @@ bfad_im_target_reset_send(struct bfad_s *bfad, struct scsi_cmnd *cmnd,
 		goto out;
 	}
 
-	/*
-	 * Set host_scribble to NULL to avoid aborting a task command if
-	 * happens.
-	 */
-	cmnd->host_scribble = NULL;
-	bfad_priv(cmnd)->status = 0;
 	bfa_itnim = bfa_fcs_itnim_get_halitn(&itnim->fcs_itnim);
 	/*
 	 * bfa_itnim can be NULL if the port gets disconnected and the bfa
@@ -290,10 +285,11 @@ bfad_im_target_reset_send(struct bfad_s *bfad, struct scsi_cmnd *cmnd,
 static int
 bfad_im_reset_lun_handler(struct scsi_cmnd *cmnd)
 {
-	struct Scsi_Host *shost = cmnd->device->host;
+	struct scsi_device *sdev = cmnd->device;
+	struct Scsi_Host *shost = sdev->host;
 	struct bfad_im_port_s *im_port =
 			(struct bfad_im_port_s *) shost->hostdata[0];
-	struct bfad_itnim_data_s *itnim_data = cmnd->device->hostdata;
+	struct bfad_itnim_data_s *itnim_data = sdev->hostdata;
 	struct bfad_s         *bfad = im_port->bfad;
 	struct bfa_tskim_s *tskim;
 	struct bfad_itnim_s   *itnim;
@@ -305,14 +301,20 @@ bfad_im_reset_lun_handler(struct scsi_cmnd *cmnd)
 	struct scsi_lun scsilun;
 
 	spin_lock_irqsave(&bfad->bfad_lock, flags);
+	if (itnim_data->tmf_wq) {
+		BFA_LOG(KERN_ERR, bfad, bfa_log_level,
+			"LUN reset, TMF already active");
+		spin_unlock_irqrestore(&bfad->bfad_lock, flags);
+		rc = FAILED;
+		goto out;
+	}
 	itnim = itnim_data->itnim;
 	if (!itnim) {
 		spin_unlock_irqrestore(&bfad->bfad_lock, flags);
 		rc = FAILED;
 		goto out;
 	}
-
-	tskim = bfa_tskim_alloc(&bfad->bfa, (struct bfad_tskim_s *) cmnd);
+	tskim = bfa_tskim_alloc(&bfad->bfa, (struct bfad_tskim_s *) itnim_data);
 	if (!tskim) {
 		BFA_LOG(KERN_ERR, bfad, bfa_log_level,
 				"LUN reset, fail to allocate tskim");
@@ -321,13 +323,8 @@ bfad_im_reset_lun_handler(struct scsi_cmnd *cmnd)
 		goto out;
 	}
 
-	/*
-	 * Set host_scribble to NULL to avoid aborting a task command
-	 * if happens.
-	 */
-	cmnd->host_scribble = NULL;
-	bfad_priv(cmnd)->wq = &wq;
-	bfad_priv(cmnd)->status = 0;
+	itnim_data->tmf_wq = &wq;
+	itnim_data->tmf_status = 0;
 	bfa_itnim = bfa_fcs_itnim_get_halitn(&itnim->fcs_itnim);
 	/*
 	 * bfa_itnim can be NULL if the port gets disconnected and the bfa
@@ -342,14 +339,15 @@ bfad_im_reset_lun_handler(struct scsi_cmnd *cmnd)
 		rc = FAILED;
 		goto out;
 	}
-	int_to_scsilun(cmnd->device->lun, &scsilun);
+	int_to_scsilun(sdev->lun, &scsilun);
 	bfa_tskim_start(tskim, bfa_itnim, scsilun,
 			    FCP_TM_LUN_RESET, BFAD_LUN_RESET_TMO);
 	spin_unlock_irqrestore(&bfad->bfad_lock, flags);
 
-	wait_event(wq, test_bit(IO_DONE_BIT, &bfad_priv(cmnd)->status));
+	wait_event(wq, test_bit(IO_DONE_BIT, &itnim_data->tmf_status));
 
-	task_status = bfad_priv(cmnd)->status >> 1;
+	itnim_data->tmf_wq = NULL;
+	task_status = itnim_data->tmf_status >> 1;
 	if (task_status != BFI_TSKIM_STS_OK) {
 		BFA_LOG(KERN_ERR, bfad, bfa_log_level,
 			"LUN reset failure, status: %d\n", task_status);
@@ -368,35 +366,47 @@ bfad_im_reset_target_handler(struct scsi_cmnd *cmnd)
 {
 	struct Scsi_Host *shost = cmnd->device->host;
 	struct scsi_target *starget = scsi_target(cmnd->device);
+	struct fc_rport *rport = starget_to_rport(starget);
 	struct bfad_im_port_s *im_port =
 				(struct bfad_im_port_s *) shost->hostdata[0];
-	struct bfad_s         *bfad = im_port->bfad;
-	struct bfad_itnim_s   *itnim;
-	unsigned long   flags;
-	u32        rc, rtn = FAILED;
+	struct bfad_s *bfad = im_port->bfad;
+	struct bfad_itnim_data_s *itnim_data;
+	struct bfad_itnim_s *itnim;
+	unsigned long flags;
+	u32 rc, rtn = FAILED;
 	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	enum bfi_tskim_status task_status;
 
 	spin_lock_irqsave(&bfad->bfad_lock, flags);
-	itnim = bfad_get_itnim(im_port, starget->id);
-	if (itnim) {
-		bfad_priv(cmnd)->wq = &wq;
-		rc = bfad_im_target_reset_send(bfad, cmnd, itnim);
-		if (rc == BFA_STATUS_OK) {
-			/* wait target reset to complete */
-			spin_unlock_irqrestore(&bfad->bfad_lock, flags);
-			wait_event(wq, test_bit(IO_DONE_BIT,
-						&bfad_priv(cmnd)->status));
-			spin_lock_irqsave(&bfad->bfad_lock, flags);
-
-			task_status = bfad_priv(cmnd)->status >> 1;
-			if (task_status != BFI_TSKIM_STS_OK)
-				BFA_LOG(KERN_ERR, bfad, bfa_log_level,
-					"target reset failure,"
-					" status: %d\n", task_status);
-			else
-				rtn = SUCCESS;
-		}
+	if (!rport->dd_data) {
+		spin_unlock_irqrestore(&bfad->bfad_lock, flags);
+		return rtn;
+	}
+	itnim_data = rport->dd_data;
+	if (itnim_data->tmf_wq) {
+		BFA_LOG(KERN_ERR, bfad, bfa_log_level,
+			"target reset failed, TMF already active");
+		spin_unlock_irqrestore(&bfad->bfad_lock, flags);
+		return rtn;
+	}
+	itnim = itnim_data->itnim;
+	itnim_data->tmf_wq = &wq;
+	itnim_data->tmf_status = 0;
+	rc = bfad_im_target_reset_send(bfad, itnim_data);
+	if (rc == BFA_STATUS_OK) {
+		/* wait target reset to complete */
+		spin_unlock_irqrestore(&bfad->bfad_lock, flags);
+		wait_event(wq, test_bit(IO_DONE_BIT,
+					&itnim_data->tmf_status));
+		spin_lock_irqsave(&bfad->bfad_lock, flags);
+
+		task_status = itnim_data->tmf_status >> 1;
+		if (task_status != BFI_TSKIM_STS_OK)
+			BFA_LOG(KERN_ERR, bfad, bfa_log_level,
+				"target reset failure,"
+				" status: %d\n", task_status);
+		else
+			rtn = SUCCESS;
 	}
 	spin_unlock_irqrestore(&bfad->bfad_lock, flags);
 
diff --git a/drivers/scsi/bfa/bfad_im.h b/drivers/scsi/bfa/bfad_im.h
index 4353feedf76a..48e8c12969c9 100644
--- a/drivers/scsi/bfa/bfad_im.h
+++ b/drivers/scsi/bfa/bfad_im.h
@@ -61,6 +61,8 @@ static inline struct bfad_cmd_priv *bfad_priv(struct scsi_cmnd *cmd)
 
 struct bfad_itnim_data_s {
 	struct bfad_itnim_s *itnim;
+	wait_queue_head_t *tmf_wq;
+	unsigned long tmf_status;
 };
 
 struct bfad_im_port_s {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/15] aha152x: look for stuck command when resetting device
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 02/15] bfa: Do not use scsi command to signal TMF status Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 04/15] a1000u2w: do not rely on the command for inia100_device_reset() Hannes Reinecke
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

The LLDD needs a command to send the reset with, so look at the
list of outstanding commands to get one.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/aha152x.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c
index 055adb349b0e..936c9f5c6f23 100644
--- a/drivers/scsi/aha152x.c
+++ b/drivers/scsi/aha152x.c
@@ -1070,24 +1070,28 @@ static int aha152x_abort(struct scsi_cmnd *SCpnt)
  */
 static int aha152x_device_reset(struct scsi_cmnd * SCpnt)
 {
-	struct Scsi_Host *shpnt = SCpnt->device->host;
+	struct scsi_device *sdev = SCpnt->device;
+	struct Scsi_Host *shpnt = sdev->host;
 	DECLARE_COMPLETION(done);
 	int ret, issued, disconnected;
-	unsigned char old_cmd_len = SCpnt->cmd_len;
+	unsigned char old_cmd_len;
 	unsigned long flags;
 	unsigned long timeleft;
 
-	if(CURRENT_SC==SCpnt) {
-		scmd_printk(KERN_ERR, SCpnt, "cannot reset current device\n");
-		return FAILED;
-	}
-
 	DO_LOCK(flags);
-	issued       = remove_SC(&ISSUE_SC, SCpnt) == NULL;
-	disconnected = issued && remove_SC(&DISCONNECTED_SC, SCpnt);
+	/* Look for the stuck command */
+	SCpnt = remove_lun_SC(&ISSUE_SC, sdev->id, sdev->lun);
+	if (SCpnt)
+		issued = 1;
+	else
+		SCpnt = remove_lun_SC(&DISCONNECTED_SC, sdev->id, sdev->lun);
+	if (!issued && SCpnt)
+		disconnected = 1;
 	DO_UNLOCK(flags);
-
-	SCpnt->cmd_len         = 0;
+	if (!SCpnt)
+		return FAILED;
+	old_cmd_len = SCpnt->cmd_len;
+	SCpnt->cmd_len = 0;
 
 	aha152x_internal_queue(SCpnt, &done, resetting);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/15] a1000u2w: do not rely on the command for inia100_device_reset()
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (2 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 03/15] aha152x: look for stuck command when resetting device Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 05/15] fas216: Rework device reset to not rely on SCSI command pointer Hannes Reinecke
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Use the scsi device as argument to orc_device_reset() instead
of relying on the passed in scsi command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/a100u2w.c | 43 +++++++++++-------------------------------
 1 file changed, 11 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/a100u2w.c b/drivers/scsi/a100u2w.c
index b95147fb18b0..43b119add2b9 100644
--- a/drivers/scsi/a100u2w.c
+++ b/drivers/scsi/a100u2w.c
@@ -592,39 +592,20 @@ static int orc_reset_scsi_bus(struct orc_host * host)
  *	commands for target w/o soft reset
  */
 
-static int orc_device_reset(struct orc_host * host, struct scsi_cmnd *cmd, unsigned int target)
+static int orc_device_reset(struct orc_host * host, struct scsi_device *sdev)
 {				/* I need Host Control Block Information */
 	struct orc_scb *scb;
 	struct orc_extended_scb *escb;
-	struct orc_scb *host_scb;
-	u8 i;
 	unsigned long flags;
 
 	spin_lock_irqsave(&(host->allocation_lock), flags);
 	scb = (struct orc_scb *) NULL;
 	escb = (struct orc_extended_scb *) NULL;
 
-	/* setup scatter list address with one buffer */
-	host_scb = host->scb_virt;
-
 	/* FIXME: is this safe if we then fail to issue the reset or race
 	   a completion ? */
 	init_alloc_map(host);
 
-	/* Find the scb corresponding to the command */
-	for (i = 0; i < ORC_MAXQUEUE; i++) {
-		escb = host_scb->escb;
-		if (host_scb->status && escb->srb == cmd)
-			break;
-		host_scb++;
-	}
-
-	if (i == ORC_MAXQUEUE) {
-		printk(KERN_ERR "Unable to Reset - No SCB Found\n");
-		spin_unlock_irqrestore(&(host->allocation_lock), flags);
-		return FAILED;
-	}
-
 	/* Allocate a new SCB for the reset command to the firmware */
 	if ((scb = __orc_alloc_scb(host)) == NULL) {
 		/* Can't happen.. */
@@ -635,7 +616,7 @@ static int orc_device_reset(struct orc_host * host, struct scsi_cmnd *cmd, unsig
 	/* Reset device is handled by the firmware, we fill in an SCB and
 	   fire it at the controller, it does the rest */
 	scb->opcode = ORC_BUSDEVRST;
-	scb->target = target;
+	scb->target = sdev->id;
 	scb->hastat = 0;
 	scb->tastat = 0;
 	scb->status = 0x0;
@@ -645,8 +626,8 @@ static int orc_device_reset(struct orc_host * host, struct scsi_cmnd *cmd, unsig
 	scb->xferlen = cpu_to_le32(0);
 	scb->sg_len = cpu_to_le32(0);
 
+	escb = scb->escb;
 	escb->srb = NULL;
-	escb->srb = cmd;
 	orc_exec_scb(host, scb);	/* Start execute SCB            */
 	spin_unlock_irqrestore(&host->allocation_lock, flags);
 	return SUCCESS;
@@ -971,7 +952,7 @@ static int inia100_device_reset(struct scsi_cmnd * cmd)
 {				/* I need Host Control Block Information */
 	struct orc_host *host;
 	host = (struct orc_host *) cmd->device->host->hostdata;
-	return orc_device_reset(host, cmd, scmd_id(cmd));
+	return orc_device_reset(host, cmd->device);
 
 }
 
@@ -991,11 +972,7 @@ static void inia100_scb_handler(struct orc_host *host, struct orc_scb *scb)
 	struct orc_extended_scb *escb;
 
 	escb = scb->escb;
-	if ((cmd = (struct scsi_cmnd *) escb->srb) == NULL) {
-		printk(KERN_ERR "inia100_scb_handler: SRB pointer is empty\n");
-		orc_release_scb(host, scb);	/* Release SCB for current channel */
-		return;
-	}
+	cmd = (struct scsi_cmnd *)escb->srb;
 	escb->srb = NULL;
 
 	switch (scb->hastat) {
@@ -1033,13 +1010,15 @@ static void inia100_scb_handler(struct orc_host *host, struct orc_scb *scb)
 		break;
 	}
 
-	if (scb->tastat == 2) {	/* Check condition              */
+	if (cmd && scb->tastat == 2) {	/* Check condition              */
 		memcpy((unsigned char *) &cmd->sense_buffer[0],
 		   (unsigned char *) &escb->sglist[0], SENSE_SIZE);
 	}
-	cmd->result = scb->tastat | (scb->hastat << 16);
-	scsi_dma_unmap(cmd);
-	scsi_done(cmd);		/* Notify system DONE           */
+	if (cmd) {
+		cmd->result = scb->tastat | (scb->hastat << 16);
+		scsi_dma_unmap(cmd);
+		scsi_done(cmd);		/* Notify system DONE           */
+	}
 	orc_release_scb(host, scb);	/* Release SCB for current channel */
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/15] fas216: Rework device reset to not rely on SCSI command pointer
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (3 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 04/15] a1000u2w: do not rely on the command for inia100_device_reset() Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 06/15] xen-scsifront: add scsi device as argument to scsifront_do_request() Hannes Reinecke
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

The device reset code should not rely on the SCSI command pointer;
it will be going away with the device reset handler rework.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/arm/fas216.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/arm/fas216.c b/drivers/scsi/arm/fas216.c
index 4ce0b2d73614..e6289c6af5ef 100644
--- a/drivers/scsi/arm/fas216.c
+++ b/drivers/scsi/arm/fas216.c
@@ -1985,7 +1985,6 @@ static void fas216_devicereset_done(FAS216_Info *info, struct scsi_cmnd *SCpnt,
 {
 	fas216_log(info, LOG_ERROR, "fas216 device reset complete");
 
-	info->rstSCpnt = NULL;
 	info->rst_dev_status = 1;
 	wake_up(&info->eh_wait);
 }
@@ -2143,12 +2142,12 @@ static void fas216_done(FAS216_Info *info, unsigned int result)
 
 	fas216_checkmagic(info);
 
-	if (!info->SCpnt)
+	if (!info->SCpnt && info->rst_dev_status)
 		goto no_command;
 
 	SCpnt = info->SCpnt;
 	info->SCpnt = NULL;
-    	info->scsi.phase = PHASE_IDLE;
+	info->scsi.phase = PHASE_IDLE;
 
 	if (info->scsi.aborting) {
 		fas216_log(info, 0, "uncaught abort - returning DID_ABORT");
@@ -2160,7 +2159,7 @@ static void fas216_done(FAS216_Info *info, unsigned int result)
 	 * Sanity check the completion - if we have zero bytes left
 	 * to transfer, we should not have a valid pointer.
 	 */
-	if (info->scsi.SCp.ptr && info->scsi.SCp.this_residual == 0) {
+	if (SCpnt && info->scsi.SCp.ptr && info->scsi.SCp.this_residual == 0) {
 		scmd_printk(KERN_INFO, SCpnt,
 			    "zero bytes left to transfer, but buffer pointer still valid: ptr=%p len=%08x\n",
 			    info->scsi.SCp.ptr, info->scsi.SCp.this_residual);
@@ -2173,12 +2172,18 @@ static void fas216_done(FAS216_Info *info, unsigned int result)
 	 * the sense information, fas216_kick will re-assert the busy
 	 * status.
 	 */
-	info->device[SCpnt->device->id].parity_check = 0;
-	clear_bit(SCpnt->device->id * 8 +
-		  (u8)(SCpnt->device->lun & 0x7), info->busyluns);
-
-	fn = (void (*)(FAS216_Info *, struct scsi_cmnd *, unsigned int))SCpnt->host_scribble;
-	fn(info, SCpnt, result);
+	if (SCpnt) {
+		info->device[SCpnt->device->id].parity_check = 0;
+		clear_bit(SCpnt->device->id * 8 +
+			  (u8)(SCpnt->device->lun & 0x7), info->busyluns);
+	}
+	if (!info->rst_dev_status) {
+		info->rst_dev_status = 1;
+		wake_up(&info->eh_wait);
+	} else {
+		fn = (void (*)(FAS216_Info *, struct scsi_cmnd *, unsigned int))SCpnt->host_scribble;
+		fn(info, SCpnt, result);
+	}
 
 	if (info->scsi.irq) {
 		spin_lock_irqsave(&info->host_lock, flags);
@@ -2478,9 +2483,10 @@ int fas216_eh_abort(struct scsi_cmnd *SCpnt)
  */
 int fas216_eh_device_reset(struct scsi_cmnd *SCpnt)
 {
-	FAS216_Info *info = (FAS216_Info *)SCpnt->device->host->hostdata;
+	struct scsi_device *sdev = SCpnt->device;
+	FAS216_Info *info = (FAS216_Info *)sdev->host->hostdata;
 	unsigned long flags;
-	int i, res = FAILED, target = SCpnt->device->id;
+	int i, res = FAILED, target = sdev->id;
 
 	fas216_log(info, LOG_ERROR, "device reset for target %d", target);
 
@@ -2494,7 +2500,7 @@ int fas216_eh_device_reset(struct scsi_cmnd *SCpnt)
 		 * and we need a bus reset.
 		 */
 		if (info->SCpnt && !info->scsi.disconnectable &&
-		    info->SCpnt->device->id == SCpnt->device->id)
+		    info->SCpnt->device->id == sdev->id)
 			break;
 
 		/*
@@ -2512,14 +2518,7 @@ int fas216_eh_device_reset(struct scsi_cmnd *SCpnt)
 		for (i = 0; i < 8; i++)
 			clear_bit(target * 8 + i, info->busyluns);
 
-		/*
-		 * Hijack this SCSI command structure to send
-		 * a bus device reset message to this device.
-		 */
-		SCpnt->host_scribble = (void *)fas216_devicereset_done;
-
 		info->rst_dev_status = 0;
-		info->rstSCpnt = SCpnt;
 
 		if (info->scsi.phase == PHASE_IDLE)
 			fas216_kick(info);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/15] xen-scsifront: add scsi device as argument to scsifront_do_request()
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (4 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 05/15] fas216: Rework device reset to not rely on SCSI command pointer Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 07/15] xen-scsifront: rework scsifront_action_handler() Hannes Reinecke
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Add scsi device as argument to scsifront_do_request() so that it
will be possible to call it with a NULL command pointer.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/xen-scsifront.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c
index 9ec55ddc1204..a0c13200d53a 100644
--- a/drivers/scsi/xen-scsifront.c
+++ b/drivers/scsi/xen-scsifront.c
@@ -178,7 +178,8 @@ static void scsifront_put_rqid(struct vscsifrnt_info *info, uint32_t id)
 		scsifront_wake_up(info);
 }
 
-static int scsifront_do_request(struct vscsifrnt_info *info,
+static int scsifront_do_request(struct scsi_device *sdev,
+				struct vscsifrnt_info *info,
 				struct vscsifrnt_shadow *shadow)
 {
 	struct vscsiif_front_ring *ring = &(info->ring);
@@ -205,17 +206,25 @@ static int scsifront_do_request(struct vscsifrnt_info *info,
 	ring_req->ref_rqid    = shadow->ref_rqid;
 	ring_req->nr_segments = shadow->nr_segments;
 
-	ring_req->id      = sc->device->id;
-	ring_req->lun     = sc->device->lun;
-	ring_req->channel = sc->device->channel;
-	ring_req->cmd_len = sc->cmd_len;
+	ring_req->id      = sdev->id;
+	ring_req->lun     = sdev->lun;
+	ring_req->channel = sdev->channel;
 
-	BUG_ON(sc->cmd_len > VSCSIIF_MAX_COMMAND_SIZE);
+	if (sc) {
+		ring_req->cmd_len = sc->cmd_len;
 
-	memcpy(ring_req->cmnd, sc->cmnd, sc->cmd_len);
+		BUG_ON(sc->cmd_len > VSCSIIF_MAX_COMMAND_SIZE);
 
-	ring_req->sc_data_direction   = (uint8_t)sc->sc_data_direction;
-	ring_req->timeout_per_command = scsi_cmd_to_rq(sc)->timeout / HZ;
+		memcpy(ring_req->cmnd, sc->cmnd, sc->cmd_len);
+
+		ring_req->sc_data_direction   = (uint8_t)sc->sc_data_direction;
+		ring_req->timeout_per_command =
+			scsi_cmd_to_rq(sc)->timeout / HZ;
+	} else {
+		ring_req->cmd_len = VSCSIIF_MAX_COMMAND_SIZE;
+		memset(ring_req->cmnd, 0, VSCSIIF_MAX_COMMAND_SIZE);
+		ring_req->sc_data_direction = DMA_NONE;
+	}
 
 	for (i = 0; i < (shadow->nr_segments & ~VSCSIIF_SG_GRANT); i++)
 		ring_req->seg[i] = shadow->seg[i];
@@ -637,7 +646,7 @@ static int scsifront_queuecommand(struct Scsi_Host *shost,
 		return 0;
 	}
 
-	if (scsifront_do_request(info, shadow)) {
+	if (scsifront_do_request(sc->device, info, shadow)) {
 		scsifront_gnttab_done(info, shadow);
 		goto busy;
 	}
@@ -685,7 +694,7 @@ static int scsifront_action_handler(struct scsi_cmnd *sc, uint8_t act)
 		if (scsifront_enter(info))
 			goto fail;
 
-		if (!scsifront_do_request(info, shadow))
+		if (!scsifront_do_request(sc->device, info, shadow))
 			break;
 
 		scsifront_return(info);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/15] xen-scsifront: rework scsifront_action_handler()
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (5 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 06/15] xen-scsifront: add scsi device as argument to scsifront_do_request() Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 08/15] libiscsi: use cls_session as argument for target and session reset Hannes Reinecke
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Rework scsifront_action_handler() to add the SCSI device as the
first argument, and select between abort and device reset by
checking whether the scsi_cmnd argument is NULL.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/xen-scsifront.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c
index a0c13200d53a..26dd229aeb22 100644
--- a/drivers/scsi/xen-scsifront.c
+++ b/drivers/scsi/xen-scsifront.c
@@ -668,11 +668,11 @@ static int scsifront_queuecommand(struct Scsi_Host *shost,
  * We have to wait until an answer is returned. This answer contains the
  * result to be returned to the requestor.
  */
-static int scsifront_action_handler(struct scsi_cmnd *sc, uint8_t act)
+static int scsifront_action_handler(struct scsi_device *sdev, struct scsi_cmnd *sc)
 {
-	struct Scsi_Host *host = sc->device->host;
+	struct Scsi_Host *host = sdev->host;
 	struct vscsifrnt_info *info = shost_priv(host);
-	struct vscsifrnt_shadow *shadow, *s = scsi_cmd_priv(sc);
+	struct vscsifrnt_shadow *shadow;
 	int err = 0;
 
 	if (info->host_active == STATE_ERROR)
@@ -682,10 +682,14 @@ static int scsifront_action_handler(struct scsi_cmnd *sc, uint8_t act)
 	if (!shadow)
 		return FAILED;
 
-	shadow->act = act;
+	shadow->act = sc ? VSCSIIF_ACT_SCSI_ABORT : VSCSIIF_ACT_SCSI_RESET;
 	shadow->rslt_reset = RSLT_RESET_WAITING;
 	shadow->sc = sc;
-	shadow->ref_rqid = s->rqid;
+	if (sc) {
+		struct vscsifrnt_shadow *s = scsi_cmd_priv(sc);
+
+		shadow->ref_rqid = s->rqid;
+	}
 	init_waitqueue_head(&shadow->wq_reset);
 
 	spin_lock_irq(host->host_lock);
@@ -735,13 +739,13 @@ static int scsifront_action_handler(struct scsi_cmnd *sc, uint8_t act)
 static int scsifront_eh_abort_handler(struct scsi_cmnd *sc)
 {
 	pr_debug("%s\n", __func__);
-	return scsifront_action_handler(sc, VSCSIIF_ACT_SCSI_ABORT);
+	return scsifront_action_handler(sc->device, sc);
 }
 
 static int scsifront_dev_reset_handler(struct scsi_cmnd *sc)
 {
 	pr_debug("%s\n", __func__);
-	return scsifront_action_handler(sc, VSCSIIF_ACT_SCSI_RESET);
+	return scsifront_action_handler(sc->device, NULL);
 }
 
 static int scsifront_sdev_configure(struct scsi_device *sdev)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/15] libiscsi: use cls_session as argument for target and session reset
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (6 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 07/15] xen-scsifront: rework scsifront_action_handler() Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 09/15] scsi_transport_iscsi: use session as argument for iscsi_block_scsi_eh() Hannes Reinecke
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

iscsi_eh_target_reset() and iscsi_eh_session_reset() only depend
on the cls_session, so use that as an argument.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/be2iscsi/be_main.c | 10 +++++++++-
 drivers/scsi/libiscsi.c         | 21 +++++++++------------
 include/scsi/libiscsi.h         |  2 +-
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index e48f14ad6dfd..441ad2ebc5d5 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -385,6 +385,14 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
 	return rc;
 }
 
+static int beiscsi_eh_session_reset(struct scsi_cmnd *sc)
+{
+	struct iscsi_cls_session *cls_session;
+
+	cls_session = starget_to_session(scsi_target(sc->device));
+	return iscsi_eh_session_reset(cls_session);
+}
+
 /*------------------- PCI Driver operations and data ----------------- */
 static const struct pci_device_id beiscsi_pci_id_table[] = {
 	{ PCI_DEVICE(BE_VENDOR_ID, BE_DEVICE_ID1) },
@@ -408,7 +416,7 @@ static const struct scsi_host_template beiscsi_sht = {
 	.eh_timed_out = iscsi_eh_cmd_timed_out,
 	.eh_abort_handler = beiscsi_eh_abort,
 	.eh_device_reset_handler = beiscsi_eh_device_reset,
-	.eh_target_reset_handler = iscsi_eh_session_reset,
+	.eh_target_reset_handler = beiscsi_eh_session_reset,
 	.shost_groups = beiscsi_groups,
 	.sg_tablesize = BEISCSI_SGLIST_ELEMENTS,
 	.can_queue = BE2_IO_DEPTH,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0fda8905eabd..a561eefabb50 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2600,13 +2600,11 @@ EXPORT_SYMBOL_GPL(iscsi_session_recovery_timedout);
  * This function will wait for a relogin, session termination from
  * userspace, or a recovery/replacement timeout.
  */
-int iscsi_eh_session_reset(struct scsi_cmnd *sc)
+int iscsi_eh_session_reset(struct iscsi_cls_session *cls_session)
 {
-	struct iscsi_cls_session *cls_session;
 	struct iscsi_session *session;
 	struct iscsi_conn *conn;
 
-	cls_session = starget_to_session(scsi_target(sc->device));
 	session = cls_session->dd_data;
 
 	mutex_lock(&session->eh_mutex);
@@ -2653,7 +2651,7 @@ int iscsi_eh_session_reset(struct scsi_cmnd *sc)
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_session_reset);
 
-static void iscsi_prep_tgt_reset_pdu(struct scsi_cmnd *sc, struct iscsi_tm *hdr)
+static void iscsi_prep_tgt_reset_pdu(struct iscsi_tm *hdr)
 {
 	memset(hdr, 0, sizeof(*hdr));
 	hdr->opcode = ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE;
@@ -2668,19 +2666,16 @@ static void iscsi_prep_tgt_reset_pdu(struct scsi_cmnd *sc, struct iscsi_tm *hdr)
  *
  * This will attempt to send a warm target reset.
  */
-static int iscsi_eh_target_reset(struct scsi_cmnd *sc)
+static int iscsi_eh_target_reset(struct iscsi_cls_session *cls_session)
 {
-	struct iscsi_cls_session *cls_session;
 	struct iscsi_session *session;
 	struct iscsi_conn *conn;
 	struct iscsi_tm *hdr;
 	int rc = FAILED;
 
-	cls_session = starget_to_session(scsi_target(sc->device));
 	session = cls_session->dd_data;
 
-	ISCSI_DBG_EH(session, "tgt Reset [sc %p tgt %s]\n", sc,
-		     session->targetname);
+	ISCSI_DBG_EH(session, "tgt Reset [tgt %s]\n", session->targetname);
 
 	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->frwd_lock);
@@ -2698,7 +2693,7 @@ static int iscsi_eh_target_reset(struct scsi_cmnd *sc)
 	session->tmf_state = TMF_QUEUED;
 
 	hdr = &session->tmhdr;
-	iscsi_prep_tgt_reset_pdu(sc, hdr);
+	iscsi_prep_tgt_reset_pdu(hdr);
 
 	if (iscsi_exec_task_mgmt_fn(conn, hdr, session->age,
 				    session->tgt_reset_timeout)) {
@@ -2750,11 +2745,13 @@ static int iscsi_eh_target_reset(struct scsi_cmnd *sc)
  */
 int iscsi_eh_recover_target(struct scsi_cmnd *sc)
 {
+	struct iscsi_cls_session *cls_session;
 	int rc;
 
-	rc = iscsi_eh_target_reset(sc);
+	cls_session = starget_to_session(scsi_target(sc->device));
+	rc = iscsi_eh_target_reset(cls_session);
 	if (rc == FAILED)
-		rc = iscsi_eh_session_reset(sc);
+		rc = iscsi_eh_session_reset(cls_session);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_recover_target);
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 7282555adfd5..7dddf785edd0 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -390,7 +390,7 @@ struct iscsi_host {
  */
 extern int iscsi_eh_abort(struct scsi_cmnd *sc);
 extern int iscsi_eh_recover_target(struct scsi_cmnd *sc);
-extern int iscsi_eh_session_reset(struct scsi_cmnd *sc);
+extern int iscsi_eh_session_reset(struct iscsi_cls_session *cls_session);
 extern int iscsi_eh_device_reset(struct scsi_cmnd *sc);
 extern int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc);
 extern enum scsi_timeout_action iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/15] scsi_transport_iscsi: use session as argument for iscsi_block_scsi_eh()
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (7 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 08/15] libiscsi: use cls_session as argument for target and session reset Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 10/15] snic: reserve tag for TMF Hannes Reinecke
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

We should be passing in the session directly instead of deriving it
from the scsi command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/qla4xxx/ql4_os.c       | 34 ++++++++++++++++++-----------
 drivers/scsi/scsi_transport_iscsi.c |  6 ++---
 include/scsi/scsi_transport_iscsi.h |  2 +-
 3 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 675332e49a7b..961fe65bbe65 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -9272,15 +9272,18 @@ static int qla4xxx_eh_abort(struct scsi_cmnd *cmd)
  **/
 static int qla4xxx_eh_device_reset(struct scsi_cmnd *cmd)
 {
-	struct scsi_qla_host *ha = to_qla_host(cmd->device->host);
-	struct ddb_entry *ddb_entry = cmd->device->hostdata;
+	struct scsi_device *sdev = cmd->device;
+	struct scsi_qla_host *ha = to_qla_host(sdev->host);
+	struct iscsi_cls_session *session;
+	struct ddb_entry *ddb_entry = sdev->hostdata;
 	int ret = FAILED, stat;
 	int rval;
 
 	if (!ddb_entry)
 		return ret;
 
-	ret = iscsi_block_scsi_eh(cmd);
+	session = starget_to_session(scsi_target(sdev));
+	ret = iscsi_block_scsi_eh(session);
 	if (ret)
 		return ret;
 	ret = FAILED;
@@ -9341,19 +9344,25 @@ static int qla4xxx_eh_device_reset(struct scsi_cmnd *cmd)
  **/
 static int qla4xxx_eh_target_reset(struct scsi_cmnd *cmd)
 {
-	struct scsi_qla_host *ha = to_qla_host(cmd->device->host);
-	struct ddb_entry *ddb_entry = cmd->device->hostdata;
+	struct scsi_target *starget = scsi_target(cmd->device);
+	struct iscsi_cls_session *cls_session = starget_to_session(starget);
+	struct iscsi_session *sess;
+	struct scsi_qla_host *ha;
+	struct ddb_entry *ddb_entry;
 	int stat, ret;
 	int rval;
 
+	sess = cls_session->dd_data;
+	ddb_entry = sess->dd_data;
 	if (!ddb_entry)
 		return FAILED;
+	ha = ddb_entry->ha;
 
-	ret = iscsi_block_scsi_eh(cmd);
+	ret = iscsi_block_scsi_eh(cls_session);
 	if (ret)
 		return ret;
 
-	starget_printk(KERN_INFO, scsi_target(cmd->device),
+	starget_printk(KERN_INFO, starget,
 		       "WARM TARGET RESET ISSUED.\n");
 
 	DEBUG2(printk(KERN_INFO
@@ -9370,14 +9379,13 @@ static int qla4xxx_eh_target_reset(struct scsi_cmnd *cmd)
 
 	stat = qla4xxx_reset_target(ha, ddb_entry);
 	if (stat != QLA_SUCCESS) {
-		starget_printk(KERN_INFO, scsi_target(cmd->device),
+		starget_printk(KERN_INFO, starget,
 			       "WARM TARGET RESET FAILED.\n");
 		return FAILED;
 	}
 
-	if (qla4xxx_eh_wait_for_commands(ha, scsi_target(cmd->device),
-					 NULL)) {
-		starget_printk(KERN_INFO, scsi_target(cmd->device),
+	if (qla4xxx_eh_wait_for_commands(ha, starget, NULL)) {
+		starget_printk(KERN_INFO, starget,
 			       "WARM TARGET DEVICE RESET FAILED - "
 			       "waiting for commands.\n");
 		return FAILED;
@@ -9386,13 +9394,13 @@ static int qla4xxx_eh_target_reset(struct scsi_cmnd *cmd)
 	/* Send marker. */
 	if (qla4xxx_send_marker_iocb(ha, ddb_entry, cmd->device->lun,
 		MM_TGT_WARM_RESET) != QLA_SUCCESS) {
-		starget_printk(KERN_INFO, scsi_target(cmd->device),
+		starget_printk(KERN_INFO, starget,
 			       "WARM TARGET DEVICE RESET FAILED - "
 			       "marker iocb failed.\n");
 		return FAILED;
 	}
 
-	starget_printk(KERN_INFO, scsi_target(cmd->device),
+	starget_printk(KERN_INFO, starget,
 		       "WARM TARGET RESET SUCCEEDED.\n");
 	return SUCCESS;
 }
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 3075b2ddf7a6..4e81ef882a49 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1838,17 +1838,15 @@ static void iscsi_scan_session(struct work_struct *work)
 
 /**
  * iscsi_block_scsi_eh - block scsi eh until session state has transistioned
- * @cmd: scsi cmd passed to scsi eh handler
+ * @session: iscsi session derived from scsi eh handler argument
  *
  * If the session is down this function will wait for the recovery
  * timer to fire or for the session to be logged back in. If the
  * recovery timer fires then FAST_IO_FAIL is returned. The caller
  * should pass this error value to the scsi eh.
  */
-int iscsi_block_scsi_eh(struct scsi_cmnd *cmd)
+int iscsi_block_scsi_eh(struct iscsi_cls_session *session)
 {
-	struct iscsi_cls_session *session =
-			starget_to_session(scsi_target(cmd->device));
 	unsigned long flags;
 	int ret = 0;
 
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index fb3399e4cd29..8ec36de5d236 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -466,7 +466,7 @@ extern struct iscsi_endpoint *iscsi_create_endpoint(int dd_size);
 extern void iscsi_destroy_endpoint(struct iscsi_endpoint *ep);
 extern struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle);
 extern void iscsi_put_endpoint(struct iscsi_endpoint *ep);
-extern int iscsi_block_scsi_eh(struct scsi_cmnd *cmd);
+extern int iscsi_block_scsi_eh(struct iscsi_cls_session *session);
 extern struct iscsi_iface *iscsi_create_iface(struct Scsi_Host *shost,
 					      struct iscsi_transport *t,
 					      uint32_t iface_type,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/15] snic: reserve tag for TMF
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (8 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 09/15] scsi_transport_iscsi: use session as argument for iscsi_block_scsi_eh() Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 11/15] snic: allocate device reset command Hannes Reinecke
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Rework the hba reset function to not rely on scsi commands and
reserve one command for HBA reset.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/snic/snic.h      |  2 +-
 drivers/scsi/snic/snic_main.c |  5 ++-
 drivers/scsi/snic/snic_scsi.c | 82 ++++++++---------------------------
 3 files changed, 21 insertions(+), 68 deletions(-)

diff --git a/drivers/scsi/snic/snic.h b/drivers/scsi/snic/snic.h
index 32f5a34b6987..0b7411624bcf 100644
--- a/drivers/scsi/snic/snic.h
+++ b/drivers/scsi/snic/snic.h
@@ -366,7 +366,7 @@ int snic_queuecommand(struct Scsi_Host *, struct scsi_cmnd *);
 int snic_abort_cmd(struct scsi_cmnd *);
 int snic_device_reset(struct scsi_cmnd *);
 int snic_host_reset(struct scsi_cmnd *);
-int snic_reset(struct Scsi_Host *, struct scsi_cmnd *);
+int snic_reset(struct Scsi_Host *);
 void snic_shutdown_scsi_cleanup(struct snic *);
 
 
diff --git a/drivers/scsi/snic/snic_main.c b/drivers/scsi/snic/snic_main.c
index cc824dcfe7da..9bee91eedc10 100644
--- a/drivers/scsi/snic/snic_main.c
+++ b/drivers/scsi/snic/snic_main.c
@@ -494,9 +494,10 @@ snic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Configure Maximum Outstanding IO reqs */
 	max_ios = snic->config.io_throttle_count;
 	if (max_ios != SNIC_UCSM_DFLT_THROTTLE_CNT_BLD)
-		shost->can_queue = min_t(u32, SNIC_MAX_IO_REQ,
+		max_ios = min_t(u32, SNIC_MAX_IO_REQ,
 					 max_t(u32, SNIC_MIN_IO_REQ, max_ios));
-
+	/* Reserve one tag for HBA reset */
+	shost->can_queue = max_ios - 1;
 	snic->max_tag_id = shost->can_queue;
 
 	shost->max_lun = snic->config.luns_per_tgt;
diff --git a/drivers/scsi/snic/snic_scsi.c b/drivers/scsi/snic/snic_scsi.c
index c50ede326cc4..f1ef781df837 100644
--- a/drivers/scsi/snic/snic_scsi.c
+++ b/drivers/scsi/snic/snic_scsi.c
@@ -952,13 +952,13 @@ snic_itmf_cmpl_handler(struct snic *snic, struct snic_fw_req *fwreq)
 
 
 static void
-snic_hba_reset_scsi_cleanup(struct snic *snic, struct scsi_cmnd *sc)
+snic_hba_reset_scsi_cleanup(struct snic *snic)
 {
 	struct snic_stats *st = &snic->s_stats;
 	long act_ios = 0, act_fwreqs = 0;
 
 	SNIC_SCSI_DBG(snic->shost, "HBA Reset scsi cleanup.\n");
-	snic_scsi_cleanup(snic, snic_cmd_tag(sc));
+	snic_scsi_cleanup(snic, snic->max_tag_id);
 
 	/* Update stats on pending IOs */
 	act_ios = atomic64_read(&st->io.active);
@@ -984,7 +984,6 @@ snic_hba_reset_cmpl_handler(struct snic *snic, struct snic_fw_req *fwreq)
 	u32 hid;
 	u8 typ;
 	u8 hdr_stat;
-	struct scsi_cmnd *sc = NULL;
 	struct snic_req_info *rqi = NULL;
 	spinlock_t *io_lock = NULL;
 	unsigned long flags, gflags;
@@ -999,18 +998,9 @@ snic_hba_reset_cmpl_handler(struct snic *snic, struct snic_fw_req *fwreq)
 		      "reset_cmpl: type = %x, hdr_stat = %x, cmnd_id = %x, hid = %x, ctx = %lx\n",
 		      typ, hdr_stat, cmnd_id, hid, ctx);
 
-	/* spl case, host reset issued through ioctl */
-	if (cmnd_id == SCSI_NO_TAG) {
-		rqi = (struct snic_req_info *) ctx;
-		SNIC_HOST_INFO(snic->shost,
-			       "reset_cmpl:Tag %d ctx %lx cmpl stat %s\n",
-			       cmnd_id, ctx, snic_io_status_to_str(hdr_stat));
-		sc = rqi->sc;
-
-		goto ioctl_hba_rst;
-	}
+	rqi = (struct snic_req_info *) ctx;
 
-	if (cmnd_id >= snic->max_tag_id) {
+	if (cmnd_id != snic->max_tag_id) {
 		SNIC_HOST_ERR(snic->shost,
 			      "reset_cmpl: Tag 0x%x out of Range,HdrStat %s\n",
 			      cmnd_id, snic_io_status_to_str(hdr_stat));
@@ -1019,55 +1009,35 @@ snic_hba_reset_cmpl_handler(struct snic *snic, struct snic_fw_req *fwreq)
 		return 1;
 	}
 
-	sc = scsi_host_find_tag(snic->shost, cmnd_id);
-ioctl_hba_rst:
-	if (!sc) {
-		atomic64_inc(&snic->s_stats.io.sc_null);
-		SNIC_HOST_ERR(snic->shost,
-			      "reset_cmpl: sc is NULL - Hdr Stat %s Tag 0x%x\n",
-			      snic_io_status_to_str(hdr_stat), cmnd_id);
-		ret = 1;
-
-		return ret;
-	}
-
 	SNIC_HOST_INFO(snic->shost,
-		       "reset_cmpl: sc %p rqi %p Tag %d flags 0x%llx\n",
-		       sc, rqi, cmnd_id, CMD_FLAGS(sc));
+		       "reset_cmpl: rqi %p Tag %d\n", rqi, cmnd_id);
 
-	io_lock = snic_io_lock_hash(snic, sc);
+	io_lock = snic_io_lock_tag(snic, cmnd_id);
 	spin_lock_irqsave(io_lock, flags);
 
 	if (!snic->remove_wait) {
 		spin_unlock_irqrestore(io_lock, flags);
 		SNIC_HOST_ERR(snic->shost,
 			      "reset_cmpl:host reset completed after timeout\n");
-		ret = 1;
-
-		return ret;
+		return 1;
 	}
 
-	rqi = (struct snic_req_info *) CMD_SP(sc);
 	WARN_ON_ONCE(!rqi);
 
 	if (!rqi) {
 		atomic64_inc(&snic->s_stats.io.req_null);
 		spin_unlock_irqrestore(io_lock, flags);
-		CMD_FLAGS(sc) |= SNIC_IO_ABTS_TERM_REQ_NULL;
 		SNIC_HOST_ERR(snic->shost,
-			      "reset_cmpl: rqi is null,Hdr stat %s Tag 0x%x sc 0x%p flags 0x%llx\n",
-			      snic_io_status_to_str(hdr_stat), cmnd_id, sc,
-			      CMD_FLAGS(sc));
-
-		ret = 1;
+			      "reset_cmpl: rqi is null,Hdr stat %s Tag 0x%x\n",
+			      snic_io_status_to_str(hdr_stat), cmnd_id);
 
-		return ret;
+		return 1;
 	}
 	/* stats */
 	spin_unlock_irqrestore(io_lock, flags);
 
 	/* scsi cleanup */
-	snic_hba_reset_scsi_cleanup(snic, sc);
+	snic_hba_reset_scsi_cleanup(snic);
 
 	SNIC_BUG_ON(snic_get_state(snic) != SNIC_OFFLINE &&
 		    snic_get_state(snic) != SNIC_FWRESET);
@@ -2203,7 +2173,7 @@ snic_device_reset(struct scsi_cmnd *sc)
  * snic_issue_hba_reset : Queues FW Reset Request.
  */
 static int
-snic_issue_hba_reset(struct snic *snic, struct scsi_cmnd *sc)
+snic_issue_hba_reset(struct snic *snic)
 {
 	struct snic_req_info *rqi = NULL;
 	struct snic_host_req *req = NULL;
@@ -2219,26 +2189,15 @@ snic_issue_hba_reset(struct snic *snic, struct scsi_cmnd *sc)
 		goto hba_rst_end;
 	}
 
-	if (snic_cmd_tag(sc) == SCSI_NO_TAG) {
-		memset(scsi_cmd_priv(sc), 0,
-			sizeof(struct snic_internal_io_state));
-		SNIC_HOST_INFO(snic->shost, "issu_hr:Host reset thru ioctl.\n");
-		rqi->sc = sc;
-	}
-
 	req = rqi_to_req(rqi);
 
-	io_lock = snic_io_lock_hash(snic, sc);
+	io_lock = snic_io_lock_tag(snic, snic->max_tag_id);
 	spin_lock_irqsave(io_lock, flags);
-	SNIC_BUG_ON(CMD_SP(sc) != NULL);
-	CMD_STATE(sc) = SNIC_IOREQ_PENDING;
-	CMD_SP(sc) = (char *) rqi;
-	CMD_FLAGS(sc) |= SNIC_IO_INITIALIZED;
 	snic->remove_wait = &wait;
 	spin_unlock_irqrestore(io_lock, flags);
 
 	/* Initialize Request */
-	snic_io_hdr_enc(&req->hdr, SNIC_REQ_HBA_RESET, 0, snic_cmd_tag(sc),
+	snic_io_hdr_enc(&req->hdr, SNIC_REQ_HBA_RESET, 0, snic->max_tag_id,
 			snic->config.hid, 0, (ulong) rqi);
 
 	req->u.reset.flags = 0;
@@ -2252,9 +2211,6 @@ snic_issue_hba_reset(struct snic *snic, struct scsi_cmnd *sc)
 		goto hba_rst_err;
 	}
 
-	spin_lock_irqsave(io_lock, flags);
-	CMD_FLAGS(sc) |= SNIC_HOST_RESET_ISSUED;
-	spin_unlock_irqrestore(io_lock, flags);
 	atomic64_inc(&snic->s_stats.reset.hba_resets);
 	SNIC_HOST_INFO(snic->shost, "Queued HBA Reset Successfully.\n");
 
@@ -2270,8 +2226,6 @@ snic_issue_hba_reset(struct snic *snic, struct scsi_cmnd *sc)
 
 	spin_lock_irqsave(io_lock, flags);
 	snic->remove_wait = NULL;
-	rqi = (struct snic_req_info *) CMD_SP(sc);
-	CMD_SP(sc) = NULL;
 	spin_unlock_irqrestore(io_lock, flags);
 
 	if (rqi)
@@ -2284,8 +2238,6 @@ snic_issue_hba_reset(struct snic *snic, struct scsi_cmnd *sc)
 hba_rst_err:
 	spin_lock_irqsave(io_lock, flags);
 	snic->remove_wait = NULL;
-	rqi = (struct snic_req_info *) CMD_SP(sc);
-	CMD_SP(sc) = NULL;
 	spin_unlock_irqrestore(io_lock, flags);
 
 	if (rqi)
@@ -2300,7 +2252,7 @@ snic_issue_hba_reset(struct snic *snic, struct scsi_cmnd *sc)
 } /* end of snic_issue_hba_reset */
 
 int
-snic_reset(struct Scsi_Host *shost, struct scsi_cmnd *sc)
+snic_reset(struct Scsi_Host *shost)
 {
 	struct snic *snic = shost_priv(shost);
 	enum snic_state sv_state;
@@ -2329,7 +2281,7 @@ snic_reset(struct Scsi_Host *shost, struct scsi_cmnd *sc)
 	while (atomic_read(&snic->ios_inflight))
 		schedule_timeout(msecs_to_jiffies(1));
 
-	ret = snic_issue_hba_reset(snic, sc);
+	ret = snic_issue_hba_reset(snic);
 	if (ret) {
 		SNIC_HOST_ERR(shost,
 			      "reset:Host Reset Failed w/ err %d.\n",
@@ -2368,7 +2320,7 @@ snic_host_reset(struct scsi_cmnd *sc)
 		      sc, sc->cmnd[0], scsi_cmd_to_rq(sc),
 		      snic_cmd_tag(sc), CMD_FLAGS(sc));
 
-	ret = snic_reset(shost, sc);
+	ret = snic_reset(shost);
 
 	SNIC_TRC(shost->host_no, snic_cmd_tag(sc), (ulong) sc,
 		 jiffies_to_msecs(jiffies - start_time),
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/15] snic: allocate device reset command
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (9 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 10/15] snic: reserve tag for TMF Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 12/15] snic: Use scsi_host_busy_iter() to traverse commands Hannes Reinecke
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Allocate a command to send a device reset instead of relying
on using the command which triggered the device failure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/snic/snic_scsi.c | 63 +++++++++++++++++------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/snic/snic_scsi.c b/drivers/scsi/snic/snic_scsi.c
index f1ef781df837..06615619b84c 100644
--- a/drivers/scsi/snic/snic_scsi.c
+++ b/drivers/scsi/snic/snic_scsi.c
@@ -2091,70 +2091,69 @@ snic_unlink_and_release_req(struct snic *snic, struct scsi_cmnd *sc, int flag)
 int
 snic_device_reset(struct scsi_cmnd *sc)
 {
-	struct Scsi_Host *shost = sc->device->host;
+	struct scsi_device *sdev = sc->device;
+	struct Scsi_Host *shost = sdev->host;
 	struct snic *snic = shost_priv(shost);
+	struct request *req;
 	struct snic_req_info *rqi = NULL;
-	int tag = snic_cmd_tag(sc);
 	int start_time = jiffies;
 	int ret = FAILED;
 	int dr_supp = 0;
 
-	SNIC_SCSI_DBG(shost, "dev_reset:sc %p :0x%x :req = %p :tag = %d\n",
-		      sc, sc->cmnd[0], scsi_cmd_to_rq(sc),
-		      snic_cmd_tag(sc));
-	dr_supp = snic_dev_reset_supported(sc->device);
+	SNIC_SCSI_DBG(shost, "dev_reset\n");
+	dr_supp = snic_dev_reset_supported(sdev);
 	if (!dr_supp) {
 		/* device reset op is not supported */
 		SNIC_HOST_INFO(shost, "LUN Reset Op not supported.\n");
-		snic_unlink_and_release_req(snic, sc, SNIC_DEV_RST_NOTSUP);
-
-		goto dev_rst_end;
+		return ret;
 	}
 
 	if (unlikely(snic_get_state(snic) != SNIC_ONLINE)) {
-		snic_unlink_and_release_req(snic, sc, 0);
 		SNIC_HOST_ERR(shost, "Devrst: Parent Devs are not online.\n");
 
-		goto dev_rst_end;
+		return ret;
 	}
 
-	/* There is no tag when lun reset is issue through ioctl. */
-	if (unlikely(tag <= SNIC_NO_TAG)) {
-		SNIC_HOST_INFO(snic->shost,
-			       "Devrst: LUN Reset Recvd thru IOCTL.\n");
-
-		rqi = snic_req_init(snic, 0);
-		if (!rqi)
-			goto dev_rst_end;
-
-		memset(scsi_cmd_priv(sc), 0,
-			sizeof(struct snic_internal_io_state));
-		CMD_SP(sc) = (char *)rqi;
-		CMD_FLAGS(sc) = SNIC_NO_FLAGS;
+	rqi = snic_req_init(snic, 0);
+	if (!rqi)
+		return ret;
 
-		/* Add special tag for dr coming from user spc */
-		rqi->tm_tag = SNIC_TAG_IOCTL_DEV_RST;
-		rqi->sc = sc;
+	req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
+				 BLK_MQ_REQ_NOWAIT);
+	if (!req) {
+		SNIC_HOST_ERR(snic->shost,
+			      "Devrst: TMF busy\n");
+		goto dev_rst_end;
 	}
+	sc = blk_mq_rq_to_pdu(req);
+	memset(scsi_cmd_priv(sc), 0,
+	       sizeof(struct snic_internal_io_state));
+	CMD_SP(sc) = (char *)rqi;
+	CMD_FLAGS(sc) = SNIC_NO_FLAGS;
 
+	/* Add special tag for dr coming from user spc */
+	rqi->tm_tag = SNIC_TAG_IOCTL_DEV_RST;
+	rqi->sc = sc;
+	WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
 	ret = snic_send_dr_and_wait(snic, sc);
 	if (ret) {
 		SNIC_HOST_ERR(snic->shost,
 			      "Devrst: IO w/ Tag %x Failed w/ err = %d\n",
-			      tag, ret);
-
+			      snic_cmd_tag(sc), ret);
+		blk_mq_set_request_complete(req);
 		snic_unlink_and_release_req(snic, sc, 0);
 
 		goto dev_rst_end;
 	}
-
+	blk_mq_set_request_complete(req);
 	ret = snic_dr_finish(snic, sc);
 
 dev_rst_end:
-	SNIC_TRC(snic->shost->host_no, tag, (ulong) sc,
+	SNIC_TRC(snic->shost->host_no, snic_cmd_tag(sc), (ulong) sc,
 		 jiffies_to_msecs(jiffies - start_time),
 		 0, SNIC_TRC_CMD(sc), SNIC_TRC_CMD_STATE_FLAGS(sc));
-
+	if (req)
+		blk_mq_free_request(req);
 	SNIC_SCSI_DBG(snic->shost,
 		      "Devrst: Returning from Device Reset : %s\n",
 		      (ret == SUCCESS) ? "SUCCESS" : "FAILED");
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/15] snic: Use scsi_host_busy_iter() to traverse commands
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (10 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 11/15] snic: allocate device reset command Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 13/15] fnic: allocate device reset command on the fly Hannes Reinecke
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn

Use scsi_host_busy_iter() to traverse commands instead of hand-crafted
routines walking the command list.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 drivers/scsi/snic/snic_scsi.c | 177 ++++++++++++++++------------------
 1 file changed, 84 insertions(+), 93 deletions(-)

diff --git a/drivers/scsi/snic/snic_scsi.c b/drivers/scsi/snic/snic_scsi.c
index 06615619b84c..48261a37d4a6 100644
--- a/drivers/scsi/snic/snic_scsi.c
+++ b/drivers/scsi/snic/snic_scsi.c
@@ -63,7 +63,7 @@ static const char * const snic_io_status_str[] = {
 	[SNIC_STAT_FATAL_ERROR]	= "SNIC_STAT_FATAL_ERROR",
 };
 
-static void snic_scsi_cleanup(struct snic *, int);
+static void snic_scsi_cleanup(struct snic *);
 
 const char *
 snic_state_to_str(unsigned int state)
@@ -958,7 +958,7 @@ snic_hba_reset_scsi_cleanup(struct snic *snic)
 	long act_ios = 0, act_fwreqs = 0;
 
 	SNIC_SCSI_DBG(snic->shost, "HBA Reset scsi cleanup.\n");
-	snic_scsi_cleanup(snic, snic->max_tag_id);
+	snic_scsi_cleanup(snic);
 
 	/* Update stats on pending IOs */
 	act_ios = atomic64_read(&st->io.active);
@@ -2357,53 +2357,36 @@ snic_cmpl_pending_tmreq(struct snic *snic, struct scsi_cmnd *sc)
 		complete(rqi->abts_done);
 }
 
-/*
- * snic_scsi_cleanup: Walks through tag map and releases the reqs
- */
-static void
-snic_scsi_cleanup(struct snic *snic, int ex_tag)
+static bool
+snic_scsi_cleanup_iter(struct scsi_cmnd *sc, void *data, bool reserved)
 {
+	struct snic *snic = data;
 	struct snic_req_info *rqi = NULL;
-	struct scsi_cmnd *sc = NULL;
 	spinlock_t *io_lock = NULL;
 	unsigned long flags;
-	int tag;
+	int tag = scsi_cmd_to_rq(sc)->tag;
 	u64 st_time = 0;
 
 	SNIC_SCSI_DBG(snic->shost, "sc_clean: scsi cleanup.\n");
 
-	for (tag = 0; tag < snic->max_tag_id; tag++) {
-		/* Skip ex_tag */
-		if (tag == ex_tag)
-			continue;
-
-		io_lock = snic_io_lock_tag(snic, tag);
-		spin_lock_irqsave(io_lock, flags);
-		sc = scsi_host_find_tag(snic->shost, tag);
-		if (!sc) {
-			spin_unlock_irqrestore(io_lock, flags);
-
-			continue;
-		}
-
-		if (unlikely(snic_tmreq_pending(sc))) {
-			/*
-			 * When FW Completes reset w/o sending completions
-			 * for outstanding ios.
-			 */
-			snic_cmpl_pending_tmreq(snic, sc);
-			spin_unlock_irqrestore(io_lock, flags);
-
-			continue;
-		}
+	io_lock = snic_io_lock_tag(snic, tag);
+	spin_lock_irqsave(io_lock, flags);
 
-		rqi = (struct snic_req_info *) CMD_SP(sc);
-		if (!rqi) {
-			spin_unlock_irqrestore(io_lock, flags);
+	if (unlikely(snic_tmreq_pending(sc))) {
+		/*
+		 * When FW Completes reset w/o sending completions
+		 * for outstanding ios.
+		 */
+		snic_cmpl_pending_tmreq(snic, sc);
+		spin_unlock_irqrestore(io_lock, flags);
 
-			goto cleanup;
-		}
+		return true;
+	}
 
+	rqi = (struct snic_req_info *) CMD_SP(sc);
+	if (!rqi)
+		spin_unlock_irqrestore(io_lock, flags);
+	else {
 		SNIC_SCSI_DBG(snic->shost,
 			      "sc_clean: sc %p, rqi %p, tag %d flags 0x%llx\n",
 			      sc, rqi, tag, CMD_FLAGS(sc));
@@ -2418,24 +2401,34 @@ snic_scsi_cleanup(struct snic *snic, int ex_tag)
 			       rqi, CMD_FLAGS(sc));
 
 		snic_release_req_buf(snic, rqi, sc);
+	}
+	sc->result = DID_TRANSPORT_DISRUPTED << 16;
+	SNIC_HOST_INFO(snic->shost,
+		       "sc_clean: DID_TRANSPORT_DISRUPTED for sc %p, Tag %d flags 0x%llx rqi %p duration %u msecs\n",
+		       sc, tag, CMD_FLAGS(sc), rqi,
+		       jiffies_to_msecs(jiffies - st_time));
 
-cleanup:
-		sc->result = DID_TRANSPORT_DISRUPTED << 16;
-		SNIC_HOST_INFO(snic->shost,
-			       "sc_clean: DID_TRANSPORT_DISRUPTED for sc %p, Tag %d flags 0x%llx rqi %p duration %u msecs\n",
-			       sc, scsi_cmd_to_rq(sc)->tag, CMD_FLAGS(sc), rqi,
-			       jiffies_to_msecs(jiffies - st_time));
+	/* Update IO stats */
+	snic_stats_update_io_cmpl(&snic->s_stats);
 
-		/* Update IO stats */
-		snic_stats_update_io_cmpl(&snic->s_stats);
+	SNIC_TRC(snic->shost->host_no, tag, (ulong) sc,
+		 jiffies_to_msecs(jiffies - st_time), 0,
+		 SNIC_TRC_CMD(sc),
+		 SNIC_TRC_CMD_STATE_FLAGS(sc));
 
-		SNIC_TRC(snic->shost->host_no, tag, (ulong) sc,
-			 jiffies_to_msecs(jiffies - st_time), 0,
-			 SNIC_TRC_CMD(sc),
-			 SNIC_TRC_CMD_STATE_FLAGS(sc));
+	scsi_done(sc);
+	return true;
+}
 
-		scsi_done(sc);
-	}
+/*
+ * snic_scsi_cleanup: Walks through tag map and releases the reqs
+ */
+static void
+snic_scsi_cleanup(struct snic *snic)
+{
+	SNIC_SCSI_DBG(snic->shost, "sc_clean: scsi cleanup\n");
+
+	scsi_host_busy_iter(snic->shost, snic_scsi_cleanup_iter, snic);
 } /* end of snic_scsi_cleanup */
 
 void
@@ -2443,7 +2436,7 @@ snic_shutdown_scsi_cleanup(struct snic *snic)
 {
 	SNIC_HOST_INFO(snic->shost, "Shutdown time SCSI Cleanup.\n");
 
-	snic_scsi_cleanup(snic, SCSI_NO_TAG);
+	snic_scsi_cleanup(snic);
 } /* end of snic_shutdown_scsi_cleanup */
 
 /*
@@ -2457,7 +2450,7 @@ snic_internal_abort_io(struct snic *snic, struct scsi_cmnd *sc, int tmf)
 	spinlock_t *io_lock = NULL;
 	unsigned long flags;
 	u32 sv_state = 0;
-	int ret = 0;
+	int ret = FAILED;
 
 	io_lock = snic_io_lock_hash(snic, sc);
 	spin_lock_irqsave(io_lock, flags);
@@ -2532,6 +2525,35 @@ snic_internal_abort_io(struct snic *snic, struct scsi_cmnd *sc, int tmf)
 	return ret;
 } /* end of snic_internal_abort_io */
 
+struct snic_tgt_scsi_abort_io_data {
+	struct snic *snic;
+	struct snic_tgt *tgt;
+	int tmf;
+	int abt_cnt;
+};
+
+static bool snic_tgt_scsi_abort_io_iter(struct scsi_cmnd *sc, void *data,
+					bool reserved)
+{
+	struct snic_tgt_scsi_abort_io_data *iter_data = data;
+	struct snic *snic = iter_data->snic;
+	struct snic_tgt *sc_tgt;
+	int ret;
+
+	sc_tgt = starget_to_tgt(scsi_target(sc->device));
+	if (sc_tgt != iter_data->tgt)
+		return true;
+
+	ret = snic_internal_abort_io(snic, sc, iter_data->tmf);
+	if (ret == SUCCESS)
+		iter_data->abt_cnt++;
+	else
+		SNIC_HOST_ERR(snic->shost,
+			      "tgt_abt_io: Tag %x, Failed w err = %d\n",
+			      scsi_cmd_to_rq(sc)->tag, ret);
+	return true;
+}
+
 /*
  * snic_tgt_scsi_abort_io : called by snic_tgt_del
  */
@@ -2539,11 +2561,9 @@ int
 snic_tgt_scsi_abort_io(struct snic_tgt *tgt)
 {
 	struct snic *snic = NULL;
-	struct scsi_cmnd *sc = NULL;
-	struct snic_tgt *sc_tgt = NULL;
-	spinlock_t *io_lock = NULL;
-	unsigned long flags;
-	int ret = 0, tag, abt_cnt = 0, tmf = 0;
+	struct snic_tgt_scsi_abort_io_data data = {
+		.abt_cnt = 0,
+	};
 
 	if (!tgt)
 		return -1;
@@ -2551,44 +2571,15 @@ snic_tgt_scsi_abort_io(struct snic_tgt *tgt)
 	snic = shost_priv(snic_tgt_to_shost(tgt));
 	SNIC_SCSI_DBG(snic->shost, "tgt_abt_io: Cleaning Pending IOs.\n");
 
+	data.snic = snic;
 	if (tgt->tdata.typ == SNIC_TGT_DAS)
-		tmf = SNIC_ITMF_ABTS_TASK;
+		data.tmf = SNIC_ITMF_ABTS_TASK;
 	else
-		tmf = SNIC_ITMF_ABTS_TASK_TERM;
-
-	for (tag = 0; tag < snic->max_tag_id; tag++) {
-		io_lock = snic_io_lock_tag(snic, tag);
-
-		spin_lock_irqsave(io_lock, flags);
-		sc = scsi_host_find_tag(snic->shost, tag);
-		if (!sc) {
-			spin_unlock_irqrestore(io_lock, flags);
-
-			continue;
-		}
+		data.tmf = SNIC_ITMF_ABTS_TASK_TERM;
 
-		sc_tgt = starget_to_tgt(scsi_target(sc->device));
-		if (sc_tgt != tgt) {
-			spin_unlock_irqrestore(io_lock, flags);
-
-			continue;
-		}
-		spin_unlock_irqrestore(io_lock, flags);
-
-		ret = snic_internal_abort_io(snic, sc, tmf);
-		if (ret < 0) {
-			SNIC_HOST_ERR(snic->shost,
-				      "tgt_abt_io: Tag %x, Failed w err = %d\n",
-				      tag, ret);
-
-			continue;
-		}
-
-		if (ret == SUCCESS)
-			abt_cnt++;
-	}
+	scsi_host_busy_iter(snic->shost, snic_tgt_scsi_abort_io_iter, &data);
 
-	SNIC_SCSI_DBG(snic->shost, "tgt_abt_io: abt_cnt = %d\n", abt_cnt);
+	SNIC_SCSI_DBG(snic->shost, "tgt_abt_io: abt_cnt = %d\n", data.abt_cnt);
 
 	return 0;
 } /* end of snic_tgt_scsi_abort_io */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/15] fnic: allocate device reset command on the fly
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (11 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 12/15] snic: Use scsi_host_busy_iter() to traverse commands Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 14/15] fnic: use fc_block_rport() correctly Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 15/15] csiostor: use separate TMF command Hannes Reinecke
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Allocate a reset command on the fly instead of relying
on using the command which triggered the device failure.
This might fail if all available tags are busy, but in
that case it'll be safer to fall back to host reset anyway.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/fnic/fnic.h      |   1 -
 drivers/scsi/fnic/fnic_scsi.c | 108 +++++++++++++++-------------------
 drivers/scsi/snic/snic_scsi.c |   5 +-
 3 files changed, 48 insertions(+), 66 deletions(-)

diff --git a/drivers/scsi/fnic/fnic.h b/drivers/scsi/fnic/fnic.h
index 93c68931a593..8ffcafb4687f 100644
--- a/drivers/scsi/fnic/fnic.h
+++ b/drivers/scsi/fnic/fnic.h
@@ -236,7 +236,6 @@ struct fnic {
 	unsigned int wq_count;
 	unsigned int cq_count;
 
-	struct mutex sgreset_mutex;
 	struct dentry *fnic_stats_debugfs_host;
 	struct dentry *fnic_stats_debugfs_file;
 	struct dentry *fnic_reset_debugfs_file;
diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 9761b2c9db48..72352ac89eab 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1984,7 +1984,6 @@ static inline int fnic_queue_dr_io_req(struct fnic *fnic,
 
 struct fnic_pending_aborts_iter_data {
 	struct fnic *fnic;
-	struct scsi_cmnd *lr_sc;
 	struct scsi_device *lun_dev;
 	int ret;
 };
@@ -2002,7 +2001,7 @@ static bool fnic_pending_aborts_iter(struct scsi_cmnd *sc, void *data)
 	DECLARE_COMPLETION_ONSTACK(tm_done);
 	enum fnic_ioreq_state old_ioreq_state;
 
-	if (sc == iter_data->lr_sc || sc->device != lun_dev)
+	if (sc->device != lun_dev)
 		return true;
 
 	io_lock = fnic_io_lock_tag(fnic, abt_tag);
@@ -2105,17 +2104,11 @@ static bool fnic_pending_aborts_iter(struct scsi_cmnd *sc, void *data)
 		return false;
 	}
 	fnic_priv(sc)->state = FNIC_IOREQ_ABTS_COMPLETE;
-
-	/* original sc used for lr is handled by dev reset code */
-	if (sc != iter_data->lr_sc)
-		fnic_priv(sc)->io_req = NULL;
+	fnic_priv(sc)->io_req = NULL;
 	spin_unlock_irqrestore(io_lock, flags);
 
-	/* original sc used for lr is handled by dev reset code */
-	if (sc != iter_data->lr_sc) {
-		fnic_release_ioreq_buf(fnic, io_req, sc);
-		mempool_free(io_req, fnic->io_req_pool);
-	}
+	fnic_release_ioreq_buf(fnic, io_req, sc);
+	mempool_free(io_req, fnic->io_req_pool);
 
 	/*
 	 * Any IO is returned during reset, it needs to call scsi_done
@@ -2135,8 +2128,7 @@ static bool fnic_pending_aborts_iter(struct scsi_cmnd *sc, void *data)
  * successfully aborted, 1 otherwise
  */
 static int fnic_clean_pending_aborts(struct fnic *fnic,
-				     struct scsi_cmnd *lr_sc,
-				     bool new_sc)
+				     struct scsi_cmnd *lr_sc)
 
 {
 	int ret = 0;
@@ -2146,9 +2138,6 @@ static int fnic_clean_pending_aborts(struct fnic *fnic,
 		.ret = SUCCESS,
 	};
 
-	if (new_sc)
-		iter_data.lr_sc = lr_sc;
-
 	scsi_host_busy_iter(fnic->lport->host,
 			    fnic_pending_aborts_iter, &iter_data);
 	if (iter_data.ret == FAILED) {
@@ -2174,7 +2163,8 @@ static int fnic_clean_pending_aborts(struct fnic *fnic,
  */
 int fnic_device_reset(struct scsi_cmnd *sc)
 {
-	struct request *rq = scsi_cmd_to_rq(sc);
+	struct scsi_device *sdev = sc->device;
+	struct request *req;
 	struct fc_lport *lp;
 	struct fnic *fnic;
 	struct fnic_io_req *io_req = NULL;
@@ -2187,15 +2177,17 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 	struct scsi_lun fc_lun;
 	struct fnic_stats *fnic_stats;
 	struct reset_stats *reset_stats;
-	int tag = rq->tag;
+	int tag;
 	DECLARE_COMPLETION_ONSTACK(tm_done);
-	bool new_sc = 0;
 
 	/* Wait for rport to unblock */
-	fc_block_scsi_eh(sc);
+	rport = starget_to_rport(scsi_target(sdev));
+	ret = fc_block_rport(rport);
+	if (ret)
+		return ret;
 
 	/* Get local-port, check ready and link up */
-	lp = shost_priv(sc->device->host);
+	lp = shost_priv(sdev->host);
 
 	fnic = lport_priv(lp);
 	fnic_stats = &fnic->fnic_stats;
@@ -2203,53 +2195,46 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 
 	atomic64_inc(&reset_stats->device_resets);
 
-	rport = starget_to_rport(scsi_target(sc->device));
 	FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
-		      "Device reset called FCID 0x%x, LUN 0x%llx sc 0x%p\n",
-		      rport->port_id, sc->device->lun, sc);
+		      "Device reset called FCID 0x%x, LUN 0x%llx\n",
+		      rport->port_id, sdev->lun);
 
 	if (lp->state != LPORT_ST_READY || !(lp->link_up))
-		goto fnic_device_reset_end;
+		return ret;
 
 	/* Check if remote port up */
 	if (fc_remote_port_chkready(rport)) {
 		atomic64_inc(&fnic_stats->misc_stats.rport_not_ready);
-		goto fnic_device_reset_end;
+		return ret;
 	}
 
-	fnic_priv(sc)->flags = FNIC_DEVICE_RESET;
-
-	if (unlikely(tag < 0)) {
-		/*
-		 * For device reset issued through sg3utils, we let
-		 * only one LUN_RESET to go through and use a special
-		 * tag equal to max_tag_id so that we don't have to allocate
-		 * or free it. It won't interact with tags
-		 * allocated by mid layer.
-		 */
-		mutex_lock(&fnic->sgreset_mutex);
-		tag = fnic->fnic_max_tag_id;
-		new_sc = 1;
+	req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
+				 BLK_MQ_REQ_NOWAIT);
+	if (!req) {
+		FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
+			      "Device reset all commands busy\n");
+		return ret;
 	}
+	sc = blk_mq_rq_to_pdu(req);
+
+	tag = req->tag;
 	io_lock = fnic_io_lock_hash(fnic, sc);
 	spin_lock_irqsave(io_lock, flags);
 	io_req = fnic_priv(sc)->io_req;
+	if (io_req)
+		goto fnic_device_reset_end;
 
-	/*
-	 * If there is a io_req attached to this command, then use it,
-	 * else allocate a new one.
-	 */
+	io_req = mempool_alloc(fnic->io_req_pool, GFP_ATOMIC);
 	if (!io_req) {
-		io_req = mempool_alloc(fnic->io_req_pool, GFP_ATOMIC);
-		if (!io_req) {
-			spin_unlock_irqrestore(io_lock, flags);
-			goto fnic_device_reset_end;
-		}
-		memset(io_req, 0, sizeof(*io_req));
-		io_req->port_id = rport->port_id;
-		fnic_priv(sc)->io_req = io_req;
+		spin_unlock_irqrestore(io_lock, flags);
+		goto fnic_device_reset_end;
 	}
+	memset(io_req, 0, sizeof(*io_req));
+	io_req->port_id = rport->port_id;
+	fnic_priv(sc)->io_req = io_req;
+
 	io_req->dr_done = &tm_done;
+	fnic_priv(sc)->flags = FNIC_DEVICE_RESET;
 	fnic_priv(sc)->state = FNIC_IOREQ_CMD_PENDING;
 	fnic_priv(sc)->lr_status = FCPIO_INVALID_CODE;
 	spin_unlock_irqrestore(io_lock, flags);
@@ -2260,11 +2245,13 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 	 * issue the device reset, if enqueue failed, clean up the ioreq
 	 * and break assoc with scsi cmd
 	 */
+	WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
 	if (fnic_queue_dr_io_req(fnic, sc, io_req)) {
 		spin_lock_irqsave(io_lock, flags);
 		io_req = fnic_priv(sc)->io_req;
 		if (io_req)
 			io_req->dr_done = NULL;
+		WRITE_ONCE(req->state, MQ_RQ_IDLE);
 		goto fnic_device_reset_clean;
 	}
 	spin_lock_irqsave(io_lock, flags);
@@ -2279,11 +2266,12 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 				    msecs_to_jiffies(FNIC_LUN_RESET_TIMEOUT));
 
 	spin_lock_irqsave(io_lock, flags);
+	WRITE_ONCE(req->state, MQ_RQ_IDLE);
 	io_req = fnic_priv(sc)->io_req;
 	if (!io_req) {
 		spin_unlock_irqrestore(io_lock, flags);
 		FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
-				"io_req is null tag 0x%x sc 0x%p\n", tag, sc);
+				"io_req is null tag 0x%x\n", tag);
 		goto fnic_device_reset_end;
 	}
 	io_req->dr_done = NULL;
@@ -2326,7 +2314,7 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 				spin_unlock_irqrestore(io_lock, flags);
 				FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
 				"Abort and terminate issued on Device reset "
-				"tag 0x%x sc 0x%p\n", tag, sc);
+				"tag 0x%x\n", tag);
 				break;
 			}
 		}
@@ -2364,7 +2352,7 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 	 * the lun reset cmd. If all cmds get cleaned, the lun reset
 	 * succeeds
 	 */
-	if (fnic_clean_pending_aborts(fnic, sc, new_sc)) {
+	if (fnic_clean_pending_aborts(fnic, sc)) {
 		spin_lock_irqsave(io_lock, flags);
 		io_req = fnic_priv(sc)->io_req;
 		FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
@@ -2393,15 +2381,15 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 	}
 
 fnic_device_reset_end:
-	FNIC_TRACE(fnic_device_reset, sc->device->host->host_no, rq->tag, sc,
+	FNIC_TRACE(fnic_device_reset, sc->device->host->host_no,
+		  scsi_cmd_to_rq(sc)->tag, sc,
 		  jiffies_to_msecs(jiffies - start_time),
 		  0, ((u64)sc->cmnd[0] << 32 |
 		  (u64)sc->cmnd[2] << 24 | (u64)sc->cmnd[3] << 16 |
 		  (u64)sc->cmnd[4] << 8 | sc->cmnd[5]),
 		  fnic_flags_and_state(sc));
 
-	if (new_sc)
-		mutex_unlock(&fnic->sgreset_mutex);
+	blk_mq_free_request(req);
 
 	FNIC_SCSI_DBG(KERN_DEBUG, fnic->lport->host,
 		      "Returning from device reset %s\n",
@@ -2633,8 +2621,6 @@ static bool fnic_abts_pending_iter(struct scsi_cmnd *sc, void *data)
 	 * ignore this lun reset cmd or cmds that do not belong to
 	 * this lun
 	 */
-	if (iter_data->lr_sc && sc == iter_data->lr_sc)
-		return true;
 	if (iter_data->lun_dev && sc->device != iter_data->lun_dev)
 		return true;
 
@@ -2677,10 +2663,8 @@ int fnic_is_abts_pending(struct fnic *fnic, struct scsi_cmnd *lr_sc)
 		.ret = 0,
 	};
 
-	if (lr_sc) {
+	if (lr_sc)
 		iter_data.lun_dev = lr_sc->device;
-		iter_data.lr_sc = lr_sc;
-	}
 
 	/* walk again to check, if IOs are still pending in fw */
 	scsi_host_busy_iter(fnic->lport->host,
diff --git a/drivers/scsi/snic/snic_scsi.c b/drivers/scsi/snic/snic_scsi.c
index 48261a37d4a6..67b78029e557 100644
--- a/drivers/scsi/snic/snic_scsi.c
+++ b/drivers/scsi/snic/snic_scsi.c
@@ -2358,7 +2358,7 @@ snic_cmpl_pending_tmreq(struct snic *snic, struct scsi_cmnd *sc)
 }
 
 static bool
-snic_scsi_cleanup_iter(struct scsi_cmnd *sc, void *data, bool reserved)
+snic_scsi_cleanup_iter(struct scsi_cmnd *sc, void *data)
 {
 	struct snic *snic = data;
 	struct snic_req_info *rqi = NULL;
@@ -2532,8 +2532,7 @@ struct snic_tgt_scsi_abort_io_data {
 	int abt_cnt;
 };
 
-static bool snic_tgt_scsi_abort_io_iter(struct scsi_cmnd *sc, void *data,
-					bool reserved)
+static bool snic_tgt_scsi_abort_io_iter(struct scsi_cmnd *sc, void *data)
 {
 	struct snic_tgt_scsi_abort_io_data *iter_data = data;
 	struct snic *snic = iter_data->snic;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 14/15] fnic: use fc_block_rport() correctly
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (12 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 13/15] fnic: allocate device reset command on the fly Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  2023-10-02 15:49 ` [PATCH 15/15] csiostor: use separate TMF command Hannes Reinecke
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke,
	Johannes Thumshirn

Use fc_block_rport() instead of fc_block_scsi_eh() and evaluate
the return value to avoid issuing TMF commands when the port
is still blocked.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 drivers/scsi/fnic/fnic_scsi.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 72352ac89eab..25af91081baf 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1719,9 +1719,6 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
 	unsigned long abt_issued_time;
 	DECLARE_COMPLETION_ONSTACK(tm_done);
 
-	/* Wait for rport to unblock */
-	fc_block_scsi_eh(sc);
-
 	/* Get local-port, check ready and link up */
 	lp = shost_priv(sc->device->host);
 
@@ -1731,6 +1728,10 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
 	term_stats = &fnic->fnic_stats.term_stats;
 
 	rport = starget_to_rport(scsi_target(sc->device));
+	ret = fc_block_rport(rport);
+	if (ret)
+		return ret;
+
 	FNIC_SCSI_DBG(KERN_DEBUG,
 		fnic->lport->host,
 		"Abort Cmd called FCID 0x%x, LUN 0x%llx TAG %x flags %x\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 15/15] csiostor: use separate TMF command
  2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
                   ` (13 preceding siblings ...)
  2023-10-02 15:49 ` [PATCH 14/15] fnic: use fc_block_rport() correctly Hannes Reinecke
@ 2023-10-02 15:49 ` Hannes Reinecke
  14 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-02 15:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, linux-scsi, Christoph Hellwig, Hannes Reinecke

Set one command aside as a TMF command, and use this command to
send the TMF. This avoids having to rely on the passed-in scsi
command when resetting the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/csiostor/csio_scsi.c | 72 ++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/csiostor/csio_scsi.c b/drivers/scsi/csiostor/csio_scsi.c
index 05e1a63e00c3..08597de49b7f 100644
--- a/drivers/scsi/csiostor/csio_scsi.c
+++ b/drivers/scsi/csiostor/csio_scsi.c
@@ -1724,7 +1724,10 @@ csio_scsi_err_handler(struct csio_hw *hw, struct csio_ioreq *req)
 	}
 
 	cmnd->result = (((host_status) << 16) | scsi_status);
-	scsi_done(cmnd);
+	if (csio_priv(cmnd)->fc_tm_flags == FCP_TMF_LUN_RESET)
+		blk_mq_set_request_complete(scsi_cmd_to_rq(cmnd));
+	else
+		scsi_done(cmnd);
 
 	/* Wake up waiting threads */
 	csio_scsi_cmnd(req) = NULL;
@@ -1752,7 +1755,10 @@ csio_scsi_cbfn(struct csio_hw *hw, struct csio_ioreq *req)
 		}
 
 		cmnd->result = (((host_status) << 16) | scsi_status);
-		scsi_done(cmnd);
+		if (csio_priv(cmnd)->fc_tm_flags == FCP_TMF_LUN_RESET)
+			blk_mq_set_request_complete(scsi_cmd_to_rq(cmnd));
+		else
+			scsi_done(cmnd);
 		csio_scsi_cmnd(req) = NULL;
 		CSIO_INC_STATS(csio_hw_to_scsim(hw), n_tot_success);
 	} else {
@@ -2056,17 +2062,21 @@ csio_tm_cbfn(struct csio_hw *hw, struct csio_ioreq *req)
 
 	/* Wake up the TM handler thread */
 	csio_scsi_cmnd(req) = NULL;
+	cmnd->host_scribble = NULL;
 }
 
 static int
 csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 {
-	struct csio_lnode *ln = shost_priv(cmnd->device->host);
+	struct scsi_device *sdev = cmnd->device;
+	struct csio_lnode *ln = shost_priv(sdev->host);
 	struct csio_hw *hw = csio_lnode_to_hw(ln);
 	struct csio_scsim *scsim = csio_hw_to_scsim(hw);
-	struct csio_rnode *rn = (struct csio_rnode *)(cmnd->device->hostdata);
+	struct csio_rnode *rn = (struct csio_rnode *)(sdev->hostdata);
 	struct csio_ioreq *ioreq = NULL;
 	struct csio_scsi_qset *sqset;
+	struct scsi_cmnd *tmf_cmnd;
+	struct request *req;
 	unsigned long flags;
 	int retval;
 	int count, ret;
@@ -2074,17 +2084,17 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 	struct csio_scsi_level_data sld;
 
 	if (!rn)
-		goto fail;
+		return FAILED;
 
 	csio_dbg(hw, "Request to reset LUN:%llu (ssni:0x%x tgtid:%d)\n",
-		      cmnd->device->lun, rn->flowid, rn->scsi_id);
+		      sdev->lun, rn->flowid, rn->scsi_id);
 
 	if (!csio_is_lnode_ready(ln)) {
 		csio_err(hw,
 			 "LUN reset cannot be issued on non-ready"
 			 " local node vnpi:0x%x (LUN:%llu)\n",
-			 ln->vnp_flowid, cmnd->device->lun);
-		goto fail;
+			 ln->vnp_flowid, sdev->lun);
+		return FAILED;
 	}
 
 	/* Lnode is ready, now wait on rport node readiness */
@@ -2103,10 +2113,19 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 		csio_err(hw,
 			 "LUN reset cannot be issued on non-ready"
 			 " remote node ssni:0x%x (LUN:%llu)\n",
-			 rn->flowid, cmnd->device->lun);
-		goto fail;
+			 rn->flowid, sdev->lun);
+		return FAILED;
 	}
 
+	req = scsi_alloc_request(sdev->request_queue, REQ_OP_DRV_IN,
+				 BLK_MQ_REQ_NOWAIT);
+	if (!req) {
+		csio_err(hw,
+			 "LUN reset TMF already busy (LUN:%llu)\n",
+			 sdev->lun);
+		return FAILED;
+	}
+	tmf_cmnd = blk_mq_rq_to_pdu(req);
 	/* Get a free ioreq structure - SM is already set to uninit */
 	ioreq = csio_get_scsi_ioreq_lock(hw, scsim);
 
@@ -2123,11 +2142,11 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 	ioreq->iq_idx		= sqset->iq_idx;
 	ioreq->eq_idx		= sqset->eq_idx;
 
-	csio_scsi_cmnd(ioreq)	= cmnd;
-	cmnd->host_scribble	= (unsigned char *)ioreq;
-	csio_priv(cmnd)->wr_status = 0;
+	csio_scsi_cmnd(ioreq)	= tmf_cmnd;
+	tmf_cmnd->host_scribble	= (unsigned char *)ioreq;
+	csio_priv(tmf_cmnd)->wr_status = 0;
 
-	csio_priv(cmnd)->fc_tm_flags = FCP_TMF_LUN_RESET;
+	csio_priv(tmf_cmnd)->fc_tm_flags = FCP_TMF_LUN_RESET;
 	ioreq->tmo		= CSIO_SCSI_LUNRST_TMO_MS / 1000;
 
 	/*
@@ -2144,30 +2163,33 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 	sld.level = CSIO_LEV_LUN;
 	sld.lnode = ioreq->lnode;
 	sld.rnode = ioreq->rnode;
-	sld.oslun = cmnd->device->lun;
+	sld.oslun = sdev->lun;
 
+	WRITE_ONCE(req->state, MQ_RQ_IN_FLIGHT);
 	spin_lock_irqsave(&hw->lock, flags);
 	/* Kick off TM SM on the ioreq */
 	retval = csio_scsi_start_tm(ioreq);
 	spin_unlock_irqrestore(&hw->lock, flags);
 
 	if (retval != 0) {
+		blk_mq_set_request_complete(req);
 		csio_err(hw, "Failed to issue LUN reset, req:%p, status:%d\n",
 			    ioreq, retval);
+		tmf_cmnd->host_scribble = NULL;
 		goto fail_ret_ioreq;
 	}
 
 	csio_dbg(hw, "Waiting max %d secs for LUN reset completion\n",
 		    count * (CSIO_SCSI_TM_POLL_MS / 1000));
 	/* Wait for completion */
-	while ((((struct scsi_cmnd *)csio_scsi_cmnd(ioreq)) == cmnd)
+	while ((((struct scsi_cmnd *)csio_scsi_cmnd(ioreq)) == tmf_cmnd)
 								&& count--)
 		msleep(CSIO_SCSI_TM_POLL_MS);
 
 	/* LUN reset timed-out */
-	if (((struct scsi_cmnd *)csio_scsi_cmnd(ioreq)) == cmnd) {
+	if (((struct scsi_cmnd *)csio_scsi_cmnd(ioreq)) == tmf_cmnd) {
 		csio_err(hw, "LUN reset (%d:%llu) timed out\n",
-			 cmnd->device->id, cmnd->device->lun);
+			 sdev->id, sdev->lun);
 
 		spin_lock_irq(&hw->lock);
 		csio_scsi_drvcleanup(ioreq);
@@ -2178,10 +2200,10 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 	}
 
 	/* LUN reset returned, check cached status */
-	if (csio_priv(cmnd)->wr_status != FW_SUCCESS) {
+	if (csio_priv(tmf_cmnd)->wr_status != FW_SUCCESS) {
 		csio_err(hw, "LUN reset failed (%d:%llu), status: %d\n",
-			 cmnd->device->id, cmnd->device->lun,
-			 csio_priv(cmnd)->wr_status);
+			 sdev->id, sdev->lun,
+			 csio_priv(tmf_cmnd)->wr_status);
 		goto fail;
 	}
 
@@ -2201,7 +2223,7 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 	if (retval != 0) {
 		csio_err(hw,
 			 "Attempt to abort I/Os during LUN reset of %llu"
-			 " returned %d\n", cmnd->device->lun, retval);
+			 " returned %d\n", sdev->lun, retval);
 		/* Return I/Os back to active_q */
 		spin_lock_irq(&hw->lock);
 		list_splice_tail_init(&local_q, &scsim->active_q);
@@ -2212,14 +2234,16 @@ csio_eh_lun_reset_handler(struct scsi_cmnd *cmnd)
 	CSIO_INC_STATS(rn, n_lun_rst);
 
 	csio_info(hw, "LUN reset occurred (%d:%llu)\n",
-		  cmnd->device->id, cmnd->device->lun);
-
+		  sdev->id, sdev->lun);
+	blk_mq_free_request(req);
 	return SUCCESS;
 
 fail_ret_ioreq:
 	csio_put_scsi_ioreq_lock(hw, scsim, ioreq);
 fail:
 	CSIO_INC_STATS(rn, n_lun_rst_fail);
+	if (req)
+		blk_mq_free_request(req);
 	return FAILED;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset
  2023-10-02 15:49 ` [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset Hannes Reinecke
@ 2023-10-12 13:54   ` Benjamin Block
  2023-10-12 14:23     ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Benjamin Block @ 2023-10-12 13:54 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Steffen Maier

Hey Hannes,

I've got a few questions re the rational for this change.

On Mon, Oct 02, 2023 at 05:49:13PM +0200, Hannes Reinecke wrote:
> zfcp_scsi_eh_host_reset_handler() would call fc_block_rport() to
> wait for all rports to become unblocked after host reset.
> But after host reset it might happen that the port is gone, hence
> fc_block_rport() might fail due to a missing port.
> But that's a perfectly legal operation; on FC remote ports might
> come and go.
> In the same vein FC HBAs are able to deal with ports being temporarily
> blocked, so really there is not point in waiting for all ports
> to become unblocked during host reset.

But in scsi_transport_fc.c we have this documented:

    * fc_block_scsi_eh - Block SCSI eh thread for blocked fc_rport
    * @cmnd: SCSI command that scsi_eh is trying to recover
    *
    * This routine can be called from a FC LLD scsi_eh callback. It
    * blocks the scsi_eh thread until the fc_rport leaves the
    * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
    * necessary to avoid the scsi_eh failing recovery actions for blocked
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    * rports which would lead to offlined SCSI devices.
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So I don't understand what the real expectation by the SCSI EH call back for
host reset is then.

Is it that all objects (host/target ports/luns) are operational again once we
return to the EH thread, or is it ok that some parts are still being
recovered (as with our host reset handler, rports might still be blocked after
`zfcp_erp_wait()` finishes, because of how this is organized internally).

If it's the later, I'd think this change is fine. But then I'd wonder why this
function exists in the first place? Is it because in other EH steps it's more
important that rports are ready after the step (e.g. because a TUR is send
after, and if that fails, things get escalate unnecessarily)?

Oh.. speaking of that, we do send a TUR after host reset as well
(`scsi_eh_test_devices()`). So doesn't this break then if one or more rports
are sill blocked after host reset returns? 
    At least `zfcp_scsi_queuecommand()` will bail very early if the rport is
not ready (we call `fc_remote_port_chkready()` as more or less first thing),
and so `scsi_send_eh_cmnd()` that is used for the TUR will fail; then it might
be retried one time, but this is a tight loop without any delay, so I'd guess
this has a good chance to fail as well.
    And then we'd offline the whole host as further escalation, which would
*REALLY* suck (with no automatic recovery no less).

My impression from look at the code that follows `scsi_try_host_reset()` in
`scsi_error.c` really is, it rather expects things to be ready to be used
after, right there and then (admittedly, this is probably already today
problematic, as things might go back to not working concurrently because of
some fabric event.. but anyway, we can life with that off-chance it seems).

Or do I miss something?

> Hence remove the call to fc_block_rport() in host reset.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Cc: Steffen Maier <maier@linux.ibm.com
> Cc: Benjamin Block <bblock@linux.ibm.com>
> ---
>  drivers/s390/scsi/zfcp_scsi.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
> index b2a8cd792266..14f929cca271 100644
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -383,10 +383,6 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>  	}
>  	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>  	zfcp_erp_wait(adapter);
> -	fc_ret = fc_block_scsi_eh(scpnt);
> -	if (fc_ret)
> -		ret = fc_ret;
> -
>  	zfcp_dbf_scsi_eh("schrh_r", adapter, ~0, ret);
>  	return ret;
>  }
> -- 
> 2.35.3
> 

-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Gregor Pillen         /         Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset
  2023-10-12 13:54   ` Benjamin Block
@ 2023-10-12 14:23     ` Hannes Reinecke
  2023-10-12 17:49       ` Benjamin Block
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-12 14:23 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Steffen Maier

On 10/12/23 15:54, Benjamin Block wrote:
> Hey Hannes,
> 
> I've got a few questions re the rational for this change.
> 
> On Mon, Oct 02, 2023 at 05:49:13PM +0200, Hannes Reinecke wrote:
>> zfcp_scsi_eh_host_reset_handler() would call fc_block_rport() to
>> wait for all rports to become unblocked after host reset.
>> But after host reset it might happen that the port is gone, hence
>> fc_block_rport() might fail due to a missing port.
>> But that's a perfectly legal operation; on FC remote ports might
>> come and go.
>> In the same vein FC HBAs are able to deal with ports being temporarily
>> blocked, so really there is not point in waiting for all ports
>> to become unblocked during host reset.
> 
> But in scsi_transport_fc.c we have this documented:
> 
>      * fc_block_scsi_eh - Block SCSI eh thread for blocked fc_rport
>      * @cmnd: SCSI command that scsi_eh is trying to recover
>      *
>      * This routine can be called from a FC LLD scsi_eh callback. It
>      * blocks the scsi_eh thread until the fc_rport leaves the
>      * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
>      * necessary to avoid the scsi_eh failing recovery actions for blocked
>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      * rports which would lead to offlined SCSI devices.
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> So I don't understand what the real expectation by the SCSI EH call back for
> host reset is then.
> 
> Is it that all objects (host/target ports/luns) are operational again once we
> return to the EH thread, or is it ok that some parts are still being
> recovered (as with our host reset handler, rports might still be blocked after
> `zfcp_erp_wait()` finishes, because of how this is organized internally).
> 
> If it's the later, I'd think this change is fine. But then I'd wonder why this
> function exists in the first place? Is it because in other EH steps it's more
> important that rports are ready after the step (e.g. because a TUR is send
> after, and if that fails, things get escalate unnecessarily)?
> 
Thing is, fc_block_scsi_eh() is assumed to be called from eh callbacks
_before_ any TMFs are to be sent.
Typically you would call them in eh_device_reset() or eh_target_reset()
to ensure that you can sent TMFs in the first place; no point in attempting
to send TMFs is the port is blocked.

Your particular case is arguably a mis-use of fc_block_scsi_eh() as
it is called _after_ host reset is initiated, essentially serving as
a completion point to ensure that all rports are back online.

However, for the FC transport implementation rport lifetimes are
decoupled from SCSI Host lifetimes; rports may (and do!) come and
go during the lifetime of a SCSI host. Consequently there is no
difference between a host with all rports blocked (eg during RSCN
processing) and a host just coming on-line after SCSI EH where rports
are still in the process of getting ready.

Hence the use of fc_block_scsi_eh() after host reset is not required,
and we can make our life easier by just dropping the call.

> Oh.. speaking of that, we do send a TUR after host reset as well
> (`scsi_eh_test_devices()`). So doesn't this break then if one or more rports
> are sill blocked after host reset returns?
>      At least `zfcp_scsi_queuecommand()` will bail very early if the rport is
> not ready (we call `fc_remote_port_chkready()` as more or less first thing),
> and so `scsi_send_eh_cmnd()` that is used for the TUR will fail; then it might
> be retried one time, but this is a tight loop without any delay, so I'd guess
> this has a good chance to fail as well.
>      And then we'd offline the whole host as further escalation, which would
> *REALLY* suck (with no automatic recovery no less).
> 
> My impression from look at the code that follows `scsi_try_host_reset()` in
> `scsi_error.c` really is, it rather expects things to be ready to be used
> after, right there and then (admittedly, this is probably already today
> problematic, as things might go back to not working concurrently because of
> some fabric event.. but anyway, we can life with that off-chance it seems).
> 
> Or do I miss something?
> 
Ah, right. True, when the rports are not ready (ie still being blocked)
sending a TEST UNIT READY will fail, with probably unintended consequences.

But: if host reset would return FAST_IO_FAIL everything would be dandy
as then we would just check if the devices are online (by virtue of
scsi_eh_flush_done_q() in scsi_unjam_host()), which they really should
as no-one should have set them offline by then.

So I guess that's the correct way to go.
Will be modifying the patch accordingly.

Thanks for the feedback!

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset
  2023-10-12 14:23     ` Hannes Reinecke
@ 2023-10-12 17:49       ` Benjamin Block
  2023-10-13  6:18         ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Benjamin Block @ 2023-10-12 17:49 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Steffen Maier

On Thu, Oct 12, 2023 at 04:23:47PM +0200, Hannes Reinecke wrote:
> On 10/12/23 15:54, Benjamin Block wrote:
> > On Mon, Oct 02, 2023 at 05:49:13PM +0200, Hannes Reinecke wrote:
> >> zfcp_scsi_eh_host_reset_handler() would call fc_block_rport() to
> >> wait for all rports to become unblocked after host reset.
> >> But after host reset it might happen that the port is gone, hence
> >> fc_block_rport() might fail due to a missing port.
> >> But that's a perfectly legal operation; on FC remote ports might
> >> come and go.
> >> In the same vein FC HBAs are able to deal with ports being temporarily
> >> blocked, so really there is not point in waiting for all ports
> >> to become unblocked during host reset.
> > 
> > But in scsi_transport_fc.c we have this documented:
> > 
> >      * fc_block_scsi_eh - Block SCSI eh thread for blocked fc_rport
> >      * @cmnd: SCSI command that scsi_eh is trying to recover
> >      *
> >      * This routine can be called from a FC LLD scsi_eh callback. It
> >      * blocks the scsi_eh thread until the fc_rport leaves the
> >      * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
> >      * necessary to avoid the scsi_eh failing recovery actions for blocked
> >                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >      * rports which would lead to offlined SCSI devices.
> >        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > So I don't understand what the real expectation by the SCSI EH call back for
> > host reset is then.
> > 
> > Is it that all objects (host/target ports/luns) are operational again once we
> > return to the EH thread, or is it ok that some parts are still being
> > recovered (as with our host reset handler, rports might still be blocked after
> > `zfcp_erp_wait()` finishes, because of how this is organized internally).
> > 
> > If it's the later, I'd think this change is fine. But then I'd wonder why this
> > function exists in the first place? Is it because in other EH steps it's more
> > important that rports are ready after the step (e.g. because a TUR is send
> > after, and if that fails, things get escalate unnecessarily)?
> >
>
> Thing is, fc_block_scsi_eh() is assumed to be called from eh callbacks
> _before_ any TMFs are to be sent.
> Typically you would call them in eh_device_reset() or eh_target_reset()
> to ensure that you can sent TMFs in the first place; no point in attempting
> to send TMFs is the port is blocked.

Ok. Interesting. We don't really care about the state of the rport when
sending TMFs or Aborts, as those commands are sent outside the normal
queuecommand flow (we just check "internal bits"), in case of Aborts we even
hand this off to firmware. Consequently we don't really care about their state
before trying to send either.

> Your particular case is arguably a mis-use of fc_block_scsi_eh() as
> it is called _after_ host reset is initiated, essentially serving as
> a completion point to ensure that all rports are back online.
> 
> However, for the FC transport implementation rport lifetimes are
> decoupled from SCSI Host lifetimes; rports may (and do!) come and
> go during the lifetime of a SCSI host.

Ye, Ack. We can't control Fabric events.

> Consequently there is no
> difference between a host with all rports blocked (eg during RSCN
> processing) and a host just coming on-line after SCSI EH where rports
> are still in the process of getting ready.
> 
> Hence the use of fc_block_scsi_eh() after host reset is not required,
> and we can make our life easier by just dropping the call.
> 
> > Oh.. speaking of that, we do send a TUR after host reset as well
> > (`scsi_eh_test_devices()`). So doesn't this break then if one or more rports
> > are sill blocked after host reset returns?
> >      At least `zfcp_scsi_queuecommand()` will bail very early if the rport is
> > not ready (we call `fc_remote_port_chkready()` as more or less first thing),
> > and so `scsi_send_eh_cmnd()` that is used for the TUR will fail; then it might
> > be retried one time, but this is a tight loop without any delay, so I'd guess
> > this has a good chance to fail as well.
> >      And then we'd offline the whole host as further escalation, which would
> > *REALLY* suck (with no automatic recovery no less).
> > 
> > My impression from look at the code that follows `scsi_try_host_reset()` in
> > `scsi_error.c` really is, it rather expects things to be ready to be used
> > after, right there and then (admittedly, this is probably already today
> > problematic, as things might go back to not working concurrently because of
> > some fabric event.. but anyway, we can life with that off-chance it seems).
> > 
> > Or do I miss something?
> > 
>
> Ah, right. True, when the rports are not ready (ie still being blocked)
> sending a TEST UNIT READY will fail, with probably unintended consequences.
> 
> But: if host reset would return FAST_IO_FAIL everything would be dandy

Ok, so that would mean, we finish all commands left in the EH work_q with
`scsi_eh_finish_cmd()`, and not populate the local `check_list` at all, which
in turns means, we don't do anything in `scsi_eh_test_devices()` (no state
checks, not TURs).

> as then we would just check if the devices are online (by virtue of
> scsi_eh_flush_done_q() in scsi_unjam_host()), which they really should
> as no-one should have set them offline by then.

When returning to `scsi_unjam_host()` directly after we return from
`scsi_eh_host_reset()` we call into `scsi_eh_flush_done_q()` and go over all
commands that are now in the done-queue (everything, if host reset returned
FAST_IO_FAIL).

In there we delete the commands from the EH list, and then check whether we
ought to retry the command on the same SDEV or return it to some upper layer
(i.e. hopefully dm-multipath for our installations).

The former depends on whether the SDEV is online again. If everything is fine
in the SAN (not cable pulled or something), I think this should be the case,
but IFF we assume the rport is still blocked because the async registration
(`zfcp_scsi_rport_work()`) hasn't finished yet (the original point for using
`fc_block_scsi_eh()`), then the SDEV might still be in state
SDEV_TRANSPORT_OFFLINE.
    This can happen during adapter recovery (where we block, IOW call
`fc_remote_port_delete()` on all rports) if fast-io-fail-tmo runs out, and
`fc_terminate_rport_io()` is called.
    That is undone when we call `fc_remote_port_add()` to 'unblock' the rport.
This would then set all SDEVs into RUNNING again. And there we have the
interaction with `fc_block_scsi_eh()` again.

Hmm. I think I could life with both though. If someone drives I/O directly on
the SDEV, and it fails after EH because of some unfortunate timing, that's bad
luck, and something was actually wrong in the SAN if fast-io-fail-tmo runs out
during recovery. They ought to use dm-multipath.
    And if they do, the commands are re-issued from that layer. I think that
should be fine.

So I think we can work with returning FAST_IO_FAIL from
`zfcp_scsi_eh_host_reset_handler()`, and removing the call to
`fc_block_scsi_eh()`.
    We (Steffen and/or I) might still want to look into some other solution
for only returning from that when we know the async rport registrations have
ran at least once after adapter recovery. But as far as your patchset goes, I
don't think that is a gate.


-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Gregor Pillen         /         Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset
  2023-10-12 17:49       ` Benjamin Block
@ 2023-10-13  6:18         ` Hannes Reinecke
  0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2023-10-13  6:18 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, James Bottomley, linux-scsi,
	Christoph Hellwig, Steffen Maier

On 10/12/23 19:49, Benjamin Block wrote:
> On Thu, Oct 12, 2023 at 04:23:47PM +0200, Hannes Reinecke wrote:
>> On 10/12/23 15:54, Benjamin Block wrote:
>>> On Mon, Oct 02, 2023 at 05:49:13PM +0200, Hannes Reinecke wrote:
>>>> zfcp_scsi_eh_host_reset_handler() would call fc_block_rport() to
>>>> wait for all rports to become unblocked after host reset.
>>>> But after host reset it might happen that the port is gone, hence
>>>> fc_block_rport() might fail due to a missing port.
>>>> But that's a perfectly legal operation; on FC remote ports might
>>>> come and go.
>>>> In the same vein FC HBAs are able to deal with ports being temporarily
>>>> blocked, so really there is not point in waiting for all ports
>>>> to become unblocked during host reset.
>>>
>>> But in scsi_transport_fc.c we have this documented:
>>>
>>>       * fc_block_scsi_eh - Block SCSI eh thread for blocked fc_rport
>>>       * @cmnd: SCSI command that scsi_eh is trying to recover
>>>       *
>>>       * This routine can be called from a FC LLD scsi_eh callback. It
>>>       * blocks the scsi_eh thread until the fc_rport leaves the
>>>       * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
>>>       * necessary to avoid the scsi_eh failing recovery actions for blocked
>>>                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>       * rports which would lead to offlined SCSI devices.
>>>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> So I don't understand what the real expectation by the SCSI EH call back for
>>> host reset is then.
>>>
>>> Is it that all objects (host/target ports/luns) are operational again once we
>>> return to the EH thread, or is it ok that some parts are still being
>>> recovered (as with our host reset handler, rports might still be blocked after
>>> `zfcp_erp_wait()` finishes, because of how this is organized internally).
>>>
>>> If it's the later, I'd think this change is fine. But then I'd wonder why this
>>> function exists in the first place? Is it because in other EH steps it's more
>>> important that rports are ready after the step (e.g. because a TUR is send
>>> after, and if that fails, things get escalate unnecessarily)?
>>>
>>
>> Thing is, fc_block_scsi_eh() is assumed to be called from eh callbacks
>> _before_ any TMFs are to be sent.
>> Typically you would call them in eh_device_reset() or eh_target_reset()
>> to ensure that you can sent TMFs in the first place; no point in attempting
>> to send TMFs is the port is blocked.
> 
> Ok. Interesting. We don't really care about the state of the rport when
> sending TMFs or Aborts, as those commands are sent outside the normal
> queuecommand flow (we just check "internal bits"), in case of Aborts we even
> hand this off to firmware. Consequently we don't really care about their state
> before trying to send either.
> 
Interesting. All others do care about the state of the rport, as for
them sending commands to a blocked rport will just cause
the TMF to fail.

[ .. ]
>>> My impression from look at the code that follows `scsi_try_host_reset()` in
>>> `scsi_error.c` really is, it rather expects things to be ready to be used
>>> after, right there and then (admittedly, this is probably already today
>>> problematic, as things might go back to not working concurrently because of
>>> some fabric event.. but anyway, we can life with that off-chance it seems).
>>>
>>> Or do I miss something?
>>>
>>
>> Ah, right. True, when the rports are not ready (ie still being blocked)
>> sending a TEST UNIT READY will fail, with probably unintended consequences.
>>
>> But: if host reset would return FAST_IO_FAIL everything would be dandy
> 
> Ok, so that would mean, we finish all commands left in the EH work_q with
> `scsi_eh_finish_cmd()`, and not populate the local `check_list` at all, which
> in turns means, we don't do anything in `scsi_eh_test_devices()` (no state
> checks, not TURs).
> 
>> as then we would just check if the devices are online (by virtue of
>> scsi_eh_flush_done_q() in scsi_unjam_host()), which they really should
>> as no-one should have set them offline by then.
> 
> When returning to `scsi_unjam_host()` directly after we return from
> `scsi_eh_host_reset()` we call into `scsi_eh_flush_done_q()` and go over all
> commands that are now in the done-queue (everything, if host reset returned
> FAST_IO_FAIL).
> 
> In there we delete the commands from the EH list, and then check whether we
> ought to retry the command on the same SDEV or return it to some upper layer
> (i.e. hopefully dm-multipath for our installations).
> 
> The former depends on whether the SDEV is online again. If everything is fine
> in the SAN (not cable pulled or something), I think this should be the case,
> but IFF we assume the rport is still blocked because the async registration
> (`zfcp_scsi_rport_work()`) hasn't finished yet (the original point for using
> `fc_block_scsi_eh()`), then the SDEV might still be in state
> SDEV_TRANSPORT_OFFLINE.
>      This can happen during adapter recovery (where we block, IOW call
> `fc_remote_port_delete()` on all rports) if fast-io-fail-tmo runs out, and
> `fc_terminate_rport_io()` is called.
>      That is undone when we call `fc_remote_port_add()` to 'unblock' the rport.
> This would then set all SDEVs into RUNNING again. And there we have the
> interaction with `fc_block_scsi_eh()` again.
> 
Yes, but that is perfectly fine, and in fact exactly as things should
work. Once a device is in SDEV_TRANSPORT_OFFLINE it means that the
underlying rport has been deleted after dev_loss_tmo has expired.
If that happened during SCSI EH, well, tough luck. I/O would have
been aborted even without SCSI EH here.
All fine by me.

> Hmm. I think I could life with both though. If someone drives I/O directly on
> the SDEV, and it fails after EH because of some unfortunate timing, that's bad
> luck, and something was actually wrong in the SAN if fast-io-fail-tmo runs out
> during recovery. They ought to use dm-multipath.
>      And if they do, the commands are re-issued from that layer. I think that
> should be fine.
> 
> So I think we can work with returning FAST_IO_FAIL from
> `zfcp_scsi_eh_host_reset_handler()`, and removing the call to
> `fc_block_scsi_eh()`.
>      We (Steffen and/or I) might still want to look into some other solution
> for only returning from that when we know the async rport registrations have
> ran at least once after adapter recovery. But as far as your patchset goes, I
> don't think that is a gate.
> 
Thanks a lot. Modification to zfcp have been bogging me down for quite some
while, glad that we've found this rather easy solution.

Will be sending an updated version.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-13  6:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 15:49 [PATCHv500/15] scsi: EH rework prep patches, part 2 Hannes Reinecke
2023-10-02 15:49 ` [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset Hannes Reinecke
2023-10-12 13:54   ` Benjamin Block
2023-10-12 14:23     ` Hannes Reinecke
2023-10-12 17:49       ` Benjamin Block
2023-10-13  6:18         ` Hannes Reinecke
2023-10-02 15:49 ` [PATCH 02/15] bfa: Do not use scsi command to signal TMF status Hannes Reinecke
2023-10-02 15:49 ` [PATCH 03/15] aha152x: look for stuck command when resetting device Hannes Reinecke
2023-10-02 15:49 ` [PATCH 04/15] a1000u2w: do not rely on the command for inia100_device_reset() Hannes Reinecke
2023-10-02 15:49 ` [PATCH 05/15] fas216: Rework device reset to not rely on SCSI command pointer Hannes Reinecke
2023-10-02 15:49 ` [PATCH 06/15] xen-scsifront: add scsi device as argument to scsifront_do_request() Hannes Reinecke
2023-10-02 15:49 ` [PATCH 07/15] xen-scsifront: rework scsifront_action_handler() Hannes Reinecke
2023-10-02 15:49 ` [PATCH 08/15] libiscsi: use cls_session as argument for target and session reset Hannes Reinecke
2023-10-02 15:49 ` [PATCH 09/15] scsi_transport_iscsi: use session as argument for iscsi_block_scsi_eh() Hannes Reinecke
2023-10-02 15:49 ` [PATCH 10/15] snic: reserve tag for TMF Hannes Reinecke
2023-10-02 15:49 ` [PATCH 11/15] snic: allocate device reset command Hannes Reinecke
2023-10-02 15:49 ` [PATCH 12/15] snic: Use scsi_host_busy_iter() to traverse commands Hannes Reinecke
2023-10-02 15:49 ` [PATCH 13/15] fnic: allocate device reset command on the fly Hannes Reinecke
2023-10-02 15:49 ` [PATCH 14/15] fnic: use fc_block_rport() correctly Hannes Reinecke
2023-10-02 15:49 ` [PATCH 15/15] csiostor: use separate TMF command Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox