public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix bugs in qla2xxx driver
@ 2024-10-08 13:23 Anastasia Kovaleva
  2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Anastasia Kovaleva @ 2024-10-08 13:23 UTC (permalink / raw)
  To: target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

This series of patches contains 3 separate changes that fix some bugs in
the qla2xxx driver.

Anastasia Kovaleva (3):
  scsi: qla2xxx: Drop starvation counter on success
  scsi: qla2xxx: make target send correct LOGO
  scsi: qla2xxx: Remove incorrect trap

 drivers/scsi/qla2xxx/qla_iocb.c   | 11 +++++++++++
 drivers/scsi/qla2xxx/qla_isr.c    |  4 ++++
 drivers/scsi/qla2xxx/qla_target.c | 21 ++++++++++++---------
 3 files changed, 27 insertions(+), 9 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success
  2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva
@ 2024-10-08 13:24 ` Anastasia Kovaleva
  2024-10-09  6:30   ` Hannes Reinecke
  2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva
  2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva
  2 siblings, 1 reply; 7+ messages in thread
From: Anastasia Kovaleva @ 2024-10-08 13:24 UTC (permalink / raw)
  To: target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

Long-lived sessions under high load can accumulate a starvation counter,
and the current implementation does not allow this counter to be reset
during an active session.

If HBA sends correct ATIO IOCB, then it has enough resources to process
commands and we should not call ISP recovery.

Cc: stable@vger.kernel.org
Fixes: ead038556f64 ("qla2xxx: Add Dual mode support in the driver")
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_isr.c    |  4 ++++
 drivers/scsi/qla2xxx/qla_target.c | 11 +++++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index fe98c76e9be3..5234ce0985e0 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -1959,6 +1959,10 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb)
 		ql_dbg(ql_dbg_async, vha, 0x5091, "Transceiver Removal\n");
 		break;
 
+	case MBA_REJECTED_FCP_CMD:
+		ql_dbg(ql_dbg_async, vha, 0x5092, "LS_RJT was sent. No resources to process the ELS request.\n");
+		break;
+
 	default:
 		ql_dbg(ql_dbg_async, vha, 0x5057,
 		    "Unknown AEN:%04x %04x %04x %04x\n",
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index d7551b1443e4..bc7feef6ee79 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -6801,6 +6801,7 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked)
 	struct qla_hw_data *ha = vha->hw;
 	struct atio_from_isp *pkt;
 	int cnt, i;
+	unsigned long flags = 0;
 
 	if (!ha->flags.fw_started)
 		return;
@@ -6826,6 +6827,16 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked)
 			qlt_send_term_exchange(ha->base_qpair, NULL, pkt,
 			    ha_locked, 0);
 		} else {
+			/*
+			 * If we get correct ATIO, then HBA had enough memory
+			 * to proceed without reset.
+			 */
+			if (!ha_locked)
+				spin_lock_irqsave(&ha->hardware_lock, flags);
+			vha->hw->exch_starvation = 0;
+			if (!ha_locked)
+				spin_unlock_irqrestore(&ha->hardware_lock, flags);
+
 			qlt_24xx_atio_pkt_all_vps(vha,
 			    (struct atio_from_isp *)pkt, ha_locked);
 		}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO
  2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva
  2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva
@ 2024-10-08 13:24 ` Anastasia Kovaleva
  2024-10-09  6:32   ` Hannes Reinecke
  2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva
  2 siblings, 1 reply; 7+ messages in thread
From: Anastasia Kovaleva @ 2024-10-08 13:24 UTC (permalink / raw)
  To: target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

Upon removing the ACL from the target, it sends a LOGO command to the
initiator to break the connection. But HBA fills port_name and port_id
of the LOGO command with all zeroes, which is not valid. The initiator
sends a reject for this command, but it is not being processed on the
target, since it assumes LOGO can never fail. This leaves a system in a
state where the initiator thinks it is still logged in to the target and
can send commands to it, but the target ignores all incoming commands
from this initiator.

If, in such a situation, the initiator sends some command (e.g. during a
scan), after not receiving a response for a timeout duration, it sends
ABORT for the command. After a timeout on receiving an ABORT response,
the initiator sends LOGO to the target. Only after that, the initiator
can successfully relogin to the target and restore the connection. In
the end, this whole situation hangs the system for approximately a
minute.

By default, the driver sends a LOGO command to HBA filling only port_id,
expecting HBA to match port_id with the correct port_name from it's
internal table. HBA doesn't do that, instead filling these fields with
all zeroes.

This patch makes the driver send a LOGO command to HBA with port_name
and port_id in the I/O PARMETER fields. HBA then copies these values to
corresponding fields in the LOGO command frame.

Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index 0b41e8a06602..90026fca14dc 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2486,6 +2486,17 @@ qla24xx_logout_iocb(srb_t *sp, struct logio_entry_24xx *logio)
 	logio->port_id[1] = sp->fcport->d_id.b.area;
 	logio->port_id[2] = sp->fcport->d_id.b.domain;
 	logio->vp_index = sp->vha->vp_idx;
+	logio->io_parameter[0] = cpu_to_le32(sp->vha->d_id.b.al_pa |
+				 sp->vha->d_id.b.area << 8 |
+				 sp->vha->d_id.b.domain << 16);
+	logio->io_parameter[1] = cpu_to_le32(sp->vha->port_name[3] |
+				 sp->vha->port_name[2] << 8 |
+				 sp->vha->port_name[1] << 16 |
+				 sp->vha->port_name[0] << 24);
+	logio->io_parameter[2] = cpu_to_le32(sp->vha->port_name[7] |
+				 sp->vha->port_name[6] << 8 |
+				 sp->vha->port_name[5] << 16 |
+				 sp->vha->port_name[4] << 24);
 }
 
 static void
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap
  2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva
  2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva
  2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva
@ 2024-10-08 13:24 ` Anastasia Kovaleva
  2024-10-09  6:33   ` Hannes Reinecke
  2 siblings, 1 reply; 7+ messages in thread
From: Anastasia Kovaleva @ 2024-10-08 13:24 UTC (permalink / raw)
  To: target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

This BUG_ON() is triggered when there is no fc_port with a certain
loop ID in the scsi host vp_fcports list, but there is one in
lport_loopid_map. As these two data structures do not change
simultaneously and atomically, such a trap is invalid.

Cc: stable@vger.kernel.org
Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery")
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_target.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index bc7feef6ee79..9a5dbd00de01 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -5190,15 +5190,7 @@ static int qlt_24xx_handle_els(struct scsi_qla_host *vha,
 		ql_dbg(ql_dbg_disc, vha, 0x20fc,
 		    "%s: logo %llx res %d sess %p ",
 		    __func__, wwn, res, sess);
-		if (res == 0) {
-			/*
-			 * cmd went upper layer, look for qlt_xmit_tm_rsp()
-			 * for LOGO_ACK & sess delete
-			 */
-			BUG_ON(!sess);
-			res = 0;
-		} else {
-			/* cmd did not go to upper layer. */
+		if (res) {
 			if (sess) {
 				qlt_schedule_sess_for_deletion(sess);
 				res = 0;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success
  2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva
@ 2024-10-09  6:30   ` Hannes Reinecke
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2024-10-09  6:30 UTC (permalink / raw)
  To: Anastasia Kovaleva, target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

On 10/8/24 15:24, Anastasia Kovaleva wrote:
> Long-lived sessions under high load can accumulate a starvation counter,
> and the current implementation does not allow this counter to be reset
> during an active session.
> 
> If HBA sends correct ATIO IOCB, then it has enough resources to process
> commands and we should not call ISP recovery.
> 
> Cc: stable@vger.kernel.org
> Fixes: ead038556f64 ("qla2xxx: Add Dual mode support in the driver")
> Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> ---
>   drivers/scsi/qla2xxx/qla_isr.c    |  4 ++++
>   drivers/scsi/qla2xxx/qla_target.c | 11 +++++++++++
>   2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index fe98c76e9be3..5234ce0985e0 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -1959,6 +1959,10 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb)
>   		ql_dbg(ql_dbg_async, vha, 0x5091, "Transceiver Removal\n");
>   		break;
>   
> +	case MBA_REJECTED_FCP_CMD:
> +		ql_dbg(ql_dbg_async, vha, 0x5092, "LS_RJT was sent. No resources to process the ELS request.\n");
> +		break;
> +
>   	default:
>   		ql_dbg(ql_dbg_async, vha, 0x5057,
>   		    "Unknown AEN:%04x %04x %04x %04x\n",
> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index d7551b1443e4..bc7feef6ee79 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -6801,6 +6801,7 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked)
>   	struct qla_hw_data *ha = vha->hw;
>   	struct atio_from_isp *pkt;
>   	int cnt, i;
> +	unsigned long flags = 0;
>   
>   	if (!ha->flags.fw_started)
>   		return;
> @@ -6826,6 +6827,16 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked)
>   			qlt_send_term_exchange(ha->base_qpair, NULL, pkt,
>   			    ha_locked, 0);
>   		} else {
> +			/*
> +			 * If we get correct ATIO, then HBA had enough memory
> +			 * to proceed without reset.
> +			 */
> +			if (!ha_locked)
> +				spin_lock_irqsave(&ha->hardware_lock, flags);
> +			vha->hw->exch_starvation = 0;
> +			if (!ha_locked)
> +				spin_unlock_irqrestore(&ha->hardware_lock, flags);
> +
>   			qlt_24xx_atio_pkt_all_vps(vha,
>   			    (struct atio_from_isp *)pkt, ha_locked);
>   		}

Why not just 'WRITE_ONCE()' and drop the spinlock?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.com                               +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO
  2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva
@ 2024-10-09  6:32   ` Hannes Reinecke
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2024-10-09  6:32 UTC (permalink / raw)
  To: Anastasia Kovaleva, target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

On 10/8/24 15:24, Anastasia Kovaleva wrote:
> Upon removing the ACL from the target, it sends a LOGO command to the
> initiator to break the connection. But HBA fills port_name and port_id
> of the LOGO command with all zeroes, which is not valid. The initiator
> sends a reject for this command, but it is not being processed on the
> target, since it assumes LOGO can never fail. This leaves a system in a
> state where the initiator thinks it is still logged in to the target and
> can send commands to it, but the target ignores all incoming commands
> from this initiator.
> 
> If, in such a situation, the initiator sends some command (e.g. during a
> scan), after not receiving a response for a timeout duration, it sends
> ABORT for the command. After a timeout on receiving an ABORT response,
> the initiator sends LOGO to the target. Only after that, the initiator
> can successfully relogin to the target and restore the connection. In
> the end, this whole situation hangs the system for approximately a
> minute.
> 
> By default, the driver sends a LOGO command to HBA filling only port_id,
> expecting HBA to match port_id with the correct port_name from it's
> internal table. HBA doesn't do that, instead filling these fields with
> all zeroes.
> 
> This patch makes the driver send a LOGO command to HBA with port_name
> and port_id in the I/O PARMETER fields. HBA then copies these values to
> corresponding fields in the LOGO command frame.
> 
> Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> ---
>   drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
> index 0b41e8a06602..90026fca14dc 100644
> --- a/drivers/scsi/qla2xxx/qla_iocb.c
> +++ b/drivers/scsi/qla2xxx/qla_iocb.c
> @@ -2486,6 +2486,17 @@ qla24xx_logout_iocb(srb_t *sp, struct logio_entry_24xx *logio)
>   	logio->port_id[1] = sp->fcport->d_id.b.area;
>   	logio->port_id[2] = sp->fcport->d_id.b.domain;
>   	logio->vp_index = sp->vha->vp_idx;
> +	logio->io_parameter[0] = cpu_to_le32(sp->vha->d_id.b.al_pa |
> +				 sp->vha->d_id.b.area << 8 |
> +				 sp->vha->d_id.b.domain << 16);
> +	logio->io_parameter[1] = cpu_to_le32(sp->vha->port_name[3] |
> +				 sp->vha->port_name[2] << 8 |
> +				 sp->vha->port_name[1] << 16 |
> +				 sp->vha->port_name[0] << 24);
> +	logio->io_parameter[2] = cpu_to_le32(sp->vha->port_name[7] |
> +				 sp->vha->port_name[6] << 8 |
> +				 sp->vha->port_name[5] << 16 |
> +				 sp->vha->port_name[4] << 24);
>   }
>   
>   static void

Now that looks like serious debugging. Well done.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.com                               +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap
  2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva
@ 2024-10-09  6:33   ` Hannes Reinecke
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Reinecke @ 2024-10-09  6:33 UTC (permalink / raw)
  To: Anastasia Kovaleva, target-devel
  Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
	martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
	linux-scsi, linux-kernel, linux

On 10/8/24 15:24, Anastasia Kovaleva wrote:
> This BUG_ON() is triggered when there is no fc_port with a certain
> loop ID in the scsi host vp_fcports list, but there is one in
> lport_loopid_map. As these two data structures do not change
> simultaneously and atomically, such a trap is invalid.
> 
> Cc: stable@vger.kernel.org
> Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery")
> Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> ---
>   drivers/scsi/qla2xxx/qla_target.c | 10 +---------
>   1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index bc7feef6ee79..9a5dbd00de01 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -5190,15 +5190,7 @@ static int qlt_24xx_handle_els(struct scsi_qla_host *vha,
>   		ql_dbg(ql_dbg_disc, vha, 0x20fc,
>   		    "%s: logo %llx res %d sess %p ",
>   		    __func__, wwn, res, sess);
> -		if (res == 0) {
> -			/*
> -			 * cmd went upper layer, look for qlt_xmit_tm_rsp()
> -			 * for LOGO_ACK & sess delete
> -			 */
> -			BUG_ON(!sess);
> -			res = 0;
> -		} else {
> -			/* cmd did not go to upper layer. */
> +		if (res) {
>   			if (sess) {
>   				qlt_schedule_sess_for_deletion(sess);
>   				res = 0;

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.com                               +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-09  6:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva
2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva
2024-10-09  6:30   ` Hannes Reinecke
2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva
2024-10-09  6:32   ` Hannes Reinecke
2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva
2024-10-09  6:33   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox