* [PATCH 0/3] Fix bugs in qla2xxx driver
@ 2024-10-08 13:23 Anastasia Kovaleva
2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Anastasia Kovaleva @ 2024-10-08 13:23 UTC (permalink / raw)
To: target-devel
Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley,
martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani,
linux-scsi, linux-kernel, linux
This series of patches contains 3 separate changes that fix some bugs in
the qla2xxx driver.
Anastasia Kovaleva (3):
scsi: qla2xxx: Drop starvation counter on success
scsi: qla2xxx: make target send correct LOGO
scsi: qla2xxx: Remove incorrect trap
drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++
drivers/scsi/qla2xxx/qla_isr.c | 4 ++++
drivers/scsi/qla2xxx/qla_target.c | 21 ++++++++++++---------
3 files changed, 27 insertions(+), 9 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success 2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva @ 2024-10-08 13:24 ` Anastasia Kovaleva 2024-10-09 6:30 ` Hannes Reinecke 2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva 2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva 2 siblings, 1 reply; 7+ messages in thread From: Anastasia Kovaleva @ 2024-10-08 13:24 UTC (permalink / raw) To: target-devel Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley, martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani, linux-scsi, linux-kernel, linux Long-lived sessions under high load can accumulate a starvation counter, and the current implementation does not allow this counter to be reset during an active session. If HBA sends correct ATIO IOCB, then it has enough resources to process commands and we should not call ISP recovery. Cc: stable@vger.kernel.org Fixes: ead038556f64 ("qla2xxx: Add Dual mode support in the driver") Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> --- drivers/scsi/qla2xxx/qla_isr.c | 4 ++++ drivers/scsi/qla2xxx/qla_target.c | 11 +++++++++++ 2 files changed, 15 insertions(+) diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index fe98c76e9be3..5234ce0985e0 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -1959,6 +1959,10 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb) ql_dbg(ql_dbg_async, vha, 0x5091, "Transceiver Removal\n"); break; + case MBA_REJECTED_FCP_CMD: + ql_dbg(ql_dbg_async, vha, 0x5092, "LS_RJT was sent. No resources to process the ELS request.\n"); + break; + default: ql_dbg(ql_dbg_async, vha, 0x5057, "Unknown AEN:%04x %04x %04x %04x\n", diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index d7551b1443e4..bc7feef6ee79 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -6801,6 +6801,7 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked) struct qla_hw_data *ha = vha->hw; struct atio_from_isp *pkt; int cnt, i; + unsigned long flags = 0; if (!ha->flags.fw_started) return; @@ -6826,6 +6827,16 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked) qlt_send_term_exchange(ha->base_qpair, NULL, pkt, ha_locked, 0); } else { + /* + * If we get correct ATIO, then HBA had enough memory + * to proceed without reset. + */ + if (!ha_locked) + spin_lock_irqsave(&ha->hardware_lock, flags); + vha->hw->exch_starvation = 0; + if (!ha_locked) + spin_unlock_irqrestore(&ha->hardware_lock, flags); + qlt_24xx_atio_pkt_all_vps(vha, (struct atio_from_isp *)pkt, ha_locked); } -- 2.40.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success 2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva @ 2024-10-09 6:30 ` Hannes Reinecke 0 siblings, 0 replies; 7+ messages in thread From: Hannes Reinecke @ 2024-10-09 6:30 UTC (permalink / raw) To: Anastasia Kovaleva, target-devel Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley, martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani, linux-scsi, linux-kernel, linux On 10/8/24 15:24, Anastasia Kovaleva wrote: > Long-lived sessions under high load can accumulate a starvation counter, > and the current implementation does not allow this counter to be reset > during an active session. > > If HBA sends correct ATIO IOCB, then it has enough resources to process > commands and we should not call ISP recovery. > > Cc: stable@vger.kernel.org > Fixes: ead038556f64 ("qla2xxx: Add Dual mode support in the driver") > Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> > Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> > --- > drivers/scsi/qla2xxx/qla_isr.c | 4 ++++ > drivers/scsi/qla2xxx/qla_target.c | 11 +++++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c > index fe98c76e9be3..5234ce0985e0 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -1959,6 +1959,10 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb) > ql_dbg(ql_dbg_async, vha, 0x5091, "Transceiver Removal\n"); > break; > > + case MBA_REJECTED_FCP_CMD: > + ql_dbg(ql_dbg_async, vha, 0x5092, "LS_RJT was sent. No resources to process the ELS request.\n"); > + break; > + > default: > ql_dbg(ql_dbg_async, vha, 0x5057, > "Unknown AEN:%04x %04x %04x %04x\n", > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c > index d7551b1443e4..bc7feef6ee79 100644 > --- a/drivers/scsi/qla2xxx/qla_target.c > +++ b/drivers/scsi/qla2xxx/qla_target.c > @@ -6801,6 +6801,7 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked) > struct qla_hw_data *ha = vha->hw; > struct atio_from_isp *pkt; > int cnt, i; > + unsigned long flags = 0; > > if (!ha->flags.fw_started) > return; > @@ -6826,6 +6827,16 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked) > qlt_send_term_exchange(ha->base_qpair, NULL, pkt, > ha_locked, 0); > } else { > + /* > + * If we get correct ATIO, then HBA had enough memory > + * to proceed without reset. > + */ > + if (!ha_locked) > + spin_lock_irqsave(&ha->hardware_lock, flags); > + vha->hw->exch_starvation = 0; > + if (!ha_locked) > + spin_unlock_irqrestore(&ha->hardware_lock, flags); > + > qlt_24xx_atio_pkt_all_vps(vha, > (struct atio_from_isp *)pkt, ha_locked); > } Why not just 'WRITE_ONCE()' and drop the spinlock? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.com +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO 2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva 2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva @ 2024-10-08 13:24 ` Anastasia Kovaleva 2024-10-09 6:32 ` Hannes Reinecke 2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva 2 siblings, 1 reply; 7+ messages in thread From: Anastasia Kovaleva @ 2024-10-08 13:24 UTC (permalink / raw) To: target-devel Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley, martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani, linux-scsi, linux-kernel, linux Upon removing the ACL from the target, it sends a LOGO command to the initiator to break the connection. But HBA fills port_name and port_id of the LOGO command with all zeroes, which is not valid. The initiator sends a reject for this command, but it is not being processed on the target, since it assumes LOGO can never fail. This leaves a system in a state where the initiator thinks it is still logged in to the target and can send commands to it, but the target ignores all incoming commands from this initiator. If, in such a situation, the initiator sends some command (e.g. during a scan), after not receiving a response for a timeout duration, it sends ABORT for the command. After a timeout on receiving an ABORT response, the initiator sends LOGO to the target. Only after that, the initiator can successfully relogin to the target and restore the connection. In the end, this whole situation hangs the system for approximately a minute. By default, the driver sends a LOGO command to HBA filling only port_id, expecting HBA to match port_id with the correct port_name from it's internal table. HBA doesn't do that, instead filling these fields with all zeroes. This patch makes the driver send a LOGO command to HBA with port_name and port_id in the I/O PARMETER fields. HBA then copies these values to corresponding fields in the LOGO command frame. Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> --- drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c index 0b41e8a06602..90026fca14dc 100644 --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -2486,6 +2486,17 @@ qla24xx_logout_iocb(srb_t *sp, struct logio_entry_24xx *logio) logio->port_id[1] = sp->fcport->d_id.b.area; logio->port_id[2] = sp->fcport->d_id.b.domain; logio->vp_index = sp->vha->vp_idx; + logio->io_parameter[0] = cpu_to_le32(sp->vha->d_id.b.al_pa | + sp->vha->d_id.b.area << 8 | + sp->vha->d_id.b.domain << 16); + logio->io_parameter[1] = cpu_to_le32(sp->vha->port_name[3] | + sp->vha->port_name[2] << 8 | + sp->vha->port_name[1] << 16 | + sp->vha->port_name[0] << 24); + logio->io_parameter[2] = cpu_to_le32(sp->vha->port_name[7] | + sp->vha->port_name[6] << 8 | + sp->vha->port_name[5] << 16 | + sp->vha->port_name[4] << 24); } static void -- 2.40.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO 2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva @ 2024-10-09 6:32 ` Hannes Reinecke 0 siblings, 0 replies; 7+ messages in thread From: Hannes Reinecke @ 2024-10-09 6:32 UTC (permalink / raw) To: Anastasia Kovaleva, target-devel Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley, martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani, linux-scsi, linux-kernel, linux On 10/8/24 15:24, Anastasia Kovaleva wrote: > Upon removing the ACL from the target, it sends a LOGO command to the > initiator to break the connection. But HBA fills port_name and port_id > of the LOGO command with all zeroes, which is not valid. The initiator > sends a reject for this command, but it is not being processed on the > target, since it assumes LOGO can never fail. This leaves a system in a > state where the initiator thinks it is still logged in to the target and > can send commands to it, but the target ignores all incoming commands > from this initiator. > > If, in such a situation, the initiator sends some command (e.g. during a > scan), after not receiving a response for a timeout duration, it sends > ABORT for the command. After a timeout on receiving an ABORT response, > the initiator sends LOGO to the target. Only after that, the initiator > can successfully relogin to the target and restore the connection. In > the end, this whole situation hangs the system for approximately a > minute. > > By default, the driver sends a LOGO command to HBA filling only port_id, > expecting HBA to match port_id with the correct port_name from it's > internal table. HBA doesn't do that, instead filling these fields with > all zeroes. > > This patch makes the driver send a LOGO command to HBA with port_name > and port_id in the I/O PARMETER fields. HBA then copies these values to > corresponding fields in the LOGO command frame. > > Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> > Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> > --- > drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c > index 0b41e8a06602..90026fca14dc 100644 > --- a/drivers/scsi/qla2xxx/qla_iocb.c > +++ b/drivers/scsi/qla2xxx/qla_iocb.c > @@ -2486,6 +2486,17 @@ qla24xx_logout_iocb(srb_t *sp, struct logio_entry_24xx *logio) > logio->port_id[1] = sp->fcport->d_id.b.area; > logio->port_id[2] = sp->fcport->d_id.b.domain; > logio->vp_index = sp->vha->vp_idx; > + logio->io_parameter[0] = cpu_to_le32(sp->vha->d_id.b.al_pa | > + sp->vha->d_id.b.area << 8 | > + sp->vha->d_id.b.domain << 16); > + logio->io_parameter[1] = cpu_to_le32(sp->vha->port_name[3] | > + sp->vha->port_name[2] << 8 | > + sp->vha->port_name[1] << 16 | > + sp->vha->port_name[0] << 24); > + logio->io_parameter[2] = cpu_to_le32(sp->vha->port_name[7] | > + sp->vha->port_name[6] << 8 | > + sp->vha->port_name[5] << 16 | > + sp->vha->port_name[4] << 24); > } > > static void Now that looks like serious debugging. Well done. Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.com +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap 2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva 2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva 2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva @ 2024-10-08 13:24 ` Anastasia Kovaleva 2024-10-09 6:33 ` Hannes Reinecke 2 siblings, 1 reply; 7+ messages in thread From: Anastasia Kovaleva @ 2024-10-08 13:24 UTC (permalink / raw) To: target-devel Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley, martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani, linux-scsi, linux-kernel, linux This BUG_ON() is triggered when there is no fc_port with a certain loop ID in the scsi host vp_fcports list, but there is one in lport_loopid_map. As these two data structures do not change simultaneously and atomically, such a trap is invalid. Cc: stable@vger.kernel.org Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery") Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> --- drivers/scsi/qla2xxx/qla_target.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index bc7feef6ee79..9a5dbd00de01 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -5190,15 +5190,7 @@ static int qlt_24xx_handle_els(struct scsi_qla_host *vha, ql_dbg(ql_dbg_disc, vha, 0x20fc, "%s: logo %llx res %d sess %p ", __func__, wwn, res, sess); - if (res == 0) { - /* - * cmd went upper layer, look for qlt_xmit_tm_rsp() - * for LOGO_ACK & sess delete - */ - BUG_ON(!sess); - res = 0; - } else { - /* cmd did not go to upper layer. */ + if (res) { if (sess) { qlt_schedule_sess_for_deletion(sess); res = 0; -- 2.40.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap 2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva @ 2024-10-09 6:33 ` Hannes Reinecke 0 siblings, 0 replies; 7+ messages in thread From: Hannes Reinecke @ 2024-10-09 6:33 UTC (permalink / raw) To: Anastasia Kovaleva, target-devel Cc: njavali, GR-QLogic-Storage-Upstream, James.Bottomley, martin.petersen, bvanassche, quinn.tran, nab, himanshu.madhani, linux-scsi, linux-kernel, linux On 10/8/24 15:24, Anastasia Kovaleva wrote: > This BUG_ON() is triggered when there is no fc_port with a certain > loop ID in the scsi host vp_fcports list, but there is one in > lport_loopid_map. As these two data structures do not change > simultaneously and atomically, such a trap is invalid. > > Cc: stable@vger.kernel.org > Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery") > Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> > Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> > --- > drivers/scsi/qla2xxx/qla_target.c | 10 +--------- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c > index bc7feef6ee79..9a5dbd00de01 100644 > --- a/drivers/scsi/qla2xxx/qla_target.c > +++ b/drivers/scsi/qla2xxx/qla_target.c > @@ -5190,15 +5190,7 @@ static int qlt_24xx_handle_els(struct scsi_qla_host *vha, > ql_dbg(ql_dbg_disc, vha, 0x20fc, > "%s: logo %llx res %d sess %p ", > __func__, wwn, res, sess); > - if (res == 0) { > - /* > - * cmd went upper layer, look for qlt_xmit_tm_rsp() > - * for LOGO_ACK & sess delete > - */ > - BUG_ON(!sess); > - res = 0; > - } else { > - /* cmd did not go to upper layer. */ > + if (res) { > if (sess) { > qlt_schedule_sess_for_deletion(sess); > res = 0; Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.com +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-09 6:33 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-08 13:23 [PATCH 0/3] Fix bugs in qla2xxx driver Anastasia Kovaleva 2024-10-08 13:24 ` [PATCH 1/3] scsi: qla2xxx: Drop starvation counter on success Anastasia Kovaleva 2024-10-09 6:30 ` Hannes Reinecke 2024-10-08 13:24 ` [PATCH 2/3] scsi: qla2xxx: make target send correct LOGO Anastasia Kovaleva 2024-10-09 6:32 ` Hannes Reinecke 2024-10-08 13:24 ` [PATCH 3/3] scsi: qla2xxx: Remove incorrect trap Anastasia Kovaleva 2024-10-09 6:33 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox