From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: Niklas Cassel <niklas.cassel@wdc.com>, tj@kernel.org, hare@suse.de
Cc: linux-ide@vger.kernel.org
Subject: Re: [PATCH v3] ata: libata-core: do not issue non-internal commands once EH is pending
Date: Fri, 11 Nov 2022 17:55:25 +0900 [thread overview]
Message-ID: <8a9a84da-13fc-04e9-ec23-807933b616e4@opensource.wdc.com> (raw)
In-Reply-To: <20221108231534.982516-1-niklas.cassel@wdc.com>
On 11/9/22 08:15, Niklas Cassel wrote:
> While the ATA specification states that a device should return command
> aborted for all commands queued after the device has entered error state,
> since ATA only keeps the sense data for the latest command (in non-NCQ
> case), we really don't want to send block layer commands to the device
> after it has entered error state. (Only ATA EH commands should be sent,
> to read the sense data etc.)
>
> Currently, scsi_queue_rq() will check if scsi_host_in_recovery()
> (state is SHOST_RECOVERY), and if so, it will _not_ issue a command via:
> scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>
> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
> when receiving a TFES error IRQ, the call chain looked like this:
> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>
> Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY
> would be set.
>
> However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
> scsi_times_out() will instead call scsi_abort_command() which will queue
> delayed work, and the worker function scmd_eh_abort_handler() will call
> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
>
> So now, after the TFES error IRQ has been serviced, we need to wait for
> the SCSI workqueue to run its work before SHOST_RECOVERY gets set.
>
> It is worth noting that, even before commit e494f6a72839 ("[SCSI] improved
> eh timeout handler"), we could receive an error IRQ from the time when
> scsi_queue_rq() checks scsi_host_in_recovery(), to the time when
> ata_scsi_queuecmd() is actually called.
>
> In order to handle both the delayed setting of SHOST_RECOVERY and the
> window where we can receive an error IRQ, add a check against
> ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ),
> inside ata_scsi_queuecmd() itself, while holding the ap->lock.
> (Since the ap->lock is held while servicing IRQs.)
>
> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Applied to for-6.1-fixes. Thanks !
> ---
> Changes since v2:
> -Improve commit message and the comment inside the code.
> -Move the check to __ata_scsi_queuecmd().
>
> drivers/ata/libata-scsi.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 4cb914103382..93ebcdf2e354 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3978,9 +3978,19 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
>
> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
> {
> + struct ata_port *ap = dev->link->ap;
> u8 scsi_op = scmd->cmnd[0];
> ata_xlat_func_t xlat_func;
>
> + /*
> + * scsi_queue_rq() will defer commands if scsi_host_in_recovery().
> + * However, this check is done without holding the ap->lock (a libata
> + * specific lock), so we can have received an error irq since then,
> + * therefore we must check if EH is pending, while holding ap->lock.
> + */
> + if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS))
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> +
> if (unlikely(!scmd->cmd_len))
> goto bad_cdb_len;
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2022-11-11 8:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-08 23:15 [PATCH v3] ata: libata-core: do not issue non-internal commands once EH is pending Niklas Cassel
2022-11-09 5:15 ` Damien Le Moal
2022-11-09 7:57 ` John Garry
2022-11-11 8:55 ` Damien Le Moal [this message]
2022-11-11 10:11 ` John Garry
2022-11-12 1:41 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a9a84da-13fc-04e9-ec23-807933b616e4@opensource.wdc.com \
--to=damien.lemoal@opensource.wdc.com \
--cc=hare@suse.de \
--cc=linux-ide@vger.kernel.org \
--cc=niklas.cassel@wdc.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox