From: John Garry <john.g.garry@oracle.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Niklas Cassel <niklas.cassel@wdc.com>,
tj@kernel.org, hare@suse.de
Cc: linux-ide@vger.kernel.org, chenxiang66@hisilicon.com
Subject: Re: [PATCH v3] ata: libata-core: do not issue non-internal commands once EH is pending
Date: Wed, 9 Nov 2022 07:57:32 +0000 [thread overview]
Message-ID: <bb443aaa-ebc4-5302-9962-a5bca6143c3a@oracle.com> (raw)
In-Reply-To: <5b63945b-b12c-480c-3b3d-cea9420d08d4@opensource.wdc.com>
On 09/11/2022 05:15, Damien Le Moal wrote:
+
> On 11/9/22 08:15, Niklas Cassel wrote:
>> While the ATA specification states that a device should return command
>> aborted for all commands queued after the device has entered error state,
>> since ATA only keeps the sense data for the latest command (in non-NCQ
>> case), we really don't want to send block layer commands to the device
>> after it has entered error state. (Only ATA EH commands should be sent,
>> to read the sense data etc.)
>>
>> Currently, scsi_queue_rq() will check if scsi_host_in_recovery()
>> (state is SHOST_RECOVERY), and if so, it will _not_ issue a command via:
>> scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
>> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>>
>> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>> when receiving a TFES error IRQ, the call chain looked like this:
>> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
>> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
>> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
>> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>>
>> Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY
>> would be set.
>>
>> However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>> scsi_times_out() will instead call scsi_abort_command() which will queue
>> delayed work, and the worker function scmd_eh_abort_handler() will call
>> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
>>
>> So now, after the TFES error IRQ has been serviced, we need to wait for
>> the SCSI workqueue to run its work before SHOST_RECOVERY gets set.
>>
>> It is worth noting that, even before commit e494f6a72839 ("[SCSI] improved
>> eh timeout handler"), we could receive an error IRQ from the time when
>> scsi_queue_rq() checks scsi_host_in_recovery(), to the time when
>> ata_scsi_queuecmd() is actually called.
>>
>> In order to handle both the delayed setting of SHOST_RECOVERY and the
>> window where we can receive an error IRQ, add a check against
>> ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ),
>> inside ata_scsi_queuecmd() itself, while holding the ap->lock.
>> (Since the ap->lock is held while servicing IRQs.)
>>
>> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>> ---
>> Changes since v2:
>> -Improve commit message and the comment inside the code.
>> -Move the check to __ata_scsi_queuecmd().
>>
>> drivers/ata/libata-scsi.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 4cb914103382..93ebcdf2e354 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -3978,9 +3978,19 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
>>
>> int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
>> {
>> + struct ata_port *ap = dev->link->ap;
>> u8 scsi_op = scmd->cmnd[0];
>> ata_xlat_func_t xlat_func;
>>
>> + /*
>> + * scsi_queue_rq() will defer commands if scsi_host_in_recovery().
>> + * However, this check is done without holding the ap->lock (a libata
>> + * specific lock), so we can have received an error irq since then,
>> + * therefore we must check if EH is pending, while holding ap->lock.
>> + */
>> + if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS))
>> + return SCSI_MLQUEUE_DEVICE_BUSY;
>> +
>
> I prefer this to the scsi recovery state approach.
>
> John,
>
> Can you test this with libsas ?
I have no HW access currently, sorry. Maybe Xiang Chen, cc'ed, can assist.
Thanks,
John
>
>
>> if (unlikely(!scmd->cmd_len))
>> goto bad_cdb_len;
>>
>
next prev parent reply other threads:[~2022-11-09 7:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-08 23:15 [PATCH v3] ata: libata-core: do not issue non-internal commands once EH is pending Niklas Cassel
2022-11-09 5:15 ` Damien Le Moal
2022-11-09 7:57 ` John Garry [this message]
2022-11-11 8:55 ` Damien Le Moal
2022-11-11 10:11 ` John Garry
2022-11-12 1:41 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bb443aaa-ebc4-5302-9962-a5bca6143c3a@oracle.com \
--to=john.g.garry@oracle.com \
--cc=chenxiang66@hisilicon.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=hare@suse.de \
--cc=linux-ide@vger.kernel.org \
--cc=niklas.cassel@wdc.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox