From: Hannes Reinecke <hare@suse.de>
To: Niklas Cassel <Niklas.Cassel@wdc.com>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: John Garry <john.g.garry@oracle.com>,
"tj@kernel.org" <tj@kernel.org>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending
Date: Wed, 9 Nov 2022 08:11:17 +0100 [thread overview]
Message-ID: <c0a34e41-17ca-cbc1-cf54-9fee23b830a3@suse.de> (raw)
In-Reply-To: <Y2ri7EVPZb2O9iD8@x1-carbon>
On 11/9/22 00:14, Niklas Cassel wrote:
> On Tue, Nov 08, 2022 at 12:50:44PM +0900, Damien Le Moal wrote:
>>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>>> index 4cb914103382..383a208f5f99 100644
>>>>> --- a/drivers/ata/libata-scsi.c
>>>>> +++ b/drivers/ata/libata-scsi.c
>>>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>>>>> if (xlat_func(qc))
>>>>> goto early_finish;
>>>>> + /*
>>>>> + * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
>>>>> + *
>>>>> + * When getting an error interrupt, ata_port_abort() will be called,
>>>>> + * which ends up calling ata_qc_schedule_eh() on all QCs.
>>>>> + *
>>>>> + * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
>>>>> + * blk_abort_request() on the given QC. blk_abort_request() will
>>>>> + * asynchronously end up calling scsi_eh_scmd_add(), which will set
>>>>> + * the state to SHOST_RECOVERY and wake up SCSI EH.
>>>>> + *
>>>>> + * In order to avoid requests from being issued to the device from the
>>>>> + * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
>>>>> + * sets the state to SHOST_RECOVERY, we defer requests here as well.
>>>>> + */
>>>>> + if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>>>>> + rc = ATA_DEFER_LINK;
>>>>> + goto defer;
>>>>
>>>> Could we move this check earlier? I mean, we didn't need to have the qc
>>>> alloc'ed and xlat'ed for this check to be done, right?
>>>
>>> Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
>>>
>>>
>>> Or, perhaps it is just time to accept that ATA EH is so interconnected with
>>> SCSI EH, so the simplest thing is just to do:
>>>
>>> --- a/drivers/ata/libata-eh.c
>>> +++ b/drivers/ata/libata-eh.c
>>> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>>>
>>> qc->flags |= ATA_QCFLAG_FAILED;
>>> ata_eh_set_pending(ap, 1);
>>> + scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>>
>> Why put this in this function ? Nothing in ata_qc_schedule_eh() calls
>> scsi_schedule_eh() or scsi_eh_scmd_add(), which set that state.
>
> It does, but after calling blk_abort_request(), we need to wait for
> two different workqueues to run their work, before the SHOST_RECOVERY
> state gets set:
>
> blk_abort_request() -> kblockd_schedule_work(&req->q->timeout_work) ->
> queue_work(kblockd_workqueue, work)
>
> ... -> blk_mq_timeout_work() -> blk_mq_handle_expired() ->
> blk_mq_rq_timed_out() -> req->q->mq_ops->timeout() (scsi_timeout()) ->
> scsi_abort_command() ->
> queue_delayed_work(shost->tmf_work_q, &scmd->abort_work, HZ / 100);
>
> ... -> scmd_eh_abort_handler() -> scsi_eh_scmd_add() ->
> scsi_host_set_state(shost, SHOST_RECOVERY)
>
> After setting state to SHOST_RECOVERY, scsi_eh_scmd_add() will
> call scsi_eh_inc_host_failed(), which will cause the while (true) loop
> in scsi_error_handler() to proceed to perform SCSI EH, and eventually
> call shost->transportt->eh_strategy_handler(shost) which for ATA is set
> to ata_scsi_error().
>
> Then we have the regular ATA side of things:
> ata_scsi_error() -> ata_scsi_port_error_handler() ->
> ap->ops->error_handler(ap) -> (for e.g. AHCI) ahci_error_handler() ->
> sata_pmp_error_handler() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
> ata_eh_analyze_ncq_error(). (Where reading the NCQ error log in the
> command that brings the device out of error state.)
>
>
> Looking at this original commit, there are two ways for libata to trigger
> SCSI EH:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b70fc039824bc7303e4007a5f758f832de56611
>
> ata_qc_schedule_eh(): which is explained above. It indirectly schedules
> SCSI with an associated QC.
>
> ata_port_schedule_eh(): It directly schedules EH for @ap without an
> associated qc. (I assume this is for e.g. an errors with the link,
> where we get an error interrupt and need to read SError register.)
>
>
> The commit message here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee7863bc68fa6ad6fe7cfcc0e5ebe9efe0c0664e
>
> "In short, SCSI host is not supposed to know about exceptions unrelated
> to specific device or command. Such exceptions should be handled by
> transport layer proper. However, the distinction is not essential to
> ATA and libata is planning to depart from SCSI, so, for the time
> being, libata will be using SCSI EH to handle such exceptions."
>
> So it appears that this distinction is not important for libata.
> Sure, if libata EH function ata_scsi_error() sees any commands in
> host->eh_cmd_q, then we know that they timed out or aborted.
> But ata_scsi_cmd_error_handler() will leave any QC alone that
> has ATA_QCFLAG_FAILED flag set.
> Those QCs will instead be processed by ata_scsi_port_error_handler()
> which totally ignores the host->eh_cmd_q list supplied by SCSI EH,
> and only looks at QCs with ATA_QCFLAG_FAILED set.
>
>
> So it would be tempting to do something like:
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1001,8 +1001,7 @@ static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
> }
> }
>
> - if (!nr_aborted)
> - ata_port_schedule_eh(ap);
> + ata_port_schedule_eh(ap);
>
> return nr_aborted;
>
>
> However, doing so would go against how this API is supposed to be used, see:
> 36fed4980529 ("[SCSI] libsas: cleanup spurious calls to scsi_schedule_eh")
>
> I did decide to try it anyway, but it turns out both this and the previous
> suggestion:
>
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>
> qc->flags |= ATA_QCFLAG_FAILED;
> ata_eh_set_pending(ap, 1);
> + scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>
>
> Both actually lead to two error interrupts.
> (We should only have one.)
>
> QEMU shows that this is from scsi_restart_operations(),
> which temporary clears the SHOST_RECOVERY bit, then calls
> scsi_run_host_queues().
>
> Since we've reached this far, that must mean that in
> #3 scsi_queue_rq
>
> when the:
> if (unlikely(scsi_host_in_recovery(shost))) {
> check was done, we were not in recovery.
>
> But from that time, to #0, we must have received an error irq,
> because a:
> (gdb) p /x scmd->device->host->shost_state
> $9 = 0x5
>
> which is SHOST_RECOVERY,
>
>
> #0 __ata_scsi_queuecmd (scmd=scmd@entry=0xffff8881016d7b88, dev=0xffff888101e124c0) at drivers/ata/libata-scsi.c:4256
> #1 0xffffffff819c1d8b in ata_scsi_queuecmd (shost=<optimized out>, cmd=0xffff8881016d7b88) at drivers/ata/libata-scsi.c:4337
> #2 0xffffffff81995c41 in scsi_dispatch_cmd (cmd=0xffff8881016d7b88) at drivers/scsi/scsi_lib.c:1516
> #3 scsi_queue_rq (hctx=<optimized out>, bd=<optimized out>) at drivers/scsi/scsi_lib.c:1757
> --Type <RET> for more, q to quit, c to continue without paging--
> #4 0xffffffff81578a42 in blk_mq_dispatch_rq_list (hctx=hctx@entry=0xffff8881008e0000, list=list@entry=0xffffc90000367da0, nr_budgets=0, nr_budgets@entry=1) at block/blk-mq.c:2056
> #5 0xffffffff8157ef1b in __blk_mq_do_dispatch_sched (hctx=0xffff8881008e0000) at block/blk-mq-sched.c:173
> #6 blk_mq_do_dispatch_sched (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:187
> #7 0xffffffff8157f238 in __blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:313
> #8 0xffffffff8157f2fb in blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:339
> #9 0xffffffff81573cc0 in __blk_mq_run_hw_queue (hctx=0xffff8881008e0000) at block/blk-mq.c:2174
> #10 0xffffffff8157546c in __blk_mq_delay_run_hw_queue (hctx=<optimized out>, async=<optimized out>, msecs=msecs@entry=0) at block/blk-mq.c:2250
> #11 0xffffffff815756a6 in blk_mq_run_hw_queue (hctx=<optimized out>, async=async@entry=false) at block/blk-mq.c:2298
> #12 0xffffffff81575990 in blk_mq_run_hw_queues (q=0xffff888102530000, async=async@entry=false) at block/blk-mq.c:2346
> #13 0xffffffff8199221d in scsi_run_queue (q=<optimized out>) at drivers/scsi/scsi_lib.c:457
> #14 0xffffffff81994ead in scsi_run_host_queues (shost=shost@entry=0xffff888101ca2000) at drivers/scsi/scsi_lib.c:475
> #15 0xffffffff819918bf in scsi_restart_operations (shost=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2134
> #16 scsi_error_handler (data=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2327
>
>
> So it appears that simply checking if SHOST_RECOVERY is set in
> scsi_queue_rq() is not enough. Since this is done without holding
> ap->lock (which is in libata..), we can always get an error irq.
>
> So the only reliable way to ensure that we don't send down requests
> while the drive is in error state, is to have a check against
> ATA_PFLAG_EH_PENDING inside __ata_scsi_queuecmd(), while holding
> the ap->lock.
> Will send a V3 that is similar to V2, but with the check inside
> __ata_scsi_queuecmd().
> Thanks for the detailed explanation, Niklas.
However, one fundamental thing is still unresolved:
I've switched SCSI EH to do asynchronous aborts with commit e494f6a72839
("[SCSI] improved eh timeout handler"); since then commands will be
aborted without invoking SCSI EH.
This goes _fundamentally_ against libata's .eh_strategy (as it's never
invoked), and as such libata _cannot_ use command aborts.
Which typically wouldn't matter as ATA doesn't have command aborts, and
realistically any error is causing the NCQ queue to be drained.
So SCSI EH scsi_abort_command() really shouldn't queue a workqueue item,
as it'll never be able to do anything meaningful.
You need this patch:
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index be2a70c5ac6d..4fb72b73871e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd)
return FAILED;
}
+ if (!hostt->eh_abort_handler) {
+ /* No abort handler, fail command directly */
+ return FAILED;
+ }
+
spin_lock_irqsave(shost->host_lock, flags);
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
to avoid having libata trying to queue a (pointless) abort workqueue
item, and get rid of the various workqueue thingies you mentioned above.
And it's a sensible fix anyway, will be sending it as a separate patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
next prev parent reply other threads:[~2022-11-09 7:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20221107161036.670237-1-niklas.cassel@wdc.com>
[not found] ` <5a4fa5db-c706-5cf2-1145-bf091445d20e@oracle.com>
[not found] ` <Y2mbX8MvYrF0FHaI@x1-carbon>
2022-11-08 3:50 ` [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending Damien Le Moal
2022-11-08 23:14 ` Niklas Cassel
2022-11-09 7:11 ` Hannes Reinecke [this message]
2022-11-09 9:28 ` Niklas Cassel
2022-11-09 9:37 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c0a34e41-17ca-cbc1-cf54-9fee23b830a3@suse.de \
--to=hare@suse.de \
--cc=James.Bottomley@hansenpartnership.com \
--cc=Niklas.Cassel@wdc.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=john.g.garry@oracle.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox