public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Niklas Cassel <Niklas.Cassel@wdc.com>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: John Garry <john.g.garry@oracle.com>,
	"tj@kernel.org" <tj@kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending
Date: Wed, 9 Nov 2022 08:11:17 +0100	[thread overview]
Message-ID: <c0a34e41-17ca-cbc1-cf54-9fee23b830a3@suse.de> (raw)
In-Reply-To: <Y2ri7EVPZb2O9iD8@x1-carbon>

On 11/9/22 00:14, Niklas Cassel wrote:
> On Tue, Nov 08, 2022 at 12:50:44PM +0900, Damien Le Moal wrote:
>>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>>> index 4cb914103382..383a208f5f99 100644
>>>>> --- a/drivers/ata/libata-scsi.c
>>>>> +++ b/drivers/ata/libata-scsi.c
>>>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>>>>>    	if (xlat_func(qc))
>>>>>    		goto early_finish;
>>>>> +	/*
>>>>> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
>>>>> +	 *
>>>>> +	 * When getting an error interrupt, ata_port_abort() will be called,
>>>>> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
>>>>> +	 *
>>>>> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
>>>>> +	 * blk_abort_request() on the given QC. blk_abort_request() will
>>>>> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
>>>>> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
>>>>> +	 *
>>>>> +	 * In order to avoid requests from being issued to the device from the
>>>>> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
>>>>> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
>>>>> +	 */
>>>>> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>>>>> +		rc = ATA_DEFER_LINK;
>>>>> +		goto defer;
>>>>
>>>> Could we move this check earlier? I mean, we didn't need to have the qc
>>>> alloc'ed and xlat'ed for this check to be done, right?
>>>
>>> Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
>>>
>>>
>>> Or, perhaps it is just time to accept that ATA EH is so interconnected with
>>> SCSI EH, so the simplest thing is just to do:
>>>
>>> --- a/drivers/ata/libata-eh.c
>>> +++ b/drivers/ata/libata-eh.c
>>> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>>>   
>>>          qc->flags |= ATA_QCFLAG_FAILED;
>>>          ata_eh_set_pending(ap, 1);
>>> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>>
>> Why put this in this function ? Nothing in ata_qc_schedule_eh() calls
>> scsi_schedule_eh() or scsi_eh_scmd_add(), which set that state.
> 
> It does, but after calling blk_abort_request(), we need to wait for
> two different workqueues to run their work, before the SHOST_RECOVERY
> state gets set:
> 
> blk_abort_request() -> kblockd_schedule_work(&req->q->timeout_work) ->
> queue_work(kblockd_workqueue, work)
> 
> ... -> blk_mq_timeout_work() -> blk_mq_handle_expired() ->
> blk_mq_rq_timed_out() -> req->q->mq_ops->timeout() (scsi_timeout()) ->
> scsi_abort_command() ->
> queue_delayed_work(shost->tmf_work_q, &scmd->abort_work, HZ / 100);
> 
> ... -> scmd_eh_abort_handler() -> scsi_eh_scmd_add() ->
> scsi_host_set_state(shost, SHOST_RECOVERY)
> 
> After setting state to SHOST_RECOVERY, scsi_eh_scmd_add() will
> call scsi_eh_inc_host_failed(), which will cause the while (true) loop
> in scsi_error_handler() to proceed to perform SCSI EH, and eventually
> call shost->transportt->eh_strategy_handler(shost) which for ATA is set
> to ata_scsi_error().
> 
> Then we have the regular ATA side of things:
> ata_scsi_error() -> ata_scsi_port_error_handler() ->
> ap->ops->error_handler(ap) -> (for e.g. AHCI) ahci_error_handler() ->
> sata_pmp_error_handler() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
> ata_eh_analyze_ncq_error(). (Where reading the NCQ error log in the
> command that brings the device out of error state.)
> 
> 
> Looking at this original commit, there are two ways for libata to trigger
> SCSI EH:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b70fc039824bc7303e4007a5f758f832de56611
> 
> ata_qc_schedule_eh(): which is explained above. It indirectly schedules
> SCSI with an associated QC.
> 
> ata_port_schedule_eh(): It directly schedules EH for @ap without an
> associated qc. (I assume this is for e.g. an errors with the link,
> where we get an error interrupt and need to read SError register.)
> 
> 
> The commit message here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee7863bc68fa6ad6fe7cfcc0e5ebe9efe0c0664e
> 
> "In short, SCSI host is not supposed to know about exceptions unrelated
> to specific device or command.  Such exceptions should be handled by
> transport layer proper.  However, the distinction is not essential to
> ATA and libata is planning to depart from SCSI, so, for the time
> being, libata will be using SCSI EH to handle such exceptions."
> 
> So it appears that this distinction is not important for libata.
> Sure, if libata EH function ata_scsi_error() sees any commands in
> host->eh_cmd_q, then we know that they timed out or aborted.
> But ata_scsi_cmd_error_handler() will leave any QC alone that
> has ATA_QCFLAG_FAILED flag set.
> Those QCs will instead be processed by ata_scsi_port_error_handler()
> which totally ignores the host->eh_cmd_q list supplied by SCSI EH,
> and only looks at QCs with ATA_QCFLAG_FAILED set.
> 
> 
> So it would be tempting to do something like:
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1001,8 +1001,7 @@ static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
>                  }
>          }
>   
> -       if (!nr_aborted)
> -               ata_port_schedule_eh(ap);
> +       ata_port_schedule_eh(ap);
>   
>          return nr_aborted;
> 
> 
> However, doing so would go against how this API is supposed to be used, see:
> 36fed4980529 ("[SCSI] libsas: cleanup spurious calls to scsi_schedule_eh")
> 
> I did decide to try it anyway, but it turns out both this and the previous
> suggestion:
> 
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
> 
>          qc->flags |= ATA_QCFLAG_FAILED;
>          ata_eh_set_pending(ap, 1);
> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
> 
> 
> Both actually lead to two error interrupts.
> (We should only have one.)
> 
> QEMU shows that this is from scsi_restart_operations(),
> which temporary clears the SHOST_RECOVERY bit, then calls
> scsi_run_host_queues().
> 
> Since we've reached this far, that must mean that in
> #3  scsi_queue_rq
> 
> when the:
> if (unlikely(scsi_host_in_recovery(shost))) {
> check was done, we were not in recovery.
> 
> But from that time, to #0, we must have received an error irq,
> because a:
> (gdb) p /x scmd->device->host->shost_state
> $9 = 0x5
> 
> which is SHOST_RECOVERY,
> 
> 
> #0  __ata_scsi_queuecmd (scmd=scmd@entry=0xffff8881016d7b88, dev=0xffff888101e124c0) at drivers/ata/libata-scsi.c:4256
> #1  0xffffffff819c1d8b in ata_scsi_queuecmd (shost=<optimized out>, cmd=0xffff8881016d7b88) at drivers/ata/libata-scsi.c:4337
> #2  0xffffffff81995c41 in scsi_dispatch_cmd (cmd=0xffff8881016d7b88) at drivers/scsi/scsi_lib.c:1516
> #3  scsi_queue_rq (hctx=<optimized out>, bd=<optimized out>) at drivers/scsi/scsi_lib.c:1757
> --Type <RET> for more, q to quit, c to continue without paging--
> #4  0xffffffff81578a42 in blk_mq_dispatch_rq_list (hctx=hctx@entry=0xffff8881008e0000, list=list@entry=0xffffc90000367da0, nr_budgets=0, nr_budgets@entry=1) at block/blk-mq.c:2056
> #5  0xffffffff8157ef1b in __blk_mq_do_dispatch_sched (hctx=0xffff8881008e0000) at block/blk-mq-sched.c:173
> #6  blk_mq_do_dispatch_sched (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:187
> #7  0xffffffff8157f238 in __blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:313
> #8  0xffffffff8157f2fb in blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:339
> #9  0xffffffff81573cc0 in __blk_mq_run_hw_queue (hctx=0xffff8881008e0000) at block/blk-mq.c:2174
> #10 0xffffffff8157546c in __blk_mq_delay_run_hw_queue (hctx=<optimized out>, async=<optimized out>, msecs=msecs@entry=0) at block/blk-mq.c:2250
> #11 0xffffffff815756a6 in blk_mq_run_hw_queue (hctx=<optimized out>, async=async@entry=false) at block/blk-mq.c:2298
> #12 0xffffffff81575990 in blk_mq_run_hw_queues (q=0xffff888102530000, async=async@entry=false) at block/blk-mq.c:2346
> #13 0xffffffff8199221d in scsi_run_queue (q=<optimized out>) at drivers/scsi/scsi_lib.c:457
> #14 0xffffffff81994ead in scsi_run_host_queues (shost=shost@entry=0xffff888101ca2000) at drivers/scsi/scsi_lib.c:475
> #15 0xffffffff819918bf in scsi_restart_operations (shost=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2134
> #16 scsi_error_handler (data=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2327
> 
> 
> So it appears that simply checking if SHOST_RECOVERY is set in
> scsi_queue_rq() is not enough. Since this is done without holding
> ap->lock (which is in libata..), we can always get an error irq.
> 
> So the only reliable way to ensure that we don't send down requests
> while the drive is in error state, is to have a check against
> ATA_PFLAG_EH_PENDING inside __ata_scsi_queuecmd(), while holding
> the ap->lock.
> Will send a V3 that is similar to V2, but with the check inside
> __ata_scsi_queuecmd().
> Thanks for the detailed explanation, Niklas.

However, one fundamental thing is still unresolved:
I've switched SCSI EH to do asynchronous aborts with commit e494f6a72839 
("[SCSI] improved eh timeout handler"); since then commands will be 
aborted without invoking SCSI EH.

This goes _fundamentally_ against libata's .eh_strategy (as it's never 
invoked), and as such libata _cannot_ use command aborts.
Which typically wouldn't matter as ATA doesn't have command aborts, and 
realistically any error is causing the NCQ queue to be drained.

So SCSI EH scsi_abort_command() really shouldn't queue a workqueue item, 
as it'll never be able to do anything meaningful.

You need this patch:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index be2a70c5ac6d..4fb72b73871e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd)
                 return FAILED;
         }

+       if (!hostt->eh_abort_handler) {
+               /* No abort handler, fail command directly */
+               return FAILED;
+       }
+
         spin_lock_irqsave(shost->host_lock, flags);
         if (shost->eh_deadline != -1 && !shost->last_reset)
                 shost->last_reset = jiffies;

to avoid having libata trying to queue a (pointless) abort workqueue 
item, and get rid of the various workqueue thingies you mentioned above.

And it's a sensible fix anyway, will be sending it as a separate patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


  reply	other threads:[~2022-11-09  7:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221107161036.670237-1-niklas.cassel@wdc.com>
     [not found] ` <5a4fa5db-c706-5cf2-1145-bf091445d20e@oracle.com>
     [not found]   ` <Y2mbX8MvYrF0FHaI@x1-carbon>
2022-11-08  3:50     ` [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending Damien Le Moal
2022-11-08 23:14       ` Niklas Cassel
2022-11-09  7:11         ` Hannes Reinecke [this message]
2022-11-09  9:28           ` Niklas Cassel
2022-11-09  9:37             ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0a34e41-17ca-cbc1-cf54-9fee23b830a3@suse.de \
    --to=hare@suse.de \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=john.g.garry@oracle.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox