From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: "linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: [PATCH 1/2] ata: libata: only mark a single command as error during a NCQ error
Date: Mon, 14 Nov 2022 17:19:01 +0000 [thread overview]
Message-ID: <Y3J4hLWA3SR+bDvW@x1-carbon> (raw)
In-Reply-To: <d9f8025b-2058-5ca3-0353-5c8bf67ff3c0@opensource.wdc.com>
On Mon, Nov 14, 2022 at 11:11:26AM +0900, Damien Le Moal wrote:
> On 11/11/22 20:09, Niklas Cassel wrote:
> > A NCQ error means that the device has aborted a single NCQ command and
> > halted further processing of queued commands.
>
> Nit: To be strict with wording, this should say something like "an NCQ
> command failure results in the device aborting the execution of all active
> commands."
>
> > To get the single NCQ command that caused the NCQ error, host software has
> > to read the NCQ error log, which also takes the device out of error state.
> >
> > When the device encounters a NCQ error, we receive an error interrupt from
> > the HBA, and call ata_do_link_abort() to mark all outstanding commands on
> > the link as ATA_QCFLAG_FAILED (which means that these commands are owned
> > by libata EH), and then call ata_qc_complete() on them.
> >
> > ata_qc_complete() will call fill_result_tf() for all commands marked as
> > ATA_QCFLAG_FAILED.
> >
> > The taskfile is simply the latest status/error as seen from the device's
> > perspective. The taskfile will have ATA_ERR set in the status field and
> > ATA_ABORTED set in the error field.
> >
> > When we fill the current taskfile values for all outstanding commands,
> > that means that qc->result_tf will have ATA_ERR set for all commands
> > owned by libata EH.
> >
> > When ata_eh_link_autopsy() later analyzes all commands owned by libata EH,
> > it will call ata_eh_analyze_tf(), which will check if qc->result_tf has
> > ATA_ERR set, if it does, it will set qc->err_mask (which marks the command
> > as an error).
> >
> > When ata_eh_finish() later calls __ata_qc_complete() on all commands owned
> > by libata EH, it will call qc->complete_fn() (ata_scsi_qc_complete()),
> > ata_scsi_qc_complete() will call ata_gen_ata_sense() to generate sense
> > data if qc->err_mask is set.
> >
> > This means that we will generate sense data for commands that really
> > should not have any sense data set. Having sense data set might cause SCSI
> > to finish these commands instead of retrying them.
> >
> > While this incorrect behavior has existed for a long time, this first
> > became a problem once we started reading the correct taskfile register in
> > commit 4ba09d202657 ("ata: libahci: read correct status and error field
> > for NCQ commands").
> >
> > Before this commit, NCQ commands would read the taskfile values received
> > from the last non-NCQ command completion, which most likely did not have
> > ATA_ERR set, since the last non-NCQ command was most likely not an error.
> >
> > Fix this by clearing ATA_ERR and any error bits for all commands except
> > the actual command that caused the NCQ error, since the error bits in the
> > taskfile are not applicable to them.
> >
> > Fixes: 4ba09d202657 ("ata: libahci: read correct status and error field for NCQ commands")
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> > drivers/ata/libata-sata.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> > index 283ce1ab29cf..6b2dcf3eb2fb 100644
> > --- a/drivers/ata/libata-sata.c
> > +++ b/drivers/ata/libata-sata.c
> > @@ -1476,6 +1476,25 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
> > }
> > }
> >
> > + ata_qc_for_each_raw(ap, qc, tag) {
> > + if (!(qc->flags & ATA_QCFLAG_FAILED) ||
> > + ata_dev_phys_link(qc->dev) != link)
> > + continue;
> > +
> > + /* Skip the single QC which caused the NCQ error. */
> > + if (qc->err_mask)
> > + continue;
>
> Before the continue, should we check that this qc tag is the one we got
> from ata_eh_read_log_10h() ? We should at least warn if there is a mismatch.
I really see no point of displaying a warning in case of mismatch here.
At this point, there will be exactly one command that has AC_ERR_NCQ set.
If ata_eh_read_log_10h() reported an invalid tag, we would have returned
in the check performed directly after ata_eh_read_log_10h() was called:
if (!(link->sactive & (1 << tag))) {
ata_link_err(link, "log page 10h reported inactive tag %d\n",
tag);
return;
}
Which means that we would never have reached this new code.
However, there could theoretically be another command that has e.g.
AC_ERR_TIMEOUT set, if there was a command that timed out just before
the NCQ error, so EH did not yet have a change to run before handling
both errors at the same time.
Therefore I wrote:
+ if (qc->err_mask)
+ continue;
Instead of:
+ if (qc->err_mask & AC_ERR_NCQ)
+ continue;
Kind regards,
Niklas
next prev parent reply other threads:[~2022-11-14 17:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 11:09 [PATCH 0/2] libata NCQ error handling fix and improvement Niklas Cassel
2022-11-11 11:09 ` [PATCH 1/2] ata: libata: only mark a single command as error during a NCQ error Niklas Cassel
2022-11-14 2:11 ` Damien Le Moal
2022-11-14 17:19 ` Niklas Cassel [this message]
2022-11-11 11:09 ` [PATCH 2/2] ata: libata: skip error analysis for commands that are not errors Niklas Cassel
2022-11-14 2:15 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3J4hLWA3SR+bDvW@x1-carbon \
--to=niklas.cassel@wdc.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox