From: John Garry <john.garry@huawei.com>
To: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: "jejb@linux.ibm.com" <jejb@linux.ibm.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"jinpu.wang@cloud.ionos.com" <jinpu.wang@cloud.ionos.com>,
"damien.lemoal@opensource.wdc.com"
<damien.lemoal@opensource.wdc.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linuxarm@huawei.com" <linuxarm@huawei.com>,
"yangxingui@huawei.com" <yangxingui@huawei.com>,
"yanaijie@huawei.com" <yanaijie@huawei.com>
Subject: Re: [PATCH v5 0/7] libsas and drivers: NCQ error handling
Date: Tue, 4 Oct 2022 15:04:51 +0100 [thread overview]
Message-ID: <cfa52b91-db81-a179-76c2-8a61266c099d@huawei.com> (raw)
In-Reply-To: <YzwvpUUftX6Ziurt@x1-carbon>
On 04/10/2022 14:05, Niklas Cassel wrote:
>> 2.35.3
>>
> For pm80xx (pm8001 changes untested):
> Tested-by: Niklas Cassel<niklas.cassel@wdc.com>
>
>
Thanks!
>
> Notes unrelated to this patch:
>
> Both before and after this series, this driver prints:
> [ 215.845053] ata21.00: exception Emask 0x0 SAct 0xfc0000 SErr 0x0 action 0x6
> [ 215.852308] ata21.00: failed command: WRITE FPDMA QUEUED
> [ 215.857801] ata21.00: cmd 61/00:00:00:3a:d3/01:00:b3:04:00/40 tag 18 ncq dma 131072 out
> res 43/04:00:ff:3a:d3/00:00:b3:04:00/40 Emask 0x400 (NCQ error) <F>
> [ 215.874396] ata21.00: status: { DRDY SENSE ERR }
> [ 215.879192] ata21.00: error: { ABRT }
> [ 215.882997] ata21.00: failed command: WRITE FPDMA QUEUED
> [ 215.888479] ata21.00: cmd 61/00:00:00:3b:d3/01:00:b3:04:00/40 tag 19 ncq dma 131072 out
> res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
> [ 215.904814] ata21.00: failed command: WRITE FPDMA QUEUED
> [ 215.910311] ata21.00: cmd 61/00:00:00:3c:d3/01:00:b3:04:00/40 tag 20 ncq dma 131072 out
> res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
> [ 215.932679] ata21.00: failed command: WRITE FPDMA QUEUED
> [ 215.941203] ata21.00: cmd 61/00:00:00:3d:d3/01:00:b3:04:00/40 tag 21 ncq dma 131072 out
> res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
> [ 215.963616] ata21.00: failed command: WRITE FPDMA QUEUED
> [ 215.972150] ata21.00: cmd 61/00:00:00:3e:d3/01:00:b3:04:00/40 tag 22 ncq dma 131072 out
> res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
> [ 215.994532] ata21.00: failed command: WRITE FPDMA QUEUED
> [ 216.003124] ata21.00: cmd 61/00:00:00:3f:d3/01:00:b3:04:00/40 tag 23 ncq dma 131072 out
> res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
>
> HSM (Host State Machine) violation errors.
>
> For the same SATA drive connected via AHCI this will instead give:
>
> [ 3796.944923] ata14.00: exception Emask 0x0 SAct 0x80800003 SErr 0xc0000 action 0x0
> [ 3796.959375] ata14.00: irq_stat 0x40000008
> [ 3796.970140] ata14: SError: { CommWake 10B8B }
> [ 3796.981231] ata14.00: failed command: WRITE FPDMA QUEUED
> [ 3796.993237] ata14.00: cmd 61/00:08:00:7e:73/02:00:8e:08:00/40 tag 1 ncq dma 262144 out
> res 43/04:01:00:00:00/00:00:00:00:00/40 Emask 0x1 (device error)
> [ 3797.017984] ata14.00: status: { DRDY SENSE ERR }
> [ 3797.026833] ata14.00: error: { ABRT }
> [ 3797.034664] ata14.00: failed command: WRITE FPDMA QUEUED
> [ 3797.043015] ata14.00: cmd 61/00:b8:00:60:73/0a:00:8e:08:00/40 tag 23 ncq dma 1310720 out
> res 43/04:00:df:67:73/00:00:8e:08:00/40 Emask 0x400 (NCQ error) <F>
> [ 3797.065224] ata14.00: status: { DRDY SENSE ERR }
> [ 3797.072914] ata14.00: error: { ABRT }
> [ 3797.079598] ata14.00: failed command: WRITE FPDMA QUEUED
> [ 3797.087920] ata14.00: cmd 61/00:f8:00:6a:73/0a:00:8e:08:00/40 tag 31 ncq dma 1310720 out
> res 43/04:00:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)
> [ 3797.109800] ata14.00: status: { DRDY SENSE ERR }
> [ 3797.117451] ata14.00: error: { ABRT }
>
> device error errors.
>
>
> Except for the I/O that caused the NCQ error, the remaining outstanding I/Os,
> regardless if they were aborted by the drive, as a side-effect of reading the
> NCQ error log (see 13.7.4 Queued Error Log (10h) in SATA 3.5a spec),
> or if they were aborted by the host (by sas_ata_device_link_abort()),
> I don't think it is correct to report these as HSM violation errors.
>
> HSM violation errors are e.g. when you try to issue a command to a drive
> that has ATA_BUSY bit set.
We had a similar issue for hisi_sas and solved in patch 4/7: don't set
ATA_ERR in the fis for those IO which complete with error, but abort the
IO via sas_abort_task().
For pm80xx the IO is either rejected (actually completes with rejection)
or is aborted via internal abort command. Maybe we can do similar for
pm8001 as we allow the IO to complete in both cases with error. I'll check.
Thanks,
John
next prev parent reply other threads:[~2022-10-04 14:05 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-27 7:04 [PATCH v5 0/7] libsas and drivers: NCQ error handling John Garry
2022-09-27 7:04 ` [PATCH v5 1/7] scsi: libsas: Add sas_ata_device_link_abort() John Garry
2022-09-27 7:04 ` [PATCH v5 2/7] scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task() John Garry
2022-09-27 7:04 ` [PATCH v5 3/7] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw John Garry
2022-09-27 7:04 ` [PATCH v5 4/7] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing John Garry
2022-09-27 7:04 ` [PATCH v5 5/7] scsi: pm8001: Modify task abort handling for SATA task John Garry
2022-09-27 7:04 ` [PATCH v5 6/7] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors John Garry
2022-09-27 7:04 ` [PATCH v5 7/7] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private John Garry
2022-10-04 13:05 ` [PATCH v5 0/7] libsas and drivers: NCQ error handling Niklas Cassel
2022-10-04 14:04 ` John Garry [this message]
2022-10-05 8:53 ` John Garry
2022-10-05 21:28 ` Niklas Cassel
2022-10-05 21:36 ` Damien Le Moal
2022-10-05 22:11 ` Niklas Cassel
2022-10-05 22:42 ` Damien Le Moal
2022-10-06 8:33 ` John Garry
2022-10-06 14:45 ` Niklas Cassel
2022-10-06 16:41 ` John Garry
2022-10-24 12:24 ` Niklas Cassel
2022-10-24 12:44 ` John Garry
2022-10-24 13:10 ` Niklas Cassel
2022-10-24 16:20 ` John Garry
2022-10-06 22:57 ` Damien Le Moal
2022-10-06 8:37 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfa52b91-db81-a179-76c2-8a61266c099d@huawei.com \
--to=john.garry@huawei.com \
--cc=Niklas.Cassel@wdc.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=jejb@linux.ibm.com \
--cc=jinpu.wang@cloud.ionos.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=martin.petersen@oracle.com \
--cc=yanaijie@huawei.com \
--cc=yangxingui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox