From: Igor Pylypiv <ipylypiv@google.com>
To: Niklas Cassel <cassel@kernel.org>
Cc: TJ Adams <tadamsjr@google.com>,
Jack Wang <jinpu.wang@cloud.ionos.com>,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] scsi: pm80xx: Do not issue hard reset before NCQ EH
Date: Fri, 14 Jun 2024 21:41:55 +0000 [thread overview]
Message-ID: <Zmy5I-XNK5bffSe3@google.com> (raw)
In-Reply-To: <ZmgNkK8haUisJ5-b@ryzen.lan>
On Tue, Jun 11, 2024 at 10:40:48AM +0200, Niklas Cassel wrote:
> Hello Igor, TJ,
>
Hi Niklas,
Thank you for the feedback!
> On Fri, Jun 07, 2024 at 05:57:42PM +0000, TJ Adams wrote:
> > From: Igor Pylypiv <ipylypiv@google.com>
> >
> > v6.2 commit 811be570a9a8 ("scsi: pm8001: Use sas_ata_device_link_abort()
>
> Do not specify kernel version (it is irrelevant), SHA1 is enough.
>
Noted.
>
> > to handle NCQ errors") removed duplicate NCQ EH from the pm80xx driver
> > and started relying on libata to handle the NCQ errors. The PM8006
> > controller has a special EH sequence that was added in v4.15 commit
> > 869ddbdcae3b ("scsi: pm80xx: corrected SATA abort handling sequence.").
>
> Do not specify kernel version (it is irrelevant), SHA1 is enough.
>
> Since the code added in 869ddbdcae3b still exists in the pm80xx driver,
> I think that you should mention the commits in chronological order.
> (Right now you mention the oldest still existing code last, which seems
> a bit backwards.)
>
Noted. I wanted to emphasise that newer commit 811be570a9a8 broke the NCQ EH
for pm8006 so I put it first. I should have added a Fixes tag to make it
clear.
>
> > The special EH sequence issues a hard reset to a drive before libata EH
> > has a chance to read the NCQ log page. Libata EH gets confused by empty
> > NCQ log page which results in HSM violation. The failed command gets
> > retried a few times and each time fails with the same HSM violation.
> > Finally, libata decides to disable NCQ due to subsequent HSM vioaltions.
>
> s/vioaltions/violations/
>
> I'm not an expert in libsas EH, but I think that your commit message fails
> to explain why this change actually fixes anything. You do not mention the
> relationship between the code that you add pm8001_work_fn() and the
> existing code in pm8001_abort_task(), and the order in which the functions
> get executed.
>
Noted, will update with more details.
> Does calling sas_execute_internal_abort_dev() from pm8001_work_fn() ensure
> that the libsas EH is never invoked? Or does it cancel the hard reset that
> is part of the "special EH sequence" in pm8001_abort_task() ?
>
It ensures that all I/Os are aborted before libsas EH kicks in. Since all
I/Os are aborted by the controller the pm8001_abort_task() will not be called.
Going to add that to commit message as well.
> Wouldn't it be better if this was fixed in pm8001_abort_task() or similar
> instead? It appears that the code you add to pm8001_work_fn() (that has a
> very ugly if (pm8006)) is only there to undo or avoid the hard reset that
> is done in pm8001_abort_task() (which also has a very ugly if (pm8006)).
>
It would definetely be better to fix this in pm8001_abort_task(), if possible.
One way would be to add a flag that will be set when NCQ error happens (when
IO_XFER_ERROR_ABORTED_NCQ_MODE event is received) and then check that flag
in pm8001_abort_task() to skip hard reset. This approach adds another type
of ugliness to the code and I'm not sure which of these ugly approaches is
less ugly.
> Now we have this ugly if (pm8006) in two different functions... which
> makes my "this could be solved in a nicer way" detector go off.
>
I would be very happy if we can drop those ugly if (pm8006) checks and have
a generic nice solution.
> If this patch (as is) is really the way to go, then I think there should
> be a more detailed reasoning why this change is the most sensible one.
>
Let me investigate this issue more to see if there is a way to drop
the ugly pm8006 checks.
Any ideas/suggestions on how to fix this nicely are very welcomed.
>
> Kind regards,
> Niklas
Thank you,
Igor
next prev parent reply other threads:[~2024-06-14 21:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-07 17:57 [RESEND][PATCH 0/3] small pm80xx driver fixes TJ Adams
2024-06-07 17:57 ` [PATCH 1/3] scsi: pm80xx: Set phy->enable_completion only when we wait for it TJ Adams
2024-06-14 5:32 ` Jinpu Wang
2024-06-07 17:57 ` [PATCH 2/3] scsi: pm80xx: Do not issue hard reset before NCQ EH TJ Adams
2024-06-11 8:40 ` Niklas Cassel
2024-06-14 21:41 ` Igor Pylypiv [this message]
2024-06-07 17:57 ` [PATCH 3/3] scsi: pm8001: Update log level when reading config table TJ Adams
2024-06-14 5:37 ` Jinpu Wang
-- strict thread matches above, loose matches on Subject: below --
2024-06-07 17:06 [PATCH 0/3] small pm80xx driver fixes TJ Adams
2024-06-07 17:06 ` [PATCH 2/3] scsi: pm80xx: Do not issue hard reset before NCQ EH TJ Adams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zmy5I-XNK5bffSe3@google.com \
--to=ipylypiv@google.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=cassel@kernel.org \
--cc=jinpu.wang@cloud.ionos.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=tadamsjr@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.