From: Damien Le Moal <dlemoal@kernel.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
Christoph Hellwig <hch@infradead.org>
Cc: Yafang Shao <laoar.shao@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Christian Brauner <brauner@kernel.org>,
djwong@kernel.org, cem@kernel.org, linux-xfs@vger.kernel.org,
Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
Damien Le Moal <Damien.LeMoal@wdc.com>,
Sathya Prakash <sathya.prakash@broadcom.com>,
Sreekanth Reddy <sreekanth.reddy@broadcom.com>,
Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
MPT-FusionLinux.pdl@broadcom.com, linux-scsi@vger.kernel.org
Subject: Re: [QUESTION] xfs, iomap: Handle writeback errors to prevent silent data corruption
Date: Wed, 4 Jun 2025 16:29:43 +0900 [thread overview]
Message-ID: <ee8952a0-d0ee-47db-8012-abb2722ae7ef@kernel.org> (raw)
In-Reply-To: <abe44d8f2bebe805dd0975be198994c89a100644.camel@HansenPartnership.com>
On 6/3/25 11:57 PM, James Bottomley wrote:
> On Tue, 2025-06-03 at 07:41 -0700, Christoph Hellwig wrote:
>> [taking this private to discuss the mpt drivers]
>>
>>> Hmmm... DID_SOFT_ERROR... Normally, this is an immediate retry as
>>> this normally is used to indicate that a command is a collateral
>>> abort due to an NCQ error, and per ATA spec, that command should be
>>> retried. However, the *BAD* thing about Broadcom HBAs using this is
>>> that it increments the command retry counter, so if a command ends
>>> up being retried more than 5 times due to other commands failing,
>>> the command runs out of retries and is failed like this. The
>>> command retry counter should *not* be incremented for NCQ
>>> collateral aborts. I tried to fix this, but it is impossible as we
>>> actually do not know if this is a collateral abort or something
>>> else. The HBA events used to handle completion do not allow
>>> differentiation. Waiting on Broadcom to do something about this
>>> (the mpi3mr HBA driver has the same nasty issue).
>>
>> Maybe we should just change the mpt3 sas/mr drivers to use
>> DID_SOFT_ERROR less? In fact there's not really a whole lot of
>> DID_SOFT_ERROR users otherwise, and there's probably better status
>> codes whatever they are doing can be translated to that do not
>> increment the retry counter.
>
> The status code that does that (retry without incrementing the counter)
> is DID_IMM_RETRY. The driver has to be a bit careful about using this
> because we can get into infinite retry loops.
James,
Thank you for the information. Will have a try again at changing the driver to
use this.
--
Damien Le Moal
Western Digital Research
prev parent reply other threads:[~2025-06-04 7:31 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-29 2:50 [QUESTION] xfs, iomap: Handle writeback errors to prevent silent data corruption Yafang Shao
2025-05-29 4:25 ` Darrick J. Wong
2025-05-29 5:55 ` Yafang Shao
2025-05-30 5:17 ` Christian Brauner
2025-05-30 15:38 ` Darrick J. Wong
2025-05-31 23:02 ` Dave Chinner
2025-06-03 0:03 ` Darrick J. Wong
2025-06-06 10:43 ` Christian Brauner
2025-06-12 3:43 ` Darrick J. Wong
2025-06-12 6:29 ` Amir Goldstein
2025-07-02 18:41 ` Darrick J. Wong
2025-06-02 5:32 ` Christoph Hellwig
2025-06-03 14:35 ` Darrick J. Wong
2025-06-03 14:38 ` Christoph Hellwig
2025-05-29 4:36 ` Dave Chinner
2025-05-29 6:04 ` Yafang Shao
2025-06-02 5:38 ` Christoph Hellwig
2025-06-02 23:19 ` Dave Chinner
2025-06-03 4:50 ` Christoph Hellwig
2025-06-03 22:05 ` Dave Chinner
2025-06-04 6:33 ` Christoph Hellwig
2025-06-05 2:18 ` Dave Chinner
2025-06-05 4:51 ` Christoph Hellwig
2025-06-02 5:31 ` Christoph Hellwig
2025-06-03 3:03 ` Yafang Shao
2025-06-03 3:13 ` Matthew Wilcox
2025-06-03 3:21 ` Yafang Shao
2025-06-03 3:26 ` Matthew Wilcox
2025-06-03 3:50 ` Yafang Shao
2025-06-03 4:40 ` Christoph Hellwig
2025-06-03 5:17 ` Damien Le Moal
2025-06-03 5:54 ` Yafang Shao
2025-06-03 6:36 ` Damien Le Moal
2025-06-03 14:41 ` Christoph Hellwig
2025-06-03 14:57 ` James Bottomley
2025-06-04 7:29 ` Damien Le Moal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee8952a0-d0ee-47db-8012-abb2722ae7ef@kernel.org \
--to=dlemoal@kernel.org \
--cc=Damien.LeMoal@wdc.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=MPT-FusionLinux.pdl@broadcom.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=laoar.shao@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=sathya.prakash@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.