From: Bjorn Helgaas <helgaas@kernel.org>
To: Alexey Bogoslavsky <Alexey.Bogoslavsky@wdc.com>
Cc: Keith Busch <kbusch@kernel.org>,
linux-pci@vger.kernel.org, Bjorn Helgas <bhelgaas@google.com>,
Christoph Hellwig <hch@lst.de>,
Grant Grundler <grundler@chromium.org>,
Rajat Khandelwal <rajat.khandelwal@linux.intel.com>
Subject: Re: [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD
Date: Tue, 11 Apr 2023 17:15:04 -0500 [thread overview]
Message-ID: <20230411221504.GA4180865@bhelgaas> (raw)
In-Reply-To: <DM6PR04MB647368572FC2A56C2869B1758BC69@DM6PR04MB6473.namprd04.prod.outlook.com>
[+cc Grant, Rajat]
On Tue, Jan 17, 2023 at 06:15:28PM +0000, Alexey Bogoslavsky wrote:
> >From: Keith Busch <kbusch@kernel.org>
> >Sent: Tuesday, January 17, 2023 5:55 PM
> >To: Alexey Bogoslavsky <Alexey.Bogoslavsky@wdc.com>
> >Cc: linux-pci@vger.kernel.org; bhelgaas@google.com; 'hch@lst.de' <hch@lst.de>
> >Subject: Re: [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD
>
> >On Mon, Jan 16, 2023 at 06:32:54PM +0000, Alexey Bogoslavsky wrote:
> >> From: Alexey Bogoslavsky <mailto:Alexey.Bogoslavsky@wdc.com>
> >>
> >> A bug was found in SN730 WD SSD that causes occasional false AER reporting
> >> of correctable errors. While functionally harmless, this causes error
> >> messages to appear in the system log (dmesg) which, in turn, causes
> >> problems in automated platform validation tests. Since the issue can not
> >> be fixed by FW, customers asked for correctable error reporting to be
> >> quirked out in the kernel for this particular device.
> >
> >> The patch was manually verified. It was checked that correctable errors
> >> are still detected but ignored for the target device (SN730), and are both
> >> detected and reported for devices not affected by this quirk.
>
> >If you're just going to have the kernel ignore these, are you not able
> >to suppress the ERR_COR message at the source? Have the following
> >options been tried?
>
> > a. Disabling Correctable Error Reporting Enable in Device Control
> > Register; i.e. mask out PCI_EXP_DEVCTL_CERE.
> > b. Setting AER Correctable Error Mask Register to all 1's
>
> >I think it's usually possible for firmware to hardwire these. If the
>
> I believe these options were discussed but deemed non-viable. I'll
> double check anyway
>
> >If firmware can't do that, quirking the kernel to always disable
> >reporting sounds like a better option. If either of the above fail
> >to suppress the error messages, then I guess having the kernel
> >ignore it is the only option.
>
> This could probably work. I'll discuss this with our FW team to make
> sure the issue can be resolved this way. Thank you
Any resolution on this FW possibility?
We have patches in progress to rate-limit correctable error messages
and make them KERN_INFO instead of KERN_WARN [1], but I don't think
that's going to be a good enough solution for you because nobody wants
to see even an informational message every 5 seconds if the message is
useless.
If firmware on the device can turn off these errors, that would be the
best solution. If not, I think your quirk is a reasonable approach
and just needs a litle polishing per the previous comments.
Bjorn
[1] https://lore.kernel.org/r/20230317175109.3859943-1-grundler@chromium.org
next prev parent reply other threads:[~2023-04-11 22:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BY5PR04MB704131DBB47254C9F1FF12B38B409@BY5PR04MB7041.namprd04.prod.outlook.com>
2023-01-16 18:32 ` [PATCH 1/1] PCI/AER: Ignore correctable error reports for SN730 WD SSD Alexey Bogoslavsky
2023-01-17 7:14 ` 'hch@lst.de'
2023-01-17 13:20 ` Alexey Bogoslavsky
2023-01-17 14:22 ` Bjorn Helgaas
2023-01-17 18:06 ` Alexey Bogoslavsky
2023-01-17 15:54 ` Keith Busch
2023-01-17 18:15 ` Alexey Bogoslavsky
2023-04-11 22:15 ` Bjorn Helgaas [this message]
2023-04-24 11:27 ` Alexey Bogoslavsky
2023-04-28 22:00 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230411221504.GA4180865@bhelgaas \
--to=helgaas@kernel.org \
--cc=Alexey.Bogoslavsky@wdc.com \
--cc=bhelgaas@google.com \
--cc=grundler@chromium.org \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=rajat.khandelwal@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox