linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alex G." <mr.nuke.me@gmail.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	alex_gagniuc@dellteam.com, austin_bolen@dell.com,
	shyam_iyer@dell.com, "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Tony Luck <tony.luck@intel.com>,
	Tyler Baicar <tbaicar@codeaurora.org>,
	Will Deacon <will.deacon@arm.com>,
	James Morse <james.morse@arm.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	"Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()
Date: Tue, 22 May 2018 10:22:19 -0500	[thread overview]
Message-ID: <9b3823fc-a660-a619-68b9-43b879f81b05@gmail.com> (raw)
In-Reply-To: <20180522145426.GG5512@pd.tnic>



On 05/22/2018 09:54 AM, Borislav Petkov wrote:
> On Tue, May 22, 2018 at 09:39:15AM -0500, Alex G. wrote:
>> No, the problem is with the current approach, not with mine. The problem
>> is trying to handle the error outside of the existing handler. That's a
>> no-no, IMO.
> 
> Let me save you some time: until you come up with a proper solution for
> *all* PCIe errors so that the kernel can correctly decide what to do for
> each error based on its actual severity, consider this NAKed.

I do have a proper solution for _all_ PCIe errors. In fact, we discussed
several valid approaches already.

> I don't care about outside or inside of the handler 

I do. I have a handler that can handle (no pun intended) errors. I want
to use the same code path in native and GHES cases. If I allow ghes.c to
take different decisions than what aer_do_recovery() would, I've failed.

>- this thing needs to be done properly 

Exactly!

> and not just to serve your particular use case of
> abrupt removal of devices causing PCIe errors, and punish the rest.

I think you're confused about what I'm actually trying to do. Or maybe
you're confused about how PCIe errors work. That's understandable. PCIe
uses the term "fatal" for errors that may make the link unusable, and
which may require a link reset, and in most other specs "fatal" means
"on fire". I understand your confusion, and I hope I cleared it up.

You're trying to make the case that surprise removal is my only concern
and use case, because that's the example that I gave. It makes your
argument stronger, but it's wrong. You don't know our test setup, and
all the things I'm testing for, and whenever I try to tell you, you fall
back to the 'surprise removal' example.

I don't know why you'd think Dell would pay me to work on this if I were
to allow things like silent data corruption to creep in. This isn't a
traditional company from Redmond, Washington.

> I especially don't want to have the case where a PCIe error is *really*
> fatal and then we noodle in some handlers debating about the severity
> because it got marked as recoverable intermittently and end up causing
> data corruption on the storage device. Here's a real no-no for ya.

I especially don't want a kernel maintainer who hasn't even read the
recovery handler (let alone the spec around which the handler was
written) tell me how the recovery handler works and what it's supposed
to do (see, I can be an ass).
PCIe errors really are fatal. They might need to unload the driver and
remove the device. But somebody set the questionable policy that
"fatal"=="panic", and that is completely inappropriate for a larger
class of errors -- PCIe happens to be the easiest example to pick on.

And even you realize that the argument that a panic() will somehow
prevent data corruption is complete noodle sauce.

Alex

  reply	other threads:[~2018-05-22 15:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 13:49 [PATCH v6 0/2] acpi: apei: Improve PCIe error handling with FFS Alexandru Gagniuc
2018-05-21 13:49 ` [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity() Alexandru Gagniuc
2018-05-22  8:55   ` Rafael J. Wysocki
2018-05-22 13:38     ` Alex G.
2018-05-22 13:50       ` Borislav Petkov
2018-05-22 14:39         ` Alex G.
2018-05-22 14:54           ` Borislav Petkov
2018-05-22 15:22             ` Alex G. [this message]
2018-05-22 15:33               ` Borislav Petkov
2018-05-22 17:57             ` Luck, Tony
2018-05-22 18:10               ` Rafael J. Wysocki
2018-05-22 18:19                 ` Alex G.
2018-05-22 18:45                   ` Luck, Tony
2018-05-22 18:49                     ` Alex G.
2018-05-22 18:33                 ` Luck, Tony
2018-05-22 18:13               ` Alex G.
2018-05-22 18:13       ` Rafael J. Wysocki
2018-05-22 18:20         ` Alex G.
2018-05-22 21:20           ` Rafael J. Wysocki
2018-05-21 13:49 ` [PATCH v6 2/2] acpi: apei: Do not panic() on PCIe errors reported through GHES Alexandru Gagniuc
2018-05-21 14:27   ` Tyler Baicar
2018-05-21 17:40     ` Alex G.
2018-05-22  9:02   ` Rafael J. Wysocki
2018-05-22 14:32     ` Alex G.
2018-05-22 15:15       ` Tyler Baicar
2018-05-22 15:18         ` Alex G.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b3823fc-a660-a619-68b9-43b879f81b05@gmail.com \
    --to=mr.nuke.me@gmail.com \
    --cc=alex_gagniuc@dellteam.com \
    --cc=austin_bolen@dell.com \
    --cc=bp@alien8.de \
    --cc=gengdongjiu@huawei.com \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=shiju.jose@huawei.com \
    --cc=shyam_iyer@dell.com \
    --cc=tbaicar@codeaurora.org \
    --cc=tony.luck@intel.com \
    --cc=will.deacon@arm.com \
    --cc=zjzhang@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).