From: Brijesh Singh <brijeshkumar.singh@amd.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: brijeshkumar.singh@amd.com, linux-arm-kernel@lists.infradead.org,
robh+dt@kernel.org, pawel.moll@arm.com,
ijc+devicetree@hellion.org.uk, galak@codeaurora.org,
dougthompson@xmission.com, bp@alien8.de, mchehab@osg.samsung.com,
devicetree@vger.kernel.org, guohanjun@huawei.com,
andre.przywara@arm.com, arnd@arndb.de, sboyd@codeaurora.org,
linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org
Subject: Re: [PATCH v4] EDAC: Add ARM64 EDAC
Date: Fri, 30 Oct 2015 11:26:58 -0500 [thread overview]
Message-ID: <56339A52.8060701@amd.com> (raw)
In-Reply-To: <20151028164053.GG25451@leverpostej>
Hi Mark,
>> +
>> +Required properties:
>> +- compatible: Should be "arm,cortex-a57-edac" or "arm,cortex-a53-edac"
>> +
>> +Example:
>> + edac {
>> + compatible = "arm,cortex-a57-edac";
>> + };
>> +
>
> This is insufficient for big.LITTLE, no interrupt is possible, and we
> haven't defined the rules for accessing the registers (e.g. whether
> write backs are permitted).
>
> Please see my prior comments [1] on those points.
>
> If we're going to use this feature directly within the kernel, we need
> to consider the envelope of possible implementations rather than your
> use-case alone.
>
I have looked at possibility of pushing correctable error logging in the
firmware; but given current hardware limitation it seems like OS is the best
place to implement it. Let me summaries the issues we are running into:
* Correctable errors does not generate any interrupt:
If we have to implement error parsing inside the firmware then work need
to be split between OS and firmware. Maybe OS can call SMC instruction to
dial into firmware and then firmware can check error syndrome registers;
if it finds correctable error then build HEST table. This method will introduce
performance issue because it require OS executing SMC every 100ms or so to just
poll for correctable error. If you have any other recommendation then please share it.
> * Interaction with firmware
> - When/do we handle interrupts?
We can a properties in dt bindings:
1) "num-interrupts = 1" - number of interrupt count. One interrupts per cluster
e.g if you have 4 cluster then num-interrupts=4.
2) interrupts = <0, 92, 0> <0, 94, 0> <0, 96, 0> <0, 98, 0> // interrupt mapping
If num-interrupts = 0, then firmware handles interrupts. Optionally we can use HEST FIRMWARE-FIRST
bit, if bit is set then firmware is handling the interrupt otherwise use DT information.
>
> - When is it valid to write back and clear an error? We should not do
> this behind the back of any firmware that owns the interface.
As per A57 TRM is concerned you are right both the correctable and uncorrectable
error needs to clear VALID bit in L1/L2 syndrome registers. So yes we need to define
a rule for accessing the registers. I can think of two possible approach here:
1) add "error-syndrome-reg-write-access=1" property in dt.
* if '1' then OS has exclusive write backs access to error syndrome register
* if '0' then OS will not clear the valid bit on fatal error
The handler looks like this:
parse_error_syndrome () {
val = read_cpumerrsr
if (!IS_VALID(val))
return
/* log the error details */
/* if fatal error and OS does not have exclusive write back access */
if (IS_FATAL(val) && !error-syndrom-reg-write-access)
return;
val = ~(1UL << 31); /* clear valid bit */
}
2) Use HEST FIRMWARE-FIRST bit field, if the bit is set then OS should not clear
the valid bit on fatal error and similarly if bit is clear then OS clears the VALID bit.
Since firmware will never handle the correctable error hence its always safe to clear
the VALID bit on non-fatal error. If you have any other suggestions then please share it.
I am not pushing my use-case only; I am trying to work through current hardware
limitation and still support all the possibilities. I am open to hear your suggestions.
I am also not well versed on big.LITTILE CPU, so you may need to point me on right
direction as we progress. My testing is limited to Cortex A57.
>
> I don't think the use of old_mask is sufficient here, given the mapping
> of logical to physical ID is arbitrary. For example, we could have CPUs
> 0,5,6,7 in one cluster, and CPUs 1,2,3,4 in another, and in that case
> we'd check the first cluster twice.
>
Noted. I should use physical ID instead of logical mapping.
> This also is wrong for big.LITTLE; we can't necessarily check on every
> CPU.
>
next prev parent reply other threads:[~2015-10-30 16:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-28 16:13 [PATCH v4] EDAC: Add ARM64 EDAC Brijesh Singh
[not found] ` <1446048829-3359-1-git-send-email-brijeshkumar.singh-5C7GfCeVMHo@public.gmane.org>
2015-10-28 16:40 ` Mark Rutland
2015-10-30 16:26 ` Brijesh Singh [this message]
2015-10-30 17:06 ` Mark Rutland
2015-10-30 17:32 ` Mark Rutland
2015-10-30 17:51 ` Borislav Petkov
2015-10-30 19:12 ` Brijesh Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56339A52.8060701@amd.com \
--to=brijeshkumar.singh@amd.com \
--cc=andre.przywara@arm.com \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=devicetree@vger.kernel.org \
--cc=dougthompson@xmission.com \
--cc=galak@codeaurora.org \
--cc=guohanjun@huawei.com \
--cc=ijc+devicetree@hellion.org.uk \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mchehab@osg.samsung.com \
--cc=pawel.moll@arm.com \
--cc=robh+dt@kernel.org \
--cc=sboyd@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).