From: James Morse <james.morse@arm.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Rui Zhao <ruizhao@microsoft.com>, Sasha Levin <sashal@kernel.org>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linux Kernel <linux-kernel@microsoft.com>,
"will.deacon@arm.com" <will.deacon@arm.com>,
"okaya@kernel.org" <okaya@kernel.org>
Subject: EDAC, dmc520:: add DMC520 EDAC driver
Date: Tue, 5 Feb 2019 17:31:24 +0000 [thread overview]
Message-ID: <e2105800-db52-513f-e163-3769a26d1dfb@arm.com> (raw)
Hi Boris,
On 23/01/2019 18:46, Borislav Petkov wrote:
> On Wed, Jan 23, 2019 at 06:36:23PM +0000, James Morse wrote:
>>> Would like to know what's the impact if this error happens, and how to fit it
>>> with current reporting in EDAC core.
>>
>> At a guess the interrupt triggers when link_err_count increases. (link_err has
>> an overflow bit, so the interrupt must be related to a counter).
>>
>> If we could associate a link with a layer in edac, we could report errors
>> against that point. But I've no idea how 'links' correspond with 'ranks and banks'!
> Well, I have no clue what kind of links you guys are talking but if
> those are per-chance coherent links used by cores to communicate in a
> coherent fabric, or cores and devices, what would showing those errors
> to the user bring ya?
(I mentioned this because its the next interrupt in the register, its an example
of something that may be added for another platform in the future, which affects
the DT and probing)
> Or are ya talking about different kinds of links?
... whatever the manual means by 'link', good point, it could be the
interconnect side.
'alert_mode_next', in the feature control register talks about DIMM training,
and says 'dfi_err' is treated a a link error. DFI is defined earlier as the 'DDR
PHY interface', so these must be links between the DMC520 and DDR.
> In any case, the first question to ask would be, can some agent or the
> user do something with the information that X or Y link errors happened?
>
> If not, then why bother?
> If yes, then that's a different story.
I agree. Surely if the DIMMs are socketed link-errors are another reason to
replace the DIMM.
It looks like this doesn't matter on Rui's platform,
Thanks,
James
WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse@arm.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Rui Zhao <ruizhao@microsoft.com>, Sasha Levin <sashal@kernel.org>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linux Kernel <linux-kernel@microsoft.com>,
"will.deacon@arm.com" <will.deacon@arm.com>,
"okaya@kernel.org" <okaya@kernel.org>
Subject: Re: [PATCH] EDAC, dmc520:: add DMC520 EDAC driver
Date: Tue, 5 Feb 2019 17:31:24 +0000 [thread overview]
Message-ID: <e2105800-db52-513f-e163-3769a26d1dfb@arm.com> (raw)
In-Reply-To: <20190123184639.GD3227@zn.tnic>
Hi Boris,
On 23/01/2019 18:46, Borislav Petkov wrote:
> On Wed, Jan 23, 2019 at 06:36:23PM +0000, James Morse wrote:
>>> Would like to know what's the impact if this error happens, and how to fit it
>>> with current reporting in EDAC core.
>>
>> At a guess the interrupt triggers when link_err_count increases. (link_err has
>> an overflow bit, so the interrupt must be related to a counter).
>>
>> If we could associate a link with a layer in edac, we could report errors
>> against that point. But I've no idea how 'links' correspond with 'ranks and banks'!
> Well, I have no clue what kind of links you guys are talking but if
> those are per-chance coherent links used by cores to communicate in a
> coherent fabric, or cores and devices, what would showing those errors
> to the user bring ya?
(I mentioned this because its the next interrupt in the register, its an example
of something that may be added for another platform in the future, which affects
the DT and probing)
> Or are ya talking about different kinds of links?
... whatever the manual means by 'link', good point, it could be the
interconnect side.
'alert_mode_next', in the feature control register talks about DIMM training,
and says 'dfi_err' is treated a a link error. DFI is defined earlier as the 'DDR
PHY interface', so these must be links between the DMC520 and DDR.
> In any case, the first question to ask would be, can some agent or the
> user do something with the information that X or Y link errors happened?
>
> If not, then why bother?
> If yes, then that's a different story.
I agree. Surely if the DIMMs are socketed link-errors are another reason to
replace the DIMM.
It looks like this doesn't matter on Rui's platform,
Thanks,
James
next reply other threads:[~2019-02-05 17:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-05 17:31 James Morse [this message]
2019-02-05 17:31 ` [PATCH] EDAC, dmc520:: add DMC520 EDAC driver James Morse
-- strict thread matches above, loose matches on Subject: below --
2019-03-06 5:20 Rui Zhao
2019-03-06 5:20 ` [PATCH] " Rui Zhao
2019-02-05 17:31 James Morse
2019-02-05 17:31 ` [PATCH] " James Morse
2019-01-23 22:08 Rui Zhao
2019-01-23 22:08 ` [PATCH] " Rui Zhao
2019-01-23 19:09 Sasha Levin
2019-01-23 19:09 ` [PATCH] " Sasha Levin
2019-01-23 19:03 Borislav Petkov
2019-01-23 19:03 ` [PATCH] " Borislav Petkov
2019-01-23 18:50 Sasha Levin
2019-01-23 18:50 ` [PATCH] " Sasha Levin
2019-01-23 18:46 Borislav Petkov
2019-01-23 18:46 ` [PATCH] " Borislav Petkov
2019-01-23 18:36 James Morse
2019-01-23 18:36 ` [PATCH] " James Morse
2019-01-21 17:09 James Morse
2019-01-21 17:09 ` [PATCH] " James Morse
2019-01-21 12:35 Borislav Petkov
2019-01-21 12:35 ` [PATCH] " Borislav Petkov
2019-01-18 16:23 Sasha Levin
2019-01-18 16:23 ` [PATCH] " Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2105800-db52-513f-e163-3769a26d1dfb@arm.com \
--to=james.morse@arm.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=okaya@kernel.org \
--cc=ruizhao@microsoft.com \
--cc=sashal@kernel.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.