From: Andi Kleen <andi@firstfloor.org>
To: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Borislav Petkov <petkovbb@googlemail.com>,
mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
andi@firstfloor.org, tglx@linutronix.de,
Andreas Herrmann <andreas.herrmann3@amd.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
linux-tip-commits@vger.kernel.org,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Fr??d??ric Weisbecker <fweisbec@gmail.com>,
Aristeu Rozanski <aris@redhat.com>,
Doug Thompson <norsk5@yahoo.com>,
Huang Ying <ying.huang@intel.com>,
Arjan van de Ven <arjan@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll
Date: Wed, 24 Feb 2010 21:28:27 +0100 [thread overview]
Message-ID: <20100224202827.GA25414@basil.fritz.box> (raw)
In-Reply-To: <4B856500.9030108@infradead.org>
On Wed, Feb 24, 2010 at 02:42:24PM -0300, Mauro Carvalho Chehab wrote:
> Mauro Carvalho Chehab wrote:
> > The EDAC data model needs some discussion, as, currently, the memory is represented
> > per csrow, and modern MCU don't allow such level of control (and it doesn't
> > make much sense on representing this way, as you can't replace a csrow). The
> > better is to use DIMM as the minumum unit.
>
> Just to start the data model, this is what a typical EDAC driver presents:
First I suspect the result wouldn't be too compatible with current EDAC,
so it might be less confusing to give it a new name.
>
> /sys/devices/system/edac/mc/mc0/
> |-- ce_count
> |-- ce_noinfo_count
> |-- csrow0
> | |-- ce_count
> | |-- ch0_ce_count
> | |-- ch0_dimm_label
> | |-- ch1_ce_count
> | |-- ch1_dimm_label
> | |-- ch2_ce_count
> | |-- ch2_dimm_label
> | |-- ch3_ce_count
> | |-- ch3_dimm_label
> | |-- dev_type
> | |-- edac_mode
> | |-- mem_type
> | |-- size_mb
> | `-- ue_count
> |-- csrow1
> | |-- ce_count
> | |-- ch0_ce_count
> | |-- ch0_dimm_label
> | |-- ch1_ce_count
> | |-- ch1_dimm_label
> | |-- ch2_ce_count
> | |-- ch2_dimm_label
> | |-- ch3_ce_count
> | |-- ch3_dimm_label
> | |-- dev_type
> | |-- edac_mode
> | |-- mem_type
> | |-- size_mb
> | `-- ue_count
> |-- device -> ../../../../pci0000:3f/0000:3f:03.0
> |-- mc_name
> |-- reset_counters
> |-- sdram_scrub_rate
> |-- seconds_since_reset
> |-- size_mb
> |-- ue_count
> `-- ue_noinfo_count
>
> In the case of i7core_edac, there's no way to identify csrows by using
> the public registers (I've no idea is is there any non-documented register
> for it). So, the driver maps one dimm per "edac csrow".
Some thoughts on this:
One of my goals would be that a fall back driver can create the information
just from the SMBIOS (that is without error counts, although there
are some cases those can be mapped from corrected MCEs)
I think I would prefer a flat model:
socket <-> DIMM
socket could be arbitary, either a CPU socket, or a external memory controller
and then have a "path" string per DIMM that can be a arbitrary path to the DIMM,
depending on the system: e.g. on a system with on board buffers it could
include the buffer, the channel to the buffer and the channel behind
the buffer. The string could be free form like a path name e.g.
FOO-CH1/DDR-CH2/DIMMa
I think it doesn't make sense to have that in the file system hierarchy
itself (that would just make it harder to parse without any real
benefit, since hardware changes could completely change the directory
structure), I would just keep it as an attribute file in the leaf nodes.
I don't think we it makes much sense to have anything below the DIMM (e.g. ranks)
These are typically hard to get in a generic way and the most important
information on this level is what units to replace.
Then the other important part is a good way to manage fall back counters
when you cannot identify the target DIMM (that happens occasionally for various
reasons).
In this case would need a "aggregator object" with a truncated path
that still maintains the same counts.
Then of course there still needs to be a event interface to let someone
actually do something with all these counts without having to poll.
Most interesting cases always need user space support, so also of course
user space needs raw events too.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
next prev parent reply other threads:[~2010-02-24 20:28 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-21 22:17 [PATCH] x86: mce: Xeon75xx specific interface to get corrected memory error information Andi Kleen
2010-01-22 10:51 ` [tip:x86/mce] x86, " tip-bot for Andi Kleen
2010-01-22 10:51 ` [tip:x86/mce] x86, mce: Rename cpu_specific_poll to mce_cpu_specific_poll tip-bot for H. Peter Anvin
2010-01-23 5:17 ` Ingo Molnar
2010-01-23 7:58 ` Borislav Petkov
2010-01-23 9:00 ` Ingo Molnar
2010-01-24 10:08 ` Borislav Petkov
2010-01-25 13:19 ` Andi Kleen
2010-01-26 6:33 ` Borislav Petkov
2010-01-26 9:06 ` Hidetoshi Seto
2010-01-26 16:09 ` Andi Kleen
2010-01-26 15:36 ` Andi Kleen
2010-02-16 21:02 ` Ingo Molnar
2010-02-22 8:28 ` Borislav Petkov
2010-02-22 9:47 ` Ingo Molnar
2010-02-22 11:59 ` Mauro Carvalho Chehab
2010-02-24 17:42 ` Mauro Carvalho Chehab
2010-02-24 20:28 ` Andi Kleen [this message]
2010-01-27 12:34 ` Mauro Carvalho Chehab
2010-01-27 14:39 ` Andi Kleen
2010-01-27 15:04 ` Mauro Carvalho Chehab
2010-01-27 16:36 ` Andi Kleen
2010-01-23 11:33 ` Andi Kleen
2010-02-05 23:31 ` [tip:x86/mce] x86, mce: Make xeon75xx memory driver dependent on PCI tip-bot for Andi Kleen
2010-02-16 20:47 ` Ingo Molnar
2010-02-16 22:29 ` Andi Kleen
2010-02-19 10:50 ` Thomas Gleixner
2010-02-19 12:17 ` Andi Kleen
2010-02-19 12:45 ` Borislav Petkov
2010-02-19 13:21 ` Andi Kleen
2010-02-19 15:17 ` Mauro Carvalho Chehab
2010-02-19 15:37 ` Andi Kleen
2010-02-20 0:14 ` Mauro Carvalho Chehab
2010-02-20 9:01 ` Andi Kleen
2010-02-19 15:46 ` Thomas Gleixner
2010-02-22 7:38 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100224202827.GA25414@basil.fritz.box \
--to=andi@firstfloor.org \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=andreas.herrmann3@amd.com \
--cc=aris@redhat.com \
--cc=arjan@infradead.org \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mchehab@infradead.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=norsk5@yahoo.com \
--cc=petkovbb@googlemail.com \
--cc=rostedt@goodmis.org \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).