linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: EDAC devel <linux-edac@vger.kernel.org>
Cc: Borislav Petkov <bp@amd64.org>, Tony Luck <tony.luck@intel.com>,
	Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>
Subject: [PATCHv7] EDAC core changes in order to properly report errors from all types of memory controllers
Date: Tue, 06 Mar 2012 21:20:27 -0300	[thread overview]
Message-ID: <4F56A9CB.2010504@redhat.com> (raw)
In-Reply-To: <20120306121616.GB11661@aftab>

Here it is the version 7 of the EDAC core changes.

Version 6 skipped due to a small issue on the series.

This series has only "cosmetic" changes over the last one. No
functional changes. What's different:

- Instead of 43 patches, this series contain 21 patches. Most of the
  dirty history were removed. It is now cleaner for review.

- A few coding style changes were applied (24 lines changed, most on
  some comments with more than 80 lines).

- The first approach to address the needs for non-csrow-based memory
  controllers were removed from the history. This made the series
  cleaner, as several patches could be folded, improving patch
  readability;

- patch descriptions were changed/improved.

The series now contains:

- 2 fix patches over upstream:
      edac/ppc4xx_edac: Fix compilation
      i5400_edac: Avoid calling pci_put_device() twice

- 1 comments improvements:
      edac: Improve the comments to better describe the memory concepts

- 1 internal struct renaming patch:
      edac: rename channel_info to rank_info

- 6 patches that prepare the internal structures to represent the memory
  properties per dimm, instead of per csrow. This is needed for modern
  controllers, where the memories at different channels may be different:
      edac: Create a dimm struct and move the labels into it
      edac: Add per dimm's sysfs nodes
      edac: move dimm properties to struct memset_info
      edac: Don't initialize csrow's first_page & friends when not needed
      edac: move nr_pages to dimm struct
      edac: Add per-dimm sysfs show nodes

- 2 patches that add proper support for FB-DIMM and for the modern Intel
  DDR2/DDR3 memory controllers: 
      edac: Fix core support for MC's that see DIMMS instead of ranks
      edac: Export MC hierarchy counters for CE and UE

- 1 log cleanup patch, that prepares for using a MCA based tracepoint:
      edac: Cleanup the logs for i7core and sb edac drivers

- 2 debug improvement patches:
      edac: Add a sysfs node to test the EDAC error report facility
      edac: Initialize the dimm label with the known information

- 5 post-FB-DIMM patches that cleans, fix and/or improve a few random things:
      edac_mc_sysfs: don't create inactive errcount sysfs nodes
      i5000_edac: Fix the logic that retrieves memory information
      edac: add a sysfs node that stores the max possible memory location
      edac: Call the sysfs nodes as "rank" instead of "dimm" if chip select is used
      i5400_edac: improve debug messages to better represent the filled memory

- 1 patch that adds a trace event to report memory errors:
      events/hw_event: Create a Hardware Events Report Mecanism (HERM)

While the preliminar tests is working ok on the machines I'm testing,
as I didn't finish the tests yet, some other fix patches may be needed,
but I'll insert them at the end of the series, as rebasing a large patchset
like that is very time-consuming.

So,  I think it is time to merge it at -next, in order to give more visibility
to it. So, tomorrow, I'll add it there, if I got no complains.

The above changes since commit 805a6af8dba5dfdd35ec35dc52ec0122400b2610:

  Linux 3.2 (2012-01-04 15:55:44 -0800)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac.git hw_events_v7


Em 06-03-2012 09:16, Borislav Petkov escreveu:
> On Tue, Mar 06, 2012 at 08:31:36AM -0300, Mauro Carvalho Chehab wrote:

>> For a FB-DIMM controller, the number of ranks is just a detail associated with
>> a given DIMM slot, as the memory is selected by slot, and not by rank.
>>
>> So, the logic is completely broken for single-rank memories and half-broken for 
>> double-rank ones.
> 
> I'm still wondering whether FBDIMM-based drivers should get their own
> EDAC infrastructure and own nomenclature instead of fitting them in the
> existing scheme...

A typical driver using csrow/channel describes the memory based on ranks. 
A FB-DIMM memory controller describes memory based on DIMMs. But those
are just the to opposite sides of the issue. There's a number of other
situations between them. Creating a FBDIMM-based won't cover them.

There are "non-typical" DDR2/DDR3 drivers that also describes the memory
internally using DIMMs, due to several factors:
	1) a rank is not a FRU. The FRU is a DIMM;
	2) several memory controllers hide the ranks information;
	3) some memory controllers have the number of ranks as a property
	   for a dimm;
	4) Some memory controllers allow using different dimms on separate
	   channels[1]. So, the memory at slot 0 at channel 0 can be different
	   than the one at channel 1.

[1] probably, there are some limits on it, depending on how the memory
    channels are interlaced, but it seems that the Intel memory controllers
    with 3 or 4 channels allow the usage of different memory sticks on
    each channel or channel pair.

After analyzing all EDAC drivers, the "typical" case is actually a minority,
nowadays.

Also, the upstream version currently has a per-rank memory label, with is
very bad, as two ranks at the same DIMM may receive two different labels.

So, it is actually better to convert the existing drivers to internally
represent the memory DIMMs.


Regards,
Mauro

  reply	other threads:[~2012-03-07  0:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-02 14:25 [RFC -v2 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-03-02 14:25 ` [PATCH 1/4] mce: Slim up struct mce Borislav Petkov
2012-03-02 17:47   ` Luck, Tony
2012-03-03  7:37     ` Ingo Molnar
2012-03-05  9:17       ` Borislav Petkov
2012-03-02 14:25 ` [PATCH 2/4] mce: Add a msg string to the MCE tracepoint Borislav Petkov
2012-03-02 14:25 ` [PATCH 3/4] x86, RAS: Add a decoded msg buffer Borislav Petkov
2012-03-02 14:25 ` [PATCH 4/4] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
2012-03-02 14:52   ` Mauro Carvalho Chehab
2012-03-05 11:04     ` Borislav Petkov
2012-03-05 11:43       ` Mauro Carvalho Chehab
2012-03-05 12:44         ` Borislav Petkov
2012-03-05 13:35           ` Mauro Carvalho Chehab
2012-03-05 14:13             ` Borislav Petkov
2012-03-05 14:58               ` Mauro Carvalho Chehab
2012-03-05 22:00                 ` [PATCHv5] EDAC core changes in order to properly report errors from all types of memory controllers Mauro Carvalho Chehab
2012-03-05 23:23                   ` Borislav Petkov
2012-03-06 11:31                     ` Mauro Carvalho Chehab
2012-03-06 12:16                       ` Borislav Petkov
2012-03-07  0:20                         ` Mauro Carvalho Chehab [this message]
2012-03-07  8:42                           ` [PATCHv7] " Borislav Petkov
2012-03-07 11:36                             ` Mauro Carvalho Chehab
2012-03-07 12:06                               ` Borislav Petkov
2012-03-07 12:13                                 ` Mauro Carvalho Chehab
2012-03-02 14:41 ` [RFC -v2 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Mauro Carvalho Chehab
2012-03-02 14:48   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F56A9CB.2010504@redhat.com \
    --to=mchehab@redhat.com \
    --cc=bp@amd64.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).