linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>,
	Borislav Petkov <bp@amd64.org>,
	Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Aristeu Rozanski <arozansk@redhat.com>,
	Doug Thompson <norsk5@yahoo.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] RAS: Add a tracepoint for reporting memory controller events
Date: Thu, 31 May 2012 12:00:05 +0200	[thread overview]
Message-ID: <20120531100005.GC14074@aftab.osrc.amd.com> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F192F6672@ORSMSX104.amr.corp.intel.com>

On Wed, May 30, 2012 at 11:24:41PM +0000, Luck, Tony wrote:
> >         u32 grain;              /* granularity of reported error in bytes */
> > 				   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> >> 			dimm->grain = nr_pages << PAGE_SHIFT;
> 
> I'm not at all sure what we'll see digging into the chipset registers
> like EDAC does - but we do have different granularity when reporting
> via machine check banks.  That's why we have this code:
> 
>                 /*
>                  * Mask the reported address by the reported granularity.
>                  */
>                 if (mce_ser && (m->status & MCI_STATUS_MISCV)) {
>                         u8 shift = MCI_MISC_ADDR_LSB(m->misc);
>                         m->addr >>= shift;
>                         m->addr <<= shift;

That's 64 bytes max, IIRC.

> in mce_read_aux().  In practice right now I think that many errors will
> report with cache line granularity,

Yep.

> while a few (IIRC patrol scrub) will report with page (4K)
> granularity. Linux doesn't really care - they all have to get rounded
> up to page size because we can't take away just one cache line from a
> process.

I'd like to see that :-)

> > @Tony: Can you ensure us that, on Intel memory controllers, the address
> > mask remains constant at module's lifetime, or are there any events that
> > may change it (memory hot-plug, mirror mode changes, interleaving 
> > reconfiguration, ...)?
> 
> I could see different controllers (or even different channels) having
> different setup if you have a system with different size/speed/#ranks
> DIMMs ... most systems today allow almost arbitrary mix & match, and the
> BIOS will decide which interleave modes are possible based on what it
> finds in the slots.  Mirroring imposes more constraints, so you will
> see less crazy options. Hot plug for Linux reduces to just the hot add
> case (as we still don't have a good way to remove DIMM sized chunks of
> memory) ... so I don't see any clever reconfiguration possibilities
> there (when you add memory, all the existing memory had better stay
> where it is, preserving contents).

You're funny :-)

> Perhaps the only option where things might change radically is socket
> migration ... where the constraint is only that the target of the
> migration have >= memory of the source. So you might move from some
> weird configuration with mixed DIMM sizes and thus no interleave, to a
> homogeneous socket with matched DIMMs and full interleave. But from an
> EDAC level, this is a new controller on a new socket ... not a changed
> configuration on an existing socket.

Right, from the frequency of such events happening, it still sounds to
me like the perfect place for the grain value is in sysfs.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

  reply	other threads:[~2012-05-31  9:59 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 16:31 [PATCH EDAC v26 00/66] EDAC patches for v3.5 Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 01/66] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 03/66] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 04/66] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 05/66] edac: rewrite edac_align_ptr() Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 06/66] edac.h: Add generic layers for describing a memory location Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 08/66] amd64_edac: convert driver to use the new edac ABI Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 09/66] amd76x_edac: " Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 10/66] cell_edac: " Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 11/66] cpc925_edac: " Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 12/66] e752x_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 13/66] e7xxx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 14/66] i3000_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 15/66] i3200_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 16/66] i5000_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 17/66] i5100_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 18/66] i5400_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 19/66] i7300_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 20/66] i7core_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 21/66] i82443bxgx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 22/66] i82860_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 23/66] i82875p_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 24/66] i82975x_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 25/66] mpc85xx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 26/66] mv64x60_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 27/66] pasemi_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 28/66] ppc4xx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 29/66] r82600_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 30/66] sb_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 31/66] tile_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 32/66] x38_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 33/66] edac: Remove the legacy EDAC ABI Mauro Carvalho Chehab
2012-05-18 17:51   ` Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 34/66] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 35/66] edac: Cleanup the logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 36/66] i5400_edac: improve debug messages to better represent the filled memory Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 37/66] RAS: Add a tracepoint for reporting memory controller events Mauro Carvalho Chehab
2012-05-24 10:14   ` [PATCH] " Mauro Carvalho Chehab
2012-05-24 10:56     ` Borislav Petkov
2012-05-24 16:13       ` Mauro Carvalho Chehab
2012-05-24 16:17         ` Mauro Carvalho Chehab
2012-05-24 16:45         ` Borislav Petkov
2012-05-24 18:00           ` Mauro Carvalho Chehab
2012-05-29 11:58             ` Borislav Petkov
2012-05-29 14:02               ` Mauro Carvalho Chehab
2012-05-29 14:52                 ` Borislav Petkov
2012-05-29 15:23                   ` Mauro Carvalho Chehab
2012-05-30 23:24                     ` Luck, Tony
2012-05-31 10:00                       ` Borislav Petkov [this message]
2012-05-31 10:33                         ` Mauro Carvalho Chehab
2012-05-31 12:17                           ` Borislav Petkov
2012-05-31 13:56                             ` Mauro Carvalho Chehab
2012-05-31 14:22                               ` Borislav Petkov
2012-05-31 14:44                                 ` Mauro Carvalho Chehab
2012-05-31 14:54                                   ` Borislav Petkov
2012-05-31 15:01                                     ` Mauro Carvalho Chehab
2012-05-31 15:14                                       ` Borislav Petkov
2012-05-31 16:14                                         ` Mauro Carvalho Chehab
2012-05-31 17:13                                           ` Borislav Petkov
2012-05-31 18:04                                             ` Mauro Carvalho Chehab
2012-05-31 18:33                                               ` Aristeu Rozanski
2012-05-31 19:37                                               ` Borislav Petkov
2012-05-31 19:32                                             ` Steven Rostedt
2012-05-31 19:42                                               ` Borislav Petkov
2012-05-31 20:11                                                 ` Steven Rostedt
2012-05-31 20:18                                                   ` Borislav Petkov
2012-05-31 20:52                                                     ` Luck, Tony
2012-06-01  9:10                                                       ` Borislav Petkov
2012-06-01  9:40                                                         ` Chen Gong
2012-06-01 12:15                                                         ` Mauro Carvalho Chehab
2012-06-01 15:42                                                         ` Luck, Tony
2012-06-01 16:00                                                           ` Borislav Petkov
2012-06-01 18:21                                                             ` Luck, Tony
2012-06-01 23:00                                                               ` Borislav Petkov
2012-06-01 23:19                                                                 ` Luck, Tony
2012-06-01 23:28                                                                   ` Borislav Petkov
2012-05-31 16:51                         ` Luck, Tony
2012-05-31 17:20                           ` Borislav Petkov
2012-05-31 18:14                             ` Luck, Tony
2012-05-31 19:26                               ` Borislav Petkov
2012-05-31 18:24                             ` Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 38/66] i5000_edac: Fix the logic that retrieves memory information Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 39/66] e752x_edac: provide more info about how DIMMS/ranks are mapped Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 40/66] edac: Rename the parent dev to pdev Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 41/66] edac: use Documentation-nano format for some data structs Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 42/66] edac: rewrite the sysfs code to use struct device Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 43/66] mpc85xx_edac: convert sysfs logic " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 44/66] amd64_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 45/66] i7core_edac: convert it " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 46/66] edac: Get rid of the old kobj's from the edac mc code Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 47/66] edac: add a new per-dimm API and make the old per-virtual-rank API obsolete Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 48/66] edac: add a sysfs node to report the maximum location for the system Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 49/66] edac: Add debufs nodes to allow doing fake error inject Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 50/66] edac: Move grain/dtype/edac_type calculus to be out of channel loop Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 51/66] i82975x_edac: Test nr_pages earlier to save a few CPU cycles Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 52/66] i5100_edac: Fix a warning when compiled with 32 bits Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 53/66] i7300_edac: Get rid of some wrongly-solved rebase conflict Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 54/66] edac: Only expose csrows/channels on legacy API if they're populated Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 55/66] edac: change the mem allocation scheme to make Documentation/kobject.txt happy Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 56/66] i7core_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 57/66] edac: move documentation ABI to ABI/testing/sysfs-devices-edac Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 58/66] Edac: Add ABI Documentation for the new device nodes Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 59/66] i5000: Fix the fatal error handling Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 60/66] i7core: fix ranks information at the per-channel struct Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 61/66] edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 62/66] edac: Use more normal debugging macro style Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 63/66] edac: Convert debugfX to edac_dbg(X, Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 64/66] edac_mc: Cleanup per-dimm_info debug messages Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 65/66] edac: Increase version to 3.0.0 Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 66/66] edac_mc: check for allocation failure in edac_mc_alloc() Mauro Carvalho Chehab
2012-05-18 16:46 ` [PATCH EDAC v26 00/66] EDAC patches for v3.5 Borislav Petkov
2012-05-18 17:43   ` Mauro Carvalho Chehab
2012-05-18 17:53     ` Borislav Petkov
2012-05-28 15:46       ` Mauro Carvalho Chehab
2012-05-28 20:36         ` Borislav Petkov
2012-05-28 23:13           ` Mauro Carvalho Chehab
2012-05-29  2:40             ` Chen Gong
2012-05-29 11:45               ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120531100005.GC14074@aftab.osrc.amd.com \
    --to=bp@amd64.org \
    --cc=arozansk@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@redhat.com \
    --cc=norsk5@yahoo.com \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).