linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: Borislav Petkov <bp@amd64.org>
Cc: Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Aristeu Rozanski <arozansk@redhat.com>,
	Doug Thompson <norsk5@yahoo.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH] RAS: Add a tracepoint for reporting memory controller events
Date: Tue, 29 May 2012 12:23:23 -0300	[thread overview]
Message-ID: <4FC4E9EB.5030801@redhat.com> (raw)
In-Reply-To: <20120529145245.GG29157@aftab.osrc.amd.com>

Em 29-05-2012 11:52, Borislav Petkov escreveu:
> On Tue, May 29, 2012 at 11:02:10AM -0300, Mauro Carvalho Chehab wrote:
>> It seems you were unable to read the comments at the function that fills dimm->grain:
>>
>> 	/*
>> 	 * The dram rank boundary (DRB) reg values are boundary addresses
>> 	 * for each DRAM rank with a granularity of 64MB.  DRB regs are
>> 	 * cumulative; the last one will contain the total memory
>> 	 * contained in all ranks.
> 
> This looks like a bug:
> 
> "The DRAM Rank Boundary Register defines the upper boundary address
> of each DRAM rank with a granularity of 32 MB. Each rank has its own
> single-byte DRB register. These registers are used to determine which
> chip select will be active for a given address."
> 
> This is from http://www.intel.com/Assets/PDF/datasheet/306828.pdf which
> is 955X but it should be documenting the same thing - DRB.

Maybe i3200 is similar to 955x. I dunno, as I didn't write this driver.

> Now, if I'm reporting an error address and I'm saying "you had an error
> at X, but this error is somewhere in the X+64MB region", then I can
> simply say which rank it is. And we're doing that already with the
> layer-things.

Doesn't make sense, as a rank is bigger than 64 MB. I suspect that the
work "rank" is used to indicate something else, like the DRAM bank.

If so, an address at the 64MB region could be used to identify the DRAM
chip.

> 
> [ … ]
> 
>> That means that any correlation function used by an stochastic process
>> analysis will need to take the grain into account, in order to detect
>> if a series of errors are due to a random noise, or if they're due to
>> a physical problem at the device.
> 
> Dude, stop talking crap and concentrate. On which planet is granularity
> of the error 64 MB?
> 
> From <Documentation/edac.txt>:
> 
> ============================================================================
> SYSTEM LOGGING
> 
> If logging for UEs and CEs are enabled then system logs will have
> error notices indicating errors that have been detected:
> 
> EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
> channel 1 "DIMM_B1": amd76x_edac
> 
> EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
> channel 1 "DIMM_B1": amd76x_edac
> 
> 
> The structure of the message is:
>         the memory controller                   (MC0)
>         Error type                              (CE)
>         memory page                             (0x283)
>         offset in the page                      (0xce0)
>         the byte granularity                    (grain 8)
>                 or resolution of the error
> 	^^^^
> 
> and
> 
> struct csrow_info {
>         unsigned long first_page;       /* first page number in dimm */
>         unsigned long last_page;        /* last page number in dimm */
>         unsigned long page_mask;        /* used for interleaving -
>                                          * 0UL for non intlv
>                                          */
>         u32 nr_pages;           /* number of pages in csrow */
>         u32 grain;              /* granularity of reported error in bytes */
> 				   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>> 			dimm->grain = nr_pages << PAGE_SHIFT;

Grain unity is bytes, so it seems ok.

Also, you might not be noticed, but, at least on this driver, the grain
is per-memory module (and not a per-memory controller value).

> But none of that matters - the only thing that matters is that this
> thing is static and doesn't change for the module's lifetime.

I'm not so sure about that.

@Tony: Can you ensure us that, on Intel memory controllers, the address
mask remains contant at module's lifetime, or are there any events that
may change it (memory hot-plug, mirror mode changes, interleaving 
reconfiguration, ...)?

> 
> So add it as a part of some EDAC initialization printk which we print
> once on boot in dmesg and userspace tools can read it. Or to sysfs, if
> it makes more sense.
> 
> But not in _each_ tracepoint record, filling the buffers with useless info.
> 

Regards,
Mauro

  reply	other threads:[~2012-05-29 15:23 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 16:31 [PATCH EDAC v26 00/66] EDAC patches for v3.5 Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 01/66] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 03/66] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 04/66] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 05/66] edac: rewrite edac_align_ptr() Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 06/66] edac.h: Add generic layers for describing a memory location Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 08/66] amd64_edac: convert driver to use the new edac ABI Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 09/66] amd76x_edac: " Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 10/66] cell_edac: " Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 11/66] cpc925_edac: " Mauro Carvalho Chehab
2012-05-18 16:31 ` [PATCH EDAC v26 12/66] e752x_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 13/66] e7xxx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 14/66] i3000_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 15/66] i3200_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 16/66] i5000_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 17/66] i5100_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 18/66] i5400_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 19/66] i7300_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 20/66] i7core_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 21/66] i82443bxgx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 22/66] i82860_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 23/66] i82875p_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 24/66] i82975x_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 25/66] mpc85xx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 26/66] mv64x60_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 27/66] pasemi_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 28/66] ppc4xx_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 29/66] r82600_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 30/66] sb_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 31/66] tile_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 32/66] x38_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 33/66] edac: Remove the legacy EDAC ABI Mauro Carvalho Chehab
2012-05-18 17:51   ` Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 34/66] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 35/66] edac: Cleanup the logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 36/66] i5400_edac: improve debug messages to better represent the filled memory Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 37/66] RAS: Add a tracepoint for reporting memory controller events Mauro Carvalho Chehab
2012-05-24 10:14   ` [PATCH] " Mauro Carvalho Chehab
2012-05-24 10:56     ` Borislav Petkov
2012-05-24 16:13       ` Mauro Carvalho Chehab
2012-05-24 16:17         ` Mauro Carvalho Chehab
2012-05-24 16:45         ` Borislav Petkov
2012-05-24 18:00           ` Mauro Carvalho Chehab
2012-05-29 11:58             ` Borislav Petkov
2012-05-29 14:02               ` Mauro Carvalho Chehab
2012-05-29 14:52                 ` Borislav Petkov
2012-05-29 15:23                   ` Mauro Carvalho Chehab [this message]
2012-05-30 23:24                     ` Luck, Tony
2012-05-31 10:00                       ` Borislav Petkov
2012-05-31 10:33                         ` Mauro Carvalho Chehab
2012-05-31 12:17                           ` Borislav Petkov
2012-05-31 13:56                             ` Mauro Carvalho Chehab
2012-05-31 14:22                               ` Borislav Petkov
2012-05-31 14:44                                 ` Mauro Carvalho Chehab
2012-05-31 14:54                                   ` Borislav Petkov
2012-05-31 15:01                                     ` Mauro Carvalho Chehab
2012-05-31 15:14                                       ` Borislav Petkov
2012-05-31 16:14                                         ` Mauro Carvalho Chehab
2012-05-31 17:13                                           ` Borislav Petkov
2012-05-31 18:04                                             ` Mauro Carvalho Chehab
2012-05-31 18:33                                               ` Aristeu Rozanski
2012-05-31 19:37                                               ` Borislav Petkov
2012-05-31 19:32                                             ` Steven Rostedt
2012-05-31 19:42                                               ` Borislav Petkov
2012-05-31 20:11                                                 ` Steven Rostedt
2012-05-31 20:18                                                   ` Borislav Petkov
2012-05-31 20:52                                                     ` Luck, Tony
2012-06-01  9:10                                                       ` Borislav Petkov
2012-06-01  9:40                                                         ` Chen Gong
2012-06-01 12:15                                                         ` Mauro Carvalho Chehab
2012-06-01 15:42                                                         ` Luck, Tony
2012-06-01 16:00                                                           ` Borislav Petkov
2012-06-01 18:21                                                             ` Luck, Tony
2012-06-01 23:00                                                               ` Borislav Petkov
2012-06-01 23:19                                                                 ` Luck, Tony
2012-06-01 23:28                                                                   ` Borislav Petkov
2012-05-31 16:51                         ` Luck, Tony
2012-05-31 17:20                           ` Borislav Petkov
2012-05-31 18:14                             ` Luck, Tony
2012-05-31 19:26                               ` Borislav Petkov
2012-05-31 18:24                             ` Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 38/66] i5000_edac: Fix the logic that retrieves memory information Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 39/66] e752x_edac: provide more info about how DIMMS/ranks are mapped Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 40/66] edac: Rename the parent dev to pdev Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 41/66] edac: use Documentation-nano format for some data structs Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 42/66] edac: rewrite the sysfs code to use struct device Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 43/66] mpc85xx_edac: convert sysfs logic " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 44/66] amd64_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 45/66] i7core_edac: convert it " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 46/66] edac: Get rid of the old kobj's from the edac mc code Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 47/66] edac: add a new per-dimm API and make the old per-virtual-rank API obsolete Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 48/66] edac: add a sysfs node to report the maximum location for the system Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 49/66] edac: Add debufs nodes to allow doing fake error inject Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 50/66] edac: Move grain/dtype/edac_type calculus to be out of channel loop Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 51/66] i82975x_edac: Test nr_pages earlier to save a few CPU cycles Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 52/66] i5100_edac: Fix a warning when compiled with 32 bits Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 53/66] i7300_edac: Get rid of some wrongly-solved rebase conflict Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 54/66] edac: Only expose csrows/channels on legacy API if they're populated Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 55/66] edac: change the mem allocation scheme to make Documentation/kobject.txt happy Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 56/66] i7core_edac: " Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 57/66] edac: move documentation ABI to ABI/testing/sysfs-devices-edac Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 58/66] Edac: Add ABI Documentation for the new device nodes Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 59/66] i5000: Fix the fatal error handling Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 60/66] i7core: fix ranks information at the per-channel struct Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 61/66] edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 62/66] edac: Use more normal debugging macro style Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 63/66] edac: Convert debugfX to edac_dbg(X, Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 64/66] edac_mc: Cleanup per-dimm_info debug messages Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 65/66] edac: Increase version to 3.0.0 Mauro Carvalho Chehab
2012-05-18 16:32 ` [PATCH EDAC v26 66/66] edac_mc: check for allocation failure in edac_mc_alloc() Mauro Carvalho Chehab
2012-05-18 16:46 ` [PATCH EDAC v26 00/66] EDAC patches for v3.5 Borislav Petkov
2012-05-18 17:43   ` Mauro Carvalho Chehab
2012-05-18 17:53     ` Borislav Petkov
2012-05-28 15:46       ` Mauro Carvalho Chehab
2012-05-28 20:36         ` Borislav Petkov
2012-05-28 23:13           ` Mauro Carvalho Chehab
2012-05-29  2:40             ` Chen Gong
2012-05-29 11:45               ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC4E9EB.5030801@redhat.com \
    --to=mchehab@redhat.com \
    --cc=arozansk@redhat.com \
    --cc=bp@amd64.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=norsk5@yahoo.com \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).