linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] edac: Remove unused mpc85xx debug code
@ 2009-11-20  0:42 Peter Tyser
  2009-11-20  0:42 ` [PATCH 2/4] edac: Fix mpc85xx page calculation Peter Tyser
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Peter Tyser @ 2009-11-20  0:42 UTC (permalink / raw)
  To: bluesmoke-devel; +Cc: Peter Tyser, linuxppc-dev, djiang

Some unused, unsupported debug code existed in the mpc85xx EDAC driver
that resulted in a build failure when CONFIG_EDAC_DEBUG was defined:

  drivers/edac/mpc85xx_edac.c: In function 'mpc85xx_mc_err_probe':
  drivers/edac/mpc85xx_edac.c:1031: error: implicit declaration of function 'edac_mc_register_mcidev_debug'
  drivers/edac/mpc85xx_edac.c:1031: error: 'debug_attr' undeclared (first use in this function)
  drivers/edac/mpc85xx_edac.c:1031: error: (Each undeclared identifier is reported only once
  drivers/edac/mpc85xx_edac.c:1031: error: for each function it appears in.)

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
---
 drivers/edac/mpc85xx_edac.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index cf27402..28d3211 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -892,10 +892,6 @@ static int __devinit mpc85xx_mc_err_probe(struct of_device *op,
 
 	mpc85xx_init_csrows(mci);
 
-#ifdef CONFIG_EDAC_DEBUG
-	edac_mc_register_mcidev_debug((struct attribute **)debug_attr);
-#endif
-
 	/* store the original error disable bits */
 	orig_ddr_err_disable =
 	    in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DISABLE);
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] edac: Fix mpc85xx page calculation
  2009-11-20  0:42 [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
@ 2009-11-20  0:42 ` Peter Tyser
  2009-11-20  0:42 ` [PATCH 3/4] edac: Mask mpc85xx ECC syndrome appropriately Peter Tyser
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Tyser @ 2009-11-20  0:42 UTC (permalink / raw)
  To: bluesmoke-devel; +Cc: Peter Tyser, linuxppc-dev, djiang

Commit b4846251727a38a7f248e41308c060995371dd05 accidentally broke how a
chip select's first and last page addresses are calculated.  The page
addresses are being shifted too far right by PAGE_SHIFT.  This results
in errors such as:

  EDAC MPC85xx MC1: Err addr: 0x003075c0
  EDAC MPC85xx MC1: PFN: 0x00000307
  EDAC MPC85xx MC1: PFN out of range!
  EDAC MC1: INTERNAL ERROR: row out of range (4 >= 4)
  EDAC MC1: CE - no information available: INTERNAL ERROR

The vale of PAGE_SHIFT is already being taken into consideration during
the calculation of the 'start' and 'end' variables, thus it is not
necessary to account for it again when setting a chip select's first and
last page address.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
---
 drivers/edac/mpc85xx_edac.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index 28d3211..ecd5928 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -804,8 +804,8 @@ static void __devinit mpc85xx_init_csrows(struct mem_ctl_info *mci)
 		end   <<= (24 - PAGE_SHIFT);
 		end    |= (1 << (24 - PAGE_SHIFT)) - 1;
 
-		csrow->first_page = start >> PAGE_SHIFT;
-		csrow->last_page = end >> PAGE_SHIFT;
+		csrow->first_page = start;
+		csrow->last_page = end;
 		csrow->nr_pages = end + 1 - start;
 		csrow->grain = 8;
 		csrow->mtype = mtype;
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] edac: Mask mpc85xx ECC syndrome appropriately
  2009-11-20  0:42 [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
  2009-11-20  0:42 ` [PATCH 2/4] edac: Fix mpc85xx page calculation Peter Tyser
@ 2009-11-20  0:42 ` Peter Tyser
  2009-11-20  0:42 ` [PATCH 4/4] edac: Improve SDRAM error reporting for mpc85xx Peter Tyser
  2010-01-25 17:20 ` [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Tyser @ 2009-11-20  0:42 UTC (permalink / raw)
  To: bluesmoke-devel; +Cc: Peter Tyser, linuxppc-dev, djiang

With a 64-bit wide data bus only the lowest 8-bits of the ECC syndrome
are relevant.  With a 32-bit wide data bus only the lowest 16-bits are
relevant on most architectures.

Without this change, the ECC syndrome displayed can be mildly confusing,
eg:

  EDAC MPC85xx MC1: syndrome: 0x25252525

When in reality the ECC syndrome is 0x25.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
---
A variety of Freescale manual's say a variety of different things about
how to decode the CAPTURE_ECC (syndrome) register.  I don't have a
system with a 32-bit bus to test on, but I believe the change is
correct.  It'd be good to get an ACK from someone at Freescale
about this change though.

 drivers/edac/mpc85xx_edac.c |   12 +++++++++++-
 drivers/edac/mpc85xx_edac.h |    3 +++
 2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index ecd5928..6d0114a 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -672,6 +672,7 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 {
 	struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
 	struct csrow_info *csrow;
+	u32 bus_width;
 	u32 err_detect;
 	u32 syndrome;
 	u32 err_addr;
@@ -692,6 +693,15 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 	}
 
 	syndrome = in_be32(pdata->mc_vbase + MPC85XX_MC_CAPTURE_ECC);
+
+	/* Mask off appropriate bits of syndrome based on bus width */
+	bus_width = (in_be32(pdata->mc_vbase + MPC85XX_MC_DDR_SDRAM_CFG) &
+			DSC_DBW_MASK) ? 32 : 64;
+	if (bus_width == 64)
+		syndrome &= 0xff;
+	else
+		syndrome &= 0xffff;
+
 	err_addr = in_be32(pdata->mc_vbase + MPC85XX_MC_CAPTURE_ADDRESS);
 	pfn = err_addr >> PAGE_SHIFT;
 
@@ -707,7 +717,7 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 	mpc85xx_mc_printk(mci, KERN_ERR, "Capture Data Low: %#8.8x\n",
 			  in_be32(pdata->mc_vbase +
 				  MPC85XX_MC_CAPTURE_DATA_LO));
-	mpc85xx_mc_printk(mci, KERN_ERR, "syndrome: %#8.8x\n", syndrome);
+	mpc85xx_mc_printk(mci, KERN_ERR, "syndrome: %#2.2x\n", syndrome);
 	mpc85xx_mc_printk(mci, KERN_ERR, "err addr: %#8.8x\n", err_addr);
 	mpc85xx_mc_printk(mci, KERN_ERR, "PFN: %#8.8x\n", pfn);
 
diff --git a/drivers/edac/mpc85xx_edac.h b/drivers/edac/mpc85xx_edac.h
index 52432ee..cb24df8 100644
--- a/drivers/edac/mpc85xx_edac.h
+++ b/drivers/edac/mpc85xx_edac.h
@@ -48,6 +48,9 @@
 #define DSC_MEM_EN	0x80000000
 #define DSC_ECC_EN	0x20000000
 #define DSC_RD_EN	0x10000000
+#define DSC_DBW_MASK	0x00180000
+#define DSC_DBW_32	0x00080000
+#define DSC_DBW_64	0x00000000
 
 #define DSC_SDTYPE_MASK		0x07000000
 
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] edac: Improve SDRAM error reporting for mpc85xx
  2009-11-20  0:42 [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
  2009-11-20  0:42 ` [PATCH 2/4] edac: Fix mpc85xx page calculation Peter Tyser
  2009-11-20  0:42 ` [PATCH 3/4] edac: Mask mpc85xx ECC syndrome appropriately Peter Tyser
@ 2009-11-20  0:42 ` Peter Tyser
  2010-01-25 17:20 ` [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Tyser @ 2009-11-20  0:42 UTC (permalink / raw)
  To: bluesmoke-devel; +Cc: Peter Tyser, linuxppc-dev, djiang

Add the ability to detect the specific data line or ECC line which
failed when printing out SDRAM single-bit errors.  An example of a
single-bit SDRAM ECC error is below:

  EDAC MPC85xx MC1: Err Detect Register: 0x80000004
  EDAC MPC85xx MC1: Faulty data bit: 59
  EDAC MPC85xx MC1: Expected Data / ECC:  0x7f80d000_409effa0 / 0x6d
  EDAC MPC85xx MC1: Captured Data / ECC:  0x7780d000_409effa0 / 0x6d
  EDAC MPC85xx MC1: Err addr: 0x00031ca0
  EDAC MPC85xx MC1: PFN: 0x00000031

Knowning which specific data or ECC line caused an error can be useful in
tracking down hardware issues such as improperly terminated signals,
loose pins, etc.

Note that this feature is only currently enabled for 64-bit wide data
buses, 32-bit wide bus support should be added.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
---
I don't have any 32-bit wide systems to test on.  If someone has one
and is willing to give this patch a shot with the check for a 64-bit
data bus removed it would be much appreciated and I can re-submit
with both 32 and 64 bit buses supported.

 drivers/edac/mpc85xx_edac.c |  146 ++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index 6d0114a..517042f 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -668,6 +668,111 @@ static struct of_platform_driver mpc85xx_l2_err_driver = {
 
 /**************************** MC Err device ***************************/
 
+/*
+ * Taken from table 8-55 in the MPC8641 User's Manual and/or 9-61 in the
+ * MPC8572 User's Manual.  Each line represents a syndrome bit column as a
+ * 64-bit value, but split into an upper and lower 32-bit chunk.  The labels
+ * below correspond to Freescale's manuals.
+ */
+static unsigned int ecc_table[16] = {
+	/* MSB           LSB */
+	/* [0:31]    [32:63] */
+	0xf00fe11e, 0xc33c0ff7,	/* Syndrome bit 7 */
+	0x00ff00ff, 0x00fff0ff,
+	0x0f0f0f0f, 0x0f0fff00,
+	0x11113333, 0x7777000f,
+	0x22224444, 0x8888222f,
+	0x44448888, 0xffff4441,
+	0x8888ffff, 0x11118882,
+	0xffff1111, 0x22221114,	/* Syndrome bit 0 */
+};
+
+/*
+ * Calculate the correct ECC value for a 64-bit value specified by high:low
+ */
+static u8 calculate_ecc(u32 high, u32 low)
+{
+	u32 mask_low;
+	u32 mask_high;
+	int bit_cnt;
+	u8 ecc = 0;
+	int i;
+	int j;
+
+	for (i = 0; i < 8; i++) {
+		mask_high = ecc_table[i * 2];
+		mask_low = ecc_table[i * 2 + 1];
+		bit_cnt = 0;
+
+		for (j = 0; j < 32; j++) {
+			if ((mask_high >> j) & 1)
+				bit_cnt ^= (high >> j) & 1;
+			if ((mask_low >> j) & 1)
+				bit_cnt ^= (low >> j) & 1;
+		}
+
+		ecc |= bit_cnt << i;
+	}
+
+	return ecc;
+}
+
+/*
+ * Create the syndrome code which is generated if the data line specified by
+ * 'bit' failed.  Eg generate an 8-bit codes seen in Table 8-55 in the MPC8641
+ * User's Manual and 9-61 in the MPC8572 User's Manual.
+ */
+static u8 syndrome_from_bit(unsigned int bit) {
+	int i;
+	u8 syndrome = 0;
+
+	/*
+	 * Cycle through the upper or lower 32-bit portion of each value in
+	 * ecc_table depending on if 'bit' is in the upper or lower half of
+	 * 64-bit data.
+	 */
+	for (i = bit < 32; i < 16; i += 2)
+		syndrome |= ((ecc_table[i] >> (bit % 32)) & 1) << (i / 2);
+
+	return syndrome;
+}
+
+/*
+ * Decode data and ecc syndrome to determine what went wrong
+ * Note: This can only decode single-bit errors
+ */
+static void sbe_ecc_decode(u32 cap_high, u32 cap_low, u32 cap_ecc,
+		       int *bad_data_bit, int *bad_ecc_bit)
+{
+	int i;
+	u8 syndrome;
+
+	*bad_data_bit = -1;
+	*bad_ecc_bit = -1;
+
+	/*
+	 * Calculate the ECC of the captured data and XOR it with the captured
+	 * ECC to find an ECC syndrome value we can search for
+	 */
+	syndrome = calculate_ecc(cap_high, cap_low) ^ cap_ecc;
+
+	/* Check if a data line is stuck... */
+	for (i = 0; i < 64; i++) {
+		if (syndrome == syndrome_from_bit(i)) {
+			*bad_data_bit = i;
+			return;
+		}
+	}
+
+	/* If data is correct, check ECC bits for errors... */
+	for (i = 0; i < 8; i++) {
+		if ((syndrome >> i) & 0x1) {
+			*bad_ecc_bit = i;
+			return;
+		}
+	}
+}
+
 static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 {
 	struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
@@ -678,6 +783,10 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 	u32 err_addr;
 	u32 pfn;
 	int row_index;
+	u32 cap_high;
+	u32 cap_low;
+	int bad_data_bit;
+	int bad_ecc_bit;
 
 	err_detect = in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT);
 	if (!err_detect)
@@ -711,14 +820,35 @@ static void mpc85xx_mc_check(struct mem_ctl_info *mci)
 			break;
 	}
 
-	mpc85xx_mc_printk(mci, KERN_ERR, "Capture Data High: %#8.8x\n",
-			  in_be32(pdata->mc_vbase +
-				  MPC85XX_MC_CAPTURE_DATA_HI));
-	mpc85xx_mc_printk(mci, KERN_ERR, "Capture Data Low: %#8.8x\n",
-			  in_be32(pdata->mc_vbase +
-				  MPC85XX_MC_CAPTURE_DATA_LO));
-	mpc85xx_mc_printk(mci, KERN_ERR, "syndrome: %#2.2x\n", syndrome);
-	mpc85xx_mc_printk(mci, KERN_ERR, "err addr: %#8.8x\n", err_addr);
+	cap_high = in_be32(pdata->mc_vbase + MPC85XX_MC_CAPTURE_DATA_HI);
+	cap_low = in_be32(pdata->mc_vbase + MPC85XX_MC_CAPTURE_DATA_LO);
+
+	/*
+	 * Analyze single-bit errors on 64-bit wide buses
+	 * TODO: Add support for 32-bit wide buses
+	 */
+	if ((err_detect & DDR_EDE_SBE) && (bus_width == 64)) {
+		sbe_ecc_decode(cap_high, cap_low, syndrome,
+				&bad_data_bit, &bad_ecc_bit);
+
+		if (bad_data_bit != -1)
+			mpc85xx_mc_printk(mci, KERN_ERR,
+				"Faulty Data bit: %d\n", bad_data_bit);
+		if (bad_ecc_bit != -1)
+			mpc85xx_mc_printk(mci, KERN_ERR,
+				"Faulty ECC bit: %d\n", bad_ecc_bit);
+
+		mpc85xx_mc_printk(mci, KERN_ERR,
+			"Expected Data / ECC:\t%#8.8x_%08x / %#2.2x\n",
+			cap_high ^ (1 << (bad_data_bit - 32)),
+			cap_low ^ (1 << bad_data_bit),
+			syndrome ^ (1 << bad_ecc_bit));
+	}
+
+	mpc85xx_mc_printk(mci, KERN_ERR,
+			"Captured Data / ECC:\t%#8.8x_%08x / %#2.2x\n",
+			cap_high, cap_low, syndrome);
+	mpc85xx_mc_printk(mci, KERN_ERR, "Err addr: %#8.8x\n", err_addr);
 	mpc85xx_mc_printk(mci, KERN_ERR, "PFN: %#8.8x\n", pfn);
 
 	/* we are out of range */
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/4] edac: Remove unused mpc85xx debug code
  2009-11-20  0:42 [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
                   ` (2 preceding siblings ...)
  2009-11-20  0:42 ` [PATCH 4/4] edac: Improve SDRAM error reporting for mpc85xx Peter Tyser
@ 2010-01-25 17:20 ` Peter Tyser
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Tyser @ 2010-01-25 17:20 UTC (permalink / raw)
  To: bluesmoke-devel; +Cc: linuxppc-dev, djiang

On Thu, 2009-11-19 at 18:42 -0600, Peter Tyser wrote:
> Some unused, unsupported debug code existed in the mpc85xx EDAC driver
> that resulted in a build failure when CONFIG_EDAC_DEBUG was defined:
> 
>   drivers/edac/mpc85xx_edac.c: In function 'mpc85xx_mc_err_probe':
>   drivers/edac/mpc85xx_edac.c:1031: error: implicit declaration of function 'edac_mc_register_mcidev_debug'
>   drivers/edac/mpc85xx_edac.c:1031: error: 'debug_attr' undeclared (first use in this function)
>   drivers/edac/mpc85xx_edac.c:1031: error: (Each undeclared identifier is reported only once
>   drivers/edac/mpc85xx_edac.c:1031: error: for each function it appears in.)
> 
> Signed-off-by: Peter Tyser <ptyser@xes-inc.com>

<snip>

Any chance some of these will be picked up for 2.6.33?

I added Kumar on CC as I'm not sure if these should go through the EDAC
or PPC tree.

1/4 fixes a build error when CONFIG_EDAC_DEBUG is defined and is pretty
trivial.

2/4 fixes a regression which results in ECC detection not working.

3/4 and 4/4 are "improvements", but don't necessarily need to go into
2.6.33.

Best,
Peter

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-25 17:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-20  0:42 [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser
2009-11-20  0:42 ` [PATCH 2/4] edac: Fix mpc85xx page calculation Peter Tyser
2009-11-20  0:42 ` [PATCH 3/4] edac: Mask mpc85xx ECC syndrome appropriately Peter Tyser
2009-11-20  0:42 ` [PATCH 4/4] edac: Improve SDRAM error reporting for mpc85xx Peter Tyser
2010-01-25 17:20 ` [PATCH 1/4] edac: Remove unused mpc85xx debug code Peter Tyser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).