linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ?
       [not found] <57AC2FA1761300418C7AB8F3EA493C9702C5F200@HQ-EXCH-5.corp.brocade.com>
@ 2009-04-10 21:47 ` Andrew Morton
  2009-04-13 17:10   ` Dave Jiang
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2009-04-10 21:47 UTC (permalink / raw)
  To: Jeff Haran; +Cc: linuxppc-dev, Dave Jiang, linux-kernel, Doug Thompson

(cc's added)

On Wed, 8 Apr 2009 14:57:42 -0700
"Jeff Haran" <jharan@Brocade.COM> wrote:

> Hi,
> 
> Recent versions of this function start off with:
> 
> static void mpc85xx_mc_check(struct mem_ctl_info *mci)
> {
>     struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
>     ...
> 
>     err_detect = in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT);
>     if (err_detect)
>         return;
> 
>     ...
> }
> 
> My reading of the Freescale 8548E Manual leads me to conclude that the
> Memory Error Detect register (ERR_DETECT) will have various bits set if
> the memory controller has detected an error since the last time it was
> cleared. If no memory error has occurred, the register will contain 0.
> 
> Perhaps I am missing something very basic, but it seem to me that the
> above "if" should be:
> 
>     if (!err_detect)
>         return;
> 
> as the existing code would seem to read "if any errors have occurred,
> ignore them", though perhaps testing has demonstrated that the Freescale
> manual is in error.
> 
> Please include this email address in responses as I do not subscribe.
> 
> Thanks,
> 
> Jeff Haran
> Brocade

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ?
  2009-04-10 21:47 ` bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ? Andrew Morton
@ 2009-04-13 17:10   ` Dave Jiang
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Jiang @ 2009-04-13 17:10 UTC (permalink / raw)
  To: Jeff Haran; +Cc: linuxppc-dev, Andrew Morton, linux-kernel, Doug Thompson

Jeff, you are correct. I will submit a patch to correct that.

Andrew Morton wrote:
> (cc's added)
> 
> On Wed, 8 Apr 2009 14:57:42 -0700
> "Jeff Haran" <jharan@Brocade.COM> wrote:
> 
>> Hi,
>>
>> Recent versions of this function start off with:
>>
>> static void mpc85xx_mc_check(struct mem_ctl_info *mci)
>> {
>>     struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
>>     ...
>>
>>     err_detect = in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT);
>>     if (err_detect)
>>         return;
>>
>>     ...
>> }
>>
>> My reading of the Freescale 8548E Manual leads me to conclude that the
>> Memory Error Detect register (ERR_DETECT) will have various bits set if
>> the memory controller has detected an error since the last time it was
>> cleared. If no memory error has occurred, the register will contain 0.
>>
>> Perhaps I am missing something very basic, but it seem to me that the
>> above "if" should be:
>>
>>     if (!err_detect)
>>         return;
>>
>> as the existing code would seem to read "if any errors have occurred,
>> ignore them", though perhaps testing has demonstrated that the Freescale
>> manual is in error.
>>
>> Please include this email address in responses as I do not subscribe.
>>
>> Thanks,
>>
>> Jeff Haran
>> Brocade
> 


-- 

------------------------------------------------------
Dave Jiang
Software Engineer
MontaVista Software, Inc.
http://www.mvista.com
------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
       [not found] <57AC2FA1761300418C7AB8F3EA493C9702E319DB@HQ-EXCH-5.corp.brocade.com>
@ 2009-04-29  7:37 ` Andrew Morton
  2009-04-29 12:46   ` Kumar Gala
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2009-04-29  7:37 UTC (permalink / raw)
  To: Jeff Haran
  Cc: Doug, linux-kernel, Dave Jiang, linuxppc-dev, Thompson,
	Kumar Gala

Let's cc the suitable people.

On Tue, 28 Apr 2009 18:23:42 -0700 "Jeff Haran" <jharan@Brocade.COM> wrote:

> Hi,
> 
> Recent versions of this function contain the following snippets:
> 
>     if (err_detect & DDR_EDE_SBE)
>         edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
>                   syndrome, row_index, 0, mci->ctl_name);
> 
>     if (err_detect & DDR_EDE_MBE)
>         edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
>                   row_index, mci->ctl_name);
> 
> I am pretty sure the references to PAGE_MASK should be proceeded by a
> tilda, as in:
> 
>     if (err_detect & DDR_EDE_SBE)
>         edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
>                   syndrome, row_index, 0, mci->ctl_name);
> 
>     if (err_detect & DDR_EDE_MBE)
>         edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
>                   row_index, mci->ctl_name);
> 

Could well be.  PAGE_MASK is very easy to get wrong.  I've _never_
trusted my own memory of it and I always have to go back to the
definition when reviewing code :(

> Much as I would like to submit a tested patch like the rest of the
> world, I find myself in the situation where the only Freescale target
> system I have to test on is running a 3 year old kernel (2.6.14), which
> preceeds the introduction of EDAC driver support, at least for
> Freescale. So the best I can do is borrow from the new EDAC driver and
> backport it to the old kernel.
> 
> But I have learned a few things in this process and can thus share what
> I've learned as it may be of help to the EDAC driver developers:
> 
> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you want
> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
> significant bit in the register) is set or else the data in
> CAPTURE_ADDRESS may not be yet valid.
> 
> 2) When you are done scrubbing the memory with the single bit error, you
> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus setup
> the ECC capture logic to capture the next single bit error.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
  2009-04-29  7:37 ` bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() Andrew Morton
@ 2009-04-29 12:46   ` Kumar Gala
  0 siblings, 0 replies; 4+ messages in thread
From: Kumar Gala @ 2009-04-29 12:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Haran, linux-kernel, Dave Jiang, linuxppc-dev, Doug Thompson,
	Kumar Gala


On Apr 29, 2009, at 2:37 AM, Andrew Morton wrote:

> Let's cc the suitable people.
>
> On Tue, 28 Apr 2009 18:23:42 -0700 "Jeff Haran" <jharan@Brocade.COM>  
> wrote:
>
>> Hi,
>>
>> Recent versions of this function contain the following snippets:
>>
>>    if (err_detect & DDR_EDE_SBE)
>>        edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
>>                  syndrome, row_index, 0, mci->ctl_name);
>>
>>    if (err_detect & DDR_EDE_MBE)
>>        edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
>>                  row_index, mci->ctl_name);
>>
>> I am pretty sure the references to PAGE_MASK should be proceeded by a
>> tilda, as in:
>>
>>    if (err_detect & DDR_EDE_SBE)
>>        edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
>>                  syndrome, row_index, 0, mci->ctl_name);
>>
>>    if (err_detect & DDR_EDE_MBE)
>>        edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
>>                  row_index, mci->ctl_name);
>>
>
> Could well be.  PAGE_MASK is very easy to get wrong.  I've _never_
> trusted my own memory of it and I always have to go back to the
> definition when reviewing code :(

This should ~PAGE_MASK to get the offset into the page.

>> Much as I would like to submit a tested patch like the rest of the
>> world, I find myself in the situation where the only Freescale target
>> system I have to test on is running a 3 year old kernel (2.6.14),  
>> which
>> preceeds the introduction of EDAC driver support, at least for
>> Freescale. So the best I can do is borrow from the new EDAC driver  
>> and
>> backport it to the old kernel.
>>
>> But I have learned a few things in this process and can thus share  
>> what
>> I've learned as it may be of help to the EDAC driver developers:
>>
>> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you  
>> want
>> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
>> significant bit in the register) is set or else the data in
>> CAPTURE_ADDRESS may not be yet valid.
>>
>> 2) When you are done scrubbing the memory with the single bit  
>> error, you
>> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus  
>> setup
>> the ECC capture logic to capture the next single bit error.

This is a correct description based on how FSL error HW works.

- k

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-29 12:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <57AC2FA1761300418C7AB8F3EA493C9702E319DB@HQ-EXCH-5.corp.brocade.com>
2009-04-29  7:37 ` bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() Andrew Morton
2009-04-29 12:46   ` Kumar Gala
     [not found] <57AC2FA1761300418C7AB8F3EA493C9702C5F200@HQ-EXCH-5.corp.brocade.com>
2009-04-10 21:47 ` bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ? Andrew Morton
2009-04-13 17:10   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).