* bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ?
@ 2009-04-08 21:57 Jeff Haran
2009-04-10 21:47 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Haran @ 2009-04-08 21:57 UTC (permalink / raw)
To: linux-kernel
Hi,
Recent versions of this function start off with:
static void mpc85xx_mc_check(struct mem_ctl_info *mci)
{
struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
...
err_detect = in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT);
if (err_detect)
return;
...
}
My reading of the Freescale 8548E Manual leads me to conclude that the
Memory Error Detect register (ERR_DETECT) will have various bits set if
the memory controller has detected an error since the last time it was
cleared. If no memory error has occurred, the register will contain 0.
Perhaps I am missing something very basic, but it seem to me that the
above "if" should be:
if (!err_detect)
return;
as the existing code would seem to read "if any errors have occurred,
ignore them", though perhaps testing has demonstrated that the Freescale
manual is in error.
Please include this email address in responses as I do not subscribe.
Thanks,
Jeff Haran
Brocade
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ?
2009-04-08 21:57 bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ? Jeff Haran
@ 2009-04-10 21:47 ` Andrew Morton
2009-04-13 17:10 ` Dave Jiang
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2009-04-10 21:47 UTC (permalink / raw)
To: Jeff Haran
Cc: linux-kernel, Doug Thompson, Kumar Gala, Dave Jiang, linuxppc-dev
(cc's added)
On Wed, 8 Apr 2009 14:57:42 -0700
"Jeff Haran" <jharan@Brocade.COM> wrote:
> Hi,
>
> Recent versions of this function start off with:
>
> static void mpc85xx_mc_check(struct mem_ctl_info *mci)
> {
> struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
> ...
>
> err_detect = in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT);
> if (err_detect)
> return;
>
> ...
> }
>
> My reading of the Freescale 8548E Manual leads me to conclude that the
> Memory Error Detect register (ERR_DETECT) will have various bits set if
> the memory controller has detected an error since the last time it was
> cleared. If no memory error has occurred, the register will contain 0.
>
> Perhaps I am missing something very basic, but it seem to me that the
> above "if" should be:
>
> if (!err_detect)
> return;
>
> as the existing code would seem to read "if any errors have occurred,
> ignore them", though perhaps testing has demonstrated that the Freescale
> manual is in error.
>
> Please include this email address in responses as I do not subscribe.
>
> Thanks,
>
> Jeff Haran
> Brocade
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ?
2009-04-10 21:47 ` Andrew Morton
@ 2009-04-13 17:10 ` Dave Jiang
0 siblings, 0 replies; 7+ messages in thread
From: Dave Jiang @ 2009-04-13 17:10 UTC (permalink / raw)
To: Jeff Haran
Cc: Andrew Morton, linux-kernel, Doug Thompson, Kumar Gala,
linuxppc-dev
Jeff, you are correct. I will submit a patch to correct that.
Andrew Morton wrote:
> (cc's added)
>
> On Wed, 8 Apr 2009 14:57:42 -0700
> "Jeff Haran" <jharan@Brocade.COM> wrote:
>
>> Hi,
>>
>> Recent versions of this function start off with:
>>
>> static void mpc85xx_mc_check(struct mem_ctl_info *mci)
>> {
>> struct mpc85xx_mc_pdata *pdata = mci->pvt_info;
>> ...
>>
>> err_detect = in_be32(pdata->mc_vbase + MPC85XX_MC_ERR_DETECT);
>> if (err_detect)
>> return;
>>
>> ...
>> }
>>
>> My reading of the Freescale 8548E Manual leads me to conclude that the
>> Memory Error Detect register (ERR_DETECT) will have various bits set if
>> the memory controller has detected an error since the last time it was
>> cleared. If no memory error has occurred, the register will contain 0.
>>
>> Perhaps I am missing something very basic, but it seem to me that the
>> above "if" should be:
>>
>> if (!err_detect)
>> return;
>>
>> as the existing code would seem to read "if any errors have occurred,
>> ignore them", though perhaps testing has demonstrated that the Freescale
>> manual is in error.
>>
>> Please include this email address in responses as I do not subscribe.
>>
>> Thanks,
>>
>> Jeff Haran
>> Brocade
>
--
------------------------------------------------------
Dave Jiang
Software Engineer
MontaVista Software, Inc.
http://www.mvista.com
------------------------------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
@ 2009-04-29 1:23 Jeff Haran
2009-04-29 7:37 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Haran @ 2009-04-29 1:23 UTC (permalink / raw)
To: linux-kernel
Hi,
Recent versions of this function contain the following snippets:
if (err_detect & DDR_EDE_SBE)
edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
syndrome, row_index, 0, mci->ctl_name);
if (err_detect & DDR_EDE_MBE)
edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
row_index, mci->ctl_name);
I am pretty sure the references to PAGE_MASK should be proceeded by a
tilda, as in:
if (err_detect & DDR_EDE_SBE)
edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
syndrome, row_index, 0, mci->ctl_name);
if (err_detect & DDR_EDE_MBE)
edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
row_index, mci->ctl_name);
Much as I would like to submit a tested patch like the rest of the
world, I find myself in the situation where the only Freescale target
system I have to test on is running a 3 year old kernel (2.6.14), which
preceeds the introduction of EDAC driver support, at least for
Freescale. So the best I can do is borrow from the new EDAC driver and
backport it to the old kernel.
But I have learned a few things in this process and can thus share what
I've learned as it may be of help to the EDAC driver developers:
1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you want
to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
significant bit in the register) is set or else the data in
CAPTURE_ADDRESS may not be yet valid.
2) When you are done scrubbing the memory with the single bit error, you
want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus setup
the ECC capture logic to capture the next single bit error.
Please include this email address in responses as I do not subscribe.
Thanks,
Jeff Haran
Brocade
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
2009-04-29 1:23 bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() Jeff Haran
@ 2009-04-29 7:37 ` Andrew Morton
2009-04-29 12:46 ` Kumar Gala
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2009-04-29 7:37 UTC (permalink / raw)
To: Jeff Haran
Cc: linux-kernel, Dave Jiang, Doug Thompson, Benjamin Herrenschmidt,
Kumar Gala, linuxppc-dev
Let's cc the suitable people.
On Tue, 28 Apr 2009 18:23:42 -0700 "Jeff Haran" <jharan@Brocade.COM> wrote:
> Hi,
>
> Recent versions of this function contain the following snippets:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
> syndrome, row_index, 0, mci->ctl_name);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
> row_index, mci->ctl_name);
>
> I am pretty sure the references to PAGE_MASK should be proceeded by a
> tilda, as in:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
> syndrome, row_index, 0, mci->ctl_name);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
> row_index, mci->ctl_name);
>
Could well be. PAGE_MASK is very easy to get wrong. I've _never_
trusted my own memory of it and I always have to go back to the
definition when reviewing code :(
> Much as I would like to submit a tested patch like the rest of the
> world, I find myself in the situation where the only Freescale target
> system I have to test on is running a 3 year old kernel (2.6.14), which
> preceeds the introduction of EDAC driver support, at least for
> Freescale. So the best I can do is borrow from the new EDAC driver and
> backport it to the old kernel.
>
> But I have learned a few things in this process and can thus share what
> I've learned as it may be of help to the EDAC driver developers:
>
> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you want
> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
> significant bit in the register) is set or else the data in
> CAPTURE_ADDRESS may not be yet valid.
>
> 2) When you are done scrubbing the memory with the single bit error, you
> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus setup
> the ECC capture logic to capture the next single bit error.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
2009-04-29 7:37 ` Andrew Morton
@ 2009-04-29 12:46 ` Kumar Gala
0 siblings, 0 replies; 7+ messages in thread
From: Kumar Gala @ 2009-04-29 12:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Jeff Haran, linux-kernel, Dave Jiang, Doug Thompson,
Benjamin Herrenschmidt, Kumar Gala, linuxppc-dev
On Apr 29, 2009, at 2:37 AM, Andrew Morton wrote:
> Let's cc the suitable people.
>
> On Tue, 28 Apr 2009 18:23:42 -0700 "Jeff Haran" <jharan@Brocade.COM>
> wrote:
>
>> Hi,
>>
>> Recent versions of this function contain the following snippets:
>>
>> if (err_detect & DDR_EDE_SBE)
>> edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
>> syndrome, row_index, 0, mci->ctl_name);
>>
>> if (err_detect & DDR_EDE_MBE)
>> edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
>> row_index, mci->ctl_name);
>>
>> I am pretty sure the references to PAGE_MASK should be proceeded by a
>> tilda, as in:
>>
>> if (err_detect & DDR_EDE_SBE)
>> edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
>> syndrome, row_index, 0, mci->ctl_name);
>>
>> if (err_detect & DDR_EDE_MBE)
>> edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
>> row_index, mci->ctl_name);
>>
>
> Could well be. PAGE_MASK is very easy to get wrong. I've _never_
> trusted my own memory of it and I always have to go back to the
> definition when reviewing code :(
This should ~PAGE_MASK to get the offset into the page.
>> Much as I would like to submit a tested patch like the rest of the
>> world, I find myself in the situation where the only Freescale target
>> system I have to test on is running a 3 year old kernel (2.6.14),
>> which
>> preceeds the introduction of EDAC driver support, at least for
>> Freescale. So the best I can do is borrow from the new EDAC driver
>> and
>> backport it to the old kernel.
>>
>> But I have learned a few things in this process and can thus share
>> what
>> I've learned as it may be of help to the EDAC driver developers:
>>
>> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you
>> want
>> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
>> significant bit in the register) is set or else the data in
>> CAPTURE_ADDRESS may not be yet valid.
>>
>> 2) When you are done scrubbing the memory with the single bit
>> error, you
>> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus
>> setup
>> the ECC capture logic to capture the next single bit error.
This is a correct description based on how FSL error HW works.
- k
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
@ 2009-04-29 4:12 Doug Thompson
0 siblings, 0 replies; 7+ messages in thread
From: Doug Thompson @ 2009-04-29 4:12 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-kernel, Jeff Haran
Dave, can you look at this
doug thompson
--- On Tue, 4/28/09, Jeff Haran <jharan@Brocade.COM> wrote:
> From: Jeff Haran <jharan@Brocade.COM>
> Subject: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
> To: linux-kernel@vger.kernel.org
> Date: Tuesday, April 28, 2009, 7:23 PM
> Hi,
>
> Recent versions of this function contain the following
> snippets:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_ce(mci, pfn,
> err_addr & PAGE_MASK,
>
> syndrome, row_index, 0, mci->ctl_name);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_ue(mci, pfn,
> err_addr & PAGE_MASK,
>
> row_index, mci->ctl_name);
>
> I am pretty sure the references to PAGE_MASK should be
> proceeded by a
> tilda, as in:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_ce(mci, pfn,
> err_addr & ~PAGE_MASK,
>
> syndrome, row_index, 0, mci->ctl_name);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_ue(mci, pfn,
> err_addr & ~PAGE_MASK,
>
> row_index, mci->ctl_name);
>
>
> Much as I would like to submit a tested patch like the rest
> of the
> world, I find myself in the situation where the only
> Freescale target
> system I have to test on is running a 3 year old kernel
> (2.6.14), which
> preceeds the introduction of EDAC driver support, at least
> for
> Freescale. So the best I can do is borrow from the new EDAC
> driver and
> backport it to the old kernel.
>
> But I have learned a few things in this process and can
> thus share what
> I've learned as it may be of help to the EDAC driver
> developers:
>
> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS
> register, you want
> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit
> (least
> significant bit in the register) is set or else the data
> in
> CAPTURE_ADDRESS may not be yet valid.
>
> 2) When you are done scrubbing the memory with the single
> bit error, you
> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD
> and thus setup
> the ECC capture logic to capture the next single bit
> error.
>
> Please include this email address in responses as I do not
> subscribe.
>
> Thanks,
>
> Jeff Haran
> Brocade
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-04-29 12:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-08 21:57 bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() ? Jeff Haran
2009-04-10 21:47 ` Andrew Morton
2009-04-13 17:10 ` Dave Jiang
-- strict thread matches above, loose matches on Subject: below --
2009-04-29 1:23 bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check() Jeff Haran
2009-04-29 7:37 ` Andrew Morton
2009-04-29 12:46 ` Kumar Gala
2009-04-29 4:12 Doug Thompson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox