public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jojy Varghese <jojyv@juniper.net>
To: Guenter Roeck <linux@roeck-us.net>, Scott Wood <scottwood@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Guenter Roeck <groeck@juniper.net>,
	"hongtao.jia@freescale.com" <hongtao.jia@freescale.com>
Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500
Date: Tue, 30 Sep 2014 20:15:24 +0000	[thread overview]
Message-ID: <D0505C72.10E3B%jojyv@juniper.net> (raw)
In-Reply-To: <20140930155029.GA4724@roeck-us.net>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="euc-kr", Size: 7444 bytes --]



On 9/30/14 8:50 AM, "Guenter Roeck" <linux@roeck-us.net> wrote:

>On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
>> On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:
>> > 
>> > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@roeck-us.net> wrote:
>> > 
>> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
>> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
>> > >> > From: Jojy G Varghese <jojyv@juniper.net>
>> > >> > 
>> > >> > For E500MC and E5500, a machine check exception in pci(e) memory
>>space
>> > >> > crashes the kernel.
>> > >> > 
>> > >> > Testing shows that the MCAR(U) register is zero on a MC
>>exception for
>> > >>the
>> > >> > E5500 core. At the same time, DEAR register has been found to
>>have the
>> > >> > address of the faulty load address during an MC exception for
>>this
>> > >>core.
>> > >> > 
>> > >> > This fix changes the current behavior to fixup the result
>>register
>> > >> > and instruction pointers in the case of a load operation on a
>>faulty
>> > >> > PCI address.
>> > >> > 
>> > >> > The changes are:
>> > >> > - Added the hook to pci machine check handing to the e500mc
>>machine
>> > >>check
>> > >> >   exception handler.
>> > >> > - For the E5500 core, load faulting address from SPRN_DEAR
>>register.
>> > >> >   As mentioned above, this is necessary because the E5500 core
>>does
>> > >>not
>> > >> >   report the fault address in the MCAR register.
>> > >> > 
>> > >> > Cc: Scott Wood <scottwood@freescale.com>
>> > >> > Signed-off-by: Jojy G Varghese <jojyv@juniper.net>
>> > >> > [Guenter Roeck: updated description]
>> > >> > Signed-off-by: Guenter Roeck <groeck@juniper.net>
>> > >> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>> > >> > ---
>> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-
>> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++
>> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)
>> > >> > 
>> > >> > diff --git a/arch/powerpc/kernel/traps.c
>>b/arch/powerpc/kernel/traps.c
>> > >> > index 0dc43f9..ecb709b 100644
>> > >> > --- a/arch/powerpc/kernel/traps.c
>> > >> > +++ b/arch/powerpc/kernel/traps.c
>> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs
>>*regs)
>> > >> >  	int recoverable = 1;
>> > >> >  
>> > >> >  	if (reason & MCSR_LD) {
>> > >> > -		recoverable = fsl_rio_mcheck_exception(regs);
>> > >> > +		recoverable = fsl_rio_mcheck_exception(regs) ||
>> > >> > +			fsl_pci_mcheck_exception(regs);
>> > >> >  		if (recoverable == 1)
>> > >> >  			goto silent_out;
>> > >> >  	}
>> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
>> > >>b/arch/powerpc/sysdev/fsl_pci.c
>> > >> > index c507767..bdb956b 100644
>> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c
>> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
>> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct
>>pt_regs
>> > >>*regs)
>> > >> >  #endif
>> > >> >  	addr += mfspr(SPRN_MCAR);
>> > >> >  
>> > >> > +#ifdef CONFIG_E5500_CPU
>> > >> > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
>> > >> > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
>> > >> > +#endif
>> > >> 
>> > >> Kconfig tells you what hardware is supported, not what hardware
>>you're
>> > >> actually running on.
>> 
>> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
>> it is used for selecting GCC optimization settings.  You could have
>> CONFIG_GENERIC_CPU instead.
>> 
>> And the subject says "E500MC / E5500", not just "E5500". :-)
>> 
>> > >Hi Scott,
>> > >
>> > >Good point. Jojy, guess we'll have to check if the CPU is actually an
>> > >E5500.
>> > >Can you look into that ?
>> > 
>> > 
>> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting
>>that
>> > we use a runtime method of determining the cpu type (cpu_spec's
>>cpu_name
>> > for
>> > example).  
>> 
>> Yes, if there's a bug to be worked around, and we don't want to apply
>> the workaround unconditionally, you should use PVR to determine whether
>> you're running on an affected core.
>> 
>> > >> Can we rely on DEAR or is this just a side effect of likely having
>>taken
>> > >> a TLB miss for the address recently?  Perhaps we should use the
>> > >> instruction emulation to determine the effective address instead.
>> > >> 
>> > >> Guenter, is this patch intended to deal with an erratum or are you
>> > >> covering up legitimate errors?
>> > >> 
>> >
>> > >Those are errors related to PCIe hotplug, and are seen with
>>unexpected
>> > >PCIe
>> > >device removals (triggered, for example, by removing power from a
>>PCIe
>> > >adapter).
>> > >The behavior we see on E5500 is quite similar to the same behavior on
>> > >E500:
>> > >If unhandled, the CPU keeps executing the same instruction over and
>>over
>> > >again
>> > >if there is an error on a PCIe access and thus stalls. I don't know
>>if
>> > >this
>> > >is considered an erratum or expected behavior, but it is one we have
>>to
>> > >address
>> > >since we have to be able to handle that condition.
>> 
>> The reason I ask is that the handling for e500 was described as an
>> erratum workaround.  If it is an erratum it would be nice to know the
>> erratum number and the full list of affected chips.
>> 
>My understanding, which may be wrong, was that this is expected behavior,
>at least for E5500. I actually thought I had seen it somewhere in the
>specification (response to PCIe errors), but I don't recall where exactly.
>
>At least for my part I am not aware of an erratum.
>
>> > >Ultimately, we'll want
>> > >to
>> > >implement PCIe error handlers for the affected drivers, but that
>>will be
>> > >a next
>> > >step.
>> 
>> For now can we at least print a ratelimited error message?  I don't like
>> the idea of silently ignoring these errors.  I suppose it's a separate
>> issue from extending the workaround to cover e500mc, though.
>> 
>I don't really like the idea of printing an error message pretty much
>each time
>when an unexpected hotplug event occurs.
>
>> > According to the spec, we MCAR is supposed to hold the faulty data
>>address
>> > but for 5500 core, we found that MCAR is zero.
>> 
>> Which specific chip and revision did you see this on?  What is the value
>> in MCSR?
>> 
>Jojy can answer that, at least for P5020. We have seen it on P5040 as
>well,
>though, so it is not just limited to one chip/revision.

The specifics are:
PVR: 0x80240012
Instruction that causes the MC exception: lwbrx
	The faulty load address is also present in RB. So we could change the
logic to use that 
instead of DEAR. What I don¡¯t know is of there are other cases also which
escapes the current logic.

					
				
			
		
	

>
>Guenter
>
>> > You are right that DEAR entry could
>> > be a resultOf a TLB miss but that©ös the register we could rely on.
>> 
>> If it's the result of a previous TLB miss then we can't rely on it.  The
>> translation might have been loaded into the TLB before the hotplug
>> event, or there might have been an interrupt between loading the
>> translation into the TLB and using the translation.
>> 
>> > What do you mean by "instruction emulation"?
>> 
>> mcheck_handle_load()
>> 
>> > Are you suggesting that we
>> > examine the RD, RS
>> > registers for the instruction?
>> 
>> Yes, if we don't have a simpler reliable source of the address.
>> 
>> -Scott
>> 
>> 

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

  reply	other threads:[~2014-09-30 20:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-29 16:48 [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500 Guenter Roeck
2014-09-29 18:36 ` Scott Wood
2014-09-29 19:06   ` Guenter Roeck
2014-09-29 23:03     ` Jojy Varghese
2014-09-29 23:31       ` Scott Wood
2014-09-30 15:50         ` Guenter Roeck
2014-09-30 20:15           ` Jojy Varghese [this message]
2014-09-30 20:17             ` Scott Wood
2014-09-30 20:20               ` Jojy Varghese
2014-10-01  0:43           ` Scott Wood
2014-10-08  3:10             ` Hongtao Jia
2014-10-08  3:08   ` Hongtao Jia
2014-10-08 23:48     ` Scott Wood
2014-10-09  2:18       ` Hongtao Jia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D0505C72.10E3B%jojyv@juniper.net \
    --to=jojyv@juniper.net \
    --cc=benh@kernel.crashing.org \
    --cc=groeck@juniper.net \
    --cc=hongtao.jia@freescale.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox