* Maple: killing a process that causes a machine check exception
@ 2006-05-23 14:59 jfaslist
2006-05-23 15:15 ` Anton Blanchard
0 siblings, 1 reply; 6+ messages in thread
From: jfaslist @ 2006-05-23 14:59 UTC (permalink / raw)
To: linuxppc64-dev
Hi,
By applying the following mods (plse see below), i was able to have a
user process that caused a machine check exception to be terminated (on
a Maple platform), as expected. I was wondering why the PPC64 had a
different ME handling than PPC which does send the SIGBUS to the process?
Thanks
Regards,
-jean-francois simon
diff -urN -X linux-2.6.16.14/Documentation/dontdiff
linux-2.6.16.14/arch/powerpc/kernel/traps.c
linux-2.6.16.14.vmeberr_fix/arch/powerpc/kernel/traps.c
--- linux-2.6.16.14/arch/powerpc/kernel/traps.c 2006-05-04
17:03:45.000000000 -0700
+++ linux-2.6.16.14.vmeberr_fix/arch/powerpc/kernel/traps.c 2006-05-09
02:46:59.000000000 -0700
@@ -340,12 +340,19 @@
#ifdef CONFIG_PPC64
int recover = 0;
+
/* See if any machine dependent calls */
if (ppc_md.machine_check_exception)
recover = ppc_md.machine_check_exception(regs);
if (recover)
return;
+
+ if (user_mode(regs)) {
+ regs->msr |= MSR_RI;
+ _exception(SIGBUS, regs, BUS_ADRERR, regs->nip);
+ return;
+ }
#else
unsigned long reason = get_mc_reason(regs);
___________________________________________________________________________
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire.
http://fr.mail.yahoo.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Maple: killing a process that causes a machine check exception
2006-05-23 14:59 Maple: killing a process that causes a machine check exception jfaslist
@ 2006-05-23 15:15 ` Anton Blanchard
2006-05-23 16:09 ` jfaslist
0 siblings, 1 reply; 6+ messages in thread
From: Anton Blanchard @ 2006-05-23 15:15 UTC (permalink / raw)
To: jfaslist; +Cc: linuxppc64-dev
> By applying the following mods (plse see below), i was able to have a
> user process that caused a machine check exception to be terminated (on
> a Maple platform), as expected. I was wondering why the PPC64 had a
> different ME handling than PPC which does send the SIGBUS to the process?
Not all machine checks are synchronous so we cant always do this. From
memory the ppc64 version will panic if the machine check wasnt
synchronous.
Anton
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Maple: killing a process that causes a machine check exception
2006-05-23 15:15 ` Anton Blanchard
@ 2006-05-23 16:09 ` jfaslist
2006-05-23 16:23 ` Anton Blanchard
0 siblings, 1 reply; 6+ messages in thread
From: jfaslist @ 2006-05-23 16:09 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linuxppc64-dev
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii; format=flowed, Size: 1259 bytes --]
Hi,
What do you mean by synchronous? Do you mean that the current process
may no be not the one that caused the ME?
In my case I _need_ the process to be killed, as it is making a VME bus
error. / PCI target-abort.
I am starting to get desperate. I have been working for several month
with IBM to get a solution on machine check related issues on the Maple.
First, the Maple was hanging hard. Now that this is fixed, I need the
Linux ME to kill the offending process!
My feeling now, is that I am really starting to have it w/ the ppc64!
-jfs
Anton Blanchard wrote:
>>By applying the following mods (plse see below), i was able to have a
>>user process that caused a machine check exception to be terminated (on
>>a Maple platform), as expected. I was wondering why the PPC64 had a
>>different ME handling than PPC which does send the SIGBUS to the process?
>>
>>
>
>Not all machine checks are synchronous so we cant always do this. From
>memory the ppc64 version will panic if the machine check wasnt
>synchronous.
>
>Anton
>
>
>
___________________________________________________________________________
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire.
http://fr.mail.yahoo.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Maple: killing a process that causes a machine check exception
2006-05-23 16:09 ` jfaslist
@ 2006-05-23 16:23 ` Anton Blanchard
2006-05-23 16:30 ` jfaslist
2006-05-23 16:48 ` Linas Vepstas
0 siblings, 2 replies; 6+ messages in thread
From: Anton Blanchard @ 2006-05-23 16:23 UTC (permalink / raw)
To: jfaslist; +Cc: linuxppc64-dev
Hi,
> What do you mean by synchronous? Do you mean that the current process
> may no be not the one that caused the ME?
Yeah, a device doing DMA might cause a machine check independent to your
current task. In that case we really need to take the machine down.
> In my case I _need_ the process to be killed, as it is making a VME bus
> error. / PCI target-abort.
> I am starting to get desperate. I have been working for several month
> with IBM to get a solution on machine check related issues on the Maple.
> First, the Maple was hanging hard. Now that this is fixed, I need the
> Linux ME to kill the offending process!
> My feeling now, is that I am really starting to have it w/ the ppc64!
Sounds like you need a Maple specific machine check handler. My point is
we cant merge a fix like that because it affects every powerpc arch out
there, all with different machine check handling requirements.
Anton
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Maple: killing a process that causes a machine check exception
2006-05-23 16:23 ` Anton Blanchard
@ 2006-05-23 16:30 ` jfaslist
2006-05-23 16:48 ` Linas Vepstas
1 sibling, 0 replies; 6+ messages in thread
From: jfaslist @ 2006-05-23 16:30 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linuxppc64-dev
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii; format=flowed, Size: 961 bytes --]
>>In my case I _need_ the process to be killed, as it is making a VME bus
>>error. / PCI target-abort.
>>I am starting to get desperate. I have been working for several month
>>with IBM to get a solution on machine check related issues on the Maple.
>>First, the Maple was hanging hard. Now that this is fixed, I need the
>>Linux ME to kill the offending process!
>>My feeling now, is that I am really starting to have it w/ the ppc64!
>>
>>
>
>Sounds like you need a Maple specific machine check handler. My point is
>we cant merge a fix like that because it affects every powerpc arch out
>there, all with different machine check handling requirements.
>
>
>
understood. thanks. sorry for letting out a little steam
-jean-francois simon
___________________________________________________________________________
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire.
http://fr.mail.yahoo.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Maple: killing a process that causes a machine check exception
2006-05-23 16:23 ` Anton Blanchard
2006-05-23 16:30 ` jfaslist
@ 2006-05-23 16:48 ` Linas Vepstas
1 sibling, 0 replies; 6+ messages in thread
From: Linas Vepstas @ 2006-05-23 16:48 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linuxppc64-dev
On Wed, May 24, 2006 at 02:23:48AM +1000, Anton Blanchard wrote:
> jfaslist <jfaslist@yahoo.fr> wrote:
> > What do you mean by synchronous? Do you mean that the current process
> > may no be not the one that caused the ME?
>
> Yeah, a device doing DMA might cause a machine check independent to your
> current task. In that case we really need to take the machine down.
>
> > In my case I _need_ the process to be killed, as it is making a VME bus
> > error. / PCI target-abort.
>
> Sounds like you need a Maple specific machine check handler. My point is
> we cant merge a fix like that because it affects every powerpc arch out
> there, all with different machine check handling requirements.
Here's an utterly crazy idea that might take a lot of work to implement,
but might help with the problem. *If* it can be determined which pci device
caused the error, then it might be possible to reset the PCI device and
restart the device driver.
There is an existing infrastructure for "PCI Error Recovery" (known as
EEH on the pSeries) for detecting and clearing PCI bus errors. On the
pSeries, it depends on a combination of custom hardware PCI bridges and
firmware to isolate the failing device; but maybe on other systems, one
might be able to do "almost" as well.
(See kernel source, Documentation/pci-error-recovery.txt)
--linas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-05-23 16:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-23 14:59 Maple: killing a process that causes a machine check exception jfaslist
2006-05-23 15:15 ` Anton Blanchard
2006-05-23 16:09 ` jfaslist
2006-05-23 16:23 ` Anton Blanchard
2006-05-23 16:30 ` jfaslist
2006-05-23 16:48 ` Linas Vepstas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).