public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
@ 2008-06-01  9:01 j.mell
  2008-06-01 11:40 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: j.mell @ 2008-06-01  9:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, ak

Hi,

> On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> I tracked this down to a single kernel configuration option. If
> CONFIG_PREEMPT is set to 'y' the application will start crashing.
> If  CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> application will run without errors.

With lots of help from Heinz-Bernd, Bernd and Oliver of the Einstein@Home 
project I now found the the following:

1. Einstein@home will crash with trap #8 if the problem is present. The 
error occurs between some minutes after starting Einstein up to more than 
10 hours after starting Einstein. This seems to depend on how many other 
applications are used on the system (it takes much more time, if only the 
Einstein processes are active on the system).

2. The error was introduced between kernel.org kernels 2.6.19.7 and 2.6.20. 
It is still present in 2.6.26-rc4

3. If I revert the patch
 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167

in 2.6.20, Einstein does not crash anymore (program was run for more than 
30 hours while system was in normal use with programming, multi-media 
etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.

Now I need some help as I am not an expert in this area. What I assume is 
that either the state of the FPU is not always restored (perhaps if the 
process is swapped between the two cores?) or it is restored more than 
once. Please keep in mind, that I am always running two Einstein processes 
simultaneously on my two cores!
I am willing to do further testing of this problem if someone can give me a 
hint how to continue.

Bye,

          Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread
* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
@ 2008-05-24 18:52 j.mell
  0 siblings, 0 replies; 20+ messages in thread
From: j.mell @ 2008-05-24 18:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel

On Sonday, 18. May 2008, you wrote:
> > On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> > > I tracked this down to a single kernel configuration option. If
> > > CONFIG_PREEMPT is set to 'y' the application will start crashing. If
> > > CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> > > application will run without errors.

> > If you see it in the kernel.org kernel, can you please do a git-bisect
> > to see which commit caused the problem?

> This is a bit of a problem. I do not know whether there was *ever* a
> kernel  version with CONFIG_PREEMPT and without this problem as I have
> not tried  any older kernel version yet. I will go back to SUSE 10.2 and
> try the  2.6.18 kernel that comes with it.

I found now that the problem was introduced somewhere between the 
kernel.org kernels 2.6.19.7 and 2.6.20. I will start bisecting now.

Bye,

          Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread
* CONFIG_PREEMPT causes corruption of application's FPU stack
@ 2008-05-17 16:31 Jürgen Mell
  2008-05-18 15:07 ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Jürgen Mell @ 2008-05-17 16:31 UTC (permalink / raw)
  To: linux-kernel

I am running the Einstein@home application (version 4.35, 
http://einstein.phys.uwm.edu).This application does lots of computations 
mostly with FPU and SSE instructions.
After I started experimenting with real-time optimized kernels the 
application began to crash with floating point errors like in the 
following message:

APP DEBUG: Application caught signal 8.

FPU status word ffffa0e1, flags:  ERR_SUMM STACK_FAULT PRECISION INVALID
Obtained 6 stack frames for this thread.
Use gdb command: 'info line *0xADDRESS' to print corresponding line 
numbers.
einstein_S5R3_4.35_i686-pc-linux-gnu[0x8069e7e]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x818d436]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x805db8f]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x806b11c]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7e14fe0]
einstein_S5R3_4.35_i686-pc-linux-gnu(shmat+0x59)[0x804bda1]
Stack trace of LAL functions in worker thread:
GetSemiCohToplist at line 3177 of 
file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.35/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HierarchicalSearch.c
At lowest level status code = 0, description: NO LAL ERROR REGISTERED
called boinc_finish

I tracked this down to a single kernel configuration option. If 
CONFIG_PREEMPT is set to 'y' the application will start crashing. If 
CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application 
will run without errors.

The problem is reproducible in so far as the error always occurs when 
CONFIG_PREEMPT is set, but the time to the first occurrence varies greatly 
from some minutes up to more than 10 CPU hours.

I found this error first on an openSUSE kernel 2.6.22.17-0.1-rt. I verified 
the problem on the following kernel versions:

openSUSE 2.6.22.17-0.1-default
openSUSE 2.6.23.17-ccj64-rt
kernel.org 2.6.26-rc1
kernel.org 2.6.26-rc2-git5

My CPU is an Intel Core2Duo 6420, running two of the Einstein applications 
in 32-bit mode. From a discussion on the Einstein message boards I know 
that other user of the application are also affected.

Please let me know if you need any additional information to track this 
down.
              Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-06-04 13:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-01  9:01 CONFIG_PREEMPT causes corruption of application's FPU stack j.mell
2008-06-01 11:40 ` Andi Kleen
2008-06-01 16:47   ` Jürgen Mell
2008-06-02 21:37     ` Suresh Siddha
2008-06-02 22:57       ` Suresh Siddha
2008-06-03  6:02         ` Jürgen Mell
2008-06-04  7:44         ` Jürgen Mell
2008-06-04 10:53           ` Ingo Molnar
2008-06-04 12:55             ` Steven Rostedt
2008-06-04 13:02               ` Ingo Molnar
2008-06-01 12:12 ` Steven Rostedt
2008-06-01 17:11 ` Simon Holm Thøgersen
2008-06-02 21:31   ` Suresh Siddha
2008-06-03 13:23     ` Simon Holm Thøgersen
2008-06-03 19:43       ` Suresh Siddha
2008-06-03 21:08         ` Simon Holm Thøgersen
  -- strict thread matches above, loose matches on Subject: below --
2008-05-24 18:52 j.mell
2008-05-17 16:31 Jürgen Mell
2008-05-18 15:07 ` Steven Rostedt
2008-05-18 15:57   ` Jürgen Mell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox