CONFIG_PREEMPT causes corruption of application's FPU stack

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* CONFIG_PREEMPT causes corruption of application's FPU stack
@ 2008-05-17 16:31 Jürgen Mell
  2008-05-18 15:07 ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Jürgen Mell @ 2008-05-17 16:31 UTC (permalink / raw)
  To: linux-kernel

I am running the Einstein@home application (version 4.35, 
http://einstein.phys.uwm.edu).This application does lots of computations 
mostly with FPU and SSE instructions.
After I started experimenting with real-time optimized kernels the 
application began to crash with floating point errors like in the 
following message:

APP DEBUG: Application caught signal 8.

FPU status word ffffa0e1, flags:  ERR_SUMM STACK_FAULT PRECISION INVALID
Obtained 6 stack frames for this thread.
Use gdb command: 'info line *0xADDRESS' to print corresponding line 
numbers.
einstein_S5R3_4.35_i686-pc-linux-gnu[0x8069e7e]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x818d436]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x805db8f]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x806b11c]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7e14fe0]
einstein_S5R3_4.35_i686-pc-linux-gnu(shmat+0x59)[0x804bda1]
Stack trace of LAL functions in worker thread:
GetSemiCohToplist at line 3177 of 
file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.35/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HierarchicalSearch.c
At lowest level status code = 0, description: NO LAL ERROR REGISTERED
called boinc_finish

I tracked this down to a single kernel configuration option. If 
CONFIG_PREEMPT is set to 'y' the application will start crashing. If 
CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application 
will run without errors.

The problem is reproducible in so far as the error always occurs when 
CONFIG_PREEMPT is set, but the time to the first occurrence varies greatly 
from some minutes up to more than 10 CPU hours.

I found this error first on an openSUSE kernel 2.6.22.17-0.1-rt. I verified 
the problem on the following kernel versions:

openSUSE 2.6.22.17-0.1-default
openSUSE 2.6.23.17-ccj64-rt
kernel.org 2.6.26-rc1
kernel.org 2.6.26-rc2-git5

My CPU is an Intel Core2Duo 6420, running two of the Einstein applications 
in 32-bit mode. From a discussion on the Einstein message boards I know 
that other user of the application are also affected.

Please let me know if you need any additional information to track this 
down.
              Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-05-17 16:31 CONFIG_PREEMPT causes corruption of application's FPU stack Jürgen Mell
@ 2008-05-18 15:07 ` Steven Rostedt
  2008-05-18 15:57   ` Jürgen Mell
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2008-05-18 15:07 UTC (permalink / raw)
  To: J?rgen Mell; +Cc: linux-kernel

On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> 
> I tracked this down to a single kernel configuration option. If 
> CONFIG_PREEMPT is set to 'y' the application will start crashing. If 
> CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application 
> will run without errors.
> 
> The problem is reproducible in so far as the error always occurs when 
> CONFIG_PREEMPT is set, but the time to the first occurrence varies greatly 
> from some minutes up to more than 10 CPU hours.
> 
> I found this error first on an openSUSE kernel 2.6.22.17-0.1-rt. I verified 
> the problem on the following kernel versions:
> 
> openSUSE 2.6.22.17-0.1-default
> openSUSE 2.6.23.17-ccj64-rt
> kernel.org 2.6.26-rc1
> kernel.org 2.6.26-rc2-git5

So you see this error in both the SuSE RT kernel, *and* mainline
kernel.org?

If you see it in the kernel.org kernel, can you please do a git-bisect
to see which commit caused the problem?

Thanks,

-- Steve

> 
> My CPU is an Intel Core2Duo 6420, running two of the Einstein applications 
> in 32-bit mode. From a discussion on the Einstein message boards I know 
> that other user of the application are also affected.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-05-18 15:07 ` Steven Rostedt
@ 2008-05-18 15:57   ` Jürgen Mell
  0 siblings, 0 replies; 20+ messages in thread
From: Jürgen Mell @ 2008-05-18 15:57 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel

On Sonday, 18. May 2008, you wrote:
> On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> > I tracked this down to a single kernel configuration option. If
> > CONFIG_PREEMPT is set to 'y' the application will start crashing. If
> > CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> > application will run without errors.
> >
> > The problem is reproducible in so far as the error always occurs when
> > CONFIG_PREEMPT is set, but the time to the first occurrence varies
> > greatly from some minutes up to more than 10 CPU hours.
> >
> > I found this error first on an openSUSE kernel 2.6.22.17-0.1-rt. I
> > verified the problem on the following kernel versions:
> >
> > openSUSE 2.6.22.17-0.1-default
> > openSUSE 2.6.23.17-ccj64-rt
> > kernel.org 2.6.26-rc1
> > kernel.org 2.6.26-rc2-git5
>
> So you see this error in both the SuSE RT kernel, *and* mainline
> kernel.org?

Yes, that is correct. The error is present from the 2.6.22 SUSE kernel up 
to the most recent mainline kernel. It is also present in the standard 
SUSE kernel, if I just modify *only* CONFIG_PREEMPT.
What makes me wonder: I am using the machine in a production environment 
for programming, multi-media etc. Why does only the Einstein program catch 
the SIGFPE? Normally I would expect other programs to crash, too, if the 
problem is present. But up to now this never happened.

> If you see it in the kernel.org kernel, can you please do a git-bisect
> to see which commit caused the problem?

This is a bit of a problem. I do not know whether there was *ever* a kernel 
version with CONFIG_PREEMPT and without this problem as I have not tried 
any older kernel version yet. I will go back to SUSE 10.2 and try the 
2.6.18 kernel that comes with it.

Bye,
           Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
@ 2008-05-24 18:52 j.mell
  0 siblings, 0 replies; 20+ messages in thread
From: j.mell @ 2008-05-24 18:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel

On Sonday, 18. May 2008, you wrote:
> > On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> > > I tracked this down to a single kernel configuration option. If
> > > CONFIG_PREEMPT is set to 'y' the application will start crashing. If
> > > CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> > > application will run without errors.

> > If you see it in the kernel.org kernel, can you please do a git-bisect
> > to see which commit caused the problem?

> This is a bit of a problem. I do not know whether there was *ever* a
> kernel  version with CONFIG_PREEMPT and without this problem as I have
> not tried  any older kernel version yet. I will go back to SUSE 10.2 and
> try the  2.6.18 kernel that comes with it.

I found now that the problem was introduced somewhere between the 
kernel.org kernels 2.6.19.7 and 2.6.20. I will start bisecting now.

Bye,

          Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
@ 2008-06-01  9:01 j.mell
  2008-06-01 11:40 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: j.mell @ 2008-06-01  9:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, ak

Hi,

> On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> I tracked this down to a single kernel configuration option. If
> CONFIG_PREEMPT is set to 'y' the application will start crashing.
> If  CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> application will run without errors.

With lots of help from Heinz-Bernd, Bernd and Oliver of the Einstein@Home 
project I now found the the following:

1. Einstein@home will crash with trap #8 if the problem is present. The 
error occurs between some minutes after starting Einstein up to more than 
10 hours after starting Einstein. This seems to depend on how many other 
applications are used on the system (it takes much more time, if only the 
Einstein processes are active on the system).

2. The error was introduced between kernel.org kernels 2.6.19.7 and 2.6.20. 
It is still present in 2.6.26-rc4

3. If I revert the patch

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167

in 2.6.20, Einstein does not crash anymore (program was run for more than 
30 hours while system was in normal use with programming, multi-media 
etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.

Now I need some help as I am not an expert in this area. What I assume is 
that either the state of the FPU is not always restored (perhaps if the 
process is swapped between the two cores?) or it is restored more than 
once. Please keep in mind, that I am always running two Einstein processes 
simultaneously on my two cores!
I am willing to do further testing of this problem if someone can give me a 
hint how to continue.

Bye,

          Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-01  9:01 j.mell
@ 2008-06-01 11:40 ` Andi Kleen
  2008-06-01 16:47   ` Jürgen Mell
  2008-06-01 12:12 ` Steven Rostedt
  2008-06-01 17:11 ` Simon Holm Thøgersen
  2 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2008-06-01 11:40 UTC (permalink / raw)
  To: j.mell; +Cc: Steven Rostedt, linux-kernel, suresh.b.siddha, arjan

j.mell@t-online.de writes:

> or it is restored more than 
> once. Please keep in mind, that I am always running two Einstein processes 
> simultaneously on my two cores!
> I am willing to do further testing of this problem if someone can give me a 
> hint how to continue.

My bet would have been actually on aa283f49276e7d840a40fb01eee6de97eaa7e012
because it does some nasty things (enable interrupts in the middle
of __switch_to).

I looked through the old patchkit and couldn't find any specific 
PREEMPT problems. All code it changes should run with preempt_off

You could verify with sticking WARN_ON_ONCE(preemptible()) into 
all the places acc207616a91a413a50fdd8847a747c4a7324167 
changes (__unlazy_fpu, math_state_restore) and see if that triggers
anywhere.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-01  9:01 j.mell
  2008-06-01 11:40 ` Andi Kleen
@ 2008-06-01 12:12 ` Steven Rostedt
  2008-06-01 17:11 ` Simon Holm Thøgersen
  2 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2008-06-01 12:12 UTC (permalink / raw)
  To: j.mell; +Cc: LKML, Chuck Ebbert, Arjan van de Ven, Andrew Morton, Andi Kleen


[
  Fixed Andi's email and added those that Signed off on the
  problem commit.
]


On Sun, 1 Jun 2008 j.mell@t-online.de wrote:

>
> Hi,
>
> > On Sat, May 17, 2008 at 06:31:08PM +0200, J?rgen Mell wrote:
> > I tracked this down to a single kernel configuration option. If
> > CONFIG_PREEMPT is set to 'y' the application will start crashing.
> > If  CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the
> > application will run without errors.
>
> With lots of help from Heinz-Bernd, Bernd and Oliver of the Einstein@Home
> project I now found the the following:
>
> 1. Einstein@home will crash with trap #8 if the problem is present. The
> error occurs between some minutes after starting Einstein up to more than
> 10 hours after starting Einstein. This seems to depend on how many other
> applications are used on the system (it takes much more time, if only the
> Einstein processes are active on the system).
>
> 2. The error was introduced between kernel.org kernels 2.6.19.7 and 2.6.20.
> It is still present in 2.6.26-rc4
>
> 3. If I revert the patch
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167

Hi,

Thanks for bisecting this. I added the commiter and those that signed off
on the problem commit. They are the ones that will need to help you solve
this.

-- Steve


>
> in 2.6.20, Einstein does not crash anymore (program was run for more than
> 30 hours while system was in normal use with programming, multi-media
> etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
>
> Now I need some help as I am not an expert in this area. What I assume is
> that either the state of the FPU is not always restored (perhaps if the
> process is swapped between the two cores?) or it is restored more than
> once. Please keep in mind, that I am always running two Einstein processes
> simultaneously on my two cores!
> I am willing to do further testing of this problem if someone can give me a
> hint how to continue.
>
> Bye,
>
>           Jürgen
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-01 11:40 ` Andi Kleen
@ 2008-06-01 16:47   ` Jürgen Mell
  2008-06-02 21:37     ` Suresh Siddha
  0 siblings, 1 reply; 20+ messages in thread
From: Jürgen Mell @ 2008-06-01 16:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Steven Rostedt, linux-kernel, suresh.b.siddha, arjan

On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> j.mell@t-online.de writes:
> > or it is restored more than
> > once. Please keep in mind, that I am always running two Einstein
> > processes simultaneously on my two cores!
> > I am willing to do further testing of this problem if someone can give
> > me a hint how to continue.
>
> My bet would have been actually on
> aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some nasty
> things (enable interrupts in the middle of __switch_to).
>
> I looked through the old patchkit and couldn't find any specific
> PREEMPT problems. All code it changes should run with preempt_off
>
> You could verify with sticking WARN_ON_ONCE(preemptible()) into
> all the places acc207616a91a413a50fdd8847a747c4a7324167
> changes (__unlazy_fpu, math_state_restore) and see if that triggers
> anywhere.

No, that did not trigger. I put the WARN_ON_ONCE into process.c, traps.c 
and also into the __unlazy_fpu macro in i387.h but I got no messages 
anywhere (dmesg, /var/log/messages, /var/log/warn) when the trap #8 
occurred. 
Meanwhile I am also running the tests on another machine to make sure it is 
not a hardware-related problem.

Any new ideas are welcome!

Meanwhile I will go back to 2.6.20 and revert
aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong track...

Bye, 
         Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-01  9:01 j.mell
  2008-06-01 11:40 ` Andi Kleen
  2008-06-01 12:12 ` Steven Rostedt
@ 2008-06-01 17:11 ` Simon Holm Thøgersen
  2008-06-02 21:31   ` Suresh Siddha
  2 siblings, 1 reply; 20+ messages in thread
From: Simon Holm Thøgersen @ 2008-06-01 17:11 UTC (permalink / raw)
  To: j.mell; +Cc: Steven Rostedt, linux-kernel, ak

søn, 01 06 2008 kl. 11:01 +0200, skrev j.mell@t-online.de:
[...]
> 
> 3. If I revert the patch
>  
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
> 
> in 2.6.20, Einstein does not crash anymore (program was run for more than 
> 30 hours while system was in normal use with programming, multi-media 
> etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
[...]

I don't think the bisected commit is responsible for anything, but
triggering a bug elsewhere with your workload. I've been chasing the
same problem I think, but with other symptoms.

I'm triggering the following by running an lguest guest, but I guess the
workload just need to have the right scheduler intensity to trigger the
bug.

BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
Pid: 4771, comm: lguest Not tainted
2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
 [<c01146ee>] __might_sleep+0xe4/0xeb
 [<c01605d9>] kmem_cache_alloc+0x22/0xb4
 [<c0108479>] init_fpu+0xb0/0x14d
 [<c0104768>] math_state_restore+0x26/0x5d
 [<c01045ab>] device_not_available+0x43/0x48
 [<c011007b>] ? handle_vm86_fault+0x213/0x6b8
 [<c01029ad>] ? __switch_to+0x23/0x113
 [<c02d6c9f>] schedule+0x221/0x2a4
 [<c02d716a>] ? schedule_timeout+0x16/0x89
 [<c016ed36>] ? __pollwait+0xaa/0xb0
 [<c0168358>] ? pipe_poll+0x29/0x89
 [<c016e79a>] ? do_select+0x478/0x4cd
 [<c016ec8c>] ? __pollwait+0x0/0xb0
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c0116657>] ? default_wake_function+0x0/0xd
 [<c012f49d>] ? getnstimeofday+0x37/0xb7
 [<c012d810>] ? ktime_get_ts+0x40/0x44
 [<c012d827>] ? ktime_get+0x13/0x2f
 [<c011406f>] ? hrtick_start_fair+0xd5/0x111
 [<c01216f6>] ? internal_add_timer+0x8e/0x92
 [<c01f27b7>] ? delay_tsc+0x4f/0x68
 [<c025b05b>] ? ide_dma_intr+0x0/0x79
 [<c01f274d>] ? __delay+0x9/0xb
 [<c01f2766>] ? __const_udelay+0x17/0x19
 [<c0256df7>] ? ide_execute_command+0x7b/0x95
 [<c025a7a2>] ? ide_dma_start+0x24/0x36
 [<c0259ebb>] ? do_rw_taskfile+0x1be/0x1cf
 [<c025c31a>] ? ide_do_rw_disk+0x19a/0x1dd
 [<c01f2766>] ? __const_udelay+0x17/0x19
 [<c025554e>] ? ide_do_request+0x838/0x875
 [<c0254941>] ? ide_end_request+0x7d/0x99
 [<c016e9d8>] ? core_sys_select+0x1e9/0x2c7
 [<c0146aa4>] ? find_lock_page+0xa1/0xbb
 [<c01489a2>] ? filemap_fault+0x21c/0x382
 [<c0146942>] ? unlock_page+0x24/0x27
 [<c0151bac>] ? __do_fault+0x314/0x34c
 [<c0153a56>] ? handle_mm_fault+0x291/0x65a
 [<c016edcb>] ? sys_select+0x8f/0x143
 [<c02d9dc8>] ? do_page_fault+0x33c/0x616
 [<c0103a67>] ? sysenter_past_esp+0x78/0xb1
 [<c02d0000>] ? packet_rcv+0x159/0x2c7
 =======================
BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
Pid: 4771, comm: lguest Not tainted
2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
 [<c01146ee>] __might_sleep+0xe4/0xeb
 [<c01605d9>] kmem_cache_alloc+0x22/0xb4
 [<c012f49d>] ? getnstimeofday+0x37/0xb7
 [<c012f49d>] ? getnstimeofday+0x37/0xb7
 [<c0108479>] init_fpu+0xb0/0x14d
 [<c0104768>] math_state_restore+0x26/0x5d
 [<c01045ab>] device_not_available+0x43/0x48
 [<c0110000>] ? handle_vm86_fault+0x198/0x6b8
 [<c01029ad>] ? __switch_to+0x23/0x113
 [<c02d6c9f>] schedule+0x221/0x2a4
 [<c012a95b>] ? prepare_to_wait+0x6c/0x84
 [<c0168bdd>] ? pipe_wait+0x53/0x72
 [<c012a76f>] ? autoremove_wake_function+0x0/0x30
 [<c01692b9>] ? pipe_read+0x29a/0x302
 [<c012d6ae>] ? hrtimer_start+0xcc/0xf8
 [<c0115efd>] ? hrtick_set+0xcc/0x140
 [<c01630b0>] ? do_sync_read+0xba/0xf8
 [<c012a76f>] ? autoremove_wake_function+0x0/0x30
 [<c01638b8>] ? default_llseek+0xa7/0xb5
 [<c0162ff6>] ? do_sync_read+0x0/0xf8
 [<c0163795>] ? vfs_read+0x8a/0x106
 [<c0163ada>] ? sys_read+0x3b/0x60
 [<c0103a67>] ? sysenter_past_esp+0x78/0xb1
 =======================

I'm getting the traces with CONFIG_DEBUG_PREEMPT=y and no
CONFIG_DEBUG_SPINLOCK, since otherwise I'd just get 

BUG: sleeping function called from invalid context at mm/slab.c:3052
BUG: spinlock recursion on CPU#0, lguest/5428

and nothing further. Using CONFIG_DEBUG_PREEMPT=y,
CONFIG_DEBUG_SPINLOCK=y and booting with "lapic nmi_watchdog=2" I was
also able to grab the following over netconsole though

BUG: sleeping function called from invalid context at mm/slab.c:3052
BUG: spinlock recursion on CPU#0, lguest/5428
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c01f1f76, registers:
Modules linked in: lg tun arc4 ecb crypto_blkcipher cryptomgr
crypto_algapi ieee80211_crypt_wep bridge llc snd_seq snd_seq_device
radeonfb uhci_hcd snd_intel8x0 ehci_hcd snd_ac97_codec ipw2200 usbcore
fb ac97_bus fb_ddc backlight snd_pcm ieee80211 firewire_ohci
firewire_core i2c_algo_bit snd_timer intel_agp cfbcopyarea snd agpgart
i2c_i801 ieee80211_crypt firmware_class cfbimgblt soundcore crc_itu_t
cfbfillrect rtc iTCO_wdt snd_page_alloc i2c_core

Pid: 5428, comm: lguest Not tainted
(2.6.26-rc4debug-locks-extended-00103-g1beee8d #2)
EIP: 0060:[<c01f1f76>] EFLAGS: 00000002 CPU: 0
EIP is at delay_tsc+0x2e/0x68
EAX: 100a8436 EBX: c03acd84 ECX: 00000000 EDX: 0000004c
ESI: 100a83fb EDI: 00000001 EBP: e9651738 ESP: e9651724
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process lguest (pid: 5428, ti=e9650000 task=efbc92c0 task.ti=e964c000)
Stack: 00000001 100a83fb c03acd84 02df69af 00000001 e9651740 c01f1f2d
e9651758 
       c01ff481 593ac9e8 c03acd84 00000092 000078cf e9651774 c02d4968
00000000 
       00000002 c01141c4 c03acd84 00000001 e965178c c01141c4 00000001
00007892 
Call Trace:
 [<c01f1f2d>] ? __delay+0x9/0xb
 [<c01ff481>] ? _raw_spin_lock+0x83/0xcb
 [<c02d4968>] ? _spin_lock_irqsave+0x46/0x4f
 [<c01141c4>] ? __wake_up+0x15/0x3b
 [<c01141c4>] ? __wake_up+0x15/0x3b
 [<c0119cf0>] ? wake_up_klogd+0x2e/0x31
 [<c0119ea0>] ? release_console_sem+0x1ad/0x1b5
 [<c011a3d1>] ? vprintk+0x383/0x3c2
 [<c0134276>] ? debug_check_no_locks_freed+0x111/0x13c
 [<c0134276>] ? debug_check_no_locks_freed+0x111/0x13c
 [<c011a425>] ? printk+0x15/0x17
 [<c01ff286>] ? spin_bug+0x3f/0x80
 [<c01ff432>] ? _raw_spin_lock+0x34/0xcb
 [<c02d474e>] ? _spin_lock+0x2a/0x32
 [<c011471b>] ? task_rq_lock+0x2a/0x31
 [<c011471b>] ? task_rq_lock+0x2a/0x31
 [<c0114dc8>] ? try_to_wake_up+0x15/0x81
 [<c0114e3f>] ? default_wake_function+0xb/0xd
 [<c012985f>] ? autoremove_wake_function+0xe/0x30
 [<c0113626>] ? __wake_up_common+0x35/0x5b
 [<c01141d7>] ? __wake_up+0x28/0x3b
 [<c0119cf0>] ? wake_up_klogd+0x2e/0x31
 [<c0119ea0>] ? release_console_sem+0x1ad/0x1b5
 [<c011a3d1>] ? vprintk+0x383/0x3c2
 [<c011a425>] ? printk+0x15/0x17
 [<c0115232>] ? __might_sleep+0x8c/0x108
 [<c0160c37>] ? kmem_cache_alloc+0x22/0xd6
 [<c0108441>] ? init_fpu+0xb0/0x14d
 [<c01047fd>] ? math_state_restore+0x2b/0x67
 [<c010463a>] ? device_not_available+0x4e/0x53
 [<c02d2b8a>] ? schedule+0x1fe/0x2a1
 [<c011007b>] ? handle_vm86_fault+0x31f/0x6b8
 [<c01029c1>] ? __switch_to+0x23/0x113
 [<c02d2bc6>] ? schedule+0x23a/0x2a1
 [<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
 [<c02d4c3f>] ? _spin_unlock_irqrestore+0x56/0x6c
 [<c02d2db8>] ? schedule_timeout+0x16/0x89
 [<c016ee08>] ? __pollwait+0xaa/0xb0
 [<c0168604>] ? pipe_poll+0x29/0x89
 [<c016e844>] ? do_select+0x4a6/0x4f8
 [<c016ed5e>] ? __pollwait+0x0/0xb0
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0114e34>] ? default_wake_function+0x0/0xd
 [<c0113dfa>] ? hrtick_start_fair+0x117/0x11f
 [<c01607a8>] ? kfree+0xad/0xc0
 [<c0160833>] ? kmem_cache_free+0x78/0x8a
 [<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
 [<c0133fac>] ? mark_held_locks+0x4e/0x66
 [<c01349ae>] ? __lock_acquire+0x488/0xb4c
 [<c016eaad>] ? core_sys_select+0x217/0x2f2
 [<c02d4d42>] ? _read_unlock_irq+0x36/0x4b
 [<c0147fab>] ? unlock_page+0x24/0x27
 [<c0152c7f>] ? __do_fault+0x31e/0x356
 [<c0153fb3>] ? handle_mm_fault+0x2a0/0x637
 [<c016ee9d>] ? sys_select+0x8f/0x143
 [<c02d67d3>] ? do_page_fault+0x352/0x631
 [<c0103b59>] ? restore_nocheck+0x12/0x15
 [<c02d6481>] ? do_page_fault+0x0/0x631
 [<c0134130>] ? trace_hardirqs_on+0xe6/0x11b
 [<c0103a4f>] ? sysenter_past_esp+0x78/0xd1
 =======================
Code: 57 56 53 83 ec 08 89 45 ec b8 01 00 00 00 e8 b8 4b 0e 00 0f 31 0f
1f 40 00 b9 00 00 00 00 89 c6 89 c8 09 f0 89 45 f0 f3 90 0f 31 <0f> 1f
40 00 b9 00 00 00 00 89 c6 89 c8 09 f0 2b 45 f0 3b 45 ec 


Let us take a closer look at __switch_to. I'm using test data from yet
another trace (that I actually captured before all the other traces
presented here).

BUG: unable to handle kernel NULL pointer dereference at 000001ff
IP: [<c0102964>] __switch_to+0x19/0xff

That is the only part of the trace I've got; there wasn't produced more.
Anyway, for that particular kernel I used, __switch_to disassembled to

0xc0102955 <__switch_to+10>: mov    0x4(%eax),%eax
0xc0102958 <__switch_to+13>: testb  $0x1,0xc(%eax)
0xc010295c <__switch_to+17>: je     0xc0102995 <__switch_to+74>
0xc010295e <__switch_to+19>: mov    0x26c(%edi),%eax
0xc0102964 <__switch_to+25>: fnsave (%eax)

with corresponding source code

struct task_struct * __switch_to(struct task_struct *prev_p, struct
task_struct *next_p)
{
[...]
        __unlazy_fpu(prev_p);
[...]
}

static inline void __unlazy_fpu(struct task_struct *tsk)
{
        if (task_thread_info(tsk)->status & TS_USEDFPU) {
                __save_init_fpu(tsk);
                stts();
        } else
                tsk->fpu_counter = 0;
}

where __save_init_fpu(tsk) accesses
	tsk->thread.xstate->fxsave

The first mov, testb and je of the disassembly are the if statement, and
the second mov and fnsave are part of __save_init_fpu that access
tsk->thread.xstate->fxsave. %eax (initially) and %edi hold prev_p from
__switch_to.

Hope that was readable and not too confusing. Now comes some partly
guesswork from my side that could be wrong. As far as I can tell
%eax (initially) and %edi / prev_p must hold 0xffffff93. Why the first
mov is valid then I'm not sure, but the if statement must be true by
chance.  It is not by pure chance though, since prev_p always has the
same value (i.e. it is consistently "dereference at 000001ff" I get
across multiple runs).

So I'd say some sort of corruption caused by very fast context switch
between the launcher and switcher (or-whatever-their-names-are)
processes in lguest.

That is as far as I have been able to debug this, suggestions are
welcome. I guess I should note that I haven't tried with preemption
disabled, but I don't think there's much point in it.


Simon Holm Thøgersen


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-01 17:11 ` Simon Holm Thøgersen
@ 2008-06-02 21:31   ` Suresh Siddha
  2008-06-03 13:23     ` Simon Holm Thøgersen
  0 siblings, 1 reply; 20+ messages in thread
From: Suresh Siddha @ 2008-06-02 21:31 UTC (permalink / raw)
  To: Simon Holm Thøgersen
  Cc: j.mell, Steven Rostedt, linux-kernel, ak, mingo, hpa, tglx, arjan

On Sun, Jun 01, 2008 at 07:11:02PM +0200, Simon Holm Thøgersen wrote:
> søn, 01 06 2008 kl. 11:01 +0200, skrev j.mell@t-online.de:
> [...]
> > 
> > 3. If I revert the patch
> >  
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
> > 
> > in 2.6.20, Einstein does not crash anymore (program was run for more than 
> > 30 hours while system was in normal use with programming, multi-media 
> > etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
> [...]
> 
> I don't think the bisected commit is responsible for anything, but
> triggering a bug elsewhere with your workload. I've been chasing the
> same problem I think, but with other symptoms.

Simon, There seems to be multiple issues here. fpu corruption seems
to be a different problem compared to the issue you have encountered.

> 
> I'm triggering the following by running an lguest guest, but I guess the
> workload just need to have the right scheduler intensity to trigger the
> bug.
> 
> BUG: sleeping function called from invalid context at mm/slab.c:3052
> in_atomic():1, irqs_disabled():0
> Pid: 4771, comm: lguest Not tainted
> 2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
>  [<c01146ee>] __might_sleep+0xe4/0xeb
>  [<c01605d9>] kmem_cache_alloc+0x22/0xb4
>  [<c0108479>] init_fpu+0xb0/0x14d
>  [<c0104768>] math_state_restore+0x26/0x5d
>  [<c01045ab>] device_not_available+0x43/0x48
>  [<c011007b>] ? handle_vm86_fault+0x213/0x6b8
>  [<c01029ad>] ? __switch_to+0x23/0x113
>  [<c02d6c9f>] schedule+0x221/0x2a4

Simon, Can you please try the appended patch and see if it fixes this
issue? Thanks.
---

[patch] x86: fix blocking call (math_state_restore()) condition in __switch_to

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index f8476df..6d54833 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -649,8 +649,11 @@ struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct
 	/* If the task has used fpu the last 5 timeslices, just do a full
 	 * restore of the math state immediately to avoid the trap; the
 	 * chances of needing FPU soon are obviously high now
+	 *
+	 * tsk_used_math() checks prevent calling math_state_restore(),
+	 * which can sleep in the case of !tsk_used_math()
 	 */
-	if (next_p->fpu_counter > 5)
+	if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
 		math_state_restore();
 
 	/*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e2319f3..ac54ff5 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -658,8 +658,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* If the task has used fpu the last 5 timeslices, just do a full
 	 * restore of the math state immediately to avoid the trap; the
 	 * chances of needing FPU soon are obviously high now
+	 *
+	 * tsk_used_math() checks prevent calling math_state_restore(),
+	 * which can sleep in the case of !tsk_used_math()
 	 */
-	if (next_p->fpu_counter>5)
+	if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
 		math_state_restore();
 	return prev_p;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-01 16:47   ` Jürgen Mell
@ 2008-06-02 21:37     ` Suresh Siddha
  2008-06-02 22:57       ` Suresh Siddha
  0 siblings, 1 reply; 20+ messages in thread
From: Suresh Siddha @ 2008-06-02 21:37 UTC (permalink / raw)
  To: Jürgen Mell
  Cc: Andi Kleen, Steven Rostedt, linux-kernel, suresh.b.siddha, arjan,
	mingo, hpa, tglx

On Sun, Jun 01, 2008 at 06:47:29PM +0200, Jürgen Mell wrote:
> On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > j.mell@t-online.de writes:
> > > or it is restored more than
> > > once. Please keep in mind, that I am always running two Einstein
> > > processes simultaneously on my two cores!
> > > I am willing to do further testing of this problem if someone can give
> > > me a hint how to continue.
> >
> > My bet would have been actually on
> > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some nasty
> > things (enable interrupts in the middle of __switch_to).
> >
> > I looked through the old patchkit and couldn't find any specific
> > PREEMPT problems. All code it changes should run with preempt_off
> >
> > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > changes (__unlazy_fpu, math_state_restore) and see if that triggers
> > anywhere.
> 
> No, that did not trigger. I put the WARN_ON_ONCE into process.c, traps.c 
> and also into the __unlazy_fpu macro in i387.h but I got no messages 
> anywhere (dmesg, /var/log/messages, /var/log/warn) when the trap #8 
> occurred. 
> Meanwhile I am also running the tests on another machine to make sure it is 
> not a hardware-related problem.
> 
> Any new ideas are welcome!
> 
> Meanwhile I will go back to 2.6.20 and revert
> aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong track...

2.6.20 doesn't have the commit 'aa283f49276e7d840a40fb01eee6de97eaa7e012'

As you are seeing this corruption problem starting from 2.6.20,
atleast recent(in 2.6.26 series) fpu changes don't play a role in this.

I will try to reproduce your issue.

thanks,
suresh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-02 21:37     ` Suresh Siddha
@ 2008-06-02 22:57       ` Suresh Siddha
  2008-06-03  6:02         ` Jürgen Mell
  2008-06-04  7:44         ` Jürgen Mell
  0 siblings, 2 replies; 20+ messages in thread
From: Suresh Siddha @ 2008-06-02 22:57 UTC (permalink / raw)
  To: j.mell
  Cc: Jürgen Mell, Andi Kleen, Steven Rostedt, linux-kernel, arjan,
	mingo, hpa, tglx

On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> On Sun, Jun 01, 2008 at 06:47:29PM +0200, Jürgen Mell wrote:
> > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > j.mell@t-online.de writes:
> > > > or it is restored more than
> > > > once. Please keep in mind, that I am always running two Einstein
> > > > processes simultaneously on my two cores!
> > > > I am willing to do further testing of this problem if someone can give
> > > > me a hint how to continue.
> > >
> > > My bet would have been actually on
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some nasty
> > > things (enable interrupts in the middle of __switch_to).
> > >
> > > I looked through the old patchkit and couldn't find any specific
> > > PREEMPT problems. All code it changes should run with preempt_off
> > >
> > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > changes (__unlazy_fpu, math_state_restore) and see if that triggers
> > > anywhere.
> > 
> > No, that did not trigger. I put the WARN_ON_ONCE into process.c, traps.c 
> > and also into the __unlazy_fpu macro in i387.h but I got no messages 
> > anywhere (dmesg, /var/log/messages, /var/log/warn) when the trap #8 
> > occurred. 
> > Meanwhile I am also running the tests on another machine to make sure it is 
> > not a hardware-related problem.
> > 
> > Any new ideas are welcome!
> > 
> > Meanwhile I will go back to 2.6.20 and revert
> > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong track...
> 
> 2.6.20 doesn't have the commit 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
> 
> As you are seeing this corruption problem starting from 2.6.20,
> atleast recent(in 2.6.26 series) fpu changes don't play a role in this.
> 
> I will try to reproduce your issue.

Jürgen, I think I found the reason for your issue aswell.

As you observed, it is probably coming from the commit
acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU optimization

It's a side affect though. This is the failing scenario:

process 'A' in save_i387_ia32() just after clear_used_math()

Got an interrupt and pre-empted out.

At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()

This results in init_fpu() during the __switch_to()'s math_state_restore()

And resulting in fpu corruption which will be saved/restored 
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.

So in short, yes the problem shows up for preempt enabled kernels and the
same patch I sent out 30 mins back (appended again) should fix your issue
aswell. Can you please test this and check if my theory is indeed correct.
If it fixes your issue aswell, then I will re-post the patch with
a new changelog and updated comments in the patch.

thanks,
suresh
---

[patch] x86: fix blocking call (math_state_restore()) condition in __switch_to

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index f8476df..6d54833 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -649,8 +649,11 @@ struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct
 	/* If the task has used fpu the last 5 timeslices, just do a full
 	 * restore of the math state immediately to avoid the trap; the
 	 * chances of needing FPU soon are obviously high now
+	 *
+	 * tsk_used_math() checks prevent calling math_state_restore(),
+	 * which can sleep in the case of !tsk_used_math()
 	 */
-	if (next_p->fpu_counter > 5)
+	if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
 		math_state_restore();
 
 	/*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e2319f3..ac54ff5 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -658,8 +658,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* If the task has used fpu the last 5 timeslices, just do a full
 	 * restore of the math state immediately to avoid the trap; the
 	 * chances of needing FPU soon are obviously high now
+	 *
+	 * tsk_used_math() checks prevent calling math_state_restore(),
+	 * which can sleep in the case of !tsk_used_math()
 	 */
-	if (next_p->fpu_counter>5)
+	if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
 		math_state_restore();
 	return prev_p;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-02 22:57       ` Suresh Siddha
@ 2008-06-03  6:02         ` Jürgen Mell
  2008-06-04  7:44         ` Jürgen Mell
  1 sibling, 0 replies; 20+ messages in thread
From: Jürgen Mell @ 2008-06-03  6:02 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Andi Kleen, Steven Rostedt, linux-kernel, arjan, mingo, hpa, tglx

On Dienstag, 3. Juni 2008, Suresh Siddha wrote:
> On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> > On Sun, Jun 01, 2008 at 06:47:29PM +0200, Jürgen Mell wrote:
> > > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > > j.mell@t-online.de writes:
> > > > > or it is restored more than
> > > > > once. Please keep in mind, that I am always running two Einstein
> > > > > processes simultaneously on my two cores!
> > > > > I am willing to do further testing of this problem if someone
> > > > > can give me a hint how to continue.
> > > >
> > > > My bet would have been actually on
> > > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some
> > > > nasty things (enable interrupts in the middle of __switch_to).
> > > >
> > > > I looked through the old patchkit and couldn't find any specific
> > > > PREEMPT problems. All code it changes should run with preempt_off
> > > >
> > > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > > changes (__unlazy_fpu, math_state_restore) and see if that
> > > > triggers anywhere.
> > >
> > > No, that did not trigger. I put the WARN_ON_ONCE into process.c,
> > > traps.c and also into the __unlazy_fpu macro in i387.h but I got no
> > > messages anywhere (dmesg, /var/log/messages, /var/log/warn) when the
> > > trap #8 occurred.
> > > Meanwhile I am also running the tests on another machine to make
> > > sure it is not a hardware-related problem.
> > >
> > > Any new ideas are welcome!
> > >
> > > Meanwhile I will go back to 2.6.20 and revert
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong
> > > track...
> >
> > 2.6.20 doesn't have the commit
> > 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
> >
> > As you are seeing this corruption problem starting from 2.6.20,
> > atleast recent(in 2.6.26 series) fpu changes don't play a role in
> > this.
> >
> > I will try to reproduce your issue.
>
> Jürgen, I think I found the reason for your issue aswell.
>
> As you observed, it is probably coming from the commit
> acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> optimization
>
> It's a side affect though. This is the failing scenario:
>
> process 'A' in save_i387_ia32() just after clear_used_math()
>
> Got an interrupt and pre-empted out.
>
> At the next context switch to process 'A' again, kernel tries to restore
> the math state proactively and sees a fpu_counter > 0 and
> !tsk_used_math()
>
> This results in init_fpu() during the __switch_to()'s
> math_state_restore()
>
> And resulting in fpu corruption which will be saved/restored
> (save_i387_fxsave and restore_i387_fxsave) during the remaining
> part of the signal handling after the context switch.
>
> So in short, yes the problem shows up for preempt enabled kernels and
> the same patch I sent out 30 mins back (appended again) should fix your
> issue aswell. Can you please test this and check if my theory is indeed
> correct. If it fixes your issue aswell, then I will re-post the patch
> with a new changelog and updated comments in the patch.
>
> thanks,
> suresh

Many thanks for the patch!
I will test this immediately but as it takes some time to make sure that 
the problem is really gone it will take some time until I have a report.

Thanks,
               Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-02 21:31   ` Suresh Siddha
@ 2008-06-03 13:23     ` Simon Holm Thøgersen
  2008-06-03 19:43       ` Suresh Siddha
  0 siblings, 1 reply; 20+ messages in thread
From: Simon Holm Thøgersen @ 2008-06-03 13:23 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: j.mell, Steven Rostedt, linux-kernel, ak, mingo, hpa, tglx, arjan,
	lguest

[CC lguest <lguest@ozlabs.org>]
man, 02 06 2008 kl. 14:31 -0700, skrev Suresh Siddha: 
> On Sun, Jun 01, 2008 at 07:11:02PM +0200, Simon Holm Thøgersen wrote:
> > søn, 01 06 2008 kl. 11:01 +0200, skrev j.mell@t-online.de:
> > [...]
> > > 
> > > 3. If I revert the patch
> > >  
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=acc207616a91a413a50fdd8847a747c4a7324167
> > > 
> > > in 2.6.20, Einstein does not crash anymore (program was run for more than 
> > > 30 hours while system was in normal use with programming, multi-media 
> > > etc.). Unfortunately git refuses to revert this patch in 2.6.26-rc4.
> > [...]
> > 
> > I don't think the bisected commit is responsible for anything, but
> > triggering a bug elsewhere with your workload. I've been chasing the
> > same problem I think, but with other symptoms.
> 
> Simon, There seems to be multiple issues here. fpu corruption seems
> to be a different problem compared to the issue you have encountered.
> 
> > 
> > I'm triggering the following by running an lguest guest, but I guess the
> > workload just need to have the right scheduler intensity to trigger the
> > bug.
> > 
> > BUG: sleeping function called from invalid context at mm/slab.c:3052
> > in_atomic():1, irqs_disabled():0
> > Pid: 4771, comm: lguest Not tainted
> > 2.6.26-rc4-debug-only-preemptible-00103-g1beee8d #3
> >  [<c01146ee>] __might_sleep+0xe4/0xeb
> >  [<c01605d9>] kmem_cache_alloc+0x22/0xb4
> >  [<c0108479>] init_fpu+0xb0/0x14d
> >  [<c0104768>] math_state_restore+0x26/0x5d
> >  [<c01045ab>] device_not_available+0x43/0x48
> >  [<c011007b>] ? handle_vm86_fault+0x213/0x6b8
> >  [<c01029ad>] ? __switch_to+0x23/0x113
> >  [<c02d6c9f>] schedule+0x221/0x2a4
> 
> Simon, Can you please try the appended patch and see if it fixes this
> issue? Thanks.
> ---
> 
> [patch] x86: fix blocking call (math_state_restore()) condition in __switch_to
> 
> Add tsk_used_math() checks to prevent calling math_state_restore()
> which can sleep in the case of !tsk_used_math(). This prevents
> making a blocking call in __switch_to().
> 
> Apparently "fpu_counter > 5" check is not enough, as in some signal handling
> and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
> 
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> ---
Hi Suresh,

and thanks for looking into this. The patch did not fix the issue, but
I'm wondering if it is lguest calling math_state_restore in
drivers/lguest/x86/core.c that could be the problem?

Regardless of whether that is the issue, I think you (and everybody
else) will be able to reproduce the issue by running lguest on a 32-bit
system with CONFIG_PREEMPT=y and CONFIG_DEBUG_SPINLOCKS_SLEEP=y (I'm
also using CONFIG_DEBUG_PREEMPT=y but I don't think that matter). If you
download http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img and
run

Documentation/lguest/lguest 64 vmlinux --block=initrd-1.1-i386.img

it will very likely trigger the backtraces I'm getting. Has anyone on
the lguest list tried running with CONFIG_PREEMPT?


Simon


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-03 13:23     ` Simon Holm Thøgersen
@ 2008-06-03 19:43       ` Suresh Siddha
  2008-06-03 21:08         ` Simon Holm Thøgersen
  0 siblings, 1 reply; 20+ messages in thread
From: Suresh Siddha @ 2008-06-03 19:43 UTC (permalink / raw)
  To: Simon Holm Thøgersen
  Cc: Suresh Siddha, j.mell, Steven Rostedt, linux-kernel, mingo, hpa,
	tglx, arjan, lguest, andi

On Tue, Jun 03, 2008 at 03:23:30PM +0200, Simon Holm Thøgersen wrote:
> > [patch] x86: fix blocking call (math_state_restore()) condition in __switch_to
> > 
> > Add tsk_used_math() checks to prevent calling math_state_restore()
> > which can sleep in the case of !tsk_used_math(). This prevents
> > making a blocking call in __switch_to().
> > 
> > Apparently "fpu_counter > 5" check is not enough, as in some signal handling
> > and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
> > 
> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > ---
> Hi Suresh,
> 
> and thanks for looking into this. The patch did not fix the issue, but

Ok. You are probably running into different issue (please see below).
Above patch fixes a real issue and I think it should fix the fpu
corruption issue encountered by Jürgen. I will wait for Jürgen's test
results before pushing the above patch.

> I'm wondering if it is lguest calling math_state_restore in
> drivers/lguest/x86/core.c that could be the problem?

I def see a problem. In lguest_arch_run_guest(), MSR_IA32_SYSENTER_CS is not
restored before making the math_state_restore() call. As the
math_state_restore() can now block, this can cause issues. Appending
patch should fix this issue and from your oops report, it is not very
clear if the below patch should help fix your issue or not. Can you
please try the below appended patch.

> 
> Regardless of whether that is the issue, I think you (and everybody
> else) will be able to reproduce the issue by running lguest on a 32-bit
> system with CONFIG_PREEMPT=y and CONFIG_DEBUG_SPINLOCKS_SLEEP=y (I'm
> also using CONFIG_DEBUG_PREEMPT=y but I don't think that matter). If you
> download http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img and
> run
> 
> Documentation/lguest/lguest 64 vmlinux --block=initrd-1.1-i386.img
> 
> it will very likely trigger the backtraces I'm getting.

If the below patch doesn't help fix your issue, then I will try to reproduce
it locally here.

thanks,
suresh
---

[patch] x86, lguest: Restore MSR_IA32_SYSENTER_CS before math_state_restore()

Restore MSR_IA32_SYSENTER_CS before making the blocking math_state_restore()
in lguest_arch_run_guest()

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 5126d5d..9279ce7 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -191,6 +191,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
 	 * was doing. */
 	run_guest_once(cpu, lguest_pages(raw_smp_processor_id()));
 
+	/* Restore SYSENTER if it's supposed to be on. */
+	if (boot_cpu_has(X86_FEATURE_SEP))
+		wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
+
 	/* Note that the "regs" structure contains two extra entries which are
 	 * not really registers: a trap number which says what interrupt or
 	 * trap made the switcher code come back, and an error code which some
@@ -203,13 +207,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
 	if (cpu->regs->trapnum == 14)
 		cpu->arch.last_pagefault = read_cr2();
 	/* Similarly, if we took a trap because the Guest used the FPU,
-	 * we have to restore the FPU it expects to see. */
+	 * we have to restore the FPU it expects to see. math_state_restore() can
+	 * re-enable interrupts and block. */
 	else if (cpu->regs->trapnum == 7)
 		math_state_restore();
-
-	/* Restore SYSENTER if it's supposed to be on. */
-	if (boot_cpu_has(X86_FEATURE_SEP))
-		wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
 }
 
 /*H:130 Now we've examined the hypercall code; our Guest can make requests.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-03 19:43       ` Suresh Siddha
@ 2008-06-03 21:08         ` Simon Holm Thøgersen
  0 siblings, 0 replies; 20+ messages in thread
From: Simon Holm Thøgersen @ 2008-06-03 21:08 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: j.mell, Steven Rostedt, linux-kernel, mingo, hpa, tglx, arjan,
	lguest, andi

tir, 03 06 2008 kl. 12:43 -0700, skrev Suresh Siddha:
> On Tue, Jun 03, 2008 at 03:23:30PM +0200, Simon Holm Thøgersen wrote:
> > > [patch] x86: fix blocking call (math_state_restore()) condition in __switch_to
> > > 
> > > Add tsk_used_math() checks to prevent calling math_state_restore()
> > > which can sleep in the case of !tsk_used_math(). This prevents
> > > making a blocking call in __switch_to().
> > > 
> > > Apparently "fpu_counter > 5" check is not enough, as in some signal handling
> > > and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.
> > > 
> > > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > > ---
> > Hi Suresh,
> > 
> > and thanks for looking into this. The patch did not fix the issue, but
> 
> Ok. You are probably running into different issue (please see below).
> Above patch fixes a real issue and I think it should fix the fpu
> corruption issue encountered by Jürgen. I will wait for Jürgen's test
> results before pushing the above patch.
> 
> > I'm wondering if it is lguest calling math_state_restore in
> > drivers/lguest/x86/core.c that could be the problem?
> 
> I def see a problem. In lguest_arch_run_guest(), MSR_IA32_SYSENTER_CS is not
> restored before making the math_state_restore() call. As the
> math_state_restore() can now block, this can cause issues. Appending
> patch should fix this issue and from your oops report, it is not very
> clear if the below patch should help fix your issue or not. Can you
> please try the below appended patch.
> 
> > 
> > Regardless of whether that is the issue, I think you (and everybody
> > else) will be able to reproduce the issue by running lguest on a 32-bit
> > system with CONFIG_PREEMPT=y and CONFIG_DEBUG_SPINLOCKS_SLEEP=y (I'm
> > also using CONFIG_DEBUG_PREEMPT=y but I don't think that matter). If you
> > download http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img and
> > run
> > 
> > Documentation/lguest/lguest 64 vmlinux --block=initrd-1.1-i386.img
> > 
> > it will very likely trigger the backtraces I'm getting.
> 
> If the below patch doesn't help fix your issue, then I will try to reproduce
> it locally here.
> 
It didn't, I'm afraid. I had both patches applied, and was able to
reproduce the trace fairly easily. The patches might have made the issue
slightly more difficult to provoke, but I'm not sure.


Simon


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-02 22:57       ` Suresh Siddha
  2008-06-03  6:02         ` Jürgen Mell
@ 2008-06-04  7:44         ` Jürgen Mell
  2008-06-04 10:53           ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Jürgen Mell @ 2008-06-04  7:44 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Andi Kleen, Steven Rostedt, linux-kernel, arjan, mingo, hpa, tglx,
	Simon Holm Thøgersen

On Tuesday, 3rd June 2008, Suresh Siddha wrote:
> On Mon, Jun 02, 2008 at 02:37:56PM -0700, Suresh Siddha wrote:
> > On Sun, Jun 01, 2008 at 06:47:29PM +0200, Jürgen Mell wrote:
> > > On Sonntag, 1. Juni 2008, Andi Kleen wrote:
> > > > j.mell@t-online.de writes:
> > > > > or it is restored more than
> > > > > once. Please keep in mind, that I am always running two Einstein
> > > > > processes simultaneously on my two cores!
> > > > > I am willing to do further testing of this problem if someone
> > > > > can give me a hint how to continue.
> > > >
> > > > My bet would have been actually on
> > > > aa283f49276e7d840a40fb01eee6de97eaa7e012 because it does some
> > > > nasty things (enable interrupts in the middle of __switch_to).
> > > >
> > > > I looked through the old patchkit and couldn't find any specific
> > > > PREEMPT problems. All code it changes should run with preempt_off
> > > >
> > > > You could verify with sticking WARN_ON_ONCE(preemptible()) into
> > > > all the places acc207616a91a413a50fdd8847a747c4a7324167
> > > > changes (__unlazy_fpu, math_state_restore) and see if that
> > > > triggers anywhere.
> > >
> > > No, that did not trigger. I put the WARN_ON_ONCE into process.c,
> > > traps.c and also into the __unlazy_fpu macro in i387.h but I got no
> > > messages anywhere (dmesg, /var/log/messages, /var/log/warn) when the
> > > trap #8 occurred.
> > > Meanwhile I am also running the tests on another machine to make
> > > sure it is not a hardware-related problem.
> > >
> > > Any new ideas are welcome!
> > >
> > > Meanwhile I will go back to 2.6.20 and revert
> > > aa283f49276e7d840a40fb01eee6de97eaa7e012. Maybe I got on a wrong
> > > track...
> >
> > 2.6.20 doesn't have the commit
> > 'aa283f49276e7d840a40fb01eee6de97eaa7e012'
> >
> > As you are seeing this corruption problem starting from 2.6.20,
> > atleast recent(in 2.6.26 series) fpu changes don't play a role in
> > this.
> >
> > I will try to reproduce your issue.
>
> Jürgen, I think I found the reason for your issue aswell.
>
> As you observed, it is probably coming from the commit
> acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> optimization
>
> It's a side affect though. This is the failing scenario:
>
> process 'A' in save_i387_ia32() just after clear_used_math()
>
> Got an interrupt and pre-empted out.
>
> At the next context switch to process 'A' again, kernel tries to restore
> the math state proactively and sees a fpu_counter > 0 and
> !tsk_used_math()
>
> This results in init_fpu() during the __switch_to()'s
> math_state_restore()
>
> And resulting in fpu corruption which will be saved/restored
> (save_i387_fxsave and restore_i387_fxsave) during the remaining
> part of the signal handling after the context switch.
>
> So in short, yes the problem shows up for preempt enabled kernels and
> the same patch I sent out 30 mins back (appended again) should fix your
> issue aswell. Can you please test this and check if my theory is indeed
> correct. If it fixes your issue aswell, then I will re-post the patch
> with a new changelog and updated comments in the patch.
>

I have applied your patch to both an openSUSE 2.6.22.17 kernel and a 
2.6.26-rc4 kernel.org kernel and run the test with Einstein@home on two 
different machines. One machine is running 24 hours now, the other 18 
hours. 

During this time there were no faults on both machines.

As it never before took more than 12 hours until the first appearance of 
the problem, I think your patch fixed it. Very good work!

I will continue running the test, but I believe we can call this fixed.

Thank you again!
                             Jürgen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-04  7:44         ` Jürgen Mell
@ 2008-06-04 10:53           ` Ingo Molnar
  2008-06-04 12:55             ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-06-04 10:53 UTC (permalink / raw)
  To: Jürgen Mell
  Cc: Suresh Siddha, Andi Kleen, Steven Rostedt, linux-kernel, arjan,
	hpa, tglx, Simon Holm Thøgersen


* Jürgen Mell <j.mell@t-online.de> wrote:

> > Jürgen, I think I found the reason for your issue aswell.
> >
> > As you observed, it is probably coming from the commit
> > acc207616a91a413a50fdd8847a747c4a7324167, i386: add sleazy FPU
> > optimization
> >
> > It's a side affect though. This is the failing scenario:
> >
> > process 'A' in save_i387_ia32() just after clear_used_math()
> >
> > Got an interrupt and pre-empted out.
> >
> > At the next context switch to process 'A' again, kernel tries to restore
> > the math state proactively and sees a fpu_counter > 0 and
> > !tsk_used_math()
> >
> > This results in init_fpu() during the __switch_to()'s
> > math_state_restore()
> >
> > And resulting in fpu corruption which will be saved/restored
> > (save_i387_fxsave and restore_i387_fxsave) during the remaining
> > part of the signal handling after the context switch.
> >
> > So in short, yes the problem shows up for preempt enabled kernels and
> > the same patch I sent out 30 mins back (appended again) should fix your
> > issue aswell. Can you please test this and check if my theory is indeed
> > correct. If it fixes your issue aswell, then I will re-post the patch
> > with a new changelog and updated comments in the patch.
> >
> 
> I have applied your patch to both an openSUSE 2.6.22.17 kernel and a 
> 2.6.26-rc4 kernel.org kernel and run the test with Einstein@home on 
> two different machines. One machine is running 24 hours now, the other 
> 18 hours.
> 
> During this time there were no faults on both machines.
> 
> As it never before took more than 12 hours until the first appearance 
> of the problem, I think your patch fixed it. Very good work!
> 
> I will continue running the test, but I believe we can call this 
> fixed.
> 
> Thank you again!

fix applied to tip/x86/urgent. Thanks everyone, nice find!

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-04 10:53           ` Ingo Molnar
@ 2008-06-04 12:55             ` Steven Rostedt
  2008-06-04 13:02               ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2008-06-04 12:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jürgen Mell, Suresh Siddha, Andi Kleen, linux-kernel, arjan,
	hpa, tglx, Simon Holm Thøgersen


On Wed, 4 Jun 2008, Ingo Molnar wrote:
> * Jürgen Mell <j.mell@t-online.de> wrote:
> >
> > Thank you again!

No, thank YOU. What you gave was excellent feedback.

>
> fix applied to tip/x86/urgent. Thanks everyone, nice find!
>

Ingo, should this be forward to the stable branch?

-- Steve


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: CONFIG_PREEMPT causes corruption of application's FPU stack
  2008-06-04 12:55             ` Steven Rostedt
@ 2008-06-04 13:02               ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2008-06-04 13:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jürgen Mell, Suresh Siddha, Andi Kleen, linux-kernel, arjan,
	hpa, tglx, Simon Holm Thøgersen, stable


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > fix applied to tip/x86/urgent. Thanks everyone, nice find!
> 
> Ingo, should this be forward to the stable branch?

yes indeed. Cc:-ed.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-06-04 13:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-17 16:31 CONFIG_PREEMPT causes corruption of application's FPU stack Jürgen Mell
2008-05-18 15:07 ` Steven Rostedt
2008-05-18 15:57   ` Jürgen Mell
  -- strict thread matches above, loose matches on Subject: below --
2008-05-24 18:52 j.mell
2008-06-01  9:01 j.mell
2008-06-01 11:40 ` Andi Kleen
2008-06-01 16:47   ` Jürgen Mell
2008-06-02 21:37     ` Suresh Siddha
2008-06-02 22:57       ` Suresh Siddha
2008-06-03  6:02         ` Jürgen Mell
2008-06-04  7:44         ` Jürgen Mell
2008-06-04 10:53           ` Ingo Molnar
2008-06-04 12:55             ` Steven Rostedt
2008-06-04 13:02               ` Ingo Molnar
2008-06-01 12:12 ` Steven Rostedt
2008-06-01 17:11 ` Simon Holm Thøgersen
2008-06-02 21:31   ` Suresh Siddha
2008-06-03 13:23     ` Simon Holm Thøgersen
2008-06-03 19:43       ` Suresh Siddha
2008-06-03 21:08         ` Simon Holm Thøgersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox