public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches
@ 2004-06-16 23:01 eliot
  2004-06-17 10:35 ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: eliot @ 2004-06-16 23:01 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel, tgriggs

Hi Andi,

	you asked:
| On what CPUs does the failure occur? Linux uses different paths
| depending on if the CPU supports SSE or not.

Travis responded:

| We run on both AMDs (Durons and Athlons) as well as PII, PIII, and 
| PIV's. Our kernels are all compiled as generic 586+. Though when we were 
| testing for this, we did try the more generic 486+ option, as well as 
| exact processor matches for the AMD at least. I don't remember it making 
| a difference.

+-----------------------------
| Date:	Wed, 16 Jun 2004 23:40:18 +0200
| From:	Andi Kleen <ak@muc.de>
| Subject:	Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches



| eliot@cincom.com writes:

| > 	I am the team lead and chief VM developer for a Smaltalk
| > 	implementation based on a JIT execution engine.  Our customers
| > 	have been seeing rare incorrect floating-point results in
| > 	intensive fp applications on 2.6 kernels using various x86
| > 	compatible processors.  These problems do not occur on
| > 	previous kernel versons.  We recently had occasion to
| > 	reimplement our fp primitives to avoid severe performance
| > 	problems on Xeon processors that were traced to Xeon's
| > 	relatively slow implementation of fnclex and fstsw.  The older

| Funny, Linux just added fnclex to a critical path on popular request.
| But I guess it will need to be removed again, we already discussed
| that. 


| > I don't know whether any action on your part is appropriate.  The
| > use of the FPU status flags is presumably rare on linux (I believe
| > that neither gcc nor glibc make use of them).  But "exotic"
| > execution machinery such as runtimes for dynamic or functional
| > languages (language implementations that may not use IEEE arithmetic
| > and instead flag Infs and NaNs as an error) may fall foul of this
| > issue.  Since previous versions of the kernel on x86 apparently do
| > preserve the FPU status flags perhaps its simple to preserve the old
| > behaviour.  At the very least let me suggest you document the
| > limitation.

| This sounds like a serious kernel bug that should be fixed if
| true. Can you perhaps create a simple demo program that shows the
| problem and post it?

| On what CPUs does the failure occur? Linux uses different paths
| depending on if the CPU supports SSE or not.

| Does your program receive signals? Could it be related to them?

| -Andi
---
Eliot Miranda                 ,,,^..^,,,                mailto:eliot@cincom.com
VisualWorks Engineering, Cincom  Smalltalk: scene not herd  Tel +1 408 216 4581
3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500



^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches
@ 2004-06-16 22:26 eliot
  2004-06-17 10:39 ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: eliot @ 2004-06-16 22:26 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel, mingo, eliot

Hi Andi,


| Funny, Linux just added fnclex to a critical path on popular request.
| But I guess it will need to be removed again, we already discussed
| that. 

Yes, this is a right royal pain.  We have problems around fnclex.  Because people can call arbitrary code from within Smalltalk we have to do an fnclex prior to an fp operation if we're to trap NaN/Inf results.  But doing so prior to each fp oiperation is becomming increasingly slower on more "modern" x86 implementations.  So we've now moved the fnclex to the return from an external language call as this tends to have lower dynamic frequency.  So from where I sit (a mushroom-like position) the issue feels like a design flaw in the x87 fpu...

| > I don't know whether any action on your part is appropriate.  The
| > use of the FPU status flags is presumably rare on linux (I believe
| > that neither gcc nor glibc make use of them).  But "exotic"
| > execution machinery such as runtimes for dynamic or functional
| > languages (language implementations that may not use IEEE arithmetic
| > and instead flag Infs and NaNs as an error) may fall foul of this
| > issue.  Since previous versions of the kernel on x86 apparently do
| > preserve the FPU status flags perhaps its simple to preserve the old
| > behaviour.  At the very least let me suggest you document the
| > limitation.

| This sounds like a serious kernel bug that should be fixed if
| true. Can you perhaps create a simple demo program that shows the
| problem and post it?

OK, I'm working on it.  I have to get one of our customers to run the test because I don't have a 2.6 kernel handy.  As Im in release crunch mode right now there may be a couple of weeks delay.  But I should have a test program to you soon.

| On what CPUs does the failure occur? Linux uses different paths
| depending on if the CPU supports SSE or not.

This answer should be more prompt.  Say tomorrow.

| Does your program receive signals? Could it be related to them?

Could be. Yes we do have to handle signals.  But I'm pretty confident the issue is with the FPU flags because as far as fp goes the only significant change between the version that shows the problem and that that doesn't is the use of the FPU flags (via fxam, fstsw).  The version that uses fxam & fstsw doesn;t show the problem on kernels prior to 2.6.  In any case if I'm right the test proram should show it pretty clearly.

As I say, give me a couple of weeks or so.

Cheers,
---
Eliot Miranda                 ,,,^..^,,,                mailto:eliot@cincom.com
VisualWorks Engineering, Cincom  Smalltalk: scene not herd  Tel +1 408 216 4581
3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500



^ permalink raw reply	[flat|nested] 8+ messages in thread
* PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches
@ 2004-06-16 20:32 eliot
  2004-06-16 21:03 ` Richard B. Johnson
  0 siblings, 1 reply; 8+ messages in thread
From: eliot @ 2004-06-16 20:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: tgriggs, eliot

Hi,

	I am the team lead and chief VM developer for a Smaltalk implementation based on a JIT execution engine.  Our customers have been seeing rare incorrect floating-point results in intensive fp applications on 2.6 kernels using various x86 compatible processors.  These problems do not occur on previous kernel versons.  We recently had occasion to reimplement our fp primitives to avoid severe performance problems on Xeon processors that were traced to Xeon's relatively slow implementation of fnclex and fstsw.  The older implementaton would produce a result and test for a valid (non NaN, non Inf) result by examining the FPU status flags via fstsw.  The newer implementation produces a result and tests its exponent for the NaN/Inf exponent.  The new implementation does not show the rare incorrect floating-point results in intensive fp applications on 2.6 kernels.  My conclusion is that context switches between the production of the result and the execution of the fstsw are the culprit, and that the context switch machinery fails to preserve the FPU status flags.

I don't know whether any action on your part is appropriate.  The use of the FPU status flags is presumably rare on linux (I believe that neither gcc nor glibc make use of them).  But "exotic" execution machinery such as runtimes for dynamic or functional languages (language implementations that may not use IEEE arithmetic and instead flag Infs and NaNs as an error) may fall foul of this issue.  Since previous versions of the kernel on x86 apparently do preserve the FPU status flags perhaps its simple to preserve the old behaviour.  At the very least let me suggest you document the limitation.

Sincerely,
---
Eliot Miranda                 ,,,^..^,,,                mailto:eliot@cincom.com
VisualWorks Engineering, Cincom  Smalltalk: scene not herd  Tel +1 408 216 4581
3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-06-17 10:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <27Pmy-sO-25@gated-at.bofh.it>
2004-06-16 21:40 ` PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches Andi Kleen
2004-06-16 23:01 eliot
2004-06-17 10:35 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2004-06-16 22:26 eliot
2004-06-17 10:39 ` Andi Kleen
2004-06-16 20:32 eliot
2004-06-16 21:03 ` Richard B. Johnson
2004-06-17  6:43   ` Denis Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox