public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Isaacson <adi@hexapodia.org>
To: Daniel Forrest <forrest@lmcg.wisc.edu>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux
Date: Fri, 26 Mar 2004 15:45:19 -0600	[thread overview]
Message-ID: <20040326214519.GA22309@hexapodia.org> (raw)
In-Reply-To: <200403262054.i2QKsV223748@rda07.lmcg.wisc.edu>

linux-kernel isn't the right forum for this, but I'll take a stab
anyways.

On Fri, Mar 26, 2004 at 02:54:31PM -0600, Daniel Forrest wrote:
[snip: 180 dual Xeon boxes]
> I am running one of our applications that has been compiled using gcc
> with the -ffast-math option.  I am finding that the identical program
> using the same input data files is producing different results on
> different machines.  However, the differences are all less than the
> precision of a single-precision floating point number.  By this I mean
> that if the results (which are written to 15 digits of precision) are
> only compared to 7 digits then the results are the same.  Also, most
> of the time the 15 digit values are the same.
> 
> My question is this: Why aren't the results always the same?  What is
> the -ffast-math option doing?  How are the excess bits of precision
> dealt with during context switches?  Shouldn't the same binary with
> the same inputs produce the same output on identical hardware?

The kernel should be doing the right thing to preserve FPU state during
context switches.  That doesn't prevent the app from doing things wrong
and thus getting the wrong answer (perhaps only under certain
circumstances).  And of course the kernel might have bugs (though it's
unlikely to be as simple as "doesn't preserve FPU state correctly"; a
lot of people depend on that codepath being right.)

Likely there is some difference in one of the following areas:
 - hardware problems
 - kernel
 - libraries
 - CPU microcode

Or, you have a bug in your program which is triggered by some
environmental factor.  (For example, an inter-thread race condition
affected by IO interrupts.)

To eliminate them:
 - first, run memtest86 or a similar program to verify that you are not
   simply victim of a bad memory stick.
 - next, check that the kernel, libc, and libm are identical across the
   machines that display the problem.
 - next, check /proc/cpuinfo and dmesg(1) output to verify that your
   CPUs are the same stepping, and running the same microcode.  (The
   likelihood that this is the problem is so small as to be almost not
   worth mentioning.)

> I have run the same test with the program compiled without -ffast-math
> enabled and the results are always identical.

You don't say how many different results you've gotten.  Is there just
one correct and one incorrect result?  Or do different runs give
different incorrect results?  What is the software environment?
(language, libraries, threading, etc.)

Basically, at this point you haven't provided us enough information to
be able to even point a finger at the kernel.  It's certainly possible
that there's a bug, but it's pretty unlikely (IMO).  I'd be looking at
hardware and at threading problems in the apps, first.

-andy

  parent reply	other threads:[~2004-03-26 21:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-26 20:54 Somewhat OT: gcc, x86, -ffast-math, and Linux Daniel Forrest
2004-03-26 21:26 ` Richard B. Johnson
2004-03-26 21:45 ` Andy Isaacson [this message]
2004-03-27 14:24 ` Jamie Lokier
2004-03-27 15:13   ` Jakub Jelinek
2004-03-29  8:47 ` Eric W. Biederman
2004-03-31  7:14 ` J.A. Magallon
  -- strict thread matches above, loose matches on Subject: below --
2004-03-27 14:48 Nick Warne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040326214519.GA22309@hexapodia.org \
    --to=adi@hexapodia.org \
    --cc=forrest@lmcg.wisc.edu \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox