public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Chuck Ebbert <76306.1226@compuserve.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: [patch 2.6.13-rc3a] i386: inline restore_fpu
Date: Mon, 25 Jul 2005 15:26:51 -0400	[thread overview]
Message-ID: <42E53CFB.7080300@tmr.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0507221028070.6074@g5.osdl.org>

Linus Torvalds wrote:
> 
> On Fri, 22 Jul 2005, Adrian Bunk wrote:
> 
>>If this patch makes a difference, could you do me a favour and check 
>>whether replacing the current cpu_has_fxsr #define in
>>include/asm-i386/cpufeature.h with
>>
>>  #define cpu_has_fxsr           1
>>
>>on top of your patch brings an additional improvement?
> 
> 
> It would be really sad if it made a difference. There might be a branch
> mispredict, but the real expense of the fnsave/fxsave will be that
> instruction itself, and any cache misses associated with it. The 9%
> performace difference would almost have to be due to a memory bank
> conflict or something (likely some unnecessary I$ prefetching that
> interacts badly with the writeback needed for the _big_ memory write
> forced by the fxsave).
> 
> I can't see any way that a single branch mispredict could make that big of 
> a difference, but I _can_ see how bad memory access patterns could do it.
> 
> Btw, the switch from fnsave to fxsave (and thus the change from a 112-byte
> save area to a 512-byte one, or whatever the exact details are) caused
> _huge_ performance degradation for various context switching benchmarks. I
> really hated that, but obviously the need to support SSE2 made it
> non-optional. The point being that the real overhead is that big memory 
> read/write in fxrestor/fxsave.
> 
> What _could_ make a bigger difference is not doing the lazy FPU at all.  
> That lazy FPU is a huge optimization on 99.9% of all loads, but it sounds
> like java/volanomark are broken and always use the FPU, and then we take a
> big hit on doing the FP restore exception (an exception is a lot more
> expensive than a mispredict).

It seems expensive to do the save/restore when it isn't needed, that's 
why the code got lazy. Would it be useful to have a small flag or count 
field and start by assuming that FPU is not used, and if the exception 
takes place set the count to unconditionally save the FP state for some 
number of context switches and then try reverting to lazy save?

That 99.9% may be a guess, but I suspect that there are a lot of 
applications which alternate between using FPU and not, even if they do 
use FPU for some parts of the application. That way the performance of 
lazy save would be realized for the common applications, and the 
overhead of exception would be greatly reduced for both the ill-behaved 
and legitimately FPU intensive application.

> 
> Something like the following (totally untested) should make it be
> non-lazy. It's going to slow down normal task switches, but might speed up 
> the "restoring FP context all the time" case.
> 
> Chuck? This should work fine with or without your inline thing. Does it 
> make any difference?

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

  reply	other threads:[~2005-07-25 19:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-22  3:06 [patch 2.6.13-rc3a] i386: inline restore_fpu Chuck Ebbert
2005-07-22  3:27 ` Andrew Morton
2005-07-22  5:22   ` Linus Torvalds
2005-07-22 11:23     ` Arjan van de Ven
2005-07-22  8:14 ` Adrian Bunk
2005-07-22 18:13   ` Linus Torvalds
2005-07-25 19:26     ` Bill Davidsen [this message]
2005-07-22 23:19 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2005-07-22  9:58 Chuck Ebbert
2005-07-23  7:09 Chuck Ebbert
2005-07-23 17:38 ` Linus Torvalds
2005-07-23 17:46   ` Arjan van de Ven
2005-07-23 18:02     ` Linus Torvalds
2005-07-23  7:09 Chuck Ebbert
2005-07-23 17:33 ` Linus Torvalds
     [not found] <200507212309_MC3-1-A534-95EF@compuserve.com.suse.lists.linux.kernel>
     [not found] ` <20050722132756.578acca7.akpm@osdl.org.suse.lists.linux.kernel>
2005-07-23 15:35   ` Andi Kleen
2005-07-24 12:56 Kenneth Parrish
2005-07-25  2:34 Kenneth Parrish
2005-07-26 21:23 Chuck Ebbert
2005-07-26 21:23 Chuck Ebbert
2005-07-26 21:47 ` Linus Torvalds
2005-07-27  1:40 linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42E53CFB.7080300@tmr.com \
    --to=davidsen@tmr.com \
    --cc=76306.1226@compuserve.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox