public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 0/2] sLeAZY FPU feature
@ 2006-07-01 17:11 Arjan van de Ven
  2006-07-01 17:12 ` [patch 1/2] sLeAZY FPU feature - x86_64 support Arjan van de Ven
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Arjan van de Ven @ 2006-07-01 17:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, ak

Hi,

the two patches in this series (the x86-64 on by me, the i386 one by
Chuck Ebbert) change how the lazy fpu feature works. In the current
situation, we are 100% lazy, meaning that after every context switch,
the application takes a trap on the first FPU use, which then restores
the FPU context.

The sLeAZY FPU patch changes this behavior; if a process has used the
FPU for 5 stints at a row, the behavior becomes proactive and the FPU
context is restored during the regular context switch already. This
means we can avoid the trap.

The underlying assumption is that if a process uses 5 times consecutive,
it's likely to do it the 6th and later times as well (eg it's not a
one-off behavior).

There is a limit built in; this proactive behavior resets after 255
times, so that when a process is long lived and chances behavior, it'll
still get the right behavior (for performance) after some time.

Chuck measured a +/- 0.4% performance gain, and my experiments show a
similar improvement.

Greetings,
   Arjan van de Ven


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [patch 0/2] sLeAZY FPU feature
@ 2006-07-02  0:57 Chuck Ebbert
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Ebbert @ 2006-07-02  0:57 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, linux-kernel, Andi Kleen, Nick Piggin

In-Reply-To: <1151782942.3195.56.camel@laptopd505.fenrus.org>

On Sat, 01 Jul 2006 21:42:22 +0200, Arjan van de Ven wrote:

> > What sort of test?
>
> the one I did was long running FPU app (calculating PI using FPU)

Mine was just running a program that loops doing getpid() in one window
and this in another:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

#define rdtscll(t)      asm("rdtsc" : "=A" (t))

int main(int argc, char * const argv[])
{
        long long tsc1, tsc2;
        long double ld = 0.0;
        int i, iters = 999999999;

        rdtscll(tsc1);
        for (i = 0; i < iters; i++)
                ld += 1.0;
        rdtscll(tsc2);

        printf("count: %Lf, clocks: %llu\n", ld, tsc2 - tsc1);

        return 0;
}

So the ~0.4% gain I saw (averaging 10 tests) was likely the minimum
and Arjan's 8.5% gain when switching tasks after every FPU operation
is the max.

-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-07-02  1:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-01 17:11 [patch 0/2] sLeAZY FPU feature Arjan van de Ven
2006-07-01 17:12 ` [patch 1/2] sLeAZY FPU feature - x86_64 support Arjan van de Ven
2006-07-01 21:49   ` Andi Kleen
2006-07-01 21:56     ` Arjan van de Ven
2006-07-01 17:13 ` [patch 2/2] sLeAZY FPU feature - i386 support Arjan van de Ven
2006-07-01 17:40 ` [patch 0/2] sLeAZY FPU feature Nick Piggin
2006-07-01 19:42   ` Arjan van de Ven
  -- strict thread matches above, loose matches on Subject: below --
2006-07-02  0:57 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox