All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/2] sLeAZY FPU feature
@ 2006-07-01 17:11 Arjan van de Ven
  2006-07-01 17:12 ` [patch 1/2] sLeAZY FPU feature - x86_64 support Arjan van de Ven
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Arjan van de Ven @ 2006-07-01 17:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, ak

Hi,

the two patches in this series (the x86-64 on by me, the i386 one by
Chuck Ebbert) change how the lazy fpu feature works. In the current
situation, we are 100% lazy, meaning that after every context switch,
the application takes a trap on the first FPU use, which then restores
the FPU context.

The sLeAZY FPU patch changes this behavior; if a process has used the
FPU for 5 stints at a row, the behavior becomes proactive and the FPU
context is restored during the regular context switch already. This
means we can avoid the trap.

The underlying assumption is that if a process uses 5 times consecutive,
it's likely to do it the 6th and later times as well (eg it's not a
one-off behavior).

There is a limit built in; this proactive behavior resets after 255
times, so that when a process is long lived and chances behavior, it'll
still get the right behavior (for performance) after some time.

Chuck measured a +/- 0.4% performance gain, and my experiments show a
similar improvement.

Greetings,
   Arjan van de Ven


^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: [patch 1/2] sLeAZY FPU feature - x86_64 support
@ 2006-07-02  1:39 Voluspa
  2006-07-02  7:39 ` Arjan van de Ven
  0 siblings, 1 reply; 9+ messages in thread
From: Voluspa @ 2006-07-02  1:39 UTC (permalink / raw)
  To: arjan; +Cc: ak, linux-kernel

You have a very strange 2.6.17 kernel there. The include/linux/sched.h is so 
incompatible that the patching (with fuzz) places "unsigned char fpu_counter;" 
in a totally unrelated struct, and not in "struct task_struct  {".

Here's a working rebase of that part - sorry about mangling by this webmail 
client... Btw, the whole thing has no measurable effect on real world stuff 
like rendering through blender - on my machine, at least.

diff -Nur linux-2.6.17-git19-original/include/linux/sched.h linux-2.6.17-git19-
sleazyfpu/include/linux/sched.h
--- linux-2.6.17-git19-original/include/linux/sched.h   2006-07-02 01:17:
36.000000000 +0200
+++ linux-2.6.17-git19-sleazyfpu/include/linux/sched.h  2006-07-02 01:10:
42.000000000 +0200
@@ -926,6 +926,16 @@
         * cache last used pipe for splice
         */
        struct pipe_inode_info *splice_pipe;
+
+       /*
+       * fpu_counter contains the number of consecutive context switches
+       * that the FPU is used. If this is over a threshold, the lazy fpu
+       * saving becomes unlazy to save the trap. This is an unsigned char
+       * so that after 256 times the counter wraps and the behavior turns
+       * lazy again; this to deal with bursty apps that only use FPU for
+       * a short time
+       */
+       unsigned char fpu_counter;
 };
 
 static inline pid_t process_group(struct task_struct *tsk)

Mvh
Mats Johannesson


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-07-02  7:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-01 17:11 [patch 0/2] sLeAZY FPU feature Arjan van de Ven
2006-07-01 17:12 ` [patch 1/2] sLeAZY FPU feature - x86_64 support Arjan van de Ven
2006-07-01 21:49   ` Andi Kleen
2006-07-01 21:56     ` Arjan van de Ven
2006-07-01 17:13 ` [patch 2/2] sLeAZY FPU feature - i386 support Arjan van de Ven
2006-07-01 17:40 ` [patch 0/2] sLeAZY FPU feature Nick Piggin
2006-07-01 19:42   ` Arjan van de Ven
  -- strict thread matches above, loose matches on Subject: below --
2006-07-02  1:39 [patch 1/2] sLeAZY FPU feature - x86_64 support Voluspa
2006-07-02  7:39 ` Arjan van de Ven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.