* [PATCH] Fast path context switch - microoptimize FPU reload
@ 2003-03-09 16:39 Andi Kleen
0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2003-03-09 16:39 UTC (permalink / raw)
To: linux-kernel; +Cc: torvalds
Following some changes on x86-64.
When cpu_has_fxsr is defined to 1 like in many kernels unlazy_fpu can
collapse to three instructions. For that inlining is a very good idea.
Otherwise it's 10 instructions or so, which can be still inlined.
We don't need the lock prefix to test our local thread flags state.
Unfortunately test_thread_flag currently always uses test_bit which
has a LOCK on SMP, but that's unnecessary. LOCK is costly on P4,
so it's a good idea to avoid it.
Work around this for now by testing directly. Better would be
probably to define __set_bit for all architectures to not guarantee
atomicity and then always use that for local thread_info accesses
in linux/thread_info.h
-Andi
diff -u linux-2.5.63-work/arch/i386/kernel/i387.c-FPU linux-2.5.63-work/arch/i386/kernel/i387.c
--- linux-2.5.63-work/arch/i386/kernel/i387.c-FPU 2003-02-10 19:39:17.000000000 +0100
+++ linux-2.5.63-work/arch/i386/kernel/i387.c 2003-03-05 00:23:19.000000000 +0100
@@ -52,24 +52,6 @@
* FPU lazy state save handling.
*/
-static inline void __save_init_fpu( struct task_struct *tsk )
-{
- if ( cpu_has_fxsr ) {
- asm volatile( "fxsave %0 ; fnclex"
- : "=m" (tsk->thread.i387.fxsave) );
- } else {
- asm volatile( "fnsave %0 ; fwait"
- : "=m" (tsk->thread.i387.fsave) );
- }
- clear_tsk_thread_flag(tsk, TIF_USEDFPU);
-}
-
-void save_init_fpu( struct task_struct *tsk )
-{
- __save_init_fpu(tsk);
- stts();
-}
-
void kernel_fpu_begin(void)
{
preempt_disable();
diff -u linux-2.5.63-work/include/asm-i386/i387.h-FPU linux-2.5.63-work/include/asm-i386/i387.h
--- linux-2.5.63-work/include/asm-i386/i387.h-FPU 2003-02-10 19:38:49.000000000 +0100
+++ linux-2.5.63-work/include/asm-i386/i387.h 2003-03-05 00:23:19.000000000 +0100
@@ -21,23 +21,41 @@
/*
* FPU lazy state save handling...
*/
-extern void save_init_fpu( struct task_struct *tsk );
extern void restore_fpu( struct task_struct *tsk );
extern void kernel_fpu_begin(void);
#define kernel_fpu_end() do { stts(); preempt_enable(); } while(0)
+static inline void __save_init_fpu( struct task_struct *tsk )
+{
+ if ( cpu_has_fxsr ) {
+ asm volatile( "fxsave %0 ; fnclex"
+ : "=m" (tsk->thread.i387.fxsave) );
+ } else {
+ asm volatile( "fnsave %0 ; fwait"
+ : "=m" (tsk->thread.i387.fsave) );
+ }
+ tsk->thread_info->flags &= ~TIF_USEDFPU;
+}
+
+static inline void save_init_fpu( struct task_struct *tsk )
+{
+ __save_init_fpu(tsk);
+ stts();
+}
+
+
#define unlazy_fpu( tsk ) do { \
- if (test_tsk_thread_flag(tsk, TIF_USEDFPU)) \
+ if ((tsk)->thread_info->flags & _TIF_USEDFPU) \
save_init_fpu( tsk ); \
} while (0)
#define clear_fpu( tsk ) \
do { \
- if (test_tsk_thread_flag(tsk, TIF_USEDFPU)) { \
+ if ((tsk)->thread_info->flags & _TIF_USEDFPU) { \
asm volatile("fwait"); \
- clear_tsk_thread_flag(tsk, TIF_USEDFPU); \
+ (tsk)->thread_info->flags &= ~_TIF_USEDFPU; \
stts(); \
} \
} while (0)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fast path context switch - microoptimize FPU reload
@ 2003-03-09 17:15 Manfred Spraul
2003-03-09 17:30 ` Andi Kleen
0 siblings, 1 reply; 3+ messages in thread
From: Manfred Spraul @ 2003-03-09 17:15 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
Andi wrote:
>We don't need the lock prefix to test our local thread flags state.
>Unfortunately test_thread_flag currently always uses test_bit which
>has a LOCK on SMP, but that's unnecessary. LOCK is costly on P4,
>so it's a good idea to avoid it.
>
>
No, LOCK is required: the TIF_ flags word is also used for signal
delivery - writing without lock could corrupt state.
What about moving TIF_USEDFPU from the thread_info into
task_struct->flags? This flag word is only accessed by "current", no
special atomicity requirements.
--
Manfred
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fast path context switch - microoptimize FPU reload
2003-03-09 17:15 [PATCH] Fast path context switch - microoptimize FPU reload Manfred Spraul
@ 2003-03-09 17:30 ` Andi Kleen
0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2003-03-09 17:30 UTC (permalink / raw)
To: Manfred Spraul; +Cc: Andi Kleen, linux-kernel
On Sun, Mar 09, 2003 at 06:15:09PM +0100, Manfred Spraul wrote:
> What about moving TIF_USEDFPU from the thread_info into
> task_struct->flags? This flag word is only accessed by "current", no
> special atomicity requirements.
There is still a PF_USEDFPU defined there that can be used.
I'll change it to using that.
But it would be a good idea to completely split the flags that are
changed externally like signals from only process local bits.
Then we could avoid the wasteful LOCK prefixes for everything local.
-Andi
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-03-09 17:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-09 17:15 [PATCH] Fast path context switch - microoptimize FPU reload Manfred Spraul
2003-03-09 17:30 ` Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2003-03-09 16:39 Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox