public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] i386: use C code for current_thread_info()
@ 2006-06-11 20:43 Chuck Ebbert
  2006-06-11 21:05 ` Emmanuel Fleury
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Ebbert @ 2006-06-11 20:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

In-Reply-To: <Pine.LNX.4.64.0606111225380.5498@g5.osdl.org>

On Sun, 11 Jun 2006 12:33:10 -0700, Linus Torbalds wrote:

> On Sun, 11 Jun 2006, Chuck Ebbert wrote:
> >
> > Using C code for current_thread_info() lets the compiler optimize it.
> 
> Ok, me likee. I just worry that this might break some older gcc version. 
> Have you checked with gcc-3.2 or something?

I just tried gcc 3.3.3 and the kernel gets a little bigger but it boots
and runs OK. That's the oldest compiler I can find.

   text    data     bss     dec     hex filename
3593627  559864  342728 4496219  449b5b 2.6.17-rc6-32-post/vmlinux
3591371  559864  342728 4493963  44928b 2.6.17-rc6-32/vmlinux
  +2256

Looking at the generated code, it seems the compiler just makes dumb
choices and tends to recompute current_thread_info() in unlikely code
paths even when there is no register pressure.  4.0.2 makes better
choices.

-- 
Chuck


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [patch] i386: use C code for current_thread_info()
@ 2006-06-13  5:26 Albert Cahalan
  0 siblings, 0 replies; 15+ messages in thread
From: Albert Cahalan @ 2006-06-13  5:26 UTC (permalink / raw)
  To: linux-kernel, emmanuel.fleury, torvalds, s0348365, jengelh,
	76306.1226, akpm

Chuck Ebbert writes:

> Using C code for current_thread_info() lets the compiler optimize it.
> With gcc 4.0.2, kernel is smaller:

The often-forgotten __attribute__((const)) might do the job.
The function is indeed const as far as gcc can see, except
perhaps near the schedular code that switches stacks.

This applies to many similar functions, like the ones used
to get current. It should work for the C code too.

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [patch] i386: use C code for current_thread_info()
@ 2006-06-13  1:50 Chuck Ebbert
  2006-06-13  6:43 ` Andreas Mohr
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Ebbert @ 2006-06-13  1:50 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: linux-kernel, Emmanuel Fleury, Linus Torvalds

In-Reply-To: <20060612184833.GA29177@rhlx01.fht-esslingen.de>

On Mon, 12 Jun 2006 20:48:33 +0200, Andreas Mohr wrote:

> > Kernel code starts out ~30K bytes smaller with gcc 4.1 and using C
> > for current_thread_info() helps even more than with 4.0.  Nice...
> 
> Especially since current_thread_info() often has an AGI stall (read:
> severe pipeline stall) since it often cannot properly intermingle
> with nearby opcodes due to lack of suitable ones, e.g. at a
> function prologue.
> mov    $0xffffe000,%eax
> and    %esp,%eax
> are fundamentally incompatible due to having to wait for the address
> generation before the "and" can be executed.
> This shows up during profiling quite noticeably (IIRC 8 hits vs. 1 to 2
> hits on other places), which really hurts since this function is used
> basically *everywhere*.

Hmmm.  The compiler does it this way:

  mov    %esp,%eax
  and    $0xffffe000,%eax

which could be faster because esp can be moved to eax while the mask
is being fetched.

-- 
Chuck


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [patch] i386: use C code for current_thread_info()
@ 2006-06-12 17:14 Chuck Ebbert
  2006-06-12 17:55 ` Adrian Bunk
  2006-06-12 18:48 ` Andreas Mohr
  0 siblings, 2 replies; 15+ messages in thread
From: Chuck Ebbert @ 2006-06-12 17:14 UTC (permalink / raw)
  To: Emmanuel Fleury; +Cc: linux-kernel

In-Reply-To: <448C85B7.1010902@labri.fr>

On Sun, 11 Jun 2006 23:05:59 +0200, Emmanuel Fleury wrote:

> > Looking at the generated code, it seems the compiler just makes dumb
> > choices and tends to recompute current_thread_info() in unlikely code
> > paths even when there is no register pressure.  4.0.2 makes better
> > choices.
>
> What size with gcc 4.1.2 ? (just curiosity)

The 3.3 vs 4.0 comparisons were with two different configs, so only
relative gain/loss with asm vs. C could be compared.

I downloaded gcc 4.1.1 and compared to 4.0.2 with the exact same config,
since I was curious how much better it might be overall.

gcc 4.0.2:
   text	   data	    bss	    dec	    hex	filename
3645212	 555556	 312024	4512792	 44dc18	2.6.17-rc6-nb-C/vmlinux
3647276	 555556	 312024	4514856	 44e428	2.6.17-rc6-nb-asm/vmlinux
  -2064

gcc 4.1.1:
   text	   data	    bss	    dec	    hex	filename
3614686	 520416	 311672	4446774	 43da36	2.6.17-rc6-nb-C/vmlinux
3616942	 520416	 311672	4449030	 43e306	2.6.17-rc6-nb-asm/vmlinux
  -2256

Kernel code starts out ~30K bytes smaller with gcc 4.1 and using C
for current_thread_info() helps even more than with 4.0.  Nice...

Maybe a patch that enables C code for gcc 4.0+ would work, since
on 3.3 the asm code is better?

-- 
Chuck

^ permalink raw reply	[flat|nested] 15+ messages in thread
* [patch] i386: use C code for current_thread_info()
@ 2006-06-11 19:07 Chuck Ebbert
  2006-06-11 19:33 ` Linus Torvalds
  2006-06-11 19:42 ` Jan Engelhardt
  0 siblings, 2 replies; 15+ messages in thread
From: Chuck Ebbert @ 2006-06-11 19:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Linus Torvalds

Using C code for current_thread_info() lets the compiler optimize it.
With gcc 4.0.2, kernel is smaller:

    text           data     bss     dec     hex filename
 3645212         555556  312024 4512792  44dc18 2.6.17-rc6-nb-post/vmlinux
 3647276         555556  312024 4514856  44e428 2.6.17-rc6-nb/vmlinux
 -------
   -2064

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>

--- 2.6.17-rc6-32.orig/include/asm-i386/thread_info.h
+++ 2.6.17-rc6-32/include/asm-i386/thread_info.h
@@ -84,17 +84,15 @@ struct thread_info {
 #define init_stack		(init_thread_union.stack)
 
 
+/* how to get the current stack pointer from C */
+register unsigned long current_stack_pointer asm("esp") __attribute_used__;
+
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
 {
-	struct thread_info *ti;
-	__asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~(THREAD_SIZE - 1)));
-	return ti;
+	return (struct thread_info *)(current_stack_pointer & ~(THREAD_SIZE - 1));
 }
 
-/* how to get the current stack pointer from C */
-register unsigned long current_stack_pointer asm("esp") __attribute_used__;
-
 /* thread information allocation */
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #define alloc_thread_info(tsk)					\
-- 
Chuck

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-06-13  9:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-11 20:43 [patch] i386: use C code for current_thread_info() Chuck Ebbert
2006-06-11 21:05 ` Emmanuel Fleury
  -- strict thread matches above, loose matches on Subject: below --
2006-06-13  5:26 Albert Cahalan
2006-06-13  1:50 Chuck Ebbert
2006-06-13  6:43 ` Andreas Mohr
2006-06-13  9:27   ` Avi Kivity
2006-06-12 17:14 Chuck Ebbert
2006-06-12 17:55 ` Adrian Bunk
2006-06-12 18:48 ` Andreas Mohr
2006-06-11 19:07 Chuck Ebbert
2006-06-11 19:33 ` Linus Torvalds
2006-06-11 19:42 ` Jan Engelhardt
2006-06-11 20:33   ` Jan Engelhardt
2006-06-11 20:44   ` Alistair John Strachan
2006-06-12  8:10     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox