public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Small optimization for UP in sched and prefetch
@ 2002-01-11 10:33 Rainer Keller
  2002-01-11 10:51 ` Jeff Garzik
  2002-01-11 19:59 ` [PATCH] Small optimization for UP in sched and prefetch Robert Love
  0 siblings, 2 replies; 6+ messages in thread
From: Rainer Keller @ 2002-01-11 10:33 UTC (permalink / raw)
  To: linux-kernel, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

Hello Marcelo & all,

After checking the assembler code of clearbit with regard to the
substition of task->processor, I added the last occurence of the
variable in init_idle.

Do You think the following patch is safe for inclusion into 2.4?

It's been running for 3 days on this computer.

Greetings,
raY

PS: Because of usage of prefetch in include/linux/list.h, the memory
prefetch is triggered 137 times on my configuration...
-- 
---------------------------------------------------------------
Rainer Keller           Mail: Keller@hlrs.de
Allmandring 30          WWW: http://www.hlrs.de/people/keller
70550 Stuttgart         Tel. 0711 / 685 5858

[-- Attachment #2: patch_prefetch_sched-2.4.17_second.diff --]
[-- Type: text/plain, Size: 4098 bytes --]

diff -ur linux-2.4.17/include/asm-i386/processor.h linux-2.4.17-mine/include/asm-i386/processor.h
--- linux-2.4.17/include/asm-i386/processor.h	Thu Nov 22 20:46:19 2001
+++ linux-2.4.17-mine/include/asm-i386/processor.h	Fri Jan 11 11:31:49 2002
@@ -478,8 +478,8 @@
 
 #define cpu_relax()	rep_nop()
 
-/* Prefetch instructions for Pentium III and AMD Athlon */
-#ifdef 	CONFIG_MPENTIUMIII
+/* Prefetch instructions for Pentium III, Pentium 4 and AMD Athlon */
+#if defined(CONFIG_MPENTIUMIII) || defined(CONFIG_MPENTIUM4)
 
 #define ARCH_HAS_PREFETCH
 extern inline void prefetch(const void *x)
@@ -502,7 +502,12 @@
 {
 	 __asm__ __volatile__ ("prefetchw (%0)" : : "r"(x));
 }
-#define spin_lock_prefetch(x)	prefetchw(x)
+
+#ifndef CONFIG_SMP
+#define spin_lock_prefetch(x) do { } while(0)
+#else
+#define spin_lock_prefetch(x) prefetchw(x)
+#endif
 
 #endif
 
diff -ur linux-2.4.17/include/linux/prefetch.h linux-2.4.17-mine/include/linux/prefetch.h
--- linux-2.4.17/include/linux/prefetch.h	Thu Nov 22 20:46:19 2001
+++ linux-2.4.17-mine/include/linux/prefetch.h	Thu Jan 10 13:21:39 2002
@@ -10,6 +10,7 @@
 #ifndef _LINUX_PREFETCH_H
 #define _LINUX_PREFETCH_H
 
+#include <linux/config.h>
 #include <asm/processor.h>
 #include <asm/cache.h>
 
@@ -26,7 +27,9 @@
 	
 	prefetch(x)  	- prefetches the cacheline at "x" for read
 	prefetchw(x)	- prefetches the cacheline at "x" for write
-	spin_lock_prefetch(x) - prefectches the spinlock *x for taking
+	spin_lock_prefetch(x) - prefetches the spinlock *x for taking,
+	                        if on SMP, otherwise not needed
+				(except for debugging reasons -- slow anyway).
 	
 	there is also PREFETCH_STRIDE which is the architecure-prefered 
 	"lookahead" size for prefetching streamed operations.
@@ -50,7 +53,11 @@
 
 #ifndef ARCH_HAS_SPINLOCK_PREFETCH
 #define ARCH_HAS_SPINLOCK_PREFETCH
+#ifndef CONFIG_SMP
+#define spin_lock_prefetch(x) do { } while(0)
+#else
 #define spin_lock_prefetch(x) prefetchw(x)
+#endif
 #endif
 
 #ifndef PREFETCH_STRIDE
diff -ur linux-2.4.17/kernel/sched.c linux-2.4.17-mine/kernel/sched.c
--- linux-2.4.17/kernel/sched.c	Fri Dec 21 18:42:04 2001
+++ linux-2.4.17-mine/kernel/sched.c	Fri Jan 11 11:30:43 2002
@@ -117,11 +117,13 @@
 #define idle_task(cpu) (init_tasks[cpu_number_map(cpu)])
 #define can_schedule(p,cpu) \
 	((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
+#define processor_of_tsk(tsk) (tsk)->processor
 
 #else
 
 #define idle_task(cpu) (&init_task)
 #define can_schedule(p,cpu) (1)
+#define processor_of_tsk(tsk) (0)
 
 #endif
 
@@ -172,7 +174,7 @@
 #ifdef CONFIG_SMP
 		/* Give a largish advantage to the same processor...   */
 		/* (this is equivalent to penalizing other processors) */
-		if (p->processor == this_cpu)
+		if (processor_of_tsk(p) == this_cpu)
 			weight += PROC_CHANGE_PENALTY;
 #endif
 
@@ -221,7 +223,7 @@
 	 * shortcut if the woken up task's last CPU is
 	 * idle now.
 	 */
-	best_cpu = p->processor;
+	best_cpu = processor_of_tsk(p);
 	if (can_schedule(p, best_cpu)) {
 		tsk = idle_task(best_cpu);
 		if (cpu_curr(best_cpu) == tsk) {
@@ -295,18 +297,18 @@
 	tsk = target_tsk;
 	if (tsk) {
 		if (oldest_idle != -1ULL) {
-			best_cpu = tsk->processor;
+			best_cpu = processor_of_tsk(tsk);
 			goto send_now_idle;
 		}
 		tsk->need_resched = 1;
-		if (tsk->processor != this_cpu)
-			smp_send_reschedule(tsk->processor);
+		if (processor_of_tsk(tsk) != this_cpu)
+			smp_send_reschedule(processor_of_tsk(tsk));
 	}
 	return;
 		
 
 #else /* UP */
-	int this_cpu = smp_processor_id();
+	const int this_cpu = smp_processor_id();
 	struct task_struct *tsk;
 
 	tsk = cpu_curr(this_cpu);
@@ -559,7 +561,7 @@
 	if (!current->active_mm) BUG();
 need_resched_back:
 	prev = current;
-	this_cpu = prev->processor;
+	this_cpu = processor_of_tsk(prev);
 
 	if (unlikely(in_interrupt())) {
 		printk("Scheduling in interrupt\n");
@@ -1311,7 +1313,7 @@
 	}
 	sched_data->curr = current;
 	sched_data->last_schedule = get_cycles();
-	clear_bit(current->processor, &wait_init_idle);
+	clear_bit(processor_of_tsk(current), &wait_init_idle);
 }
 
 extern void init_timervecs (void);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Small optimization for UP in sched and prefetch
  2002-01-11 10:33 [PATCH] Small optimization for UP in sched and prefetch Rainer Keller
@ 2002-01-11 10:51 ` Jeff Garzik
  2002-01-11 17:07   ` [PATCH] Small optimization for UP in sched and prefetch (take 3) Rainer Keller
  2002-01-11 19:59 ` [PATCH] Small optimization for UP in sched and prefetch Robert Love
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Garzik @ 2002-01-11 10:51 UTC (permalink / raw)
  To: Rainer Keller; +Cc: linux-kernel, Marcelo Tosatti

Rainer Keller wrote:
> PS: Because of usage of prefetch in include/linux/list.h, the memory
> prefetch is triggered 137 times on my configuration...

We need to merge in __builtin_prefetch support into the kernel, because
gcc 3.1 recently got support for it.  It would be nice at least for
future prefetch-related patches to perhaps call __builtin_prefetch, and
have the headers substitute a prefetch if the compiler does not support
it.

	Jeff



-- 
Jeff Garzik      | Alternate titles for LOTR:
Building 1024    | Fast Times at Uruk-Hai
MandrakeSoft     | The Took, the Elf, His Daughter and Her Lover
                 | Samwise Gamgee: International Hobbit of Mystery

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Small optimization for UP in sched and prefetch (take 3)
  2002-01-11 10:51 ` Jeff Garzik
@ 2002-01-11 17:07   ` Rainer Keller
  0 siblings, 0 replies; 6+ messages in thread
From: Rainer Keller @ 2002-01-11 17:07 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

Jeff Garzik wrote:
> > PS: Because of usage of prefetch in include/linux/list.h, the
> > memory prefetch is triggered 137 times on my configuration...
> We need to merge in __builtin_prefetch support into the kernel,
> because gcc 3.1 recently got support for it.  It would be nice at
> least for future prefetch-related patches to perhaps call
> __builtin_prefetch, and have the headers substitute a prefetch if
> the compiler does not support it.

OK, I added this suggestion -- it changes compiler.h to look out for
gcc-3.1 and then set macros for __builtin_prefetch.
But in order to get gcc 3.1 to issue prefetches, the -mcpu (and/or
-march?) must be set to pentium3 or pentium4 (same applies for AMD
athlons).

There's one minor nit to this:
"copy_to_user" has declared the parameter "from" as const and then does
a prefetch on "from".... This produces a warning:

/usr/src/linux-2.4.17-mine/include/asm/uaccess.h: In function
`__constant_copy_to_user':
/usr/src/linux-2.4.17-mine/include/asm/uaccess.h:549: warning: passing
arg 1 of `__builtin_prefetch' discards qualifiers from pointer target
type

How can I resolve this ?

Greetings,
raY
-- 
---------------------------------------------------------------
Rainer Keller           Mail: Keller@hlrs.de
Allmandring 30          WWW: http://www.hlrs.de/people/keller
70550 Stuttgart         Tel. 0711 / 685 5858

[-- Attachment #2: patch_prefetch_sched-2.4.17_third.diff --]
[-- Type: text/plain, Size: 5928 bytes --]

diff -ur linux-2.4.17/include/asm-i386/processor.h linux-2.4.17-mine/include/asm-i386/processor.h
--- linux-2.4.17/include/asm-i386/processor.h	Thu Nov 22 20:46:19 2001
+++ linux-2.4.17-mine/include/asm-i386/processor.h	Fri Jan 11 13:27:30 2002
@@ -478,8 +478,14 @@
 
 #define cpu_relax()	rep_nop()
 
-/* Prefetch instructions for Pentium III and AMD Athlon */
-#ifdef 	CONFIG_MPENTIUMIII
+/*
+ * If we don't have a compiler, which offers builtin_prefetch, do it our selve,
+ * if the processor supports it.
+ */
+#ifndef HAVE_builtin_prefetch
+
+/* Prefetch instructions for Pentium III, Pentium 4 and AMD Athlon */
+#if defined(CONFIG_MPENTIUMIII) || defined(CONFIG_MPENTIUM4)
 
 #define ARCH_HAS_PREFETCH
 extern inline void prefetch(const void *x)
@@ -502,8 +508,14 @@
 {
 	 __asm__ __volatile__ ("prefetchw (%0)" : : "r"(x));
 }
-#define spin_lock_prefetch(x)	prefetchw(x)
 
+#ifndef CONFIG_SMP
+#define spin_lock_prefetch(x) do { } while(0)
+#else
+#define spin_lock_prefetch(x) prefetchw(x)
 #endif
+
+#endif /* CONFIG_MPENTIUMII || CONFIG_MPENTIUM4 */
+#endif /* HAVE_builtin_prefetch */
 
 #endif /* __ASM_I386_PROCESSOR_H */
diff -ur linux-2.4.17/include/linux/compiler.h linux-2.4.17-mine/include/linux/compiler.h
--- linux-2.4.17/include/linux/compiler.h	Tue Sep 18 23:12:45 2001
+++ linux-2.4.17-mine/include/linux/compiler.h	Fri Jan 11 13:19:40 2002
@@ -1,6 +1,8 @@
 #ifndef __LINUX_COMPILER_H
 #define __LINUX_COMPILER_H
 
+#include <linux/config.h>
+
 /* Somewhere in the middle of the GCC 2.96 development cycle, we implemented
    a mechanism by which the user can annotate likely branch directions and
    expect the blocks to be reordered appropriately.  Define __builtin_expect
@@ -12,5 +14,24 @@
 
 #define likely(x)	__builtin_expect((x),1)
 #define unlikely(x)	__builtin_expect((x),0)
+
+
+/*
+ * Starting somewhere in GCC 3.1, builtin_prefetch support was added, i.e. we're
+ * not dependant on information by include/asm/processor.h
+ */
+#if __GNUC__ == 3 && __GNUC_MINOR__ >= 1
+#define HAVE_builtin_prefetch
+
+#define prefetch(x)  __builtin_prefetch((x))
+#define prefetchw(x) __builtin_prefetch((x), 2)
+
+#ifndef CONFIG_SMP
+#define spin_lock_prefetch(x) do { } while(0)
+#else
+#define spin_lock_prefetch(x) prefetchw(x)
+#endif
+
+#endif
 
 #endif /* __LINUX_COMPILER_H */
diff -ur linux-2.4.17/include/linux/prefetch.h linux-2.4.17-mine/include/linux/prefetch.h
--- linux-2.4.17/include/linux/prefetch.h	Thu Nov 22 20:46:19 2001
+++ linux-2.4.17-mine/include/linux/prefetch.h	Fri Jan 11 12:44:30 2002
@@ -10,6 +10,8 @@
 #ifndef _LINUX_PREFETCH_H
 #define _LINUX_PREFETCH_H
 
+#include <linux/config.h>
+#include <linux/compiler.h>
 #include <asm/processor.h>
 #include <asm/cache.h>
 
@@ -26,7 +28,9 @@
 	
 	prefetch(x)  	- prefetches the cacheline at "x" for read
 	prefetchw(x)	- prefetches the cacheline at "x" for write
-	spin_lock_prefetch(x) - prefectches the spinlock *x for taking
+	spin_lock_prefetch(x) - prefetches the spinlock *x for taking,
+	                        if on SMP, otherwise not needed
+				(except for debugging reasons -- slow anyway).
 	
 	there is also PREFETCH_STRIDE which is the architecure-prefered 
 	"lookahead" size for prefetching streamed operations.
@@ -37,7 +41,9 @@
  *	These cannot be do{}while(0) macros. See the mental gymnastics in
  *	the loop macro.
  */
- 
+
+#ifndef HAVE_builtin_prefetch 
+
 #ifndef ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCH
 static inline void prefetch(const void *x) {;}
@@ -50,11 +56,17 @@
 
 #ifndef ARCH_HAS_SPINLOCK_PREFETCH
 #define ARCH_HAS_SPINLOCK_PREFETCH
+#ifndef CONFIG_SMP
+#define spin_lock_prefetch(x) do { } while(0)
+#else
 #define spin_lock_prefetch(x) prefetchw(x)
 #endif
+#endif
 
 #ifndef PREFETCH_STRIDE
 #define PREFETCH_STRIDE (4*L1_CACHE_BYTES)
 #endif
+
+#endif /* HAVE_builtin_prefetch */
 
 #endif
diff -ur linux-2.4.17/kernel/sched.c linux-2.4.17-mine/kernel/sched.c
--- linux-2.4.17/kernel/sched.c	Fri Dec 21 18:42:04 2001
+++ linux-2.4.17-mine/kernel/sched.c	Fri Jan 11 11:30:43 2002
@@ -117,11 +117,13 @@
 #define idle_task(cpu) (init_tasks[cpu_number_map(cpu)])
 #define can_schedule(p,cpu) \
 	((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
+#define processor_of_tsk(tsk) (tsk)->processor
 
 #else
 
 #define idle_task(cpu) (&init_task)
 #define can_schedule(p,cpu) (1)
+#define processor_of_tsk(tsk) (0)
 
 #endif
 
@@ -172,7 +174,7 @@
 #ifdef CONFIG_SMP
 		/* Give a largish advantage to the same processor...   */
 		/* (this is equivalent to penalizing other processors) */
-		if (p->processor == this_cpu)
+		if (processor_of_tsk(p) == this_cpu)
 			weight += PROC_CHANGE_PENALTY;
 #endif
 
@@ -221,7 +223,7 @@
 	 * shortcut if the woken up task's last CPU is
 	 * idle now.
 	 */
-	best_cpu = p->processor;
+	best_cpu = processor_of_tsk(p);
 	if (can_schedule(p, best_cpu)) {
 		tsk = idle_task(best_cpu);
 		if (cpu_curr(best_cpu) == tsk) {
@@ -295,18 +297,18 @@
 	tsk = target_tsk;
 	if (tsk) {
 		if (oldest_idle != -1ULL) {
-			best_cpu = tsk->processor;
+			best_cpu = processor_of_tsk(tsk);
 			goto send_now_idle;
 		}
 		tsk->need_resched = 1;
-		if (tsk->processor != this_cpu)
-			smp_send_reschedule(tsk->processor);
+		if (processor_of_tsk(tsk) != this_cpu)
+			smp_send_reschedule(processor_of_tsk(tsk));
 	}
 	return;
 		
 
 #else /* UP */
-	int this_cpu = smp_processor_id();
+	const int this_cpu = smp_processor_id();
 	struct task_struct *tsk;
 
 	tsk = cpu_curr(this_cpu);
@@ -559,7 +561,7 @@
 	if (!current->active_mm) BUG();
 need_resched_back:
 	prev = current;
-	this_cpu = prev->processor;
+	this_cpu = processor_of_tsk(prev);
 
 	if (unlikely(in_interrupt())) {
 		printk("Scheduling in interrupt\n");
@@ -1311,7 +1313,7 @@
 	}
 	sched_data->curr = current;
 	sched_data->last_schedule = get_cycles();
-	clear_bit(current->processor, &wait_init_idle);
+	clear_bit(processor_of_tsk(current), &wait_init_idle);
 }
 
 extern void init_timervecs (void);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Small optimization for UP in sched and prefetch
  2002-01-11 10:33 [PATCH] Small optimization for UP in sched and prefetch Rainer Keller
  2002-01-11 10:51 ` Jeff Garzik
@ 2002-01-11 19:59 ` Robert Love
  2002-01-11 20:02   ` Dave Jones
  1 sibling, 1 reply; 6+ messages in thread
From: Robert Love @ 2002-01-11 19:59 UTC (permalink / raw)
  To: Rainer Keller; +Cc: linux-kernel, Marcelo Tosatti

On Fri, 2002-01-11 at 05:33, Rainer Keller wrote:

> -/* Prefetch instructions for Pentium III and AMD Athlon */
> -#ifdef 	CONFIG_MPENTIUMIII
> +/* Prefetch instructions for Pentium III, Pentium 4 and AMD Athlon */
> +#if defined(CONFIG_MPENTIUMIII) || defined(CONFIG_MPENTIUM4)

if we really intend to check for the use of the AMD Athlon as well, we
need to add CONFIG_MK7, too.  Since the Athlon does have this prefetch,
it would make sense.  Otherwise, the comment is wrong.

Anyhow, good patch and I can't see it not being safe for 2.4.

	Robert Love


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Small optimization for UP in sched and prefetch
  2002-01-11 19:59 ` [PATCH] Small optimization for UP in sched and prefetch Robert Love
@ 2002-01-11 20:02   ` Dave Jones
  2002-01-11 20:07     ` Robert Love
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Jones @ 2002-01-11 20:02 UTC (permalink / raw)
  To: Robert Love; +Cc: Rainer Keller, linux-kernel, Marcelo Tosatti

On 11 Jan 2002, Robert Love wrote:

> if we really intend to check for the use of the AMD Athlon as well, we
> need to add CONFIG_MK7, too.  Since the Athlon does have this prefetch,
> it would make sense.  Otherwise, the comment is wrong.

It's handled a few lines further down in a CONFIG_X86_USE_3DNOW
which means that CyrixIII's can also use them too.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Small optimization for UP in sched and prefetch
  2002-01-11 20:02   ` Dave Jones
@ 2002-01-11 20:07     ` Robert Love
  0 siblings, 0 replies; 6+ messages in thread
From: Robert Love @ 2002-01-11 20:07 UTC (permalink / raw)
  To: Dave Jones; +Cc: Rainer Keller, linux-kernel, Marcelo Tosatti

On Fri, 2002-01-11 at 15:02, Dave Jones wrote:

> It's handled a few lines further down in a CONFIG_X86_USE_3DNOW
> which means that CyrixIII's can also use them too.

Ah, my mistake.  I should of read the source and not just the patch.

	Robert Love


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-01-11 20:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-11 10:33 [PATCH] Small optimization for UP in sched and prefetch Rainer Keller
2002-01-11 10:51 ` Jeff Garzik
2002-01-11 17:07   ` [PATCH] Small optimization for UP in sched and prefetch (take 3) Rainer Keller
2002-01-11 19:59 ` [PATCH] Small optimization for UP in sched and prefetch Robert Love
2002-01-11 20:02   ` Dave Jones
2002-01-11 20:07     ` Robert Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox