linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] preempt_count rework
@ 2013-08-14 13:15 Peter Zijlstra
  2013-08-14 13:15 ` Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

These patches optimize preempt_enable by firstly folding the preempt and
need_resched tests into one -- this should work for all architectures. And
secondly by providing per-arch preempt_count implementations; with x86 using
per-cpu preempt_count for fastest access.

These patches have so far only been compiled for defconfig-x86_64 +
CONFIG_PREEMPT=y and boot tested with kvm -smp 4 upto wanting to mount root.

It still needs asm volatile("call preempt_schedule": : :"memory"); as per
Andi's other patches to avoid the C calling convention cluttering the
preempt_enable() sites.

---
 arch/alpha/include/asm/Kbuild      |  1 +
 arch/arc/include/asm/Kbuild        |  1 +
 arch/arm/include/asm/Kbuild        |  1 +
 arch/arm64/include/asm/Kbuild      |  1 +
 arch/avr32/include/asm/Kbuild      |  1 +
 arch/blackfin/include/asm/Kbuild   |  1 +
 arch/c6x/include/asm/Kbuild        |  1 +
 arch/cris/include/asm/Kbuild       |  1 +
 arch/frv/include/asm/Kbuild        |  1 +
 arch/h8300/include/asm/Kbuild      |  1 +
 arch/hexagon/include/asm/Kbuild    |  1 +
 arch/ia64/include/asm/Kbuild       |  1 +
 arch/m32r/include/asm/Kbuild       |  1 +
 arch/m68k/include/asm/Kbuild       |  1 +
 arch/metag/include/asm/Kbuild      |  1 +
 arch/microblaze/include/asm/Kbuild |  1 +
 arch/mips/include/asm/Kbuild       |  1 +
 arch/mn10300/include/asm/Kbuild    |  1 +
 arch/openrisc/include/asm/Kbuild   |  1 +
 arch/parisc/include/asm/Kbuild     |  1 +
 arch/powerpc/include/asm/Kbuild    |  1 +
 arch/s390/include/asm/Kbuild       |  1 +
 arch/score/include/asm/Kbuild      |  1 +
 arch/sh/include/asm/Kbuild         |  1 +
 arch/sparc/include/asm/Kbuild      |  1 +
 arch/tile/include/asm/Kbuild       |  1 +
 arch/um/include/asm/Kbuild         |  1 +
 arch/unicore32/include/asm/Kbuild  |  1 +
 arch/x86/include/asm/thread_info.h |  5 ++---
 arch/x86/kernel/asm-offsets.c      |  1 -
 arch/x86/kernel/cpu/common.c       |  5 +++++
 arch/x86/kernel/entry_32.S         |  7 ++-----
 arch/x86/kernel/entry_64.S         |  4 +---
 arch/x86/kernel/process_32.c       | 10 ++++++++++
 arch/x86/kernel/process_64.c       | 10 ++++++++++
 arch/xtensa/include/asm/Kbuild     |  1 +
 include/linux/preempt.h            | 49 ++++++++++++++++++++++++++++++++++++++-----------
 include/linux/sched.h              |  2 +-
 include/linux/thread_info.h        |  1 +
 include/trace/events/sched.h       |  2 +-
 init/main.c                        |  2 +-
 kernel/context_tracking.c          |  3 +--
 kernel/cpu/idle.c                  | 10 ++++++++++
 kernel/sched/core.c                | 31 +++++++++++++++++++------------
 kernel/softirq.c                   |  4 ++--
 kernel/timer.c                     |  8 ++++----
 lib/smp_processor_id.c             |  3 +--
 47 files changed, 138 insertions(+), 48 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
@ 2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:15 ` [RFC][PATCH 1/5] sched: Introduce preempt_count accessor functions Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

These patches optimize preempt_enable by firstly folding the preempt and
need_resched tests into one -- this should work for all architectures. And
secondly by providing per-arch preempt_count implementations; with x86 using
per-cpu preempt_count for fastest access.

These patches have so far only been compiled for defconfig-x86_64 +
CONFIG_PREEMPT=y and boot tested with kvm -smp 4 upto wanting to mount root.

It still needs asm volatile("call preempt_schedule": : :"memory"); as per
Andi's other patches to avoid the C calling convention cluttering the
preempt_enable() sites.

---
 arch/alpha/include/asm/Kbuild      |  1 +
 arch/arc/include/asm/Kbuild        |  1 +
 arch/arm/include/asm/Kbuild        |  1 +
 arch/arm64/include/asm/Kbuild      |  1 +
 arch/avr32/include/asm/Kbuild      |  1 +
 arch/blackfin/include/asm/Kbuild   |  1 +
 arch/c6x/include/asm/Kbuild        |  1 +
 arch/cris/include/asm/Kbuild       |  1 +
 arch/frv/include/asm/Kbuild        |  1 +
 arch/h8300/include/asm/Kbuild      |  1 +
 arch/hexagon/include/asm/Kbuild    |  1 +
 arch/ia64/include/asm/Kbuild       |  1 +
 arch/m32r/include/asm/Kbuild       |  1 +
 arch/m68k/include/asm/Kbuild       |  1 +
 arch/metag/include/asm/Kbuild      |  1 +
 arch/microblaze/include/asm/Kbuild |  1 +
 arch/mips/include/asm/Kbuild       |  1 +
 arch/mn10300/include/asm/Kbuild    |  1 +
 arch/openrisc/include/asm/Kbuild   |  1 +
 arch/parisc/include/asm/Kbuild     |  1 +
 arch/powerpc/include/asm/Kbuild    |  1 +
 arch/s390/include/asm/Kbuild       |  1 +
 arch/score/include/asm/Kbuild      |  1 +
 arch/sh/include/asm/Kbuild         |  1 +
 arch/sparc/include/asm/Kbuild      |  1 +
 arch/tile/include/asm/Kbuild       |  1 +
 arch/um/include/asm/Kbuild         |  1 +
 arch/unicore32/include/asm/Kbuild  |  1 +
 arch/x86/include/asm/thread_info.h |  5 ++---
 arch/x86/kernel/asm-offsets.c      |  1 -
 arch/x86/kernel/cpu/common.c       |  5 +++++
 arch/x86/kernel/entry_32.S         |  7 ++-----
 arch/x86/kernel/entry_64.S         |  4 +---
 arch/x86/kernel/process_32.c       | 10 ++++++++++
 arch/x86/kernel/process_64.c       | 10 ++++++++++
 arch/xtensa/include/asm/Kbuild     |  1 +
 include/linux/preempt.h            | 49 ++++++++++++++++++++++++++++++++++++++-----------
 include/linux/sched.h              |  2 +-
 include/linux/thread_info.h        |  1 +
 include/trace/events/sched.h       |  2 +-
 init/main.c                        |  2 +-
 kernel/context_tracking.c          |  3 +--
 kernel/cpu/idle.c                  | 10 ++++++++++
 kernel/sched/core.c                | 31 +++++++++++++++++++------------
 kernel/softirq.c                   |  4 ++--
 kernel/timer.c                     |  8 ++++----
 lib/smp_processor_id.c             |  3 +--
 47 files changed, 138 insertions(+), 48 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 1/5] sched: Introduce preempt_count accessor functions
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
  2013-08-14 13:15 ` Peter Zijlstra
@ 2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:15   ` Peter Zijlstra
  2013-08-14 13:15 ` [RFC][PATCH 2/5] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-preempt_count-accessors.patch --]
[-- Type: text/plain, Size: 5023 bytes --]

Replace the single preempt_count() 'function' that's an lvalue with
two proper functions:

 preempt_count() - returns the preempt_count value as rvalue
 preempt_count_ptr() - returns a pointer to the preempt_count

Then change all sites that modify the preempt count to use
preempt_count_ptr().

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-7zr0ph7cvibcp96lw02ln9nr@git.kernel.org
---
 include/linux/preempt.h |   20 ++++++++++++++------
 init/main.c             |    2 +-
 kernel/sched/core.c     |    4 ++--
 kernel/softirq.c        |    4 ++--
 kernel/timer.c          |    8 ++++----
 lib/smp_processor_id.c  |    3 +--
 6 files changed, 24 insertions(+), 17 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,19 +10,27 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+static __always_inline int preempt_count(void)
+{
+	return current_thread_info()->preempt_count;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &current_thread_info()->preempt_count;
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
 #else
-# define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
-# define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
+# define add_preempt_count(val)	do { *preempt_count_ptr() += (val); } while (0)
+# define sub_preempt_count(val)	do { *preempt_count_ptr() -= (val); } while (0)
 #endif
 
 #define inc_preempt_count() add_preempt_count(1)
 #define dec_preempt_count() sub_preempt_count(1)
 
-#define preempt_count()	(current_thread_info()->preempt_count)
-
 #ifdef CONFIG_PREEMPT
 
 asmlinkage void preempt_schedule(void);
@@ -81,9 +89,9 @@ do { \
 
 /* For debugging and tracer internals only! */
 #define add_preempt_count_notrace(val)			\
-	do { preempt_count() += (val); } while (0)
+	do { *preempt_count_ptr() += (val); } while (0)
 #define sub_preempt_count_notrace(val)			\
-	do { preempt_count() -= (val); } while (0)
+	do { *preempt_count_ptr() -= (val); } while (0)
 #define inc_preempt_count_notrace() add_preempt_count_notrace(1)
 #define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
 
--- a/init/main.c
+++ b/init/main.c
@@ -690,7 +690,7 @@ int __init_or_module do_one_initcall(ini
 
 	if (preempt_count() != count) {
 		sprintf(msgbuf, "preemption imbalance ");
-		preempt_count() = count;
+		*preempt_count_ptr() = count;
 	}
 	if (irqs_disabled()) {
 		strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2233,7 +2233,7 @@ void __kprobes add_preempt_count(int val
 	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
 #endif
-	preempt_count() += val;
+	*preempt_count_ptr() += val;
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Spinlock count overflowing soon?
@@ -2264,7 +2264,7 @@ void __kprobes sub_preempt_count(int val
 
 	if (preempt_count() == val)
 		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
-	preempt_count() -= val;
+	*preempt_count_ptr() -= val;
 }
 EXPORT_SYMBOL(sub_preempt_count);
 
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -106,7 +106,7 @@ static void __local_bh_disable(unsigned
 	 * We must manually increment preempt_count here and manually
 	 * call the trace_preempt_off later.
 	 */
-	preempt_count() += cnt;
+	*preempt_count_ptr() += cnt;
 	/*
 	 * Were softirqs turned off above:
 	 */
@@ -256,7 +256,7 @@ asmlinkage void __do_softirq(void)
 				       " exited with %08x?\n", vec_nr,
 				       softirq_to_name[vec_nr], h->action,
 				       prev_count, preempt_count());
-				preempt_count() = prev_count;
+				*preempt_count_ptr() = prev_count;
 			}
 
 			rcu_bh_qs(cpu);
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1092,7 +1092,7 @@ static int cascade(struct tvec_base *bas
 static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 			  unsigned long data)
 {
-	int preempt_count = preempt_count();
+	int count = preempt_count();
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1119,16 +1119,16 @@ static void call_timer_fn(struct timer_l
 
 	lock_map_release(&lockdep_map);
 
-	if (preempt_count != preempt_count()) {
+	if (count != preempt_count()) {
 		WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
-			  fn, preempt_count, preempt_count());
+			  fn, count, preempt_count());
 		/*
 		 * Restore the preempt count. That gives us a decent
 		 * chance to survive and extract information. If the
 		 * callback kept a lock held, bad luck, but not worse
 		 * than the BUG() we had.
 		 */
-		preempt_count() = preempt_count;
+		*preempt_count_ptr() = count;
 	}
 }
 
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -9,10 +9,9 @@
 
 notrace unsigned int debug_smp_processor_id(void)
 {
-	unsigned long preempt_count = preempt_count();
 	int this_cpu = raw_smp_processor_id();
 
-	if (likely(preempt_count))
+	if (likely(preempt_count()))
 		goto out;
 
 	if (irqs_disabled())

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 1/5] sched: Introduce preempt_count accessor functions
  2013-08-14 13:15 ` [RFC][PATCH 1/5] sched: Introduce preempt_count accessor functions Peter Zijlstra
@ 2013-08-14 13:15   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-preempt_count-accessors.patch --]
[-- Type: text/plain, Size: 5025 bytes --]

Replace the single preempt_count() 'function' that's an lvalue with
two proper functions:

 preempt_count() - returns the preempt_count value as rvalue
 preempt_count_ptr() - returns a pointer to the preempt_count

Then change all sites that modify the preempt count to use
preempt_count_ptr().

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-7zr0ph7cvibcp96lw02ln9nr@git.kernel.org
---
 include/linux/preempt.h |   20 ++++++++++++++------
 init/main.c             |    2 +-
 kernel/sched/core.c     |    4 ++--
 kernel/softirq.c        |    4 ++--
 kernel/timer.c          |    8 ++++----
 lib/smp_processor_id.c  |    3 +--
 6 files changed, 24 insertions(+), 17 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,19 +10,27 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+static __always_inline int preempt_count(void)
+{
+	return current_thread_info()->preempt_count;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &current_thread_info()->preempt_count;
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
 #else
-# define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
-# define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
+# define add_preempt_count(val)	do { *preempt_count_ptr() += (val); } while (0)
+# define sub_preempt_count(val)	do { *preempt_count_ptr() -= (val); } while (0)
 #endif
 
 #define inc_preempt_count() add_preempt_count(1)
 #define dec_preempt_count() sub_preempt_count(1)
 
-#define preempt_count()	(current_thread_info()->preempt_count)
-
 #ifdef CONFIG_PREEMPT
 
 asmlinkage void preempt_schedule(void);
@@ -81,9 +89,9 @@ do { \
 
 /* For debugging and tracer internals only! */
 #define add_preempt_count_notrace(val)			\
-	do { preempt_count() += (val); } while (0)
+	do { *preempt_count_ptr() += (val); } while (0)
 #define sub_preempt_count_notrace(val)			\
-	do { preempt_count() -= (val); } while (0)
+	do { *preempt_count_ptr() -= (val); } while (0)
 #define inc_preempt_count_notrace() add_preempt_count_notrace(1)
 #define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
 
--- a/init/main.c
+++ b/init/main.c
@@ -690,7 +690,7 @@ int __init_or_module do_one_initcall(ini
 
 	if (preempt_count() != count) {
 		sprintf(msgbuf, "preemption imbalance ");
-		preempt_count() = count;
+		*preempt_count_ptr() = count;
 	}
 	if (irqs_disabled()) {
 		strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2233,7 +2233,7 @@ void __kprobes add_preempt_count(int val
 	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
 #endif
-	preempt_count() += val;
+	*preempt_count_ptr() += val;
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Spinlock count overflowing soon?
@@ -2264,7 +2264,7 @@ void __kprobes sub_preempt_count(int val
 
 	if (preempt_count() == val)
 		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
-	preempt_count() -= val;
+	*preempt_count_ptr() -= val;
 }
 EXPORT_SYMBOL(sub_preempt_count);
 
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -106,7 +106,7 @@ static void __local_bh_disable(unsigned
 	 * We must manually increment preempt_count here and manually
 	 * call the trace_preempt_off later.
 	 */
-	preempt_count() += cnt;
+	*preempt_count_ptr() += cnt;
 	/*
 	 * Were softirqs turned off above:
 	 */
@@ -256,7 +256,7 @@ asmlinkage void __do_softirq(void)
 				       " exited with %08x?\n", vec_nr,
 				       softirq_to_name[vec_nr], h->action,
 				       prev_count, preempt_count());
-				preempt_count() = prev_count;
+				*preempt_count_ptr() = prev_count;
 			}
 
 			rcu_bh_qs(cpu);
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1092,7 +1092,7 @@ static int cascade(struct tvec_base *bas
 static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 			  unsigned long data)
 {
-	int preempt_count = preempt_count();
+	int count = preempt_count();
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1119,16 +1119,16 @@ static void call_timer_fn(struct timer_l
 
 	lock_map_release(&lockdep_map);
 
-	if (preempt_count != preempt_count()) {
+	if (count != preempt_count()) {
 		WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
-			  fn, preempt_count, preempt_count());
+			  fn, count, preempt_count());
 		/*
 		 * Restore the preempt count. That gives us a decent
 		 * chance to survive and extract information. If the
 		 * callback kept a lock held, bad luck, but not worse
 		 * than the BUG() we had.
 		 */
-		preempt_count() = preempt_count;
+		*preempt_count_ptr() = count;
 	}
 }
 
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -9,10 +9,9 @@
 
 notrace unsigned int debug_smp_processor_id(void)
 {
-	unsigned long preempt_count = preempt_count();
 	int this_cpu = raw_smp_processor_id();
 
-	if (likely(preempt_count))
+	if (likely(preempt_count()))
 		goto out;
 
 	if (irqs_disabled())



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 2/5] sched: Add NEED_RESCHED to the preempt_count
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
  2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:15 ` [RFC][PATCH 1/5] sched: Introduce preempt_count accessor functions Peter Zijlstra
@ 2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:15   ` Peter Zijlstra
  2013-08-14 13:15 ` [RFC][PATCH 3/5] sched, arch: Create asm/preempt.h Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-preempt_count-need_resched.patch --]
[-- Type: text/plain, Size: 7527 bytes --]

In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.

We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
sites test it and fold its value into preempt_count; namely:

 - resched_task() when setting TIF_NEED_RESCHED on the current task
 - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
                   remote task it follows it up with a reschedule IPI
                   and we can modify the cpu local preempt_count from
                   there.
 - cpu_idle_loop() for when resched_task() found tsk_is_polling().

We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.

Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-7io71l00ukk9a78unrglfldi@git.kernel.org
---
 include/linux/preempt.h     |   42 +++++++++++++++++++++++++++++++++++++-----
 include/linux/sched.h       |    2 +-
 include/linux/thread_info.h |    1 +
 kernel/context_tracking.c   |    3 +--
 kernel/cpu/idle.c           |   10 ++++++++++
 kernel/sched/core.c         |   20 ++++++++++++++------
 6 files changed, 64 insertions(+), 14 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,9 +10,19 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+/*
+ * We use the MSB mostly because its available; see <linux/hardirq.h> for
+ * the other bits.
+ */
+#define PREEMPT_NEED_RESCHED	0x80000000
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
 static __always_inline int preempt_count(void)
 {
-	return current_thread_info()->preempt_count;
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
 }
 
 static __always_inline int *preempt_count_ptr(void)
@@ -20,6 +30,30 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
@@ -37,7 +71,7 @@ asmlinkage void preempt_schedule(void);
 
 #define preempt_check_resched() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule(); \
 } while (0)
 
@@ -47,7 +81,7 @@ void preempt_schedule_context(void);
 
 #define preempt_check_resched_context() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule_context(); \
 } while (0)
 #else
@@ -83,7 +117,6 @@ do { \
 #define preempt_enable() \
 do { \
 	preempt_enable_no_resched(); \
-	barrier(); \
 	preempt_check_resched(); \
 } while (0)
 
@@ -111,7 +144,6 @@ do { \
 #define preempt_enable_notrace() \
 do { \
 	preempt_enable_no_resched_notrace(); \
-	barrier(); \
 	preempt_check_resched_context(); \
 } while (0)
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2405,7 +2405,7 @@ static inline int signal_pending_state(l
 
 static inline int need_resched(void)
 {
-	return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+	return unlikely(test_preempt_need_resched());
 }
 
 /*
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -104,6 +104,7 @@ static inline int test_ti_thread_flag(st
 #define test_thread_flag(flag) \
 	test_ti_thread_flag(current_thread_info(), flag)
 
+#define test_need_resched()	test_thread_flag(TIF_NEED_RESCHED)
 #define set_need_resched()	set_thread_flag(TIF_NEED_RESCHED)
 #define clear_need_resched()	clear_thread_flag(TIF_NEED_RESCHED)
 
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -87,10 +87,9 @@ void user_enter(void)
  */
 void __sched notrace preempt_schedule_context(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_ctx;
 
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -106,6 +106,16 @@ static void cpu_idle_loop(void)
 				current_set_polling();
 			}
 			arch_cpu_idle_exit();
+			/*
+			 * We need to test and propagate the TIF_NEED_RESCHED
+			 * bit here because we might not have send the
+			 * reschedule IPI to idle tasks.
+			 *
+			 * XXX is there a better place in the generic idle code
+			 *     for this?
+			 */
+			if (test_need_resched())
+				set_preempt_need_resched();
 		}
 		tick_nohz_idle_exit();
 		schedule_preempt_disabled();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -526,8 +526,10 @@ void resched_task(struct task_struct *p)
 	set_tsk_need_resched(p);
 
 	cpu = task_cpu(p);
-	if (cpu == smp_processor_id())
+	if (cpu == smp_processor_id()) {
+		set_preempt_need_resched();
 		return;
+	}
 
 	/* NEED_RESCHED must be visible before we test polling */
 	smp_mb();
@@ -1411,6 +1413,14 @@ static void sched_ttwu_pending(void)
 
 void scheduler_ipi(void)
 {
+	/*
+	 * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
+	 * TIF_NEED_RESCHED remotely (for the first time) will also send
+	 * this IPI.
+	 */
+	if (test_need_resched())
+		set_preempt_need_resched();
+
 	if (llist_empty(&this_rq()->wake_list)
 			&& !tick_nohz_full_cpu(smp_processor_id())
 			&& !got_nohz_idle_kick())
@@ -2433,6 +2443,7 @@ static void __sched __schedule(void)
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 	clear_tsk_need_resched(prev);
+	clear_preempt_need_resched();
 	rq->skip_clock_update = 0;
 
 	if (likely(prev != next)) {
@@ -2515,13 +2526,11 @@ void __sched schedule_preempt_disabled(v
  */
 asmlinkage void __sched notrace preempt_schedule(void)
 {
-	struct thread_info *ti = current_thread_info();
-
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	do {
@@ -2546,11 +2555,10 @@ EXPORT_SYMBOL(preempt_schedule);
  */
 asmlinkage void __sched preempt_schedule_irq(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_state;
 
 	/* Catch callers which need to be fixed */
-	BUG_ON(ti->preempt_count || !irqs_disabled());
+	BUG_ON(preempt_count() || !irqs_disabled());
 
 	prev_state = exception_enter();
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 2/5] sched: Add NEED_RESCHED to the preempt_count
  2013-08-14 13:15 ` [RFC][PATCH 2/5] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
@ 2013-08-14 13:15   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-preempt_count-need_resched.patch --]
[-- Type: text/plain, Size: 7529 bytes --]

In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.

We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
sites test it and fold its value into preempt_count; namely:

 - resched_task() when setting TIF_NEED_RESCHED on the current task
 - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
                   remote task it follows it up with a reschedule IPI
                   and we can modify the cpu local preempt_count from
                   there.
 - cpu_idle_loop() for when resched_task() found tsk_is_polling().

We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.

Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-7io71l00ukk9a78unrglfldi@git.kernel.org
---
 include/linux/preempt.h     |   42 +++++++++++++++++++++++++++++++++++++-----
 include/linux/sched.h       |    2 +-
 include/linux/thread_info.h |    1 +
 kernel/context_tracking.c   |    3 +--
 kernel/cpu/idle.c           |   10 ++++++++++
 kernel/sched/core.c         |   20 ++++++++++++++------
 6 files changed, 64 insertions(+), 14 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,9 +10,19 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+/*
+ * We use the MSB mostly because its available; see <linux/hardirq.h> for
+ * the other bits.
+ */
+#define PREEMPT_NEED_RESCHED	0x80000000
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
 static __always_inline int preempt_count(void)
 {
-	return current_thread_info()->preempt_count;
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
 }
 
 static __always_inline int *preempt_count_ptr(void)
@@ -20,6 +30,30 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
@@ -37,7 +71,7 @@ asmlinkage void preempt_schedule(void);
 
 #define preempt_check_resched() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule(); \
 } while (0)
 
@@ -47,7 +81,7 @@ void preempt_schedule_context(void);
 
 #define preempt_check_resched_context() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule_context(); \
 } while (0)
 #else
@@ -83,7 +117,6 @@ do { \
 #define preempt_enable() \
 do { \
 	preempt_enable_no_resched(); \
-	barrier(); \
 	preempt_check_resched(); \
 } while (0)
 
@@ -111,7 +144,6 @@ do { \
 #define preempt_enable_notrace() \
 do { \
 	preempt_enable_no_resched_notrace(); \
-	barrier(); \
 	preempt_check_resched_context(); \
 } while (0)
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2405,7 +2405,7 @@ static inline int signal_pending_state(l
 
 static inline int need_resched(void)
 {
-	return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+	return unlikely(test_preempt_need_resched());
 }
 
 /*
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -104,6 +104,7 @@ static inline int test_ti_thread_flag(st
 #define test_thread_flag(flag) \
 	test_ti_thread_flag(current_thread_info(), flag)
 
+#define test_need_resched()	test_thread_flag(TIF_NEED_RESCHED)
 #define set_need_resched()	set_thread_flag(TIF_NEED_RESCHED)
 #define clear_need_resched()	clear_thread_flag(TIF_NEED_RESCHED)
 
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -87,10 +87,9 @@ void user_enter(void)
  */
 void __sched notrace preempt_schedule_context(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_ctx;
 
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -106,6 +106,16 @@ static void cpu_idle_loop(void)
 				current_set_polling();
 			}
 			arch_cpu_idle_exit();
+			/*
+			 * We need to test and propagate the TIF_NEED_RESCHED
+			 * bit here because we might not have send the
+			 * reschedule IPI to idle tasks.
+			 *
+			 * XXX is there a better place in the generic idle code
+			 *     for this?
+			 */
+			if (test_need_resched())
+				set_preempt_need_resched();
 		}
 		tick_nohz_idle_exit();
 		schedule_preempt_disabled();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -526,8 +526,10 @@ void resched_task(struct task_struct *p)
 	set_tsk_need_resched(p);
 
 	cpu = task_cpu(p);
-	if (cpu == smp_processor_id())
+	if (cpu == smp_processor_id()) {
+		set_preempt_need_resched();
 		return;
+	}
 
 	/* NEED_RESCHED must be visible before we test polling */
 	smp_mb();
@@ -1411,6 +1413,14 @@ static void sched_ttwu_pending(void)
 
 void scheduler_ipi(void)
 {
+	/*
+	 * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
+	 * TIF_NEED_RESCHED remotely (for the first time) will also send
+	 * this IPI.
+	 */
+	if (test_need_resched())
+		set_preempt_need_resched();
+
 	if (llist_empty(&this_rq()->wake_list)
 			&& !tick_nohz_full_cpu(smp_processor_id())
 			&& !got_nohz_idle_kick())
@@ -2433,6 +2443,7 @@ static void __sched __schedule(void)
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 	clear_tsk_need_resched(prev);
+	clear_preempt_need_resched();
 	rq->skip_clock_update = 0;
 
 	if (likely(prev != next)) {
@@ -2515,13 +2526,11 @@ void __sched schedule_preempt_disabled(v
  */
 asmlinkage void __sched notrace preempt_schedule(void)
 {
-	struct thread_info *ti = current_thread_info();
-
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	do {
@@ -2546,11 +2555,10 @@ EXPORT_SYMBOL(preempt_schedule);
  */
 asmlinkage void __sched preempt_schedule_irq(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_state;
 
 	/* Catch callers which need to be fixed */
-	BUG_ON(ti->preempt_count || !irqs_disabled());
+	BUG_ON(preempt_count() || !irqs_disabled());
 
 	prev_state = exception_enter();
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 3/5] sched, arch: Create asm/preempt.h
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
                   ` (2 preceding siblings ...)
  2013-08-14 13:15 ` [RFC][PATCH 2/5] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
@ 2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:15   ` Peter Zijlstra
  2013-08-14 13:15 ` [RFC][PATCH 4/5] sched: Create more preempt_count accessors Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-asm-preempt_count.patch --]
[-- Type: text/plain, Size: 9299 bytes --]

In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-svxlmlhaa58low5tko1itj2v@git.kernel.org
---
 arch/alpha/include/asm/Kbuild      |    1 +
 arch/arc/include/asm/Kbuild        |    1 +
 arch/arm/include/asm/Kbuild        |    1 +
 arch/arm64/include/asm/Kbuild      |    1 +
 arch/avr32/include/asm/Kbuild      |    1 +
 arch/blackfin/include/asm/Kbuild   |    1 +
 arch/c6x/include/asm/Kbuild        |    1 +
 arch/cris/include/asm/Kbuild       |    1 +
 arch/frv/include/asm/Kbuild        |    1 +
 arch/h8300/include/asm/Kbuild      |    1 +
 arch/hexagon/include/asm/Kbuild    |    1 +
 arch/ia64/include/asm/Kbuild       |    1 +
 arch/m32r/include/asm/Kbuild       |    1 +
 arch/m68k/include/asm/Kbuild       |    1 +
 arch/metag/include/asm/Kbuild      |    1 +
 arch/microblaze/include/asm/Kbuild |    1 +
 arch/mips/include/asm/Kbuild       |    1 +
 arch/mn10300/include/asm/Kbuild    |    1 +
 arch/openrisc/include/asm/Kbuild   |    1 +
 arch/parisc/include/asm/Kbuild     |    1 +
 arch/powerpc/include/asm/Kbuild    |    1 +
 arch/s390/include/asm/Kbuild       |    1 +
 arch/score/include/asm/Kbuild      |    1 +
 arch/sh/include/asm/Kbuild         |    1 +
 arch/sparc/include/asm/Kbuild      |    1 +
 arch/tile/include/asm/Kbuild       |    1 +
 arch/um/include/asm/Kbuild         |    1 +
 arch/unicore32/include/asm/Kbuild  |    1 +
 arch/x86/include/asm/Kbuild        |    1 +
 arch/xtensa/include/asm/Kbuild     |    1 +
 include/asm-generic/preempt.h      |   20 ++++++++++++++++++++
 include/linux/preempt.h            |   15 +--------------
 32 files changed, 51 insertions(+), 14 deletions(-)

--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -46,3 +46,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -33,3 +33,4 @@ generic-y += timex.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += unaligned.h
+generic-y += preempt.h
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -50,3 +50,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y	+= clkdev.h
 generic-y	+= exec.h
 generic-y	+= trace_clock.h
 generic-y	+= param.h
+generic-y += preempt.h
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -44,3 +44,4 @@ generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += user.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -56,3 +56,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += module.h
 generic-y += trace_clock.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/frv/include/asm/Kbuild
+++ b/arch/frv/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -6,3 +6,4 @@ generic-y += mmu.h
 generic-y += module.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -53,3 +53,4 @@ generic-y += types.h
 generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += kvm_para.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/m32r/include/asm/Kbuild
+++ b/arch/m32r/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += module.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -31,3 +31,4 @@ generic-y += trace_clock.h
 generic-y += types.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/metag/include/asm/Kbuild
+++ b/arch/metag/include/asm/Kbuild
@@ -52,3 +52,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += syscalls.h
+generic-y += preempt.h
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -1,2 +1,3 @@
 # MIPS headers
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/mn10300/include/asm/Kbuild
+++ b/arch/mn10300/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -67,3 +67,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h u
 	  div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \
 	  poll.h xor.h clkdev.h exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -2,3 +2,4 @@
 
 generic-y += clkdev.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@ -4,3 +4,4 @@ header-y +=
 generic-y += clkdev.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -34,3 +34,4 @@ generic-y += termios.h
 generic-y += trace_clock.h
 generic-y += ucontext.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -16,3 +16,4 @@ generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += word-at-a-time.h
+generic-y += preempt.h
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -37,3 +37,4 @@ generic-y += termios.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.
 generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h
 generic-y += switch_to.h clkdev.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -60,3 +60,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,3 +5,4 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
+generic-y += preempt.h
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -28,3 +28,4 @@ generic-y += termios.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- /dev/null
+++ b/include/asm-generic/preempt.h
@@ -0,0 +1,20 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <linux/thread_info.h>
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &current_thread_info()->preempt_count;
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -6,7 +6,6 @@
  * preempt_count (used for kernel preemption, interrupt count, etc.)
  */
 
-#include <linux/thread_info.h>
 #include <linux/linkage.h>
 #include <linux/list.h>
 
@@ -16,19 +15,7 @@
  */
 #define PREEMPT_NEED_RESCHED	0x80000000
 
-/*
- * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
- * that think a non-zero value indicates we cannot preempt.
- */
-static __always_inline int preempt_count(void)
-{
-	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline int *preempt_count_ptr(void)
-{
-	return &current_thread_info()->preempt_count;
-}
+#include <asm/preempt.h>
 
 /*
  * We fold the NEED_RESCHED bit into the preempt count such that

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 3/5] sched, arch: Create asm/preempt.h
  2013-08-14 13:15 ` [RFC][PATCH 3/5] sched, arch: Create asm/preempt.h Peter Zijlstra
@ 2013-08-14 13:15   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-asm-preempt_count.patch --]
[-- Type: text/plain, Size: 9301 bytes --]

In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-svxlmlhaa58low5tko1itj2v@git.kernel.org
---
 arch/alpha/include/asm/Kbuild      |    1 +
 arch/arc/include/asm/Kbuild        |    1 +
 arch/arm/include/asm/Kbuild        |    1 +
 arch/arm64/include/asm/Kbuild      |    1 +
 arch/avr32/include/asm/Kbuild      |    1 +
 arch/blackfin/include/asm/Kbuild   |    1 +
 arch/c6x/include/asm/Kbuild        |    1 +
 arch/cris/include/asm/Kbuild       |    1 +
 arch/frv/include/asm/Kbuild        |    1 +
 arch/h8300/include/asm/Kbuild      |    1 +
 arch/hexagon/include/asm/Kbuild    |    1 +
 arch/ia64/include/asm/Kbuild       |    1 +
 arch/m32r/include/asm/Kbuild       |    1 +
 arch/m68k/include/asm/Kbuild       |    1 +
 arch/metag/include/asm/Kbuild      |    1 +
 arch/microblaze/include/asm/Kbuild |    1 +
 arch/mips/include/asm/Kbuild       |    1 +
 arch/mn10300/include/asm/Kbuild    |    1 +
 arch/openrisc/include/asm/Kbuild   |    1 +
 arch/parisc/include/asm/Kbuild     |    1 +
 arch/powerpc/include/asm/Kbuild    |    1 +
 arch/s390/include/asm/Kbuild       |    1 +
 arch/score/include/asm/Kbuild      |    1 +
 arch/sh/include/asm/Kbuild         |    1 +
 arch/sparc/include/asm/Kbuild      |    1 +
 arch/tile/include/asm/Kbuild       |    1 +
 arch/um/include/asm/Kbuild         |    1 +
 arch/unicore32/include/asm/Kbuild  |    1 +
 arch/x86/include/asm/Kbuild        |    1 +
 arch/xtensa/include/asm/Kbuild     |    1 +
 include/asm-generic/preempt.h      |   20 ++++++++++++++++++++
 include/linux/preempt.h            |   15 +--------------
 32 files changed, 51 insertions(+), 14 deletions(-)

--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -46,3 +46,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -33,3 +33,4 @@ generic-y += timex.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += unaligned.h
+generic-y += preempt.h
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -50,3 +50,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y	+= clkdev.h
 generic-y	+= exec.h
 generic-y	+= trace_clock.h
 generic-y	+= param.h
+generic-y += preempt.h
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -44,3 +44,4 @@ generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += user.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -56,3 +56,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += module.h
 generic-y += trace_clock.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/frv/include/asm/Kbuild
+++ b/arch/frv/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -6,3 +6,4 @@ generic-y += mmu.h
 generic-y += module.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -53,3 +53,4 @@ generic-y += types.h
 generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += kvm_para.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/m32r/include/asm/Kbuild
+++ b/arch/m32r/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += module.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -31,3 +31,4 @@ generic-y += trace_clock.h
 generic-y += types.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/metag/include/asm/Kbuild
+++ b/arch/metag/include/asm/Kbuild
@@ -52,3 +52,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += syscalls.h
+generic-y += preempt.h
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -1,2 +1,3 @@
 # MIPS headers
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/mn10300/include/asm/Kbuild
+++ b/arch/mn10300/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -67,3 +67,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h u
 	  div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \
 	  poll.h xor.h clkdev.h exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -2,3 +2,4 @@
 
 generic-y += clkdev.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@ -4,3 +4,4 @@ header-y +=
 generic-y += clkdev.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -34,3 +34,4 @@ generic-y += termios.h
 generic-y += trace_clock.h
 generic-y += ucontext.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -16,3 +16,4 @@ generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += word-at-a-time.h
+generic-y += preempt.h
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -37,3 +37,4 @@ generic-y += termios.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.
 generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h
 generic-y += switch_to.h clkdev.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -60,3 +60,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,3 +5,4 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
+generic-y += preempt.h
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -28,3 +28,4 @@ generic-y += termios.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- /dev/null
+++ b/include/asm-generic/preempt.h
@@ -0,0 +1,20 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <linux/thread_info.h>
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &current_thread_info()->preempt_count;
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -6,7 +6,6 @@
  * preempt_count (used for kernel preemption, interrupt count, etc.)
  */
 
-#include <linux/thread_info.h>
 #include <linux/linkage.h>
 #include <linux/list.h>
 
@@ -16,19 +15,7 @@
  */
 #define PREEMPT_NEED_RESCHED	0x80000000
 
-/*
- * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
- * that think a non-zero value indicates we cannot preempt.
- */
-static __always_inline int preempt_count(void)
-{
-	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline int *preempt_count_ptr(void)
-{
-	return &current_thread_info()->preempt_count;
-}
+#include <asm/preempt.h>
 
 /*
  * We fold the NEED_RESCHED bit into the preempt count such that



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 4/5] sched: Create more preempt_count accessors
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
                   ` (3 preceding siblings ...)
  2013-08-14 13:15 ` [RFC][PATCH 3/5] sched, arch: Create asm/preempt.h Peter Zijlstra
@ 2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:15   ` Peter Zijlstra
  2013-08-14 13:15 ` [RFC][PATCH 5/5] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-task_preempt_count.patch --]
[-- Type: text/plain, Size: 2742 bytes --]

We need a few special preempt_count accessors:
 - task_preempt_count() for when we're interested in the preemption
   count of another (non-running) task.
 - init_task_preempt_count() for properly initializing the preemption
   count.
 - init_idle_preempt_count() a special case of the above for the idle
   threads.

With these no generic code ever touches thread_info::preempt_count
anymore and architectures could choose to remove it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ko6tuved7x9y0t08qxhhnubz@git.kernel.org
---
 include/asm-generic/preempt.h |   14 ++++++++++++++
 include/trace/events/sched.h  |    2 +-
 kernel/sched/core.c           |    7 +++----
 3 files changed, 18 insertions(+), 5 deletions(-)

--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -17,4 +17,18 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+	(task_thread_info(p)->preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+	task_thread_info(p)->preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+	task_thread_info(p)->preempt_count = 0; \
+} while (0)
+
 #endif /* __ASM_PREEMPT_H */
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -103,7 +103,7 @@ static inline long __trace_sched_switch_
 	/*
 	 * For all intents and purposes a preempted task is a running task.
 	 */
-	if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
+	if (task_preempt_count(p) & PREEMPT_ACTIVE)
 		state = TASK_RUNNING | TASK_STATE_MAX;
 #endif
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -996,7 +996,7 @@ void set_task_cpu(struct task_struct *p,
 	 * ttwu() will sort out the placement.
 	 */
 	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
-			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+			!(task_preempt_count(p) & PREEMPT_ACTIVE));
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1737,8 +1737,7 @@ void sched_fork(struct task_struct *p)
 	p->on_cpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT_COUNT
-	/* Want to start with kernel preemption disabled. */
-	task_thread_info(p)->preempt_count = 1;
+	init_task_preempt_count(p);
 #endif
 #ifdef CONFIG_SMP
 	plist_node_init(&p->pushable_tasks, MAX_PRIO);
@@ -4225,7 +4224,7 @@ void init_idle(struct task_struct *idle,
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 
 	/* Set the preempt count _outside_ the spinlocks! */
-	task_thread_info(idle)->preempt_count = 0;
+	init_idle_preempt_count(idle, cpu);
 
 	/*
 	 * The idle tasks have their own, simple scheduling class:

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 4/5] sched: Create more preempt_count accessors
  2013-08-14 13:15 ` [RFC][PATCH 4/5] sched: Create more preempt_count accessors Peter Zijlstra
@ 2013-08-14 13:15   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-task_preempt_count.patch --]
[-- Type: text/plain, Size: 2744 bytes --]

We need a few special preempt_count accessors:
 - task_preempt_count() for when we're interested in the preemption
   count of another (non-running) task.
 - init_task_preempt_count() for properly initializing the preemption
   count.
 - init_idle_preempt_count() a special case of the above for the idle
   threads.

With these no generic code ever touches thread_info::preempt_count
anymore and architectures could choose to remove it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ko6tuved7x9y0t08qxhhnubz@git.kernel.org
---
 include/asm-generic/preempt.h |   14 ++++++++++++++
 include/trace/events/sched.h  |    2 +-
 kernel/sched/core.c           |    7 +++----
 3 files changed, 18 insertions(+), 5 deletions(-)

--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -17,4 +17,18 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+	(task_thread_info(p)->preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+	task_thread_info(p)->preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+	task_thread_info(p)->preempt_count = 0; \
+} while (0)
+
 #endif /* __ASM_PREEMPT_H */
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -103,7 +103,7 @@ static inline long __trace_sched_switch_
 	/*
 	 * For all intents and purposes a preempted task is a running task.
 	 */
-	if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
+	if (task_preempt_count(p) & PREEMPT_ACTIVE)
 		state = TASK_RUNNING | TASK_STATE_MAX;
 #endif
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -996,7 +996,7 @@ void set_task_cpu(struct task_struct *p,
 	 * ttwu() will sort out the placement.
 	 */
 	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
-			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+			!(task_preempt_count(p) & PREEMPT_ACTIVE));
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1737,8 +1737,7 @@ void sched_fork(struct task_struct *p)
 	p->on_cpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT_COUNT
-	/* Want to start with kernel preemption disabled. */
-	task_thread_info(p)->preempt_count = 1;
+	init_task_preempt_count(p);
 #endif
 #ifdef CONFIG_SMP
 	plist_node_init(&p->pushable_tasks, MAX_PRIO);
@@ -4225,7 +4224,7 @@ void init_idle(struct task_struct *idle,
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 
 	/* Set the preempt count _outside_ the spinlocks! */
-	task_thread_info(idle)->preempt_count = 0;
+	init_idle_preempt_count(idle, cpu);
 
 	/*
 	 * The idle tasks have their own, simple scheduling class:



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH 5/5] sched, x86: Provide a per-cpu preempt_count implementation
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
                   ` (4 preceding siblings ...)
  2013-08-14 13:15 ` [RFC][PATCH 4/5] sched: Create more preempt_count accessors Peter Zijlstra
@ 2013-08-14 13:15 ` Peter Zijlstra
  2013-08-14 13:47 ` [RFC][PATCH 0/5] preempt_count rework H. Peter Anvin
  2013-08-14 16:48 ` Andi Kleen
  7 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 13:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, linux-kernel, linux-arch, Peter Zijlstra

[-- Attachment #1: peterz-x86-per-cpu-preempt_count.patch --]
[-- Type: text/plain, Size: 6784 bytes --]

Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

With this preempt_enable() reduces to:

  decl fs:__preempt_count
  jnz 1f
  call preempt_schedule
1:

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dechwc0p86ik69ajunev8brp@git.kernel.org
---
 arch/x86/include/asm/Kbuild        |    1 
 arch/x86/include/asm/preempt.h     |   38 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/thread_info.h |    5 +---
 arch/x86/kernel/asm-offsets.c      |    1 
 arch/x86/kernel/cpu/common.c       |    5 ++++
 arch/x86/kernel/entry_32.S         |    7 +-----
 arch/x86/kernel/entry_64.S         |    4 ---
 arch/x86/kernel/process_32.c       |   10 +++++++++
 arch/x86/kernel/process_64.c       |   10 +++++++++
 9 files changed, 68 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,4 +5,3 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
-generic-y += preempt.h
--- /dev/null
+++ b/arch/x86/include/asm/preempt.h
@@ -0,0 +1,38 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <asm/percpu.h>
+#include <linux/thread_info.h>
+
+DECLARE_PER_CPU(int, __preempt_count);
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return __raw_get_cpu_var(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &__raw_get_cpu_var(__preempt_count);
+}
+
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+	(task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+	task_thread_info(p)->saved_preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+	task_thread_info(p)->saved_preempt_count = 0; \
+	per_cpu(__preempt_count, (cpu)) = 0; \
+} while (0)
+
+#endif /* __ASM_PREEMPT_H */
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -28,8 +28,7 @@ struct thread_info {
 	__u32			flags;		/* low level flags */
 	__u32			status;		/* thread synchronous flags */
 	__u32			cpu;		/* current CPU */
-	int			preempt_count;	/* 0 => preemptable,
-						   <0 => BUG */
+	int			saved_preempt_count;
 	mm_segment_t		addr_limit;
 	struct restart_block    restart_block;
 	void __user		*sysenter_return;
@@ -49,7 +48,7 @@ struct thread_info {
 	.exec_domain	= &default_exec_domain,	\
 	.flags		= 0,			\
 	.cpu		= 0,			\
-	.preempt_count	= INIT_PREEMPT_COUNT,	\
+	.saved_preempt_count = INIT_PREEMPT_COUNT,	\
 	.addr_limit	= KERNEL_DS,		\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -32,7 +32,6 @@ void common(void) {
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
 	OFFSET(TI_addr_limit, thread_info, addr_limit);
-	OFFSET(TI_preempt_count, thread_info, preempt_count);
 
 	BLANK();
 	OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
 
 DEFINE_PER_CPU(unsigned int, irq_count) = -1;
 
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
+
 DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
 
 /*
@@ -1169,6 +1172,8 @@ void debug_stack_reset(void)
 
 DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
 DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
 
 #ifdef CONFIG_CC_STACKPROTECTOR
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
-	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
-	jnz restore_all
 need_resched:
-	movl TI_flags(%ebp), %ecx	# need_resched set ?
-	testb $_TIF_NEED_RESCHED, %cl
-	jz restore_all
+	cmpl $0,PER_CPU_VAR(__preempt_count)
+	jnz restore_all
 	testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)	# interrupts off (exception path) ?
 	jz restore_all
 	call preempt_schedule_irq
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1118,10 +1118,8 @@ ENTRY(native_iret)
 	/* Returning to kernel space. Check if we need preemption */
 	/* rcx:	 threadinfo. interrupts off. */
 ENTRY(retint_kernel)
-	cmpl $0,TI_preempt_count(%rcx)
+	cmpl $0,PER_CPU_VAR(__preempt_count)
 	jnz  retint_restore_args
-	bt  $TIF_NEED_RESCHED,TI_flags(%rcx)
-	jnc  retint_restore_args
 	bt   $9,EFLAGS-ARGOFFSET(%rsp)	/* interrupts off? */
 	jnc  retint_restore_args
 	call preempt_schedule_irq
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -291,6 +291,16 @@ __switch_to(struct task_struct *prev_p,
 	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
 		set_iopl_mask(next->iopl);
 
+#ifdef CONFIG_PREEMPT_COUNT
+	/*
+	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
+	 * preempt_count of all tasks was equal here and this would not be
+	 * needed.
+	 */
+	task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
+	__raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
+#endif
+
 	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -363,6 +363,16 @@ __switch_to(struct task_struct *prev_p,
 	this_cpu_write(old_rsp, next->usersp);
 	this_cpu_write(current_task, next_p);
 
+#ifdef CONFIG_PREEMPT_COUNT
+	/*
+	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
+	 * preempt_count of all tasks was equal here and this would not be
+	 * needed.
+	 */
+	task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
+	__raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
+#endif
+
 	this_cpu_write(kernel_stack,
 		  (unsigned long)task_stack_page(next_p) +
 		  THREAD_SIZE - KERNEL_STACK_OFFSET);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
                   ` (5 preceding siblings ...)
  2013-08-14 13:15 ` [RFC][PATCH 5/5] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
@ 2013-08-14 13:47 ` H. Peter Anvin
  2013-08-14 15:39   ` Mike Galbraith
  2013-08-14 16:04   ` Peter Zijlstra
  2013-08-14 16:48 ` Andi Kleen
  7 siblings, 2 replies; 26+ messages in thread
From: H. Peter Anvin @ 2013-08-14 13:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On 08/14/2013 06:15 AM, Peter Zijlstra wrote:
> These patches optimize preempt_enable by firstly folding the preempt and
> need_resched tests into one -- this should work for all architectures. And
> secondly by providing per-arch preempt_count implementations; with x86 using
> per-cpu preempt_count for fastest access.
> 
> These patches have so far only been compiled for defconfig-x86_64 +
> CONFIG_PREEMPT=y and boot tested with kvm -smp 4 upto wanting to mount root.
> 
> It still needs asm volatile("call preempt_schedule": : :"memory"); as per
> Andi's other patches to avoid the C calling convention cluttering the
> preempt_enable() sites.

Hi,

I still don't see this using a decrement of the percpu variable
anywhere.  The C compiler doesn't know how to generate those, so if I'm
not completely wet we will end up relying on sub_preempt_count()...
which, because it relies on taking the address of the percpu variable
will generate absolutely horrific code.

On x86, you never want to take the address of a percpu variable if you
can avoid it, as you end up generating code like:

	movq %fs:0,%rax
	subl $1,(%rax)

... for absolutely no good reason.  You can use the existing accessors
for percpu variables, but that would make you lose the flags output
which was part of the point, so I think the whole sequence needs to be
in assembly (note that once you are manipulating percpu state you are
already in assembly.)

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 13:47 ` [RFC][PATCH 0/5] preempt_count rework H. Peter Anvin
@ 2013-08-14 15:39   ` Mike Galbraith
  2013-08-14 15:39     ` Mike Galbraith
                       ` (2 more replies)
  2013-08-14 16:04   ` Peter Zijlstra
  1 sibling, 3 replies; 26+ messages in thread
From: Mike Galbraith @ 2013-08-14 15:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, 2013-08-14 at 06:47 -0700, H. Peter Anvin wrote:

> On x86, you never want to take the address of a percpu variable if you
> can avoid it, as you end up generating code like:
> 
> 	movq %fs:0,%rax
> 	subl $1,(%rax)

Hmmm..

#define cpu_rq(cpu)             (&per_cpu(runqueues, (cpu)))
#define this_rq()               (&__get_cpu_var(runqueues))

ffffffff81438c7f:       48 c7 c3 80 11 01 00    mov    $0x11180,%rbx
        /*
         * this_rq must be evaluated again because prev may have moved
         * CPUs since it called schedule(), thus the 'rq' on its stack
         * frame will be invalid.
         */
        finish_task_switch(this_rq(), prev);
ffffffff81438c86:       e8 25 b4 c0 ff          callq  ffffffff810440b0 <finish_task_switch>
                 * The context switch have flipped the stack from under us
                 * and restored the local variables which were saved when
                 * this task called schedule() in the past. prev == current
                 * is still correct, but it can be moved to another cpu/rq.
                 */
                cpu = smp_processor_id();
ffffffff81438c8b:       65 8b 04 25 b8 c5 00    mov    %gs:0xc5b8,%eax
ffffffff81438c92:       00
                rq = cpu_rq(cpu);
ffffffff81438c93:       48 98                   cltq
ffffffff81438c95:       48 03 1c c5 00 f3 bb    add    -0x7e440d00(,%rax,8),%rbx

..so could the rq = cpu_rq(cpu) sequence be improved cycle expenditure
wise by squirreling rq pointer away in a percpu this_rq, and replacing
cpu_rq(cpu) above with a __this_cpu_read(this_rq) version of this_rq()?

-Mike

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 15:39   ` Mike Galbraith
@ 2013-08-14 15:39     ` Mike Galbraith
  2013-08-14 15:43     ` H. Peter Anvin
  2013-08-14 16:06     ` Peter Zijlstra
  2 siblings, 0 replies; 26+ messages in thread
From: Mike Galbraith @ 2013-08-14 15:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, 2013-08-14 at 06:47 -0700, H. Peter Anvin wrote:

> On x86, you never want to take the address of a percpu variable if you
> can avoid it, as you end up generating code like:
> 
> 	movq %fs:0,%rax
> 	subl $1,(%rax)

Hmmm..

#define cpu_rq(cpu)             (&per_cpu(runqueues, (cpu)))
#define this_rq()               (&__get_cpu_var(runqueues))

ffffffff81438c7f:       48 c7 c3 80 11 01 00    mov    $0x11180,%rbx
        /*
         * this_rq must be evaluated again because prev may have moved
         * CPUs since it called schedule(), thus the 'rq' on its stack
         * frame will be invalid.
         */
        finish_task_switch(this_rq(), prev);
ffffffff81438c86:       e8 25 b4 c0 ff          callq  ffffffff810440b0 <finish_task_switch>
                 * The context switch have flipped the stack from under us
                 * and restored the local variables which were saved when
                 * this task called schedule() in the past. prev == current
                 * is still correct, but it can be moved to another cpu/rq.
                 */
                cpu = smp_processor_id();
ffffffff81438c8b:       65 8b 04 25 b8 c5 00    mov    %gs:0xc5b8,%eax
ffffffff81438c92:       00
                rq = cpu_rq(cpu);
ffffffff81438c93:       48 98                   cltq
ffffffff81438c95:       48 03 1c c5 00 f3 bb    add    -0x7e440d00(,%rax,8),%rbx

..so could the rq = cpu_rq(cpu) sequence be improved cycle expenditure
wise by squirreling rq pointer away in a percpu this_rq, and replacing
cpu_rq(cpu) above with a __this_cpu_read(this_rq) version of this_rq()?

-Mike


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 15:39   ` Mike Galbraith
  2013-08-14 15:39     ` Mike Galbraith
@ 2013-08-14 15:43     ` H. Peter Anvin
  2013-08-15  9:01       ` Mike Galbraith
  2013-08-14 16:06     ` Peter Zijlstra
  2 siblings, 1 reply; 26+ messages in thread
From: H. Peter Anvin @ 2013-08-14 15:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On 08/14/2013 08:39 AM, Mike Galbraith wrote:
> 
> ..so could the rq = cpu_rq(cpu) sequence be improved cycle expenditure
> wise by squirreling rq pointer away in a percpu this_rq, and replacing
> cpu_rq(cpu) above with a __this_cpu_read(this_rq) version of this_rq()?
> 

Yes.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 13:47 ` [RFC][PATCH 0/5] preempt_count rework H. Peter Anvin
  2013-08-14 15:39   ` Mike Galbraith
@ 2013-08-14 16:04   ` Peter Zijlstra
  2013-08-14 17:31     ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 16:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 06:47:45AM -0700, H. Peter Anvin wrote:

> I still don't see this using a decrement of the percpu variable
> anywhere.  The C compiler doesn't know how to generate those, so if I'm
> not completely wet we will end up relying on sub_preempt_count()...
> which, because it relies on taking the address of the percpu variable
> will generate absolutely horrific code.
> 
> On x86, you never want to take the address of a percpu variable if you
> can avoid it, as you end up generating code like:
> 
> 	movq %fs:0,%rax
> 	subl $1,(%rax)
> 

Urgh,. yes you're right. I keep forgetting GCC doesn't know how to merge
those :/

OK, so something like the below would cure the worst of that I suppose.
It compiles but doesn't boot; must've done something wrong.

Someone please look at it because my asm-foo blows. I pretty much
copy/pasted this from asm/percpu.h.

---
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -20,6 +20,28 @@ static __always_inline int *preempt_coun
 	return &__raw_get_cpu_var(__preempt_count);
 }
 
+#define __preempt_count_add(x) do {		\
+	asm("addl %1," __percpu_arg(0)		\
+		: "+m" (__preempt_count) 	\
+		: "ri" ((int)x) 		\
+		: "memory"); 			\
+} while (0)
+
+#define __preempt_count_sub(x) do {		\
+	asm("subl %1," __percpu_arg(0)		\
+		: "+m" (__preempt_count) 	\
+		: "ri" ((int)x) 		\
+		: "memory"); 			\
+} while (0)
+
+#define preempt_enable() do {			\
+	asm("\nsubl $1," __percpu_arg(0)	\
+	    "\njnz 1f"				\
+	    "\ncall preempt_schedule"		\
+	    "\n1:" : "+m" (__preempt_count)	\
+		   : : "memory");		\
+} while (0)
+
 /*
  * must be macros to avoid header recursion hell
  */
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -17,6 +17,9 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+#define __preempt_count_add(x)	do { current_thread_info()->preempt_count += (x); } while (0)
+#define __preempt_count_sub(x)  __preempt_count_add(-(x))
+
 /*
  * must be macros to avoid header recursion hell
  */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -45,8 +45,8 @@ static __always_inline bool test_preempt
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
 #else
-# define add_preempt_count(val)	do { *preempt_count_ptr() += (val); } while (0)
-# define sub_preempt_count(val)	do { *preempt_count_ptr() -= (val); } while (0)
+# define add_preempt_count(val)	__preempt_count_add(val)
+# define sub_preempt_count(val)	__preempt_count_sub(val)
 #endif
 
 #define inc_preempt_count() add_preempt_count(1)
@@ -101,17 +101,17 @@ do { \
 
 #define preempt_enable_no_resched()	sched_preempt_enable_no_resched()
 
+#ifndef preempt_enable
 #define preempt_enable() \
 do { \
 	preempt_enable_no_resched(); \
 	preempt_check_resched(); \
 } while (0)
+#endif
 
 /* For debugging and tracer internals only! */
-#define add_preempt_count_notrace(val)			\
-	do { *preempt_count_ptr() += (val); } while (0)
-#define sub_preempt_count_notrace(val)			\
-	do { *preempt_count_ptr() -= (val); } while (0)
+#define add_preempt_count_notrace(val)	__preempt_count_add(val)
+#define sub_preempt_count_notrace(val)	__preempt_count_sub(val)
 #define inc_preempt_count_notrace() add_preempt_count_notrace(1)
 #define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 15:39   ` Mike Galbraith
  2013-08-14 15:39     ` Mike Galbraith
  2013-08-14 15:43     ` H. Peter Anvin
@ 2013-08-14 16:06     ` Peter Zijlstra
  2013-08-14 16:14       ` H. Peter Anvin
  2 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 16:06 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 05:39:11PM +0200, Mike Galbraith wrote:
> On Wed, 2013-08-14 at 06:47 -0700, H. Peter Anvin wrote:
> 
> > On x86, you never want to take the address of a percpu variable if you
> > can avoid it, as you end up generating code like:
> > 
> > 	movq %fs:0,%rax
> > 	subl $1,(%rax)
> 
> Hmmm..
> 
> #define cpu_rq(cpu)             (&per_cpu(runqueues, (cpu)))
> #define this_rq()               (&__get_cpu_var(runqueues))
> 
> ffffffff81438c7f:       48 c7 c3 80 11 01 00    mov    $0x11180,%rbx
>         /*
>          * this_rq must be evaluated again because prev may have moved
>          * CPUs since it called schedule(), thus the 'rq' on its stack
>          * frame will be invalid.
>          */
>         finish_task_switch(this_rq(), prev);
> ffffffff81438c86:       e8 25 b4 c0 ff          callq  ffffffff810440b0 <finish_task_switch>
>                  * The context switch have flipped the stack from under us
>                  * and restored the local variables which were saved when
>                  * this task called schedule() in the past. prev == current
>                  * is still correct, but it can be moved to another cpu/rq.
>                  */
>                 cpu = smp_processor_id();
> ffffffff81438c8b:       65 8b 04 25 b8 c5 00    mov    %gs:0xc5b8,%eax
> ffffffff81438c92:       00
>                 rq = cpu_rq(cpu);
> ffffffff81438c93:       48 98                   cltq
> ffffffff81438c95:       48 03 1c c5 00 f3 bb    add    -0x7e440d00(,%rax,8),%rbx
> 
> ..so could the rq = cpu_rq(cpu) sequence be improved cycle expenditure
> wise by squirreling rq pointer away in a percpu this_rq, and replacing
> cpu_rq(cpu) above with a __this_cpu_read(this_rq) version of this_rq()?

Well, this_rq() should already get you that. The above code sucks for
using cpu_rq() when we know cpu == smp_processor_id().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:06     ` Peter Zijlstra
@ 2013-08-14 16:14       ` H. Peter Anvin
  2013-08-14 16:52         ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: H. Peter Anvin @ 2013-08-14 16:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On 08/14/2013 09:06 AM, Peter Zijlstra wrote:
> On Wed, Aug 14, 2013 at 05:39:11PM +0200, Mike Galbraith wrote:
>> On Wed, 2013-08-14 at 06:47 -0700, H. Peter Anvin wrote:
>>
>>> On x86, you never want to take the address of a percpu variable if you
>>> can avoid it, as you end up generating code like:
>>>
>>> 	movq %fs:0,%rax
>>> 	subl $1,(%rax)
>>
>> Hmmm..
>>
>> #define cpu_rq(cpu)             (&per_cpu(runqueues, (cpu)))
>> #define this_rq()               (&__get_cpu_var(runqueues))
>>
>> ffffffff81438c7f:       48 c7 c3 80 11 01 00    mov    $0x11180,%rbx
>>         /*
>>          * this_rq must be evaluated again because prev may have moved
>>          * CPUs since it called schedule(), thus the 'rq' on its stack
>>          * frame will be invalid.
>>          */
>>         finish_task_switch(this_rq(), prev);
>> ffffffff81438c86:       e8 25 b4 c0 ff          callq  ffffffff810440b0 <finish_task_switch>
>>                  * The context switch have flipped the stack from under us
>>                  * and restored the local variables which were saved when
>>                  * this task called schedule() in the past. prev == current
>>                  * is still correct, but it can be moved to another cpu/rq.
>>                  */
>>                 cpu = smp_processor_id();
>> ffffffff81438c8b:       65 8b 04 25 b8 c5 00    mov    %gs:0xc5b8,%eax
>> ffffffff81438c92:       00
>>                 rq = cpu_rq(cpu);
>> ffffffff81438c93:       48 98                   cltq
>> ffffffff81438c95:       48 03 1c c5 00 f3 bb    add    -0x7e440d00(,%rax,8),%rbx
>>
>> ..so could the rq = cpu_rq(cpu) sequence be improved cycle expenditure
>> wise by squirreling rq pointer away in a percpu this_rq, and replacing
>> cpu_rq(cpu) above with a __this_cpu_read(this_rq) version of this_rq()?
> 
> Well, this_rq() should already get you that. The above code sucks for
> using cpu_rq() when we know cpu == smp_processor_id().
> 

Even so, this_rq() uses __get_cpu_var() and takes its address, which
turns into a sequence like:

	leaq __percpu_runqueues(%rip),%rax
	addq %gs:this_cpu_off,%rax

... which is better than the above but still more heavyweight than it
would be if the pointer was itself a percpu variable.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
                   ` (6 preceding siblings ...)
  2013-08-14 13:47 ` [RFC][PATCH 0/5] preempt_count rework H. Peter Anvin
@ 2013-08-14 16:48 ` Andi Kleen
  2013-08-14 16:48   ` Andi Kleen
  2013-08-14 16:55   ` Peter Zijlstra
  7 siblings, 2 replies; 26+ messages in thread
From: Andi Kleen @ 2013-08-14 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 03:15:39PM +0200, Peter Zijlstra wrote:
> These patches optimize preempt_enable by firstly folding the preempt and
> need_resched tests into one -- this should work for all architectures. And
> secondly by providing per-arch preempt_count implementations; with x86 using
> per-cpu preempt_count for fastest access.
> 
> These patches have so far only been compiled for defconfig-x86_64 +
> CONFIG_PREEMPT=y and boot tested with kvm -smp 4 upto wanting to mount root.
> 
> It still needs asm volatile("call preempt_schedule": : :"memory"); as per
> Andi's other patches to avoid the C calling convention cluttering the
> preempt_enable() sites.

FWIW I removed the user_schedule in v2 because I don't need it anymore.
Feel free to pick it up from v1 though.

It needs two patches: the one adding SAVE_ALL for 32bit and
the parts of the put_user() patch adding user_schedule

When it's not used in user_ anymore it should probably 
be renamed too, to preempt_schedule or somesuch,
and probably moved to a different file.

-Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:48 ` Andi Kleen
@ 2013-08-14 16:48   ` Andi Kleen
  2013-08-14 16:55   ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Andi Kleen @ 2013-08-14 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 03:15:39PM +0200, Peter Zijlstra wrote:
> These patches optimize preempt_enable by firstly folding the preempt and
> need_resched tests into one -- this should work for all architectures. And
> secondly by providing per-arch preempt_count implementations; with x86 using
> per-cpu preempt_count for fastest access.
> 
> These patches have so far only been compiled for defconfig-x86_64 +
> CONFIG_PREEMPT=y and boot tested with kvm -smp 4 upto wanting to mount root.
> 
> It still needs asm volatile("call preempt_schedule": : :"memory"); as per
> Andi's other patches to avoid the C calling convention cluttering the
> preempt_enable() sites.

FWIW I removed the user_schedule in v2 because I don't need it anymore.
Feel free to pick it up from v1 though.

It needs two patches: the one adding SAVE_ALL for 32bit and
the parts of the put_user() patch adding user_schedule

When it's not used in user_ anymore it should probably 
be renamed too, to preempt_schedule or somesuch,
and probably moved to a different file.

-Andi


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:14       ` H. Peter Anvin
@ 2013-08-14 16:52         ` Peter Zijlstra
  2013-08-14 16:58           ` H. Peter Anvin
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 16:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 09:14:34AM -0700, H. Peter Anvin wrote:
> Even so, this_rq() uses __get_cpu_var() and takes its address, which
> turns into a sequence like:
> 
> 	leaq __percpu_runqueues(%rip),%rax
> 	addq %gs:this_cpu_off,%rax
> 
> ... which is better than the above but still more heavyweight than it
> would be if the pointer was itself a percpu variable.

Oh curses, this is because lea can't do segment offsets? So there's no
sane way to get addresses of per-cpu variables.

Because ideally we'd have something like:

  lea %gs:__percpu_runqueues,%rax

So in this case it makes sense to also store the actual pointer; how
unfortunate.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:48 ` Andi Kleen
  2013-08-14 16:48   ` Andi Kleen
@ 2013-08-14 16:55   ` Peter Zijlstra
  2013-08-14 17:12     ` Andi Kleen
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 16:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Ingo Molnar, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 09:48:27AM -0700, Andi Kleen wrote:

> FWIW I removed the user_schedule in v2 because I don't need it anymore.
> Feel free to pick it up from v1 though.

Ah, I had a quick look through your v2 because I got a link into it from
Ingo but didn't find it. I'll have to ask Google to locate this v1. I
suppose that's what's wrong with my last patch. It directly does a call
preempt_schedule -- which I had hoped would work due to its asmlinkage,
but apparently there's more to it.

Also, I suppose there's a few volatile thingies missing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:52         ` Peter Zijlstra
@ 2013-08-14 16:58           ` H. Peter Anvin
  0 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2013-08-14 16:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On 08/14/2013 09:52 AM, Peter Zijlstra wrote:
> 
> Oh curses, this is because lea can't do segment offsets? So there's no
> sane way to get addresses of per-cpu variables.
> 
> Because ideally we'd have something like:
> 
>   lea %gs:__percpu_runqueues,%rax
> 
> So in this case it makes sense to also store the actual pointer; how
> unfortunate.
> 

Correct; the segment offset is not part of the effective address, even
in 64-bit mode.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:55   ` Peter Zijlstra
@ 2013-08-14 17:12     ` Andi Kleen
  0 siblings, 0 replies; 26+ messages in thread
From: Andi Kleen @ 2013-08-14 17:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 06:55:05PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 14, 2013 at 09:48:27AM -0700, Andi Kleen wrote:
> 
> > FWIW I removed the user_schedule in v2 because I don't need it anymore.
> > Feel free to pick it up from v1 though.
> 
> Ah, I had a quick look through your v2 because I got a link into it from
> Ingo but didn't find it. I'll have to ask Google to locate this v1. I

Sorry, it's the thread starting with

http://permalink.gmane.org/gmane.linux.kernel/1541950

I also pushed the branch to git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git

uaccess-opt311 is v1
uaccess-opt311-2 is v2 (without user_schedule)


> suppose that's what's wrong with my last patch. It directly does a call
> preempt_schedule -- which I had hoped would work due to its asmlinkage,
> but apparently there's more to it.

preempt_schedule does not preserve registers, so yes that would
explain a lot of problems.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 16:04   ` Peter Zijlstra
@ 2013-08-14 17:31     ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2013-08-14 17:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, Aug 14, 2013 at 06:04:16PM +0200, Peter Zijlstra wrote:
> OK, so something like the below would cure the worst of that I suppose.
> It compiles but doesn't boot; must've done something wrong.
> 
> Someone please look at it because my asm-foo blows. I pretty much
> copy/pasted this from asm/percpu.h.

OK, another hatchet job, now with bits borrowed from Andi's
voluntary-preempt v1. This one actually boots.

---
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -48,6 +48,8 @@ For 32-bit we have the following convent
 
 #include <asm/dwarf2.h>
 
+#ifdef CONFIG_X86_64
+
 /*
  * 64-bit system call stack frame layout defines and helpers,
  * for assembly code:
@@ -192,3 +194,51 @@ For 32-bit we have the following convent
 	.macro icebp
 	.byte 0xf1
 	.endm
+
+#else /* CONFIG_X86_64 */
+
+/*
+ * For 32bit only simplified versions of SAVE_ALL/RESTORE_ALL. These
+ * are different from the entry_32.S versions in not changing the segment
+ * registers. So only suitable for in kernel use, not when transitioning
+ * from or to user space. The resulting stack frame is not a standard
+ * pt_regs frame. The main use case is calling C code from assembler
+ * when all the registers need to be preserved.
+ */
+
+	.macro SAVE_ALL
+	pushl_cfi %eax
+	CFI_REL_OFFSET eax, 0
+	pushl_cfi %ebp
+	CFI_REL_OFFSET ebp, 0
+	pushl_cfi %edi
+	CFI_REL_OFFSET edi, 0
+	pushl_cfi %esi
+	CFI_REL_OFFSET esi, 0
+	pushl_cfi %edx
+	CFI_REL_OFFSET edx, 0
+	pushl_cfi %ecx
+	CFI_REL_OFFSET ecx, 0
+	pushl_cfi %ebx
+	CFI_REL_OFFSET ebx, 0
+	.endm
+
+	.macro RESTORE_ALL
+	popl_cfi %ebx
+	CFI_RESTORE ebx
+	popl_cfi %ecx
+	CFI_RESTORE ecx
+	popl_cfi %edx
+	CFI_RESTORE edx
+	popl_cfi %esi
+	CFI_RESTORE esi
+	popl_cfi %edi
+	CFI_RESTORE edi
+	popl_cfi %ebp
+	CFI_RESTORE ebp
+	popl_cfi %eax
+	CFI_RESTORE eax
+	.endm
+
+#endif /* CONFIG_X86_64 */
+
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -20,6 +20,28 @@ static __always_inline int *preempt_coun
 	return &__raw_get_cpu_var(__preempt_count);
 }
 
+#define __preempt_count_add(x) do {		\
+	asm volatile("addl %1," __percpu_arg(0)	\
+		: "+m" (__preempt_count) 	\
+		: "ri" ((int)x) 		\
+		: "memory"); 			\
+} while (0)
+
+#define __preempt_count_sub(x) do {		\
+	asm volatile("subl %1," __percpu_arg(0)	\
+		: "+m" (__preempt_count) 	\
+		: "ri" ((int)x) 		\
+		: "memory"); 			\
+} while (0)
+
+#define __preempt_enable() do {				\
+	asm volatile("\nsubl $1," __percpu_arg(0)	\
+		     "\njnz 1f"				\
+		     "\ncall __preempt_schedule"	\
+		     "\n1:" 				\
+		: "+m" (__preempt_count) : : "memory");	\
+} while (0)
+
 /*
  * must be macros to avoid header recursion hell
  */
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -36,6 +36,8 @@ obj-y			+= tsc.o io_delay.o rtc.o
 obj-y			+= pci-iommu_table.o
 obj-y			+= resource.o
 
+obj-$(CONFIG_PREEMPT)	+= preempt.o
+
 obj-y				+= process.o
 obj-y				+= i387.o xsave.o
 obj-y				+= ptrace.o
--- /dev/null
+++ b/arch/x86/kernel/preempt.S
@@ -0,0 +1,25 @@
+
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/asm.h>
+#include <asm/calling.h>
+
+ENTRY(__preempt_schedule)
+	CFI_STARTPROC
+	SAVE_ALL
+	call preempt_schedule
+	RESTORE_ALL
+	ret
+	CFI_ENDPROC
+
+#ifdef CONFIG_CONTEXT_TRACKING
+
+ENTRY(__preempt_schedule_context)
+	CFI_STARTPROC
+	SAVE_ALL
+	call preempt_schedule_context
+	RESTORE_ALL
+	ret
+	CFI_ENDPROC
+
+#endif
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -17,6 +17,9 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+#define __preempt_count_add(x)	do { current_thread_info()->preempt_count += (x); } while (0)
+#define __preempt_count_sub(x)  __preempt_count_add(-(x))
+
 /*
  * must be macros to avoid header recursion hell
  */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -45,8 +45,8 @@ static __always_inline bool test_preempt
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
 #else
-# define add_preempt_count(val)	do { *preempt_count_ptr() += (val); } while (0)
-# define sub_preempt_count(val)	do { *preempt_count_ptr() -= (val); } while (0)
+# define add_preempt_count(val)	__preempt_count_add(val)
+# define sub_preempt_count(val)	__preempt_count_sub(val)
 #endif
 
 #define inc_preempt_count() add_preempt_count(1)
@@ -64,7 +64,7 @@ do { \
 
 #ifdef CONFIG_CONTEXT_TRACKING
 
-void preempt_schedule_context(void);
+asmlinkage void preempt_schedule_context(void);
 
 #define preempt_check_resched_context() \
 do { \
@@ -101,17 +101,19 @@ do { \
 
 #define preempt_enable_no_resched()	sched_preempt_enable_no_resched()
 
+#ifndef __preempt_enable
 #define preempt_enable() \
 do { \
 	preempt_enable_no_resched(); \
 	preempt_check_resched(); \
 } while (0)
+#else
+#define preempt_enable() __preempt_enable()
+#endif
 
 /* For debugging and tracer internals only! */
-#define add_preempt_count_notrace(val)			\
-	do { *preempt_count_ptr() += (val); } while (0)
-#define sub_preempt_count_notrace(val)			\
-	do { *preempt_count_ptr() -= (val); } while (0)
+#define add_preempt_count_notrace(val)	__preempt_count_add(val)
+#define sub_preempt_count_notrace(val)	__preempt_count_sub(val)
 #define inc_preempt_count_notrace() add_preempt_count_notrace(1)
 #define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH 0/5] preempt_count rework
  2013-08-14 15:43     ` H. Peter Anvin
@ 2013-08-15  9:01       ` Mike Galbraith
  0 siblings, 0 replies; 26+ messages in thread
From: Mike Galbraith @ 2013-08-15  9:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Thomas Gleixner, Arjan van de Ven, linux-kernel, linux-arch

On Wed, 2013-08-14 at 08:43 -0700, H. Peter Anvin wrote: 
> On 08/14/2013 08:39 AM, Mike Galbraith wrote:
> > 
> > ..so could the rq = cpu_rq(cpu) sequence be improved cycle expenditure
> > wise by squirreling rq pointer away in a percpu this_rq, and replacing
> > cpu_rq(cpu) above with a __this_cpu_read(this_rq) version of this_rq()?
> > 
> 
> Yes.

Oh darn, that worked out about as you'd expect.  Cycles are so far down
in the frog hair as to be invisible, so not be worth the space cost.

pinned sched_yield proggy, switches/sec, 3 boots/5 runs each:
                                                                            avg
pre:      1650522     1580422     1604430     1611697     1612928     1611999.8
          1682789     1609103     1603866     1559040     1607424     1612444.4
          1608265     1607513     1606730     1607079     1635914     1613100.2
                                                                      1612514.8  avg avg  1.000

post:     1649396     1595364     1621720     1643665     1641829     1630394.8
          1571322     1591638     1575406     1629960     1592129     1592091.0
          1641807     1622591     1620581     1651145     1663025     1639829.8
                                                                      1620771.8  avg avg  1.005

---
 kernel/sched/core.c  |    8 ++++----
 kernel/sched/sched.h |   12 +++++++++---
 2 files changed, 13 insertions(+), 7 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -111,6 +111,7 @@ void start_bandwidth_timer(struct hrtime
 
 DEFINE_MUTEX(sched_domains_mutex);
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+DEFINE_PER_CPU_SHARED_ALIGNED(struct rq *, runqueue);
 
 static void update_rq_clock_task(struct rq *rq, s64 delta);
 
@@ -2390,7 +2391,7 @@ static void __sched __schedule(void)
 need_resched:
 	preempt_disable();
 	cpu = smp_processor_id();
-	rq = cpu_rq(cpu);
+	rq = this_rq();
 	rcu_note_context_switch(cpu);
 	prev = rq->curr;
 
@@ -2447,8 +2448,7 @@ static void __sched __schedule(void)
 		 * this task called schedule() in the past. prev == current
 		 * is still correct, but it can be moved to another cpu/rq.
 		 */
-		cpu = smp_processor_id();
-		rq = cpu_rq(cpu);
+		rq = this_rq();
 	} else
 		raw_spin_unlock_irq(&rq->lock);
 
@@ -6470,7 +6470,7 @@ void __init sched_init(void)
 	for_each_possible_cpu(i) {
 		struct rq *rq;
 
-		rq = cpu_rq(i);
+		rq = per_cpu(runqueue, i) = &per_cpu(runqueues, i);
 		raw_spin_lock_init(&rq->lock);
 		rq->nr_running = 0;
 		rq->calc_load_active = 0;
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -537,11 +537,17 @@ static inline int cpu_of(struct rq *rq)
 
 DECLARE_PER_CPU(struct rq, runqueues);
 
-#define cpu_rq(cpu)		(&per_cpu(runqueues, (cpu)))
-#define this_rq()		(&__get_cpu_var(runqueues))
+/*
+ * Runqueue pointer for use by macros to avoid costly code generated
+ * by taking the address of percpu variables.
+ */
+DECLARE_PER_CPU(struct rq *, runqueue);
+
+#define cpu_rq(cpu)		(per_cpu(runqueue, (cpu)))
+#define this_rq()		(__this_cpu_read(runqueue))
 #define task_rq(p)		cpu_rq(task_cpu(p))
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
-#define raw_rq()		(&__raw_get_cpu_var(runqueues))
+#define raw_rq()		(__raw_get_cpu_var(runqueue))
 
 static inline u64 rq_clock(struct rq *rq)
 {

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-08-15  9:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-14 13:15 [RFC][PATCH 0/5] preempt_count rework Peter Zijlstra
2013-08-14 13:15 ` Peter Zijlstra
2013-08-14 13:15 ` [RFC][PATCH 1/5] sched: Introduce preempt_count accessor functions Peter Zijlstra
2013-08-14 13:15   ` Peter Zijlstra
2013-08-14 13:15 ` [RFC][PATCH 2/5] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
2013-08-14 13:15   ` Peter Zijlstra
2013-08-14 13:15 ` [RFC][PATCH 3/5] sched, arch: Create asm/preempt.h Peter Zijlstra
2013-08-14 13:15   ` Peter Zijlstra
2013-08-14 13:15 ` [RFC][PATCH 4/5] sched: Create more preempt_count accessors Peter Zijlstra
2013-08-14 13:15   ` Peter Zijlstra
2013-08-14 13:15 ` [RFC][PATCH 5/5] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
2013-08-14 13:47 ` [RFC][PATCH 0/5] preempt_count rework H. Peter Anvin
2013-08-14 15:39   ` Mike Galbraith
2013-08-14 15:39     ` Mike Galbraith
2013-08-14 15:43     ` H. Peter Anvin
2013-08-15  9:01       ` Mike Galbraith
2013-08-14 16:06     ` Peter Zijlstra
2013-08-14 16:14       ` H. Peter Anvin
2013-08-14 16:52         ` Peter Zijlstra
2013-08-14 16:58           ` H. Peter Anvin
2013-08-14 16:04   ` Peter Zijlstra
2013-08-14 17:31     ` Peter Zijlstra
2013-08-14 16:48 ` Andi Kleen
2013-08-14 16:48   ` Andi Kleen
2013-08-14 16:55   ` Peter Zijlstra
2013-08-14 17:12     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).