public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3
@ 2014-05-11 23:33 Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 1/5] irq_work: Architecture support for remote irq work raise Frederic Weisbecker
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-11 23:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

Hi,

So this version gives up with smp_queue_function_single() and extends
irq work to support remote queuing. As suggested by Peterz.

Comments?

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz-irq-work

Thanks,
	Frederic
---

Frederic Weisbecker (5):
      irq_work: Architecture support for remote irq work raise
      irq_work: Force non-lazy works on IPI
      irq_work: Allow remote queueing
      nohz: Move full nohz kick to its own IPI
      nohz: Use IPI implicit full barrier against rq->nr_running r/w


 arch/Kconfig               | 12 +++++++
 arch/alpha/kernel/time.c   |  3 +-
 arch/arm/Kconfig           |  1 +
 arch/arm/kernel/smp.c      |  4 +--
 arch/powerpc/kernel/time.c |  3 +-
 arch/sparc/kernel/pcr.c    |  3 +-
 arch/x86/Kconfig           |  1 +
 arch/x86/kernel/irq_work.c | 10 ++----
 include/linux/irq_work.h   |  3 ++
 include/linux/tick.h       |  9 ++++-
 kernel/irq_work.c          | 87 +++++++++++++++++++++++++++++++---------------
 kernel/sched/core.c        | 14 ++++----
 kernel/sched/sched.h       | 12 +++++--
 kernel/time/Kconfig        |  2 ++
 kernel/time/tick-sched.c   | 10 +++---
 kernel/timer.c             |  2 +-
 16 files changed, 118 insertions(+), 58 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-11 23:33 [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3 Frederic Weisbecker
@ 2014-05-11 23:33 ` Frederic Weisbecker
  2014-05-12  0:08   ` Benjamin Herrenschmidt
  2014-05-12  7:56   ` Peter Zijlstra
  2014-05-11 23:33 ` [PATCH 2/5] irq_work: Force non-lazy works on IPI Frederic Weisbecker
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-11 23:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Benjamin Herrenschmidt,
	David S. Miller, Ingo Molnar, Kevin Hilman, Paul E. McKenney,
	Paul Mackerras, Peter Zijlstra, Russell King, Thomas Gleixner,
	Viresh Kumar

We are going to extend irq work to support remote queuing.

So lets add a cpu argument to arch_irq_work_raise(). The architectures
willing to support that must then provide the backend to raise irq work
IPIs remotely.

Initial support is provided for x86 and ARM since they are easily
extended. The other archs that overwrite arch_irq_work_raise() seem
to use local clock interrupts and therefore need deeper rewrite of their
irq work support to implement remote raising.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/Kconfig               | 12 ++++++++++++
 arch/alpha/kernel/time.c   |  3 ++-
 arch/arm/Kconfig           |  1 +
 arch/arm/kernel/smp.c      |  4 ++--
 arch/powerpc/kernel/time.c |  3 ++-
 arch/sparc/kernel/pcr.c    |  3 ++-
 arch/x86/Kconfig           |  1 +
 arch/x86/kernel/irq_work.c | 10 ++--------
 kernel/irq_work.c          |  4 ++--
 9 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 97ff872..3a38356 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -472,6 +472,18 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK
 	  This spares a stack switch and improves cache usage on softirq
 	  processing.
 
+config HAVE_IRQ_WORK_IPI
+	bool
+	help
+	  Architecture supports raising irq work interrupts both locally and
+	  remotely. Without this capability, we can only trigger local irq works
+	  loosely handled by the generic timer tick with the bad implications
+	  coming along: the irq work is subject to HZ latency and it runs under
+	  the tick random locking scenario (possibly holding hrtimer lock).
+
+	  This capability is required on configs running with a very minimized
+	  tick rate like full dynticks.
+
 #
 # ABI hall of shame
 #
diff --git a/arch/alpha/kernel/time.c b/arch/alpha/kernel/time.c
index ee39cee..2ff0c61 100644
--- a/arch/alpha/kernel/time.c
+++ b/arch/alpha/kernel/time.c
@@ -60,8 +60,9 @@ DEFINE_PER_CPU(u8, irq_work_pending);
 #define test_irq_work_pending()      __get_cpu_var(irq_work_pending)
 #define clear_irq_work_pending()     __get_cpu_var(irq_work_pending) = 0
 
-void arch_irq_work_raise(void)
+void arch_irq_work_raise(int cpu)
 {
+	WARN_ON_ONCE(cpu != smp_processor_id());
 	set_irq_work_pending_flag();
 }
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index db3c541..7edce21 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -46,6 +46,7 @@ config ARM
 	select HAVE_HW_BREAKPOINT if (PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7))
 	select HAVE_IDE if PCI || ISA || PCMCIA
 	select HAVE_IRQ_TIME_ACCOUNTING
+	select HAVE_IRQ_WORK_IPI
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZ4
 	select HAVE_KERNEL_LZMA
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 7c4fada..042a800 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -454,10 +454,10 @@ void arch_send_call_function_single_ipi(int cpu)
 }
 
 #ifdef CONFIG_IRQ_WORK
-void arch_irq_work_raise(void)
+void arch_irq_work_raise(int cpu)
 {
 	if (is_smp())
-		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
+		smp_cross_call(cpumask_of(cpu), IPI_IRQ_WORK);
 }
 #endif
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 122a580..4de25f4 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -464,9 +464,10 @@ DEFINE_PER_CPU(u8, irq_work_pending);
 
 #endif /* 32 vs 64 bit */
 
-void arch_irq_work_raise(void)
+void arch_irq_work_raise(int cpu)
 {
 	preempt_disable();
+	WARN_ON_ONCE(cpu != smp_processor_id());
 	set_irq_work_pending_flag();
 	set_dec(1);
 	preempt_enable();
diff --git a/arch/sparc/kernel/pcr.c b/arch/sparc/kernel/pcr.c
index 269af58..0e5bfd9 100644
--- a/arch/sparc/kernel/pcr.c
+++ b/arch/sparc/kernel/pcr.c
@@ -43,8 +43,9 @@ void __irq_entry deferred_pcr_work_irq(int irq, struct pt_regs *regs)
 	set_irq_regs(old_regs);
 }
 
-void arch_irq_work_raise(void)
+void arch_irq_work_raise(int cpu)
 {
+	WARN_ON_ONCE(cpu != smp_processor_id());
 	set_softint(1 << PIL_DEFERRED_PCR_WORK);
 }
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 25d2c6f..b06f3fd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -130,6 +130,7 @@ config X86
 	select HAVE_CC_STACKPROTECTOR
 	select GENERIC_CPU_AUTOPROBE
 	select HAVE_ARCH_AUDITSYSCALL
+	select HAVE_IRQ_WORK_IPI
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
index 1de84e3..500ec1f 100644
--- a/arch/x86/kernel/irq_work.c
+++ b/arch/x86/kernel/irq_work.c
@@ -38,13 +38,7 @@ __visible void smp_trace_irq_work_interrupt(struct pt_regs *regs)
 	exiting_irq();
 }
 
-void arch_irq_work_raise(void)
+void arch_irq_work_raise(int cpu)
 {
-#ifdef CONFIG_X86_LOCAL_APIC
-	if (!cpu_has_apic)
-		return;
-
-	apic->send_IPI_self(IRQ_WORK_VECTOR);
-	apic_wait_icr_idle();
-#endif
+	apic->send_IPI_mask(cpumask_of(cpu), IRQ_WORK_VECTOR);
 }
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a82170e..2559383 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -48,7 +48,7 @@ static bool irq_work_claim(struct irq_work *work)
 	return true;
 }
 
-void __weak arch_irq_work_raise(void)
+void __weak arch_irq_work_raise(int cpu)
 {
 	/*
 	 * Lame architectures will get the timer tick callback
@@ -79,7 +79,7 @@ bool irq_work_queue(struct irq_work *work)
 	 */
 	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
 		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
-			arch_irq_work_raise();
+			arch_irq_work_raise(smp_processor_id());
 	}
 
 	preempt_enable();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] irq_work: Force non-lazy works on IPI
  2014-05-11 23:33 [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3 Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 1/5] irq_work: Architecture support for remote irq work raise Frederic Weisbecker
@ 2014-05-11 23:33 ` Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 3/5] irq_work: Allow remote queueing Frederic Weisbecker
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-11 23:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

As we plan to handle the full nohz IPI using irq work, we need to
enforce non-lazy works outside the tick because it's called under
hrtimer lock. This is not desired from the nohz callback revaluating the
tick because it can take hrtimer lock itself.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/irq_work.h |  1 +
 kernel/irq_work.c        | 58 ++++++++++++++++++++++++++----------------------
 kernel/timer.c           |  2 +-
 3 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 19ae05d..429b1ba 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -34,6 +34,7 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 
 bool irq_work_queue(struct irq_work *work);
 void irq_work_run(void);
+void irq_work_run_tick(void);
 void irq_work_sync(struct irq_work *work);
 
 #ifdef CONFIG_IRQ_WORK
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 2559383..0a554a6 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -19,8 +19,8 @@
 #include <asm/processor.h>
 
 
-static DEFINE_PER_CPU(struct llist_head, irq_work_list);
-static DEFINE_PER_CPU(int, irq_work_raised);
+static DEFINE_PER_CPU(struct llist_head, lazy_list);
+static DEFINE_PER_CPU(struct llist_head, raised_list);
 
 /*
  * Claim the entry so that no one else will poke at it.
@@ -63,14 +63,14 @@ void __weak arch_irq_work_raise(int cpu)
  */
 bool irq_work_queue(struct irq_work *work)
 {
+	unsigned long flags;
+
 	/* Only queue if not already pending */
 	if (!irq_work_claim(work))
 		return false;
 
 	/* Queue the entry and raise the IPI if needed. */
-	preempt_disable();
-
-	llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
+	local_irq_save(flags);
 
 	/*
 	 * If the work is not "lazy" or the tick is stopped, raise the irq
@@ -78,11 +78,13 @@ bool irq_work_queue(struct irq_work *work)
 	 * for the next tick.
 	 */
 	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
-		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
+		if (llist_add(&work->llnode, &__get_cpu_var(raised_list)))
 			arch_irq_work_raise(smp_processor_id());
+	} else {
+		llist_add(&work->llnode, &__get_cpu_var(lazy_list));
 	}
 
-	preempt_enable();
+	local_irq_restore(flags);
 
 	return true;
 }
@@ -90,10 +92,7 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
 
 bool irq_work_needs_cpu(void)
 {
-	struct llist_head *this_list;
-
-	this_list = &__get_cpu_var(irq_work_list);
-	if (llist_empty(this_list))
+	if (llist_empty(&__get_cpu_var(lazy_list)))
 		return false;
 
 	/* All work should have been flushed before going offline */
@@ -102,28 +101,18 @@ bool irq_work_needs_cpu(void)
 	return true;
 }
 
-static void __irq_work_run(void)
+static void __irq_work_run(struct llist_head *list)
 {
 	unsigned long flags;
 	struct irq_work *work;
-	struct llist_head *this_list;
 	struct llist_node *llnode;
 
-
-	/*
-	 * Reset the "raised" state right before we check the list because
-	 * an NMI may enqueue after we find the list empty from the runner.
-	 */
-	__this_cpu_write(irq_work_raised, 0);
-	barrier();
-
-	this_list = &__get_cpu_var(irq_work_list);
-	if (llist_empty(this_list))
+	if (llist_empty(list))
 		return;
 
 	BUG_ON(!irqs_disabled());
 
-	llnode = llist_del_all(this_list);
+	llnode = llist_del_all(list);
 	while (llnode != NULL) {
 		work = llist_entry(llnode, struct irq_work, llnode);
 
@@ -155,11 +144,27 @@ static void __irq_work_run(void)
 void irq_work_run(void)
 {
 	BUG_ON(!in_irq());
-	__irq_work_run();
+	__irq_work_run(&__get_cpu_var(raised_list));
+	__irq_work_run(&__get_cpu_var(lazy_list));
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
 /*
+ * Run the lazy irq_work entries on this cpu from the tick. But let
+ * the IPI handle the others. Some works may require to work outside
+ * the tick due to its locking dependencies (hrtimer lock).
+ */
+void irq_work_run_tick(void)
+{
+	BUG_ON(!in_irq());
+#ifndef CONFIG_HAVE_IRQ_WORK_IPI
+	/* No IPI support, we don't have the choice... */
+	__irq_work_run(&__get_cpu_var(raised_list));
+#endif
+	__irq_work_run(&__get_cpu_var(lazy_list));
+}
+
+/*
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
  */
@@ -183,7 +188,8 @@ static int irq_work_cpu_notify(struct notifier_block *self,
 		/* Called from stop_machine */
 		if (WARN_ON_ONCE(cpu != smp_processor_id()))
 			break;
-		__irq_work_run();
+		__irq_work_run(&__get_cpu_var(raised_list));
+		__irq_work_run(&__get_cpu_var(lazy_list));
 		break;
 	default:
 		break;
diff --git a/kernel/timer.c b/kernel/timer.c
index 3bb01a3..0251dfa 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1384,7 +1384,7 @@ void update_process_times(int user_tick)
 	rcu_check_callbacks(cpu, user_tick);
 #ifdef CONFIG_IRQ_WORK
 	if (in_irq())
-		irq_work_run();
+		irq_work_run_tick();
 #endif
 	scheduler_tick();
 	run_posix_cpu_timers(p);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] irq_work: Allow remote queueing
  2014-05-11 23:33 [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3 Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 1/5] irq_work: Architecture support for remote irq work raise Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 2/5] irq_work: Force non-lazy works on IPI Frederic Weisbecker
@ 2014-05-11 23:33 ` Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
  4 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-11 23:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

irq work currently only supports local callbacks. However its code
is mostly ready to run remote callbacks and we have some potential user.

The full nohz subsystem currently open codes its own remote irq work
on top of the scheduler ipi when it wants a CPU to revaluate its next
tick.

However this ad hoc solution bloats the scheduler IPI.

Lets just extend the irq work subsystem to handle this kind of user.
This shouldn't add noticeable overhead.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/irq_work.h |  2 ++
 kernel/irq_work.c        | 29 +++++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 429b1ba..511e7f7 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }
 
 bool irq_work_queue(struct irq_work *work);
+bool irq_work_queue_on(struct irq_work *work, int cpu);
+
 void irq_work_run(void);
 void irq_work_run_tick(void);
 void irq_work_sync(struct irq_work *work);
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 0a554a6..3a7e163 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -55,12 +55,37 @@ void __weak arch_irq_work_raise(int cpu)
 	 */
 }
 
+static void irq_work_queue_raise(struct irq_work *work,
+				 struct llist_head *head, int cpu)
+{
+	if (llist_add(&work->llnode, head))
+		arch_irq_work_raise(cpu);
+}
+
+#ifdef CONFIG_HAVE_IRQ_WORK_IPI
 /*
  * Enqueue the irq_work @entry unless it's already pending
  * somewhere.
  *
  * Can be re-enqueued while the callback is still in progress.
  */
+bool irq_work_queue_on(struct irq_work *work, int cpu)
+{
+	/* Only queue if not already pending */
+	if (!irq_work_claim(work))
+		return false;
+
+	/* All work should have been flushed before going offline */
+	WARN_ON_ONCE(cpu_is_offline(cpu));
+	WARN_ON_ONCE(work->flags & IRQ_WORK_LAZY);
+
+	irq_work_queue_raise(work, &per_cpu(raised_list, cpu), cpu);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(irq_work_queue_on);
+#endif /* #endif CONFIG_HAVE_IRQ_WORK_IPI */
+
 bool irq_work_queue(struct irq_work *work)
 {
 	unsigned long flags;
@@ -78,8 +103,8 @@ bool irq_work_queue(struct irq_work *work)
 	 * for the next tick.
 	 */
 	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
-		if (llist_add(&work->llnode, &__get_cpu_var(raised_list)))
-			arch_irq_work_raise(smp_processor_id());
+		irq_work_queue_raise(work, &__get_cpu_var(raised_list),
+				     smp_processor_id());
 	} else {
 		llist_add(&work->llnode, &__get_cpu_var(lazy_list));
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] nohz: Move full nohz kick to its own IPI
  2014-05-11 23:33 [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-05-11 23:33 ` [PATCH 3/5] irq_work: Allow remote queueing Frederic Weisbecker
@ 2014-05-11 23:33 ` Frederic Weisbecker
  2014-05-11 23:33 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker
  4 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-11 23:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

Now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.

Lets use it for the full dynticks kick to notify a CPU that it's
exiting single task mode.

This unbloats a bit the scheduler IPI that the nohz code was abusing
for its cool "callable anywhere/anytime" properties.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h     |  9 ++++++++-
 kernel/sched/core.c      |  5 +----
 kernel/sched/sched.h     |  2 +-
 kernel/time/Kconfig      |  2 ++
 kernel/time/tick-sched.c | 10 ++++++----
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..8a4987f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -181,7 +181,13 @@ static inline bool tick_nohz_full_cpu(int cpu)
 
 extern void tick_nohz_init(void);
 extern void __tick_nohz_full_check(void);
-extern void tick_nohz_full_kick(void);
+extern void tick_nohz_full_kick_cpu(int cpu);
+
+static inline void tick_nohz_full_kick(void)
+{
+	tick_nohz_full_kick_cpu(smp_processor_id());
+}
+
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
@@ -189,6 +195,7 @@ static inline void tick_nohz_init(void) { }
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void __tick_nohz_full_check(void) { }
+static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..00ac248 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1500,9 +1500,7 @@ void scheduler_ipi(void)
 	 */
 	preempt_fold_need_resched();
 
-	if (llist_empty(&this_rq()->wake_list)
-			&& !tick_nohz_full_cpu(smp_processor_id())
-			&& !got_nohz_idle_kick())
+	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
 		return;
 
 	/*
@@ -1519,7 +1517,6 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
-	tick_nohz_full_check();
 	sched_ttwu_pending();
 
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 456e492..6089e00 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1225,7 +1225,7 @@ static inline void inc_nr_running(struct rq *rq)
 		if (tick_nohz_full_cpu(rq->cpu)) {
 			/* Order rq->nr_running write against the IPI */
 			smp_wmb();
-			smp_send_reschedule(rq->cpu);
+			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
 #endif
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index f448513..27f1f63 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -101,6 +101,8 @@ config NO_HZ_FULL
 	depends on HAVE_CONTEXT_TRACKING
 	# VIRT_CPU_ACCOUNTING_GEN dependency
 	depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
+	# tickless irq work
+	depends on HAVE_IRQ_WORK_IPI
 	select NO_HZ_COMMON
 	select RCU_USER_QS
 	select RCU_NOCB_CPU
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7a..3d63944 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -224,13 +224,15 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 };
 
 /*
- * Kick the current CPU if it's full dynticks in order to force it to
+ * Kick the CPU if it's full dynticks in order to force it to
  * re-evaluate its dependency on the tick and restart it if necessary.
  */
-void tick_nohz_full_kick(void)
+void tick_nohz_full_kick_cpu(int cpu)
 {
-	if (tick_nohz_full_cpu(smp_processor_id()))
-		irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
 static void nohz_full_kick_ipi(void *info)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w
  2014-05-11 23:33 [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2014-05-11 23:33 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
@ 2014-05-11 23:33 ` Frederic Weisbecker
  4 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-11 23:33 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Viresh Kumar

A full dynticks CPU is allowed to stop its tick when a single task runs.
Meanwhile when a new task gets enqueued, the CPU must be notified so that
it restart its tick to maintain local fairness and other accounting details.

This notification is performed by way of an IPI. Then when the target
receives the IPI, we expect it to see the new value of rq->nr_running.

Hence the following ordering scenario:

   CPU 0                   CPU 1

   write rq->running       get IPI
   smp_wmb()               smp_rmb()
   send IPI                read rq->nr_running

But Paul Mckenney says that nowadays IPIs imply a full barrier on
all architectures. So we can safely remove this pair and rely on the
implicit barriers that come along IPI send/receive. Lets
just comment on this new assumption.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  |  9 +++++----
 kernel/sched/sched.h | 10 ++++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 00ac248..0d78470 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -670,10 +670,11 @@ bool sched_can_stop_tick(void)
 
        rq = this_rq();
 
-       /* Make sure rq->nr_running update is visible after the IPI */
-       smp_rmb();
-
-       /* More than one running task need preemption */
+       /*
+	* More than one running task need preemption.
+	* nr_running update is assumed to be visible
+	* after IPI is sent from wakers.
+	*/
        if (rq->nr_running > 1)
                return false;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6089e00..219bfbd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1223,8 +1223,14 @@ static inline void inc_nr_running(struct rq *rq)
 #ifdef CONFIG_NO_HZ_FULL
 	if (rq->nr_running == 2) {
 		if (tick_nohz_full_cpu(rq->cpu)) {
-			/* Order rq->nr_running write against the IPI */
-			smp_wmb();
+			/*
+			 * Tick is needed if more than one task runs on a CPU.
+			 * Send the target an IPI to kick it out of nohz mode.
+			 *
+			 * We assume that IPI implies full memory barrier and the
+			 * new value of rq->nr_running is visible on reception
+			 * from the target.
+			 */
 			tick_nohz_full_kick_cpu(rq->cpu);
 		}
        }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-11 23:33 ` [PATCH 1/5] irq_work: Architecture support for remote irq work raise Frederic Weisbecker
@ 2014-05-12  0:08   ` Benjamin Herrenschmidt
  2014-05-12  3:11     ` Benjamin Herrenschmidt
  2014-05-12  7:56   ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-05-12  0:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, David S. Miller, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Paul Mackerras, Peter Zijlstra, Russell King,
	Thomas Gleixner, Viresh Kumar

On Mon, 2014-05-12 at 01:33 +0200, Frederic Weisbecker wrote:
> We are going to extend irq work to support remote queuing.
> 
> So lets add a cpu argument to arch_irq_work_raise(). The architectures
> willing to support that must then provide the backend to raise irq work
> IPIs remotely.
> 
> Initial support is provided for x86 and ARM since they are easily
> extended. The other archs that overwrite arch_irq_work_raise() seem
> to use local clock interrupts and therefore need deeper rewrite of their
> irq work support to implement remote raising.

Well, looks like it's time to turn it into an IPI... It gets a bit more
tricky because whether whacking the interrupt controller is safe to
do from an NMI is safe or not might depend on that irq controller
implementation...

It looks like XICS and MPIC should be safe though, so at least we
should be able to cover ppc64, but I'll leave ppc32 alone.

Cheers,
Ben.

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  arch/Kconfig               | 12 ++++++++++++
>  arch/alpha/kernel/time.c   |  3 ++-
>  arch/arm/Kconfig           |  1 +
>  arch/arm/kernel/smp.c      |  4 ++--
>  arch/powerpc/kernel/time.c |  3 ++-
>  arch/sparc/kernel/pcr.c    |  3 ++-
>  arch/x86/Kconfig           |  1 +
>  arch/x86/kernel/irq_work.c | 10 ++--------
>  kernel/irq_work.c          |  4 ++--
>  9 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 97ff872..3a38356 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -472,6 +472,18 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK
>  	  This spares a stack switch and improves cache usage on softirq
>  	  processing.
>  
> +config HAVE_IRQ_WORK_IPI
> +	bool
> +	help
> +	  Architecture supports raising irq work interrupts both locally and
> +	  remotely. Without this capability, we can only trigger local irq works
> +	  loosely handled by the generic timer tick with the bad implications
> +	  coming along: the irq work is subject to HZ latency and it runs under
> +	  the tick random locking scenario (possibly holding hrtimer lock).
> +
> +	  This capability is required on configs running with a very minimized
> +	  tick rate like full dynticks.
> +
>  #
>  # ABI hall of shame
>  #
> diff --git a/arch/alpha/kernel/time.c b/arch/alpha/kernel/time.c
> index ee39cee..2ff0c61 100644
> --- a/arch/alpha/kernel/time.c
> +++ b/arch/alpha/kernel/time.c
> @@ -60,8 +60,9 @@ DEFINE_PER_CPU(u8, irq_work_pending);
>  #define test_irq_work_pending()      __get_cpu_var(irq_work_pending)
>  #define clear_irq_work_pending()     __get_cpu_var(irq_work_pending) = 0
>  
> -void arch_irq_work_raise(void)
> +void arch_irq_work_raise(int cpu)
>  {
> +	WARN_ON_ONCE(cpu != smp_processor_id());
>  	set_irq_work_pending_flag();
>  }
>  
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index db3c541..7edce21 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -46,6 +46,7 @@ config ARM
>  	select HAVE_HW_BREAKPOINT if (PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7))
>  	select HAVE_IDE if PCI || ISA || PCMCIA
>  	select HAVE_IRQ_TIME_ACCOUNTING
> +	select HAVE_IRQ_WORK_IPI
>  	select HAVE_KERNEL_GZIP
>  	select HAVE_KERNEL_LZ4
>  	select HAVE_KERNEL_LZMA
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 7c4fada..042a800 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -454,10 +454,10 @@ void arch_send_call_function_single_ipi(int cpu)
>  }
>  
>  #ifdef CONFIG_IRQ_WORK
> -void arch_irq_work_raise(void)
> +void arch_irq_work_raise(int cpu)
>  {
>  	if (is_smp())
> -		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
> +		smp_cross_call(cpumask_of(cpu), IPI_IRQ_WORK);
>  }
>  #endif
>  
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 122a580..4de25f4 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -464,9 +464,10 @@ DEFINE_PER_CPU(u8, irq_work_pending);
>  
>  #endif /* 32 vs 64 bit */
>  
> -void arch_irq_work_raise(void)
> +void arch_irq_work_raise(int cpu)
>  {
>  	preempt_disable();
> +	WARN_ON_ONCE(cpu != smp_processor_id());
>  	set_irq_work_pending_flag();
>  	set_dec(1);
>  	preempt_enable();
> diff --git a/arch/sparc/kernel/pcr.c b/arch/sparc/kernel/pcr.c
> index 269af58..0e5bfd9 100644
> --- a/arch/sparc/kernel/pcr.c
> +++ b/arch/sparc/kernel/pcr.c
> @@ -43,8 +43,9 @@ void __irq_entry deferred_pcr_work_irq(int irq, struct pt_regs *regs)
>  	set_irq_regs(old_regs);
>  }
>  
> -void arch_irq_work_raise(void)
> +void arch_irq_work_raise(int cpu)
>  {
> +	WARN_ON_ONCE(cpu != smp_processor_id());
>  	set_softint(1 << PIL_DEFERRED_PCR_WORK);
>  }
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 25d2c6f..b06f3fd 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -130,6 +130,7 @@ config X86
>  	select HAVE_CC_STACKPROTECTOR
>  	select GENERIC_CPU_AUTOPROBE
>  	select HAVE_ARCH_AUDITSYSCALL
> +	select HAVE_IRQ_WORK_IPI
>  
>  config INSTRUCTION_DECODER
>  	def_bool y
> diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
> index 1de84e3..500ec1f 100644
> --- a/arch/x86/kernel/irq_work.c
> +++ b/arch/x86/kernel/irq_work.c
> @@ -38,13 +38,7 @@ __visible void smp_trace_irq_work_interrupt(struct pt_regs *regs)
>  	exiting_irq();
>  }
>  
> -void arch_irq_work_raise(void)
> +void arch_irq_work_raise(int cpu)
>  {
> -#ifdef CONFIG_X86_LOCAL_APIC
> -	if (!cpu_has_apic)
> -		return;
> -
> -	apic->send_IPI_self(IRQ_WORK_VECTOR);
> -	apic_wait_icr_idle();
> -#endif
> +	apic->send_IPI_mask(cpumask_of(cpu), IRQ_WORK_VECTOR);
>  }
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index a82170e..2559383 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -48,7 +48,7 @@ static bool irq_work_claim(struct irq_work *work)
>  	return true;
>  }
>  
> -void __weak arch_irq_work_raise(void)
> +void __weak arch_irq_work_raise(int cpu)
>  {
>  	/*
>  	 * Lame architectures will get the timer tick callback
> @@ -79,7 +79,7 @@ bool irq_work_queue(struct irq_work *work)
>  	 */
>  	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
>  		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> -			arch_irq_work_raise();
> +			arch_irq_work_raise(smp_processor_id());
>  	}
>  
>  	preempt_enable();



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-12  0:08   ` Benjamin Herrenschmidt
@ 2014-05-12  3:11     ` Benjamin Herrenschmidt
  2014-05-12 16:29       ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-05-12  3:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, David S. Miller, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Paul Mackerras, Peter Zijlstra, Russell King,
	Thomas Gleixner, Viresh Kumar

On Mon, 2014-05-12 at 10:08 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-12 at 01:33 +0200, Frederic Weisbecker wrote:
> > We are going to extend irq work to support remote queuing.
> > 
> > So lets add a cpu argument to arch_irq_work_raise(). The architectures
> > willing to support that must then provide the backend to raise irq work
> > IPIs remotely.
> > 
> > Initial support is provided for x86 and ARM since they are easily
> > extended. The other archs that overwrite arch_irq_work_raise() seem
> > to use local clock interrupts and therefore need deeper rewrite of their
> > irq work support to implement remote raising.
> 
> Well, looks like it's time to turn it into an IPI... It gets a bit more
> tricky because whether whacking the interrupt controller is safe to
> do from an NMI is safe or not might depend on that irq controller
> implementation...
> 
> It looks like XICS and MPIC should be safe though, so at least we
> should be able to cover ppc64, but I'll leave ppc32 alone.

Correction... that's actually a bit more tricky. We might need an MMIO
to trigger the IPI. That means potentially having to take a hash miss,
and we certainly can't do that at NMI time at the moment.

We *could* hard disable interrupts (which blocks our NMIs since they
arent't real NMIs, they are just a way to bypass our soft-disable state
for perf interrupts) for hash_page, but that still makes me somewhat
nervous.

Another option would be to add an ioremap flag of some description to
be able to install bolted hash entries. (It already does so if called
early enough during boot, so it might actually just work by accident but
that's an undebuggable horror show waiting to happen if we ever change
that).

So needs a bit more thinking on our side.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-11 23:33 ` [PATCH 1/5] irq_work: Architecture support for remote irq work raise Frederic Weisbecker
  2014-05-12  0:08   ` Benjamin Herrenschmidt
@ 2014-05-12  7:56   ` Peter Zijlstra
  2014-05-12 16:26     ` Frederic Weisbecker
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2014-05-12  7:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Benjamin Herrenschmidt, David S. Miller,
	Ingo Molnar, Kevin Hilman, Paul E. McKenney, Paul Mackerras,
	Russell King, Thomas Gleixner, Viresh Kumar

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Mon, May 12, 2014 at 01:33:53AM +0200, Frederic Weisbecker wrote:
> We are going to extend irq work to support remote queuing.
> 
> So lets add a cpu argument to arch_irq_work_raise(). The architectures
> willing to support that must then provide the backend to raise irq work
> IPIs remotely.
> 
> Initial support is provided for x86 and ARM since they are easily
> extended. The other archs that overwrite arch_irq_work_raise() seem
> to use local clock interrupts and therefore need deeper rewrite of their
> irq work support to implement remote raising.
> 

> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

Why not borrow the smp_call_function IPI for the remote bits? We could
limit the 'safe from NMI' to the local works. And we validate this by
putting a WARN_ON(in_nmi()) in irq_work_queue_on().

At some later stage we could look at maybe merging the smp_function_call
and irq_work into a single thing, where we have the irq_work interface
as async and the smp_function_call() as sync interface.

But for now a quick 'hack' would be to call __irq_work_run() from
generic_smp_call_function_single_interrupt().

That would leave arch_irq_work_raise() as the special NMI safe local IPI
hook.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-12  7:56   ` Peter Zijlstra
@ 2014-05-12 16:26     ` Frederic Weisbecker
  2014-05-12 17:17       ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-12 16:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Benjamin Herrenschmidt, David S. Miller,
	Ingo Molnar, Kevin Hilman, Paul E. McKenney, Paul Mackerras,
	Russell King, Thomas Gleixner, Viresh Kumar

On Mon, May 12, 2014 at 09:56:50AM +0200, Peter Zijlstra wrote:
> On Mon, May 12, 2014 at 01:33:53AM +0200, Frederic Weisbecker wrote:
> > We are going to extend irq work to support remote queuing.
> > 
> > So lets add a cpu argument to arch_irq_work_raise(). The architectures
> > willing to support that must then provide the backend to raise irq work
> > IPIs remotely.
> > 
> > Initial support is provided for x86 and ARM since they are easily
> > extended. The other archs that overwrite arch_irq_work_raise() seem
> > to use local clock interrupts and therefore need deeper rewrite of their
> > irq work support to implement remote raising.
> > 
> 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> Why not borrow the smp_call_function IPI for the remote bits? We could
> limit the 'safe from NMI' to the local works. And we validate this by
> putting a WARN_ON(in_nmi()) in irq_work_queue_on().

Right, but although I don't need it to be safe from NMI, I need it
to be callable concurrently and when irqs are disabled.

So we can't use smp_call_function_single() for that. But we can use the async
version in which case we must keep the irq work claim. But that's
about the same than smp_queue_function_single() we had previously
and we are back with our csd_lock issue.

> 
> At some later stage we could look at maybe merging the smp_function_call
> and irq_work into a single thing, where we have the irq_work interface
> as async and the smp_function_call() as sync interface.
> 
> But for now a quick 'hack' would be to call __irq_work_run() from
> generic_smp_call_function_single_interrupt().
> 
> That would leave arch_irq_work_raise() as the special NMI safe local IPI
> hook.

I don't know, calling irq_work_run() from there sounds like an overkill.

Or we can encapsulate:

struct remote_irq_work {
    struct irq_work work;
    struct call_single_data csd;
}

bool irq_work_queue_on(remote_work, cpu)
{
   if (irq_work_claim(remote_work.work))
      return false;
   remote_work.csd.func = irq_work_remote_interrupt;
   remotr_work.csd.info = work
   smp_call_function_single_async(&remote_work.csd, cpu);
}

void irq_work_remote_interrupt(void *info)
{
    struct irq_work *work = info;

    work->func(work);
}

And some tweaking to make csd_lock() out of the way. smp_call_function_single_async()
don't need it.

Or the two other solutions:

1) expand irq_work to support remote queuing. And really this hasn't added more
atomic operations nor smp barriers that the local queuing. The only complication
is that we need remote raise support from arch.

2) expand smp_call_function stuff with smp_queue_function_single(). As I did
previously. It's the same than the above irq_work_queue_on() above will need to be
conditional though.

The native remote support on irq_work sounds to me the most proper. But unfortunately
it requires support from arch that we already have with smp_function...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-12  3:11     ` Benjamin Herrenschmidt
@ 2014-05-12 16:29       ` Frederic Weisbecker
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-12 16:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: LKML, Andrew Morton, David S. Miller, Ingo Molnar, Kevin Hilman,
	Paul E. McKenney, Paul Mackerras, Peter Zijlstra, Russell King,
	Thomas Gleixner, Viresh Kumar

On Mon, May 12, 2014 at 01:11:50PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-12 at 10:08 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-05-12 at 01:33 +0200, Frederic Weisbecker wrote:
> > > We are going to extend irq work to support remote queuing.
> > > 
> > > So lets add a cpu argument to arch_irq_work_raise(). The architectures
> > > willing to support that must then provide the backend to raise irq work
> > > IPIs remotely.
> > > 
> > > Initial support is provided for x86 and ARM since they are easily
> > > extended. The other archs that overwrite arch_irq_work_raise() seem
> > > to use local clock interrupts and therefore need deeper rewrite of their
> > > irq work support to implement remote raising.
> > 
> > Well, looks like it's time to turn it into an IPI... It gets a bit more
> > tricky because whether whacking the interrupt controller is safe to
> > do from an NMI is safe or not might depend on that irq controller
> > implementation...
> > 
> > It looks like XICS and MPIC should be safe though, so at least we
> > should be able to cover ppc64, but I'll leave ppc32 alone.
> 
> Correction... that's actually a bit more tricky. We might need an MMIO
> to trigger the IPI. That means potentially having to take a hash miss,
> and we certainly can't do that at NMI time at the moment.
> 
> We *could* hard disable interrupts (which blocks our NMIs since they
> arent't real NMIs, they are just a way to bypass our soft-disable state
> for perf interrupts) for hash_page, but that still makes me somewhat
> nervous.
> 
> Another option would be to add an ioremap flag of some description to
> be able to install bolted hash entries. (It already does so if called
> early enough during boot, so it might actually just work by accident but
> that's an undebuggable horror show waiting to happen if we ever change
> that).
> 
> So needs a bit more thinking on our side.

Yeah, well if we ever end up with native remote irq work, only local raise
will need to be NMI-safe. If that ever helps...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-12 16:26     ` Frederic Weisbecker
@ 2014-05-12 17:17       ` Peter Zijlstra
  2014-05-12 17:41         ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2014-05-12 17:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Benjamin Herrenschmidt, David S. Miller,
	Ingo Molnar, Kevin Hilman, Paul E. McKenney, Paul Mackerras,
	Russell King, Thomas Gleixner, Viresh Kumar

On Mon, May 12, 2014 at 06:26:49PM +0200, Frederic Weisbecker wrote:
> On Mon, May 12, 2014 at 09:56:50AM +0200, Peter Zijlstra wrote:
> > On Mon, May 12, 2014 at 01:33:53AM +0200, Frederic Weisbecker wrote:
> > > We are going to extend irq work to support remote queuing.
> > > 
> > > So lets add a cpu argument to arch_irq_work_raise(). The architectures
> > > willing to support that must then provide the backend to raise irq work
> > > IPIs remotely.
> > > 
> > > Initial support is provided for x86 and ARM since they are easily
> > > extended. The other archs that overwrite arch_irq_work_raise() seem
> > > to use local clock interrupts and therefore need deeper rewrite of their
> > > irq work support to implement remote raising.
> > > 
> > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > 
> > Why not borrow the smp_call_function IPI for the remote bits? We could
> > limit the 'safe from NMI' to the local works. And we validate this by
> > putting a WARN_ON(in_nmi()) in irq_work_queue_on().
> 
> Right, but although I don't need it to be safe from NMI, I need it
> to be callable concurrently and when irqs are disabled.
> 
> So we can't use smp_call_function_single() for that. But we can use the async
> version in which case we must keep the irq work claim. But that's
> about the same than smp_queue_function_single() we had previously
> and we are back with our csd_lock issue.

Who said anything about using smp_call_function_single()?


---
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a82170e2fa78..2fc9d8ece05a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -61,7 +61,8 @@ void __weak arch_irq_work_raise(void)
  *
  * Can be re-enqueued while the callback is still in progress.
  */
-bool irq_work_queue(struct irq_work *work)
+static __always_inline bool
+__irq_work_queue_on(struct irq_work *work, int cpu)
 {
 	/* Only queue if not already pending */
 	if (!irq_work_claim(work))
@@ -78,16 +79,31 @@ bool irq_work_queue(struct irq_work *work)
 	 * for the next tick.
 	 */
 	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
-		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
-			arch_irq_work_raise();
+		if (cmpxchg(&__get_cpu_var(irq_work_raised, 0, 1) == 0)) {
+			if (cpu == smp_processor_id() || cpu == -1)
+				arch_irq_work_raise();
+			else
+				arch_send_call_function_single_ipi();
+		}
 	}
 
 	preempt_enable();
 
 	return true;
 }
+
+bool irq_work_queue(struct irq_work *work)
+{
+	return __irq_work_queue_on(work, -1);
+}
 EXPORT_SYMBOL_GPL(irq_work_queue);
 
+bool irq_work_queue_on(struct irq_work *work, int cpu)
+{
+	WARN_ON_ONCE(in_nmi());
+	return __irq_work_queue_on(work, cpu);
+}
+
 bool irq_work_needs_cpu(void)
 {
 	struct llist_head *this_list;
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e42c72..0fd53963c4fb 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -198,6 +198,12 @@ void generic_smp_call_function_single_interrupt(void)
 		csd->func(csd->info);
 		csd_unlock(csd);
 	}
+
+	/*
+	 * First run the synchronous callbacks, people are waiting on them;
+	 * then run the async ones.
+	 */
+	irq_work_run();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise
  2014-05-12 17:17       ` Peter Zijlstra
@ 2014-05-12 17:41         ` Frederic Weisbecker
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2014-05-12 17:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Andrew Morton, Benjamin Herrenschmidt, David S. Miller,
	Ingo Molnar, Kevin Hilman, Paul E. McKenney, Paul Mackerras,
	Russell King, Thomas Gleixner, Viresh Kumar

On Mon, May 12, 2014 at 07:17:29PM +0200, Peter Zijlstra wrote:
> On Mon, May 12, 2014 at 06:26:49PM +0200, Frederic Weisbecker wrote:
> > On Mon, May 12, 2014 at 09:56:50AM +0200, Peter Zijlstra wrote:
> > > On Mon, May 12, 2014 at 01:33:53AM +0200, Frederic Weisbecker wrote:
> > > > We are going to extend irq work to support remote queuing.
> > > > 
> > > > So lets add a cpu argument to arch_irq_work_raise(). The architectures
> > > > willing to support that must then provide the backend to raise irq work
> > > > IPIs remotely.
> > > > 
> > > > Initial support is provided for x86 and ARM since they are easily
> > > > extended. The other archs that overwrite arch_irq_work_raise() seem
> > > > to use local clock interrupts and therefore need deeper rewrite of their
> > > > irq work support to implement remote raising.
> > > > 
> > > 
> > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > Why not borrow the smp_call_function IPI for the remote bits? We could
> > > limit the 'safe from NMI' to the local works. And we validate this by
> > > putting a WARN_ON(in_nmi()) in irq_work_queue_on().
> > 
> > Right, but although I don't need it to be safe from NMI, I need it
> > to be callable concurrently and when irqs are disabled.
> > 
> > So we can't use smp_call_function_single() for that. But we can use the async
> > version in which case we must keep the irq work claim. But that's
> > about the same than smp_queue_function_single() we had previously
> > and we are back with our csd_lock issue.
> 
> Who said anything about using smp_call_function_single()?

Ah shortcutting, doesn't look bad indeed.

> 
> 
> ---
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index a82170e2fa78..2fc9d8ece05a 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -61,7 +61,8 @@ void __weak arch_irq_work_raise(void)
>   *
>   * Can be re-enqueued while the callback is still in progress.
>   */
> -bool irq_work_queue(struct irq_work *work)
> +static __always_inline bool
> +__irq_work_queue_on(struct irq_work *work, int cpu)
>  {
>  	/* Only queue if not already pending */
>  	if (!irq_work_claim(work))
> @@ -78,16 +79,31 @@ bool irq_work_queue(struct irq_work *work)
>  	 * for the next tick.
>  	 */
>  	if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
> -		if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
> -			arch_irq_work_raise();
> +		if (cmpxchg(&__get_cpu_var(irq_work_raised, 0, 1) == 0)) {
> +			if (cpu == smp_processor_id() || cpu == -1)
> +				arch_irq_work_raise();
> +			else
> +				arch_send_call_function_single_ipi();
> +		}

Ok that needs some more tuning with the raised flag and the destination list
to pick, but I get the idea.

>  	}
>  
>  	preempt_enable();
>  
>  	return true;
>  }
> +
> +bool irq_work_queue(struct irq_work *work)
> +{
> +	return __irq_work_queue_on(work, -1);
> +}
>  EXPORT_SYMBOL_GPL(irq_work_queue);
>  
> +bool irq_work_queue_on(struct irq_work *work, int cpu)
> +{
> +	WARN_ON_ONCE(in_nmi());
> +	return __irq_work_queue_on(work, cpu);
> +}
> +
>  bool irq_work_needs_cpu(void)
>  {
>  	struct llist_head *this_list;
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 06d574e42c72..0fd53963c4fb 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -198,6 +198,12 @@ void generic_smp_call_function_single_interrupt(void)
>  		csd->func(csd->info);
>  		csd_unlock(csd);
>  	}
> +
> +	/*
> +	 * First run the synchronous callbacks, people are waiting on them;
> +	 * then run the async ones.
> +	 */
> +	irq_work_run();
>  }

Alright, I'm reiterating with that.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-05-12 17:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-11 23:33 [RFC PATCH 0/5] nohz: Move nohz kick out of scheduler IPI, v3 Frederic Weisbecker
2014-05-11 23:33 ` [PATCH 1/5] irq_work: Architecture support for remote irq work raise Frederic Weisbecker
2014-05-12  0:08   ` Benjamin Herrenschmidt
2014-05-12  3:11     ` Benjamin Herrenschmidt
2014-05-12 16:29       ` Frederic Weisbecker
2014-05-12  7:56   ` Peter Zijlstra
2014-05-12 16:26     ` Frederic Weisbecker
2014-05-12 17:17       ` Peter Zijlstra
2014-05-12 17:41         ` Frederic Weisbecker
2014-05-11 23:33 ` [PATCH 2/5] irq_work: Force non-lazy works on IPI Frederic Weisbecker
2014-05-11 23:33 ` [PATCH 3/5] irq_work: Allow remote queueing Frederic Weisbecker
2014-05-11 23:33 ` [PATCH 4/5] nohz: Move full nohz kick to its own IPI Frederic Weisbecker
2014-05-11 23:33 ` [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox