All of lore.kernel.org
 help / color / mirror / Atom feed
From: tip-bot for Steven Rostedt <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, efault@gmx.de,
	hpa@zytor.com, torvalds@linux-foundation.org, mingo@kernel.org,
	rostedt@goodmis.org, peterz@infradead.org
Subject: [tip:sched/core] sched/core: Add preempt checks in preempt_schedule() code
Date: Thu, 31 Mar 2016 02:28:50 -0700	[thread overview]
Message-ID: <tip-47252cfbac03644ee4a3adfa50c77896aa94f2bb@git.kernel.org> (raw)
In-Reply-To: <20160321112339.6dc78ad6@gandalf.local.home>

Commit-ID:  47252cfbac03644ee4a3adfa50c77896aa94f2bb
Gitweb:     http://git.kernel.org/tip/47252cfbac03644ee4a3adfa50c77896aa94f2bb
Author:     Steven Rostedt <rostedt@goodmis.org>
AuthorDate: Mon, 21 Mar 2016 11:23:39 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:49:45 +0200

sched/core: Add preempt checks in preempt_schedule() code

While testing the tracer preemptoff, I hit this strange trace:

   <...>-259     0...1    0us : schedule <-worker_thread
   <...>-259     0d..1    0us : rcu_note_context_switch <-__schedule
   <...>-259     0d..1    0us : rcu_sched_qs <-rcu_note_context_switch
   <...>-259     0d..1    0us : rcu_preempt_qs <-rcu_note_context_switch
   <...>-259     0d..1    0us : _raw_spin_lock <-__schedule
   <...>-259     0d..1    0us : preempt_count_add <-_raw_spin_lock
   <...>-259     0d..2    0us : do_raw_spin_lock <-_raw_spin_lock
   <...>-259     0d..2    1us : deactivate_task <-__schedule
   <...>-259     0d..2    1us : update_rq_clock.part.84 <-deactivate_task
   <...>-259     0d..2    1us : dequeue_task_fair <-deactivate_task
   <...>-259     0d..2    1us : dequeue_entity <-dequeue_task_fair
   <...>-259     0d..2    1us : update_curr <-dequeue_entity
   <...>-259     0d..2    1us : update_min_vruntime <-update_curr
   <...>-259     0d..2    1us : cpuacct_charge <-update_curr
   <...>-259     0d..2    1us : __rcu_read_lock <-cpuacct_charge
   <...>-259     0d..2    1us : __rcu_read_unlock <-cpuacct_charge
   <...>-259     0d..2    1us : clear_buddies <-dequeue_entity
   <...>-259     0d..2    1us : account_entity_dequeue <-dequeue_entity
   <...>-259     0d..2    2us : update_min_vruntime <-dequeue_entity
   <...>-259     0d..2    2us : update_cfs_shares <-dequeue_entity
   <...>-259     0d..2    2us : hrtick_update <-dequeue_task_fair
   <...>-259     0d..2    2us : wq_worker_sleeping <-__schedule
   <...>-259     0d..2    2us : kthread_data <-wq_worker_sleeping
   <...>-259     0d..2    2us : pick_next_task_fair <-__schedule
   <...>-259     0d..2    2us : check_cfs_rq_runtime <-pick_next_task_fair
   <...>-259     0d..2    2us : pick_next_entity <-pick_next_task_fair
   <...>-259     0d..2    2us : clear_buddies <-pick_next_entity
   <...>-259     0d..2    2us : pick_next_entity <-pick_next_task_fair
   <...>-259     0d..2    2us : clear_buddies <-pick_next_entity
   <...>-259     0d..2    2us : set_next_entity <-pick_next_task_fair
   <...>-259     0d..2    3us : put_prev_entity <-pick_next_task_fair
   <...>-259     0d..2    3us : check_cfs_rq_runtime <-put_prev_entity
   <...>-259     0d..2    3us : set_next_entity <-pick_next_task_fair
gnome-sh-1031    0d..2    3us : finish_task_switch <-__schedule
gnome-sh-1031    0d..2    3us : _raw_spin_unlock_irq <-finish_task_switch
gnome-sh-1031    0d..2    3us : do_raw_spin_unlock <-_raw_spin_unlock_irq
gnome-sh-1031    0...2    3us!: preempt_count_sub <-_raw_spin_unlock_irq
gnome-sh-1031    0...1  582us : do_raw_spin_lock <-_raw_spin_lock
gnome-sh-1031    0...1  583us : _raw_spin_unlock <-drm_gem_object_lookup
gnome-sh-1031    0...1  583us : do_raw_spin_unlock <-_raw_spin_unlock
gnome-sh-1031    0...1  583us : preempt_count_sub <-_raw_spin_unlock
gnome-sh-1031    0...1  584us : _raw_spin_unlock <-drm_gem_object_lookup
gnome-sh-1031    0...1  584us+: trace_preempt_on <-drm_gem_object_lookup
gnome-sh-1031    0...1  603us : <stack trace>
 => preempt_count_sub
 => _raw_spin_unlock
 => drm_gem_object_lookup
 => i915_gem_madvise_ioctl
 => drm_ioctl
 => do_vfs_ioctl
 => SyS_ioctl
 => entry_SYSCALL_64_fastpath

As I'm tracing preemption disabled, it seemed incorrect that the trace
would go across a schedule and report not being in the scheduler.
Looking into this I discovered the problem.

schedule() calls preempt_disable() but the preempt_schedule() calls
preempt_enable_notrace(). What happened above was that the gnome-shell
task was preempted on another CPU, migrated over to the idle cpu. The
tracer stared with idle calling schedule(), which called
preempt_disable(), but then gnome-shell finished, and it enabled
preemption with preempt_enable_notrace() that does stop the trace, even
though preemption was enabled.

The purpose of the preempt_disable_notrace() in the preempt_schedule()
is to prevent function tracing from going into an infinite loop.
Because function tracing can trace the preempt_enable/disable() calls
that are traced. The problem with function tracing is:

  NEED_RESCHED set
  preempt_schedule()
    preempt_disable()
      preempt_count_inc()
        function trace (before incrementing preempt count)
          preempt_disable_notrace()
          preempt_enable_notrace()
            sees NEED_RESCHED set
               preempt_schedule() (repeat)

Now by breaking out the preempt off/on tracing into their own code:
preempt_disable_check() and preempt_enable_check(), we can add these to
the preempt_schedule() code. As preemption would then be disabled, even
if they were to be traced by the function tracer, the disabled
preemption would prevent the recursion.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160321112339.6dc78ad6@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 59 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..b5cf01d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2958,6 +2958,20 @@ u64 scheduler_tick_max_deferment(void)
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
 				defined(CONFIG_PREEMPT_TRACER))
+/*
+ * If the value passed in is equal to the current preempt count
+ * then we just disabled preemption. Start timing the latency.
+ */
+static inline void preempt_latency_start(int val)
+{
+	if (preempt_count() == val) {
+		unsigned long ip = get_lock_parent_ip();
+#ifdef CONFIG_DEBUG_PREEMPT
+		current->preempt_disable_ip = ip;
+#endif
+		trace_preempt_off(CALLER_ADDR0, ip);
+	}
+}
 
 void preempt_count_add(int val)
 {
@@ -2976,17 +2990,21 @@ void preempt_count_add(int val)
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
 #endif
-	if (preempt_count() == val) {
-		unsigned long ip = get_lock_parent_ip();
-#ifdef CONFIG_DEBUG_PREEMPT
-		current->preempt_disable_ip = ip;
-#endif
-		trace_preempt_off(CALLER_ADDR0, ip);
-	}
+	preempt_latency_start(val);
 }
 EXPORT_SYMBOL(preempt_count_add);
 NOKPROBE_SYMBOL(preempt_count_add);
 
+/*
+ * If the value passed in equals to the current preempt count
+ * then we just enabled preemption. Stop timing the latency.
+ */
+static inline void preempt_latency_stop(int val)
+{
+	if (preempt_count() == val)
+		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+}
+
 void preempt_count_sub(int val)
 {
 #ifdef CONFIG_DEBUG_PREEMPT
@@ -3003,13 +3021,15 @@ void preempt_count_sub(int val)
 		return;
 #endif
 
-	if (preempt_count() == val)
-		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+	preempt_latency_stop(val);
 	__preempt_count_sub(val);
 }
 EXPORT_SYMBOL(preempt_count_sub);
 NOKPROBE_SYMBOL(preempt_count_sub);
 
+#else
+static inline void preempt_latency_start(int val) { }
+static inline void preempt_latency_stop(int val) { }
 #endif
 
 /*
@@ -3284,8 +3304,23 @@ void __sched schedule_preempt_disabled(void)
 static void __sched notrace preempt_schedule_common(void)
 {
 	do {
+		/*
+		 * Because the function tracer can trace preempt_count_sub()
+		 * and it also uses preempt_enable/disable_notrace(), if
+		 * NEED_RESCHED is set, the preempt_enable_notrace() called
+		 * by the function tracer will call this function again and
+		 * cause infinite recursion.
+		 *
+		 * Preemption must be disabled here before the function
+		 * tracer can trace. Break up preempt_disable() into two
+		 * calls. One to disable preemption without fear of being
+		 * traced. The other to still record the preemption latency,
+		 * which can also be traced by the function tracer.
+		 */
 		preempt_disable_notrace();
+		preempt_latency_start(1);
 		__schedule(true);
+		preempt_latency_stop(1);
 		preempt_enable_no_resched_notrace();
 
 		/*
@@ -3337,7 +3372,21 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 		return;
 
 	do {
+		/*
+		 * Because the function tracer can trace preempt_count_sub()
+		 * and it also uses preempt_enable/disable_notrace(), if
+		 * NEED_RESCHED is set, the preempt_enable_notrace() called
+		 * by the function tracer will call this function again and
+		 * cause infinite recursion.
+		 *
+		 * Preemption must be disabled here before the function
+		 * tracer can trace. Break up preempt_disable() into two
+		 * calls. One to disable preemption without fear of being
+		 * traced. The other to still record the preemption latency,
+		 * which can also be traced by the function tracer.
+		 */
 		preempt_disable_notrace();
+		preempt_latency_start(1);
 		/*
 		 * Needs preempt disabled in case user_exit() is traced
 		 * and the tracer calls preempt_enable_notrace() causing
@@ -3347,6 +3396,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 		__schedule(true);
 		exception_exit(prev_ctx);
 
+		preempt_latency_stop(1);
 		preempt_enable_no_resched_notrace();
 	} while (need_resched());
 }

      parent reply	other threads:[~2016-03-31  9:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-21 15:23 [PATCH v2] sched: Add preempt checks in preempt_schedule() code Steven Rostedt
2016-03-22  9:09 ` Peter Zijlstra
2016-03-22 12:52   ` Steven Rostedt
2016-03-29 10:22     ` Ingo Molnar
2016-03-31  9:28 ` tip-bot for Steven Rostedt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-47252cfbac03644ee4a3adfa50c77896aa94f2bb@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=efault@gmx.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.