linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: "Niklas Söderlund" <niklas.soderlund@ragnatech.se>
Cc: Heiner Kallweit <hkallweit1@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-renesas-soc@vger.kernel.org
Subject: Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")
Date: Mon, 23 Apr 2018 11:54:20 +0200	[thread overview]
Message-ID: <20180423095420.GA23995@linaro.org> (raw)
In-Reply-To: <20180422221827.GB27674@bigcity.dyn.berto.se>

Hi Niklas,

Le Monday 23 Apr 2018 à 00:18:27 (+0200), Niklas Söderlund a écrit :
> Hi Vincent,
> 
> On 2018-04-20 18:00:13 +0200, Vincent Guittot wrote:
> 
> [snip]
> 
> > > 
> > > You results are similar to Heiner's ones. The problem is still there
> > > even if we only kick ilb which mainly send an IPI reschedule to the
> > > other CPU if Idle
> > > 
> > 
> > Could it be possible to record some traces of the problem to get a better view
> > of what happens ?
> > 
> > I have a small patch that adds some traces in the functions that seems to create
> > the problem
> 
> Sure, I applied the patch bellow on-top of c18bb396d3d261eb ("Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net").
> 
> > 
> > 
> > 
> > Also that would be good to stop tracing when the RCU stall happens
> > 
> > In case you are not familiar with the trace tool, I have put below how to configure ftrace to trace scheudler, irq, timer ... events and stop tracing when dump_backtrace function is called
> > 
> > trace-cmd reset > /dev/null
> > trace-cmd start -b 40000 -p function -l dump_backtrace:traceoff -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk
> > trace-cmd start -b 40000 -p function -l dump_backtrace
> > 
> > The trace-cmd start has to be called twice to make sure that we will not trace all function
> > 
> > Once the dump happen and the trace is stopped, you can extract the traces with 
> > 
> > trace-cmd extract -o <trace file name>
> 
> Thanks for the help with trace-cmd, I have attached the full 
> trace-cmd.dat extracted from running this. But a quick run of trace-cmd 
> report.i

Thanks for the report. Can you re run with the following trace-cmd sequence ? My previous sequence disables ftrace events

trace-cmd reset > /dev/null
trace-cmd start -b 40000 -p function -l dump_backtrace:traceoff -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk
trace-cmd start -b 40000 -p function -l dump_backtrace -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk

I have updated the patch and added traces to check that scheduler returns from idle_balance function and doesn't stay stuck

---
 kernel/sched/fair.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0951d1c..4285511 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9606,6 +9606,8 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 	 */
 	WRITE_ONCE(nohz.has_blocked, 0);
 
+	trace_printk("_nohz_idle_balance cpu %d idle %d flag %x", this_cpu, idle, flags);
+
 	/*
 	 * Ensures that if we miss the CPU, we must see the has_blocked
 	 * store from nohz_balance_enter_idle().
@@ -9628,6 +9630,8 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 
 		rq = cpu_rq(balance_cpu);
 
+		trace_printk("_nohz_idle_balance update cpu %d ", balance_cpu);
+
 		has_blocked_load |= update_nohz_stats(rq, true);
 
 		/*
@@ -9680,6 +9684,8 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
 	if (likely(update_next_balance))
 		nohz.next_balance = next_balance;
 
+	trace_printk("_nohz_idle_balance return %d", ret);
+
 	return ret;
 }
 
@@ -9732,6 +9738,8 @@ static void nohz_newidle_balance(struct rq *this_rq)
 	    time_before(jiffies, READ_ONCE(nohz.next_blocked)))
 		return;
 
+	trace_printk("nohz_newidle_balance start update");
+
 	raw_spin_unlock(&this_rq->lock);
 	/*
 	 * This CPU is going to be idle and blocked load of idle CPUs
@@ -9742,6 +9750,9 @@ static void nohz_newidle_balance(struct rq *this_rq)
 	if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
 		kick_ilb(NOHZ_STATS_KICK);
 	raw_spin_lock(&this_rq->lock);
+
+	trace_printk("nohz_newidle_balance lock back");
+
 }
 
 #else /* !CONFIG_NO_HZ_COMMON */
@@ -9869,6 +9880,8 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
 
 	rq_repin_lock(this_rq, rf);
 
+	trace_printk("idle_balance %d", pulled_task);
+
 	return pulled_task;
 }
 
-- 
2.7.4

>
> 

[snip]

>        rcu_sched-9     [000]   147.342741: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>        rcu_sched-9     [000]   147.342749: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 0 idle 2 flag 2
>        rcu_sched-9     [000]   147.342754: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>        rcu_sched-9     [000]   147.382739: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>        rcu_sched-9     [000]   147.382741: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 0 idle 2 flag 2
>        rcu_sched-9     [000]   147.382745: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>             sshd-1881  [000]   147.621105: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>             sshd-1881  [000]   147.621108: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 0 idle 2 flag 2
>             sshd-1881  [000]   147.621110: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1

This is the last call to the function added by the patch on CPU0 and it returns completely.
But then nothing else is schedule on CPU0

>           <idle>-0     [001]   147.662725: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 0 flag 2
>           <idle>-0     [001]   147.662730: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   151.032830: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   151.032833: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   151.032835: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   155.032777: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   155.032780: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   155.032781: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>          haveged-1823  [001]   155.515789: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>          haveged-1823  [001]   155.515792: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>          haveged-1823  [001]   155.515794: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   157.032718: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   157.032720: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   157.032722: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   159.032724: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   159.032726: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   159.032728: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   161.032698: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   161.032701: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   161.032702: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   165.032671: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   165.032673: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   165.032675: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   167.032667: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   167.032669: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   167.032671: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   169.032654: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   169.032657: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   169.032658: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>          rpcbind-1847  [001]   169.472688: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>          rpcbind-1847  [001]   169.472691: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>          rpcbind-1847  [001]   169.472693: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>      kworker/1:1-31    [001]   171.032660: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>      kworker/1:1-31    [001]   171.032662: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>      kworker/1:1-31    [001]   171.032664: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>          rpcbind-1847  [001]   199.482558: bputs:                pick_next_task_fair: nohz_newidle_balance start update
>          rpcbind-1847  [001]   199.482561: bprint:               _nohz_idle_balance: _nohz_idle_balance cpu 1 idle 2 flag 2
>          rpcbind-1847  [001]   199.482563: bprint:               _nohz_idle_balance: _nohz_idle_balance return 1
>           <idle>-0     [000]   241.615152: function:             dump_backtrace
> 
> -- 
> Regards,
> Niklas Söderlund

sorry for the inconvenience

Vincent

  reply	other threads:[~2018-04-23  9:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12  9:18 Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Niklas Söderlund
2018-04-12 10:33 ` Vincent Guittot
2018-04-12 11:15   ` Niklas Söderlund
     [not found]     ` <20180412133031.GA551@linaro.org>
2018-04-12 19:43       ` Heiner Kallweit
2018-04-14 11:21         ` Vincent Guittot
2018-04-12 22:39       ` Niklas Söderlund
2018-04-14 11:24         ` Vincent Guittot
2018-04-20 16:00           ` Vincent Guittot
2018-04-20 16:30             ` Joel Fernandes
2018-04-22 22:18             ` Niklas Söderlund
2018-04-23  9:54               ` Vincent Guittot [this message]
     [not found]                 ` <20180425225603.GA26177@bigcity.dyn.berto.se>
2018-04-26 10:31                   ` Vincent Guittot
2018-04-26 11:48                     ` Peter Zijlstra
2018-04-26 14:41                     ` Niklas Söderlund
2018-04-26 15:27                       ` Vincent Guittot
2018-04-26 15:38                         ` Niklas Söderlund
2018-05-02 13:40                     ` Geert Uytterhoeven
2018-05-03  9:25                     ` [tip:sched/urgent] sched/fair: Fix the update of blocked load when newly idle tip-bot for Vincent Guittot
2018-04-13 20:38     ` Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Niklas Söderlund
2018-04-14 11:26       ` Vincent Guittot
2018-04-12 22:06   ` Niklas Söderlund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180423095420.GA23995@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=hkallweit1@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).