Re: [PATCH v5] sched/rt: Use IPI to trigger RT task push migration instead of pulling

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Clark Williams" <williams@redhat.com>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	"Mike Galbraith" <umgwanakikbuti@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	"Jörn Engel" <joern@purestorage.com>
Subject: Re: [PATCH v5] sched/rt: Use IPI to trigger RT task push migration instead of pulling
Date: Fri, 20 Mar 2015 11:31:20 +0100	[thread overview]
Message-ID: <20150320103120.GZ23123@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150318144946.2f3cc982@gandalf.local.home>

On Wed, Mar 18, 2015 at 02:49:46PM -0400, Steven Rostedt wrote:
> 
> When debugging the latencies on a 40 core box, where we hit 300 to
> 500 microsecond latencies, I found there was a huge contention on the
> runqueue locks.
> 
> Investigating it further, running ftrace, I found that it was due to
> the pulling of RT tasks.
> 
> The test that was run was the following:
> 
>  cyclictest --numa -p95 -m -d0 -i100
> 
> This created a thread on each CPU, that would set its wakeup in iterations
> of 100 microseconds. The -d0 means that all the threads had the same
> interval (100us). Each thread sleeps for 100us and wakes up and measures
> its latencies.
> 
> cyclictest is maintained at:
>  git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
> 
> What happened was another RT task would be scheduled on one of the CPUs
> that was running our test, when the other CPU tests went to sleep and
> scheduled idle. This caused the "pull" operation to execute on all
> these CPUs. Each one of these saw the RT task that was overloaded on
> the CPU of the test that was still running, and each one tried
> to grab that task in a thundering herd way.
> 
> To grab the task, each thread would do a double rq lock grab, grabbing
> its own lock as well as the rq of the overloaded CPU. As the sched
> domains on this box was rather flat for its size, I saw up to 12 CPUs
> block on this lock at once. This caused a ripple affect with the
> rq locks especially since the taking was done via a double rq lock, which
> means that several of the CPUs had their own rq locks held while trying
> to take this rq lock. As these locks were blocked, any wakeups or load
> balanceing on these CPUs would also block on these locks, and the wait
> time escalated.
> 
> I've tried various methods to lessen the load, but things like an
> atomic counter to only let one CPU grab the task wont work, because
> the task may have a limited affinity, and we may pick the wrong
> CPU to take that lock and do the pull, to only find out that the
> CPU we picked isn't in the task's affinity.
> 
> Instead of doing the PULL, I now have the CPUs that want the pull to
> send over an IPI to the overloaded CPU, and let that CPU pick what
> CPU to push the task to. No more need to grab the rq lock, and the
> push/pull algorithm still works fine.
> 
> With this patch, the latency dropped to just 150us over a 20 hour run.
> Without the patch, the huge latencies would trigger in seconds.
> 
> I've created a new sched feature called RT_PUSH_IPI, which is enabled
> by default.
> 
> When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks
> and having the pulling CPU do the work is implemented. When RT_PUSH_IPI
> is enabled, the IPI is sent to the overloaded CPU to do a push.
> 
> To enabled or disable this at run time:
> 
>  # mount -t debugfs nodev /sys/kernel/debug
>  # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features
> or
>  # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features
> 
> Update: This original patch would send an IPI to all CPUs in the RT overload
> list. But that could theoretically cause the reverse issue. That is, there
> could be lots of overloaded RT queues and one CPU lowers its priority. It would
> then send an IPI to all the overloaded RT queues and they could then all try
> to grab the rq lock of the CPU lowering its priority, and then we have the
> same problem.
> 
> The latest design sends out only one IPI to the first overloaded CPU. It tries to
> push any tasks that it can, and then looks for the next overloaded CPU that can
> push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable
> tasks that have priorities greater than the source CPU are covered. In case the
> source CPU lowers its priority again, a flag is set to tell the IPI traversal to
> restart with the first RT overloaded CPU after the source CPU.
> 
> Parts-suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

OK, queued it. Do we want to look into making the same change for
deadline once this has settled?

next prev parent reply	other threads:[~2015-03-20 10:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 18:49 [PATCH v5] sched/rt: Use IPI to trigger RT task push migration instead of pulling Steven Rostedt
2015-03-20 10:25 ` Peter Zijlstra
2015-03-20 14:27   ` Steven Rostedt
2015-03-20 10:31 ` Peter Zijlstra [this message]
2015-03-20 14:30   ` Steven Rostedt
2015-03-26  1:24     ` Wanpeng Li
2015-03-27 16:14       ` Steven Rostedt
2015-03-23 12:25 ` [tip:sched/core] " tip-bot for Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150320103120.GZ23123@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=joern@purestorage.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox