[PATCH 3.4 14/24] sched/nohz: Fix rq->cpu_load[] calculations

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	pjt@google.com, Venkatesh Pallipadi <venki@google.com>,
	Ingo Molnar <mingo@kernel.org>, Li Zefan <lizefan@huawei.com>
Subject: [PATCH 3.4 14/24] sched/nohz: Fix rq->cpu_load[] calculations
Date: Tue, 18 Feb 2014 14:47:01 -0800	[thread overview]
Message-ID: <20140218224550.629528065@linuxfoundation.org> (raw)
In-Reply-To: <20140218224550.221535225@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream.

While investigating why the load-balancer did funny I found that the
rq->cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/sched/core.c  |   53 +++++++++++++++++++++++++++++++++++++--------------
 kernel/sched/fair.c  |    2 -
 kernel/sched/sched.h |    2 -
 3 files changed, 41 insertions(+), 16 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -692,8 +692,6 @@ int tg_nop(struct task_group *tg, void *
 }
 #endif
 
-void update_cpu_load(struct rq *this_rq);
-
 static void set_load_weight(struct task_struct *p)
 {
 	int prio = p->static_prio - MAX_RT_PRIO;
@@ -2620,22 +2618,13 @@ decay_load_missed(unsigned long load, un
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
-void update_cpu_load(struct rq *this_rq)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+			      unsigned long pending_updates)
 {
-	unsigned long this_load = this_rq->load.weight;
-	unsigned long curr_jiffies = jiffies;
-	unsigned long pending_updates;
 	int i, scale;
 
 	this_rq->nr_load_updates++;
 
-	/* Avoid repeated calls on same jiffy, when moving in and out of idle */
-	if (curr_jiffies == this_rq->last_load_update_tick)
-		return;
-
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	this_rq->last_load_update_tick = curr_jiffies;
-
 	/* Update our load: */
 	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
 	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
@@ -2660,9 +2649,45 @@ void update_cpu_load(struct rq *this_rq)
 	sched_avg_update(this_rq);
 }
 
+/*
+ * Called from nohz_idle_balance() to update the load ratings before doing the
+ * idle balance.
+ */
+void update_idle_cpu_load(struct rq *this_rq)
+{
+	unsigned long curr_jiffies = jiffies;
+	unsigned long load = this_rq->load.weight;
+	unsigned long pending_updates;
+
+	/*
+	 * Bloody broken means of dealing with nohz, but better than nothing..
+	 * jiffies is updated by one cpu, another cpu can drift wrt the jiffy
+	 * update and see 0 difference the one time and 2 the next, even though
+	 * we ticked at roughtly the same rate.
+	 *
+	 * Hence we only use this from nohz_idle_balance() and skip this
+	 * nonsense when called from the scheduler_tick() since that's
+	 * guaranteed a stable rate.
+	 */
+	if (load || curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	this_rq->last_load_update_tick = curr_jiffies;
+
+	__update_cpu_load(this_rq, load, pending_updates);
+}
+
+/*
+ * Called from scheduler_tick()
+ */
 static void update_cpu_load_active(struct rq *this_rq)
 {
-	update_cpu_load(this_rq);
+	/*
+	 * See the mess in update_idle_cpu_load().
+	 */
+	this_rq->last_load_update_tick = jiffies;
+	__update_cpu_load(this_rq, this_rq->load.weight, 1);
 
 	calc_load_account_active(this_rq);
 }
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5042,7 +5042,7 @@ static void nohz_idle_balance(int this_c
 
 		raw_spin_lock_irq(&this_rq->lock);
 		update_rq_clock(this_rq);
-		update_cpu_load(this_rq);
+		update_idle_cpu_load(this_rq);
 		raw_spin_unlock_irq(&this_rq->lock);
 
 		rebalance_domains(balance_cpu, CPU_IDLE);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -873,7 +873,7 @@ extern void resched_cpu(int cpu);
 extern struct rt_bandwidth def_rt_bandwidth;
 extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
 
-extern void update_cpu_load(struct rq *this_rq);
+extern void update_idle_cpu_load(struct rq *this_rq);
 
 #ifdef CONFIG_CGROUP_CPUACCT
 #include <linux/cgroup.h>

next prev parent reply	other threads:[~2014-02-18 22:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 22:46 [PATCH 3.4 00/24] 3.4.81-stable review Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 01/24] SELinux: Fix kernel BUG on empty security contexts Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 02/24] mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 03/24] mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 04/24] x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 05/24] printk: Fix scheduling-while-atomic problem in console_cpu_notify() Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 06/24] ext4: protect group inode free counting with group lock Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 07/24] drm/i915: kick any firmware framebuffers before claiming the gtt Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 08/24] mm/page_alloc.c: remove pageblock_default_order() Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 09/24] mm: setup pageblock_order before its used by sparsemem Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 10/24] dm sysfs: fix a module unload race Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 11/24] ftrace: Synchronize setting function_trace_op with ftrace_trace_function Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 12/24] ftrace: Fix synchronization location disabling and freeing ftrace_ops Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 13/24] ftrace: Have function graph only trace based on global_ops filters Greg Kroah-Hartman
2014-02-18 22:47 ` Greg Kroah-Hartman [this message]
2014-02-18 22:47 ` [PATCH 3.4 15/24] sched/nohz: Fix rq->cpu_load calculations some more Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 16/24] IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 17/24] target/file: Use O_DSYNC by default for FILEIO backends Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 18/24] target/file: Re-enable optional fd_buffered_io=1 operation Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 19/24] KVM: Fix buffer overflow in kvm_set_irq() Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 20/24] PM / Hibernate: Hibernate/thaw fixes/improvements Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 21/24] Input: synaptics - handle out of bounds values from the hardware Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 22/24] virtio-blk: Use block layer provided spinlock Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 23/24] lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q) Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 24/24] nfs: tear down caches in nfs_init_writepagecache when allocation fails Greg Kroah-Hartman
2014-02-19  4:26 ` [PATCH 3.4 00/24] 3.4.81-stable review Guenter Roeck
2014-02-20  0:03 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140218224550.629528065@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=stable@vger.kernel.org \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).