stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	pjt@google.com, Venkatesh Pallipadi <venki@google.com>,
	Ingo Molnar <mingo@kernel.org>, Li Zefan <lizefan@huawei.com>
Subject: [PATCH 3.4 15/24] sched/nohz: Fix rq->cpu_load calculations some more
Date: Tue, 18 Feb 2014 14:47:02 -0800	[thread overview]
Message-ID: <20140218224550.657593239@linuxfoundation.org> (raw)
In-Reply-To: <20140218224550.221535225@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq->cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/sched.h    |    1 
 kernel/sched/core.c      |   53 ++++++++++++++++++++++++++++++++++++++---------
 kernel/time/tick-sched.c |    1 
 3 files changed, 45 insertions(+), 10 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -144,6 +144,7 @@ extern unsigned long this_cpu_load(void)
 
 
 extern void calc_global_load(unsigned long ticks);
+extern void update_cpu_load_nohz(void);
 
 extern unsigned long get_parent_ip(unsigned long addr);
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2649,25 +2649,32 @@ static void __update_cpu_load(struct rq
 	sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_NO_HZ
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we cannot use the delta approach from the regular tick since that
+ * would seriously skew the load calculation. However we'll make do for those
+ * updates happening while idle (nohz_idle_balance) or coming out of idle
+ * (tick_nohz_idle_exit).
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
 /*
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
 void update_idle_cpu_load(struct rq *this_rq)
 {
-	unsigned long curr_jiffies = jiffies;
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
 	unsigned long load = this_rq->load.weight;
 	unsigned long pending_updates;
 
 	/*
-	 * Bloody broken means of dealing with nohz, but better than nothing..
-	 * jiffies is updated by one cpu, another cpu can drift wrt the jiffy
-	 * update and see 0 difference the one time and 2 the next, even though
-	 * we ticked at roughtly the same rate.
-	 *
-	 * Hence we only use this from nohz_idle_balance() and skip this
-	 * nonsense when called from the scheduler_tick() since that's
-	 * guaranteed a stable rate.
+	 * bail if there's load or we're actually up-to-date.
 	 */
 	if (load || curr_jiffies == this_rq->last_load_update_tick)
 		return;
@@ -2679,12 +2686,38 @@ void update_idle_cpu_load(struct rq *thi
 }
 
 /*
+ * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ */
+void update_cpu_load_nohz(void)
+{
+	struct rq *this_rq = this_rq();
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long pending_updates;
+
+	if (curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	raw_spin_lock(&this_rq->lock);
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	if (pending_updates) {
+		this_rq->last_load_update_tick = curr_jiffies;
+		/*
+		 * We were idle, this means load 0, the current load might be
+		 * !0 due to remote wakeups and the sort.
+		 */
+		__update_cpu_load(this_rq, 0, pending_updates);
+	}
+	raw_spin_unlock(&this_rq->lock);
+}
+#endif /* CONFIG_NO_HZ */
+
+/*
  * Called from scheduler_tick()
  */
 static void update_cpu_load_active(struct rq *this_rq)
 {
 	/*
-	 * See the mess in update_idle_cpu_load().
+	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
 	__update_cpu_load(this_rq, this_rq->load.weight, 1);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -582,6 +582,7 @@ void tick_nohz_idle_exit(void)
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);
+	update_cpu_load_nohz();
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	/*



  parent reply	other threads:[~2014-02-18 22:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 22:46 [PATCH 3.4 00/24] 3.4.81-stable review Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 01/24] SELinux: Fix kernel BUG on empty security contexts Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 02/24] mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 03/24] mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 04/24] x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 05/24] printk: Fix scheduling-while-atomic problem in console_cpu_notify() Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 06/24] ext4: protect group inode free counting with group lock Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 07/24] drm/i915: kick any firmware framebuffers before claiming the gtt Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 08/24] mm/page_alloc.c: remove pageblock_default_order() Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 09/24] mm: setup pageblock_order before its used by sparsemem Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 10/24] dm sysfs: fix a module unload race Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 11/24] ftrace: Synchronize setting function_trace_op with ftrace_trace_function Greg Kroah-Hartman
2014-02-18 22:46 ` [PATCH 3.4 12/24] ftrace: Fix synchronization location disabling and freeing ftrace_ops Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 13/24] ftrace: Have function graph only trace based on global_ops filters Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 14/24] sched/nohz: Fix rq->cpu_load[] calculations Greg Kroah-Hartman
2014-02-18 22:47 ` Greg Kroah-Hartman [this message]
2014-02-18 22:47 ` [PATCH 3.4 16/24] IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 17/24] target/file: Use O_DSYNC by default for FILEIO backends Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 18/24] target/file: Re-enable optional fd_buffered_io=1 operation Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 19/24] KVM: Fix buffer overflow in kvm_set_irq() Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 20/24] PM / Hibernate: Hibernate/thaw fixes/improvements Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 21/24] Input: synaptics - handle out of bounds values from the hardware Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 22/24] virtio-blk: Use block layer provided spinlock Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 23/24] lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q) Greg Kroah-Hartman
2014-02-18 22:47 ` [PATCH 3.4 24/24] nfs: tear down caches in nfs_init_writepagecache when allocation fails Greg Kroah-Hartman
2014-02-19  4:26 ` [PATCH 3.4 00/24] 3.4.81-stable review Guenter Roeck
2014-02-20  0:03 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140218224550.657593239@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=stable@vger.kernel.org \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).