public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Venkatesh Pallipadi <venki@google.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: [ 19/73] sched/nohz: Fix rq->cpu_load calculations some more
Date: Tue, 31 Jul 2012 05:43:29 +0100	[thread overview]
Message-ID: <20120731044314.326949038@decadent.org.uk> (raw)
In-Reply-To: <20120731044310.013763753@decadent.org.uk>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq->cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames and context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -145,6 +145,7 @@ extern unsigned long this_cpu_load(void)
 
 
 extern void calc_global_load(unsigned long ticks);
+extern void update_cpu_load_nohz(void);
 
 extern unsigned long get_parent_ip(unsigned long addr);
 
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3886,25 +3886,32 @@ static void __update_cpu_load(struct rq
 	sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_NO_HZ
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we cannot use the delta approach from the regular tick since that
+ * would seriously skew the load calculation. However we'll make do for those
+ * updates happening while idle (nohz_idle_balance) or coming out of idle
+ * (tick_nohz_idle_exit).
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
 /*
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
 static void update_idle_cpu_load(struct rq *this_rq)
 {
-	unsigned long curr_jiffies = jiffies;
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
 	unsigned long load = this_rq->load.weight;
 	unsigned long pending_updates;
 
 	/*
-	 * Bloody broken means of dealing with nohz, but better than nothing..
-	 * jiffies is updated by one cpu, another cpu can drift wrt the jiffy
-	 * update and see 0 difference the one time and 2 the next, even though
-	 * we ticked at roughtly the same rate.
-	 *
-	 * Hence we only use this from nohz_idle_balance() and skip this
-	 * nonsense when called from the scheduler_tick() since that's
-	 * guaranteed a stable rate.
+	 * bail if there's load or we're actually up-to-date.
 	 */
 	if (load || curr_jiffies == this_rq->last_load_update_tick)
 		return;
@@ -3916,12 +3923,38 @@ static void update_idle_cpu_load(struct
 }
 
 /*
+ * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ */
+void update_cpu_load_nohz(void)
+{
+	struct rq *this_rq = this_rq();
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long pending_updates;
+
+	if (curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	raw_spin_lock(&this_rq->lock);
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	if (pending_updates) {
+		this_rq->last_load_update_tick = curr_jiffies;
+		/*
+		 * We were idle, this means load 0, the current load might be
+		 * !0 due to remote wakeups and the sort.
+		 */
+		__update_cpu_load(this_rq, 0, pending_updates);
+	}
+	raw_spin_unlock(&this_rq->lock);
+}
+#endif /* CONFIG_NO_HZ */
+
+/*
  * Called from scheduler_tick()
  */
 static void update_cpu_load_active(struct rq *this_rq)
 {
 	/*
-	 * See the mess in update_idle_cpu_load().
+	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
 	__update_cpu_load(this_rq, this_rq->load.weight, 1);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -549,6 +549,7 @@ void tick_nohz_restart_sched_tick(void)
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);
+	update_cpu_load_nohz();
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	/*



  parent reply	other threads:[~2012-07-31  4:50 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
2012-07-31  4:43 ` [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes Ben Hutchings
2012-07-31  4:43 ` [ 02/73] mm: compaction: allow compaction to isolate dirty pages Ben Hutchings
2012-07-31  4:43 ` [ 03/73] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage Ben Hutchings
2012-07-31  4:43 ` [ 04/73] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Ben Hutchings
2012-07-31  4:43 ` [ 05/73] mm: compaction: make isolate_lru_page() filter-aware again Ben Hutchings
2012-07-31  4:43 ` [ 06/73] mm: compaction: introduce sync-light migration for use by compaction Ben Hutchings
2012-07-31 16:42   ` Herton Ronaldo Krzesinski
2012-07-31 17:00     ` Mel Gorman
2012-07-31 17:03       ` Mel Gorman
2012-07-31 23:12         ` Ben Hutchings
2012-07-31  4:43 ` [ 07/73] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Ben Hutchings
2012-07-31  4:43 ` [ 08/73] mm: vmscan: do not OOM if aborting reclaim to start compaction Ben Hutchings
2012-07-31  4:43 ` [ 09/73] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Ben Hutchings
2012-07-31  4:43 ` [ 10/73] vmscan: promote shared file mapped pages Ben Hutchings
2012-07-31  4:43 ` [ 11/73] vmscan: activate executable pages after first usage Ben Hutchings
2012-07-31  4:43 ` [ 12/73] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Ben Hutchings
2012-07-31  4:43 ` [ 13/73] mm: test PageSwapBacked in lumpy reclaim Ben Hutchings
2012-07-31  4:43 ` [ 14/73] mm: vmscan: convert global reclaim to per-memcg LRU lists Ben Hutchings
2012-07-31  4:43 ` [ 15/73] cpuset: mm: reduce large amounts of memory barrier related damage v3 Ben Hutchings
2012-07-31  4:43 ` [ 16/73] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Ben Hutchings
2012-07-31  4:43 ` [ 17/73] [SCSI] Fix NULL dereferences in scsi_cmd_to_driver Ben Hutchings
2012-07-31  4:43 ` [ 18/73] sched/nohz: Fix rq->cpu_load[] calculations Ben Hutchings
2012-07-31  4:43 ` Ben Hutchings [this message]
2012-07-31  4:43 ` [ 20/73] powerpc/ftrace: Fix assembly trampoline register usage Ben Hutchings
2012-07-31  4:43 ` [ 21/73] cx25821: Remove bad strcpy to read-only char* Ben Hutchings
2012-07-31  4:43 ` [ 22/73] x86: Fix boot on Twinhead H12Y Ben Hutchings
2012-07-31  4:43 ` [ 23/73] r8169: RxConfig hack for the 8168evl Ben Hutchings
2012-07-31  4:43 ` [ 24/73] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps Ben Hutchings
2012-07-31  4:43 ` [ 25/73] wireless: rt2x00: rt2800usb add more devices ids Ben Hutchings
2012-07-31  4:43 ` [ 26/73] wireless: rt2x00: rt2800usb more devices were identified Ben Hutchings
2012-07-31  4:43 ` [ 27/73] rt2800usb: 2001:3c17 is an RT3370 device Ben Hutchings
2012-07-31  4:43 ` [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one Ben Hutchings
2012-08-01  1:56   ` Herton Ronaldo Krzesinski
2012-08-01  2:36     ` Ben Hutchings
2012-07-31  4:43 ` [ 29/73] usb: gadget: Fix g_ether interface link status Ben Hutchings
2012-07-31  4:43 ` [ 30/73] ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr Ben Hutchings
2012-07-31  4:43 ` [ 31/73] ftrace: Disable function tracing during suspend/resume and hibernation, again Ben Hutchings
2012-07-31  4:43 ` [ 32/73] x86, microcode: microcode_core.c simple_strtoul cleanup Ben Hutchings
2012-07-31  4:43 ` [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface Ben Hutchings
2012-08-03  9:04   ` Sven Joachim
2012-08-03  9:43     ` Borislav Petkov
2012-08-03 12:27       ` Borislav Petkov
2012-08-04 15:41         ` Ben Hutchings
2012-08-04 16:07           ` Henrique de Moraes Holschuh
2012-08-04 17:23             ` Ben Hutchings
2012-08-05  9:21               ` Borislav Petkov
2012-08-05 18:56                 ` Ben Hutchings
2012-07-31  4:43 ` [ 34/73] usbdevfs: Correct amount of data copied to user in processcompl_compat Ben Hutchings
2012-07-31  4:43 ` [ 35/73] ASoC: dapm: Fix locking during codec shutdown Ben Hutchings
2012-07-31 16:11   ` Herton Ronaldo Krzesinski
2012-07-31 16:13     ` Mark Brown
2012-07-31 23:20       ` Ben Hutchings
2012-07-31  4:43 ` [ 36/73] ext4: fix overhead calculation used by ext4_statfs() Ben Hutchings
2012-07-31  4:43 ` [ 37/73] udf: Improve table length check to avoid possible overflow Ben Hutchings
2012-07-31  4:43 ` [ 38/73] powerpc: Add "memory" attribute for mfmsr() Ben Hutchings
2012-07-31  4:43 ` [ 39/73] mwifiex: correction in mcs index check Ben Hutchings
2012-07-31  4:43 ` [ 40/73] USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces Ben Hutchings
2012-07-31  4:43 ` [ 41/73] USB: option: add ZTE MF821D Ben Hutchings
2012-07-31  4:43 ` [ 42/73] target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE Ben Hutchings
2012-07-31  4:43 ` [ 43/73] target: Add range checking to UNMAP emulation Ben Hutchings
2012-07-31  4:43 ` [ 44/73] target: Fix reading of data length fields for UNMAP commands Ben Hutchings
2012-07-31  4:43 ` [ 45/73] target: Fix possible integer underflow in UNMAP emulation Ben Hutchings
2012-07-31  4:43 ` [ 46/73] target: Check number of unmap descriptors against our limit Ben Hutchings
2012-07-31  4:43 ` [ 47/73] s390/idle: fix sequence handling vs cpu hotplug Ben Hutchings
2012-07-31  4:43 ` [ 48/73] rtlwifi: rtl8192de: Fix phy-based version calculation Ben Hutchings
2012-07-31  4:43 ` [ 49/73] workqueue: perform cpu down operations from low priority cpu_notifier() Ben Hutchings
2012-07-31  4:44 ` [ 50/73] ALSA: hda - Add support for Realtek ALC282 Ben Hutchings
2012-07-31  4:44 ` [ 51/73] iommu/amd: Fix hotplug with iommu=pt Ben Hutchings
2012-07-31  4:44 ` [ 52/73] drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns Ben Hutchings
2012-07-31  4:44 ` [ 53/73] ALSA: hda - Turn on PIN_OUT from hdmi playback prepare Ben Hutchings
2012-07-31  4:44 ` [ 54/73] block: add blk_queue_dead() Ben Hutchings
2012-07-31  4:44 ` [ 55/73] [SCSI] Fix device removal NULL pointer dereference Ben Hutchings
2012-07-31  4:44 ` [ 56/73] [SCSI] Avoid dangling pointer in scsi_requeue_command() Ben Hutchings
2012-07-31  4:44 ` [ 57/73] [SCSI] fix hot unplug vs async scan race Ben Hutchings
2012-07-31  4:44 ` [ 58/73] [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Ben Hutchings
2012-07-31  4:44 ` [ 59/73] [SCSI] libsas: continue revalidation Ben Hutchings
2012-07-31  4:44 ` [ 60/73] [SCSI] libsas: fix sas_discover_devices return code handling Ben Hutchings
2012-07-31  4:44 ` [ 61/73] iscsi-target: Drop bogus struct file usage for iSCSI/SCTP Ben Hutchings
2012-07-31  4:44 ` [ 62/73] mmc: sdhci-pci: CaFe has broken card detection Ben Hutchings
2012-07-31  4:44 ` [ 63/73] ext4: dont let i_reserved_meta_blocks go negative Ben Hutchings
2012-07-31  4:44 ` [ 64/73] ext4: undo ext4_calc_metadata_amount if we fail to claim space Ben Hutchings
2012-07-31  4:44 ` [ 65/73] ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements Ben Hutchings
2012-07-31  4:44 ` [ 66/73] locks: fix checking of fcntl_setlease argument Ben Hutchings
2012-07-31  4:44 ` [ 67/73] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Ben Hutchings
2012-07-31  4:44 ` [ 68/73] drm/radeon: fix bo creation retry path Ben Hutchings
2012-07-31  4:44 ` [ 69/73] drm/radeon: fix non revealent error message Ben Hutchings
2012-07-31  4:44 ` [ 70/73] drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2) Ben Hutchings
2012-07-31  4:44 ` [ 71/73] drm/radeon: on hotplug force link training to happen (v2) Ben Hutchings
2012-07-31  4:44 ` [ 72/73] Btrfs: call the ordered free operation without any locks held Ben Hutchings
2012-07-31  4:44 ` [ 73/73] nouveau: Fix alignment requirements on src and dst addresses Ben Hutchings
2012-07-31  5:00 ` [ 00/73] 3.2.25-stable review Ben Hutchings
2012-08-01 12:55 ` Steven Rostedt
2012-08-05 22:26   ` Ben Hutchings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120731044314.326949038@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox