public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Venkatesh Pallipadi <venki@google.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: [ 18/73] sched/nohz: Fix rq->cpu_load[] calculations
Date: Tue, 31 Jul 2012 05:43:28 +0100	[thread overview]
Message-ID: <20120731044314.153043819@decadent.org.uk> (raw)
In-Reply-To: <20120731044310.013763753@decadent.org.uk>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream.

While investigating why the load-balancer did funny I found that the
rq->cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames and context; keep functions static]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1887,7 +1887,7 @@ static void double_rq_unlock(struct rq *
 
 static void update_sysctl(void);
 static int get_update_sysctl_factor(void);
-static void update_cpu_load(struct rq *this_rq);
+static void update_idle_cpu_load(struct rq *this_rq);
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
@@ -3855,22 +3855,13 @@ decay_load_missed(unsigned long load, un
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
-static void update_cpu_load(struct rq *this_rq)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+			      unsigned long pending_updates)
 {
-	unsigned long this_load = this_rq->load.weight;
-	unsigned long curr_jiffies = jiffies;
-	unsigned long pending_updates;
 	int i, scale;
 
 	this_rq->nr_load_updates++;
 
-	/* Avoid repeated calls on same jiffy, when moving in and out of idle */
-	if (curr_jiffies == this_rq->last_load_update_tick)
-		return;
-
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	this_rq->last_load_update_tick = curr_jiffies;
-
 	/* Update our load: */
 	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
 	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
@@ -3895,9 +3886,45 @@ static void update_cpu_load(struct rq *t
 	sched_avg_update(this_rq);
 }
 
+/*
+ * Called from nohz_idle_balance() to update the load ratings before doing the
+ * idle balance.
+ */
+static void update_idle_cpu_load(struct rq *this_rq)
+{
+	unsigned long curr_jiffies = jiffies;
+	unsigned long load = this_rq->load.weight;
+	unsigned long pending_updates;
+
+	/*
+	 * Bloody broken means of dealing with nohz, but better than nothing..
+	 * jiffies is updated by one cpu, another cpu can drift wrt the jiffy
+	 * update and see 0 difference the one time and 2 the next, even though
+	 * we ticked at roughtly the same rate.
+	 *
+	 * Hence we only use this from nohz_idle_balance() and skip this
+	 * nonsense when called from the scheduler_tick() since that's
+	 * guaranteed a stable rate.
+	 */
+	if (load || curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	this_rq->last_load_update_tick = curr_jiffies;
+
+	__update_cpu_load(this_rq, load, pending_updates);
+}
+
+/*
+ * Called from scheduler_tick()
+ */
 static void update_cpu_load_active(struct rq *this_rq)
 {
-	update_cpu_load(this_rq);
+	/*
+	 * See the mess in update_idle_cpu_load().
+	 */
+	this_rq->last_load_update_tick = jiffies;
+	__update_cpu_load(this_rq, this_rq->load.weight, 1);
 
 	calc_load_account_active(this_rq);
 }
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -4735,7 +4735,7 @@ static void nohz_idle_balance(int this_c
 
 		raw_spin_lock_irq(&this_rq->lock);
 		update_rq_clock(this_rq);
-		update_cpu_load(this_rq);
+		update_idle_cpu_load(this_rq);
 		raw_spin_unlock_irq(&this_rq->lock);
 
 		rebalance_domains(balance_cpu, CPU_IDLE);



  parent reply	other threads:[~2012-07-31  4:50 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
2012-07-31  4:43 ` [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes Ben Hutchings
2012-07-31  4:43 ` [ 02/73] mm: compaction: allow compaction to isolate dirty pages Ben Hutchings
2012-07-31  4:43 ` [ 03/73] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage Ben Hutchings
2012-07-31  4:43 ` [ 04/73] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Ben Hutchings
2012-07-31  4:43 ` [ 05/73] mm: compaction: make isolate_lru_page() filter-aware again Ben Hutchings
2012-07-31  4:43 ` [ 06/73] mm: compaction: introduce sync-light migration for use by compaction Ben Hutchings
2012-07-31 16:42   ` Herton Ronaldo Krzesinski
2012-07-31 17:00     ` Mel Gorman
2012-07-31 17:03       ` Mel Gorman
2012-07-31 23:12         ` Ben Hutchings
2012-07-31  4:43 ` [ 07/73] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Ben Hutchings
2012-07-31  4:43 ` [ 08/73] mm: vmscan: do not OOM if aborting reclaim to start compaction Ben Hutchings
2012-07-31  4:43 ` [ 09/73] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Ben Hutchings
2012-07-31  4:43 ` [ 10/73] vmscan: promote shared file mapped pages Ben Hutchings
2012-07-31  4:43 ` [ 11/73] vmscan: activate executable pages after first usage Ben Hutchings
2012-07-31  4:43 ` [ 12/73] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Ben Hutchings
2012-07-31  4:43 ` [ 13/73] mm: test PageSwapBacked in lumpy reclaim Ben Hutchings
2012-07-31  4:43 ` [ 14/73] mm: vmscan: convert global reclaim to per-memcg LRU lists Ben Hutchings
2012-07-31  4:43 ` [ 15/73] cpuset: mm: reduce large amounts of memory barrier related damage v3 Ben Hutchings
2012-07-31  4:43 ` [ 16/73] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Ben Hutchings
2012-07-31  4:43 ` [ 17/73] [SCSI] Fix NULL dereferences in scsi_cmd_to_driver Ben Hutchings
2012-07-31  4:43 ` Ben Hutchings [this message]
2012-07-31  4:43 ` [ 19/73] sched/nohz: Fix rq->cpu_load calculations some more Ben Hutchings
2012-07-31  4:43 ` [ 20/73] powerpc/ftrace: Fix assembly trampoline register usage Ben Hutchings
2012-07-31  4:43 ` [ 21/73] cx25821: Remove bad strcpy to read-only char* Ben Hutchings
2012-07-31  4:43 ` [ 22/73] x86: Fix boot on Twinhead H12Y Ben Hutchings
2012-07-31  4:43 ` [ 23/73] r8169: RxConfig hack for the 8168evl Ben Hutchings
2012-07-31  4:43 ` [ 24/73] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps Ben Hutchings
2012-07-31  4:43 ` [ 25/73] wireless: rt2x00: rt2800usb add more devices ids Ben Hutchings
2012-07-31  4:43 ` [ 26/73] wireless: rt2x00: rt2800usb more devices were identified Ben Hutchings
2012-07-31  4:43 ` [ 27/73] rt2800usb: 2001:3c17 is an RT3370 device Ben Hutchings
2012-07-31  4:43 ` [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one Ben Hutchings
2012-08-01  1:56   ` Herton Ronaldo Krzesinski
2012-08-01  2:36     ` Ben Hutchings
2012-07-31  4:43 ` [ 29/73] usb: gadget: Fix g_ether interface link status Ben Hutchings
2012-07-31  4:43 ` [ 30/73] ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr Ben Hutchings
2012-07-31  4:43 ` [ 31/73] ftrace: Disable function tracing during suspend/resume and hibernation, again Ben Hutchings
2012-07-31  4:43 ` [ 32/73] x86, microcode: microcode_core.c simple_strtoul cleanup Ben Hutchings
2012-07-31  4:43 ` [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface Ben Hutchings
2012-08-03  9:04   ` Sven Joachim
2012-08-03  9:43     ` Borislav Petkov
2012-08-03 12:27       ` Borislav Petkov
2012-08-04 15:41         ` Ben Hutchings
2012-08-04 16:07           ` Henrique de Moraes Holschuh
2012-08-04 17:23             ` Ben Hutchings
2012-08-05  9:21               ` Borislav Petkov
2012-08-05 18:56                 ` Ben Hutchings
2012-07-31  4:43 ` [ 34/73] usbdevfs: Correct amount of data copied to user in processcompl_compat Ben Hutchings
2012-07-31  4:43 ` [ 35/73] ASoC: dapm: Fix locking during codec shutdown Ben Hutchings
2012-07-31 16:11   ` Herton Ronaldo Krzesinski
2012-07-31 16:13     ` Mark Brown
2012-07-31 23:20       ` Ben Hutchings
2012-07-31  4:43 ` [ 36/73] ext4: fix overhead calculation used by ext4_statfs() Ben Hutchings
2012-07-31  4:43 ` [ 37/73] udf: Improve table length check to avoid possible overflow Ben Hutchings
2012-07-31  4:43 ` [ 38/73] powerpc: Add "memory" attribute for mfmsr() Ben Hutchings
2012-07-31  4:43 ` [ 39/73] mwifiex: correction in mcs index check Ben Hutchings
2012-07-31  4:43 ` [ 40/73] USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces Ben Hutchings
2012-07-31  4:43 ` [ 41/73] USB: option: add ZTE MF821D Ben Hutchings
2012-07-31  4:43 ` [ 42/73] target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE Ben Hutchings
2012-07-31  4:43 ` [ 43/73] target: Add range checking to UNMAP emulation Ben Hutchings
2012-07-31  4:43 ` [ 44/73] target: Fix reading of data length fields for UNMAP commands Ben Hutchings
2012-07-31  4:43 ` [ 45/73] target: Fix possible integer underflow in UNMAP emulation Ben Hutchings
2012-07-31  4:43 ` [ 46/73] target: Check number of unmap descriptors against our limit Ben Hutchings
2012-07-31  4:43 ` [ 47/73] s390/idle: fix sequence handling vs cpu hotplug Ben Hutchings
2012-07-31  4:43 ` [ 48/73] rtlwifi: rtl8192de: Fix phy-based version calculation Ben Hutchings
2012-07-31  4:43 ` [ 49/73] workqueue: perform cpu down operations from low priority cpu_notifier() Ben Hutchings
2012-07-31  4:44 ` [ 50/73] ALSA: hda - Add support for Realtek ALC282 Ben Hutchings
2012-07-31  4:44 ` [ 51/73] iommu/amd: Fix hotplug with iommu=pt Ben Hutchings
2012-07-31  4:44 ` [ 52/73] drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns Ben Hutchings
2012-07-31  4:44 ` [ 53/73] ALSA: hda - Turn on PIN_OUT from hdmi playback prepare Ben Hutchings
2012-07-31  4:44 ` [ 54/73] block: add blk_queue_dead() Ben Hutchings
2012-07-31  4:44 ` [ 55/73] [SCSI] Fix device removal NULL pointer dereference Ben Hutchings
2012-07-31  4:44 ` [ 56/73] [SCSI] Avoid dangling pointer in scsi_requeue_command() Ben Hutchings
2012-07-31  4:44 ` [ 57/73] [SCSI] fix hot unplug vs async scan race Ben Hutchings
2012-07-31  4:44 ` [ 58/73] [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Ben Hutchings
2012-07-31  4:44 ` [ 59/73] [SCSI] libsas: continue revalidation Ben Hutchings
2012-07-31  4:44 ` [ 60/73] [SCSI] libsas: fix sas_discover_devices return code handling Ben Hutchings
2012-07-31  4:44 ` [ 61/73] iscsi-target: Drop bogus struct file usage for iSCSI/SCTP Ben Hutchings
2012-07-31  4:44 ` [ 62/73] mmc: sdhci-pci: CaFe has broken card detection Ben Hutchings
2012-07-31  4:44 ` [ 63/73] ext4: dont let i_reserved_meta_blocks go negative Ben Hutchings
2012-07-31  4:44 ` [ 64/73] ext4: undo ext4_calc_metadata_amount if we fail to claim space Ben Hutchings
2012-07-31  4:44 ` [ 65/73] ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements Ben Hutchings
2012-07-31  4:44 ` [ 66/73] locks: fix checking of fcntl_setlease argument Ben Hutchings
2012-07-31  4:44 ` [ 67/73] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Ben Hutchings
2012-07-31  4:44 ` [ 68/73] drm/radeon: fix bo creation retry path Ben Hutchings
2012-07-31  4:44 ` [ 69/73] drm/radeon: fix non revealent error message Ben Hutchings
2012-07-31  4:44 ` [ 70/73] drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2) Ben Hutchings
2012-07-31  4:44 ` [ 71/73] drm/radeon: on hotplug force link training to happen (v2) Ben Hutchings
2012-07-31  4:44 ` [ 72/73] Btrfs: call the ordered free operation without any locks held Ben Hutchings
2012-07-31  4:44 ` [ 73/73] nouveau: Fix alignment requirements on src and dst addresses Ben Hutchings
2012-07-31  5:00 ` [ 00/73] 3.2.25-stable review Ben Hutchings
2012-08-01 12:55 ` Steven Rostedt
2012-08-05 22:26   ` Ben Hutchings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120731044314.153043819@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox