linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Frederic Weisbecker <fweisbec@gmail.com>,
	Rik van Riel <riel@redhat.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Stanislaw Gruszka <sgruszka@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Ivan Delalande <colona@arista.com>
Subject: [PATCH 4.9 29/35] sched/cputime: Fix ksoftirqd cputime accounting regression
Date: Thu, 11 Oct 2018 17:35:31 +0200	[thread overview]
Message-ID: <20181011152521.346719028@linuxfoundation.org> (raw)
In-Reply-To: <20181011152520.174949126@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Frederic Weisbecker <fweisbec@gmail.com>

commit 25e2d8c1b9e327ed260edd13169cc22bc7a78bc6 upstream.

irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.

But this behaviour got broken by:

  a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account")

... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().

This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.

ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().

To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.

Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/sched/cputime.c |   27 ++++++++++++++++-----------
 kernel/sched/sched.h   |    9 +++++++--
 2 files changed, 23 insertions(+), 13 deletions(-)

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -37,6 +37,18 @@ void disable_sched_clock_irqtime(void)
 	sched_clock_irqtime = 0;
 }
 
+static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
+				  enum cpu_usage_stat idx)
+{
+	u64 *cpustat = kcpustat_this_cpu->cpustat;
+
+	u64_stats_update_begin(&irqtime->sync);
+	cpustat[idx] += delta;
+	irqtime->total += delta;
+	irqtime->tick_delta += delta;
+	u64_stats_update_end(&irqtime->sync);
+}
+
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
@@ -44,7 +56,6 @@ void disable_sched_clock_irqtime(void)
 void irqtime_account_irq(struct task_struct *curr)
 {
 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	s64 delta;
 	int cpu;
 
@@ -55,22 +66,16 @@ void irqtime_account_irq(struct task_str
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
 
-	u64_stats_update_begin(&irqtime->sync);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count()) {
-		cpustat[CPUTIME_IRQ] += delta;
-		irqtime->tick_delta += delta;
-	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
-		cpustat[CPUTIME_SOFTIRQ] += delta;
-		irqtime->tick_delta += delta;
-	}
-
-	u64_stats_update_end(&irqtime->sync);
+	if (hardirq_count())
+		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
+	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1743,6 +1743,7 @@ static inline void nohz_balance_exit_idl
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 struct irqtime {
+	u64			total;
 	u64			tick_delta;
 	u64			irq_start_time;
 	struct u64_stats_sync	sync;
@@ -1750,16 +1751,20 @@ struct irqtime {
 
 DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
 
+/*
+ * Returns the irqtime minus the softirq time computed by ksoftirqd.
+ * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime
+ * and never move forward.
+ */
 static inline u64 irq_time_read(int cpu)
 {
 	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
-	u64 *cpustat = kcpustat_cpu(cpu).cpustat;
 	unsigned int seq;
 	u64 total;
 
 	do {
 		seq = __u64_stats_fetch_begin(&irqtime->sync);
-		total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ];
+		total = irqtime->total;
 	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
 
 	return total;



  parent reply	other threads:[~2018-10-11 15:43 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 15:35 [PATCH 4.9 00/35] 4.9.133-stable review Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 01/35] mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 02/35] fbdev/omapfb: fix omapfb_memory_read infoleak Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 03/35] xen-netback: fix input validation in xenvif_set_hash_mapping() Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 04/35] x86/vdso: Fix asm constraints on vDSO syscall fallbacks Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 05/35] x86/vdso: Fix vDSO syscall fallback asm constraint regression Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 06/35] PCI: Reprogram bridge prefetch registers on resume Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 07/35] mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 08/35] PM / core: Clear the direct_complete flag on errors Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 09/35] dm cache metadata: ignore hints array being too small during resize Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 10/35] dm cache: fix resize crash if user doesnt reload cache table Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 11/35] xhci: Add missing CAS workaround for Intel Sunrise Point xHCI Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 12/35] usb: xhci-mtk: resume USB3 roothub first Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 13/35] USB: serial: simple: add Motorola Tetra MTP6550 id Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 14/35] tty: Drop tty->count on tty_reopen() failure Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 15/35] of: unittest: Disable interrupt node tests for old world MAC systems Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 16/35] ext4: add corruption check in ext4_xattr_set_entry() Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 17/35] ext4: always verify the magic number in xattr blocks Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 18/35] cgroup: Fix deadlock in cpu hotplug path Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 19/35] ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 20/35] ath10k: fix kernel panic issue during pci probe Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 21/35] powerpc/fadump: Return error when fadump registration fails Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 22/35] ARC: clone syscall to setp r25 as thread pointer Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 23/35] x86/mm: Expand static page table for fixmap space Greg Kroah-Hartman
2018-11-01 22:25   ` Ben Hutchings
2018-11-02  3:38     ` Feng Tang
2018-11-02 13:56       ` Sasha Levin
2018-10-11 15:35 ` [PATCH 4.9 24/35] f2fs: fix invalid memory access Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 25/35] ucma: fix a use-after-free in ucma_resolve_ip() Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 26/35] ubifs: Check for name being NULL while mounting Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 27/35] sched/cputime: Convert kcpustat to nsecs Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 28/35] sched/cputime: Increment kcpustat directly on irqtime account Greg Kroah-Hartman
2018-10-11 15:35 ` Greg Kroah-Hartman [this message]
2018-10-11 15:35 ` [PATCH 4.9 30/35] ath10k: fix scan crash due to incorrect length calculation Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 31/35] ebtables: arpreply: Add the standard target sanity check Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 32/35] x86/fpu: Remove use_eager_fpu() Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 33/35] x86/fpu: Remove struct fpu::counter Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 34/35] Revert "perf: sync up x86/.../cpufeatures.h" Greg Kroah-Hartman
2018-10-11 15:35 ` [PATCH 4.9 35/35] x86/fpu: Finish excising eagerfpu Greg Kroah-Hartman
2018-10-11 22:40 ` [PATCH 4.9 00/35] 4.9.133-stable review Shuah Khan
2018-10-12  4:33 ` Naresh Kamboju
2018-10-12 12:20 ` Guenter Roeck
2018-10-12 13:52   ` Greg Kroah-Hartman
2018-10-12 14:44     ` Greg Kroah-Hartman
2018-10-12 17:07 ` Nathan Chancellor
2018-10-12 17:50 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181011152521.346719028@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=brouer@redhat.com \
    --cc=colona@arista.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgruszka@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).