All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Frederic Weisbecker <fweisbec@gmail.com>,
	Rik van Riel <riel@redhat.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Stanislaw Gruszka <sgruszka@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Ivan Delalande <colona@arista.com>
Subject: [PATCH 4.9 33/35] sched/cputime: Fix ksoftirqd cputime accounting regression
Date: Thu, 18 Oct 2018 19:55:02 +0200	[thread overview]
Message-ID: <20181018175427.156878860@linuxfoundation.org> (raw)
In-Reply-To: <20181018175422.506152522@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Frederic Weisbecker <fweisbec@gmail.com>

commit 25e2d8c1b9e327ed260edd13169cc22bc7a78bc6 upstream.

irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.

But this behaviour got broken by:

  a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account")

... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().

This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.

ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().

To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.

Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/sched/cputime.c |   27 ++++++++++++++++-----------
 kernel/sched/sched.h   |    9 +++++++--
 2 files changed, 23 insertions(+), 13 deletions(-)

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -37,6 +37,18 @@ void disable_sched_clock_irqtime(void)
 	sched_clock_irqtime = 0;
 }
 
+static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
+				  enum cpu_usage_stat idx)
+{
+	u64 *cpustat = kcpustat_this_cpu->cpustat;
+
+	u64_stats_update_begin(&irqtime->sync);
+	cpustat[idx] += delta;
+	irqtime->total += delta;
+	irqtime->tick_delta += delta;
+	u64_stats_update_end(&irqtime->sync);
+}
+
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
@@ -44,7 +56,6 @@ void disable_sched_clock_irqtime(void)
 void irqtime_account_irq(struct task_struct *curr)
 {
 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	s64 delta;
 	int cpu;
 
@@ -55,22 +66,16 @@ void irqtime_account_irq(struct task_str
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
 
-	u64_stats_update_begin(&irqtime->sync);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count()) {
-		cpustat[CPUTIME_IRQ] += delta;
-		irqtime->tick_delta += delta;
-	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
-		cpustat[CPUTIME_SOFTIRQ] += delta;
-		irqtime->tick_delta += delta;
-	}
-
-	u64_stats_update_end(&irqtime->sync);
+	if (hardirq_count())
+		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
+	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1743,6 +1743,7 @@ static inline void nohz_balance_exit_idl
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 struct irqtime {
+	u64			total;
 	u64			tick_delta;
 	u64			irq_start_time;
 	struct u64_stats_sync	sync;
@@ -1750,16 +1751,20 @@ struct irqtime {
 
 DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
 
+/*
+ * Returns the irqtime minus the softirq time computed by ksoftirqd.
+ * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime
+ * and never move forward.
+ */
 static inline u64 irq_time_read(int cpu)
 {
 	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
-	u64 *cpustat = kcpustat_cpu(cpu).cpustat;
 	unsigned int seq;
 	u64 total;
 
 	do {
 		seq = __u64_stats_fetch_begin(&irqtime->sync);
-		total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ];
+		total = irqtime->total;
 	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
 
 	return total;



  parent reply	other threads:[~2018-10-18 18:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18 17:54 [PATCH 4.9 00/35] 4.9.135-stable review Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 01/35] media: af9035: prevent buffer overflow on write Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 02/35] batman-adv: Fix segfault when writing to throughput_override Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 03/35] batman-adv: Fix segfault when writing to sysfs elp_interval Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 04/35] batman-adv: Prevent duplicated nc_node entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 05/35] batman-adv: Prevent duplicated softif_vlan entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 06/35] batman-adv: Prevent duplicated global TT entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 07/35] batman-adv: Prevent duplicated tvlv handler Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 08/35] batman-adv: fix backbone_gw refcount on queue_work() failure Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 09/35] batman-adv: fix hardif_neigh " Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 10/35] clocksource/drivers/ti-32k: Add CLOCK_SOURCE_SUSPEND_NONSTOP flag for non-am43 SoCs Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 11/35] scsi: ibmvscsis: Fix a stringop-overflow warning Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 12/35] scsi: ibmvscsis: Ensure partition name is properly NUL terminated Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 13/35] Input: atakbd - fix Atari keymap Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 14/35] Input: atakbd - fix Atari CapsLock behaviour Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 15/35] ravb: do not write 1 to reserved bits Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 16/35] drm: mali-dp: Call drm_crtc_vblank_reset on device init Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 17/35] scsi: sd: dont crash the host on invalid commands Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 18/35] net/mlx4: Use cpumask_available for eq->affinity_mask Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 19/35] RISC-V: include linux/ftrace.h in asm-prototypes.h Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 20/35] powerpc/tm: Fix userspace r13 corruption Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 21/35] powerpc/tm: Avoid possible userspace r1 corruption on reclaim Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 22/35] iommu/amd: Return devid as alias for ACPI HID devices Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 23/35] mremap: properly flush TLB before releasing the page Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 24/35] mm: Preserve _PAGE_DEVMAP across mprotect() calls Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 25/35] netfilter: check for seqadj ext existence before adding it in nf_nat_setup_info Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 26/35] ARC: build: Get rid of toolchain check Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 27/35] ARC: build: Dont set CROSS_COMPILE in archs Makefile Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 28/35] HID: quirks: fix support for Apple Magic Keyboards Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 29/35] usb: gadget: serial: fix oops when data rxd after close Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 30/35] sched/cputime: Convert kcpustat to nsecs Greg Kroah-Hartman
2018-10-18 17:55 ` [PATCH 4.9 31/35] macintosh/rack-meter: Convert cputime64_t use to u64 Greg Kroah-Hartman
2018-10-18 17:55 ` [PATCH 4.9 32/35] sched/cputime: Increment kcpustat directly on irqtime account Greg Kroah-Hartman
2018-10-18 17:55 ` Greg Kroah-Hartman [this message]
2018-10-18 17:55 ` [PATCH 4.9 34/35] ext4: avoid running out of journal credits when appending to an inline file Greg Kroah-Hartman
2018-10-18 17:55 ` [PATCH 4.9 35/35] HV: properly delay KVP packets when negotiation is in progress Greg Kroah-Hartman
2018-10-19  1:41 ` [PATCH 4.9 00/35] 4.9.135-stable review Nathan Chancellor
2018-10-19 13:20 ` Rafael David Tinoco
2018-10-19 15:48 ` Guenter Roeck
2018-10-19 20:46 ` Shuah Khan
2018-10-19 20:55 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181018175427.156878860@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=brouer@redhat.com \
    --cc=colona@arista.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgruszka@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.