public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Rik van Riel <riel@redhat.com>,
	Stanislaw Gruszka <sgruszka@redhat.com>,
	brouer@redhat.com
Subject: Re: [PATCH] sched/cputime: Fix ksoftirqd cputime accounting regression
Date: Wed, 26 Apr 2017 16:30:00 +0200	[thread overview]
Message-ID: <20170426163000.19082290@redhat.com> (raw)
In-Reply-To: <1493129448-5356-1-git-send-email-fweisbec@gmail.com>

On Tue, 25 Apr 2017 16:10:48 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> irq_time_read() returns the irqtime minus the ksoftirqd time. This
> is necessary because irq_time_read() is used to substract the IRQ time
> from the sum_exec_runtime of a task. If we were to include the softirq
> time of ksoftirqd, this task would substract its own CPU time everytime
> it updates ksoftirqd->sum_exec_runtime which would therefore never
> progress.
> 
> But this behaviour got broken by commit a499a5a14db:
> 	("sched/cputime: Increment kcpustat directly on irqtime account")
> which now includes ksoftirqd softirq time in the time returned by
> irq_time_read().
> 
> This has resulted in wrong ksotfirqd cputime reported to userspace
                             ^^^^^^^^^
Typo in commit message, guess maintainer can fix/amend that.

> through /proc/stat and thus "top" not showing ksoftirqd when it should
> after intense networking load.
> 
> ksoftirqd->stime happens to be correct but it gets scaled down by
> sum_exec_runtime through task_cputime_adjusted().
> 
> To fix this, just account the strict IRQ time in a separate counter and
> use it to report the IRQ time.
> 
> Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Confirming I have tested patch before Frederic posted it, and have been
running with this applied on my net-next testlab kernels since Monday.

And it does fix the bisected regression I reported here:
  http://lkml.kernel.org/r/20170328101403.34a82fbf@redhat.com


> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Stanislaw Gruszka <sgruszka@redhat.com>
> Cc: Wanpeng Li <wanpeng.li@hotmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>

Please add:

Fixed: a499a5a14dbd ("sched/cputime: Increment kcpustat directly on irqtime account")

--Jesper


> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

> ---
>  kernel/sched/cputime.c | 27 ++++++++++++++++-----------
>  kernel/sched/sched.h   |  9 +++++++--
>  2 files changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index f3778e2b..aea3135 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -34,6 +34,18 @@ void disable_sched_clock_irqtime(void)
>  	sched_clock_irqtime = 0;
>  }
>  
> +static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
> +				  enum cpu_usage_stat idx)
> +{
> +	u64 *cpustat = kcpustat_this_cpu->cpustat;
> +
> +	u64_stats_update_begin(&irqtime->sync);
> +	cpustat[idx] += delta;
> +	irqtime->total += delta;
> +	irqtime->tick_delta += delta;
> +	u64_stats_update_end(&irqtime->sync);
> +}
> +
>  /*
>   * Called before incrementing preempt_count on {soft,}irq_enter
>   * and before decrementing preempt_count on {soft,}irq_exit.
> @@ -41,7 +53,6 @@ void disable_sched_clock_irqtime(void)
>  void irqtime_account_irq(struct task_struct *curr)
>  {
>  	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> -	u64 *cpustat = kcpustat_this_cpu->cpustat;
>  	s64 delta;
>  	int cpu;
>  
> @@ -52,22 +63,16 @@ void irqtime_account_irq(struct task_struct *curr)
>  	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
>  	irqtime->irq_start_time += delta;
>  
> -	u64_stats_update_begin(&irqtime->sync);
>  	/*
>  	 * We do not account for softirq time from ksoftirqd here.
>  	 * We want to continue accounting softirq time to ksoftirqd thread
>  	 * in that case, so as not to confuse scheduler with a special task
>  	 * that do not consume any time, but still wants to run.
>  	 */
> -	if (hardirq_count()) {
> -		cpustat[CPUTIME_IRQ] += delta;
> -		irqtime->tick_delta += delta;
> -	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
> -		cpustat[CPUTIME_SOFTIRQ] += delta;
> -		irqtime->tick_delta += delta;
> -	}
> -
> -	u64_stats_update_end(&irqtime->sync);
> +	if (hardirq_count())
> +		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> +	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
> +		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
>  }
>  EXPORT_SYMBOL_GPL(irqtime_account_irq);
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5cbf922..767aab3 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1869,6 +1869,7 @@ static inline void nohz_balance_exit_idle(unsigned int cpu) { }
>  
>  #ifdef CONFIG_IRQ_TIME_ACCOUNTING
>  struct irqtime {
> +	u64			total;
>  	u64			tick_delta;
>  	u64			irq_start_time;
>  	struct u64_stats_sync	sync;
> @@ -1876,16 +1877,20 @@ struct irqtime {
>  
>  DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
>  
> +/*
> + * Returns the irqtime minus the softirq time computed by ksoftirqd.
> + * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime
> + * and never move forward.
> + */
>  static inline u64 irq_time_read(int cpu)
>  {
>  	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
> -	u64 *cpustat = kcpustat_cpu(cpu).cpustat;
>  	unsigned int seq;
>  	u64 total;
>  
>  	do {
>  		seq = __u64_stats_fetch_begin(&irqtime->sync);
> -		total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ];
> +		total = irqtime->total;
>  	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
>  
>  	return total;



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2017-04-26 14:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 14:10 [PATCH] sched/cputime: Fix ksoftirqd cputime accounting regression Frederic Weisbecker
2017-04-26 14:30 ` Jesper Dangaard Brouer [this message]
2017-04-26 15:44 ` Rik van Riel
2017-04-27 14:16 ` [tip:sched/urgent] " tip-bot for Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170426163000.19082290@redhat.com \
    --to=brouer@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox