public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: riel@redhat.com
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@kernel.org, pbonzini@redhat.com, fweisbec@redhat.com,
	wanpeng.li@hotmail.com, efault@gmx.de, tglx@linutronix.de,
	rkrcmar@redhat.com
Subject: Re: [PATCH 1/4] sched,time: count actually elapsed irq & softirq time
Date: Tue, 5 Jul 2016 14:40:34 +0200	[thread overview]
Message-ID: <20160705124033.GA5332@lerouge> (raw)
In-Reply-To: <1467315350-3152-2-git-send-email-riel@redhat.com>

On Thu, Jun 30, 2016 at 03:35:47PM -0400, riel@redhat.com wrote:
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 3d60e5d76fdb..018bae2ada36 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -79,40 +79,50 @@ void irqtime_account_irq(struct task_struct *curr)
>  }
>  EXPORT_SYMBOL_GPL(irqtime_account_irq);
>  
> -static int irqtime_account_hi_update(void)
> +static cputime_t irqtime_account_hi_update(cputime_t maxtime)
>  {
>  	u64 *cpustat = kcpustat_this_cpu->cpustat;
>  	unsigned long flags;
> -	u64 latest_ns;
> -	int ret = 0;
> +	cputime_t irq_cputime;
>  
>  	local_irq_save(flags);
> -	latest_ns = this_cpu_read(cpu_hardirq_time);
> -	if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
> -		ret = 1;
> +	irq_cputime = nsecs_to_cputime(this_cpu_read(cpu_hardirq_time)) -
> +		      cpustat[CPUTIME_IRQ];

We might want to keep nsecs_to_cputime64(). If cputime_t == jiffies_t == unsigned long,
we may have a problem after 49 days of interrupts. Arguably that's a lot of IRQs
but lets be paranoid.

> +	irq_cputime = min(irq_cputime, maxtime);
> +	cpustat[CPUTIME_IRQ] += irq_cputime;
>  	local_irq_restore(flags);
> -	return ret;
> +	return irq_cputime;
>  }
>  
> -static int irqtime_account_si_update(void)
> +static cputime_t irqtime_account_si_update(cputime_t maxtime)
>  {
>  	u64 *cpustat = kcpustat_this_cpu->cpustat;
>  	unsigned long flags;
> -	u64 latest_ns;
> -	int ret = 0;
> +	cputime_t softirq_cputime;
>  
>  	local_irq_save(flags);
> -	latest_ns = this_cpu_read(cpu_softirq_time);
> -	if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_SOFTIRQ])
> -		ret = 1;
> +	softirq_cputime = nsecs_to_cputime(this_cpu_read(cpu_softirq_time)) -

Ditto.

> +			  cpustat[CPUTIME_SOFTIRQ];
> +	softirq_cputime = min(softirq_cputime, maxtime);
> +	cpustat[CPUTIME_SOFTIRQ] += softirq_cputime;
>  	local_irq_restore(flags);
> -	return ret;
> +	return softirq_cputime;
>  }
>  
>  #else /* CONFIG_IRQ_TIME_ACCOUNTING */
>  
>  #define sched_clock_irqtime	(0)
>  
> +static cputime_t irqtime_account_hi_update(cputime_t dummy)
> +{
> +	return 0;
> +}
> +
> +static cputime_t irqtime_account_si_update(cputime_t dummy)
> +{
> +	return 0;
> +}
> +
>  #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
>  
>  static inline void task_group_account_field(struct task_struct *p, int index,
> @@ -257,32 +267,45 @@ void account_idle_time(cputime_t cputime)
>  		cpustat[CPUTIME_IDLE] += (__force u64) cputime;
>  }
>  
> -static __always_inline unsigned long steal_account_process_tick(unsigned long max_jiffies)
> +static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
>  {
>  #ifdef CONFIG_PARAVIRT
>  	if (static_key_false(&paravirt_steal_enabled)) {
> +		cputime_t steal_cputime;
>  		u64 steal;
> -		unsigned long steal_jiffies;
>  
>  		steal = paravirt_steal_clock(smp_processor_id());
>  		steal -= this_rq()->prev_steal_time;
> +		this_rq()->prev_steal_time += steal;

We are accounting steal_cputime but you make it remember steal_nsecs. This is
leaking quite some steal time in the way.

Imagine that cputime_t == jiffies_t and HZ=100.
paravirt_steal_clock() returns 199 nsecs. prev_steal_time gets added those 199.
nsecs_to_cputime() return 1 jiffy (we are one nsec off the next jiffy). So
account_steal_time() is accounting 1 jiffy and the 99 remaining nsecs are leaked.
If some more steal time is to be accounted on the next tick, the 99 previous nsecs
are forgotten.

A non-leaking sequence would rather be:

	steal = paravirt_steal_clock(smp_processor_id());
	steal -= this_rq()->prev_steal_time;
	steal_cputime = min(nsecs_to_cputime(steal), maxtime);
	account_steal_time(steal_cputime);
	this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);

Thanks!

>  
> -		/*
> -		 * steal is in nsecs but our caller is expecting steal
> -		 * time in jiffies. Lets cast the result to jiffies
> -		 * granularity and account the rest on the next rounds.
> -		 */
> -		steal_jiffies = min(nsecs_to_jiffies(steal), max_jiffies);
> -		this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies);
> +		steal_cputime = min(nsecs_to_cputime(steal), maxtime);
> +		account_steal_time(steal_cputime);
>  
> -		account_steal_time(jiffies_to_cputime(steal_jiffies));
> -		return steal_jiffies;
> +		return steal_cputime;

  reply	other threads:[~2016-07-05 12:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-30 19:35 [PATCH v3 0/4] sched,time: fix irq time accounting with nohz_idle riel
2016-06-30 19:35 ` [PATCH 1/4] sched,time: count actually elapsed irq & softirq time riel
2016-07-05 12:40   ` Frederic Weisbecker [this message]
2016-07-05 13:08     ` Rik van Riel
2016-07-05 14:00       ` Frederic Weisbecker
2016-07-05 16:47     ` [PATCH v3 " Rik van Riel
2016-07-06 14:15       ` Frederic Weisbecker
2016-06-30 19:35 ` [PATCH 2/4] nohz,cputime: replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code riel
2016-06-30 19:35 ` [PATCH 3/4] irqtime: add irq type parameter to irqtime_account_irq riel
2016-06-30 19:35 ` [PATCH 4/4] irqtime: drop local_irq_save/restore from irqtime_account_irq riel
2016-07-08 12:30   ` Frederic Weisbecker
2016-07-08 13:19     ` Rik van Riel
2016-07-08 14:01       ` Frederic Weisbecker
2016-07-08 14:34       ` Paolo Bonzini
2016-07-08 15:56         ` Rik van Riel
2016-07-08 23:58           ` Frederic Weisbecker
2016-07-05 13:02 ` [PATCH v3 0/4] sched,time: fix irq time accounting with nohz_idle Nikolay Borisov
2016-07-05 13:09   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160705124033.GA5332@lerouge \
    --to=fweisbec@gmail.com \
    --cc=efault@gmx.de \
    --cc=fweisbec@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox