public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Luiz Capitulino <lcapitulino@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Rik van Riel <riel@redhat.com>, Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] tick/nohz: Fix wrong user and system time accouting against vtime sampling
Date: Fri, 7 Apr 2017 12:58:01 -0400	[thread overview]
Message-ID: <20170407125801.503ae098@redhat.com> (raw)
In-Reply-To: <1491460584-4908-1-git-send-email-wanpeng.li@hotmail.com>

On Wed,  5 Apr 2017 23:36:24 -0700
Wanpeng Li <kernellwp@gmail.com> wrote:

> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This can be reproduced by setting up a CPU for nohz_full and pin two tasks 
> that hog the CPU 100% of the time to that CPU, top reports 70% system time 
> and 30% user time utilization sometimes. The trace like this:
> 
>    hog-11980 [015]   341.494491: function:    enter_from_user_mode <-- apic_timer_interrupt
> <idle>-0     [000]   341.494492: function:    smp_apic_timer_interrupt <-- apic_timer_interrupt
>    hog-11980 [015]   341.494492: function:    __context_tracking_exit <-- enter_from_user_mode
> <idle>-0     [000]   341.494492: function:    irq_enter <-- smp_apic_timer_interrupt
>    hog-11980 [015]   341.494492: bprint:      vtime_delta: diff=0 (now=4295008339 vtime_snap=4295008339)
>    hog-11980 [015]   341.494492: function:    smp_apic_timer_interrupt <-- apic_timer_interrupt
>    hog-11980 [015]   341.494492: function:    irq_enter <-- smp_apic_timer_interrupt
>    hog-11980 [015]   341.494493: function:    tick_sched_timer <-- __hrtimer_run_queues
> <idle>-0     [000]   341.494493: function:    tick_sched_timer <-- __hrtimer_run_queues
> <idle>-0     [000]   341.494493: function:    tick_do_update_jiffies64.part.14 <-- tick_sched_do_timer
> <idle>-0     [000]   341.494494: function:    do_timer <-- tick_do_update_jiffies64.part.14
>    hog-11980 [015]   341.494494: function:    irq_exit <-- smp_apic_timer_interrupt
> <idle>-0     [000]   341.494494: bprint:      do_timer: updated jiffies_64=4295008340 ticks=1
>    hog-11980 [015]   341.494494: function:    __context_tracking_enter <-- prepare_exit_to_usermode
>    hog-11980 [015]   341.494494: function:    vtime_user_enter <-- __context_tracking_enter
>    hog-11980 [015]   341.494495: bprint:      vtime_delta: diff=1000000 (now=4295008340 vtime_snap=4295008339)
>    hog-11980 [015]   341.494495: function:    __vtime_account_system <-- vtime_user_enter
>    hog-11980 [015]   341.494495: bprint:      get_vtime_delta: vtime_snap=4295008339 now=4295008340
>    hog-11980 [015]   341.494495: function:    account_system_time <-- __vtime_account_system
>    hog-11980 [015]   341.494495: bprint:      account_system_time: cputime=995488
> <idle>-0     [000]   341.494497: function:    irq_exit <-- smp_apic_timer_interrupt
> 
> In this trace, we see the following:
> 
>  1. On CPU15, we transition from user-space to kernel-space because
>     of a timer interrupt (it's the tick)
> 
>  2. vtimer_delta() returns 0, because jiffies didn't change since the
>     last accounting
> 
>  3. While CPU15 is executing in kernel-space, jiffies is updated
>     by CPU0
> 
>  4. When going back to user-space, vtime_delta() returns non-zero
>     and the whole time is accounted for system time (observe how
>     the cputime parameter in account_system_time() is less than 1ms)
> 
> 
> After commit ff9a9b4c433 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN 
> to jiffy granularity"), it reduces timing accuracy on nohz_full CPUs to 
> jiffy based sampling. So the issue can happen if all CPUs fire their ticks 
> at the same time:
> 
> 
>                CPU 0                  CPU 1
>                -----                  -----
>                                       exit_user() // no cputime update 
> tick X         update_jiffies
>                                       enter_user() // cputime update
> 
> 
>                                       exit_user() // no cputime update
> tick X+1       update_jiffies
>                                       enter_user() // cputime update
> 
> The nohz_full CPU receives an interrupt at the same time the timer 
> interrupt fires on the housekeeping CPU.
> 
> This patch offsets the tick to avert all ticks alignment in order 
> that the vtime sampling does not end up "in phase" with the jiffies 
> incrementing.
> 
> Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
> Suggested-by: Rik van Riel <riel@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

Reviewed-and-Tested-by: Luiz Capitulino <lcapitulino@redhat.com>

Thanks for everyone who participated on tracking this down
and fixing it!

> ---
>  kernel/time/tick-sched.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 7fe53be..35d79f7 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1197,8 +1197,12 @@ void tick_setup_sched_timer(void)
>  	/* Get the next period (per-CPU) */
>  	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
>  
> -	/* Offset the tick to avert jiffies_lock contention. */
> -	if (sched_skew_tick) {
> +	/*
> +	 * Offset the tick to avert jiffies_lock contention, and all ticks
> +	 * alignment in order that the vtime sampling does not end up "in
> +	 * phase" with the jiffies incrementing.
> +	 */
> +	if (sched_skew_tick || tick_nohz_full_enabled()) {
>  		u64 offset = ktime_to_ns(tick_period) >> 1;
>  		do_div(offset, num_possible_cpus());
>  		offset *= smp_processor_id();

  parent reply	other threads:[~2017-04-07 16:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06  6:36 [PATCH] tick/nohz: Fix wrong user and system time accouting against vtime sampling Wanpeng Li
2017-04-06 14:36 ` Rik van Riel
2017-04-07 16:58 ` Luiz Capitulino [this message]
2017-04-10 15:45 ` Thomas Gleixner
2017-04-10 18:01   ` Frederic Weisbecker
2017-04-10 21:51     ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170407125801.503ae098@redhat.com \
    --to=lcapitulino@redhat.com \
    --cc=efault@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=kernellwp@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox