From: Luiz Capitulino <lcapitulino@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Wanpeng Li <wanpeng.li@hotmail.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Rik van Riel <riel@redhat.com>, Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] tick/nohz: Fix wrong user and system time accouting against vtime sampling
Date: Fri, 7 Apr 2017 12:58:01 -0400 [thread overview]
Message-ID: <20170407125801.503ae098@redhat.com> (raw)
In-Reply-To: <1491460584-4908-1-git-send-email-wanpeng.li@hotmail.com>
On Wed, 5 Apr 2017 23:36:24 -0700
Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> This can be reproduced by setting up a CPU for nohz_full and pin two tasks
> that hog the CPU 100% of the time to that CPU, top reports 70% system time
> and 30% user time utilization sometimes. The trace like this:
>
> hog-11980 [015] 341.494491: function: enter_from_user_mode <-- apic_timer_interrupt
> <idle>-0 [000] 341.494492: function: smp_apic_timer_interrupt <-- apic_timer_interrupt
> hog-11980 [015] 341.494492: function: __context_tracking_exit <-- enter_from_user_mode
> <idle>-0 [000] 341.494492: function: irq_enter <-- smp_apic_timer_interrupt
> hog-11980 [015] 341.494492: bprint: vtime_delta: diff=0 (now=4295008339 vtime_snap=4295008339)
> hog-11980 [015] 341.494492: function: smp_apic_timer_interrupt <-- apic_timer_interrupt
> hog-11980 [015] 341.494492: function: irq_enter <-- smp_apic_timer_interrupt
> hog-11980 [015] 341.494493: function: tick_sched_timer <-- __hrtimer_run_queues
> <idle>-0 [000] 341.494493: function: tick_sched_timer <-- __hrtimer_run_queues
> <idle>-0 [000] 341.494493: function: tick_do_update_jiffies64.part.14 <-- tick_sched_do_timer
> <idle>-0 [000] 341.494494: function: do_timer <-- tick_do_update_jiffies64.part.14
> hog-11980 [015] 341.494494: function: irq_exit <-- smp_apic_timer_interrupt
> <idle>-0 [000] 341.494494: bprint: do_timer: updated jiffies_64=4295008340 ticks=1
> hog-11980 [015] 341.494494: function: __context_tracking_enter <-- prepare_exit_to_usermode
> hog-11980 [015] 341.494494: function: vtime_user_enter <-- __context_tracking_enter
> hog-11980 [015] 341.494495: bprint: vtime_delta: diff=1000000 (now=4295008340 vtime_snap=4295008339)
> hog-11980 [015] 341.494495: function: __vtime_account_system <-- vtime_user_enter
> hog-11980 [015] 341.494495: bprint: get_vtime_delta: vtime_snap=4295008339 now=4295008340
> hog-11980 [015] 341.494495: function: account_system_time <-- __vtime_account_system
> hog-11980 [015] 341.494495: bprint: account_system_time: cputime=995488
> <idle>-0 [000] 341.494497: function: irq_exit <-- smp_apic_timer_interrupt
>
> In this trace, we see the following:
>
> 1. On CPU15, we transition from user-space to kernel-space because
> of a timer interrupt (it's the tick)
>
> 2. vtimer_delta() returns 0, because jiffies didn't change since the
> last accounting
>
> 3. While CPU15 is executing in kernel-space, jiffies is updated
> by CPU0
>
> 4. When going back to user-space, vtime_delta() returns non-zero
> and the whole time is accounted for system time (observe how
> the cputime parameter in account_system_time() is less than 1ms)
>
>
> After commit ff9a9b4c433 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN
> to jiffy granularity"), it reduces timing accuracy on nohz_full CPUs to
> jiffy based sampling. So the issue can happen if all CPUs fire their ticks
> at the same time:
>
>
> CPU 0 CPU 1
> ----- -----
> exit_user() // no cputime update
> tick X update_jiffies
> enter_user() // cputime update
>
>
> exit_user() // no cputime update
> tick X+1 update_jiffies
> enter_user() // cputime update
>
> The nohz_full CPU receives an interrupt at the same time the timer
> interrupt fires on the housekeeping CPU.
>
> This patch offsets the tick to avert all ticks alignment in order
> that the vtime sampling does not end up "in phase" with the jiffies
> incrementing.
>
> Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
> Suggested-by: Rik van Riel <riel@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-and-Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Thanks for everyone who participated on tracking this down
and fixing it!
> ---
> kernel/time/tick-sched.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 7fe53be..35d79f7 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1197,8 +1197,12 @@ void tick_setup_sched_timer(void)
> /* Get the next period (per-CPU) */
> hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
>
> - /* Offset the tick to avert jiffies_lock contention. */
> - if (sched_skew_tick) {
> + /*
> + * Offset the tick to avert jiffies_lock contention, and all ticks
> + * alignment in order that the vtime sampling does not end up "in
> + * phase" with the jiffies incrementing.
> + */
> + if (sched_skew_tick || tick_nohz_full_enabled()) {
> u64 offset = ktime_to_ns(tick_period) >> 1;
> do_div(offset, num_possible_cpus());
> offset *= smp_processor_id();
next prev parent reply other threads:[~2017-04-07 16:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-06 6:36 [PATCH] tick/nohz: Fix wrong user and system time accouting against vtime sampling Wanpeng Li
2017-04-06 14:36 ` Rik van Riel
2017-04-07 16:58 ` Luiz Capitulino [this message]
2017-04-10 15:45 ` Thomas Gleixner
2017-04-10 18:01 ` Frederic Weisbecker
2017-04-10 21:51 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170407125801.503ae098@redhat.com \
--to=lcapitulino@redhat.com \
--cc=efault@gmx.de \
--cc=fweisbec@gmail.com \
--cc=kernellwp@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox