All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Luiz Capitulino <lcapitulino@redhat.com>
Cc: fweisbec@gmail.com, linux-kernel@vger.kernel.org,
	linux-rt-users@vger.kernel.org
Subject: Re: [BUG nohz]: wrong user and system time accounting
Date: Thu, 23 Mar 2017 21:08:38 -0400	[thread overview]
Message-ID: <1490317718.8850.61.camel@redhat.com> (raw)
In-Reply-To: <20170323210541.5222eaf6@redhat.com>

On Thu, 2017-03-23 at 21:05 -0400, Luiz Capitulino wrote:
> On Thu, 23 Mar 2017 20:56:02 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > On Thu, 2017-03-23 at 16:55 -0400, Luiz Capitulino wrote:
> > > When there are two or more tasks executing in user-space and
> > > taking 100% of a nohz_full CPU, top reports 70% system time
> > > and 30% user time utilization. Sometimes I'm even able to get
> > > 100% system time and 0% user time.
> > > 
> > > This was reproduced with latest Linus tree (093b995), but I
> > > don't believe it's a regression (at least not a recent one)
> > > as I can reproduce it with older kernels. Also, I have
> > > CONFIG_IRQ_TIME_ACCOUNTING=y and haven't tried to reproduce
> > > without it yet.
> > > 
> > > Below you'll find the steps to reproduce and some initial
> > > analysis.
> > > 
> > > Steps to reproduce
> > > ------------------
> > > 
> > > 1. Set up a CPU for nohz_full with isolcpus= nohz_full=
> > > 
> > > 2. Pin two tasks that hog the CPU 100% of the time to that CPU
> > > 
> > > 3. Run top -d1 and check system time
> > > 
> > > NOTE: When there's only one task hogging a nohz_full CPU, top
> > >       shows 100% user-time, as expected
> > > 
> > > Initial analysis
> > > ----------------
> > > 
> > > When tracing vtime accounting functions and the user-space/kernel
> > > transitions when the issue is taking place, I see several of the
> > > following:
> > > 
> > > hog-10552 [015]  1132.711104:
> > > function:             enter_from_user_mode <--
> > > apic_timer_interrupt
> > > hog-10552 [015]  1132.711105:
> > > function:             __context_tracking_exit <--
> > > enter_from_user_mode
> > > hog-10552 [015]  1132.711105:
> > > bprint:               __context_tracking_exit.part.4: new state=1
> > > cur
> > > state=1 active=1
> > > hog-10552 [015]  1132.711105:
> > > function:             vtime_account_user <--
> > > __context_tracking_exit.part.4
> > > hog-10552 [015]  1132.711105:
> > > function:             smp_apic_timer_interrupt <--
> > > apic_timer_interrupt
> > > hog-10552 [015]  1132.711106: function:             irq_enter <--
> > > smp_apic_timer_interrupt
> > > hog-10552 [015]  1132.711106:
> > > function:             tick_sched_timer
> > > <-- __hrtimer_run_queues
> > > hog-10552 [015]  1132.711108: function:             irq_exit <--
> > > smp_apic_timer_interrupt
> > > hog-10552 [015]  1132.711108:
> > > function:             __context_tracking_enter <--
> > > prepare_exit_to_usermode
> > > hog-10552 [015]  1132.711108:
> > > bprint:               __context_tracking_enter.part.2: new
> > > state=1
> > > cur state=0 active=1
> > > hog-10552 [015]  1132.711109:
> > > function:             vtime_user_enter
> > > <-- __context_tracking_enter.part.2
> > > hog-10552 [015]  1132.711109:
> > > function:             __vtime_account_system <-- vtime_user_enter
> > > hog-10552 [015]  1132.711109:
> > > function:             account_system_time <--
> > > __vtime_account_system
> > > 
> > > On entering the kernel due to a timer interrupt,
> > > vtime_account_user()
> > > skips user-time accounting. Then later on when returning to user-
> > > space,
> > > vtime_user_enter() is probably accounting the whole time (ie.
> > > user-
> > > space
> > > plus kernel-space) to system time.
> > > 
> > > Now, when does vtime_account_user() skips accounting? Well, when
> > > the
> > > time delta is less then one jiffie. This would imply that
> > > vtime_account_user()
> > > is being called less than one jiffie since the last accounting,
> > > but I
> > > haven't
> > > confirmed any of this yet.  
> > 
> > Jiffies should be advanced by the timer interrupt, on the
> > housekeeping CPU, which is not doing context tracking.
> 
> The hypothesis isn't that it wasn't advanced, but that we stayed in
> user-space less than 1ms.

That is part of the hypothesis. The other part of the hypothesis
involves jiffies advancing on the nohz_full & isolated CPU while
that CPU is in kernel mode 30% of the time.

I have no good explanation for the latter yet...

> > Why is the isolated/nohz_full CPU receiving timer interrupts
> > at all?
> > 
> > I thought it would not, but obviously I am wrong. What is
> > going on here?
> 
> There are two runnable SCHED_OTHER tasks on the nohz_full CPU. When
> that happens, the tick is re-activated. We're not nohz_full anymore,
> but accounting should still work.

Isn't the scheduler tick distinct from the timer interrupt,
or am I confused?

  reply	other threads:[~2017-03-24  1:08 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 20:55 [BUG nohz]: wrong user and system time accounting Luiz Capitulino
2017-03-24  0:56 ` Rik van Riel
2017-03-24  1:05   ` Luiz Capitulino
2017-03-24  1:08     ` Rik van Riel [this message]
2017-03-24  1:39       ` Luiz Capitulino
2017-03-27  5:33   ` lkml
2017-03-24  1:52 ` Wanpeng Li
2017-03-24  3:56   ` Luiz Capitulino
2017-03-27  1:56 ` Wanpeng Li
2017-03-27 17:35   ` Rik van Riel
2017-03-28  7:19     ` Wanpeng Li
     [not found]     ` <20170328132406.7d23579c@redhat.com>
     [not found]       ` <20170328161454.4a5d9e8b@redhat.com>
2017-03-28 21:01         ` Rik van Riel
2017-03-28 21:26           ` Luiz Capitulino
2017-03-29  9:56             ` Wanpeng Li
2017-03-29 12:56               ` Frederic Weisbecker
2017-03-28 21:24         ` Rik van Riel
2017-03-28 21:30           ` Luiz Capitulino
     [not found]       ` <20170329131656.1d6cb743@redhat.com>
2017-03-29 20:08         ` Rik van Riel
2017-03-29 22:54           ` Frederic Weisbecker
2017-03-30 12:57             ` Rik van Riel
2017-03-30  1:58           ` Wanpeng Li
2017-03-30 12:40             ` Frederic Weisbecker
2017-03-30 13:19               ` Mike Galbraith
2017-03-30  4:27           ` Mike Galbraith
2017-03-30  6:47             ` Wanpeng Li
2017-03-30 11:52               ` Wanpeng Li
2017-03-30 12:33                 ` Mike Galbraith
2017-03-30 13:38               ` Frederic Weisbecker
2017-03-30 13:59                 ` Wanpeng Li
2017-03-30 14:18                   ` Frederic Weisbecker
2017-03-30 21:25                     ` Luiz Capitulino
2017-03-31 20:09                       ` Luiz Capitulino
2017-03-31 23:24                         ` Frederic Weisbecker
2017-04-01  3:11                           ` Luiz Capitulino
2017-04-03 15:23                             ` Frederic Weisbecker
2017-04-03 19:06                               ` Luiz Capitulino
2017-04-04 17:36                                 ` Luiz Capitulino
2017-04-05 14:26                                   ` Rik van Riel
2017-04-11 11:03                 ` Wanpeng Li
2017-04-11 11:36                   ` Peter Zijlstra
2017-04-11 11:43                     ` Wanpeng Li
2017-04-11 14:22               ` Thomas Gleixner
2017-04-12 13:18                 ` Frederic Weisbecker
2017-04-12 14:57                   ` Thomas Gleixner
2017-04-12 15:14                     ` Frederic Weisbecker
2017-04-13  4:31                     ` Wanpeng Li
2017-04-13 13:32                       ` Frederic Weisbecker
2017-05-02 10:01                         ` Wanpeng Li
2017-05-15  8:17                           ` Wanpeng Li
2017-06-29 17:22                             ` Frederic Weisbecker
2017-03-30 12:51             ` Frederic Weisbecker
2017-03-30 13:02               ` Rik van Riel
2017-03-30 13:35                 ` Mike Galbraith
2017-04-03 14:40                   ` Frederic Weisbecker
2017-04-04  7:32                     ` Mike Galbraith
2017-03-30 13:44                 ` Frederic Weisbecker
     [not found]         ` <20170329221700.GB23895@lerouge>
2017-03-29 22:46           ` Wanpeng Li
2017-03-30  2:14             ` Luiz Capitulino
2017-03-30 12:27               ` Wanpeng Li
2017-03-27 18:38   ` Luiz Capitulino
2017-03-28  5:28     ` Wanpeng Li
2017-03-28 13:44       ` Luiz Capitulino
2017-03-29 13:04 ` Frederic Weisbecker
2017-03-29 13:14   ` Rik van Riel
2017-03-29 13:23     ` Luiz Capitulino
2017-03-29 21:12       ` Frederic Weisbecker
2017-03-30  1:48         ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1490317718.8850.61.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.