public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>,
	Sonny Rao <sonnyrao@chromium.org>, Jann Horn <jann@thejh.net>,
	Robert Foss <robert.foss@collabora.com>,
	corbet@lwn.net, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	John Stultz <john.stultz@linaro.org>,
	ross.zwisler@linux.intel.com, jmarchan@redhat.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	Kees Cook <keescook@chromium.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Robin Humble <plaguedbypenguins@gmail.com>,
	David Rientjes <rientjes@google.com>,
	eric.engestrom@imgtec.com, Janis Danisevskis <jdanis@google.com>,
	calvinowens@fb.com, Alexey Dobriyan <adobriyan@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	ldufour@linux.vnet.ibm.com, linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ben Zhang <benzh@chromium.org>, Bryan Freed <bfreed@chromium.org>,
	Filipe Brandenburger <filbranden@chromium.org>,
	Mateusz Guzik <mguzik@redhat.com>
Subject: Re: utime accounting regression since 4.6 (was: Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps)
Date: Tue, 23 Aug 2016 17:46:11 -0400	[thread overview]
Message-ID: <1471988771.2751.29.camel@redhat.com> (raw)
In-Reply-To: <20160823143330.GL23577@dhcp22.suse.cz>

On Tue, 2016-08-23 at 16:33 +0200, Michal Hocko wrote:
> On Tue 23-08-16 10:26:03, Michal Hocko wrote:
> > On Mon 22-08-16 19:47:09, Michal Hocko wrote:
> > > On Mon 22-08-16 19:29:36, Michal Hocko wrote:
> > > > On Mon 22-08-16 18:45:54, Michal Hocko wrote:
> > > > [...]
> > > > > I have no idea why those numbers are so different on my
> > > > > laptop
> > > > > yet. It surely looks suspicious. I will try to debug this
> > > > > further
> > > > > tomorrow.
> > > > 
> > > > Hmm, so I've tried to use my version of awk on other machine
> > > > and vice
> > > > versa and it didn't make any difference. So this is independent
> > > > on the
> > > > awk version it seems. So I've tried to strace /usr/bin/time and
> > > > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0,
> > > > {ru_utime={0, 0}, ru_stime={0, 688438}, ...}) = 9128
> > > > 
> > > > so the kernel indeed reports 0 user time for some reason. Note
> > > > I
> > > > was testing with 4.7 and right now with 4.8.0-rc3 kernel (no
> > > > local
> > > > modifications). The other machine which reports non-0 utime is
> > > > 3.12
> > > > SLES kernel. Maybe I am hitting some accounting bug. At first I
> > > > was
> > > > suspecting CONFIG_NO_HZ_FULL because that is the main
> > > > difference between
> > > > my and the other machine but then I've noticed that the tests I
> > > > was
> > > > doing in kvm have this disabled too.. so it must be something
> > > > else.
> > > 
> > > 4.5 reports non-0 while 4.6 zero utime. NO_HZ configuration is
> > > the same
> > > in both kernels.
> > 
> > and one more thing. It is not like utime accounting would be
> > completely
> > broken and always report 0. Other commands report non-0 values even
> > on
> > 4.6 kernels. I will try to bisect this down later today.
> 
> OK, so it seems I found it. I was quite lucky because
> account_user_time
> is not all that popular function and there were basically no changes
> besides Riks ff9a9b4c4334 ("sched, time: Switch
> VIRT_CPU_ACCOUNTING_GEN
> to jiffy granularity") and that seems to cause the regression.
> Reverting
> the commit on top of the current mmotm seems to fix the issue for me.
> 
> And just to give Rik more context. While debugging overhead of the
> /proc/<pid>/smaps I am getting a misleading output from /usr/bin/time
> -v
> (source for ./max_mmap is [1])
> 
> root@test1:~# uname -r
> 4.5.0-rc6-bisect1-00025-gff9a9b4c4334
> root@test1:~# ./max_map 
> pid:2990 maps:65515
> root@test1:~# /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2}
> END {printf "rss:%d pss:%d\n", rss, pss}' /proc/2990/smaps
> rss:263368 pss:262203
>         Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END
> {printf "rss:%d pss:%d\n", rss, pss} /proc/2990/smaps"
>         User time (seconds): 0.00
>         System time (seconds): 0.45
>         Percent of CPU this job got: 98%
> 

> root@test1:~# /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2}
> END {printf "rss:%d pss:%d\n", rss, pss}' /proc/3015/smaps
> rss:263316 pss:262199
>         Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END
> {printf "rss:%d pss:%d\n", rss, pss} /proc/3015/smaps"
>         User time (seconds): 0.18
>         System time (seconds): 0.29
>         Percent of CPU this job got: 97%

The patch in question makes user and system
time accounting essentially tick-based. If
jiffies changes while the task is in user
mode, time gets accounted as user time, if
jiffies changes while the task is in system
mode, time gets accounted as system time.

If you get "unlucky", with a job like the
above, it is possible all time gets accounted
to system time.

This would be true both with the system running
with a periodic timer tick (before and after my
patch is applied), and in nohz_idle mode (after
my patch).

However, it does seem quite unlikely that you
get zero user time, since you have 125 timer
ticks in half a second. Furthermore, you do not
even have NO_HZ_FULL enabled...

Does the workload consistently get zero user
time?

If so, we need to dig further to see under
what precise circumstances that happens.

On my laptop, with kernel 4.6.3-300.fc24.x86_64
I get this:

$ /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf
"rss:%d pss:%d\n", rss, pss}' /proc/19825/smaps
rss:263368 pss:262145
	Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END
{printf "rss:%d pss:%d\n", rss, pss} /proc/19825/smaps"
	User time (seconds): 0.64
	System time (seconds): 0.19
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.83

The main difference between your and my
NO_HZ config seems to be that NO_HZ_FULL
is set here. However, it is not enabled
at run time, so both of our systems
should only really get NO_HZ_IDLE
effectively.

Running tasks should get sampled with the
regular timer tick, while they are running.

In other words, vtime accounting should be
disabled in both of our tests, for everything
except the idle task.

Do I need to do anything special to reproduce
your bug, besides running the max mmap program
and the awk script?

  reply	other threads:[~2016-08-23 21:46 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-12 22:04 [PACTH v2 0/3] Implement /proc/<pid>/totmaps robert.foss
2016-08-12 22:04 ` [PACTH v2 1/3] mm, proc: " robert.foss
2016-08-13 14:39   ` Jann Horn
2016-08-15 13:57     ` Robert Foss
2016-08-15 20:14       ` Robert Foss
2016-08-12 22:04 ` [PACTH v2 2/3] Documentation/filesystems: Fixed typo robert.foss
2016-08-12 22:04 ` [PACTH v2 3/3] Documentation/filesystems: Added /proc/PID/totmaps documentation robert.foss
2016-08-14  9:04 ` [PACTH v2 0/3] Implement /proc/<pid>/totmaps Michal Hocko
2016-08-15 13:00   ` Robert Foss
2016-08-15 13:42     ` Michal Hocko
2016-08-15 16:25       ` Robert Foss
2016-08-16  7:12         ` Michal Hocko
2016-08-16 16:46           ` Robert Foss
2016-08-17  8:22             ` Michal Hocko
2016-08-17  9:31               ` Jann Horn
2016-08-17 13:03                 ` Michal Hocko
2016-08-17 16:48                   ` Robert Foss
2016-08-17 18:57                   ` Sonny Rao
2016-08-18  7:44                     ` Michal Hocko
2016-08-18 17:47                       ` Sonny Rao
2016-08-18 18:01                         ` Michal Hocko
2016-08-18 21:05                           ` Robert Foss
2016-08-19  6:27                             ` Sonny Rao
2016-08-19  2:26                           ` Minchan Kim
2016-08-19  6:47                             ` Sonny Rao
2016-08-19  8:05                             ` Michal Hocko
2016-08-19 18:20                               ` Sonny Rao
2016-08-22  0:07                               ` Minchan Kim
2016-08-22  7:40                                 ` Michal Hocko
2016-08-22 14:12                                   ` Minchan Kim
2016-08-22 14:37                                     ` Robert Foss
2016-08-22 16:45                                     ` Michal Hocko
2016-08-22 17:29                                       ` Michal Hocko
2016-08-22 17:47                                         ` Michal Hocko
2016-08-23  8:26                                           ` Michal Hocko
2016-08-23 14:33                                             ` utime accounting regression since 4.6 (was: Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps) Michal Hocko
2016-08-23 21:46                                               ` Rik van Riel [this message]
2016-08-24 16:56                                                 ` Michal Hocko
2016-09-30  9:49                                                 ` Michal Hocko
2016-09-30 12:35                                                   ` Rik van Riel
2016-08-19  6:43                           ` [PACTH v2 0/3] Implement /proc/<pid>/totmaps Sonny Rao
2016-08-19  7:59                             ` Michal Hocko
2016-08-19 17:57                               ` Sonny Rao
2016-08-22  7:54                                 ` Michal Hocko
2016-08-22 22:44                                   ` Sonny Rao
2016-08-24 10:14                                     ` Marcin Jabrzyk
2016-08-30  8:20                                       ` Michal Hocko
2016-08-29 14:37                                     ` Michal Hocko
2016-08-30  8:15                                       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1471988771.2751.29.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=benzh@chromium.org \
    --cc=bfreed@chromium.org \
    --cc=calvinowens@fb.com \
    --cc=corbet@lwn.net \
    --cc=eric.engestrom@imgtec.com \
    --cc=filbranden@chromium.org \
    --cc=gorcunov@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jann@thejh.net \
    --cc=jdanis@google.com \
    --cc=jmarchan@redhat.com \
    --cc=john.stultz@linaro.org \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=koct9i@gmail.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mguzik@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=plaguedbypenguins@gmail.com \
    --cc=rientjes@google.com \
    --cc=robert.foss@collabora.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=sonnyrao@chromium.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox