All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [sched next] overflowed cpu time for kernel threads in /proc/PID/stat
Date: Mon, 2 Sep 2013 15:07:45 +0200	[thread overview]
Message-ID: <20130902130744.GB2378@somewhere> (raw)
In-Reply-To: <20130902122845.GA2457@swordfish.minsk.epam.com>

On Mon, Sep 02, 2013 at 03:28:45PM +0300, Sergey Senozhatsky wrote:
> On (08/31/13 01:04), Frederic Weisbecker wrote:
> > > in cputime_adjust() `stime' is greater than `rtime', so `utime = rtime - stime'
> > > sets wrong value.
> > 
> > But stime should always be below rtime due to the calculation done by scale_stime()
> > which roughly sums up to:
> > 
> >       stime = (stime / stime + utime) * rtime
> > 
> > So this shouldn't happen.
> > 
> > I couldn't manage to reproduce it though. Can you still trigger it with latest -next?
> >
> 
> Hope this may help.
> I've added a silly check to make sure that `stime < rtime'
> 
> @@ -579,6 +582,10 @@ static void cputime_adjust(struct task_cputime *curr,
>         if (total) {
>                 stime = scale_stime((__force u64)stime,
>                                     (__force u64)rtime, (__force u64)total);
> +               if (stime > rtime) {
> +                       printk(KERN_ERR "Ooops: stime:%llu rtime:%llu\n", stime, rtime);
> +                       WARN_ON(1);
> +               }
>                 utime = rtime - stime;
>         } else {
>                 stime = rtime;
> 
> 
> and got the following results:
> 
> 
> [ 1295.311056] Ooops: stime:4622750929 rtime:4622750928
> [ 1295.311063] ------------[ cut here ]------------
> [ 1295.311072] WARNING: CPU: 1 PID: 2167 at kernel/sched/cputime.c:584 cputime_adjust+0x132/0x140()
> [ 1295.311123] CPU: 1 PID: 2167 Comm: top Tainted: G         C   3.11.0-rc7-next-20130902-dbg-dirty #2
> [ 1295.311126] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
> [ 1295.311128]  0000000000000009 ffff880099937c50 ffffffff8162fc85 0000000000000000
> [ 1295.311134]  ffff880099937c88 ffffffff8104ee2d ffff8801530bb158 ffff880099937df0
> [ 1295.311139]  00000001138994d0 00000001138994d1 ffff880153118ff0 ffff880099937c98
> [ 1295.311145] Call Trace:
> [ 1295.311155]  [<ffffffff8162fc85>] dump_stack+0x4e/0x82
> [ 1295.311162]  [<ffffffff8104ee2d>] warn_slowpath_common+0x7d/0xa0
> [ 1295.311166]  [<ffffffff8104ef0a>] warn_slowpath_null+0x1a/0x20
> [ 1295.311170]  [<ffffffff8108db72>] cputime_adjust+0x132/0x140
> [ 1295.311175]  [<ffffffff8108e941>] thread_group_cputime_adjusted+0x41/0x50
> [ 1295.311182]  [<ffffffff811f3ac4>] do_task_stat+0x9c4/0xb50
> [ 1295.311188]  [<ffffffff811f4904>] proc_tgid_stat+0x14/0x20
> [ 1295.311192]  [<ffffffff811edac4>] proc_single_show+0x54/0xa0
> [ 1295.311198]  [<ffffffff811a6d24>] seq_read+0x164/0x3d0
> [ 1295.311204]  [<ffffffff811805e1>] vfs_read+0xa1/0x180
> [ 1295.311208]  [<ffffffff8118119c>] SyS_read+0x4c/0xa0
> [ 1295.311213]  [<ffffffff8163ad06>] tracesys+0xd4/0xd9
> [ 1295.311216] ---[ end trace bab1e899ff45eea2 ]---
> 
> [ 1301.384396] Ooops: stime:4654072951 rtime:4654072950
> [ 1301.384401] ------------[ cut here ]------------
> [ 1301.384409] WARNING: CPU: 0 PID: 2167 at kernel/sched/cputime.c:584 cputime_adjust+0x132/0x140()
> [ 1301.384450] CPU: 0 PID: 2167 Comm: top Tainted: G        WC   3.11.0-rc7-next-20130902-dbg-dirty #2
> [ 1301.384452] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
> [ 1301.384454]  0000000000000009 ffff880099937c50 ffffffff8162fc85 0000000000000000
> [ 1301.384458]  ffff880099937c88 ffffffff8104ee2d ffff8801530bb158 ffff880099937df0
> [ 1301.384462]  0000000115678476 0000000115678477 ffff880153118ff0 ffff880099937c98
> [ 1301.384466] Call Trace:
> [ 1301.384475]  [<ffffffff8162fc85>] dump_stack+0x4e/0x82
> [ 1301.384481]  [<ffffffff8104ee2d>] warn_slowpath_common+0x7d/0xa0
> [ 1301.384484]  [<ffffffff8104ef0a>] warn_slowpath_null+0x1a/0x20
> [ 1301.384487]  [<ffffffff8108db72>] cputime_adjust+0x132/0x140
> [ 1301.384491]  [<ffffffff8108e941>] thread_group_cputime_adjusted+0x41/0x50
> [ 1301.384497]  [<ffffffff811f3ac4>] do_task_stat+0x9c4/0xb50
> [ 1301.384501]  [<ffffffff811f4904>] proc_tgid_stat+0x14/0x20
> [ 1301.384504]  [<ffffffff811edac4>] proc_single_show+0x54/0xa0
> [ 1301.384509]  [<ffffffff811a6d24>] seq_read+0x164/0x3d0
> [ 1301.384514]  [<ffffffff811805e1>] vfs_read+0xa1/0x180
> [ 1301.384517]  [<ffffffff8118119c>] SyS_read+0x4c/0xa0
> [ 1301.384521]  [<ffffffff8163ad06>] tracesys+0xd4/0xd9
> [ 1301.384523] ---[ end trace bab1e899ff45eea3 ]---
> 
> and so on.
> 
> 
> $ dmesg | grep Ooops
> [ 1295.311056] Ooops: stime:4622750929 rtime:4622750928
> [ 1301.384396] Ooops: stime:4654072951 rtime:4654072950
> [ 1307.438935] Ooops: stime:4687858107 rtime:4687858106
> [ 1313.493462] Ooops: stime:4724166945 rtime:4724166944
> [ 1316.521740] Ooops: stime:4741142233 rtime:4741142232
> [ 1325.605625] Ooops: stime:4793906690 rtime:4793906688
> [ 1331.648611] Ooops: stime:4308413629 rtime:4308413628
> [ 1337.708468] Ooops: stime:4510156747 rtime:4510156746
> [ 1340.744586] Ooops: stime:4880584390 rtime:4880584389
> [ 1343.773111] Ooops: stime:4442005879 rtime:4442005878
> [ 1343.773362] Ooops: stime:4558310943 rtime:4558310942
> [ 1349.815797] Ooops: stime:4927379646 rtime:4927379645
> [ 1352.836936] Ooops: stime:4942782836 rtime:4942782834
> [ 1358.885232] Ooops: stime:4979031940 rtime:4979031938
> [ 1364.977902] Ooops: stime:5019633914 rtime:5019633912
> [ 1364.978283] Ooops: stime:4601969101 rtime:4601969100
> [ 1364.978623] Ooops: stime:4826393815 rtime:4826393814
> [ 1371.034265] Ooops: stime:4894272175 rtime:4894272174
> [ 1374.077150] Ooops: stime:4688831829 rtime:4688831828
> [ 1374.077519] Ooops: stime:4916319315 rtime:4916319314
> [ 1377.110416] Ooops: stime:5084174048 rtime:5084174047
> [ 1377.111043] Ooops: stime:4949250198 rtime:4949250196
> [ 1380.137382] Ooops: stime:5096963048 rtime:5096963046
> [ 1380.137739] Ooops: stime:4975014880 rtime:4975014879
> [ 1383.161441] Ooops: stime:5002466396 rtime:5002466394
> [ 1389.212166] Ooops: stime:5140400038 rtime:5140400036
> [ 1392.239303] Ooops: stime:5082051028 rtime:5082051027
> [ 1395.270391] Ooops: stime:5170118636 rtime:5170118634
> [ 1398.303593] Ooops: stime:5184355318 rtime:5184355316


Thanks a lot Sergey for testing this further!

Interesting results, so rtime is always one or two units off stime after scaling.
Stanislaw made the scaling code with Linus and he has a better idea on the math guts
here.

Stanislaw, any idea?

> 
> 
> 	-ss

  reply	other threads:[~2013-09-02 13:07 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-20 11:14 [sched next] overflowed cpu time for kernel threads in /proc/PID/stat Sergey Senozhatsky
2013-08-20 15:15 ` Frederic Weisbecker
2013-08-20 15:35   ` Sergey Senozhatsky
2013-08-20 15:42     ` Frederic Weisbecker
2013-08-20 15:53       ` Sergey Senozhatsky
2013-08-20 18:48       ` Sergey Senozhatsky
2013-08-21 15:39       ` Sergey Senozhatsky
2013-08-30 23:04         ` Frederic Weisbecker
2013-08-31 19:01           ` Sergey Senozhatsky
2013-09-08 10:56             ` Frederic Weisbecker
2013-09-08 10:59             ` Frederic Weisbecker
2013-09-02 12:28           ` Sergey Senozhatsky
2013-09-02 13:07             ` Frederic Weisbecker [this message]
2013-09-02 13:39               ` Sergey Senozhatsky
2013-09-02 13:50               ` Stanislaw Gruszka
2013-09-02 14:00                 ` Sergey Senozhatsky
2013-09-03  8:43                   ` Stanislaw Gruszka
2013-09-03  9:33                     ` Sergey Senozhatsky
2013-09-03 13:15                     ` Sergey Senozhatsky
2013-09-03 18:09                     ` Sergey Senozhatsky
2013-09-03 18:32                       ` Ingo Molnar
2013-09-04 12:08                         ` [PATCH -tip] sched/cputime: do not scale when utime == 0 Stanislaw Gruszka
2013-09-04 12:33                           ` Ingo Molnar
2013-09-04 13:16                             ` [PATCH -tip v2] " Stanislaw Gruszka
2013-09-04 14:31                               ` Ingo Molnar
2013-09-04 14:37                               ` [tip:timers/urgent] sched/cputime: Do " tip-bot for Stanislaw Gruszka
2013-09-04 15:35                                 ` Frederic Weisbecker
2013-09-02 14:13                 ` [sched next] overflowed cpu time for kernel threads in /proc/PID/stat Frederic Weisbecker
2013-09-02 14:24                   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130902130744.GB2378@somewhere \
    --to=fweisbec@gmail.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=sgruszka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.